Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread Thomas Gleixner
On Wed, Oct 07 2020 at 17:11, David Woodhouse wrote:
> On 7 October 2020 16:57:36 BST, Thomas Gleixner  wrote:
>>There is not lot's of nastiness.
>
> OK, but I think we do have to cope with the fact that the limit is
> dynamic, and a CPU might be added which widens the mask. I think
> that's fundamental and not x86-specific.

Yes, but it needs some thoughts vs. serialization against CPU hotplug.

The stability of that cpumask is architecture specific if the calling
context cannot serialize against CPU hotplug. The hot-unplug case is
less interesting because the mask does not shrink, it only get's wider.

That's why I want a callback instead of a pointer assignment and that
callback cannot return a pointer to something which can be modified
concurrently, it needs to update a caller provided mask so that the
result is at least consistent in itself.

It just occured to me that there is something which needs some more
thought vs. CPU hotunplug as well:

  Right now we just check whether there are enough vectors available on
  the remaining online CPUs so that the interrupts which have an
  effective affinity directed to the outgoing CPU can be migrated away
  (modulo the managed ones).

  That check obviously needs to take the target CPU restrictions into
  account to prevent that devices end up with no target at the end.

I bet there are some other interesting bits and pieces which need
some care...

Thanks,

tglx


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: xen-swiotlb vs phys_to_dma

2020-10-07 Thread Stefano Stabellini
On Wed, 7 Oct 2020, Christoph Hellwig wrote:
> On Tue, Oct 06, 2020 at 01:46:12PM -0700, Stefano Stabellini wrote:
> > OK, this makes a lot of sense, and I like the patch because it makes the
> > swiotlb interface clearer.
> > 
> > Just one comment below.
> > 
> 
> > > +phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t 
> > > orig_addr,
> > > + size_t mapping_size, size_t alloc_size,
> > > + enum dma_data_direction dir, unsigned long attrs)
> > >  {
> > > + dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, io_tlb_start);
> > 
> > This is supposed to be hwdev, not dev
> 
> Yeah, te compiler would be rather unhappy oterwise.
> 
> I'll resend it after the dma-mapping and Xen trees are merged by Linus
> to avoid a merge conflict.

Sounds good, thanks. Please add

Reviewed-by: Stefano Stabellini 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 19:23 +0200, Thomas Gleixner wrote:
> > It so happens that in Linux, we don't really architect the software
> > like that. So each of the PCI MSI domain, HPET, and IOAPIC have their
> > *own* message composer which has the same limits and composes basically
> > the same messages as if it was *their* format, not dictated to them by
> > the APIC upstream. And that's what we're both getting our panties in a
> > knot about, I think.
> 
> Are you actually reading what I write and caring to look at the code?
> 
> PCI-MSI does not have a compose message callback in the irq chip. The
> message is composed by the underlying parent domain. Same for HPET.
> 
> The only dogdy part is the IO/APIC for hysterical raisins and because
> I did not come around yet to sort that out.

Right. And the problem is that the "underlying parent domain" in this
case is actually x86_vector_domain. Whereas it probably makes more
sense, at least in theory, for there to be an *intermediate* domain
responsible for composing the Compat MSI messages.

Both the Compat-MSI and the IR-MSI domains would be children of the
generic x86_vector_domain, and then any given HPET, IOAPIC or PCI-MSI
domain would be a child of either one of those generic MSI domains.

> > It really doesn't matter that much to the underlying generic irqdomain
> > support for limited affinities. Except that you want to make the
> > generic code support the concept of a child domain supporting *more*
> > CPUs than its parent, which really doesn't make much sense if you think
> > about it.
> 
> Right. So we really want to stick the restriction into a compat-MSI
> domain to make stuff match reality and to avoid banging the head against
> the wall sooner than later.

Right.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread Thomas Gleixner
On Wed, Oct 07 2020 at 16:46, David Woodhouse wrote:
> The PCI MSI domain, HPET, and even the IOAPIC are just the things out
> there on the bus which might perform those physical address cycles. And
> yes, as you say they're just a message store sending exactly the
> message that was composed for them. They know absolutely nothing about
> what the message means and how it is composed.

That's what I said.

> It so happens that in Linux, we don't really architect the software
> like that. So each of the PCI MSI domain, HPET, and IOAPIC have their
> *own* message composer which has the same limits and composes basically
> the same messages as if it was *their* format, not dictated to them by
> the APIC upstream. And that's what we're both getting our panties in a
> knot about, I think.

Are you actually reading what I write and caring to look at the code?

PCI-MSI does not have a compose message callback in the irq chip. The
message is composed by the underlying parent domain. Same for HPET.

The only dogdy part is the IO/APIC for hysterical raisins and because
I did not come around yet to sort that out.

> It really doesn't matter that much to the underlying generic irqdomain
> support for limited affinities. Except that you want to make the
> generic code support the concept of a child domain supporting *more*
> CPUs than its parent, which really doesn't make much sense if you think
> about it.

Right. So we really want to stick the restriction into a compat-MSI
domain to make stuff match reality and to avoid banging the head against
the wall sooner than later.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse



On 7 October 2020 17:02:59 BST, Thomas Gleixner  wrote:
>On Wed, Oct 07 2020 at 15:23, David Woodhouse wrote:
>> On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote:
>>> On Wed, Oct 07 2020 at 14:08, David Woodhouse wrote:
>>> > On 7 October 2020 13:59:00 BST, Thomas Gleixner
> wrote:
>>> > > On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
>>> > > > To fix *that* case, we really do need the whole series giving
>us per-
>>> > > > domain restricted affinity, and to use it for those
>MSIs/IOAPICs that
>>> > > > the IRQ remapping doesn't cover.
>>> > > 
>>> > > Which do not exist today.
>>> > 
>>> > Sure. But at patch 10/13 into this particular patch series, it
>*does*
>>> > exist.
>>> 
>>> As I told you before: Your ordering is wrong. We do not introduce
>bugs
>>> first and then fix them later 
>>
>> I didn't introduce that bug; it's been there for years. Fixing it
>> properly requires per-irqdomain affinity limits.
>>
>> There's a cute little TODO at least in the Intel irq-remapping
>driver,
>> noting that we should probably check if there are any IOAPICs that
>> aren't in the scope of any DRHD at all. But that's all.
>
>So someone forgot to remove the cute little TODO when this was added:
>
>   if (parse_ioapics_under_ir()) {
>pr_info("Not enabling interrupt remapping\n");
>goto error;
>}

And HPET, and PCI devices including those that might be hotplugged in future 
and not be covered by any extant IOMMU's scope?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread David Woodhouse



On 7 October 2020 16:57:36 BST, Thomas Gleixner  wrote:
>On Wed, Oct 07 2020 at 15:10, David Woodhouse wrote:
>> On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote:
>>> What is preventing you to change the function signature? But handing
>>> down irqdomain here is not cutting it. The right thing to do is to
>>> replace 'struct irq_affinity_desc *affinity' with something more
>>> flexible.
>>
>> Yeah, although I do think I want to ditch this part completely, and
>> treat the "possible" mask for the domain very much more like we do
>> cpu_online_mask. In that we can allow affinities to be *requested*
>> which are outside it, and they can just never be *effective* while
>> those CPUs aren't present and reachable.
>
>Huch?
>
>> That way a lot of the nastiness about preparing an "acceptable" mask
>in
>> advance can go away.
>
>There is not lot's of nastiness.

OK, but I think we do have to cope with the fact that the limit is dynamic, and 
a CPU might be added which widens the mask. I think that's fundamental and not 
x86-specific.

>> It's *also* driven, as I noted, by the realisation that on x86, the
>> x86_non_ir_cpumask will only ever contain those CPUs which have been
>> brought online so far and meet the criteria... but won't that be true
>> on *any* system where CPU numbers are virtual and not 1:1 mapped with
>> whatever determines reachability by the IRQ domain? It isn't *such*
>an
>> x86ism, surely?
>
>Yes, but that's exactly the reason why I don't want to have whatever
>mask name you chose to be directly exposed and want it to be part of
>the
>irq domains because then you can express arbitrary per domain limits.

The x86_non_ir_mask isn't intended to be directly exposed to any generic IRQ 
code.

It's set up by the x86 APIC code so that those x86 IRQ domains which are 
affected by it, can set it with irqdomain_set_max_affinity() for the generic 
code to see. Without each having to create the cpumask from scratch for 
themselves.

> ... reading for once the kids are elsewhere...

Thanks.

>It's not rocket science to fix that as the irq remapping code already
>differentiates between the device types.

I don't actually like that very much. The IRQ remapping code could just compose 
the MSI messages that it desires without really having to care so much about 
the child device. The bit where IRQ remapping actually composes IOAPIC RTEs 
makes me particularly sad.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread Thomas Gleixner
On Wed, Oct 07 2020 at 15:23, David Woodhouse wrote:
> On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote:
>> On Wed, Oct 07 2020 at 14:08, David Woodhouse wrote:
>> > On 7 October 2020 13:59:00 BST, Thomas Gleixner  wrote:
>> > > On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
>> > > > To fix *that* case, we really do need the whole series giving us per-
>> > > > domain restricted affinity, and to use it for those MSIs/IOAPICs that
>> > > > the IRQ remapping doesn't cover.
>> > > 
>> > > Which do not exist today.
>> > 
>> > Sure. But at patch 10/13 into this particular patch series, it *does*
>> > exist.
>> 
>> As I told you before: Your ordering is wrong. We do not introduce bugs
>> first and then fix them later 
>
> I didn't introduce that bug; it's been there for years. Fixing it
> properly requires per-irqdomain affinity limits.
>
> There's a cute little TODO at least in the Intel irq-remapping driver,
> noting that we should probably check if there are any IOAPICs that
> aren't in the scope of any DRHD at all. But that's all.

So someone forgot to remove the cute little TODO when this was added:

   if (parse_ioapics_under_ir()) {
pr_info("Not enabling interrupt remapping\n");
goto error;
}

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread Thomas Gleixner
On Wed, Oct 07 2020 at 15:10, David Woodhouse wrote:
> On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote:
>> What is preventing you to change the function signature? But handing
>> down irqdomain here is not cutting it. The right thing to do is to
>> replace 'struct irq_affinity_desc *affinity' with something more
>> flexible.
>
> Yeah, although I do think I want to ditch this part completely, and
> treat the "possible" mask for the domain very much more like we do
> cpu_online_mask. In that we can allow affinities to be *requested*
> which are outside it, and they can just never be *effective* while
> those CPUs aren't present and reachable.

Huch?

> That way a lot of the nastiness about preparing an "acceptable" mask in
> advance can go away.

There is not lot's of nastiness.

> It's *also* driven, as I noted, by the realisation that on x86, the
> x86_non_ir_cpumask will only ever contain those CPUs which have been
> brought online so far and meet the criteria... but won't that be true
> on *any* system where CPU numbers are virtual and not 1:1 mapped with
> whatever determines reachability by the IRQ domain? It isn't *such* an
> x86ism, surely?

Yes, but that's exactly the reason why I don't want to have whatever
mask name you chose to be directly exposed and want it to be part of the
irq domains because then you can express arbitrary per domain limits.

>> Fact is, that if there are CPUs which cannot be targeted by device
>> interrupts then the multiqueue affinity mechanism has to be fixed to
>> handle this. Right now it's just broken.
>
> I think part of the problem there is that I don't really understand how
> this part is *supposed* to work. I was focusing on getting the simple
> case working first, in the expectation that we'd come back to that part
> ansd you'd keep me honest. Is there some decent documentation on this
> that I'm missing?

TLDR & HTF;

Multiqueue devices want to have at max 1 queue per CPU or if the device
has less queues than CPUs they want the queues to have a fixed
association to a set of CPUs.

At setup time this is established considering possible CPUs to handle
'physical' hotplug correctly.

If a queue has no online CPUs it cannot be started. If it's active and
the last CPU goes down then it's quiesced and stopped and the core code
shuts down the interrupt and does not move it to a still online CPU.

So with your hackery, we end up in a situation where we have a large
possible mask, but not all CPUs in that mask can be reached, which means
in a 1 queue per CPU scenario all unreachable CPUs would have
disfunctional queues.

So that spreading algorithm needs to know about this limitation.

>> So if you look at X86 then you have either:
>> 
>>[VECTOR] - [IO/APIC]
>>   |-- [MSI]
>>   |-- [WHATEVER]
>> 
>> or
>> 
>>[VECTOR] ---[REMAP]--- [IO/APIC]
>>  ||-- [MSI]
>>  |[WHATEVER]
>
> Hierarchically, theoretically the IOAPIC and HPET really ought to be
> children of the MSI domain. It's the Compatibility MSI which has the
> restriction on destination ID, because of how many bits it interprets
> from the MSI address. HPET and IOAPIC are just generating MSIs that hit
> that upstream limit.

We kinda have that, but not nicely abstracted. But we surely can and
should fix that.

>> So if REMAP allows cpu_possible_mask and VECTOR some restricted subset
>> then irqdomain_get_possible_affinity() will return the correct result
>> independent whether remapping is enabled or not.
>
w> Sure. Although VECTOR doesn't have the restriction. As noted, it's the
> Compatibility MSI that does. So the diagram looks something like this:
>
>  [ VECTOR ]  [ REMAP ]  [ IR-MSI ]  [ IR-HPET ]
>   | |---[ IR-PCI-MSI ]
>   | |---[ IR-IOAPIC ]
>   |
>   |--[ COMPAT-MSI ]  [ HPET ]
>  |-- [ PCI-MSI ]
>  |-- [ IOAPIC ]
>
>
> In this diagram, it's COMPAT-MSI that has the affinity restriction,
> inherited by its children.
>
> Now, I understand that you're not keen on IOAPIC actually being a child
> of the MSI domain, and that's fine. In Linux right now, those generic
> 'IR-MSI' and 'COMPAT-MSI' domains don't exist. So all three of the
> compatibility HPET, PCI-MSI and IOAPIC domains would have to add that
> same 8-bit affinity limit for themselves, as their parent is the VECTOR
> domain.

No. We fix it proper and not by hacking around it.

> I suppose it *might* not hurt to pretend that VECTOR does indeed have
> the limit, and to *remove* it in the remapping domain. And then the
> affinity limit could be removed in one place by the REMAP domain
> because even now in Linux's imprecise hierarchy, the IR-HPET, IR-PCI-
> MSI and IR-IOAPIC domains *are* children of that.

It's not rocket science to fix 

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 17:25 +0200, Thomas Gleixner wrote:
> It's clearly how the hardware works. MSI has a message store of some
> sorts and if the entry is enabled then the MSI chip (in PCI or
> elsewhere) will send exactly the message which is in that message
> store. It knows absolutely nothing about what the message means and how
> it is composed. The only things which MSI knows about is whether the
> message address is 64bit wide or not and whether the entries are
> maskable or not and how many entries it can store.
> 
> Software allocates a message target at the underlying irq domain (vector
> or remap) and that underlying irq domain defines the properties.
> 
> If qemu emulates it differently then it's qemu's problem, but that does
> not make it in anyway something which influences the irq domain
> abstractions which are correctly modeled after how the hardware works.
> 
> > Not really the important part to deal with right now, either way.
> 
> Oh yes it is. We define that upfront and not after the fact.

The way the hardware works is that something handles physical address
cycles to addresses in the (on x86) 0xFEEx range, and turns them
into actual interrupts on the appropriate CPU — where the APIC ID and
vector (etc.) are directly encoded in the bits of the address and the
data written. That compatibility x86 APIC MSI format is where the 8-bit 
(or 15-bit) limit comes from.

Then interrupt remapping comes along, and now those physical address
cycles are actually handled by the IOMMU — which can either handle the
compatibility format as before, or use a different format of
address/data bits and perform a lookup in its IRTE table.

The PCI MSI domain, HPET, and even the IOAPIC are just the things out
there on the bus which might perform those physical address cycles. And
yes, as you say they're just a message store sending exactly the
message that was composed for them. They know absolutely nothing about
what the message means and how it is composed.

It so happens that in Linux, we don't really architect the software
like that. So each of the PCI MSI domain, HPET, and IOAPIC have their
*own* message composer which has the same limits and composes basically
the same messages as if it was *their* format, not dictated to them by
the APIC upstream. And that's what we're both getting our panties in a
knot about, I think.

It really doesn't matter that much to the underlying generic irqdomain
support for limited affinities. Except that you want to make the
generic code support the concept of a child domain supporting *more*
CPUs than its parent, which really doesn't make much sense if you think
about it. But it isn't that hard to do, and if it means we don't have
to argue any more about the x86 hierarchy not matching the hardware
then it's a price worth paying. 




smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Bug 209321] DMAR: [DMA Read] Request device [03:00.0] PASID ffffffff fault addr fffd3000 [fault reason 06] PTE Read access is not set

2020-10-07 Thread Bjorn Helgaas
https://bugzilla.kernel.org/show_bug.cgi?id=209321

Not much detail in the bugzilla yet, but apparently this started in
v5.8.0-rc1:

  DMAR: [DMA Read] Request device [03:00.0] PASID  fault addr fffd3000 
[fault reason 06] PTE Read access is not set

Currently assigned to Driver/PCI, but not clear to me yet whether PCI
is the culprit or the victim.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread Thomas Gleixner
On Wed, Oct 07 2020 at 16:05, David Woodhouse wrote:
> On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote:
>> The top most MSI irq chip does not even have a compose function, neither
>> for the remap nor for the vector case. The composition is done by the
>> parent domain from the data which the parent domain constructed. Same
>> for the IO/APIC just less clearly separated.
>> 
>> The top most chip just takes what the underlying domain constructed and
>> writes it to the message store, because that's what the top most chip
>> controls. It does not control the content.
>
> Sure, whatever. The way we arrange the IRQ domain hierarchy in x86
> Linux doesn't really match my understanding of the real hardware, or
> how qemu emulates it either. But it is at least internally self-
> consistent, and in this model it probably does make some sense to claim
> the 8-bit limit on x86_vector_domain itself, and *remove* that limit in
> the remapping irqdomain.

It's clearly how the hardware works. MSI has a message store of some
sorts and if the entry is enabled then the MSI chip (in PCI or
elsewhere) will send exactly the message which is in that message
store. It knows absolutely nothing about what the message means and how
it is composed. The only things which MSI knows about is whether the
message address is 64bit wide or not and whether the entries are
maskable or not and how many entries it can store.

Software allocates a message target at the underlying irq domain (vector
or remap) and that underlying irq domain defines the properties.

If qemu emulates it differently then it's qemu's problem, but that does
not make it in anyway something which influences the irq domain
abstractions which are correctly modeled after how the hardware works.

> Not really the important part to deal with right now, either way.

Oh yes it is. We define that upfront and not after the fact.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote:
> > > The information has to property of the relevant irq domains and the
> > > hierarchy allows you nicely to retrieve it from there instead of
> > > sprinkling this all over the place.
> > 
> > No. This is not a property of the parent domain per se. Especially if
> > you're thinking that we could inherit the affinity mask from the
> > parent, then twice no.
> > 
> > This is a property of the MSI domain itself, and how many bits of
> > destination ID the hardware at *this* level can interpret and pass on
> > to the parent domain.
> 
> Errm what?
> 
> The MSI domain does not know anything about what the underlying domain
> can handle and it shouldn't.
> 
> If MSI is on top of remapping then the remapping domain defines what the
> MSI domain can do and not the other way round. Ditto for the non
> remapped case in which the vector domain decides.
> 
> The top most MSI irq chip does not even have a compose function, neither
> for the remap nor for the vector case. The composition is done by the
> parent domain from the data which the parent domain constructed. Same
> for the IO/APIC just less clearly separated.
> 
> The top most chip just takes what the underlying domain constructed and
> writes it to the message store, because that's what the top most chip
> controls. It does not control the content.

Sure, whatever. The way we arrange the IRQ domain hierarchy in x86
Linux doesn't really match my understanding of the real hardware, or
how qemu emulates it either. But it is at least internally self-
consistent, and in this model it probably does make some sense to claim
the 8-bit limit on x86_vector_domain itself, and *remove* that limit in
the remapping irqdomain.

Not really the important part to deal with right now, either way.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 16:05 +0200, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 14:08, David Woodhouse wrote:
> > On 7 October 2020 13:59:00 BST, Thomas Gleixner  wrote:
> > > On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
> > > > To fix *that* case, we really do need the whole series giving us per-
> > > > domain restricted affinity, and to use it for those MSIs/IOAPICs that
> > > > the IRQ remapping doesn't cover.
> > > 
> > > Which do not exist today.
> > 
> > Sure. But at patch 10/13 into this particular patch series, it *does*
> > exist.
> 
> As I told you before: Your ordering is wrong. We do not introduce bugs
> first and then fix them later 

I didn't introduce that bug; it's been there for years. Fixing it
properly requires per-irqdomain affinity limits.

There's a cute little TODO at least in the Intel irq-remapping driver,
noting that we should probably check if there are any IOAPICs that
aren't in the scope of any DRHD at all. But that's all.

If it happens, I think we'll end up doing the right thing and
instantiating a non-IR IOAPIC domain for it, and if we *don't* have any
CPU with an APIC ID above... (checks notes)... 7 ... then it'll just
about work out. Over 7 and we're screwed (because logical mode; see
also https://git.infradead.org/users/dwmw2/linux.git/commit/429f0c33f
for a straw man but that's getting even further down the rabbit hole)



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote:
> On Wed, Oct 07 2020 at 08:19, David Woodhouse wrote:
> > On Tue, 2020-10-06 at 23:26 +0200, Thomas Gleixner wrote:
> > > On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> > > > From: David Woodhouse 
> > > > 
> > > > This is the maximum possible set of CPUs which can be used. Use it
> > > > to calculate the default affinity requested from __irq_alloc_descs()
> > > > by first attempting to find the intersection with irq_default_affinity,
> > > > or falling back to using just the max_affinity if the intersection
> > > > would be empty.
> > > 
> > > And why do we need that as yet another argument?
> > > 
> > > This is an optional property of the irq domain, really and no caller has
> > > any business with that. 
> > 
> > Because irq_domain_alloc_descs() doesn't actually *take* the domain as
> > an argument. It's more of an internal function, which is only non-
> > static because it's used from kernel/irq/ipi.c too for some reason. If
> > we convert the IPI code to just call __irq_alloc_descs() directly,
> > perhaps that we can actually make irq_domain_alloc_decs() static.
> 
> What is preventing you to change the function signature? But handing
> down irqdomain here is not cutting it. The right thing to do is to
> replace 'struct irq_affinity_desc *affinity' with something more
> flexible.

Yeah, although I do think I want to ditch this part completely, and
treat the "possible" mask for the domain very much more like we do
cpu_online_mask. In that we can allow affinities to be *requested*
which are outside it, and they can just never be *effective* while
those CPUs aren't present and reachable.

That way a lot of the nastiness about preparing an "acceptable" mask in
advance can go away.

It's *also* driven, as I noted, by the realisation that on x86, the
x86_non_ir_cpumask will only ever contain those CPUs which have been
brought online so far and meet the criteria... but won't that be true
on *any* system where CPU numbers are virtual and not 1:1 mapped with
whatever determines reachability by the IRQ domain? It isn't *such* an
x86ism, surely?

> Fact is, that if there are CPUs which cannot be targeted by device
> interrupts then the multiqueue affinity mechanism has to be fixed to
> handle this. Right now it's just broken.

I think part of the problem there is that I don't really understand how
this part is *supposed* to work. I was focusing on getting the simple
case working first, in the expectation that we'd come back to that part
ansd you'd keep me honest. Is there some decent documentation on this
that I'm missing?

> It's pretty obvious that the irq domains are the right place to store
> that information:
> 
> const struct cpumask *irqdomain_get_possible_affinity(struct irq_domain *d)
> {
> while (d) {
>   if (d->get_possible_affinity)
>   return d->get_possible_affinity(d);
> d = d->parent;
> }
> return cpu_possible_mask;
> }

Sure.

> So if you look at X86 then you have either:
> 
>[VECTOR] - [IO/APIC]
>   |-- [MSI]
>   |-- [WHATEVER]
> 
> or
> 
>[VECTOR] ---[REMAP]--- [IO/APIC]
>  ||-- [MSI]
>  |[WHATEVER]

Hierarchically, theoretically the IOAPIC and HPET really ought to be
children of the MSI domain. It's the Compatibility MSI which has the
restriction on destination ID, because of how many bits it interprets
from the MSI address. HPET and IOAPIC are just generating MSIs that hit
that upstream limit.

> So if REMAP allows cpu_possible_mask and VECTOR some restricted subset
> then irqdomain_get_possible_affinity() will return the correct result
> independent whether remapping is enabled or not.

Sure. Although VECTOR doesn't have the restriction. As noted, it's the
Compatibility MSI that does. So the diagram looks something like this:

 [ VECTOR ]  [ REMAP ]  [ IR-MSI ]  [ IR-HPET ]
  | |---[ IR-PCI-MSI ]
  | |---[ IR-IOAPIC ]
  |
  |--[ COMPAT-MSI ]  [ HPET ]
 |-- [ PCI-MSI ]
 |-- [ IOAPIC ]


In this diagram, it's COMPAT-MSI that has the affinity restriction,
inherited by its children.

Now, I understand that you're not keen on IOAPIC actually being a child
of the MSI domain, and that's fine. In Linux right now, those generic
'IR-MSI' and 'COMPAT-MSI' domains don't exist. So all three of the
compatibility HPET, PCI-MSI and IOAPIC domains would have to add that
same 8-bit affinity limit for themselves, as their parent is the VECTOR
domain.

I suppose it *might* not hurt to pretend that VECTOR does indeed have
the limit, and to *remove* it in the remapping domain. And then the
affinity limit could be removed in one place by the REMAP domain
because even 

Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread Thomas Gleixner
On Wed, Oct 07 2020 at 14:08, David Woodhouse wrote:
> On 7 October 2020 13:59:00 BST, Thomas Gleixner  wrote:
>>On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
>>> To fix *that* case, we really do need the whole series giving us per-
>>> domain restricted affinity, and to use it for those MSIs/IOAPICs that
>>> the IRQ remapping doesn't cover.
>>
>>Which do not exist today.
>
> Sure. But at patch 10/13 into this particular patch series, it *does*
> exist.

As I told you before: Your ordering is wrong. We do not introduce bugs
first and then fix them later 

>>And all of this is completely wrong to begin with.
>>
>>The information has to property of the relevant irq domains and the
>>hierarchy allows you nicely to retrieve it from there instead of
>>sprinkling this all over the place.
>
> No. This is not a property of the parent domain per se. Especially if
> you're thinking that we could inherit the affinity mask from the
> parent, then twice no.
>
> This is a property of the MSI domain itself, and how many bits of
> destination ID the hardware at *this* level can interpret and pass on
> to the parent domain.

Errm what?

The MSI domain does not know anything about what the underlying domain
can handle and it shouldn't.

If MSI is on top of remapping then the remapping domain defines what the
MSI domain can do and not the other way round. Ditto for the non
remapped case in which the vector domain decides.

The top most MSI irq chip does not even have a compose function, neither
for the remap nor for the vector case. The composition is done by the
parent domain from the data which the parent domain constructed. Same
for the IO/APIC just less clearly separated.

The top most chip just takes what the underlying domain constructed and
writes it to the message store, because that's what the top most chip
controls. It does not control the content.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/4] Add system mmu support for Armada-806

2020-10-07 Thread Marcin Wojtas
Hi Denis,

Thank you for your report.

wt., 6 paź 2020 o 17:17 Denis Odintsov  napisał(a):
>
> Hi,
>
> > Am 15.07.2020 um 09:06 schrieb Tomasz Nowicki :
> >
> > The series is meant to support SMMU for AP806 and a workaround
> > for accessing ARM SMMU 64bit registers is the gist of it.
> >
> > For the record, AP-806 can't access SMMU registers with 64bit width.
> > This patches split the readq/writeq into two 32bit accesses instead
> > and update DT bindings.
> >
> > The series was successfully tested on a vanilla v5.8-rc3 kernel and
> > Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.
> >
> > For reference, previous versions are listed below:
> > V1: https://lkml.org/lkml/2018/10/15/373
> > V2: https://lkml.org/lkml/2019/7/11/426
> > V3: https://lkml.org/lkml/2020/7/2/1114
> >
>
> 1) After enabling SMMU on Armada 8040, and 
> ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y by default in kernel since 
> 954a03be033c7cef80ddc232e7cbdb17df735663,
> internal eMMC is prevented from being initialised (as there is no iommus 
> property for ap_sdhci0)
> Disabling "Disable bypass by default" make it work, but the patch highly 
> suggest doing it properly.
> I wasn't able to find correct path for ap_sdhci for iommus in any publicly 
> available documentation,
> would be highly appreciated addressed properly, thank you!

According to my knowledge and the docs AP IO devices cannot be
virtualized, only ones connected via CP110/CP115. We'd need to check
what should be done in such configuration and get back to you.


>
> 2) Second issue I got (btw I have ClearFog GT 8k armada-8040 based board) is 
> mpci ath10k card.
> It is found, it is enumerated, it is visible in lspci, but it fails to be 
> initialised. Here is the log:
>
> [1.743754] armada8k-pcie f260.pcie: host bridge /cp0/pcie@f260 
> ranges:
> [1.751116] armada8k-pcie f260.pcie:  MEM 
> 0x00f600..0x00f6ef -> 0x00f600
> [1.964690] armada8k-pcie f260.pcie: Link up
> [1.969379] armada8k-pcie f260.pcie: PCI host bridge to bus :00
> [1.976026] pci_bus :00: root bus resource [bus 00-ff]
> [1.981537] pci_bus :00: root bus resource [mem 0xf600-0xf6ef]
> [1.988462] pci :00:00.0: [11ab:0110] type 01 class 0x060400
> [1.994504] pci :00:00.0: reg 0x10: [mem 0x-0x000f]
> [2.000843] pci :00:00.0: supports D1 D2
> [2.005132] pci :00:00.0: PME# supported from D0 D1 D3hot
> [2.011853] pci :01:00.0: [168c:003c] type 00 class 0x028000
> [2.018001] pci :01:00.0: reg 0x10: [mem 0x-0x001f 64bit]
> [2.025002] pci :01:00.0: reg 0x30: [mem 0x-0x pref]
> [2.032111] pci :01:00.0: supports D1 D2
> [2.049409] pci :00:00.0: BAR 14: assigned [mem 0xf600-0xf61f]
> [2.056322] pci :00:00.0: BAR 0: assigned [mem 0xf620-0xf62f]
> [2.063142] pci :00:00.0: BAR 15: assigned [mem 0xf630-0xf63f 
> pref]
> [2.070484] pci :01:00.0: BAR 0: assigned [mem 0xf600-0xf61f 
> 64bit]
> [2.077880] pci :01:00.0: BAR 6: assigned [mem 0xf630-0xf630 
> pref]
> [2.085135] pci :00:00.0: PCI bridge to [bus 01-ff]
> [2.090384] pci :00:00.0:   bridge window [mem 0xf600-0xf61f]
> [2.097202] pci :00:00.0:   bridge window [mem 0xf630-0xf63f 
> pref]
> [2.104539] pcieport :00:00.0: Adding to iommu group 4
> [2.110232] pcieport :00:00.0: PME: Signaling with IRQ 38
> [2.116141] pcieport :00:00.0: AER: enabled with IRQ 38
> [8.131135] ath10k_pci :01:00.0: Adding to iommu group 4
> [8.131874] ath10k_pci :01:00.0: enabling device ( -> 0002)
> [8.132203] ath10k_pci :01:00.0: pci irq msi oper_irq_mode 2 irq_mode 
> 0 reset_mode 0
>
> up to that point the log is the same as without SMMU enabled, except "Adding 
> to iommu group N" lines, and IRQ being 37
>
> [8.221328] ath10k_pci :01:00.0: failed to poke copy engine: -16
> [8.313362] ath10k_pci :01:00.0: failed to poke copy engine: -16
> [8.409373] ath10k_pci :01:00.0: failed to poke copy engine: -16
> [8.553433] ath10k_pci :01:00.0: failed to poke copy engine: -16
> [8.641370] ath10k_pci :01:00.0: failed to poke copy engine: -16
> [8.737979] ath10k_pci :01:00.0: failed to poke copy engine: -16
> [8.807356] ath10k_pci :01:00.0: Failed to get pcie state addr: -16
> [8.814032] ath10k_pci :01:00.0: failed to setup init config: -16
> [8.820605] ath10k_pci :01:00.0: could not power on hif bus (-16)
> [8.827111] ath10k_pci :01:00.0: could not probe fw (-16)
>
> Thank you!

The PCIE was validated when booting from edk2 + using pci-host-generic
driver and standard intel NIC. Not sure if it makes any difference vs
the Designware driver ("marvell,armada8k-pcie"), but we need to
double-check that.

Best regards,
Marcin

>
> > v3 -> v4
> > - call 

Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread Thomas Gleixner
On Wed, Oct 07 2020 at 08:19, David Woodhouse wrote:
> On Tue, 2020-10-06 at 23:26 +0200, Thomas Gleixner wrote:
>> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
>> > From: David Woodhouse 
>> > 
>> > This is the maximum possible set of CPUs which can be used. Use it
>> > to calculate the default affinity requested from __irq_alloc_descs()
>> > by first attempting to find the intersection with irq_default_affinity,
>> > or falling back to using just the max_affinity if the intersection
>> > would be empty.
>> 
>> And why do we need that as yet another argument?
>> 
>> This is an optional property of the irq domain, really and no caller has
>> any business with that. 
>
> Because irq_domain_alloc_descs() doesn't actually *take* the domain as
> an argument. It's more of an internal function, which is only non-
> static because it's used from kernel/irq/ipi.c too for some reason. If
> we convert the IPI code to just call __irq_alloc_descs() directly,
> perhaps that we can actually make irq_domain_alloc_decs() static.

What is preventing you to change the function signature? But handing
down irqdomain here is not cutting it. The right thing to do is to
replace 'struct irq_affinity_desc *affinity' with something more
flexible.

>> >  int irq_domain_alloc_descs(int virq, unsigned int cnt, irq_hw_number_t 
>> > hwirq,
>> > - int node, const struct irq_affinity_desc *affinity)
>> > + int node, const struct irq_affinity_desc *affinity,
>> > + const struct cpumask *max_affinity)
>> >  {
>> > +  cpumask_var_t default_affinity;
>> >unsigned int hint;
>> > +  int i;
>> > +
>> > +  /* Check requested per-IRQ affinities are in the possible range */
>> > +  if (affinity && max_affinity) {
>> > +  for (i = 0; i < cnt; i++)
>> > +  if (!cpumask_subset([i].mask, max_affinity))
>> > +  return -EINVAL;
>> 
>> https://lore.kernel.org/r/alpine.DEB.2.20.1701171956290.3645@nanos
>> 
>> What is preventing the affinity spreading code from spreading the masks
>> out to unusable CPUs? The changelog is silent about that part.
>
> I'm coming to the conclusion that we should allow unusable CPUs to be
> specified at this point, just as we do offline CPUs. That's largely
> driven by the realisation that our x86_non_ir_cpumask is only going to
> contain online CPUs anyway, and hotplugged CPUs only get added to it as
> they are brought online.

Can you please stop looking at this from a x86 only perspective. It's
largely irrelevant what particular needs x86 or virt or whatever has.

Fact is, that if there are CPUs which cannot be targeted by device
interrupts then the multiqueue affinity mechanism has to be fixed to
handle this. Right now it's just broken.

Passing yet more cpumasks and random pointers around through device
drivers and whatever is just not going to happen. Neither are we going
to have

arch_can_be_used_for_device_interrupts_mask

or whatever you come up with and claim it to be 'generic'.

The whole affinity control mechanism needs to be refactored from ground
up and the information about CPUs which can be targeted has to be
retrievable through the irqdomain hierarchy.

Anything else is just tinkering and I have zero interest in mopping up
after you.

It's pretty obvious that the irq domains are the right place to store
that information:

const struct cpumask *irqdomain_get_possible_affinity(struct irq_domain *d)
{
while (d) {
if (d->get_possible_affinity)
return d->get_possible_affinity(d);
d = d->parent;
}
return cpu_possible_mask;
}

So if you look at X86 then you have either:

   [VECTOR] - [IO/APIC]
  |-- [MSI]
  |-- [WHATEVER]

or

   [VECTOR] ---[REMAP]--- [IO/APIC]
 ||-- [MSI]
 |[WHATEVER]

So if REMAP allows cpu_possible_mask and VECTOR some restricted subset
then irqdomain_get_possible_affinity() will return the correct result
independent whether remapping is enabled or not.

This allows to use that for other things like per node restrictions or
whatever people come up with, without sprinkling more insanities through
the tree.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse



On 7 October 2020 13:59:00 BST, Thomas Gleixner  wrote:
>On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
>> On Tue, 2020-10-06 at 23:54 +0200, Thomas Gleixner wrote:
>>> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
>> This is the case I called out in the cover letter:
>>
>> This patch series implements a per-domain "maximum affinity" set
>and
>> uses it for the non-remapped IOAPIC and MSI domains on x86. As
>well as
>> allowing more CPUs to be used without interrupt remapping, this
>also
>> fixes the case where some IOAPICs or PCI devices aren't actually
>in
>> scope of any active IOMMU and are operating without remapping.
>>
>> To fix *that* case, we really do need the whole series giving us per-
>> domain restricted affinity, and to use it for those MSIs/IOAPICs that
>> the IRQ remapping doesn't cover.
>
>Which do not exist today.

Sure. But at patch 10/13 into this particular patch series, it *does* exist.

(Ignoring, for the moment, that I'm about to rewrite half the preceding patches 
to take a different approach)


>>> >   ip->irqdomain->parent = parent;
>>> > + if (parent == x86_vector_domain)
>>> > + irq_domain_set_affinity(ip->irqdomain, _non_ir_cpumask);
>>> 
>>> OMG
>>
>> This IOAPIC function may or may not be behind remapping.
>
>>> >   d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
>>> > + irq_domain_set_affinity(d, _non_ir_cpumask);
>>> 
>>> So here it's unconditional
>>
>> Yes, this code is only for the non-remapped case and there's a
>separate
>> arch_create_remap_msi_irq_domain() function a few lines further down
>> which creates the IR-PCI-MSI IRQ domain. So no need for a condition
>> here.
>>
>>> > + if (parent == x86_vector_domain)
>>> > + irq_domain_set_affinity(d, _non_ir_cpumask);
>>> 
>>> And here we need a condition again. Completely obvious and
>reviewable - NOT.
>>
>> And HPET may or may not be behind remapping so again needs the
>> condition. I had figured that part was fairly obvious but can note it
>> in the commit comment.
>
>And all of this is completely wrong to begin with.
>
>The information has to property of the relevant irq domains and the
>hierarchy allows you nicely to retrieve it from there instead of
>sprinkling this all over the place.

No. This is not a property of the parent domain per se. Especially if you're 
thinking that we could inherit the affinity mask from the parent, then twice no.

This is a property of the MSI domain itself, and how many bits of destination 
ID the hardware at *this* level can interpret and pass on to the parent domain.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread Thomas Gleixner
On Wed, Oct 07 2020 at 08:48, David Woodhouse wrote:
> On Tue, 2020-10-06 at 23:54 +0200, Thomas Gleixner wrote:
>> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> This is the case I called out in the cover letter:
>
> This patch series implements a per-domain "maximum affinity" set and
> uses it for the non-remapped IOAPIC and MSI domains on x86. As well as
> allowing more CPUs to be used without interrupt remapping, this also
> fixes the case where some IOAPICs or PCI devices aren't actually in
> scope of any active IOMMU and are operating without remapping.
>
> To fix *that* case, we really do need the whole series giving us per-
> domain restricted affinity, and to use it for those MSIs/IOAPICs that
> the IRQ remapping doesn't cover.

Which do not exist today.

>> >ip->irqdomain->parent = parent;
>> > +  if (parent == x86_vector_domain)
>> > +  irq_domain_set_affinity(ip->irqdomain, _non_ir_cpumask);
>> 
>> OMG
>
> This IOAPIC function may or may not be behind remapping.

>> >d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
>> > +  irq_domain_set_affinity(d, _non_ir_cpumask);
>> 
>> So here it's unconditional
>
> Yes, this code is only for the non-remapped case and there's a separate
> arch_create_remap_msi_irq_domain() function a few lines further down
> which creates the IR-PCI-MSI IRQ domain. So no need for a condition
> here.
>
>> > +  if (parent == x86_vector_domain)
>> > +  irq_domain_set_affinity(d, _non_ir_cpumask);
>> 
>> And here we need a condition again. Completely obvious and reviewable - NOT.
>
> And HPET may or may not be behind remapping so again needs the
> condition. I had figured that part was fairly obvious but can note it
> in the commit comment.

And all of this is completely wrong to begin with.

The information has to property of the relevant irq domains and the
hierarchy allows you nicely to retrieve it from there instead of
sprinkling this all over the place.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-07 Thread Christoph Hellwig
On Wed, Oct 07, 2020 at 02:21:43PM +0200, Tomasz Figa wrote:
> My initial feeling is that it should work, but we'll give you a
> definitive answer once we prototype it. :)
> 
> We might actually give it a try in the USB HCD subsystem as well, to
> implement usb_alloc_noncoherent(), as an optimization for drivers
> which have to perform multiple random accesses to the URB buffers. I
> think you might recall discussing this by the way of the pwc and
> uvcvideo camera drivers.

Yes.  I guess for usb the dma_alloc_noncoherent as-is in linux-next
might be better.   So if you have the cycles please prototype it
either way, although for that we'd also need at least a
mmap_noncoherent method, and probaby a get_sgtable_noncoherent one.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-07 Thread Tomasz Figa
On Wed, Oct 7, 2020 at 8:21 AM Christoph Hellwig  wrote:
>
> On Tue, Oct 06, 2020 at 10:56:04PM +0200, Tomasz Figa wrote:
> > > Yes.  And make sure the API isn't implemented when VIVT caches are
> > > used, but that isn't really different from the current interface.
> >
> > Okay, thanks. Let's see if we can make necessary changes to the videobuf2.
> >
> > +Sergey Senozhatsky for awareness too.
>
> I can defer the changes a bit to see if you'd really much prefer
> the former interface.  I think for now the most important thing is
> that it works properly for the potential users, and the prime one is
> videobuf2 for now.  drm also seems like a big potential users, but I
> had a really hard time getting the developers to engage in API
> development.

My initial feeling is that it should work, but we'll give you a
definitive answer once we prototype it. :)

We might actually give it a try in the USB HCD subsystem as well, to
implement usb_alloc_noncoherent(), as an optimization for drivers
which have to perform multiple random accesses to the URB buffers. I
think you might recall discussing this by the way of the pwc and
uvcvideo camera drivers.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 13:15 +0200, Paolo Bonzini wrote:
> On 07/10/20 10:59, David Woodhouse wrote:
> > Yeah, I was expecting the per-irqdomain affinity support to take a few
> > iterations. But this part, still sticking with the current behaviour of
> > only allowing CPUs to come online at all if they can be reached by all
> > interrupts, can probably go in first.
> > 
> > It's presumably (hopefully!) a blocker for the qemu patch which exposes
> > the same feature bit defined in this patch.
> 
> Yeah, though we could split it further and get the documentation part in
> first.  That would let the QEMU part go through.

Potentially. Although I've worked out that the first patch in my
series, adding x2apic_set_max_apicid(), is actually a bug fix because
it fixes the behaviour if you only *hotplug* CPUs with APIC IDs > 255
and there were none of them present at boot time.

So I'll post this set on its own to start with, and then focus on the
per-irqdomain affinity support after that.

David Woodhouse (5):
  x86/apic: Fix x2apic enablement without interrupt remapping
  x86/msi: Only use high bits of MSI address for DMAR unit
  x86/ioapic: Handle Extended Destination ID field in RTE
  x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available
  x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

2020-10-07 Thread Paolo Bonzini
On 07/10/20 10:59, David Woodhouse wrote:
> Yeah, I was expecting the per-irqdomain affinity support to take a few
> iterations. But this part, still sticking with the current behaviour of
> only allowing CPUs to come online at all if they can be reached by all
> interrupts, can probably go in first.
> 
> It's presumably (hopefully!) a blocker for the qemu patch which exposes
> the same feature bit defined in this patch.

Yeah, though we could split it further and get the documentation part in
first.  That would let the QEMU part go through.

Paolo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths

2020-10-07 Thread Joerg Roedel
On Fri, Sep 25, 2020 at 09:52:31AM +0800, Lu Baolu wrote:
> 
> On 9/24/20 10:08 PM, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > Instead of bailing out completely, such a unit can still be used for
> > interrupt remapping.
> 
> Reviewed-by: Lu Baolu 

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

2020-10-07 Thread David Woodhouse
On Wed, 2020-10-07 at 10:14 +0200, Paolo Bonzini wrote:
> Looks like the rest of the series needs some more work, but anyway:
> 
> Acked-by: Paolo Bonzini 

Thanks.

Yeah, I was expecting the per-irqdomain affinity support to take a few
iterations. But this part, still sticking with the current behaviour of
only allowing CPUs to come online at all if they can be reached by all
interrupts, can probably go in first.

It's presumably (hopefully!) a blocker for the qemu patch which exposes
the same feature bit defined in this patch.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

2020-10-07 Thread Paolo Bonzini
On 05/10/20 17:28, David Woodhouse wrote:
> From: David Woodhouse 
> 
> This allows the host to indicate that IOAPIC and MSI emulation supports
> 15-bit destination IDs, allowing up to 32Ki CPUs without remapping.
> 
> Signed-off-by: David Woodhouse 
> ---
>  Documentation/virt/kvm/cpuid.rst | 4 
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kernel/kvm.c| 6 ++
>  3 files changed, 11 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/cpuid.rst 
> b/Documentation/virt/kvm/cpuid.rst
> index a7dff9186bed..1726b5925d2b 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT  14  guest checks 
> this feature bit
>async pf acknowledgment msr
>0x4b564d07.
>  
> +KVM_FEATURE_MSI_EXT_DEST_ID   15  guest checks this feature bit
> +  before using extended 
> destination
> +  ID bits in MSI address bits 
> 11-5.
> +
>  KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24  host will warn if no guest-side
>per-cpu warps are expeced in
>kvmclock
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 812e9b4c1114..950afebfba88 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -32,6 +32,7 @@
>  #define KVM_FEATURE_POLL_CONTROL 12
>  #define KVM_FEATURE_PV_SCHED_YIELD   13
>  #define KVM_FEATURE_ASYNC_PF_INT 14
> +#define KVM_FEATURE_MSI_EXT_DEST_ID  15
>  
>  #define KVM_HINTS_REALTIME  0
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 1b51b727b140..4986b4399aef 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -743,12 +743,18 @@ static void __init kvm_init_platform(void)
>   x86_platform.apic_post_init = kvm_apic_init;
>  }
>  
> +static bool __init kvm_msi_ext_dest_id(void)
> +{
> + return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
> +}
> +
>  const __initconst struct hypervisor_x86 x86_hyper_kvm = {
>   .name   = "KVM",
>   .detect = kvm_detect,
>   .type   = X86_HYPER_KVM,
>   .init.guest_late_init   = kvm_guest_init,
>   .init.x2apic_available  = kvm_para_available,
> + .init.msi_ext_dest_id   = kvm_msi_ext_dest_id,
>   .init.init_platform = kvm_init_platform,
>  };
>  
> 

Looks like the rest of the series needs some more work, but anyway:

Acked-by: Paolo Bonzini 

Paolo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR

2020-10-07 Thread David Woodhouse
On Tue, 2020-10-06 at 23:54 +0200, Thomas Gleixner wrote:
> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> 
> > From: David Woodhouse 
> > 
> > When interrupt remapping isn't enabled, only the first 255 CPUs can
> 
> No, only CPUs with an APICid < 255 

Ack.

> > receive external interrupts. Set the appropriate max affinity for
> > the IOAPIC and MSI IRQ domains accordingly.
> > 
> > This also fixes the case where interrupt remapping is enabled but some
> > devices are not within the scope of any active IOMMU.
> 
> What? If this fixes an pre-existing problem then
> 
>   1) Explain the problem proper
>   2) Have a patch at the beginning of the series which fixes it
>  independently of this pile
> 
> If it's fixing a problem in your pile, then you got the ordering wrong.

It's not that simple; there's not a single patch which fixes that and
which can go first. I can, and do, fix the "no IR" case in a simple
patch that goes first, simply by restricting the kernel to the APIC IDs
which can be targeted.

This is the case I called out in the cover letter:

This patch series implements a per-domain "maximum affinity" set and
uses it for the non-remapped IOAPIC and MSI domains on x86. As well as
allowing more CPUs to be used without interrupt remapping, this also
fixes the case where some IOAPICs or PCI devices aren't actually in
scope of any active IOMMU and are operating without remapping.

To fix *that* case, we really do need the whole series giving us per-
domain restricted affinity, and to use it for those MSIs/IOAPICs that
the IRQ remapping doesn't cover.

> You didn't start kernel programming as of yesterday, so you really know
> how that works.
> 
> > ip->irqdomain->parent = parent;
> > +   if (parent == x86_vector_domain)
> > +   irq_domain_set_affinity(ip->irqdomain, _non_ir_cpumask);
> 
> OMG

This IOAPIC function may or may not be behind remapping.

> 
> > if (cfg->type == IOAPIC_DOMAIN_LEGACY ||
> > cfg->type == IOAPIC_DOMAIN_STRICT)
> > diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> > index 4d891967bea4..af5ce5c4da02 100644
> > --- a/arch/x86/kernel/apic/msi.c
> > +++ b/arch/x86/kernel/apic/msi.c
> > @@ -259,6 +259,7 @@ struct irq_domain * __init 
> > native_create_pci_msi_domain(void)
> > pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
> > } else {
> > d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
> > +   irq_domain_set_affinity(d, _non_ir_cpumask);
> 
> So here it's unconditional

Yes, this code is only for the non-remapped case and there's a separate
arch_create_remap_msi_irq_domain() function a few lines further down
which creates the IR-PCI-MSI IRQ domain. So no need for a condition
here.

> > }
> > return d;
> >  }
> > @@ -479,6 +480,8 @@ struct irq_domain *hpet_create_irq_domain(int hpet_id)
> > irq_domain_free_fwnode(fn);
> > kfree(domain_info);
> > }
> > +   if (parent == x86_vector_domain)
> > +   irq_domain_set_affinity(d, _non_ir_cpumask);
> 
> And here we need a condition again. Completely obvious and reviewable - NOT.

And HPET may or may not be behind remapping so again needs the
condition. I had figured that part was fairly obvious but can note it
in the commit comment.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 09/13] x86/irq: Add x86_non_ir_cpumask

2020-10-07 Thread David Woodhouse
On Tue, 2020-10-06 at 23:42 +0200, Thomas Gleixner wrote:
> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > This is the mask of CPUs to which IRQs can be delivered without
> > interrupt
> > remapping.
> >  
> > +/* Mask of CPUs which can be targeted by non-remapped interrupts.
> > */
> > +cpumask_t x86_non_ir_cpumask = { CPU_BITS_ALL };
> 
> What?

By default, if we didn't hit any limits, all CPUs can be targeted by
external interrupts. It's the default today.

Or at least we pretend it is, modulo the bugs :)

> >  #ifdef CONFIG_X86_32
> >  
> >  /*
> > @@ -1838,6 +1841,7 @@ static __init void x2apic_enable(void)
> >  static __init void try_to_enable_x2apic(int remap_mode)
> >  {
> > u32 apic_limit = 0;
> > +   int i;
> >  
> > if (x2apic_state == X2APIC_DISABLED)
> > return;
> > @@ -1880,6 +1884,14 @@ static __init void try_to_enable_x2apic(int 
> > remap_mode)
> > if (apic_limit)
> > x2apic_set_max_apicid(apic_limit);
> >  
> > +   /* Build the affinity mask for interrupts that can't be remapped. */
> > +   cpumask_clear(_non_ir_cpumask);
> > +   i = min_t(unsigned int, num_possible_cpus() - 1, apic_limit);
> > +   for ( ; i >= 0; i--) {
> > +   if (cpu_physical_id(i) <= apic_limit)
> > +   cpumask_set_cpu(i, _non_ir_cpumask);
> > +   }
> 
> Blink. If the APIC id is not linear with the cpu numbers then this
> results in a reduced addressable set of CPUs. WHY?

Hm, good question. That loop was cargo-culted from hyperv-iommu.c;
perhaps it makes more sense there because Hyper-V really does promise
that linearity, or perhaps it was already buggy. Will fix.

In fact, in apic.c I could probably just use the cpuid_to_apicid array
which is right there in the file.

> > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> > index aa9a3b54a96c..4d0ef46fedb9 100644
> > --- a/arch/x86/kernel/apic/io_apic.c
> > +++ b/arch/x86/kernel/apic/io_apic.c
> > @@ -2098,6 +2098,8 @@ static int mp_alloc_timer_irq(int ioapic, int pin)
> > struct irq_alloc_info info;
> >  
> > ioapic_set_alloc_attr(, NUMA_NO_NODE, 0, 0);
> > +   if (domain->parent == x86_vector_domain)
> > +   info.mask = _non_ir_cpumask;
> 
> We are not going to sprinkle such domain checks all over the
> place. Again, the mask is a property of the interrupt domain.

Yeah, that's a hangover from the first attempts which I forgot to
delete.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 08/13] genirq: Add irq_domain_set_affinity()

2020-10-07 Thread David Woodhouse
On Tue, 2020-10-06 at 23:32 +0200, Thomas Gleixner wrote:
> What the heck? Why does this need a setter function which is exported?
> So that random driver writers can fiddle with it?
> 
> The affinity mask restriction of an irq domain is already known when the
> domain is created.

It's exported because __irq_domain_add is exported. I didn't want to
just add a rarely-used extra argument to __irq_domain_add and the other
public APIs which call into it, and figured a separate setter function
was the simplest option.

I can rework to add the argument to __irq_domain_add if you prefer.

Or we can declare that restricted affinity *isn't* something that IRQ
domains provided by modules can have, and just not export the setter
function?



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()

2020-10-07 Thread David Woodhouse
On Tue, 2020-10-06 at 23:26 +0200, Thomas Gleixner wrote:
> On Mon, Oct 05 2020 at 16:28, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > This is the maximum possible set of CPUs which can be used. Use it
> > to calculate the default affinity requested from __irq_alloc_descs()
> > by first attempting to find the intersection with irq_default_affinity,
> > or falling back to using just the max_affinity if the intersection
> > would be empty.
> 
> And why do we need that as yet another argument?
> 
> This is an optional property of the irq domain, really and no caller has
> any business with that. 

Because irq_domain_alloc_descs() doesn't actually *take* the domain as
an argument. It's more of an internal function, which is only non-
static because it's used from kernel/irq/ipi.c too for some reason. If
we convert the IPI code to just call __irq_alloc_descs() directly,
perhaps that we can actually make irq_domain_alloc_decs() static.

> >  int irq_domain_alloc_descs(int virq, unsigned int cnt, irq_hw_number_t 
> > hwirq,
> > -  int node, const struct irq_affinity_desc *affinity)
> > +  int node, const struct irq_affinity_desc *affinity,
> > +  const struct cpumask *max_affinity)
> >  {
> > +   cpumask_var_t default_affinity;
> > unsigned int hint;
> > +   int i;
> > +
> > +   /* Check requested per-IRQ affinities are in the possible range */
> > +   if (affinity && max_affinity) {
> > +   for (i = 0; i < cnt; i++)
> > +   if (!cpumask_subset([i].mask, max_affinity))
> > +   return -EINVAL;
> 
> https://lore.kernel.org/r/alpine.DEB.2.20.1701171956290.3645@nanos
> 
> What is preventing the affinity spreading code from spreading the masks
> out to unusable CPUs? The changelog is silent about that part.

I'm coming to the conclusion that we should allow unusable CPUs to be
specified at this point, just as we do offline CPUs. That's largely
driven by the realisation that our x86_non_ir_cpumask is only going to
contain online CPUs anyway, and hotplugged CPUs only get added to it as
they are brought online.

> > +   /*
> > +* Generate default affinity. Either the possible subset of
> > +* irq_default_affinity if such a subset is non-empty, or fall
> > +* back to the provided max_affinity if there is no intersection.
> 
> ..
> > +* And just a copy of irq_default_affinity in the
> > +* !CONFIG_CPUMASK_OFFSTACK case.
> 
> We know that already...
> 
> > +*/
> > +   memset(_affinity, 0, sizeof(default_affinity));
> 
> Right, memset() before allocating is useful.

The memset is because there's no cpumask_var_t initialiser that I can
see. So cpumask_available() on the uninitialised 'default_affinity'
variable might be true even in the OFFSTACK case.

> > +   if ((max_affinity &&
> > +!cpumask_subset(irq_default_affinity, max_affinity))) {
> > +   if (!alloc_cpumask_var(_affinity, GFP_KERNEL))
> > +   return -ENOMEM;
> > +   cpumask_and(default_affinity, max_affinity,
> > +   irq_default_affinity);
> > +   if (cpumask_empty(default_affinity))
> > +   cpumask_copy(default_affinity, max_affinity);
> > +   } else if (cpumask_available(default_affinity))
> > +   cpumask_copy(default_affinity, irq_default_affinity);
> 
> That's garbage and unreadable.

That's why there was a comment explaining it... at which point you
claimed to already know :)

Clearly the comment didn't do the job it was supposed to do. I'll
rework.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance

2020-10-07 Thread Christoph Hellwig
On Tue, Oct 06, 2020 at 09:19:32AM -0400, Jonathan Marek wrote:
> One example why drm/msm can't use DMA API is multiple page table support
> (that is landing in 5.10), which is something that definitely couldn't work
> with DMA API.
> 
> Another one is being able to choose the address for mappings, which AFAIK
> DMA API can't do (somewhat related to this: qcom hardware often has ranges
> of allowed addresses, which the dma_mask mechanism fails to represent, what
> I see is drivers using dma_mask as a "maximum address", and since addresses
> are allocated from the top it generally works)

That sounds like a good enough rason to use the IOMMU API.  I just
wanted to make sure this really makes sense.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-07 Thread Christoph Hellwig
On Tue, Oct 06, 2020 at 10:56:04PM +0200, Tomasz Figa wrote:
> > Yes.  And make sure the API isn't implemented when VIVT caches are
> > used, but that isn't really different from the current interface.
> 
> Okay, thanks. Let's see if we can make necessary changes to the videobuf2.
> 
> +Sergey Senozhatsky for awareness too.

I can defer the changes a bit to see if you'd really much prefer
the former interface.  I think for now the most important thing is
that it works properly for the potential users, and the prime one is
videobuf2 for now.  drm also seems like a big potential users, but I
had a really hard time getting the developers to engage in API
development.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: xen-swiotlb vs phys_to_dma

2020-10-07 Thread Christoph Hellwig
On Tue, Oct 06, 2020 at 01:46:12PM -0700, Stefano Stabellini wrote:
> OK, this makes a lot of sense, and I like the patch because it makes the
> swiotlb interface clearer.
> 
> Just one comment below.
> 

> > +phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t 
> > orig_addr,
> > +   size_t mapping_size, size_t alloc_size,
> > +   enum dma_data_direction dir, unsigned long attrs)
> >  {
> > +   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, io_tlb_start);
> 
> This is supposed to be hwdev, not dev

Yeah, te compiler would be rather unhappy oterwise.

I'll resend it after the dma-mapping and Xen trees are merged by Linus
to avoid a merge conflict.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu