from:"Benjamin Herrenschmidt"

Re: [PATCH 06/25] powerpc: Use bool function return values of true/false not 1/0

2015-03-30 Thread Benjamin Herrenschmidt

On Mon, 2015-03-30 at 16:46 -0700, Joe Perches wrote:
> Use the normal return values for bool functions

Acked-by: Benjamin Herrenschmidt 

Should we merge it or will you ?

Cheers,
Ben.

> Signed-off-by: Joe Perches 
> ---
>  arch/powerpc/include/asm/dcr-native.h| 2 +-
>  arch/powerpc/include/asm/dma-mapping.h   | 4 ++--
>  arch/powerpc/include/asm/kvm_book3s_64.h | 4 ++--
>  arch/powerpc/sysdev/dcr.c| 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dcr-native.h 
> b/arch/powerpc/include/asm/dcr-native.h
> index 7d2e623..4efc11d 100644
> --- a/arch/powerpc/include/asm/dcr-native.h
> +++ b/arch/powerpc/include/asm/dcr-native.h
> @@ -31,7 +31,7 @@ typedef struct {
>  
>  static inline bool dcr_map_ok_native(dcr_host_native_t host)
>  {
> - return 1;
> + return true;
>  }
>  
>  #define dcr_map_native(dev, dcr_n, dcr_c) \
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index 894d538..9103687 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -191,11 +191,11 @@ static inline bool dma_capable(struct device *dev, 
> dma_addr_t addr, size_t size)
>   struct dev_archdata *sd = &dev->archdata;
>  
>   if (sd->max_direct_dma_addr && addr + size > sd->max_direct_dma_addr)
> - return 0;
> + return false;
>  #endif
>  
>   if (!dev->dma_mask)
> - return 0;
> + return false;
>  
>   return addr + size - 1 <= *dev->dma_mask;
>  }
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 2d81e20..2a244bf 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -335,7 +335,7 @@ static inline bool hpte_read_permission(unsigned long pp, 
> unsigned long key)
>  {
>   if (key)
>   return PP_RWRX <= pp && pp <= PP_RXRX;
> - return 1;
> + return true;
>  }
>  
>  static inline bool hpte_write_permission(unsigned long pp, unsigned long key)
> @@ -373,7 +373,7 @@ static inline bool slot_is_aligned(struct kvm_memory_slot 
> *memslot,
>   unsigned long mask = (pagesize >> PAGE_SHIFT) - 1;
>  
>   if (pagesize <= PAGE_SIZE)
> - return 1;
> + return true;
>   return !(memslot->base_gfn & mask) && !(memslot->npages & mask);
>  }
>  
> diff --git a/arch/powerpc/sysdev/dcr.c b/arch/powerpc/sysdev/dcr.c
> index 2d8a101..121e26f 100644
> --- a/arch/powerpc/sysdev/dcr.c
> +++ b/arch/powerpc/sysdev/dcr.c
> @@ -54,7 +54,7 @@ bool dcr_map_ok_generic(dcr_host_t host)
>   else if (host.type == DCR_HOST_MMIO)
>   return dcr_map_ok_mmio(host.host.mmio);
>   else
> - return 0;
> + return false;
>  }
>  EXPORT_SYMBOL_GPL(dcr_map_ok_generic);
>  


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Benjamin Herrenschmidt

On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote:
> Having bravely said that..
> 
> the IB team informs me that they see a 10% degradation using 
> the spin_lock as opposed to the trylock.
> 
> one path going forward is to continue processing this patch-set 
> as is. I can investigate this further, and later revise the spin_lock
> to the trylock, after we are certain that it is good/necessary.

Have they tried using more pools instead ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Reword the "returning from prom_init" message

2015-04-01 Thread Benjamin Herrenschmidt

On Wed, 2015-04-01 at 14:56 +0800, Jeremy Kerr wrote:
> Hi Michael,
> 
> > Let's try something different.
> > 
> > This prints:
> > 
> >   Quiescing Open Firmware ...
> >   Booting Linux via __start() ...
> 
> "returning from prom_init" has been an emblem of the powerpc kernel boot
> since old times. I feel like we should give the old message a Viking
> funeral and pick a date for a minute of silence.

Let's drink to that :-)

> (Wistfully-)Acked-by: Jeremy Kerr 
> 
> Cheers,
> 
> 
> Jeremy
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-04-02 Thread Benjamin Herrenschmidt

On Thu, 2015-04-02 at 12:21 -0400, David Miller wrote:
> From: Sowmini Varadhan 
> Date: Thu, 2 Apr 2015 08:51:52 -0400
> 
> > do I need to resubmit this without the RFC tag? Perhaps I should
> > have dropped that some time ago.
> 
> I want to hear from the powerpc folks whether they can positively
> adopt the new generic code or not.

Oh yes, I'm fine with it, it's basically our code slightly refactored,
but I still have a few minor review comments on Swomini's patch I
started to put together yesterday but didn't finish. I'll send them
today.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-04-02 Thread Benjamin Herrenschmidt

On Fri, 2015-04-03 at 07:22 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2015-04-02 at 12:21 -0400, David Miller wrote:
> > From: Sowmini Varadhan 
> > Date: Thu, 2 Apr 2015 08:51:52 -0400
> > 
> > > do I need to resubmit this without the RFC tag? Perhaps I should
> > > have dropped that some time ago.
> > 
> > I want to hear from the powerpc folks whether they can positively
> > adopt the new generic code or not.
> 
> Oh yes, I'm fine with it, it's basically our code slightly refactored,
> but I still have a few minor review comments on Swomini's patch I
> started to put together yesterday but didn't finish. I'll send them
> today.

Actually I found some more serious issues that for some reason I didn't
spot the other day. Will send comments in a jiffy.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt

On Tue, 2015-03-31 at 10:40 -0400, Sowmini Varadhan wrote:

> + if (largealloc) {
> + pool = &(iommu->large_pool);
> + spin_lock_irqsave(&pool->lock, flags);
> + pool_nr = 0; /* to keep compiler happy */
> + } else {
> + /* pick out pool_nr */
> + pool_nr =  pool_hash & (npools - 1);
> + pool = &(iommu->pools[pool_nr]);
> + spin_lock_irqsave(&(pool->lock), flags);
> + }

Move spin_lock_irqsave outside of if/else

> + again:
> + if (pass == 0 && handle && *handle &&
> + (*handle >= pool->start) && (*handle < pool->end))
> + start = *handle;
> + else
> + start = pool->hint;

Now this means "handle" might be < pool->hint, in that case you also
need a lazy flush. Or rather only if the resulting alloc is. My
suggestion originally was to test that after the area alloc. See further
down.

> + limit = pool->end;
> +
> + /* The case below can happen if we have a small segment appended
> +  * to a large, or when the previous alloc was at the very end of
> +  * the available space. If so, go back to the beginning and flush.
> +  */
> + if (start >= limit) {
> + start = pool->start;
> + if (!large_pool && iommu->lazy_flush != NULL)
> + iommu->lazy_flush(iommu);

Add need_flush = false;

In fact, looking more closely, I see a another problems:

You never flush the large pool. You should either remove all
your large_pool checks since you treat it as a normal pool you
your code, and let it flush normally, or you should have an
unconditional flush somewhere in there for the large pool (or in
free()), either way. The current implementation you show me, afaik,
never calls lazy_flush on a large allocation.

Also, if you fix the problem I mentioned earlier about *handle < hint,
you also don't need the above flush, it will be handled further down
in the else case for the if (n == -1), see below

> + }

I only just noticed too, you completely dropped the code to honor
the dma mask. Why that ? Some devices rely on this.

> + if (dev)
> + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> +   1 << iommu->table_shift);
> + else
> + boundary_size = ALIGN(1UL << 32, 1 << iommu->table_shift);
> +
> + shift = iommu->table_map_base >> iommu->table_shift;
> + boundary_size = boundary_size >> iommu->table_shift;
> + /*
> +  * if the skip_span_boundary_check had been set during init, we set
> +  * things up so that iommu_is_span_boundary() merely checks if the
> +  * (index + npages) < num_tsb_entries
> +  */
> + if ((iommu->flags & IOMMU_NO_SPAN_BOUND) != 0) {
> + shift = 0;
> + boundary_size = iommu->poolsize * iommu->nr_pools;
> + }
> + n = iommu_area_alloc(iommu->map, limit, start, npages, shift,
> +  boundary_size, 0);

You have completely dropped the alignment support. This will break
drivers. There are cases (especially with consistent allocations) where
the driver have alignment constraints on the address, those must be
preserved.

> + if (n == -1) {
> + if (likely(pass == 0)) {
> + /* First failure, rescan from the beginning.  */
> + pool->hint = pool->start;
> + if (!large_pool && iommu->lazy_flush != NULL)
> + need_flush = true;

Same question about the large pool check... I wouldn't bother with the
function pointer NULL check here either, only test it at the call site.

> + pass++;
> + goto again;
> + } else if (!largealloc && pass <= iommu->nr_pools) {
> + 

Here, you might have moved the hint of the pool (case above), and then
hit this code path, no ? In that case, need_flush might be set, which
means you must flush before you unlock.

>   spin_unlock(&(pool->lock));
> +
>   pool_nr = (pool_nr + 1) & (iommu->nr_pools - 1);
> + pool = &(iommu->pools[pool_nr]);
> + spin_lock(&(pool->lock));
> + pool->hint = pool->start;
> + if (!large_pool && iommu->lazy_flush != NULL)
> + need_flush = true;

I wouldn't bother with the test. This can't be large_alloc afaik and
the NULL check can be done at the call site.

> + pass++;
> + goto again;
> + } else {
> + /* give up */
> + n = DMA_ERROR_CODE;
> + goto bail;
> + }
> + }

Here, make this something like:

} else if (end < pool->hint)
need_flush = true;

To handle the case where you started your alloc below the hint without
having updated the hint yet, ie, a *handle < hint or a

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt

On Thu, 2015-04-02 at 17:43 -0400, Sowmini Varadhan wrote:
> On (04/03/15 07:54), Benjamin Herrenschmidt wrote:
> > > + limit = pool->end;
> > > +
> > > + /* The case below can happen if we have a small segment appended
> > > +  * to a large, or when the previous alloc was at the very end of
> > > +  * the available space. If so, go back to the beginning and flush.
> > > +  */
> > > + if (start >= limit) {
> > > + start = pool->start;
> > > + if (!large_pool && iommu->lazy_flush != NULL)
> > > + iommu->lazy_flush(iommu);
> > 
> > Add need_flush = false;
> 
> A few clarifications, while I parse the rest of your comments:
> 
> Not sure I follow- need_flush is initialized to true at the start of the 
> function?

No but you can loop back there via "goto again". However if you follow
my other comment and move the flush back to the end, then you don't need
that at all.

> > I only just noticed too, you completely dropped the code to honor
> > the dma mask. Why that ? Some devices rely on this.
> 
> so that's an interesting question: the existing iommu_range_alloc() in
> arch/sparc/kernel/iommu.c does not use the mask at all. I based most of
> the code  on this (except for the lock fragmentation part). 
> I dont know if this is arch specific.

Probably, not that many devices have limits on DMA mask but they do
exist. It becomes more important if we decide to create a very large
IOMMU window that spans beyond 4G in order to support devices with
32-bit DMA masks. Otherwise it's older devices mostly with <32-bit
masks.

In any case, for a generic piece of code, this should be supported.
Basically, assume that if we have something in the powerpc code, we need
it, if you remove it, we won't be able to use your code generically.

There are a few cases we can debate like our block allocation, but
things like the mask or the alignment constraints aren't in that group.

> > 
> > > + if (dev)
> > > + boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > > +   1 << iommu->table_shift);
> > > + else
> > > + boundary_size = ALIGN(1UL << 32, 1 << iommu->table_shift);
> > > +
> > > + shift = iommu->table_map_base >> iommu->table_shift;
> > > + boundary_size = boundary_size >> iommu->table_shift;
> > > + /*
> > > +  * if the skip_span_boundary_check had been set during init, we set
> > > +  * things up so that iommu_is_span_boundary() merely checks if the
> > > +  * (index + npages) < num_tsb_entries
> > > +  */
> > > + if ((iommu->flags & IOMMU_NO_SPAN_BOUND) != 0) {
> > > + shift = 0;
> > > + boundary_size = iommu->poolsize * iommu->nr_pools;
> > > + }
> > > + n = iommu_area_alloc(iommu->map, limit, start, npages, shift,
> > > +  boundary_size, 0);
> > 
> > You have completely dropped the alignment support. This will break
> > drivers. There are cases (especially with consistent allocations) where
> 
> Again, not sure I follow? are you referring to the IOMMU_NO_SPAN_BOUND case?

No, the last argument to iommu_area_alloc() which is passed from the
callers when doing consistent allocs. Basically, the DMA api mandates
that consistent allocs are naturally aligned (to their own size), we
implement that on powerpc by passing that alignment argument down.

> That's very specific to LDC (sparc ldoms virtualization infra). The default
> is to not have IOMMU_NO_SPAN_BOUND set. 
> For the rest of the drivers, the code that sets up boundary_size aligns things
> in the same way as the ppc code.
> 
> 
> > the driver have alignment constraints on the address, those must be
> > preserved.
> > 
> 
> --Sowmini


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt

On Fri, 2015-04-03 at 08:57 +1100, Benjamin Herrenschmidt wrote:

> No, the last argument to iommu_area_alloc() which is passed from the
> callers when doing consistent allocs. Basically, the DMA api mandates
> that consistent allocs are naturally aligned (to their own size), we
> implement that on powerpc by passing that alignment argument down.

Talking of this ... I notice this is not documented in DMA-API.txt...
however many drivers make that assumption, and the DMA pool allocator
in mm/dmapool.c as well.

Maybe somebody should update DMA-API.txt...

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt

On Fri, 2015-04-03 at 09:01 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2015-04-03 at 08:57 +1100, Benjamin Herrenschmidt wrote:
> 
> > No, the last argument to iommu_area_alloc() which is passed from the
> > callers when doing consistent allocs. Basically, the DMA api mandates
> > that consistent allocs are naturally aligned (to their own size), we
> > implement that on powerpc by passing that alignment argument down.
> 
> Talking of this ... I notice this is not documented in DMA-API.txt...
> however many drivers make that assumption, and the DMA pool allocator
> in mm/dmapool.c as well.
> 
> Maybe somebody should update DMA-API.txt...

Ah it *is* documented in DMA-API-HOWTO.txt :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt

On Thu, 2015-04-02 at 17:54 -0400, Sowmini Varadhan wrote:
> the other question that comes to my mind is: the whole lazy_flush
> optimization probably works best when there is exactly one pool,
> and no large pools. In most other cases, we'd end up doing a lazy_flush
> when we wrap within our pool itself, losing the benefit of that 
> optimization. 

Not a big deal. The point of the optimization is to avoid flushing on
every map/unmap. As long as the pool wrapping is an order of magnitude
rarer than actual map/unmap operations, it's a win.

> Given that the lazy_flush is mostly there to avoid regressions for 
> the older sun4u architectures (which have other hardware bottlenecks 
> anyway), and this code is rapidly getting messy, does it make sense
> to constrain the lazy_flush check to only apply for the 1-pool, 
> no-large-pool case?

I was planning to use it to improve things on older G5s as well and I
definitely want pools on these. Leave it in, we just need to get it
right.

I think it's not that messy, if you refactor the way I suggested, it
basically boils down to conditionally flushing in two places, at
the point where the pool lock is dropped, if the hint for that pool
was moved backward.

Do the test once in the else case of the n == -1 for all the "normal"
cases (which includes the hint > limit, multi-pass wrap and *handle
below hint, they are all covered) and in the pool hop'ing case.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Benjamin Herrenschmidt

On Thu, 2015-04-02 at 18:15 -0400, Sowmini Varadhan wrote:
> On (04/03/15 08:57), Benjamin Herrenschmidt wrote:
> > 
> > > > I only just noticed too, you completely dropped the code to honor
> > > > the dma mask. Why that ? Some devices rely on this.
> 
> /* Sowmini's comment about this coming from sparc origins.. */
> 
> > Probably, not that many devices have limits on DMA mask but they do
> > exist. It becomes more important if we decide to create a very large
> > IOMMU window that spans beyond 4G in order to support devices with
> > 32-bit DMA masks. Otherwise it's older devices mostly with <32-bit
> > masks.
> > 
> > In any case, for a generic piece of code, this should be supported.
> > Basically, assume that if we have something in the powerpc code, we need
> > it, if you remove it, we won't be able to use your code generically.
> 
> I see.
> 
> is the mask something that can be stored in the iommu_map_table as
> part of the init? 
> 
No, the mask is per device and has to be retrieved from the device.
Additionally the mask used for dma_map_* can be different from the
consistent mask used for alloc_coherent (we have a bug there on powerpc
which I'm trying to fix btw).

So it should be passed as an argument by the caller.

> I can see that the align_order has to be an additional arg to 
> iommu_tbl_range_alloc, not sure if mask falls in that category
> as well.

It does.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Benjamin Herrenschmidt

On Fri, 2015-04-03 at 14:28 -0400, Sowmini Varadhan wrote:
> 
> Just want to confirm:
> 
> > > + again:
> > > + if (pass == 0 && handle && *handle &&
> > > + (*handle >= pool->start) && (*handle < pool->end))
> > > + start = *handle;
> > > + else
> > > + start = pool->hint;
> > 
> > Now this means "handle" might be < pool->hint, in that case you also
> > need a lazy flush. Or rather only if the resulting alloc is. My
>  :
> 
> > > + } else {
> > > + /* give up */
> > > + n = DMA_ERROR_CODE;
> > > + goto bail;
> > > + }
> > > + }
> > 
> > Here, make this something like:
> > 
> > } else if (end < pool->hint)
> > need_flush = true;
> 
> you mean 
> 
>   } else if (start < pool->hint)
> 
> right?  (so I'm not missing some corner-case that you are thinking
> about here)

No, I meant "n < pool->hint", ie, the start of the newly allocated
block.

"end" hasn't been adjusted yet at that point but we don't want to
compare "end" anyway (which will be n + npages), we really want n, ie if
the *beginning* of the newly allocated chunk is before the end of the
previous one (after all they may even overlap if the previous one has
been freed).

Ben.

> --Sowmini
>  


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-04 Thread Benjamin Herrenschmidt

On Sat, 2015-04-04 at 07:27 -0400, Sowmini Varadhan wrote:
> One last question before I spin out v9.. the dma_mask code
> is a bit confusing to me, so I want to make sure... the code is 
> 
> > if (limit + tbl->it_offset > mask) {
> >   limit = mask - tbl->it_offset + 1;
> >   /* If we're constrained on address range, first try
> >* at the masked hint to avoid O(n) search complexity,
> >* but on second pass, start at 0 in pool 0.
> >*/
> >   if ((start & mask) >= limit || pass > 0) {
> >   spin_unlock(&(pool->lock));
> >   pool = &(tbl->pools[0]);
> >   spin_lock(&(pool->lock));
> >   start = pool->start;
> 
> So here I would need to mark need_flush to true, right?

Well, no but ...

So you aren't changing any pool hint, so you don't have to mark any pool
for flushing. The "n < hint" check will do the flush later on if needed
(the one I told you to add).

You only need to force a flush if you modify the hint.

*However*, you do an unlock, which means that if you have modified any
*other* pool's hint (ie, need_flush is true), you need to actually
perform a flush.

I think that business with flushing on unlock is becoming too ugly, I
would suggest you simplify things by moving need_flush to be part of the
pool structure (maybe a flag).

IE, when you modify a pool's hint, your mark it for flushing. 

When you allocate from a pool, you flush it at the end, just before you
return success, if either it was marked for flushing or n < hint (and
clear the mark of course).

That should cleanly factor all cases in a simpler & more readable code
path. The flush is done only when needed (an actual allocation happen on
a pool), you just "mark them dirty" when you change their hints, you
don't have to worry about the lock dropped case & pool hop'ing anymore.

Basically the logic looks like that:

.../...

n = iommu_area_alloc(...)
if (n == -1) {
if (pass == 0) {
change pool hint
pool->need_flush = true;
pass++;
goto again;
} else if (!largealloc &&  ) {
unlock/pool hop/lock
// no need to flush anything here
pool->hint = pool->start;
pool->need_flush = true;
etc...
} else {
n = DMA_ERROR_CODE;
goto fail;
}
}
end = n + pages;

if (n < pool->hint || pool->need_flush) {
pool->need_flush = false;
iommu->lazy_flush(...);
}
pool->hint = end;

.../...

Now this can be further simplified imho. For example that logic:

+   if (pass == 0 && handle && *handle &&
+   (*handle >= pool->start) && (*handle < pool->end))
+   start = *handle;
+   else
+   start = pool->hint;

Can move above the again: label. that way, you remove the pass test
and you no longer have to adjust pool->hint in the "if (likely(pass ==
0))" failure case. Just adjust "start" and try again, which means you
also no longer need to set need_flush.

Additionally, when pool hop'ing, now you only need to adjust "start" as
well. By not resetting it, you also remove the need for marking it "need
flush". I would add to that, why not set "start" to the new pool's hint
instead of the new pool's start ? That will save some flushes, no ?

Anton, do you remember why you reset the hint when changing pool ? I
don't see that gaining us anything and it does make lazy flushing more
complex.

At that point, you have removed all setters of need_flush, that logic
can be entirely removed, or am I just too tired (it's past midnight) and
missing something entirely here ?

We only need the if (n < pool->hint) check to do the lazy flush at the
end, that's it.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCHv9 RFC 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-05 Thread Benjamin Herrenschmidt

On Sun, 2015-04-05 at 07:49 -0400, Sowmini Varadhan wrote:
> +   if (likely(pass == 0)) {
> +   /* First failure, rescan from the beginning.
> */
> +   pool->hint = pool->start;
> +   set_flush(iommu);
> +   pass++;
> +   goto again;
> +   } else if (!largealloc && pass <= iommu->nr_pools) {
> +   spin_unlock(&(pool->lock));
> +   pool_nr = (pool_nr + 1) & (iommu->nr_pools -
> 1);
> +   pool = &(iommu->pools[pool_nr]);
> +   spin_lock(&(pool->lock));
> +   pool->hint = pool->start;
> +   set_flush(iommu);
> +   pass++;
> +   goto again;

So you decided to keep the logic here that updates the hint instead of
just getting rid of need_flush alltogether ?

Out of curiosity, what's the rationale ? Did you find a reason why
resetting the hint in those two cases (rather than just setting "start"
appropriately) is actually useful ?

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [Skiboot] [PATCH 4/4] hwmon: (ibmpowernv) pretty print labels

2015-04-07 Thread Benjamin Herrenschmidt

On Tue, 2015-04-07 at 20:03 +0200, Cedric Le Goater wrote:
> 
> This is a shortcut. The code is for the ibmpowernv platform and assumes 
> that we are running on a P8 (8 hardware threads). It would be better to 
> use a "maximum threads per core" variable but I am not sure this is 
> available, as it is a tunable. I will look into it.

Yes, please don't hard code this...

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in of parsing

2015-04-07 Thread Benjamin Herrenschmidt

On Tue, 2015-04-07 at 17:24 -0700, Yinghai Lu wrote:
> For device resource PREF bit setting under bridge 64-bit pref resource,
> we need to make sure only set PREF for 64bit resource, so set IORESOUCE_MEM_64
> for 64bit resource during of device resource flags parsing.

> Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241

These patches (from the above BZ) aren't right for one thing: You
shouldn't set the IORESOURCE_PREFETCH in the resource itself. This
*will* break stuff.

You can put the resource in a prefetchable region indeed, if you
are on a PCIe-only path (though we should have some arch flag
to check that the host bridge indeed doesn't do any prefetch,
on powerpc we are good there).

But you can't set the "prefetch" bit on the resource because that would
be losing the indication that this resource has side effects. This bit
is used in some cases to enable store gathering when mmap'ing via sysfs
and can be used for similar uses by drivers.

It's one thing to say "you can put non-prefetch BARs in prefetch windows
on that path", it's another one to completely make the BAR looks like
it's prefetchable when it's not.

Cheers,
Ben.

> Signed-off-by: Yinghai Lu 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Gavin Shan 
> Cc: Yijing Wang 
> Cc: Anton Blanchard 
> Cc: linuxppc-dev@lists.ozlabs.org
> 
> ---
>  arch/powerpc/kernel/pci_of_scan.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/arch/powerpc/kernel/pci_of_scan.c
> ===
> --- linux-2.6.orig/arch/powerpc/kernel/pci_of_scan.c
> +++ linux-2.6/arch/powerpc/kernel/pci_of_scan.c
> @@ -44,8 +44,10 @@ static unsigned int pci_parse_of_flags(u
>  
>   if (addr0 & 0x0200) {
>   flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
> - flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
>   flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
> + if (addr0 & 0x0100)
> + flags |= IORESOURCE_MEM_64
> +  | PCI_BASE_ADDRESS_MEM_TYPE_64;
>   if (addr0 & 0x4000)
>   flags |= IORESOURCE_PREFETCH
>| PCI_BASE_ADDRESS_MEM_PREFETCH;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in of parsing

2015-04-08 Thread Benjamin Herrenschmidt

On Tue, 2015-04-07 at 22:39 -0700, Yinghai Lu wrote:
> On Tue, Apr 7, 2015 at 8:49 PM, Benjamin Herrenschmidt
>  wrote:
> > On Tue, 2015-04-07 at 17:24 -0700, Yinghai Lu wrote:
> >> For device resource PREF bit setting under bridge 64-bit pref resource,
> >> we need to make sure only set PREF for 64bit resource, so set 
> >> IORESOUCE_MEM_64
> >> for 64bit resource during of device resource flags parsing.
> >
> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=96261
> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=96241
> >
> > These patches (from the above BZ) aren't right for one thing: You
> > shouldn't set the IORESOURCE_PREFETCH in the resource itself. This
> > *will* break stuff.
> >
> > You can put the resource in a prefetchable region indeed, if you
> > are on a PCIe-only path (though we should have some arch flag
> > to check that the host bridge indeed doesn't do any prefetch,
> > on powerpc we are good there).
> 
> The patch is at:
> 
> http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=for-linus&id=1a3ec5e7b00dcd9cac24efe3d65bfccf82597ce5
> 
> and we limit to set pref bit for pcie end devices mmio 64bit resource.

This is still completely wrong. Please revert.

> >
> > But you can't set the "prefetch" bit on the resource because that would
> > be losing the indication that this resource has side effects. This bit
> > is used in some cases to enable store gathering when mmap'ing via sysfs
> > and can be used for similar uses by drivers.
> 
> Any pointer for that?
>
> >
> > It's one thing to say "you can put non-prefetch BARs in prefetch windows
> > on that path", it's another one to completely make the BAR looks like
> > it's prefetchable when it's not.
> 
> Too hard for me to tell the difference.

Fix it. Add a new bit if necessary "CAN_BE_IN_PREFETCH" or whatever but
you just cannot randomly fuck around with the flag like that.

You are basically dropping the information that the BAR has side effects
completely. This is WRONG.

The fact that you can put a non-prefetchable BAR in a prefetchable
window doesn't make the BAR itself prefetchable. Your patch makes it so.

That means that anything that rely on the BAR flag to establish
prefetchable or write-combining mappings will be fucked. Drivers might
test that, arch code will, powerpc sysfs mmap will (see our
implementation of __pci_mmap_set_pgprot), and quite possibly others
architectures or drivers doing the same thing.

This is utterly WRONG.

Use a different flag to indicate the BAR policy or temporarily tweak a
local copy of the resource flag during BAR assignment or whatever you
prefer to fix the original problem but do *not* change the Linux overall
view of that BAR to lie about prefetchability.

Bjorn, please revert this ASAP.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCHv9 RFC 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-08 Thread Benjamin Herrenschmidt

On Sun, 2015-04-05 at 07:49 -0400, Sowmini Varadhan wrote:
> Investigation of multithreaded iperf experiments on an ethernet
> interface show the iommu->lock as the hottest lock identified by
> lockstat, with something of the order of  21M contentions out of
> 27M acquisitions, and an average wait time of 26 us for the lock.
> This is not efficient. A more scalable design is to follow the ppc
> model, where the iommu_map_table has multiple pools, each stretching
> over a segment of the map, and with a separate lock for each pool.
> This model allows for better parallelization of the iommu map search.
> 
> This patch adds the iommu range alloc/free function infrastructure.

Sorry for the delay, I'm swamped with "stuff" at the moment, I wanted to
try actually porting powerpc over and testing but didn't get a chance.

I'm happy with your last version, feel free to add my

Acked-by: Benjamin Herrenschmidt 

I'll port powerpc over when I get a chance and if I find a need to tweak
something at that point, I'll send a separate patch on top of yours.

Cheers,
Ben.

> Signed-off-by: Sowmini Varadhan 
> ---
> v2 changes:
>   - incorporate David Miller editorial comments: sparc specific
> fields moved from iommu-common into sparc's iommu_64.h
>   - make the npools value an input parameter, for the case when
> the iommu map size is not very large
>   - cookie_to_index mapping, and optimizations for span-boundary
> check, for use case such as LDC.
> v3: eliminate iommu_sparc, rearrange the ->demap indirection to
> be invoked under the pool lock.
> 
> v4: David Miller review changes:
>   - s/IOMMU_ERROR_CODE/DMA_ERROR_CODE
>   - page_table_map_base and page_table_shift are unsigned long, not u32.
> 
> v5: Feedback from b...@kernel.crashing.org and a...@ozlabs.ru
>   - removed ->cookie_to_index and ->demap indirection: caller should
> invoke these as needed before calling into the generic allocator
> 
> v6: Benh/DaveM discussion eliminationg iommu_tbl_ops, but retaining flush_all
> optimization.
> 
> v7: one-time initialization of pool_hash from iommu_tbl_pool_init()
> 
> v8: Benh code review comments.
> 
> v9: More Benh code review comments, added dma_mask, align_order logic to
> iommu_tbl_range_alloc.
> 
>  include/linux/iommu-common.h |   51 
>  lib/Makefile |2 +-
>  lib/iommu-common.c   |  266 
> ++
>  3 files changed, 318 insertions(+), 1 deletions(-)
>  create mode 100644 include/linux/iommu-common.h
>  create mode 100644 lib/iommu-common.c
> 
> diff --git a/include/linux/iommu-common.h b/include/linux/iommu-common.h
> new file mode 100644
> index 000..bbced83
> --- /dev/null
> +++ b/include/linux/iommu-common.h
> @@ -0,0 +1,51 @@
> +#ifndef _LINUX_IOMMU_COMMON_H
> +#define _LINUX_IOMMU_COMMON_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define IOMMU_POOL_HASHBITS 4
> +#define IOMMU_NR_POOLS  (1 << IOMMU_POOL_HASHBITS)
> +
> +struct iommu_pool {
> + unsigned long   start;
> + unsigned long   end;
> + unsigned long   hint;
> + spinlock_t  lock;
> +};
> +
> +struct iommu_map_table {
> + unsigned long   table_map_base;
> + unsigned long   table_shift;
> + unsigned long   nr_pools;
> + void(*lazy_flush)(struct iommu_map_table *);
> + unsigned long   poolsize;
> + struct iommu_pool   pools[IOMMU_NR_POOLS];
> + u32 flags;
> +#define  IOMMU_HAS_LARGE_POOL0x0001
> +#define  IOMMU_NO_SPAN_BOUND 0x0002
> +#define  IOMMU_NEED_FLUSH0x0004
> + struct iommu_pool   large_pool;
> + unsigned long   *map;
> +};
> +
> +extern void iommu_tbl_pool_init(struct iommu_map_table *iommu,
> + unsigned long num_entries,
> + u32 table_shift,
> + void (*lazy_flush)(struct iommu_map_table *),
> + bool large_pool, u32 npools,
> + bool skip_span_boundary_check);
> +
> +extern unsigned long iommu_tbl_range_alloc(struct device *dev,
> +struct iommu_map_table *iommu,
> +unsigned long npages,
> +unsigned long *handle,
> +unsigned long mask,
> +unsigned int align_order);
> +
> +extern void iommu_tbl_range_free(struct iommu_map_table *iommu,
> +

Re: [PATCH 2/2] pci: Use Qemu created PCI device nodes

2015-04-25 Thread Benjamin Herrenschmidt

On Sat, 2015-04-25 at 17:31 +1000, Alexey Kardashevskiy wrote:
> We need BAR setup in 2 cases: when SLOF needs to boot from a PCI device 
> (and SLOF can do BAR setup) and when we do PCI hotplug - and BARs are set 
> by the guest, otherwise we hit races here (Michael Roth can tell more). So 
> as for today there is no reason for doing this in QEMU.

Doing BAR setup in the guest is a violation of PAPR though  we do it now
to workaround bugs I believe but normally the BARs are expected to be owned
and setup by FW on a PAPR system.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] leds/powernv: Add driver for PowerNV platform

2015-04-26 Thread Benjamin Herrenschmidt

On Thu, 2015-04-23 at 16:13 +0200, Jacek Anaszewski wrote:

> How the firmware is related to kernel? These bindings are for kernel,
> not for the firmware.

There should be no relation. DT bindings should be kernel agnostic and
represent the HW layout, not be designed based on what a given kernel
driver wants, but I'm digressing...

> DT bindings are compiled to *.dtb file which is concatenated with
> zImage. 

No. First, a "binding" isn't compiled to a dtb, a binding is a piece
documentation. A flat device tree *might* be compiled from a dts to a
dtb but that isn't necessarily the case.

For example, any machine with an actual implementation of Open Firmware
will essentially flatten OF internal tree into a dtb at boot time, and
that tree is itself generated by forth code.

In our case, the device tree is procedurally generated by two layers of
firmware, there is no dts "compilation" happening. HostBoot generates a
shell and OPAL extends it before flattening it and passing it to the
kernel.

The "concatenated with zImage" point you make is also a very ARM centric
one. ARM provides the *optional* ability to concatenate a dtb with a
zImage, but that's specific to ARM zImage wrapper. For example, powerpc
can do something similar (but not identical) using the "wrapper" script
we have in arch/powerpc/boot where we embed the dtb. However, this too
is optional, we have a longer history of having firwmares generating
device-trees. 

Note: We invented the whole FDT business :-)

> During system boot device drivers are matched with DT bindings
> through 'compatible' property. A driver should have single matching DT
> node, i.e. no other driver can probe with the same DT node.
> This implies that the node should contain only the properties required
> for configuring the related device.

I don't see how you goes from A to implying B here. Yes, a device
generally will have a single representing node but that doesn't mean
that the node only contains what the driver needs. The DT node can
contain all sort of auxilliary informations that the driver may or may
not be interested in that was deemed potentially relevant or useful by
the platform designer.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] leds/powernv: Add driver for PowerNV platform

2015-04-26 Thread Benjamin Herrenschmidt

On Fri, 2015-04-24 at 12:15 +0200, Jacek Anaszewski wrote:
> No matter what format of device tree OPAL produces, I assume that
> it must compile it from some sources.

Nope. Well... .C code maybe qualifies as "sources" :-)

> dtb file is a compiled form of human readable dts file containing
> Flattened Device Tree - a data structure for describing the hardware
> in the system.
> 
> Please refer to: http://elinux.org/Device_Tree

We are quite familiar with this, guess what architecture created the
whole flat device-tree stuff in the first place ?

In any case, in our case, the fdt is procedurally generated by firmware.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] leds/powernv: Add driver for PowerNV platform

2015-04-27 Thread Benjamin Herrenschmidt

On Mon, 2015-04-27 at 09:24 +0200, Jacek Anaszewski wrote:
> I was not aware that some other entity than the driver could be
> interested in the information provided by DT node. I will no longer
> object, provided that we will get an ack from DT maintainer.

My understanding is that we no longer need bindings to be accepted by DT
maintainers.

That being said, I would like to review this one before it is accepted,
as I haven't had a chance to do so yet :-) (I was mostly responding to
you, I haven't caught up with the full thread yet).

Please wait for an ack from myself or Michael Ellerman (the two powerpc
maintainers). Either that or provide your ack and we'll merge it via our
tree if we are happy with it.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/4] pci: Support 64-bit address translation

2015-04-27 Thread Benjamin Herrenschmidt

On Mon, 2015-04-27 at 11:32 +0200, Thomas Huth wrote:
> Anyway, I think we can fix that up as soon as somebody tries to run
> the
> js2x code again (I still have a YDL PowerStation at home somewhere,
> maybe I will have enough spare time to reactivate it one day).
> 
> And since your changes itself looks fine to me:

Another thing we could do with the new IPMI stack floating around would
be to do a js2x model in qemu :-) Shouldn't be too hard  I've
promised Alex Graf I would do a proper U4 model one day ... :-)

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/powernv: Add opal-prd channel

2015-04-29 Thread Benjamin Herrenschmidt

On Wed, 2015-04-01 at 14:07 +0800, Jeremy Kerr wrote:
> This change adds a char device to access the "PRD" (processor runtime
> diagnostics) channel to OPAL firmware.
> 
> Includes contributions from Vaidyanathan Srinivasan, Neelesh Gupta &
> Vishal Kulkarni.
> 
> Signed-off-by: Neelesh Gupta 
> Signed-off-by: Jeremy Kerr 
> 
> ---
>  arch/powerpc/include/asm/opal-api.h|   40 +
>  arch/powerpc/include/asm/opal.h|1 
>  arch/powerpc/include/uapi/asm/opal-prd.h   |   57 ++
>  arch/powerpc/platforms/powernv/Makefile|2 
>  arch/powerpc/platforms/powernv/opal-prd.c  |  440 +
>  arch/powerpc/platforms/powernv/opal-wrappers.S |1 
>  6 files changed, 539 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index 0321a90..b787b95 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -153,7 +153,8 @@
>  #define OPAL_FLASH_READ  110
>  #define OPAL_FLASH_WRITE 111
>  #define OPAL_FLASH_ERASE 112
> -#define OPAL_LAST112
> +#define OPAL_PRD_MSG 113
> +#define OPAL_LAST113
>  
>  /* Device tree flags */
>  
> @@ -352,6 +353,7 @@ enum opal_msg_type {
>   OPAL_MSG_SHUTDOWN,  /* params[0] = 1 reboot, 0 shutdown */
>   OPAL_MSG_HMI_EVT,
>   OPAL_MSG_DPO,
> + OPAL_MSG_PRD,
>   OPAL_MSG_TYPE_MAX,
>  };
>  
> @@ -674,6 +676,42 @@ typedef struct oppanel_line {
>   __be64 line_len;
>  } oppanel_line_t;
>  
> +enum opal_prd_msg_type {
> + OPAL_PRD_MSG_TYPE_INIT = 0, /* HBRT --> OPAL */
> + OPAL_PRD_MSG_TYPE_FINI, /* HBRT --> OPAL */
> + OPAL_PRD_MSG_TYPE_ATTN, /* HBRT <-- OPAL */
> + OPAL_PRD_MSG_TYPE_ATTN_ACK, /* HBRT --> OPAL */
> + OPAL_PRD_MSG_TYPE_OCC_ERROR,/* HBRT <-- OPAL */
> + OPAL_PRD_MSG_TYPE_OCC_RESET,/* HBRT <-- OPAL */
> +};
> +
> +struct opal_prd_msg {
> + uint8_t type;
> + uint8_t pad[3];
> + __be32  token;
> + union {
> + struct {
> + __be64  version;
> + __be64  ipoll;
> + } init;
> + struct {
> + __be64  proc;
> + __be64  ipoll_status;
> + __be64  ipoll_mask;
> + } attn;
> + struct {
> + __be64  proc;
> + __be64  ipoll_ack;
> + } attn_ack;
> + struct {
> + __be64  chip;
> + } occ_error;
> + struct {
> + __be64  chip;
> + } occ_reset;
> + };
> +};
> +
>  /*
>   * SG entries
>   *
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 7c6d7ea..4375cb4 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -193,6 +193,7 @@ int64_t opal_ipmi_recv(uint64_t interface, struct 
> opal_ipmi_msg *msg,
>   uint64_t *msg_len);
>  int64_t opal_i2c_request(uint64_t async_token, uint32_t bus_id,
>struct opal_i2c_request *oreq);
> +int64_t opal_prd_msg(struct opal_prd_msg *msg);
>  
>  int64_t opal_flash_read(uint64_t id, uint64_t offset, uint64_t buf,
>   uint64_t size, uint64_t token);
> diff --git a/arch/powerpc/include/uapi/asm/opal-prd.h 
> b/arch/powerpc/include/uapi/asm/opal-prd.h
> new file mode 100644
> index 000..938af8e
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/opal-prd.h
> @@ -0,0 +1,57 @@
> +/*
> + * OPAL Runtime Diagnostics interface driver
> + * Supported on POWERNV platform
> + *
> + * (C) Copyright IBM 2015
> + *
> + * Author: Vaidyanathan Srinivasan 
> + * Author: Jeremy Kerr 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2, or (at your option)
> + * any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_OPAL_PRD_H_
> +#define _UAPI_ASM_POWERPC_OPAL_PRD_H_
> +
> +#include 
> +
> +#define OPAL_PRD_VERSION 1
> +#define OPAL_PRD_RANGE_NAME_LEN  32
> +#define OPAL_PRD_MAX_RANGES  8
> +
> +#define OPAL_PRD_GET_INFO_IOR('o', 0x01, struct opal_prd_info)
> +#define OPAL_PRD_SCOM_READ   _IOR('o', 0x10, struct opal_prd_scom)
> +#define OPAL_PRD_SCOM_WRITE  _IOW('o', 0x11, struct opal_prd_scom)
> +
> +#ifndef __ASSEMBLY__
> +
> +struct opal_prd_ran

Re: [PATCH kernel v9 31/32] vfio: powerpc/spapr: Support multiple groups in one container if possible

2015-04-30 Thread Benjamin Herrenschmidt

On Thu, 2015-04-30 at 19:33 +1000, Alexey Kardashevskiy wrote:
> On 04/30/2015 05:22 PM, David Gibson wrote:
> > On Sat, Apr 25, 2015 at 10:14:55PM +1000, Alexey Kardashevskiy wrote:
> >> At the moment only one group per container is supported.
> >> POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
> >> IOMMU group so we can relax this limitation and support multiple groups
> >> per container.
> >
> > It's not obvious why allowing multiple TCE tables per PE has any
> > pearing on allowing multiple groups per container.
> 
> 
> This patchset is a global TCE tables rework (patches 1..30, roughly) with 2 
> outcomes:
> 1. reusing the same IOMMU table for multiple groups - patch 31;
> 2. allowing dynamic create/remove of IOMMU tables - patch 32.
> 
> I can remove this one from the patchset and post it separately later but 
> since 1..30 aim to support both 1) and 2), I'd think I better keep them all 
> together (might explain some of changes I do in 1..30).

I think you are talking past each other :-)

But yes, having 2 tables per group is orthogonal to the ability of
having multiple groups per container.

The latter is made possible on P8 in large part because each PE has its
own DMA address space (unlike P5IOC2 or P7IOC where a single address
space is segmented).

Also, on P8 you can actually make the TVT entries point to the same
table in memory, thus removing the need to duplicate the actual
tables (though you still have to duplicate the invalidations). I would
however recommend only sharing the table that way within a chip/node.

 .../..

> >>
> >> -1) Only one IOMMU group per container is supported as an IOMMU group
> >> -represents the minimal entity which isolation can be guaranteed for and
> >> -groups are allocated statically, one per a Partitionable Endpoint (PE)
> >> +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
> >> +container is supported as an IOMMU table is allocated at the boot time,
> >> +one table per a IOMMU group which is a Partitionable Endpoint (PE)
> >>   (PE is often a PCI domain but not always).

> > I thought the more fundamental problem was that different PEs tended
> > to use disjoint bus address ranges, so even by duplicating put_tce
> > across PEs you couldn't have a common address space.

Yes. This is the problem with P7IOC and earlier. It *could* be doable on
P7IOC by making them the same PE but let's not go there.

> Sorry, I am not following you here.
> 
> By duplicating put_tce, I can have multiple IOMMU groups on the same 
> virtual PHB in QEMU, "[PATCH qemu v7 04/14] spapr_pci_vfio: Enable multiple 
> groups per container" does this, the address ranges will the same.

But that is only possible on P8 because only there do we have separate
address spaces between PEs.

> What I cannot do on p5ioc2 is programming the same table to multiple 
> physical PHBs (or I could but it is very different than IODA2 and pretty 
> ugly and might not always be possible because I would have to allocate 
> these pages from some common pool and face problems like fragmentation).

And P7IOC has a similar issue. The DMA address top bits indexes the
window on P7IOC within a shared address space. It's possible to
configure a TVT to cover multiple devices but with very serious
limitations.

> >> +Newer systems (POWER8 with IODA2) have improved hardware design which 
> >> allows
> >> +to remove this limitation and have multiple IOMMU groups per a VFIO 
> >> container.
> >>
> >>   2) The hardware supports so called DMA windows - the PCI address range
> >>   within which DMA transfer is allowed, any attempt to access address space
> >> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> >> b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> index a7d6729..970e3a2 100644
> >> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> >> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> @@ -82,6 +82,11 @@ static void decrement_locked_vm(long npages)
> >>* into DMA'ble space using the IOMMU
> >>*/
> >>
> >> +struct tce_iommu_group {
> >> +  struct list_head next;
> >> +  struct iommu_group *grp;
> >> +};
> >> +
> >>   /*
> >>* The container descriptor supports only a single group per container.
> >>* Required by the API as the container is not supplied with the IOMMU 
> >> group
> >> @@ -89,10 +94,11 @@ static void decrement_locked_vm(long npages)
> >>*/
> >>   struct tce_container {
> >>struct mutex lock;
> >> -  struct iommu_group *grp;
> >>bool enabled;
> >>unsigned long locked_pages;
> >>bool v2;
> >> +  struct iommu_table tables[IOMMU_TABLE_GROUP_MAX_TABLES];
> >
> > Hrm,  so here we have more copies of the full iommu_table structures,
> > which again muddies the lifetime.  The table_group pointer is
> > presumably meaningless in these copies, which seems dangerously
> > confusing.
> 
> 
> Ouch. This is bad. No, table_group is not pointless here as it is used to 
> get to the PE number to invalidate TCE cache. I just realized although I 
> ne

Re: [PATCH] powerpc/powernv: Add opal-prd channel

2015-05-01 Thread Benjamin Herrenschmidt

On Fri, 2015-05-01 at 11:46 +0800, Jeremy Kerr wrote:
> Hi Ben,
> 
> >> +static LIST_HEAD(opal_prd_msg_queue);
> >> +static DEFINE_SPINLOCK(opal_prd_msg_queue_lock);
> >> +static DECLARE_WAIT_QUEUE_HEAD(opal_prd_msg_wait);
> >> +static atomic_t usage;
> > 
> > opal_prd_usage ... otherwise  it's a mess in the symbols map
> 
> OK, I'll change this.
> 
> > Also why limit the number of opens ? we might want to have tools using
> > the opal prd for xscom :-) (in absence of debugfs). .. as long as not
> > two people read() it should be ok. Or a tool to dump the regions etc...
> > 
> > I don't see any reason to block multiple open's.
> 
> Simplicity, really. We can do a "get exclusive", but there's no
> (current) use-case for multiple openers on a PRD interface.

Sure but if we want to add one we have to change the kernel which is
nasty ... I always try to think a bit ahead when it comes to kernel
interfaces.

> Pulling this thread a little, you've hit on a key decision point of the
> prd design - I see there being two directions we could take with this:
> 
>  1) This interface is specifically for PRD functions, or
> 
>  2) This interface is a generic userspace interface to OPAL,
> and PRD is a subset of that.
>
> I've been aiming for (1) with the current code; and the nature of the
> generic read() & write() operations being PRD-specific enforces that.
> 
> Allowing multiple openers will help with (2), but if we want to go in
> that direction, I think we'd be better off doing a couple of other
> changes too:
> 
>  * move the general functions (eg xscom, range mappings, OCC control)
>to a separate interface that isn't tied to PRD - say just /dev/opal

Well, there's debugfs but then we don't want to *rely* on that as API

>  * using this prd code for only the prd-event handling, possibly
>renamed to /dev/opal-prd-events. This would still need some
>method of enforcing exclusive access.
> 
> In this case, the actual PRD application would use both devices,
> dequeueing events (and updating the ipoll mask) from the latter, and
> using the former for helper functionality.
> 
> Other tools (eg generic xscom access) would just use the generic
> interface, and not the PRD one, which wouldn't enforce exclusive access.

Or make it all /dev/opal with an ioctl to receive the PRD messages which
only one open fd can do. Keeps things simpler. Ie, rename /dev/prd
to /dev/opal and add _IOC_PRD :-)

> Regardless of the choice here, we could also remove the single-open
> exclusion, and shift that responsibility to userspace (eg, flock() on
> the PRD device node?). The main reason for the exclusion is to prevent
> multiple prd daemons running, which may get messy when updating the
> ipoll mask.

Well, the exclusion on _IOC_PRD that enables reception of PRD messages
works. Unless we want a way to "sniff" PRD messages but that gets harder
if the kernel has to maintain multiple queues so let's not go there.

> > Should we rely exclusively on userspace setting the right permissions or
> > should we check CAP_SYSADMIN here ?
> 
> I'm okay with relying on userspace, is there any reason not to?

Not really I suppose. What does /dev/mem do ?

> 
> >> +  vma->vm_page_prot = phys_mem_access_prot(file, vma->vm_pgoff,
> >> +   size, vma->vm_page_prot)
> >> +  | _PAGE_SPECIAL;
> >> +
> >> +  rc = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, size,
> >> +  vma->vm_page_prot);
> > 
> > Do we still have the warnings of process exist about the map count or is
> > that fixed ?
> 
> No, not fixed at present. I'll need to chat to you about that.

Ok, I need to figure out/remember what we need to do to avoid it.

> >> +  case OPAL_PRD_SCOM_READ:
> >> +  rc = copy_from_user(&scom, (void __user *)param, sizeof(scom));
> >> +  if (rc)
> >> +  return -EFAULT;
> >> +
> >> +  rc = opal_xscom_read(scom.chip, scom.addr,
> >> +  (__be64 *)&scom.data);
> > 
> > Are we exporting these for modules ?
> 
> No, but opal-prd isn't configurable as a module at the moment.

Why ?

> > 
> >> +  scom.data = be64_to_cpu(scom.data);
> >> +  pr_debug("ioctl SCOM_READ: chip %llx addr %016llx "
> >> +  "data %016llx rc %d\n",
> >> +  scom.chip, scom.addr, scom.data, rc);
> > 
> > pr_devel ?
> 
> This removes the possibility of CONFIG_DYNAMIC_DEBUG, is that intentional?

Too noisy. Enabling DYNAMIC_DEBUG doesn't mean I want to have my log
flooded with the thousands of SCOMs that PRD is going to do.

> > 
> >> +  if (rc)
> >> +  return -EIO;
> > 
> > Should we consider returning more info about the SCOM error ? HBRT might
> > actually need that... Maybe opal_prd_scom needs a field for the OPAL rc
> > which is currently not very descriptive but that's fixable.
> 
> Sounds good, I'll add that in. On error, we'll return -EIO and have t

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-01 Thread Benjamin Herrenschmidt

On Fri, 2015-05-01 at 07:54 -0500, Rob Herring wrote:

> The difference seems to be whether you allocate space or just point to
> the FDT for various strings/data. Is that right?
> 
> >* of_fdt_add_subtree() is the introduced API to do the work.
> 
> Have you looked at overlays and if so why do they not work for your purposes?
> 
> Why do you need to do this with the flattened tree?

The basic idea I asked Gavin to implement is that since the FW needs to
provide a bunch of DT updates to Linux at runtime in the form of new
nodes below an existing one, rather than doing it via some new/custom
format, instead, have it send a bit of FDT blob to expand under an
existing node.

As for the details of Gavin implementation, I haven't looked at it in
details yet so there might be issues there, however I don't know what
you mean by "overlays", any pointer ?

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-01 Thread Benjamin Herrenschmidt

On Fri, 2015-05-01 at 13:46 -0500, Rob Herring wrote:
> On Fri, May 1, 2015 at 10:22 AM, Benjamin Herrenschmidt
>  wrote:
> > On Fri, 2015-05-01 at 07:54 -0500, Rob Herring wrote:
> >
> >> The difference seems to be whether you allocate space or just point to
> >> the FDT for various strings/data. Is that right?
> >>
> >> >* of_fdt_add_subtree() is the introduced API to do the work.
> >>
> >> Have you looked at overlays and if so why do they not work for your 
> >> purposes?
> >>
> >> Why do you need to do this with the flattened tree?
> >
> > The basic idea I asked Gavin to implement is that since the FW needs to
> > provide a bunch of DT updates to Linux at runtime in the form of new
> > nodes below an existing one, rather than doing it via some new/custom
> > format, instead, have it send a bit of FDT blob to expand under an
> > existing node.
> 
> Overlay = an FDT blob to graft into a live running system. Sounds like
> the same thing.
> 
> > As for the details of Gavin implementation, I haven't looked at it in
> > details yet so there might be issues there, however I don't know what
> > you mean by "overlays", any pointer ?
> 
> CONFIG_OF_OVERLAY
> 
> http://events.linuxfoundation.org/sites/events/files/slides/dynamic-dt-keynote-v3.pdf

Well, that looks horrendously complicated, poorly documented and totally
unused in-tree outside of the unittest stuff, yay ! It has all sort of
"features" that I don't really care about.

I still don't see what it buys me other than making my FW a lot more
complex having to generate all that additional fixup etc... crap that I
don't totally get yet.

What's wrong with just unflattening the nodes in place ? The DT comes
from the FW in the first place so all the phandles are already good in
the new added blob. Internally, the FW created new nodes in its internal
representation and flattened the subtree and sends that subtree to
Linux.

I don't plan to play "revert" either, if you unplug, I do need to remove
what's under the slot but that's true of boot time devices, not just
"new" ones, so the overlay stuff won't do the trick and I certainly
don't want to keep track...

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-01 Thread Benjamin Herrenschmidt

On Sat, 2015-05-02 at 08:57 +1000, Benjamin Herrenschmidt wrote:

> > Overlay = an FDT blob to graft into a live running system. Sounds like
> > the same thing.
> > 
> > > As for the details of Gavin implementation, I haven't looked at it in
> > > details yet so there might be issues there, however I don't know what
> > > you mean by "overlays", any pointer ?
> > 
> > CONFIG_OF_OVERLAY
> > 
> > http://events.linuxfoundation.org/sites/events/files/slides/dynamic-dt-keynote-v3.pdf
> 
> Well, that looks horrendously complicated, poorly documented and totally
> unused in-tree outside of the unittest stuff, yay ! It has all sort of
> "features" that I don't really care about.

Looking a bit more at it, I don't quite see how I can attach a subtree
using that stuff.

Instead, each node in the overlay seems to need extra nodes and
properties to refer to the original.

So the FW would essentially have to create something a lot more complex
than just reflattening a bit of its internal tree. For each internal
node, it will need to add all those __overlay__ nodes and properties.

That is not going to fly for me at all. It's order of magnitudes more
complex than the solution we are pursuing.

So I think for our use case, we should continue in the direction of
having a helper to unflatten a piece of FDT underneath an existing
node. I don't like the "HYBRID" stuff though, we should not refer to
the original FDT, we should just make them normal dynamic nodes.

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-01 Thread Benjamin Herrenschmidt

On Sat, 2015-05-02 at 09:29 +1000, Benjamin Herrenschmidt wrote:

> Looking a bit more at it, I don't quite see how I can attach a subtree
> using that stuff.
> 
> Instead, each node in the overlay seems to need extra nodes and
> properties to refer to the original.
> 
> So the FW would essentially have to create something a lot more complex
> than just reflattening a bit of its internal tree. For each internal
> node, it will need to add all those __overlay__ nodes and properties.
> 
> That is not going to fly for me at all. It's order of magnitudes more
> complex than the solution we are pursuing.
> 
> So I think for our use case, we should continue in the direction of
> having a helper to unflatten a piece of FDT underneath an existing
> node. I don't like the "HYBRID" stuff though, we should not refer to
> the original FDT, we should just make them normal dynamic nodes.

A bit more thought... if we were to use the overlay stuff, Gavin, what
we *could* do is add to OPAL FW internal representation a generation
count to every node and property.

That way we could essentially know whenever something's changed from
what we flattened originally for the kernel.

We can then create a generic (not PCI specific) call that generates
an overlay tree for every node and property that has a generation
count that is newer than what was flattened (or passed by the OS).

It's still a LOT more complex than what we need though...

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-03 Thread Benjamin Herrenschmidt

On Mon, 2015-05-04 at 11:30 +1000, Gavin Shan wrote:
> Thanks, Ben. If we really need utilize overlay to support our case,
> we need some one to parse the input (device-tree changes) from
> firmware
> and create "overlay" device node and "target" node as I mentioned
> above.
> It's not simpler than the way we had to support our case. I'm not sure
> if we really need utilize overlay for our case.

No, if we decide to go down that path, then the FW needs to create
the overlay.

This could be done by having some kind of versioning to all nodes and
properties using a global generation count.

Ie, if we "know" what we passed to Linux, we can generate an overlay
that contains everything that changed since then using the version
numbers.

However, we should probably encode the version in the tree itself and
have specific APIs to retrieve "from" a given version to properly deal
with kexec'ing a kernel since in that case, the new kernel will have
something that isn't version 0 but version N where N is the latest
applied overlay.

Also I don't know how removing nodes works with overlay. IE the overlay
system is designed around the idea of removing the overlay to retrieve
the original tree.

In our case, our overlays are meant to be fully committed, and I don't
know whether there's a way to keep track. On unplug, we will just remove
all the nodes below the slot.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-04 Thread Benjamin Herrenschmidt

On Mon, 2015-05-04 at 19:41 +0300, Pantelis Antoniou wrote:
> 
> You get all of the corner cases handled for free. Perhaps it works for
> your case too.
> 
> Perhaps you can educate me on what you need supported and we can make
> sure it’s included.

Which corner cases ?

IE, what I want is simply "update" the device-tree below a PCIe slot on
PCI hotplug.

The DT isn't "compiled" from a dts (it's amazing how many people seem to
believe this is the only way you get fdt's nowadays). It's dynamically
(ie programatically) generated by firmware at boot time and contains
whatever PCIe devices happen to be plugged during boot.

When doing PCIe hotplug operations, the kernel does various FW calls
(among others to control slot power), and during these, the FW re-probes
underneath the slot and refreshes its internal representation. So the
phandles remain fully consistent, there is no fixup needed.

We want the kernel to also update his copy as wee in order to avoid
keeping stale nodes that don't match what's there anymore. Also, when
plugging specific kind of IO drawers, the FW can provide additional node
and properties that will be used to control slots inside the drawers.

So what we need is:

  - On PCIe unplug, remove all old nodes below the slot
  - On PCIe plug, get all the new nodes from FW

Note that there is no need to do anything like platform device probing
etc... the PCI layer takes care of that, we will remove the old nodes
after the pci_dev are gone and create the new ones before Linux
re-probes the PCIe bus subtree.

So what we need is very simple: The removal can be handled without FW
help, and the plug case is a matter of just transferring all those new
nodes to Linux to re-expand.

Since the phandle etc... are all consistent with the original tree,
there is no fixups required.

So the "trivial" way to do it (and the way we have implemented the FW
side so far) is to have the FW simply "flatten" the subtree below the
slot and pass it to Linux, with the intent of expanding it back below
the slot node.

This is what Gavin proposed patches do.

The overlay mechanism adds all sorts of features that we don't seen to
need and would make the above more complex.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH] powerpc/mm: Return NULL for not present hugetlb page

2015-05-07 Thread Benjamin Herrenschmidt

On Thu, 2015-05-07 at 12:46 +0530, Aneesh Kumar K.V wrote:
> We need to check whether pte is present in follow_huge_addr and
> properly return NULL if mapping is not present. Also use READ_ONCE
> when dereferencing pte_t address.

Do that need to go to stable as well ?

> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/hugetlbpage.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 0ce968b00b7c..f5688423bc69 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -689,27 +689,34 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
>  struct page *
>  follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
>  {
> - pte_t *ptep;
> - struct page *page;
> + pte_t *ptep, pte;
>   unsigned shift;
>   unsigned long mask, flags;
> + struct page *page = ERR_PTR(-EINVAL);
> +
> + local_irq_save(flags);
> + ptep = find_linux_pte_or_hugepte(mm->pgd, address, &shift);
> + if (!ptep)
> + goto no_page;
> + pte = READ_ONCE(*ptep);
>   /*
> +  * Verify it is a huge page else bail.
>* Transparent hugepages are handled by generic code. We can skip them
>* here.
>*/
> - local_irq_save(flags);
> - ptep = find_linux_pte_or_hugepte(mm->pgd, address, &shift);
> + if (!shift || pmd_trans_huge((pmd_t)pte))
> + goto no_page;
>  
> - /* Verify it is a huge page else bail. */
> - if (!ptep || !shift || pmd_trans_huge(*(pmd_t *)ptep)) {
> - local_irq_restore(flags);
> - return ERR_PTR(-EINVAL);
> + if (!pte_present(pte)) {
> + page = NULL;
> + goto no_page;
>   }
>   mask = (1UL << shift) - 1;
> - page = pte_page(*ptep);
> + page = pte_page(pte);
>   if (page)
>   page += (address & mask) / PAGE_SIZE;
>  
> +no_page:
>   local_irq_restore(flags);
>   return page;
>  }


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: G5 Xserve rackmeter broken?

2015-05-10 Thread Benjamin Herrenschmidt

On Sun, 2015-05-10 at 21:32 +0300, Aaro Koskinen wrote:
> Hi,
> 
> With 4.1-rc2 the rackmeter driver for G5 Xserve is giving bogus
> led patterns. So far I have seen at least the following:
> 
> a) With static load the leds seems to be sane and report CPU
> usage properly, but after few minutes they go completely OFF,
> even if the CPU load remains high.
> 
> b) On a completely idle system, leds alter between all OFF and all ON
> roughly once a second.
> 
> Unfortunately I cannot say which was the last kernel where this worked
> properly... These servers were away from normal use for a while due
> to PSU issues.

And mine is dead due to ... PSU issue :-(

It could be that what we use to get the "idle time" isn't correct
anymore...

Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] i2c: powermac: don't workaround for keywest

2015-05-10 Thread Benjamin Herrenschmidt

On Sun, 2015-05-10 at 20:34 +0200, Wolfram Sang wrote:
> Okay, so this patch is bogus. I understand now that onyx uses another
> codec than TAS, so this change will regress on other machines.
> However,
> it shows that this unconditional instantiation of the TAS breaks sound
> on Macintoshs which still need non-aoa sound support. I assume there
> will be noone in the near future to convert Keywest to AOA, so we'll
> need to find a hackish way around this instantiation problem.

Converting the old macs that use TAS shouldn't be *that* hard, main
problem is I don't have the hardware to test...

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] i2c: powermac: don't workaround for keywest

2015-05-11 Thread Benjamin Herrenschmidt

On Mon, 2015-05-11 at 09:34 +0200, w...@the-dreams.de wrote:
> On Mon, May 11, 2015 at 08:14:47AM +1000, Benjamin Herrenschmidt wrote:
> > On Sun, 2015-05-10 at 20:34 +0200, Wolfram Sang wrote:
> > > Okay, so this patch is bogus. I understand now that onyx uses another
> > > codec than TAS, so this change will regress on other machines.
> > > However,
> > > it shows that this unconditional instantiation of the TAS breaks sound
> > > on Macintoshs which still need non-aoa sound support. I assume there
> > > will be noone in the near future to convert Keywest to AOA, so we'll
> > > need to find a hackish way around this instantiation problem.
> > 
> > Converting the old macs that use TAS shouldn't be *that* hard, main
> > problem is I don't have the hardware to test...
> 
> Dan and Mark (on CC) have been very helpful with testing. Maybe they are
> also in to test "the proper solution"?

Possibly, depends how many machines we can cover. I also have extremely
little time, so I'm not that keen on volunteering :-) I'll *try* to have
a look some time this or next week.

The main problem is the difference in the way the layout of the codec
etc... is reported from the DT by the firmware. Otherwise the HW is the
same between pre-AOA and post-AOA really ...

I do have a bunch of DT snapshots lying around, so I can try to figure
out something.

Dan, Mark, what machine models specifically do you have ? (compatible
property in the DT pls, ie, /proc/device-tree/compatible).

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: G5 Xserve rackmeter broken?

2015-05-12 Thread Benjamin Herrenschmidt

On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> I'm running with HZ=100 so the values are still probably within
> jiffy resolution, so perhaps the calculation should first do
> idle = min(idle, total)?

Does it gives you a reasonable output if you do that ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: mpt2sas DMA mask

2015-05-12 Thread Benjamin Herrenschmidt

On Tue, 2015-05-12 at 17:07 -0500, Brian King wrote:
> The mpt2sas driver was changed late last year in that it now requests a 64 
> bit DMA
> mask, then requests a 32 bit coherent DMA mask, then later requests a 64 bit 
> coherent
> DMA mask. This was 5fb1bf8aaa832e1e9ca3198de7bbecb8eff7db9c. This breaks 64 
> bit DMA
> support for mpt2sas on Power and we always fall back to using 32 bit DMA. 
> Looking
> at the commit log, it looks like this was an intentional change.
> 
> Ben - you had a patch set you had posted to the list back in Feb of this 
> year, but it
> doesn't look like it got merged. 

Right, it was broken, we can't do that unfortunately. It's a hard
problem to fix.

> https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-February/125087.html
> 
> This would fix the issue I'm seeing on mpt2sas. Do you plan to dust that 
> patch set
> off and upstream it? Were there issues with it that still need to be resolved?

No that patch doesn't actually work.

What we need is essentially a new set of DMA ops that can route an
individual map request via the iommu or the bypass window depending on
what mask applies.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] mpt2sas: Fall back to 64 bit coherent mask if 64 bit DMA / 32 bit coherent mask not supported

2015-05-13 Thread Benjamin Herrenschmidt

On Wed, 2015-05-13 at 22:14 +0530, Sreekanth Reddy wrote:
> Hi Brain,
> 
> We had a restriction from the firmware that the upper 32 bits of the
> RDPQ pool entries must be same. So to limit this restriction we
> initially set the consistent DMA mask to 32 and then after allocating
> the RDPQ pools we changed the consistent DMA mask to 64 before
> allocating remaining pools.

Brian, maybe we should just bite that bullet and implement that hybrid
DMA ops I mentioned. I'll see if I can spare some time today to look
at it. It would work as long as we never remove the legacy window using
the DDW APIs (removing the legacy window is broken anyway for other
reasons I think we discussed already).

As long as we have both the legacy window and the DDW available (or the
legacy and bypass on "nv"), we can then route individual DMAs according
to the corresponding applicable mask.

I'll try to come up with a patch but I'll need you to test it.

Cheers,
Ben.

> Regards,
> Sreekanth
> 
> On Wed, May 13, 2015 at 7:42 PM, Brian King  wrote:
> > On 05/13/2015 08:31 AM, Arnd Bergmann wrote:
> >> On Wednesday 13 May 2015 08:23:41 Brian King wrote:
> >>> On 05/13/2015 03:10 AM, Arnd Bergmann wrote:
>  On Tuesday 12 May 2015 18:24:43 Brian King wrote:
> >
> > Commit 5fb1bf8aaa832e1e9ca3198de7bbecb8eff7db9c broke 64 bit DMA for 
> > mpt2sas on Power.
> > That commit changed the sequence for setting up the DMA and coherent 
> > DMA masks so
> > that during initialization the driver requests a 64 bit DMA mask and a 
> > 32 bit consistent
> > DMA mask, then later requests a 64 bit consistent DMA mask. The Power 
> > architecture does
> > not currently support this, which results in always falling back to a 
> > 32 bit DMA window,
> > which has a negative impact on performance. Tweak this algorithm 
> > slightly so that
> > if requesting a 32 bit consistent mask fails after we've successfully 
> > set a 64 bit
> > DMA mask, just try to get a 64 bit consistent mask. This should 
> > preserve existing
> > behavior on platforms that support mixed mask setting and restore 
> > previous functionality
> > to those that do not.
> >
> > Signed-off-by: Brian King 
> 
>  I believe the way the API is designed, it should guarantee that after 
>  dma_set_mask()
>  succeeds for a device, dma_set_coherent_mask() with the same mask will 
>  also succeed.
> 
>  Could you just call dma_set_mask_and_coherent() here to avoid that 
>  complex logic?
> >>>
> >>> I don't think that would work. The mpt2sas driver wants to set the dma 
> >>> mask to 64bits
> >>> but set the coherent mask to 32 bits, then my change is to set the 
> >>> coherent mask to
> >>> 64bits if setting it to 32bit fails. I'm not seeing how using 
> >>> dma_set_mask_and_coherent
> >>> would simplify anything here.
> >>
> >> My question was more about why the driver would ask for a 32-bit coherent
> >> mask to start with. Is this a limitation in the mpt2sas device that can
> >> only have descriptors at low addresses, or is it trying to work around some
> >> bug in a particular host system?
> >
> > I'll need help from Sreekanth in answering that. All that is in the git 
> > commit log is:
> >
> > 2. Initially set the consistent DMA mask to 32 bit and then change it 
> > to 64 bit mask
> > after allocating RDPQ pools by calling the function 
> > _base_change_consistent_dma_mask.
> > This is to ensure that all the upper 32 bits of RDPQ entries's base 
> > address to be same.
> >
> > So, I'm not sure my patch won't break something with mpt2sas... This commit 
> > also changed
> > how the RDPO pools are allocated, so its possible this won't work. However, 
> > I did load
> > it on Power box and I'm not seeing any major problems so far. I am seeing 
> > the following message
> > get logged on a regular basis, not sure if its related to this change or 
> > not:
> >
> > mpt2sas3: log_info(0x31120303): originator(IOP), code(0x03), 
> > sub_code(0x3112)
> >
> > -Brian
> >
> >
> > --
> > Brian King
> > Power Linux I/O
> > IBM Linux Technology Center
> >
> >


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-13 Thread Benjamin Herrenschmidt

On Tue, 2015-05-05 at 07:14 +1000, Benjamin Herrenschmidt wrote:
> So the "trivial" way to do it (and the way we have implemented the FW
> side so far) is to have the FW simply "flatten" the subtree below the
> slot and pass it to Linux, with the intent of expanding it back below
> the slot node.
> 
> This is what Gavin proposed patches do.
> 
> The overlay mechanism adds all sorts of features that we don't seen to
> need and would make the above more complex.

Guys, I never got a final answer from you on this. Are we ok with adding
the way to just expand a subtree or are you insistent we need to use the
overlap mechanism ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-13 Thread Benjamin Herrenschmidt

On Wed, 2015-05-13 at 19:18 -0500, Rob Herring wrote:

> I haven't decided really.
> 
> The main thing with the current patch is I don't really like the added
> complexity to unflatten_dt_node. It is already a fairly complex
> function. Perhaps removing of "hybrid" as discussed will help?

I agree, we should be able to make that much simpler, I was planning on
sorting that out with Gavin.

> If there are things we can do to make overlays easier to use in your
> use case, I'd like to hear ideas. I don't really buy that being more
> complex than needed is an obstacle. That is very often the case to
> have common, scale-able solutions. I want to see a simple case be
> simple to support.

Well, it's a LOT more complex from the FW perspective for a bunch of
features we don't really need, in a way because the DT update in our
case is just purely informational to avoid keeping wrong/outdated DT
bits, it has little functional impact (it might have a bit for interrupt
routing through bridges though).

However, I am also pursuing an approach on FW side using a generation
count in our nodes and properties which we could use to generate
arbitrary overlays if we know what generation linux has.

There might actual be a usage scenario for a generic way for our
firwmare to convey DT updates to Linux for other reasons.

A few things that I don't find in the overlay code (but maybe I haven't
looked at it hard enough):

 - Can it remove nodes/properties ?

 - Can it "commit" a changeset so it's permanently part of the main DT ?
We will never have a concept of "revertable" changesets, if we need a
subsequent update, we will get a new overlay from FW that will remove
what needs to be removed and add what needs to be added.

IE, our current mechanism without overlay is fairly simple:

  - On PCI unplug, we remove all nodes below the slot (from linux),
the FW does the equivalent internally.

  - On PCI re-plug, the FW internally builds new nodes and sends a
new subtree as an FDT that we can expand/attach.

Now we could consider that subtree as a changeset that can be undone,
but that wouldn't work for boot time. And subsequent updates wouldn't
have that concept of "undoing" anyway.

IE. conceptually, what overlays do today is quite rooted around the idea
of having a fixed "base" DT and some pre-compiled DTB overlays that
get added/removed. The design completely ignore the idea of a FW that
maintains a "live" tree which we want to keep in sync, which is what we
want to do here, or what we could do with a "live" open firmware
implementation.

Now we might be able to reconcile them, but it feels to me that the
overlay/changeset stuff is too rooted in the first concept...

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-13 Thread Benjamin Herrenschmidt

On Thu, 2015-05-14 at 09:23 +0300, Pantelis Antoniou wrote:

> > A few things that I don't find in the overlay code (but maybe I haven't
> > looked at it hard enough):
> > 
> > - Can it remove nodes/properties ?
> > 
> 
> Yes.

Ok, I've missed that when looking at the overlay code then, I'll have to
give it a closer look.

> > - Can it "commit" a changeset so it's permanently part of the main DT ?
> > We will never have a concept of "revertable" changesets, if we need a
> > subsequent update, we will get a new overlay from FW that will remove
> > what needs to be removed and add what needs to be added.
> > 
> 
> The overlay when applied is a part of the kernel DT tree.
> It is trivial to add a mechanism that simply commits everything and
> tosses away the revert information.
> 
> Note that in that case you have to make provisions for the unflatten
> blob to not be freed or for the device tree nodes/properties to be
> dynamically allocated.

I think it makes sense to do the dynamic thing anyway...

> > IE, our current mechanism without overlay is fairly simple:
> > 
> >  - On PCI unplug, we remove all nodes below the slot (from linux),
> > the FW does the equivalent internally.
> > 
> 
> If you use an overlay, you just revert it and everything would
> be as it was before, without anything hanging below the slot node.

Except that doesn't work for the boot time content which we get
from the firmware as part of the initial FDT (and we can't change that
without breaking backward compatibility).

> Note that the ‘remove all nodes below the slot’ does not work for my case.
> 
> That is because there are devices being instantiated under the slot
> (i2c busses, i2c devices, FPGAs etc) that need to be removed from the
> system.

Right while in my case, there isn't, it's just the standard OF PCI
representation generated by FW, the main thing is that it might have
some enriched properties for some known cable cards of external drawers
that are good to have.

> >  - On PCI re-plug, the FW internally builds new nodes and sends a
> > new subtree as an FDT that we can expand/attach.
> > 
> 
> You can easily send a DT blob containing an overlay from firmware.

Sending one is easy. Building it is not :-)

> It can be even easy, since you might not have to recreate the full blob
> each time, but instead using flat device tree methods to populate the
> few properties that change each time.

No, we basically have our internal tree in the firmware in a format
similar to Linux, ie, a pointer based tree. We can "flatten" it of
course, but generating an overlay is trickier. We can, it's just more
work and we are running out of time (I basically have to cut that FW in
the next few days, then we'll be stuck with whatever interfaces we
created, I have a big of time to fix bugs after that but that's about
it).

> > Now we could consider that subtree as a changeset that can be undone,
> > but that wouldn't work for boot time. And subsequent updates wouldn't
> > have that concept of "undoing" anyway.
> > 
> 
> I have posted another patch that does boot-time DT quirk which are
> non-revertable.
> 
> https://lkml.org/lkml/2015/2/18/258

Not sure how that applies in my case ... I can't change the
representation of the PCI subtree, this is standard OFW representation,
I can't change the FW to make it an overlay-like thing at boot time,
that would break existing kernels.

> > IE. conceptually, what overlays do today is quite rooted around the idea
> > of having a fixed "base" DT and some pre-compiled DTB overlays that
> > get added/removed. The design completely ignore the idea of a FW that
> > maintains a "live" tree which we want to keep in sync, which is what we
> > want to do here, or what we could do with a "live" open firmware
> > implementation.
> > 
> > Now we might be able to reconcile them, but it feels to me that the
> > overlay/changeset stuff is too rooted in the first concept…
> > 
> 
> The first DT overlays use case (beaglebone capes) is what got the concept
> started.
> 
> Right now is a generic mechanism to apply modifications to the kernel
> live tree, with the possibility to revert them.

Yes but as I said it's not really thought in term of keeping the kernel
tree in sync with an external dynamically generated tree. Maybe we can
fix it, but it's more complex...

Ben.

> > Ben.
> > 
> > 
> 
> Regards
> 
> — Pantelis

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-14 Thread Benjamin Herrenschmidt

On Thu, 2015-05-14 at 10:04 +0300, Pantelis Antoniou wrote:

> Hmm, since you just want to transmit a whole subtree things are a bit
> simpler.
> 
> You don’t need any of the fixups, and your target node is known.
> 
> So your overlay is simply:
> 
> / {
>   fragment@0 {
>   target-path = “/foo”;
>   __overlay__ {
>   /* contents of the slot */
>   };
>   }; 
> };
>
> I think it’s possible to just bit-mangle a blob (in pseudo code).
> 
>   const u8 template_overlay_blob[] = {  };
> 
>   flatten_slot(slot_blob);
> 
>   overlay_blob = allocate_new_blob(template_overlay_blob, slot_blob);
> 
>   overlay_node = find_node(overlay_blob, “/fragment@0/__overlay__);
>   target_prop = find_prop(overlay_blob, “/fragment@0/target-path”);
> 
>   inject_slot_blob(overlay_blob, overlay_node, slot_blob);
>   modify_slot_target(overlay_blob, target_prop, slot_target);
>   
> I don’t think you need to re-flatten anything, shuffling bits around with
> memmove should work.

Fairly gross :-)

But yeah generating the overlay doesn't necessarily scare me, I can
generate a temp tree that is the overlay in which I "copy" the subtree
(or in my internal ptr-based representation I could have a concept of
alias which I follow while flattening).

That leaves me with these problems:

 - No support for removing of nodes, so that needs to be added back to
the format and to Linux unless I continue removing by hand in the PCI
hotplug code itself

 - No support for "committing" the overlay which needs to be added as
well.

> >>> Now we could consider that subtree as a changeset that can be undone,
> >>> but that wouldn't work for boot time. And subsequent updates wouldn't
> >>> have that concept of "undoing" anyway.
> >>> 
> >> 
> >> I have posted another patch that does boot-time DT quirk which are
> >> non-revertable.
> >> 
> >> https://lkml.org/lkml/2015/2/18/258
> > 
> > Not sure how that applies in my case ... I can't change the
> > representation of the PCI subtree, this is standard OFW representation,
> > I can't change the FW to make it an overlay-like thing at boot time,
> > that would break existing kernels.
> > 
> 
> The idea is to append the ‘quirk’ to the already booting device tree blob.

I know but that's not how things work for me. At boot time the FW passes
me one tree that contains all the PCI stuff it has probed.

> Another idea floating around was to simple concatenate the booting blob with
> any overlay blobs you want applied at boot time.

Sure but I don't get overlay blobs at boot time.

> >>> IE. conceptually, what overlays do today is quite rooted around the idea
> >>> of having a fixed "base" DT and some pre-compiled DTB overlays that
> >>> get added/removed. The design completely ignore the idea of a FW that
> >>> maintains a "live" tree which we want to keep in sync, which is what we
> >>> want to do here, or what we could do with a "live" open firmware
> >>> implementation.
> >>> 
> >>> Now we might be able to reconcile them, but it feels to me that the
> >>> overlay/changeset stuff is too rooted in the first concept…
> >>> 
> >> 
> >> The first DT overlays use case (beaglebone capes) is what got the concept
> >> started.
> >> 
> >> Right now is a generic mechanism to apply modifications to the kernel
> >> live tree, with the possibility to revert them.
> > 
> > Yes but as I said it's not really thought in term of keeping the kernel
> > tree in sync with an external dynamically generated tree. Maybe we can
> > fix it, but it's more complex…
> > 
> 
> Yes it is, unfortunately.

Right. Which makes the solution of just passing my bit of tree as a blob
which I expand in Linux where I want it rather than an overlay tempting
if we can make Gavin patch more palatable (removing the hybrid stuff
etc...).

Cheers,
Ben.

> > Ben.
> > 
> >>> Ben.
> >>> 
> >>> 
> >> 
> >> Regards
> >> 
> >> — Pantelis
> > 
> > 
> 
> Regards
> 
> — Pantelis


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-14 Thread Benjamin Herrenschmidt

On Thu, 2015-05-14 at 10:19 +0300, Pantelis Antoniou wrote:

> 
> You don’t want to know how sausages are made, but they are delicious :)

 ... most of the time :)

> > But yeah generating the overlay doesn't necessarily scare me, I can
> > generate a temp tree that is the overlay in which I "copy" the subtree
> > (or in my internal ptr-based representation I could have a concept of
> > alias which I follow while flattening).
> > 
> > That leaves me with these problems:
> > 
> > - No support for removing of nodes, so that needs to be added back to
> > the format and to Linux unless I continue removing by hand in the PCI
> > hotplug code itself
> > 
> 
> What kind of nodes/properties you need to remove at _application_ time?

Well, if we stick to removing by hand in Linux for the unplug case, then
none.

> What you describe is inserting a bunch of properties and nodes under
> a slot’s device node. Reverting the overlay removes them all just fine.

Except that still doesn't work for boot time :-)

So I would have to do a special case on unplug:

if (slot->dt_is_overlay) /* set to false at boot */
remove_subtree_myself();
else
undo_overlay(slot->overlay);

> > - No support for "committing" the overlay which needs to be added as
> > well.
> > 
> 
> That’s the easiest part.

Yeah, I will need to get my head around the code a bit more but it
doesn't seem too scary.

> I see. Well, how about this?
> 
> Who said you have to do the whole blob dance in the firmware?
> 
> You can just as easily pass the blob as it is to the linux kernel and
> the kernel there can convert it to an overlay and apply it.

That's not that pretty but we can do that too which solve the problem of
fixing the FW interface.

There is however an argument to be made in having the FW be able to
generate arbitrary overlays. If we ever want to pass more "property"
updates or node updates to Linux at runtime.

A few cases have crept up on the radar, like updating the pstate tables
or VPD informations ...

If we go down that path, then I would implement a concept of generation
count in the firmware, so I can generate an overlay that include all the
changes since the last "generation" given to Linux.

However that requires supporting removal of nodes/properties. So I'm
tempted to keep that feature on the back burner and go with an ad-hoc
interface for PCI for now.

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-14 Thread Benjamin Herrenschmidt


> So I would have to do a special case on unplug:
> 
>   if (slot->dt_is_overlay) /* set to false at boot */
>   remove_subtree_myself();
>   else
>   undo_overlay(slot->overlay);

Of course I just inverted the polarity of the if () in the example :-)

But you get the idea...


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC/PATCH] powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask

2015-05-14 Thread Benjamin Herrenschmidt

This patch adds the ability to the DMA direct ops to fallback to the IOMMU
ops for coherent alloc/free if the coherent mask of the device isn't
suitable for accessing the direct DMA space and the device also happens
to have an active IOMMU table.

Signed-off-by: Benjamin Herrenschmidt 
---

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 9f1371b..9ce2199 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -10,6 +10,7 @@ struct dma_map_ops;
 struct device_node;
 #ifdef CONFIG_PPC64
 struct pci_dn;
+struct iommu_table;
 #endif
 
 /*
@@ -23,13 +24,15 @@ struct dev_archdata {
struct dma_map_ops  *dma_ops;
 
/*
-* When an iommu is in use, dma_data is used as a ptr to the base of the
-* iommu_table.  Otherwise, it is a simple numerical offset.
+* These two used to be a union. However, with the hybrid ops we need
+* both so here we store both a DMA offset for direct mappings and
+* an iommu_table for remapped DMA.
 */
-   union {
-   dma_addr_t  dma_offset;
-   void*iommu_table_base;
-   } dma_data;
+   dma_addr_t  dma_offset;
+
+#ifdef CONFIG_PPC64
+   struct iommu_table  *iommu_table_base;
+#endif
 
 #ifdef CONFIG_IOMMU_API
void*iommu_domain;
diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 894d538..2e12366 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -21,12 +21,12 @@
 #define DMA_ERROR_CODE (~(dma_addr_t)0x0)
 
 /* Some dma direct funcs must be visible for use in other dma_ops */
-extern void *dma_direct_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flag,
+extern void *__dma_direct_alloc_coherent(struct device *dev, size_t size,
+dma_addr_t *dma_handle, gfp_t flag,
+struct dma_attrs *attrs);
+extern void __dma_direct_free_coherent(struct device *dev, size_t size,
+  void *vaddr, dma_addr_t dma_handle,
   struct dma_attrs *attrs);
-extern void dma_direct_free_coherent(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-struct dma_attrs *attrs);
 extern int dma_direct_mmap_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t handle,
@@ -106,7 +106,7 @@ static inline void set_dma_ops(struct device *dev, struct 
dma_map_ops *ops)
 static inline dma_addr_t get_dma_offset(struct device *dev)
 {
if (dev)
-   return dev->archdata.dma_data.dma_offset;
+   return dev->archdata.dma_offset;
 
return PCI_DRAM_OFFSET;
 }
@@ -114,7 +114,7 @@ static inline dma_addr_t get_dma_offset(struct device *dev)
 static inline void set_dma_offset(struct device *dev, dma_addr_t off)
 {
if (dev)
-   dev->archdata.dma_data.dma_offset = off;
+   dev->archdata.dma_offset = off;
 }
 
 /* this will be removed soon */
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index e2abbe8..157cfdd 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -2,17 +2,17 @@
  * Copyright (C) 2001 Mike Corrigan & Dave Engebretsen, IBM Corporation
  * Rewrite, cleanup:
  * Copyright (C) 2004 Olof Johansson , IBM Corporation
- * 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
@@ -93,14 +93,17 @@ int get_iommu_order(unsigned long size, struct iommu_table 
*tbl)
 
 struct scatterlist;
 
-static inline void set_iommu_table_base(struct device *dev, void *base)
+#ifdef CONFIG_PPC64
+
+static inline void set_iommu_table_base(struct device *dev,
+   struct iommu_table *base)
 {
-   dev->archdata.dma_data.iommu_table_base = base;
+   dev->archdata.iommu_table_base = base;
 }
 
 static inline void *get_iommu_ta

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-14 Thread Benjamin Herrenschmidt

On Thu, 2015-05-14 at 10:34 +0300, Pantelis Antoniou wrote:

> >> What you describe is inserting a bunch of properties and nodes under
> >> a slot’s device node. Reverting the overlay removes them all just fine.
> > 
> > Except that still doesn't work for boot time :-)
> > 
> > So I would have to do a special case on unplug:
> > 
> > if (slot->dt_is_overlay) /* set to false at boot */
> > remove_subtree_myself();
> > else
> > undo_overlay(slot->overlay);
> > 
> 
> OK, in that case you do require removal. But in any case it’s the ‘negative’
> of an already applied one, either at boot time or not.

Sort-of, unless we have a way in the overlay to simply specify node
removal statements so we don't have to explicitly remove all properties
(or even all children).

> Modifying the overlay code to apply a ‘negative’ property should do the trick.
> 
> Is that correct?

I would do negatives node and let Linux imply the properties (or even
children).

But yes, that would probably do.

 .../...

> I will probably need that generation count myself for my PCI use case.
> 
> > However that requires supporting removal of nodes/properties. So I'm
> > tempted to keep that feature on the back burner and go with an ad-hoc
> > interface for PCI for now.
> > 
> 
> I see. Bonne chance :)

Merci :)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: G5 Xserve rackmeter broken?

2015-05-14 Thread Benjamin Herrenschmidt

On Thu, 2015-05-14 at 13:06 +0300, Aaro Koskinen wrote:
> Hi,
> 
> On Wed, May 13, 2015 at 06:39:40AM +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> > > I'm running with HZ=100 so the values are still probably within
> > > jiffy resolution, so perhaps the calculation should first do
> > > idle = min(idle, total)?
> > 
> > Does it gives you a reasonable output if you do that ?
> 
> The below change fixes the idle system blinking behaviour.
> 
> I'm also able to reproduce the leds going off during full CPU load case.
> It seems there is some DMA error. Normally, reading rm->dma_regs->status
> in the IRQ handler gives 0x8400. In the failure cases I've seen values
> 0x8880 and 0x8980 - the IRQ will stop after this and it will need
> paused -> started cycle before it gets going again (but sometimes fails
> again soon after).

That's a bit worrysome, is that new ? Smells like faulting HW ...

Ben.

> A.
> 
> diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
> index 048901a..3381fa59 100644
> --- a/drivers/macintosh/rack-meter.c
> +++ b/drivers/macintosh/rack-meter.c
> @@ -227,6 +227,7 @@ static void rackmeter_do_timer(struct work_struct *work)
>  
>   total_idle_ticks = get_cpu_idle_time(cpu);
>   idle_ticks = (unsigned int) (total_idle_ticks - rcpu->prev_idle);
> + idle_ticks = min(idle_ticks, total_ticks);
>   rcpu->prev_idle = total_idle_ticks;
>  
>   /* We do a very dumb calculation to update the LEDs for now,


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC/PATCH v2] powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask

2015-05-14 Thread Benjamin Herrenschmidt

This patch adds the ability to the DMA direct ops to fallback to the IOMMU
ops for coherent alloc/free if the coherent mask of the device isn't
suitable for accessing the direct DMA space and the device also happens
to have an active IOMMU table.

Signed-off-by: Benjamin Herrenschmidt 
---

v2: Implement a custom dma_set_coherent_mask() to work around the fact
that dma_supported() will fail otherwise.

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 22b0940..cd46474 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -82,6 +82,9 @@ config GENERIC_HWEIGHT
bool
default y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+bool
+
 config PPC
bool
default y
@@ -152,6 +155,7 @@ config PPC
select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
select NO_BOOTMEM
select HAVE_GENERIC_RCU_GUP
+   select ARCH_HAS_DMA_SET_COHERENT_MASK
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 9f1371b..9ce2199 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -10,6 +10,7 @@ struct dma_map_ops;
 struct device_node;
 #ifdef CONFIG_PPC64
 struct pci_dn;
+struct iommu_table;
 #endif
 
 /*
@@ -23,13 +24,15 @@ struct dev_archdata {
struct dma_map_ops  *dma_ops;
 
/*
-* When an iommu is in use, dma_data is used as a ptr to the base of the
-* iommu_table.  Otherwise, it is a simple numerical offset.
+* These two used to be a union. However, with the hybrid ops we need
+* both so here we store both a DMA offset for direct mappings and
+* an iommu_table for remapped DMA.
 */
-   union {
-   dma_addr_t  dma_offset;
-   void*iommu_table_base;
-   } dma_data;
+   dma_addr_t  dma_offset;
+
+#ifdef CONFIG_PPC64
+   struct iommu_table  *iommu_table_base;
+#endif
 
 #ifdef CONFIG_IOMMU_API
void*iommu_domain;
diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 894d538..2e12366 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -21,12 +21,12 @@
 #define DMA_ERROR_CODE (~(dma_addr_t)0x0)
 
 /* Some dma direct funcs must be visible for use in other dma_ops */
-extern void *dma_direct_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flag,
+extern void *__dma_direct_alloc_coherent(struct device *dev, size_t size,
+dma_addr_t *dma_handle, gfp_t flag,
+struct dma_attrs *attrs);
+extern void __dma_direct_free_coherent(struct device *dev, size_t size,
+  void *vaddr, dma_addr_t dma_handle,
   struct dma_attrs *attrs);
-extern void dma_direct_free_coherent(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-struct dma_attrs *attrs);
 extern int dma_direct_mmap_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t handle,
@@ -106,7 +106,7 @@ static inline void set_dma_ops(struct device *dev, struct 
dma_map_ops *ops)
 static inline dma_addr_t get_dma_offset(struct device *dev)
 {
if (dev)
-   return dev->archdata.dma_data.dma_offset;
+   return dev->archdata.dma_offset;
 
return PCI_DRAM_OFFSET;
 }
@@ -114,7 +114,7 @@ static inline dma_addr_t get_dma_offset(struct device *dev)
 static inline void set_dma_offset(struct device *dev, dma_addr_t off)
 {
if (dev)
-   dev->archdata.dma_data.dma_offset = off;
+   dev->archdata.dma_offset = off;
 }
 
 /* this will be removed soon */
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index e2abbe8..e67e678 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -2,17 +2,17 @@
  * Copyright (C) 2001 Mike Corrigan & Dave Engebretsen, IBM Corporation
  * Rewrite, cleanup:
  * Copyright (C) 2004 Olof Johansson , IBM Corporation
- * 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You shoul

[RFC/PATCH v3] powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask

2015-05-14 Thread Benjamin Herrenschmidt

This patch adds the ability to the DMA direct ops to fallback to the IOMMU
ops for coherent alloc/free if the coherent mask of the device isn't
suitable for accessing the direct DMA space and the device also happens
to have an active IOMMU table.

Signed-off-by: Benjamin Herrenschmidt 
---

v2: Implement a custom dma_set_coherent_mask() to work around the fact
that dma_supported() will fail otherwise.

v3: Add missing symbol export

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 22b0940..cd46474 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -82,6 +82,9 @@ config GENERIC_HWEIGHT
bool
default y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+bool
+
 config PPC
bool
default y
@@ -152,6 +155,7 @@ config PPC
select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
select NO_BOOTMEM
select HAVE_GENERIC_RCU_GUP
+   select ARCH_HAS_DMA_SET_COHERENT_MASK
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 9f1371b..9ce2199 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -10,6 +10,7 @@ struct dma_map_ops;
 struct device_node;
 #ifdef CONFIG_PPC64
 struct pci_dn;
+struct iommu_table;
 #endif
 
 /*
@@ -23,13 +24,15 @@ struct dev_archdata {
struct dma_map_ops  *dma_ops;
 
/*
-* When an iommu is in use, dma_data is used as a ptr to the base of the
-* iommu_table.  Otherwise, it is a simple numerical offset.
+* These two used to be a union. However, with the hybrid ops we need
+* both so here we store both a DMA offset for direct mappings and
+* an iommu_table for remapped DMA.
 */
-   union {
-   dma_addr_t  dma_offset;
-   void*iommu_table_base;
-   } dma_data;
+   dma_addr_t  dma_offset;
+
+#ifdef CONFIG_PPC64
+   struct iommu_table  *iommu_table_base;
+#endif
 
 #ifdef CONFIG_IOMMU_API
void*iommu_domain;
diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 894d538..2e12366 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -21,12 +21,12 @@
 #define DMA_ERROR_CODE (~(dma_addr_t)0x0)
 
 /* Some dma direct funcs must be visible for use in other dma_ops */
-extern void *dma_direct_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flag,
+extern void *__dma_direct_alloc_coherent(struct device *dev, size_t size,
+dma_addr_t *dma_handle, gfp_t flag,
+struct dma_attrs *attrs);
+extern void __dma_direct_free_coherent(struct device *dev, size_t size,
+  void *vaddr, dma_addr_t dma_handle,
   struct dma_attrs *attrs);
-extern void dma_direct_free_coherent(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-struct dma_attrs *attrs);
 extern int dma_direct_mmap_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t handle,
@@ -106,7 +106,7 @@ static inline void set_dma_ops(struct device *dev, struct 
dma_map_ops *ops)
 static inline dma_addr_t get_dma_offset(struct device *dev)
 {
if (dev)
-   return dev->archdata.dma_data.dma_offset;
+   return dev->archdata.dma_offset;
 
return PCI_DRAM_OFFSET;
 }
@@ -114,7 +114,7 @@ static inline dma_addr_t get_dma_offset(struct device *dev)
 static inline void set_dma_offset(struct device *dev, dma_addr_t off)
 {
if (dev)
-   dev->archdata.dma_data.dma_offset = off;
+   dev->archdata.dma_offset = off;
 }
 
 /* this will be removed soon */
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index e2abbe8..e67e678 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -2,17 +2,17 @@
  * Copyright (C) 2001 Mike Corrigan & Dave Engebretsen, IBM Corporation
  * Rewrite, cleanup:
  * Copyright (C) 2004 Olof Johansson , IBM Corporation
- * 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for m

Re: [PATCH v4 19/21] drivers/of: Support adding sub-tree

2015-05-14 Thread Benjamin Herrenschmidt

On Thu, 2015-05-14 at 14:02 +0300, Pantelis Antoniou wrote:
> Hi Ben,
> 
> > On May 14, 2015, at 10:47 , Benjamin Herrenschmidt 
> >  wrote:
> > 
> 
> [snip]
> 
> So I spend some time thinking about your use case and I think it boils down
> to this:
> 
> I have a live tree in the firmware, I have made changes and I need to reflect
> those changes to the live tree in the kernel.
> 
> Sounds like ‘how do I generate a patch for getting those two in sync'. No?

More or less.

> I can see where this might be useful for others as all.
> 
> I think we really need to create a liblivedt like we have libfdt since
> we have a number of projects going about using/manipulating DT at runtime.
> 
> 1. The linux kernel, with it’s own live tree implementation.
> 2. The device tree compiler (it has a live tree) custom implemented.
> 3. Your weird and wonderful (or wacky) firmware.
> 4. u-boot does use DT now, but it does with libfdt. I believe this is 
> suboptimal.
> 5. barebox does DT as well.
> 
> Most of what we want to do with DT can be abstracted in a library I think that
> all of those projects can use.
> 
> What are your thoughts?

Well, we have at least two implementations, the kernel one and the one
in our OPAL firmware:

https://github.com/open-power/skiboot/blob/master/include/device.h
https://github.com/open-power/skiboot/blob/master/core/device.c

The latter uses some nice Rusty tricks (tm) for multiple argument
functions.

It would make sense to do a library somewhere yes. However, I need to
cut my firmware API pretty much today so I think for now I'll stick
to something Ad-Hoc for the PCI hotplug code that just passes the
bit of FDT with the new devices and leave the "grand project" of live
sync of the tree for later.

There are other implementations of live DT in various Open Firmware
variants out there, most are in Forth which I suggest you don't bother
with unless you enjoy pain, but I think at least one of these is
actually in C.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask

2015-05-17 Thread Benjamin Herrenschmidt

This patch adds the ability to the DMA direct ops to fallback to the IOMMU
ops for coherent alloc/free if the coherent mask of the device isn't
suitable for accessing the direct DMA space and the device also happens
to have an active IOMMU table.

Signed-off-by: Benjamin Herrenschmidt 
Tested-by: Brian King 
---

Identical to RFC v3

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 22b0940..cd46474 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -82,6 +82,9 @@ config GENERIC_HWEIGHT
bool
default y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+bool
+
 config PPC
bool
default y
@@ -152,6 +155,7 @@ config PPC
select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
select NO_BOOTMEM
select HAVE_GENERIC_RCU_GUP
+   select ARCH_HAS_DMA_SET_COHERENT_MASK
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 9f1371b..9ce2199 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -10,6 +10,7 @@ struct dma_map_ops;
 struct device_node;
 #ifdef CONFIG_PPC64
 struct pci_dn;
+struct iommu_table;
 #endif
 
 /*
@@ -23,13 +24,15 @@ struct dev_archdata {
struct dma_map_ops  *dma_ops;
 
/*
-* When an iommu is in use, dma_data is used as a ptr to the base of the
-* iommu_table.  Otherwise, it is a simple numerical offset.
+* These two used to be a union. However, with the hybrid ops we need
+* both so here we store both a DMA offset for direct mappings and
+* an iommu_table for remapped DMA.
 */
-   union {
-   dma_addr_t  dma_offset;
-   void*iommu_table_base;
-   } dma_data;
+   dma_addr_t  dma_offset;
+
+#ifdef CONFIG_PPC64
+   struct iommu_table  *iommu_table_base;
+#endif
 
 #ifdef CONFIG_IOMMU_API
void*iommu_domain;
diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 894d538..2e12366 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -21,12 +21,12 @@
 #define DMA_ERROR_CODE (~(dma_addr_t)0x0)
 
 /* Some dma direct funcs must be visible for use in other dma_ops */
-extern void *dma_direct_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flag,
+extern void *__dma_direct_alloc_coherent(struct device *dev, size_t size,
+dma_addr_t *dma_handle, gfp_t flag,
+struct dma_attrs *attrs);
+extern void __dma_direct_free_coherent(struct device *dev, size_t size,
+  void *vaddr, dma_addr_t dma_handle,
   struct dma_attrs *attrs);
-extern void dma_direct_free_coherent(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-struct dma_attrs *attrs);
 extern int dma_direct_mmap_coherent(struct device *dev,
struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t handle,
@@ -106,7 +106,7 @@ static inline void set_dma_ops(struct device *dev, struct 
dma_map_ops *ops)
 static inline dma_addr_t get_dma_offset(struct device *dev)
 {
if (dev)
-   return dev->archdata.dma_data.dma_offset;
+   return dev->archdata.dma_offset;
 
return PCI_DRAM_OFFSET;
 }
@@ -114,7 +114,7 @@ static inline dma_addr_t get_dma_offset(struct device *dev)
 static inline void set_dma_offset(struct device *dev, dma_addr_t off)
 {
if (dev)
-   dev->archdata.dma_data.dma_offset = off;
+   dev->archdata.dma_offset = off;
 }
 
 /* this will be removed soon */
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index e2abbe8..e67e678 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -2,17 +2,17 @@
  * Copyright (C) 2001 Mike Corrigan & Dave Engebretsen, IBM Corporation
  * Rewrite, cleanup:
  * Copyright (C) 2004 Olof Johansson , IBM Corporation
- * 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along wit

Re: [PATCH] powerpc/perf: Fix book3s kernel to userspace backtraces

2015-05-26 Thread Benjamin Herrenschmidt

On Tue, 2015-05-26 at 15:10 +1000, Anton Blanchard wrote:
> When we take a PMU exception or a software event we call
> perf_read_regs(). This overloads regs->result with a boolean that
> describes if we should use the sampled instruction address register
> (SIAR) or the regs.
> 
> If the exception is in kernel, we start with the kernel regs and
> backtrace through the kernel stack. At this point we switch to the
> userspace regs and backtrace the user stack with perf_callchain_user().
> 
> Unfortunately these regs have not got the perf_read_regs() treatment,
> so regs->result could be anything. If it is non zero,
> perf_instruction_pointer() decides to use the SIAR, and we get issues
> like this:

CC stable ?

> 0.11%  qemu-system-ppc  [kernel.kallsyms][k] _raw_spin_lock_irqsave
>|
>---_raw_spin_lock_irqsave
>   |
>   |--52.35%-- 0
>   |  |
>   |  |--46.39%-- __hrtimer_start_range_ns
>   |  |  kvmppc_run_core
>   |  |  kvmppc_vcpu_run_hv
>   |  |  kvmppc_vcpu_run
>   |  |  kvm_arch_vcpu_ioctl_run
>   |  |  kvm_vcpu_ioctl
>   |  |  do_vfs_ioctl
>   |  |  sys_ioctl
>   |  |  system_call
>   |  |  |
>   |  |  |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
>   |  |  |  |
>   |  |  |   --100.00%-- 0x7e714
>   |  |  | 0x7e714
> 
> Notice the bogus _raw_spin_irqsave when we transition from kernel
> (system_call) to userspace (0x7e714). We inserted what was in the SIAR.
> 
> Add a check in regs_use_siar() to check that the regs in question
> are from a PMU exception. With this fix the backtrace makes sense:
> 
>  0.47%  qemu-system-ppc  [kernel.vmlinux] [k] 
> _raw_spin_lock_irqsave
> |
> ---_raw_spin_lock_irqsave
>|
>|--53.83%-- 0
>|  |
>|  |--44.73%-- hrtimer_try_to_cancel
>|  |  kvmppc_start_thread
>|  |  kvmppc_run_core
>|  |  kvmppc_vcpu_run_hv
>|  |  kvmppc_vcpu_run
>|  |  kvm_arch_vcpu_ioctl_run
>|  |  kvm_vcpu_ioctl
>|  |  do_vfs_ioctl
>|  |  sys_ioctl
>|  |  system_call
>|  |  __ioctl
>|  |  0x7e714
>|  |  0x7e714
> 
> Signed-off-by: Anton Blanchard 
> ---
>  arch/powerpc/perf/core-book3s.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 12b6384..42a3725 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -131,7 +131,10 @@ static void pmao_restore_workaround(bool ebb) { }
>  
>  static bool regs_use_siar(struct pt_regs *regs)
>  {
> - return !!regs->result;
> + if ((TRAP(regs) == 0xf00) && regs->result)
> + return true;
> +
> + return false;
>  }
>  
>  /*


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] PCI: pciehp: Convert pciehp to be builtin only, not modular

2015-05-28 Thread Benjamin Herrenschmidt

On Thu, 2015-05-28 at 15:08 -0700, Yinghai Lu wrote:
> On Wed, May 27, 2015 at 6:30 PM, Rafael J. Wysocki  wrote:
> > On Wednesday, May 27, 2015 02:31:49 PM Bjorn Helgaas wrote:
> > For debug you can always use pcie_ports=compat and that will disable
> > pciehp too.
> 
> That will disable AER at the same time, right?.

For ppc machines under either our hypervisor or running bare metal with
our EEH-enabled bridges, both generic AER and hotplug code are going to
interfere.

EEH "subsumes" AER, and we have firmware interfaces to do hotplug that
know more than what the generic code can know about (such as on-board
GPIOs that can control PERST, it's not always via the hotplug registers
or hotplug from top-level slots which isn't the same as hotplug from
switch slots on our platforms).

So basically, at this point, we must not have the PCIe port drivers at
all, just just interfere.

We should replace that with the appropriate "hooks" for the platform to
disable selected functions (hotplug and AER) from the generic port
drivers.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cpuidle: powernv/pseries: Decrease the snooze residency

2015-05-30 Thread Benjamin Herrenschmidt

On Sat, 2015-05-30 at 11:31 +0530, Vaidyanathan Srinivasan wrote:
> In shared lpar case, spinning in guest context may potentially take
> away cycles from other lpars waiting to run on the same physical cpu.
> 
> So the policy in shared lpar case is to let PowerVM hypervisor know
> immediately that the guest cpu is idle which will allow the hypervisor
> to use the cycles for other tasks/lpars.

But that will have negative side effects under KVM no ?

Suresh mentioned something with his new directed interrupts code that we
had many cases where the interrupts ended up arriving shortly after we
exited to host for NAP'ing ...

Snooze might fix it...

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields && data tearing

2014-09-03 Thread Benjamin Herrenschmidt

On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote:

> Apologies for hijacking this thread but I need to extend this discussion
> somewhat regarding what a compiler might do with adjacent fields in a 
> structure.
> 
> The tty subsystem defines a large aggregate structure, struct tty_struct.
> Importantly, several different locks apply to different fields within that
> structure; ie., a specific spinlock will be claimed before updating or 
> accessing
> certain fields while a different spinlock will be claimed before updating or
> accessing certain _adjacent_ fields.
> 
> What is necessary and sufficient to prevent accidental false-sharing?
> The patch below was flagged as insufficient on ia64, and possibly ARM.

We expect native aligned scalar types to be accessed atomically (the
read/modify/write of a larger quantity that gcc does on some bitfield
cases has been flagged as a gcc bug, but shouldn't happen on normal
scalar types).

I am not 100% certain of "bool" here, I assume it's treated as a normal
scalar and thus atomic but if unsure, you can always use int.

Another option is to use the atomic bitops and make these bits in a
bitmask but that is probably unnecessary if you have locks already.

Cheers,
Ben.


> Regards,
> Peter Hurley
> 
> --- >% ---
> Subject: [PATCH 21/26] tty: Convert tty_struct bitfield to bools
> 
> The stopped, hw_stopped, flow_stopped and packet bits are smp-unsafe
> and interrupt-unsafe. For example,
> 
> CPU 0 | CPU 1
>   |
> tty->flow_stopped = 1 | tty->hw_stopped = 0
> 
> One of these updates will be corrupted, as the bitwise operation
> on the bitfield is non-atomic.
> 
> Ensure each flag has a separate memory location, so concurrent
> updates do not corrupt orthogonal states.
> 
> Signed-off-by: Peter Hurley 
> ---
>  include/linux/tty.h | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/tty.h b/include/linux/tty.h
> index 1c3316a..7cf61cb 100644
> --- a/include/linux/tty.h
> +++ b/include/linux/tty.h
> @@ -261,7 +261,10 @@ struct tty_struct {
>   unsigned long flags;
>   int count;
>   struct winsize winsize; /* winsize_mutex */
> - unsigned char stopped:1, hw_stopped:1, flow_stopped:1, packet:1;
> + bool stopped;
> + bool hw_stopped;
> + bool flow_stopped;
> + bool packet;
>   unsigned char ctrl_status;  /* ctrl_lock */
>   unsigned int receive_room;  /* Bytes free for queue */
>   int flow_change;


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

TTM placement & caching issue/questions

2014-09-03 Thread Benjamin Herrenschmidt

Hi folks !

I've been tracking down some problems with the recent DRI on powerpc and
stumbled upon something that doesn't look right, and not necessarily
only for us.

Now it's possible that I haven't fully understood the code here and I
also don't know to what extent some of that behaviour is necessary for
some platforms such as Intel GTT bits.

What I've observed with a simple/dumb (no DMA) driver like AST (but this
probably happens more generally) is that when evicting a BO from VRAM
into System memory, the TTM tries to preserve the existing caching
attributes of the VRAM object.

From what I can tell, we end up with going from VRAM to System memory
type, and we eventually call ttm_bo_select_caching() to select the
caching option for the target.

This will, from what I can tell, try to use the same caching mode as the
original object:

if ((cur_placement & caching) != 0)
result |= (cur_placement & caching);

And cur_placement comes from bo->mem.placement which as far as I can
tell is based on the placement array which the drivers set up.

Now they tend to uniformly setup the placement for System memory as
TTM_PL_MASK_CACHING which enables all caching modes.

So I end up with, for example, my System memory BOs having
TTM_PL_FLAG_CACHED not set (though they also don't have
TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.

We don't seem to use the man->default_caching (which will have
TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
proposed placement and the existing caching mode.

Now this is a problem for several reason that I can think of:

 - On a number of powerpc platforms, such as all our server 64-bit one
for example, it's actually illegal to map system memory non-cached. The
system is fully cache coherent for all possible DMA originators (that we
care about at least) and mapping memory non-cachable while it's mapped
cachable in the linear mapping can cause nasty cache paradox which, when
detected by HW, can checkstop the system.

 - A similar issue exists, afaik, on ARM >= v7, so anything mapped
non-cachable must be removed from the linear mapping explicitly since
otherwise it can be speculatively prefetched into the cache.

 - I don't know about x86, but even then, it looks quite sub-optimal to
map the memory backing of the BOs and access it using a WC rather than a
cachable mapping attribute.

Now, some folks on IRC mentioned that there might be reasons for the
current behaviour as to not change the caching attributes when going
in/out of the GTT on Intel, I don't know how that relates and how that
works, but maybe that should be enforced by having a different placement
mask specifically on those chipsets.

Dave, should we change the various PCI drivers for generally coherent
devices such that the System memory type doesn't allow placements
without CACHED attribute ? Or at least on coherent platforms ? How do
detect that ? Should we have a TTM helper to establish the default
memory placement attributes that "normal PCI" drivers call to set that
up so we can have all the necessary arch ifdefs in one single place, at
least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?

Non-PCI and "special" drivers like Intel can use a different set of
placement attributes to represent the requirements of those specific
platforms (mostly thinking of embedded ARM here which under some
circumstances might actually require non-cached mappings).
Or am I missing another part of the puzzle ?

As it-is, things are broken for me even for dumb drivers, and I suspect
to a large extent with radeon and nouveau too, though in some case we
might get away with it most of the time ... until the machine locks up
for some unexplainable reason... This might cause problems on existing
distros such as RHEL7 with our radeon adapters even.

Any suggestion of what's the best approach to fix it ? I'm happy to
produce the patches but I'm not that familiar with the TTM so I would
like to make sure I'm the right track first :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: TTM placement & caching issue/questions

2014-09-03 Thread Benjamin Herrenschmidt

On Wed, 2014-09-03 at 21:55 -0400, Jerome Glisse wrote:
> So i think we need to get a platform flags and or set_pages_array_wc|uc
> needs to fail and this would fallback to cached mapping if the fallback
> code still works. So if your arch properly return and error for those
> cache changing function then you should be fine.
> 
> This also means that we need to fix ttm_tt_set_placement_caching so that
> when it returns an error it switches to cached mapping. Which will always
> work.

Can't I just filter the mem_type definitions in the mem_type_manager
with something along that totally untested patch ?

Or do I *also* need to make those set_page_array_* things fail ?

--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1308,6 +1308,24 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned 
 }
 EXPORT_SYMBOL(ttm_bo_evict_mm);
 
+static void ttm_bo_filter_mem_type(struct ttm_bo_device *bdev, unsigned type,
+  struct ttm_mem_type_manager *man)
+{
+   /*
+* On some architectures/patforms, we cannot allow non-cachable
+* mappings of system memory. This can be a problem with AGP on
+* old G5 systems vs. TTM_PL_TT but we don't really have a choice
+* at this point on ppc64 at least and the AGP on these never
+* worked reliably anyway.
+*/
+#if defined(CONFIG_PPC) && !defined(CONFIG_NOT_COHERENT_CACHE)
+   if (type == TTM_PL_SYSTEM || type == TTM_PL_TT) {
+   man->available_caching &= TTM_PL_FLAG_CACHED;
+   man->default_caching &= man->available_caching;
+   }
+#endif
+}
+
 int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
unsigned long p_size)
 {
@@ -1327,6 +1345,8 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned ty
return ret;
man->bdev = bdev;
 
+   ttm_bo_filter_mem_type(bdev, type, man);
+
ret = 0;
if (type != TTM_PL_SYSTEM) {
ret = (*man->func->init)(man, p_size);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: TTM placement & caching issue/questions

2014-09-03 Thread Benjamin Herrenschmidt

On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:

> So in the meantime the attached patch should work, it just silently ignore
> the caching attribute request on non x86 instead of pretending that things
> are setup as expected and then latter the radeon ou nouveau hw unsetting
> the snoop bit.
> 
> It's not tested but i think it should work.

I'm still getting placements with !CACHED going from bo_memcpy in
ttm_io_prot() though ... I'm looking at filtering the placement
attributes instead.

Ben.

> > 
> > Cheers,
> > Jérôme
> > 
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > 
> > > ___
> > > dri-devel mailing list
> > > dri-de...@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/dri-devel


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: TTM placement & caching issue/questions

2014-09-03 Thread Benjamin Herrenschmidt

On Wed, 2014-09-03 at 22:36 -0400, Jerome Glisse wrote:
> On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
> > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> > > 
> > > > So in the meantime the attached patch should work, it just silently 
> > > > ignore
> > > > the caching attribute request on non x86 instead of pretending that 
> > > > things
> > > > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > > > the snoop bit.
> > > > 
> > > > It's not tested but i think it should work.
> > > 
> > > I'm still getting placements with !CACHED going from bo_memcpy in
> > > ttm_io_prot() though ... I'm looking at filtering the placement
> > > attributes instead.
> > > 
> > > Ben.
> > 
> > Ok so this one should do the trick.
> 
> Ok final version ... famous last word.

Minus a couple of obvious typos that prevent if from building, it seems
to do the trick for me with the AST driver, no more bad mappings.

I'll still send a patch that catches the incorrect mapping attempts
inside ttm_io_prot() and warns to help future debugging and avoid
"random" behaviour. (I need to fix other things in the powerpc code
in there anyway).

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 16:19 +0900, Michel Dänzer wrote:
> > +#else /* CONFIG_X86 */
> > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t
> *placement)
> > +{
> > + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> > + ttm->caching_state = tt_cached;
> > + *placement &= ~TTM_PL_MASK_CACHING;
> > + *placement |= TTM_PL_FLAG_CACHED;
> 
> NAK, this will break AGP on PowerMacs.

 ... which doesn't work reliably anyway with DRI2 :-)

The problem is ... with DRI1 I think we had tricks to take out the
AGP from the linear mapping but that want away, didn't we ?

In any case, we are playing with fire on these by allowing the
cache paradox. It just happens that those old CPUs aren't *that*
aggressive at speculative prefetch and we probably rarely hit the
lockups that they would cause...

Michel, what do you recommend we do then ? The patch I sent to
double check in ttm_io_prot() has a specific hack to avoid warning
on PowerMac for the above reason, but we need to fix Jerome if we
want to keep that broken-by-design Mac AGP functionality going :-)

Maybe we could add a similar ifdef in the above ?

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote:

> > This will, from what I can tell, try to use the same caching mode as the
> > original object:
> >
> > if ((cur_placement & caching) != 0)
> > result |= (cur_placement & caching);
> >
> > And cur_placement comes from bo->mem.placement which as far as I can
> > tell is based on the placement array which the drivers set up.
> 
> This originates from the fact that when evicting GTT memory, on x86 it's
> unnecessary and undesirable to switch caching mode when going to system.

But that's what I don't quite understand. We have two different mappings
here. The VRAM and the memory object. We wouldn't be "switching"... we
are creating a temporary mapping for the memory object in order to do
the memcpy, but we seem to be doing it by using the caching attributes
of the VRAM object or am I missing something ? I don't see how that
makes sense so I suppose I'm missing something here :-)

> Last time I tested, (and it seems like Michel is on the same track),
> writing with the CPU to write-combined memory was substantially faster
> than writing to cached memory, with the additional side-effect that CPU
> caches are left unpolluted.

That's very strange indeed. It's certainly an x86 specific artifact,
even if we were allowed by our hypervisor to map memory non-cachable
(the HW somewhat can), we tend to have a higher throughput by going
cachable, but that could be due to the way the PowerBus works (it's
basically very biased toward cachable transactions).

> I dislike the approach of rewriting placements. In some cases I think it
> won't even work, because placements are declared 'static const'
> 
> What I'd suggest is instead to intercept the driver response from
> init_mem_type() and filter out undesired caching modes from
> available_caching and default_caching, 

This was my original intent but Jerome seems to have different ideas
(see his proposed patches). I'm happy to revive mine as well and post it
as an alternative after I've tested it a bit more (tomorrow).

> perhaps also looking at whether
> the memory type is mappable or not. This should have the additional
> benefit of working everywhere, and if a caching mode is selected that's
> not available on the platform, you'll simply get an error. (I guess?)

You mean that if not mappable we don't bother filtering ?

The rule is really for me pretty simple:

   - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable

   - If it's PCIe memory space (VRAM, registers, ...) it MUST be
non-cachable.

Cheers,
Ben.

> /Thomas
> 
> 
> >
> > Cheers,
> > Ben.
> >
> >
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/dri-devel

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 16:59 +0900, Michel Dänzer wrote:
> 
> Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm 
> not alone, at least with AGP 1x it seems to work quite well for most 
> people. So I don't see the justification for intentionally breaking it 
> completely for all of us.

Oh I wasn't arguing for breaking it, just jesting. We need to keep it
working. It's amazing how well broken stuff actually work though :-)

I mean, it's architecturally broken and if we get a collision between
the cache and the NCU, the chip will crash. We just get lucky I suppose.

Anyway, I'll try a different approach tomorrow see how it goes.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote:
> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
> > Last time I tested, (and it seems like Michel is on the same track),
> > writing with the CPU to write-combined memory was substantially faster
> > than writing to cached memory, with the additional side-effect that CPU
> > caches are left unpolluted.
> > 
> > Moreover (although only tested on Intel's embedded chipsets), texturing
> > from cpu-cache-coherent PCI memory was a real GPU performance hog
> > compared to texturing from non-snooped memory. Hence, whenever a buffer
> > could be classified as GPU-read-only (or almost at least), it should be
> > placed in write-combined memory.
> 
> Just a quick comment since this explicitly referes to intel chips: On
> desktop/laptop chips with the big shared l3/l4 caches it's the other way
> round. Cached uploads are substantially faster than wc and not using
> coherent access is a severe perf hit for texturing. I guess the hw guys
> worked really hard to hide the snooping costs so that the gpu can benefit
> from the massive bandwidth these caches can provide.

This is similar to modern POWER chips as well. We have pretty big L3's
(though not technically shared they are in a separate quadrant and we
have a shared L4 in the memory buffer) and our fabric is generally
optimized for cachable/coherent access performance. In fact, we only
have so many credits for NC accesses on the bus...

What that tells me is that when setting up the desired cachability
attributes for the mapping of a memory object, we need to consider these
things here:

  - The hard requirement of the HW (non-coherent GPUs require NC, AGP
does in some cases, etc...) which I think is basically already handled
using the placement attributes set by the GPU driver for the memory type

  - The optimal attributes (and platform hard requirements) for fast
memory accesses to an object by the processor.  From what I read here,
this can be NC+WC on older Intel, cachable on newer, etc...)

  - The optimal attributes for fast GPU DMA accesses to the object in
system memory. Here too, this is fairly platform/chipset dependent.

Do we have flags in the DRM that tell us whether an object in memory is
more likely to be used by the GPU via DMA vs by the CPU via MMIO ? On
powerpc (except in the old AGP case), I wouldn't care about require
cachable in both case, but I can see the low latency crowd wanting the
former to be non-cachable while the dumb GPUs like AST who don't do DMA
would benefit greatly from the latter...

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields && data tearing

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 08:43 +, David Laight wrote:
> From:  Benjamin Herrenschmidt
> > On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote:
> > 
> > > Apologies for hijacking this thread but I need to extend this discussion
> > > somewhat regarding what a compiler might do with adjacent fields in a 
> > > structure.
> > >
> > > The tty subsystem defines a large aggregate structure, struct tty_struct.
> > > Importantly, several different locks apply to different fields within that
> > > structure; ie., a specific spinlock will be claimed before updating or 
> > > accessing
> > > certain fields while a different spinlock will be claimed before updating 
> > > or
> > > accessing certain _adjacent_ fields.
> > >
> > > What is necessary and sufficient to prevent accidental false-sharing?
> > > The patch below was flagged as insufficient on ia64, and possibly ARM.
> > 
> > We expect native aligned scalar types to be accessed atomically (the
> > read/modify/write of a larger quantity that gcc does on some bitfield
> > cases has been flagged as a gcc bug, but shouldn't happen on normal
> > scalar types).
> 
> That isn't true on all architectures for items smaller than a machine word.
> At least one has to do rmw for byte accesses.

Yeah correct, alpha and bytes right ? Is there any other ? That's why I
suggested int.

>   David
> 
> > I am not 100% certain of "bool" here, I assume it's treated as a normal
> > scalar and thus atomic but if unsure, you can always use int.
> > 
> > Another option is to use the atomic bitops and make these bits in a
> > bitmask but that is probably unnecessary if you have locks already.
> > 
> > Cheers,
> > Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cpuidle/powernv: Enter fastsleep on checking if deep idle states are allowed

2014-09-15 Thread Benjamin Herrenschmidt

On Mon, 2014-09-15 at 14:22 +0530, Preeti U Murthy wrote:
> > We were talking about getting rid of powersave_nap altogether, but I
> think we
> > decided we couldn't, I forget.
> 
> Isn't this a helpful knob to disable cpuidle at runtime? Currently we
> check the value of powersave_nap before entering both nap and
> fastsleep.

It's useful when diagnosing some problems that can be caused by idle
states.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] powerpc/powernv: Add OPAL check token call

2014-09-15 Thread Benjamin Herrenschmidt

On Mon, 2014-09-15 at 13:10 +1000, Michael Ellerman wrote:
> > +enum OpalCheckTokenStatus {
> > +   OPAL_TOKEN_ABSENT = 0,
> > +   OPAL_TOKEN_PRESENT = 1
> > +};
> 
> I don't see this used anywhere?
> 
> And NoCamelCase !
> 
> Yes I know there's lots in that file, but I didn't merge that :)

The original OPAL APIs were like that and I chose to not "fix" them
because I've been aiming at reconciling the Linux and the FW versions
of the file, which I haven't had a chance to do yet (Cyril & Sam have
some WIP in that area afaik).

We could de-camelify them both but I'd rather do the reunification
first.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/3] powerpc/ftrace: simplify prepare_ftrace_return

2014-09-23 Thread Benjamin Herrenschmidt

On Wed, 2014-09-24 at 12:22 +1000, Anton Blanchard wrote:
> Hi Steve,
> 
> > > This could be broken from the earlier patches, I haven't run just
> > > this test. I probably should on them.
> > 
> > I went back and tested, and it breaks under the first patch.
> 
> Thanks for testing. It looks like some toolchains have issues
> other than the -fno-no-omit-frame-pointer one, and -mno-sched-epilog
> works around it.
> 
> I'll drop that patch and respin.

Or maybe do a toolchain check / or enable it in LE ?

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc/powernv: Override dma_get_required_mask()

2014-09-25 Thread Benjamin Herrenschmidt

On Thu, 2014-09-25 at 17:24 +1000, Gavin Shan wrote:
> When using bypass window on IODA2, the incorrect DMA operations
> "dma_iommu_ops" is used by devices. The device driver calls
> dma_get_required_mask() to determine using 32-bits or 64-bits
> bypass DMA window. Unfortunately, the returned DMA mask always
> forces the driver to use 32-bits DMA window. The problem was
> reported on the device as follows:

I would write the above comment a bit differently:

The dma_get_required_mask() function is used by some drivers to
query the platform about what DMA mask is needed to cover all of memory.
This is a bit of a strange semantic when we have to chose between iommu
translation or bypass, but essentially what it means is "what DMA mask
will give best performances".

Currently, our iommu backend always returns a 32-bit mask here, we don't
do anything special to it when we have bypass available. This causes
some drivers to chose a 32-bit mask, thus losing the ability to use the
bypass window, thinking this is more efficient.

This patch adds an override of that function in order to, instead,
return a 64-bit mask whenever a bypass window is available in order for
drivers to prefer this configuration.

 ... or something along those lines.

> 0004:03:00.0 0107: 1000:0087 (rev 05)
> 0004:03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios \
>  Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
> 
> The patch fixes above issue by overriding dma_get_required_mask(),
> which returns mask corresponding to bypass window base. Otherwise,
> "dma_iommu_ops::get_required_mask" will be called to return mask
> corresponding to 32-bits DMA window.
> 
> Reported-by: Murali N. Iyer 
> Signed-off-by: Gavin Shan 
> ---
>  arch/powerpc/include/asm/dma-mapping.h|  1 +
>  arch/powerpc/kernel/dma.c | 14 ++
>  arch/powerpc/platforms/powernv/pci-ioda.c | 23 +++
>  arch/powerpc/platforms/powernv/pci.c  | 11 +++
>  arch/powerpc/platforms/powernv/pci.h  |  2 ++
>  arch/powerpc/platforms/powernv/powernv.h  |  6 ++
>  arch/powerpc/platforms/powernv/setup.c|  9 +
>  7 files changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index 150866b..894d538 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -135,6 +135,7 @@ static inline int dma_supported(struct device *dev, u64 
> mask)
>  
>  extern int dma_set_mask(struct device *dev, u64 dma_mask);
>  extern int __dma_set_mask(struct device *dev, u64 dma_mask);
> +extern u64 __dma_get_required_mask(struct device *dev);
>  
>  #define dma_alloc_coherent(d,s,h,f)  dma_alloc_attrs(d,s,h,f,NULL)
>  
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index ee78f6e..210ff9d 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -202,6 +202,7 @@ int __dma_set_mask(struct device *dev, u64 dma_mask)
>   *dev->dma_mask = dma_mask;
>   return 0;
>  }
> +
>  int dma_set_mask(struct device *dev, u64 dma_mask)
>  {
>   if (ppc_md.dma_set_mask)
> @@ -210,13 +211,10 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
>  }
>  EXPORT_SYMBOL(dma_set_mask);
>  
> -u64 dma_get_required_mask(struct device *dev)
> +u64 __dma_get_required_mask(struct device *dev)
>  {
>   struct dma_map_ops *dma_ops = get_dma_ops(dev);
>  
> - if (ppc_md.dma_get_required_mask)
> - return ppc_md.dma_get_required_mask(dev);
> -
>   if (unlikely(dma_ops == NULL))
>   return 0;
>  
> @@ -225,6 +223,14 @@ u64 dma_get_required_mask(struct device *dev)
>  
>   return DMA_BIT_MASK(8 * sizeof(dma_addr_t));
>  }
> +
> +u64 dma_get_required_mask(struct device *dev)
> +{
> + if (ppc_md.dma_get_required_mask)
> + return ppc_md.dma_get_required_mask(dev);
> +
> + return __dma_get_required_mask(dev);
> +}
>  EXPORT_SYMBOL_GPL(dma_get_required_mask);
>  
>  static int __init dma_init(void)
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 36b1a7a..380ebc9 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -890,6 +890,28 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
>   return 0;
>  }
>  
> +static u64 pnv_pci_ioda_dma_get_required_mask(struct pnv_phb *phb,
> +   struct pci_dev *pdev)
> +{
> + struct pci_dn *pdn = pci_get_pdn(pdev);
> + struct pnv_ioda_pe *pe;
> + u64 end, mask;
> +
> + if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
> + return 0;
> +
> + pe = &phb->ioda.pe_array[pdn->pe_number];
> + if (!pe->tce_bypass_enabled)
> + return __dma_get_required_mask(&pdev->dev);
> +
> +
> + end = pe->tce_bypass_base + memblock_end_of_DRAM();
> + mask

[PATCH 1/4] pci/msi: Move "force_32bit_msi" flag from powerpc to generic pci_dev

2014-09-30 Thread Benjamin Herrenschmidt


Some devices have broken 64-bit MSI support which only support some
address bits (40 to 48 typically). This doesn't work on some platforms
such as POWER servers, so we need a quirk.

Currently we keep a flag in a powerpc specific data structure which we
have per PCI device. However this is impractical as we really want the
driver to set that flag appropriately (and the driver shouldn't touch
that arch specific data structure).

It's also not unlikely that this limitation will affect other architectures
in the long run so may as well be prepared for it.

So this moves the flag to struct pci_dev instead and adjusts the
corresponding arch/powerpc code to look for it there. At this point,
there is no attempt at making other architectures honor it just yet
though from what I can tell, x86 seems to always use 32-bit addresses
for MSIs.

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---

Note: CC'ing stable as I really want this to hit distros, without that
(and the three subsequent patches), we crash during boot with a number
of radeon cards on power machines.

Note2: Alex, we can wait for the response of the HW guys for which revisions
actually need the quirk in hda_intel but I don't see a big risk or issue in
just doing it for all AMD/ATI for now and fix that up later

 arch/powerpc/include/asm/pci-bridge.h | 2 --
 arch/powerpc/kernel/pci_64.c  | 5 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 +--
 arch/powerpc/platforms/powernv/pci.c  | 3 +--
 arch/powerpc/platforms/pseries/msi.c  | 2 +-
 include/linux/pci.h   | 1 +
 6 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 4ca90a3..725247b 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -159,8 +159,6 @@ struct pci_dn {
 
int pci_ext_config_space;   /* for pci devices */
 
-   boolforce_32bit_msi;
-
struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 155013d..a6ce5fe 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -269,10 +269,7 @@ EXPORT_SYMBOL(pcibus_to_node);
 
 static void quirk_radeon_32bit_msi(struct pci_dev *dev)
 {
-   struct pci_dn *pdn = pci_get_pdn(dev);
-
-   if (pdn)
-   pdn->force_32bit_msi = true;
+   dev->force_32bit_msi = true;
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index df241b1..9d98475 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1311,7 +1311,6 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, 
struct pci_dev *dev,
  unsigned int is_64, struct msi_msg *msg)
 {
struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
-   struct pci_dn *pdn = pci_get_pdn(dev);
struct irq_data *idata;
struct irq_chip *ichip;
unsigned int xive_num = hwirq - phb->msi_base;
@@ -1327,7 +1326,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, 
struct pci_dev *dev,
return -ENXIO;
 
/* Force 32-bit MSI on some broken devices */
-   if (pdn && pdn->force_32bit_msi)
+   if (dev->force_32bit_msi)
is_64 = 0;
 
/* Assign XIVE to PE */
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index b854b57..4e43a6f 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -50,9 +50,8 @@ static int pnv_msi_check_device(struct pci_dev* pdev, int 
nvec, int type)
 {
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
struct pnv_phb *phb = hose->private_data;
-   struct pci_dn *pdn = pci_get_pdn(pdev);
 
-   if (pdn && pdn->force_32bit_msi && !phb->msi32_support)
+   if (pdev->force_32bit_msi && !phb->msi32_support)
return -ENODEV;
 
return (phb && phb->msi_bmp.bitmap) ? 0 : -ENODEV;
diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 18ff462..b3f2c1a 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -429,7 +429,7 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
 */
 again:
if (type == PCI_CAP_ID_MSI) {
-   if (pdn->force_32bit_msi) {
+   if (pdev->force_32bit_msi) {
rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec);

[PATCH 2/4] gpu/radeon: Generalize 64-bit MSI quirks

2014-09-30 Thread Benjamin Herrenschmidt


A number of radeon cards have a HW limitation causing them to be
unable to generate the full 64-bit of address bits for MSIs. This
breaks MSIs on some platforms such as POWER machines.

We used to have a powerpc specific quirk to address that on a
single card, but this doesn't scale very well, this is better
put under control of the drivers who know precisely what a given
HW revision can do.

This moves the setting of the quirk flag to the radeon driver

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---
 arch/powerpc/kernel/pci_64.c|  1 -
 drivers/gpu/drm/radeon/radeon_irq_kms.c | 10 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index a6ce5fe..66e1cd0 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -271,5 +271,4 @@ static void quirk_radeon_32bit_msi(struct pci_dev *dev)
 {
dev->force_32bit_msi = true;
 }
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c 
b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index 16807af..7ea1ea9 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -202,6 +202,16 @@ static bool radeon_msi_ok(struct radeon_device *rdev)
if (rdev->flags & RADEON_IS_AGP)
return false;
 
+   /*
+* Older chips have a HW limitation, they can only generate 40 bits
+* of address for "64-bit" MSIs which breaks on some platforms, notably
+* IBM POWER servers, so we limit them
+*/
+   if (rdev->family < CHIP_BONAIRE) {
+   dev_info(rdev->dev, "radeon: MSI limited to 32-bit\n");
+   rdev->pdev->force_32bit_msi = true;
+   }
+
/* force MSI on */
if (radeon_msi == 1)
return true;



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] sounds/hda/radeon: Disable 64-bit DMA on radeon

2014-09-30 Thread Benjamin Herrenschmidt

The chipset has a limitation in the number of address bits it
can generate. The graphics portion uses a specific mask of
40 or 48 bits depending on the generation. For audio, it's a bit
less of an issue, so just mark them as no-64bit for now.

Without this, it crashes on POWER machines which can use high bits
in the DMA address to distinguish between DMA windows. 

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---
 sound/pci/hda/hda_intel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 3e6d22d..2b679d5 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -297,7 +297,7 @@ enum {
 /* quirks for ATI/AMD HDMI */
 #define AZX_DCAPS_PRESET_ATI_HDMI \
(AZX_DCAPS_NO_TCSEL | AZX_DCAPS_SYNC_WRITE | AZX_DCAPS_POSFIX_LPIB|\
-AZX_DCAPS_NO_MSI64)
+AZX_DCAPS_NO_MSI64 | AZX_DCAPS_NO_64BIT)
 
 /* quirks for Nvidia */
 #define AZX_DCAPS_PRESET_NVIDIA \


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/4] sound/hda/radeon: Generalize 64-bit MSI quirks

2014-09-30 Thread Benjamin Herrenschmidt

A number of radeon cards have a HW limitation causing them to be
unable to generate the full 64-bit of address bits for MSIs. This
breaks MSIs on some platforms such as POWER machines.

We used to have a powerpc specific quirk to address that on a
single card, but this doesn't scale very well, this is better
put under control of the drivers who know precisely what a given
HW revision can do.

This moves the setting of the quirk flag to the HDA driver when
detecting the radeon audio interface.

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---
 arch/powerpc/kernel/pci_64.c |  6 --
 sound/pci/hda/hda_intel.c| 10 --
 sound/pci/hda/hda_priv.h |  1 +
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 66e1cd0..b15194e 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -266,9 +266,3 @@ int pcibus_to_node(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pcibus_to_node);
 #endif
-
-static void quirk_radeon_32bit_msi(struct pci_dev *dev)
-{
-   dev->force_32bit_msi = true;
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index aa302fb..3e6d22d 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -296,7 +296,8 @@ enum {
 
 /* quirks for ATI/AMD HDMI */
 #define AZX_DCAPS_PRESET_ATI_HDMI \
-   (AZX_DCAPS_NO_TCSEL | AZX_DCAPS_SYNC_WRITE | AZX_DCAPS_POSFIX_LPIB)
+   (AZX_DCAPS_NO_TCSEL | AZX_DCAPS_SYNC_WRITE | AZX_DCAPS_POSFIX_LPIB|\
+AZX_DCAPS_NO_MSI64)
 
 /* quirks for Nvidia */
 #define AZX_DCAPS_PRESET_NVIDIA \
@@ -1505,9 +1506,14 @@ static int azx_first_init(struct azx *chip)
return -ENXIO;
}
 
-   if (chip->msi)
+   if (chip->msi) {
+   if (chip->driver_caps & AZX_DCAPS_NO_MSI64) {
+   dev_dbg(card->dev, "Disabling 64bit MSI\n");
+   pci->force_32bit_msi = true;
+   }
if (pci_enable_msi(pci) < 0)
chip->msi = 0;
+   }
 
if (azx_acquire_irq(chip, 0) < 0)
return -EBUSY;
diff --git a/sound/pci/hda/hda_priv.h b/sound/pci/hda/hda_priv.h
index 949cd43..5016014 100644
--- a/sound/pci/hda/hda_priv.h
+++ b/sound/pci/hda/hda_priv.h
@@ -171,6 +171,7 @@ enum { SDI0, SDI1, SDI2, SDI3, SDO0, SDO1, SDO2, SDO3 };
 #define AZX_DCAPS_PM_RUNTIME   (1 << 26)   /* runtime PM support */
 #define AZX_DCAPS_I915_POWERWELL (1 << 27) /* HSW i915 powerwell support */
 #define AZX_DCAPS_CORBRP_SELF_CLEAR (1 << 28)  /* CORBRP clears itself after 
reset */
+#define AZX_DCAPS_NO_MSI64  (1 << 29)  /* Stick to 32-bit MSIs */
 
 /* HD Audio class code */
 #define PCI_CLASS_MULTIMEDIA_HD_AUDIO  0x0403



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 02/17] powerpc/cell: Move data segment faulting code out of cell platform

2014-09-30 Thread Benjamin Herrenschmidt

On Wed, 2014-10-01 at 16:47 +1000, Michael Ellerman wrote:
> 
> If we give it a name that says what it does, we get
> copro_get_ea_esid_and_vsid().
> Or something equally ugly.

copro_calc_full_va() ?

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] sounds/hda/radeon: Disable 64-bit DMA on radeon

2014-10-01 Thread Benjamin Herrenschmidt

On Wed, 2014-10-01 at 09:38 +0200, Takashi Iwai wrote:

> > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > index 3e6d22d..2b679d5 100644
> > --- a/sound/pci/hda/hda_intel.c
> > +++ b/sound/pci/hda/hda_intel.c
> > @@ -297,7 +297,7 @@ enum {
> >  /* quirks for ATI/AMD HDMI */
> >  #define AZX_DCAPS_PRESET_ATI_HDMI \
> > (AZX_DCAPS_NO_TCSEL | AZX_DCAPS_SYNC_WRITE | AZX_DCAPS_POSFIX_LPIB|\
> > -AZX_DCAPS_NO_MSI64)
> > +AZX_DCAPS_NO_MSI64 | AZX_DCAPS_NO_64BIT)
> 
> The only concern is that this will disable 64bit DMA also on x86 where
> it has been working fine.  Can we add an ifdef CONFIG_PPC for this?

I don't like that approach because technically the chip doesn't do
64-bit DMA ... it does something like 40 or 48 (might actually depend on
the chip version) and for all I know it will break on future x86 with
more memory or other platforms with similar address encodings as
powerpc...

The right thing might be to get the exact number of bits and do the
appropriate dma_set_mask() like the graphics driver does, but that's a
bit tricky unless we add a DMA mask field in that big array of chips in
there...

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] sounds/hda/radeon: Disable 64-bit DMA on radeon

2014-10-01 Thread Benjamin Herrenschmidt

On Wed, 2014-10-01 at 17:41 +1000, Benjamin Herrenschmidt wrote:

> I don't like that approach because technically the chip doesn't do
> 64-bit DMA ... it does something like 40 or 48 (might actually depend on
> the chip version) and for all I know it will break on future x86 with
> more memory or other platforms with similar address encodings as
> powerpc...
> 
> The right thing might be to get the exact number of bits and do the
> appropriate dma_set_mask() like the graphics driver does, but that's a
> bit tricky unless we add a DMA mask field in that big array of chips in
> there...

Note that I made the assumption that in the grand scheme of things, HDMI
audio isn't a very big throughput device and thus the extra iommu
mappings (or swiotlb) shouldn't hurt that much. But I may be wrong...

So by all means, let me know what you want done here, but I don't think
CONFIG_PPC is the right thing to do here, especially since this is not
even something that affects all PPCs, but specifically the POWER server
PCI bridges for which we configure a 64-bit window high up in the
address space.

FYI: Basically we setup PCI DMA addresses 0...2G as a 32-bit IOMMU
remapped window and we also configure a "direct map" window higher up
that bypasses the IOMMU and provides a direct linear mapping of RAM for
64-bit capable devices. The problem is that the HW on those server chips
requires that second window to have address bit 59 set. Then based on
the DMA mask, the arch code provide the device with DMA ops that will
either go through the iommu remap window or through the bypass window
based on the DMA mask set by the driver.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] sounds/hda/radeon: Disable 64-bit DMA on radeon

2014-10-01 Thread Benjamin Herrenschmidt

On Wed, 2014-10-01 at 13:58 -0400, Alex Deucher wrote:
> Patch looks good.  Audio DMAs are limited to 40 bits, same as the GPU
> side.  I'm still waiting to hear back on the MSIs for audio, but they
> probably follow the GPU side, so I expect they should be fixed on Sea
> Islands as well.

In the audio driver we don't have the "family names", just a list of
PCI IDs, do you happen to know which ones are unaffected ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] pci/msi: Move "force_32bit_msi" flag from powerpc to generic pci_dev

2014-10-01 Thread Benjamin Herrenschmidt

On Wed, 2014-10-01 at 14:33 -0600, Bjorn Helgaas wrote:
> I like the idea of handling this more generically, e.g., with a bit like
> this in struct pci_dev (I'd probably name it something like "no_64bit_msi"
> along the lines of your driver #defines).
> 
> What I don't like is that we haven't done anything to help other
> architectures, because the only code that *looks* at this bit is in
> arch/powerpc.  The next arch that tries to use 64-bit MSI addresses for
> these devices will trip over the same problem.

I started looking. From my (limited) understanding of x86, it doesn't
even look at the "size" of MSIs which makes me think it's always 32-bit,
and I plan to look at the others next (though not for stable).

> Can we check in pci_enable_msi_range() and pci_enable_msix_range() whether
> the MSI addresses allocated by the arch are too big, and fail the call if
> they are?

Yes, good idea, I'll add something.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] sounds/hda/radeon: Disable 64-bit DMA on radeon

2014-10-01 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 08:08 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2014-10-01 at 13:58 -0400, Alex Deucher wrote:
> > Patch looks good.  Audio DMAs are limited to 40 bits, same as the GPU
> > side.  I'm still waiting to hear back on the MSIs for audio, but they
> > probably follow the GPU side, so I expect they should be fixed on Sea
> > Islands as well.
> 
> In the audio driver we don't have the "family names", just a list of
> PCI IDs, do you happen to know which ones are unaffected ?

So I tried to be more discriminate but in the end, I had to give up, the
most recent ID we have in the driver seems to be Cap Verde/Pitcairn,
which isn't fixed. So either we are missing IDs or the new cards use the
same ID.

I think the consequences of being over-zealous here are nil on what
matters, ie, x86. AFAIK, x86 always uses 32-bit addresses for MSIs
anyway, so it's unaffected and ARM ... well, I doubt it will be. In
general, I don't think we have a big issue here by just flagging them
all.

I'll send an updated series with just the flag name changed as suggested
by Alex and the DMA bits updated as suggested by Takashi.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/4] pci/msi: Move "no_64bit_msi" flag from powerpc to generic pci_dev

2014-10-01 Thread Benjamin Herrenschmidt

Some devices have broken 64-bit MSI support which only support some
address bits (40 to 48 typically). This doesn't work on some platforms
such as POWER servers, so we need a quirk.

Currently we keep a flag in a powerpc specific data structure which we
have per PCI device. However this is impractical as we really want the
driver to set that flag appropriately (and the driver shouldn't touch
that arch specific data structure).

It's also not unlikely that this limitation will affect other architectures
in the long run.

So this moves the flag to struct pci_dev instead and adjusts the
corresponding arch/powerpc code to look for it there. At this point,
there is no attempt at making other architectures honor it just yet,
though it appears that x86 doesn't care and always generates 32-bit
MSI addresses.

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---

v2: Rename flag to "no_64bit_msi" as suggested by Bjorn

 arch/powerpc/include/asm/pci-bridge.h | 2 --
 arch/powerpc/kernel/pci_64.c  | 5 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 +--
 arch/powerpc/platforms/powernv/pci.c  | 3 +--
 arch/powerpc/platforms/pseries/msi.c  | 2 +-
 include/linux/pci.h   | 1 +
 6 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 4ca90a3..725247b 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -159,8 +159,6 @@ struct pci_dn {
 
int pci_ext_config_space;   /* for pci devices */
 
-   boolforce_32bit_msi;
-
struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 155013d..d41a831 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -269,10 +269,7 @@ EXPORT_SYMBOL(pcibus_to_node);
 
 static void quirk_radeon_32bit_msi(struct pci_dev *dev)
 {
-   struct pci_dn *pdn = pci_get_pdn(dev);
-
-   if (pdn)
-   pdn->force_32bit_msi = true;
+   dev->no_64bit_msi = true;
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index df241b1..a188bb8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1311,7 +1311,6 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, 
struct pci_dev *dev,
  unsigned int is_64, struct msi_msg *msg)
 {
struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
-   struct pci_dn *pdn = pci_get_pdn(dev);
struct irq_data *idata;
struct irq_chip *ichip;
unsigned int xive_num = hwirq - phb->msi_base;
@@ -1327,7 +1326,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, 
struct pci_dev *dev,
return -ENXIO;
 
/* Force 32-bit MSI on some broken devices */
-   if (pdn && pdn->force_32bit_msi)
+   if (dev->no_64bit_msi)
is_64 = 0;
 
/* Assign XIVE to PE */
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index b854b57..89c6608 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -50,9 +50,8 @@ static int pnv_msi_check_device(struct pci_dev* pdev, int 
nvec, int type)
 {
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
struct pnv_phb *phb = hose->private_data;
-   struct pci_dn *pdn = pci_get_pdn(pdev);
 
-   if (pdn && pdn->force_32bit_msi && !phb->msi32_support)
+   if (pdev->no_64bit_msi && !phb->msi32_support)
return -ENODEV;
 
return (phb && phb->msi_bmp.bitmap) ? 0 : -ENODEV;
diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 18ff462..6fd96d8 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -429,7 +429,7 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
 */
 again:
if (type == PCI_CAP_ID_MSI) {
-   if (pdn->force_32bit_msi) {
+   if (pdev->no_64bit_msi) {
rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec);
if (rc < 0) {
/*
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 96453f9..fc938003 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -331,6 +331,7 @@ struct pci_dev {
unsigned intis_added:1;
unsigned intis_busmaster:1; /* device is busmaster */
unsigned

[PATCH v2 2/4] gpu/radeon: Move 64-bit MSI quirk from arch to driver

2014-10-01 Thread Benjamin Herrenschmidt

A number of radeon cards have a HW limitation causing them to be
unable to generate the full 64-bit of address bits for MSIs. This
breaks MSIs on some platforms such as POWER machines.

We used to have a powerpc specific quirk to address that on a
single card, but this doesn't scale very well, this is better
put under control of the drivers who know precisely what a given
HW revision can do.

This moves the setting of the quirk flag to the radeon driver

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---

v2: This is just adjusted to the new flag name

 arch/powerpc/kernel/pci_64.c|  1 -
 drivers/gpu/drm/radeon/radeon_irq_kms.c | 10 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index d41a831..5330f6d 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -271,5 +271,4 @@ static void quirk_radeon_32bit_msi(struct pci_dev *dev)
 {
dev->no_64bit_msi = true;
 }
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c 
b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index 16807af..e760671 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -202,6 +202,16 @@ static bool radeon_msi_ok(struct radeon_device *rdev)
if (rdev->flags & RADEON_IS_AGP)
return false;
 
+   /*
+* Older chips have a HW limitation, they can only generate 40 bits
+* of address for "64-bit" MSIs which breaks on some platforms, notably
+* IBM POWER servers, so we limit them
+*/
+   if (rdev->family < CHIP_BONAIRE) {
+   dev_info(rdev->dev, "radeon: MSI limited to 32-bit\n");
+   rdev->pdev->no_64bit_msi = true;
+   }
+
/* force MSI on */
if (radeon_msi == 1)
return true;


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] ALSA: hda - Limit 40bit DMA for AMD HDMI controllers

2014-10-01 Thread Benjamin Herrenschmidt


AMD/ATI HDMI controller chip models, we already have a filter to lower
to 32bit DMA, but the rest are supposed to be working with 64bit
although the hardware doesn't really work with 63bit but only with 40
or 48bit DMA.  In this patch, we take 40bit DMA for safety for the
AMD/ATI controllers as the graphics drivers does.

Signed-off-by: Takashi Iwai 
Signed-off-by: Benjamin Herrenschmidt 
CC: 
---

Tested, works fine. This patch is actually independent of the rest
of the series

 sound/pci/hda/hda_intel.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f91ba7f..48d0f30 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1483,6 +1483,7 @@ static int azx_first_init(struct azx *chip)
struct snd_card *card = chip->card;
int err;
unsigned short gcap;
+   unsigned int dma_bits = 64;
 
 #if BITS_PER_LONG != 64
/* Fix up base address on ULI M5461 */
@@ -1524,9 +1525,14 @@ static int azx_first_init(struct azx *chip)
gcap = azx_readw(chip, GCAP);
dev_dbg(card->dev, "chipset global capabilities = 0x%x\n", gcap);
 
+   /* AMD devices support 40 or 48bit DMA, take the safe one */
+   if (chip->pci->vendor == PCI_VENDOR_ID_AMD)
+   dma_bits = 40;
+
/* disable SB600 64bit support for safety */
if (chip->pci->vendor == PCI_VENDOR_ID_ATI) {
struct pci_dev *p_smbus;
+   dma_bits = 40;
p_smbus = pci_get_device(PCI_VENDOR_ID_ATI,
 PCI_DEVICE_ID_ATI_SBX00_SMBUS,
 NULL);
@@ -1556,9 +1562,11 @@ static int azx_first_init(struct azx *chip)
}
 
/* allow 64bit DMA address if supported by H/W */
-   if ((gcap & AZX_GCAP_64OK) && !pci_set_dma_mask(pci, DMA_BIT_MASK(64)))
-   pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(64));
-   else {
+   if (!(gcap & AZX_GCAP_64OK))
+   dma_bits = 32;
+   if (!pci_set_dma_mask(pci, DMA_BIT_MASK(dma_bits))) {
+   pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(dma_bits));
+   } else {
pci_set_dma_mask(pci, DMA_BIT_MASK(32));
pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(32));
}




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 3/4] sound/radeon: Move 64-bit MSI quirk from arch to driver

2014-10-01 Thread Benjamin Herrenschmidt


A number of radeon cards have a HW limitation causing them to be
unable to generate the full 64-bit of address bits for MSIs. This
breaks MSIs on some platforms such as POWER machines.

We used to have a powerpc specific quirk to address that on a
single card, but this doesn't scale very well, this is better
put under control of the drivers who know precisely what a given
HW revision can do.

This moves the setting of the quirk flag to the audio driver.

While recent ASICs have that problem fixed, they don't seem to
be listed in the PCI IDs of the current driver, so let's quirk all
the ATI HDMI for now. The consequences are nil on x86 anyway.

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---
 arch/powerpc/kernel/pci_64.c |  6 --
 sound/pci/hda/hda_intel.c| 10 --
 sound/pci/hda/hda_priv.h |  1 +
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 5330f6d..b15194e 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -266,9 +266,3 @@ int pcibus_to_node(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pcibus_to_node);
 #endif
-
-static void quirk_radeon_32bit_msi(struct pci_dev *dev)
-{
-   dev->no_64bit_msi = true;
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index aa302fb..f91ba7f 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -296,7 +296,8 @@ enum {
 
 /* quirks for ATI/AMD HDMI */
 #define AZX_DCAPS_PRESET_ATI_HDMI \
-   (AZX_DCAPS_NO_TCSEL | AZX_DCAPS_SYNC_WRITE | AZX_DCAPS_POSFIX_LPIB)
+   (AZX_DCAPS_NO_TCSEL | AZX_DCAPS_SYNC_WRITE | AZX_DCAPS_POSFIX_LPIB|\
+AZX_DCAPS_NO_MSI64)
 
 /* quirks for Nvidia */
 #define AZX_DCAPS_PRESET_NVIDIA \
@@ -1505,9 +1506,14 @@ static int azx_first_init(struct azx *chip)
return -ENXIO;
}
 
-   if (chip->msi)
+   if (chip->msi) {
+   if (chip->driver_caps & AZX_DCAPS_NO_MSI64) {
+   dev_dbg(card->dev, "Disabling 64bit MSI\n");
+   pci->no_64bit_msi = true;
+   }
if (pci_enable_msi(pci) < 0)
chip->msi = 0;
+   }
 
if (azx_acquire_irq(chip, 0) < 0)
return -EBUSY;
diff --git a/sound/pci/hda/hda_priv.h b/sound/pci/hda/hda_priv.h
index 949cd43..5016014 100644
--- a/sound/pci/hda/hda_priv.h
+++ b/sound/pci/hda/hda_priv.h
@@ -171,6 +171,7 @@ enum { SDI0, SDI1, SDI2, SDI3, SDO0, SDO1, SDO2, SDO3 };
 #define AZX_DCAPS_PM_RUNTIME   (1 << 26)   /* runtime PM support */
 #define AZX_DCAPS_I915_POWERWELL (1 << 27) /* HSW i915 powerwell support */
 #define AZX_DCAPS_CORBRP_SELF_CLEAR (1 << 28)  /* CORBRP clears itself after 
reset */
+#define AZX_DCAPS_NO_MSI64  (1 << 29)  /* Stick to 32-bit MSIs */
 
 /* HD Audio class code */
 #define PCI_CLASS_MULTIMEDIA_HD_AUDIO  0x0403




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/4] gpu/radeon: Move 64-bit MSI quirk from arch to driver

2014-10-01 Thread Benjamin Herrenschmidt


A number of radeon cards have a HW limitation causing them to be
unable to generate the full 64-bit of address bits for MSIs. This
breaks MSIs on some platforms such as POWER machines.

We used to have a powerpc specific quirk to address that on a
single card, but this doesn't scale very well, this is better
put under control of the drivers who know precisely what a given
HW revision can do.

This moves the setting of the quirk flag to the radeon driver

Signed-off-by: Benjamin Herrenschmidt 
---

v2: This is just adjusted to the new flag name

 arch/powerpc/kernel/pci_64.c|  1 -
 drivers/gpu/drm/radeon/radeon_irq_kms.c | 10 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index d41a831..5330f6d 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -271,5 +271,4 @@ static void quirk_radeon_32bit_msi(struct pci_dev *dev)
 {
dev->no_64bit_msi = true;
 }
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c 
b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index 16807af..e760671 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -202,6 +202,16 @@ static bool radeon_msi_ok(struct radeon_device *rdev)
if (rdev->flags & RADEON_IS_AGP)
return false;
 
+   /*
+* Older chips have a HW limitation, they can only generate 40 bits
+* of address for "64-bit" MSIs which breaks on some platforms, notably
+* IBM POWER servers, so we limit them
+*/
+   if (rdev->family < CHIP_BONAIRE) {
+   dev_info(rdev->dev, "radeon: MSI limited to 32-bit\n");
+   rdev->pdev->no_64bit_msi = true;
+   }
+
/* force MSI on */
if (radeon_msi == 1)
return true;



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/4] pci/msi: Move "no_64bit_msi" flag from powerpc to generic pci_dev

2014-10-01 Thread Benjamin Herrenschmidt


Some devices have broken 64-bit MSI support which only support some
address bits (40 to 48 typically). This doesn't work on some platforms
such as POWER servers, so we need a quirk.

Currently we keep a flag in a powerpc specific data structure which we
have per PCI device. However this is impractical as we really want the
driver to set that flag appropriately (and the driver shouldn't touch
that arch specific data structure).

It's also not unlikely that this limitation will affect other architectures
in the long run.

So this moves the flag to struct pci_dev instead and adjusts the
corresponding arch/powerpc code to look for it there. At this point,
there is no attempt at making other architectures honor it just yet,
though it appears that x86 doesn't care and always generates 32-bit
MSI addresses.

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---

v2: Rename flag to "no_64bit_msi" as suggested by Bjorn

 arch/powerpc/include/asm/pci-bridge.h | 2 --
 arch/powerpc/kernel/pci_64.c  | 5 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 +--
 arch/powerpc/platforms/powernv/pci.c  | 3 +--
 arch/powerpc/platforms/pseries/msi.c  | 2 +-
 include/linux/pci.h   | 1 +
 6 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 4ca90a3..725247b 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -159,8 +159,6 @@ struct pci_dn {
 
int pci_ext_config_space;   /* for pci devices */
 
-   boolforce_32bit_msi;
-
struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 155013d..d41a831 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -269,10 +269,7 @@ EXPORT_SYMBOL(pcibus_to_node);
 
 static void quirk_radeon_32bit_msi(struct pci_dev *dev)
 {
-   struct pci_dn *pdn = pci_get_pdn(dev);
-
-   if (pdn)
-   pdn->force_32bit_msi = true;
+   dev->no_64bit_msi = true;
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index df241b1..a188bb8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1311,7 +1311,6 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, 
struct pci_dev *dev,
  unsigned int is_64, struct msi_msg *msg)
 {
struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
-   struct pci_dn *pdn = pci_get_pdn(dev);
struct irq_data *idata;
struct irq_chip *ichip;
unsigned int xive_num = hwirq - phb->msi_base;
@@ -1327,7 +1326,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, 
struct pci_dev *dev,
return -ENXIO;
 
/* Force 32-bit MSI on some broken devices */
-   if (pdn && pdn->force_32bit_msi)
+   if (dev->no_64bit_msi)
is_64 = 0;
 
/* Assign XIVE to PE */
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index b854b57..89c6608 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -50,9 +50,8 @@ static int pnv_msi_check_device(struct pci_dev* pdev, int 
nvec, int type)
 {
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
struct pnv_phb *phb = hose->private_data;
-   struct pci_dn *pdn = pci_get_pdn(pdev);
 
-   if (pdn && pdn->force_32bit_msi && !phb->msi32_support)
+   if (pdev->no_64bit_msi && !phb->msi32_support)
return -ENODEV;
 
return (phb && phb->msi_bmp.bitmap) ? 0 : -ENODEV;
diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 18ff462..6fd96d8 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -429,7 +429,7 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
 */
 again:
if (type == PCI_CAP_ID_MSI) {
-   if (pdn->force_32bit_msi) {
+   if (pdev->no_64bit_msi) {
rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec);
if (rc < 0) {
/*
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 96453f9..fc938003 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -331,6 +331,7 @@ struct pci_dev {
unsigned intis_added:1;
unsigned intis_busmaster:1; /* device is busmaster */
unsigned

Re: [PATCH v2 3/4] sound/radeon: Move 64-bit MSI quirk from arch to driver

2014-10-02 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 09:59 +0200, Takashi Iwai wrote:

> > The attached updated patch only flags the affected asics.
> 
> Alex's patch looks more comprehensive.  For either patch,
> 
> Reviewed-by: Takashi Iwai 

Absolutely. I'll give it a spin tomorrow to be 100% sure. From there,
it will need the previous patch moving the quirk so I'm thinking of
merging it via Bjorn or my tree unless objection...

Cheers,
Ben.

> 
> thanks,
> 
> Takashi
> 
> > 
> > Alex
> > 
> > > ---
> > >  arch/powerpc/kernel/pci_64.c |  6 --
> > >  sound/pci/hda/hda_intel.c| 10 --
> > >  sound/pci/hda/hda_priv.h |  1 +
> > >  3 files changed, 9 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> > > index 5330f6d..b15194e 100644
> > > --- a/arch/powerpc/kernel/pci_64.c
> > > +++ b/arch/powerpc/kernel/pci_64.c
> > > @@ -266,9 +266,3 @@ int pcibus_to_node(struct pci_bus *bus)
> > >  }
> > >  EXPORT_SYMBOL(pcibus_to_node);
> > >  #endif
> > > -
> > > -static void quirk_radeon_32bit_msi(struct pci_dev *dev)
> > > -{
> > > -   dev->no_64bit_msi = true;
> > > -}
> > > -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, 
> > > quirk_radeon_32bit_msi);
> > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > > index aa302fb..f91ba7f 100644
> > > --- a/sound/pci/hda/hda_intel.c
> > > +++ b/sound/pci/hda/hda_intel.c
> > > @@ -296,7 +296,8 @@ enum {
> > >
> > >  /* quirks for ATI/AMD HDMI */
> > >  #define AZX_DCAPS_PRESET_ATI_HDMI \
> > > -   (AZX_DCAPS_NO_TCSEL | AZX_DCAPS_SYNC_WRITE | 
> > > AZX_DCAPS_POSFIX_LPIB)
> > > +   (AZX_DCAPS_NO_TCSEL | AZX_DCAPS_SYNC_WRITE | 
> > > AZX_DCAPS_POSFIX_LPIB|\
> > > +AZX_DCAPS_NO_MSI64)
> > >
> > >  /* quirks for Nvidia */
> > >  #define AZX_DCAPS_PRESET_NVIDIA \
> > > @@ -1505,9 +1506,14 @@ static int azx_first_init(struct azx *chip)
> > > return -ENXIO;
> > > }
> > >
> > > -   if (chip->msi)
> > > +   if (chip->msi) {
> > > +   if (chip->driver_caps & AZX_DCAPS_NO_MSI64) {
> > > +   dev_dbg(card->dev, "Disabling 64bit MSI\n");
> > > +   pci->no_64bit_msi = true;
> > > +   }
> > > if (pci_enable_msi(pci) < 0)
> > > chip->msi = 0;
> > > +   }
> > >
> > > if (azx_acquire_irq(chip, 0) < 0)
> > > return -EBUSY;
> > > diff --git a/sound/pci/hda/hda_priv.h b/sound/pci/hda/hda_priv.h
> > > index 949cd43..5016014 100644
> > > --- a/sound/pci/hda/hda_priv.h
> > > +++ b/sound/pci/hda/hda_priv.h
> > > @@ -171,6 +171,7 @@ enum { SDI0, SDI1, SDI2, SDI3, SDO0, SDO1, SDO2, SDO3 
> > > };
> > >  #define AZX_DCAPS_PM_RUNTIME   (1 << 26)   /* runtime PM support */
> > >  #define AZX_DCAPS_I915_POWERWELL (1 << 27) /* HSW i915 powerwell 
> > > support */
> > >  #define AZX_DCAPS_CORBRP_SELF_CLEAR (1 << 28)  /* CORBRP clears itself 
> > > after reset */
> > > +#define AZX_DCAPS_NO_MSI64  (1 << 29)  /* Stick to 32-bit MSIs */
> > >
> > >  /* HD Audio class code */
> > >  #define PCI_CLASS_MULTIMEDIA_HD_AUDIO  0x0403
> > >
> > >
> > >
> > >


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 15/17] cxl: Userspace header file.

2014-10-02 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 20:28 +1000, Ian Munsie wrote:
> Hey Michael,
> 
> Excerpts from Michael Ellerman's message of 2014-10-02 16:02:37 +1000:
> > > +/* ioctls */
> > > +struct cxl_ioctl_start_work {
> > > +__u64 wed;
> > > +__u64 amr;
> > > +__u64 reserved1;
> > > +__u32 reserved2;
> > > +__s16 num_interrupts; /* -1 = use value from afu descriptor */
> > > +__u16 process_element; /* returned from kernel */
> > > +__u64 reserved3;
> > > +__u64 reserved4;
> > > +__u64 reserved5;
> > > +__u64 reserved6;
> > 
> > Why so many reserved fields?
> 
> The first two are reserved for the context save area (reserved1) and
> size (reserved2) of the "shared" (AKA time sliced) virtualisation model,
> which we don't yet support. That only leaves us with four reserved
> fields for anything that we haven't thought of or that the hardware team
> hasn't come up with yet ;-)
> 
> > What mechanism is there that will allow you to ever unreserve them?
> >
> > ie. how does a new userspace detect that the kernel it's running on supports
> > new fields?
> 
> The ioctl will return -EINVAL if any of them are set to non-zero values,
> so userspace can easily tell if it's running on an old kernel.

Not good enough in my experience. Throw in a flags field I'd say..

> > Or conversely how does a new kernel detect that userspace has passed it a
> > meaningful value in one of the previously reserved fields?
> 
> They would have to be non-zero (certainly true of the context save
> area's size), or one could turn into a flags field or api version.

If you go that way you need to negociate as well latest compatible
etc...

> > > +#define CXL_MAGIC 0xCA
> > > +#define CXL_IOCTL_START_WORK  _IOWR(CXL_MAGIC, 0x00, struct 
> > > cxl_ioctl_start_work)
> > 
> > What happened to 0x1 ?
> 
> That was used to dynamically program the FPGA with a new AFU image, but
> we don't have anything to test it on yet and I'm not convinced that the
> procedure won't change by the time we do, so we pulled the code.
> 
> We can repack the ioctl numbers easily enough... Will do :)
> 
> > > +enum cxl_event_type {
> > > +CXL_EVENT_READ_FAIL = -1,
> > 
> > I don't see this used?
> 
> That was used in the userspace library to mark it's buffer as bad if the
> read() call failed for whatever reason... but you're right - it isn't
> used by the kernel and doesn't belong in this header. Will remove.
> 
> > > +struct cxl_event_header {
> > > +__u32 type;
> > > +__u16 size;
> > > +__u16 process_element;
> > > +__u64 reserved1;
> > > +__u64 reserved2;
> > > +__u64 reserved3;
> > > +};
> > 
> > Again lots of reserved fields?
> 
> Figured it was better to have a bit more than we expect we might need
> just in case... We can reduce this if you feel it is excessive?
> 
> In an earlier version of the code the kernel would fill out the header
> and not clear an event if a buffer was passed in that was too small, so
> userspace could realloc a larger buffer and try again. This made the API
> a bit more complex and our internal users weren't too keen on it, so we
> decided to use a fixed-size buffer and make it larger than we strictly
> needed so we have plenty of room for further expansion.
> 
> > Rather than having the header included in every event, would it be clearer 
> > if
> > the cxl_event was:
> > 
> > struct cxl_event {
> > struct cxl_event_header header;
> > union {
> > struct cxl_event_afu_interrupt irq;
> > struct cxl_event_data_storage fault;
> > struct cxl_event_afu_error afu_err;
> > };
> > };
> 
> Sounds like a good idea to me :)
> 
> Cheers,
> -Ian


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/4] pci/msi: Move "no_64bit_msi" flag from powerpc to generic pci_dev

2014-10-02 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 09:31 -0600, Bjorn Helgaas wrote:

> > So this moves the flag to struct pci_dev instead and adjusts the
> > corresponding arch/powerpc code to look for it there. At this point,
> > there is no attempt at making other architectures honor it just yet,
> > though it appears that x86 doesn't care and always generates 32-bit
> > MSI addresses.
> 
> I thought you were going to add something in drivers/pci to make this more
> generic?

I don't see how. The only thing I can add is the check which I still
plan to do. But the decision on the address is in the arch.

> Is it feasible to structure this series as follows?
> 
>   - add generic stuff (struct pci_dev bit, msi.c checks)
>   - add quirks to drivers to set the pci_dev bit
>   - change powerpc to check pci_dev bit instead of pdn bit
>   - remove powerpc-specific quirks since nobody looks as the pdn bit
> 
> I think it would be easier to follow if the powerpc-specific parts weren't
> intertwined with the rest.

Well, in that case the first patch just adds one bit to pci_dev and
maybe one check for address to msi.c. Unfortunately there isn't more
I can do generically. But ok, I'll do it that way.

> > Signed-off-by: Benjamin Herrenschmidt 
> > CC: 
> > ---
> > 
> > v2: Rename flag to "no_64bit_msi" as suggested by Bjorn
> > 
> >  arch/powerpc/include/asm/pci-bridge.h | 2 --
> >  arch/powerpc/kernel/pci_64.c  | 5 +
> >  arch/powerpc/platforms/powernv/pci-ioda.c | 3 +--
> >  arch/powerpc/platforms/powernv/pci.c  | 3 +--
> >  arch/powerpc/platforms/pseries/msi.c  | 2 +-
> >  include/linux/pci.h   | 1 +
> >  6 files changed, 5 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/pci-bridge.h 
> > b/arch/powerpc/include/asm/pci-bridge.h
> > index 4ca90a3..725247b 100644
> > --- a/arch/powerpc/include/asm/pci-bridge.h
> > +++ b/arch/powerpc/include/asm/pci-bridge.h
> > @@ -159,8 +159,6 @@ struct pci_dn {
> >  
> > int pci_ext_config_space;   /* for pci devices */
> >  
> > -   boolforce_32bit_msi;
> > -
> > struct  pci_dev *pcidev;/* back-pointer to the pci device */
> >  #ifdef CONFIG_EEH
> > struct eeh_dev *edev;   /* eeh device */
> > diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> > index 155013d..d41a831 100644
> > --- a/arch/powerpc/kernel/pci_64.c
> > +++ b/arch/powerpc/kernel/pci_64.c
> > @@ -269,10 +269,7 @@ EXPORT_SYMBOL(pcibus_to_node);
> >  
> >  static void quirk_radeon_32bit_msi(struct pci_dev *dev)
> >  {
> > -   struct pci_dn *pdn = pci_get_pdn(dev);
> > -
> > -   if (pdn)
> > -   pdn->force_32bit_msi = true;
> > +   dev->no_64bit_msi = true;
> >  }
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index df241b1..a188bb8 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -1311,7 +1311,6 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb 
> > *phb, struct pci_dev *dev,
> >   unsigned int is_64, struct msi_msg *msg)
> >  {
> > struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
> > -   struct pci_dn *pdn = pci_get_pdn(dev);
> > struct irq_data *idata;
> > struct irq_chip *ichip;
> > unsigned int xive_num = hwirq - phb->msi_base;
> > @@ -1327,7 +1326,7 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb 
> > *phb, struct pci_dev *dev,
> > return -ENXIO;
> >  
> > /* Force 32-bit MSI on some broken devices */
> > -   if (pdn && pdn->force_32bit_msi)
> > +   if (dev->no_64bit_msi)
> > is_64 = 0;
> >  
> > /* Assign XIVE to PE */
> > diff --git a/arch/powerpc/platforms/powernv/pci.c 
> > b/arch/powerpc/platforms/powernv/pci.c
> > index b854b57..89c6608 100644
> > --- a/arch/powerpc/platforms/powernv/pci.c
> > +++ b/arch/powerpc/platforms/powernv/pci.c
> > @@ -50,9 +50,8 @@ static int pnv_msi_check_device(struct pci_dev* pdev, int 
> > nvec, int type)
> >  {
> > struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > struct pnv_phb *phb = hose->private_data;
> > -   struct pci_dn *pdn = pci_get_pdn(pdev);
> &g

Re: [PATCH v2 2/4] gpu/radeon: Move 64-bit MSI quirk from arch to driver

2014-10-02 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 09:34 -0600, Bjorn Helgaas wrote:
> On Thu, Oct 02, 2014 at 10:34:22AM +1000, Benjamin Herrenschmidt wrote:
> > 
> > A number of radeon cards have a HW limitation causing them to be
> > unable to generate the full 64-bit of address bits for MSIs. This
> > breaks MSIs on some platforms such as POWER machines.
> > 
> > We used to have a powerpc specific quirk to address that on a
> > single card, but this doesn't scale very well, this is better
> > put under control of the drivers who know precisely what a given
> > HW revision can do.
> > 
> > This moves the setting of the quirk flag to the radeon driver
> > 
> > Signed-off-by: Benjamin Herrenschmidt 
> > ---
> > 
> > v2: This is just adjusted to the new flag name
> 
> I'm sorta confused because I got two "v2 2/4" emails a minute or so apart.
> I assume they're the same.

Yes, evo blew up while sending the series the first time around :(
> > 
> >  arch/powerpc/kernel/pci_64.c|  1 -
> >  drivers/gpu/drm/radeon/radeon_irq_kms.c | 10 ++
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> > index d41a831..5330f6d 100644
> > --- a/arch/powerpc/kernel/pci_64.c
> > +++ b/arch/powerpc/kernel/pci_64.c
> > @@ -271,5 +271,4 @@ static void quirk_radeon_32bit_msi(struct pci_dev *dev)
> >  {
> > dev->no_64bit_msi = true;
> >  }
> > -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);
> 
> Why do we keep the 0xaa68 quirk?  Shouldn't that be made generic, too?
> 
> > diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c 
> > b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> > index 16807af..e760671 100644
> > --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
> > +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> > @@ -202,6 +202,16 @@ static bool radeon_msi_ok(struct radeon_device *rdev)
> > if (rdev->flags & RADEON_IS_AGP)
> > return false;
> >  
> > +   /*
> > +* Older chips have a HW limitation, they can only generate 40 bits
> > +* of address for "64-bit" MSIs which breaks on some platforms, notably
> > +* IBM POWER servers, so we limit them
> > +*/
> > +   if (rdev->family < CHIP_BONAIRE) {
> > +   dev_info(rdev->dev, "radeon: MSI limited to 32-bit\n");
> > +   rdev->pdev->no_64bit_msi = true;
> > +   }
> > +
> > /* force MSI on */
> > if (radeon_msi == 1)
> > return true;
> > 
> > 
> > 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-10-02 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 14:13 -0700, Nishanth Aravamudan wrote:
> Ben & Michael,
> 
> What's the status of these patches?

Waiting for somebody to review them ? :-)

Cheers,
Ben.

> Thanks,
> Nish
> 
> On 27.08.2014 [17:33:59 +0800], Li Zhong wrote:
> > With commit 2fabf084b6ad ("powerpc: reorder per-cpu NUMA information's
> > initialization"), during boottime, cpu_numa_callback() is called
> > earlier(before their online) for each cpu, and verify_cpu_node_mapping()
> > uses cpu_to_node() to check whether siblings are in the same node.
> > 
> > It skips the checking for siblings that are not online yet. So the only
> > check done here is for the bootcpu, which is online at that time. But
> > the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
> > will be set up in smp_prepare_cpus()).
> > 
> > So I saw something like following reported:
> > [0.00] CPU thread siblings 1/2/3 and 0 don't belong to the same
> > node!
> > 
> > As we don't actually do the checking during this early stage, so maybe
> > we could directly call numa_setup_cpu() in do_init_bootmem().
> > 
> > Cc: Nishanth Aravamudan 
> > Cc: Nathan Fontenot 
> > Signed-off-by: Li Zhong 
> > ---
> >  arch/powerpc/mm/numa.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index d7737a5..9918c02 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -1128,8 +1128,7 @@ void __init do_init_bootmem(void)
> >  * early in boot, cf. smp_prepare_cpus().
> >  */
> > for_each_possible_cpu(cpu) {
> > -   cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
> > - (void *)(unsigned long)cpu);
> > +   numa_setup_cpu((unsigned long)cpu);
> > }
> >  }
> > 
> > -- 
> > 1.9.1
> > 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: fix sys_call_table declaration

2014-10-02 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 15:41 +0100, Romeo Cane wrote:
> Declaring sys_call_table as a pointer causes the compiler to generate the 
> wrong lookup code in arch_syscall_addr

Care to elaborate ?

Ben.

> Signed-off-by: Romeo Cane 
> ---
>  arch/powerpc/include/asm/syscall.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/syscall.h 
> b/arch/powerpc/include/asm/syscall.h
> index b54b2ad..528ba9d 100644
> --- a/arch/powerpc/include/asm/syscall.h
> +++ b/arch/powerpc/include/asm/syscall.h
> @@ -17,7 +17,7 @@
>  
>  /* ftrace syscalls requires exporting the sys_call_table */
>  #ifdef CONFIG_FTRACE_SYSCALLS
> -extern const unsigned long *sys_call_table;
> +extern const unsigned long sys_call_table[];
>  #endif /* CONFIG_FTRACE_SYSCALLS */
>  
>  static inline long syscall_get_nr(struct task_struct *task,


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/4] gpu/radeon: Move 64-bit MSI quirk from arch to driver

2014-10-02 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 15:44 -0600, Bjorn Helgaas wrote:
> On Thu, Oct 2, 2014 at 3:02 PM, Benjamin Herrenschmidt
>  wrote:
> > On Thu, 2014-10-02 at 09:34 -0600, Bjorn Helgaas wrote:
> >> On Thu, Oct 02, 2014 at 10:34:22AM +1000, Benjamin Herrenschmidt wrote:
> >> >
> >> > A number of radeon cards have a HW limitation causing them to be
> >> > unable to generate the full 64-bit of address bits for MSIs. This
> >> > breaks MSIs on some platforms such as POWER machines.
> >> >
> >> > We used to have a powerpc specific quirk to address that on a
> >> > single card, but this doesn't scale very well, this is better
> >> > put under control of the drivers who know precisely what a given
> >> > HW revision can do.
> >> >
> >> > This moves the setting of the quirk flag to the radeon driver
> >> >
> >> > Signed-off-by: Benjamin Herrenschmidt 
> >> > ---
> >> >
> >> > v2: This is just adjusted to the new flag name
> >>
> >> I'm sorta confused because I got two "v2 2/4" emails a minute or so apart.
> >> I assume they're the same.
> >
> > Yes, evo blew up while sending the series the first time around :(
> 
> No problem.  My real question is below, but you probably missed it
> because of my email gripe :)

Heh ok :-)

> >> >  arch/powerpc/kernel/pci_64.c|  1 -
> >> >  drivers/gpu/drm/radeon/radeon_irq_kms.c | 10 ++
> >> >  2 files changed, 10 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> >> > index d41a831..5330f6d 100644
> >> > --- a/arch/powerpc/kernel/pci_64.c
> >> > +++ b/arch/powerpc/kernel/pci_64.c
> >> > @@ -271,5 +271,4 @@ static void quirk_radeon_32bit_msi(struct pci_dev 
> >> > *dev)
> >> >  {
> >> > dev->no_64bit_msi = true;
> >> >  }
> >> > -DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, 
> >> > quirk_radeon_32bit_msi);
> >> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, 
> >> > quirk_radeon_32bit_msi);
> >>
> >> Why do we keep the 0xaa68 quirk?  Shouldn't that be made generic, too?

aa68 is the audio part, it's removed by the audio driver patch.

But I can break things down into smaller bits as you suggested. I'll try
to get that sorted later today.

Cheers,
Ben.

> >> > diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c 
> >> > b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> >> > index 16807af..e760671 100644
> >> > --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
> >> > +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> >> > @@ -202,6 +202,16 @@ static bool radeon_msi_ok(struct radeon_device 
> >> > *rdev)
> >> > if (rdev->flags & RADEON_IS_AGP)
> >> > return false;
> >> >
> >> > +   /*
> >> > +* Older chips have a HW limitation, they can only generate 40 bits
> >> > +* of address for "64-bit" MSIs which breaks on some platforms, 
> >> > notably
> >> > +* IBM POWER servers, so we limit them
> >> > +*/
> >> > +   if (rdev->family < CHIP_BONAIRE) {
> >> > +   dev_info(rdev->dev, "radeon: MSI limited to 32-bit\n");
> >> > +   rdev->pdev->no_64bit_msi = true;
> >> > +   }
> >> > +
> >> > /* force MSI on */
> >> > if (radeon_msi == 1)
> >> > return true;
> >> >
> >> >
> >> >
> >
> >


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v3 1/3] powerpc: Fix warning reported by verify_cpu_node_mapping()

2014-10-02 Thread Benjamin Herrenschmidt

On Thu, 2014-10-02 at 14:53 -0700, Nishanth Aravamudan wrote:
> On 03.10.2014 [07:28:19 +1000], Benjamin Herrenschmidt wrote:
> > On Thu, 2014-10-02 at 14:13 -0700, Nishanth Aravamudan wrote:
> > > Ben & Michael,
> > > 
> > > What's the status of these patches?
> > 
> > Waiting for somebody to review them ? :-)
> 
> Heh, I acked them about a month ago, I can give a stronger Reviewed-by,
> if you prefer?

Nah, nothing other than having me or mpe pick them up. I've been letting
Michael handle this merge window, so maybe he should take it :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 3/4] sound/radeon: Move 64-bit MSI quirk from arch to driver

2014-10-02 Thread Benjamin Herrenschmidt

On Wed, 2014-10-01 at 22:13 -0400, Alex Deucher wrote:

> The attached updated patch only flags the affected asics.

I need formal (email :-) permission to add your s-o-b (and From: while
at it since you wrote most of it) for this patch. I'll then include
it in my new series.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/powernv: Fix endian bug in LPC bus debugfs accessors

2014-10-03 Thread Benjamin Herrenschmidt

When reading from the LPC, the OPAL FW calls return the value via pointer
to a uint32_t which is always returned big endian. Our internal inb/outb
implementation byteswaps that fine but our debugfs code is still broken.

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---

diff --git a/arch/powerpc/platforms/powernv/opal-lpc.c 
b/arch/powerpc/platforms/powernv/opal-lpc.c
index ad4b31d..dd2c285 100644
--- a/arch/powerpc/platforms/powernv/opal-lpc.c
+++ b/arch/powerpc/platforms/powernv/opal-lpc.c
@@ -191,6 +191,7 @@ static ssize_t lpc_debug_read(struct file *filp, char 
__user *ubuf,
 {
struct lpc_debugfs_entry *lpc = filp->private_data;
u32 data, pos, len, todo;
+   __be32 bedata;
int rc;
 
if (!access_ok(VERIFY_WRITE, ubuf, count))
@@ -213,9 +214,10 @@ static ssize_t lpc_debug_read(struct file *filp, char 
__user *ubuf,
len = 2;
}
rc = opal_lpc_read(opal_lpc_chip_id, lpc->lpc_type, pos,
-  &data, len);
+  &bedata, len);
if (rc)
return -ENXIO;
+   data = be32_to_cpu(bedata);
switch(len) {
case 4:
rc = __put_user((u32)data, (u32 __user *)ubuf);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/book3s: Fix flush_tlb cpu_spec hook to take a generic argument.

2014-10-03 Thread Benjamin Herrenschmidt

On Fri, 2014-10-03 at 17:36 +1000, Michael Ellerman wrote:
> It seems that the only detail we are abstracting here is that power7 has 128
> TLB sets and power8 has 512. Is that right?

Today ...

> It seems like a lot of code to deal with that one detail.
> 
> I guess the other thing it does is cur_cpu_spec->flush_tlb() is only
> implemented on power7 and power8, ie. not power6. But it seems that the two
> callers can only ever run on power7/power8 anway.
> 
> So couldn't we just do:
>  - add the number of tlb_sets to cpu_spec
>  - use that in flush_tlb()

Provided the form of tlbiel we use remains ... things are different from
embedded for example. I've asked Mahesh to add this hook with the idea
somewhat to migrate existing/older code to use it as well, and on older
chips or embedded the TLB flush is quite different.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/7] ALSA: hda - Limit 40bit DMA for AMD HDMI controllers

2014-10-06 Thread Benjamin Herrenschmidt

From: Takashi Iwai 

AMD/ATI HDMI controller chip models, we already have a filter to lower
to 32bit DMA, but the rest are supposed to be working with 64bit
although the hardware doesn't really work with 63bit but only with 40
or 48bit DMA.  In this patch, we take 40bit DMA for safety for the
AMD/ATI controllers as the graphics drivers does.

Signed-off-by: Takashi Iwai 
Signed-off-by: Benjamin Herrenschmidt 
CC: 
---
 sound/pci/hda/hda_intel.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index aa302fb..99b367b 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1482,6 +1482,7 @@ static int azx_first_init(struct azx *chip)
struct snd_card *card = chip->card;
int err;
unsigned short gcap;
+   unsigned int dma_bits = 64;
 
 #if BITS_PER_LONG != 64
/* Fix up base address on ULI M5461 */
@@ -1518,9 +1519,14 @@ static int azx_first_init(struct azx *chip)
gcap = azx_readw(chip, GCAP);
dev_dbg(card->dev, "chipset global capabilities = 0x%x\n", gcap);
 
+   /* AMD devices support 40 or 48bit DMA, take the safe one */
+   if (chip->pci->vendor == PCI_VENDOR_ID_AMD)
+   dma_bits = 40;
+
/* disable SB600 64bit support for safety */
if (chip->pci->vendor == PCI_VENDOR_ID_ATI) {
struct pci_dev *p_smbus;
+   dma_bits = 40;
p_smbus = pci_get_device(PCI_VENDOR_ID_ATI,
 PCI_DEVICE_ID_ATI_SBX00_SMBUS,
 NULL);
@@ -1550,9 +1556,11 @@ static int azx_first_init(struct azx *chip)
}
 
/* allow 64bit DMA address if supported by H/W */
-   if ((gcap & AZX_GCAP_64OK) && !pci_set_dma_mask(pci, DMA_BIT_MASK(64)))
-   pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(64));
-   else {
+   if (!(gcap & AZX_GCAP_64OK))
+   dma_bits = 32;
+   if (!pci_set_dma_mask(pci, DMA_BIT_MASK(dma_bits))) {
+   pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(dma_bits));
+   } else {
pci_set_dma_mask(pci, DMA_BIT_MASK(32));
pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(32));
}



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 7/7] powerpc/pci: Remove unused force_32bit_msi quirk

2014-10-06 Thread Benjamin Herrenschmidt

This is now fully replaced with the generic "no_64bit_msi" one
that is set by the respective drivers directly.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/pci-bridge.h |  2 --
 arch/powerpc/kernel/pci_64.c  | 10 --
 2 files changed, 12 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 4ca90a3..725247b 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -159,8 +159,6 @@ struct pci_dn {
 
int pci_ext_config_space;   /* for pci devices */
 
-   boolforce_32bit_msi;
-
struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 155013d..b15194e 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -266,13 +266,3 @@ int pcibus_to_node(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pcibus_to_node);
 #endif
-
-static void quirk_radeon_32bit_msi(struct pci_dev *dev)
-{
-   struct pci_dn *pdn = pci_get_pdn(dev);
-
-   if (pdn)
-   pdn->force_32bit_msi = true;
-}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x68f2, quirk_radeon_32bit_msi);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0xaa68, quirk_radeon_32bit_msi);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 4/7] sound/radeon: Add quirk for broken 64-bit MSI

2014-10-06 Thread Benjamin Herrenschmidt

From: Signed-off-by: Alex Deucher 

A number of radeon cards have a HW limitation causing them to be
unable to generate the full 64-bit of address bits for MSIs. This
breaks MSIs on some platforms such as POWER machines.

We used to have a powerpc specific quirk to address that on a
single card, but this doesn't scale very well, this is better
put under control of the drivers who know precisely what a given
HW revision can do.

This moves the setting of the quirk flag to the audio driver.

While recent ASICs have that problem fixed, they don't seem to
be listed in the PCI IDs of the current driver, so let's quirk all
the ATI HDMI for now. The consequences are nil on x86 anyway.

Signed-off-by: Alex Deucher 
Signed-off-by: Benjamin Herrenschmidt 
CC: 
---
 sound/pci/hda/hda_intel.c | 96 +--
 sound/pci/hda/hda_priv.h  |  1 +
 2 files changed, 69 insertions(+), 28 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 99b367b..af38ed9 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1506,9 +1506,14 @@ static int azx_first_init(struct azx *chip)
return -ENXIO;
}
 
-   if (chip->msi)
+   if (chip->msi) {
+   if (chip->driver_caps & AZX_DCAPS_NO_MSI64) {
+   dev_dbg(card->dev, "Disabling 64bit MSI\n");
+   pci->no_64bit_msi = true;
+   }
if (pci_enable_msi(pci) < 0)
chip->msi = 0;
+   }
 
if (azx_acquire_irq(chip, 0) < 0)
return -EBUSY;
@@ -2070,58 +2075,93 @@ static const struct pci_device_id azx_ids[] = {
{ PCI_DEVICE(0x1022, 0x780d),
  .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB },
/* ATI HDMI */
-   { PCI_DEVICE(0x1002, 0x793b),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+   { PCI_DEVICE(0x1002, 0x1314),
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0x7919),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
+   { PCI_DEVICE(0x1002, 0x7969),
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
+   { PCI_DEVICE(0x1002, 0x793b),
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0x960f),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
+   { PCI_DEVICE(0x1002, 0x9646),
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0x970f),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0xaa00),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0xaa08),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0xaa10),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0xaa18),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0xaa20),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0xaa28),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0xaa30),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },
{ PCI_DEVICE(0x1002, 0xaa38),
- .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI },
+ .driver_data = AZX_DRIVER_ATIHDMI | AZX_DCAPS_PRESET_ATI_HDMI |
+ AZX_DCAPS_NO_MSI64 },

[PATCH v3 6/7] powerpc/pseries: Honor the generic "no_64bit_msi" flag

2014-10-06 Thread Benjamin Herrenschmidt

Instead of the arch specific quirk which we are deprecating
and that drivers don't understand.

Signed-off-by: Benjamin Herrenschmidt 
CC: 
---
 arch/powerpc/platforms/pseries/msi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 18ff462..6fd96d8 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -429,7 +429,7 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type)
 */
 again:
if (type == PCI_CAP_ID_MSI) {
-   if (pdn->force_32bit_msi) {
+   if (pdev->no_64bit_msi) {
rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSI_FN, nvec);
if (rc < 0) {
/*


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 7623 matches

Mail list logo