Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-10 Thread Martin Schwidefsky
On Fri, 10 May 2019 12:40:58 -0400
Steven Rostedt  wrote:

> On Fri, 10 May 2019 18:32:58 +0200
> Martin Schwidefsky  wrote:
> 
> > On Fri, 10 May 2019 12:24:01 -0400
> > Steven Rostedt  wrote:
> >   
> > > On Fri, 10 May 2019 10:42:13 +0200
> > > Petr Mladek  wrote:
> > > 
> > > >  static const char *check_pointer_msg(const void *ptr)
> > > >  {
> > > > -   char byte;
> > > > -
> > > > if (!ptr)
> > > > return "(null)";
> > > >  
> > > > -   if (probe_kernel_address(ptr, byte))
> > > > +   if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > > > return "(efault)";
> > > >
> > > 
> > > 
> > >   < PAGE_SIZE ?
> > > 
> > > do you mean: < TASK_SIZE ?
> > 
> > The check with < TASK_SIZE would break on s390. The 'ptr' is
> > in the kernel address space, *not* in the user address space.
> > Remember s390 has two separate address spaces for kernel/user
> > the check < TASK_SIZE only makes sense with a __user pointer.
> >   
> 
> So we allow this to read user addresses? Can't that cause a fault?
> 
> If the condition is true, we return "(efault)".

On x86 this would allow a user space access as kernel and user live
in the same address space, on s390 it would not.
h
-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-10 Thread Martin Schwidefsky
On Fri, 10 May 2019 12:24:01 -0400
Steven Rostedt  wrote:

> On Fri, 10 May 2019 10:42:13 +0200
> Petr Mladek  wrote:
> 
> >  static const char *check_pointer_msg(const void *ptr)
> >  {
> > -   char byte;
> > -
> > if (!ptr)
> > return "(null)";
> >  
> > -   if (probe_kernel_address(ptr, byte))
> > +   if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > return "(efault)";
> >
> 
> 
>   < PAGE_SIZE ?
> 
> do you mean: < TASK_SIZE ?

The check with < TASK_SIZE would break on s390. The 'ptr' is
in the kernel address space, *not* in the user address space.
Remember s390 has two separate address spaces for kernel/user
the check < TASK_SIZE only makes sense with a __user pointer.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-05-02 Thread Martin Schwidefsky
On Thu, 2 May 2019 16:31:10 +0200
Greg KH  wrote:

> On Thu, May 02, 2019 at 04:17:58PM +0200, Martin Schwidefsky wrote:
> > On Thu, 2 May 2019 14:21:28 +0200
> > Greg KH  wrote:
> >   
> > > On Mon, Apr 15, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:  
> > > > On Sun, Apr 14, 2019 at 10:19 PM Christoph Hellwig  
> > > > wrote:
> > > > >
> > > > > Can we please have the page refcount overflow fixes out on the list
> > > > > for review, even if it is after the fact?
> > > > 
> > > > They were actually on a list for review long before the fact, but it
> > > > was the security mailing list. The issue actually got discussed back
> > > > in January along with early versions of the patches, but then we
> > > > dropped the ball because it just wasn't on anybody's radar and it got
> > > > resurrected late March. Willy wrote a rather bigger patch-series, and
> > > > review of that is what then resulted in those commits. So they may
> > > > look recent, but that's just because the original patches got
> > > > seriously edited down and rewritten.
> > > > 
> > > > That said, powerpc and s390 should at least look at maybe adding a
> > > > check for the page ref in their gup paths too. Powerpc has the special
> > > > gup_hugepte() case, and s390 has its own version of gup entirely. I
> > > > was actually hoping the s390 guys would look at using the generic gup
> > > > code.
> > > > 
> > > > I ruthlessly also entirely ignored MIPS, SH and sparc, since they seem
> > > > largely irrelevant, partly since even theoretically this whole issue
> > > > needs a _lot_ of memory.
> > > > 
> > > > Michael, Martin, see commit 6b3a70773630 ("Merge branch 'page-refs'
> > > > (page ref overflow)"). You may or may not really care.
> > > 
> > > I've now queued these patches up for the next round of stable releases,
> > > as some people seem to care about these.
> > > 
> > > I didn't see any follow-on patches for s390 or ppc64 hit the tree for
> > > these changes, am I just missing them and should also queue up a few
> > > more to handle this issue on those platforms?  
> > 
> > I fixed that with a different approach. The following two patches are
> > queued for the next merge window:
> > 
> > d1874a0c2805 "s390/mm: make the pxd_offset functions more robust"
> > 1a42010cdc26 "s390/mm: convert to the generic get_user_pages_fast code"
> > 
> > With these two s390 now uses the generic gup code in mm/gup.c  
> 
> Nice!  Do you want me to queue those up for the stable backports once
> they hit a public -rc release?

Yes please!

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-05-02 Thread Martin Schwidefsky
On Thu, 2 May 2019 14:21:28 +0200
Greg KH  wrote:

> On Mon, Apr 15, 2019 at 09:17:10AM -0700, Linus Torvalds wrote:
> > On Sun, Apr 14, 2019 at 10:19 PM Christoph Hellwig  
> > wrote:  
> > >
> > > Can we please have the page refcount overflow fixes out on the list
> > > for review, even if it is after the fact?  
> > 
> > They were actually on a list for review long before the fact, but it
> > was the security mailing list. The issue actually got discussed back
> > in January along with early versions of the patches, but then we
> > dropped the ball because it just wasn't on anybody's radar and it got
> > resurrected late March. Willy wrote a rather bigger patch-series, and
> > review of that is what then resulted in those commits. So they may
> > look recent, but that's just because the original patches got
> > seriously edited down and rewritten.
> > 
> > That said, powerpc and s390 should at least look at maybe adding a
> > check for the page ref in their gup paths too. Powerpc has the special
> > gup_hugepte() case, and s390 has its own version of gup entirely. I
> > was actually hoping the s390 guys would look at using the generic gup
> > code.
> > 
> > I ruthlessly also entirely ignored MIPS, SH and sparc, since they seem
> > largely irrelevant, partly since even theoretically this whole issue
> > needs a _lot_ of memory.
> > 
> > Michael, Martin, see commit 6b3a70773630 ("Merge branch 'page-refs'
> > (page ref overflow)"). You may or may not really care.  
> 
> I've now queued these patches up for the next round of stable releases,
> as some people seem to care about these.
> 
> I didn't see any follow-on patches for s390 or ppc64 hit the tree for
> these changes, am I just missing them and should also queue up a few
> more to handle this issue on those platforms?

I fixed that with a different approach. The following two patches are
queued for the next merge window:

d1874a0c2805 "s390/mm: make the pxd_offset functions more robust"
1a42010cdc26 "s390/mm: convert to the generic get_user_pages_fast code"

With these two s390 now uses the generic gup code in mm/gup.c

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-04-23 Thread Martin Schwidefsky
On Fri, 19 Apr 2019 10:27:17 -0700
Linus Torvalds  wrote:

> On Fri, Apr 19, 2019 at 6:33 AM Martin Schwidefsky
>  wrote:
> >
> > That problem got stuck in my head and I thought more about it. Why not
> > emulate the static folding sequence in the s390 page table code?  
> 
> So this model seems much closer to what x86 does in its folding, where
> the pattern is basically
> 
> > static inline pX-1d_t *pXd_offset(pXd_t *pXd, unsigned long address)
> > {
> > if (pXd_folded(pXd)
> > return (pX-1d_t *) pXd;
> > return (pX-1d_t *) pXd_deref(*pXd) + pXd_index(address);
> > }  
> 
> which is really how the code is designed to work (ie the folded entry
> doesn't actually do anything to the page directory pointer, it just
> says "ok, we'll use this exact page directory pointer for the next
> lower level instead".
> 
> And that's very much what allows the generic gup code to load the
> entry once, and use a temporary, and as you walk down the chain, if it
> is folded it just then uses that (previous) temporary value for the
> next level instead. IOW, the lower level page table is hidden inside
> the upper level one, and folding just means "don't do any offsets,
> don't change any values, just use the entry as-is for the next lower
> level".
> 
> So I think that's the right thing to do.

Ok, I added two patches for my s390/linux:features branch

Martin Schwidefsky (2):
  s390/mm: make the pxd_offset functions more robust
  s390/mm: convert to the generic get_user_pages_fast code

All code changes are inside arch/s390, I plan to include these patches with
the next merge window. That gives us a little bit of time to run our tests.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-04-19 Thread Martin Schwidefsky
On Thu, 18 Apr 2019 20:41:44 +0200
Martin Schwidefsky  wrote:

> On Thu, 18 Apr 2019 08:49:32 -0700
> Linus Torvalds  wrote:
> 
> > On Thu, Apr 18, 2019 at 1:02 AM Martin Schwidefsky
> >  wrote:  
> > >
> > > The problematic lines in the generic gup code are these three:
> > >
> > > 1845:   pmdp = pmd_offset(, addr);
> > > 1888:   pudp = pud_offset(, addr);
> > > 1916:   p4dp = p4d_offset(, addr);
> > >
> > > Passing the pointer of a *copy* of a page table entry to pxd_offset() does
> > > not work with the page table folding on s390.
> > 
> > Hmm. I wonder why. x86 too does the folding thing for the p4d and pud case.
> > 
> > The folding works with the local copy just the same way it works with
> > the orignal value.  
> 
> The difference is that with the static page table folding pgd_offset()
> does the index calculation of the actual hardware top-level table. With
> dynamic page table folding as s390 is doing it, if the task does not use
> a 5-level page table pgd_offset() will see a pgd_index() of 0, the indexing
> of the actual top-level table is done later with p4d_offset(), pud_offset()
> or pmd_offset(). 
> 
> As an example, with a three level page table we have three indexes x/y/z.
> The common code "thinks" 5 indexing steps, with static folding the index
> sequence is x 0 0 y z. With dynamic folding the sequence is 0 0 x y z.
> By moving the first indexing operation to pgd_offset the static sequence
> does not add an index to a non-dereferenced pointer to a stack variable,
> the dynamic sequence does.

That problem got stuck in my head and I thought more about it. Why not
emulate the static folding sequence in the s390 page table code?

As the table type is encoded in every entry for the region and segment
tables, pgd_offset() can look at the first entry to find the table type
and then do the correct index calculation for the given top-level table.
Like this:

static inline pgd_t *pgd_offset_raw(pgd_t *pgd, unsigned long address)
{
unsigned long rste;
unsigned int shift;

/* Get the first entry of the top level table */
rste = pgd_val(*pgd);
/* Pick up the shift from the table type of the first entry */
shift = ((rste & _REGION_ENTRY_TYPE_MASK) >> 2) * 11 + 20;
return pgd + ((address >> shift) & (PTRS_PER_PGD - 1));
}

#define pgd_offset(mm, address) pgd_offset_raw((mm)->pgd, address)
#define pgd_offset_k(address) pgd_offset(_mm, address)

static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
{
if ((pgd_val(*pgd) & _REGION_ENTRY_TYPE_MASK) != _REGION_ENTRY_TYPE_R1)
return (p4d_t *) pgd;
return (p4d_t *) pgd_deref(*pgd) + p4d_index(address);
}

static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
{
if ((p4d_val(*p4d) & _REGION_ENTRY_TYPE_MASK) != _REGION_ENTRY_TYPE_R2)
return (pud_t *) p4d;
return (pud_t *) p4d_deref(*p4d) + pud_index(address);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
if ((pud_val(*pud) & _REGION_ENTRY_TYPE_MASK) != _REGION_ENTRY_TYPE_R3)
return (pmd_t *) pud;
return (pmd_t *) pud_deref(*pud) + pmd_index(address);
}

This needs more thorough testing but in principle it does work. The kernel
boots and survives a kernel compile. The only things that is slightly off is
that pgd_offset() now has to look at the first table entry to do its job.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-04-18 Thread Martin Schwidefsky
On Thu, 18 Apr 2019 08:49:32 -0700
Linus Torvalds  wrote:

> On Thu, Apr 18, 2019 at 1:02 AM Martin Schwidefsky
>  wrote:
> >
> > The problematic lines in the generic gup code are these three:
> >
> > 1845:   pmdp = pmd_offset(, addr);
> > 1888:   pudp = pud_offset(, addr);
> > 1916:   p4dp = p4d_offset(, addr);
> >
> > Passing the pointer of a *copy* of a page table entry to pxd_offset() does
> > not work with the page table folding on s390.  
> 
> Hmm. I wonder why. x86 too does the folding thing for the p4d and pud case.
> 
> The folding works with the local copy just the same way it works with
> the orignal value.

The difference is that with the static page table folding pgd_offset()
does the index calculation of the actual hardware top-level table. With
dynamic page table folding as s390 is doing it, if the task does not use
a 5-level page table pgd_offset() will see a pgd_index() of 0, the indexing
of the actual top-level table is done later with p4d_offset(), pud_offset()
or pmd_offset(). 

As an example, with a three level page table we have three indexes x/y/z.
The common code "thinks" 5 indexing steps, with static folding the index
sequence is x 0 0 y z. With dynamic folding the sequence is 0 0 x y z.
By moving the first indexing operation to pgd_offset the static sequence
does not add an index to a non-dereferenced pointer to a stack variable,
the dynamic sequence does.

> But I see that s390 does some other kind of folding and does that
> addition of the p*d_index() unconditionally.
> 
> I guess that does mean that s390 will just have to have its own walker.
> 
> For the issue of the page refcount overflow it really isn't a huge
> deal. Adding the refcount checking is simple (see the example patch I
> gave for powerpc - you'll just have a couple of extra cases since you
> do it all, rather than just the special hugetlb cases).
> 
> Obviously in general it would have been nicer to share as much code as
> possible, but let's not make things unnecessarily complex if s390 is
> just fundamentally different..

It would have been nice to use the generic code (less bugs) but not at
the price of over-complicating things. And that page table folding thing
always makes my head hurt.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-04-18 Thread Martin Schwidefsky
On Wed, 17 Apr 2019 09:57:01 -0700
Linus Torvalds  wrote:

> On Wed, Apr 17, 2019 at 1:02 AM Martin Schwidefsky
>  wrote:
> >
> > Grumpf, that does *not* work. For gup the table entries may be read only
> > once. Now I remember why I open-coded p4d_offset, pud_offset and pmd_offset
> > in arch/s390/mm/gup.c, to avoid to read the table entries twice.
> > It will be hard to use the common gup code after all.  
> 
> Hmm. The common gup code generally should do the "read only once"
> thing too (since by definition the gup-fast case is done without
> locking), although it's probably the case that most architectures
> simply don't care.
> 
> What would it require for the generic code to work for s390?

The problematic lines in the generic gup code are these three:

1845:   pmdp = pmd_offset(, addr);
1888:   pudp = pud_offset(, addr);
1916:   p4dp = p4d_offset(, addr);

Passing the pointer of a *copy* of a page table entry to pxd_offset() does
not work with the page table folding on s390. The pxd_offset() function
on s390 have to make a choice, either return the dereferenced value behind
the passed pointer (that works) or return the original page table pointer
if the table level is folded (that does not work).

To fix this we would need three new helpers pmd_offset_orig, pud_offset_orig
and p4d_offset_orig, their generic definition would look like this:

#define p4d_offset_orig(pgdp, pgd, address)p4d_offset(, address)
#define pud_offset_orig(p4dp, p4d, address)pud_offset(, address)
#define pmd_offset_orig(pudp, pud, address)pmd_offset(, address)

For the s390 definition see the following branch:

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git generic-gup

A quick test with this branch shows everything working normally.
Keeping my fingers crossed that I did not miss anything.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-04-17 Thread Martin Schwidefsky
On Wed, 17 Apr 2019 09:46:37 +0200
Martin Schwidefsky  wrote:

> On Tue, 16 Apr 2019 09:49:46 -0700
> Linus Torvalds  wrote:
> 
> > On Tue, Apr 16, 2019 at 9:16 AM Linus Torvalds
> >  wrote:  
> > >
> > > We actually already *have* this function.
> > >
> > > It's called "gup_fast_permitted()" and it's used by x86-64 to verify
> > > the proper address range. Exactly like s390 needs..
> > >
> > > Could you please use that instead?
> > 
> > IOW, something like the attached.
> > 
> > Obviously untested. And maybe 'current' isn't declared in
> > , in which case you'd need to modify it to instead make
> > the inline function be "s390_gup_fast_permitted()" that takes a
> > pointer to the mm, and do something like
> > 
> >   #define gup_fast_permitted(start, pages) \
> >  s390_gup_fast_permitted(current->mm, start, pages)
> > 
> > instead.
> > 
> > But I think you get the idea..  
> 
> Nice, I did not realize that gup_fast_permitted is a platform
> override-able function. So that part is doable in arch/s390. But I
> spoke to soon, I got my first crash and realized that the common gup code
> is not usable as it is. The reason is this e.g. this sequence:
> 
>   pgdp = pgd_offset(current->mm, addr);
>   pgd_t pgd = READ_ONCE(*pgdp);
>   /* some checking on pgd */
>   gup_p4d_range(pgd, addr, next, write, pages, nr);
> 
>   p4dp = p4d_offset(, addr);
>   p4d_t p4d = READ_ONCE(*p4dp);
>   /* some checking on p4d */
>   gup_pud_range(p4d, addr, next, write, pages, nr);
> 
>   pudp = pud_offset(, addr);
>   pud_t pud = READ_ONCE(*pudp);
>   /* some checking on pud */
>   gup_pmd_range(pud, addr, next, write, pages, nr;
> 
> Each step along the way will read the page table entry and pass the
> table entry to the next function. This clashes with the page table
> folding on s390. The s390 gup code looks more like this:
> 
>   pgdp = pgd_offset(current->mm, addr);
>   /* some checking on pgd */
>   pgd_t pgd = READ_ONCE(*pgdp);
>   gup_p4d_range(pgdp, pgd, addr, next, write, pages, );
> 
>   p4dp = p4d_offset(pgdp, addr);
>   p4d_t p4d = READ_ONCE(*p4dp);
>   /* some checking on p4d */
>   gup_pud_range(p4dp, p4d, addr, next, write, pages, nr);
> 
>   pudp = pud_offset(p4dp, addr);
>   pud_t pud = READ_ONCE(*pudp);
>   /* some checking on pud */
>   gup_pmd_range(pudp, pud, addr, next, write, pages, nr;
> 
> There are magic dereferences in the s390 versions of p4d_offset,
> pud_offset and pmd_offset functions. To make this work the pointer
> passed to these functions may not be the local copy of the already
> dereferenced table entry. I'll cook up a patch for the common code.

Grumpf, that does *not* work. For gup the table entries may be read only
once. Now I remember why I open-coded p4d_offset, pud_offset and pmd_offset
in arch/s390/mm/gup.c, to avoid to read the table entries twice.
It will be hard to use the common gup code after all.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-04-17 Thread Martin Schwidefsky
On Tue, 16 Apr 2019 09:49:46 -0700
Linus Torvalds  wrote:

> On Tue, Apr 16, 2019 at 9:16 AM Linus Torvalds
>  wrote:
> >
> > We actually already *have* this function.
> >
> > It's called "gup_fast_permitted()" and it's used by x86-64 to verify
> > the proper address range. Exactly like s390 needs..
> >
> > Could you please use that instead?  
> 
> IOW, something like the attached.
> 
> Obviously untested. And maybe 'current' isn't declared in
> , in which case you'd need to modify it to instead make
> the inline function be "s390_gup_fast_permitted()" that takes a
> pointer to the mm, and do something like
> 
>   #define gup_fast_permitted(start, pages) \
>  s390_gup_fast_permitted(current->mm, start, pages)
> 
> instead.
> 
> But I think you get the idea..

Nice, I did not realize that gup_fast_permitted is a platform
override-able function. So that part is doable in arch/s390. But I
spoke to soon, I got my first crash and realized that the common gup code
is not usable as it is. The reason is this e.g. this sequence:

pgdp = pgd_offset(current->mm, addr);
pgd_t pgd = READ_ONCE(*pgdp);
/* some checking on pgd */
gup_p4d_range(pgd, addr, next, write, pages, nr);

p4dp = p4d_offset(, addr);
p4d_t p4d = READ_ONCE(*p4dp);
/* some checking on p4d */
gup_pud_range(p4d, addr, next, write, pages, nr);

pudp = pud_offset(, addr);
pud_t pud = READ_ONCE(*pudp);
/* some checking on pud */
gup_pmd_range(pud, addr, next, write, pages, nr;

Each step along the way will read the page table entry and pass the
table entry to the next function. This clashes with the page table
folding on s390. The s390 gup code looks more like this:

pgdp = pgd_offset(current->mm, addr);
/* some checking on pgd */
pgd_t pgd = READ_ONCE(*pgdp);
gup_p4d_range(pgdp, pgd, addr, next, write, pages, );

p4dp = p4d_offset(pgdp, addr);
p4d_t p4d = READ_ONCE(*p4dp);
/* some checking on p4d */
gup_pud_range(p4dp, p4d, addr, next, write, pages, nr);

pudp = pud_offset(p4dp, addr);
pud_t pud = READ_ONCE(*pudp);
/* some checking on pud */
gup_pmd_range(pudp, pud, addr, next, write, pages, nr;

There are magic dereferences in the s390 versions of p4d_offset,
pud_offset and pmd_offset functions. To make this work the pointer
passed to these functions may not be the local copy of the already
dereferenced table entry. I'll cook up a patch for the common code.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-04-16 Thread Martin Schwidefsky
On Tue, 16 Apr 2019 11:09:06 +0200
Martin Schwidefsky  wrote:

> On Mon, 15 Apr 2019 09:17:10 -0700
> Linus Torvalds  wrote:
> 
> > On Sun, Apr 14, 2019 at 10:19 PM Christoph Hellwig  
> > wrote:  
> > >
> > > Can we please have the page refcount overflow fixes out on the list
> > > for review, even if it is after the fact?
> > 
> > They were actually on a list for review long before the fact, but it
> > was the security mailing list. The issue actually got discussed back
> > in January along with early versions of the patches, but then we
> > dropped the ball because it just wasn't on anybody's radar and it got
> > resurrected late March. Willy wrote a rather bigger patch-series, and
> > review of that is what then resulted in those commits. So they may
> > look recent, but that's just because the original patches got
> > seriously edited down and rewritten.  
> 
> First time I hear about this, thanks for the heads up.
>  
> > That said, powerpc and s390 should at least look at maybe adding a
> > check for the page ref in their gup paths too. Powerpc has the special
> > gup_hugepte() case, and s390 has its own version of gup entirely. I
> > was actually hoping the s390 guys would look at using the generic gup
> > code.  
> 
> We did look at converting the s390 gup code to CONFIG_HAVE_GENERIC_GUP,
> there are some details that need careful consideration. The top one
> is access_ok(), for s390 we always return true. The generic gup code
> relies on the fact that a page table walk with a specific address is
> doable if access_ok() returned true, the s390 specific check is slightly
> different:
> 
> if ((end <= start) || (end > mm->context.asce_limit))
> return 0;
> 
> The obvious approach would be to modify access_ok() to check against
> the asce_limit. I will try and see if anything breaks, e.g. the automatic
> page table upgrade.

I tested the waters in regard to access_ok() and the generic gup code.
The good news is that mm/gup.c with CONFIG_HAVE_GENERIC_GUP=y seems to
work just fine if the access_ok() issue is taken care of. But..

Bloat-o-meter with a non-empty uaccess_ok() that checks against
current->mm->context.asce_limit:

add/remove: 8/2 grow/shrink: 611/11 up/down: 61352/-1914 (59438)

with CONFIG_HAVE_GENERIC_GUP on top of that

add/remove: 10/2 grow/shrink: 612/12 up/down: 63568/-3280 (60288)

This is not nice, would a patch like the following be acceptable?
--
Subject: [PATCH] mm: introduce mm_pgd_walk_ok

Add the architecture overrideable function mm_pgd_walk_ok() to check
if a block of memory is inside the limits of the page table hierarchy
of a given mm struct.

Signed-off-by: Martin Schwidefsky 
---
 include/asm-generic/pgtable.h | 4 
 mm/gup.c  | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index fa782fba51ee..7d2a8a58f1c1 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1186,4 +1186,8 @@ static inline bool arch_has_pfn_modify_check(void)
 #define mm_pmd_folded(mm)  __is_defined(__PAGETABLE_PMD_FOLDED)
 #endif
 
+#ifndef mm_pgd_walk_ok
+#define mm_pgd_walk_ok(mm, addr, size) access_ok(addr, size)
+#endif
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/gup.c b/mm/gup.c
index 91819b8ad9cc..b3eb3f45d237 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1990,7 +1990,7 @@ int __get_user_pages_fast(unsigned long start, int 
nr_pages, int write,
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
 
-   if (unlikely(!access_ok((void __user *)start, len)))
+   if (unlikely(!mm_pgd_walk_ok(current->mm, (void __user *)start, len)))
return 0;
 
/*
@@ -2044,7 +2044,7 @@ int get_user_pages_fast(unsigned long start, int 
nr_pages, int write,
if (nr_pages <= 0)
return 0;
 
-   if (unlikely(!access_ok((void __user *)start, len)))
+   if (unlikely(!mm_pgd_walk_ok(current->mm, (void __user *)start, len)))
return -EFAULT;
 
if (gup_fast_permitted(start, nr_pages)) {
-- 
2.16.4

With an empty access_ok() but a "real" mm_pgd_walk_ok() the results are
much more reasonable:

add/remove: 2/0 grow/shrink: 2/1 up/down: 2186/-1382 (804)

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Linux 5.1-rc5

2019-04-16 Thread Martin Schwidefsky
On Mon, 15 Apr 2019 09:17:10 -0700
Linus Torvalds  wrote:

> On Sun, Apr 14, 2019 at 10:19 PM Christoph Hellwig  wrote:
> >
> > Can we please have the page refcount overflow fixes out on the list
> > for review, even if it is after the fact?  
> 
> They were actually on a list for review long before the fact, but it
> was the security mailing list. The issue actually got discussed back
> in January along with early versions of the patches, but then we
> dropped the ball because it just wasn't on anybody's radar and it got
> resurrected late March. Willy wrote a rather bigger patch-series, and
> review of that is what then resulted in those commits. So they may
> look recent, but that's just because the original patches got
> seriously edited down and rewritten.

First time I hear about this, thanks for the heads up.
 
> That said, powerpc and s390 should at least look at maybe adding a
> check for the page ref in their gup paths too. Powerpc has the special
> gup_hugepte() case, and s390 has its own version of gup entirely. I
> was actually hoping the s390 guys would look at using the generic gup
> code.

We did look at converting the s390 gup code to CONFIG_HAVE_GENERIC_GUP,
there are some details that need careful consideration. The top one
is access_ok(), for s390 we always return true. The generic gup code
relies on the fact that a page table walk with a specific address is
doable if access_ok() returned true, the s390 specific check is slightly
different:

if ((end <= start) || (end > mm->context.asce_limit))
return 0;

The obvious approach would be to modify access_ok() to check against
the asce_limit. I will try and see if anything breaks, e.g. the automatic
page table upgrade.

> I ruthlessly also entirely ignored MIPS, SH and sparc, since they seem
> largely irrelevant, partly since even theoretically this whole issue
> needs a _lot_ of memory.
> 
> Michael, Martin, see commit 6b3a70773630 ("Merge branch 'page-refs'
> (page ref overflow)"). You may or may not really care.

On s390 we can have up to 16TB of memory in a single LPAR. So yes, I do
care about it.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: [PATCH 3/5] s390: Fix vDSO clock_getres()

2019-04-03 Thread Martin Schwidefsky
On Mon,  1 Apr 2019 12:51:50 +0100
Vincenzo Frascino  wrote:

> clock_getres in the vDSO library has to preserve the same behaviour
> of posix_get_hrtimer_res().
> 
> In particular, posix_get_hrtimer_res() does:
> sec = 0;
> ns = hrtimer_resolution;
> and hrtimer_resolution depends on the enablement of the high
> resolution timers that can happen either at compile or at run time.
> 
> Fix the s390 vdso implementation of clock_getres keeping a copy of
> hrtimer_resolution in vdso data and using that directly.
> 
> Cc: Martin Schwidefsky 
> Cc: Heiko Carstens 
> Signed-off-by: Vincenzo Frascino 
> ---
>  arch/s390/include/asm/vdso.h   |  1 +
>  arch/s390/kernel/asm-offsets.c |  2 +-
>  arch/s390/kernel/time.c|  1 +
>  arch/s390/kernel/vdso32/clock_getres.S | 17 -
>  arch/s390/kernel/vdso64/clock_getres.S | 15 ++-
>  5 files changed, 25 insertions(+), 11 deletions(-)

I tried this patch and in principle this works. In that regard
Acked-by: Martin Schwidefsky 

But I wonder if the loop to check the update counter is really
necessary. The hrtimer_resolution value can only changes once with
the first call to hrtimer_switch_to_hres(). With the TOD clock
as the only clock available on s390 we always have the ability
to do hrtimer. It then all depends on the highres=[on|off] kernel
parameter what value we get with clock_getres().

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: Kconfig label updates

2019-01-09 Thread Martin Schwidefsky
On Tue, 8 Jan 2019 16:30:24 -0600
Bjorn Helgaas  wrote:

> Hi,
> 
> I want to update the PCI Kconfig labels so they're more consistent and
> useful to users, something like the patch below.  IIUC, the items
> below are all IBM-related; please correct me if not.
> 
> I'd also like to expand (or remove) "RPA" because Google doesn't find
> anything about "IBM RPA", except Robotic Process Automation, which I
> think must be something else.
> 
> Is there some text expansion of RPA that we could use that would be
> meaningful to a user, i.e., something he/she might find on a nameplate
> or in a user manual?
> 
> Ideally the PCI Kconfig labels would match the terms used in
> arch/.../Kconfig, e.g.,
> 
>   config PPC_POWERNV
> bool "IBM PowerNV (Non-Virtualized) platform support"
> 
>   config PPC_PSERIES
> bool "IBM pSeries & new (POWER5-based) iSeries"
> 
>   config MARCH_Z900
> bool "IBM zSeries model z800 and z900"
> 
>   config MARCH_Z9_109
> bool "IBM System z9"
> 
> Bjorn
> 
> 
> diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> index e9f78eb390d2..1c1d145bfd84 100644
> --- a/drivers/pci/hotplug/Kconfig
> +++ b/drivers/pci/hotplug/Kconfig
> @@ -112,7 +112,7 @@ config HOTPLUG_PCI_SHPC
> When in doubt, say N.
> 
>  config HOTPLUG_PCI_POWERNV
> - tristate "PowerPC PowerNV PCI Hotplug driver"
> + tristate "IBM PowerNV PCI Hotplug driver"
>   depends on PPC_POWERNV && EEH
>   select OF_DYNAMIC
>   help
> @@ -125,10 +125,11 @@ config HOTPLUG_PCI_POWERNV
> When in doubt, say N.
> 
>  config HOTPLUG_PCI_RPA
> - tristate "RPA PCI Hotplug driver"
> + tristate "IBM Power Systems RPA PCI Hotplug driver"
>   depends on PPC_PSERIES && EEH
>   help
> Say Y here if you have a RPA system that supports PCI Hotplug.
> +   This includes the earlier pSeries and iSeries.
> 
> To compile this driver as a module, choose M here: the
> module will be called rpaphp.
> @@ -136,7 +137,7 @@ config HOTPLUG_PCI_RPA
> When in doubt, say N.
> 
>  config HOTPLUG_PCI_RPA_DLPAR
> - tristate "RPA Dynamic Logical Partitioning for I/O slots"
> + tristate "IBM RPA Dynamic Logical Partitioning for I/O slots"
>   depends on HOTPLUG_PCI_RPA
>   help
> Say Y here if your system supports Dynamic Logical Partitioning
> @@ -157,7 +158,7 @@ config HOTPLUG_PCI_SGI
> When in doubt, say N.
> 
>  config HOTPLUG_PCI_S390
> - bool "System z PCI Hotplug Support"
> + bool "IBM System z PCI Hotplug Support"
>   depends on S390 && 64BIT
>   help
> Say Y here if you want to use the System z PCI Hotplug
> 

The rewording of the HOTPLUG_PCI_S390 entry is fine with me.
Acked-by: Martin Schwidefsky 

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-15 Thread Martin Schwidefsky
On Mon, 15 Oct 2018 09:10:53 +0200
Christian Borntraeger  wrote:

> On 10/12/2018 03:37 AM, Joel Fernandes (Google) wrote:
> > Android needs to mremap large regions of memory during memory management
> > related operations. The mremap system call can be really slow if THP is
> > not enabled. The bottleneck is move_page_tables, which is copying each
> > pte at a time, and can be really slow across a large map. Turning on THP
> > may not be a viable option, and is not for us. This patch speeds up the
> > performance for non-THP system by copying at the PMD level when possible.
> > 
> > The speed up is three orders of magnitude. On a 1GB mremap, the mremap
> > completion times drops from 160-250 millesconds to 380-400 microseconds.
> > 
> > Before:
> > Total mremap time for 1GB data: 242321014 nanoseconds.
> > Total mremap time for 1GB data: 196842467 nanoseconds.
> > Total mremap time for 1GB data: 167051162 nanoseconds.
> > 
> > After:
> > Total mremap time for 1GB data: 385781 nanoseconds.
> > Total mremap time for 1GB data: 388959 nanoseconds.
> > Total mremap time for 1GB data: 402813 nanoseconds.
> > 
> > Incase THP is enabled, the optimization is skipped. I also flush the
> > tlb every time we do this optimization since I couldn't find a way to
> > determine if the low-level PTEs are dirty. It is seen that the cost of
> > doing so is not much compared the improvement, on both x86-64 and arm64.
> > 
> > Cc: minc...@kernel.org
> > Cc: pan...@google.com
> > Cc: hu...@google.com
> > Cc: lokeshgi...@google.com
> > Cc: dan...@google.com
> > Cc: mho...@kernel.org
> > Cc: kir...@shutemov.name
> > Cc: a...@linux-foundation.org
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> >  mm/mremap.c | 62 +
> >  1 file changed, 62 insertions(+)
> > 
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index 9e68a02a52b1..d82c485822ef 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, 
> > pmd_t *old_pmd,
> > drop_rmap_locks(vma);
> >  }
> >  
> > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long 
> > old_addr,
> > + unsigned long new_addr, unsigned long old_end,
> > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
> > +{
> > +   spinlock_t *old_ptl, *new_ptl;
> > +   struct mm_struct *mm = vma->vm_mm;
> > +
> > +   if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
> > +   || old_end - old_addr < PMD_SIZE)
> > +   return false;
> > +
> > +   /*
> > +* The destination pmd shouldn't be established, free_pgtables()
> > +* should have release it.
> > +*/
> > +   if (WARN_ON(!pmd_none(*new_pmd)))
> > +   return false;
> > +
> > +   /*
> > +* We don't have to worry about the ordering of src and dst
> > +* ptlocks because exclusive mmap_sem prevents deadlock.
> > +*/
> > +   old_ptl = pmd_lock(vma->vm_mm, old_pmd);
> > +   if (old_ptl) {
> > +   pmd_t pmd;
> > +
> > +   new_ptl = pmd_lockptr(mm, new_pmd);
> > +   if (new_ptl != old_ptl)
> > +   spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
> > +
> > +   /* Clear the pmd */
> > +   pmd = *old_pmd;
> > +   pmd_clear(old_pmd);  
> 
> Adding Martin Schwidefsky.
> Is this mapping maybe still in use on other CPUs? If yes, I think for
> s390 we need to flush here as well (in other word we might need to introduce
> pmd_clear_flush). On s390 you have to use instructions like CRDTE,IPTE or IDTE
> to modify page table entries that are still in use. Otherwise you can get a 
> delayed access exception which is - in contrast to page faults - not 
> recoverable.

Just clearing an active pmd would be broken for s390. We need the equivalent
of the ptep_get_and_clear() function for pmds. For s390 this function would
look like this:

static inline pte_t pmdp_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pmd_t *pmdp)
{
return pmdp_xchg_lazy(mm, addr, pmdp, __pmd(_SEGMENT_ENTRY_INVALID));
}

Just like pmdp_huge_get_and_clear() in fact.

> 
> 
> 
> > +
> > +   VM_BUG_ON(!pmd_none(*new_pmd));
> > +
> > +   /* Set the new pmd */
> > +   set_pmd_at(mm, new_addr, new_pmd, pmd);
> > +   if (new_ptl != old_ptl)
> > +   spin_unlock(new_p

Re: [PATCH 3/4] cputime/powerpc/s390: make scaled cputime arch specific

2016-11-02 Thread Martin Schwidefsky
On Wed, 2 Nov 2016 10:38:20 +0100
Stanislaw Gruszka  wrote:

> On Wed, Nov 02, 2016 at 10:11:22AM +0100, Christian Borntraeger wrote:
> > On 10/31/2016 01:36 PM, Stanislaw Gruszka wrote:
> > > Only s390 and powerpc have hardware facilities allowing to measure
> > > cputimes scaled by frequency. On all other architectures
> > > utimescaled/stimescaled are equal to utime/stime (however they are
> > > accounted separately).
> > > 
> > > Patch remove {u,s}timescaled accounting on all architectures except
> > > powerpc and s390, where those values are explicitly accounted on proper
> > > places.
> > 
> > If we remove it everywhere else (and assuming that there are no users then)
> > I aks myself if we should remove this as well from s390.
> 
> There is one user of scaled cputimes values, it is taskstats (to users
> space are exported ac_utimescaled, ac_stimescaled and
> cpu_scaled_run_real_total which is calculated based on scaled times).
> However on other than powerpc and s390 architectures scaled times are
> equal to normal times (this is also true for older powerpc's without
> SPURR/PURR registers).

The taskstats interface is the only user of the scaled cputime that
I am aware of. It is hard to say how valuable the information is.
Without the scaled number in the taskstats inferface you have no means
to detect if your workload has been affected by a another SMT thread
running on the same core.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: [PATCH 11/14] s390/ptrace: run seccomp after ptrace

2016-06-10 Thread Martin Schwidefsky
On Thu,  9 Jun 2016 14:02:01 -0700
Kees Cook <keesc...@chromium.org> wrote:

> Close the hole where ptrace can change a syscall out from under seccomp.
> 
> Signed-off-by: Kees Cook <keesc...@chromium.org>
> Cc: Heiko Carstens <heiko.carst...@de.ibm.com>
> Cc: Martin Schwidefsky <schwidef...@de.ibm.com>
> Cc: linux-s...@vger.kernel.org
> ---
>  arch/s390/kernel/ptrace.c | 21 +
>  1 file changed, 9 insertions(+), 12 deletions(-)

If the change in semantics in regard to the audit of skipped system calls
is acceptable, the modified s390 arch code is ok.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-24 Thread Martin Schwidefsky
On Tue, 23 Feb 2016 22:33:45 +0300
"Kirill A. Shutemov"  wrote:

> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
> > I'll check with Martin, maybe it is actually trivial, then we can
> > do a quick test it to rule that one out.
> 
> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
> _the_ bug.
> 
> pmdp_invalidate() is called for the wrong address :-/
> I guess that can be destructive on the architecture, right?
> 
> Could you check this?
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1c317b85ea7d..4246bc70e55a 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2865,7 +2865,7 @@ static void __split_huge_pmd_locked(struct 
> vm_area_struct *vma, pmd_t *pmd,
>   pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>   pmd_populate(mm, &_pmd, pgtable);
> 
> - for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> + for (i = 0; i < HPAGE_PMD_NR; i++) {
>   pte_t entry, *pte;
>   /*
>* Note that NUMA hinting access restrictions are not
> @@ -2886,9 +2886,9 @@ static void __split_huge_pmd_locked(struct 
> vm_area_struct *vma, pmd_t *pmd,
>   }
>   if (dirty)
>   SetPageDirty(page + i);
> - pte = pte_offset_map(&_pmd, haddr);
> + pte = pte_offset_map(&_pmd, haddr + i * PAGE_SIZE);
>   BUG_ON(!pte_none(*pte));
> - set_pte_at(mm, haddr, pte, entry);
> + set_pte_at(mm, haddr + i * PAGE_SIZE, pte, entry);
>   atomic_inc([i]._mapcount);
>   pte_unmap(pte);
>   }
> @@ -2938,7 +2938,7 @@ static void __split_huge_pmd_locked(struct 
> vm_area_struct *vma, pmd_t *pmd,
>   pmd_populate(mm, pmd, pgtable);
> 
>   if (freeze) {
> - for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> + for (i = 0; i < HPAGE_PMD_NR; i++) {
>   page_remove_rmap(page + i, false);
>   put_page(page + i);
>   }

Test is running and it looks good so far. For the final assessment I defer
to Gerald and Sebastian.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-24 Thread Martin Schwidefsky
On Tue, 23 Feb 2016 19:19:07 +0100
Gerald Schaefer  wrote:

> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov"  wrote:
> 
> > On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > > On Fri, 12 Feb 2016 16:57:27 +0100
> > > Christian Borntraeger  wrote:
> > > 
> > > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. 
> > > > > Hm?
> > > > 
> > > > Don't know, Gerald or Martin?
> > > 
> > > The implementation frequently changes depending on how many new bits 
> > > Martin
> > > needs to squeeze out :-)
> > > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks 
> > > if the
> > > entry is not empty. pmd_none() of course does the opposite, it checks if 
> > > it is
> > > empty.
> > 
> > I still worry about pmd_present(). It looks wrong to me. I wounder if
> > patch below makes a difference.
> > 
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.

If I am not mistaken this is true for x86 as well. The generic implementation
for pmdp_invalidate sets a new pmd that has been modified with
pmd_mknotpresent. For x86 this function removes the _PAGE_PRESENT and
_PAGE_PROTNONE bits from the entry. The _PAGE_PSE bit stays set and that
makes pmd_present return true.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 06/32] s390: reuse asm-generic/barrier.h

2016-01-05 Thread Martin Schwidefsky
On Mon, 4 Jan 2016 22:42:44 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Mon, Jan 04, 2016 at 04:03:39PM +0100, Martin Schwidefsky wrote:
> > On Mon, 4 Jan 2016 14:20:42 +0100
> > Peter Zijlstra <pet...@infradead.org> wrote:
> > 
> > > On Thu, Dec 31, 2015 at 09:06:30PM +0200, Michael S. Tsirkin wrote:
> > > > On s390 read_barrier_depends, smp_read_barrier_depends
> > > > smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
> > > > asm-generic variants exactly. Drop the local definitions and pull in
> > > > asm-generic/barrier.h instead.
> > > > 
> > > > This is in preparation to refactoring this code area.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > > > Acked-by: Arnd Bergmann <a...@arndb.de>
> > > > ---
> > > >  arch/s390/include/asm/barrier.h | 10 ++
> > > >  1 file changed, 2 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/arch/s390/include/asm/barrier.h 
> > > > b/arch/s390/include/asm/barrier.h
> > > > index 7ffd0b1..c358c31 100644
> > > > --- a/arch/s390/include/asm/barrier.h
> > > > +++ b/arch/s390/include/asm/barrier.h
> > > > @@ -30,14 +30,6 @@
> > > >  #define smp_rmb()  rmb()
> > > >  #define smp_wmb()  wmb()
> > > >  
> > > > -#define read_barrier_depends() do { } while (0)
> > > > -#define smp_read_barrier_depends() do { } while (0)
> > > > -
> > > > -#define smp_mb__before_atomic()smp_mb()
> > > > -#define smp_mb__after_atomic() smp_mb()
> > > 
> > > As per:
> > > 
> > >   lkml.kernel.org/r/20150921112252.3c2937e1@mschwide
> > > 
> > > s390 should change this to barrier() instead of smp_mb() and hence
> > > should not use the generic versions.
> >  
> > Yes, we wanted to simplify this. Thanks for the reminder, I'll queue
> > a patch.
> 
> Could you base on my patchset maybe, to avoid conflicts,
> and I'll merge it?
> Or if it's just replacing these 2 with barrier() I can do this
> myself easily.

Probably the easiest solution if you do the patch yourself and
include it in your patch set. 

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 22/32] s390: define __smp_xxx

2016-01-05 Thread Martin Schwidefsky
On Mon, 4 Jan 2016 22:18:58 +0200
"Michael S. Tsirkin"  wrote:

> On Mon, Jan 04, 2016 at 02:45:25PM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 31, 2015 at 09:08:38PM +0200, Michael S. Tsirkin wrote:
> > > This defines __smp_xxx barriers for s390,
> > > for use by virtualization.
> > > 
> > > Some smp_xxx barriers are removed as they are
> > > defined correctly by asm-generic/barriers.h
> > > 
> > > Note: smp_mb, smp_rmb and smp_wmb are defined as full barriers
> > > unconditionally on this architecture.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > Acked-by: Arnd Bergmann 
> > > ---
> > >  arch/s390/include/asm/barrier.h | 15 +--
> > >  1 file changed, 9 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/s390/include/asm/barrier.h 
> > > b/arch/s390/include/asm/barrier.h
> > > index c358c31..fbd25b2 100644
> > > --- a/arch/s390/include/asm/barrier.h
> > > +++ b/arch/s390/include/asm/barrier.h
> > > @@ -26,18 +26,21 @@
> > >  #define wmb()barrier()
> > >  #define dma_rmb()mb()
> > >  #define dma_wmb()mb()
> > > -#define smp_mb() mb()
> > > -#define smp_rmb()rmb()
> > > -#define smp_wmb()wmb()
> > > -
> > > -#define smp_store_release(p, v)  
> > > \
> > > +#define __smp_mb()   mb()
> > > +#define __smp_rmb()  rmb()
> > > +#define __smp_wmb()  wmb()
> > > +#define smp_mb() __smp_mb()
> > > +#define smp_rmb()__smp_rmb()
> > > +#define smp_wmb()__smp_wmb()
> > 
> > Why define the smp_*mb() primitives here? Would not the inclusion of
> > asm-generic/barrier.h do this?
> 
> No because the generic one is a nop on !SMP, this one isn't.
> 
> Pls note this patch is just reordering code without making
> functional changes.
> And at the moment, on s390 smp_xxx barriers are always non empty.

The s390 kernel is SMP to 99.99%, we just didn't bother with a
non-smp variant for the memory-barriers. If the generic header
is used we'd get the non-smp version for free. It will save a
small amount of text space for CONFIG_SMP=n. 
 
> Some of this could be sub-optimal, but
> since on s390 Linux always runs on a hypervisor,
> I am not sure it's safe to use the generic version -
> in other words, it just might be that for s390 smp_ and virt_
> barriers must be equivalent.

The definition of the memory barriers is independent from the fact
if the system is running on an hypervisor or not. Is there really
an architecture where you need special virt_xxx barriers?!? 

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 22/32] s390: define __smp_xxx

2016-01-05 Thread Martin Schwidefsky
On Tue, 5 Jan 2016 15:04:43 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Tue, Jan 05, 2016 at 01:08:52PM +0100, Martin Schwidefsky wrote:
> > On Tue, 5 Jan 2016 11:30:19 +0200
> > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > 
> > > On Tue, Jan 05, 2016 at 09:13:19AM +0100, Martin Schwidefsky wrote:
> > > > On Mon, 4 Jan 2016 22:18:58 +0200
> > > > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > > > 
> > > > > On Mon, Jan 04, 2016 at 02:45:25PM +0100, Peter Zijlstra wrote:
> > > > > > On Thu, Dec 31, 2015 at 09:08:38PM +0200, Michael S. Tsirkin wrote:
> > > > > > > This defines __smp_xxx barriers for s390,
> > > > > > > for use by virtualization.
> > > > > > > 
> > > > > > > Some smp_xxx barriers are removed as they are
> > > > > > > defined correctly by asm-generic/barriers.h
> > > > > > > 
> > > > > > > Note: smp_mb, smp_rmb and smp_wmb are defined as full barriers
> > > > > > > unconditionally on this architecture.
> > > > > > > 
> > > > > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > > > > > > Acked-by: Arnd Bergmann <a...@arndb.de>
> > > > > > > ---
> > > > > > >  arch/s390/include/asm/barrier.h | 15 +--
> > > > > > >  1 file changed, 9 insertions(+), 6 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/arch/s390/include/asm/barrier.h 
> > > > > > > b/arch/s390/include/asm/barrier.h
> > > > > > > index c358c31..fbd25b2 100644
> > > > > > > --- a/arch/s390/include/asm/barrier.h
> > > > > > > +++ b/arch/s390/include/asm/barrier.h
> > > > > > > @@ -26,18 +26,21 @@
> > > > > > >  #define wmb()barrier()
> > > > > > >  #define dma_rmb()mb()
> > > > > > >  #define dma_wmb()mb()
> > > > > > > -#define smp_mb() mb()
> > > > > > > -#define smp_rmb()rmb()
> > > > > > > -#define smp_wmb()wmb()
> > > > > > > -
> > > > > > > -#define smp_store_release(p, v)  
> > > > > > > \
> > > > > > > +#define __smp_mb()   mb()
> > > > > > > +#define __smp_rmb()  rmb()
> > > > > > > +#define __smp_wmb()  wmb()
> > > > > > > +#define smp_mb() __smp_mb()
> > > > > > > +#define smp_rmb()__smp_rmb()
> > > > > > > +#define smp_wmb()__smp_wmb()
> > > > > > 
> > > > > > Why define the smp_*mb() primitives here? Would not the inclusion of
> > > > > > asm-generic/barrier.h do this?
> > > > > 
> > > > > No because the generic one is a nop on !SMP, this one isn't.
> > > > > 
> > > > > Pls note this patch is just reordering code without making
> > > > > functional changes.
> > > > > And at the moment, on s390 smp_xxx barriers are always non empty.
> > > > 
> > > > The s390 kernel is SMP to 99.99%, we just didn't bother with a
> > > > non-smp variant for the memory-barriers. If the generic header
> > > > is used we'd get the non-smp version for free. It will save a
> > > > small amount of text space for CONFIG_SMP=n. 
> > > 
> > > OK, so I'll queue a patch to do this then?
> > 
> > Yes please.
> 
> OK, I'll add a patch on top in v3.

Good, with this addition:

Acked-by: Martin Schwidefsky <schwidef...@de.ibm.com>

> > > Just to make sure: the question would be, are smp_xxx barriers ever used
> > > in s390 arch specific code to flush in/out memory accesses for
> > > synchronization with the hypervisor?
> > > 
> > > I went over s390 arch code and it seems to me the answer is no
> > > (except of course for virtio).
> > 
> > Correct. Guest to host communication either uses instructions which
> > imply a memory barrier or QDIO which uses atomics.
> 
> And atomics imply a barrier on s390, right?

Yes they do.

> > > But I also see a lot of weirdness on this architecture.
> > 
> > Mostly historical, s390 actually is one of the easiest architectures in
> > regard to memory barriers.
> > 
> > > I found these calls:
> > > 
> > > arch/s390/include/asm/bitops.h: smp_mb__before_atomic();
> > > arch/s390/include/asm/bitops.h: smp_mb();
> > > 
> > > Not used in arch specific code so this is likely OK.
> > 
> > This has been introduced with git commit 5402ea6af11dc5a9, the smp_mb
> > and smp_mb__before_atomic are used in clear_bit_unlock and
> > __clear_bit_unlock which are 1:1 copies from the code in
> > include/asm-generic/bitops/lock.h. Only test_and_set_bit_lock differs
> > from the generic implementation.
> 
> something to keep in mind, but
> I'd rather not touch bitops at the moment - this patchset is already too big.

With the conversion smp_mb__before_atomic to a barrier() it does the
correct thing. I don't think that any chance is necessary.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 22/32] s390: define __smp_xxx

2016-01-05 Thread Martin Schwidefsky
On Tue, 5 Jan 2016 11:30:19 +0200
"Michael S. Tsirkin" <m...@redhat.com> wrote:

> On Tue, Jan 05, 2016 at 09:13:19AM +0100, Martin Schwidefsky wrote:
> > On Mon, 4 Jan 2016 22:18:58 +0200
> > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > 
> > > On Mon, Jan 04, 2016 at 02:45:25PM +0100, Peter Zijlstra wrote:
> > > > On Thu, Dec 31, 2015 at 09:08:38PM +0200, Michael S. Tsirkin wrote:
> > > > > This defines __smp_xxx barriers for s390,
> > > > > for use by virtualization.
> > > > > 
> > > > > Some smp_xxx barriers are removed as they are
> > > > > defined correctly by asm-generic/barriers.h
> > > > > 
> > > > > Note: smp_mb, smp_rmb and smp_wmb are defined as full barriers
> > > > > unconditionally on this architecture.
> > > > > 
> > > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > > > > Acked-by: Arnd Bergmann <a...@arndb.de>
> > > > > ---
> > > > >  arch/s390/include/asm/barrier.h | 15 +--
> > > > >  1 file changed, 9 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/arch/s390/include/asm/barrier.h 
> > > > > b/arch/s390/include/asm/barrier.h
> > > > > index c358c31..fbd25b2 100644
> > > > > --- a/arch/s390/include/asm/barrier.h
> > > > > +++ b/arch/s390/include/asm/barrier.h
> > > > > @@ -26,18 +26,21 @@
> > > > >  #define wmb()barrier()
> > > > >  #define dma_rmb()mb()
> > > > >  #define dma_wmb()mb()
> > > > > -#define smp_mb() mb()
> > > > > -#define smp_rmb()rmb()
> > > > > -#define smp_wmb()wmb()
> > > > > -
> > > > > -#define smp_store_release(p, v)  
> > > > > \
> > > > > +#define __smp_mb()   mb()
> > > > > +#define __smp_rmb()  rmb()
> > > > > +#define __smp_wmb()  wmb()
> > > > > +#define smp_mb() __smp_mb()
> > > > > +#define smp_rmb()__smp_rmb()
> > > > > +#define smp_wmb()__smp_wmb()
> > > > 
> > > > Why define the smp_*mb() primitives here? Would not the inclusion of
> > > > asm-generic/barrier.h do this?
> > > 
> > > No because the generic one is a nop on !SMP, this one isn't.
> > > 
> > > Pls note this patch is just reordering code without making
> > > functional changes.
> > > And at the moment, on s390 smp_xxx barriers are always non empty.
> > 
> > The s390 kernel is SMP to 99.99%, we just didn't bother with a
> > non-smp variant for the memory-barriers. If the generic header
> > is used we'd get the non-smp version for free. It will save a
> > small amount of text space for CONFIG_SMP=n. 
> 
> OK, so I'll queue a patch to do this then?

Yes please.
 
> Just to make sure: the question would be, are smp_xxx barriers ever used
> in s390 arch specific code to flush in/out memory accesses for
> synchronization with the hypervisor?
> 
> I went over s390 arch code and it seems to me the answer is no
> (except of course for virtio).

Correct. Guest to host communication either uses instructions which
imply a memory barrier or QDIO which uses atomics.

> But I also see a lot of weirdness on this architecture.

Mostly historical, s390 actually is one of the easiest architectures in
regard to memory barriers.

> I found these calls:
> 
> arch/s390/include/asm/bitops.h: smp_mb__before_atomic();
> arch/s390/include/asm/bitops.h: smp_mb();
> 
> Not used in arch specific code so this is likely OK.

This has been introduced with git commit 5402ea6af11dc5a9, the smp_mb
and smp_mb__before_atomic are used in clear_bit_unlock and
__clear_bit_unlock which are 1:1 copies from the code in
include/asm-generic/bitops/lock.h. Only test_and_set_bit_lock differs
from the generic implementation.

> arch/s390/kernel/vdso.c:smp_mb();
> 
> Looking at
>   Author: Christian Borntraeger <borntrae...@de.ibm.com>
>   Date:   Fri Sep 11 16:23:06 2015 +0200
> 
>   s390/vdso: use correct memory barrier
> 
>   By definition smp_wmb only orders writes against writes. (Finish all
>   previous writes, and do not start any future write). To protect the

Re: [PATCH v2 06/32] s390: reuse asm-generic/barrier.h

2016-01-04 Thread Martin Schwidefsky
On Mon, 4 Jan 2016 14:20:42 +0100
Peter Zijlstra  wrote:

> On Thu, Dec 31, 2015 at 09:06:30PM +0200, Michael S. Tsirkin wrote:
> > On s390 read_barrier_depends, smp_read_barrier_depends
> > smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
> > asm-generic variants exactly. Drop the local definitions and pull in
> > asm-generic/barrier.h instead.
> > 
> > This is in preparation to refactoring this code area.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > Acked-by: Arnd Bergmann 
> > ---
> >  arch/s390/include/asm/barrier.h | 10 ++
> >  1 file changed, 2 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/s390/include/asm/barrier.h 
> > b/arch/s390/include/asm/barrier.h
> > index 7ffd0b1..c358c31 100644
> > --- a/arch/s390/include/asm/barrier.h
> > +++ b/arch/s390/include/asm/barrier.h
> > @@ -30,14 +30,6 @@
> >  #define smp_rmb()  rmb()
> >  #define smp_wmb()  wmb()
> >  
> > -#define read_barrier_depends() do { } while (0)
> > -#define smp_read_barrier_depends() do { } while (0)
> > -
> > -#define smp_mb__before_atomic()smp_mb()
> > -#define smp_mb__after_atomic() smp_mb()
> 
> As per:
> 
>   lkml.kernel.org/r/20150921112252.3c2937e1@mschwide
> 
> s390 should change this to barrier() instead of smp_mb() and hence
> should not use the generic versions.
 
Yes, we wanted to simplify this. Thanks for the reminder, I'll queue
a patch.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Ping Re: [PATCH 0/8] math-emu: Update kernel math-emu code from current glibc soft-fp

2015-08-27 Thread Martin Schwidefsky
On Thu, 27 Aug 2015 13:12:05 +1000
Michael Ellerman m...@ellerman.id.au wrote:

 On Wed, 2015-08-26 at 14:44 +, Joseph Myers wrote:
  On Thu, 20 Aug 2015, Michael Ellerman wrote:
  
   On Wed, 2015-08-19 at 14:39 +, Joseph Myers wrote:
I'd like to ping this patch series, not having seen any comments on it.

[PATCH 0/8] math-emu: Update kernel math-emu code from current glibc 
soft-fp
https://lkml.org/lkml/2015/7/2/394

[PATCH 1/8] math-emu: Move math-emu to math-emu-old
https://lkml.org/lkml/2015/7/2/395

[PATCH 2/8] math-emu: Import current glibc soft-fp as include/math-emu
https://lkml.org/lkml/2015/7/2/398

[PATCH 3/8] alpha/math-emu: Move alpha from math-emu-old to math-emu
https://lkml.org/lkml/2015/7/2/399

[PATCH 4/8] powerpc/math-emu: Move powerpc from math-emu-old to math-emu
https://lkml.org/lkml/2015/7/2/400

[PATCH 5/8] s390/math-emu: Move s390 from math-emu-old to math-emu
https://lkml.org/lkml/2015/7/2/401

[PATCH 6/8] sh/math-emu: Move sh from math-emu-old to math-emu
https://lkml.org/lkml/2015/7/2/402

[PATCH 7/8] sparc/math-emu: Move sparc from math-emu-old to math-emu
https://lkml.org/lkml/2015/7/2/404

[PATCH 8/8] math-emu: Remove math-emu-old
https://lkml.org/lkml/2015/7/2/405

As noted in patch 0, the dependencies are as follows: patch 2 depends 
on 
patch 1, patches 3-7 depend on patches 1 and 2 but are independent of 
each 
other, patch 8 depends on all the other patches.
   
   Hi Joseph,
   
   I have done some testing on powerpc, and it seems good.
   
   I'm reluctant to proceed with merging them though until we've heard at 
   least
   *something* from the other maintainers.
  
  Ping again for the other-architecture maintainers (alpha, s390, sh, 
  sparc)
 
 
 OK I've put this whole series in a branch with the acks we have so far.
 
 I'll ask Stephen to put it in linux-next once 4.3-rc1 is out, and I'll ask
 Linus to pull it for 4.4
 
 I haven't published the branch yet, so if the s390 and sparc maintainers want
 to send me acks I'll rebase and add them.
 

For the s390 parts:
Acked-by: Martin Schwidefsky schwidef...@de.ibm.com 


-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] Consolidate redundant register/stack access code

2015-06-16 Thread Martin Schwidefsky
On Mon, 15 Jun 2015 12:42:57 -0400
David Long dave.l...@linaro.org wrote:

 From: David A. Long dave.l...@linaro.org
 
 Move duplicate and functionally equivalent code for accessing registers
 and stack (CONFIG_HAVE_REGS_AND_STACK_ACCESS_API) from arch subdirs into
 common kernel files.
 
 Note: Help regression testing s390, hexagon, and sh would be appreciated.
   Powerpc builds but I have not verified the functionality.
 
 David A. Long (2):
   Move the pt_regs_offset struct definition from arch to common include
 file
   Consolidate redundant register/stack access code

Compiles  boots on a s390 box. The code for regs_query_register_offset
and regs_query_register_name is a bit more complex for s390 compared
to the current state of things. Should be equivalent though.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 06/10] s390: standardize mmap_rnd() usage

2015-03-09 Thread Martin Schwidefsky
On Wed,  4 Mar 2015 13:10:50 -0800
Kees Cook keesc...@chromium.org wrote:

 In preparation for splitting out ET_DYN ASLR, this refactors the use of
 mmap_rnd() to be used similarly to arm and x86, and extracts the checking
 of PF_RANDOMIZE.
 
 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
  arch/s390/mm/mmap.c | 34 +++---
  1 file changed, 23 insertions(+), 11 deletions(-)

Patch series including this patch works fine
Acked-by: Martin Schwidefsky schwidef...@de.ibm.com

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 08/10] s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE

2015-03-09 Thread Martin Schwidefsky
On Wed,  4 Mar 2015 13:10:52 -0800
Kees Cook keesc...@chromium.org wrote:

 In preparation for moving ET_DYN randomization into the ELF loader (which
 requires a static ELF_ET_DYN_BASE), this redefines s390's existing ET_DYN
 randomization in a call to arch_mmap_rnd(). This refactoring results in
 the same ET_DYN randomization on s390.
 
 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
  arch/s390/include/asm/elf.h |  8 +---
  arch/s390/mm/mmap.c | 11 ++-
  2 files changed, 7 insertions(+), 12 deletions(-)

Patch series including this patch works fine
Acked-by: Martin Schwidefsky schwidef...@de.ibm.com

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] srcu: Isolate srcu sections using CONFIG_SRCU

2014-12-29 Thread Martin Schwidefsky
On Sat, 27 Dec 2014 12:17:43 -0500
Pranith Kumar bobby.pr...@gmail.com wrote:

 @@ -65,10 +65,13 @@
  #include asm/kexec.h
  #include asm/mmu_context.h
  #include asm/code-patching.h
 -#include asm/kvm_ppc.h
  #include asm/hugetlb.h
  #include asm/epapr_hcalls.h
 
 +#if IS_ENABLED(CONFIG_KVM)
 +#include asm/kvm_ppc.h
 +#endif
 +
  #ifdef DEBUG
  #define DBG(fmt...) udbg_printf(fmt)
  #else

I always cringe when I see an include protected by an #ifdef.
Is this really necessary? All that is done in asm-offsets.c is
to calculate offsets, the code where the two offsets in question
are used (entry64.S) does have the #ifdef for CONFIG_KVM.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC 04/77] PCI/MSI/s390: Remove superfluous check of MSI type

2013-10-04 Thread Martin Schwidefsky
On Wed,  2 Oct 2013 12:48:20 +0200
Alexander Gordeev agord...@redhat.com wrote:

 arch_setup_msi_irqs() hook can only be called from the generic
 MSI code which ensures correct MSI type parameter.
 
 Signed-off-by: Alexander Gordeev agord...@redhat.com
 ---
  arch/s390/pci/pci.c |2 --
  1 files changed, 0 insertions(+), 2 deletions(-)
 
 diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
 index c79c6e4..61a3c2c 100644
 --- a/arch/s390/pci/pci.c
 +++ b/arch/s390/pci/pci.c
 @@ -425,8 +425,6 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, 
 int type)
   int rc;
 
   pr_debug(%s: requesting %d MSI-X interrupts..., __func__, nvec);
 - if (type != PCI_CAP_ID_MSIX  type != PCI_CAP_ID_MSI)
 - return -EINVAL;
   if (type == PCI_CAP_ID_MSI  nvec  1)
   return 1;
   msi_vecs = min(nvec, ZPCI_MSI_VEC_MAX);

Acked-by: Martin Schwidefsky schwidef...@de.ibm.com

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RFC 03/77] PCI/MSI/s390: Fix single MSI only check

2013-10-04 Thread Martin Schwidefsky
On Wed,  2 Oct 2013 12:48:19 +0200
Alexander Gordeev agord...@redhat.com wrote:

 Multiple MSIs have never been supported on s390 architecture,
 but the platform code fails to report single MSI only.
 
 Signed-off-by: Alexander Gordeev agord...@redhat.com
 ---
  arch/s390/pci/pci.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)
 
 diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
 index f17a834..c79c6e4 100644
 --- a/arch/s390/pci/pci.c
 +++ b/arch/s390/pci/pci.c
 @@ -427,6 +427,8 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, 
 int type)
   pr_debug(%s: requesting %d MSI-X interrupts..., __func__, nvec);
   if (type != PCI_CAP_ID_MSIX  type != PCI_CAP_ID_MSI)
   return -EINVAL;
 + if (type == PCI_CAP_ID_MSI  nvec  1)
 + return 1;
   msi_vecs = min(nvec, ZPCI_MSI_VEC_MAX);
   msi_vecs = min_t(unsigned int, msi_vecs, CONFIG_PCI_NR_MSI);
 

Acked-by: Martin Schwidefsky schwidef...@de.ibm.com

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUG] rebuild_sched_domains considered dangerous

2011-03-09 Thread Martin Schwidefsky
On Wed, 09 Mar 2011 12:33:49 +0100
Peter Zijlstra pet...@infradead.org wrote:

 On Wed, 2011-03-09 at 11:19 +0100, Peter Zijlstra wrote:
   It appears that this corresponds to one CPU deciding to rebuild the
   sched domains. There's various reasons why that can happen, the typical
   one in our case is the new VPNH feature where the hypervisor informs us
   of a change in node affinity of our virtual processors. s390 has a
   similar feature and should be affected as well.
  
  Ahh, so that's triggering it :-), just curious, how often does the HV do
  that to you? 
 
 OK, so Ben told me on IRC this can happen quite frequently, to which I
 must ask WTF were you guys smoking? Flipping the CPU topology every time
 the HV scheduler does something funny is quite insane. And you did that
 without ever talking to the scheduler folks, not cool.
 
 That is of course aside from the fact that we have a real bug there that
 needs fixing, but really guys, WTF!

Just for info, on s390 the topology change events are rather infrequent.
They do happen e.g. after an LPAR has been activated and the LPAR
hypervisor needs to reshuffle the CPUs of the different nodes.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUG] rebuild_sched_domains considered dangerous

2011-03-09 Thread Martin Schwidefsky
On Wed, 09 Mar 2011 14:19:29 +0100
Peter Zijlstra pet...@infradead.org wrote:

 On Wed, 2011-03-09 at 14:15 +0100, Martin Schwidefsky wrote:
  On Wed, 09 Mar 2011 12:33:49 +0100
  Peter Zijlstra pet...@infradead.org wrote:
  
   On Wed, 2011-03-09 at 11:19 +0100, Peter Zijlstra wrote:
 It appears that this corresponds to one CPU deciding to rebuild the
 sched domains. There's various reasons why that can happen, the 
 typical
 one in our case is the new VPNH feature where the hypervisor informs 
 us
 of a change in node affinity of our virtual processors. s390 has a
 similar feature and should be affected as well.

Ahh, so that's triggering it :-), just curious, how often does the HV do
that to you? 
   
   OK, so Ben told me on IRC this can happen quite frequently, to which I
   must ask WTF were you guys smoking? Flipping the CPU topology every time
   the HV scheduler does something funny is quite insane. And you did that
   without ever talking to the scheduler folks, not cool.
   
   That is of course aside from the fact that we have a real bug there that
   needs fixing, but really guys, WTF!
  
  Just for info, on s390 the topology change events are rather infrequent.
  They do happen e.g. after an LPAR has been activated and the LPAR
  hypervisor needs to reshuffle the CPUs of the different nodes.
 
 But if you don't also update the cpu-node memory mappings (which I
 think it near impossible) what good is it to change the scheduler
 topology?

The memory for the different LPARs is striped over all nodes (or books as we
call them). We heavily rely on the large shared cache between the books to hide
the different memory access latencies.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [BUG] rebuild_sched_domains considered dangerous

2011-03-09 Thread Martin Schwidefsky
On Wed, 09 Mar 2011 14:33:56 +0100
Peter Zijlstra pet...@infradead.org wrote:

 On Wed, 2011-03-09 at 14:31 +0100, Martin Schwidefsky wrote:
   But if you don't also update the cpu-node memory mappings (which I
   think it near impossible) what good is it to change the scheduler
   topology?
  
  The memory for the different LPARs is striped over all nodes (or books as we
  call them). We heavily rely on the large shared cache between the books to 
  hide
  the different memory access latencies. 
 
 Right, so effectively you don't have NUMA due to that striping. So why
 then change the CPU topology? Simply create a topology without NUMA and
 keep it static, that accurately reflects the memory topology.

Well the CPU topology can change due to different grouping of logical CPUs
dependent on which LPARs are activated. And we effectively do not have a
memory topology, only CPU. Its basically all about caches, we want to
reflect the distance between CPUs over the up to 4 cache levels.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()

2011-01-17 Thread Martin Schwidefsky
On Mon, 17 Jan 2011 12:07:13 +0100
Peter Zijlstra pet...@infradead.org wrote:

 I visited existing smp_send_reschedule() implementations and tried to
 add a call to scheduler_ipi() in their handler part, but esp. for MIPS
 I'm not quite sure I actually got all of them.
  
 diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
 index 94cf510..61789e8 100644
 --- a/arch/s390/kernel/smp.c
 +++ b/arch/s390/kernel/smp.c
 @@ -163,12 +163,12 @@ static void do_ext_call_interrupt(unsigned int 
 ext_int_code,
  
   /*
* handle bit signal external calls
 -  *
 -  * For the ec_schedule signal we have to do nothing. All the work
 -  * is done automatically when we return from the interrupt.
*/
   bits = xchg(S390_lowcore.ext_call_fast, 0);
  
 + if (test_bit(ec_schedule, bits))
 + scheduler_ipi();
 +
   if (test_bit(ec_call_function, bits))
   generic_smp_call_function_interrupt();
  

s390 bits are fine.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] math-emu: correct test for downshifting fraction in _FP_FROM_INT()

2010-07-20 Thread Martin Schwidefsky
On Tue, 20 Jul 2010 00:12:02 +0200
Mikael Pettersson mi...@it.uu.se wrote:

 Unfortunately it seems difficult to write a generic module
 which uses math-emu:
 - math-emu/soft-fp.h includes asm/sfp-machine.h,
   but only a handful of archs have it
 - asm/sfp-machine.h isn't always self-contained and may depend
   on various $arch-specific declarations being present
 
 The given test module works on sparc64 and ppc64, where it uses
 the kernel's sfp-machine.h, and on x86 where it uses a stub
 sfp-machine.h supplied by itself.  I tried to cross-compile it
 for alpha, but that failed due to its sfp-machine.h not being
 self-contained.  I didn't try sh or s390.

It would be challange to try this on s390. The math emulation code is
only used for really old 31 bit machines. Starting with the G5 the fpu
can do IEEE754, I would say the math emulation code is irrelevant for
s390 by now.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC/PATCH] mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()

2009-07-20 Thread Martin Schwidefsky
On Mon, 20 Jul 2009 17:11:13 +1000
Benjamin Herrenschmidt b...@kernel.crashing.org wrote:

 On Wed, 2009-07-15 at 15:56 +0200, Nick Piggin wrote:
   I would like to merge the new support that depends on this in 2.6.32,
   so unless there's major objections, I'd like this to go in early during
   the merge window. We can sort out separately how to carry the patch
   around in -next until then since the powerpc tree will have a dependency
   on it.
  
  Can't see any problem with that.
 
 CC'ing Linus here. How do you want to proceed with that merge ? (IE. so
 far nobody objected to the patch itself)
 
 IE. The patch affects all archs, though it's a trivial change every
 time, but I'll have stuff in powerpc-next that depends on it, and so I'm
 not sure what the right approach is here. Should I put it in the powerpc
 tree ?
 
 I also didn't have any formal Ack from anybody, neither mm folks nor
 arch maintainers :-)

Well the change is trivial, it just adds another unused argument to the
macros. For the records: it still compiles on s390.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Restore deterministic CPU accounting on powerpc

2007-11-02 Thread Martin Schwidefsky
On Fri, 2007-11-02 at 15:48 +1100, Paul Mackerras wrote:
 This also lets us simplify the s390 code a bit; it means that the s390
 timer interrupt can now call update_process_times even when
 CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
 suitable account_process_tick().

Just tested on s390 with CONFIG_VIRT_CPU_ACCOUNTING=y. Works fine and it
is a nice cleanup of the s390 timer code. Thanks Paul.

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]

-- 
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc2-mm2 build error on MIPS and ARM

2007-08-11 Thread Martin Schwidefsky
On Fri, 2007-08-10 at 22:58 -0400, Mathieu Desnoyers wrote:
 I got the following errors when building 2.6.23-rc2-mm2 on both mips and
 arm. Both errors are very much alike.

That attached patch should fix it for arm and mips. I'll try a few more
architectures until I'm fed up with compiling cross-compilers..

-- 
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.
---
Subject: [PATCH] move mm_struct and vm_area_struct, compile fix.

From: Martin Schwidefsky [EMAIL PROTECTED]

Include asm/page.h in linux/mm_types.h to make mips and arm
compile again after mm_struct and vm_area_struct have been moved.

Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED]
---

 include/linux/mm_types.h |1 +
 1 file changed, 1 insertion(+)

diff -urpN linux-2.6/include/linux/mm_types.h
linux-2.6-patched/include/linux/mm_types.h
--- linux-2.6/include/linux/mm_types.h  2007-08-11 11:30:10.0
+0200
+++ linux-2.6-patched/include/linux/mm_types.h  2007-08-11
11:32:42.0 +0200
@@ -11,6 +11,7 @@
 #include linux/rwsem.h
 #include linux/completion.h
 #include asm/mmu.h
+#include asm/page.h
 
 struct address_space;
 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc2-mm2 build error on MIPS and ARM

2007-08-11 Thread Martin Schwidefsky
On Sat, 2007-08-11 at 11:56 +0200, Sam Ravnborg wrote:
  That attached patch should fix it for arm and mips. I'll try a few more
  architectures until I'm fed up with compiling cross-compilers..
 http://userweb.kernel.org/~akpm/cross-compilers/

Thanks for the hint but I prefer to compile them on my own (naming
scheme, installation target, 4.1.2, ..). The first one is the hardest,
for the following ones it is just a matter how fast you machine is.
I'll do one cross-compiler every 15 minutes right now (without glibc).

-- 
blue skies,
  Martin.

Reality continues to ruin my life. - Calvin.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev