Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-21 Thread Al Viro
On Mon, Mar 19, 2018 at 11:23:42PM +, Al Viro wrote:
> Benefits:
>   * all SyS... wrappers (i.e. the thing that really ought to
> go into syscall tables) have the same type.
>   * we could have SYSCALL_DEFINE produce a trivial compat
> wrapper, have explicit COMPAT_SYSCALL_DEFINE discard that thing
> and populate the compat syscall table *entirely* with compat_SyS_...,
> letting the linker sort it out.  That way we don't need to keep
> track of what can use native and what needs compat in each compat
> table on biarch.
>   * s390 compat wrappers would disappear with that approach.
>   * we could even stop generating sys_... aliases - if
> syscall table is generated by slapping SyS_... or compat_SyS_...
> on the name given there, we don't need to _have_ those sys_...
> things at all.  All SyS_... would have the same type, so the pile
> in syscalls.h would not be needed - we could generate the externs
> at the same time we generate the syscall table.
> 
> And yes, it's a high-squick approach.  I know and I'm not saying
> it's a good idea.  OTOH, to quote the motto of philosophers and
> shell game operators, "there's something in it"...

FWIW, I have something that is almost reasonable on preprocessor side;
however, that has uncovered the following fun:
void f(unsigned long long);
void g(unsigned a, unsigned b)
{
funsigned long long)b)<<32)|a);
}

which does compile to "jump to f" on i386, ends up with the following
joy on arm:
mov r3, r1
mov r2, #0
push{r4, lr}
orr r2, r2, r0
mov r0, r2
mov r1, r3
bl  f
pop {r4, lr}
bx  lr
with gcc6; gcc7 is saner - there we have just
mov r2, #0
orr r0, r2, r0
b   f

The former is
r3 = r1
r2 = 0
r2 |= r0
r0 = r2
r1 = r3
The latter -
r2 = 0
r0 |= r2
which is better, but still bloody odd

And I'm afraid to check what e.g. 4.4 will do with that testcase...


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-20 Thread Ingo Molnar

* Al Viro  wrote:

> > For example this attempt at creating a new system call:
> > 
> >   SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count)
> > 
> > ... would translate into something like:
> > 
> > .name = "moron", .pattern = "WWW", .type = "int",.size = 4,
> > .name = NULL,  .type = "loff_t", .size = 8,
> > .name = NULL,  .type = "size_t", .size = 4,
> > .name = NULL,  .type = NULL, .size = 0, /* 
> > end of parameter list */
> > 
> > i.e. "WDW". The build-time constraint checker could then warn about:
> > 
> >   # error: System call "moron" uses invalid 'WWW' argument mapping for a 
> > 'WDW' sequence
> >   #please avoid long-long arguments or use 'SYSCALL_DEFINE3_WDW()' 
> > instead
> 
> ... if you do 32bit build.

Yeah - but the checking tool could do a 32-bit sizing of the types and thus the 
checks would work on all arches and on all bitness settings.

I don't think doing part of this in CPP is a good idea:

 - It won't be able to do the full range of checks

 - Wrappers should IMHO be trivial and open coded as much as possible - not 
hidden
   inside several layers of macros.

 - There should be a penalty for newly introduced, badly designed system call
   ABIs, while most CPP variants I can think of will just make bad but solvable 
   decisions palatable, AFAICS.

I.e. I think the way out of this would be two steps:

 1) for new system calls: hard-enforce the highest quality at the development
stage and hard-reject crap. No new 6-parameter system calls or badly ordered
arguments. The tool would also check new extensions to existing system 
calls, 
i.e. no more "add a crappy 4th argument to an existing system call that 
works 
on x86 but hurts MIPS".

 2) for old legacies: cleanly open code all our existing legacies and weird
wrappers. No new muck will be added to it so the line count does not matter.

... is there anything I'm missing?

Thanks,

Ingo


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-20 Thread Dominik Brodowski
On Mon, Mar 19, 2018 at 11:23:42PM +, Al Viro wrote:
> static inline long C_S_moron(int, loff_t, size_t);
> long compat_SyS_moron(long a0, long a1, long a2, long a3, long a4, long a5, 
> long a6)
> {
>   return C_S_moron((__force int)a0,
> (__force loff_t)(((u64)a2 << 32)|a1),
> (__force size_t)a3);
> }
> static inline long C_S_moron(int fd, loff_t offset, size_t count)
> {
>   whatever body you had for it
> }
> 
> That - from
> COMPAT_SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count)
> {
>   whatever body you had for it
> }
> 
> We can use similar machinery for SYSCALL_DEFINE itself, so that
> SyS_moron() would be defined with (long, long, long, long, long, long)
> as arguments and not (long, long long, long) as we have now.

That would be great, as it would allow to use a struct pt_regs * based
syscall calling convention on i386 as well, and not only on x86-64, right?

> It's not impossible to do.  It won't be pretty, but that use of local
> enums allows to avoid unbearably long expansions.
> 
> Benefits:
>   * all SyS... wrappers (i.e. the thing that really ought to
> go into syscall tables) have the same type.
>   * we could have SYSCALL_DEFINE produce a trivial compat
> wrapper, have explicit COMPAT_SYSCALL_DEFINE discard that thing
> and populate the compat syscall table *entirely* with compat_SyS_...,
> letting the linker sort it out.  That way we don't need to keep
> track of what can use native and what needs compat in each compat
> table on biarch.
>   * s390 compat wrappers would disappear with that approach.
>   * we could even stop generating sys_... aliases - if
> syscall table is generated by slapping SyS_... or compat_SyS_...
> on the name given there, we don't need to _have_ those sys_...
> things at all.  All SyS_... would have the same type, so the pile
> in syscalls.h would not be needed - we could generate the externs
> at the same time we generate the syscall table.
> 
> And yes, it's a high-squick approach.  I know and I'm not saying
> it's a good idea.  OTOH, to quote the motto of philosophers and
> shell game operators, "there's something in it"...

... and getting rid of all in-kernel calls to sys_*() is needed as
groundwork for that. So I'll continue to do that "mindless" conversion
first. On top of that, three things (which are mostly orthogonal to each
other) can be done:

1) ptregs system call conversion for x86-64

   Original implementation by Linus exists; needs a bit of tweaking
   but should be doable soon. Need to double-check it does the right
   thing for IA32_EMULATION, though.

2) re-work initramfs etc. code to not use in-kernel equivalents of
   syscalls, but operate on the VFS level instead.

3) re-work SYSCALL_DEFINEx() / COMPAT_SYSCALL_DEFINEx() based on
   your suggestions.

Does that sound sensible?

Thanks,
Dominik


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-19 Thread Al Viro
On Mon, Mar 19, 2018 at 10:29:20AM +0100, Ingo Molnar wrote:
> 
> * Al Viro  wrote:
> 
> > On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote:
> > 
> > > I'd done some digging in that area, will find the notes and post.
> > 
> > OK, found:
> 
> Very nice writeup - IMHO this should go into Documentation/!

If you want to turn that into something printable - more power to you...
FWIW, I think we need to require per-architecture descriptions of ABI
for all architectures.  Something along the lines of

alpha:
C ABI: 64bit, location sequence is ($16, $17, $18, $19, $20, $21, stack)
No arg padding (as for all 64bit).  Stack pointer in $30, return value
in $0.
Syscall ABI: syscall number in $0, arg slots filled from $16, $17, $18, $19,
$20, $21.  Return value in $0, error is reported as 1 in $19.  Saved
syscall number is used as a flag for __force_successful_syscall_return()
purposes - sticking 0 there inhibits the effect of negative return value.

arm:
C ABI: 32bit, location sequence is (r0, r1, r2, r3, stack).  Arg padding
for 64bit args: to even slot.  Stack pointer in sp, return value in r0
Syscall ABI, EABI variant: syscall number in r7.
Syscall ABI, OABI variant: syscall number encoded into insn.
Syscall ABI (both variants): arg slots filled from r0, r1, r2, r3, r4, r5.
Return value in r0.  It's not quite a biarch (support of e.g. ioctl
handling is absent, etc.; basic syscalls are handled, but that's it).

etc.  Ideally the information about callee-saved registers, syscall restart
logics, etc. should also go there.  I'm sick and tired of digging though
the asm glue of unfamiliar architectures ;-/

Another relevant piece of information (especially for biarch) is how
should sub-word arguments be normalized.  E.g. on amd64 both int and long
are passed in 64bit words and function that expects an int does *not*
care about the upper 32 bits.  If you have long f(int a) {return a;},
it will sign-extend the argument.  On ppc, OTOH, it won't - the caller
is responsible for having the bits 31..63 all equal.

That used to be a source of considerable PITA - e.g. kill(2) used to
require a compat wrapper on ppc.  These days SYSCALL_DEFINE and
COMPAT_SYSCALL_DEFINE glue takes care of normalizations.  However
it doesn't apply for the stuff that does *not* use ...DEFINE and
for use of native syscalls on biarch we need a bit more.  Consider
e.g. 32bit syscall on sparc64 wanting to use the native counterpart.
Arguments that are <= 32bit in both ABIs are fine - normalizations
will take care of them.  Anything that is 64bit in both ABIs means
that we will need compat anyway - the argument needs to be recombined
from two registers into one.  The headache comes from
* signed long
* unsigned long
* pointers
Those are word-sized and we need to normalize.  Solution before
SYSCALL_DEFINE glue: have upper halves forcibly zeroed on entry (which
normalizes unsigned long and pointers) and then sign-extend every
signed int and signed long in per-syscall glue (that zeroing is
guaranteed to denormalize int arguments).  Once SYSCALL_DEFINE started
to do normalization we disposed on the need to do separate wrappers
for int arguments; that still leaves us with signed long ones, but
* they are very rare
* most of the syscalls passing them need compat for more
serious reasons anyway.
There are only two exceptions - bdflush(2) and pciconfig_iobase(2).
The latter doesn't exist on sparc, the former ignores its signed long
argument completely.  So we are left with "zero upper halves of all
argument-bearing registers upon the entry and have per-syscall glue
take care of the rest".

For s390 the situation is nastier - normalization for signed and
unsigned long is the same as usual, but pointers might have junk
in bit 31.  IOW, for anything with pointer in arguments we can't
just use the native syscall.  As the result, s390 doesn't bother
with zeroing upper halves in syscall dispatcher and does private
mini-wrappers for native syscalls with pointer/long/ulong arguments.

That kind of crap really needs to be documented - RTFS becomes
somewhat painful when it involves tens of assemblers *and*
missing ABI documents (try to locate one for something embedded -
great motivation for expanding vocabulary, that) ;-/

FWIW, SYSCALL_DEFINE and its ilk (COMPAT_SYSCALL_DEFINE,
s390 COMPAT_SYSCALL_WRAP, etc.) are all about stepping over the
ABI gap - we've got some values from userland caller and we
need to turn that into a valid C function call that would
satisfy C ABI constraints.  Some amount of normalization might've
been done by syscall dispatcher; this stuff does the rest on
per-function basis.

> One way to implement this would be to put the argument chain types (string) 
> and 
> sizes (int) into a special debug section which isn't included in the final 
> kernel 
> image but which can be checked at link time.

Umm...  Possible, but I actually believe that we can do that without
debug 

Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-19 Thread Ingo Molnar

* Al Viro  wrote:

> On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote:
> 
> > I'd done some digging in that area, will find the notes and post.
> 
> OK, found:

Very nice writeup - IMHO this should go into Documentation/!

> OTOH, consider arm.  There we have
>   * r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence
> of objects used to pass arguments
>   * 32bit and less - pick the next available slot
>   * 64bit - skip a slot if we'd already taken an odd number, then use
> the next two slots for lower and upper 32 bits of the argument.
> 
> So our classes take
> simple n-argument:0 to 6 slots
> WD4 slots
> DWW   4 slots
> WDW   5 slots
> WWDD  6 slots
> WDWW  5 slots
> WWWD  6 slots
> WWDWW 6 slots
> WDDW  7 slots (!)  Also , , !@#!@#!@#!# and other nice
> and well-deserved comments from arch maintainers, some of them even printable:
> /* It would be nice if people remember that not all the world's an i386
>when they introduce new system calls */
> SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags,
>  loff_t, offset, loff_t, nbytes)

Such idiosyncratic platform quirks that have an impact on generic code should 
be 
as self-maintaining as possible: i.e. there should be a build time warning even 
on 
x86 if someone introduces a new, suboptimally packed system call.

Otherwise we'll have such incidents again and again as new system calls get 
added.

> [snip the preprocessor horrors - the sketches I've got there are downright 
> obscene]

I still think we should consider creating a generic facility and a tool: which 
would immediately and automatically add new system calls to *every* 
architecture - 
or which would initially at least check these syscall ABI constraints.

I.e. this would start with a new generic kernel facility that warns about 
suboptimal new system call argument layouts on every architecture, not just on 
the 
affected ones.

That's a significant undertaking but should be possible to do.

Once such a facility is in place all the existing old mess is still a PITA, but 
should be manageable eventually - as no new mess is added to it.

IMHO that's the only thing that could break the somewhat deadly current dynamic 
of 
system call mappings mess. Complaining about people not knowing about quirks 
won't 
help.

One way to implement this would be to put the argument chain types (string) and 
sizes (int) into a special debug section which isn't included in the final 
kernel 
image but which can be checked at link time.

For example this attempt at creating a new system call:

  SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count)

... would translate into something like:

.name = "moron", .pattern = "WWW", .type = "int",.size = 4,
.name = NULL,  .type = "loff_t", .size = 8,
.name = NULL,  .type = "size_t", .size = 4,
.name = NULL,  .type = NULL, .size = 0, /* 
end of parameter list */

i.e. "WDW". The build-time constraint checker could then warn about:

  # error: System call "moron" uses invalid 'WWW' argument mapping for a 'WDW' 
sequence
  #please avoid long-long arguments or use 'SYSCALL_DEFINE3_WDW()' 
instead

Each architecture can provide its own syscall parameter checking logic. Both 
'stack boundary' and parameter packing rules would be straightforward to 
express 
if we had such a data structure.

Also note that this tool could also check for optimum packing, i.e. if the new 
system call is defined as:

  SYSCALL_DEFINE3_WDW(moron, int, fd, loff_t, offset, size_t, count)

... would translate to something like:

.name = "moron", .pattern = "WDW", .type = "int",.size = 4,
.name = NULL,  .type = "loff_t", .size = 8,
.name = NULL,  .type = "size_t", .size = 4,
.name = NULL,  .type = NULL, .size = 0, /* 
end of parameter list */

where the tool would print out this error:

  # error: System call "moron" uses suboptimal 'WDW' argument mapping instead 
of 'WWD'

there would be a whitelist of existing system calls that are already using an 
suboptimal argument order - but the warnings/errors would trigger for all new 
system calls.

But adding non-straight-mapped system calls would be the exception in any case.

Such tooling could also do other things, such as limit the C types used for 
system 
call defines to a well-chosen set of ABI-safe types, such as:

  3  key_t
  3  uint32_t
  4  aio_context_t
  4  mqd_t
  4  timer_t
 10  clockid_t
 10  gid_t
 10  loff_t
 10  long
 10  old_gid_t
 10  old_uid_t
 10  umode_t
 11  uid_t
 31  pid_t
 34  size_t
 69  unsigned int

Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Al Viro
On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote:

> I'd done some digging in that area, will find the notes and post.

OK, found:

We have two ABIs in the game - syscall and normal C.  The latter
(for all targets we support) can be described in the following
terms:
* there is a series of word-sized objects used to pass
arguments, some in registers, some on stack.  Arguments are
mapped on that sequence.  Anything up to word size simply takes
the next available word, so on 64bit it's all there is - nth
argument goes into the nth object, types simply do not matter.
On 32bit it's not that simple - there 64bit arguments occupy
two objects.  They are still picked from the same sequence;
on little-endian the lower half goes first, on big-endian -
the higher one.  For some architectures that's all there is to it.
However, on quite a few there's an extra complication - not every
pair can be used for 64bit value.  Rules for those are arch-dependent.
One variety is "pairs should be aligned", i.e. "if we'd consumed
an odd number of slots, add a dummy one before eating a pair".
Another is "don't let a pair span the registers/stack boundary";
surprisingly enough, that's not universal.

The syscall ABI can considerably differ from C one.  First of all,
we really do *not* want to pass anything on stack - it's a major
headache at syscall entry.  See mips/o32 for the scale of that headache.
Not fun.  OTOH, the registers that can be used are a limited resource.
i386 can't pass more than 6 words and that has served as an upper
limit.  If we encode the argument sizes (W - word, D - 64bit) we
have the following variants among the syscalls:
* no arguments at all (SYSCALL_DEFINE0)
* W (SYSCALL_DEFINE1)
* WW (SYSCALL_DEFINE2)
* WWW (SYSCALL_DEFINE3)
*  (SYSCALL_DEFINE4)
* W (SYSCALL_DEFINE5)
* WW (SYSCALL_DEFINE6)
* WD (SYSCALL_DEFINE2, truncate64, ftruncate64)
* DWW (SYSCALL_DEFINE3, lookup_dcookie)
* WDW (SYSCALL_DEFINE3, readahead)
* WWWD (SYSCALL_DEFINE4, pread64, pwrite64)
* WWDD (SYSCALL_DEFINE4, fallocate, sync_file_range2)
* WDDW (SYSCALL_DEFINE4, sync_file_range, fadvise64_64)
* WDWW (SYSCALL_DEFINE4, fadvise64)
* WWDWW (SYSCALL_DEFINE5, fanotify_mark)

The list for each long long-passing variant might be incomplete, but
they really are rare.  And a source of headache; not just for biarch
architectures - e.g. c6x and metag are not biarch and these syscalls
are exactly what stinks them up.

One thing we *really* don't want is syscall-dependent mapping from
syscall ABI registers to C ABI sequence.  Inside a syscall (or in
per-syscall glue) - sure, we can do it (if not happily).  In the
syscall dispatcher - fuck no, too much PITA.  mips/o32 used to be
a horrible example of why not, then they went for "copy from userland
stack whether we need it or not"...

For simple syscalls (first 7 classes in the above, each argument fits
into word) we simply map registers to the first 6 slots of the sequence
and we are done.  It's bloody tempting to use the same mapping for
the rest - use the same code that calls simple syscalls and have it
call sys_readahead() instead of sys_mkdir(), etc.  For something like
x86 or sparc that works perfectly - these guys have no padding for 64bit
arguments, so we are good (provided that userland sets those registers
sanely, that is).

OTOH, consider arm.  There we have
* r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence
of objects used to pass arguments
* 32bit and less - pick the next available slot
* 64bit - skip a slot if we'd already taken an odd number, then use
the next two slots for lower and upper 32 bits of the argument.

So our classes take
simple n-argument:  0 to 6 slots
WD  4 slots
DWW 4 slots
WDW 5 slots
WWDD6 slots
WDWW5 slots
WWWD6 slots
WWDWW   6 slots
WDDW7 slots (!)  Also , , !@#!@#!@#!# and other nice
and well-deserved comments from arch maintainers, some of them even printable:
/* It would be nice if people remember that not all the world's an i386
   when they introduce new system calls */
SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags,
 loff_t, offset, loff_t, nbytes)

No idea why that hadn't been done to fadvise64_64() - that got
/*
 * Since loff_t is a 64 bit type we avoid a lot of ABI hassle
 * with a different argument ordering.
 */
asmlinkage long sys_arm_fadvise64_64(int fd, int advice,
 loff_t offset, loff_t len)
{
return sys_fadvise64_64(fd, offset, len, advice);
}
long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
  u32 len_high, u32 len_low)
{
return sys_fadvise64(fd, 

Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Al Viro
On Sun, Mar 18, 2018 at 11:06:42AM -0700, Linus Torvalds wrote:

> and then we can do
> 
>   COMPAT_SYSCALL_DEFINE5(readahead, int, fd,
> COMPAT_ARG_64BIT_ODD(off), compat_size_t, count)
>   {
>   return do_readahead(fd, off_lo + ((u64)off_hi << 64), count);
>   }
> 
> which at least looks reasonably legible, and has *zero* ifdef's anywhere.

It's a bit more complicated, but...

> I do *not* want to see those disgusting __ARCH_WANT_LE_COMPAT_SYS
> things and crazy #ifdef's in code.

Absolutely.  Those piles of ifdefs are unreadable garbage.

> So either let the architectures do their own trivial wrappers
> entirely, or do something clean like the above. Do *not* do
> #ifdef'fery at the system call declaration time.
> 
> Also note that the "ODD" arguments may not be the ones that need
> padding. I could easily see a system call argument numbering scheme
> like
> 
>r0 - system call number
>r1 - first argument
>r2 - second argument
>...
> 
> and then it's the *EVEN* 64-bit arguments that would need the padding
> (because they are actually odd in the register numbers). The above
> COMPAT_ARG_64BIT[_ODD]() model allows for that too.
> 
> Of course, if some architecture then has some other arbitrary rules (I
> could see register pairing rules that aren't the usual "even register"
> ones), then such an architecture would really have to have its own
> wrapper, but the above at least would handle the simple cases, and
> doesn't look disgusting to use.

I'd done some digging in that area, will find the notes and post.
Basically, we can even avoid the odd/even annotations and have
COMPAT_SYSCALL_DEFINE... sort it out.  It's a bit more hairy than
I would like at this stage in the cycle, though.  I'll see if it can
be done without too much PITA.

However, there still are genuinely speci^Wfucked in head cases - see
e.g. this sad story:
commit ab8a261ba5e2dd9206da640de5870cc31d568a7c
Author: Helge Deller 
Date:   Thu Jul 10 18:07:17 2014 +0200

parisc: fix fanotify_mark() syscall on 32bit compat kernel

Those certainly ought to stay in arch/*


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Al Viro
On Sun, Mar 18, 2018 at 05:10:54PM +0100, Dominik Brodowski wrote:

> +#ifdef __ARCH_WANT_COMPAT_SYS_READAHEAD
> +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding,
> +unsigned int, off_lo, unsigned int, off_hi,
> +size_t, count)
> +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + !defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding,
> +unsigned int, off_hi, unsigned int, off_lo,
> +size_t, count)
> +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE4(readahead, int, fd,
> +unsigned int, off_lo, unsigned int, off_hi,
> +size_t, count)
> +#else /* no padding, big endian */
> +COMPAT_SYSCALL_DEFINE4(readahead, int, fd,
> +unsigned int, off_hi, unsigned int, off_lo,
> +size_t, count)
> +#endif
> +{
> + return do_readahead(fd, ((u64) off_hi << 32) | off_lo, count);
>  }

*UGH*

static inline compat_to_u64(u32 w0, u32 w1)
{
#ifdef __BIG_ENDIAN
return ((u64)w0 << 32) | w1;
#else
return ((u64)w1 << 32) | w0;
#endif
}

in compat.h, then this turns into

#ifdef __ARCH_WANT_COMPAT_SYS_WITH_PADDING
COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding,
   u32, off0, u32 off1,
   compat_size_t, count)
#else
COMPAT_SYSCALL_DEFINE4(readahead, int, fd,
   u32, off0, u32 off1,
   compat_size_t, count)
#endif
{
return do_readahead(fd, compat_to_u64(off0, off1), count);
}


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Linus Torvalds
On Sun, Mar 18, 2018 at 10:40 AM, Al Viro  wrote:
>
> *UGH*
>
> static inline compat_to_u64(u32 w0, u32 w1)
> {
> #ifdef __BIG_ENDIAN
> return ((u64)w0 << 32) | w1;
> #else
> return ((u64)w1 << 32) | w0;
> #endif
> }
>
> in compat.h, then this turns into
>
> #ifdef __ARCH_WANT_COMPAT_SYS_WITH_PADDING
> COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding,
>u32, off0, u32 off1,
>compat_size_t, count)
> #else
> COMPAT_SYSCALL_DEFINE4(readahead, int, fd,
>u32, off0, u32 off1,
>compat_size_t, count)
> #endif

No. This is still too ugly to live.

What *may* be acceptable is if architectures defined something like this:

x86:

/* Little endian registers - low bits first, no padding for odd
register numbers necessary */
#define COMPAT_ARG_64BIT(x) unsigned int x##_lo, unsigned int x##_hi
#define COMPAT_ARG_64BIT_ODD(x) COMPAT_ARG_64BIT(x)

ppc BE:

/* Big-endian registers - high bits first, odd argument pairs
padded up to the next even register */
#define COMPAT_ARG_64BIT(x) unsigned int x##_hi, unsigned int x##_lo
#define COMPAT_ARG_64BIT_ODD(x) unsigned int  x##_padding,
COMPAT_ARG_64BIT(x)

and then we can do

  COMPAT_SYSCALL_DEFINE5(readahead, int, fd,
COMPAT_ARG_64BIT_ODD(off), compat_size_t, count)
  {
  return do_readahead(fd, off_lo + ((u64)off_hi << 64), count);
  }

which at least looks reasonably legible, and has *zero* ifdef's anywhere.

I do *not* want to see those disgusting __ARCH_WANT_LE_COMPAT_SYS
things and crazy #ifdef's in code.

So either let the architectures do their own trivial wrappers
entirely, or do something clean like the above. Do *not* do
#ifdef'fery at the system call declaration time.

Also note that the "ODD" arguments may not be the ones that need
padding. I could easily see a system call argument numbering scheme
like

   r0 - system call number
   r1 - first argument
   r2 - second argument
   ...

and then it's the *EVEN* 64-bit arguments that would need the padding
(because they are actually odd in the register numbers). The above
COMPAT_ARG_64BIT[_ODD]() model allows for that too.

Of course, if some architecture then has some other arbitrary rules (I
could see register pairing rules that aren't the usual "even register"
ones), then such an architecture would really have to have its own
wrapper, but the above at least would handle the simple cases, and
doesn't look disgusting to use.

   Linus

PS. It is possible that we should then add a

   #define COMPAT_ARG_64BIT_VAL(x) (x_##lo + ((u64)x_##hi << 32))

and then do

  COMPAT_SYSCALL_DEFINE5(readahead, int, fd,
COMPAT_ARG_64BIT_ODD(off), compat_size_t, count)
  {
  return do_readahead(fd, COMPAT_ARG_64BIT_VAL(off), count);
  }

because then we could perhaps generate the *non*compat system calls
this way too, just using

#define COMPAT_ARG_64BIT(x) unsigned long x
#define COMPAT_ARG_64BIT_VAL(x) (x)

instead (there might also be a "compat" mode that actually has access
to 64-bit registers, like x32 does, but I suspect it would have other
issues).


[RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Dominik Brodowski
The compat_sys_readahead() implementations in mips, powerpc, s390, sparc
and x86 only differed based on whether the u64 parameter needed padding
and on their endianness.

Oh, and some defined the parameters as u64 or "unsigned long" which
expanded to u64, though it only expected u32 in these parameters.

This patch is part of a series which tries to remove in-kernel calls to
syscalls. On this basis, the syscall entry path can be streamlined.

Cc: Ralf Baechle 
Cc: James Hogan 
Cc: linux-m...@linux-mips.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: linux-s...@vger.kernel.org
Cc: David S. Miller 
Cc: sparcli...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Jiri Slaby 
Cc: x...@kernel.org
Cc: Al Viro 
Signed-off-by: Dominik Brodowski 
---
 arch/mips/include/asm/unistd.h |  1 +
 arch/mips/kernel/linux32.c |  6 ---
 arch/mips/kernel/scall64-o32.S |  2 +-
 arch/powerpc/include/asm/unistd.h  |  1 +
 arch/powerpc/kernel/sys_ppc32.c|  5 ---
 arch/s390/include/asm/unistd.h |  1 +
 arch/s390/kernel/compat_linux.c|  5 ---
 arch/s390/kernel/compat_linux.h|  1 -
 arch/s390/kernel/syscalls/syscall.tbl  |  2 +-
 arch/sparc/include/asm/unistd.h|  1 +
 arch/sparc/kernel/sys_sparc32.c|  8 
 arch/sparc/kernel/systbls.h|  4 --
 arch/x86/entry/syscalls/syscall_32.tbl |  2 +-
 arch/x86/ia32/sys_ia32.c   |  6 ---
 arch/x86/include/asm/sys_ia32.h|  2 -
 arch/x86/include/asm/unistd.h  |  1 +
 include/linux/compat.h | 10 +
 mm/readahead.c | 81 --
 18 files changed, 76 insertions(+), 63 deletions(-)

diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h
index 3ddc271ad77b..f8f9046164ae 100644
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -49,6 +49,7 @@
 #  define __ARCH_WANT_COMPAT_SYS_FALLOCATE
 #  define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
 #  define __ARCH_WANT_COMPAT_SYS_PREADWRITE64
+#  define __ARCH_WANT_COMPAT_SYS_READAHEAD
 #  ifdef __MIPSEL__
 #define __ARCH_WANT_LE_COMPAT_SYS
 #  endif
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index 871cda53a915..c40ce08be17d 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -106,12 +106,6 @@ SYSCALL_DEFINE1(32_personality, unsigned long, personality)
return ret;
 }
 
-asmlinkage ssize_t sys32_readahead(int fd, u32 pad0, u64 a2, u64 a3,
-  size_t count)
-{
-   return sys_readahead(fd, merge_64(a2, a3), count);
-}
-
 asmlinkage long sys32_sync_file_range(int fd, int __pad,
unsigned long a2, unsigned long a3,
unsigned long a4, unsigned long a5,
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index fbc463b234a1..eb4e66ba025a 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -439,7 +439,7 @@ EXPORT(sys32_call_table)
PTR compat_sys_fcntl64  /* 4220 */
PTR sys_ni_syscall
PTR sys_gettid
-   PTR sys32_readahead
+   PTR compat_sys_readahead
PTR sys_setxattr
PTR sys_lsetxattr   /* 4225 */
PTR sys_fsetxattr
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index 704f2413ac30..870317a35763 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -53,6 +53,7 @@
 #define __ARCH_WANT_COMPAT_SYS_FALLOCATE
 #define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
 #define __ARCH_WANT_COMPAT_SYS_PREADWRITE64
+#define __ARCH_WANT_COMPAT_SYS_READAHEAD
 #endif
 #define __ARCH_WANT_SYS_FORK
 #define __ARCH_WANT_SYS_VFORK
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index ec896c8df968..289ae55bb4b5 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -74,11 +74,6 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t 
len,
  * The 32 bit ABI passes long longs in an odd even register pair.
  */
 
-compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offhi, u32 offlo, u32 
count)
-{
-   return sys_readahead(fd, ((loff_t)offhi << 32) | offlo, count);
-}
-
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
high,
 unsigned long low)
 {
diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h
index 71e6f7d65762..685ad7944850 100644
--- a/arch/s390/include/asm/unistd.h
+++ b/arch/s390/include/asm/unistd.h
@@