Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Mon, Mar 19, 2018 at 11:23:42PM +, Al Viro wrote: > Benefits: > * all SyS... wrappers (i.e. the thing that really ought to > go into syscall tables) have the same type. > * we could have SYSCALL_DEFINE produce a trivial compat > wrapper, have explicit COMPAT_SYSCALL_DEFINE discard that thing > and populate the compat syscall table *entirely* with compat_SyS_..., > letting the linker sort it out. That way we don't need to keep > track of what can use native and what needs compat in each compat > table on biarch. > * s390 compat wrappers would disappear with that approach. > * we could even stop generating sys_... aliases - if > syscall table is generated by slapping SyS_... or compat_SyS_... > on the name given there, we don't need to _have_ those sys_... > things at all. All SyS_... would have the same type, so the pile > in syscalls.h would not be needed - we could generate the externs > at the same time we generate the syscall table. > > And yes, it's a high-squick approach. I know and I'm not saying > it's a good idea. OTOH, to quote the motto of philosophers and > shell game operators, "there's something in it"... FWIW, I have something that is almost reasonable on preprocessor side; however, that has uncovered the following fun: void f(unsigned long long); void g(unsigned a, unsigned b) { funsigned long long)b)<<32)|a); } which does compile to "jump to f" on i386, ends up with the following joy on arm: mov r3, r1 mov r2, #0 push{r4, lr} orr r2, r2, r0 mov r0, r2 mov r1, r3 bl f pop {r4, lr} bx lr with gcc6; gcc7 is saner - there we have just mov r2, #0 orr r0, r2, r0 b f The former is r3 = r1 r2 = 0 r2 |= r0 r0 = r2 r1 = r3 The latter - r2 = 0 r0 |= r2 which is better, but still bloody odd And I'm afraid to check what e.g. 4.4 will do with that testcase...
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
* Al Virowrote: > > For example this attempt at creating a new system call: > > > > SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count) > > > > ... would translate into something like: > > > > .name = "moron", .pattern = "WWW", .type = "int",.size = 4, > > .name = NULL, .type = "loff_t", .size = 8, > > .name = NULL, .type = "size_t", .size = 4, > > .name = NULL, .type = NULL, .size = 0, /* > > end of parameter list */ > > > > i.e. "WDW". The build-time constraint checker could then warn about: > > > > # error: System call "moron" uses invalid 'WWW' argument mapping for a > > 'WDW' sequence > > #please avoid long-long arguments or use 'SYSCALL_DEFINE3_WDW()' > > instead > > ... if you do 32bit build. Yeah - but the checking tool could do a 32-bit sizing of the types and thus the checks would work on all arches and on all bitness settings. I don't think doing part of this in CPP is a good idea: - It won't be able to do the full range of checks - Wrappers should IMHO be trivial and open coded as much as possible - not hidden inside several layers of macros. - There should be a penalty for newly introduced, badly designed system call ABIs, while most CPP variants I can think of will just make bad but solvable decisions palatable, AFAICS. I.e. I think the way out of this would be two steps: 1) for new system calls: hard-enforce the highest quality at the development stage and hard-reject crap. No new 6-parameter system calls or badly ordered arguments. The tool would also check new extensions to existing system calls, i.e. no more "add a crappy 4th argument to an existing system call that works on x86 but hurts MIPS". 2) for old legacies: cleanly open code all our existing legacies and weird wrappers. No new muck will be added to it so the line count does not matter. ... is there anything I'm missing? Thanks, Ingo
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Mon, Mar 19, 2018 at 11:23:42PM +, Al Viro wrote: > static inline long C_S_moron(int, loff_t, size_t); > long compat_SyS_moron(long a0, long a1, long a2, long a3, long a4, long a5, > long a6) > { > return C_S_moron((__force int)a0, > (__force loff_t)(((u64)a2 << 32)|a1), > (__force size_t)a3); > } > static inline long C_S_moron(int fd, loff_t offset, size_t count) > { > whatever body you had for it > } > > That - from > COMPAT_SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count) > { > whatever body you had for it > } > > We can use similar machinery for SYSCALL_DEFINE itself, so that > SyS_moron() would be defined with (long, long, long, long, long, long) > as arguments and not (long, long long, long) as we have now. That would be great, as it would allow to use a struct pt_regs * based syscall calling convention on i386 as well, and not only on x86-64, right? > It's not impossible to do. It won't be pretty, but that use of local > enums allows to avoid unbearably long expansions. > > Benefits: > * all SyS... wrappers (i.e. the thing that really ought to > go into syscall tables) have the same type. > * we could have SYSCALL_DEFINE produce a trivial compat > wrapper, have explicit COMPAT_SYSCALL_DEFINE discard that thing > and populate the compat syscall table *entirely* with compat_SyS_..., > letting the linker sort it out. That way we don't need to keep > track of what can use native and what needs compat in each compat > table on biarch. > * s390 compat wrappers would disappear with that approach. > * we could even stop generating sys_... aliases - if > syscall table is generated by slapping SyS_... or compat_SyS_... > on the name given there, we don't need to _have_ those sys_... > things at all. All SyS_... would have the same type, so the pile > in syscalls.h would not be needed - we could generate the externs > at the same time we generate the syscall table. > > And yes, it's a high-squick approach. I know and I'm not saying > it's a good idea. OTOH, to quote the motto of philosophers and > shell game operators, "there's something in it"... ... and getting rid of all in-kernel calls to sys_*() is needed as groundwork for that. So I'll continue to do that "mindless" conversion first. On top of that, three things (which are mostly orthogonal to each other) can be done: 1) ptregs system call conversion for x86-64 Original implementation by Linus exists; needs a bit of tweaking but should be doable soon. Need to double-check it does the right thing for IA32_EMULATION, though. 2) re-work initramfs etc. code to not use in-kernel equivalents of syscalls, but operate on the VFS level instead. 3) re-work SYSCALL_DEFINEx() / COMPAT_SYSCALL_DEFINEx() based on your suggestions. Does that sound sensible? Thanks, Dominik
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Mon, Mar 19, 2018 at 10:29:20AM +0100, Ingo Molnar wrote: > > * Al Virowrote: > > > On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote: > > > > > I'd done some digging in that area, will find the notes and post. > > > > OK, found: > > Very nice writeup - IMHO this should go into Documentation/! If you want to turn that into something printable - more power to you... FWIW, I think we need to require per-architecture descriptions of ABI for all architectures. Something along the lines of alpha: C ABI: 64bit, location sequence is ($16, $17, $18, $19, $20, $21, stack) No arg padding (as for all 64bit). Stack pointer in $30, return value in $0. Syscall ABI: syscall number in $0, arg slots filled from $16, $17, $18, $19, $20, $21. Return value in $0, error is reported as 1 in $19. Saved syscall number is used as a flag for __force_successful_syscall_return() purposes - sticking 0 there inhibits the effect of negative return value. arm: C ABI: 32bit, location sequence is (r0, r1, r2, r3, stack). Arg padding for 64bit args: to even slot. Stack pointer in sp, return value in r0 Syscall ABI, EABI variant: syscall number in r7. Syscall ABI, OABI variant: syscall number encoded into insn. Syscall ABI (both variants): arg slots filled from r0, r1, r2, r3, r4, r5. Return value in r0. It's not quite a biarch (support of e.g. ioctl handling is absent, etc.; basic syscalls are handled, but that's it). etc. Ideally the information about callee-saved registers, syscall restart logics, etc. should also go there. I'm sick and tired of digging though the asm glue of unfamiliar architectures ;-/ Another relevant piece of information (especially for biarch) is how should sub-word arguments be normalized. E.g. on amd64 both int and long are passed in 64bit words and function that expects an int does *not* care about the upper 32 bits. If you have long f(int a) {return a;}, it will sign-extend the argument. On ppc, OTOH, it won't - the caller is responsible for having the bits 31..63 all equal. That used to be a source of considerable PITA - e.g. kill(2) used to require a compat wrapper on ppc. These days SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE glue takes care of normalizations. However it doesn't apply for the stuff that does *not* use ...DEFINE and for use of native syscalls on biarch we need a bit more. Consider e.g. 32bit syscall on sparc64 wanting to use the native counterpart. Arguments that are <= 32bit in both ABIs are fine - normalizations will take care of them. Anything that is 64bit in both ABIs means that we will need compat anyway - the argument needs to be recombined from two registers into one. The headache comes from * signed long * unsigned long * pointers Those are word-sized and we need to normalize. Solution before SYSCALL_DEFINE glue: have upper halves forcibly zeroed on entry (which normalizes unsigned long and pointers) and then sign-extend every signed int and signed long in per-syscall glue (that zeroing is guaranteed to denormalize int arguments). Once SYSCALL_DEFINE started to do normalization we disposed on the need to do separate wrappers for int arguments; that still leaves us with signed long ones, but * they are very rare * most of the syscalls passing them need compat for more serious reasons anyway. There are only two exceptions - bdflush(2) and pciconfig_iobase(2). The latter doesn't exist on sparc, the former ignores its signed long argument completely. So we are left with "zero upper halves of all argument-bearing registers upon the entry and have per-syscall glue take care of the rest". For s390 the situation is nastier - normalization for signed and unsigned long is the same as usual, but pointers might have junk in bit 31. IOW, for anything with pointer in arguments we can't just use the native syscall. As the result, s390 doesn't bother with zeroing upper halves in syscall dispatcher and does private mini-wrappers for native syscalls with pointer/long/ulong arguments. That kind of crap really needs to be documented - RTFS becomes somewhat painful when it involves tens of assemblers *and* missing ABI documents (try to locate one for something embedded - great motivation for expanding vocabulary, that) ;-/ FWIW, SYSCALL_DEFINE and its ilk (COMPAT_SYSCALL_DEFINE, s390 COMPAT_SYSCALL_WRAP, etc.) are all about stepping over the ABI gap - we've got some values from userland caller and we need to turn that into a valid C function call that would satisfy C ABI constraints. Some amount of normalization might've been done by syscall dispatcher; this stuff does the rest on per-function basis. > One way to implement this would be to put the argument chain types (string) > and > sizes (int) into a special debug section which isn't included in the final > kernel > image but which can be checked at link time. Umm... Possible, but I actually believe that we can do that without debug
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
* Al Virowrote: > On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote: > > > I'd done some digging in that area, will find the notes and post. > > OK, found: Very nice writeup - IMHO this should go into Documentation/! > OTOH, consider arm. There we have > * r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence > of objects used to pass arguments > * 32bit and less - pick the next available slot > * 64bit - skip a slot if we'd already taken an odd number, then use > the next two slots for lower and upper 32 bits of the argument. > > So our classes take > simple n-argument:0 to 6 slots > WD4 slots > DWW 4 slots > WDW 5 slots > WWDD 6 slots > WDWW 5 slots > WWWD 6 slots > WWDWW 6 slots > WDDW 7 slots (!) Also , , !@#!@#!@#!# and other nice > and well-deserved comments from arch maintainers, some of them even printable: > /* It would be nice if people remember that not all the world's an i386 >when they introduce new system calls */ > SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags, > loff_t, offset, loff_t, nbytes) Such idiosyncratic platform quirks that have an impact on generic code should be as self-maintaining as possible: i.e. there should be a build time warning even on x86 if someone introduces a new, suboptimally packed system call. Otherwise we'll have such incidents again and again as new system calls get added. > [snip the preprocessor horrors - the sketches I've got there are downright > obscene] I still think we should consider creating a generic facility and a tool: which would immediately and automatically add new system calls to *every* architecture - or which would initially at least check these syscall ABI constraints. I.e. this would start with a new generic kernel facility that warns about suboptimal new system call argument layouts on every architecture, not just on the affected ones. That's a significant undertaking but should be possible to do. Once such a facility is in place all the existing old mess is still a PITA, but should be manageable eventually - as no new mess is added to it. IMHO that's the only thing that could break the somewhat deadly current dynamic of system call mappings mess. Complaining about people not knowing about quirks won't help. One way to implement this would be to put the argument chain types (string) and sizes (int) into a special debug section which isn't included in the final kernel image but which can be checked at link time. For example this attempt at creating a new system call: SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count) ... would translate into something like: .name = "moron", .pattern = "WWW", .type = "int",.size = 4, .name = NULL, .type = "loff_t", .size = 8, .name = NULL, .type = "size_t", .size = 4, .name = NULL, .type = NULL, .size = 0, /* end of parameter list */ i.e. "WDW". The build-time constraint checker could then warn about: # error: System call "moron" uses invalid 'WWW' argument mapping for a 'WDW' sequence #please avoid long-long arguments or use 'SYSCALL_DEFINE3_WDW()' instead Each architecture can provide its own syscall parameter checking logic. Both 'stack boundary' and parameter packing rules would be straightforward to express if we had such a data structure. Also note that this tool could also check for optimum packing, i.e. if the new system call is defined as: SYSCALL_DEFINE3_WDW(moron, int, fd, loff_t, offset, size_t, count) ... would translate to something like: .name = "moron", .pattern = "WDW", .type = "int",.size = 4, .name = NULL, .type = "loff_t", .size = 8, .name = NULL, .type = "size_t", .size = 4, .name = NULL, .type = NULL, .size = 0, /* end of parameter list */ where the tool would print out this error: # error: System call "moron" uses suboptimal 'WDW' argument mapping instead of 'WWD' there would be a whitelist of existing system calls that are already using an suboptimal argument order - but the warnings/errors would trigger for all new system calls. But adding non-straight-mapped system calls would be the exception in any case. Such tooling could also do other things, such as limit the C types used for system call defines to a well-chosen set of ABI-safe types, such as: 3 key_t 3 uint32_t 4 aio_context_t 4 mqd_t 4 timer_t 10 clockid_t 10 gid_t 10 loff_t 10 long 10 old_gid_t 10 old_uid_t 10 umode_t 11 uid_t 31 pid_t 34 size_t 69 unsigned int
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote: > I'd done some digging in that area, will find the notes and post. OK, found: We have two ABIs in the game - syscall and normal C. The latter (for all targets we support) can be described in the following terms: * there is a series of word-sized objects used to pass arguments, some in registers, some on stack. Arguments are mapped on that sequence. Anything up to word size simply takes the next available word, so on 64bit it's all there is - nth argument goes into the nth object, types simply do not matter. On 32bit it's not that simple - there 64bit arguments occupy two objects. They are still picked from the same sequence; on little-endian the lower half goes first, on big-endian - the higher one. For some architectures that's all there is to it. However, on quite a few there's an extra complication - not every pair can be used for 64bit value. Rules for those are arch-dependent. One variety is "pairs should be aligned", i.e. "if we'd consumed an odd number of slots, add a dummy one before eating a pair". Another is "don't let a pair span the registers/stack boundary"; surprisingly enough, that's not universal. The syscall ABI can considerably differ from C one. First of all, we really do *not* want to pass anything on stack - it's a major headache at syscall entry. See mips/o32 for the scale of that headache. Not fun. OTOH, the registers that can be used are a limited resource. i386 can't pass more than 6 words and that has served as an upper limit. If we encode the argument sizes (W - word, D - 64bit) we have the following variants among the syscalls: * no arguments at all (SYSCALL_DEFINE0) * W (SYSCALL_DEFINE1) * WW (SYSCALL_DEFINE2) * WWW (SYSCALL_DEFINE3) * (SYSCALL_DEFINE4) * W (SYSCALL_DEFINE5) * WW (SYSCALL_DEFINE6) * WD (SYSCALL_DEFINE2, truncate64, ftruncate64) * DWW (SYSCALL_DEFINE3, lookup_dcookie) * WDW (SYSCALL_DEFINE3, readahead) * WWWD (SYSCALL_DEFINE4, pread64, pwrite64) * WWDD (SYSCALL_DEFINE4, fallocate, sync_file_range2) * WDDW (SYSCALL_DEFINE4, sync_file_range, fadvise64_64) * WDWW (SYSCALL_DEFINE4, fadvise64) * WWDWW (SYSCALL_DEFINE5, fanotify_mark) The list for each long long-passing variant might be incomplete, but they really are rare. And a source of headache; not just for biarch architectures - e.g. c6x and metag are not biarch and these syscalls are exactly what stinks them up. One thing we *really* don't want is syscall-dependent mapping from syscall ABI registers to C ABI sequence. Inside a syscall (or in per-syscall glue) - sure, we can do it (if not happily). In the syscall dispatcher - fuck no, too much PITA. mips/o32 used to be a horrible example of why not, then they went for "copy from userland stack whether we need it or not"... For simple syscalls (first 7 classes in the above, each argument fits into word) we simply map registers to the first 6 slots of the sequence and we are done. It's bloody tempting to use the same mapping for the rest - use the same code that calls simple syscalls and have it call sys_readahead() instead of sys_mkdir(), etc. For something like x86 or sparc that works perfectly - these guys have no padding for 64bit arguments, so we are good (provided that userland sets those registers sanely, that is). OTOH, consider arm. There we have * r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence of objects used to pass arguments * 32bit and less - pick the next available slot * 64bit - skip a slot if we'd already taken an odd number, then use the next two slots for lower and upper 32 bits of the argument. So our classes take simple n-argument: 0 to 6 slots WD 4 slots DWW 4 slots WDW 5 slots WWDD6 slots WDWW5 slots WWWD6 slots WWDWW 6 slots WDDW7 slots (!) Also , , !@#!@#!@#!# and other nice and well-deserved comments from arch maintainers, some of them even printable: /* It would be nice if people remember that not all the world's an i386 when they introduce new system calls */ SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags, loff_t, offset, loff_t, nbytes) No idea why that hadn't been done to fadvise64_64() - that got /* * Since loff_t is a 64 bit type we avoid a lot of ABI hassle * with a different argument ordering. */ asmlinkage long sys_arm_fadvise64_64(int fd, int advice, loff_t offset, loff_t len) { return sys_fadvise64_64(fd, offset, len, advice); } long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low, u32 len_high, u32 len_low) { return sys_fadvise64(fd,
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Sun, Mar 18, 2018 at 11:06:42AM -0700, Linus Torvalds wrote: > and then we can do > > COMPAT_SYSCALL_DEFINE5(readahead, int, fd, > COMPAT_ARG_64BIT_ODD(off), compat_size_t, count) > { > return do_readahead(fd, off_lo + ((u64)off_hi << 64), count); > } > > which at least looks reasonably legible, and has *zero* ifdef's anywhere. It's a bit more complicated, but... > I do *not* want to see those disgusting __ARCH_WANT_LE_COMPAT_SYS > things and crazy #ifdef's in code. Absolutely. Those piles of ifdefs are unreadable garbage. > So either let the architectures do their own trivial wrappers > entirely, or do something clean like the above. Do *not* do > #ifdef'fery at the system call declaration time. > > Also note that the "ODD" arguments may not be the ones that need > padding. I could easily see a system call argument numbering scheme > like > >r0 - system call number >r1 - first argument >r2 - second argument >... > > and then it's the *EVEN* 64-bit arguments that would need the padding > (because they are actually odd in the register numbers). The above > COMPAT_ARG_64BIT[_ODD]() model allows for that too. > > Of course, if some architecture then has some other arbitrary rules (I > could see register pairing rules that aren't the usual "even register" > ones), then such an architecture would really have to have its own > wrapper, but the above at least would handle the simple cases, and > doesn't look disgusting to use. I'd done some digging in that area, will find the notes and post. Basically, we can even avoid the odd/even annotations and have COMPAT_SYSCALL_DEFINE... sort it out. It's a bit more hairy than I would like at this stage in the cycle, though. I'll see if it can be done without too much PITA. However, there still are genuinely speci^Wfucked in head cases - see e.g. this sad story: commit ab8a261ba5e2dd9206da640de5870cc31d568a7c Author: Helge DellerDate: Thu Jul 10 18:07:17 2014 +0200 parisc: fix fanotify_mark() syscall on 32bit compat kernel Those certainly ought to stay in arch/*
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Sun, Mar 18, 2018 at 05:10:54PM +0100, Dominik Brodowski wrote: > +#ifdef __ARCH_WANT_COMPAT_SYS_READAHEAD > +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding, > +unsigned int, off_lo, unsigned int, off_hi, > +size_t, count) > +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + !defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding, > +unsigned int, off_hi, unsigned int, off_lo, > +size_t, count) > +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE4(readahead, int, fd, > +unsigned int, off_lo, unsigned int, off_hi, > +size_t, count) > +#else /* no padding, big endian */ > +COMPAT_SYSCALL_DEFINE4(readahead, int, fd, > +unsigned int, off_hi, unsigned int, off_lo, > +size_t, count) > +#endif > +{ > + return do_readahead(fd, ((u64) off_hi << 32) | off_lo, count); > } *UGH* static inline compat_to_u64(u32 w0, u32 w1) { #ifdef __BIG_ENDIAN return ((u64)w0 << 32) | w1; #else return ((u64)w1 << 32) | w0; #endif } in compat.h, then this turns into #ifdef __ARCH_WANT_COMPAT_SYS_WITH_PADDING COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding, u32, off0, u32 off1, compat_size_t, count) #else COMPAT_SYSCALL_DEFINE4(readahead, int, fd, u32, off0, u32 off1, compat_size_t, count) #endif { return do_readahead(fd, compat_to_u64(off0, off1), count); }
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Sun, Mar 18, 2018 at 10:40 AM, Al Virowrote: > > *UGH* > > static inline compat_to_u64(u32 w0, u32 w1) > { > #ifdef __BIG_ENDIAN > return ((u64)w0 << 32) | w1; > #else > return ((u64)w1 << 32) | w0; > #endif > } > > in compat.h, then this turns into > > #ifdef __ARCH_WANT_COMPAT_SYS_WITH_PADDING > COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding, >u32, off0, u32 off1, >compat_size_t, count) > #else > COMPAT_SYSCALL_DEFINE4(readahead, int, fd, >u32, off0, u32 off1, >compat_size_t, count) > #endif No. This is still too ugly to live. What *may* be acceptable is if architectures defined something like this: x86: /* Little endian registers - low bits first, no padding for odd register numbers necessary */ #define COMPAT_ARG_64BIT(x) unsigned int x##_lo, unsigned int x##_hi #define COMPAT_ARG_64BIT_ODD(x) COMPAT_ARG_64BIT(x) ppc BE: /* Big-endian registers - high bits first, odd argument pairs padded up to the next even register */ #define COMPAT_ARG_64BIT(x) unsigned int x##_hi, unsigned int x##_lo #define COMPAT_ARG_64BIT_ODD(x) unsigned int x##_padding, COMPAT_ARG_64BIT(x) and then we can do COMPAT_SYSCALL_DEFINE5(readahead, int, fd, COMPAT_ARG_64BIT_ODD(off), compat_size_t, count) { return do_readahead(fd, off_lo + ((u64)off_hi << 64), count); } which at least looks reasonably legible, and has *zero* ifdef's anywhere. I do *not* want to see those disgusting __ARCH_WANT_LE_COMPAT_SYS things and crazy #ifdef's in code. So either let the architectures do their own trivial wrappers entirely, or do something clean like the above. Do *not* do #ifdef'fery at the system call declaration time. Also note that the "ODD" arguments may not be the ones that need padding. I could easily see a system call argument numbering scheme like r0 - system call number r1 - first argument r2 - second argument ... and then it's the *EVEN* 64-bit arguments that would need the padding (because they are actually odd in the register numbers). The above COMPAT_ARG_64BIT[_ODD]() model allows for that too. Of course, if some architecture then has some other arbitrary rules (I could see register pairing rules that aren't the usual "even register" ones), then such an architecture would really have to have its own wrapper, but the above at least would handle the simple cases, and doesn't look disgusting to use. Linus PS. It is possible that we should then add a #define COMPAT_ARG_64BIT_VAL(x) (x_##lo + ((u64)x_##hi << 32)) and then do COMPAT_SYSCALL_DEFINE5(readahead, int, fd, COMPAT_ARG_64BIT_ODD(off), compat_size_t, count) { return do_readahead(fd, COMPAT_ARG_64BIT_VAL(off), count); } because then we could perhaps generate the *non*compat system calls this way too, just using #define COMPAT_ARG_64BIT(x) unsigned long x #define COMPAT_ARG_64BIT_VAL(x) (x) instead (there might also be a "compat" mode that actually has access to 64-bit registers, like x32 does, but I suspect it would have other issues).
[RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
The compat_sys_readahead() implementations in mips, powerpc, s390, sparc and x86 only differed based on whether the u64 parameter needed padding and on their endianness. Oh, and some defined the parameters as u64 or "unsigned long" which expanded to u64, though it only expected u32 in these parameters. This patch is part of a series which tries to remove in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. Cc: Ralf BaechleCc: James Hogan Cc: linux-m...@linux-mips.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: linux-s...@vger.kernel.org Cc: David S. Miller Cc: sparcli...@vger.kernel.org Cc: Ingo Molnar Cc: Jiri Slaby Cc: x...@kernel.org Cc: Al Viro Signed-off-by: Dominik Brodowski --- arch/mips/include/asm/unistd.h | 1 + arch/mips/kernel/linux32.c | 6 --- arch/mips/kernel/scall64-o32.S | 2 +- arch/powerpc/include/asm/unistd.h | 1 + arch/powerpc/kernel/sys_ppc32.c| 5 --- arch/s390/include/asm/unistd.h | 1 + arch/s390/kernel/compat_linux.c| 5 --- arch/s390/kernel/compat_linux.h| 1 - arch/s390/kernel/syscalls/syscall.tbl | 2 +- arch/sparc/include/asm/unistd.h| 1 + arch/sparc/kernel/sys_sparc32.c| 8 arch/sparc/kernel/systbls.h| 4 -- arch/x86/entry/syscalls/syscall_32.tbl | 2 +- arch/x86/ia32/sys_ia32.c | 6 --- arch/x86/include/asm/sys_ia32.h| 2 - arch/x86/include/asm/unistd.h | 1 + include/linux/compat.h | 10 + mm/readahead.c | 81 -- 18 files changed, 76 insertions(+), 63 deletions(-) diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h index 3ddc271ad77b..f8f9046164ae 100644 --- a/arch/mips/include/asm/unistd.h +++ b/arch/mips/include/asm/unistd.h @@ -49,6 +49,7 @@ # define __ARCH_WANT_COMPAT_SYS_FALLOCATE # define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 # define __ARCH_WANT_COMPAT_SYS_PREADWRITE64 +# define __ARCH_WANT_COMPAT_SYS_READAHEAD # ifdef __MIPSEL__ #define __ARCH_WANT_LE_COMPAT_SYS # endif diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c index 871cda53a915..c40ce08be17d 100644 --- a/arch/mips/kernel/linux32.c +++ b/arch/mips/kernel/linux32.c @@ -106,12 +106,6 @@ SYSCALL_DEFINE1(32_personality, unsigned long, personality) return ret; } -asmlinkage ssize_t sys32_readahead(int fd, u32 pad0, u64 a2, u64 a3, - size_t count) -{ - return sys_readahead(fd, merge_64(a2, a3), count); -} - asmlinkage long sys32_sync_file_range(int fd, int __pad, unsigned long a2, unsigned long a3, unsigned long a4, unsigned long a5, diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S index fbc463b234a1..eb4e66ba025a 100644 --- a/arch/mips/kernel/scall64-o32.S +++ b/arch/mips/kernel/scall64-o32.S @@ -439,7 +439,7 @@ EXPORT(sys32_call_table) PTR compat_sys_fcntl64 /* 4220 */ PTR sys_ni_syscall PTR sys_gettid - PTR sys32_readahead + PTR compat_sys_readahead PTR sys_setxattr PTR sys_lsetxattr /* 4225 */ PTR sys_fsetxattr diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index 704f2413ac30..870317a35763 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -53,6 +53,7 @@ #define __ARCH_WANT_COMPAT_SYS_FALLOCATE #define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 #define __ARCH_WANT_COMPAT_SYS_PREADWRITE64 +#define __ARCH_WANT_COMPAT_SYS_READAHEAD #endif #define __ARCH_WANT_SYS_FORK #define __ARCH_WANT_SYS_VFORK diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c index ec896c8df968..289ae55bb4b5 100644 --- a/arch/powerpc/kernel/sys_ppc32.c +++ b/arch/powerpc/kernel/sys_ppc32.c @@ -74,11 +74,6 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t len, * The 32 bit ABI passes long longs in an odd even register pair. */ -compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offhi, u32 offlo, u32 count) -{ - return sys_readahead(fd, ((loff_t)offhi << 32) | offlo, count); -} - asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long high, unsigned long low) { diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h index 71e6f7d65762..685ad7944850 100644 --- a/arch/s390/include/asm/unistd.h +++ b/arch/s390/include/asm/unistd.h @@