[PATCH RFC rebase 0/9] powerpc barrier_nospec
Yes, it is good idea to add some commit messages. Also I rebased the patches on top v3 of series Setup RFI flush after PowerVM LPM migration Thanks Michal Michal Suchanek (9): powerpc: Add barrier_nospec powerpc: Use barrier_nospec in copy_from_user powerpc/64: Use barrier_nospec in syscall entry powerpc/64s: Use barrier_nospec in RFI_FLUSH_SLOT powerpc/64s: Add support for ori barrier_nospec patching powerpc/64: Patch barrier_nospec in modules powerpc/64: barrier_nospec: Add debugfs trigger powerpc/64s: barrier_nospec: Add hcall triggerr powerpc/64: barrier_nospec: Add commandline trigger arch/powerpc/include/asm/barrier.h| 9 arch/powerpc/include/asm/exception-64s.h | 2 +- arch/powerpc/include/asm/feature-fixups.h | 9 arch/powerpc/include/asm/setup.h | 11 arch/powerpc/include/asm/uaccess.h| 11 +++- arch/powerpc/kernel/entry_64.S| 3 ++ arch/powerpc/kernel/module.c | 6 +++ arch/powerpc/kernel/setup_64.c| 87 +++ arch/powerpc/kernel/vmlinux.lds.S | 7 +++ arch/powerpc/lib/feature-fixups.c | 47 ++--- arch/powerpc/platforms/pseries/mobility.c | 2 +- arch/powerpc/platforms/pseries/pseries.h | 2 +- arch/powerpc/platforms/pseries/setup.c| 37 + 13 files changed, 213 insertions(+), 20 deletions(-) -- 2.13.6
[PATCH RFC rebase 3/9] powerpc/64: Use barrier_nospec in syscall entry
On powerpc syscall entry is done in assembly so patch in an explicit barrier_nospec. Signed-off-by: Michal Suchanek <msucha...@suse.de> --- arch/powerpc/kernel/entry_64.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2cb5109a7ea3..7bfc4cf48af2 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -36,6 +36,7 @@ #include #include #include +#include #include #ifdef CONFIG_PPC_BOOK3S #include @@ -159,6 +160,7 @@ system_call:/* label this so stack traces look sane */ andi. r11,r10,_TIF_SYSCALL_DOTRACE bne .Lsyscall_dotrace /* does not return */ cmpldi 0,r0,NR_syscalls + barrier_nospec bge-.Lsyscall_enosys .Lsyscall: @@ -319,6 +321,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r10,TI_FLAGS(r10) cmpldi r0,NR_syscalls + barrier_nospec blt+.Lsyscall /* Return code is already in r3 thanks to do_syscall_trace_enter() */ -- 2.13.6
[PATCH RFC rebase 7/9] powerpc/64: barrier_nospec: Add debugfs trigger
Copypasta from rfi implementation Signed-off-by: Michal Suchanek <msucha...@suse.de> --- arch/powerpc/kernel/setup_64.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index f60e0e3b5ad2..f6678a7b6114 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -963,6 +963,41 @@ static __init int rfi_flush_debugfs_init(void) return 0; } device_initcall(rfi_flush_debugfs_init); + +static int barrier_nospec_set(void *data, u64 val) +{ + switch (val) { + case 0: + case 1: + break; + default: + return -EINVAL; + } + + if (!!val == !!barrier_nospec_enabled) + return 0; + + barrier_nospec_enable(!!val); + + return 0; +} + +static int barrier_nospec_get(void *data, u64 *val) +{ + *val = barrier_nospec_enabled ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_barrier_nospec, + barrier_nospec_get, barrier_nospec_set, "%llu\n"); + +static __init int barrier_nospec_debugfs_init(void) +{ + debugfs_create_file("barrier_nospec", 0600, powerpc_debugfs_root, NULL, + _barrier_nospec); + return 0; +} +device_initcall(barrier_nospec_debugfs_init); #endif ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) -- 2.13.6
Re: doing lots of disk writes causes oom killer to kill processes
Hello, On 19 September 2013 12:13, Jan Kara wrote: > On Wed 18-09-13 16:56:08, Michal Suchanek wrote: >> On 17 September 2013 23:13, Jan Kara wrote: >> > Hello, >> >> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting > Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10 > to better accomodate some workloads but 60/40 on 8 GB machines with > SATA drive really seems too much. That is going to give memory management a > headache. > > The problem is that a good SATA drive can do ~100 MB/s if we are > lucky and IO is sequential. Thus if you have 5 GB of dirty data to write, > it takes 50s at best to write it, with more random IO to image file it can > well take several minutes to write. That may cause some increased latency > when memory reclaim waits for writeback to clean some pages. > >> these to 5/2 gives about the same result as running the script that >> syncs every 5s. Setting to 30/10 gives larger data chunks and >> intermittent lockup before every chunk is written. >> >> It is quite possible to set kernel parameters that kill the kernel but >> >> 1) this is the default > Not upstream one so you should raise this with Debian I guess. 60/40 > looks way out of reasonable range for todays machines. > >> 2) the parameter is set in units that do not prevent the issue in >> general (% RAM vs #blocks) > You can set the number of bytes instead of percentage - > /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper > sizing depends on amount of memory, storage HW, workload. So it's more an > administrative task to set this tunable properly. > >> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle >> traversing a structure holding 800M data in the background. Something >> is seriously rotten somewhere. > Likely processes are waiting in direct reclaim for IO to finish. But that > is just guessing. Try running attached script (forgot to attach it to > previous email). You will need systemtap and kernel debuginfo installed. > The script doesn't work with all versions of systemtap (as it is sadly a > moving target) so if it fails, tell me your version of systemtap and I'll > update the script accordingly. This was fixed for me by the patch posted earlier by Hillf Danton so I guess this answers what the system was (not) doing: --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013 +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013 @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to * implies that pages are cycling through the LRU faster than * they are written so also forcibly stall. */ - if (nr_unqueued_dirty == nr_taken || nr_immediate) + if (nr_unqueued_dirty == nr_taken || nr_immediate) { + if (current_is_kswapd()) + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES); congestion_wait(BLK_RW_ASYNC, HZ/10); + } } /* Also 75485363 is hopefully addressing this issue in mainline. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing lots of disk writes causes oom killer to kill processes
On 5 September 2013 12:12, Michal Suchanek wrote: > Hello > > On 26 August 2013 15:51, Michal Suchanek wrote: >> On 12 March 2013 03:15, Hillf Danton wrote: >>>>On 11 March 2013 13:15, Michal Suchanek wrote: >>>>>On 8 February 2013 17:31, Michal Suchanek wrote: >>>>> Hello, >>>>> >>>>> I am dealing with VM disk images and performing something like wiping >>>>> free space to prepare image for compressing and storing on server or >>>>> copying it to external USB disk causes >>>>> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores >>>>> are 100% used by system and the machine is basicaly unusable >>>>> >>>>> 2) oom killer killing processes >>>>> >>>>> This all on system with 8G ram so there should be plenty space to work >>>>> with. >>>>> >>>>> This happens with kernels 3.6.4 or 3.7.1 >>>>> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a >>>>> problem even with less ram. >>>>> >>>>> I have vm.swappiness = 0 set for a long time already. >>>>> >>>>> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the >>>>kernel still causes all cores to loop somewhere in system when writing >>>>lots of data to disk. >>>> >>>>With swappiness as much as 90 processes still get killed on large disk >>>>writes. >>>> >>>>Given that the max is 100 the interval in which mm works at all is >>>>going to be very narrow, less than 10% of the paramater range. This is >>>>a severe regression as is the cpu time consumed by the kernel. >>>> >>>>The io scheduler is the default cfq. >>>> >>>>If you have any idea what to try other than downgrading to an earlier >>>>unaffected kernel I would like to hear. >>>> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible >>> deadlock caused by too_many_isolated())? >>> >>> Or try 3.8 and/or 3.9, additionally? >>> >> >> Hello, >> >> with deadline IO scheduler I experience this issue less often but it >> still happens. >> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem. >> >> Do you have some idea what to log so that useful information about the >> lockup is gathered? >> > > This appears to be fixed in vanilla 3.11 kernel. > > I still get short intermittent lockups and cpu usage spikes up to 20% > on a core but nowhere near the minute+ long lockups with all cores > 100% on earlier kernels. > So I did more testing on the 3.11 kernel and while it works OK with tar you can get severe lockups with mc or kvm. The difference is probably the fact that sane tools do fsync() on files they close forcing the file to write out and the kernel returning possible write errors before they move on to next file. With kvm writing to a file used as virtual disk the system would stall indefinitely until the disk driver in the emulated system would time out, return disk IO error, and the emulated system would stop writing. In top I see all CPU cores 90%+ in wait. System is unusable. With mc the lockups would be indefinite, probably because there is no timeout on writing a file in mc. I tried tuning swappiness and eleveators but the the basic problem is solved by neither: the dirty buffers fill up memory and system stalls trying to resolve the situation. Obviously the kernel puts off writing any dirty buffers until the memory pressure is overwhelming and the vmm flops. At least the OOM killer does not get invoked anymore since there is lots of memory - just Linux does not know how to use it. The solution to this problem is quite simple - use the ancient userspace bdflushd or what it was called. I emulate it with { while true ; do sleep 5; sync ; done } & The system performance suddenly increases - to the awesome Debian stable levels. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing lots of disk writes causes oom killer to kill processes
On 17 September 2013 23:13, Jan Kara wrote: > Hello, > > On Tue 17-09-13 15:31:31, Michal Suchanek wrote: >> On 5 September 2013 12:12, Michal Suchanek wrote: >> > On 26 August 2013 15:51, Michal Suchanek wrote: >> >> On 12 March 2013 03:15, Hillf Danton wrote: >> >>>>On 11 March 2013 13:15, Michal Suchanek wrote: >> >>>>>On 8 February 2013 17:31, Michal Suchanek wrote: >> >>>>> Hello, >> >>>>> >> >>>>> I am dealing with VM disk images and performing something like wiping >> >>>>> free space to prepare image for compressing and storing on server or >> >>>>> copying it to external USB disk causes >> >>>>> >> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores >> >>>>> are 100% used by system and the machine is basicaly unusable >> >>>>> >> >>>>> 2) oom killer killing processes >> >>>>> >> >>>>> This all on system with 8G ram so there should be plenty space to work >> >>>>> with. >> >>>>> >> >>>>> This happens with kernels 3.6.4 or 3.7.1 >> >>>>> >> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a >> >>>>> problem even with less ram. >> >>>>> >> >>>>> I have vm.swappiness = 0 set for a long time already. >> >>>>> >> >>>>> >> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the >> >>>>kernel still causes all cores to loop somewhere in system when writing >> >>>>lots of data to disk. >> >>>> >> >>>>With swappiness as much as 90 processes still get killed on large disk >> >>>>writes. >> >>>> >> >>>>Given that the max is 100 the interval in which mm works at all is >> >>>>going to be very narrow, less than 10% of the paramater range. This is >> >>>>a severe regression as is the cpu time consumed by the kernel. >> >>>> >> >>>>The io scheduler is the default cfq. >> >>>> >> >>>>If you have any idea what to try other than downgrading to an earlier >> >>>>unaffected kernel I would like to hear. >> >>>> >> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible >> >>> deadlock caused by too_many_isolated())? >> >>> >> >>> Or try 3.8 and/or 3.9, additionally? >> >>> >> >> >> >> Hello, >> >> >> >> with deadline IO scheduler I experience this issue less often but it >> >> still happens. >> >> >> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem. >> >> >> >> Do you have some idea what to log so that useful information about the >> >> lockup is gathered? >> >> >> > >> > This appears to be fixed in vanilla 3.11 kernel. >> > >> > I still get short intermittent lockups and cpu usage spikes up to 20% >> > on a core but nowhere near the minute+ long lockups with all cores >> > 100% on earlier kernels. >> > >> >> So I did more testing on the 3.11 kernel and while it works OK with >> tar you can get severe lockups with mc or kvm. The difference is >> probably the fact that sane tools do fsync() on files they close >> forcing the file to write out and the kernel returning possible write >> errors before they move on to next file. > Sorry for chiming in a bit late. But is this really writing to a normal > disk? SATA drive or something else? It's a LVM volume on a SATA drive. I sometimes use USB disks as well but most of the time it's SATA or eSATA. > >> With kvm writing to a file used as virtual disk the system would stall >> indefinitely until the disk driver in the emulated system would time >> out, return disk IO error, and the emulated system would stop writing. >> In top I see all CPU cores 90%+ in wait. System is unusable. With mc >> the lockups would be indefinite, probably because there is no timeout >> on writing a file in mc. >> >> I tried tuning swappiness and eleveators but the the basic problem is >> solved by neither: the dirty buffers fill up memory and system stalls >> trying to resolve the situation. > This is really strange. There is /proc/sys/vm/dirty_ratio, which limits > amount of dirty memory. By default it is set to 20% of memory which tends > to be too much for 8 GB machine. Can you set it to something like 5% and > /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate > sizing (assuming standard SATA drive). Does it change anything? I can try that but I don't really mind if the kernel uses 2G ram for buffers. The problem is it cannot manage those buffers. Does some kernel structure grow out of proportion when the buffers reach this size or something? Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing lots of disk writes causes oom killer to kill processes
On 17 September 2013 23:13, Jan Kara wrote: > Hello, > > On Tue 17-09-13 15:31:31, Michal Suchanek wrote: >> On 5 September 2013 12:12, Michal Suchanek wrote: >> > On 26 August 2013 15:51, Michal Suchanek wrote: >> >> On 12 March 2013 03:15, Hillf Danton wrote: >> >>>>On 11 March 2013 13:15, Michal Suchanek wrote: >> >>>>>On 8 February 2013 17:31, Michal Suchanek wrote: >> >>>>> Hello, >> >>>>> >> >>>>> I am dealing with VM disk images and performing something like wiping >> >>>>> free space to prepare image for compressing and storing on server or >> >>>>> copying it to external USB disk causes >> >>>>> >> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores >> >>>>> are 100% used by system and the machine is basicaly unusable >> >>>>> >> >>>>> 2) oom killer killing processes >> >>>>> >> >>>>> This all on system with 8G ram so there should be plenty space to work >> >>>>> with. >> >>>>> >> >>>>> This happens with kernels 3.6.4 or 3.7.1 >> >>>>> >> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a >> >>>>> problem even with less ram. >> >>>>> >> >>>>> I have vm.swappiness = 0 set for a long time already. >> >>>>> >> >>>>> >> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the >> >>>>kernel still causes all cores to loop somewhere in system when writing >> >>>>lots of data to disk. >> >>>> >> >>>>With swappiness as much as 90 processes still get killed on large disk >> >>>>writes. >> >>>> >> >>>>Given that the max is 100 the interval in which mm works at all is >> >>>>going to be very narrow, less than 10% of the paramater range. This is >> >>>>a severe regression as is the cpu time consumed by the kernel. >> >>>> >> >>>>The io scheduler is the default cfq. >> >>>> >> >>>>If you have any idea what to try other than downgrading to an earlier >> >>>>unaffected kernel I would like to hear. >> >>>> >> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible >> >>> deadlock caused by too_many_isolated())? >> >>> >> >>> Or try 3.8 and/or 3.9, additionally? >> >>> >> >> >> >> Hello, >> >> >> >> with deadline IO scheduler I experience this issue less often but it >> >> still happens. >> >> >> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem. >> >> >> >> Do you have some idea what to log so that useful information about the >> >> lockup is gathered? >> >> >> > >> > This appears to be fixed in vanilla 3.11 kernel. >> > >> > I still get short intermittent lockups and cpu usage spikes up to 20% >> > on a core but nowhere near the minute+ long lockups with all cores >> > 100% on earlier kernels. >> > >> >> So I did more testing on the 3.11 kernel and while it works OK with >> tar you can get severe lockups with mc or kvm. The difference is >> probably the fact that sane tools do fsync() on files they close >> forcing the file to write out and the kernel returning possible write >> errors before they move on to next file. > Sorry for chiming in a bit late. But is this really writing to a normal > disk? SATA drive or something else? > >> With kvm writing to a file used as virtual disk the system would stall >> indefinitely until the disk driver in the emulated system would time >> out, return disk IO error, and the emulated system would stop writing. >> In top I see all CPU cores 90%+ in wait. System is unusable. With mc >> the lockups would be indefinite, probably because there is no timeout >> on writing a file in mc. >> >> I tried tuning swappiness and eleveators but the the basic problem is >> solved by neither: the dirty buffers fill up memory and system stalls >> trying to resolve the situation. > This is really strange. There is /proc/sys/vm/dirty_ratio, which limits > amount of dirty memory. By default it is set to 20% of memory which tends > to be too much for 8 GB machine. Can you set it to something like 5% and > /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate > sizing (assuming standard SATA drive). Does it change anything? The default for dirty_ratio/dirty_background_ratio is 60/40. Setting these to 5/2 gives about the same result as running the script that syncs every 5s. Setting to 30/10 gives larger data chunks and intermittent lockup before every chunk is written. It is quite possible to set kernel parameters that kill the kernel but 1) this is the default 2) the parameter is set in units that do not prevent the issue in general (% RAM vs #blocks) 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle traversing a structure holding 800M data in the background. Something is seriously rotten somewhere. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing lots of disk writes causes oom killer to kill processes
On 9 October 2013 16:19, Michal Suchanek wrote: > Hello, > > On 19 September 2013 12:13, Jan Kara wrote: >> On Wed 18-09-13 16:56:08, Michal Suchanek wrote: >>> On 17 September 2013 23:13, Jan Kara wrote: >>> > Hello, >>> >>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting >> Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10 >> to better accomodate some workloads but 60/40 on 8 GB machines with >> SATA drive really seems too much. That is going to give memory management a >> headache. >> >> The problem is that a good SATA drive can do ~100 MB/s if we are >> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write, >> it takes 50s at best to write it, with more random IO to image file it can >> well take several minutes to write. That may cause some increased latency >> when memory reclaim waits for writeback to clean some pages. >> >>> these to 5/2 gives about the same result as running the script that >>> syncs every 5s. Setting to 30/10 gives larger data chunks and >>> intermittent lockup before every chunk is written. >>> >>> It is quite possible to set kernel parameters that kill the kernel but >>> >>> 1) this is the default >> Not upstream one so you should raise this with Debian I guess. 60/40 >> looks way out of reasonable range for todays machines. >> >>> 2) the parameter is set in units that do not prevent the issue in >>> general (% RAM vs #blocks) >> You can set the number of bytes instead of percentage - >> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper >> sizing depends on amount of memory, storage HW, workload. So it's more an >> administrative task to set this tunable properly. >> >>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle >>> traversing a structure holding 800M data in the background. Something >>> is seriously rotten somewhere. >> Likely processes are waiting in direct reclaim for IO to finish. But that >> is just guessing. Try running attached script (forgot to attach it to >> previous email). You will need systemtap and kernel debuginfo installed. >> The script doesn't work with all versions of systemtap (as it is sadly a >> moving target) so if it fails, tell me your version of systemtap and I'll >> update the script accordingly. > > This was fixed for me by the patch posted earlier by Hillf Danton so I > guess this answers what the system was (not) doing: > > --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013 > +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013 > @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to > * implies that pages are cycling through the LRU faster than > * they are written so also forcibly stall. > */ > - if (nr_unqueued_dirty == nr_taken || nr_immediate) > + if (nr_unqueued_dirty == nr_taken || nr_immediate) { > + if (current_is_kswapd()) > + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES); > congestion_wait(BLK_RW_ASYNC, HZ/10); > + } > } > > /* > > Also 75485363 is hopefully addressing this issue in mainline. > Actually, this was in 3.11 already and it did make the behaviour a bit better but was not enough. So is something like the vmscan.c patch going to make it into the mainline kernel? Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing lots of disk writes causes oom killer to kill processes
Hello, On 19 September 2013 10:07, Hillf Danton wrote: > Hello Michal > > Take it easy please, the kernel is made by human hands. > > Can you please try the diff(and sorry if mail agent reformats it)? > > Best Regards > Hillf > > > --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013 > +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013 > @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to > * implies that pages are cycling through the LRU faster than > * they are written so also forcibly stall. > */ > - if (nr_unqueued_dirty == nr_taken || nr_immediate) > + if (nr_unqueued_dirty == nr_taken || nr_immediate) { > + if (current_is_kswapd()) > + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES); > congestion_wait(BLK_RW_ASYNC, HZ/10); > + } > } > > /* > -- I applied the patch and raised the dirty block ratios to 30/10 and the default 60/40 while imaging a VM and did not observe any problems so I guess this solves it. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing lots of disk writes causes oom killer to kill processes
Hello On 26 August 2013 15:51, Michal Suchanek wrote: > On 12 March 2013 03:15, Hillf Danton wrote: >>>On 11 March 2013 13:15, Michal Suchanek wrote: >>>>On 8 February 2013 17:31, Michal Suchanek wrote: >>>> Hello, >>>> >>>> I am dealing with VM disk images and performing something like wiping >>>> free space to prepare image for compressing and storing on server or >>>> copying it to external USB disk causes >>>> >>>> 1) system lockup in order of a few tens of seconds when all CPU cores >>>> are 100% used by system and the machine is basicaly unusable >>>> >>>> 2) oom killer killing processes >>>> >>>> This all on system with 8G ram so there should be plenty space to work >>>> with. >>>> >>>> This happens with kernels 3.6.4 or 3.7.1 >>>> >>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a >>>> problem even with less ram. >>>> >>>> I have vm.swappiness = 0 set for a long time already. >>>> >>>> >>>I did some testing with 3.7.1 and with swappiness as much as 75 the >>>kernel still causes all cores to loop somewhere in system when writing >>>lots of data to disk. >>> >>>With swappiness as much as 90 processes still get killed on large disk >>>writes. >>> >>>Given that the max is 100 the interval in which mm works at all is >>>going to be very narrow, less than 10% of the paramater range. This is >>>a severe regression as is the cpu time consumed by the kernel. >>> >>>The io scheduler is the default cfq. >>> >>>If you have any idea what to try other than downgrading to an earlier >>>unaffected kernel I would like to hear. >>> >> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible >> deadlock caused by too_many_isolated())? >> >> Or try 3.8 and/or 3.9, additionally? >> > > Hello, > > with deadline IO scheduler I experience this issue less often but it > still happens. > > I am on 3.9.6 Debian kernel so 3.8 did not fix this problem. > > Do you have some idea what to log so that useful information about the > lockup is gathered? > This appears to be fixed in vanilla 3.11 kernel. I still get short intermittent lockups and cpu usage spikes up to 20% on a core but nowhere near the minute+ long lockups with all cores 100% on earlier kernels. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: doing lots of disk writes causes oom killer to kill processes
On 12 March 2013 03:15, Hillf Danton wrote: >>On 11 March 2013 13:15, Michal Suchanek wrote: >>>On 8 February 2013 17:31, Michal Suchanek wrote: >>> Hello, >>> >>> I am dealing with VM disk images and performing something like wiping >>> free space to prepare image for compressing and storing on server or >>> copying it to external USB disk causes >>> >>> 1) system lockup in order of a few tens of seconds when all CPU cores >>> are 100% used by system and the machine is basicaly unusable >>> >>> 2) oom killer killing processes >>> >>> This all on system with 8G ram so there should be plenty space to work with. >>> >>> This happens with kernels 3.6.4 or 3.7.1 >>> >>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a >>> problem even with less ram. >>> >>> I have vm.swappiness = 0 set for a long time already. >>> >>> >>I did some testing with 3.7.1 and with swappiness as much as 75 the >>kernel still causes all cores to loop somewhere in system when writing >>lots of data to disk. >> >>With swappiness as much as 90 processes still get killed on large disk writes. >> >>Given that the max is 100 the interval in which mm works at all is >>going to be very narrow, less than 10% of the paramater range. This is >>a severe regression as is the cpu time consumed by the kernel. >> >>The io scheduler is the default cfq. >> >>If you have any idea what to try other than downgrading to an earlier >>unaffected kernel I would like to hear. >> > Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible > deadlock caused by too_many_isolated())? > > Or try 3.8 and/or 3.9, additionally? Hello, in the meantime I tried setting io scheduler to deadline because I remember using that one in my self-built kernels due to cfq breaking some obscure block driver. With the deadline io scheduler I can set swappiness back to 0 and the system works normally even for moderate amount of IO - restoring disk images from network. This would cause lockups and oom killer running loose with the cfq scheduler. So I guess I found what breaks the system and it is not so much the kernel version. It's using pre-built kernels with the default scheduler. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-sunxi] Re: [RFC PATCH 0/9] mtd: nand: add sunxi NAND Flash Controller support
On 13 January 2014 10:02, boris brezillon wrote: > Hi Henrik, > > > On 11/01/2014 22:11, Henrik Nordström wrote: >> >> thanks for pointing out your documents >> I'm trying to get the NAND driver with HW ECC (and HW RND) >> without using DMA at all >> >> I tried many things but did not quite get the ECC reading command to >> return meaningful resuts. But should work somehow. >> >> do you have any other information I could use to do this ? >> >> Not really. There is no known code to look at using the nand controller >> without DMA. All allwinner code uses DMA even the boot ROM (BROM). >> >> For example, I wonder why there are 2 RAM sectors (the >> driver I found only make use of RAM0) >> >> I think it's used during DMA to fetch next sector while the previous one >> is transferred by DMA. But not sure. > > > Some feedback on my tests: > > - I managed to get HW ECC working without any DMA transfer (using CMD = 01): > * I only tested the sequential ECC => ECC are stored between 2 data blocks > (1024 byte) > * Non sequential ECC should work if I store ECC bytes in the OOB area too > (I'll just have > to send RANDOM_OUT commands to move to the OOB area before sending the > ECC > cmd and another RANDOM_OUT to go back to the DATA area) > > - The HW RND (randomizer) works too, I'll just have to figure out how this > could be > mainlined: >* using a simple dt property to tell the controller it should enable the > randomizer >* provide an interface (like the nand_ecc_ctrl struct ) for other to add > their own > randomizer implementation (this was requested: > https://lkml.org/lkml/2013/12/13/154) > > > The most complicated part is the boot0 partition. > > Tell me if I'm wrong, but here's what I understood from your work (and yuq's > work too): > > boot 0 part properties: > - uses sequential ECC > - uses 1024 bytes ECC blocks > - boot0 code is stored only on the first ECC block of each page (1024 bytes > + ecc bytes) > - boot0 code is stored on the first 64 pages of the first block > - boot0 uses HW randomizer with a specific rnd seed (0x4a80) > > It's not that complicated to read/write from/to boot0, but it's a bit more > to mainline this > implementation: > - the nand chip must use the same ECC algorithm and ECC layout on the whole > flash >(no partition specific config available) > - you cannot mark some part of pages as unused => the nand driver will write > the > whole page, not just the first ECC block (1024 bytes) > > I thought about manually creating an mtd device that fullfils these needs > (in case we > encounter the "allwinner,nandn-boot" property on a nand@X node), but I'm not > sure > this is the right approach. > > Any ideas ? Maybe if varying parameters on one MTD device is not acceptable you could export parts of the flash as different MTD devices each with its own parameters. Since the boot0 part is fixed size this should not really be an issue. Existing MTD drivers that share hardware with other devices exist - eg. the MTD driver which exports part of RAM as MDT device. I wonder if it would be good idea to make it possible to use the NAND only for storage without a boot0 area. If this is selected by a DT parameter as suggested changing the parameter will probably make the NAND unreadable. Thanks Michal > > > Best Regards, > > Boris > >> >> Regards >> Henrik >> > > -- > You received this message because you are subscribed to the Google Groups > "linux-sunxi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to linux-sunxi+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-sunxi] Re: [RFC PATCH 0/9] mtd: nand: add sunxi NAND Flash Controller support
On 29 January 2014 16:43, boris brezillon dev wrote: > Hello Michal, > > > On 29/01/2014 16:11, Michal Suchanek wrote: >> >> On 13 January 2014 10:02, boris brezillon wrote: >>> >>> >>> boot 0 part properties: >>> - uses sequential ECC >>> - uses 1024 bytes ECC blocks >>> - boot0 code is stored only on the first ECC block of each page (1024 >>> bytes >>> + ecc bytes) >>> - boot0 code is stored on the first 64 pages of the first block >>> - boot0 uses HW randomizer with a specific rnd seed (0x4a80) >>> >>> It's not that complicated to read/write from/to boot0, but it's a bit >>> more >>> to mainline this >>> implementation: >>> - the nand chip must use the same ECC algorithm and ECC layout on the >>> whole >>> flash >>> (no partition specific config available) >>> - you cannot mark some part of pages as unused => the nand driver will >>> write >>> the >>>whole page, not just the first ECC block (1024 bytes) >>> >>> I thought about manually creating an mtd device that fullfils these needs >>> (in case we >>> encounter the "allwinner,nandn-boot" property on a nand@X node), but I'm >>> not >>> sure >>> this is the right approach. >>> >>> Any ideas ? >> >> Maybe if varying parameters on one MTD device is not acceptable you >> could export parts of the flash as different MTD devices each with its >> own parameters. Since the boot0 part is fixed size this should not >> really be an issue. Existing MTD drivers that share hardware with >> other devices exist - eg. the MTD driver which exports part of RAM as >> MDT device. > > > I considered this option (exposing 2 mtd devices which use the > same nand chip: one for the boot partition and the other one > for the remaining space). > I might give it a try. > > For the moment I'm trying to use standard partitions and then > attach one of these partitions as a sunxi-nand-boot-interface. > Something similar to what UBI is doing when attaching to an MTD > device. > > This way we can use the NAND as a standard MTD dev and when one > partition is attached as a sunxi-nand-boot-interface you can access > the boot0 partition using a char dev (/dev/snbi0 ?). > The sunxi-nand-boot-interface will provide the appropriate abstraction > to hide the specific boot0 layout... > > What do you think ? If it works with MTD, sure. The problem the two devices avoid is that with uniform parameters across MTD device the boot0 partition is invalid. > > >> >> I wonder if it would be good idea to make it possible to use the NAND >> only for storage without a boot0 area. If this is selected by a DT >> parameter as suggested changing the parameter will probably make the >> NAND unreadable. > > Actually the NAND controller supports up to 8 chips. I guess only the > first one can be used as a boot device. > Reserving space for the boot partition on all of these chips is kind of > useless. This actually depends on the BROM. I did not read the BROM code so I don't know what it does. > Moreover, we can't tell if the user wants to boot from the NAND or > from another storage (MMC for example), in this case we don't need > to expose the boot0 partition. It's possible to use the NAND only for storage, sure. However, a NAND on which the boo0 area is reserved would be unreadable without reserving boot0 area in the driver, right? The best we can tell is if user specified to reserve the area in the DT. It might be possible to verify the boot0 area the same way BROM does when booting from it. This might be nice option when you don't know what you have on the chip and want to read it but most of the time you will want to enforce bootable or non-bootable format when writing the NAND. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-sunxi] [PATCH 00/10] net: stmmac: Add sun7i GMAC glue layer
On 6 December 2013 18:29, Chen-Yu Tsai wrote: > Hi, > > This patch series adds Allwinner sun7i support to stmmac. > The Allwinner sun7i SoC A20 integrates an early version of > dwmac IP from Synopsys. On top of that is a hardware glue > layer. This layer needs to be configured before the dwmac > can be used. ... > Comments? > > Thanks, > > wens > > > Chen-Yu Tsai (10): > net: stmmac: Enable stmmac main clock when probing hardware > net: stmmac: Honor DT parameter to force DMA store and forward mode > net: stmmac: Use platform data tied with compatible strings > net: stmmac: sunxi platfrom extensions for GMAC in Allwinner A20 SoC's > ARM: dts: sun7i: Add GMAC controller node to sun7i DTSI > ARM: dts: sun7i: Add pin muxing options for the GMAC > ARM: dts: sun7i: cubietruck: Enable the GMAC > ARM: dts: sun7i: cubieboard2: Enable GMAC instead of EMAC > ARM: dts: sun7i: olinuxino-micro: Enable GMAC instead of EMAC > ARM: dts: sun7i: Add ethernet alias for GMAC Tested-By: Michal Suchanek Works for me with RGMII and MII phy on top of 3.13rc3. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-sunxi] Re: [PATCH 3/3] ARM: sunxi: dts: Add ahci support to a few A10 and A20 boards
On 7 December 2013 12:47, Olliver Schinagl wrote: > Hey maxime, > > On 06-12-13 19:33, Maxime Ripard wrote: >> >> Hi Oliver, >> >> On Wed, Dec 04, 2013 at 01:10:55PM +0100, oli...@schinagl.nl wrote: >>> >>> From: Oliver Schinagl >>> >>> This patch adds sunxi sata support to A10 and A20 boards that have such >>> a connector. Some boards also feature a regulator via a GPIO and support >>> for this is also added. >>> >>> Signed-off-by: Olliver Schinagl >> >> >> Your git setup seems to be pretty uncertain about how your first name is >> spelled :) > > I should have formally mention it to confuse less people, > > This is how officially my name is spelled (I left out any 'middle' letters. > I never really used it as such, as it confuses people and they always write > it wrong anyway. After years I decided that at least on these patches, I > should write it down properly (googleability etc in the future). So formally > it's Olliver 'oliver' M. Schinagl. > > And no, I won't share my middle name :p > > There! :) > >> >>> --- >>> arch/arm/boot/dts/sun4i-a10-cubieboard.dts | 26 >>> + >>> arch/arm/boot/dts/sun4i-a10.dtsi| 9 + >>> arch/arm/boot/dts/sun7i-a20-cubieboard2.dts | 26 >>> + >>> arch/arm/boot/dts/sun7i-a20-cubietruck.dts | 26 >>> + >>> arch/arm/boot/dts/sun7i-a20-olinuxino-micro.dts | 26 >>> + >>> arch/arm/boot/dts/sun7i-a20.dtsi| 9 + >>> 6 files changed, 122 insertions(+) >> >> >> Could you split this into several patches please? > > Yes, appologies, will take care of this! Sorry, > > Oliver > >> >> At least one per SoC. >> >>> diff --git a/arch/arm/boot/dts/sun4i-a10-cubieboard.dts >>> b/arch/arm/boot/dts/sun4i-a10-cubieboard.dts >>> index 425a7db..b620084 100644 >>> --- a/arch/arm/boot/dts/sun4i-a10-cubieboard.dts >>> +++ b/arch/arm/boot/dts/sun4i-a10-cubieboard.dts >>> @@ -42,7 +42,18 @@ >>> }; >>> }; >>> >>> + sata: ahci@01c18000 { >>> + pwr-supply = <_ahci_5v>; >>> + status = "okay"; >>> + }; >>> + >>> pinctrl@01c20800 { >>> + ahci_pwr_pin: ahci_pwr_pin@0 { >> >> >> Please prefix it with name of the board. >> >>> + allwinner,pins = "PB8"; >>> + allwinner,function = "gpio_out"; >>> + allwinner,driver = <0>; >>> + allwinner,pull = <0>; >>> + }; >> >> >> Please add a newline here. >> >>> led_pins_cubieboard: led_pins@0 { >>> allwinner,pins = "PH20", "PH21"; >>> allwinner,function = "gpio_out"; >>> @@ -86,4 +97,19 @@ >>> linux,default-trigger = "heartbeat"; >>> }; >>> }; >>> + >>> + regulators { >>> + compatible = "simple-bus"; >>> + pinctrl-names = "default"; >>> + >>> + reg_ahci_5v: ahci-5v { >>> + compatible = "regulator-fixed"; >>> + regulator-name = "ahci-5v"; >>> + regulator-min-microvolt = <500>; >>> + regulator-max-microvolt = <500>; >>> + pinctrl-0 = <_pwr_pin>; >>> + gpio = < 1 8 0>; >>> + enable-active-high; >>> + }; >>> + }; >>> }; >>> diff --git a/arch/arm/boot/dts/sun4i-a10.dtsi >>> b/arch/arm/boot/dts/sun4i-a10.dtsi >>> index 4dccdb0..53c6cdb 100644 >>> --- a/arch/arm/boot/dts/sun4i-a10.dtsi >>> +++ b/arch/arm/boot/dts/sun4i-a10.dtsi >>> @@ -306,6 +306,15 @@ >>> #size-cells = <0>; >>> }; >>> >>> + sata: ahci@01c18000 { >>> + compatible = "allwinner,sun4i-a10-ahci"; >> >> >> Please use sun4i-ahci for consistency. >> >>> + reg = <0x01c18000 0x1000>; >>> + interrupts = <0 56 1>; >> >> >> The interrupt here doesn't seem right. Is it actually working at all? >> >>> + clocks = <_gates 25>, < 0>; >>> + clock-names = "ahb_sata", "pll6_sata"; >>> + status = "disabled"; >>> + }; >>> + >>> intc: interrupt-controller@01c20400 { >>> compatible = "allwinner,sun4i-ic"; >>> reg = <0x01c20400 0x400>; >>> diff --git a/arch/arm/boot/dts/sun7i-a20-cubieboard2.dts >>> b/arch/arm/boot/dts/sun7i-a20-cubieboard2.dts >>> index 5c51cb8..99c5e78 100644 >>> --- a/arch/arm/boot/dts/sun7i-a20-cubieboard2.dts >>> +++ b/arch/arm/boot/dts/sun7i-a20-cubieboard2.dts >>> @@ -34,7 +34,18 @@ >>> }; >>> }; >>>
Re: doing lots of disk writes causes oom killer to kill processes
On 9 October 2013 16:19, Michal Suchanek wrote: > Hello, > > On 19 September 2013 12:13, Jan Kara wrote: >> On Wed 18-09-13 16:56:08, Michal Suchanek wrote: >>> On 17 September 2013 23:13, Jan Kara wrote: >>> > Hello, >>> >>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting >> Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10 >> to better accomodate some workloads but 60/40 on 8 GB machines with >> SATA drive really seems too much. That is going to give memory management a >> headache. >> >> The problem is that a good SATA drive can do ~100 MB/s if we are >> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write, >> it takes 50s at best to write it, with more random IO to image file it can >> well take several minutes to write. That may cause some increased latency >> when memory reclaim waits for writeback to clean some pages. >> >>> these to 5/2 gives about the same result as running the script that >>> syncs every 5s. Setting to 30/10 gives larger data chunks and >>> intermittent lockup before every chunk is written. >>> >>> It is quite possible to set kernel parameters that kill the kernel but >>> >>> 1) this is the default >> Not upstream one so you should raise this with Debian I guess. 60/40 >> looks way out of reasonable range for todays machines. >> >>> 2) the parameter is set in units that do not prevent the issue in >>> general (% RAM vs #blocks) >> You can set the number of bytes instead of percentage - >> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper >> sizing depends on amount of memory, storage HW, workload. So it's more an >> administrative task to set this tunable properly. >> >>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle >>> traversing a structure holding 800M data in the background. Something >>> is seriously rotten somewhere. >> Likely processes are waiting in direct reclaim for IO to finish. But that >> is just guessing. Try running attached script (forgot to attach it to >> previous email). You will need systemtap and kernel debuginfo installed. >> The script doesn't work with all versions of systemtap (as it is sadly a >> moving target) so if it fails, tell me your version of systemtap and I'll >> update the script accordingly. > > This was fixed for me by the patch posted earlier by Hillf Danton so I > guess this answers what the system was (not) doing: > > --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013 > +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013 > @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to > * implies that pages are cycling through the LRU faster than > * they are written so also forcibly stall. > */ > - if (nr_unqueued_dirty == nr_taken || nr_immediate) > + if (nr_unqueued_dirty == nr_taken || nr_immediate) { > + if (current_is_kswapd()) > + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES); > congestion_wait(BLK_RW_ASYNC, HZ/10); > + } > } > > /* > Hello, Is this being addressed somehow? It seems the 3.15 kernel still has this issue .. unless it happens to lock up for some other reason in similar situations. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC 8/8] powerpc/64: barrier_nospec: Add commandline trigger
Copypasta from rfi implementation Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/setup_64.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 4b67b7b877d9..257f0e6be107 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -840,6 +840,14 @@ static int __init handle_no_pti(char *p) } early_param("nopti", handle_no_pti); +static int __init handle_no_nospec(char *p) +{ + pr_info("barrier_nospec: disabled on command line."); + no_nospec = true; + return 0; +} +early_param("no_nospec", handle_no_nospec); + static void do_nothing(void *unused) { /* -- 2.13.6
[PATCH RFC 7/8] powerpc/64s: barrier_nospec: Add hcall triggerr
Copypasta from rfi implementation Signed-off-by: Michal Suchanek --- arch/powerpc/platforms/pseries/setup.c | 38 ++ 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 1a527625acf7..b779ddb8e250 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -459,38 +459,50 @@ static void __init find_and_init_phbs(void) of_pci_check_probe_only(); } -static void pseries_setup_rfi_flush(void) +static void pseries_setup_rfi_nospec(void) { struct h_cpu_char_result result; - enum l1d_flush_type types; - bool enable; + enum l1d_flush_type flush_types; + enum spec_barrier_type barrier_type; + bool flush_enable; + bool barrier_enable; long rc; /* Enable by default */ - enable = true; + flush_enable = true; + barrier_enable = true; + /* no fallback if the firmware does not tell us */ + barrier_type = SPEC_BARRIER_NONE; rc = plpar_get_cpu_characteristics(); if (rc == H_SUCCESS) { - types = L1D_FLUSH_NONE; + flush_types = L1D_FLUSH_NONE; if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2) - types |= L1D_FLUSH_MTTRIG; + flush_types |= L1D_FLUSH_MTTRIG; if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30) - types |= L1D_FLUSH_ORI; + flush_types |= L1D_FLUSH_ORI; + if (result.character & H_CPU_CHAR_SPEC_BAR_ORI31) + barrier_type |= SPEC_BARRIER_ORI; /* Use fallback if nothing set in hcall */ - if (types == L1D_FLUSH_NONE) - types = L1D_FLUSH_FALLBACK; + if (flush_types == L1D_FLUSH_NONE) + flush_types = L1D_FLUSH_FALLBACK; if ((!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) || (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) - enable = false; + flush_enable = false; + + if ((!(result.behaviour & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR)) || + (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) + barrier_enable = false; } else { /* Default to fallback if case hcall is not available */ - types = L1D_FLUSH_FALLBACK; + flush_types = L1D_FLUSH_FALLBACK; } - setup_rfi_flush(types, enable); + setup_barrier_nospec(barrier_type, barrier_enable); + setup_rfi_flush(flush_types, flush_enable); } #ifdef CONFIG_PCI_IOV @@ -666,7 +678,7 @@ static void __init pSeries_setup_arch(void) fwnmi_init(); - pseries_setup_rfi_flush(); + pseries_setup_rfi_nospec(); /* By default, only probe PCI (can be overridden by rtas_pci) */ pci_add_flags(PCI_PROBE_ONLY); -- 2.13.6
[PATCH RFC 3/8] powerpc/64: Use barrier_nospec in syscall entry
Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/entry_64.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2cb5109a7ea3..7bfc4cf48af2 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -36,6 +36,7 @@ #include #include #include +#include #include #ifdef CONFIG_PPC_BOOK3S #include @@ -159,6 +160,7 @@ system_call:/* label this so stack traces look sane */ andi. r11,r10,_TIF_SYSCALL_DOTRACE bne .Lsyscall_dotrace /* does not return */ cmpldi 0,r0,NR_syscalls + barrier_nospec bge-.Lsyscall_enosys .Lsyscall: @@ -319,6 +321,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r10,TI_FLAGS(r10) cmpldi r0,NR_syscalls + barrier_nospec blt+.Lsyscall /* Return code is already in r3 thanks to do_syscall_trace_enter() */ -- 2.13.6
[PATCH RFC 6/8] powerpc/64: barrier_nospec: Add debugfs trigger
Copypasta from rfi implementation Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/setup_64.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index d1d9f047161e..4b67b7b877d9 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -955,6 +955,41 @@ static __init int rfi_flush_debugfs_init(void) return 0; } device_initcall(rfi_flush_debugfs_init); + +static int barrier_nospec_set(void *data, u64 val) +{ + switch (val) { + case 0: + case 1: + break; + default: + return -EINVAL; + } + + if (!!val == !!barrier_nospec_enabled) + return 0; + + barrier_nospec_enable(!!val); + + return 0; +} + +static int barrier_nospec_get(void *data, u64 *val) +{ + *val = barrier_nospec_enabled ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_barrier_nospec, + barrier_nospec_get, barrier_nospec_set, "%llu\n"); + +static __init int barrier_nospec_debugfs_init(void) +{ + debugfs_create_file("barrier_nospec", 0600, powerpc_debugfs_root, NULL, + _barrier_nospec); + return 0; +} +device_initcall(barrier_nospec_debugfs_init); #endif ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) -- 2.13.6
[PATCH RFC 5/8] powerpc/64: Patch barrier_nospec in modules
Copypasta from lwsync patching. Note that unlike RFI which is patched only in kernel the nospec state reflects settings at the time the module was loaded. Iterating all modules and re-patching every time the settings change is not implemented. Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/setup.h | 5 - arch/powerpc/kernel/module.c | 6 ++ arch/powerpc/kernel/setup_64.c| 4 ++-- arch/powerpc/lib/feature-fixups.c | 17 ++--- 4 files changed, 26 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 486d02e4a310..7e3a41248810 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -58,7 +58,10 @@ enum spec_barrier_type { void __init setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); void __init setup_barrier_nospec(enum spec_barrier_type, bool enable); -void do_barrier_nospec_fixups(enum spec_barrier_type type); +void do_barrier_nospec_fixups_kernel(enum spec_barrier_type type); +void do_barrier_nospec_fixups(enum spec_barrier_type type, + void *start, void *end); +extern enum spec_barrier_type powerpc_barrier_nospec; #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index 3f7ba0f5bf29..7b6d0ec06a21 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -72,6 +72,12 @@ int module_finalize(const Elf_Ehdr *hdr, do_feature_fixups(powerpc_firmware_features, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); + + sect = find_section(hdr, sechdrs, "__spec_barrier_fixup"); + if (sect != NULL) + do_barrier_nospec_fixups(powerpc_barrier_nospec, + (void *)sect->sh_addr, + (void *)sect->sh_addr + sect->sh_size); #endif sect = find_section(hdr, sechdrs, "__lwsync_fixup"); diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 09f21a954bfc..d1d9f047161e 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -909,11 +909,11 @@ void barrier_nospec_enable(bool enable) if (enable) { powerpc_barrier_nospec = barrier_nospec_type; - do_barrier_nospec_fixups(powerpc_barrier_nospec); + do_barrier_nospec_fixups_kernel(powerpc_barrier_nospec); on_each_cpu(do_nothing, NULL, 1); } else { powerpc_barrier_nospec = SPEC_BARRIER_NONE; - do_barrier_nospec_fixups(powerpc_barrier_nospec); + do_barrier_nospec_fixups_kernel(powerpc_barrier_nospec); } } diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 000e153184ad..b59ebc2215e8 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -156,14 +156,15 @@ void do_rfi_flush_fixups(enum l1d_flush_type types) printk(KERN_DEBUG "rfi-flush: patched %d locations\n", i); } -void do_barrier_nospec_fixups(enum spec_barrier_type type) +void do_barrier_nospec_fixups(enum spec_barrier_type type, + void *fixup_start, void *fixup_end) { unsigned int instr, *dest; long *start, *end; int i; - start = PTRRELOC(&__start___spec_barrier_fixup), - end = PTRRELOC(&__stop___spec_barrier_fixup); + start = fixup_start; + end = fixup_end; instr = 0x6000; /* nop */ @@ -182,6 +183,16 @@ void do_barrier_nospec_fixups(enum spec_barrier_type type) printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); } +void do_barrier_nospec_fixups_kernel(enum spec_barrier_type type) +{ + void *start, *end; + + start = PTRRELOC(&__start___spec_barrier_fixup), + end = PTRRELOC(&__stop___spec_barrier_fixup); + + do_barrier_nospec_fixups(type, start, end); +} + #endif /* CONFIG_PPC_BOOK3S_64 */ void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end) -- 2.13.6
[PATCH RFC 1/8] powerpc: Add barrier_nospec
Copypasta from original gmb() and rfi implementation Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/barrier.h | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 10daa1d56e0a..8e47b3abe405 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -75,6 +75,15 @@ do { \ ___p1; \ }) +/* TODO: add patching so this can be disabled */ +/* Prevent speculative execution past this barrier. */ +#define barrier_nospec_asm ori 31,31,0 +#ifdef __ASSEMBLY__ +#define barrier_nospec barrier_nospec_asm +#else +#define barrier_nospec() __asm__ __volatile__ (stringify_in_c(barrier_nospec_asm) : : :) +#endif + #include #endif /* _ASM_POWERPC_BARRIER_H */ -- 2.13.6
[PATCH RFC 4/8] powerpc/64s: Add support for ori barrier_nospec
Copypasta from rfi implementation Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/barrier.h| 4 ++-- arch/powerpc/include/asm/feature-fixups.h | 9 + arch/powerpc/include/asm/setup.h | 8 arch/powerpc/kernel/setup_64.c| 29 + arch/powerpc/kernel/vmlinux.lds.S | 7 +++ arch/powerpc/lib/feature-fixups.c | 27 +++ 6 files changed, 82 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 8e47b3abe405..4079a95e84c2 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -75,9 +75,9 @@ do { \ ___p1; \ }) -/* TODO: add patching so this can be disabled */ /* Prevent speculative execution past this barrier. */ -#define barrier_nospec_asm ori 31,31,0 +#define barrier_nospec_asm SPEC_BARRIER_FIXUP_SECTION; \ + nop #ifdef __ASSEMBLY__ #define barrier_nospec barrier_nospec_asm #else diff --git a/arch/powerpc/include/asm/feature-fixups.h b/arch/powerpc/include/asm/feature-fixups.h index 1e82eb3caabd..9d3382618ffd 100644 --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -195,11 +195,20 @@ label##3: \ FTR_ENTRY_OFFSET 951b-952b; \ .popsection; +#define SPEC_BARRIER_FIXUP_SECTION \ +953: \ + .pushsection __spec_barrier_fixup,"a"; \ + .align 2; \ +954: \ + FTR_ENTRY_OFFSET 953b-954b; \ + .popsection; + #ifndef __ASSEMBLY__ #include extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; +extern long __start___spec_barrier_fixup, __stop___spec_barrier_fixup; void apply_feature_fixups(void); void setup_feature_keys(void); diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 469b7fdc9be4..486d02e4a310 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -49,8 +49,16 @@ enum l1d_flush_type { L1D_FLUSH_MTTRIG= 0x8, }; +/* These are bit flags */ +enum spec_barrier_type { + SPEC_BARRIER_NONE = 0x1, + SPEC_BARRIER_ORI= 0x2, +}; + void __init setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +void __init setup_barrier_nospec(enum spec_barrier_type, bool enable); +void do_barrier_nospec_fixups(enum spec_barrier_type type); #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index c388cc3357fa..09f21a954bfc 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -815,6 +815,10 @@ static enum l1d_flush_type enabled_flush_types; static void *l1d_flush_fallback_area; static bool no_rfi_flush; bool rfi_flush; +enum spec_barrier_type powerpc_barrier_nospec; +static enum spec_barrier_type barrier_nospec_type; +static bool no_nospec; +bool barrier_nospec_enabled; static int __init handle_no_rfi_flush(char *p) { @@ -899,6 +903,31 @@ void __init setup_rfi_flush(enum l1d_flush_type types, bool enable) rfi_flush_enable(enable); } +void barrier_nospec_enable(bool enable) +{ + barrier_nospec_enabled = enable; + + if (enable) { + powerpc_barrier_nospec = barrier_nospec_type; + do_barrier_nospec_fixups(powerpc_barrier_nospec); + on_each_cpu(do_nothing, NULL, 1); + } else { + powerpc_barrier_nospec = SPEC_BARRIER_NONE; + do_barrier_nospec_fixups(powerpc_barrier_nospec); + } +} + +void __init setup_barrier_nospec(enum spec_barrier_type type, bool enable) +{ + if (type & SPEC_BARRIER_ORI) + pr_info("barrier_nospec: Using ori type flush\n"); + + barrier_nospec_type = type; + + if (!no_nospec) + barrier_nospec_enable(enable); +} + #ifdef CONFIG_DEBUG_FS static int rfi_flush_set(void *data, u64 val) { diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index c8af90ff49f0..744b58ff77f1 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -139,6 +139,13 @@ SECTIONS *(__rfi_flush_fixup) __stop___rfi_flush_fixup = .; } + + . = ALIGN(8); + __spec_barrier_fixup : AT(ADDR(__spec_barrier_fixup) - LOAD_OFFSET) { + __start___spec_barrier_fixup = .; + *(__spec_barrier_fixup) + __s
[PATCH RFC 0/8] powerpc barrier_nospec
Hello, this is patchset adding barrier_nospec on powerpc. It is based on the out-of-tree gmb() patch and the existing rfi patches. I do not have the tests for the Spectre/Meltdown issues available so this is untested. Feedback on the general approach as well as actual effectivity is welcome. Thanks Michal Michal Suchanek (8): powerpc: Add barrier_nospec powerpc: Use barrier_nospec in copy_from_user powerpc/64: Use barrier_nospec in syscall entry powerpc/64s: Add support for ori barrier_nospec powerpc/64: Patch barrier_nospec in modules powerpc/64: barrier_nospec: Add debugfs trigger powerpc/64s: barrier_nospec: Add hcall triggerr powerpc/64: barrier_nospec: Add commandline trigger arch/powerpc/include/asm/barrier.h| 9 arch/powerpc/include/asm/feature-fixups.h | 9 arch/powerpc/include/asm/setup.h | 11 + arch/powerpc/include/asm/uaccess.h| 11 - arch/powerpc/kernel/entry_64.S| 3 ++ arch/powerpc/kernel/module.c | 6 +++ arch/powerpc/kernel/setup_64.c| 72 +++ arch/powerpc/kernel/vmlinux.lds.S | 7 +++ arch/powerpc/lib/feature-fixups.c | 38 arch/powerpc/platforms/pseries/setup.c| 38 ++-- 10 files changed, 190 insertions(+), 14 deletions(-) -- 2.13.6
[PATCH RFC 2/8] powerpc: Use barrier_nospec in copy_from_user
Coopypasta from x86. Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/uaccess.h | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 51bfeb8777f0..af9b0e731f46 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -248,6 +248,7 @@ do { \ __chk_user_ptr(ptr);\ if (!is_kernel_addr((unsigned long)__gu_addr)) \ might_fault(); \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -258,8 +259,10 @@ do { \ long __gu_err = -EFAULT;\ unsigned long __gu_val = 0;\ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ + int can_access = access_ok(VERIFY_READ, __gu_addr, (size)); \ might_fault(); \ - if (access_ok(VERIFY_READ, __gu_addr, (size))) \ + barrier_nospec(); \ + if (can_access) \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -271,6 +274,7 @@ do { \ unsigned long __gu_val; \ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr);\ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -298,15 +302,19 @@ static inline unsigned long raw_copy_from_user(void *to, switch (n) { case 1: + barrier_nospec(); __get_user_size(*(u8 *)to, from, 1, ret); break; case 2: + barrier_nospec(); __get_user_size(*(u16 *)to, from, 2, ret); break; case 4: + barrier_nospec(); __get_user_size(*(u32 *)to, from, 4, ret); break; case 8: + barrier_nospec(); __get_user_size(*(u64 *)to, from, 8, ret); break; } @@ -314,6 +322,7 @@ static inline unsigned long raw_copy_from_user(void *to, return 0; } + barrier_nospec(); return __copy_tofrom_user((__force void __user *)to, from, n); } -- 2.13.6
[PATCH 1/2] mmc: bcm2835: reset host on timeout
The bcm2835 mmc host tends to lock up for unknown reason so reset it on timeout. The upper mmc block layer tries retransimitting with single blocks which tends to work out after a long wait. This is better than giving up and leaving the machine broken for no obvious reason. Signed-off-by: Michal Suchanek --- drivers/mmc/host/bcm2835.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/mmc/host/bcm2835.c b/drivers/mmc/host/bcm2835.c index 229dc18f0581..ce05fe72f865 100644 --- a/drivers/mmc/host/bcm2835.c +++ b/drivers/mmc/host/bcm2835.c @@ -286,6 +286,7 @@ static void bcm2835_reset(struct mmc_host *mmc) if (host->dma_chan) dmaengine_terminate_sync(host->dma_chan); + host->dma_chan = NULL; bcm2835_reset_internal(host); } @@ -837,6 +838,8 @@ static void bcm2835_timeout(struct work_struct *work) dev_err(dev, "timeout waiting for hardware interrupt.\n"); bcm2835_dumpregs(host); + bcm2835_reset(host->mmc); + if (host->data) { host->data->error = -ETIMEDOUT; bcm2835_finish_data(host); -- 2.13.6
[PATCH 2/2] mmc: bcm2835: print some informational messages during reset
The previous patch does reset during hardware error so make the reset progress more visible. Signed-off-by: Michal Suchanek --- drivers/mmc/host/bcm2835.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/bcm2835.c b/drivers/mmc/host/bcm2835.c index ce05fe72f865..4dde8b2b62a9 100644 --- a/drivers/mmc/host/bcm2835.c +++ b/drivers/mmc/host/bcm2835.c @@ -283,10 +283,14 @@ static void bcm2835_reset_internal(struct bcm2835_host *host) static void bcm2835_reset(struct mmc_host *mmc) { struct bcm2835_host *host = mmc_priv(mmc); + struct device *dev = >pdev->dev; - if (host->dma_chan) + if (host->dma_chan) { + dev_info(dev, "tearing down dma"); dmaengine_terminate_sync(host->dma_chan); + } host->dma_chan = NULL; + dev_info(dev, "resetting"); bcm2835_reset_internal(host); } -- 2.13.6
[PATCH RFC rebase 7/9] powerpc/64: barrier_nospec: Add debugfs trigger
Copypasta from rfi implementation Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/setup_64.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index f60e0e3b5ad2..f6678a7b6114 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -963,6 +963,41 @@ static __init int rfi_flush_debugfs_init(void) return 0; } device_initcall(rfi_flush_debugfs_init); + +static int barrier_nospec_set(void *data, u64 val) +{ + switch (val) { + case 0: + case 1: + break; + default: + return -EINVAL; + } + + if (!!val == !!barrier_nospec_enabled) + return 0; + + barrier_nospec_enable(!!val); + + return 0; +} + +static int barrier_nospec_get(void *data, u64 *val) +{ + *val = barrier_nospec_enabled ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_barrier_nospec, + barrier_nospec_get, barrier_nospec_set, "%llu\n"); + +static __init int barrier_nospec_debugfs_init(void) +{ + debugfs_create_file("barrier_nospec", 0600, powerpc_debugfs_root, NULL, + _barrier_nospec); + return 0; +} +device_initcall(barrier_nospec_debugfs_init); #endif ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) -- 2.13.6
[PATCH RFC rebase 8/9] powerpc/64s: barrier_nospec: Add hcall triggerr
Adapted from the RFI implementation Signed-off-by: Michal Suchanek --- arch/powerpc/platforms/pseries/mobility.c | 2 +- arch/powerpc/platforms/pseries/pseries.h | 2 +- arch/powerpc/platforms/pseries/setup.c| 37 ++- 3 files changed, 29 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 8a8033a249c7..9d506be1580e 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -349,7 +349,7 @@ void post_mobility_fixup(void) "failed: %d\n", rc); /* Possibly switch to a new RFI flush type */ - pseries_setup_rfi_flush(); + pseries_setup_rfi_nospec(); return; } diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h index 27cdcb69fd18..d49670c67686 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -100,6 +100,6 @@ static inline unsigned long cmo_get_page_size(void) int dlpar_workqueue_init(void); -void pseries_setup_rfi_flush(void); +void pseries_setup_rfi_nospec(void); #endif /* _PSERIES_PSERIES_H */ diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 9877c3dfcdc8..4b899a4db6dd 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -459,30 +459,47 @@ static void __init find_and_init_phbs(void) of_pci_check_probe_only(); } -void pseries_setup_rfi_flush(void) +void pseries_setup_rfi_nospec(void) { struct h_cpu_char_result result; - enum l1d_flush_type types; - bool enable; + enum l1d_flush_type flush_types; + enum spec_barrier_type barrier_type; + bool flush_enable; + bool barrier_enable; long rc; /* Enable by default */ - enable = true; - types = L1D_FLUSH_FALLBACK; + flush_enable = true; + flush_types = L1D_FLUSH_FALLBACK; + barrier_enable = true; + /* no fallback available if the firmware does not tell us */ + barrier_type = SPEC_BARRIER_NONE; rc = plpar_get_cpu_characteristics(); if (rc == H_SUCCESS) { if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2) - types |= L1D_FLUSH_MTTRIG; + flush_types |= L1D_FLUSH_MTTRIG; if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30) - types |= L1D_FLUSH_ORI; + flush_types |= L1D_FLUSH_ORI; + if (result.character & H_CPU_CHAR_SPEC_BAR_ORI31) + barrier_type |= SPEC_BARRIER_ORI; if ((!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) || (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) - enable = false; + flush_enable = false; + /* +* Do not check H_CPU_BEHAV_BNDS_CHK_SPEC_BAR - the ORI does +* nothing anyway when not supported. +*/ + if ((!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) + barrier_enable = false; + } else { + /* Default to fallback if case hcall is not available */ + flush_types = L1D_FLUSH_FALLBACK; } - setup_rfi_flush(types, enable); + setup_barrier_nospec(barrier_type, barrier_enable); + setup_rfi_flush(flush_types, flush_enable); } #ifdef CONFIG_PCI_IOV @@ -658,7 +675,7 @@ static void __init pSeries_setup_arch(void) fwnmi_init(); - pseries_setup_rfi_flush(); + pseries_setup_rfi_nospec(); /* By default, only probe PCI (can be overridden by rtas_pci) */ pci_add_flags(PCI_PROBE_ONLY); -- 2.13.6
[PATCH RFC rebase 6/9] powerpc/64: Patch barrier_nospec in modules
Note that unlike RFI which is patched only in kernel the nospec state reflects settings at the time the module was loaded. Iterating all modules and re-patching every time the settings change is not implemented. Based on lwsync patching. Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/setup.h | 5 - arch/powerpc/kernel/module.c | 6 ++ arch/powerpc/kernel/setup_64.c| 4 ++-- arch/powerpc/lib/feature-fixups.c | 17 ++--- 4 files changed, 26 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index c7e9e66c2a38..92520d2483b8 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -58,7 +58,10 @@ enum spec_barrier_type { void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); void setup_barrier_nospec(enum spec_barrier_type, bool enable); -void do_barrier_nospec_fixups(enum spec_barrier_type type); +void do_barrier_nospec_fixups_kernel(enum spec_barrier_type type); +void do_barrier_nospec_fixups(enum spec_barrier_type type, + void *start, void *end); +extern enum spec_barrier_type powerpc_barrier_nospec; #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index 3f7ba0f5bf29..7b6d0ec06a21 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -72,6 +72,12 @@ int module_finalize(const Elf_Ehdr *hdr, do_feature_fixups(powerpc_firmware_features, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); + + sect = find_section(hdr, sechdrs, "__spec_barrier_fixup"); + if (sect != NULL) + do_barrier_nospec_fixups(powerpc_barrier_nospec, + (void *)sect->sh_addr, + (void *)sect->sh_addr + sect->sh_size); #endif sect = find_section(hdr, sechdrs, "__lwsync_fixup"); diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 767240074cad..f60e0e3b5ad2 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -910,11 +910,11 @@ void barrier_nospec_enable(bool enable) if (enable) { powerpc_barrier_nospec = barrier_nospec_type; - do_barrier_nospec_fixups(powerpc_barrier_nospec); + do_barrier_nospec_fixups_kernel(powerpc_barrier_nospec); on_each_cpu(do_nothing, NULL, 1); } else { powerpc_barrier_nospec = SPEC_BARRIER_NONE; - do_barrier_nospec_fixups(powerpc_barrier_nospec); + do_barrier_nospec_fixups_kernel(powerpc_barrier_nospec); } } diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index dfeb7feeccef..a529ac6b2a5d 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -160,14 +160,15 @@ void do_rfi_flush_fixups(enum l1d_flush_type types) : "unknown"); } -void do_barrier_nospec_fixups(enum spec_barrier_type type) +void do_barrier_nospec_fixups(enum spec_barrier_type type, + void *fixup_start, void *fixup_end) { unsigned int instr, *dest; long *start, *end; int i; - start = PTRRELOC(&__start___spec_barrier_fixup), - end = PTRRELOC(&__stop___spec_barrier_fixup); + start = fixup_start; + end = fixup_end; instr = 0x6000; /* nop */ @@ -186,6 +187,16 @@ void do_barrier_nospec_fixups(enum spec_barrier_type type) printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); } +void do_barrier_nospec_fixups_kernel(enum spec_barrier_type type) +{ + void *start, *end; + + start = PTRRELOC(&__start___spec_barrier_fixup), + end = PTRRELOC(&__stop___spec_barrier_fixup); + + do_barrier_nospec_fixups(type, start, end); +} + #endif /* CONFIG_PPC_BOOK3S_64 */ void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end) -- 2.13.6
[PATCH RFC rebase 5/9] powerpc/64s: Add support for ori barrier_nospec patching
Based on the RFI patching. This is required to be able to disable the speculation barrier. Only one barrier type is supported and it does nothing when the firmware does not enable it. Also re-patching modules is not supported So the only meaningful thing that can be done is patching out the speculation barrier at boot when the user says it is not wanted. Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/barrier.h| 4 ++-- arch/powerpc/include/asm/feature-fixups.h | 9 + arch/powerpc/include/asm/setup.h | 8 arch/powerpc/kernel/setup_64.c| 30 ++ arch/powerpc/kernel/vmlinux.lds.S | 7 +++ arch/powerpc/lib/feature-fixups.c | 27 +++ 6 files changed, 83 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 8e47b3abe405..4079a95e84c2 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -75,9 +75,9 @@ do { \ ___p1; \ }) -/* TODO: add patching so this can be disabled */ /* Prevent speculative execution past this barrier. */ -#define barrier_nospec_asm ori 31,31,0 +#define barrier_nospec_asm SPEC_BARRIER_FIXUP_SECTION; \ + nop #ifdef __ASSEMBLY__ #define barrier_nospec barrier_nospec_asm #else diff --git a/arch/powerpc/include/asm/feature-fixups.h b/arch/powerpc/include/asm/feature-fixups.h index 1e82eb3caabd..9d3382618ffd 100644 --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -195,11 +195,20 @@ label##3: \ FTR_ENTRY_OFFSET 951b-952b; \ .popsection; +#define SPEC_BARRIER_FIXUP_SECTION \ +953: \ + .pushsection __spec_barrier_fixup,"a"; \ + .align 2; \ +954: \ + FTR_ENTRY_OFFSET 953b-954b; \ + .popsection; + #ifndef __ASSEMBLY__ #include extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; +extern long __start___spec_barrier_fixup, __stop___spec_barrier_fixup; void apply_feature_fixups(void); void setup_feature_keys(void); diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index bbcdf929be54..c7e9e66c2a38 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -49,8 +49,16 @@ enum l1d_flush_type { L1D_FLUSH_MTTRIG= 0x8, }; +/* These are bit flags */ +enum spec_barrier_type { + SPEC_BARRIER_NONE = 0x1, + SPEC_BARRIER_ORI= 0x2, +}; + void setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +void setup_barrier_nospec(enum spec_barrier_type, bool enable); +void do_barrier_nospec_fixups(enum spec_barrier_type type); #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 4ec4a27b36a9..767240074cad 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -815,6 +815,10 @@ static enum l1d_flush_type enabled_flush_types; static void *l1d_flush_fallback_area; static bool no_rfi_flush; bool rfi_flush; +enum spec_barrier_type powerpc_barrier_nospec; +static enum spec_barrier_type barrier_nospec_type; +static bool no_nospec; +bool barrier_nospec_enabled; static int __init handle_no_rfi_flush(char *p) { @@ -900,6 +904,32 @@ void setup_rfi_flush(enum l1d_flush_type types, bool enable) rfi_flush_enable(enable); } +void barrier_nospec_enable(bool enable) +{ + barrier_nospec_enabled = enable; + + if (enable) { + powerpc_barrier_nospec = barrier_nospec_type; + do_barrier_nospec_fixups(powerpc_barrier_nospec); + on_each_cpu(do_nothing, NULL, 1); + } else { + powerpc_barrier_nospec = SPEC_BARRIER_NONE; + do_barrier_nospec_fixups(powerpc_barrier_nospec); + } +} + +void setup_barrier_nospec(enum spec_barrier_type type, bool enable) +{ + /* +* Only one barrier type is supported and it does nothing when the +* firmware does not enable it. So the only meaningful thing to do +* here is check the user preference. +*/ + barrier_nospec_type = SPEC_BARRIER_ORI; + + barrier_nospec_enable(!no_nospec && enable); +} + #ifdef CONFIG_DEBUG_FS static int rfi_flush_set(void *data, u64 val) { diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index c8af90ff49f0..744b58ff77f1 100644 --- a/arch/powerpc/kernel/v
[PATCH RFC rebase 9/9] powerpc/64: barrier_nospec: Add commandline trigger
Add commandline options spectre_v2 and nospectre_v2 These are named same as similar x86 options regardless of actual effect to not require platform-specific configuration. Supported options: nospectre_v2 or spectre_v2=off - speculation barrier not used spectre_v2=on or spectre_v2=auto - speculation barrier used Changing the settings after boot is not supported and VM migration may change requirements so auto is same as on. Based on s390 implementation Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/setup_64.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index f6678a7b6114..c74e656265df 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -840,6 +840,28 @@ static int __init handle_no_pti(char *p) } early_param("nopti", handle_no_pti); +static int __init nospectre_v2_setup_early(char *str) +{ + no_nospec = true; + return 0; +} +early_param("nospectre_v2", nospectre_v2_setup_early); + +static int __init spectre_v2_setup_early(char *str) +{ + if (str && !strncmp(str, "on", 2)) + no_nospec = false; + + if (str && !strncmp(str, "off", 3)) + no_nospec = true; + + if (str && !strncmp(str, "auto", 4)) + no_nospec = false; + + return 0; +} +early_param("spectre_v2", spectre_v2_setup_early); + static void do_nothing(void *unused) { /* -- 2.13.6
[PATCH RFC rebase 1/9] powerpc: Add barrier_nospec
When the firmware supports it an otherwise useless combination of ORI instruction arguments is interpreted as speculation barrier. Implement barrier_nospec using this instruction. Based on the out-of-tree gmb() implementation. Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/barrier.h | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 10daa1d56e0a..8e47b3abe405 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -75,6 +75,15 @@ do { \ ___p1; \ }) +/* TODO: add patching so this can be disabled */ +/* Prevent speculative execution past this barrier. */ +#define barrier_nospec_asm ori 31,31,0 +#ifdef __ASSEMBLY__ +#define barrier_nospec barrier_nospec_asm +#else +#define barrier_nospec() __asm__ __volatile__ (stringify_in_c(barrier_nospec_asm) : : :) +#endif + #include #endif /* _ASM_POWERPC_BARRIER_H */ -- 2.13.6
[PATCH RFC rebase 4/9] powerpc/64s: Use barrier_nospec in RFI_FLUSH_SLOT
The RFI flush support patches the speculation barrier into RFI_FLUSH_SLOT as part of the RFI flush. Use separate barrier_nospec instead. Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/exception-64s.h | 2 +- arch/powerpc/lib/feature-fixups.c| 9 +++-- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 471b2274fbeb..bb5a3052b29b 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -81,9 +81,9 @@ * L1-D cache when returning to userspace or a guest. */ #define RFI_FLUSH_SLOT \ + barrier_nospec_asm; \ RFI_FLUSH_FIXUP_SECTION;\ nop;\ - nop;\ nop #define RFI_TO_KERNEL \ diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 35f80ab7cbd8..4cc2f0c5c863 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -119,7 +119,7 @@ void do_feature_fixups(unsigned long value, void *fixup_start, void *fixup_end) #ifdef CONFIG_PPC_BOOK3S_64 void do_rfi_flush_fixups(enum l1d_flush_type types) { - unsigned int instrs[3], *dest; + unsigned int instrs[2], *dest; long *start, *end; int i; @@ -128,15 +128,13 @@ void do_rfi_flush_fixups(enum l1d_flush_type types) instrs[0] = 0x6000; /* nop */ instrs[1] = 0x6000; /* nop */ - instrs[2] = 0x6000; /* nop */ if (types & L1D_FLUSH_FALLBACK) - /* b .+16 to fallback flush */ - instrs[0] = 0x4810; + /* b .+12 to fallback flush */ + instrs[0] = 0x480c; i = 0; if (types & L1D_FLUSH_ORI) { - instrs[i++] = 0x63ff; /* ori 31,31,0 speculation barrier */ instrs[i++] = 0x63de; /* ori 30,30,0 L1d flush*/ } @@ -150,7 +148,6 @@ void do_rfi_flush_fixups(enum l1d_flush_type types) patch_instruction(dest, instrs[0]); patch_instruction(dest + 1, instrs[1]); - patch_instruction(dest + 2, instrs[2]); } printk(KERN_DEBUG "rfi-flush: patched %d locations (%s flush)\n", i, -- 2.13.6
[PATCH RFC rebase 3/9] powerpc/64: Use barrier_nospec in syscall entry
On powerpc syscall entry is done in assembly so patch in an explicit barrier_nospec. Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/entry_64.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2cb5109a7ea3..7bfc4cf48af2 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -36,6 +36,7 @@ #include #include #include +#include #include #ifdef CONFIG_PPC_BOOK3S #include @@ -159,6 +160,7 @@ system_call:/* label this so stack traces look sane */ andi. r11,r10,_TIF_SYSCALL_DOTRACE bne .Lsyscall_dotrace /* does not return */ cmpldi 0,r0,NR_syscalls + barrier_nospec bge-.Lsyscall_enosys .Lsyscall: @@ -319,6 +321,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r10,TI_FLAGS(r10) cmpldi r0,NR_syscalls + barrier_nospec blt+.Lsyscall /* Return code is already in r3 thanks to do_syscall_trace_enter() */ -- 2.13.6
[PATCH RFC rebase 2/9] powerpc: Use barrier_nospec in copy_from_user
This is based on x86 patch doing the same. Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/uaccess.h | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 51bfeb8777f0..af9b0e731f46 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -248,6 +248,7 @@ do { \ __chk_user_ptr(ptr);\ if (!is_kernel_addr((unsigned long)__gu_addr)) \ might_fault(); \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -258,8 +259,10 @@ do { \ long __gu_err = -EFAULT;\ unsigned long __gu_val = 0;\ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ + int can_access = access_ok(VERIFY_READ, __gu_addr, (size)); \ might_fault(); \ - if (access_ok(VERIFY_READ, __gu_addr, (size))) \ + barrier_nospec(); \ + if (can_access) \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -271,6 +274,7 @@ do { \ unsigned long __gu_val; \ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr);\ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -298,15 +302,19 @@ static inline unsigned long raw_copy_from_user(void *to, switch (n) { case 1: + barrier_nospec(); __get_user_size(*(u8 *)to, from, 1, ret); break; case 2: + barrier_nospec(); __get_user_size(*(u16 *)to, from, 2, ret); break; case 4: + barrier_nospec(); __get_user_size(*(u32 *)to, from, 4, ret); break; case 8: + barrier_nospec(); __get_user_size(*(u64 *)to, from, 8, ret); break; } @@ -314,6 +322,7 @@ static inline unsigned long raw_copy_from_user(void *to, return 0; } + barrier_nospec(); return __copy_tofrom_user((__force void __user *)to, from, n); } -- 2.13.6
[PATCH RFC rebase 0/9] powerpc barrier_nospec
Yes, it is good idea to add some commit messages. Also I rebased the patches on top v3 of series Setup RFI flush after PowerVM LPM migration Thanks Michal Michal Suchanek (9): powerpc: Add barrier_nospec powerpc: Use barrier_nospec in copy_from_user powerpc/64: Use barrier_nospec in syscall entry powerpc/64s: Use barrier_nospec in RFI_FLUSH_SLOT powerpc/64s: Add support for ori barrier_nospec patching powerpc/64: Patch barrier_nospec in modules powerpc/64: barrier_nospec: Add debugfs trigger powerpc/64s: barrier_nospec: Add hcall triggerr powerpc/64: barrier_nospec: Add commandline trigger arch/powerpc/include/asm/barrier.h| 9 arch/powerpc/include/asm/exception-64s.h | 2 +- arch/powerpc/include/asm/feature-fixups.h | 9 arch/powerpc/include/asm/setup.h | 11 arch/powerpc/include/asm/uaccess.h| 11 +++- arch/powerpc/kernel/entry_64.S| 3 ++ arch/powerpc/kernel/module.c | 6 +++ arch/powerpc/kernel/setup_64.c| 87 +++ arch/powerpc/kernel/vmlinux.lds.S | 7 +++ arch/powerpc/lib/feature-fixups.c | 47 ++--- arch/powerpc/platforms/pseries/mobility.c | 2 +- arch/powerpc/platforms/pseries/pseries.h | 2 +- arch/powerpc/platforms/pseries/setup.c| 37 + 13 files changed, 213 insertions(+), 20 deletions(-) -- 2.13.6
[PATCH] powerpc/xmon: really enable xmon when a breakpoint is set
When single-stepping kernel code from xmon without a debug hook enabled the kernel crashes. This can happen when kernel starts with xmon on crash disabled but xmon is entered using sysrq. Commit e1368d0c9edb ("powerpc/xmon: Setup debugger hooks when first break-point is set") adds force_enable_xmon function that prints "xmon: Enabling debugger hooks" but does not enable them. Add the call to xmon_init to install the debugger hooks in force_enable_xmon and also call force_enable_xmon when single-stepping in xmon. Fixes: e1368d0c9edb ("powerpc/xmon: Setup debugger hooks when first break-point is set") Signed-off-by: Michal Suchanek --- arch/powerpc/xmon/xmon.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a0842f1ff72c..504bd1c3d8b0 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -179,6 +179,9 @@ static const char *getvecname(unsigned long vec); static int do_spu_cmd(void); +static void xmon_init(int enable); +static inline void force_enable_xmon(void); + #ifdef CONFIG_44x static void dump_tlb_44x(void); #endif @@ -1094,6 +1097,7 @@ static int do_step(struct pt_regs *regs) unsigned int instr; int stepped; + force_enable_xmon(); /* check we are in 64-bit kernel mode, translation enabled */ if ((regs->msr & (MSR_64BIT|MSR_PR|MSR_IR)) == (MSR_64BIT|MSR_IR)) { if (mread(regs->nip, , 4) == 4) { @@ -1275,6 +1279,7 @@ static inline void force_enable_xmon(void) if (!xmon_on) { printf("xmon: Enabling debugger hooks\n"); xmon_on = 1; + xmon_init(1); } } -- 2.13.6
[PATCH v7 3/4] lib/cmdline.c Remove quotes symmetrically.
Remove quotes from argument value only if there is qoute on both sides. Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/fadump.c | 6 ++ lib/cmdline.c| 7 ++- 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index a1614d9b8a21..d7da4ce9f7ae 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -489,10 +489,8 @@ static void __init fadump_update_params(struct param_info *param_info, *tgt++ = ' '; /* next_arg removes one leading and one trailing '"' */ - if (*tgt == '"') - shortening += 1; - if (*(tgt + vallen + shortening) == '"') - shortening += 1; + if ((*tgt == '"') && (*(tgt + vallen + shortening) == '"')) + shortening += 2; /* remove one leading and one trailing quote if both are present */ if ((val[0] == '"') && (val[vallen - 1] == '"')) { diff --git a/lib/cmdline.c b/lib/cmdline.c index 4c0888c4a68d..01e701b2afe8 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -227,14 +227,11 @@ char *next_arg(char *args, char **param, char **val) *val = args + equals + 1; /* Don't include quotes in value. */ - if (**val == '"') { + if ((**val == '"') && (args[i-1] == '"')) { (*val)++; - if (args[i-1] == '"') - args[i-1] = '\0'; + args[i-1] = '\0'; } } - if (quoted && args[i-1] == '"') - args[i-1] = '\0'; if (args[i]) { args[i] = '\0'; -- 2.10.2
[PATCH v7 1/4] powerpc/fadump: reduce memory consumption for capture kernel
From: Hari Bathini With fadump (dump capture) kernel booting like a regular kernel, it needs almost the same amount of memory to boot as the production kernel, which is unwarranted for a dump capture kernel. But with no option to disable some of the unnecessary subsystems in fadump kernel, that much memory is wasted on fadump, depriving the production kernel of that memory. Introduce kernel parameter 'fadump_extra_args=' that would take regular parameters as a space separated quoted string, to be enforced when fadump is active. This 'fadump_extra_args=' parameter can be leveraged to pass parameters like nr_cpus=1, cgroup_disable=memory and numa=off, to disable unwarranted resources/subsystems. Also, ensure the log "Firmware-assisted dump is active" is printed early in the boot process to put the subsequent fadump messages in context. Suggested-by: Michael Ellerman Signed-off-by: Hari Bathini Signed-off-by: Michal Suchanek --- Changes from v6: Correct and simplify quote handling. Ideally I would like to extend parse_args to give the length of the original quoted value to callback. However, parse_args removes at most one doubel-quote from the start and one from the end so that is easy to detect. Otherwise all other users will have to be updated to trash the new argument. --- arch/powerpc/include/asm/fadump.h | 2 + arch/powerpc/kernel/fadump.c | 109 -- arch/powerpc/kernel/prom.c| 7 +++ 3 files changed, 115 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index ce88bbe1d809..98ae00943fb3 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -208,11 +208,13 @@ extern int early_init_dt_scan_fw_dump(unsigned long node, const char *uname, int depth, void *data); extern int fadump_reserve_mem(void); extern int setup_fadump(void); +extern void enforce_fadump_extra_args(char *cmdline); extern int is_fadump_active(void); extern void crash_fadump(struct pt_regs *, const char *); extern void fadump_cleanup(void); #else /* CONFIG_FA_DUMP */ +static inline void enforce_fadump_extra_args(char *cmdline) { } static inline int is_fadump_active(void) { return 0; } static inline void crash_fadump(struct pt_regs *regs, const char *str) { } #endif diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index dc0c49cfd90a..a1614d9b8a21 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, * dump data waiting for us. */ fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL); - if (fdm_active) + if (fdm_active) { + pr_info("Firmware-assisted dump is active.\n"); fw_dump.dump_active = 1; + } /* Get the sizes required to store dump data for the firmware provided * dump sections. @@ -332,8 +334,11 @@ int __init fadump_reserve_mem(void) { unsigned long base, size, memory_boundary; - if (!fw_dump.fadump_enabled) + if (!fw_dump.fadump_enabled) { + if (fw_dump.dump_active) + pr_warn("Firmware-assisted dump was active but kernel booted with fadump disabled!\n"); return 0; + } if (!fw_dump.fadump_supported) { printk(KERN_INFO "Firmware-assisted dump is not supported on" @@ -373,7 +378,6 @@ int __init fadump_reserve_mem(void) memory_boundary = memblock_end_of_DRAM(); if (fw_dump.dump_active) { - printk(KERN_INFO "Firmware-assisted dump is active.\n"); /* * If last boot has crashed then reserve all the memory * above boot_memory_size so that we don't touch it until @@ -460,6 +464,105 @@ static int __init early_fadump_reserve_mem(char *p) } early_param("fadump_reserve_mem", early_fadump_reserve_mem); +#define FADUMP_EXTRA_ARGS_PARAM"fadump_extra_args=" +#define FADUMP_EXTRA_ARGS_LEN (strlen(FADUMP_EXTRA_ARGS_PARAM) - 1) + +struct param_info { + char*cmdline; + char*tmp_cmdline; + int shortening; +}; + +static void __init fadump_update_params(struct param_info *param_info, + char *param, char *val) +{ + ptrdiff_t param_offset = param - param_info->tmp_cmdline; + size_t vallen = val ? strlen(val) : 0; + char *tgt = param_info->cmdline + param_offset + + FADUMP_EXTRA_ARGS_LEN - param_info->shortening; + int shortening = 0; + + if (!val) + return; + + /* remove '=' */ + *tgt++ = ' '; + + /* next_arg removes one leading and one trailing '"' */ +
[PATCH v7 4/4] boot/param: add pointer to next argument to unknown parameter callback
The fadump parameter processing re-does the logic of next_arg quote stripping to determine where the argument ends. Pass pointer to the next argument instead to make this more robust. Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/fadump.c | 13 + arch/powerpc/mm/hugetlbpage.c | 4 ++-- include/linux/moduleparam.h | 2 +- init/main.c | 12 ++-- kernel/module.c | 4 ++-- kernel/params.c | 19 +++ lib/dynamic_debug.c | 2 +- 7 files changed, 28 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index d7da4ce9f7ae..6ef96711ee9a 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -474,13 +474,14 @@ struct param_info { }; static void __init fadump_update_params(struct param_info *param_info, - char *param, char *val) + char *param, char *val, char *next) { ptrdiff_t param_offset = param - param_info->tmp_cmdline; size_t vallen = val ? strlen(val) : 0; char *tgt = param_info->cmdline + param_offset + FADUMP_EXTRA_ARGS_LEN - param_info->shortening; - int shortening = 0; + int shortening = ((next - 1) - (param)) + - (FADUMP_EXTRA_ARGS_LEN + 1 + vallen); if (!val) return; @@ -488,10 +489,6 @@ static void __init fadump_update_params(struct param_info *param_info, /* remove '=' */ *tgt++ = ' '; - /* next_arg removes one leading and one trailing '"' */ - if ((*tgt == '"') && (*(tgt + vallen + shortening) == '"')) - shortening += 2; - /* remove one leading and one trailing quote if both are present */ if ((val[0] == '"') && (val[vallen - 1] == '"')) { shortening += 2; @@ -517,7 +514,7 @@ static void __init fadump_update_params(struct param_info *param_info, * to enforce the parameters passed through it */ static int __init fadump_rework_cmdline_params(char *param, char *val, - const char *unused, void *arg) + char *next, const char *unused, void *arg) { struct param_info *param_info = (struct param_info *)arg; @@ -525,7 +522,7 @@ static int __init fadump_rework_cmdline_params(char *param, char *val, strlen(FADUMP_EXTRA_ARGS_PARAM) - 1)) return 0; - fadump_update_params(param_info, param, val); + fadump_update_params(param_info, param, val, next); return 0; } diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index e1bf5ca397fe..3a4cce552906 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -268,8 +268,8 @@ int alloc_bootmem_huge_page(struct hstate *hstate) unsigned long gpage_npages[MMU_PAGE_COUNT]; -static int __init do_gpage_early_setup(char *param, char *val, - const char *unused, void *arg) +static int __init do_gpage_early_setup(char *param, char *val, char *unused1, + const char *unused2, void *arg) { static phys_addr_t size; unsigned long npages; diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h index 1ee7b30dafec..fec05a186c08 100644 --- a/include/linux/moduleparam.h +++ b/include/linux/moduleparam.h @@ -326,7 +326,7 @@ extern char *parse_args(const char *name, s16 level_min, s16 level_max, void *arg, - int (*unknown)(char *param, char *val, + int (*unknown)(char *param, char *val, char *next, const char *doing, void *arg)); /* Called by module remove. */ diff --git a/init/main.c b/init/main.c index 052481fbe363..920c3564b2f0 100644 --- a/init/main.c +++ b/init/main.c @@ -239,7 +239,7 @@ static int __init loglevel(char *str) early_param("loglevel", loglevel); /* Change NUL term back to "=", to make "param" the whole string. */ -static int __init repair_env_string(char *param, char *val, +static int __init repair_env_string(char *param, char *val, char *unused2, const char *unused, void *arg) { if (val) { @@ -257,7 +257,7 @@ static int __init repair_env_string(char *param, char *val, } /* Anything after -- gets handed straight to init. */ -static int __init set_init_arg(char *param, char *val, +static int __init set_init_arg(char *param, char *val, char *unused2, const char *unused, void *arg) { unsigned int i; @@ -265,7 +265,7 @@ static int __init set_init_arg(char *param, char *val, if (panic_later) return 0; -
[PATCH v7 2/4] powerpc/fadump: update documentation about 'fadump_extra_args=' parameter
From: Hari Bathini With the introduction of 'fadump_extra_args=' parameter to pass additional parameters to fadump (capture) kernel, update documentation about it. Signed-off-by: Hari Bathini Signed-off-by: Michal Suchanek --- Documentation/powerpc/firmware-assisted-dump.txt | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index bdd344aa18d9..2df88524d2c7 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -162,7 +162,19 @@ How to enable firmware-assisted dump (fadump): 1. Set config option CONFIG_FA_DUMP=y and build kernel. 2. Boot into linux kernel with 'fadump=on' kernel cmdline option. -3. Optionally, user can also set 'crashkernel=' kernel cmdline +3. A user can pass additional command line parameters as a space + separated quoted list through 'fadump_extra_args=' parameter, + to be enforced when fadump is active. For example, parameter + 'fadump_extra_args="nr_cpus=1 numa=off udev.children-max=2"' + will be changed to 'fadump_extra_args nr_cpus=1 numa=off + udev.children-max=2' in-place when fadump is active. This + parameter has no affect when fadump is not active. Multiple + instances of 'fadump_extra_args=' can be passed. This provision + can be used to reduce memory consumption during dump capture by + disabling unwarranted resources/subsystems like CPUs, NUMA + and such. Value with spaces can be passed as + 'fadump_extra_args=""parameter="value with spaces"""' +4. Optionally, user can also set 'crashkernel=' kernel cmdline to specify size of the memory to reserve for boot memory dump preservation. @@ -172,6 +184,12 @@ NOTE: 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead 2. If firmware-assisted dump fails to reserve memory then it will fallback to existing kdump mechanism if 'crashkernel=' option is set at kernel cmdline. + 3. Special parameters like '--' passed inside fadump_extra_args are also + just left in-place. So, the user is advised to consider this while + specifying such parameters. It may be required to quote the argument + to fadump_extra_args when the bootloader uses double-quotes as + argument delimiter as well. eg +append = " fadump_extra_args=\"nr_cpus=1 numa=off udev.children-max=2\"" Sysfs/debugfs files: -- 2.10.2
[PATCH] bootwrapper: mspsc.c: fix pointer-to-int-cast warnings
I get these warnings: ../arch/powerpc/boot/mpsc.c: In function 'mpsc_get_virtreg_of_phandle': ../arch/powerpc/boot/mpsc.c:113:35: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] ../arch/powerpc/boot/mpsc.c: In function 'mpsc_console_init': ../arch/powerpc/boot/mpsc.c:147:12: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Presumably the patch below fixes these, and presumably the DT defines that pointes and integers have the same size in the DT so this is fine regardless of 32bit/64bit target. I have not found a DT definition for PowerPC, howewer. So any bugs in the property sizing and resulting failures to read the properties are left as before. Signed-off-by: Michal Suchanek --- arch/powerpc/boot/mpsc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/boot/mpsc.c b/arch/powerpc/boot/mpsc.c index 425ad88cce8d..ea740493277a 100644 --- a/arch/powerpc/boot/mpsc.c +++ b/arch/powerpc/boot/mpsc.c @@ -110,7 +110,7 @@ static volatile char *mpsc_get_virtreg_of_phandle(void *devp, char *prop) if (n != sizeof(v)) goto err_out; - devp = find_node_by_linuxphandle((u32)v); + devp = find_node_by_linuxphandle((intptr_t)v); if (devp == NULL) goto err_out; @@ -144,7 +144,7 @@ int mpsc_console_init(void *devp, struct serial_console_data *scdp) n = getprop(devp, "cell-index", , sizeof(v)); if (n != sizeof(v)) goto err_out; - reg_set = (int)v; + reg_set = (intptr_t)v; mpscintr_base += (reg_set == 0) ? 0x4 : 0xc; -- 2.10.2
[PATCH 0/6] Fix cdrom autoclose
Hello, there is cdrom autoclose feature that is supposed to close the tray, wait for the disc to become ready, and then open the device. This used to work in ancient times. Then in old times there was a hack in util-linux which worked around the breakage which probably resulted from switching to scsi emulation. Currently util-linux maintainer refuses to merge another hack on the basis that kernel still has the feature so it should be fixed there. Indeed, to implement this feature effectively from userspace one would need to know when the CD-ROM is in the "drive becoming ready" state which is knowledge that never leaves the hardware-specific driver and is passed neither to userspace nor the generic cdrom driver. So this patchset fixes the kernel autoclose implementation in cdrom.c and to do so reports the "drive becoming ready" state from the harware specific drivers. Michal Suchanek (6): delay: add poll_event_interruptible cdrom: factor out common open_for_* code cdrom: wait for tray to close cdrom: introduce CDS_DRIVE_ERROR Documentetion: cdrom: introduce CDS_DRIVE_ERROR cdrom: wait for drive to become ready Documentation/cdrom/cdrom-standard.tex | 8 ++- Documentation/cdrom/ide-cd | 6 ++ Documentation/ioctl/cdrom.txt | 1 + drivers/block/paride/pcd.c | 2 +- drivers/cdrom/cdrom.c | 124 - drivers/cdrom/gdrom.c | 2 +- drivers/ide/ide-cd_ioctl.c | 12 ++-- drivers/scsi/sr_ioctl.c| 2 +- include/linux/delay.h | 12 include/uapi/linux/cdrom.h | 1 + 10 files changed, 99 insertions(+), 71 deletions(-) -- 2.13.6
[PATCH 3/6] cdrom: wait for tray to close
The scsi command to close tray only starts the motor and does not wait for the tray to close. Wait until the state chages from TRAY_OPEN so users do not race with the tray closing. This looks like inifinte wait but unless the drive is broken it either closes the tray within a few seconds or reports an error when it detects the tray is blocked. At worst the wait can be interrupted by user. Signed-off-by: Michal Suchanek --- drivers/cdrom/cdrom.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index e976d3d0180d..040d3d466cd7 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -281,7 +281,9 @@ #include #include #include +#include #include +#include #include /* used to tell the module to turn on full debugging messages */ @@ -1030,6 +1032,18 @@ static void cdrom_count_tracks(struct cdrom_device_info *cdi, tracktype *tracks) tracks->cdi, tracks->xa); } +static int tray_close(struct cdrom_device_info *cdi) +{ + int ret; + + ret = cdi->ops->tray_move(cdi, 0); + if (ret) + return ret; + + return poll_event_interruptible(CDS_TRAY_OPEN != + cdi->ops->drive_status(cdi, CDSL_CURRENT), 500); +} + static int open_for_common(struct cdrom_device_info *cdi, tracktype *tracks) { @@ -1048,7 +1062,9 @@ int open_for_common(struct cdrom_device_info *cdi, tracktype *tracks) if (CDROM_CAN(CDC_CLOSE_TRAY) && cdi->options & CDO_AUTO_CLOSE) { cd_dbg(CD_OPEN, "trying to close the tray\n"); - ret = cdo->tray_move(cdi, 0); + ret = tray_close(cdi); + if (ret == -ERESTARTSYS) + return ret; if (ret) { cd_dbg(CD_OPEN, "bummer. tried to close the tray but failed.\n"); /* Ignore the error from the low @@ -2312,7 +2328,8 @@ static int cdrom_ioctl_closetray(struct cdrom_device_info *cdi) if (!CDROM_CAN(CDC_CLOSE_TRAY)) return -ENOSYS; - return cdi->ops->tray_move(cdi, 0); + + return tray_close(cdi); } static int cdrom_ioctl_eject_sw(struct cdrom_device_info *cdi, -- 2.13.6
[PATCH 4/6] cdrom: introduce CDS_DRIVE_ERROR
CDS_DRIVE_NOT_READY is used for the state in which CDROM is 'becoming ready' (typically analyzing the disc) but also as the fallback when nothing else applies. Introduce CDS_DRIVE_ERROR for the fallback case. Signed-off-by: Michal Suchanek --- drivers/block/paride/pcd.c | 2 +- drivers/cdrom/gdrom.c | 2 +- drivers/ide/ide-cd_ioctl.c | 12 drivers/scsi/sr_ioctl.c| 2 +- include/uapi/linux/cdrom.h | 1 + 5 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/block/paride/pcd.c b/drivers/block/paride/pcd.c index 7b8c6368beb7..6e00093ff34e 100644 --- a/drivers/block/paride/pcd.c +++ b/drivers/block/paride/pcd.c @@ -605,7 +605,7 @@ static int pcd_drive_status(struct cdrom_device_info *cdi, int slot_nr) struct pcd_unit *cd = cdi->handle; if (pcd_ready_wait(cd, PCD_READY_TMO)) - return CDS_DRIVE_NOT_READY; + return CDS_DRIVE_ERROR; if (pcd_atapi(cd, rc_cmd, 8, pcd_scratch, DBMSG("check media"))) return CDS_NO_DISC; return CDS_DISC_OK; diff --git a/drivers/cdrom/gdrom.c b/drivers/cdrom/gdrom.c index 6495b03f576c..702f255bbe42 100644 --- a/drivers/cdrom/gdrom.c +++ b/drivers/cdrom/gdrom.c @@ -390,7 +390,7 @@ static int gdrom_drivestatus(struct cdrom_device_info *cd_info, int ignore) if (sense == 0) return CDS_DISC_OK; if (sense == 0x20) - return CDS_DRIVE_NOT_READY; + return CDS_DRIVE_ERROR; /* default */ return CDS_NO_INFO; } diff --git a/drivers/ide/ide-cd_ioctl.c b/drivers/ide/ide-cd_ioctl.c index 2acca12b9c94..9a26f50a2092 100644 --- a/drivers/ide/ide-cd_ioctl.c +++ b/drivers/ide/ide-cd_ioctl.c @@ -62,9 +62,13 @@ int ide_cdrom_drive_status(struct cdrom_device_info *cdi, int slot_nr) return CDS_NO_DISC; } - if (sense.sense_key == NOT_READY && sense.asc == 0x04 - && sense.ascq == 0x04) - return CDS_DISC_OK; + if (sense.sense_key == NOT_READY && sense.asc == 0x04) + switch (sense.ascq) { + case 0x01: + return CDS_DRIVE_NOT_READY; + case 0x04: + return CDS_DISC_OK; + } /* * If not using Mt Fuji extended media tray reports, @@ -77,7 +81,7 @@ int ide_cdrom_drive_status(struct cdrom_device_info *cdi, int slot_nr) else return CDS_TRAY_OPEN; } - return CDS_DRIVE_NOT_READY; + return CDS_DRIVE_ERROR; } /* diff --git a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c index 2a21f2d48592..7c93f12a9cb8 100644 --- a/drivers/scsi/sr_ioctl.c +++ b/drivers/scsi/sr_ioctl.c @@ -333,7 +333,7 @@ int sr_drive_status(struct cdrom_device_info *cdi, int slot) else return CDS_TRAY_OPEN; - return CDS_DRIVE_NOT_READY; + return CDS_DRIVE_ERROR; } int sr_disk_status(struct cdrom_device_info *cdi) diff --git a/include/uapi/linux/cdrom.h b/include/uapi/linux/cdrom.h index 2817230148fd..339b1435f44e 100644 --- a/include/uapi/linux/cdrom.h +++ b/include/uapi/linux/cdrom.h @@ -398,6 +398,7 @@ struct cdrom_generic_command #define CDS_TRAY_OPEN 2 #define CDS_DRIVE_NOT_READY3 #define CDS_DISC_OK4 +#define CDS_DRIVE_ERROR5 /* return values for the CDROM_DISC_STATUS ioctl */ /* can also return CDS_NO_[INFO|DISC], from above */ -- 2.13.6
[PATCH 6/6] cdrom: wait for drive to become ready
When the drive closes it can take tens of seconds until the disc is analyzed. Wait for the drive to become ready or report an error. Signed-off-by: Michal Suchanek --- drivers/cdrom/cdrom.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 040d3d466cd7..a483f34b7648 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -1087,6 +1087,15 @@ int open_for_common(struct cdrom_device_info *cdi, tracktype *tracks) } cd_dbg(CD_OPEN, "the tray is now closed\n"); } + /* the door should be closed now, check for the disc */ + if (ret == CDS_DRIVE_NOT_READY) { + int poll_res = poll_event_interruptible( + CDS_DRIVE_NOT_READY != + (ret = cdo->drive_status(cdi, CDSL_CURRENT)), + 500); + if (poll_res == -ERESTARTSYS) + return poll_res; + } if (ret != CDS_DISC_OK) return -ENOMEDIUM; } -- 2.13.6
[PATCH 5/6] Documentetion: cdrom: introduce CDS_DRIVE_ERROR
CDS_DRIVE_NOT_READY is used for the state in which CDROM is 'becoming ready' (typically analyzing the disc) but also as the fallback when nothing else applies. Introduce CDS_DRIVE_ERROR for the fallback case. Signed-off-by: Michal Suchanek --- Documentation/cdrom/cdrom-standard.tex | 8 +++- Documentation/cdrom/ide-cd | 6 ++ Documentation/ioctl/cdrom.txt | 1 + 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/cdrom/cdrom-standard.tex b/Documentation/cdrom/cdrom-standard.tex index 8f85b0e41046..018284ba696a 100644 --- a/Documentation/cdrom/cdrom-standard.tex +++ b/Documentation/cdrom/cdrom-standard.tex @@ -371,11 +371,17 @@ $$ CDS_NO_INFO& no information available\cr CDS_NO_DISC& no disc is inserted, tray is closed\cr CDS_TRAY_OPEN& tray is opened\cr -CDS_DRIVE_NOT_READY& something is wrong, tray is moving?\cr +CDS_DRIVE_NOT_READY& tray just closed?\cr CDS_DISC_OK& a disc is loaded and everything is fine\cr +CDS_DRIVE_ERROR& something is wrong\cr } $$ +Note: The IDE and SCSI cdroms have a status code 'drive becoming ready' which +is typically returned when the drive has just closed and is analyzing the disc. +For other cdrom types this state is not reported by the hardware or not +implemented by the driver. + \subsection{$Int\ media_changed(struct\ cdrom_device_info * cdi, int\ disc_nr)$} This function is very similar to the original function in $struct\ diff --git a/Documentation/cdrom/ide-cd b/Documentation/cdrom/ide-cd index a5f2a7f1ff46..9324a8fd9a39 100644 --- a/Documentation/cdrom/ide-cd +++ b/Documentation/cdrom/ide-cd @@ -455,6 +455,9 @@ main (int argc, char **argv) case CDS_DRIVE_NOT_READY: printf ("Drive Not Ready.\n"); break; + case CDS_DRIVE_ERROR: + printf ("Drive problem.\n"); + break; default: printf ("This Should not happen!\n"); break; @@ -481,6 +484,9 @@ main (int argc, char **argv) case CDS_NO_INFO: printf ("No Information available."); break; + case CDS_DRIVE_ERROR: + printf ("Drive problem.\n"); + break; default: printf ("This Should not happen!\n"); break; diff --git a/Documentation/ioctl/cdrom.txt b/Documentation/ioctl/cdrom.txt index a4d62a9d6771..7720d11807c3 100644 --- a/Documentation/ioctl/cdrom.txt +++ b/Documentation/ioctl/cdrom.txt @@ -700,6 +700,7 @@ CDROM_DRIVE_STATUS Get tray position, etc. CDS_TRAY_OPEN CDS_DRIVE_NOT_READY CDS_DISC_OK + CDS_DRIVE_ERROR -1 error error returns: -- 2.13.6
[PATCH 1/6] delay: add poll_event_interruptible
Add convenience macro for polling an event that does not have a waitqueue. Signed-off-by: Michal Suchanek --- include/linux/delay.h | 12 1 file changed, 12 insertions(+) diff --git a/include/linux/delay.h b/include/linux/delay.h index b78bab4395d8..3ae9fa395628 100644 --- a/include/linux/delay.h +++ b/include/linux/delay.h @@ -64,4 +64,16 @@ static inline void ssleep(unsigned int seconds) msleep(seconds * 1000); } +#define poll_event_interruptible(event, interval) ({ \ + int ret = 0; \ + while (!(event)) { \ + if (signal_pending(current)) { \ + ret = -ERESTARTSYS; \ + break; \ + } \ + msleep_interruptible(interval); \ + } \ + ret; \ +}) + #endif /* defined(_LINUX_DELAY_H) */ -- 2.13.6
[PATCH 2/6] cdrom: factor out common open_for_* code
The open_for_audio and open_for_data copies are bitrotten in different ways already and will need to update the autoclose logic in both. Signed-off-by: Michal Suchanek --- drivers/cdrom/cdrom.c | 100 ++ 1 file changed, 36 insertions(+), 64 deletions(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index e36d160c458f..e976d3d0180d 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -1031,12 +1031,12 @@ static void cdrom_count_tracks(struct cdrom_device_info *cdi, tracktype *tracks) } static -int open_for_data(struct cdrom_device_info *cdi) +int open_for_common(struct cdrom_device_info *cdi, tracktype *tracks) { int ret; const struct cdrom_device_ops *cdo = cdi->ops; - tracktype tracks; - cd_dbg(CD_OPEN, "entering open_for_data\n"); + + cd_dbg(CD_OPEN, "entering " __func__ "\n"); /* Check if the driver can report drive status. If it can, we can do clever things. If it can't, well, we at least tried! */ if (cdo->drive_status != NULL) { @@ -1048,7 +1048,7 @@ int open_for_data(struct cdrom_device_info *cdi) if (CDROM_CAN(CDC_CLOSE_TRAY) && cdi->options & CDO_AUTO_CLOSE) { cd_dbg(CD_OPEN, "trying to close the tray\n"); - ret=cdo->tray_move(cdi,0); + ret = cdo->tray_move(cdi, 0); if (ret) { cd_dbg(CD_OPEN, "bummer. tried to close the tray but failed.\n"); /* Ignore the error from the low @@ -1056,37 +1056,45 @@ int open_for_data(struct cdrom_device_info *cdi) couldn't close the tray. We only care that there is no disc in the drive, since that is the _REAL_ problem here.*/ - ret=-ENOMEDIUM; - goto clean_up_and_return; + return -ENOMEDIUM; } } else { cd_dbg(CD_OPEN, "bummer. this drive can't close the tray.\n"); - ret=-ENOMEDIUM; - goto clean_up_and_return; + return -ENOMEDIUM; } /* Ok, the door should be closed now.. Check again */ ret = cdo->drive_status(cdi, CDSL_CURRENT); - if ((ret == CDS_NO_DISC) || (ret==CDS_TRAY_OPEN)) { + if ((ret == CDS_NO_DISC) || (ret == CDS_TRAY_OPEN)) { cd_dbg(CD_OPEN, "bummer. the tray is still not closed.\n"); cd_dbg(CD_OPEN, "tray might not contain a medium\n"); - ret=-ENOMEDIUM; - goto clean_up_and_return; + return -ENOMEDIUM; } cd_dbg(CD_OPEN, "the tray is now closed\n"); } - /* the door should be closed now, check for the disc */ - ret = cdo->drive_status(cdi, CDSL_CURRENT); - if (ret!=CDS_DISC_OK) { - ret = -ENOMEDIUM; - goto clean_up_and_return; - } + if (ret != CDS_DISC_OK) + return -ENOMEDIUM; } - cdrom_count_tracks(cdi, ); - if (tracks.error == CDS_NO_DISC) { + cdrom_count_tracks(cdi, tracks); + if (tracks->error == CDS_NO_DISC) { cd_dbg(CD_OPEN, "bummer. no disc.\n"); - ret=-ENOMEDIUM; - goto clean_up_and_return; + return -ENOMEDIUM; } + + return 0; +} + +static +int open_for_data(struct cdrom_device_info *cdi) +{ + int ret; + const struct cdrom_device_ops *cdo = cdi->ops; + tracktype tracks; + + cd_dbg(CD_OPEN, "entering " __func__ "\n"); + ret = open_for_common(cdi, ); + if (ret) + goto clean_up_and_return; + /* CD-Players which don't use O_NONBLOCK, workman * for example, need bit CDO_CHECK_TYPE cleared! */ if (tracks.data==0) { @@ -1196,53 +1204,17 @@ int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev, /* This code is similar to that in open_for_data. The routine is called whenever an audio play operation is requested. */ -static int check_for_audio_disc(struct cdrom_device_info *cdi, - const struct cdrom_de
[PATCH] init/main.c: simplify repair_env_string
Quoting characters are now removed from the parameter so value always follows directly after the NUL terminating parameter name. Signed-off-by: Michal Suchanek --- init/main.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) Since the previous "[PATCH v9 3/8] lib/cmdline.c: add backslash support to kernel commandline parsing" adds the memmove in lib/cmdline.c it is now superfluous in init/main.c diff --git a/init/main.c b/init/main.c index 1f5fdedbb293..1e5b1dc940d9 100644 --- a/init/main.c +++ b/init/main.c @@ -244,15 +244,10 @@ static int __init repair_env_string(char *param, char *val, const char *unused, void *arg) { if (val) { - /* param=val or param="val"? */ - if (val == param+strlen(param)+1) - val[-1] = '='; - else if (val == param+strlen(param)+2) { - val[-2] = '='; - memmove(val-1, val, strlen(val)+1); - val--; - } else - BUG(); + int parm_len = strlen(param); + + param[parm_len] = '='; + BUG_ON(val != param + parm_len + 1); } return 0; } -- 2.13.6
[PATCH] Fix parse_args cycle limit check.
Actually args are supposed to be renamed to next so both and args hold the previous argument so both can be passed to the callback. This additionla patch should fix up the rename. --- kernel/params.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/params.c b/kernel/params.c index 69ff58e69887..efb4dfaa6bc5 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -182,17 +182,18 @@ char *parse_args(const char *doing, if (*args) pr_debug("doing %s, parsing ARGS: '%s'\n", doing, args); + else + return err; - next = next_arg(args, , ); - while (*next) { + do { int ret; int irq_was_disabled; - args = next; next = next_arg(args, , ); + /* Stop at -- */ if (!val && strcmp(param, "--") == 0) - return err ?: args; + return err ?: next; irq_was_disabled = irqs_disabled(); ret = parse_one(param, val, args, next, doing, params, num, min_level, max_level, arg, unknown); @@ -215,9 +216,10 @@ char *parse_args(const char *doing, doing, val ?: "", param); break; } - err = ERR_PTR(ret); - } + + args = next; + } while (*args); return err; } -- 2.13.6
[PATCH] Optimize final quote removal.
This is additional patch that avoids the memmove when processing the quote on the end of the parameter. --- lib/cmdline.c | 9 +++-- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/lib/cmdline.c b/lib/cmdline.c index c5335a79a177..b1d8a0dc60fc 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -191,7 +191,13 @@ bool parse_option_str(const char *str, const char *option) return false; } +#define break_arg_end(i) { \ + if (isspace(args[i]) && !in_quote && !backslash && !in_single) \ + break; \ + } + #define squash_char { \ + break_arg_end(i + 1); \ memmove(args + 1, args, i); \ args++; \ i--; \ @@ -209,8 +215,7 @@ char *next_arg(char *args, char **param, char **val) char *next; for (i = 0; args[i]; i++) { - if (isspace(args[i]) && !in_quote && !backslash && !in_single) - break; + break_arg_end(i); if ((equals == 0) && (args[i] == '=')) equals = i; -- 2.13.6
[PATCH v2] Do not disable driver and bus shutdown hook when class shutdown hook is set.
As seen from the implementation of the single class shutdown hook this is not very sound design. Rename the class shutdown hook to shutdown_pre to make it clear it runs before the driver shutdown hook. Signed-off-by: Michal Suchanek --- v2: rename class shutdown member to shutdown_pre --- drivers/base/core.c | 9 + drivers/char/tpm/tpm-chip.c | 11 ++- include/linux/device.h | 4 ++-- 3 files changed, 9 insertions(+), 15 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 755451f684bc..13e7c41fd417 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2664,11 +2664,12 @@ void device_shutdown(void) pm_runtime_get_noresume(dev); pm_runtime_barrier(dev); - if (dev->class && dev->class->shutdown) { + if (dev->class && dev->class->shutdown_pre) { if (initcall_debug) - dev_info(dev, "shutdown\n"); - dev->class->shutdown(dev); - } else if (dev->bus && dev->bus->shutdown) { + dev_info(dev, "shutdown_pre\n"); + dev->class->shutdown_pre(dev); + } + if (dev->bus && dev->bus->shutdown) { if (initcall_debug) dev_info(dev, "shutdown\n"); dev->bus->shutdown(dev); diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 67ec9d3d04f5..0eca20c5a80c 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -164,14 +164,7 @@ static int tpm_class_shutdown(struct device *dev) chip->ops = NULL; up_write(>ops_sem); } - /* Allow bus- and device-specific code to run. Note: since chip->ops -* is NULL, more-specific shutdown code will not be able to issue TPM -* commands. -*/ - if (dev->bus && dev->bus->shutdown) - dev->bus->shutdown(dev); - else if (dev->driver && dev->driver->shutdown) - dev->driver->shutdown(dev); + return 0; } @@ -214,7 +207,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev, device_initialize(>devs); chip->dev.class = tpm_class; - chip->dev.class->shutdown = tpm_class_shutdown; + chip->dev.class->shutdown_pre = tpm_class_shutdown; chip->dev.release = tpm_dev_release; chip->dev.parent = pdev; chip->dev.groups = chip->groups; diff --git a/include/linux/device.h b/include/linux/device.h index beabdbc08420..649b1b72c76a 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -375,7 +375,7 @@ int subsys_virtual_register(struct bus_type *subsys, * @suspend: Used to put the device to sleep mode, usually to a low power * state. * @resume:Used to bring the device from the sleep mode. - * @shutdown: Called at shut-down time to quiesce the device. + * @shutdown_pre: Called at shut-down time before driver shutdown. * @ns_type: Callbacks so sysfs can detemine namespaces. * @namespace: Namespace of the device belongs to this class. * @pm:The default device power management operations of this class. @@ -404,7 +404,7 @@ struct class { int (*suspend)(struct device *dev, pm_message_t state); int (*resume)(struct device *dev); - int (*shutdown)(struct device *dev); + int (*shutdown_pre)(struct device *dev); const struct kobj_ns_type_operations *ns_type; const void *(*namespace)(struct device *dev); -- 2.10.2
[PATCH] ibmvnic: Fix unused variable warning
Fixes: a248878d7a1d ("ibmvnic: Check for transport event on driver resume") Signed-off-by: Michal Suchanek --- drivers/net/ethernet/ibm/ibmvnic.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 99576ba4187f..09c20d3b1b79 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -3948,7 +3948,6 @@ static int ibmvnic_resume(struct device *dev) { struct net_device *netdev = dev_get_drvdata(dev); struct ibmvnic_adapter *adapter = netdev_priv(netdev); - int i; if (adapter->state != VNIC_OPEN) return 0; -- 2.10.2
[PATCH] Do not disable driver and bus shutdown hook when class shutdown hook is set.
Disabling the driver hook by setting class hook is totally sound design not prone to error as evidenced by the single implementation of the class hook. Fixes: d1bd4a792d39 ("tpm: Issue a TPM2_Shutdown for TPM2 devices.") Fixes: f77af1516584 ("Add "shutdown" to "struct class".") Signed-off-by: Michal Suchanek --- drivers/base/core.c | 3 ++- drivers/char/tpm/tpm-chip.c | 9 + 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 755451f684bc..2cf752dc1421 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2668,7 +2668,8 @@ void device_shutdown(void) if (initcall_debug) dev_info(dev, "shutdown\n"); dev->class->shutdown(dev); - } else if (dev->bus && dev->bus->shutdown) { + } + if (dev->bus && dev->bus->shutdown) { if (initcall_debug) dev_info(dev, "shutdown\n"); dev->bus->shutdown(dev); diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 67ec9d3d04f5..edf8fa553f5f 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -164,14 +164,7 @@ static int tpm_class_shutdown(struct device *dev) chip->ops = NULL; up_write(>ops_sem); } - /* Allow bus- and device-specific code to run. Note: since chip->ops -* is NULL, more-specific shutdown code will not be able to issue TPM -* commands. -*/ - if (dev->bus && dev->bus->shutdown) - dev->bus->shutdown(dev); - else if (dev->driver && dev->driver->shutdown) - dev->driver->shutdown(dev); + return 0; } -- 2.10.2
[PATCH 5/6] lib/cmdline.c: Implement single quotes in commandline argument parsing
This brings the kernel parser about on par with bourne shell, grub, and other tools that chew the arguments before kernel does. This should make it easier to deal with multiple levels of nesting/quoting. With same quoting grammar on each level there is less room for confusion. Signed-off-by: Michal Suchanek --- lib/cmdline.c | 29 - 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/lib/cmdline.c b/lib/cmdline.c index d98bdc017545..c5335a79a177 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -191,34 +191,45 @@ bool parse_option_str(const char *str, const char *option) return false; } +#define squash_char { \ + memmove(args + 1, args, i); \ + args++; \ + i--; \ +} + /* * Parse a string to get a param value pair. - * You can use " around spaces, and you can escape with \ + * You can use " or ' around spaces, and you can escape with \ * Hyphens and underscores equivalent in parameter names. */ char *next_arg(char *args, char **param, char **val) { unsigned int i, equals = 0; - int in_quote = 0, backslash = 0; + int in_quote = 0, backslash = 0, in_single = 0; char *next; for (i = 0; args[i]; i++) { - if (isspace(args[i]) && !in_quote && !backslash) + if (isspace(args[i]) && !in_quote && !backslash && !in_single) break; if ((equals == 0) && (args[i] == '=')) equals = i; - if (!backslash) { - if ((args[i] == '"') || (args[i] == '\\')) { + if (in_single) { + if (args[i] == '\'') { + in_single = 0; + squash_char; + } + } else if (!backslash) { + if ((args[i] == '"') || (args[i] == '\\') || + (args[i] == '\'')) { if (args[i] == '"') in_quote = !in_quote; if (args[i] == '\\') backslash = 1; - - memmove(args + 1, args, i); - args++; - i--; + if (args[i] == '\'') + in_single = 1; + squash_char; } } else { backslash = 0; -- 2.10.2
[PATCH 3/6] powerpc/fadump: stop removing quotes in argument parsing.
Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/fadump.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 1678d99ea835..275ea42a27d5 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -494,13 +494,6 @@ static void __init fadump_update_params(struct param_info *param_info, if (!val) return; - /* remove one leading and one trailing quote if both are present */ - if ((val[0] == '"') && (val[vallen - 1] == '"')) { - shortening += 2; - vallen -= 2; - val++; - } - strncpy(tgt, FADUMP_EXTRA_ARGS_PARAM, FADUMP_EXTRA_ARGS_LEN); tgt += FADUMP_EXTRA_ARGS_LEN; *tgt++ = ' '; -- 2.10.2
[PATCH 1/6] lib/cmdline.c: Add backslash support to kernel commandline parsing.
This allows passing quotes in kernel arguments. It is useful for passing fadump nested arguemnts in fadump_extra_args and might be useful if somebody wanted to pass a double quote directly as part of an argument. It is also useful to have quoting grammar more similar to shells and bootloaders. Signed-off-by: Michal Suchanek --- lib/cmdline.c | 41 - 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/lib/cmdline.c b/lib/cmdline.c index 6d398a8b63fc..d98bdc017545 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -193,30 +193,36 @@ bool parse_option_str(const char *str, const char *option) /* * Parse a string to get a param value pair. - * You can use " around spaces, but can't escape ". + * You can use " around spaces, and you can escape with \ * Hyphens and underscores equivalent in parameter names. */ char *next_arg(char *args, char **param, char **val) { unsigned int i, equals = 0; - int in_quote = 0, quoted = 0; + int in_quote = 0, backslash = 0; char *next; - if (*args == '"') { - args++; - in_quote = 1; - quoted = 1; - } - for (i = 0; args[i]; i++) { - if (isspace(args[i]) && !in_quote) + if (isspace(args[i]) && !in_quote && !backslash) break; - if (equals == 0) { - if (args[i] == '=') - equals = i; + + if ((equals == 0) && (args[i] == '=')) + equals = i; + + if (!backslash) { + if ((args[i] == '"') || (args[i] == '\\')) { + if (args[i] == '"') + in_quote = !in_quote; + if (args[i] == '\\') + backslash = 1; + + memmove(args + 1, args, i); + args++; + i--; + } + } else { + backslash = 0; } - if (args[i] == '"') - in_quote = !in_quote; } *param = args; @@ -225,13 +231,6 @@ char *next_arg(char *args, char **param, char **val) else { args[equals] = '\0'; *val = args + equals + 1; - - /* Don't include quotes in value. */ - if ((args[i-1] == '"') && ((quoted) || (**val == '"'))) { - args[i-1] = '\0'; - if (!quoted) - (*val)++; - } } if (args[i]) { -- 2.10.2
[PATCH 6/6] Documentation/admin-guide: single quotes in kernel arguments.
Signed-off-by: Michal Suchanek --- Documentation/admin-guide/kernel-parameters.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index 722d3f771924..1f9837266417 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -35,9 +35,10 @@ can also be entered as:: log-buf-len=1M print_fatal_signals=1 -Double-quotes and backslashes can be used to protect spaces in values, e.g.:: +Double-quotes single-quaotes and backslashes can be used to protect spaces +in values, e.g.:: - param="spaces in here" param2=spaces\ in\ here + param="spaces in here" param2=spaces\ in\ here param3='@%# !\' cpu lists: -- -- 2.10.2
[PATCH 2/6] Documentation/admin-guide: backslash support in commandline.
Signed-off-by: Michal Suchanek --- Documentation/admin-guide/kernel-parameters.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index b2598cc9834c..722d3f771924 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -35,9 +35,9 @@ can also be entered as:: log-buf-len=1M print_fatal_signals=1 -Double-quotes can be used to protect spaces in values, e.g.:: +Double-quotes and backslashes can be used to protect spaces in values, e.g.:: - param="spaces in here" + param="spaces in here" param2=spaces\ in\ here cpu lists: -- -- 2.10.2
[PATCH 4/6] powerpc/fadump: Update fadump ducumentation on quoting arguments.
Signed-off-by: Michal Suchanek --- Documentation/powerpc/firmware-assisted-dump.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index 2df88524d2c7..5705f55ffae4 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -173,7 +173,7 @@ How to enable firmware-assisted dump (fadump): can be used to reduce memory consumption during dump capture by disabling unwarranted resources/subsystems like CPUs, NUMA and such. Value with spaces can be passed as - 'fadump_extra_args=""parameter="value with spaces"""' + 'fadump_extra_args="parameter=\"value with spaces\""' 4. Optionally, user can also set 'crashkernel=' kernel cmdline to specify size of the memory to reserve for boot memory dump preservation. -- 2.10.2
[PATCH] powerpc/pseries: include linux/types.h in asm/hvcall.h
Commit 6e032b350cd1 ("powerpc/powernv: Check device-tree for RFI flush settings") uses u64 in asm/hvcall.h without including linux/types.h This breaks hvcall.h users that do not include the header themselves. Fixes: 6e032b350cd1 ("powerpc/powernv: Check device-tree for RFI flush settings") Signed-off-by: Michal Suchanek --- arch/powerpc/include/asm/hvcall.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index f0461618bf7b..eca3f9c68907 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -353,6 +353,7 @@ #define PROC_TABLE_GTSE0x01 #ifndef __ASSEMBLY__ +#include /** * plpar_hcall_norets: - Make a pseries hypervisor call with no return arguments -- 2.13.6
[PATCH 1/2] powerpc/fadump: return 0 on re-registration
When fadump is already registered return success. Currently EEXIST is returned which is difficult to handle race-free in userspace when shell scripts are used. If multiple writers are trying to write '1' there is no difference in whichever succeeds so just return 0 to all. Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/fadump.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 436aedf195ab..5a7355381dac 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1214,7 +1214,6 @@ static ssize_t fadump_register_store(struct kobject *kobj, break; case '1': if (fw_dump.dump_registered == 1) { - ret = -EEXIST; goto unlock_out; } /* Register Firmware-assisted dump */ -- 2.10.2
[PATCH 2/2] powerpc/fadump: use kstrtoint to handle sysfs store
Currently sysfs store handlers in fadump use if buf[0] == 'char'. This means input "100foo" is interpreted as '1' and "01" as '0'. Change to kstrtoint so leading zeroes and the like is handled in expected way. Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/fadump.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 5a7355381dac..241eff0b5f76 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1161,10 +1161,15 @@ static ssize_t fadump_release_memory_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { + int input = -1; + if (!fw_dump.dump_active) return -EPERM; - if (buf[0] == '1') { + if (kstrtoint(buf, 0, )) + return -EINVAL; + + if (input == 1) { /* * Take away the '/proc/vmcore'. We are releasing the dump * memory, hence it will not be valid anymore. @@ -1198,21 +1203,25 @@ static ssize_t fadump_register_store(struct kobject *kobj, const char *buf, size_t count) { int ret = 0; + int input = -1; if (!fw_dump.fadump_enabled || fdm_active) return -EPERM; + if (kstrtoint(buf, 0, )) + return -EINVAL; + mutex_lock(_mutex); - switch (buf[0]) { - case '0': + switch (input) { + case 0: if (fw_dump.dump_registered == 0) { goto unlock_out; } /* Un-register Firmware-assisted dump */ fadump_unregister_dump(); break; - case '1': + case 1: if (fw_dump.dump_registered == 1) { goto unlock_out; } -- 2.10.2
[PATCH] powerpc/mm/hash: Remove stale comment.
In commit e6f81a92015b ("powerpc/mm/hash: Support 68 bit VA") the masking is folded into ASM_VSID_SCRAMBLE but the comment about masking is removed only from the firt use of ASM_VSID_SCRAMBLE. Signed-off-by: Michal Suchanek --- arch/powerpc/mm/slb_low.S | 4 1 file changed, 4 deletions(-) diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S index bde378559d01..8e95e01b9e8e 100644 --- a/arch/powerpc/mm/slb_low.S +++ b/arch/powerpc/mm/slb_low.S @@ -296,10 +296,6 @@ slb_compare_rr_to_size: srdir10,r10,(SID_SHIFT_1T - SID_SHIFT) /* get 1T ESID */ rldimi r10,r9,ESID_BITS_1T,0 ASM_VSID_SCRAMBLE(r10,r9,r11,1T) - /* -* bits above VSID_BITS_1T need to be ignored from r10 -* also combine VSID and flags -*/ li r10,MMU_SEGSIZE_1T rldimi r11,r10,SLB_VSID_SSIZE_SHIFT,0 /* insert segment size */ -- 2.10.2
Re: [PATCH 2/3] mtd: spi-nor: core code for the Altera Quadspi Flash Controller v2
On 4 July 2017 at 02:00, Cyrille Pitchen wrote: > Hi Matthew, > > > Le 26/06/2017 à 18:13, matthew.gerl...@linux.intel.com a écrit : >> From: Matthew Gerlach >> +static int altera_quadspi_setup_banks(struct device *dev, >> + u32 bank, struct device_node *np) >> +{ >> + struct altera_quadspi *q = dev_get_drvdata(dev); >> + struct altera_quadspi_flash *flash; >> + struct spi_nor *nor; >> + int ret = 0; >> + char modalias[40] = {0}; >> + struct spi_nor_hwcaps hwcaps = { >> + .mask = SNOR_HWCAPS_READ | >> + SNOR_HWCAPS_READ_FAST | >> + SNOR_HWCAPS_READ_1_1_2 | >> + SNOR_HWCAPS_READ_1_1_4 | >> + SNOR_HWCAPS_PP, >> + }; > > since aletera_quadspi_{read|erase} just don't care about > nor->read_opcode, nor->program_opcode and so on and anyway override all > settings chosen by spi-nor.c, it means they will use Dual or Quad SPI > controllers as they want, whether SNOR_HWCAPS_READ_1_1_{2|4} are set or not. > Then I think it's risky to declare the READ_1_1_2 and READ_1_1_4 hwcaps > because it may trigger additionnal calls of nor->read_reg() / > nor->write_reg() from spi_nor_scan() with op codes not supported by > altera_quadspi_{read|write}_reg(). > >> + >> + if (bank > q->num_flashes - 1) >> + return -EINVAL; >> + >> + altera_quadspi_chip_select(q, bank); >> + >> + flash = devm_kzalloc(q->dev, sizeof(*flash), GFP_KERNEL); >> + if (!flash) >> + return -ENOMEM; >> + >> + q->flash[bank] = flash; >> + nor = >nor; >> + nor->dev = dev; >> + nor->priv = flash; >> + nor->mtd.priv = nor; >> + flash->q = q; >> + flash->bank = bank; >> + spi_nor_set_flash_node(nor, np); >> + >> + /* spi nor framework*/ >> + nor->read_reg = altera_quadspi_read_reg; >> + nor->write_reg = altera_quadspi_write_reg; >> + nor->read = altera_quadspi_read; >> + nor->write = altera_quadspi_write; >> + nor->erase = altera_quadspi_erase; >> + nor->flash_lock = altera_quadspi_lock; >> + nor->flash_unlock = altera_quadspi_unlock; > > nor->flash_lock and nor->flash_unlock are described as "FLASH SPECIFIC" > in include/linux/mtd/spi-nor.h as opposed to "DRIVER SPECIFIC" functions > like nor->read, nor->read_reg, ... > > It means the actual implementations should be provided by the spi-nor > sub-system but not by each SPI controller driver. > > > > For me, it really sounds like a bad idea that this driver tries so much > to mystify the spi-nor sub-system. > > I can understand that you have to cope with the hardware design and its > limitations but clearly it looks the spi-nor API is not suited to this > hardware. This driver ignores and by-passes any settings selected by > spi_nor_scan(). > Duplicating code is generally a bad idea but in this case, I don't know > if trying to reuse spi_nor_read() / spi_nor_write() and spi_nor_erase() > from spi-nor.c is that helpful. > > Why not directly plug your driver into the above mtd layer implementing > you own version of mtd->_read(), mtd->_write() and mtd->_erase() then > registering the mtd device? It may be not the way to go but at least we > should study this alternative. AFAICT fsl-quadspi does just that preventing the use of the SPI controller for non-flash devices. There is at least one accelerated driver that is passed the opcodes to program in the controller for read acceleration in spi_flash_read so reusing that should be viable. If the opcodes can be programmed or match what is hardcoded in the controller use the acceleration and fallback to plain spi transfer if there is mismatch between what m25p80_read requests and what the controller can do. If this works and you can still use the plain SPI trnsfers the controller will be much morer useful than fsl-quadspi. Thanks Michal
[PATCH] s390/decompressor: add fortify_panic as x86 has.
Fix following error: LD arch/s390/boot/compressed/vmlinux drivers/s390/char/sclp_early_core.o: In function `memcpy': ../include/linux/string.h:340: undefined reference to `fortify_panic' make[4]: *** [../arch/s390/boot/compressed/Makefile:29: arch/s390/boot/compressed/vmlinux] Error 1 Fixes: 79962038dffa ("s390: add support for FORTIFY_SOURCE") Signed-off-by: Michal Suchanek --- arch/s390/boot/compressed/misc.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/s390/boot/compressed/misc.c b/arch/s390/boot/compressed/misc.c index cecf38b9ec82..e79c4499c548 100644 --- a/arch/s390/boot/compressed/misc.c +++ b/arch/s390/boot/compressed/misc.c @@ -174,3 +174,7 @@ unsigned long decompress_kernel(void) return (unsigned long) output; } +void fortify_panic(const char *name) +{ + error("detected buffer overflow"); +} -- 2.13.6
[PATCH v8 6/6] powerpc/fadump: use the new parse_args callback arguments
Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/fadump.c | 47 1 file changed, 13 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 8778e1cc0380..1678d99ea835 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -481,33 +481,19 @@ struct param_info { }; static void __init fadump_update_params(struct param_info *param_info, - char *param, char *val) + char *param, char *val, + char *currant, char *next) { - ptrdiff_t param_offset = param - param_info->tmp_cmdline; + ptrdiff_t param_offset = currant - param_info->tmp_cmdline; size_t vallen = val ? strlen(val) : 0; char *tgt = param_info->cmdline + param_offset - param_info->shortening; - int shortening = 0; - int quoted = 0; + int shortening = ((next - 1) - (currant)) + - (FADUMP_EXTRA_ARGS_LEN + 1 + vallen); if (!val) return; - /* leading '"' removed from parameter */ - if ((param > param_info->tmp_cmdline) && *(param - 1) == '"') { - quoted = 1; - shortening += 1; - tgt--; - } - - /* next_arg removes one leading and one trailing '"' */ - if ((*(tgt + FADUMP_EXTRA_ARGS_LEN + 1 + vallen + shortening) == '"') && - (quoted || (*(tgt + FADUMP_EXTRA_ARGS_LEN + 1) == '"'))) { - shortening += 1; - if (!quoted) - shortening += 1; - } - /* remove one leading and one trailing quote if both are present */ if ((val[0] == '"') && (val[vallen - 1] == '"')) { shortening += 2; @@ -515,22 +501,15 @@ static void __init fadump_update_params(struct param_info *param_info, val++; } - /* some characters were removed - move the trailing part of cmdline */ - if (shortening) { - char *src; + strncpy(tgt, FADUMP_EXTRA_ARGS_PARAM, FADUMP_EXTRA_ARGS_LEN); + tgt += FADUMP_EXTRA_ARGS_LEN; + *tgt++ = ' '; + strncpy(tgt, val, vallen); + tgt += vallen; - strncpy(tgt, FADUMP_EXTRA_ARGS_PARAM, FADUMP_EXTRA_ARGS_LEN); - tgt += FADUMP_EXTRA_ARGS_LEN; - *tgt++ = ' '; - - strncpy(tgt, val, vallen); - tgt += vallen; - - src = tgt + shortening; + if (shortening) { + char *src = tgt + shortening; memmove(tgt, src, strlen(src) + 1); - } else { - /* remove the '=' */ - *(tgt + FADUMP_EXTRA_ARGS_LEN) = ' '; } param_info->shortening += shortening; @@ -550,7 +529,7 @@ static int __init fadump_rework_cmdline_params(char *param, char *val, strlen(FADUMP_EXTRA_ARGS_PARAM) - 1)) return 0; - fadump_update_params(param_info, param, val); + fadump_update_params(param_info, param, val, currant, next); return 0; } -- 2.10.2
[PATCH v8 1/6] powerpc/fadump: reduce memory consumption for capture kernel
With fadump (dump capture) kernel booting like a regular kernel, it needs almost the same amount of memory to boot as the production kernel, which is unwarranted for a dump capture kernel. But with no option to disable some of the unnecessary subsystems in fadump kernel, that much memory is wasted on fadump, depriving the production kernel of that memory. Introduce kernel parameter 'fadump_extra_args=' that would take regular parameters as a space separated quoted string, to be enforced when fadump is active. This 'fadump_extra_args=' parameter can be leveraged to pass parameters like nr_cpus=1, cgroup_disable=memory and numa=off, to disable unwarranted resources/subsystems. Also, ensure the log "Firmware-assisted dump is active" is printed early in the boot process to put the subsequent fadump messages in context. Suggested-by: Michael Ellerman Signed-off-by: Hari Bathini Signed-off-by: Michal Suchanek --- Changes from v6: Correct and simplify quote handling. Ideally I would like to extend parse_args to give the length of the original quoted value to callback. However, parse_args removes at most one doubel-quote from the start and one from the end so that is easy to detect. Otherwise all other users will have to be updated to trash the new argument. Changes from v7: Handle leading quote in parameter name. --- arch/powerpc/include/asm/fadump.h | 2 + arch/powerpc/kernel/fadump.c | 122 +- arch/powerpc/kernel/prom.c| 7 +++ 3 files changed, 128 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index 5a23010af600..41b50b317a67 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -208,12 +208,14 @@ extern int early_init_dt_scan_fw_dump(unsigned long node, const char *uname, int depth, void *data); extern int fadump_reserve_mem(void); extern int setup_fadump(void); +extern void enforce_fadump_extra_args(char *cmdline); extern int is_fadump_active(void); extern int should_fadump_crash(void); extern void crash_fadump(struct pt_regs *, const char *); extern void fadump_cleanup(void); #else /* CONFIG_FA_DUMP */ +static inline void enforce_fadump_extra_args(char *cmdline) { } static inline int is_fadump_active(void) { return 0; } static inline int should_fadump_crash(void) { return 0; } static inline void crash_fadump(struct pt_regs *regs, const char *str) { } diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index e1431800bfb9..0e08f1a80af2 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, * dump data waiting for us. */ fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL); - if (fdm_active) + if (fdm_active) { + pr_info("Firmware-assisted dump is active.\n"); fw_dump.dump_active = 1; + } /* Get the sizes required to store dump data for the firmware provided * dump sections. @@ -339,8 +341,11 @@ int __init fadump_reserve_mem(void) { unsigned long base, size, memory_boundary; - if (!fw_dump.fadump_enabled) + if (!fw_dump.fadump_enabled) { + if (fw_dump.dump_active) + pr_warn("Firmware-assisted dump was active but kernel booted with fadump disabled!\n"); return 0; + } if (!fw_dump.fadump_supported) { printk(KERN_INFO "Firmware-assisted dump is not supported on" @@ -380,7 +385,6 @@ int __init fadump_reserve_mem(void) memory_boundary = memblock_end_of_DRAM(); if (fw_dump.dump_active) { - printk(KERN_INFO "Firmware-assisted dump is active.\n"); /* * If last boot has crashed then reserve all the memory * above boot_memory_size so that we don't touch it until @@ -467,6 +471,118 @@ static int __init early_fadump_reserve_mem(char *p) } early_param("fadump_reserve_mem", early_fadump_reserve_mem); +#define FADUMP_EXTRA_ARGS_PARAM"fadump_extra_args=" +#define FADUMP_EXTRA_ARGS_LEN (strlen(FADUMP_EXTRA_ARGS_PARAM) - 1) + +struct param_info { + char*cmdline; + char*tmp_cmdline; + int shortening; +}; + +static void __init fadump_update_params(struct param_info *param_info, + char *param, char *val) +{ + ptrdiff_t param_offset = param - param_info->tmp_cmdline; + size_t vallen = val ? strlen(val) : 0; + char *tgt = param_info->cmdline + param_offset + - param_info->shortening; + int shortening = 0; + int quoted = 0; + + if (!val) + r
[PATCH v8 4/6] powerpc/fadump: update the dequoting logic to match lib/cmdline.c
Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/fadump.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 0e08f1a80af2..b214c1e333dd 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -501,10 +501,12 @@ static void __init fadump_update_params(struct param_info *param_info, } /* next_arg removes one leading and one trailing '"' */ - if (*(tgt + FADUMP_EXTRA_ARGS_LEN + 1) == '"') - shortening += 1; - if (*(tgt + FADUMP_EXTRA_ARGS_LEN + 1 + vallen + shortening) == '"') + if ((*(tgt + FADUMP_EXTRA_ARGS_LEN + 1 + vallen + shortening) == '"') && + (quoted || (*(tgt + FADUMP_EXTRA_ARGS_LEN + 1) == '"'))) { shortening += 1; + if (!quoted) + shortening += 1; + } /* remove one leading and one trailing quote if both are present */ if ((val[0] == '"') && (val[vallen - 1] == '"')) { -- 2.10.2
[PATCH v8 5/6] boot/param: add pointer to current and next argument to unknown parameter callback
The fadump parameter processing re-does the logic of next_arg quote stripping to determine where the argument ends. Pass pointer to the current and next argument instead to make this more robust. Signed-off-by: Michal Suchanek --- rebase on master split off changes to fadump.c add pointer to current argument to detect shortening of the parameterer name --- arch/powerpc/kernel/fadump.c | 1 + include/linux/moduleparam.h | 1 + init/main.c | 8 ++-- kernel/module.c | 5 +++-- kernel/params.c | 20 +--- lib/dynamic_debug.c | 1 + 6 files changed, 25 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index b214c1e333dd..8778e1cc0380 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -541,6 +541,7 @@ static void __init fadump_update_params(struct param_info *param_info, * to enforce the parameters passed through it */ static int __init fadump_rework_cmdline_params(char *param, char *val, + char *currant, char *next, const char *unused, void *arg) { struct param_info *param_info = (struct param_info *)arg; diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h index 1ee7b30dafec..e86f3f830a7f 100644 --- a/include/linux/moduleparam.h +++ b/include/linux/moduleparam.h @@ -327,6 +327,7 @@ extern char *parse_args(const char *name, s16 level_max, void *arg, int (*unknown)(char *param, char *val, +char *currant, char *next, const char *doing, void *arg)); /* Called by module remove. */ diff --git a/init/main.c b/init/main.c index 0ee9c6866ada..9381aa24bca7 100644 --- a/init/main.c +++ b/init/main.c @@ -240,6 +240,7 @@ early_param("loglevel", loglevel); /* Change NUL term back to "=", to make "param" the whole string. */ static int __init repair_env_string(char *param, char *val, + char *unused3, char *unused2, const char *unused, void *arg) { if (val) { @@ -258,6 +259,7 @@ static int __init repair_env_string(char *param, char *val, /* Anything after -- gets handed straight to init. */ static int __init set_init_arg(char *param, char *val, + char *unused3, char *unused2, const char *unused, void *arg) { unsigned int i; @@ -265,7 +267,7 @@ static int __init set_init_arg(char *param, char *val, if (panic_later) return 0; - repair_env_string(param, val, unused, NULL); + repair_env_string(param, val, unused3, unused2, unused, NULL); for (i = 0; argv_init[i]; i++) { if (i == MAX_INIT_ARGS) { @@ -283,9 +285,10 @@ static int __init set_init_arg(char *param, char *val, * unused parameters (modprobe will find them in /proc/cmdline). */ static int __init unknown_bootoption(char *param, char *val, +char *unused3, char *unused2, const char *unused, void *arg) { - repair_env_string(param, val, unused, NULL); + repair_env_string(param, val, unused3, unused2, unused, NULL); /* Handle obsolete-style parameters */ if (obsolete_checksetup(param)) @@ -437,6 +440,7 @@ static noinline void __ref rest_init(void) /* Check for early params. */ static int __init do_early_param(char *param, char *val, +char *unused3, char *unused2, const char *unused, void *arg) { const struct obs_kernel_param *p; diff --git a/kernel/module.c b/kernel/module.c index 40f983cbea81..0f74718f8934 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3609,8 +3609,9 @@ static int prepare_coming_module(struct module *mod) return 0; } -static int unknown_module_param_cb(char *param, char *val, const char *modname, - void *arg) +static int unknown_module_param_cb(char *param, char *val, + char *unused, char *unused2, + const char *modname, void *arg) { struct module *mod = arg; int ret; diff --git a/kernel/params.c b/kernel/params.c index 60b2d8101355..c0e0c65f460b 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -119,6 +119,8 @@ static void param_check_unsafe(const struct kernel_param *kp) static int parse_one(char *param, char *val, +char *currant, +char *next, const char *doing, const struct kernel_param *params, unsigned num_params, @@ -126,7 +128,8 @@ st
[PATCH v8 3/6] lib/cmdline.c: Remove quotes symmetrically.
Remove quotes from argument value only if there is qoute on both sides. Signed-off-by: Michal Suchanek --- lib/cmdline.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/lib/cmdline.c b/lib/cmdline.c index 171c19b6888e..6d398a8b63fc 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -227,14 +227,12 @@ char *next_arg(char *args, char **param, char **val) *val = args + equals + 1; /* Don't include quotes in value. */ - if (**val == '"') { - (*val)++; - if (args[i-1] == '"') - args[i-1] = '\0'; + if ((args[i-1] == '"') && ((quoted) || (**val == '"'))) { + args[i-1] = '\0'; + if (!quoted) + (*val)++; } } - if (quoted && args[i-1] == '"') - args[i-1] = '\0'; if (args[i]) { args[i] = '\0'; -- 2.10.2
[PATCH v8 2/6] powerpc/fadump: update documentation about 'fadump_extra_args=' parameter
With the introduction of 'fadump_extra_args=' parameter to pass additional parameters to fadump (capture) kernel, update documentation about it. Signed-off-by: Hari Bathini Signed-off-by: Michal Suchanek --- Documentation/powerpc/firmware-assisted-dump.txt | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index bdd344aa18d9..2df88524d2c7 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -162,7 +162,19 @@ How to enable firmware-assisted dump (fadump): 1. Set config option CONFIG_FA_DUMP=y and build kernel. 2. Boot into linux kernel with 'fadump=on' kernel cmdline option. -3. Optionally, user can also set 'crashkernel=' kernel cmdline +3. A user can pass additional command line parameters as a space + separated quoted list through 'fadump_extra_args=' parameter, + to be enforced when fadump is active. For example, parameter + 'fadump_extra_args="nr_cpus=1 numa=off udev.children-max=2"' + will be changed to 'fadump_extra_args nr_cpus=1 numa=off + udev.children-max=2' in-place when fadump is active. This + parameter has no affect when fadump is not active. Multiple + instances of 'fadump_extra_args=' can be passed. This provision + can be used to reduce memory consumption during dump capture by + disabling unwarranted resources/subsystems like CPUs, NUMA + and such. Value with spaces can be passed as + 'fadump_extra_args=""parameter="value with spaces"""' +4. Optionally, user can also set 'crashkernel=' kernel cmdline to specify size of the memory to reserve for boot memory dump preservation. @@ -172,6 +184,12 @@ NOTE: 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead 2. If firmware-assisted dump fails to reserve memory then it will fallback to existing kdump mechanism if 'crashkernel=' option is set at kernel cmdline. + 3. Special parameters like '--' passed inside fadump_extra_args are also + just left in-place. So, the user is advised to consider this while + specifying such parameters. It may be required to quote the argument + to fadump_extra_args when the bootloader uses double-quotes as + argument delimiter as well. eg +append = " fadump_extra_args=\"nr_cpus=1 numa=off udev.children-max=2\"" Sysfs/debugfs files: -- 2.10.2
[PATCH 2/2] dt-bindings: arm: sunxi: Fix Orange Pi Zero bindings
There are two models of Orange Pi zero which are confusingly marketed under the same name. Old model comes without a flash memory and current model does have a flash memory. Add bindings for each model. Signed-off-by: Michal Suchanek --- Documentation/devicetree/bindings/arm/sunxi.yaml | 7 +++ 1 file changed, 7 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/sunxi.yaml b/Documentation/devicetree/bindings/arm/sunxi.yaml index efc9118233b4..7e76ea544bf7 100644 --- a/Documentation/devicetree/bindings/arm/sunxi.yaml +++ b/Documentation/devicetree/bindings/arm/sunxi.yaml @@ -864,8 +864,15 @@ properties: - const: xunlong,orangepi-win - const: allwinner,sun50i-a64 + - description: Xunlong OrangePi Zero (old model without flash memory) +items: + - const: xunlong,orangepi-zero-no-flash + - const: xunlong,orangepi-zero + - const: allwinner,sun8i-h2-plus + - description: Xunlong OrangePi Zero items: + - const: xunlong,orangepi-zero-with-flash - const: xunlong,orangepi-zero - const: allwinner,sun8i-h2-plus -- 2.28.0
[PATCH 1/2] ARM: dts: sun8i: h2+: Fix Orange Pi Zero device description.
There are two models of Orange Pi zero which are confusingly marketed under the same name. Old model comes without a flash memory and current model does have a flash memory. Build device tree for each model. Signed-off-by: Michal Suchanek --- arch/arm/boot/dts/Makefile| 1 + .../sun8i-h2-plus-orangepi-zero-no-flash.dts | 210 ++ .../boot/dts/sun8i-h2-plus-orangepi-zero.dts | 201 + 3 files changed, 215 insertions(+), 197 deletions(-) create mode 100644 arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero-no-flash.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 4572db3fa5ae..f2853cea0c9c 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -1168,6 +1168,7 @@ dtb-$(CONFIG_MACH_SUN8I) += \ sun8i-h2-plus-libretech-all-h3-cc.dtb \ sun8i-h2-plus-orangepi-r1.dtb \ sun8i-h2-plus-orangepi-zero.dtb \ + sun8i-h2-plus-orangepi-zero-no-flash.dtb \ sun8i-h3-bananapi-m2-plus.dtb \ sun8i-h3-bananapi-m2-plus-v1.2.dtb \ sun8i-h3-beelink-x2.dtb \ diff --git a/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero-no-flash.dts b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero-no-flash.dts new file mode 100644 index ..3859b663e3f0 --- /dev/null +++ b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero-no-flash.dts @@ -0,0 +1,210 @@ +/* + * Copyright (C) 2016 Icenowy Zheng + * + * Based on sun8i-h3-orangepi-one.dts, which is: + * Copyright (C) 2016 Hans de Goede + * + * This file is dual-licensed: you can use it either under the terms + * of the GPL or the X11 license, at your option. Note that this dual + * licensing only applies to this file, and not this project as a + * whole. + * + * a) This file is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License, or (at your option) any later version. + * + * This file is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Or, alternatively, + * + * b) Permission is hereby granted, free of charge, to any person + * obtaining a copy of this software and associated documentation + * files (the "Software"), to deal in the Software without + * restriction, including without limitation the rights to use, + * copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following + * conditions: + * + * The above copyright notice and this permission notice shall be + * included in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES + * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT + * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, + * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +/dts-v1/; +#include "sun8i-h3.dtsi" +#include "sunxi-common-regulators.dtsi" + +#include +#include + +/ { + model = "Xunlong Orange Pi Zero (old model without flash memory)"; + compatible = "xunlong,orangepi-zero-no-flash", + "xunlong,orangepi-zero", "allwinner,sun8i-h2-plus"; + + aliases { + serial0 = + /* ethernet0 is the H3 emac, defined in sun8i-h3.dtsi */ + ethernet0 = + ethernet1 = + }; + + chosen { + stdout-path = "serial0:115200n8"; + }; + + leds { + compatible = "gpio-leds"; + + pwr_led { + label = "orangepi:green:pwr"; + gpios = <_pio 0 10 GPIO_ACTIVE_HIGH>; + default-state = "on"; + }; + + status_led { + label = "orangepi:red:status"; + gpios = < 0 17 GPIO_ACTIVE_HIGH>; + }; + }; + + reg_vcc_wifi: reg_vcc_wifi { + compatible = "regulator-fixed"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + regulator-name = "vcc-wifi"; + enable-active-high; + gpio
[PATCH] char: virtio: Select VIRTIO from VIRTIO_CONSOLE.
Make it possible to have virtio console built-in when other virtio drivers are modular. Signed-off-by: Michal Suchanek --- drivers/char/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 3a144c000a38..9bd9917ca9af 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -93,8 +93,9 @@ config PPDEV config VIRTIO_CONSOLE tristate "Virtio console" - depends on VIRTIO && TTY + depends on TTY select HVC_DRIVER + select VIRTIO help Virtio console for use with hypervisors. -- 2.28.0
[PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting"
This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d. This commit causes the kernel to oops and reboot when injecting a SLB multihit which causes a MCE. Before this commit a SLB multihit was corrected by the kernel and the system continued to operate normally. cc: sta...@vger.kernel.org Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/mce.c | 7 --- arch/powerpc/kernel/traps.c | 18 +++--- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index ada59f6c4298..2e13528dcc92 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -591,14 +591,10 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info); long notrace machine_check_early(struct pt_regs *regs) { long handled = 0; - bool nested = in_nmi(); u8 ftrace_enabled = this_cpu_get_ftrace_enabled(); this_cpu_set_ftrace_enabled(0); - if (!nested) - nmi_enter(); - hv_nmi_check_nonrecoverable(regs); /* @@ -607,9 +603,6 @@ long notrace machine_check_early(struct pt_regs *regs) if (ppc_md.machine_check_early) handled = ppc_md.machine_check_early(regs); - if (!nested) - nmi_exit(); - this_cpu_set_ftrace_enabled(ftrace_enabled); return handled; diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index d1ebe152f210..7853b770918d 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -827,19 +827,7 @@ void machine_check_exception(struct pt_regs *regs) { int recover = 0; - /* -* BOOK3S_64 does not call this handler as a non-maskable interrupt -* (it uses its own early real-mode handler to handle the MCE proper -* and then raises irq_work to call this handler when interrupts are -* enabled). -* -* This is silly. The BOOK3S_64 should just call a different function -* rather than expecting semantics to magically change. Something -* like 'non_nmi_machine_check_exception()', perhaps? -*/ - const bool nmi = !IS_ENABLED(CONFIG_PPC_BOOK3S_64); - - if (nmi) nmi_enter(); + nmi_enter(); __this_cpu_inc(irq_stat.mce_exceptions); @@ -865,7 +853,7 @@ void machine_check_exception(struct pt_regs *regs) if (check_io_access(regs)) goto bail; - if (nmi) nmi_exit(); + nmi_exit(); die("Machine check", regs, SIGBUS); @@ -876,7 +864,7 @@ void machine_check_exception(struct pt_regs *regs) return; bail: - if (nmi) nmi_exit(); + nmi_exit(); } void SMIException(struct pt_regs *regs) -- 2.28.0
[PATCH] ibmveth: Fix use of ibmveth in a bridge.
From: Thomas Bogendoerfer The check for src mac address in ibmveth_is_packet_unsupported is wrong. Commit 6f2275433a2f wanted to shut down messages for loopback packets, but now suppresses bridged frames, which are accepted by the hypervisor otherwise bridging won't work at all. Fixes: 6f2275433a2f ("ibmveth: Detect unsupported packets before sending to the hypervisor") Signed-off-by: Michal Suchanek --- ms: added commit message --- drivers/net/ethernet/ibm/ibmveth.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 7ef3369953b6..c3ec9ceed833 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1031,12 +1031,6 @@ static int ibmveth_is_packet_unsupported(struct sk_buff *skb, ret = -EOPNOTSUPP; } - if (!ether_addr_equal(ether_header->h_source, netdev->dev_addr)) { - netdev_dbg(netdev, "source packet MAC address does not match veth device's, dropping packet.\n"); - netdev->stats.tx_dropped++; - ret = -EOPNOTSUPP; - } - return ret; } -- 2.28.0
[PATCH] powerpc: Stop exporting __clear_user which is now inlined.
Stable commit 452e2a83ea23 ("powerpc: Fix __clear_user() with KUAP enabled") redefines __clear_user as inline function but does not remove the export. Fixes: 452e2a83ea23 ("powerpc: Fix __clear_user() with KUAP enabled") Signed-off-by: Michal Suchanek --- arch/powerpc/lib/ppc_ksyms.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/lib/ppc_ksyms.c b/arch/powerpc/lib/ppc_ksyms.c index c7f8e9586316..4b81fd96aa3e 100644 --- a/arch/powerpc/lib/ppc_ksyms.c +++ b/arch/powerpc/lib/ppc_ksyms.c @@ -24,7 +24,6 @@ EXPORT_SYMBOL(csum_tcpudp_magic); #endif EXPORT_SYMBOL(__copy_tofrom_user); -EXPORT_SYMBOL(__clear_user); EXPORT_SYMBOL(copy_page); #ifdef CONFIG_PPC64 -- 2.26.2
[PATCH 0/2] Tristate moount option comatibility fixup
Hello, after the tristate dax option change some applications fail to detect pmem devices because the dax option no longer shows in mtab when device is mounted with -o dax. At first it might seem stupid to detect pmem by looking at the mount options. However, if the application actually wants a mount point properly configured for dax rather than just backed by pmem I do not see any other easy way. Also this happens during early installtion steps when the mounted filesystem is typically empty and you want to perform non-destructive detection. If there are better ways to detect dax enabled mount poins I want to hear all about it. In the meantime we have legacy applications to support. It also makes sense that when you mount a device with -o dax it actually shows dax in the mount options. Not doind so is confusing for humans as well. Thanks Michal Michal Suchanek (2): xfs: show the dax option in mount options. ext4: show the dax option in mount options References: bsc#1178366 fs/ext4/super.c| 2 +- fs/xfs/xfs_super.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- 2.26.2
[PATCH 2/2] ext4: show the dax option in mount options
ext4 accepts both dax and dax_always option but shows only dax_always. Show both options. Fixes: 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state") Signed-off-by: Michal Suchanek --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ef4734b40e2a..7656c519cbe6 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2647,7 +2647,7 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb, if (IS_EXT2_SB(sb)) SEQ_OPTS_PUTS("dax"); else - SEQ_OPTS_PUTS("dax=always"); + SEQ_OPTS_PUTS("dax,dax=always"); } else if (test_opt2(sb, DAX_NEVER)) { SEQ_OPTS_PUTS("dax=never"); } else if (test_opt2(sb, DAX_INODE)) { -- 2.26.2
[PATCH 1/2] xfs: show the dax option in mount options.
xfs accepts both dax and dax_enum but shows only dax_enum. Show both options. Fixes: 8d6c3446ec23 ("fs/xfs: Make DAX mount option a tri-state") Signed-off-by: Michal Suchanek --- fs/xfs/xfs_super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e3e229e52512..a3b3840d 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -163,7 +163,7 @@ xfs_fs_show_options( { XFS_MOUNT_GRPID, ",grpid" }, { XFS_MOUNT_DISCARD,",discard" }, { XFS_MOUNT_LARGEIO,",largeio" }, - { XFS_MOUNT_DAX_ALWAYS, ",dax=always" }, + { XFS_MOUNT_DAX_ALWAYS, ",dax,dax=always" }, { XFS_MOUNT_DAX_NEVER, ",dax=never" }, { 0, NULL } }; -- 2.26.2
[PATCH] powerpc/fadump: when fadump is supported register the fadump sysfs files.
Currently it is not possible to distinguish the case when fadump is supported by firmware and disabled in kernel and completely unsupported using the kernel sysfs interface. User can investigate the devicetree but it is more reasonable to provide sysfs files in case we get some fadumpv2 in the future. With this patch sysfs files are available whenever fadump is supported by firmware. Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/fadump.c | 32 ++-- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 4eab97292cc2..f35ab2433a9b 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1671,13 +1671,9 @@ static void fadump_init_files(void) */ int __init setup_fadump(void) { - if (!fw_dump.fadump_enabled) - return 0; - - if (!fw_dump.fadump_supported) { + if (!fw_dump.fadump_supported && fw_dump.fadump_enabled) { printk(KERN_ERR "Firmware-assisted dump is not supported on" " this hardware\n"); - return 0; } fadump_show_config(); @@ -1685,18 +1681,26 @@ int __init setup_fadump(void) * If dump data is available then see if it is valid and prepare for * saving it to the disk. */ - if (fw_dump.dump_active) { + if (fw_dump.fadump_enabled) { + if (fw_dump.dump_active) { + /* +* if dump process fails then invalidate the +* registration and release memory before proceeding +* for re-registration. +*/ + if (process_fadump(fdm_active) < 0) + fadump_invalidate_release_mem(); + } /* -* if dump process fails then invalidate the registration -* and release memory before proceeding for re-registration. +* Initialize the kernel dump memory structure for FAD +* registration. */ - if (process_fadump(fdm_active) < 0) - fadump_invalidate_release_mem(); + else if (fw_dump.reserve_dump_area_size) + init_fadump_mem_struct(, + fw_dump.reserve_dump_area_start); } - /* Initialize the kernel dump memory structure for FAD registration. */ - else if (fw_dump.reserve_dump_area_size) - init_fadump_mem_struct(, fw_dump.reserve_dump_area_start); - fadump_init_files(); + if (fw_dump.fadump_supported) + fadump_init_files(); return 1; } -- 2.22.0
[PATCH rebased] powerpc/fadump: when fadump is supported register the fadump sysfs files.
Currently it is not possible to distinguish the case when fadump is supported by firmware and disabled in kernel and completely unsupported using the kernel sysfs interface. User can investigate the devicetree but it is more reasonable to provide sysfs files in case we get some fadumpv2 in the future. With this patch sysfs files are available whenever fadump is supported by firmware. Signed-off-by: Michal Suchanek --- Rebase on top of http://patchwork.ozlabs.org/patch/1150160/ [v5,31/31] powernv/fadump: support holes in kernel boot memory area --- arch/powerpc/kernel/fadump.c | 33 ++--- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 4b1bb3c55cf9..7ad424729e9c 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1319,13 +1319,9 @@ static void fadump_init_files(void) */ int __init setup_fadump(void) { - if (!fw_dump.fadump_enabled) - return 0; - - if (!fw_dump.fadump_supported) { + if (!fw_dump.fadump_supported && fw_dump.fadump_enabled) { printk(KERN_ERR "Firmware-assisted dump is not supported on" " this hardware\n"); - return 0; } fadump_show_config(); @@ -1333,19 +1329,26 @@ int __init setup_fadump(void) * If dump data is available then see if it is valid and prepare for * saving it to the disk. */ - if (fw_dump.dump_active) { + if (fw_dump.fadump_enabled) { + if (fw_dump.dump_active) { + /* +* if dump process fails then invalidate the +* registration and release memory before proceeding +* for re-registration. +*/ + if (fw_dump.ops->fadump_process(_dump) < 0) + fadump_invalidate_release_mem(); + } /* -* if dump process fails then invalidate the registration -* and release memory before proceeding for re-registration. +* Initialize the kernel dump memory structure for FAD +* registration. */ - if (fw_dump.ops->fadump_process(_dump) < 0) - fadump_invalidate_release_mem(); - } - /* Initialize the kernel dump memory structure for FAD registration. */ - else if (fw_dump.reserve_dump_area_size) - fw_dump.ops->fadump_init_mem_struct(_dump); + else if (fw_dump.reserve_dump_area_size) + fw_dump.ops->fadump_init_mem_struct(_dump); - fadump_init_files(); + } + if (fw_dump.fadump_supported) + fadump_init_files(); return 1; } -- 2.22.0
[PATCH] ARM: dts: sun8i: h2+: Enable optional SPI flash on Orange Pi Zero board
The flash is present on all new boards and users went out of their way to add it on the old ones. Enabling it makes a more reasonable default. Signed-off-by: Michal Suchanek --- arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts index f19ed981da9d..061d295bbba7 100644 --- a/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts +++ b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts @@ -163,8 +163,8 @@ { }; { - /* Disable SPI NOR by default: it optional on Orange Pi Zero boards */ - status = "disabled"; + /* Enable optional SPI NOR by default */ + status = "okay"; flash@0 { #address-cells = <1>; -- 2.28.0
[PATCH] net/ibmvnic: Fix missing { in __ibmvnic_reset
Commit 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue") adds a } without corresponding { causing build break. Fixes: 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue") Signed-off-by: Michal Suchanek --- drivers/net/ethernet/ibm/ibmvnic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 6644cabc8e75..5cb55ea671e3 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1984,7 +1984,7 @@ static void __ibmvnic_reset(struct work_struct *work) rwi = get_next_rwi(adapter); while (rwi) { if (adapter->state == VNIC_REMOVING || - adapter->state == VNIC_REMOVED) + adapter->state == VNIC_REMOVED) { kfree(rwi); rc = EBUSY; break; -- 2.22.0
[PATCH resend 0/6] Fix cdrom autoclose
Hello, there is cdrom autoclose feature that is supposed to close the tray, wait for the disc to become ready, and then open the device. This used to work in ancient times. Then in old times there was a hack in util-linux which worked around the breakage which probably resulted from switching to scsi emulation. Currently util-linux maintainer refuses to merge another hack on the basis that kernel still has the feature so it should be fixed there. Indeed, to implement this feature effectively from userspace one would need to know when the CD-ROM is in the "drive becoming ready" state which is knowledge that never leaves the hardware-specific driver and is passed neither to userspace nor the generic cdrom driver. So this patchset fixes the kernel autoclose implementation in cdrom.c and to do so reports the "drive becoming ready" state from the harware specific drivers. First time I did not get any feedback for the patches. I found a defect in tray_close - it used status function without checking it exists. So resending with the defect corrected. Michal Suchanek (6): delay: add poll_event_interruptible cdrom: factor out common open_for_* code cdrom: wait for tray to close cdrom: introduce CDS_DRIVE_ERROR Documentetion: cdrom: introduce CDS_DRIVE_ERROR cdrom: wait for drive to become ready Documentation/cdrom/cdrom-standard.tex | 8 ++- Documentation/cdrom/ide-cd | 6 ++ Documentation/ioctl/cdrom.txt | 1 + drivers/block/paride/pcd.c | 2 +- drivers/cdrom/cdrom.c | 124 - drivers/cdrom/gdrom.c | 2 +- drivers/ide/ide-cd_ioctl.c | 12 ++-- drivers/scsi/sr_ioctl.c| 2 +- include/linux/delay.h | 12 include/uapi/linux/cdrom.h | 1 + 10 files changed, 99 insertions(+), 71 deletions(-) -- 2.13.6
[PATCH resend 2/6] cdrom: factor out common open_for_* code
The open_for_audio and open_for_data copies are bitrotten in different ways already and will need to update the autoclose logic in both. Signed-off-by: Michal Suchanek --- drivers/cdrom/cdrom.c | 100 ++ 1 file changed, 36 insertions(+), 64 deletions(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index e36d160c458f..89746b3d193f 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -1031,12 +1031,12 @@ static void cdrom_count_tracks(struct cdrom_device_info *cdi, tracktype *tracks) } static -int open_for_data(struct cdrom_device_info *cdi) +int open_for_common(struct cdrom_device_info *cdi, tracktype *tracks) { int ret; const struct cdrom_device_ops *cdo = cdi->ops; - tracktype tracks; - cd_dbg(CD_OPEN, "entering open_for_data\n"); + + cd_dbg(CD_OPEN, "entering open_for_common\n"); /* Check if the driver can report drive status. If it can, we can do clever things. If it can't, well, we at least tried! */ if (cdo->drive_status != NULL) { @@ -1048,7 +1048,7 @@ int open_for_data(struct cdrom_device_info *cdi) if (CDROM_CAN(CDC_CLOSE_TRAY) && cdi->options & CDO_AUTO_CLOSE) { cd_dbg(CD_OPEN, "trying to close the tray\n"); - ret=cdo->tray_move(cdi,0); + ret = cdo->tray_move(cdi, 0); if (ret) { cd_dbg(CD_OPEN, "bummer. tried to close the tray but failed.\n"); /* Ignore the error from the low @@ -1056,37 +1056,45 @@ int open_for_data(struct cdrom_device_info *cdi) couldn't close the tray. We only care that there is no disc in the drive, since that is the _REAL_ problem here.*/ - ret=-ENOMEDIUM; - goto clean_up_and_return; + return -ENOMEDIUM; } } else { cd_dbg(CD_OPEN, "bummer. this drive can't close the tray.\n"); - ret=-ENOMEDIUM; - goto clean_up_and_return; + return -ENOMEDIUM; } /* Ok, the door should be closed now.. Check again */ ret = cdo->drive_status(cdi, CDSL_CURRENT); - if ((ret == CDS_NO_DISC) || (ret==CDS_TRAY_OPEN)) { + if ((ret == CDS_NO_DISC) || (ret == CDS_TRAY_OPEN)) { cd_dbg(CD_OPEN, "bummer. the tray is still not closed.\n"); cd_dbg(CD_OPEN, "tray might not contain a medium\n"); - ret=-ENOMEDIUM; - goto clean_up_and_return; + return -ENOMEDIUM; } cd_dbg(CD_OPEN, "the tray is now closed\n"); } - /* the door should be closed now, check for the disc */ - ret = cdo->drive_status(cdi, CDSL_CURRENT); - if (ret!=CDS_DISC_OK) { - ret = -ENOMEDIUM; - goto clean_up_and_return; - } + if (ret != CDS_DISC_OK) + return -ENOMEDIUM; } - cdrom_count_tracks(cdi, ); - if (tracks.error == CDS_NO_DISC) { + cdrom_count_tracks(cdi, tracks); + if (tracks->error == CDS_NO_DISC) { cd_dbg(CD_OPEN, "bummer. no disc.\n"); - ret=-ENOMEDIUM; - goto clean_up_and_return; + return -ENOMEDIUM; } + + return 0; +} + +static +int open_for_data(struct cdrom_device_info *cdi) +{ + int ret; + const struct cdrom_device_ops *cdo = cdi->ops; + tracktype tracks; + + cd_dbg(CD_OPEN, "entering open_for_data\n"); + ret = open_for_common(cdi, ); + if (ret) + goto clean_up_and_return; + /* CD-Players which don't use O_NONBLOCK, workman * for example, need bit CDO_CHECK_TYPE cleared! */ if (tracks.data==0) { @@ -1196,53 +1204,17 @@ int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev, /* This code is similar to that in open_for_data. The routine is called whenever an audio play operation is requested. */ -static int check_for_audio_disc(struct cdrom_device_info *cdi, - const struct cdrom_device_ops
[PATCH resend 6/6] cdrom: wait for drive to become ready
When the drive closes it can take tens of seconds until the disc is analyzed. Wait for the drive to become ready or report an error. Signed-off-by: Michal Suchanek --- drivers/cdrom/cdrom.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 69e85c902373..9994441f5041 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -1087,6 +1087,15 @@ int open_for_common(struct cdrom_device_info *cdi, tracktype *tracks) } cd_dbg(CD_OPEN, "the tray is now closed\n"); } + /* the door should be closed now, check for the disc */ + if (ret == CDS_DRIVE_NOT_READY) { + int poll_res = poll_event_interruptible( + CDS_DRIVE_NOT_READY != + (ret = cdo->drive_status(cdi, CDSL_CURRENT)), + 500); + if (poll_res == -ERESTARTSYS) + return poll_res; + } if (ret != CDS_DISC_OK) return -ENOMEDIUM; } -- 2.13.6
[PATCH resend 4/6] cdrom: introduce CDS_DRIVE_ERROR
CDS_DRIVE_NOT_READY is used for the state in which CDROM is 'becoming ready' (typically analyzing the disc) but also as the fallback when nothing else applies. Introduce CDS_DRIVE_ERROR for the fallback case. Signed-off-by: Michal Suchanek --- drivers/block/paride/pcd.c | 2 +- drivers/cdrom/gdrom.c | 2 +- drivers/ide/ide-cd_ioctl.c | 12 drivers/scsi/sr_ioctl.c| 2 +- include/uapi/linux/cdrom.h | 1 + 5 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/block/paride/pcd.c b/drivers/block/paride/pcd.c index 7b8c6368beb7..6e00093ff34e 100644 --- a/drivers/block/paride/pcd.c +++ b/drivers/block/paride/pcd.c @@ -605,7 +605,7 @@ static int pcd_drive_status(struct cdrom_device_info *cdi, int slot_nr) struct pcd_unit *cd = cdi->handle; if (pcd_ready_wait(cd, PCD_READY_TMO)) - return CDS_DRIVE_NOT_READY; + return CDS_DRIVE_ERROR; if (pcd_atapi(cd, rc_cmd, 8, pcd_scratch, DBMSG("check media"))) return CDS_NO_DISC; return CDS_DISC_OK; diff --git a/drivers/cdrom/gdrom.c b/drivers/cdrom/gdrom.c index 6495b03f576c..702f255bbe42 100644 --- a/drivers/cdrom/gdrom.c +++ b/drivers/cdrom/gdrom.c @@ -390,7 +390,7 @@ static int gdrom_drivestatus(struct cdrom_device_info *cd_info, int ignore) if (sense == 0) return CDS_DISC_OK; if (sense == 0x20) - return CDS_DRIVE_NOT_READY; + return CDS_DRIVE_ERROR; /* default */ return CDS_NO_INFO; } diff --git a/drivers/ide/ide-cd_ioctl.c b/drivers/ide/ide-cd_ioctl.c index 2acca12b9c94..9a26f50a2092 100644 --- a/drivers/ide/ide-cd_ioctl.c +++ b/drivers/ide/ide-cd_ioctl.c @@ -62,9 +62,13 @@ int ide_cdrom_drive_status(struct cdrom_device_info *cdi, int slot_nr) return CDS_NO_DISC; } - if (sense.sense_key == NOT_READY && sense.asc == 0x04 - && sense.ascq == 0x04) - return CDS_DISC_OK; + if (sense.sense_key == NOT_READY && sense.asc == 0x04) + switch (sense.ascq) { + case 0x01: + return CDS_DRIVE_NOT_READY; + case 0x04: + return CDS_DISC_OK; + } /* * If not using Mt Fuji extended media tray reports, @@ -77,7 +81,7 @@ int ide_cdrom_drive_status(struct cdrom_device_info *cdi, int slot_nr) else return CDS_TRAY_OPEN; } - return CDS_DRIVE_NOT_READY; + return CDS_DRIVE_ERROR; } /* diff --git a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c index 2a21f2d48592..7c93f12a9cb8 100644 --- a/drivers/scsi/sr_ioctl.c +++ b/drivers/scsi/sr_ioctl.c @@ -333,7 +333,7 @@ int sr_drive_status(struct cdrom_device_info *cdi, int slot) else return CDS_TRAY_OPEN; - return CDS_DRIVE_NOT_READY; + return CDS_DRIVE_ERROR; } int sr_disk_status(struct cdrom_device_info *cdi) diff --git a/include/uapi/linux/cdrom.h b/include/uapi/linux/cdrom.h index 2817230148fd..339b1435f44e 100644 --- a/include/uapi/linux/cdrom.h +++ b/include/uapi/linux/cdrom.h @@ -398,6 +398,7 @@ struct cdrom_generic_command #define CDS_TRAY_OPEN 2 #define CDS_DRIVE_NOT_READY3 #define CDS_DISC_OK4 +#define CDS_DRIVE_ERROR5 /* return values for the CDROM_DISC_STATUS ioctl */ /* can also return CDS_NO_[INFO|DISC], from above */ -- 2.13.6
[PATCH resend 3/6] cdrom: wait for tray to close
The scsi command to close tray only starts the motor and does not wait for the tray to close. Wait until the state chages from TRAY_OPEN so users do not race with the tray closing. This looks like inifinte wait but unless the drive is broken it either closes the tray within a few seconds or reports an error when it detects the tray is blocked. At worst the wait can be interrupted by user. Signed-off-by: Michal Suchanek --- v2: - check drive_status exists before using it - rename tray_close -> cdrom_tray_close --- drivers/cdrom/cdrom.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 89746b3d193f..69e85c902373 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -281,7 +281,9 @@ #include #include #include +#include #include +#include #include /* used to tell the module to turn on full debugging messages */ @@ -1030,6 +1032,18 @@ static void cdrom_count_tracks(struct cdrom_device_info *cdi, tracktype *tracks) tracks->cdi, tracks->xa); } +static int cdrom_tray_close(struct cdrom_device_info *cdi) +{ + int ret; + + ret = cdi->ops->tray_move(cdi, 0); + if (ret || !cdi->ops->drive_status) + return ret; + + return poll_event_interruptible(CDS_TRAY_OPEN != + cdi->ops->drive_status(cdi, CDSL_CURRENT), 500); +} + static int open_for_common(struct cdrom_device_info *cdi, tracktype *tracks) { @@ -1048,7 +1062,9 @@ int open_for_common(struct cdrom_device_info *cdi, tracktype *tracks) if (CDROM_CAN(CDC_CLOSE_TRAY) && cdi->options & CDO_AUTO_CLOSE) { cd_dbg(CD_OPEN, "trying to close the tray\n"); - ret = cdo->tray_move(cdi, 0); + ret = cdrom_tray_close(cdi); + if (ret == -ERESTARTSYS) + return ret; if (ret) { cd_dbg(CD_OPEN, "bummer. tried to close the tray but failed.\n"); /* Ignore the error from the low @@ -2312,7 +2328,8 @@ static int cdrom_ioctl_closetray(struct cdrom_device_info *cdi) if (!CDROM_CAN(CDC_CLOSE_TRAY)) return -ENOSYS; - return cdi->ops->tray_move(cdi, 0); + + return cdrom_tray_close(cdi); } static int cdrom_ioctl_eject_sw(struct cdrom_device_info *cdi, -- 2.13.6
[PATCH resend 1/6] delay: add poll_event_interruptible
Add convenience macro for polling an event that does not have a waitqueue. Signed-off-by: Michal Suchanek --- include/linux/delay.h | 12 1 file changed, 12 insertions(+) diff --git a/include/linux/delay.h b/include/linux/delay.h index b78bab4395d8..3ae9fa395628 100644 --- a/include/linux/delay.h +++ b/include/linux/delay.h @@ -64,4 +64,16 @@ static inline void ssleep(unsigned int seconds) msleep(seconds * 1000); } +#define poll_event_interruptible(event, interval) ({ \ + int ret = 0; \ + while (!(event)) { \ + if (signal_pending(current)) { \ + ret = -ERESTARTSYS; \ + break; \ + } \ + msleep_interruptible(interval); \ + } \ + ret; \ +}) + #endif /* defined(_LINUX_DELAY_H) */ -- 2.13.6
[PATCH resend 5/6] Documentetion: cdrom: introduce CDS_DRIVE_ERROR
CDS_DRIVE_NOT_READY is used for the state in which CDROM is 'becoming ready' (typically analyzing the disc) but also as the fallback when nothing else applies. Introduce CDS_DRIVE_ERROR for the fallback case. Signed-off-by: Michal Suchanek --- Documentation/cdrom/cdrom-standard.tex | 8 +++- Documentation/cdrom/ide-cd | 6 ++ Documentation/ioctl/cdrom.txt | 1 + 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/cdrom/cdrom-standard.tex b/Documentation/cdrom/cdrom-standard.tex index 8f85b0e41046..018284ba696a 100644 --- a/Documentation/cdrom/cdrom-standard.tex +++ b/Documentation/cdrom/cdrom-standard.tex @@ -371,11 +371,17 @@ $$ CDS_NO_INFO& no information available\cr CDS_NO_DISC& no disc is inserted, tray is closed\cr CDS_TRAY_OPEN& tray is opened\cr -CDS_DRIVE_NOT_READY& something is wrong, tray is moving?\cr +CDS_DRIVE_NOT_READY& tray just closed?\cr CDS_DISC_OK& a disc is loaded and everything is fine\cr +CDS_DRIVE_ERROR& something is wrong\cr } $$ +Note: The IDE and SCSI cdroms have a status code 'drive becoming ready' which +is typically returned when the drive has just closed and is analyzing the disc. +For other cdrom types this state is not reported by the hardware or not +implemented by the driver. + \subsection{$Int\ media_changed(struct\ cdrom_device_info * cdi, int\ disc_nr)$} This function is very similar to the original function in $struct\ diff --git a/Documentation/cdrom/ide-cd b/Documentation/cdrom/ide-cd index a5f2a7f1ff46..9324a8fd9a39 100644 --- a/Documentation/cdrom/ide-cd +++ b/Documentation/cdrom/ide-cd @@ -455,6 +455,9 @@ main (int argc, char **argv) case CDS_DRIVE_NOT_READY: printf ("Drive Not Ready.\n"); break; + case CDS_DRIVE_ERROR: + printf ("Drive problem.\n"); + break; default: printf ("This Should not happen!\n"); break; @@ -481,6 +484,9 @@ main (int argc, char **argv) case CDS_NO_INFO: printf ("No Information available."); break; + case CDS_DRIVE_ERROR: + printf ("Drive problem.\n"); + break; default: printf ("This Should not happen!\n"); break; diff --git a/Documentation/ioctl/cdrom.txt b/Documentation/ioctl/cdrom.txt index a4d62a9d6771..7720d11807c3 100644 --- a/Documentation/ioctl/cdrom.txt +++ b/Documentation/ioctl/cdrom.txt @@ -700,6 +700,7 @@ CDROM_DRIVE_STATUS Get tray position, etc. CDS_TRAY_OPEN CDS_DRIVE_NOT_READY CDS_DISC_OK + CDS_DRIVE_ERROR -1 error error returns: -- 2.13.6
[PATCH v5 4/5] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
On bigendian ppc64 it is common to have 32bit legacy binaries but much less so on littleendian. Signed-off-by: Michal Suchanek --- v3: make configurable --- arch/powerpc/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 5bab0bb6b833..b0339e892329 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -264,8 +264,9 @@ config PANIC_TIMEOUT default 180 config COMPAT - bool - default y if PPC64 + bool "Enable support for 32bit binaries" + depends on PPC64 + default y if !CPU_LITTLE_ENDIAN select COMPAT_BINFMT_ELF select ARCH_WANT_OLD_COMPAT_IPC select COMPAT_OLD_SIGACTION -- 2.22.0
[PATCH v5 5/5] powerpc/perf: split callchain.c by bitness
Building callchain.c with !COMPAT proved quite ugly with all the defines. Splitting out the 32bit and 64bit parts looks better. Also rewrite current_is_64bit as common function. No other code change intended. Signed-off-by: Michal Suchanek --- arch/powerpc/perf/Makefile | 4 + arch/powerpc/perf/callchain.c| 388 +-- arch/powerpc/perf/callchain.h| 11 + arch/powerpc/perf/callchain_32.c | 218 + arch/powerpc/perf/callchain_64.c | 185 +++ 5 files changed, 422 insertions(+), 384 deletions(-) create mode 100644 arch/powerpc/perf/callchain.h create mode 100644 arch/powerpc/perf/callchain_32.c create mode 100644 arch/powerpc/perf/callchain_64.c diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index c155dcbb8691..e9f3202251d0 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -1,6 +1,10 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_PERF_EVENTS) += callchain.o perf_regs.o +ifdef CONFIG_PERF_EVENTS +obj-y += callchain_$(BITS).o +obj-$(CONFIG_COMPAT) += callchain_32.o +endif obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o obj64-$(CONFIG_PPC_PERF_CTRS) += ppc970-pmu.o power5-pmu.o \ diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c index 881be5c4e9bb..981005625c05 100644 --- a/arch/powerpc/perf/callchain.c +++ b/arch/powerpc/perf/callchain.c @@ -15,11 +15,9 @@ #include #include #include -#ifdef CONFIG_COMPAT -#include "../kernel/ppc32.h" -#endif #include +#include "callchain.h" /* * Is sp valid as the address of the next kernel stack frame after prev_sp? @@ -102,188 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re } } -#ifdef CONFIG_PPC64 -/* - * On 64-bit we don't want to invoke hash_page on user addresses from - * interrupt context, so if the access faults, we read the page tables - * to find which page (if any) is mapped and access it directly. - */ -static int read_user_stack_slow(void __user *ptr, void *buf, int nb) -{ - int ret = -EFAULT; - pgd_t *pgdir; - pte_t *ptep, pte; - unsigned shift; - unsigned long addr = (unsigned long) ptr; - unsigned long offset; - unsigned long pfn, flags; - void *kaddr; - - pgdir = current->mm->pgd; - if (!pgdir) - return -EFAULT; - - local_irq_save(flags); - ptep = find_current_mm_pte(pgdir, addr, NULL, ); - if (!ptep) - goto err_out; - if (!shift) - shift = PAGE_SHIFT; - - /* align address to page boundary */ - offset = addr & ((1UL << shift) - 1); - - pte = READ_ONCE(*ptep); - if (!pte_present(pte) || !pte_user(pte)) - goto err_out; - pfn = pte_pfn(pte); - if (!page_is_ram(pfn)) - goto err_out; - - /* no highmem to worry about here */ - kaddr = pfn_to_kaddr(pfn); - memcpy(buf, kaddr + offset, nb); - ret = 0; -err_out: - local_irq_restore(flags); - return ret; -} - -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret) -{ - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) || - ((unsigned long)ptr & 7)) - return -EFAULT; - - pagefault_disable(); - if (!__get_user_inatomic(*ret, ptr)) { - pagefault_enable(); - return 0; - } - pagefault_enable(); - - return read_user_stack_slow(ptr, ret, 8); -} - -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret) -{ - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) || - ((unsigned long)ptr & 3)) - return -EFAULT; - - pagefault_disable(); - if (!__get_user_inatomic(*ret, ptr)) { - pagefault_enable(); - return 0; - } - pagefault_enable(); - - return read_user_stack_slow(ptr, ret, 4); -} - -static inline int valid_user_sp(unsigned long sp, int is_64) -{ - if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x1UL) - 32) - return 0; - return 1; -} - -/* - * 64-bit user processes use the same stack frame for RT and non-RT signals. - */ -struct signal_frame_64 { - chardummy[__SIGNAL_FRAMESIZE]; - struct ucontext uc; - unsigned long unused[2]; - unsigned inttramp[6]; - struct siginfo *pinfo; - void*puc; - struct siginfo info; - charabigap[288]; -}; - -static int is_sigreturn_64_address(unsigned long nip, unsigned long fp) -{ - if (nip == fp + offsetof(struct signal_frame_64, tramp)) - return 1; - if (vdso64_rt_sigtramp && current->mm->context.vdso_base && - nip == current->mm-&
[PATCH v5 2/5] powerpc: move common register copy functions from signal_32.c to signal.c
These functions are required for 64bit as well. Signed-off-by: Michal Suchanek --- arch/powerpc/kernel/signal.c| 141 arch/powerpc/kernel/signal_32.c | 140 --- 2 files changed, 141 insertions(+), 140 deletions(-) diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index e6c30cee6abf..60436432399f 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -18,12 +18,153 @@ #include #include #include +#include #include #include #include #include "signal.h" +#ifdef CONFIG_VSX +unsigned long copy_fpr_to_user(void __user *to, + struct task_struct *task) +{ + u64 buf[ELF_NFPREG]; + int i; + + /* save FPR copy to local buffer then write to the thread_struct */ + for (i = 0; i < (ELF_NFPREG - 1) ; i++) + buf[i] = task->thread.TS_FPR(i); + buf[i] = task->thread.fp_state.fpscr; + return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double)); +} + +unsigned long copy_fpr_from_user(struct task_struct *task, +void __user *from) +{ + u64 buf[ELF_NFPREG]; + int i; + + if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double))) + return 1; + for (i = 0; i < (ELF_NFPREG - 1) ; i++) + task->thread.TS_FPR(i) = buf[i]; + task->thread.fp_state.fpscr = buf[i]; + + return 0; +} + +unsigned long copy_vsx_to_user(void __user *to, + struct task_struct *task) +{ + u64 buf[ELF_NVSRHALFREG]; + int i; + + /* save FPR copy to local buffer then write to the thread_struct */ + for (i = 0; i < ELF_NVSRHALFREG; i++) + buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET]; + return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double)); +} + +unsigned long copy_vsx_from_user(struct task_struct *task, +void __user *from) +{ + u64 buf[ELF_NVSRHALFREG]; + int i; + + if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double))) + return 1; + for (i = 0; i < ELF_NVSRHALFREG ; i++) + task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; + return 0; +} + +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +unsigned long copy_ckfpr_to_user(void __user *to, + struct task_struct *task) +{ + u64 buf[ELF_NFPREG]; + int i; + + /* save FPR copy to local buffer then write to the thread_struct */ + for (i = 0; i < (ELF_NFPREG - 1) ; i++) + buf[i] = task->thread.TS_CKFPR(i); + buf[i] = task->thread.ckfp_state.fpscr; + return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double)); +} + +unsigned long copy_ckfpr_from_user(struct task_struct *task, + void __user *from) +{ + u64 buf[ELF_NFPREG]; + int i; + + if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double))) + return 1; + for (i = 0; i < (ELF_NFPREG - 1) ; i++) + task->thread.TS_CKFPR(i) = buf[i]; + task->thread.ckfp_state.fpscr = buf[i]; + + return 0; +} + +unsigned long copy_ckvsx_to_user(void __user *to, + struct task_struct *task) +{ + u64 buf[ELF_NVSRHALFREG]; + int i; + + /* save FPR copy to local buffer then write to the thread_struct */ + for (i = 0; i < ELF_NVSRHALFREG; i++) + buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET]; + return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double)); +} + +unsigned long copy_ckvsx_from_user(struct task_struct *task, + void __user *from) +{ + u64 buf[ELF_NVSRHALFREG]; + int i; + + if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double))) + return 1; + for (i = 0; i < ELF_NVSRHALFREG ; i++) + task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; + return 0; +} +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ +#else +inline unsigned long copy_fpr_to_user(void __user *to, + struct task_struct *task) +{ + return __copy_to_user(to, task->thread.fp_state.fpr, + ELF_NFPREG * sizeof(double)); +} + +inline unsigned long copy_fpr_from_user(struct task_struct *task, + void __user *from) +{ + return __copy_from_user(task->thread.fp_state.fpr, from, + ELF_NFPREG * sizeof(double)); +} + +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +inline unsigned long copy_ckfpr_to_user(void __user *to, +struct task_struct *task) +{ + return __copy_to_user(to, task->thread.ckfp_state.fpr, +
[PATCH v5 3/5] powerpc/64: make buildable without CONFIG_COMPAT
There are numerous references to 32bit functions in generic and 64bit code so ifdef them out. Signed-off-by: Michal Suchanek --- v2: - fix 32bit ifdef condition in signal.c - simplify the compat ifdef condition in vdso.c - 64bit is redundant - simplify the compat ifdef condition in callchain.c - 64bit is redundant v3: - use IS_ENABLED and maybe_unused where possible - do not ifdef declarations - clean up Makefile v4: - further makefile cleanup - simplify is_32bit_task conditions - avoid ifdef in condition by using return v5: - avoid unreachable code on 32bit - make is_current_64bit constant on !COMPAT - add stub perf_callchain_user_32 to avoid some ifdefs --- arch/powerpc/include/asm/thread_info.h | 4 ++-- arch/powerpc/kernel/Makefile | 7 +++ arch/powerpc/kernel/entry_64.S | 2 ++ arch/powerpc/kernel/signal.c | 3 +-- arch/powerpc/kernel/syscall_64.c | 6 ++ arch/powerpc/kernel/vdso.c | 5 ++--- arch/powerpc/perf/callchain.c | 13 +++-- 7 files changed, 23 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 8e1d0195ac36..c128d8a48ea3 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned int flags) return (ti->local_flags & flags) != 0; } -#ifdef CONFIG_PPC64 +#ifdef CONFIG_COMPAT #define is_32bit_task()(test_thread_flag(TIF_32BIT)) #else -#define is_32bit_task()(1) +#define is_32bit_task()(IS_ENABLED(CONFIG_PPC32)) #endif #if defined(CONFIG_PPC64) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 1d646a94d96c..9d8772e863b9 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -44,16 +44,15 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING endif obj-y := cputable.o ptrace.o syscalls.o \ - irq.o align.o signal_32.o pmc.o vdso.o \ + irq.o align.o signal_$(BITS).o pmc.o vdso.o \ process.o systbl.o idle.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ of_platform.o prom_parse.o -obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \ - signal_64.o ptrace32.o \ - paca.o nvram_64.o firmware.o \ +obj-$(CONFIG_PPC64)+= setup_64.o paca.o nvram_64.o firmware.o \ syscall_64.o +obj-$(CONFIG_COMPAT) += sys_ppc32.o ptrace32.o signal_32.o obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2ec825a85f5b..a2dbf216f607 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -51,8 +51,10 @@ SYS_CALL_TABLE: .tc sys_call_table[TC],sys_call_table +#ifdef CONFIG_COMPAT COMPAT_SYS_CALL_TABLE: .tc compat_sys_call_table[TC],compat_sys_call_table +#endif /* This value is used to mark exception frames on the stack. */ exception_marker: diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index 60436432399f..61678cb0e6a1 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk) sigset_t *oldset = sigmask_to_save(); struct ksignal ksig = { .sig = 0 }; int ret; - int is32 = is_32bit_task(); BUG_ON(tsk != current); @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk) rseq_signal_deliver(, tsk->thread.regs); - if (is32) { + if (is_32bit_task()) { if (ksig.ka.sa.sa_flags & SA_SIGINFO) ret = handle_rt_signal32(, oldset, tsk); else diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index 98ed970796d5..0d5cbbe54cf1 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -38,7 +38,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, long); long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs) { - unsigned long ti_flags; syscall_fn f; BUG_ON(!(regs->msr & MSR_PR)); @@ -83,8 +82,7 @@ long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, */ regs->softe = IRQS_ENABLED; - ti_flags = current_thread_info()->flags; -
[PATCH v5 1/5] powerpc: make llseek 32bit-only.
The llseek syscall is not built in fs/read_write.c when !64bit && !COMPAT With the syscall marked as common in syscall.tbl build fails in this case. The llseek inteface does not make sense on 64bit and it is explicitly described as 32bit interface. Use on 64bit is not well-defined so just drop it for 64bit. Fixes: caf6f9c8a326 ("asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macro") Signed-off-by: Michal Suchanek --- v5: update commit message. --- arch/powerpc/kernel/syscalls/syscall.tbl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 010b9f445586..53e427606f6c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -188,7 +188,7 @@ 137common afs_syscall sys_ni_syscall 138common setfsuidsys_setfsuid 139common setfsgidsys_setfsgid -140common _llseek sys_llseek +14032 _llseek sys_llseek 141common getdentssys_getdents compat_sys_getdents 142common _newselect sys_select compat_sys_select 143common flock sys_flock -- 2.22.0
[PATCH v5 0/5] Disable compat cruft on ppc64le v5
Less code means less bugs so add a knob to skip the compat stuff. This is tested on ppc64le top of https://patchwork.ozlabs.org/cover/1153556/ Changes in v2: saner CONFIG_COMPAT ifdefs Changes in v3: - change llseek to 32bit instead of builing it unconditionally in fs - clanup the makefile conditionals - remove some ifdefs or convert to IS_DEFINED where possible Changes in v4: - cleanup is_32bit_task and current_is_64bit - more makefile cleanup Changes in v5: - more current_is_64bit cleanup - split off callchain.c 32bit and 64bit parts Michal Suchanek (5): powerpc: make llseek 32bit-only. powerpc: move common register copy functions from signal_32.c to signal.c powerpc/64: make buildable without CONFIG_COMPAT powerpc/64: Make COMPAT user-selectable disabled on littleendian by default. powerpc/perf: split callchain.c by bitness arch/powerpc/Kconfig | 5 +- arch/powerpc/include/asm/thread_info.h | 4 +- arch/powerpc/kernel/Makefile | 7 +- arch/powerpc/kernel/entry_64.S | 2 + arch/powerpc/kernel/signal.c | 144 - arch/powerpc/kernel/signal_32.c | 140 - arch/powerpc/kernel/syscall_64.c | 6 +- arch/powerpc/kernel/syscalls/syscall.tbl | 2 +- arch/powerpc/kernel/vdso.c | 5 +- arch/powerpc/perf/Makefile | 4 + arch/powerpc/perf/callchain.c| 379 +-- arch/powerpc/perf/callchain.h| 11 + arch/powerpc/perf/callchain_32.c | 218 + arch/powerpc/perf/callchain_64.c | 185 +++ 14 files changed, 579 insertions(+), 533 deletions(-) create mode 100644 arch/powerpc/perf/callchain.h create mode 100644 arch/powerpc/perf/callchain_32.c create mode 100644 arch/powerpc/perf/callchain_64.c -- 2.22.0