[PATCH v5.2 2/2] lcoking/rwsem: Add missing ACQUIRE to read_slowpath sleep loop

2019-08-26 Thread Sasha Levin
From: Peter Zijlstra 

[ Upstream commit 99143f82a255e7f054bead8443462fae76dd829e ]

While reviewing another read_slowpath patch, both Will and I noticed
another missing ACQUIRE, namely:

  X = 0;

  CPU0  CPU1

  rwsem_down_read()
for (;;) {
  set_current_state(TASK_UNINTERRUPTIBLE);

X = 1;
rwsem_up_write();
  rwsem_mark_wake()
atomic_long_add(adjustment, >count);
smp_store_release(>task, NULL);

  if (!waiter.task)
break;

  ...
}

  r = X;

Allows 'r == 0'.

Reported-by: Peter Zijlstra (Intel) 
Reported-by: Will Deacon 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Will Deacon 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---

This is a backport for the v5.2 stable tree. There were multiple reports  
of this issue being hit.

Given that there were a few changes to the code around this, I'd
appreciate an ack before pulling it in.

 kernel/locking/rwsem-xadd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 397dedc58432d..385ebcfc31a6d 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -485,8 +485,10 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, 
int state)
/* wait to be given the lock */
while (true) {
set_current_state(state);
-   if (!waiter.task)
+   if (!smp_load_acquire()) {
+   /* Orders against rwsem_mark_wake()'s 
smp_store_release() */
break;
+   }
if (signal_pending_state(state, current)) {
raw_spin_lock_irq(>wait_lock);
if (waiter.task)
-- 
2.20.1



[PATCH v5.2 1/2] locking/rwsem: Add missing ACQUIRE to read_slowpath exit when queue is empty

2019-08-26 Thread Sasha Levin
From: Jan Stancek 

[ Upstream commit e1b98fa316648420d0434d9ff5b92ad6609ba6c3 ]

LTP mtest06 has been observed to occasionally hit "still mapped when
deleted" and following BUG_ON on arm64.

The extra mapcount originated from pagefault handler, which handled
pagefault for vma that has already been detached. vma is detached
under mmap_sem write lock by detach_vmas_to_be_unmapped(), which
also invalidates vmacache.

When the pagefault handler (under mmap_sem read lock) calls
find_vma(), vmacache_valid() wrongly reports vmacache as valid.

After rwsem down_read() returns via 'queue empty' path (as of v5.2),
it does so without an ACQUIRE on sem->count:

  down_read()
__down_read()
  rwsem_down_read_failed()
__rwsem_down_read_failed_common()
  raw_spin_lock_irq(>wait_lock);
  if (list_empty(>wait_list)) {
if (atomic_long_read(>count) >= 0) {
  raw_spin_unlock_irq(>wait_lock);
  return sem;

The problem can be reproduced by running LTP mtest06 in a loop and
building the kernel (-j $NCPUS) in parallel. It does reproduces since
v4.20 on arm64 HPE Apollo 70 (224 CPUs, 256GB RAM, 2 nodes). It
triggers reliably in about an hour.

The patched kernel ran fine for 10+ hours.

Signed-off-by: Jan Stancek 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Will Deacon 
Acked-by: Waiman Long 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dbu...@suse.de
Fixes: 4b486b535c33 ("locking/rwsem: Exit read lock slowpath if queue empty & 
no writer")
Link: 
https://lkml.kernel.org/r/50b8914e20d1d62bb2dee42d342836c2c16ebee7.1563438048.git.jstan...@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---

This is a backport for the v5.2 stable tree. There were multiple reports
of this issue being hit.

Given that there were a few changes to the code around this, I'd
appreciate an ack before pulling it in.

 kernel/locking/rwsem-xadd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 0b1f779572402..397dedc58432d 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -454,6 +454,8 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, 
int state)
 * been set in the count.
 */
if (atomic_long_read(>count) >= 0) {
+   /* Provide lock ACQUIRE */
+   smp_acquire__after_ctrl_dep();
raw_spin_unlock_irq(>wait_lock);
rwsem_set_reader_owned(sem);
lockevent_inc(rwsem_rlock_fast);
-- 
2.20.1



Re: [PATCH v1] [semaphore] Removed redundant code from semaphore's down family of function

2019-08-26 Thread Peter Zijlstra
On Mon, Aug 26, 2019 at 04:14:36PM +0200, Peter Zijlstra wrote:
> (XXX, we should probably move the schedule_timeout() thing into its own
> patch)

A better version here...

---
Subject: sched,time: Allow better constprop/DCE for schedule_timeout()

If timeout is constant and MAX_SCHEDULE_TIMEOUT, it would be nice to
allow to optimize away everything timeout.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/sched.h | 13 -
 kernel/time/timer.c   | 52 ---
 2 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f0edee94834a..6003e96bce52 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -214,7 +214,18 @@ extern void scheduler_tick(void);
 
 #defineMAX_SCHEDULE_TIMEOUTLONG_MAX
 
-extern long schedule_timeout(long timeout);
+extern long __schedule_timeout(long timeout);
+
+static inline long schedule_timeout(long timeout)
+{
+   if (__builtin_constant_p(timeout) && timeout == MAX_SCHEDULE_TIMEOUT) {
+   schedule();
+   return timeout;
+   }
+
+   return __schedule_timeout(timeout);
+}
+
 extern long schedule_timeout_interruptible(long timeout);
 extern long schedule_timeout_killable(long timeout);
 extern long schedule_timeout_uninterruptible(long timeout);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 0e315a2e77ae..912ae56b96b8 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1851,38 +1851,34 @@ static void process_timeout(struct timer_list *t)
  * jiffies will be returned.  In all cases the return value is guaranteed
  * to be non-negative.
  */
-signed long __sched schedule_timeout(signed long timeout)
+signed long __sched __schedule_timeout(signed long timeout)
 {
struct process_timer timer;
unsigned long expire;
 
-   switch (timeout)
-   {
-   case MAX_SCHEDULE_TIMEOUT:
-   /*
-* These two special cases are useful to be comfortable
-* in the caller. Nothing more. We could take
-* MAX_SCHEDULE_TIMEOUT from one of the negative value
-* but I' d like to return a valid offset (>=0) to allow
-* the caller to do everything it want with the retval.
-*/
+   /*
+* We could take MAX_SCHEDULE_TIMEOUT from one of the negative values,
+* but I'd like to return a valid offset (>= 0) to allow the caller to
+* do everything it wants with the retval.
+*/
+   if (timeout == MAX_SCHEDULE_TIMEOUT) {
schedule();
-   goto out;
-   default:
-   /*
-* Another bit of PARANOID. Note that the retval will be
-* 0 since no piece of kernel is supposed to do a check
-* for a negative retval of schedule_timeout() (since it
-* should never happens anyway). You just have the printk()
-* that will tell you if something is gone wrong and where.
-*/
-   if (timeout < 0) {
-   printk(KERN_ERR "schedule_timeout: wrong timeout "
+   return timeout;
+   }
+
+   /*
+* Another bit of PARANOID. Note that the retval will be 0 since no
+* piece of kernel is supposed to do a check for a negative retval of
+* schedule_timeout() (since it should never happens anyway). You just
+* have the printk() that will tell you if something is gone wrong and
+* where.
+*/
+   if (timeout < 0) {
+   printk(KERN_ERR "schedule_timeout: wrong timeout "
"value %lx\n", timeout);
-   dump_stack();
-   current->state = TASK_RUNNING;
-   goto out;
-   }
+   dump_stack();
+   current->state = TASK_RUNNING;
+   goto out;
}
 
expire = timeout + jiffies;
@@ -1898,10 +1894,10 @@ signed long __sched schedule_timeout(signed long 
timeout)
 
timeout = expire - jiffies;
 
- out:
+out:
return timeout < 0 ? 0 : timeout;
 }
-EXPORT_SYMBOL(schedule_timeout);
+EXPORT_SYMBOL(__schedule_timeout);
 
 /*
  * We can use __set_current_state() here because schedule_timeout() calls


Re: [PATCH RESEND] erofs: fix compile warnings when moving out include/trace/events/erofs.h

2019-08-26 Thread Gao Xiang
Hi Chao,

On Mon, Aug 26, 2019 at 09:51:35PM +0800, Chao Yu wrote:
> On 2019-8-26 21:26, Gao Xiang wrote:

[]

> >  TRACE_EVENT(erofs_lookup,
> >  ^~~
> > include/trace/events/erofs.h:28:2: note: in expansion of macro 'TP_PROTO'
> >   TP_PROTO(struct inode *dir, struct dentry *dentry, unsigned int flags),
> >   ^~~~
> > 
> > That makes me very confused since most original EROFS tracepoint code
> > was taken from f2fs, and finally I found
> > 
> > commit 43c78d88036e ("kbuild: compile-test kernel headers to ensure they 
> > are self-contained")
> > 
> > It seems these warnings are generated from KERNEL_HEADER_TEST feature and
> > ext4/f2fs tracepoint files were in blacklist.
> 
> For f2fs.h, it will be only used by f2fs module, I guess it's okay to let it
> stay in blacklist...


Yes, it depends on you f2fs folks selection...
Anyway, this file is a new file, therefore it should be better not to add to
blacklist...


> 
> > 
> > Anyway, let's fix these issues for KERNEL_HEADER_TEST feature instead
> > of adding to blacklist...
> > 
> > [1] https://lore.kernel.org/lkml/20190826162432.11100...@canb.auug.org.au/
> > Reported-by: Stephen Rothwell 
> > Signed-off-by: Gao Xiang 
> 
> Reviewed-by: Chao Yu 


Thanks for reviewing :)


Thanks,
Gao Xiang

> 
> Thanks,



Re: [PATCH 00/14] per memcg lru_lock

2019-08-26 Thread Alex Shi



在 2019/8/22 下午11:20, Daniel Jordan 写道:
>>
>>>    
>>> https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice>
>>> It's also synthetic but it stresses lru_lock more than just anon 
>>> alloc/free.  It hits the page activate path, which is where we see this 
>>> lock in our database, and if enough memory is configured lru_lock also gets 
>>> stressed during reclaim, similar to [1].
>>
>> Thanks for the sharing, this patchset can not help the [1] case, since it's 
>> just relief the per container lock contention now.
> 
> I should've been clearer.  [1] is meant as an example of someone suffering 
> from lru_lock during reclaim.  Wouldn't your series help per-memcg reclaim?

yes, I got your point, since the aim9 don't show much improvement, I am trying 
this case in containers.

Thanks
Alex


Re: BoF on LPC 2019 : Linux Perf advancements for compute intensive and server systems

2019-08-26 Thread Alexey Budankov


On 26.08.2019 16:55, Arnaldo Carvalho de Melo wrote:
> Em Mon, Aug 26, 2019 at 02:36:48PM +0300, Alexey Budankov escreveu:
>>
>> Hi,
>>
>> There is a BoF session scheduled on Linux Plumbers Conference 2019 event.
>> If you plan attend the event feel free to join and discuss about the BoF 
>> topic and beyond:
>>
>> Linux Perf advancements for compute intensive and server systems:



> 
> All those are already merged, after long reviewing phases and lots of
> testing, right?

Right. These changes now constitute parts of the Linux kernel source tree.

~Alexey

> 
> I think the next step for people working in this area, in preparation
> for this BoF, is to list what are their current efforts, like Ian et all
> did in:
> 
>   https://linuxplumbersconf.org/event/4/contributions/291/
> 
> - Arnaldo
>  
>> Best regards,
>> Alexey
>>
>> [1] https://marc.info/?l=linux-kernel=154149439404555=2
>> [2] https://marc.info/?l=linux-kernel=154817912621465=2
>> [3] https://marc.info/?l=linux-kernel=155293062518459=2
>> [4] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> 


Re: [PATCH v2 1/6] ASoC: atmel: enable SOC_SSC_PDC and SOC_SSC_DMA in Kconfig

2019-08-26 Thread Alexandre Belloni
On 24/08/2019 22:26:52+0200, Michał Mirosław wrote:
> Allow SSC to be used on platforms described using audio-graph-card
> in Device Tree.
> 
> Signed-off-by: Michał Mirosław 
Acked-by: Alexandre Belloni 

> 
> ---
>  v2: extended to PDC mode
>  reworked and fixed Kconfig option dependencies
> 
> ---
>  sound/soc/atmel/Kconfig | 30 ++
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/sound/soc/atmel/Kconfig b/sound/soc/atmel/Kconfig
> index 06c1d5ce642c..f118c229ed82 100644
> --- a/sound/soc/atmel/Kconfig
> +++ b/sound/soc/atmel/Kconfig
> @@ -12,25 +12,31 @@ if SND_ATMEL_SOC
>  config SND_ATMEL_SOC_PDC
>   tristate
>   depends on HAS_DMA
> - default m if SND_ATMEL_SOC_SSC_PDC=m && SND_ATMEL_SOC_SSC=m
> - default y if SND_ATMEL_SOC_SSC_PDC=y || (SND_ATMEL_SOC_SSC_PDC=m && 
> SND_ATMEL_SOC_SSC=y)
> -
> -config SND_ATMEL_SOC_SSC_PDC
> - tristate
>  
>  config SND_ATMEL_SOC_DMA
>   tristate
>   select SND_SOC_GENERIC_DMAENGINE_PCM
> - default m if SND_ATMEL_SOC_SSC_DMA=m && SND_ATMEL_SOC_SSC=m
> - default y if SND_ATMEL_SOC_SSC_DMA=y || (SND_ATMEL_SOC_SSC_DMA=m && 
> SND_ATMEL_SOC_SSC=y)
> -
> -config SND_ATMEL_SOC_SSC_DMA
> - tristate
>  
>  config SND_ATMEL_SOC_SSC
>   tristate
> - default y if SND_ATMEL_SOC_SSC_DMA=y || SND_ATMEL_SOC_SSC_PDC=y
> - default m if SND_ATMEL_SOC_SSC_DMA=m || SND_ATMEL_SOC_SSC_PDC=m
> +
> +config SND_ATMEL_SOC_SSC_PDC
> + tristate "SoC PCM DAI support for AT91 SSC controller using PDC"
> + depends on ATMEL_SSC
> + select SND_ATMEL_SOC_PDC
> + select SND_ATMEL_SOC_SSC
> + help
> +   Say Y or M if you want to add support for Atmel SSC interface
> +   in PDC mode configured using audio-graph-card in device-tree.
> +
> +config SND_ATMEL_SOC_SSC_DMA
> + tristate "SoC PCM DAI support for AT91 SSC controller using DMA"
> + depends on ATMEL_SSC
> + select SND_ATMEL_SOC_DMA
> + select SND_ATMEL_SOC_SSC
> + help
> +   Say Y or M if you want to add support for Atmel SSC interface
> +   in DMA mode configured using audio-graph-card in device-tree.
>  
>  config SND_AT91_SOC_SAM9G20_WM8731
>   tristate "SoC Audio support for WM8731-based At91sam9g20 evaluation 
> board"
> -- 
> 2.20.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [PATCH 00/14] per memcg lru_lock

2019-08-26 Thread Alex Shi



在 2019/8/26 下午4:39, Konstantin Khlebnikov 写道:
>>>
> because they mixes pages from different cgroups.
> 
> pagevec_lru_move_fn and friends need better implementation:
> either sorting pages or splitting vectores in per-lruvec basis.

Right, this should be the next step to improve. Maybe we could try the 
per-lruvec pagevec?

Thanks
Alex


Re: [PATCH v2] fs/proc/page: Skip uninitialized page when iterating page structures

2019-08-26 Thread Matthew Wilcox
On Mon, Aug 26, 2019 at 09:43:24AM -0400, Waiman Long wrote:
> On 8/26/19 9:25 AM, Matthew Wilcox wrote:
> > On Mon, Aug 26, 2019 at 08:43:36AM -0400, Waiman Long wrote:
> >> It was found that on a dual-socket x86-64 system with nvdimm, reading
> >> /proc/kpagecount may cause the system to panic:
> >>
> >> ===
> >> [   79.917682] BUG: unable to handle page fault for address: 
> >> fffe
> >> [   79.924558] #PF: supervisor read access in kernel mode
> >> [   79.929696] #PF: error_code(0x) - not-present page
> >> [   79.934834] PGD 87b60d067 P4D 87b60d067 PUD 87b60f067 PMD 0
> >> [   79.940494] Oops:  [#1] SMP NOPTI
> >> [   79.944157] CPU: 89 PID: 3455 Comm: cp Not tainted 5.3.0-rc5-test+ #14
> >> [   79.950682] Hardware name: Dell Inc. PowerEdge R740/07X9K0, BIOS 2.2.11 
> >> 06/13/2019
> >> [   79.958246] RIP: 0010:kpagecount_read+0xdb/0x1a0
> >> [   79.962859] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 f7 48 c1 e7 06 
> >> 48 03 3d fe da de 00 74 1d 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 
> >> <48> 8b 00 f6 c4 02 75 06 83 7f 30 80 7d 62 31 c0 4c 89 f9 e8 5d c9
> >> [   79.981603] RSP: 0018:b0d9c950fe70 EFLAGS: 00010202
> >> [   79.986830] RAX: fffe RBX: 8beebe5383c0 RCX: 
> >> b0d9c950ff00
> >> [   79.993963] RDX: 0001 RSI: 7fd85b29e000 RDI: 
> >> e77a2200
> >> [   80.001095] RBP: 0002 R08: 0001 R09: 
> >> 
> >> [   80.008226] R10:  R11: 0001 R12: 
> >> 7fd85b29e000
> >> [   80.015358] R13: 893f0480 R14: 0088 R15: 
> >> 7fd85b29e000
> >> [   80.022491] FS:  7fd85b312800() GS:8c359fb0() 
> >> knlGS:
> >> [   80.030576] CS:  0010 DS:  ES:  CR0: 80050033
> >> [   80.036321] CR2: fffe CR3: 004f54a38001 CR4: 
> >> 007606e0
> >> [   80.043455] DR0:  DR1:  DR2: 
> >> 
> >> [   80.050586] DR3:  DR6: fffe0ff0 DR7: 
> >> 0400
> >> [   80.057718] PKRU: 5554
> >> [   80.060428] Call Trace:
> >> [   80.062877]  proc_reg_read+0x39/0x60
> >> [   80.066459]  vfs_read+0x91/0x140
> >> [   80.069686]  ksys_read+0x59/0xd0
> >> [   80.072922]  do_syscall_64+0x59/0x1e0
> >> [   80.076588]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [   80.081637] RIP: 0033:0x7fd85a7f5d75
> >> ===
> >>
> >> It turns out the panic was caused by the kpagecount_read() function
> >> hitting an uninitialized page structure at PFN 0x88 where all its
> >> fields were set to -1. The compound_head value of -1 will mislead the
> >> kernel to treat -2 as a pointer to the head page of the compound page
> >> leading to the crash.
> >>
> >> The system have 12 GB of nvdimm ranging from PFN 0x88-0xb7.
> >> However, only PFN 0x88c200-0xb7 are released by the nvdimm
> >> driver to the kernel and initialized. IOW, PFN 0x88-0x88c1ff
> >> remain uninitialized. Perhaps these 196 MB of nvdimm are reserved for
> >> internal use.
> >>
> >> To fix the panic, we need to find out if a page structure has been
> >> initialized. This is done now by checking if the PFN is in the range
> >> of a memory zone assuming that pages in a zone is either correctly
> >> marked as not present in the mem_section structure or have their page
> >> structures initialized.
> >>
> >> Signed-off-by: Waiman Long 
> >> ---
> >>  fs/proc/page.c | 68 +++---
> >>  1 file changed, 65 insertions(+), 3 deletions(-)
> > Would this not work equally well?
> >
> > +++ b/fs/proc/page.c
> > @@ -46,7 +46,8 @@ static ssize_t kpagecount_read(struct file *file, char 
> > __user *buf,
> > ppage = pfn_to_page(pfn);
> > else
> > ppage = NULL;
> > -   if (!ppage || PageSlab(ppage) || page_has_type(ppage))
> > +   if (!ppage || PageSlab(ppage) || page_has_type(ppage) ||
> > +   PagePoisoned(ppage))
> > pcount = 0;
> > else
> > pcount = page_mapcount(ppage);
> >
> That is my initial thought too. However, I couldn't find out where the
> memory of the uninitialized page structures may have been initialized
> somehow. The only thing I found is when vm_debug is on that the page
> structures are indeed poisoned. Without that it is probably just
> whatever the content that the memory have when booting up the kernel.
> 
> It just happens on the test system that I used the memory of those page
> structures turned out to be -1. It may be different in other systems
> that can still crash the kernel, but not detected by the PagePoisoned()
> check. That is why I settle on the current scheme which is more general
> and don't rely on the memory get initialized in a certain way.

page_init_poison().

Since you have a system 

RE: [PATCH v3 0/8] AMD64 EDAC fixes

2019-08-26 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, August 23, 2019 10:38 AM
> To: Ghannam, Yazen 
> Cc: Adam Borowski ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/8] AMD64 EDAC fixes
> 
> On Fri, Aug 23, 2019 at 03:28:59PM +, Ghannam, Yazen wrote:
> > Boris, Do you think it'd be appropriate to change the return values
> > for some cases?
> >
> > For example, ECC disabled is a hardware configuration. This doesn't
> > mean that the module failed any operations in this case.
> >
> > In other words, the module checks for a feature. If the feature is not
> > present, then return without failure (and maybe give a message).
> 
> That makes sense but AFAICT if probe_one_instance() sees that ECC is not
> enabled, it returns 0.
> 
> The "if (!edac_has_mcs())" check later is to verify that at least once
> instance was loaded successfully and, if not, then return an error.
> 
> So where does it return failure?
> 

I was tracking down the failure with ECC disabled, and that seems to be it.

So I think we should return 0 "if (!edac_has_mcs())", because we'd only get
there if ECC is disabled on all nodes and there wasn't some other initialization
error.

I'll send a patch for this soon.

Adam, would you mind testing this patch?

Thanks,
Yazen


Re: [PATCH 03/14] lru/memcg: using per lruvec lock in un/lock_page_lru

2019-08-26 Thread Alex Shi



在 2019/8/26 下午4:30, Konstantin Khlebnikov 写道:
>>
>>   
> 
> What protects lruvec from freeing at this point?
> After reading resolving lruvec page could be moved and cgroup deleted.
> 
> In this old patches I've used RCU for that: 
> https://lkml.org/lkml/2012/2/20/276
> Pointer to lruvec should be resolved under disabled irq.
> Not sure this works these days.

Thanks for reminder! I will reconsider this point and come up with changes.

Thanks
Alex


Re: [PATCH v1] [semaphore] Removed redundant code from semaphore's down family of function

2019-08-26 Thread Peter Zijlstra
On Sat, Aug 24, 2019 at 09:20:59AM +0530, Satendra Singh Thakur wrote:
> On Thu, 22 Aug 2019 17:51:12 +0200, Peter Zijlstra wrote:
> > On Mon, Aug 12, 2019 at 07:18:59PM +0530, Satendra Singh Thakur wrote:
> > > -The semaphore code has four funcs
> > > down,
> > > down_interruptible,
> > > down_killable,
> > > down_timeout
> > > -These four funcs have almost similar code except that
> > > they all call lower level function __down_xyz.
> > > -This lower level func in-turn call inline func
> > > __down_common with appropriate arguments.
> > > -This patch creates a common macro for above family of funcs
> > > so that duplicate code is eliminated.
> > > -Also, __down_common has been made noinline so that code is
> > > functionally similar to previous one
> > > -For example, earlier down_killable would call __down_killable
> > > , which in-turn would call inline func __down_common
> > > Now, down_killable calls noinline __down_common directly
> > > through a macro
> > > -The funcs __down_interruptible, __down_killable etc have been
> > > removed as they were just wrapper to __down_common
> >
> > The above is unreadable and seems to lack a reason for this change.
> Hi Mr Peter,
> Thanks for the comments.
> I will try to explain it further:
> 
> The semaphore has four functions named down*.
> The call flow of the functions is
> 
> down* > __down* > inline __down_common
> 
> The code of down* and __down* is redundant/duplicate except that
> the __down_common is called with different arguments from __down*
> functions.
> 
> This patch defines a macro down_common which contain this common
> code of all down* functions.
> 
> new call flow is
> 
> down* > noinline __down_common (through a macro down_common).
> 
> > AFAICT from the actual patch, you're destroying the explicit
> > instantiation of the __down*() functions
> > through constant propagation into __down_common().
> Intead of instantiation of __down* functions, we are instaintiating
> __down_common, is it a problem ?

Then you loose the constant propagation into __down_common and you'll
not DCE the signal and/or timeout code. The function even has a comment
on explaining that. That said; the timeout DCE seems suboptimal.

Also; I'm thinking you can do it without that ugly CPP.

---
Subject: locking/semaphore: Simplify code flow

Ever since we got rid of the per arch assembly the structure of the
semaphore code is more complicated than it needs to be.

Specifically, the slow path calling convention is no more. With that,
Satendara noted that all functions calling __down_common() (aka the slow
path) were basically the same.

Fold the lot.

(XXX, we should probably move the schedule_timeout() thing into its own
patch)

Suggested-by: Satendra Singh Thakur 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/sched.h  |  13 +-
 kernel/locking/semaphore.c | 105 +
 kernel/time/timer.c|  44 +++
 3 files changed, 58 insertions(+), 104 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f0edee94834a..8f09e8cebe5a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -214,7 +214,18 @@ extern void scheduler_tick(void);
 
 #defineMAX_SCHEDULE_TIMEOUTLONG_MAX
 
-extern long schedule_timeout(long timeout);
+extern long __schedule_timeout(long timeout);
+
+static inline long schedule_timeout(long timeout)
+{
+   if (timeout == MAX_SCHEDULE_TIMEOUT) {
+   schedule();
+   return timeout;
+   }
+
+   return __schedule_timeout(timeout);
+}
+
 extern long schedule_timeout_interruptible(long timeout);
 extern long schedule_timeout_killable(long timeout);
 extern long schedule_timeout_uninterruptible(long timeout);
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index d9dd94defc0a..4585fa3a0984 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -33,11 +33,24 @@
 #include 
 #include 
 
-static noinline void __down(struct semaphore *sem);
-static noinline int __down_interruptible(struct semaphore *sem);
-static noinline int __down_killable(struct semaphore *sem);
-static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
+static inline int
+__sched __down_slow(struct semaphore *sem, long state, long timeout);
+static void __up(struct semaphore *sem);
+
+static inline int __down(struct semaphore *sem, long state, long timeout)
+{
+   unsigned long flags;
+   int result = 0;
+
+   raw_spin_lock_irqsave(>lock, flags);
+   if (likely(sem->count > 0))
+   sem->count--;
+   else
+   result = __down_slow(sem, state, timeout);
+   raw_spin_unlock_irqrestore(>lock, flags);
+
+   return result;
+}
 
 /**
  * down - acquire the semaphore
@@ -52,14 +65,7 @@ static noinline void __up(struct semaphore *sem);
  */
 void down(struct semaphore *sem)
 {
- 

Re: [PATCH v2 3/6] ASoC: atmel_ssc_dai: implement left-justified data mode

2019-08-26 Thread Codrin.Ciubotariu
> Enable support for left-justified data mode for SSC-codec link.
> 
> Signed-off-by: Michał Mirosław 
> 
> ---
>   v2: rebased

I noticed you also added a description and you removed two comments from 
v1. Please include all the changes in the changelog.
I already added my 'Reviewed-by' in v1, but since there are some more 
changes from the previous version:

Reviewed-by: Codrin Ciubotariu 

Thanks and best regards,
Codrin

> 
> ---
>   sound/soc/atmel/atmel_ssc_dai.c | 9 +
>   1 file changed, 9 insertions(+)
> 
> diff --git a/sound/soc/atmel/atmel_ssc_dai.c b/sound/soc/atmel/atmel_ssc_dai.c
> index 7dc6ec9b8c7a..48e9eef34c0f 100644
> --- a/sound/soc/atmel/atmel_ssc_dai.c
> +++ b/sound/soc/atmel/atmel_ssc_dai.c
> @@ -564,6 +564,15 @@ static int atmel_ssc_hw_params(struct snd_pcm_substream 
> *substream,
>   
>   switch (ssc_p->daifmt & SND_SOC_DAIFMT_FORMAT_MASK) {
>   
> + case SND_SOC_DAIFMT_LEFT_J:
> + fs_osync = SSC_FSOS_POSITIVE;
> + fs_edge = SSC_START_RISING_RF;
> +
> + rcmr =SSC_BF(RCMR_STTDLY, 0);
> + tcmr =SSC_BF(TCMR_STTDLY, 0);
> +
> + break;
> +
>   case SND_SOC_DAIFMT_I2S:
>   fs_osync = SSC_FSOS_NEGATIVE;
>   fs_edge = SSC_START_FALLING_RF;
> 



Re: numlist_push() barriers Re: [RFC PATCH v4 1/9] printk-rb: add a new printk ringbuffer implementation

2019-08-26 Thread Petr Mladek
On Mon 2019-08-26 10:34:36, Andrea Parri wrote:
> > > + /*
> > > +  * bA:
> > > +  *
> > > +  * Setup the node to be a list terminator: next_id == id.
> > > +  */
> > > + WRITE_ONCE(n->next_id, id);
> > 
> > Do we need WRITE_ONCE() here?
> > Both "n" and "id" are given as parameters and do not change.
> > The assigment must be done before "id" is set as nl->head_id.
> > The ordering is enforced by cmpxchg_release().
> 
> (Disclaimer: this is still a very much debated issue...)
> 
> According to the LKMM, this question boils down to the question:
> 
>   Is there "ordering"/synchronization between the above access and
>   the "matching accesses" bF and aA' to the same location?
> 
> Again according to the LKMM's analysis, such synchronization is provided
> by the RELEASE -> "reads-from" -> ADDR relation.  (Encoding address dep.
> in litmus tests is kind of tricky but possible, e.g., for the pattern in
> question, we could write/model as follows:
> 
> C S+ponarelease+addroncena
> 
> {
>   int *y = 
> }
> 
> P0(int *x, int **y, int *a)
> {
>   int *r0;
> 
>   *x = 2;
>   r0 = cmpxchg_release(y, a, x);
> }
> 
> P1(int *x, int **y)
> {
>   int *r0;
>
>   r0 = READ_ONCE(*y);
>   *r0 = 1;
> }
> 
> exists (1:r0=x /\ x=2)

Which r0 the above exists rule refers to, please?
Do both P0 and P1 define r0 by purpose?

> Then
> 
>   $ herd7 -conf linux-kernel.cfg S+ponarelease+addroncena
>   Test S+ponarelease+addroncena Allowed
>   States 2
>   1:r0=a; x=2;
>   1:r0=x; x=1;
>   No
>   Witnesses
>   Positive: 0 Negative: 2
>   Condition exists (1:r0=x /\ x=2)
>   Observation S+ponarelease+addroncena Never 0 2
>   Time S+ponarelease+addroncena 0.01
>   Hash=7eaf7b5e95419a3c352d7fd50b9cd0d5
> 
> that is, the test is not racy and the "exists" clause is not satisfiable
> in the LKMM.  Notice that _if the READ_ONCE(*y) in P1 were replaced by a
> plain read, then we would obtain:
> 
>   Test S+ponarelease+addrnana Allowed
>   States 2
>   1:r0=x; x=1;
>   1:r0=x; x=2;

Do you have any explanation how r0=x; x=2; could happen, please?

Does the ommited READ_ONCE allows to do r0 = (*y) twice
before and after *r0 = 1?
Or the two operations P1 can be called in any order?

I am sorry if it obvious. Feel free to ask me to re-read Paul's
articles on LWN more times or point me to another resources.



>   Ok
>   Witnesses
>   Positive: 1 Negative: 1
>   Flag data-race  [ <-- the LKMM warns about a data-race ]
>   Condition exists (1:r0=x /\ x=2)
>   Observation S+ponarelease+addrnana Sometimes 1 1
>   Time S+ponarelease+addrnana 0.00
>   Hash=a61acf2e8e51c2129d33ddf5e4c76a49
> 
> N.B. This analysis generally depends on the assumption that every marked
> access (e.g., the cmpxchg_release() called out above and the READ_ONCE()
> heading the address dependencies) are _single-copy atomic, an assumption
> which has been recently shown to _not be valid in such generality:
> 
>   https://lkml.kernel.org/r/20190821103200.kpufwtviqhpbuv2n@willie-the-truck

So, it might be even worse. Do I get it correctly?

Best Regards,
Petr



Re: [PATCH] perf script: Fix memory leaks in list_scripts()

2019-08-26 Thread Arnaldo Carvalho de Melo
Em Sun, Aug 25, 2019 at 11:06:25PM -0500, Gustavo A. R. Silva escreveu:
> Hi all,
> 
> Friendly ping (second one after 4 months):
> 
> Who can take this?

Sorry for the delay, finally applied, thanks!

- Arnaldo
 
> Thanks
> --
> Gustavo
> 
> On 4/22/19 10:14 AM, Gustavo A. R. Silva wrote:
> > Hi all,
> > 
> > Friendly ping:
> > 
> > Who can take this?
> > 
> > Thanks
> > 
> > On 4/8/19 11:27 AM, Gustavo A. R. Silva wrote:
> >> In case memory resources for *buf* and *paths* were allocated,
> >> jump to *out* and release them before return.
> >>
> >> Addresses-Coverity-ID: 1444328 ("Resource leak")
> >> Fixes: 6f3da20e151f ("perf report: Support builtin perf script in scripts 
> >> menu")
> >> Signed-off-by: Gustavo A. R. Silva 
> >> ---
> >>  tools/perf/ui/browsers/scripts.c | 6 --
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/perf/ui/browsers/scripts.c 
> >> b/tools/perf/ui/browsers/scripts.c
> >> index 27cf3ab88d13..f4edb18f67ec 100644
> >> --- a/tools/perf/ui/browsers/scripts.c
> >> +++ b/tools/perf/ui/browsers/scripts.c
> >> @@ -131,8 +131,10 @@ static int list_scripts(char *script_name, bool 
> >> *custom,
> >>int key = ui_browser__input_window("perf script command",
> >>"Enter perf script command line (without perf 
> >> script prefix)",
> >>script_args, "", 0);
> >> -  if (key != K_ENTER)
> >> -  return -1;
> >> +  if (key != K_ENTER) {
> >> +  ret = -1;
> >> +  goto out;
> >> +  }
> >>sprintf(script_name, "%s script %s", perf, script_args);
> >>} else if (choice < num + max_std) {
> >>strcpy(script_name, paths[choice]);
> >>

-- 

- Arnaldo


Re: [PATCH] PCI: Add missing link delays required by the PCIe spec

2019-08-26 Thread Bjorn Helgaas
On Mon, Aug 26, 2019 at 01:17:26PM +0300, Mika Westerberg wrote:
> On Fri, Aug 23, 2019 at 09:12:54PM -0500, Bjorn Helgaas wrote:
> > Hi Mika,
> 
> Hi,
> 
> > I'm trying to figure out specifically why we need this and where it
> > should go.  Questions below.
> 
> Thanks for looking at this.
> 
> > On Wed, Aug 21, 2019 at 03:45:19PM +0300, Mika Westerberg wrote:
> > > Currently Linux does not follow PCIe spec regarding the required delays
> > > after reset. A concrete example is a Thunderbolt add-in-card that
> > > consists of a PCIe switch and two PCIe endpoints:
> > > 
> > >   +-1b.0-[01-6b]00.0-[02-6b]--+-00.0-[03]00.0 TBT controller
> > >   +-01.0-[04-36]-- DS hotplug port
> > >   +-02.0-[37]00.0 xHCI controller
> > >   \-04.0-[38-6b]-- DS hotplug port
> > > 
> > > The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> > > gen3 so they support 8GT/s link speeds.
> > > 
> > > We wait for the PCIe hierarchy to enter D3cold (runtime):
> > > 
> > >   pcieport :00:1b.0: power state changed by ACPI to D3cold
> > > 
> > > When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> > > PCIe switch is put to reset and its power is re-applied. This means that
> > > we must follow the rules in PCIe 4.0 section 6.6.1.
> > > 
> > > For the PCIe gen3 ports we are dealing with here, the following applies:
> > > 
> > >   With a Downstream Port that supports Link speeds greater than 5.0
> > >   GT/s, software must wait a minimum of 100 ms after Link training
> > >   completes before sending a Configuration Request to the device
> > >   immediately below that Port. Software can determine when Link training
> > >   completes by polling the Data Link Layer Link Active bit or by setting
> > >   up an associated interrupt (see Section 6.7.3.3).
> > > 
> > > Translating this into the above topology we would need to do this (DLLLA
> > > stands for Data Link Layer Link Active):
> > > 
> > >   pcieport :00:1b.0: wait for 100ms after DLLLA is set before access 
> > > to :01:00.0
> > >   pcieport :02:00.0: wait for 100ms after DLLLA is set before access 
> > > to :03:00.0
> > >   pcieport :02:02.0: wait for 100ms after DLLLA is set before access 
> > > to :37:00.0
> > > 
> > > I've instrumented the kernel with additional logging so we can see the
> > > actual delays the kernel performs:
> > > ...

> > >   xhci_hcd :37:00.0: restoring config space at offset 0x10 (was 0x0, 
> > > writing 0x73f0)
> > >   ...
> > >   thunderbolt :03:00.0: restoring config space at offset 0x14 (was 
> > > 0x0, writing 0x8a04)
> > > 
> > > This is even worse. None of the mandatory delays are performed. If this
> > > would be S3 instead of s2idle then according to PCI FW spec 3.2 section
> > > 4.6.8.  there is a specific _DSM that allows the OS to skip the delays
> > > but this platform does not provide the _DSM and does not go to S3 anyway
> > > so no firmware is involved that could already handle these delays.
> > > 
> > > In this particular Intel Coffee Lake platform these delays are not
> > > actually needed because there is an additional delay as part of the ACPI
> > > power resource that is used to turn on power to the hierarchy but since
> > > that additional delay is not required by any of standards (PCIe, ACPI)
> > 
> > So it sounds like this Coffee Lake accidentally works because of
> > unrelated firmware delay that's not actually required, or at least not
> > related to the delay required by PCIe?
> 
> Correct. The root port ACPI Power Resource includes quite long delay
> which allows the links to be trained before the _ON method is finished.
> 
> > I did notice that we don't implement all of _DSM function 9 and the
> > parts we're missing look like they could be relevant.
> 
> AFAICT that _DSM is only for lowering the delays, not increasing them.
> If the platform does not provide it the OS must adhere to all the timing
> requirements of the PCIe spec (PCI FW 3.2 section 4.6.9). Here we are
> actually trying to do just that :)

Yep, true.

> > > it is not present in the Intel Ice Lake, for example where missing the
> > > mandatory delays causes pciehp to start tearing down the stack too early
> > > (links are not yet trained).
> > 
> > I'm guessing the Coffee Lake/Ice Lake distinction is not really
> > relevant and the important thing is that something about Ice Lake is
> > faster and reduces the accidental delay to the point that things stop
> > working.
> > 
> > So probably the same thing could happen on Coffee Lake or any other
> > system if it had different firmware.
> 
> Indeed.

Thanks for confirming that.  I would probably abstract this somehow in
the commit log so people don't have the impression that we're fixing
something specific to Ice Lake.

> > > There is also one reported case (see the bugzilla link below) where the
> > > missing delay causes 

Re: [PATCH] KVM: x86: Don't update RIP or do single-step on faulting emulation

2019-08-26 Thread Sean Christopherson
On Fri, Aug 23, 2019 at 03:46:20PM -0700, Andy Lutomirski wrote:
> On Fri, Aug 23, 2019 at 1:55 PM Sean Christopherson
>  wrote:
> >
> > Don't advance RIP or inject a single-step #DB if emulation signals a
> > fault.  This logic applies to all state updates that are conditional on
> > clean retirement of the emulation instruction, e.g. updating RFLAGS was
> > previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
> > EFLAGS on faulting emulation").
> >
> > Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
> > ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
> > fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
> > invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation
> 
> EFLAGS.TF=1

It's always some mundane detail...


Re: [PATCH v2 2/6] ASoC: atmel_ssc_dai: rework DAI format configuration

2019-08-26 Thread Codrin.Ciubotariu
On 24.08.2019 23:26, Michał Mirosław wrote:
> Rework DAI format calculation in preparation for adding more formats
> later. As a side-effect this enables all CBM/CBS x CFM/CFS combinations
> for supported formats. (Note: the additional modes are not tested.)

The only mode added (and not tested) is for SND_SOC_DAIFMT_DSP_A |
SND_SOC_DAIFMT_CBM_CFS. I would mention it explicitly, instead of 
something generic. Also, SND_SOC_DAIFMT_CBM_CFS is not supported for any 
format, please see atmel_ssc_hw_rule_rate().

> 
> Note: this changes FSEDGE to POSITIVE for I2S CBM_CFS mode as the TXSYN
> interrupt is not used anyway.
> 
> Signed-off-by: Michał Mirosław 
> 
> ---
>   v2: added note about extended modes' status
>   incorporated common FS (LRCLK) configuration
> 
> ---
>   sound/soc/atmel/atmel_ssc_dai.c | 286 +---
>   1 file changed, 80 insertions(+), 206 deletions(-)
> 
> diff --git a/sound/soc/atmel/atmel_ssc_dai.c b/sound/soc/atmel/atmel_ssc_dai.c
> index 6f89483ac88c..7dc6ec9b8c7a 100644
> --- a/sound/soc/atmel/atmel_ssc_dai.c
> +++ b/sound/soc/atmel/atmel_ssc_dai.c
> @@ -471,7 +471,7 @@ static int atmel_ssc_hw_params(struct snd_pcm_substream 
> *substream,
>   int dir, channels, bits;
>   u32 tfmr, rfmr, tcmr, rcmr;
>   int ret;
> - int fslen, fslen_ext;
> + int fslen, fslen_ext, fs_osync, fs_edge;
>   u32 cmr_div;
>   u32 tcmr_period;
>   u32 rcmr_period;
> @@ -558,226 +558,36 @@ static int atmel_ssc_hw_params(struct 
> snd_pcm_substream *substream,
>   /*
>* Compute SSC register settings.
>*/
> - switch (ssc_p->daifmt
> - & (SND_SOC_DAIFMT_FORMAT_MASK | SND_SOC_DAIFMT_MASTER_MASK)) {
>   
> - case SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_CBS_CFS:
> - /*
> -  * I2S format, SSC provides BCLK and LRC clocks.
> -  *
> -  * The SSC transmit and receive clocks are generated
> -  * from the MCK divider, and the BCLK signal
> -  * is output on the SSC TK line.
> -  */
> -
> - if (bits > 16 && !ssc->pdata->has_fslen_ext) {
> - dev_err(dai->dev,
> - "sample size %d is too large for SSC device\n",
> - bits);
> - return -EINVAL;
> - }
> -
> - fslen_ext = (bits - 1) / 16;
> - fslen = (bits - 1) % 16;
> -
> - rcmr =SSC_BF(RCMR_PERIOD, rcmr_period)
> - | SSC_BF(RCMR_STTDLY, START_DELAY)
> - | SSC_BF(RCMR_START, SSC_START_FALLING_RF)
> - | SSC_BF(RCMR_CKI, SSC_CKI_RISING)
> - | SSC_BF(RCMR_CKO, SSC_CKO_NONE)
> - | SSC_BF(RCMR_CKS, SSC_CKS_DIV);
> -
> - rfmr =SSC_BF(RFMR_FSLEN_EXT, fslen_ext)
> - | SSC_BF(RFMR_FSEDGE, SSC_FSEDGE_POSITIVE)
> - | SSC_BF(RFMR_FSOS, SSC_FSOS_NEGATIVE)
> - | SSC_BF(RFMR_FSLEN, fslen)
> - | SSC_BF(RFMR_DATNB, (channels - 1))
> - | SSC_BIT(RFMR_MSBF)
> - | SSC_BF(RFMR_LOOP, 0)
> - | SSC_BF(RFMR_DATLEN, (bits - 1));
> -
> - tcmr =SSC_BF(TCMR_PERIOD, tcmr_period)
> - | SSC_BF(TCMR_STTDLY, START_DELAY)
> - | SSC_BF(TCMR_START, SSC_START_FALLING_RF)
> - | SSC_BF(TCMR_CKI, SSC_CKI_FALLING)
> - | SSC_BF(TCMR_CKO, SSC_CKO_CONTINUOUS)
> - | SSC_BF(TCMR_CKS, SSC_CKS_DIV);
> -
> - tfmr =SSC_BF(TFMR_FSLEN_EXT, fslen_ext)
> - | SSC_BF(TFMR_FSEDGE, SSC_FSEDGE_POSITIVE)
> - | SSC_BF(TFMR_FSDEN, 0)
> - | SSC_BF(TFMR_FSOS, SSC_FSOS_NEGATIVE)
> - | SSC_BF(TFMR_FSLEN, fslen)
> - | SSC_BF(TFMR_DATNB, (channels - 1))
> - | SSC_BIT(TFMR_MSBF)
> - | SSC_BF(TFMR_DATDEF, 0)
> - | SSC_BF(TFMR_DATLEN, (bits - 1));
> - break;
> -
> - case SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_CBM_CFM:
> - /* I2S format, CODEC supplies BCLK and LRC clocks. */
> - rcmr =SSC_BF(RCMR_PERIOD, 0)
> - | SSC_BF(RCMR_STTDLY, START_DELAY)
> - | SSC_BF(RCMR_START, SSC_START_FALLING_RF)
> - | SSC_BF(RCMR_CKI, SSC_CKI_RISING)
> - | SSC_BF(RCMR_CKO, SSC_CKO_NONE)
> - | SSC_BF(RCMR_CKS, ssc->clk_from_rk_pin ?
> -SSC_CKS_PIN : SSC_CKS_CLOCK);
> -
> - rfmr =SSC_BF(RFMR_FSEDGE, SSC_FSEDGE_POSITIVE)
> - | SSC_BF(RFMR_FSOS, SSC_FSOS_NONE)
> - | SSC_BF(RFMR_FSLEN, 0)
> - | SSC_BF(RFMR_DATNB, (channels - 1))
> -

Re: [RESEND PATCH v3 14/20] mtd: spi_nor: Add a ->setup() method

2019-08-26 Thread Boris Brezillon
On Mon, 26 Aug 2019 13:38:48 +
Schrempf Frieder  wrote:

> On 26.08.19 14:40, Boris Brezillon wrote:
> > On Mon, 26 Aug 2019 12:08:58 +
> >  wrote:
> >   
> >> From: Tudor Ambarus 
> >>
> >> nor->params.setup() configures the SPI NOR memory. Useful for SPI NOR
> >> flashes that have peculiarities to the SPI NOR standard, e.g.
> >> different opcodes, specific address calculation, page size, etc.
> >> Right now the only user will be the S3AN chips, but other
> >> manufacturers can implement it if needed.
> >>
> >> Move spi_nor_setup() related code in order to avoid a forward
> >> declaration to spi_nor_default_setup().
> >>
> >> Reviewed-by: Boris Brezillon   
> > 
> > Nitpick: R-bs should normally be placed after your SoB.  
> 
> Just a question unrelated to the patch content:
> 
> I learned to add R-b tags after my SoB when submitting MTD patches, but 
> recently I submitted a patch to the serial subsystem and was told to put 
> my SoB last. Is there an "official" rule for this? And if so where to 
> find it?

Should match the order of addition: if you picked an existing patch that
had already received R-b/A-b tags and applied it to your tree you
should add your SoB at the end. But if you are the author, your SoB
should come first. At least that's the rule I follow :-).


[tip: x86/urgent] uprobes/x86: Fix detection of 32-bit user mode

2019-08-26 Thread tip-bot2 for Sebastian Mayr
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 9212ec7d8357ea630031e89d0d399c761421c83b
Gitweb:
https://git.kernel.org/tip/9212ec7d8357ea630031e89d0d399c761421c83b
Author:Sebastian Mayr 
AuthorDate:Sun, 28 Jul 2019 17:26:17 +02:00
Committer: Thomas Gleixner 
CommitterDate: Mon, 26 Aug 2019 15:55:09 +02:00

uprobes/x86: Fix detection of 32-bit user mode

32-bit processes running on a 64-bit kernel are not always detected
correctly, causing the process to crash when uretprobes are installed.

The reason for the crash is that in_ia32_syscall() is used to determine the
process's mode, which only works correctly when called from a syscall.

In the case of uretprobes, however, the function is called from a exception
and always returns 'false' on a 64-bit kernel. In consequence this leads to
corruption of the process's return address.

Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which
is correct in any situation.

[ tglx: Add a comment and the following historical info ]

This should have been detected by the rename which happened in commit

  abfb9498ee13 ("x86/entry: Rename is_{ia32,x32}_task() to 
in_{ia32,x32}_syscall()")

which states in the changelog:

The is_ia32_task()/is_x32_task() function names are a big misnomer: they
suggests that the compat-ness of a system call is a task property, which
is not true, the compatness of a system call purely depends on how it
was invoked through the system call layer.
.

and then it went and blindly renamed every call site.

Sadly enough this was already mentioned here:

   8faaed1b9f50 ("uprobes/x86: Introduce sizeof_long(), cleanup 
adjust_ret_addr() and
arch_uretprobe_hijack_return_addr()")

where the changelog says:

TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
not necessarily mean 32bit. Fortunately syscall-like insns can't be
probed so it actually works, but it would be better to rename and
use is_ia32_frame().

and goes all the way back to:

0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions")

Oh well. 7+ years until someone actually tried a uretprobe on a 32bit
process on a 64bit kernel

Fixes: 0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep 
exceptions")
Signed-off-by: Sebastian Mayr 
Signed-off-by: Thomas Gleixner 
Cc: Masami Hiramatsu 
Cc: Dmitry Safonov 
Cc: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20190728152617.7308-1...@sam.st
---
 arch/x86/kernel/uprobes.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index d8359eb..8cd745e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -508,9 +508,12 @@ struct uprobe_xol_ops {
void(*abort)(struct arch_uprobe *, struct pt_regs *);
 };
 
-static inline int sizeof_long(void)
+static inline int sizeof_long(struct pt_regs *regs)
 {
-   return in_ia32_syscall() ? 4 : 8;
+   /*
+* Check registers for mode as in_xxx_syscall() does not apply here.
+*/
+   return user_64bit_mode(regs) ? 8 : 4;
 }
 
 static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs 
*regs)
@@ -521,9 +524,9 @@ static int default_pre_xol_op(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 
 static int emulate_push_stack(struct pt_regs *regs, unsigned long val)
 {
-   unsigned long new_sp = regs->sp - sizeof_long();
+   unsigned long new_sp = regs->sp - sizeof_long(regs);
 
-   if (copy_to_user((void __user *)new_sp, , sizeof_long()))
+   if (copy_to_user((void __user *)new_sp, , sizeof_long(regs)))
return -EFAULT;
 
regs->sp = new_sp;
@@ -556,7 +559,7 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, 
struct pt_regs *regs
long correction = utask->vaddr - utask->xol_vaddr;
regs->ip += correction;
} else if (auprobe->defparam.fixups & UPROBE_FIX_CALL) {
-   regs->sp += sizeof_long(); /* Pop incorrect return address */
+   regs->sp += sizeof_long(regs); /* Pop incorrect return address 
*/
if (emulate_push_stack(regs, utask->vaddr + 
auprobe->defparam.ilen))
return -ERESTART;
}
@@ -675,7 +678,7 @@ static int branch_post_xol_op(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 * "call" insn was executed out-of-line. Just restore ->sp and restart.
 * We could also restore ->ip and try to call branch_emulate_op() again.
 */
-   regs->sp += sizeof_long();
+   regs->sp += sizeof_long(regs);
return -ERESTART;
 }
 
@@ -1056,7 +1059,7 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 unsigned long
 arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct 
pt_regs *regs)
 {
-   int rasize 

Re: Linux 5.2.10

2019-08-26 Thread Jörg-Volker Peetz
For me the command

$ gpg --receive-keys DEA66FF797772CDC

did the key import and I was able to verify the signature via
https://www.kernel.org/category/signatures.html
and then the git tag v5.2.10.

Thank you both very much.

Best regards,
Jörg.



Re: BoF on LPC 2019 : Linux Perf advancements for compute intensive and server systems

2019-08-26 Thread Arnaldo Carvalho de Melo
Em Mon, Aug 26, 2019 at 02:36:48PM +0300, Alexey Budankov escreveu:
> 
> Hi,
> 
> There is a BoF session scheduled on Linux Plumbers Conference 2019 event.
> If you plan attend the event feel free to join and discuss about the BoF 
> topic and beyond:
> 
> Linux Perf advancements for compute intensive and server systems:
> 
> "Modern server and compute intensive systems are naturally built around 
>  several top performance CPUs with large amount of cores and equipped 
>  by shared memory that spans a number of NUMA domains. Compute intensive 
>  workloads usually implement highly parallel CPU bound cyclic codes 
>  performing mathematics calculations that reference data located in 
>  the shared memory. Performance observability and profiling of these 
>  workloads on such systems have unique characteristics and impose specific 
>  requirements on software performance tools. The requirements include 
>  tools CPU scalability, coping with high rate and volume of collected 
>  performance data as well as NUMA awareness. In order to fulfill that 
>  requirements a number of extensions have been implemented in Linux Perf 
>  tool that are currently a part of the Linux kernel source tree 
>  [1], [2], [3], [4]"

All those are already merged, after long reviewing phases and lots of
testing, right?

I think the next step for people working in this area, in preparation
for this BoF, is to list what are their current efforts, like Ian et all
did in:

  https://linuxplumbersconf.org/event/4/contributions/291/

- Arnaldo
 
> Best regards,
> Alexey
> 
> [1] https://marc.info/?l=linux-kernel=154149439404555=2
> [2] https://marc.info/?l=linux-kernel=154817912621465=2
> [3] https://marc.info/?l=linux-kernel=155293062518459=2
> [4] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html

-- 

- Arnaldo


Re: [PATCH v2 1/5] ACPI: Enable driver and firmware hints to control power at probe time

2019-08-26 Thread Sakari Ailus
On Mon, Aug 26, 2019 at 03:34:39PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Aug 26, 2019 at 01:32:00PM +0300, Sakari Ailus wrote:
> > Hi Greg,
> > 
> > On Mon, Aug 26, 2019 at 10:43:43AM +0200, Greg Kroah-Hartman wrote:
> > 
> > ...
> > 
> > > > diff --git a/include/linux/device.h b/include/linux/device.h
> > > > index 6717adee33f01..4bc0ea4a3201a 100644
> > > > --- a/include/linux/device.h
> > > > +++ b/include/linux/device.h
> > > > @@ -248,6 +248,12 @@ enum probe_type {
> > > >   * @owner: The module owner.
> > > >   * @mod_name:  Used for built-in modules.
> > > >   * @suppress_bind_attrs: Disables bind/unbind via sysfs.
> > > > + * @probe_low_power: The driver supports its probe function being 
> > > > called while
> > > > + *  the device is in a low power state, independently 
> > > > of the
> > > > + *  expected behaviour on combination of a given bus 
> > > > and
> > > > + *  firmware interface etc. The driver is responsible 
> > > > for
> > > > + *  powering the device on using runtime PM in such 
> > > > case.
> > > > + *  This configuration has no effect if CONFIG_PM is 
> > > > disabled.
> > > >   * @probe_type:Type of the probe (synchronous or asynchronous) 
> > > > to use.
> > > >   * @of_match_table: The open firmware table.
> > > >   * @acpi_match_table: The ACPI match table.
> > > > @@ -285,6 +291,7 @@ struct device_driver {
> > > > const char  *mod_name;  /* used for built-in 
> > > > modules */
> > > >  
> > > > bool suppress_bind_attrs;   /* disables bind/unbind via 
> > > > sysfs */
> > > > +   bool probe_low_power;
> > > 
> > > Ick, no, this should be a bus-specific thing to handle such messed up
> > > hardware.  Why polute this in the driver core?
> > 
> > The alternative could be to make it I²C specific indeed; the vast majority
> > of camera sensors are I²C devices these days.
> 
> Why is this even needed to be a bus/device attribute at all?  You are
> checking the firmware property in the probe function, just do the logic
> there as you are, what needs to be saved to the bus's logic?

By the time the driver gets hold of the device, or gets to control its
power state, the I²C framework has already called dev_pm_domain_attach() to
power on the device. The I²C drivers on ACPI based systems expect the
device to be powered on for probe.

-- 
Sakari Ailus
sakari.ai...@linux.intel.com


Re: Linux-next-20190823: x86_64/i386: prot_hsymlinks.c:325: Failed to run cmd: useradd hsym

2019-08-26 Thread Naresh Kamboju
Hi Jan and Cyril,

On Mon, 26 Aug 2019 at 16:35, Jan Stancek  wrote:
>
>
>
> - Original Message -
> > Hi!
> > > Do you see this LTP prot_hsymlinks failure on linux next 20190823 on
> > > x86_64 and i386 devices?
> > >
> > > test output log,
> > > useradd: failure while writing changes to /etc/passwd
> > > useradd: /home/hsym was created, but could not be removed
> >
> > This looks like an unrelated problem, failure to write to /etc/passwd
> > probably means that filesystem is full or some problem happend and how
> > is remounted RO.
>
> In Naresh' example, root is on NFS:
>   root=/dev/nfs rw 
> nfsroot=10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-nfsrootfs-tyuevoxm,tcp,hard,intr

Right !
root is mounted on NFS.

>
> 10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-nfsrootfs-tyuevoxm 
> on / type nfs 
> (rw,relatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.123,mountvers=1,mountproto=tcp,local_lock=all,addr=10.66.16.123)
> devtmpfs on /dev type devtmpfs 
> (rw,relatime,size=3977640k,nr_inodes=994410,mode=755)
>
> Following message repeats couple times in logs:
>   NFS: Server wrote zero bytes, expected XXX
>
> Naresh, can you check if there are any errors on NFS server side?

I have re-tested the failed tests on next-20190822 and all get pass
which is also
using same NFS server [1] [2].

> Maybe run NFS cthon against that server with client running next-20190822 and 
> next-20190823.

Thanks for the pointers.
I will setup and run NFS cthon on next-20190822 and next-20190823.

>
> >
> > I do not see the kernel messages from this job anywhere at the job
> > pages, is it stored somewhere?
>
> It appears to be mixed in same log file:
>   
> https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190823/testrun/886412/log

For the record the following tests failed on linux -next-20190823 on x86_64
and i386. The filesystem is mounted on NFS and tests are using
locally mounted hard drive ( with -d /scratch ).

Juno-r2 device filesystem mounted on NFS and did not see these errors
and test getting pass on -next-20190823.

These failures are reproducible all time on next-20190823 kernel on x86_64
and i386 device with root mounted on NFS [3] [4] [5] [6].

I will git bisect to find out which is bad commit.

prot_hsymlinks: [3]
--
useradd: failure while writing changes to /etc/passwd
useradd: /home/hsym was created, but could not be removed
userdel: user 'hsym' does not exist
prot_hsymlinks1  TBROK  :  prot_hsymlinks.c:325: Failed to run
cmd: useradd hsym
prot_hsymlinks2  TBROK  :  prot_hsymlinks.c:325: Remaining cases broken
prot_hsymlinks3  TBROK  :  prot_hsymlinks.c:325: Failed to run
cmd: userdel -r hsym
prot_hsymlinks4  TBROK  :  tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).

logrotate01: [4]
-
compressing log with: /bin/gzip
error: error creating temp state file /var/lib/logrotate.status.tmp:
Input/output error
logrotate011  TFAIL  :  ltpapicmd.c:154: Test #1: logrotate
command exited with 1 return code. Output:

sem_unlink_2-2: [5]
--
make[3]: Entering directory
'/opt/ltp/testcases/open_posix_testsuite/conformance/interfaces/sem_unlink'
cat: write error: Input/output error
conformance/interfaces/sem_unlink/sem_unlink_2-2: execution: FAILED

syslog{01 ...10} [6]
---
cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
syslog011  TBROK  :  ltpapicmd.c:188: failed to backup /etc/syslog.conf

cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
syslog021  TBROK  :  ltpapicmd.c:188: failed to backup /etc/syslog.conf

...
cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
syslog101  TBROK  :  ltpapicmd.c:188: failed to backup /etc/syslog.conf

ref:
PASS on 20190222:
[1] https://lkft.validation.linaro.org/scheduler/job/890446#L1232
[2] https://lkft.validation.linaro.org/scheduler/job/890454

FAILED on 20190823:
[3] https://lkft.validation.linaro.org/scheduler/job/890404#L1245
[4] https://lkft.validation.linaro.org/scheduler/job/886408#L2544
[5] https://lkft.validation.linaro.org/scheduler/job/886409#L3088
[6] https://lkft.validation.linaro.org/scheduler/job/890400#L1234

 - Naresh


Re: [PATCH v2] fs/proc/page: Skip uninitialized page when iterating page structures

2019-08-26 Thread Waiman Long
On 8/26/19 9:43 AM, Waiman Long wrote:
> On 8/26/19 9:25 AM, Matthew Wilcox wrote:
>>
>> Would this not work equally well?
>>
>> +++ b/fs/proc/page.c
>> @@ -46,7 +46,8 @@ static ssize_t kpagecount_read(struct file *file, char 
>> __user *buf,
>> ppage = pfn_to_page(pfn);
>> else
>> ppage = NULL;
>> -   if (!ppage || PageSlab(ppage) || page_has_type(ppage))
>> +   if (!ppage || PageSlab(ppage) || page_has_type(ppage) ||
>> +   PagePoisoned(ppage))
>> pcount = 0;
>> else
>> pcount = page_mapcount(ppage);
>>
> That is my initial thought too. However, I couldn't find out where the
> memory of the uninitialized page structures may have been initialized
> somehow. The only thing I found is when vm_debug is on that the page
> structures are indeed poisoned. Without that it is probably just
> whatever the content that the memory have when booting up the kernel.
>
> It just happens on the test system that I used the memory of those page
> structures turned out to be -1. It may be different in other systems
> that can still crash the kernel, but not detected by the PagePoisoned()
> check. That is why I settle on the current scheme which is more general
> and don't rely on the memory get initialized in a certain way.

Actually, I have also thought about always poisoning the page
structures. However, that will introduce additional delay in the boot up
process which can be problematic especially if the system has large
amount of persistent memory.

Cheers,
Longman



Re: [PATCH v3 4/5] writeback, memcg: Implement cgroup_writeback_by_id()

2019-08-26 Thread Jan Kara
On Wed 21-08-19 14:02:10, Tejun Heo wrote:
> Implement cgroup_writeback_by_id() which initiates cgroup writeback
> from bdi and memcg IDs.  This will be used by memcg foreign inode
> flushing.
> 
> v2: Use wb_get_lookup() instead of wb_get_create() to avoid creating
> spurious wbs.
> 
> v3: Interpret 0 @nr as 1.25 * nr_dirty to implement best-effort
> flushing while avoding possible livelocks.
> 
> Signed-off-by: Tejun Heo 

The patch looks good to me. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/fs-writeback.c |   83 
> ++
>  include/linux/writeback.h |2 +
>  2 files changed, 85 insertions(+)
> 
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -892,6 +892,89 @@ restart:
>  }
>  
>  /**
> + * cgroup_writeback_by_id - initiate cgroup writeback from bdi and memcg IDs
> + * @bdi_id: target bdi id
> + * @memcg_id: target memcg css id
> + * @nr_pages: number of pages to write, 0 for best-effort dirty flushing
> + * @reason: reason why some writeback work initiated
> + * @done: target wb_completion
> + *
> + * Initiate flush of the bdi_writeback identified by @bdi_id and @memcg_id
> + * with the specified parameters.
> + */
> +int cgroup_writeback_by_id(u64 bdi_id, int memcg_id, unsigned long nr,
> +enum wb_reason reason, struct wb_completion *done)
> +{
> + struct backing_dev_info *bdi;
> + struct cgroup_subsys_state *memcg_css;
> + struct bdi_writeback *wb;
> + struct wb_writeback_work *work;
> + int ret;
> +
> + /* lookup bdi and memcg */
> + bdi = bdi_get_by_id(bdi_id);
> + if (!bdi)
> + return -ENOENT;
> +
> + rcu_read_lock();
> + memcg_css = css_from_id(memcg_id, _cgrp_subsys);
> + if (memcg_css && !css_tryget(memcg_css))
> + memcg_css = NULL;
> + rcu_read_unlock();
> + if (!memcg_css) {
> + ret = -ENOENT;
> + goto out_bdi_put;
> + }
> +
> + /*
> +  * And find the associated wb.  If the wb isn't there already
> +  * there's nothing to flush, don't create one.
> +  */
> + wb = wb_get_lookup(bdi, memcg_css);
> + if (!wb) {
> + ret = -ENOENT;
> + goto out_css_put;
> + }
> +
> + /*
> +  * If @nr is zero, the caller is attempting to write out most of
> +  * the currently dirty pages.  Let's take the current dirty page
> +  * count and inflate it by 25% which should be large enough to
> +  * flush out most dirty pages while avoiding getting livelocked by
> +  * concurrent dirtiers.
> +  */
> + if (!nr) {
> + unsigned long filepages, headroom, dirty, writeback;
> +
> + mem_cgroup_wb_stats(wb, , , ,
> +   );
> + nr = dirty * 10 / 8;
> + }
> +
> + /* issue the writeback work */
> + work = kzalloc(sizeof(*work), GFP_NOWAIT | __GFP_NOWARN);
> + if (work) {
> + work->nr_pages = nr;
> + work->sync_mode = WB_SYNC_NONE;
> + work->range_cyclic = 1;
> + work->reason = reason;
> + work->done = done;
> + work->auto_free = 1;
> + wb_queue_work(wb, work);
> + ret = 0;
> + } else {
> + ret = -ENOMEM;
> + }
> +
> + wb_put(wb);
> +out_css_put:
> + css_put(memcg_css);
> +out_bdi_put:
> + bdi_put(bdi);
> + return ret;
> +}
> +
> +/**
>   * cgroup_writeback_umount - flush inode wb switches for umount
>   *
>   * This function is called when a super_block is about to be destroyed and
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -217,6 +217,8 @@ void wbc_attach_and_unlock_inode(struct
>  void wbc_detach_inode(struct writeback_control *wbc);
>  void wbc_account_cgroup_owner(struct writeback_control *wbc, struct page 
> *page,
> size_t bytes);
> +int cgroup_writeback_by_id(u64 bdi_id, int memcg_id, unsigned long nr_pages,
> +enum wb_reason reason, struct wb_completion *done);
>  void cgroup_writeback_umount(void);
>  
>  /**
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH] uprobes/x86: fix detection of 32-bit user mode

2019-08-26 Thread Thomas Gleixner
On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> On Fri, 23 Aug 2019, Andy Lutomirski wrote:
> > > On Aug 23, 2019, at 5:03 PM, Thomas Gleixner  wrote:
> > > 
> > >> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> > >> On Fri, 23 Aug 2019, Andy Lutomirski wrote:
> >  On Aug 23, 2019, at 4:44 PM, Thomas Gleixner  
> >  wrote:
> >  
> > >> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> > >> On Sun, 28 Jul 2019, Sebastian Mayr wrote:
> > >> 
> > >> -static inline int sizeof_long(void)
> > >> +static inline int sizeof_long(struct pt_regs *regs)
> > >> {
> > >> -return in_ia32_syscall() ? 4 : 8;
> > > 
> > > This wants a comment.
> > > 
> > >> +return user_64bit_mode(regs) ? 8 : 4;
> >  
> >  The more simpler one liner is to check
> >  
> >    test_thread_flag(TIF_IA32)
> > >>> 
> > >>> I still want to finish killing TIF_IA32 some day.  Let’s please not add 
> > >>> new users.
> > >> 
> > >> Well, yes and no. This needs to be backported 
> > > 
> > > And TBH the magic in user_64bit_mode() is not pretty either.
> > > 
> > It’s only magic on Xen. I should probably stick a
> > cpu_feature_enabled(X86_FEATURE_XENPV) in there instead.
> 
> For backporting sake I really prefer the TIF version. One usage site more
> is not the end of the world. We can add the user_64bit_mode() variant from
> Sebastian on top as a cleanup right away so mainline is clean.

Bah, scratch it. I take the proper one right away.


Re: [PATCH 2/2] x86/microcode/intel: Issue the revision updated message only on the BSP

2019-08-26 Thread Borislav Petkov
On Mon, Aug 26, 2019 at 09:34:01AM -0400, Boris Ostrovsky wrote:
> AMD too?

We're not doing this output shortening on AMD at all. As before,
non-ugly patches are welcome. :)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v2 1/6] ASoC: atmel: enable SOC_SSC_PDC and SOC_SSC_DMA in Kconfig

2019-08-26 Thread Codrin.Ciubotariu
On 24.08.2019 23:26, Michał Mirosław wrote:
> Allow SSC to be used on platforms described using audio-graph-card
> in Device Tree.
> 
> Signed-off-by: Michał Mirosław 

Reviewed-by: Codrin Ciubotariu 

Thanks!

Best regards,
Codrin

> 
> ---
>   v2: extended to PDC mode
>   reworked and fixed Kconfig option dependencies
> 
> ---
>   sound/soc/atmel/Kconfig | 30 ++
>   1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/sound/soc/atmel/Kconfig b/sound/soc/atmel/Kconfig
> index 06c1d5ce642c..f118c229ed82 100644
> --- a/sound/soc/atmel/Kconfig
> +++ b/sound/soc/atmel/Kconfig
> @@ -12,25 +12,31 @@ if SND_ATMEL_SOC
>   config SND_ATMEL_SOC_PDC
>   tristate
>   depends on HAS_DMA
> - default m if SND_ATMEL_SOC_SSC_PDC=m && SND_ATMEL_SOC_SSC=m
> - default y if SND_ATMEL_SOC_SSC_PDC=y || (SND_ATMEL_SOC_SSC_PDC=m && 
> SND_ATMEL_SOC_SSC=y)
> -
> -config SND_ATMEL_SOC_SSC_PDC
> - tristate
>   
>   config SND_ATMEL_SOC_DMA
>   tristate
>   select SND_SOC_GENERIC_DMAENGINE_PCM
> - default m if SND_ATMEL_SOC_SSC_DMA=m && SND_ATMEL_SOC_SSC=m
> - default y if SND_ATMEL_SOC_SSC_DMA=y || (SND_ATMEL_SOC_SSC_DMA=m && 
> SND_ATMEL_SOC_SSC=y)
> -
> -config SND_ATMEL_SOC_SSC_DMA
> - tristate
>   
>   config SND_ATMEL_SOC_SSC
>   tristate
> - default y if SND_ATMEL_SOC_SSC_DMA=y || SND_ATMEL_SOC_SSC_PDC=y
> - default m if SND_ATMEL_SOC_SSC_DMA=m || SND_ATMEL_SOC_SSC_PDC=m
> +
> +config SND_ATMEL_SOC_SSC_PDC
> + tristate "SoC PCM DAI support for AT91 SSC controller using PDC"
> + depends on ATMEL_SSC
> + select SND_ATMEL_SOC_PDC
> + select SND_ATMEL_SOC_SSC
> + help
> +   Say Y or M if you want to add support for Atmel SSC interface
> +   in PDC mode configured using audio-graph-card in device-tree.
> +
> +config SND_ATMEL_SOC_SSC_DMA
> + tristate "SoC PCM DAI support for AT91 SSC controller using DMA"
> + depends on ATMEL_SSC
> + select SND_ATMEL_SOC_DMA
> + select SND_ATMEL_SOC_SSC
> + help
> +   Say Y or M if you want to add support for Atmel SSC interface
> +   in DMA mode configured using audio-graph-card in device-tree.
>   
>   config SND_AT91_SOC_SAM9G20_WM8731
>   tristate "SoC Audio support for WM8731-based At91sam9g20 evaluation 
> board"
> 



linux-next: Tree for Aug 26

2019-08-26 Thread Stephen Rothwell
Hi all,

Changes since 20190823:

The mips tree gained a conflict against the kbuild tree.

The net-next tree gained a conflict against the net tree.

The drm tree gained conflicts against Linus' and the drm-misc-fixes trees.

The clockevents tree gained a conflict against the tip tree.

The gpio-brgl tree gained a conflict against the gpio tree.

The pinctrl tree gained a conflict against the gpio tree.

Non-merge commits (relative to Linus' tree): 8397
 8546 files changed, 418536 insertions(+), 239036 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 309 trees (counting Linus' and 77 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (a55aa89aab90 Linux 5.3-rc6)
Merging fixes/master (609488bc979f Linux 5.3-rc2)
Merging kbuild-current/fixes (451577f3e3a9 Merge tag 'kbuild-fixes-v5.3-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging arc-current/for-curr (e86d94fdda8e ARC: unwind: Mark expected switch 
fall-throughs)
Merging arm-current/fixes (693898371711 ARM: 8897/1: check stmfd instruction 
using right shift)
Merging arm-soc-fixes/arm/fixes (305cd70ec311 Merge tag 'amlogic-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into 
arm/fixes)
Merging arm64-fixes/for-next/fixes (b6143d10d23e arm64: ftrace: Ensure module 
ftrace trampoline is coherent with I-side)
Merging m68k-current/for-linus (f28a1f16135c m68k: Don't select 
ARCH_HAS_DMA_PREP_COHERENT for nommu or coldfire)
Merging powerpc-fixes/fixes (b9ee5e04fd77 powerpc/64e: Drop stale call to 
smp_processor_id() which hangs SMP startup)
Merging s390-fixes/fixes (d45331b00ddb Linux 5.3-rc4)
Merging sparc/master (038029c03e21 sparc: remove unneeded uapi/asm/statfs.h)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (803f3e22ae10 ipv4: mpls: fix mpls_xmit for iptunnel)
Merging bpf/master (2c238177bd7f bpf: allow narrow loads of some 
sk_reuseport_md fields with offset > 0)
Merging ipsec/master (769a807d0b41 xfrm: policy: avoid warning splat when 
merging nodes)
Merging netfilter/master (211c46245215 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf)
Merging ipvs/master (58e8b37069ff Merge branch 'net-phy-dp83867-add-some-fixes')
Merging wireless-drivers/master (5a8c31aa6357 iwlwifi: pcie: fix recognition of 
QuZ devices)
Merging mac80211/master (0d31d4dbf384 Revert "cfg80211: fix processing world 
regdomain when non modular")
Merging rdma-fixes/for-rc (c536277e0db1 RDMA/siw: Fix 64/32bit pointer 
inconsistency)
Merging sound-current/for-linus (75545304eba6 ALSA: seq: Fix potential 
concurrent access to the deleted pool)
Merging sound-asoc-fixes/for-linus (f1ae2996ca31 Merge branch 'asoc-5.3' into 
asoc-linus)
Merging regmap-fixes/for-linus (0161b8716465 Merge branch 'regmap-5.3' into 
regmap-linus)
Merging regulator-fixes/for-linus (df932f9af7b2 Merge branch 'regulator-5.3' 
into regulator-linus)
Merging spi-fixes/for-linus (0fb2637a974f Merge branch 'spi-5.3' into spi-linus)
Merging pci-current/for-linus (7bafda88de20 Documentation PCI: Fix 
pciebus-howto.rst filename typo)
Merging driver-core.current/driver-core-linus (d45331b00ddb Linux 5.3-rc4)
Merging tty.current/tty-linus (d45331b00ddb Linux 5.3-rc4)
Merging usb.current/usb-linus (08d676d1685c 

Re: [RFC PATCH 2/2] livepatch: Clear relocation targets on a module removal

2019-08-26 Thread Nicolai Stange
Josh Poimboeuf  writes:

> On Wed, Aug 14, 2019 at 01:06:09PM +0200, Miroslav Benes wrote:
>> > Really, we should be going in the opposite direction, by creating module
>> > dependencies, like all other kernel modules do, ensuring that a module
>> > is loaded *before* we patch it.  That would also eliminate this bug.
>> 
>> Yes, but it is not ideal either with cumulative one-fixes-all patch 
>> modules. It would load also modules which are not necessary for a 
>> customer and I know that at least some customers care about this. They 
>> want to deploy only things which are crucial for their systems.

Security concerns set aside, some of the patched modules might get
distributed separately from the main kernel through some sort of
kernel-*-extra packages and thus, not be found on some target system
at all. Or they might have been blacklisted.


> If you frame the question as "do you want to destabilize the live
> patching infrastucture" then the answer might be different.
>
> We should look at whether it makes sense to destabilize live patching
> for everybody, for a small minority of people who care about a small
> minority of edge cases.
>
> Or maybe there's some other solution we haven't thought about, which
> fits more in the framework of how kernel modules already work.
>
>> We could split patch modules as you proposed in the past, but that have 
>> issues as well.
>
> Right, I'm not really crazy about that solution either.
>
> Here's another idea: per-object patch modules.  Patches to vmlinux are
> in a vmlinux patch module.  Patches to kvm.ko are in a kvm patch module.
> That would require:
>
> - Careful management of dependencies between object-specific patches.
>   Maybe that just means that exported function ABIs shouldn't change.
>
> - Some kind of hooking into modprobe to ensure the patch module gets
>   loaded with the real one.
>
> - Changing 'atomic replace' to allow patch modules to be per-object.
>

Perhaps I'm misunderstanding, but supporting only per-object livepatch
modules would make livepatch creation for something like commit
15fab63e1e57 ("fs: prevent page refcount overflow in pipe_buf_get"),
CVE-2019-11487 really cumbersome (see the fuse part)?

I think I've seen similar interdependencies between e.g. kvm.ko <->
kvm_intel.ko, but can't find an example right now.


Thanks,

Nicolai

--
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 247165, AG München), GF: Felix Imendörffer


Re: [PATCH v3] /dev/mem: Bail out upon SIGKILL.

2019-08-26 Thread Tetsuo Handa
On 2019/08/26 22:29, Greg Kroah-Hartman wrote:
> On Mon, Aug 26, 2019 at 10:13:25PM +0900, Tetsuo Handa wrote:
>> syzbot found that a thread can stall for minutes inside read_mem() or
>> write_mem() after that thread was killed by SIGKILL [1]. Reading from
>> iomem areas of /dev/mem can be slow, depending on the hardware.
>> While reading 2GB at one read() is legal, delaying termination of killed
>> thread for minutes is bad. Thus, allow reading/writing /dev/mem and
>> /dev/kmem to be preemptible and killable.
>>
>>   [ 1335.912419][T20577] read_mem: sz=4096 count=2134565632
>>   [ 1335.943194][T20577] read_mem: sz=4096 count=2134561536
>>   [ 1335.978280][T20577] read_mem: sz=4096 count=2134557440
>>   [ 1336.011147][T20577] read_mem: sz=4096 count=2134553344
>>   [ 1336.041897][T20577] read_mem: sz=4096 count=2134549248
>>
>> Theoretically, reading/writing /dev/mem and /dev/kmem can become
>> "interruptible". But this patch chose "killable". Future patch will make
>> them "interruptible" so that we can revert to "killable" if some program
>> regressed.
>>
>> [1] 
>> https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e
>>
>> Signed-off-by: Tetsuo Handa 
>> Reported-by: syzbot 
>> ---
>>  drivers/char/mem.c | 21 +
>>  1 file changed, 21 insertions(+)
> 
> What changed from previous versions?
> 
> That goes below the --- line at the very least.

(1) Moved fatal_signal_pending() test to end of iteration.
(2) Added need_resched() test before cond_resched().
(3) Removed -EINTR assignment because end of iteration means
that at least one byte was processed (sz > 0).

> 
> thanks,
> 
> greg k-h
> 



Re: [PATCH v4 1/2] kernel.h: Update comment about simple_strto() functions

2019-08-26 Thread Andy Shevchenko
On Thu, Aug 01, 2019 at 09:50:54PM +0200, Miguel Ojeda wrote:
> On Thu, Aug 1, 2019 at 9:29 PM Andy Shevchenko
>  wrote:
> >
> > There were discussions in the past about use cases for
> > simple_strto() functions and, in some rare cases,
> > they have a benefit over kstrto() ones.
> >
> > Update a comment to reduce confusion about special use cases.
> >
> > Suggested-by: Miguel Ojeda 
> 
> I don't recall suggesting this, but I have a bad memory :-)
> 
> Andrew, should I pick both patches myself or do you want to pick this one?

It seems you can take both.

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v2] fs/proc/page: Skip uninitialized page when iterating page structures

2019-08-26 Thread Waiman Long
On 8/26/19 9:25 AM, Matthew Wilcox wrote:
> On Mon, Aug 26, 2019 at 08:43:36AM -0400, Waiman Long wrote:
>> It was found that on a dual-socket x86-64 system with nvdimm, reading
>> /proc/kpagecount may cause the system to panic:
>>
>> ===
>> [   79.917682] BUG: unable to handle page fault for address: fffe
>> [   79.924558] #PF: supervisor read access in kernel mode
>> [   79.929696] #PF: error_code(0x) - not-present page
>> [   79.934834] PGD 87b60d067 P4D 87b60d067 PUD 87b60f067 PMD 0
>> [   79.940494] Oops:  [#1] SMP NOPTI
>> [   79.944157] CPU: 89 PID: 3455 Comm: cp Not tainted 5.3.0-rc5-test+ #14
>> [   79.950682] Hardware name: Dell Inc. PowerEdge R740/07X9K0, BIOS 2.2.11 
>> 06/13/2019
>> [   79.958246] RIP: 0010:kpagecount_read+0xdb/0x1a0
>> [   79.962859] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 f7 48 c1 e7 06 
>> 48 03 3d fe da de 00 74 1d 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 
>> 8b 00 f6 c4 02 75 06 83 7f 30 80 7d 62 31 c0 4c 89 f9 e8 5d c9
>> [   79.981603] RSP: 0018:b0d9c950fe70 EFLAGS: 00010202
>> [   79.986830] RAX: fffe RBX: 8beebe5383c0 RCX: 
>> b0d9c950ff00
>> [   79.993963] RDX: 0001 RSI: 7fd85b29e000 RDI: 
>> e77a2200
>> [   80.001095] RBP: 0002 R08: 0001 R09: 
>> 
>> [   80.008226] R10:  R11: 0001 R12: 
>> 7fd85b29e000
>> [   80.015358] R13: 893f0480 R14: 0088 R15: 
>> 7fd85b29e000
>> [   80.022491] FS:  7fd85b312800() GS:8c359fb0() 
>> knlGS:
>> [   80.030576] CS:  0010 DS:  ES:  CR0: 80050033
>> [   80.036321] CR2: fffe CR3: 004f54a38001 CR4: 
>> 007606e0
>> [   80.043455] DR0:  DR1:  DR2: 
>> 
>> [   80.050586] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>> [   80.057718] PKRU: 5554
>> [   80.060428] Call Trace:
>> [   80.062877]  proc_reg_read+0x39/0x60
>> [   80.066459]  vfs_read+0x91/0x140
>> [   80.069686]  ksys_read+0x59/0xd0
>> [   80.072922]  do_syscall_64+0x59/0x1e0
>> [   80.076588]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [   80.081637] RIP: 0033:0x7fd85a7f5d75
>> ===
>>
>> It turns out the panic was caused by the kpagecount_read() function
>> hitting an uninitialized page structure at PFN 0x88 where all its
>> fields were set to -1. The compound_head value of -1 will mislead the
>> kernel to treat -2 as a pointer to the head page of the compound page
>> leading to the crash.
>>
>> The system have 12 GB of nvdimm ranging from PFN 0x88-0xb7.
>> However, only PFN 0x88c200-0xb7 are released by the nvdimm
>> driver to the kernel and initialized. IOW, PFN 0x88-0x88c1ff
>> remain uninitialized. Perhaps these 196 MB of nvdimm are reserved for
>> internal use.
>>
>> To fix the panic, we need to find out if a page structure has been
>> initialized. This is done now by checking if the PFN is in the range
>> of a memory zone assuming that pages in a zone is either correctly
>> marked as not present in the mem_section structure or have their page
>> structures initialized.
>>
>> Signed-off-by: Waiman Long 
>> ---
>>  fs/proc/page.c | 68 +++---
>>  1 file changed, 65 insertions(+), 3 deletions(-)
> Would this not work equally well?
>
> +++ b/fs/proc/page.c
> @@ -46,7 +46,8 @@ static ssize_t kpagecount_read(struct file *file, char 
> __user *buf,
> ppage = pfn_to_page(pfn);
> else
> ppage = NULL;
> -   if (!ppage || PageSlab(ppage) || page_has_type(ppage))
> +   if (!ppage || PageSlab(ppage) || page_has_type(ppage) ||
> +   PagePoisoned(ppage))
> pcount = 0;
> else
> pcount = page_mapcount(ppage);
>
That is my initial thought too. However, I couldn't find out where the
memory of the uninitialized page structures may have been initialized
somehow. The only thing I found is when vm_debug is on that the page
structures are indeed poisoned. Without that it is probably just
whatever the content that the memory have when booting up the kernel.

It just happens on the test system that I used the memory of those page
structures turned out to be -1. It may be different in other systems
that can still crash the kernel, but not detected by the PagePoisoned()
check. That is why I settle on the current scheme which is more general
and don't rely on the memory get initialized in a certain way.

Cheers,
Longman



Re: [PATCH net v3] ixgbe: fix double clean of tx descriptors with xdp

2019-08-26 Thread Maciej Fijalkowski
On Thu, 22 Aug 2019 20:12:37 +0300
Ilya Maximets  wrote:

> Tx code doesn't clear the descriptors' status after cleaning.
> So, if the budget is larger than number of used elems in a ring, some
> descriptors will be accounted twice and xsk_umem_complete_tx will move
> prod_tail far beyond the prod_head breaking the completion queue ring.
> 
> Fix that by limiting the number of descriptors to clean by the number
> of used descriptors in the tx ring.
> 
> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
> 'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use
> 'next_to_clean' and 'next_to_use' indexes.
> 
> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> Signed-off-by: Ilya Maximets 
> ---
> 
> Version 3:
>   * Reverted some refactoring made for v2.
>   * Eliminated 'budget' for tx clean.
>   * prefetch returned.
> 
> Version 2:
>   * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
> 'ixgbe_xsk_clean_tx_ring()'.
> 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 29 
>  1 file changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 6b609553329f..a3b6d8c89127 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -633,19 +633,17 @@ static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring 
> *tx_ring,
>  bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>   struct ixgbe_ring *tx_ring, int napi_budget)

While you're at it, can you please as well remove the 'napi_budget' argument?
It wasn't used at all even before your patch.

I'm jumping late in, but I was really wondering and hesitated with taking
part in discussion since the v1 of this patch - can you elaborate why simply
clearing the DD bit wasn't sufficient?

Maciej

>  {
> + u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
>   unsigned int total_packets = 0, total_bytes = 0;
> - u32 i = tx_ring->next_to_clean, xsk_frames = 0;
> - unsigned int budget = q_vector->tx.work_limit;
>   struct xdp_umem *umem = tx_ring->xsk_umem;
>   union ixgbe_adv_tx_desc *tx_desc;
>   struct ixgbe_tx_buffer *tx_bi;
> - bool xmit_done;
> + u32 xsk_frames = 0;
>  
> - tx_bi = _ring->tx_buffer_info[i];
> - tx_desc = IXGBE_TX_DESC(tx_ring, i);
> - i -= tx_ring->count;
> + tx_bi = _ring->tx_buffer_info[ntc];
> + tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
>  
> - do {
> + while (ntc != ntu) {
>   if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
>   break;
>  
> @@ -661,22 +659,18 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector 
> *q_vector,
>  
>   tx_bi++;
>   tx_desc++;
> - i++;
> - if (unlikely(!i)) {
> - i -= tx_ring->count;
> + ntc++;
> + if (unlikely(ntc == tx_ring->count)) {
> + ntc = 0;
>   tx_bi = tx_ring->tx_buffer_info;
>   tx_desc = IXGBE_TX_DESC(tx_ring, 0);
>   }
>  
>   /* issue prefetch for next Tx descriptor */
>   prefetch(tx_desc);
> + }
>  
> - /* update budget accounting */
> - budget--;
> - } while (likely(budget));
> -
> - i += tx_ring->count;
> - tx_ring->next_to_clean = i;
> + tx_ring->next_to_clean = ntc;
>  
>   u64_stats_update_begin(_ring->syncp);
>   tx_ring->stats.bytes += total_bytes;
> @@ -688,8 +682,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector 
> *q_vector,
>   if (xsk_frames)
>   xsk_umem_complete_tx(umem, xsk_frames);
>  
> - xmit_done = ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
> - return budget > 0 && xmit_done;
> + return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
>  }
>  
>  int ixgbe_xsk_async_xmit(struct net_device *dev, u32 qid)



Re: [PATCH] powerpc/time: use feature fixup in __USE_RTC() instead of cpu feature.

2019-08-26 Thread Christophe Leroy




Le 26/08/2019 à 15:25, Benjamin Herrenschmidt a écrit :

On Mon, 2019-08-26 at 21:41 +1000, Michael Ellerman wrote:

Christophe Leroy  writes:

sched_clock(), used by printk(), calls __USE_RTC() to know
whether to use realtime clock or timebase.

__USE_RTC() uses cpu_has_feature() which is initialised by
machine_init(). Before machine_init(), __USE_RTC() returns true,
leading to a program check exception on CPUs not having realtime
clock.

In order to be able to use printk() earlier, use feature fixup.
Feature fixups are applies in early_init(), enabling the use of
printk() earlier.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/time.h | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)


The other option would be just to make this a compile time decision, eg.
add CONFIG_PPC_601 and use that to gate whether we use RTC.

Given how many 601 users there are, maybe 1?, I think that would be a
simpler option and avoids complicating the code / binary for everyone
else.


Didn't we ditch 601 support years ago anyway ? We had workaround we
threw out I think...


There are still workarounds for 601 in the kernel, for exemple (in 
arch/powerpc/include/asm/ppc_asm.h)


/* various errata or part fixups */
#ifdef CONFIG_PPC601_SYNC_FIX
#define SYNC\
BEGIN_FTR_SECTION   \
sync;   \
isync;  \
END_FTR_SECTION_IFSET(CPU_FTR_601)
#define SYNC_601\
BEGIN_FTR_SECTION   \
sync;   \
END_FTR_SECTION_IFSET(CPU_FTR_601)
#define ISYNC_601   \
BEGIN_FTR_SECTION   \
isync;  \
END_FTR_SECTION_IFSET(CPU_FTR_601)
#else
#define SYNC
#define SYNC_601
#define ISYNC_601
#endif

But if you think we can get rid of 601 completely, I'm happy with that.

Christophe




Cheers,
Ben.


cheers


diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 54f4ec1f9fab..3455cb54c333 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -42,7 +42,14 @@ struct div_result {
  /* Accessor functions for the timebase (RTC on 601) registers. */
  /* If one day CONFIG_POWER is added just define __USE_RTC as 1 */
  #ifdef CONFIG_PPC_BOOK3S_32
-#define __USE_RTC()(cpu_has_feature(CPU_FTR_USE_RTC))
+static inline bool __USE_RTC(void)
+{
+   asm_volatile_goto(ASM_FTR_IFCLR("nop;", "b %1;", %0) ::
+ "i" (CPU_FTR_USE_RTC) :: l_use_rtc);
+   return false;
+l_use_rtc:
+   return true;
+}
  #else
  #define __USE_RTC()   0
  #endif
--
2.13.3


Re: [RESEND PATCH v3 14/20] mtd: spi_nor: Add a ->setup() method

2019-08-26 Thread Schrempf Frieder
On 26.08.19 14:40, Boris Brezillon wrote:
> On Mon, 26 Aug 2019 12:08:58 +
>  wrote:
> 
>> From: Tudor Ambarus 
>>
>> nor->params.setup() configures the SPI NOR memory. Useful for SPI NOR
>> flashes that have peculiarities to the SPI NOR standard, e.g.
>> different opcodes, specific address calculation, page size, etc.
>> Right now the only user will be the S3AN chips, but other
>> manufacturers can implement it if needed.
>>
>> Move spi_nor_setup() related code in order to avoid a forward
>> declaration to spi_nor_default_setup().
>>
>> Reviewed-by: Boris Brezillon 
> 
> Nitpick: R-bs should normally be placed after your SoB.

Just a question unrelated to the patch content:

I learned to add R-b tags after my SoB when submitting MTD patches, but 
recently I submitted a patch to the serial subsystem and was told to put 
my SoB last. Is there an "official" rule for this? And if so where to 
find it?

> 
>> Signed-off-by: Tudor Ambarus 
>> ---
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 

Re: [PATCH 2/2] x86/microcode/intel: Issue the revision updated message only on the BSP

2019-08-26 Thread Boris Ostrovsky
On 8/24/19 4:53 AM, Borislav Petkov wrote:
> From: Borislav Petkov 
> Date: Sat, 24 Aug 2019 10:01:53 +0200
>
> ... in order to not pollute dmesg with a line for each updated microcode
> engine.
>
> Signed-off-by: Borislav Petkov 
> Cc: Ashok Raj 
> Cc: Boris Ostrovsky 
> Cc: "H. Peter Anvin" 
> Cc: Ingo Molnar 
> Cc: Jon Grimm 
> Cc: kanth.ghatr...@oracle.com
> Cc: konrad.w...@oracle.com
> Cc: Mihai Carabas 
> Cc: patrick.c...@oracle.com
> Cc: Thomas Gleixner 
> Cc: Tom Lendacky 
> Cc: x86-ml 
> ---
>  arch/x86/kernel/cpu/microcode/intel.c | 5 +++--


AMD too?


-boris


Re: [PATCH] ARM: davinci: dm646x: Fix a typo in the comment

2019-08-26 Thread Sekhar Nori
On 23/07/19 3:06 AM, Christophe JAILLET wrote:
> The driver is dedicated to DM646x. So update the description in the top
> most comment accordingly.
> 
> It must have been derived from dm644x.c, but looks DM646 speecific now.
> 
> Signed-off-by: Christophe JAILLET 

Applied to my v5.4/soc branch.

Thanks,
Sekhar


Re: [PATCH v2 1/5] ACPI: Enable driver and firmware hints to control power at probe time

2019-08-26 Thread Greg Kroah-Hartman
On Mon, Aug 26, 2019 at 01:32:00PM +0300, Sakari Ailus wrote:
> Hi Greg,
> 
> On Mon, Aug 26, 2019 at 10:43:43AM +0200, Greg Kroah-Hartman wrote:
> 
> ...
> 
> > > diff --git a/include/linux/device.h b/include/linux/device.h
> > > index 6717adee33f01..4bc0ea4a3201a 100644
> > > --- a/include/linux/device.h
> > > +++ b/include/linux/device.h
> > > @@ -248,6 +248,12 @@ enum probe_type {
> > >   * @owner:   The module owner.
> > >   * @mod_name:Used for built-in modules.
> > >   * @suppress_bind_attrs: Disables bind/unbind via sysfs.
> > > + * @probe_low_power: The driver supports its probe function being called 
> > > while
> > > + *the device is in a low power state, independently 
> > > of the
> > > + *expected behaviour on combination of a given bus 
> > > and
> > > + *firmware interface etc. The driver is responsible 
> > > for
> > > + *powering the device on using runtime PM in such 
> > > case.
> > > + *This configuration has no effect if CONFIG_PM is 
> > > disabled.
> > >   * @probe_type:  Type of the probe (synchronous or asynchronous) to use.
> > >   * @of_match_table: The open firmware table.
> > >   * @acpi_match_table: The ACPI match table.
> > > @@ -285,6 +291,7 @@ struct device_driver {
> > >   const char  *mod_name;  /* used for built-in modules */
> > >  
> > >   bool suppress_bind_attrs;   /* disables bind/unbind via sysfs */
> > > + bool probe_low_power;
> > 
> > Ick, no, this should be a bus-specific thing to handle such messed up
> > hardware.  Why polute this in the driver core?
> 
> The alternative could be to make it I²C specific indeed; the vast majority
> of camera sensors are I²C devices these days.

Why is this even needed to be a bus/device attribute at all?  You are
checking the firmware property in the probe function, just do the logic
there as you are, what needs to be saved to the bus's logic?

thanks,

greg k-h


Re: [tip:x86/urgent] x86/boot/compressed/64: Fix boot on machines with broken E820 table

2019-08-26 Thread Kirill A. Shutemov
On Mon, Aug 26, 2019 at 09:15:39AM +0200, Borislav Petkov wrote:
> On Sun, Aug 25, 2019 at 10:33:15PM -0500, Gustavo A. R. Silva wrote:
> > Hi all,
> > 
> > On 8/19/19 9:16 AM, tip-bot for Kirill A. Shutemov wrote:
> > [..]
> > > 
> > > diff --git a/arch/x86/boot/compressed/pgtable_64.c 
> > > b/arch/x86/boot/compressed/pgtable_64.c
> > > index 5f2d03067ae5..2faddeb0398a 100644
> > > --- a/arch/x86/boot/compressed/pgtable_64.c
> > > +++ b/arch/x86/boot/compressed/pgtable_64.c
> > > @@ -72,6 +72,8 @@ static unsigned long find_trampoline_placement(void)
> > >  
> > >   /* Find the first usable memory region under bios_start. */
> > >   for (i = boot_params->e820_entries - 1; i >= 0; i--) {
> > > + unsigned long new;
> > > +
> > >   entry = _params->e820_table[i];
> > >  
> > >   /* Skip all entries above bios_start. */
> > > @@ -84,15 +86,20 @@ static unsigned long find_trampoline_placement(void)
> > >  
> > >   /* Adjust bios_start to the end of the entry if needed. */
> > >   if (bios_start > entry->addr + entry->size)
> > 
> > Notice that if this condition happens to be false, we end up with an
> > uninitialized variable *new*.
> 
> Yap, good catch.

:facepalm:

> > What would be the right value to assign to *new* at declaration under
> > this condition?
> 
> Looking at the changed flow of the loop, how we use new instead of
> bios_start and how we assign new back to bios_start, I think we should
> do:
> 
>   unsigned long new = bios_start;
> 
> at the beginning...

Right.

What about this:

>From b613c675e6690ef5608a5abf71d90e15ced31b2b Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" 
Date: Mon, 26 Aug 2019 16:26:01 +0300
Subject: [PATCH] x86/boot/compressed/64: Fix missining initialization in
 find_trampoline_placement()

Gustavo noticed that 'new' can be left uninitialized if 'bios_start'
happens to be less or equal to 'entry->addr + entry->size'.

Initialize the variable at the start of the iteration to current value
of 'bios_start'.

Signed-off-by: Kirill A. Shutemov 
Reported-by: "Gustavo A. R. Silva" 
Fixes: 0a46fff2f910 ("x86/boot/compressed/64: Fix boot on machines with broken 
E820 table")
---
 arch/x86/boot/compressed/pgtable_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/pgtable_64.c 
b/arch/x86/boot/compressed/pgtable_64.c
index 2faddeb0398a..c8862696a47b 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -72,7 +72,7 @@ static unsigned long find_trampoline_placement(void)
 
/* Find the first usable memory region under bios_start. */
for (i = boot_params->e820_entries - 1; i >= 0; i--) {
-   unsigned long new;
+   unsigned long new = bios_start;
 
entry = _params->e820_table[i];
 
-- 
 Kirill A. Shutemov


Re: [PATCH] ARM: dts: da850-evm: Use generic jedec, spi-nor for flash

2019-08-26 Thread Sekhar Nori
On 23/07/19 5:40 PM, Adam Ford wrote:
> Logic PD re-spun the L138 and AM1808 SOM's with larger flash.
> The m25p80 driver has a generic 'jedec,spi-nor' compatible option
> which is requests to use whenever possible since it will read the
> JEDEC READ ID opcode.
> 
> Signed-off-by: Adam Ford 

Applied to v5.4/dt

Thanks,
Sekhar


Re: [RESEND PATCH 00/10] ARM: davinci: use the new clocksource driver

2019-08-26 Thread Sekhar Nori
On 08/08/19 1:11 PM, Bartosz Golaszewski wrote:
> śr., 7 sie 2019 o 21:28 Sekhar Nori  napisał(a):
>>
>> On 05/08/19 1:59 PM, Bartosz Golaszewski wrote:
>>> pon., 22 lip 2019 o 15:17 Bartosz Golaszewski  napisał(a):

 From: Bartosz Golaszewski 

 Sekhar,

 the following patches switch DaVinci to using the new clocksource driver 
 which
 is now upstream. They are rebased on top of v5.3-rc1. Additionally the
 following two patches were reverted locally due to a regression in v5.3-rc1
 about which the relevant maintainers have been already notified:

   2eef1399a866 modules: fix BUG when load module with rodata=n
   93651f80dcb6 modules: fix compile error if don't have strict module rwx

 Bartosz Golaszewski (10):
   ARM: davinci: enable the clocksource driver for DT mode
   ARM: davinci: WARN_ON() if clk_get() fails
   ARM: davinci: da850: switch to using the clocksource driver
   ARM: davinci: da830: switch to using the clocksource driver
   ARM: davinci: move timer definitions to davinci.h
   ARM: davinci: dm355: switch to using the clocksource driver
   ARM: davinci: dm365: switch to using the clocksource driver
   ARM: davinci: dm644x: switch to using the clocksource driver
   ARM: davinci: dm646x: switch to using the clocksource driver
   ARM: davinci: remove legacy timer support

  arch/arm/Kconfig|   1 +
  arch/arm/mach-davinci/Makefile  |   3 +-
  arch/arm/mach-davinci/da830.c   |  45 +--
  arch/arm/mach-davinci/da850.c   |  50 +--
  arch/arm/mach-davinci/davinci.h |   3 +
  arch/arm/mach-davinci/devices-da8xx.c   |   1 -
  arch/arm/mach-davinci/devices.c |  19 -
  arch/arm/mach-davinci/dm355.c   |  28 +-
  arch/arm/mach-davinci/dm365.c   |  26 +-
  arch/arm/mach-davinci/dm644x.c  |  28 +-
  arch/arm/mach-davinci/dm646x.c  |  28 +-
  arch/arm/mach-davinci/include/mach/common.h |  17 -
  arch/arm/mach-davinci/include/mach/time.h   |  35 --
  arch/arm/mach-davinci/time.c| 414 
  14 files changed, 110 insertions(+), 588 deletions(-)
  delete mode 100644 arch/arm/mach-davinci/include/mach/time.h
  delete mode 100644 arch/arm/mach-davinci/time.c

 --
 2.21.0

>>>
>>> Hi Sekhar,
>>>
>>> a gentle ping. Is this series good to go in for v5.4?
>>
>> Hi Bartosz, a quick test shows that DM365 fails to boot after this. Can
>> you please see if there is anything obviously wrong for that SoC. Rest
>> seems to be okay.
>>
>> Thanks,
>> Sekhar
> 
> Hi Sekhar,
> 
> just verified on Kevin's dm365-evm rebased on top of v5.3-rc3 and it
> boots fine. I know that davinci failed to boot at v5.3-rc1.
> 
> Let me know if I can help with debugging.

Added series except 7/10 and 10/10 to v5.4/soc. Debug of DM365 issue is
still going on offline. DM365 migration is postponed pending conclusion
of that debug.

Thanks,
Sekhar


[PATCH] ARM: OMAP2+: Delete an unnecessary kfree() call in omap_hsmmc_pdata_init()

2019-08-26 Thread Markus Elfring
From: Markus Elfring 
Date: Mon, 26 Aug 2019 15:05:31 +0200

A null pointer would be passed to a call of the function "kfree" directly
after a call of the function "kzalloc" failed at one place.
Remove this superfluous function call.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 arch/arm/mach-omap2/hsmmc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm/mach-omap2/hsmmc.c b/arch/arm/mach-omap2/hsmmc.c
index 14b9c13c1fa0..63423ea6a240 100644
--- a/arch/arm/mach-omap2/hsmmc.c
+++ b/arch/arm/mach-omap2/hsmmc.c
@@ -32,10 +32,8 @@ static int __init omap_hsmmc_pdata_init(struct 
omap2_hsmmc_info *c,
char *hc_name;

hc_name = kzalloc(HSMMC_NAME_LEN + 1, GFP_KERNEL);
-   if (!hc_name) {
-   kfree(hc_name);
+   if (!hc_name)
return -ENOMEM;
-   }

snprintf(hc_name, (HSMMC_NAME_LEN + 1), "mmc%islot%i", c->mmc, 1);
mmc->name = hc_name;
--
2.23.0



Re: [PATCH v3 03/11] KVM: SVM: Add KVM_SEV_SEND_FINISH command

2019-08-26 Thread Borislav Petkov
On Wed, Jul 10, 2019 at 08:13:03PM +, Singh, Brijesh wrote:
> The command is used to finailize the encryption context created with
> KVM_SEV_SEND_START command.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Joerg Roedel 
> Cc: Borislav Petkov 
> Cc: Tom Lendacky 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> ---
>  .../virtual/kvm/amd-memory-encryption.rst |  8 +++
>  arch/x86/kvm/svm.c| 23 +++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/amd-memory-encryption.rst 
> b/Documentation/virtual/kvm/amd-memory-encryption.rst
> index 060ac2316d69..9864f9215c43 100644
> --- a/Documentation/virtual/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virtual/kvm/amd-memory-encryption.rst
> @@ -289,6 +289,14 @@ Returns: 0 on success, -negative on error
>  __u32 trans_len;
>  };
>  
> +12. KVM_SEV_SEND_FINISH
> +
> +
> +After completion of the migration flow, the KVM_SEV_SEND_FINISH command can 
> be
> +issued by the hypervisor to delete the encryption context.
> +
> +Returns: 0 on success, -negative on error
> +
>  References
>  ==
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 8e815a53c420..be73a87a8c4f 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7168,6 +7168,26 @@ static int sev_send_update_data(struct kvm *kvm, 
> struct kvm_sev_cmd *argp)
>   return ret;
>  }
>  
> +static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> + struct sev_data_send_finish *data;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;

Almost all sev_ command functions do that check, except
sev_guest_init(). You could pull up that check, into svm_mem_enc_op()
and save yourself the repeated pattern:

---
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 273ad624b23d..950282c8c4f7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7225,6 +7225,11 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user 
*argp)
if (copy_from_user(_cmd, argp, sizeof(struct kvm_sev_cmd)))
return -EFAULT;
 
+   if (sev_cmd.id != KVM_SEV_INIT) {
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+   }
+
mutex_lock(>lock);
 
switch (sev_cmd.id) {
---

> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);

Btw, since

  1ec696470c86 ("kvm: svm: Add memcg accounting to KVM allocations")

gfp flags should be GFP_KERNEL_ACCOUNT now.

-- 
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 247165, AG 
München


Re: [PATCH v3] /dev/mem: Bail out upon SIGKILL.

2019-08-26 Thread Greg Kroah-Hartman
On Mon, Aug 26, 2019 at 10:13:25PM +0900, Tetsuo Handa wrote:
> syzbot found that a thread can stall for minutes inside read_mem() or
> write_mem() after that thread was killed by SIGKILL [1]. Reading from
> iomem areas of /dev/mem can be slow, depending on the hardware.
> While reading 2GB at one read() is legal, delaying termination of killed
> thread for minutes is bad. Thus, allow reading/writing /dev/mem and
> /dev/kmem to be preemptible and killable.
> 
>   [ 1335.912419][T20577] read_mem: sz=4096 count=2134565632
>   [ 1335.943194][T20577] read_mem: sz=4096 count=2134561536
>   [ 1335.978280][T20577] read_mem: sz=4096 count=2134557440
>   [ 1336.011147][T20577] read_mem: sz=4096 count=2134553344
>   [ 1336.041897][T20577] read_mem: sz=4096 count=2134549248
> 
> Theoretically, reading/writing /dev/mem and /dev/kmem can become
> "interruptible". But this patch chose "killable". Future patch will make
> them "interruptible" so that we can revert to "killable" if some program
> regressed.
> 
> [1] 
> https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e
> 
> Signed-off-by: Tetsuo Handa 
> Reported-by: syzbot 
> ---
>  drivers/char/mem.c | 21 +
>  1 file changed, 21 insertions(+)

What changed from previous versions?

That goes below the --- line at the very least.

thanks,

greg k-h


Re: [PATCH v1 1/2] vsprintf: introduce %dE for error constants

2019-08-26 Thread Enrico Weigelt, metux IT consult

On 25.08.19 01:37, Uwe Kleine-König wrote:

Hi,

+static noinline_for_stack > +char *errstr(char *buf, char *end, unsigned long long num,> +	 

struct printf_spec spec)> +{
#1: why not putting that into some separate strerror() lib function ?
This is something I've been looking for quite some time (actually
already hacked it up somewhere, sometime, but forgotten ...)

#2: why not just having a big case statement and leave the actual lookup
logic to the compiler ? IMHO, could be written in a very compact way
by some macro magic


+   for (i = 0; i < ARRAY_SIZE(errorcodes); ++i) { > +if 
(num == errorcodes[i].err || num == -errorcodes[i].err) {


why not taking the abs value only once, instead of duplicate comp on
each iteration ?


--mtx

--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
i...@metux.net -- +49-151-27565287


Re: [PATCH v2] fs/proc/page: Skip uninitialized page when iterating page structures

2019-08-26 Thread Matthew Wilcox
On Mon, Aug 26, 2019 at 08:43:36AM -0400, Waiman Long wrote:
> It was found that on a dual-socket x86-64 system with nvdimm, reading
> /proc/kpagecount may cause the system to panic:
> 
> ===
> [   79.917682] BUG: unable to handle page fault for address: fffe
> [   79.924558] #PF: supervisor read access in kernel mode
> [   79.929696] #PF: error_code(0x) - not-present page
> [   79.934834] PGD 87b60d067 P4D 87b60d067 PUD 87b60f067 PMD 0
> [   79.940494] Oops:  [#1] SMP NOPTI
> [   79.944157] CPU: 89 PID: 3455 Comm: cp Not tainted 5.3.0-rc5-test+ #14
> [   79.950682] Hardware name: Dell Inc. PowerEdge R740/07X9K0, BIOS 2.2.11 
> 06/13/2019
> [   79.958246] RIP: 0010:kpagecount_read+0xdb/0x1a0
> [   79.962859] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 f7 48 c1 e7 06 48 
> 03 3d fe da de 00 74 1d 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 
> 00 f6 c4 02 75 06 83 7f 30 80 7d 62 31 c0 4c 89 f9 e8 5d c9
> [   79.981603] RSP: 0018:b0d9c950fe70 EFLAGS: 00010202
> [   79.986830] RAX: fffe RBX: 8beebe5383c0 RCX: 
> b0d9c950ff00
> [   79.993963] RDX: 0001 RSI: 7fd85b29e000 RDI: 
> e77a2200
> [   80.001095] RBP: 0002 R08: 0001 R09: 
> 
> [   80.008226] R10:  R11: 0001 R12: 
> 7fd85b29e000
> [   80.015358] R13: 893f0480 R14: 0088 R15: 
> 7fd85b29e000
> [   80.022491] FS:  7fd85b312800() GS:8c359fb0() 
> knlGS:
> [   80.030576] CS:  0010 DS:  ES:  CR0: 80050033
> [   80.036321] CR2: fffe CR3: 004f54a38001 CR4: 
> 007606e0
> [   80.043455] DR0:  DR1:  DR2: 
> 
> [   80.050586] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [   80.057718] PKRU: 5554
> [   80.060428] Call Trace:
> [   80.062877]  proc_reg_read+0x39/0x60
> [   80.066459]  vfs_read+0x91/0x140
> [   80.069686]  ksys_read+0x59/0xd0
> [   80.072922]  do_syscall_64+0x59/0x1e0
> [   80.076588]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   80.081637] RIP: 0033:0x7fd85a7f5d75
> ===
> 
> It turns out the panic was caused by the kpagecount_read() function
> hitting an uninitialized page structure at PFN 0x88 where all its
> fields were set to -1. The compound_head value of -1 will mislead the
> kernel to treat -2 as a pointer to the head page of the compound page
> leading to the crash.
> 
> The system have 12 GB of nvdimm ranging from PFN 0x88-0xb7.
> However, only PFN 0x88c200-0xb7 are released by the nvdimm
> driver to the kernel and initialized. IOW, PFN 0x88-0x88c1ff
> remain uninitialized. Perhaps these 196 MB of nvdimm are reserved for
> internal use.
> 
> To fix the panic, we need to find out if a page structure has been
> initialized. This is done now by checking if the PFN is in the range
> of a memory zone assuming that pages in a zone is either correctly
> marked as not present in the mem_section structure or have their page
> structures initialized.
> 
> Signed-off-by: Waiman Long 
> ---
>  fs/proc/page.c | 68 +++---
>  1 file changed, 65 insertions(+), 3 deletions(-)

Would this not work equally well?

+++ b/fs/proc/page.c
@@ -46,7 +46,8 @@ static ssize_t kpagecount_read(struct file *file, char __user 
*buf,
ppage = pfn_to_page(pfn);
else
ppage = NULL;
-   if (!ppage || PageSlab(ppage) || page_has_type(ppage))
+   if (!ppage || PageSlab(ppage) || page_has_type(ppage) ||
+   PagePoisoned(ppage))
pcount = 0;
else
pcount = page_mapcount(ppage);



Re: [PATCH] powerpc/time: use feature fixup in __USE_RTC() instead of cpu feature.

2019-08-26 Thread Benjamin Herrenschmidt
On Mon, 2019-08-26 at 21:41 +1000, Michael Ellerman wrote:
> Christophe Leroy  writes:
> > sched_clock(), used by printk(), calls __USE_RTC() to know
> > whether to use realtime clock or timebase.
> > 
> > __USE_RTC() uses cpu_has_feature() which is initialised by
> > machine_init(). Before machine_init(), __USE_RTC() returns true,
> > leading to a program check exception on CPUs not having realtime
> > clock.
> > 
> > In order to be able to use printk() earlier, use feature fixup.
> > Feature fixups are applies in early_init(), enabling the use of
> > printk() earlier.
> > 
> > Signed-off-by: Christophe Leroy 
> > ---
> >  arch/powerpc/include/asm/time.h | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> The other option would be just to make this a compile time decision, eg.
> add CONFIG_PPC_601 and use that to gate whether we use RTC.
> 
> Given how many 601 users there are, maybe 1?, I think that would be a
> simpler option and avoids complicating the code / binary for everyone
> else.

Didn't we ditch 601 support years ago anyway ? We had workaround we
threw out I think...

Cheers,
Ben.

> cheers
> 
> > diff --git a/arch/powerpc/include/asm/time.h 
> > b/arch/powerpc/include/asm/time.h
> > index 54f4ec1f9fab..3455cb54c333 100644
> > --- a/arch/powerpc/include/asm/time.h
> > +++ b/arch/powerpc/include/asm/time.h
> > @@ -42,7 +42,14 @@ struct div_result {
> >  /* Accessor functions for the timebase (RTC on 601) registers. */
> >  /* If one day CONFIG_POWER is added just define __USE_RTC as 1 */
> >  #ifdef CONFIG_PPC_BOOK3S_32
> > -#define __USE_RTC()(cpu_has_feature(CPU_FTR_USE_RTC))
> > +static inline bool __USE_RTC(void)
> > +{
> > +   asm_volatile_goto(ASM_FTR_IFCLR("nop;", "b %1;", %0) ::
> > + "i" (CPU_FTR_USE_RTC) :: l_use_rtc);
> > +   return false;
> > +l_use_rtc:
> > +   return true;
> > +}
> >  #else
> >  #define __USE_RTC()0
> >  #endif
> > -- 
> > 2.13.3



Re: [PATCH -next] rtc: pcf2127: Fix build error without CONFIG_WATCHDOG_CORE

2019-08-26 Thread Guenter Roeck

On 8/26/19 1:12 AM, Yuehaibing wrote:



On 2019/8/23 22:05, Alexandre Belloni wrote:

On 23/08/2019 20:45:53+0800, YueHaibing wrote:

If WATCHDOG_CORE is not set, build fails:

drivers/rtc/rtc-pcf2127.o: In function `pcf2127_probe.isra.6':
drivers/rtc/rtc-pcf2127.c:478: undefined reference to 
`devm_watchdog_register_device'

Add WATCHDOG_CORE Kconfig dependency to fix this.

Reported-by: Hulk Robot 
Fixes: bbc597561ce1 ("rtc: pcf2127: add watchdog feature support")
Signed-off-by: YueHaibing 
---
  drivers/rtc/Kconfig | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 25af63d..9dce7dc 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -886,6 +886,8 @@ config RTC_DRV_DS3232_HWMON
  config RTC_DRV_PCF2127
tristate "NXP PCF2127"
depends on RTC_I2C_AND_SPI
+   depends on WATCHDOG


Definitively not, I fixed it that way:
+   select WATCHDOG_CORE if WATCHDOG



No, this still fails while WATCHDOG is not set



Correct, there are no dummy functions for watchdog device registration.
There would have to be conditional code in the driver if the watchdog
is supposed to be optional.

Guenter


Re: [PATCH] mm: replace is_zero_pfn with is_huge_zero_pmd for thp

2019-08-26 Thread Matthew Wilcox


Why did you not cc Gerald who wrote the patch?  You can't just
run get_maintainers.pl and call it good.

On Sun, Aug 25, 2019 at 02:06:21PM -0600, Yu Zhao wrote:
> For hugely mapped thp, we use is_huge_zero_pmd() to check if it's
> zero page or not.
> 
> We do fill ptes with my_zero_pfn() when we split zero thp pmd, but
>  this is not what we have in vm_normal_page_pmd().
> pmd_trans_huge_lock() makes sure of it.
> 
> This is a trivial fix for /proc/pid/numa_maps, and AFAIK nobody
> complains about it.
> 
> Signed-off-by: Yu Zhao 
> ---
>  mm/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e2bb51b6242e..ea3c74855b23 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -654,7 +654,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct 
> *vma, unsigned long addr,
>  
>   if (pmd_devmap(pmd))
>   return NULL;
> - if (is_zero_pfn(pfn))
> + if (is_huge_zero_pmd(pmd))
>   return NULL;
>   if (unlikely(pfn > highest_memmap_pfn))
>   return NULL;
> -- 
> 2.23.0.187.g17f5b7556c-goog
> 


Re: [v2 PATCH -mm] mm: account deferred split THPs into MemAvailable

2019-08-26 Thread Kirill A. Shutemov
On Mon, Aug 26, 2019 at 09:40:35AM +0200, Michal Hocko wrote:
> On Thu 22-08-19 18:29:34, Kirill A. Shutemov wrote:
> > On Thu, Aug 22, 2019 at 02:56:56PM +0200, Vlastimil Babka wrote:
> > > On 8/22/19 10:04 AM, Michal Hocko wrote:
> > > > On Thu 22-08-19 01:55:25, Yang Shi wrote:
> > > >> Available memory is one of the most important metrics for memory
> > > >> pressure.
> > > > 
> > > > I would disagree with this statement. It is a rough estimate that tells
> > > > how much memory you can allocate before going into a more expensive
> > > > reclaim (mostly swapping). Allocating that amount still might result in
> > > > direct reclaim induced stalls. I do realize that this is simple metric
> > > > that is attractive to use and works in many cases though.
> > > > 
> > > >> Currently, the deferred split THPs are not accounted into
> > > >> available memory, but they are reclaimable actually, like reclaimable
> > > >> slabs.
> > > >> 
> > > >> And, they seems very common with the common workloads when THP is
> > > >> enabled.  A simple run with MariaDB test of mmtest with THP enabled as
> > > >> always shows it could generate over fifteen thousand deferred split 
> > > >> THPs
> > > >> (accumulated around 30G in one hour run, 75% of 40G memory for my VM).
> > > >> It looks worth accounting in MemAvailable.
> > > > 
> > > > OK, this makes sense. But your above numbers are really worrying.
> > > > Accumulating such a large amount of pages that are likely not going to
> > > > be used is really bad. They are essentially blocking any higher order
> > > > allocations and also push the system towards more memory pressure.
> > > > 
> > > > IIUC deferred splitting is mostly a workaround for nasty locking issues
> > > > during splitting, right? This is not really an optimization to cache
> > > > THPs for reuse or something like that. What is the reason this is not
> > > > done from a worker context? At least THPs which would be freed
> > > > completely sound like a good candidate for kworker tear down, no?
> > > 
> > > Agreed that it's a good question. For Kirill :) Maybe with kworker 
> > > approach we
> > > also wouldn't need the cgroup awareness?
> > 
> > I don't remember a particular locking issue, but I cannot say there's
> > none :P
> > 
> > It's artifact from decoupling PMD split from compound page split: the same
> > page can be mapped multiple times with combination of PMDs and PTEs. Split
> > of one PMD doesn't need to trigger split of all PMDs and underlying
> > compound page.
> > 
> > Other consideration is the fact that page split can fail and we need to
> > have fallback for this case.
> > 
> > Also in most cases THP split would be just waste of time if we would do
> > them at the spot. If you don't have memory pressure it's better to wait
> > until process termination: less pages on LRU is still beneficial.
> 
> This might be true but the reality shows that a lot of THPs might be
> waiting for the memory pressure that is essentially freeable on the
> spot. So I am not really convinced that "less pages on LRUs" is really a
> plausible justification. Can we free at least those THPs which are
> unmapped completely without any pte mappings?

Unmapped completely pages will be freed with current code. Deferred split
only applies to partly mapped THPs: at least on 4k of the THP is still
mapped somewhere.

> > Main source of partly mapped THPs comes from exit path. When PMD mapping
> > of THP got split across multiple VMAs (for instance due to mprotect()),
> > in exit path we unmap PTEs belonging to one VMA just before unmapping the
> > rest of the page. It would be total waste of time to split the page in
> > this scenario.
> > 
> > The whole deferred split thing still looks as a reasonable compromise
> > to me.
> 
> Even when it leads to all other problems mentioned in this and memcg
> deferred reclaim series?

Yes.

You would still need deferred split even if you *try* to split the page on
the spot. split_huge_page() can fail (due to pin on the page) and you will
need to have a way to try again later.

You'll not win anything in complexity by trying split_huge_page()
immediately. I would ague you'll create much more complexity.

> > We may have some kind of watermark and try to keep the number of deferred
> > split THP under it. But it comes with own set of problems: what if all
> > these pages are pinned for really long time and effectively not available
> > for split.
> 
> Again, why cannot we simply push the freeing where there are no other
> mappings? This should be pretty common case, right?

Partly mapped THP is not common case at all.

To get to this point you will need to create a mapping, fault in THP and
then unmap part of it. It requires very active memory management on
application side. This kind of applications usually knows if THP is a fit
for them.

> I am still not sure that waiting for the memory reclaim is a general
> win.

It wins CPU cycles by not doing the work that is likely unneeded.

[PATCH v3] /dev/mem: Bail out upon SIGKILL.

2019-08-26 Thread Tetsuo Handa
syzbot found that a thread can stall for minutes inside read_mem() or
write_mem() after that thread was killed by SIGKILL [1]. Reading from
iomem areas of /dev/mem can be slow, depending on the hardware.
While reading 2GB at one read() is legal, delaying termination of killed
thread for minutes is bad. Thus, allow reading/writing /dev/mem and
/dev/kmem to be preemptible and killable.

  [ 1335.912419][T20577] read_mem: sz=4096 count=2134565632
  [ 1335.943194][T20577] read_mem: sz=4096 count=2134561536
  [ 1335.978280][T20577] read_mem: sz=4096 count=2134557440
  [ 1336.011147][T20577] read_mem: sz=4096 count=2134553344
  [ 1336.041897][T20577] read_mem: sz=4096 count=2134549248

Theoretically, reading/writing /dev/mem and /dev/kmem can become
"interruptible". But this patch chose "killable". Future patch will make
them "interruptible" so that we can revert to "killable" if some program
regressed.

[1] 
https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e

Signed-off-by: Tetsuo Handa 
Reported-by: syzbot 
---
 drivers/char/mem.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index b08dc50..9eb564c 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -97,6 +97,13 @@ void __weak unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 }
 #endif
 
+static inline bool should_stop_iteration(void)
+{
+   if (need_resched())
+   cond_resched();
+   return fatal_signal_pending(current);
+}
+
 /*
  * This funcion reads the *physical* memory. The f_pos points directly to the
  * memory location.
@@ -175,6 +182,8 @@ static ssize_t read_mem(struct file *file, char __user *buf,
p += sz;
count -= sz;
read += sz;
+   if (should_stop_iteration())
+   break;
}
kfree(bounce);
 
@@ -251,6 +260,8 @@ static ssize_t write_mem(struct file *file, const char 
__user *buf,
p += sz;
count -= sz;
written += sz;
+   if (should_stop_iteration())
+   break;
}
 
*ppos += written;
@@ -468,6 +479,10 @@ static ssize_t read_kmem(struct file *file, char __user 
*buf,
read += sz;
low_count -= sz;
count -= sz;
+   if (should_stop_iteration()) {
+   count = 0;
+   break;
+   }
}
}
 
@@ -492,6 +507,8 @@ static ssize_t read_kmem(struct file *file, char __user 
*buf,
buf += sz;
read += sz;
p += sz;
+   if (should_stop_iteration())
+   break;
}
free_page((unsigned long)kbuf);
}
@@ -544,6 +561,8 @@ static ssize_t do_write_kmem(unsigned long p, const char 
__user *buf,
p += sz;
count -= sz;
written += sz;
+   if (should_stop_iteration())
+   break;
}
 
*ppos += written;
@@ -595,6 +614,8 @@ static ssize_t write_kmem(struct file *file, const char 
__user *buf,
buf += sz;
virtr += sz;
p += sz;
+   if (should_stop_iteration())
+   break;
}
free_page((unsigned long)kbuf);
}
-- 
1.8.3.1



Re: [PATCH v2 5/5] ARM: dts: ls1021a-tsn: Use the DSPI controller in poll mode

2019-08-26 Thread Vladimir Oltean
Hi Mark,

On Fri, 23 Aug 2019 at 00:15, Vladimir Oltean  wrote:
>
> Connected to the LS1021A DSPI is the SJA1105 DSA switch. This
> constitutes 4 of the 6 Ethernet ports on this board.
>
> As the SJA1105 is a PTP switch, constant disciplining of its PTP clock
> is necessary, and that translates into a lot of SPI I/O even when
> otherwise idle.
>
> Switching to using the DSPI in poll mode has several distinct
> benefits:
>
> - With interrupts, the DSPI driver in TCFQ mode raises an IRQ after each
>   transmitted byte. There is more time wasted for the "waitq" event than
>   for actual I/O. And the DSPI IRQ count is by far the largest in
>   /proc/interrupts on this board (larger than Ethernet). I should
>   mention that due to various LS1021A errata, other operating modes than
>   TCFQ are not available.
>
> - The SPI I/O time is both lower, and more consistently so. For a TSN
>   switch it is important that all SPI transfers take a deterministic
>   time to complete.
>   Reading the PTP clock is an important example.
>   Egressing through the switch requires some setup in advance (an SPI
>   write command). Without this patch, that operation required a
>   --tx_timestamp_timeout 50 (ms), now it can be done with
>   --tx_timestamp_timeout 10.
>   Yet another example is reconstructing timestamps, which has a hard
>   deadline because the PTP timestamping counter wraps around in 0.135
>   seconds. Combined with other I/O needed for that to happen, there is
>   a real risk that the deadline is not always met.
>
> See drivers/net/dsa/sja1105/ for more info about the above.
>
> Cc: Rob Herring 
> Cc: Shawn Guo 
> Signed-off-by: Vladimir Oltean 
> ---
>  arch/arm/boot/dts/ls1021a-tsn.dts | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm/boot/dts/ls1021a-tsn.dts 
> b/arch/arm/boot/dts/ls1021a-tsn.dts
> index 5b7689094b70..1c09cfc766af 100644
> --- a/arch/arm/boot/dts/ls1021a-tsn.dts
> +++ b/arch/arm/boot/dts/ls1021a-tsn.dts
> @@ -33,6 +33,7 @@
>  };
>
>   {
> +   /delete-property/ interrupts;
> bus-num = <0>;
> status = "okay";
>
> --
> 2.17.1
>

I noticed you skipped applying this patch, and I'm not sure that Shawn
will review it/take it.
Do you have a better suggestion how I can achieve putting the DSPI
driver in poll mode for this board? A Kconfig option maybe?

Regards,
-Vladimir


Re: [Virtio-fs] [PATCH 04/19] virtio: Implement get_shm_region for PCI transport

2019-08-26 Thread Vivek Goyal
On Mon, Aug 26, 2019 at 09:43:08AM +0800, piaojun wrote:

[..]
> > +static bool vp_get_shm_region(struct virtio_device *vdev,
> > + struct virtio_shm_region *region, u8 id)
> > +{
> > +   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > +   struct pci_dev *pci_dev = vp_dev->pci_dev;
> > +   u8 bar;
> > +   u64 offset, len;
> > +   phys_addr_t phys_addr;
> > +   size_t bar_len;
> > +   char *bar_name;
> 
> 'char *bar_name' should be cleaned up to avoid compiling warning. And I
> wonder if you mix tab and blankspace for code indent? Or it's just my
> email display problem?

Will get rid of now unused bar_name. 

Generally git flags if there are tab/space issues. I did not see any. So
if you see something, point it out and I will fix it.

Vivek


Re: [PATCH v2] mtd: rawnand: Fix a memory leak bug

2019-08-26 Thread Boris Brezillon
On Sun, 18 Aug 2019 21:46:04 -0500
Wenwen Wang  wrote:

> In nand_scan_bbt(), a temporary buffer 'buf' is allocated through
> vmalloc(). However, if check_create() fails, 'buf' is not deallocated,
> leading to a memory leak bug. To fix this issue, free 'buf' before
> returning the error.
> 
> Signed-off-by: Wenwen Wang 
> ---
>  drivers/mtd/nand/raw/nand_bbt.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/nand_bbt.c b/drivers/mtd/nand/raw/nand_bbt.c
> index 2ef15ef..96045d6 100644
> --- a/drivers/mtd/nand/raw/nand_bbt.c
> +++ b/drivers/mtd/nand/raw/nand_bbt.c
> @@ -1232,7 +1232,7 @@ static int nand_scan_bbt(struct nand_chip *this, struct 
> nand_bbt_descr *bd)
>   if (!td) {
>   if ((res = nand_memory_bbt(this, bd))) {
>   pr_err("nand_bbt: can't scan flash and build the 
> RAM-based BBT\n");
> - goto err;
> + goto err_free_bbt;
>   }
>   return 0;
>   }
> @@ -1245,7 +1245,7 @@ static int nand_scan_bbt(struct nand_chip *this, struct 
> nand_bbt_descr *bd)
>   buf = vmalloc(len);
>   if (!buf) {
>   res = -ENOMEM;
> - goto err;
> + goto err_free_bbt;
>   }
>  
>   /* Is the bbt at a given page? */
> @@ -1258,7 +1258,7 @@ static int nand_scan_bbt(struct nand_chip *this, struct 
> nand_bbt_descr *bd)
>  
>   res = check_create(this, buf, bd);

I know it's too late, but calling

vfree(buf);

here

>   if (res)
> - goto err;
> + goto err_free_buf;
>  
>   /* Prevent the bbt regions from erasing / writing */
>   mark_bbt_region(this, td);
> @@ -1268,7 +1268,9 @@ static int nand_scan_bbt(struct nand_chip *this, struct 
> nand_bbt_descr *bd)
>   vfree(buf);

instead of here would have fixed the leak without the need for an extra
err label.

>   return 0;
>  
> -err:
> +err_free_buf:
> + vfree(buf);
> +err_free_bbt:
>   kfree(this->bbt);
>   this->bbt = NULL;
>   return res;



Re: [PATCH 0/3] Rewrite x86/ftrace to use text_poke()

2019-08-26 Thread Peter Zijlstra
On Mon, Aug 26, 2019 at 02:51:38PM +0200, Peter Zijlstra wrote:
> Ftrace was one of the last W^X violators; these patches move it over to the
> generic text_poke() interface and thereby get rid of this oddity.
> 
> Very lightly tested...

I'm thinking there's more cleanup to be had; set_kernel_text_*() can go
away now, and I'm thinking the whole batching thing can be further
simplified/shared between this and jump_label.



Re: [PATCH 1/2] x86/microcode: Update late microcode in parallel

2019-08-26 Thread Borislav Petkov
On Mon, Aug 26, 2019 at 08:53:05AM -0400, Boris Ostrovsky wrote:
> What is the advantage of having those other threads go through
> find_patch() and (in Intel case) intel_get_microcode_revision() (which
> involves two MSR accesses) vs. having the master sibling update slaves'
> microcode revisions? There are only two things that need to be updated,
> uci->cpu_sig.rev and c->microcode.

Less code churn and simplicity.

I accept non-ugly patches, of course. :-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] mtd: onenand_base: Fix a memory leak bug

2019-08-26 Thread Miquel Raynal
On Sun, 2019-08-18 at 15:52:49 UTC, Wenwen Wang wrote:
> In onenand_scan(), if CONFIG_MTD_ONENAND_VERIFY_WRITE is defined,
> 'this->verify_buf' is allocated through kzalloc(). However, it is not
> deallocated in the following execution, if the allocation for
> 'this->oob_buf' fails, leading to a memory leak bug. To fix this issue,
> free 'this->verify_buf' before returning the error.
> 
> Signed-off-by: Wenwen Wang 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
nand/next, thanks.

Miquel


Re: [PATCH v2] mtd: rawnand: Fix a memory leak bug

2019-08-26 Thread Miquel Raynal
On Mon, 2019-08-19 at 02:46:04 UTC, Wenwen Wang wrote:
> In nand_scan_bbt(), a temporary buffer 'buf' is allocated through
> vmalloc(). However, if check_create() fails, 'buf' is not deallocated,
> leading to a memory leak bug. To fix this issue, free 'buf' before
> returning the error.
> 
> Signed-off-by: Wenwen Wang 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
nand/next, thanks.

Miquel


Re: [PATCH v7 1/2] mtd: rawnand: Add Macronix raw NAND controller driver

2019-08-26 Thread Miquel Raynal
On Mon, 2019-08-19 at 07:19:08 UTC, Mason Yang wrote:
> Add a driver for Macronix raw NAND controller.
> 
> Signed-off-by: Mason Yang 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
nand/next, thanks.

Miquel


Re: [PATCH trivial] mtd: nand: fix typo, s/erasablocks/eraseblocks

2019-08-26 Thread Miquel Raynal
On Fri, 2019-08-23 at 15:39:37 UTC,  wrote:
> From: Tudor Ambarus 
> 
> Signed-off-by: Tudor Ambarus 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
nand/next, thanks.

Miquel


Re: [PATCH v7 2/2] dt-bindings: mtd: Document Macronix raw NAND controller bindings

2019-08-26 Thread Miquel Raynal
On Mon, 2019-08-19 at 07:19:09 UTC, Mason Yang wrote:
> Document the bindings used by the Macronix raw NAND controller.
> 
> Signed-off-by: Mason Yang 
> Reviewed-by: Rob Herring 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
nand/next, thanks.

Miquel


Re: [PATCH 12/16] mtd: rawnand: remove w90x900 driver

2019-08-26 Thread Miquel Raynal
On Fri, 2019-08-09 at 20:27:40 UTC, Arnd Bergmann wrote:
> The ARM w90x900 platform is getting removed, so this driver is obsolete.
> 
> Signed-off-by: Arnd Bergmann 
> Reviewed-by: Boris Brezillon 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
nand/next, thanks.

Miquel


Re: [PATCH] mtd: rawnand: sharpsl: add include guard to linux/mtd/sharpsl.h

2019-08-26 Thread Miquel Raynal
On Tue, 2019-08-20 at 02:50:57 UTC, Masahiro Yamada wrote:
> Add a header include guard just in case.
> 
> Signed-off-by: Masahiro Yamada 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
nand/next, thanks.

Miquel


[PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions

2019-08-26 Thread Peter Zijlstra
In preparation for static_call and variable size jump_label support,
teach text_poke_bp() to emulate instructions, namely:

  JMP32, JMP8, CALL, NOP2, NOP_ATOMIC5

The current text_poke_bp() takes a @handler argument which is used as
a jump target when the temporary INT3 is hit by a different CPU.

When patching CALL instructions, this doesn't work because we'd miss
the PUSH of the return address. Instead, teach poke_int3_handler() to
emulate an instruction, typically the instruction we're patching in.

This fits almost all text_poke_bp() users, except
arch_unoptimize_kprobe() which restores random text, and for that site
we have to build an explicit emulate instruction.

Cc: Masami Hiramatsu 
Cc: Steven Rostedt 
Cc: Daniel Bristot de Oliveira 
Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/text-patching.h |   24 ++--
 arch/x86/kernel/alternative.c|   98 ++-
 arch/x86/kernel/jump_label.c |9 +--
 arch/x86/kernel/kprobes/opt.c|   11 ++-
 4 files changed, 103 insertions(+), 39 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -26,10 +26,11 @@ static inline void apply_paravirt(struct
 #define POKE_MAX_OPCODE_SIZE   5
 
 struct text_poke_loc {
-   void *detour;
void *addr;
-   size_t len;
-   const char opcode[POKE_MAX_OPCODE_SIZE];
+   int len;
+   s32 rel32;
+   u8 opcode;
+   const char text[POKE_MAX_OPCODE_SIZE];
 };
 
 extern void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -51,8 +52,10 @@ extern void text_poke_early(void *addr,
 extern void *text_poke(void *addr, const void *opcode, size_t len);
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
-extern void text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
+extern void text_poke_bp(void *addr, const void *opcode, size_t len, const 
void *emulate);
 extern void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int 
nr_entries);
+extern void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
+  const void *opcode, size_t len, const void 
*emulate);
 extern int after_bootmem;
 extern __ro_after_init struct mm_struct *poking_mm;
 extern __ro_after_init unsigned long poking_addr;
@@ -63,8 +66,17 @@ static inline void int3_emulate_jmp(stru
regs->ip = ip;
 }
 
-#define INT3_INSN_SIZE 1
-#define CALL_INSN_SIZE 5
+#define INT3_INSN_SIZE 1
+#define INT3_INSN_OPCODE   0xCC
+
+#define CALL_INSN_SIZE 5
+#define CALL_INSN_OPCODE   0xE8
+
+#define JMP32_INSN_SIZE5
+#define JMP32_INSN_OPCODE  0xE9
+
+#define JMP8_INSN_SIZE 2
+#define JMP8_INSN_OPCODE   0xEB
 
 static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
 {
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -956,16 +956,15 @@ NOKPROBE_SYMBOL(patch_cmp);
 int poke_int3_handler(struct pt_regs *regs)
 {
struct text_poke_loc *tp;
-   unsigned char int3 = 0xcc;
void *ip;
 
/*
 * Having observed our INT3 instruction, we now must observe
 * bp_patching.nr_entries.
 *
-*  nr_entries != 0 INT3
-*  WMB RMB
-*  write INT3  if (nr_entries)
+*  nr_entries != 0 INT3
+*  WMB RMB
+*  write INT3  if (nr_entries)
 *
 * Idem for other elements in bp_patching.
 */
@@ -978,9 +977,9 @@ int poke_int3_handler(struct pt_regs *re
return 0;
 
/*
-* Discount the sizeof(int3). See text_poke_bp_batch().
+* Discount the INT3. See text_poke_bp_batch().
 */
-   ip = (void *) regs->ip - sizeof(int3);
+   ip = (void *) regs->ip - INT3_INSN_SIZE;
 
/*
 * Skip the binary search if there is a single member in the vector.
@@ -997,8 +996,22 @@ int poke_int3_handler(struct pt_regs *re
return 0;
}
 
-   /* set up the specified breakpoint detour */
-   regs->ip = (unsigned long) tp->detour;
+   ip += tp->len;
+
+   switch (tp->opcode) {
+   case CALL_INSN_OPCODE:
+   int3_emulate_call(regs, (long)ip + tp->rel32);
+   break;
+
+   case JMP32_INSN_OPCODE:
+   case JMP8_INSN_OPCODE:
+   int3_emulate_jmp(regs, (long)ip + tp->rel32);
+   break;
+
+   default: /* nop */
+   int3_emulate_jmp(regs, (long)ip);
+   break;
+   }
 
return 1;
 }
@@ -1027,8 +1040,8 @@ NOKPROBE_SYMBOL(poke_int3_handler);
  */
 void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries)
 {
+   unsigned char int3 = INT3_INSN_OPCODE;
int 

[PATCH 2/3] x86/alternatives: Move tp_vec

2019-08-26 Thread Peter Zijlstra
In order to allow other users to make use of the tp_vec; move it near
the text_poke_bp_batch() code, instead of keeping it in the jump_label
code.

Cc: Daniel Bristot de Oliveira 
Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/text-patching.h |4 
 arch/x86/kernel/alternative.c|3 +++
 arch/x86/kernel/jump_label.c |   25 -
 3 files changed, 19 insertions(+), 13 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -60,6 +60,10 @@ extern int after_bootmem;
 extern __ro_after_init struct mm_struct *poking_mm;
 extern __ro_after_init unsigned long poking_addr;
 
+#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc))
+extern struct text_poke_loc tp_vec[TP_VEC_MAX];
+extern int tp_vec_nr;
+
 #ifndef CONFIG_UML_X86
 static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
 {
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1017,6 +1017,9 @@ int poke_int3_handler(struct pt_regs *re
 }
 NOKPROBE_SYMBOL(poke_int3_handler);
 
+struct text_poke_loc tp_vec[TP_VEC_MAX];
+int tp_vec_nr;
+
 /**
  * text_poke_bp_batch() -- update instructions on live kernel on SMP
  * @tp:vector of instructions to patch
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -100,15 +100,12 @@ void arch_jump_label_transform(struct ju
mutex_unlock(_mutex);
 }
 
-#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc))
-static struct text_poke_loc tp_vec[TP_VEC_MAX];
-static int tp_vec_nr;
-
 bool arch_jump_label_transform_queue(struct jump_entry *entry,
 enum jump_label_type type)
 {
struct text_poke_loc *tp;
void *entry_code;
+   bool ret = true;
 
if (system_state == SYSTEM_BOOTING) {
/*
@@ -118,12 +115,15 @@ bool arch_jump_label_transform_queue(str
return true;
}
 
+   mutex_lock(_mutex);
/*
 * No more space in the vector, tell upper layer to apply
 * the queue before continuing.
 */
-   if (tp_vec_nr == TP_VEC_MAX)
-   return false;
+   if (tp_vec_nr == TP_VEC_MAX) {
+   ret = false;
+   goto unlock;
+   }
 
tp = _vec[tp_vec_nr];
 
@@ -151,20 +151,19 @@ bool arch_jump_label_transform_queue(str
text_poke_loc_init(tp, entry_code, NULL, JUMP_LABEL_NOP_SIZE, NULL);
 
tp_vec_nr++;
+unlock:
+   mutex_unlock(_mutex);
 
-   return true;
+   return ret;
 }
 
 void arch_jump_label_transform_apply(void)
 {
-   if (!tp_vec_nr)
-   return;
-
mutex_lock(_mutex);
-   text_poke_bp_batch(tp_vec, tp_vec_nr);
-   mutex_unlock(_mutex);
-
+   if (tp_vec_nr)
+   text_poke_bp_batch(tp_vec, tp_vec_nr);
tp_vec_nr = 0;
+   mutex_unlock(_mutex);
 }
 
 static enum {




[RFC][PATCH 3/3] x86/ftrace: Use text_poke()

2019-08-26 Thread Peter Zijlstra
Move ftrace over to using the generic x86 text_poke functions; this
avoids having a second/different copy of that code around.

Cc: Daniel Bristot de Oliveira 
Cc: Steven Rostedt 
Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/ftrace.h |2 
 arch/x86/kernel/ftrace.c  |  571 +++---
 arch/x86/kernel/traps.c   |9 
 3 files changed, 51 insertions(+), 531 deletions(-)

--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -35,8 +35,6 @@ struct dyn_arch_ftrace {
/* No extra data needed for x86 */
 };
 
-int ftrace_int3_handler(struct pt_regs *regs);
-
 #define FTRACE_GRAPH_TRAMP_ADDR FTRACE_GRAPH_ADDR
 
 #endif /*  CONFIG_DYNAMIC_FTRACE */
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -43,16 +43,17 @@ int ftrace_arch_code_modify_prepare(void
 * ftrace has it set to "read/write".
 */
mutex_lock(_mutex);
-   set_kernel_text_rw();
-   set_all_modules_text_rw();
+   WARN_ON_ONCE(tp_vec_nr);
return 0;
 }
 
 int ftrace_arch_code_modify_post_process(void)
 __releases(_mutex)
 {
-   set_all_modules_text_ro();
-   set_kernel_text_ro();
+   if (tp_vec_nr) {
+   text_poke_bp_batch(tp_vec, tp_vec_nr);
+   tp_vec_nr = 0;
+   }
mutex_unlock(_mutex);
return 0;
 }
@@ -60,67 +61,36 @@ int ftrace_arch_code_modify_post_process
 union ftrace_code_union {
char code[MCOUNT_INSN_SIZE];
struct {
-   unsigned char op;
+   char op;
int offset;
} __attribute__((packed));
 };
 
-static int ftrace_calc_offset(long ip, long addr)
-{
-   return (int)(addr - ip);
-}
-
-static unsigned char *
-ftrace_text_replace(unsigned char op, unsigned long ip, unsigned long addr)
+static char *ftrace_text_replace(char op, unsigned long ip, unsigned long addr)
 {
static union ftrace_code_union calc;
 
-   calc.op = op;
-   calc.offset = ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+   calc.op = op;
+   calc.offset = (int)(addr - (ip + MCOUNT_INSN_SIZE));
 
return calc.code;
 }
 
-static unsigned char *
-ftrace_call_replace(unsigned long ip, unsigned long addr)
-{
-   return ftrace_text_replace(0xe8, ip, addr);
-}
-
-static inline int
-within(unsigned long addr, unsigned long start, unsigned long end)
+static char *ftrace_nop_replace(void)
 {
-   return addr >= start && addr < end;
-}
-
-static unsigned long text_ip_addr(unsigned long ip)
-{
-   /*
-* On x86_64, kernel text mappings are mapped read-only, so we use
-* the kernel identity mapping instead of the kernel text mapping
-* to modify the kernel text.
-*
-* For 32bit kernels, these mappings are same and we can use
-* kernel identity mapping to modify code.
-*/
-   if (within(ip, (unsigned long)_text, (unsigned long)_etext))
-   ip = (unsigned long)__va(__pa_symbol(ip));
-
-   return ip;
+   return ideal_nops[NOP_ATOMIC5];
 }
 
-static const unsigned char *ftrace_nop_replace(void)
+static char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 {
-   return ideal_nops[NOP_ATOMIC5];
+   return ftrace_text_replace(CALL_INSN_OPCODE, ip, addr);
 }
 
-static int
-ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
-  unsigned const char *new_code)
+static int;
+ftrace_modify_code_direct(unsigned long ip, const char *old_code,
+ const char *new_code)
 {
-   unsigned char replaced[MCOUNT_INSN_SIZE];
-
-   ftrace_expected = old_code;
+   char cur_code[MCOUNT_INSN_SIZE];
 
/*
 * Note:
@@ -129,31 +99,23 @@ ftrace_modify_code_direct(unsigned long
 * Carefully read and modify the code with probe_kernel_*(), and make
 * sure what we read is what we expected it to be before modifying it.
 */
-
/* read the text we want to modify */
-   if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+   if (probe_kernel_read(cur_code, (void *)ip, MCOUNT_INSN_SIZE))
return -EFAULT;
 
/* Make sure it is what we expect it to be */
-   if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
+   if (memcmp(cur_code, old_code, MCOUNT_INSN_SIZE) != 0)
return -EINVAL;
 
-   ip = text_ip_addr(ip);
-
/* replace the text with the new text */
-   if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
-   return -EPERM;
-
-   sync_core();
-
+   text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
return 0;
 }
 
-int ftrace_make_nop(struct module *mod,
-   struct dyn_ftrace *rec, unsigned long addr)
+int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long 
addr)
 {
-   unsigned const char *new, *old;
unsigned long 

[PATCH 0/3] Rewrite x86/ftrace to use text_poke()

2019-08-26 Thread Peter Zijlstra
Ftrace was one of the last W^X violators; these patches move it over to the
generic text_poke() interface and thereby get rid of this oddity.

Very lightly tested...



Re: [PATCH 1/2] x86/microcode: Update late microcode in parallel

2019-08-26 Thread Boris Ostrovsky
On 8/24/19 4:53 AM, Borislav Petkov wrote:
>  
> +wait_for_siblings:
> + if (__wait_for_cpus(_cpus_out, NSEC_PER_SEC))
> + panic("Timeout during microcode update!\n");
> +
>   /*
> -  * Increase the wait timeout to a safe value here since we're
> -  * serializing the microcode update and that could take a while on a
> -  * large number of CPUs. And that is fine as the *actual* timeout will
> -  * be determined by the last CPU finished updating and thus cut short.
> +  * At least one thread has completed update on each core.
> +  * For others, simply call the update to make sure the
> +  * per-cpu cpuinfo can be updated with right microcode
> +  * revision.


What is the advantage of having those other threads go through
find_patch() and (in Intel case) intel_get_microcode_revision() (which
involves two MSR accesses) vs. having the master sibling update slaves'
microcode revisions? There are only two things that need to be updated,
uci->cpu_sig.rev and c->microcode.

-boris


>*/
> - if (__wait_for_cpus(_cpus_out, NSEC_PER_SEC * num_online_cpus()))
> - panic("Timeout during microcode update!\n");
> + if (cpumask_first(topology_sibling_cpumask(cpu)) != cpu)
> + apply_microcode_local();
>  
>   return ret;
>  }



Re: [GIT pull] x86/urgent for 5.3-rc5

2019-08-26 Thread Borislav Petkov
On Sun, Aug 25, 2019 at 10:17:23PM +0200, Borislav Petkov wrote:
> > I think WARN_ONCE() is good. It's big enough that it will show up in
> > dmesg if anybody looks, and if nobody looks I think distros still have
> > logging for things like that, don't they?
> 
> Probably. Lemme research that.

So there's a whole bunch of daemons doing desktop notifications along with a
desktop notifications spec, yadda yadda:

https://wiki.archlinux.org/index.php/Desktop_notifications

but installing an openSUSE Leap 15.1 in a guest and that is a default
desktop installation doesn't give me any notifications. I installed
notification-daemon and whatnot but nada.

Which means, that we cannot guarantee that every user would see it.
There might be installations which miss it.

So the only thing I can think of right now is to make that single line
pr_emerg() so that it atleast spews into the terminals when suspending:

linux-6qfo:~ # echo "suspend" > /sys/power/disk
linux-6qfo:~ # echo "mem" > /sys/power/state

<--- resume guest.

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.416145] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.423091] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.427426] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.434699] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.442587] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.449047] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.456328] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.462198] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.470120] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.477302] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.484605] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.490115] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.497508] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.505350] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

Message from syslogd@linux-6qfo at Aug 26 14:48:05 ...
 kernel:[   40.513336] RDRAND gives funky smelling output, might consider not 
using it by booting with "nordrand"

---

Assuming the user has at least a single terminal open. But someone might
have a better idea.

Current diff:

---
commit d46b23c4be1b4acae7d21c97be189131e200f6d0 (HEAD -> 
refs/heads/rc5+1-rdrand)
Author: Borislav Petkov 
Date:   Sun Aug 25 22:50:18 2019 +0200

WIP

Suggested-by: Linus Torvalds 
Signed-off-by: Borislav Petkov 

diff --git a/arch/x86/kernel/cpu/rdrand.c b/arch/x86/kernel/cpu/rdrand.c
index 5c900f9527ff..b02d1ce91081 100644
--- a/arch/x86/kernel/cpu/rdrand.c
+++ b/arch/x86/kernel/cpu/rdrand.c
@@ -29,7 +29,8 @@ __setup("nordrand", x86_rdrand_setup);
 #ifdef CONFIG_ARCH_RANDOM
 void x86_init_rdrand(struct cpuinfo_x86 *c)
 {
-   unsigned long tmp;
+   unsigned int changed = 0;
+   unsigned long tmp, prev;
int i;
 
if (!cpu_has(c, X86_FEATURE_RDRAND))
@@ -42,5 +43,27 @@ void x86_init_rdrand(struct cpuinfo_x86 *c)
return;
}
}
+
+   /*
+* Stupid sanity-check whether RDRAND does *actually* generate
+* some at least random-looking data.
+*/
+   prev = tmp;
+   for (i = 0; i < SANITY_CHECK_LOOPS; i++) {
+   if (rdrand_long()) {
+   if (prev != tmp)
+   changed++;
+
+   prev = tmp;
+   }
+   }
+
+   if (!changed) {
+   pr_emerg(
+"RDRAND gives 

Re: [PATCH v5] perf machine: arm/arm64: Improve completeness for kernel address space

2019-08-26 Thread Leo Yan
Hi Adrian,

On Fri, Aug 16, 2019 at 04:00:02PM +0300, Adrian Hunter wrote:
> On 16/08/19 4:45 AM, Leo Yan wrote:
> > Hi Adrian,
> > 
> > On Thu, Aug 15, 2019 at 02:45:57PM +0300, Adrian Hunter wrote:
> > 
> > [...]
> > 
>  How come you cannot use kallsyms to get the information?
> >>>
> >>> Thanks for pointing out this.  Sorry I skipped your comment "I don't
> >>> know how you intend to calculate ARM_PRE_START_SIZE" when you reviewed
> >>> the patch v3, I should use that chance to elaborate the detailed idea
> >>> and so can get more feedback/guidance before procceed.
> >>>
> >>> Actually, I have considered to use kallsyms when worked on the previous
> >>> patch set.
> >>>
> >>> As mentioned in patch set v4's cover letter, I tried to implement
> >>> machine__create_extra_kernel_maps() for arm/arm64, the purpose is to
> >>> parse kallsyms so can find more kernel maps and thus also can fixup
> >>> the kernel start address.  But I found the 'perf script' tool directly
> >>> calls machine__get_kernel_start() instead of running into the flow for
> >>> machine__create_extra_kernel_maps();
> >>
> >> Doesn't it just need to loop through each kernel map to find the lowest
> >> start address?
> > 
> > Based on your suggestion, I worked out below change and verified it
> > can work well on arm64 for fixing up start address; please let me know
> > if the change works for you?
> 
> How does that work if take a perf.data file to a machine with a different
> architecture?

Sorry I delayed so long to respond to your question; I didn't have
confidence to give out very reasonale answer and this is the main reason
for delaying.

For your question for taking a perf.data file to a machine with a
different architecture, we can firstly use command 'perf buildid-list'
to print out the buildid for kallsyms, based on the dumped buildid we
can find out the location for the saved kallsyms file; then we can use
option '--kallsyms' to specify the offline kallsyms file and use the
offline kallsyms to fixup kernel start address.  The detailed commands
are listed as below:

root@debian:~# perf buildid-list
7b36dfca8317ef74974ebd7ee5ec0a8b35c97640 [kernel.kallsyms]
56b84aa88a1bcfe222a97a53698b92723a3977ca /usr/lib/systemd/systemd
0956b952e9cd673d48ff2cfeb1a9dbd0c853e686 /usr/lib/aarch64-linux-gnu/libm-2.28.so
[...]

root@debian:~# perf script --kallsyms 
~/.debug/\[kernel.kallsyms\]/7b36dfca8317ef74974ebd7ee5ec0a8b35c97640/kallsyms

The amended patch is as below, please review and always welcome
any suggestions or comments!

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 5734460fc89e..593f05cc453f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2672,9 +2672,26 @@ int machine__nr_cpus_avail(struct machine *machine)
return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
 }
 
+static int machine__fixup_kernel_start(void *arg,
+  const char *name __maybe_unused,
+  char type,
+  u64 start)
+{
+   struct machine *machine = arg;
+
+   type = toupper(type);
+
+   /* Fixup for text, weak, data and bss sections. */
+   if (type == 'T' || type == 'W' || type == 'D' || type == 'B')
+   machine->kernel_start = min(machine->kernel_start, start);
+
+   return 0;
+}
+
 int machine__get_kernel_start(struct machine *machine)
 {
struct map *map = machine__kernel_map(machine);
+   char filename[PATH_MAX];
int err = 0;
 
/*
@@ -2696,6 +2713,22 @@ int machine__get_kernel_start(struct machine *machine)
if (!err && !machine__is(machine, "x86_64"))
machine->kernel_start = map->start;
}
+
+   if (symbol_conf.kallsyms_name != NULL) {
+   strncpy(filename, symbol_conf.kallsyms_name, PATH_MAX);
+   } else {
+   machine__get_kallsyms_filename(machine, filename, PATH_MAX);
+
+   if (symbol__restricted_filename(filename, "/proc/kallsyms"))
+   goto out;
+   }
+
+   if (kallsyms__parse(filename, machine, machine__fixup_kernel_start))
+   pr_warning("Fail to fixup kernel start address. skipping...\n");
+
+out:
return err;
 }
 

Thanks,
Leo Yan


Re: [RESEND RFC PATCH v3 20/20] mtd: spi-nor: Rework the disabling of block write protection

2019-08-26 Thread Boris Brezillon
On Mon, 26 Aug 2019 12:09:09 +
 wrote:

> From: Tudor Ambarus 
> 
> spi_nor_unlock() unlocks blocks of memory or the entire flash memory
> array, if requested. clear_sr_bp() unlocks the entire flash memory
> array at boot time. This calls for some unification, clear_sr_bp() is
> just an optimization for the case when the unlock request covers the
> entire flash size.
> 
> Merge the clear_sr_bp() and stm_lock/unlock logic and introduce
> spi_nor_unlock_all(), which makes an unlock request that covers the
> entire flash size.
> 
> Get rid of the MFR handling and implement specific manufacturer
> default_init() fixup hooks.
> 
> Move write_sr_cr() to avoid to add a forward declaration. Prefix
> new function with 'spi_nor_'.
> 
> Note that this changes a bit the logic for the SNOR_MFR_ATMEL and
> SNOR_MFR_INTEL cases. Before this patch, the Atmel and Intel chips
> did not set the locking ops, but unlocked the entire flash at boot
> time, while now they are setting the locking ops to stm_locking_ops.
> This should work, since the the disable of the block protection at the
> boot time used the same Status Register bits to unlock the flash, as
> in the stm_locking_ops case.
> 
> In future, we should probably add new hooks to
> 'struct spi_nor_flash_parameter' to describe how to interact with the
> Status and Configuration Registers in the form of:
>   nor->params.ops->read_sr
>   nor->params.ops->write_sr
>   nor->params.ops->read_cr
>   nor->params.ops->write_sr
> We can retrieve this info starting with JESD216 revB, by checking the
> 15th DWORD of Basic Flash Parameter Table, or with later revisions of
> the standard, by parsing the "Status, Control and Configuration Register
> Map for SPI Memory Devices".
> 
> Suggested-by: Boris Brezillon 
> Signed-off-by: Tudor Ambarus 

Reviewed-by: Boris Brezillon 

Though I'd recommend waiting a bit before applying that one. As
discussed privately, we might have problems when ->quad_enable is set
to spansion_read_cr_quad_enable or spansion_no_read_cr_quad_enable.


Re: linux-next: build warning after merge of the devfreq tree

2019-08-26 Thread MyungJoo Ham
Thank you for pointing this out!

I've added a fix to the tree:
https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git/log/?h=for-next
(and shared the patch in a previous reply)

Rafael, could you please pull the fix from the git repo above?


Cheers,
MyungJoo

On Mon, Aug 26, 2019 at 8:51 PM Stephen Rothwell  wrote:
>
> Hi all,
>
> After merging the devfreq tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
>
> drivers/devfreq/governor_passive.c: In function 
> 'devfreq_passive_event_handler':
> drivers/devfreq/governor_passive.c:152:17: warning: unused variable 'dev' 
> [-Wunused-variable]
>   struct device *dev = devfreq->dev.parent;
>  ^~~
>
> Introduced by commit
>
>   0ef7c7cce43f ("PM / devfreq: passive: Use non-devm notifiers")
>
> --
> Cheers,
> Stephen Rothwell



-- 
MyungJoo Ham, Ph.D.
S/W Center, Samsung Electronics


Re: [PATCH] x86/mm: Do not split_large_page() for set_kernel_text_rw()

2019-08-26 Thread Peter Zijlstra
On Mon, Aug 26, 2019 at 07:33:08AM -0400, Steven Rostedt wrote:
> Anyway, I believe Nadav has some patches that converts ftrace to use
> the shadow page modification trick somewhere.
> 
> Or we also need the text_poke batch processing (did that get upstream?).

It did. And I just did that patch; I'll send out in a bit.

It seems to work, but this is the very first time I've looked at this
code.


Re: [PATCH] fs/proc/page: Skip uninitialized page when iterating page structures

2019-08-26 Thread Waiman Long
On 8/26/19 12:08 AM, kbuild test robot wrote:
> Hi Waiman,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc6 next-20190823]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Waiman-Long/fs-proc-page-Skip-uninitialized-page-when-iterating-page-structures/20190826-105836
> config: x86_64-lkp (attached as .config)
> compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
>
> All errors (new ones prefixed by >>):
>
>fs/proc/page.c: In function 'find_next_zone_range':
>>> fs/proc/page.c:58:2: error: expected ';' before 'for'
>  for (pgdat = first_online_pgdat(); pgdat;
>  ^~~
>fs/proc/page.c:52:6: warning: unused variable 'i' [-Wunused-variable]
>  int i;
>  ^
>fs/proc/page.c:51:15: warning: unused variable 'zone' [-Wunused-variable]
>  struct zone *zone;
>   ^~~~
>fs/proc/page.c:50:13: warning: unused variable 'pgdat' [-Wunused-variable]
>  pg_data_t *pgdat;
> ^
>fs/proc/page.c: In function 'pfn_in_zone':
>>> fs/proc/page.c:79:23: error: 'struct zone_range' has no member named 
>>> 'start'; did you mean 'pfn_start'?
>  return pfn >= range->start && pfn < range->end;
>   ^
>   pfn_start
>>> fs/proc/page.c:79:43: error: 'struct zone_range' has no member named 'end'
>  return pfn >= range->start && pfn < range->end;
>   ^~
>fs/proc/page.c:80:1: warning: control reaches end of non-void function 
> [-Wreturn-type]
> }
> ^
>
> vim +58 fs/proc/page.c
>
> 46
> 47static void find_next_zone_range(struct zone_range *range)
> 48{
> 49unsigned long start, end;
>   > 50pg_data_t *pgdat;
>   > 51struct zone *zone;
> 52int i;
> 53
> 54/*
> 55 * Scan all the zone structures to find the next 
> closest one.
> 56 */
> 57start = end = -1UL
>   > 58for (pgdat = first_online_pgdat(); pgdat;
> 59 pgdat = next_online_pgdat(pgdat)) {
> 60for (zone = pgdat->node_zones, i = 0; i < 
> MAX_NR_ZONES;
> 61 zone++, i++) {
> 62if (!zone->spanned_pages)
> 63continue;
> 64if ((zone->zone_start_pfn >= 
> range->pfn_end) &&
> 65(zone->zone_start_pfn < start)) {
> 66start = zone->zone_start_pfn;
> 67end   = start + 
> zone->spanned_pages;
> 68}
> 69}
> 70}
> 71range->pfn_start = start;
> 72range->pfn_end   = end;
> 73}
> 74
> 75static inline bool pfn_in_zone(unsigned long pfn, struct 
> zone_range *range)
> 76{
> 77if (pfn >= range->pfn_end)
> 78find_next_zone_range(range);
>   > 79return pfn >= range->start && pfn < range->end;
> 80}
> 81
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Sorry, I have accidentally sent the old version with syntax error out.
Has posted the right one in v2.

Cheers,
Longman



[PATCH] PM / devfreq: passive: fix compiler warning

2019-08-26 Thread MyungJoo Ham
The recent commit of
PM / devfreq: passive: Use non-devm notifiers
had incurred compiler warning, "unused variable 'dev'".

Reported-by: Stephen Rothwell 
Signed-off-by: MyungJoo Ham 
---
 drivers/devfreq/governor_passive.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/devfreq/governor_passive.c
b/drivers/devfreq/governor_passive.c
index da48547..be6eeab 100644
--- a/drivers/devfreq/governor_passive.c
+++ b/drivers/devfreq/governor_passive.c
@@ -149,7 +149,6 @@ static int devfreq_passive_notifier_call(struct
notifier_block *nb,
 static int devfreq_passive_event_handler(struct devfreq *devfreq,
 unsigned int event, void *data)
 {
-struct device *dev = devfreq->dev.parent;
 struct devfreq_passive_data *p_data
 = (struct devfreq_passive_data *)devfreq->data;
 struct devfreq *parent = (struct devfreq *)p_data->parent;
-- 
2.7.4


[PATCH v2] fs/proc/page: Skip uninitialized page when iterating page structures

2019-08-26 Thread Waiman Long
It was found that on a dual-socket x86-64 system with nvdimm, reading
/proc/kpagecount may cause the system to panic:

===
[   79.917682] BUG: unable to handle page fault for address: fffe
[   79.924558] #PF: supervisor read access in kernel mode
[   79.929696] #PF: error_code(0x) - not-present page
[   79.934834] PGD 87b60d067 P4D 87b60d067 PUD 87b60f067 PMD 0
[   79.940494] Oops:  [#1] SMP NOPTI
[   79.944157] CPU: 89 PID: 3455 Comm: cp Not tainted 5.3.0-rc5-test+ #14
[   79.950682] Hardware name: Dell Inc. PowerEdge R740/07X9K0, BIOS 2.2.11 
06/13/2019
[   79.958246] RIP: 0010:kpagecount_read+0xdb/0x1a0
[   79.962859] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 f7 48 c1 e7 06 48 
03 3d fe da de 00 74 1d 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 
f6 c4 02 75 06 83 7f 30 80 7d 62 31 c0 4c 89 f9 e8 5d c9
[   79.981603] RSP: 0018:b0d9c950fe70 EFLAGS: 00010202
[   79.986830] RAX: fffe RBX: 8beebe5383c0 RCX: b0d9c950ff00
[   79.993963] RDX: 0001 RSI: 7fd85b29e000 RDI: e77a2200
[   80.001095] RBP: 0002 R08: 0001 R09: 
[   80.008226] R10:  R11: 0001 R12: 7fd85b29e000
[   80.015358] R13: 893f0480 R14: 0088 R15: 7fd85b29e000
[   80.022491] FS:  7fd85b312800() GS:8c359fb0() 
knlGS:
[   80.030576] CS:  0010 DS:  ES:  CR0: 80050033
[   80.036321] CR2: fffe CR3: 004f54a38001 CR4: 007606e0
[   80.043455] DR0:  DR1:  DR2: 
[   80.050586] DR3:  DR6: fffe0ff0 DR7: 0400
[   80.057718] PKRU: 5554
[   80.060428] Call Trace:
[   80.062877]  proc_reg_read+0x39/0x60
[   80.066459]  vfs_read+0x91/0x140
[   80.069686]  ksys_read+0x59/0xd0
[   80.072922]  do_syscall_64+0x59/0x1e0
[   80.076588]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   80.081637] RIP: 0033:0x7fd85a7f5d75
===

It turns out the panic was caused by the kpagecount_read() function
hitting an uninitialized page structure at PFN 0x88 where all its
fields were set to -1. The compound_head value of -1 will mislead the
kernel to treat -2 as a pointer to the head page of the compound page
leading to the crash.

The system have 12 GB of nvdimm ranging from PFN 0x88-0xb7.
However, only PFN 0x88c200-0xb7 are released by the nvdimm
driver to the kernel and initialized. IOW, PFN 0x88-0x88c1ff
remain uninitialized. Perhaps these 196 MB of nvdimm are reserved for
internal use.

To fix the panic, we need to find out if a page structure has been
initialized. This is done now by checking if the PFN is in the range
of a memory zone assuming that pages in a zone is either correctly
marked as not present in the mem_section structure or have their page
structures initialized.

Signed-off-by: Waiman Long 
---
 fs/proc/page.c | 68 +++---
 1 file changed, 65 insertions(+), 3 deletions(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 544d1ee15aee..fee55ad95893 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -21,6 +21,64 @@
 #define KPMMASK (KPMSIZE - 1)
 #define KPMBITS (KPMSIZE * BITS_PER_BYTE)
 
+/*
+ * It is possible a page structure is contained in a mem_section that is
+ * regarded as valid but the page structure itself is not properly
+ * initialized. For example, portion of the device memory may be used
+ * internally by device driver or firmware without being managed by the
+ * kernel and hence their page structures may not be initialized.
+ *
+ * An uninitialized page structure may cause the PFN iteration code
+ * in this file to panic the system. To safe-guard against this
+ * possibility, an additional check of the PFN is done to make sure
+ * that it is in a valid range in one of the memory zones:
+ *
+ * [zone_start_pfn, zone_start_pfn + spanned_pages)
+ *
+ * It is possible that some of the PFNs within a zone is not present.
+ * In this case, it will have to rely on the current mem_section check
+ * as well as the affected page structures are still properly initialized.
+ */
+struct zone_range {
+   unsigned long pfn_start;
+   unsigned long pfn_end;
+};
+
+static void find_next_zone_range(struct zone_range *range)
+{
+   unsigned long start, end;
+   pg_data_t *pgdat;
+   struct zone *zone;
+   int i;
+
+   /*
+* Scan all the zone structures to find the next closest one.
+*/
+   start = end = -1UL;
+   for (pgdat = first_online_pgdat(); pgdat;
+pgdat = next_online_pgdat(pgdat)) {
+   for (zone = pgdat->node_zones, i = 0; i < MAX_NR_ZONES;
+zone++, i++) {
+   if (!zone->spanned_pages)
+   continue;
+   if ((zone->zone_start_pfn >= range->pfn_end) &&
+  

[PATCH] tpm: tpm_crb: Add an AMD fTPM support feature

2019-08-26 Thread Safford, David (GE Global Research, US)
> From: linux-integrity-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Jarkko Sakkinen
> Sent: Monday, August 26, 2019 1:59 AM
> To: Seunghun Han 
> Cc: Peter Huewe ; Thomas Gleixner
> ; linux-kernel@vger.kernel.org; linux-
> integr...@vger.kernel.org
> Subject: EXT: Re: [PATCH] tpm: tpm_crb: Add an AMD fTPM support feature
> 
> On Mon, Aug 26, 2019 at 02:40:19AM +0900, Seunghun Han wrote:
> > I'm Seunghun Han and work at the Affiliated Institute of ETRI. I got
> > an AMD system which had a Ryzen Threadripper 1950X and MSI mainboard,
> > and I had a problem with AMD's fTPM. My machine showed an error
> > message below, and the fTPM didn't work because of it.
> >
> > [5.732084] tpm_crb MSFT0101:00: can't request region for resource
> >[mem 0x79b4f000-0x79b4]
> > [5.732089] tpm_crb: probe of MSFT0101:00 failed with error -16
> >
> > When I saw the iomem areas and found two TPM CRB regions were in the
> > ACPI NVS area.  The iomem regions are below.
> >
> > 79a39000-79b6afff : ACPI Non-volatile Storage
> >   79b4b000-79b4bfff : MSFT0101:00
> >   79b4f000-79b4 : MSFT0101:00
> >
> > After analyzing this issue, I found out that a busy bit was set to the
> > ACPI NVS area, and the current Linux kernel allowed nothing to be
> > assigned in it. I also found that the kernel couldn't calculate the
> > sizes of command and response buffers correctly when the TPM regions
> were two or more.
> >
> > To support AMD's fTPM, I removed the busy bit from the ACPI NVS area
> > so that AMD's fTPM regions could be assigned in it. I also fixed the
> > bug that did not calculate the sizes of command and response buffer
> correctly.

The problem is that the acpi tables are _wrong_ in this and other cases.
They not only incorrectly report the area as reserved, but also report
the command and response buffer sizes incorrectly. If you look at
the addresses for the buffers listed in the crb control area, the sizes
are correct (4Kbytes).  My patch uses the control area values, and
everything works. The remaining problem is that if acpi reports the
area as NVS, then the linux nvs driver will try to use the space.
I'm looking at how to fix that. I'm not sure, if simply removing
the busy bit is sufficient.
Dave

> >
> > Signed-off-by: Seunghun Han 
> 
> You need to split this into multiple patches e.g. if you think you've fixed a
> bug, please write a patch with just the bug fix and nothing else.
> 
> For further information, read the section three of
> 
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html
> 
> I'd also recommend to check out the earlier discussion on ACPI NVS:
> 
> https://lore.kernel.org/linux-
> integrity/BCA04D5D9A3B764C9B7405BBA4D4A3C035EF7BC7@ALPMBAPA12.e
> 2k.ad.ge.com/
> 
> /Jarkko


Re: [RESEND PATCH v3 14/20] mtd: spi_nor: Add a ->setup() method

2019-08-26 Thread Boris Brezillon
On Mon, 26 Aug 2019 12:08:58 +
 wrote:

> From: Tudor Ambarus 
> 
> nor->params.setup() configures the SPI NOR memory. Useful for SPI NOR
> flashes that have peculiarities to the SPI NOR standard, e.g.
> different opcodes, specific address calculation, page size, etc.
> Right now the only user will be the S3AN chips, but other
> manufacturers can implement it if needed.
> 
> Move spi_nor_setup() related code in order to avoid a forward
> declaration to spi_nor_default_setup().
> 
> Reviewed-by: Boris Brezillon 

Nitpick: R-bs should normally be placed after your SoB.

> Signed-off-by: Tudor Ambarus 
> ---


Re: [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature

2019-08-26 Thread Andrew Lunn
On Mon, Aug 26, 2019 at 10:11:12AM +0200, Horatiu Vultur wrote:
> When a network port is added to a bridge then the port is added in
> promisc mode. Some HW that has bridge capabilities(can learn, forward,
> flood etc the frames) they are disabling promisc mode in the network
> driver when the port is added to the SW bridge.
> 
> This patch adds the feature NETIF_F_HW_BR_CAP so that the network ports
> that have this feature will not be set in promisc mode when they are
> added to a SW bridge.
> 
> In this way the HW that has bridge capabilities don't need to send all the
> traffic to the CPU and can also implement the promisc mode and toggle it
> using the command 'ip link set dev swp promisc on'

Hi Horatiu

I'm still not convinced this is needed. The model is, the hardware is
there to accelerate what Linux can do in software. Any peculiarities
of the accelerator should be hidden in the driver.  If the accelerator
can do its job without needing promisc mode, do that in the driver.

So you are trying to differentiate between promisc mode because the
interface is a member of a bridge, and promisc mode because some
application, like pcap, has asked for promisc mode.

dev->promiscuity is a counter. So what you can do it look at its
value, and how the interface is being used. If the interface is not a
member of a bridge, and the count > 0, enable promisc mode in the
accelerator. If the interface is a member of a bridge, and the count >
1, enable promisc mode in the accelerator.

   Andrew



Re: [RESEND PATCH v3 04/20] mtd: spi-nor: Move erase_map to 'struct spi_nor_flash_parameter'

2019-08-26 Thread Boris Brezillon
On Mon, 26 Aug 2019 12:08:38 +
 wrote:

> From: Tudor Ambarus 
> 
> All flash parameters and settings should reside inside
> 'struct spi_nor_flash_parameter'. Move the SMPT parsed erase map
> from 'struct spi_nor' to 'struct spi_nor_flash_parameter'.
> 
> Please note that there is a roll-back mechanism for the flash
> parameter and settings, for cases when SFDP parser fails. The SFDP
> parser receives a Stack allocated copy of nor->params, called
> sfdp_params, and uses it to retrieve the serial flash discoverable
> parameters. JESD216 SFDP is a standard and has a higher priority
> than the default initialized flash parameters, so will overwrite the
> sfdp_params data when needed. All SFDP code uses the local copy of
> nor->params, that will overwrite it in the end, if the parser succeds.
> 
> Saving and restoring the nor->params.erase_map is no longer needed,
> since the SFDP code does not touch it.
> 
> Signed-off-by: Tudor Ambarus 
> ---
> v3: Collect R-b

Looks like you actually forgot to collect them :P.


Re: linux-next: build warning after merge of the devfreq tree

2019-08-26 Thread Leonard Crestez
On 26.08.2019 14:50, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the devfreq tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
> 
> drivers/devfreq/governor_passive.c: In function 
> 'devfreq_passive_event_handler':
> drivers/devfreq/governor_passive.c:152:17: warning: unused variable 'dev' 
> [-Wunused-variable]
>struct device *dev = devfreq->dev.parent;
>   ^~~
> 
> Introduced by commit
> 
>0ef7c7cce43f ("PM / devfreq: passive: Use non-devm notifiers")
>
An older version was merged somehow, this was fixed in v3: 
https://patchwork.kernel.org/patch/11091793/

--
Regards,
Leonard


Re: [PATCH] IB/mlx5: Convert to use vm_map_pages_zero()

2019-08-26 Thread Jason Gunthorpe
On Mon, Aug 26, 2019 at 01:32:09AM +0530, Souptick Joarder wrote:
> On Mon, Aug 26, 2019 at 1:13 AM Jason Gunthorpe  wrote:
> >
> > On Sun, Aug 25, 2019 at 11:37:27AM +0530, Souptick Joarder wrote:
> > > First, length passed to mmap is checked explicitly against
> > > PAGE_SIZE.
> > >
> > > Second, if vma->vm_pgoff is passed as non zero, it would return
> > > error. It appears like driver is expecting vma->vm_pgoff to
> > > be passed as 0 always.
> >
> > ? pg_off is not zero
> 
> Sorry, I mean, driver has a check against non zero to return error -EOPNOTSUPP
> which means in true scenario driver is expecting vma->vm_pgoff should be 
> passed
> as 0.

get_index is masking vm_pgoff, it is not 0

Jason


Re: [PATCH 2/3] xfs: add kmem_alloc_io()

2019-08-26 Thread Michal Hocko
On Thu 22-08-19 16:26:42, Vlastimil Babka wrote:
> On 8/22/19 3:17 PM, Dave Chinner wrote:
> > On Thu, Aug 22, 2019 at 02:19:04PM +0200, Vlastimil Babka wrote:
> >> On 8/22/19 2:07 PM, Dave Chinner wrote:
> >> > On Thu, Aug 22, 2019 at 01:14:30PM +0200, Vlastimil Babka wrote:
> >> > 
> >> > No, the problem is this (using kmalloc as a general term for
> >> > allocation, whether it be kmalloc, kmem_cache_alloc, alloc_page, etc)
> >> > 
> >> >some random kernel code
> >> > kmalloc(GFP_KERNEL)
> >> >  reclaim
> >> >  PF_MEMALLOC
> >> >  shrink_slab
> >> >   xfs_inode_shrink
> >> >XFS_ILOCK
> >> > xfs_buf_allocate_memory()
> >> >  kmalloc(GFP_KERNEL)
> >> > 
> >> > And so locks on inodes in reclaim are seen below reclaim. Then
> >> > somewhere else we have:
> >> > 
> >> >some high level read-only xfs code like readdir
> >> > XFS_ILOCK
> >> >  xfs_buf_allocate_memory()
> >> >   kmalloc(GFP_KERNEL)
> >> >reclaim
> >> > 
> >> > And this one throws false positive lockdep warnings because we
> >> > called into reclaim with XFS_ILOCK held and GFP_KERNEL alloc
> >> 
> >> OK, and what exactly makes this positive a false one? Why can't it 
> >> continue like
> >> the first example where reclaim leads to another XFS_ILOCK, thus deadlock?
> > 
> > Because above reclaim we only have operations being done on
> > referenced inodes, and below reclaim we only have unreferenced
> > inodes. We never lock the same inode both above and below reclaim
> > at the same time.
> > 
> > IOWs, an operation above reclaim cannot see, access or lock
> > unreferenced inodes, except in inode write clustering, and that uses
> > trylocks so cannot deadlock with reclaim.
> > 
> > An operation below reclaim cannot see, access or lock referenced
> > inodes except during inode write clustering, and that uses trylocks
> > so cannot deadlock with code above reclaim.
> 
> Thanks for elaborating. Perhaps lockdep experts (not me) would know how to
> express that. If not possible, then replacing GFP_NOFS with __GFP_NOLOCKDEP
> should indeed suppress the warning, while allowing FS reclaim.

This was certainly my hope to happen when introducing __GFP_NOLOCKDEP.
I couldn't have done the second step because that requires a deep
understanding of the code in question which is beyond my capacity. It
seems we still haven't found a brave soul to start converting GFP_NOFS
to __GFP_NOLOCKDEP. And it would be really appreciated.

Thanks.
-- 
Michal Hocko
SUSE Labs


[PATCH v4 4/4] clk: qcom: clk-rpmh: Add support for SM8150

2019-08-26 Thread Vinod Koul
Add support for rpmh clocks found in SM8150

Signed-off-by: Vinod Koul 
Reviewed-by: Bjorn Andersson 
---
 drivers/clk/qcom/clk-rpmh.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/clk/qcom/clk-rpmh.c b/drivers/clk/qcom/clk-rpmh.c
index 97aa092f5f40..082d0b72fe1e 100644
--- a/drivers/clk/qcom/clk-rpmh.c
+++ b/drivers/clk/qcom/clk-rpmh.c
@@ -374,6 +374,33 @@ static const struct clk_rpmh_desc clk_rpmh_sdm845 = {
.num_clks = ARRAY_SIZE(sdm845_rpmh_clocks),
 };
 
+DEFINE_CLK_RPMH_ARC(sm8150, bi_tcxo, bi_tcxo_ao, "xo.lvl", 0x3, 2);
+DEFINE_CLK_RPMH_VRM(sm8150, ln_bb_clk2, ln_bb_clk2_ao, "lnbclka2", 2);
+DEFINE_CLK_RPMH_VRM(sm8150, ln_bb_clk3, ln_bb_clk3_ao, "lnbclka3", 2);
+DEFINE_CLK_RPMH_VRM(sm8150, rf_clk1, rf_clk1_ao, "rfclka1", 1);
+DEFINE_CLK_RPMH_VRM(sm8150, rf_clk2, rf_clk2_ao, "rfclka2", 1);
+DEFINE_CLK_RPMH_VRM(sm8150, rf_clk3, rf_clk3_ao, "rfclka3", 1);
+
+static struct clk_hw *sm8150_rpmh_clocks[] = {
+   [RPMH_CXO_CLK]  = _bi_tcxo.hw,
+   [RPMH_CXO_CLK_A]= _bi_tcxo_ao.hw,
+   [RPMH_LN_BB_CLK2]   = _ln_bb_clk2.hw,
+   [RPMH_LN_BB_CLK2_A] = _ln_bb_clk2_ao.hw,
+   [RPMH_LN_BB_CLK3]   = _ln_bb_clk3.hw,
+   [RPMH_LN_BB_CLK3_A] = _ln_bb_clk3_ao.hw,
+   [RPMH_RF_CLK1]  = _rf_clk1.hw,
+   [RPMH_RF_CLK1_A]= _rf_clk1_ao.hw,
+   [RPMH_RF_CLK2]  = _rf_clk2.hw,
+   [RPMH_RF_CLK2_A]= _rf_clk2_ao.hw,
+   [RPMH_RF_CLK3]  = _rf_clk3.hw,
+   [RPMH_RF_CLK3_A]= _rf_clk3_ao.hw,
+};
+
+static const struct clk_rpmh_desc clk_rpmh_sm8150 = {
+   .clks = sm8150_rpmh_clocks,
+   .num_clks = ARRAY_SIZE(sm8150_rpmh_clocks),
+};
+
 static struct clk_hw *of_clk_rpmh_hw_get(struct of_phandle_args *clkspec,
 void *data)
 {
@@ -453,6 +480,7 @@ static int clk_rpmh_probe(struct platform_device *pdev)
 
 static const struct of_device_id clk_rpmh_match_table[] = {
{ .compatible = "qcom,sdm845-rpmh-clk", .data = _rpmh_sdm845},
+   { .compatible = "qcom,sm8150-rpmh-clk", .data = _rpmh_sm8150},
{ }
 };
 MODULE_DEVICE_TABLE(of, clk_rpmh_match_table);
-- 
2.20.1



[PATCH v4 3/4] dt-bindings: clock: Document SM8150 rpmh-clock compatible

2019-08-26 Thread Vinod Koul
Document the SM8150 rpmh-clock compatible for rpmh clock controller
found on SM8150 platforms.

Signed-off-by: Vinod Koul 
Reviewed-by: Bjorn Andersson 
---
 Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt 
b/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt
index 8b97968f9c88..365bbde599b1 100644
--- a/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt
+++ b/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt
@@ -6,7 +6,9 @@ some Qualcomm Technologies Inc. SoCs. It accepts clock requests 
from
 other hardware subsystems via RSC to control clocks.
 
 Required properties :
-- compatible : shall contain "qcom,sdm845-rpmh-clk"
+- compatible : must be one of:
+  "qcom,sdm845-rpmh-clk"
+  "qcom,sm8150-rpmh-clk"
 
 - #clock-cells : must contain 1
 - clocks: a list of phandles and clock-specifier pairs,
-- 
2.20.1



[PATCH v3] sched/fair: don't assign runtime for throttled cfs_rq

2019-08-26 Thread Liangyan
do_sched_cfs_period_timer() will refill cfs_b runtime and call
distribute_cfs_runtime to unthrottle cfs_rq, sometimes cfs_b->runtime
will allocate all quota to one cfs_rq incorrectly, then other cfs_rqs
attached to this cfs_b can't get runtime and will be throttled.

We find that one throttled cfs_rq has non-negative
cfs_rq->runtime_remaining and cause an unexpetced cast from s64 to u64
in snippet: distribute_cfs_runtime() {
runtime = -cfs_rq->runtime_remaining + 1; }.
The runtime here will change to a large number and consume all
cfs_b->runtime in this cfs_b period.

According to Ben Segall, the throttled cfs_rq can have
account_cfs_rq_runtime called on it because it is throttled before
idle_balance, and the idle_balance calls update_rq_clock to add time
that is accounted to the task.

This commit prevents cfs_rq to be assgined new runtime if it has been
throttled until that distribute_cfs_runtime is called.

Signed-off-by: Liangyan 
Reviewed-by: Ben Segall 
Reviewed-by: Valentin Schneider 
---
 kernel/sched/fair.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc9cfeaac8bd..500f5db0de0b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4470,6 +4470,8 @@ static void __account_cfs_rq_runtime(struct cfs_rq 
*cfs_rq, u64 delta_exec)
if (likely(cfs_rq->runtime_remaining > 0))
return;
 
+   if (cfs_rq->throttled)
+   return;
/*
 * if we're unable to extend our runtime we resched so that the active
 * hierarchy can be throttled
@@ -4673,6 +4675,9 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth 
*cfs_b,
if (!cfs_rq_throttled(cfs_rq))
goto next;
 
+   /* By the above check, this should never be true */
+   SCHED_WARN_ON(cfs_rq->runtime_remaining > 0);
+
runtime = -cfs_rq->runtime_remaining + 1;
if (runtime > remaining)
runtime = remaining;
-- 
2.14.4.44.g2045bb6



[PATCH v4 2/4] clk: qcom: clk-rpmh: Convert to parent data scheme

2019-08-26 Thread Vinod Koul
Convert the rpmh clock driver to use the new parent data scheme by
specifying the parent data for board clock.

Signed-off-by: Vinod Koul 
---
 drivers/clk/qcom/clk-rpmh.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/qcom/clk-rpmh.c b/drivers/clk/qcom/clk-rpmh.c
index c3fd632af119..97aa092f5f40 100644
--- a/drivers/clk/qcom/clk-rpmh.c
+++ b/drivers/clk/qcom/clk-rpmh.c
@@ -95,7 +95,10 @@ static DEFINE_MUTEX(rpmh_clk_lock);
.hw.init = &(struct clk_init_data){ \
.ops = _rpmh_ops,   \
.name = #_name, \
-   .parent_names = (const char *[]){ "xo_board" }, \
+   .parent_data =  &(const struct clk_parent_data){ \
+   .fw_name = "xo_board",  \
+   .name = "xo",   \
+   },  \
.num_parents = 1,   \
},  \
};  \
@@ -110,7 +113,10 @@ static DEFINE_MUTEX(rpmh_clk_lock);
.hw.init = &(struct clk_init_data){ \
.ops = _rpmh_ops,   \
.name = #_name_active,  \
-   .parent_names = (const char *[]){ "xo_board" }, \
+   .parent_data =  &(const struct clk_parent_data){ \
+   .fw_name = "xo_board",  \
+   .name = "xo",   \
+   },  \
.num_parents = 1,   \
},  \
}
-- 
2.20.1



[PATCH v4 1/4] dt-bindings: clock: Document the parent clocks

2019-08-26 Thread Vinod Koul
With clock parent data scheme we must specify the parent clocks for the
rpmhcc nodes. So describe the parent clock for rpmhcc in the bindings.

Signed-off-by: Vinod Koul 
Reviewed-by: Bjorn Andersson 
---
 Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt 
b/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt
index 3c007653da31..8b97968f9c88 100644
--- a/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt
+++ b/Documentation/devicetree/bindings/clock/qcom,rpmh-clk.txt
@@ -9,6 +9,9 @@ Required properties :
 - compatible : shall contain "qcom,sdm845-rpmh-clk"
 
 - #clock-cells : must contain 1
+- clocks: a list of phandles and clock-specifier pairs,
+ one for each entry in clock-names.
+- clock-names: Parent board clock: "xo".
 
 Example :
 
-- 
2.20.1



[PATCH v4 0/4] clk: qcom: Add support for SM8150

2019-08-26 Thread Vinod Koul
Add support for rpm clock controller found in SM8150 and while at it update
the driver to support parent data clock scheme as suggested by Stephen.

Changes since v3:
 - Make clock parent name as xo instead of xo_board

Changes since v2:
 - Add reviewed-by from Bjorn
 - Update the parent name as xo_board
 - Fix style issue

Changes since v1:
 - Describe parent clocks for rpmhcc
 - Add support for parent data scheme for rpmhcc

Vinod Koul (4):
  dt-bindings: clock: Document the parent clocks
  clk: qcom: clk-rpmh: Convert to parent data scheme
  dt-bindings: clock: Document SM8150 rpmh-clock compatible
  clk: qcom: clk-rpmh: Add support for SM8150

 .../bindings/clock/qcom,rpmh-clk.txt  |  7 +++-
 drivers/clk/qcom/clk-rpmh.c   | 38 ++-
 2 files changed, 42 insertions(+), 3 deletions(-)

-- 
2.20.1



Re: [PATCH] sched/cpufreq: Align trace event behavior of fast switching

2019-08-26 Thread Dietmar Eggemann
On 26/08/2019 13:24, Peter Zijlstra wrote:
> On Mon, Aug 26, 2019 at 11:51:17AM +0200, Dietmar Eggemann wrote:
> 
>> Not sure about the extra  'if trace_cpu_frequency_enabled()' but I guess
>> it doesn't hurt.
> 
> Without that you do that for_each_cpu() iteration unconditionally, even
> if the tracepoint is disabled.

Makes sense, I'm wondering if we want this in
cpufreq_notify_transition() CPUFREQ_POSTCHANGE for the
non-fast-switching drivers as well.


Re: [PATCH v2,2/3] dt-bindings: arm: amlogic: Add support for the Ugoos AM6

2019-08-26 Thread Rob Herring
On Sat, Aug 24, 2019 at 3:05 AM Christian Hewitt
 wrote:
>
> The Ugoos AM6 is based on the Amlogic W400 (G12B) reference design using the
> S922X chipset.
>
> Signed-off-by: Christian Hewitt 
> ---
>  Documentation/devicetree/bindings/arm/amlogic.yaml | 1 +
>  1 file changed, 1 insertion(+)

Acked-by: Rob Herring 


Re: [PATCH v2,1/3] dt-bindings: Add vendor prefix for Ugoos

2019-08-26 Thread Rob Herring
On Sat, Aug 24, 2019 at 3:05 AM Christian Hewitt
 wrote:
>
> Ugoos Industrial Co., Ltd. are a manufacturer of ARM based TV Boxes,
> Dongles, Digital Signage and Advertisement Solutions [0].
>
> [0] (https://ugoos.com)
>
> Signed-off-by: Christian Hewitt 
> ---
>  Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++
>  1 file changed, 2 insertions(+)

Acked-by: Rob Herring 


Re: [PATCH] scsi: aic7xxx: Remove dead code

2019-08-26 Thread Souptick Joarder
On Mon, Aug 26, 2019 at 2:03 PM James Bottomley  wrote:
>
> On Sat, 2019-08-24 at 20:38 +0530, Souptick Joarder wrote:
> > These are dead code since 2.6.13. If there is no plan
> > to use it further, these can be removed forever.
>
> Unless you can articulate a clear useful reason for the removal I'd
> rather keep this and the other code.  Most of the documentation for
> this chip is lost in the mists of time, so code fragments like this are
> the only way we know how it was supposed to work.

Considering your feedback, let's keep this code :)
>
> A clear reason might be that it's impossible for aic7xxx ever to make
> use of IU and QAS because they're LVD parameters and it's a SE/HVD
> card, so the documentation in the code is actively wrong, but you'd
> need to research that.

I need to look further to come up with a clear reason.


[RESEND RFC PATCH v3 20/20] mtd: spi-nor: Rework the disabling of block write protection

2019-08-26 Thread Tudor.Ambarus
From: Tudor Ambarus 

spi_nor_unlock() unlocks blocks of memory or the entire flash memory
array, if requested. clear_sr_bp() unlocks the entire flash memory
array at boot time. This calls for some unification, clear_sr_bp() is
just an optimization for the case when the unlock request covers the
entire flash size.

Merge the clear_sr_bp() and stm_lock/unlock logic and introduce
spi_nor_unlock_all(), which makes an unlock request that covers the
entire flash size.

Get rid of the MFR handling and implement specific manufacturer
default_init() fixup hooks.

Move write_sr_cr() to avoid to add a forward declaration. Prefix
new function with 'spi_nor_'.

Note that this changes a bit the logic for the SNOR_MFR_ATMEL and
SNOR_MFR_INTEL cases. Before this patch, the Atmel and Intel chips
did not set the locking ops, but unlocked the entire flash at boot
time, while now they are setting the locking ops to stm_locking_ops.
This should work, since the the disable of the block protection at the
boot time used the same Status Register bits to unlock the flash, as
in the stm_locking_ops case.

In future, we should probably add new hooks to
'struct spi_nor_flash_parameter' to describe how to interact with the
Status and Configuration Registers in the form of:
nor->params.ops->read_sr
nor->params.ops->write_sr
nor->params.ops->read_cr
nor->params.ops->write_sr
We can retrieve this info starting with JESD216 revB, by checking the
15th DWORD of Basic Flash Parameter Table, or with later revisions of
the standard, by parsing the "Status, Control and Configuration Register
Map for SPI Memory Devices".

Suggested-by: Boris Brezillon 
Signed-off-by: Tudor Ambarus 
---
v3: new patch

 drivers/mtd/spi-nor/spi-nor.c | 337 +++---
 include/linux/mtd/spi-nor.h   |   4 +-
 2 files changed, 185 insertions(+), 156 deletions(-)

diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index ec70b58294ec..01d4051510b6 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -1320,8 +1320,63 @@ static int spi_nor_erase(struct mtd_info *mtd, struct 
erase_info *instr)
return ret;
 }
 
-/* Write status register and ensure bits in mask match written values */
-static int write_sr_and_check(struct spi_nor *nor, u8 status_new, u8 mask)
+/*
+ * write_sr_cr() - Write the Status Register and Configuration Register.
+ * @nor:pointer to a 'struct spi_nor'.
+ * status_new: byte value to be written to the Status Register.
+ * mask:   mask with which to check the written values.
+ *
+ * Write Status Register and Configuration Register with 2 bytes. The first
+ * byte will be written to the status register, while the second byte will be
+ * written to the configuration register.
+ *
+ * Return: 0 on success, -errno otherwise.
+ */
+static int write_sr_cr(struct spi_nor *nor, u8 *sr_cr)
+{
+   int ret;
+
+   write_enable(nor);
+
+   if (nor->spimem) {
+   struct spi_mem_op op =
+   SPI_MEM_OP(SPI_MEM_OP_CMD(SPINOR_OP_WRSR, 1),
+  SPI_MEM_OP_NO_ADDR,
+  SPI_MEM_OP_NO_DUMMY,
+  SPI_MEM_OP_DATA_OUT(2, sr_cr, 1));
+
+   ret = spi_mem_exec_op(nor->spimem, );
+   } else {
+   ret = nor->write_reg(nor, SPINOR_OP_WRSR, sr_cr, 2);
+   }
+
+   if (ret < 0) {
+   dev_err(nor->dev,
+   "error while writing configuration register\n");
+   return -EINVAL;
+   }
+
+   ret = spi_nor_wait_till_ready(nor);
+   if (ret) {
+   dev_err(nor->dev,
+   "timeout while writing configuration register\n");
+   return ret;
+   }
+
+   return 0;
+}
+
+/**
+ * spi_nor_write_sr1_and_check() - Write one byte to the Status Register and
+ * ensure the bits in the mask match the written values.
+ * @nor:pointer to a 'struct spi_nor'.
+ * status_new: byte value to be written to the Status Register.
+ * mask:   mask with which to check the written values.
+ *
+ * Return: 0 on success, -errno otherwise.
+ */
+static int spi_nor_write_sr1_and_check(struct spi_nor *nor, u8 status_new,
+  u8 mask)
 {
int ret;
 
@@ -1341,6 +1396,88 @@ static int write_sr_and_check(struct spi_nor *nor, u8 
status_new, u8 mask)
return ((ret & mask) != (status_new & mask)) ? -EIO : 0;
 }
 
+/**
+ * spi_nor_write_sr_cr_and_check() - Write the Status Register and
+ * Configuration Register and ensure the bits in the mask match the written
+ * values.
+ * @nor:pointer to a 'struct spi_nor'.
+ * status_new: byte value to be written to the Status Register.
+ * mask:   mask with which to check the written values.
+ *
+ * The Configuration Register is written only when it has the Quad Enable (QE)
+ * bit set to one. When QE bit is set to 

[RESEND PATCH v3 09/20] mtd: spi-nor: Create a ->set_4byte() method

2019-08-26 Thread Tudor.Ambarus
From: Boris Brezillon 

The procedure used to enable 4 byte addressing mode depends on the NOR
device, so let's provide a hook so that manufacturer specific handling
can be implemented in a sane way.

Signed-off-by: Boris Brezillon 
[tudor.amba...@microchip.com: use nor->params.set_4byte() instead of
nor->set_4byte()]
Signed-off-by: Tudor Ambarus 
---
v3: no changes

 drivers/mtd/spi-nor/spi-nor.c | 76 ++-
 include/linux/mtd/spi-nor.h   |  2 ++
 2 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index 1e7f8dc3457d..235e82a121a1 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -633,6 +633,17 @@ static int macronix_set_4byte(struct spi_nor *nor, bool 
enable)
  NULL, 0);
 }
 
+static int st_micron_set_4byte(struct spi_nor *nor, bool enable)
+{
+   int ret;
+
+   write_enable(nor);
+   ret = macronix_set_4byte(nor, enable);
+   write_disable(nor);
+
+   return ret;
+}
+
 static int spansion_set_4byte(struct spi_nor *nor, bool enable)
 {
nor->bouncebuf[0] = enable << 7;
@@ -667,45 +678,24 @@ static int spi_nor_write_ear(struct spi_nor *nor, u8 ear)
return nor->write_reg(nor, SPINOR_OP_WREAR, nor->bouncebuf, 1);
 }
 
-/* Enable/disable 4-byte addressing mode. */
-static int set_4byte(struct spi_nor *nor, bool enable)
+static int winbond_set_4byte(struct spi_nor *nor, bool enable)
 {
-   int status;
-   bool need_wren = false;
-
-   switch (JEDEC_MFR(nor->info)) {
-   case SNOR_MFR_ST:
-   case SNOR_MFR_MICRON:
-   /* Some Micron need WREN command; all will accept it */
-   need_wren = true;
-   /* fall through */
-   case SNOR_MFR_MACRONIX:
-   case SNOR_MFR_WINBOND:
-   if (need_wren)
-   write_enable(nor);
+   int ret;
 
-   status = macronix_set_4byte(nor, enable);
-   if (need_wren)
-   write_disable(nor);
+   ret = macronix_set_4byte(nor, enable);
+   if (ret || enable)
+   return ret;
 
-   if (!status && !enable &&
-   JEDEC_MFR(nor->info) == SNOR_MFR_WINBOND) {
-   /*
-* On Winbond W25Q256FV, leaving 4byte mode causes
-* the Extended Address Register to be set to 1, so all
-* 3-byte-address reads come from the second 16M.
-* We must clear the register to enable normal behavior.
-*/
-   write_enable(nor);
-   spi_nor_write_ear(nor, 0);
-   write_disable(nor);
-   }
+   /*
+* On Winbond W25Q256FV, leaving 4byte mode causes the Extended Address
+* Register to be set to 1, so all 3-byte-address reads come from the
+* second 16M. We must clear the register to enable normal behavior.
+*/
+   write_enable(nor);
+   ret = spi_nor_write_ear(nor, 0);
+   write_disable(nor);
 
-   return status;
-   default:
-   /* Spansion style */
-   return spansion_set_4byte(nor, enable);
-   }
+   return ret;
 }
 
 static int spi_nor_xread_sr(struct spi_nor *nor, u8 *sr)
@@ -4153,11 +4143,18 @@ static int spi_nor_parse_sfdp(struct spi_nor *nor,
 static void macronix_set_default_init(struct spi_nor *nor)
 {
nor->params.quad_enable = macronix_quad_enable;
+   nor->params.set_4byte = macronix_set_4byte;
 }
 
 static void st_micron_set_default_init(struct spi_nor *nor)
 {
nor->params.quad_enable = NULL;
+   nor->params.set_4byte = st_micron_set_4byte;
+}
+
+static void winbond_set_default_init(struct spi_nor *nor)
+{
+   nor->params.set_4byte = winbond_set_4byte;
 }
 
 /**
@@ -4178,6 +4175,10 @@ static void spi_nor_manufacturer_init_params(struct 
spi_nor *nor)
st_micron_set_default_init(nor);
break;
 
+   case SNOR_MFR_WINBOND:
+   winbond_set_default_init(nor);
+   break;
+
default:
break;
}
@@ -4222,6 +4223,7 @@ static void spi_nor_info_init_params(struct spi_nor *nor)
 
/* Initialize legacy flash parameters and settings. */
params->quad_enable = spansion_quad_enable;
+   params->set_4byte = spansion_set_4byte;
 
/* Set SPI NOR sizes. */
params->size = (u64)info->sector_size * info->n_sectors;
@@ -4587,7 +4589,7 @@ static int spi_nor_init(struct spi_nor *nor)
 */
WARN_ONCE(nor->flags & SNOR_F_BROKEN_RESET,
  "enabling reset hack; may not recover from unexpected 
reboots\n");
-   set_4byte(nor, true);
+   nor->params.set_4byte(nor, true);
}
 
return 0;
@@ -4611,7 +4613,7 @@ void 

[RESEND PATCH v3 17/20] mtd: spi-nor: Bring flash params init together

2019-08-26 Thread Tudor.Ambarus
From: Tudor Ambarus 

Bring all flash parameters default initialization in
spi_nor_legacy_params_init().

Signed-off-by: Tudor Ambarus 
Reviewed-by: Boris Brezillon 
---
v3: collect R-b

 drivers/mtd/spi-nor/spi-nor.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index 2699e999d21a..dcda96a20f6c 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -4453,6 +4453,7 @@ static void spi_nor_info_init_params(struct spi_nor *nor)
struct spi_nor_flash_parameter *params = >params;
struct spi_nor_erase_map *map = >erase_map;
const struct flash_info *info = nor->info;
+   struct device_node *np = spi_nor_get_flash_node(nor);
u8 i, erase_mask;
 
/* Initialize legacy flash parameters and settings. */
@@ -4464,18 +4465,25 @@ static void spi_nor_info_init_params(struct spi_nor 
*nor)
params->size = (u64)info->sector_size * info->n_sectors;
params->page_size = info->page_size;
 
+   if (!(info->flags & SPI_NOR_NO_FR)) {
+   /* Default to Fast Read for DT and non-DT platform devices. */
+   params->hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
+
+   /* Mask out Fast Read if not requested at DT instantiation. */
+   if (np && !of_property_read_bool(np, "m25p,fast-read"))
+   params->hwcaps.mask &= ~SNOR_HWCAPS_READ_FAST;
+   }
+
/* (Fast) Read settings. */
params->hwcaps.mask |= SNOR_HWCAPS_READ;
spi_nor_set_read_settings(>reads[SNOR_CMD_READ],
  0, 0, SPINOR_OP_READ,
  SNOR_PROTO_1_1_1);
 
-   if (!(info->flags & SPI_NOR_NO_FR)) {
-   params->hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
+   if (params->hwcaps.mask & SNOR_HWCAPS_READ_FAST)
spi_nor_set_read_settings(>reads[SNOR_CMD_READ_FAST],
  0, 8, SPINOR_OP_READ_FAST,
  SNOR_PROTO_1_1_1);
-   }
 
if (info->flags & SPI_NOR_DUAL_READ) {
params->hwcaps.mask |= SNOR_HWCAPS_READ_1_1_2;
@@ -4864,24 +4872,9 @@ int spi_nor_scan(struct spi_nor *nor, const char *name,
nor->page_size = params->page_size;
mtd->writebufsize = nor->page_size;
 
-   if (np) {
-   /* If we were instantiated by DT, use it */
-   if (of_property_read_bool(np, "m25p,fast-read"))
-   params->hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
-   else
-   params->hwcaps.mask &= ~SNOR_HWCAPS_READ_FAST;
-   } else {
-   /* If we weren't instantiated by DT, default to fast-read */
-   params->hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
-   }
-
if (of_property_read_bool(np, "broken-flash-reset"))
nor->flags |= SNOR_F_BROKEN_RESET;
 
-   /* Some devices cannot do fast-read, no matter what DT tells us */
-   if (info->flags & SPI_NOR_NO_FR)
-   params->hwcaps.mask &= ~SNOR_HWCAPS_READ_FAST;
-
/*
 * Configure the SPI memory:
 * - select op codes for (Fast) Read, Page Program and Sector Erase.
-- 
2.9.5



[RESEND PATCH v3 19/20] mtd: spi-nor: Introduce spi_nor_get_flash_info()

2019-08-26 Thread Tudor.Ambarus
From: Tudor Ambarus 

Dedicate a function for getting the pointer to the flash_info
const struct. Trim a bit the spi_nor_scan() huge function.

Signed-off-by: Tudor Ambarus 
Reviewed-by: Boris Brezillon 
---
v3: no changes

 drivers/mtd/spi-nor/spi-nor.c | 76 +--
 1 file changed, 44 insertions(+), 32 deletions(-)

diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index d13317d1f372..ec70b58294ec 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -4766,10 +4766,50 @@ static int spi_nor_set_addr_width(struct spi_nor *nor)
return 0;
 }
 
+static const struct flash_info *spi_nor_get_flash_info(struct spi_nor *nor,
+  const char *name)
+{
+   const struct flash_info *info = NULL;
+
+   if (name)
+   info = spi_nor_match_id(name);
+   /* Try to auto-detect if chip name wasn't specified or not found */
+   if (!info)
+   info = spi_nor_read_id(nor);
+   if (IS_ERR_OR_NULL(info))
+   return ERR_PTR(-ENOENT);
+
+   /*
+* If caller has specified name of flash model that can normally be
+* detected using JEDEC, let's verify it.
+*/
+   if (name && info->id_len) {
+   const struct flash_info *jinfo;
+
+   jinfo = spi_nor_read_id(nor);
+   if (IS_ERR(jinfo)) {
+   return jinfo;
+   } else if (jinfo != info) {
+   /*
+* JEDEC knows better, so overwrite platform ID. We
+* can't trust partitions any longer, but we'll let
+* mtd apply them anyway, since some partitions may be
+* marked read-only, and we don't want to lose that
+* information, even if it's not 100% accurate.
+*/
+   dev_warn(nor->dev, "found %s, expected %s\n",
+jinfo->name, info->name);
+   info = jinfo;
+   }
+   }
+
+   return info;
+}
+
 int spi_nor_scan(struct spi_nor *nor, const char *name,
 const struct spi_nor_hwcaps *hwcaps)
 {
-   const struct flash_info *info = NULL;
+   const struct flash_info *info;
struct device *dev = nor->dev;
struct mtd_info *mtd = >mtd;
struct device_node *np = spi_nor_get_flash_node(nor);
@@ -4800,37 +4840,9 @@ int spi_nor_scan(struct spi_nor *nor, const char *name,
if (!nor->bouncebuf)
return -ENOMEM;
 
-   if (name)
-   info = spi_nor_match_id(name);
-   /* Try to auto-detect if chip name wasn't specified or not found */
-   if (!info)
-   info = spi_nor_read_id(nor);
-   if (IS_ERR_OR_NULL(info))
-   return -ENOENT;
-
-   /*
-* If caller has specified name of flash model that can normally be
-* detected using JEDEC, let's verify it.
-*/
-   if (name && info->id_len) {
-   const struct flash_info *jinfo;
-
-   jinfo = spi_nor_read_id(nor);
-   if (IS_ERR(jinfo)) {
-   return PTR_ERR(jinfo);
-   } else if (jinfo != info) {
-   /*
-* JEDEC knows better, so overwrite platform ID. We
-* can't trust partitions any longer, but we'll let
-* mtd apply them anyway, since some partitions may be
-* marked read-only, and we don't want to lose that
-* information, even if it's not 100% accurate.
-*/
-   dev_warn(dev, "found %s, expected %s\n",
-jinfo->name, info->name);
-   info = jinfo;
-   }
-   }
+   info = spi_nor_get_flash_info(nor, name);
+   if (IS_ERR(info))
+   return PTR_ERR(info);
 
nor->info = info;
 
-- 
2.9.5



[RESEND PATCH v3 15/20] mtd: spi-nor: Add s3an_post_sfdp_fixups()

2019-08-26 Thread Tudor.Ambarus
From: Tudor Ambarus 

s3an_nor_scan() was overriding the opcode selection done in
spi_nor_default_setup(). Set nor->setup() method in order to
avoid the unnecessary call to spi_nor_default_setup().

Signed-off-by: Tudor Ambarus 
Reviewed-by: Boris Brezillon 
---
v3: no changes, rebase on previous commits

 drivers/mtd/spi-nor/spi-nor.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index 2aca56e07341..edf1c8badac9 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -2718,7 +2718,8 @@ static int spi_nor_check(struct spi_nor *nor)
return 0;
 }
 
-static int s3an_nor_scan(struct spi_nor *nor)
+static int s3an_nor_setup(struct spi_nor *nor,
+ const struct spi_nor_hwcaps *hwcaps)
 {
int ret;
 
@@ -4530,6 +4531,11 @@ static void spansion_post_sfdp_fixups(struct spi_nor 
*nor)
nor->mtd.erasesize = nor->info->sector_size;
 }
 
+static void s3an_post_sfdp_fixups(struct spi_nor *nor)
+{
+   nor->params.setup = s3an_nor_setup;
+}
+
 /**
  * spi_nor_post_sfdp_fixups() - Updates the flash's parameters and settings
  * after SFDP has been parsed (is also called for SPI NORs that do not
@@ -4551,6 +4557,9 @@ static void spi_nor_post_sfdp_fixups(struct spi_nor *nor)
break;
}
 
+   if (nor->info->flags & SPI_S3AN)
+   s3an_post_sfdp_fixups(nor);
+
if (nor->info->fixups && nor->info->fixups->post_sfdp)
nor->info->fixups->post_sfdp(nor);
 }
@@ -4899,12 +4908,6 @@ int spi_nor_scan(struct spi_nor *nor, const char *name,
return -EINVAL;
}
 
-   if (info->flags & SPI_S3AN) {
-   ret = s3an_nor_scan(nor);
-   if (ret)
-   return ret;
-   }
-
/* Send all the required SPI flash commands to initialize device */
ret = spi_nor_init(nor);
if (ret)
-- 
2.9.5



<    1   2   3   4   5   6   7   8   9   10   >