Re: [RFC PATCH] mm, memory_hotplug: do not clear numa_node association after hot_remove

2018-11-08 Thread Michal Hocko
On Fri 09-11-18 09:12:09, Anshuman Khandual wrote:
> 
> 
> On 11/08/2018 03:59 PM, Michal Hocko wrote:
> > [Removing Wen Congyang and Tang Chen from the CC list because their
> >  emails bounce. It seems that we will never learn about their motivation]
> > 
> > On Thu 08-11-18 11:04:13, Michal Hocko wrote:
> >> From: Michal Hocko 
> >>
> >> Per-cpu numa_node provides a default node for each possible cpu. The
> >> association gets initialized during the boot when the architecture
> >> specific code explores cpu->NUMA affinity. When the whole NUMA node is
> >> removed though we are clearing this association
> >>
> >> try_offline_node
> >>   check_and_unmap_cpu_on_node
> >> unmap_cpu_on_node
> >>   numa_clear_node
> >> numa_set_node(cpu, NUMA_NO_NODE)
> >>
> >> This means that whoever calls cpu_to_node for a cpu associated with such
> >> a node will get NUMA_NO_NODE. This is problematic for two reasons. First
> >> it is fragile because __alloc_pages_node would simply blow up on an
> >> out-of-bound access. We have encountered this when loading kvm module
> >> BUG: unable to handle kernel paging request at 21c0
> >> IP: [] __alloc_pages_nodemask+0x93/0xb70
> >> PGD 80ffe853e067 PUD 7336bbc067 PMD 0
> >> Oops:  [#1] SMP
> >> [...]
> >> CPU: 88 PID: 1223749 Comm: modprobe Tainted: GW  
> >> 4.4.156-94.64-default #1
> >> task: 88727eff1880 ti: 88735449 task.ti: 88735449
> >> RIP: 0010:[]  [] 
> >> __alloc_pages_nodemask+0x93/0xb70
> >> RSP: 0018:887354493b40  EFLAGS: 00010202
> >> RAX: 21c0 RBX:  RCX: 
> >> RDX:  RSI: 0002 RDI: 014000c0
> >> RBP: 014000c0 R08:  R09: 
> >> R10: 88fffc89e790 R11: 00014000 R12: 0101
> >> R13: a0772cd4 R14: a0769ac0 R15: 
> >> FS:  7fdf2f2f1700() GS:88fffc88() 
> >> knlGS:
> >> CS:  0010 DS:  ES:  CR0: 80050033
> >> CR2: 21c0 CR3: 0077205ee000 CR4: 00360670
> >> DR0:  DR1:  DR2: 
> >> DR3:  DR6: fffe0ff0 DR7: 0400
> >> Stack:
> >>  0086 014000c014d20400 887354493bb8 882614d20f4c
> >>   0046 0046 810ac0c9
> >>  88ffe78c 009f e8ffe82d3500 88ff8ac55000
> >> Call Trace:
> >>  [] alloc_vmcs_cpu+0x3d/0x90 [kvm_intel]
> >>  [] hardware_setup+0x781/0x849 [kvm_intel]
> >>  [] kvm_arch_hardware_setup+0x28/0x190 [kvm]
> >>  [] kvm_init+0x7c/0x2d0 [kvm]
> >>  [] vmx_init+0x1e/0x32c [kvm_intel]
> >>  [] do_one_initcall+0xca/0x1f0
> >>  [] do_init_module+0x5a/0x1d7
> >>  [] load_module+0x1393/0x1c90
> >>  [] SYSC_finit_module+0x70/0xa0
> >>  [] entry_SYSCALL_64_fastpath+0x1e/0xb7
> >> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb7
> >>
> >> on an older kernel but the code is basically the same in the current
> >> Linus tree as well. alloc_vmcs_cpu could use alloc_pages_nodemask which
> >> would recognize NUMA_NO_NODE and use alloc_pages_node which would translate
> >> it to numa_mem_id but that is wrong as well because it would use a cpu
> >> affinity of the local CPU which might be quite far from the original node.
> 
> But then the original node is getting/already off-lined. The allocation is
> going to come from a different node. alloc_pages_node() at least steer the
> allocation alway from VM_BUG_ON() because of NUMA_NO_NODE by replacing it
> with numa_mem_id().
> 
> If node fallback order is important for this allocation then could not it
> use __alloc_pages_nodemask() directly giving preference for its zonelist
> node and nodemask. Just curious.

How does the caller get the right node to allocate from? We do have the
proper zone list for the offline node so why not use it?

> >> It is also reasonable to expect that cpu_to_node will provide a sane value
> >> and there might be many more callers like that.
> 
> AFAICS there are two choices here. Either mark them NUMA_NO_NODE for all
> cpus of a node going offline or keep the existing mapping in case the node
> comes back again.

Or update the mapping to the closeses node. I have chosen to keep the
mapping because it is the easiest and the most natural one.

> >> The second problem is that __register_one_node relies on cpu_to_node
> >> to properly associate cpus back to the node when it is onlined. We do
> >> not want to lose that link as there is no arch independent way to get it
> >> from the early boot time AFAICS.
> 
> Retaining the links seems to be right unless unmap_cpu_on_node() is sort
> of a weak callback letting arch to decide.
> 
> >>
> >> Drop the whole check_and_unmap_cpu_on_node machinery and keep the
> >> association to fix both issues. The NODE_DATA(nid) is not deallocated
> Though retaining the link is a problem 

Re: [RFC PATCH] mm, memory_hotplug: do not clear numa_node association after hot_remove

2018-11-08 Thread Michal Hocko
On Fri 09-11-18 09:12:09, Anshuman Khandual wrote:
> 
> 
> On 11/08/2018 03:59 PM, Michal Hocko wrote:
> > [Removing Wen Congyang and Tang Chen from the CC list because their
> >  emails bounce. It seems that we will never learn about their motivation]
> > 
> > On Thu 08-11-18 11:04:13, Michal Hocko wrote:
> >> From: Michal Hocko 
> >>
> >> Per-cpu numa_node provides a default node for each possible cpu. The
> >> association gets initialized during the boot when the architecture
> >> specific code explores cpu->NUMA affinity. When the whole NUMA node is
> >> removed though we are clearing this association
> >>
> >> try_offline_node
> >>   check_and_unmap_cpu_on_node
> >> unmap_cpu_on_node
> >>   numa_clear_node
> >> numa_set_node(cpu, NUMA_NO_NODE)
> >>
> >> This means that whoever calls cpu_to_node for a cpu associated with such
> >> a node will get NUMA_NO_NODE. This is problematic for two reasons. First
> >> it is fragile because __alloc_pages_node would simply blow up on an
> >> out-of-bound access. We have encountered this when loading kvm module
> >> BUG: unable to handle kernel paging request at 21c0
> >> IP: [] __alloc_pages_nodemask+0x93/0xb70
> >> PGD 80ffe853e067 PUD 7336bbc067 PMD 0
> >> Oops:  [#1] SMP
> >> [...]
> >> CPU: 88 PID: 1223749 Comm: modprobe Tainted: GW  
> >> 4.4.156-94.64-default #1
> >> task: 88727eff1880 ti: 88735449 task.ti: 88735449
> >> RIP: 0010:[]  [] 
> >> __alloc_pages_nodemask+0x93/0xb70
> >> RSP: 0018:887354493b40  EFLAGS: 00010202
> >> RAX: 21c0 RBX:  RCX: 
> >> RDX:  RSI: 0002 RDI: 014000c0
> >> RBP: 014000c0 R08:  R09: 
> >> R10: 88fffc89e790 R11: 00014000 R12: 0101
> >> R13: a0772cd4 R14: a0769ac0 R15: 
> >> FS:  7fdf2f2f1700() GS:88fffc88() 
> >> knlGS:
> >> CS:  0010 DS:  ES:  CR0: 80050033
> >> CR2: 21c0 CR3: 0077205ee000 CR4: 00360670
> >> DR0:  DR1:  DR2: 
> >> DR3:  DR6: fffe0ff0 DR7: 0400
> >> Stack:
> >>  0086 014000c014d20400 887354493bb8 882614d20f4c
> >>   0046 0046 810ac0c9
> >>  88ffe78c 009f e8ffe82d3500 88ff8ac55000
> >> Call Trace:
> >>  [] alloc_vmcs_cpu+0x3d/0x90 [kvm_intel]
> >>  [] hardware_setup+0x781/0x849 [kvm_intel]
> >>  [] kvm_arch_hardware_setup+0x28/0x190 [kvm]
> >>  [] kvm_init+0x7c/0x2d0 [kvm]
> >>  [] vmx_init+0x1e/0x32c [kvm_intel]
> >>  [] do_one_initcall+0xca/0x1f0
> >>  [] do_init_module+0x5a/0x1d7
> >>  [] load_module+0x1393/0x1c90
> >>  [] SYSC_finit_module+0x70/0xa0
> >>  [] entry_SYSCALL_64_fastpath+0x1e/0xb7
> >> DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb7
> >>
> >> on an older kernel but the code is basically the same in the current
> >> Linus tree as well. alloc_vmcs_cpu could use alloc_pages_nodemask which
> >> would recognize NUMA_NO_NODE and use alloc_pages_node which would translate
> >> it to numa_mem_id but that is wrong as well because it would use a cpu
> >> affinity of the local CPU which might be quite far from the original node.
> 
> But then the original node is getting/already off-lined. The allocation is
> going to come from a different node. alloc_pages_node() at least steer the
> allocation alway from VM_BUG_ON() because of NUMA_NO_NODE by replacing it
> with numa_mem_id().
> 
> If node fallback order is important for this allocation then could not it
> use __alloc_pages_nodemask() directly giving preference for its zonelist
> node and nodemask. Just curious.

How does the caller get the right node to allocate from? We do have the
proper zone list for the offline node so why not use it?

> >> It is also reasonable to expect that cpu_to_node will provide a sane value
> >> and there might be many more callers like that.
> 
> AFAICS there are two choices here. Either mark them NUMA_NO_NODE for all
> cpus of a node going offline or keep the existing mapping in case the node
> comes back again.

Or update the mapping to the closeses node. I have chosen to keep the
mapping because it is the easiest and the most natural one.

> >> The second problem is that __register_one_node relies on cpu_to_node
> >> to properly associate cpus back to the node when it is onlined. We do
> >> not want to lose that link as there is no arch independent way to get it
> >> from the early boot time AFAICS.
> 
> Retaining the links seems to be right unless unmap_cpu_on_node() is sort
> of a weak callback letting arch to decide.
> 
> >>
> >> Drop the whole check_and_unmap_cpu_on_node machinery and keep the
> >> association to fix both issues. The NODE_DATA(nid) is not deallocated
> Though retaining the link is a problem 

Re: [PATCH 07/10] irqchip/gic-v3-its: Split probing from its node initialization

2018-11-08 Thread Richter, Robert
On 08.11.18 11:25:24, Julien Thierry wrote:
> On 07/11/18 22:03, Robert Richter wrote:

> >-static int its_init_domain(struct fwnode_handle *handle, struct its_node 
> >*its)
> >+static int its_init_domain(struct its_node *its)
> >  {
> >  struct irq_domain *inner_domain;
> >  struct msi_domain_info *info;
> >@@ -3384,7 +3385,8 @@ static int its_init_domain(struct fwnode_handle 
> >*handle, struct its_node *its)
> >  if (!info)
> >  return -ENOMEM;
> >
> >- inner_domain = irq_domain_create_tree(handle, _domain_ops, its);
> >+ inner_domain = irq_domain_create_tree(its->fwnode_handle,
> >+ _domain_ops, its);
> 
> Separate change?
> 
> >  if (!inner_domain) {
> >  kfree(info);
> >  return -ENOMEM;
> >@@ -3441,8 +3443,7 @@ static int its_init_vpe_domain(void)
> >  return 0;
> >  }
> >
> >-static int __init its_compute_its_list_map(struct resource *res,
> >-void __iomem *its_base)
> >+static int __init its_compute_its_list_map(struct its_node *its)
> >  {
> >  int its_number;
> >  u32 ctlr;
> >@@ -3456,15 +3457,15 @@ static int __init its_compute_its_list_map(struct 
> >resource *res,
> >  its_number = find_first_zero_bit(_list_map, GICv4_ITS_LIST_MAX);
> >  if (its_number >= GICv4_ITS_LIST_MAX) {
> >  pr_err("ITS@%pa: No ITSList entry available!\n",
> >->start);
> >+>phys_base);
> >  return -EINVAL;
> >  }
> >
> >- ctlr = readl_relaxed(its_base + GITS_CTLR);
> >+ ctlr = readl_relaxed(its->base + GITS_CTLR);
> >  ctlr &= ~GITS_CTLR_ITS_NUMBER;
> >  ctlr |= its_number << GITS_CTLR_ITS_NUMBER_SHIFT;
> >- writel_relaxed(ctlr, its_base + GITS_CTLR);
> >- ctlr = readl_relaxed(its_base + GITS_CTLR);
> >+ writel_relaxed(ctlr, its->base + GITS_CTLR);
> >+ ctlr = readl_relaxed(its->base + GITS_CTLR);
> 
> This (removal of its_base parameter) also feel like a separate change.

In a separate change the motivation of the change would not be
obvious. While the change of the variable itself is trivial from the
perspective of review and testing, I decided to keep it in the context
of the overall change of this patch.

> 
> Also, I would define a local variable its_base to avoid dereferencing
> its every time in order to get the base address.

Hmm, there is not much difference in reading the code then, while the
use of a local variable just adds more code without benefit. The
compiler does not care as the value is probably stored in a register
anyway. There are also other struct members, should all of them being
mirrored in a local variable?

> 
> >  if ((ctlr & GITS_CTLR_ITS_NUMBER) != (its_number << 
> > GITS_CTLR_ITS_NUMBER_SHIFT)) {
> >  its_number = ctlr & GITS_CTLR_ITS_NUMBER;
> >  its_number >>= GITS_CTLR_ITS_NUMBER_SHIFT;
> >@@ -3472,83 +3473,110 @@ static int __init its_compute_its_list_map(struct 
> >resource *res,
> >
> >  if (test_and_set_bit(its_number, _list_map)) {
> >  pr_err("ITS@%pa: Duplicate ITSList entry %d\n",
> >->start, its_number);
> >+>phys_base, its_number);
> >  return -EINVAL;
> >  }
> >
> >  return its_number;
> >  }
> >
> >+static void its_free(struct its_node *its)
> >+{
> >+ raw_spin_lock(_lock);
> >+ list_del(>entry);
> >+ raw_spin_unlock(_lock);
> >+
> >+ kfree(its);
> >+}
> >+
> >+static int __init its_init_one(struct its_node *its);
> 
> You might as well define its_init_one here, no?

This is an intermediate definition that will be removed in a later
patch. Moving the whole code here would make the change less readable.

> 
> >+
> >  static int __init its_probe_one(struct resource *res,
> >  struct fwnode_handle *handle, int numa_node)
> >  {
> >  struct its_node *its;
> >+ int err;
> >+
> >+ its = kzalloc(sizeof(*its), GFP_KERNEL);
> >+ if (!its)
> >+ return -ENOMEM;
> >+
> >+ raw_spin_lock_init(>lock);
> >+ INIT_LIST_HEAD(>entry);
> >+ INIT_LIST_HEAD(>its_device_list);
> >+ its->fwnode_handle = handle;
> >+ its->phys_base = res->start;
> >+ its->phys_size = resource_size(res);
> >+ its->numa_node = numa_node;
> >+
> >+ raw_spin_lock(_lock);
> >+ list_add_tail(>entry, _nodes);
> >+ raw_spin_unlock(_lock);
> >+
> >+ pr_info("ITS %pR\n", res);
> >+
> >+ err = its_init_one(its);
> >+ if (err)
> >+ its_free(its);
> >+
> >+ return err;
> >+}
> >+
> >+static int __init its_init_one(struct its_node *its)
> >+{
> >  void __iomem *its_base;
> >  u32 val, ctlr;
> >  u64 baser, tmp, typer;
> >  int err;
> >
> >- its_base = ioremap(res->start, resource_size(res));
> >+ its_base = ioremap(its->phys_base, its->phys_size);
> >  if (!its_base) {
> >- pr_warn("ITS@%pa: 

Re: [PATCH 07/10] irqchip/gic-v3-its: Split probing from its node initialization

2018-11-08 Thread Richter, Robert
On 08.11.18 11:25:24, Julien Thierry wrote:
> On 07/11/18 22:03, Robert Richter wrote:

> >-static int its_init_domain(struct fwnode_handle *handle, struct its_node 
> >*its)
> >+static int its_init_domain(struct its_node *its)
> >  {
> >  struct irq_domain *inner_domain;
> >  struct msi_domain_info *info;
> >@@ -3384,7 +3385,8 @@ static int its_init_domain(struct fwnode_handle 
> >*handle, struct its_node *its)
> >  if (!info)
> >  return -ENOMEM;
> >
> >- inner_domain = irq_domain_create_tree(handle, _domain_ops, its);
> >+ inner_domain = irq_domain_create_tree(its->fwnode_handle,
> >+ _domain_ops, its);
> 
> Separate change?
> 
> >  if (!inner_domain) {
> >  kfree(info);
> >  return -ENOMEM;
> >@@ -3441,8 +3443,7 @@ static int its_init_vpe_domain(void)
> >  return 0;
> >  }
> >
> >-static int __init its_compute_its_list_map(struct resource *res,
> >-void __iomem *its_base)
> >+static int __init its_compute_its_list_map(struct its_node *its)
> >  {
> >  int its_number;
> >  u32 ctlr;
> >@@ -3456,15 +3457,15 @@ static int __init its_compute_its_list_map(struct 
> >resource *res,
> >  its_number = find_first_zero_bit(_list_map, GICv4_ITS_LIST_MAX);
> >  if (its_number >= GICv4_ITS_LIST_MAX) {
> >  pr_err("ITS@%pa: No ITSList entry available!\n",
> >->start);
> >+>phys_base);
> >  return -EINVAL;
> >  }
> >
> >- ctlr = readl_relaxed(its_base + GITS_CTLR);
> >+ ctlr = readl_relaxed(its->base + GITS_CTLR);
> >  ctlr &= ~GITS_CTLR_ITS_NUMBER;
> >  ctlr |= its_number << GITS_CTLR_ITS_NUMBER_SHIFT;
> >- writel_relaxed(ctlr, its_base + GITS_CTLR);
> >- ctlr = readl_relaxed(its_base + GITS_CTLR);
> >+ writel_relaxed(ctlr, its->base + GITS_CTLR);
> >+ ctlr = readl_relaxed(its->base + GITS_CTLR);
> 
> This (removal of its_base parameter) also feel like a separate change.

In a separate change the motivation of the change would not be
obvious. While the change of the variable itself is trivial from the
perspective of review and testing, I decided to keep it in the context
of the overall change of this patch.

> 
> Also, I would define a local variable its_base to avoid dereferencing
> its every time in order to get the base address.

Hmm, there is not much difference in reading the code then, while the
use of a local variable just adds more code without benefit. The
compiler does not care as the value is probably stored in a register
anyway. There are also other struct members, should all of them being
mirrored in a local variable?

> 
> >  if ((ctlr & GITS_CTLR_ITS_NUMBER) != (its_number << 
> > GITS_CTLR_ITS_NUMBER_SHIFT)) {
> >  its_number = ctlr & GITS_CTLR_ITS_NUMBER;
> >  its_number >>= GITS_CTLR_ITS_NUMBER_SHIFT;
> >@@ -3472,83 +3473,110 @@ static int __init its_compute_its_list_map(struct 
> >resource *res,
> >
> >  if (test_and_set_bit(its_number, _list_map)) {
> >  pr_err("ITS@%pa: Duplicate ITSList entry %d\n",
> >->start, its_number);
> >+>phys_base, its_number);
> >  return -EINVAL;
> >  }
> >
> >  return its_number;
> >  }
> >
> >+static void its_free(struct its_node *its)
> >+{
> >+ raw_spin_lock(_lock);
> >+ list_del(>entry);
> >+ raw_spin_unlock(_lock);
> >+
> >+ kfree(its);
> >+}
> >+
> >+static int __init its_init_one(struct its_node *its);
> 
> You might as well define its_init_one here, no?

This is an intermediate definition that will be removed in a later
patch. Moving the whole code here would make the change less readable.

> 
> >+
> >  static int __init its_probe_one(struct resource *res,
> >  struct fwnode_handle *handle, int numa_node)
> >  {
> >  struct its_node *its;
> >+ int err;
> >+
> >+ its = kzalloc(sizeof(*its), GFP_KERNEL);
> >+ if (!its)
> >+ return -ENOMEM;
> >+
> >+ raw_spin_lock_init(>lock);
> >+ INIT_LIST_HEAD(>entry);
> >+ INIT_LIST_HEAD(>its_device_list);
> >+ its->fwnode_handle = handle;
> >+ its->phys_base = res->start;
> >+ its->phys_size = resource_size(res);
> >+ its->numa_node = numa_node;
> >+
> >+ raw_spin_lock(_lock);
> >+ list_add_tail(>entry, _nodes);
> >+ raw_spin_unlock(_lock);
> >+
> >+ pr_info("ITS %pR\n", res);
> >+
> >+ err = its_init_one(its);
> >+ if (err)
> >+ its_free(its);
> >+
> >+ return err;
> >+}
> >+
> >+static int __init its_init_one(struct its_node *its)
> >+{
> >  void __iomem *its_base;
> >  u32 val, ctlr;
> >  u64 baser, tmp, typer;
> >  int err;
> >
> >- its_base = ioremap(res->start, resource_size(res));
> >+ its_base = ioremap(its->phys_base, its->phys_size);
> >  if (!its_base) {
> >- pr_warn("ITS@%pa: 

RE: [PATCH] freezer: fix freeze timeout on exec

2018-11-08 Thread Chanho Min
> >
> > Can't we simply change de_thread() to use freezable_schedule() ?
> >
> > Oleg.
> 
> We need to change freezable_schedule_timeout() instead.
> freezable_schedule also can't be frozen if sub-threads can't stop
> schedule().
> Furthermore, I'm not sure if it is safe to freeze it at de_thread().
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 9c5ee2a..291cbd6 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -942,7 +942,7 @@ static int de_thread(struct task_struct *tsk)
> while (sig->notify_count) {
> __set_current_state(TASK_KILLABLE);
> spin_unlock_irq(lock);
> -   schedule();
> +   while (!freezable_schedule_timeout(HZ));
> if (unlikely(__fatal_signal_pending(tsk)))
> goto killed;
> spin_lock_irq(lock);
> 
> Chanho

Sorry, I might misunderstand freezer.
Changes to freezable_schedule() works fine. It looks safe.
I'll apply patch again.

Chanho



Re: [PATCH RFC 0/3] Static calls

2018-11-08 Thread Ingo Molnar


* Ingo Molnar  wrote:

> > - Does this feature have much value without retpolines?  If not, should
> >   we make it depend on retpolines somehow?
> 
> Paravirt patching, as you mention in your later reply?

BTW., to look for candidates of this API, I'd suggest looking at the 
function call frequency of my (almost-)distro kernel vmlinux:

  $ objdump -d vmlinux | grep -w callq | cut -f3- | sort | uniq -c | sort -n | 
tail -100

which gives:

502 callq  8157d050 
522 callq  81aaf420 
536 callq  81547e60 <_copy_to_user>
615 callq  81a97700 
624 callq  *0x82648428
624 callq  810cc810 <__might_sleep>
625 callq  81a93b90 
649 callq  81547dd0 <_copy_from_user>
651 callq  811ba930 
654 callq  8170b6f0 <_dev_warn>
691 callq  81a93790 
693 callq  81a88dc0 
709 callq  *0x82648438
723 callq  811bdbd0 
735 callq  810feac0 
750 callq  8163e9f0 
768 callq  *0x82648430
814 callq  81ab2710 <_raw_spin_lock_irq>
841 callq  81a9e680 <__memcpy>
863 callq  812ae3d0 <__kmalloc>
899 callq  8126ac80 <__might_fault>
912 callq  81ab2970 <_raw_spin_unlock_irq>
939 callq  81aaaf10 <_cond_resched>
966 callq  811bda00 
   1069 callq  81126f50 
   1078 callq  81097760 <__warn_printk>
   1081 callq  8157b140 <__dynamic_dev_dbg>
   1351 callq  8170b630 <_dev_err>
   1365 callq  811050c0 
   1373 callq  81a977f0 
   1390 callq  8157b090 <__dynamic_pr_debug>
   1453 callq  8155c650 <__list_add_valid>
   1501 callq  812ad6f0 
   1509 callq  8155c6c0 <__list_del_entry_valid>
   1513 callq  81310ce0 
   1571 callq  81ab2780 <_raw_spin_lock_irqsave>
   1624 callq  81ab29b0 <_raw_spin_unlock_irqrestore>
   1661 callq  81126fd0 
   1986 callq  81104940 
   2050 callq  811c5110 
   2133 callq  81102c70 
   2507 callq  81ab2560 <_raw_spin_lock>
   2676 callq  81aadc40 
   3056 callq  81ab2900 <_raw_spin_unlock>
   3294 callq  81aac610 
   3628 callq  81129100 
   4462 callq  812ac2c0 
   6454 callq  8111a51e 
   6676 callq  81101420 
   7328 callq  81e014b0 <__x86_indirect_thunk_rax>
   7598 callq  81126f30 
   9065 callq  810979f0 <__stack_chk_fail>

The most prominent callers which are already function call pointers today 
are:

  $ objdump -d vmlinux | grep -w callq | grep \* | cut -f3- | sort | uniq -c | 
sort -n | tail -10

109 callq  *0x82648530
134 callq  *0x82648568
154 callq  *0x826483d0
260 callq  *0x826483d8
297 callq  *0x826483e0
345 callq  *0x82648440
345 callq  *0x82648558
624 callq  *0x82648428
709 callq  *0x82648438
768 callq  *0x82648430

That's all pv_ops->*() method calls:

   82648300 D pv_ops
   826485d0 D pv_info

Optimizing those thousands of function pointer calls would already be a 
nice improvement.

But retpolines:

   7328 callq  81e014b0 <__x86_indirect_thunk_rax>

  81e014b0 <__x86_indirect_thunk_rax>:
  81e014b0:   ff e0   jmpq   *%rax

... are even more prominent, and turned on in every distro as well, 
obviously.

Thanks,

Ingo


RE: [PATCH] freezer: fix freeze timeout on exec

2018-11-08 Thread Chanho Min
> >
> > Can't we simply change de_thread() to use freezable_schedule() ?
> >
> > Oleg.
> 
> We need to change freezable_schedule_timeout() instead.
> freezable_schedule also can't be frozen if sub-threads can't stop
> schedule().
> Furthermore, I'm not sure if it is safe to freeze it at de_thread().
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 9c5ee2a..291cbd6 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -942,7 +942,7 @@ static int de_thread(struct task_struct *tsk)
> while (sig->notify_count) {
> __set_current_state(TASK_KILLABLE);
> spin_unlock_irq(lock);
> -   schedule();
> +   while (!freezable_schedule_timeout(HZ));
> if (unlikely(__fatal_signal_pending(tsk)))
> goto killed;
> spin_lock_irq(lock);
> 
> Chanho

Sorry, I might misunderstand freezer.
Changes to freezable_schedule() works fine. It looks safe.
I'll apply patch again.

Chanho



Re: [PATCH RFC 0/3] Static calls

2018-11-08 Thread Ingo Molnar


* Ingo Molnar  wrote:

> > - Does this feature have much value without retpolines?  If not, should
> >   we make it depend on retpolines somehow?
> 
> Paravirt patching, as you mention in your later reply?

BTW., to look for candidates of this API, I'd suggest looking at the 
function call frequency of my (almost-)distro kernel vmlinux:

  $ objdump -d vmlinux | grep -w callq | cut -f3- | sort | uniq -c | sort -n | 
tail -100

which gives:

502 callq  8157d050 
522 callq  81aaf420 
536 callq  81547e60 <_copy_to_user>
615 callq  81a97700 
624 callq  *0x82648428
624 callq  810cc810 <__might_sleep>
625 callq  81a93b90 
649 callq  81547dd0 <_copy_from_user>
651 callq  811ba930 
654 callq  8170b6f0 <_dev_warn>
691 callq  81a93790 
693 callq  81a88dc0 
709 callq  *0x82648438
723 callq  811bdbd0 
735 callq  810feac0 
750 callq  8163e9f0 
768 callq  *0x82648430
814 callq  81ab2710 <_raw_spin_lock_irq>
841 callq  81a9e680 <__memcpy>
863 callq  812ae3d0 <__kmalloc>
899 callq  8126ac80 <__might_fault>
912 callq  81ab2970 <_raw_spin_unlock_irq>
939 callq  81aaaf10 <_cond_resched>
966 callq  811bda00 
   1069 callq  81126f50 
   1078 callq  81097760 <__warn_printk>
   1081 callq  8157b140 <__dynamic_dev_dbg>
   1351 callq  8170b630 <_dev_err>
   1365 callq  811050c0 
   1373 callq  81a977f0 
   1390 callq  8157b090 <__dynamic_pr_debug>
   1453 callq  8155c650 <__list_add_valid>
   1501 callq  812ad6f0 
   1509 callq  8155c6c0 <__list_del_entry_valid>
   1513 callq  81310ce0 
   1571 callq  81ab2780 <_raw_spin_lock_irqsave>
   1624 callq  81ab29b0 <_raw_spin_unlock_irqrestore>
   1661 callq  81126fd0 
   1986 callq  81104940 
   2050 callq  811c5110 
   2133 callq  81102c70 
   2507 callq  81ab2560 <_raw_spin_lock>
   2676 callq  81aadc40 
   3056 callq  81ab2900 <_raw_spin_unlock>
   3294 callq  81aac610 
   3628 callq  81129100 
   4462 callq  812ac2c0 
   6454 callq  8111a51e 
   6676 callq  81101420 
   7328 callq  81e014b0 <__x86_indirect_thunk_rax>
   7598 callq  81126f30 
   9065 callq  810979f0 <__stack_chk_fail>

The most prominent callers which are already function call pointers today 
are:

  $ objdump -d vmlinux | grep -w callq | grep \* | cut -f3- | sort | uniq -c | 
sort -n | tail -10

109 callq  *0x82648530
134 callq  *0x82648568
154 callq  *0x826483d0
260 callq  *0x826483d8
297 callq  *0x826483e0
345 callq  *0x82648440
345 callq  *0x82648558
624 callq  *0x82648428
709 callq  *0x82648438
768 callq  *0x82648430

That's all pv_ops->*() method calls:

   82648300 D pv_ops
   826485d0 D pv_info

Optimizing those thousands of function pointer calls would already be a 
nice improvement.

But retpolines:

   7328 callq  81e014b0 <__x86_indirect_thunk_rax>

  81e014b0 <__x86_indirect_thunk_rax>:
  81e014b0:   ff e0   jmpq   *%rax

... are even more prominent, and turned on in every distro as well, 
obviously.

Thanks,

Ingo


Re: [PATCH 4.14 01/31] eeprom: at24: Add support for address-width property

2018-11-08 Thread Bartosz Golaszewski
czw., 8 lis 2018 o 23:08 Greg Kroah-Hartman
 napisał(a):
>
> 4.14-stable review patch.  If anyone has any objections, please let me know.
>

Hi Greg,

this looks like a new feature, not a fix. Are you sure this should go
into the stable branch?

Best regards,
Bartosz Golaszewski

> --
>
> [ Upstream commit a2b3bf4846e5eed62ea6abb096af2c950961033c ]
>
> Provide a flexible way to determine the addressing bits of eeprom.
> Pass the addressing bits to driver through address-width property.
>
> Signed-off-by: Alan Chiang 
> Signed-off-by: Andy Yeh 
> Signed-off-by: Bartosz Golaszewski 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/misc/eeprom/at24.c | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/drivers/misc/eeprom/at24.c b/drivers/misc/eeprom/at24.c
> index 4cc0b42f2acc..ded48a0c77ee 100644
> --- a/drivers/misc/eeprom/at24.c
> +++ b/drivers/misc/eeprom/at24.c
> @@ -577,6 +577,23 @@ static void at24_get_pdata(struct device *dev, struct 
> at24_platform_data *chip)
> if (device_property_present(dev, "read-only"))
> chip->flags |= AT24_FLAG_READONLY;
>
> +   err = device_property_read_u32(dev, "address-width", );
> +   if (!err) {
> +   switch (val) {
> +   case 8:
> +   if (chip->flags & AT24_FLAG_ADDR16)
> +   dev_warn(dev, "Override address width to be 
> 8, while default is 16\n");
> +   chip->flags &= ~AT24_FLAG_ADDR16;
> +   break;
> +   case 16:
> +   chip->flags |= AT24_FLAG_ADDR16;
> +   break;
> +   default:
> +   dev_warn(dev, "Bad \"address-width\" property: %u\n",
> +val);
> +   }
> +   }
> +
> err = device_property_read_u32(dev, "pagesize", );
> if (!err) {
> chip->page_size = val;
> --
> 2.17.1
>
>
>


Re: [PATCH 4.14 01/31] eeprom: at24: Add support for address-width property

2018-11-08 Thread Bartosz Golaszewski
czw., 8 lis 2018 o 23:08 Greg Kroah-Hartman
 napisał(a):
>
> 4.14-stable review patch.  If anyone has any objections, please let me know.
>

Hi Greg,

this looks like a new feature, not a fix. Are you sure this should go
into the stable branch?

Best regards,
Bartosz Golaszewski

> --
>
> [ Upstream commit a2b3bf4846e5eed62ea6abb096af2c950961033c ]
>
> Provide a flexible way to determine the addressing bits of eeprom.
> Pass the addressing bits to driver through address-width property.
>
> Signed-off-by: Alan Chiang 
> Signed-off-by: Andy Yeh 
> Signed-off-by: Bartosz Golaszewski 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/misc/eeprom/at24.c | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/drivers/misc/eeprom/at24.c b/drivers/misc/eeprom/at24.c
> index 4cc0b42f2acc..ded48a0c77ee 100644
> --- a/drivers/misc/eeprom/at24.c
> +++ b/drivers/misc/eeprom/at24.c
> @@ -577,6 +577,23 @@ static void at24_get_pdata(struct device *dev, struct 
> at24_platform_data *chip)
> if (device_property_present(dev, "read-only"))
> chip->flags |= AT24_FLAG_READONLY;
>
> +   err = device_property_read_u32(dev, "address-width", );
> +   if (!err) {
> +   switch (val) {
> +   case 8:
> +   if (chip->flags & AT24_FLAG_ADDR16)
> +   dev_warn(dev, "Override address width to be 
> 8, while default is 16\n");
> +   chip->flags &= ~AT24_FLAG_ADDR16;
> +   break;
> +   case 16:
> +   chip->flags |= AT24_FLAG_ADDR16;
> +   break;
> +   default:
> +   dev_warn(dev, "Bad \"address-width\" property: %u\n",
> +val);
> +   }
> +   }
> +
> err = device_property_read_u32(dev, "pagesize", );
> if (!err) {
> chip->page_size = val;
> --
> 2.17.1
>
>
>


[PATCH] greybus: gpio: switch GPIO portions to use GPIOLIB_IRQCHIP

2018-11-08 Thread Nishad Kamdar
Convert the GPIO driver to use the GPIO irqchip library
GPIOLIB_IRQCHIP instead of reimplementing the same.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/greybus/Kconfig |   1 +
 drivers/staging/greybus/gpio.c  | 123 ++--
 2 files changed, 21 insertions(+), 103 deletions(-)

diff --git a/drivers/staging/greybus/Kconfig b/drivers/staging/greybus/Kconfig
index ab096bcef98c..b571e4e8060b 100644
--- a/drivers/staging/greybus/Kconfig
+++ b/drivers/staging/greybus/Kconfig
@@ -148,6 +148,7 @@ if GREYBUS_BRIDGED_PHY
 config GREYBUS_GPIO
tristate "Greybus GPIO Bridged PHY driver"
depends on GPIOLIB
+   select GPIOLIB_IRQCHIP
---help---
  Select this option if you have a device that follows the
  Greybus GPIO Bridged PHY Class specification.
diff --git a/drivers/staging/greybus/gpio.c b/drivers/staging/greybus/gpio.c
index b1d4698019a1..32c228bad33a 100644
--- a/drivers/staging/greybus/gpio.c
+++ b/drivers/staging/greybus/gpio.c
@@ -9,9 +9,7 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
+#include 
 #include 
 
 #include "greybus.h"
@@ -40,8 +38,6 @@ struct gb_gpio_controller {
struct gpio_chipchip;
struct irq_chip irqc;
struct irq_chip *irqchip;
-   struct irq_domain   *irqdomain;
-   unsigned intirq_base;
irq_flow_handler_t  irq_handler;
unsigned intirq_default_type;
struct mutexirq_lock;
@@ -365,6 +361,7 @@ static int gb_gpio_request_handler(struct gb_operation *op)
 {
struct gb_connection *connection = op->connection;
struct gb_gpio_controller *ggc = gb_connection_get_data(connection);
+   struct gpio_chip *gc = >chip;
struct device *dev = >gbphy_dev->dev;
struct gb_message *request;
struct gb_gpio_irq_event_request *event;
@@ -391,7 +388,7 @@ static int gb_gpio_request_handler(struct gb_operation *op)
return -EINVAL;
}
 
-   irq = irq_find_mapping(ggc->irqdomain, event->which);
+   irq = irq_find_mapping(gc->irq.domain, event->which);
if (!irq) {
dev_err(dev, "failed to find IRQ\n");
return -EINVAL;
@@ -506,68 +503,6 @@ static int gb_gpio_controller_setup(struct 
gb_gpio_controller *ggc)
return ret;
 }
 
-/**
- * gb_gpio_irq_map() - maps an IRQ into a GB gpio irqchip
- * @d: the irqdomain used by this irqchip
- * @irq: the global irq number used by this GB gpio irqchip irq
- * @hwirq: the local IRQ/GPIO line offset on this GB gpio
- *
- * This function will set up the mapping for a certain IRQ line on a
- * GB gpio by assigning the GB gpio as chip data, and using the irqchip
- * stored inside the GB gpio.
- */
-static int gb_gpio_irq_map(struct irq_domain *domain, unsigned int irq,
-  irq_hw_number_t hwirq)
-{
-   struct gpio_chip *chip = domain->host_data;
-   struct gb_gpio_controller *ggc = gpio_chip_to_gb_gpio_controller(chip);
-
-   irq_set_chip_data(irq, ggc);
-   irq_set_chip_and_handler(irq, ggc->irqchip, ggc->irq_handler);
-   irq_set_noprobe(irq);
-   /*
-* No set-up of the hardware will happen if IRQ_TYPE_NONE
-* is passed as default type.
-*/
-   if (ggc->irq_default_type != IRQ_TYPE_NONE)
-   irq_set_irq_type(irq, ggc->irq_default_type);
-
-   return 0;
-}
-
-static void gb_gpio_irq_unmap(struct irq_domain *d, unsigned int irq)
-{
-   irq_set_chip_and_handler(irq, NULL, NULL);
-   irq_set_chip_data(irq, NULL);
-}
-
-static const struct irq_domain_ops gb_gpio_domain_ops = {
-   .map= gb_gpio_irq_map,
-   .unmap  = gb_gpio_irq_unmap,
-};
-
-/**
- * gb_gpio_irqchip_remove() - removes an irqchip added to a gb_gpio_controller
- * @ggc: the gb_gpio_controller to remove the irqchip from
- *
- * This is called only from gb_gpio_remove()
- */
-static void gb_gpio_irqchip_remove(struct gb_gpio_controller *ggc)
-{
-   unsigned int offset;
-
-   /* Remove all IRQ mappings and delete the domain */
-   if (ggc->irqdomain) {
-   for (offset = 0; offset < (ggc->line_max + 1); offset++)
-   irq_dispose_mapping(irq_find_mapping(ggc->irqdomain,
-offset));
-   irq_domain_remove(ggc->irqdomain);
-   }
-
-   if (ggc->irqchip)
-   ggc->irqchip = NULL;
-}
-
 /**
  * gb_gpio_irqchip_add() - adds an irqchip to a gpio chip
  * @chip: the gpio chip to add the irqchip to
@@ -595,8 +530,7 @@ static int gb_gpio_irqchip_add(struct gpio_chip *chip,
 unsigned int type)
 {
struct gb_gpio_controller *ggc;
-   unsigned int offset;
-   unsigned int irq_base;
+   unsigned int err;
 
if (!chip || !irqchip)
return -EINVAL;
@@ -606,35 +540,21 @@ static int gb_gpio_irqchip_add(struct 

[PATCH] greybus: gpio: switch GPIO portions to use GPIOLIB_IRQCHIP

2018-11-08 Thread Nishad Kamdar
Convert the GPIO driver to use the GPIO irqchip library
GPIOLIB_IRQCHIP instead of reimplementing the same.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/greybus/Kconfig |   1 +
 drivers/staging/greybus/gpio.c  | 123 ++--
 2 files changed, 21 insertions(+), 103 deletions(-)

diff --git a/drivers/staging/greybus/Kconfig b/drivers/staging/greybus/Kconfig
index ab096bcef98c..b571e4e8060b 100644
--- a/drivers/staging/greybus/Kconfig
+++ b/drivers/staging/greybus/Kconfig
@@ -148,6 +148,7 @@ if GREYBUS_BRIDGED_PHY
 config GREYBUS_GPIO
tristate "Greybus GPIO Bridged PHY driver"
depends on GPIOLIB
+   select GPIOLIB_IRQCHIP
---help---
  Select this option if you have a device that follows the
  Greybus GPIO Bridged PHY Class specification.
diff --git a/drivers/staging/greybus/gpio.c b/drivers/staging/greybus/gpio.c
index b1d4698019a1..32c228bad33a 100644
--- a/drivers/staging/greybus/gpio.c
+++ b/drivers/staging/greybus/gpio.c
@@ -9,9 +9,7 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
+#include 
 #include 
 
 #include "greybus.h"
@@ -40,8 +38,6 @@ struct gb_gpio_controller {
struct gpio_chipchip;
struct irq_chip irqc;
struct irq_chip *irqchip;
-   struct irq_domain   *irqdomain;
-   unsigned intirq_base;
irq_flow_handler_t  irq_handler;
unsigned intirq_default_type;
struct mutexirq_lock;
@@ -365,6 +361,7 @@ static int gb_gpio_request_handler(struct gb_operation *op)
 {
struct gb_connection *connection = op->connection;
struct gb_gpio_controller *ggc = gb_connection_get_data(connection);
+   struct gpio_chip *gc = >chip;
struct device *dev = >gbphy_dev->dev;
struct gb_message *request;
struct gb_gpio_irq_event_request *event;
@@ -391,7 +388,7 @@ static int gb_gpio_request_handler(struct gb_operation *op)
return -EINVAL;
}
 
-   irq = irq_find_mapping(ggc->irqdomain, event->which);
+   irq = irq_find_mapping(gc->irq.domain, event->which);
if (!irq) {
dev_err(dev, "failed to find IRQ\n");
return -EINVAL;
@@ -506,68 +503,6 @@ static int gb_gpio_controller_setup(struct 
gb_gpio_controller *ggc)
return ret;
 }
 
-/**
- * gb_gpio_irq_map() - maps an IRQ into a GB gpio irqchip
- * @d: the irqdomain used by this irqchip
- * @irq: the global irq number used by this GB gpio irqchip irq
- * @hwirq: the local IRQ/GPIO line offset on this GB gpio
- *
- * This function will set up the mapping for a certain IRQ line on a
- * GB gpio by assigning the GB gpio as chip data, and using the irqchip
- * stored inside the GB gpio.
- */
-static int gb_gpio_irq_map(struct irq_domain *domain, unsigned int irq,
-  irq_hw_number_t hwirq)
-{
-   struct gpio_chip *chip = domain->host_data;
-   struct gb_gpio_controller *ggc = gpio_chip_to_gb_gpio_controller(chip);
-
-   irq_set_chip_data(irq, ggc);
-   irq_set_chip_and_handler(irq, ggc->irqchip, ggc->irq_handler);
-   irq_set_noprobe(irq);
-   /*
-* No set-up of the hardware will happen if IRQ_TYPE_NONE
-* is passed as default type.
-*/
-   if (ggc->irq_default_type != IRQ_TYPE_NONE)
-   irq_set_irq_type(irq, ggc->irq_default_type);
-
-   return 0;
-}
-
-static void gb_gpio_irq_unmap(struct irq_domain *d, unsigned int irq)
-{
-   irq_set_chip_and_handler(irq, NULL, NULL);
-   irq_set_chip_data(irq, NULL);
-}
-
-static const struct irq_domain_ops gb_gpio_domain_ops = {
-   .map= gb_gpio_irq_map,
-   .unmap  = gb_gpio_irq_unmap,
-};
-
-/**
- * gb_gpio_irqchip_remove() - removes an irqchip added to a gb_gpio_controller
- * @ggc: the gb_gpio_controller to remove the irqchip from
- *
- * This is called only from gb_gpio_remove()
- */
-static void gb_gpio_irqchip_remove(struct gb_gpio_controller *ggc)
-{
-   unsigned int offset;
-
-   /* Remove all IRQ mappings and delete the domain */
-   if (ggc->irqdomain) {
-   for (offset = 0; offset < (ggc->line_max + 1); offset++)
-   irq_dispose_mapping(irq_find_mapping(ggc->irqdomain,
-offset));
-   irq_domain_remove(ggc->irqdomain);
-   }
-
-   if (ggc->irqchip)
-   ggc->irqchip = NULL;
-}
-
 /**
  * gb_gpio_irqchip_add() - adds an irqchip to a gpio chip
  * @chip: the gpio chip to add the irqchip to
@@ -595,8 +530,7 @@ static int gb_gpio_irqchip_add(struct gpio_chip *chip,
 unsigned int type)
 {
struct gb_gpio_controller *ggc;
-   unsigned int offset;
-   unsigned int irq_base;
+   unsigned int err;
 
if (!chip || !irqchip)
return -EINVAL;
@@ -606,35 +540,21 @@ static int gb_gpio_irqchip_add(struct 

Re: [PATCH v3 0/3] Huawei laptops

2018-11-08 Thread Takashi Iwai
On Thu, 08 Nov 2018 20:59:45 +0100,
Andy Shevchenko wrote:
> 
> On Thu, Nov 8, 2018 at 7:17 PM Ayman Bagabas  wrote:
> 
> Is it supposed to go via PDx86 or ALSA tree?

I don't mind either way.  The addition in platform is more
significant, so I suppose you can take it more easily.


thanks,

Takashi


Re: [PATCH v3 0/3] Huawei laptops

2018-11-08 Thread Takashi Iwai
On Thu, 08 Nov 2018 20:59:45 +0100,
Andy Shevchenko wrote:
> 
> On Thu, Nov 8, 2018 at 7:17 PM Ayman Bagabas  wrote:
> 
> Is it supposed to go via PDx86 or ALSA tree?

I don't mind either way.  The addition in platform is more
significant, so I suppose you can take it more easily.


thanks,

Takashi


[PATCH v3 4/4] staging: iio: ad7816: Add device tree table.

2018-11-08 Thread Nishad Kamdar
Add device tree table for matching vendor ID.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/iio/adc/ad7816.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
index a2fead85cd46..b8a9149fbac1 100644
--- a/drivers/staging/iio/adc/ad7816.c
+++ b/drivers/staging/iio/adc/ad7816.c
@@ -422,6 +422,12 @@ static int ad7816_probe(struct spi_device *spi_dev)
return 0;
 }
 
+static const struct of_device_id ad7816_of_match[] = {
+   { .compatible = "adi,ad7816", },
+   { }
+};
+MODULE_DEVICE_TABLE(of, ad7816_of_match);
+
 static const struct spi_device_id ad7816_id[] = {
{ "ad7816", ID_AD7816 },
{ "ad7817", ID_AD7817 },
@@ -434,6 +440,7 @@ MODULE_DEVICE_TABLE(spi, ad7816_id);
 static struct spi_driver ad7816_driver = {
.driver = {
.name = "ad7816",
+   .of_match_table = of_match_ptr(ad7816_of_match),
},
.probe = ad7816_probe,
.id_table = ad7816_id,
-- 
2.17.1



[PATCH v3 4/4] staging: iio: ad7816: Add device tree table.

2018-11-08 Thread Nishad Kamdar
Add device tree table for matching vendor ID.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/iio/adc/ad7816.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
index a2fead85cd46..b8a9149fbac1 100644
--- a/drivers/staging/iio/adc/ad7816.c
+++ b/drivers/staging/iio/adc/ad7816.c
@@ -422,6 +422,12 @@ static int ad7816_probe(struct spi_device *spi_dev)
return 0;
 }
 
+static const struct of_device_id ad7816_of_match[] = {
+   { .compatible = "adi,ad7816", },
+   { }
+};
+MODULE_DEVICE_TABLE(of, ad7816_of_match);
+
 static const struct spi_device_id ad7816_id[] = {
{ "ad7816", ID_AD7816 },
{ "ad7817", ID_AD7817 },
@@ -434,6 +440,7 @@ MODULE_DEVICE_TABLE(spi, ad7816_id);
 static struct spi_driver ad7816_driver = {
.driver = {
.name = "ad7816",
+   .of_match_table = of_match_ptr(ad7816_of_match),
},
.probe = ad7816_probe,
.id_table = ad7816_id,
-- 
2.17.1



[PATCH v3 3/4] staging: iio: ad7816: Set RD/WR pin and CONVST pin as outputs.

2018-11-08 Thread Nishad Kamdar
The RD/WR pin and CONVST pin are logical inputs to the AD78xx
chip as per the datasheet. Hence convert them to outputs.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/iio/adc/ad7816.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
index 3cda5cd09365..a2fead85cd46 100644
--- a/drivers/staging/iio/adc/ad7816.c
+++ b/drivers/staging/iio/adc/ad7816.c
@@ -369,14 +369,15 @@ static int ad7816_probe(struct spi_device *spi_dev)
chip->oti_data[i] = 203;
 
chip->id = spi_get_device_id(spi_dev)->driver_data;
-   chip->rdwr_pin = devm_gpiod_get(_dev->dev, "rdwr", GPIOD_IN);
+   chip->rdwr_pin = devm_gpiod_get(_dev->dev, "rdwr", GPIOD_OUT_HIGH);
if (IS_ERR(chip->rdwr_pin)) {
ret = PTR_ERR(chip->rdwr_pin);
dev_err(_dev->dev, "Failed to request rdwr GPIO: %d\n",
ret);
return ret;
}
-   chip->convert_pin = devm_gpiod_get(_dev->dev, "convert", GPIOD_IN);
+   chip->convert_pin = devm_gpiod_get(_dev->dev, "convert",
+  GPIOD_OUT_HIGH);
if (IS_ERR(chip->convert_pin)) {
ret = PTR_ERR(chip->convert_pin);
dev_err(_dev->dev, "Failed to request convert GPIO: %d\n",
-- 
2.17.1



[PATCH v3 3/4] staging: iio: ad7816: Set RD/WR pin and CONVST pin as outputs.

2018-11-08 Thread Nishad Kamdar
The RD/WR pin and CONVST pin are logical inputs to the AD78xx
chip as per the datasheet. Hence convert them to outputs.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/iio/adc/ad7816.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
index 3cda5cd09365..a2fead85cd46 100644
--- a/drivers/staging/iio/adc/ad7816.c
+++ b/drivers/staging/iio/adc/ad7816.c
@@ -369,14 +369,15 @@ static int ad7816_probe(struct spi_device *spi_dev)
chip->oti_data[i] = 203;
 
chip->id = spi_get_device_id(spi_dev)->driver_data;
-   chip->rdwr_pin = devm_gpiod_get(_dev->dev, "rdwr", GPIOD_IN);
+   chip->rdwr_pin = devm_gpiod_get(_dev->dev, "rdwr", GPIOD_OUT_HIGH);
if (IS_ERR(chip->rdwr_pin)) {
ret = PTR_ERR(chip->rdwr_pin);
dev_err(_dev->dev, "Failed to request rdwr GPIO: %d\n",
ret);
return ret;
}
-   chip->convert_pin = devm_gpiod_get(_dev->dev, "convert", GPIOD_IN);
+   chip->convert_pin = devm_gpiod_get(_dev->dev, "convert",
+  GPIOD_OUT_HIGH);
if (IS_ERR(chip->convert_pin)) {
ret = PTR_ERR(chip->convert_pin);
dev_err(_dev->dev, "Failed to request convert GPIO: %d\n",
-- 
2.17.1



[PATCH v3 2/4] staging: iio: ad7816: Do not use busy_pin in case of AD7818

2018-11-08 Thread Nishad Kamdar
AD7818 does not support busy_pin functionality as per datasheet.
Hence drop busy_pin when AD7818 is used.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/iio/adc/ad7816.c | 35 ++--
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
index 12c4e0ab4713..3cda5cd09365 100644
--- a/drivers/staging/iio/adc/ad7816.c
+++ b/drivers/staging/iio/adc/ad7816.c
@@ -43,6 +43,7 @@
  */
 
 struct ad7816_chip_info {
+   kernel_ulong_t id;
struct spi_device *spi_dev;
struct gpio_desc *rdwr_pin;
struct gpio_desc *convert_pin;
@@ -52,6 +53,12 @@ struct ad7816_chip_info {
u8  mode;
 };
 
+enum ad7816_type {
+   ID_AD7816,
+   ID_AD7817,
+   ID_AD7818,
+};
+
 /*
  * ad7816 data access by SPI
  */
@@ -78,8 +85,10 @@ static int ad7816_spi_read(struct ad7816_chip_info *chip, 
u16 *data)
gpiod_set_value(chip->convert_pin, 1);
}
 
-   while (gpiod_get_value(chip->busy_pin))
-   cpu_relax();
+   if (chip->id == ID_AD7816 || chip->id == ID_AD7817) {
+   while (gpiod_get_value(chip->busy_pin))
+   cpu_relax();
+   }
 
gpiod_set_value(chip->rdwr_pin, 0);
gpiod_set_value(chip->rdwr_pin, 1);
@@ -359,6 +368,7 @@ static int ad7816_probe(struct spi_device *spi_dev)
for (i = 0; i <= AD7816_CS_MAX; i++)
chip->oti_data[i] = 203;
 
+   chip->id = spi_get_device_id(spi_dev)->driver_data;
chip->rdwr_pin = devm_gpiod_get(_dev->dev, "rdwr", GPIOD_IN);
if (IS_ERR(chip->rdwr_pin)) {
ret = PTR_ERR(chip->rdwr_pin);
@@ -373,12 +383,15 @@ static int ad7816_probe(struct spi_device *spi_dev)
ret);
return ret;
}
-   chip->busy_pin = devm_gpiod_get(_dev->dev, "busy", GPIOD_IN);
-   if (IS_ERR(chip->busy_pin)) {
-   ret = PTR_ERR(chip->busy_pin);
-   dev_err(_dev->dev, "Failed to request busy GPIO: %d\n",
-   ret);
-   return ret;
+   if (chip->id == ID_AD7816 || chip->id == ID_AD7817) {
+   chip->busy_pin = devm_gpiod_get(_dev->dev, "busy",
+   GPIOD_IN);
+   if (IS_ERR(chip->busy_pin)) {
+   ret = PTR_ERR(chip->busy_pin);
+   dev_err(_dev->dev, "Failed to request busy GPIO: 
%d\n",
+   ret);
+   return ret;
+   }
}
 
indio_dev->name = spi_get_device_id(spi_dev)->name;
@@ -409,9 +422,9 @@ static int ad7816_probe(struct spi_device *spi_dev)
 }
 
 static const struct spi_device_id ad7816_id[] = {
-   { "ad7816", 0 },
-   { "ad7817", 0 },
-   { "ad7818", 0 },
+   { "ad7816", ID_AD7816 },
+   { "ad7817", ID_AD7817 },
+   { "ad7818", ID_AD7818 },
{}
 };
 
-- 
2.17.1



[PATCH v3 2/4] staging: iio: ad7816: Do not use busy_pin in case of AD7818

2018-11-08 Thread Nishad Kamdar
AD7818 does not support busy_pin functionality as per datasheet.
Hence drop busy_pin when AD7818 is used.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/iio/adc/ad7816.c | 35 ++--
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
index 12c4e0ab4713..3cda5cd09365 100644
--- a/drivers/staging/iio/adc/ad7816.c
+++ b/drivers/staging/iio/adc/ad7816.c
@@ -43,6 +43,7 @@
  */
 
 struct ad7816_chip_info {
+   kernel_ulong_t id;
struct spi_device *spi_dev;
struct gpio_desc *rdwr_pin;
struct gpio_desc *convert_pin;
@@ -52,6 +53,12 @@ struct ad7816_chip_info {
u8  mode;
 };
 
+enum ad7816_type {
+   ID_AD7816,
+   ID_AD7817,
+   ID_AD7818,
+};
+
 /*
  * ad7816 data access by SPI
  */
@@ -78,8 +85,10 @@ static int ad7816_spi_read(struct ad7816_chip_info *chip, 
u16 *data)
gpiod_set_value(chip->convert_pin, 1);
}
 
-   while (gpiod_get_value(chip->busy_pin))
-   cpu_relax();
+   if (chip->id == ID_AD7816 || chip->id == ID_AD7817) {
+   while (gpiod_get_value(chip->busy_pin))
+   cpu_relax();
+   }
 
gpiod_set_value(chip->rdwr_pin, 0);
gpiod_set_value(chip->rdwr_pin, 1);
@@ -359,6 +368,7 @@ static int ad7816_probe(struct spi_device *spi_dev)
for (i = 0; i <= AD7816_CS_MAX; i++)
chip->oti_data[i] = 203;
 
+   chip->id = spi_get_device_id(spi_dev)->driver_data;
chip->rdwr_pin = devm_gpiod_get(_dev->dev, "rdwr", GPIOD_IN);
if (IS_ERR(chip->rdwr_pin)) {
ret = PTR_ERR(chip->rdwr_pin);
@@ -373,12 +383,15 @@ static int ad7816_probe(struct spi_device *spi_dev)
ret);
return ret;
}
-   chip->busy_pin = devm_gpiod_get(_dev->dev, "busy", GPIOD_IN);
-   if (IS_ERR(chip->busy_pin)) {
-   ret = PTR_ERR(chip->busy_pin);
-   dev_err(_dev->dev, "Failed to request busy GPIO: %d\n",
-   ret);
-   return ret;
+   if (chip->id == ID_AD7816 || chip->id == ID_AD7817) {
+   chip->busy_pin = devm_gpiod_get(_dev->dev, "busy",
+   GPIOD_IN);
+   if (IS_ERR(chip->busy_pin)) {
+   ret = PTR_ERR(chip->busy_pin);
+   dev_err(_dev->dev, "Failed to request busy GPIO: 
%d\n",
+   ret);
+   return ret;
+   }
}
 
indio_dev->name = spi_get_device_id(spi_dev)->name;
@@ -409,9 +422,9 @@ static int ad7816_probe(struct spi_device *spi_dev)
 }
 
 static const struct spi_device_id ad7816_id[] = {
-   { "ad7816", 0 },
-   { "ad7817", 0 },
-   { "ad7818", 0 },
+   { "ad7816", ID_AD7816 },
+   { "ad7817", ID_AD7817 },
+   { "ad7818", ID_AD7818 },
{}
 };
 
-- 
2.17.1



[PATCH v3 1/4] staging: iio: ad7816: Switch to the gpio descriptor interface

2018-11-08 Thread Nishad Kamdar
Use the gpiod interface for rdwr_pin, convert_pin and busy_pin
instead of the deprecated old non-descriptor interface.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/iio/adc/ad7816.c | 80 ++--
 1 file changed, 34 insertions(+), 46 deletions(-)

diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
index bf76a8620bdb..12c4e0ab4713 100644
--- a/drivers/staging/iio/adc/ad7816.c
+++ b/drivers/staging/iio/adc/ad7816.c
@@ -7,7 +7,7 @@
  */
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -44,9 +44,9 @@
 
 struct ad7816_chip_info {
struct spi_device *spi_dev;
-   u16 rdwr_pin;
-   u16 convert_pin;
-   u16 busy_pin;
+   struct gpio_desc *rdwr_pin;
+   struct gpio_desc *convert_pin;
+   struct gpio_desc *busy_pin;
u8  oti_data[AD7816_CS_MAX + 1];
u8  channel_id; /* 0 always be temperature */
u8  mode;
@@ -61,28 +61,28 @@ static int ad7816_spi_read(struct ad7816_chip_info *chip, 
u16 *data)
int ret = 0;
__be16 buf;
 
-   gpio_set_value(chip->rdwr_pin, 1);
-   gpio_set_value(chip->rdwr_pin, 0);
+   gpiod_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 0);
ret = spi_write(spi_dev, >channel_id, sizeof(chip->channel_id));
if (ret < 0) {
dev_err(_dev->dev, "SPI channel setting error\n");
return ret;
}
-   gpio_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 1);
 
if (chip->mode == AD7816_PD) { /* operating mode 2 */
-   gpio_set_value(chip->convert_pin, 1);
-   gpio_set_value(chip->convert_pin, 0);
+   gpiod_set_value(chip->convert_pin, 1);
+   gpiod_set_value(chip->convert_pin, 0);
} else { /* operating mode 1 */
-   gpio_set_value(chip->convert_pin, 0);
-   gpio_set_value(chip->convert_pin, 1);
+   gpiod_set_value(chip->convert_pin, 0);
+   gpiod_set_value(chip->convert_pin, 1);
}
 
-   while (gpio_get_value(chip->busy_pin))
+   while (gpiod_get_value(chip->busy_pin))
cpu_relax();
 
-   gpio_set_value(chip->rdwr_pin, 0);
-   gpio_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 0);
+   gpiod_set_value(chip->rdwr_pin, 1);
ret = spi_read(spi_dev, , sizeof(*data));
if (ret < 0) {
dev_err(_dev->dev, "SPI data read error\n");
@@ -99,8 +99,8 @@ static int ad7816_spi_write(struct ad7816_chip_info *chip, u8 
data)
struct spi_device *spi_dev = chip->spi_dev;
int ret = 0;
 
-   gpio_set_value(chip->rdwr_pin, 1);
-   gpio_set_value(chip->rdwr_pin, 0);
+   gpiod_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 0);
ret = spi_write(spi_dev, , sizeof(data));
if (ret < 0)
dev_err(_dev->dev, "SPI oti data write error\n");
@@ -129,10 +129,10 @@ static ssize_t ad7816_store_mode(struct device *dev,
struct ad7816_chip_info *chip = iio_priv(indio_dev);
 
if (strcmp(buf, "full")) {
-   gpio_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 1);
chip->mode = AD7816_FULL;
} else {
-   gpio_set_value(chip->rdwr_pin, 0);
+   gpiod_set_value(chip->rdwr_pin, 0);
chip->mode = AD7816_PD;
}
 
@@ -345,15 +345,9 @@ static int ad7816_probe(struct spi_device *spi_dev)
 {
struct ad7816_chip_info *chip;
struct iio_dev *indio_dev;
-   unsigned short *pins = dev_get_platdata(_dev->dev);
int ret = 0;
int i;
 
-   if (!pins) {
-   dev_err(_dev->dev, "No necessary GPIO platform data.\n");
-   return -EINVAL;
-   }
-
indio_dev = devm_iio_device_alloc(_dev->dev, sizeof(*chip));
if (!indio_dev)
return -ENOMEM;
@@ -364,34 +358,28 @@ static int ad7816_probe(struct spi_device *spi_dev)
chip->spi_dev = spi_dev;
for (i = 0; i <= AD7816_CS_MAX; i++)
chip->oti_data[i] = 203;
-   chip->rdwr_pin = pins[0];
-   chip->convert_pin = pins[1];
-   chip->busy_pin = pins[2];
-
-   ret = devm_gpio_request(_dev->dev, chip->rdwr_pin,
-   spi_get_device_id(spi_dev)->name);
-   if (ret) {
-   dev_err(_dev->dev, "Fail to request rdwr gpio PIN %d.\n",
-   chip->rdwr_pin);
+
+   chip->rdwr_pin = devm_gpiod_get(_dev->dev, "rdwr", GPIOD_IN);
+   if (IS_ERR(chip->rdwr_pin)) {
+   ret = PTR_ERR(chip->rdwr_pin);
+   dev_err(_dev->dev, "Failed to request rdwr GPIO: %d\n",
+   ret);
return ret;
}
-   gpio_direction_input(chip->rdwr_pin);
-   ret = devm_gpio_request(_dev->dev, chip->convert_pin,
-   

[PATCH v3 1/4] staging: iio: ad7816: Switch to the gpio descriptor interface

2018-11-08 Thread Nishad Kamdar
Use the gpiod interface for rdwr_pin, convert_pin and busy_pin
instead of the deprecated old non-descriptor interface.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/iio/adc/ad7816.c | 80 ++--
 1 file changed, 34 insertions(+), 46 deletions(-)

diff --git a/drivers/staging/iio/adc/ad7816.c b/drivers/staging/iio/adc/ad7816.c
index bf76a8620bdb..12c4e0ab4713 100644
--- a/drivers/staging/iio/adc/ad7816.c
+++ b/drivers/staging/iio/adc/ad7816.c
@@ -7,7 +7,7 @@
  */
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -44,9 +44,9 @@
 
 struct ad7816_chip_info {
struct spi_device *spi_dev;
-   u16 rdwr_pin;
-   u16 convert_pin;
-   u16 busy_pin;
+   struct gpio_desc *rdwr_pin;
+   struct gpio_desc *convert_pin;
+   struct gpio_desc *busy_pin;
u8  oti_data[AD7816_CS_MAX + 1];
u8  channel_id; /* 0 always be temperature */
u8  mode;
@@ -61,28 +61,28 @@ static int ad7816_spi_read(struct ad7816_chip_info *chip, 
u16 *data)
int ret = 0;
__be16 buf;
 
-   gpio_set_value(chip->rdwr_pin, 1);
-   gpio_set_value(chip->rdwr_pin, 0);
+   gpiod_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 0);
ret = spi_write(spi_dev, >channel_id, sizeof(chip->channel_id));
if (ret < 0) {
dev_err(_dev->dev, "SPI channel setting error\n");
return ret;
}
-   gpio_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 1);
 
if (chip->mode == AD7816_PD) { /* operating mode 2 */
-   gpio_set_value(chip->convert_pin, 1);
-   gpio_set_value(chip->convert_pin, 0);
+   gpiod_set_value(chip->convert_pin, 1);
+   gpiod_set_value(chip->convert_pin, 0);
} else { /* operating mode 1 */
-   gpio_set_value(chip->convert_pin, 0);
-   gpio_set_value(chip->convert_pin, 1);
+   gpiod_set_value(chip->convert_pin, 0);
+   gpiod_set_value(chip->convert_pin, 1);
}
 
-   while (gpio_get_value(chip->busy_pin))
+   while (gpiod_get_value(chip->busy_pin))
cpu_relax();
 
-   gpio_set_value(chip->rdwr_pin, 0);
-   gpio_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 0);
+   gpiod_set_value(chip->rdwr_pin, 1);
ret = spi_read(spi_dev, , sizeof(*data));
if (ret < 0) {
dev_err(_dev->dev, "SPI data read error\n");
@@ -99,8 +99,8 @@ static int ad7816_spi_write(struct ad7816_chip_info *chip, u8 
data)
struct spi_device *spi_dev = chip->spi_dev;
int ret = 0;
 
-   gpio_set_value(chip->rdwr_pin, 1);
-   gpio_set_value(chip->rdwr_pin, 0);
+   gpiod_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 0);
ret = spi_write(spi_dev, , sizeof(data));
if (ret < 0)
dev_err(_dev->dev, "SPI oti data write error\n");
@@ -129,10 +129,10 @@ static ssize_t ad7816_store_mode(struct device *dev,
struct ad7816_chip_info *chip = iio_priv(indio_dev);
 
if (strcmp(buf, "full")) {
-   gpio_set_value(chip->rdwr_pin, 1);
+   gpiod_set_value(chip->rdwr_pin, 1);
chip->mode = AD7816_FULL;
} else {
-   gpio_set_value(chip->rdwr_pin, 0);
+   gpiod_set_value(chip->rdwr_pin, 0);
chip->mode = AD7816_PD;
}
 
@@ -345,15 +345,9 @@ static int ad7816_probe(struct spi_device *spi_dev)
 {
struct ad7816_chip_info *chip;
struct iio_dev *indio_dev;
-   unsigned short *pins = dev_get_platdata(_dev->dev);
int ret = 0;
int i;
 
-   if (!pins) {
-   dev_err(_dev->dev, "No necessary GPIO platform data.\n");
-   return -EINVAL;
-   }
-
indio_dev = devm_iio_device_alloc(_dev->dev, sizeof(*chip));
if (!indio_dev)
return -ENOMEM;
@@ -364,34 +358,28 @@ static int ad7816_probe(struct spi_device *spi_dev)
chip->spi_dev = spi_dev;
for (i = 0; i <= AD7816_CS_MAX; i++)
chip->oti_data[i] = 203;
-   chip->rdwr_pin = pins[0];
-   chip->convert_pin = pins[1];
-   chip->busy_pin = pins[2];
-
-   ret = devm_gpio_request(_dev->dev, chip->rdwr_pin,
-   spi_get_device_id(spi_dev)->name);
-   if (ret) {
-   dev_err(_dev->dev, "Fail to request rdwr gpio PIN %d.\n",
-   chip->rdwr_pin);
+
+   chip->rdwr_pin = devm_gpiod_get(_dev->dev, "rdwr", GPIOD_IN);
+   if (IS_ERR(chip->rdwr_pin)) {
+   ret = PTR_ERR(chip->rdwr_pin);
+   dev_err(_dev->dev, "Failed to request rdwr GPIO: %d\n",
+   ret);
return ret;
}
-   gpio_direction_input(chip->rdwr_pin);
-   ret = devm_gpio_request(_dev->dev, chip->convert_pin,
-   

[PATCH v3 0/4] staging: iio: ad7816: Switch to the gpio descriptor interface

2018-11-08 Thread Nishad Kamdar
Changes in v4:
 - Drop busy pin in case of AD7818.
 - Set RD/WR pin and CONVST pin as outputs.
 - Add device tree table.

Nishad Kamdar (4):
  staging: iio: ad7816: Switch to the gpio descriptor interface
  staging: iio: ad7816: Do not use busy_pin in case of AD7818
  staging: iio: ad7816: Set RD/WR pin and CONVST pin as outputs.
  staging: iio: ad7816: Add device tree table.

 drivers/staging/iio/adc/ad7816.c | 111 +--
 1 file changed, 60 insertions(+), 51 deletions(-)

-- 
2.17.1



[PATCH v3 0/4] staging: iio: ad7816: Switch to the gpio descriptor interface

2018-11-08 Thread Nishad Kamdar
Changes in v4:
 - Drop busy pin in case of AD7818.
 - Set RD/WR pin and CONVST pin as outputs.
 - Add device tree table.

Nishad Kamdar (4):
  staging: iio: ad7816: Switch to the gpio descriptor interface
  staging: iio: ad7816: Do not use busy_pin in case of AD7818
  staging: iio: ad7816: Set RD/WR pin and CONVST pin as outputs.
  staging: iio: ad7816: Add device tree table.

 drivers/staging/iio/adc/ad7816.c | 111 +--
 1 file changed, 60 insertions(+), 51 deletions(-)

-- 
2.17.1



Re: [PATCH RFC 0/3] Static calls

2018-11-08 Thread Ingo Molnar


* Josh Poimboeuf  wrote:

> These patches are related to two similar patch sets from Ard and Steve:
> 
> - https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org
> - https://lkml.kernel.org/r/20181006015110.653946...@goodmis.org
> 
> The code is also heavily inspired by the jump label code, as some of the
> concepts are very similar.
> 
> There are three separate implementations, depending on what the arch
> supports:
> 
>   1) CONFIG_HAVE_STATIC_CALL_OPTIMIZED: patched call sites - requires
>  objtool and a small amount of arch code
>   
>   2) CONFIG_HAVE_STATIC_CALL_UNOPTIMIZED: patched trampolines - requires
>  a small amount of arch code
>   
>   3) If no arch support, fall back to regular function pointers
> 
> 
> TODO:
> 
> - I'm not sure about the objtool approach.  Objtool is (currently)
>   x86-64 only, which means we have to use the "unoptimized" version
>   everywhere else.  I may experiment with a GCC plugin instead.

I'd prefer the objtool approach. It's a pretty reliable first-principles 
approach while GCC plugin would have to be replicated for Clang and any 
other compilers, etc.

> - Does this feature have much value without retpolines?  If not, should
>   we make it depend on retpolines somehow?

Paravirt patching, as you mention in your later reply?

> - Find some actual users of the interfaces (tracepoints? crypto?)

I'd be very happy with a demonstrated paravirt optimization already - 
i.e. seeing the before/after effect on the vmlinux with an x86 distro 
config.

All major Linux distributions enable CONFIG_PARAVIRT=y and 
CONFIG_PARAVIRT_XXL=y on x86 at the moment, so optimizing it away as much 
as possible in the 99.999% cases where it's not used is a primary 
concern.

All other usecases are bonus, but it would certainly be interesting to 
investigate the impact of using these APIs for tracing: that too is a 
feature enabled everywhere but utilized only by a small fraction of Linux 
users - so literally every single cycle or instruction saved or hot-path 
shortened is a major win.

Thanks,

Ingo


Re: [PATCH RFC 0/3] Static calls

2018-11-08 Thread Ingo Molnar


* Josh Poimboeuf  wrote:

> These patches are related to two similar patch sets from Ard and Steve:
> 
> - https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org
> - https://lkml.kernel.org/r/20181006015110.653946...@goodmis.org
> 
> The code is also heavily inspired by the jump label code, as some of the
> concepts are very similar.
> 
> There are three separate implementations, depending on what the arch
> supports:
> 
>   1) CONFIG_HAVE_STATIC_CALL_OPTIMIZED: patched call sites - requires
>  objtool and a small amount of arch code
>   
>   2) CONFIG_HAVE_STATIC_CALL_UNOPTIMIZED: patched trampolines - requires
>  a small amount of arch code
>   
>   3) If no arch support, fall back to regular function pointers
> 
> 
> TODO:
> 
> - I'm not sure about the objtool approach.  Objtool is (currently)
>   x86-64 only, which means we have to use the "unoptimized" version
>   everywhere else.  I may experiment with a GCC plugin instead.

I'd prefer the objtool approach. It's a pretty reliable first-principles 
approach while GCC plugin would have to be replicated for Clang and any 
other compilers, etc.

> - Does this feature have much value without retpolines?  If not, should
>   we make it depend on retpolines somehow?

Paravirt patching, as you mention in your later reply?

> - Find some actual users of the interfaces (tracepoints? crypto?)

I'd be very happy with a demonstrated paravirt optimization already - 
i.e. seeing the before/after effect on the vmlinux with an x86 distro 
config.

All major Linux distributions enable CONFIG_PARAVIRT=y and 
CONFIG_PARAVIRT_XXL=y on x86 at the moment, so optimizing it away as much 
as possible in the 99.999% cases where it's not used is a primary 
concern.

All other usecases are bonus, but it would certainly be interesting to 
investigate the impact of using these APIs for tracing: that too is a 
feature enabled everywhere but utilized only by a small fraction of Linux 
users - so literally every single cycle or instruction saved or hot-path 
shortened is a major win.

Thanks,

Ingo


Re: [PATCH v1 2/4] thermal: tegra: remove unnecessary warnings

2018-11-08 Thread Wei Ni



On 8/11/2018 8:47 PM, Thierry Reding wrote:
> On Mon, Nov 05, 2018 at 05:32:32PM +0800, Wei Ni wrote:
>> Convert warnings to info as not all platforms may
>> have all the thresholds and sensors enabled.
>>
>> Signed-off-by: Wei Ni 
>> ---
>>  drivers/thermal/tegra/soctherm.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> This seems overly generalized to me. Shouldn't we be checking in a more
> fine-grained way for the absence of thresholds and/or sensors?
> 
> Otherwise, how are going to make the difference between the sensor not
> being enabled or the device tree just missing the information?
> 
The sensor being enabled or not is controlled by device tree, if the dts
have the corresponding nodes, then the sensors should be enabled. And
the thresholds for sensor are not necessary, so I think we just need to
print out them.
BTW, in my patch 1/4, I should print out the sensor name if the sensor
not enabled and register failed.
Will update it.

> Thierry
> 


Re: [PATCH v1 2/4] thermal: tegra: remove unnecessary warnings

2018-11-08 Thread Wei Ni



On 8/11/2018 8:47 PM, Thierry Reding wrote:
> On Mon, Nov 05, 2018 at 05:32:32PM +0800, Wei Ni wrote:
>> Convert warnings to info as not all platforms may
>> have all the thresholds and sensors enabled.
>>
>> Signed-off-by: Wei Ni 
>> ---
>>  drivers/thermal/tegra/soctherm.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> This seems overly generalized to me. Shouldn't we be checking in a more
> fine-grained way for the absence of thresholds and/or sensors?
> 
> Otherwise, how are going to make the difference between the sensor not
> being enabled or the device tree just missing the information?
> 
The sensor being enabled or not is controlled by device tree, if the dts
have the corresponding nodes, then the sensors should be enabled. And
the thresholds for sensor are not necessary, so I think we just need to
print out them.
BTW, in my patch 1/4, I should print out the sensor name if the sensor
not enabled and register failed.
Will update it.

> Thierry
> 


[GIT PULL] s390 patches for 4.20 #2

2018-11-08 Thread Martin Schwidefsky
Hi Linus,

please pull s390 fixes for 4.20-rc2

The following changes since commit e5f6d9afa3415104e402cd69288bb03f7165eeba:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc (2018-10-25 
18:14:31 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git s390-4.20-2

for you to fetch changes up to 0bb2ae1b26e1fb7543ec7474cdd374ac4b88c4da:

  s390/perf: Change CPUM_CF return code in event init function (2018-11-08 
07:58:16 +0100)


s390 updates for 4.20-rc2

 - A fix for the pgtable_bytes misaccounting on s390. The patch changes
   common code part in regard to page table folding and adds extra
   checks to mm_[inc|dec]_nr_[pmds|puds].

 - Add FORCE for all build targets using if_changed

 - Use non-loadable phdr for the .vmlinux.info section to avoid
   a segment overlap that confuses kexec

 - Cleanup the attribute definition for the diagnostic sampling

 - Increase stack size for CONFIG_KASAN=y builds

 - Export __node_distance to fix a build error

 - Correct return code of a PMU event init function

 - An update for the default configs


Heiko Carstens (1):
  s390: update defconfigs

Justin M. Forbes (1):
  s390/mm: Fix ERROR: "__node_distance" undefined!

Martin Schwidefsky (4):
  mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
  mm: introduce mm_[p4d|pud|pmd]_folded
  mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
  s390/mm: fix mis-accounting of pgtable_bytes

Thomas Richter (2):
  s390/cpum_sf: Rework attribute definition for diagnostic sampling
  s390/perf: Change CPUM_CF return code in event init function

Vasily Gorbik (4):
  s390/decompressor: add missing FORCE to build targets
  s390/vdso: add missing FORCE to build targets
  s390: avoid vmlinux segments overlap
  s390/kasan: increase instrumented stack size to 64k

 arch/arm/include/asm/pgtable-2level.h|  2 +-
 arch/m68k/include/asm/pgtable_mm.h   |  4 +-
 arch/microblaze/include/asm/pgtable.h|  2 +-
 arch/nds32/include/asm/pgtable.h |  2 +-
 arch/parisc/include/asm/pgtable.h|  2 +-
 arch/s390/Makefile   |  2 +-
 arch/s390/boot/compressed/Makefile   | 16 +++
 arch/s390/configs/debug_defconfig| 14 --
 arch/s390/configs/performance_defconfig  | 13 +-
 arch/s390/defconfig  | 79 +---
 arch/s390/include/asm/mmu_context.h  |  5 --
 arch/s390/include/asm/pgalloc.h  |  6 +--
 arch/s390/include/asm/pgtable.h  | 18 
 arch/s390/include/asm/thread_info.h  |  2 +-
 arch/s390/include/asm/tlb.h  |  6 +--
 arch/s390/kernel/entry.S |  6 +--
 arch/s390/kernel/perf_cpum_cf.c  |  2 +-
 arch/s390/kernel/perf_cpum_sf.c  | 33 +++--
 arch/s390/kernel/vdso32/Makefile |  6 +--
 arch/s390/kernel/vdso64/Makefile |  6 +--
 arch/s390/kernel/vmlinux.lds.S   |  4 +-
 arch/s390/mm/pgalloc.c   |  1 +
 arch/s390/numa/numa.c|  1 +
 include/asm-generic/4level-fixup.h   |  2 +-
 include/asm-generic/5level-fixup.h   |  2 +-
 include/asm-generic/pgtable-nop4d-hack.h |  2 +-
 include/asm-generic/pgtable-nop4d.h  |  2 +-
 include/asm-generic/pgtable-nopmd.h  |  2 +-
 include/asm-generic/pgtable-nopud.h  |  2 +-
 include/asm-generic/pgtable.h| 16 +++
 include/linux/mm.h   |  8 
 31 files changed, 175 insertions(+), 93 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h 
b/arch/arm/include/asm/pgtable-2level.h
index 92fd2c8..12659ce 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -10,7 +10,7 @@
 #ifndef _ASM_PGTABLE_2LEVEL_H
 #define _ASM_PGTABLE_2LEVEL_H
 
-#define __PAGETABLE_PMD_FOLDED
+#define __PAGETABLE_PMD_FOLDED 1
 
 /*
  * Hardware-wise, we have a two level page table structure, where the first
diff --git a/arch/m68k/include/asm/pgtable_mm.h 
b/arch/m68k/include/asm/pgtable_mm.h
index 6181e41..fe3ddd7 100644
--- a/arch/m68k/include/asm/pgtable_mm.h
+++ b/arch/m68k/include/asm/pgtable_mm.h
@@ -55,12 +55,12 @@
  */
 #ifdef CONFIG_SUN3
 #define PTRS_PER_PTE   16
-#define __PAGETABLE_PMD_FOLDED
+#define __PAGETABLE_PMD_FOLDED 1
 #define PTRS_PER_PMD   1
 #define PTRS_PER_PGD   2048
 #elif defined(CONFIG_COLDFIRE)
 #define PTRS_PER_PTE   512
-#define __PAGETABLE_PMD_FOLDED
+#define __PAGETABLE_PMD_FOLDED 1
 #define PTRS_PER_PMD   1
 #define PTRS_PER_PGD   1024
 #else
diff --git a/arch/microblaze/include/asm/pgtable.h 
b/arch/microblaze/include/asm/pgtable.h
index f64ebb9..e14b662 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -63,7 +63,7 @@ extern int mem_init_done;
 
 #include 
 

[GIT PULL] s390 patches for 4.20 #2

2018-11-08 Thread Martin Schwidefsky
Hi Linus,

please pull s390 fixes for 4.20-rc2

The following changes since commit e5f6d9afa3415104e402cd69288bb03f7165eeba:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc (2018-10-25 
18:14:31 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git s390-4.20-2

for you to fetch changes up to 0bb2ae1b26e1fb7543ec7474cdd374ac4b88c4da:

  s390/perf: Change CPUM_CF return code in event init function (2018-11-08 
07:58:16 +0100)


s390 updates for 4.20-rc2

 - A fix for the pgtable_bytes misaccounting on s390. The patch changes
   common code part in regard to page table folding and adds extra
   checks to mm_[inc|dec]_nr_[pmds|puds].

 - Add FORCE for all build targets using if_changed

 - Use non-loadable phdr for the .vmlinux.info section to avoid
   a segment overlap that confuses kexec

 - Cleanup the attribute definition for the diagnostic sampling

 - Increase stack size for CONFIG_KASAN=y builds

 - Export __node_distance to fix a build error

 - Correct return code of a PMU event init function

 - An update for the default configs


Heiko Carstens (1):
  s390: update defconfigs

Justin M. Forbes (1):
  s390/mm: Fix ERROR: "__node_distance" undefined!

Martin Schwidefsky (4):
  mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
  mm: introduce mm_[p4d|pud|pmd]_folded
  mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
  s390/mm: fix mis-accounting of pgtable_bytes

Thomas Richter (2):
  s390/cpum_sf: Rework attribute definition for diagnostic sampling
  s390/perf: Change CPUM_CF return code in event init function

Vasily Gorbik (4):
  s390/decompressor: add missing FORCE to build targets
  s390/vdso: add missing FORCE to build targets
  s390: avoid vmlinux segments overlap
  s390/kasan: increase instrumented stack size to 64k

 arch/arm/include/asm/pgtable-2level.h|  2 +-
 arch/m68k/include/asm/pgtable_mm.h   |  4 +-
 arch/microblaze/include/asm/pgtable.h|  2 +-
 arch/nds32/include/asm/pgtable.h |  2 +-
 arch/parisc/include/asm/pgtable.h|  2 +-
 arch/s390/Makefile   |  2 +-
 arch/s390/boot/compressed/Makefile   | 16 +++
 arch/s390/configs/debug_defconfig| 14 --
 arch/s390/configs/performance_defconfig  | 13 +-
 arch/s390/defconfig  | 79 +---
 arch/s390/include/asm/mmu_context.h  |  5 --
 arch/s390/include/asm/pgalloc.h  |  6 +--
 arch/s390/include/asm/pgtable.h  | 18 
 arch/s390/include/asm/thread_info.h  |  2 +-
 arch/s390/include/asm/tlb.h  |  6 +--
 arch/s390/kernel/entry.S |  6 +--
 arch/s390/kernel/perf_cpum_cf.c  |  2 +-
 arch/s390/kernel/perf_cpum_sf.c  | 33 +++--
 arch/s390/kernel/vdso32/Makefile |  6 +--
 arch/s390/kernel/vdso64/Makefile |  6 +--
 arch/s390/kernel/vmlinux.lds.S   |  4 +-
 arch/s390/mm/pgalloc.c   |  1 +
 arch/s390/numa/numa.c|  1 +
 include/asm-generic/4level-fixup.h   |  2 +-
 include/asm-generic/5level-fixup.h   |  2 +-
 include/asm-generic/pgtable-nop4d-hack.h |  2 +-
 include/asm-generic/pgtable-nop4d.h  |  2 +-
 include/asm-generic/pgtable-nopmd.h  |  2 +-
 include/asm-generic/pgtable-nopud.h  |  2 +-
 include/asm-generic/pgtable.h| 16 +++
 include/linux/mm.h   |  8 
 31 files changed, 175 insertions(+), 93 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h 
b/arch/arm/include/asm/pgtable-2level.h
index 92fd2c8..12659ce 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -10,7 +10,7 @@
 #ifndef _ASM_PGTABLE_2LEVEL_H
 #define _ASM_PGTABLE_2LEVEL_H
 
-#define __PAGETABLE_PMD_FOLDED
+#define __PAGETABLE_PMD_FOLDED 1
 
 /*
  * Hardware-wise, we have a two level page table structure, where the first
diff --git a/arch/m68k/include/asm/pgtable_mm.h 
b/arch/m68k/include/asm/pgtable_mm.h
index 6181e41..fe3ddd7 100644
--- a/arch/m68k/include/asm/pgtable_mm.h
+++ b/arch/m68k/include/asm/pgtable_mm.h
@@ -55,12 +55,12 @@
  */
 #ifdef CONFIG_SUN3
 #define PTRS_PER_PTE   16
-#define __PAGETABLE_PMD_FOLDED
+#define __PAGETABLE_PMD_FOLDED 1
 #define PTRS_PER_PMD   1
 #define PTRS_PER_PGD   2048
 #elif defined(CONFIG_COLDFIRE)
 #define PTRS_PER_PTE   512
-#define __PAGETABLE_PMD_FOLDED
+#define __PAGETABLE_PMD_FOLDED 1
 #define PTRS_PER_PMD   1
 #define PTRS_PER_PGD   1024
 #else
diff --git a/arch/microblaze/include/asm/pgtable.h 
b/arch/microblaze/include/asm/pgtable.h
index f64ebb9..e14b662 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -63,7 +63,7 @@ extern int mem_init_done;
 
 #include 
 

Re: RFC: userspace exception fixups

2018-11-08 Thread Christoph Hellwig
On Thu, Nov 08, 2018 at 12:05:42PM -0800, Andy Lutomirski wrote:
> This whole thing is a mess.  I'm starting to think that the cleanest
> solution would be to provide a way to just tell the kernel that
> certain RIP values have exception fixups.

The bay far cleanest solution would be to say that SGX is sich a mess
that we are not going to support it at all.  It's not like it is a must
have a feature to start with.


Re: RFC: userspace exception fixups

2018-11-08 Thread Christoph Hellwig
On Thu, Nov 08, 2018 at 12:05:42PM -0800, Andy Lutomirski wrote:
> This whole thing is a mess.  I'm starting to think that the cleanest
> solution would be to provide a way to just tell the kernel that
> certain RIP values have exception fixups.

The bay far cleanest solution would be to say that SGX is sich a mess
that we are not going to support it at all.  It's not like it is a must
have a feature to start with.


[PATCH] of: reserved_mem: disable kmemleak scan on removed memory blocks

2018-11-08 Thread Prateek Patel
From: Sri Krishna chowdary 

Memory reserved with "nomap" DT property in of_reserved_mem.c
removes the memory block. The removed memory blocks don't have
VA to PA mapping created in kernel page table. Kmemleak scan on
removed memory blocks is causing page faults and leading to
kernel panic. So, Disable kmemleak scan on the removed memory
blocks.

Following is the observed crash log:
[  154.846370] Unable to handle kernel paging request at virtual address 
ffc070a0
<1>[  154.846576] Mem abort info:
<1>[  154.846635]   Exception class = DABT (current EL), IL = 32 bits
<1>[  154.846737]   SET = 0, FnV = 0
<1>[  154.846796]   EA = 0, S1PTW = 0
<1>[  154.846859] Data abort info:
<1>[  154.846913]   ISV = 0, ISS = 0x0006
<1>[  154.846983]   CM = 0, WnR = 0
<1>[  154.847053] swapper pgtable: 4k pages, 39-bit VAs, pgd = ff8009df7000
<1>[  154.847228] [ffc070a0] *pgd=00087fff5803, 
*pud=00087fff5803, *pmd=
<0>[  154.847408] Internal error: Oops: 9606 [#1] PREEMPT SMP
<4>[  154.847511] Modules linked in: nvs_led_test nvs_bmi160 nvs_cm3218 
nvs_bh1730fvc nvi_bmpX80 nvi_ak89xx nvi_mpu cdc_acm uas lr388k7_ts imx268 
imx318 imx204 imx274 imx185 lc898212 ov23850 ov10823 ov9281 ov5693 tc358840 
pca9570 nvs snd_soc_tegra_machine_driver_mobile lp855x_bl spidev input_cfboost 
pwm_tegra tegra_cryptodev tegra_se_nvhost tegra_se_elp tegra_se ghash_ce 
sha2_ce sha1_ce aes_ce_ccm cryptd nvgpu cpufreq_userspace 
snd_soc_tegra186_alt_dspk snd_soc_tegra186_alt_asrc snd_soc_tegra186_alt_arad 
snd_soc_tegra210_alt_ope snd_soc_tegra210_alt_mvc snd_soc_tegra210_alt_dmic 
snd_soc_tegra210_alt_amx snd_soc_tegra210_alt_adx snd_soc_tegra210_alt_afc 
snd_soc_tegra210_alt_mixer snd_soc_tegra210_alt_i2s snd_soc_tegra210_alt_sfc 
snd_soc_tegra210_alt_adsp snd_soc_tegra210_alt_admaif snd_soc_tegra210_alt_xbar
<4>[  154.882606]  snd_soc_tegra_alt_utils snd_hda_tegra
<4>[  154.888133] CPU: 2 PID: 8079 Comm: sh Not tainted 
4.14.53-tegra-05132-g9c33465 #2
<4>[  154.895983] Hardware name: e3360_1099 (DT)
<4>[  154.900447] task: ffc7d62dda00 task.stack: ff800e2b
<4>[  154.906502] PC is at scan_block+0x7c/0x148
<4>[  154.911234] LR is at scan_block+0x78/0x148
<4>[  154.915689] pc : [] lr : [] pstate: 
804000c9
<4>[  154.923290] sp : ff800e2b3b80
<4>[  154.927228] x29: ff800e2b3b80 x28: ffc7d62dda00
<4>[  154.932999] x27: ff8009aaa000 x26: ffc070c0
<4>[  154.938769] x25: 00c0 x24: ff8009d90608
<4>[  154.944287] x23: ffc7dc6c6000 x22: ff8009d9
<4>[  154.950320] x21: ff8009aeb320 x20: ffc070a00ff9
<4>[  154.955919] x19: ffc070a0 x18: bec4c3f2
<4>[  154.961438] x17: 002224777924 x16: ff80080bb0e0
<4>[  154.967124] x15:  x14: 0f75
<4>[  154.973069] x13: 000f x12: ffbf1e9f4240
<4>[  154.978670] x11: 0040 x10: 0ad0
<4>[  154.984107] x9 : ff800e2b3ab0 x8 : ffc7d62de530
<4>[  154.989958] x7 : 00078000 x6 : 0018
<4>[  154.995645] x5 :  x4 : 
<4>[  155.001245] x3 : ff8009aaa000 x2 : 0047f6712000
<4>[  155.006846] x1 : ffc7d1ae6900 x0 : 

Signed-off-by: Sri Krishna chowdary 
Signed-off-by: Prateek 
---
 drivers/of/of_reserved_mem.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 1977ee0..ac8f377 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_RESERVED_REGIONS   32
 static struct reserved_mem reserved_mem[MAX_RESERVED_REGIONS];
@@ -50,8 +51,10 @@ int __init __weak 
early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
}
 
*res_base = base;
-   if (nomap)
+   if (nomap) {
+   kmemleak_no_scan(__va(base));
return memblock_remove(base, size);
+   }
return 0;
 }
 
-- 
2.1.4



[PATCH] of: reserved_mem: disable kmemleak scan on removed memory blocks

2018-11-08 Thread Prateek Patel
From: Sri Krishna chowdary 

Memory reserved with "nomap" DT property in of_reserved_mem.c
removes the memory block. The removed memory blocks don't have
VA to PA mapping created in kernel page table. Kmemleak scan on
removed memory blocks is causing page faults and leading to
kernel panic. So, Disable kmemleak scan on the removed memory
blocks.

Following is the observed crash log:
[  154.846370] Unable to handle kernel paging request at virtual address 
ffc070a0
<1>[  154.846576] Mem abort info:
<1>[  154.846635]   Exception class = DABT (current EL), IL = 32 bits
<1>[  154.846737]   SET = 0, FnV = 0
<1>[  154.846796]   EA = 0, S1PTW = 0
<1>[  154.846859] Data abort info:
<1>[  154.846913]   ISV = 0, ISS = 0x0006
<1>[  154.846983]   CM = 0, WnR = 0
<1>[  154.847053] swapper pgtable: 4k pages, 39-bit VAs, pgd = ff8009df7000
<1>[  154.847228] [ffc070a0] *pgd=00087fff5803, 
*pud=00087fff5803, *pmd=
<0>[  154.847408] Internal error: Oops: 9606 [#1] PREEMPT SMP
<4>[  154.847511] Modules linked in: nvs_led_test nvs_bmi160 nvs_cm3218 
nvs_bh1730fvc nvi_bmpX80 nvi_ak89xx nvi_mpu cdc_acm uas lr388k7_ts imx268 
imx318 imx204 imx274 imx185 lc898212 ov23850 ov10823 ov9281 ov5693 tc358840 
pca9570 nvs snd_soc_tegra_machine_driver_mobile lp855x_bl spidev input_cfboost 
pwm_tegra tegra_cryptodev tegra_se_nvhost tegra_se_elp tegra_se ghash_ce 
sha2_ce sha1_ce aes_ce_ccm cryptd nvgpu cpufreq_userspace 
snd_soc_tegra186_alt_dspk snd_soc_tegra186_alt_asrc snd_soc_tegra186_alt_arad 
snd_soc_tegra210_alt_ope snd_soc_tegra210_alt_mvc snd_soc_tegra210_alt_dmic 
snd_soc_tegra210_alt_amx snd_soc_tegra210_alt_adx snd_soc_tegra210_alt_afc 
snd_soc_tegra210_alt_mixer snd_soc_tegra210_alt_i2s snd_soc_tegra210_alt_sfc 
snd_soc_tegra210_alt_adsp snd_soc_tegra210_alt_admaif snd_soc_tegra210_alt_xbar
<4>[  154.882606]  snd_soc_tegra_alt_utils snd_hda_tegra
<4>[  154.888133] CPU: 2 PID: 8079 Comm: sh Not tainted 
4.14.53-tegra-05132-g9c33465 #2
<4>[  154.895983] Hardware name: e3360_1099 (DT)
<4>[  154.900447] task: ffc7d62dda00 task.stack: ff800e2b
<4>[  154.906502] PC is at scan_block+0x7c/0x148
<4>[  154.911234] LR is at scan_block+0x78/0x148
<4>[  154.915689] pc : [] lr : [] pstate: 
804000c9
<4>[  154.923290] sp : ff800e2b3b80
<4>[  154.927228] x29: ff800e2b3b80 x28: ffc7d62dda00
<4>[  154.932999] x27: ff8009aaa000 x26: ffc070c0
<4>[  154.938769] x25: 00c0 x24: ff8009d90608
<4>[  154.944287] x23: ffc7dc6c6000 x22: ff8009d9
<4>[  154.950320] x21: ff8009aeb320 x20: ffc070a00ff9
<4>[  154.955919] x19: ffc070a0 x18: bec4c3f2
<4>[  154.961438] x17: 002224777924 x16: ff80080bb0e0
<4>[  154.967124] x15:  x14: 0f75
<4>[  154.973069] x13: 000f x12: ffbf1e9f4240
<4>[  154.978670] x11: 0040 x10: 0ad0
<4>[  154.984107] x9 : ff800e2b3ab0 x8 : ffc7d62de530
<4>[  154.989958] x7 : 00078000 x6 : 0018
<4>[  154.995645] x5 :  x4 : 
<4>[  155.001245] x3 : ff8009aaa000 x2 : 0047f6712000
<4>[  155.006846] x1 : ffc7d1ae6900 x0 : 

Signed-off-by: Sri Krishna chowdary 
Signed-off-by: Prateek 
---
 drivers/of/of_reserved_mem.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 1977ee0..ac8f377 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_RESERVED_REGIONS   32
 static struct reserved_mem reserved_mem[MAX_RESERVED_REGIONS];
@@ -50,8 +51,10 @@ int __init __weak 
early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
}
 
*res_base = base;
-   if (nomap)
+   if (nomap) {
+   kmemleak_no_scan(__va(base));
return memblock_remove(base, size);
+   }
return 0;
 }
 
-- 
2.1.4



[PATCH v6 6/9] ARM: l2x0: add marvell,ecc-enable property for aurora

2018-11-08 Thread Chris Packham
The aurora cache on the Marvell Armada-XP SoC supports ECC protection
for the L2 data arrays. Add a "marvell,ecc-enable" device tree property
which can be used to enable this.

Signed-off-by: Chris Packham 
[j...@pengutronix.de: use aurora specific define AURORA_ACR_ECC_EN]
Signed-off-by: Jan Luebbe 
---
 arch/arm/mm/cache-l2x0.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index b70bee74750d..644f786e4fa9 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1505,6 +1505,13 @@ static void __init aurora_of_parse(const struct 
device_node *np,
mask |= AURORA_ACR_FORCE_WRITE_POLICY_MASK;
}
 
+   if (of_property_read_bool(np, "marvell,ecc-enable")) {
+   mask |= AURORA_ACR_ECC_EN;
+   val |= AURORA_ACR_ECC_EN;
+   } else if (of_property_read_bool(np, "marvell,ecc-disable")) {
+   mask |= AURORA_ACR_ECC_EN;
+   }
+
if (of_property_read_bool(np, "arm,parity-enable")) {
mask |= AURORA_ACR_PARITY_EN;
val |= AURORA_ACR_PARITY_EN;
-- 
2.19.1


[PATCH v6 7/9] EDAC: Add missing debugfs_create_x32 wrapper

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

We already have wrappers for x8 and x16, so add the missing x32 one.

Signed-off-by: Jan Luebbe 
Reviewed-by: Borislav Petkov 
Signed-off-by: Chris Packham 
---
 drivers/edac/debugfs.c | 11 +++
 drivers/edac/edac_module.h |  5 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/edac/debugfs.c b/drivers/edac/debugfs.c
index 92dbb7e2320c..268ede7a60b2 100644
--- a/drivers/edac/debugfs.c
+++ b/drivers/edac/debugfs.c
@@ -161,3 +161,14 @@ struct dentry *edac_debugfs_create_x16(const char *name, 
umode_t mode,
return debugfs_create_x16(name, mode, parent, value);
 }
 EXPORT_SYMBOL_GPL(edac_debugfs_create_x16);
+
+/* Wrapper for debugfs_create_x32() */
+struct dentry *edac_debugfs_create_x32(const char *name, umode_t mode,
+  struct dentry *parent, u32 *value)
+{
+   if (!parent)
+   parent = edac_debugfs;
+
+   return debugfs_create_x32(name, mode, parent, value);
+}
+EXPORT_SYMBOL_GPL(edac_debugfs_create_x32);
diff --git a/drivers/edac/edac_module.h b/drivers/edac/edac_module.h
index dec88dcea036..546b16e29221 100644
--- a/drivers/edac/edac_module.h
+++ b/drivers/edac/edac_module.h
@@ -82,6 +82,8 @@ struct dentry *
 edac_debugfs_create_x8(const char *name, umode_t mode, struct dentry *parent, 
u8 *value);
 struct dentry *
 edac_debugfs_create_x16(const char *name, umode_t mode, struct dentry *parent, 
u16 *value);
+struct dentry *
+edac_debugfs_create_x32(const char *name, umode_t mode, struct dentry *parent, 
u32 *value);
 #else
 static inline int edac_debugfs_init(void)  
{ return -ENODEV; }
 static inline void edac_debugfs_exit(void) 
{ }
@@ -98,6 +100,9 @@ edac_debugfs_create_x8(const char *name, umode_t mode,
 static inline struct dentry *
 edac_debugfs_create_x16(const char *name, umode_t mode,
   struct dentry *parent, u16 *value)   
{ return NULL; }
+static inline struct dentry *
+edac_debugfs_create_x32(const char *name, umode_t mode,
+  struct dentry *parent, u32 *value)   
{ return NULL; }
 #endif
 
 /*
-- 
2.19.1


[PATCH v6 3/9] ARM: aurora-l2: add defines for parity and ECC registers

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

These defines will be used by subsequent patches to add support for the
parity check and error correction functionality in the Aurora L2 cache
controller.

Signed-off-by: Jan Luebbe 
Signed-off-by: Chris Packham 
---
 .../include/asm/hardware/cache-aurora-l2.h| 48 +++
 1 file changed, 48 insertions(+)

diff --git a/arch/arm/include/asm/hardware/cache-aurora-l2.h 
b/arch/arm/include/asm/hardware/cache-aurora-l2.h
index dc5c479ec4c3..39769ffa0051 100644
--- a/arch/arm/include/asm/hardware/cache-aurora-l2.h
+++ b/arch/arm/include/asm/hardware/cache-aurora-l2.h
@@ -31,6 +31,9 @@
 #define AURORA_ACR_REPLACEMENT_TYPE_SEMIPLRU \
(3 << AURORA_ACR_REPLACEMENT_OFFSET)
 
+#define AURORA_ACR_PARITY_EN   (1 << 21)
+#define AURORA_ACR_ECC_EN  (1 << 20)
+
 #define AURORA_ACR_FORCE_WRITE_POLICY_OFFSET   0
 #define AURORA_ACR_FORCE_WRITE_POLICY_MASK \
(0x3 << AURORA_ACR_FORCE_WRITE_POLICY_OFFSET)
@@ -41,6 +44,51 @@
 #define AURORA_ACR_FORCE_WRITE_THRO_POLICY \
(2 << AURORA_ACR_FORCE_WRITE_POLICY_OFFSET)
 
+#define AURORA_ERR_CNT_REG  0x600
+#define AURORA_ERR_ATTR_CAP_REG 0x608
+#define AURORA_ERR_ADDR_CAP_REG 0x60c
+#define AURORA_ERR_WAY_CAP_REG  0x610
+#define AURORA_ERR_INJECT_CTL_REG   0x614
+#define AURORA_ERR_INJECT_MASK_REG  0x618
+
+#define AURORA_ERR_CNT_CLR_OFFSET 31
+#define AURORA_ERR_CNT_CLR\
+   (0x1 << AURORA_ERR_CNT_CLR_OFFSET)
+#define AURORA_ERR_CNT_UE_OFFSET  16
+#define AURORA_ERR_CNT_UE_MASK \
+   (0x7fff << AURORA_ERR_CNT_UE_OFFSET)
+#define AURORA_ERR_CNT_CE_OFFSET   0
+#define AURORA_ERR_CNT_CE_MASK \
+   (0x << AURORA_ERR_CNT_CE_OFFSET)
+
+#define AURORA_ERR_ATTR_SRC_OFF   16
+#define AURORA_ERR_ATTR_SRC_MSK\
+   (0x7 << AURORA_ERR_ATTR_SRC_OFF)
+#define AURORA_ERR_ATTR_TXN_OFF   12
+#define AURORA_ERR_ATTR_TXN_MSK\
+   (0xf << AURORA_ERR_ATTR_TXN_OFF)
+#define AURORA_ERR_ATTR_ERR_OFF8
+#define AURORA_ERR_ATTR_ERR_MSK\
+   (0x3 << AURORA_ERR_ATTR_ERR_OFF)
+#define AURORA_ERR_ATTR_CAP_VALID_OFF  0
+#define AURORA_ERR_ATTR_CAP_VALID  \
+   (0x1 << AURORA_ERR_ATTR_CAP_VALID_OFF)
+
+#define AURORA_ERR_ADDR_CAP_ADDR_MASK 0xffe0
+
+#define AURORA_ERR_WAY_IDX_OFF 8
+#define AURORA_ERR_WAY_IDX_MSK \
+   (0xfff << AURORA_ERR_WAY_IDX_OFF)
+#define AURORA_ERR_WAY_CAP_WAY_OFFSET  1
+#define AURORA_ERR_WAY_CAP_WAY_MASK\
+   (0xf << AURORA_ERR_WAY_CAP_WAY_OFFSET)
+
+#define AURORA_ERR_INJECT_CTL_ADDR_MASK 0xfff0
+#define AURORA_ERR_ATTR_TXN_OFF   12
+#define AURORA_ERR_INJECT_CTL_EN_MASK  0x3
+#define AURORA_ERR_INJECT_CTL_EN_PARITY0x2
+#define AURORA_ERR_INJECT_CTL_EN_ECC   0x1
+
 #define AURORA_MAX_RANGE_SIZE  1024
 
 #define AURORA_WAY_SIZE_SHIFT  2
-- 
2.19.1


[PATCH v6 1/9] ARM: l2c: move cache-aurora-l2.h to asm/hardware

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

This include file will be used by the AURORA EDAC code.

Signed-off-by: Jan Luebbe 
Reviewed-by: Gregory CLEMENT 
Signed-off-by: Chris Packham 
---
 arch/arm/{mm => include/asm/hardware}/cache-aurora-l2.h | 0
 arch/arm/mm/cache-l2x0.c| 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/arm/{mm => include/asm/hardware}/cache-aurora-l2.h (100%)

diff --git a/arch/arm/mm/cache-aurora-l2.h 
b/arch/arm/include/asm/hardware/cache-aurora-l2.h
similarity index 100%
rename from arch/arm/mm/cache-aurora-l2.h
rename to arch/arm/include/asm/hardware/cache-aurora-l2.h
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 808efbb89b88..a00d6f7fd34c 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -30,8 +30,8 @@
 #include 
 #include 
 #include 
+#include 
 #include "cache-tauros3.h"
-#include "cache-aurora-l2.h"
 
 struct l2c_init_data {
const char *type;
-- 
2.19.1


[PATCH v6 7/9] EDAC: Add missing debugfs_create_x32 wrapper

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

We already have wrappers for x8 and x16, so add the missing x32 one.

Signed-off-by: Jan Luebbe 
Reviewed-by: Borislav Petkov 
Signed-off-by: Chris Packham 
---
 drivers/edac/debugfs.c | 11 +++
 drivers/edac/edac_module.h |  5 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/edac/debugfs.c b/drivers/edac/debugfs.c
index 92dbb7e2320c..268ede7a60b2 100644
--- a/drivers/edac/debugfs.c
+++ b/drivers/edac/debugfs.c
@@ -161,3 +161,14 @@ struct dentry *edac_debugfs_create_x16(const char *name, 
umode_t mode,
return debugfs_create_x16(name, mode, parent, value);
 }
 EXPORT_SYMBOL_GPL(edac_debugfs_create_x16);
+
+/* Wrapper for debugfs_create_x32() */
+struct dentry *edac_debugfs_create_x32(const char *name, umode_t mode,
+  struct dentry *parent, u32 *value)
+{
+   if (!parent)
+   parent = edac_debugfs;
+
+   return debugfs_create_x32(name, mode, parent, value);
+}
+EXPORT_SYMBOL_GPL(edac_debugfs_create_x32);
diff --git a/drivers/edac/edac_module.h b/drivers/edac/edac_module.h
index dec88dcea036..546b16e29221 100644
--- a/drivers/edac/edac_module.h
+++ b/drivers/edac/edac_module.h
@@ -82,6 +82,8 @@ struct dentry *
 edac_debugfs_create_x8(const char *name, umode_t mode, struct dentry *parent, 
u8 *value);
 struct dentry *
 edac_debugfs_create_x16(const char *name, umode_t mode, struct dentry *parent, 
u16 *value);
+struct dentry *
+edac_debugfs_create_x32(const char *name, umode_t mode, struct dentry *parent, 
u32 *value);
 #else
 static inline int edac_debugfs_init(void)  
{ return -ENODEV; }
 static inline void edac_debugfs_exit(void) 
{ }
@@ -98,6 +100,9 @@ edac_debugfs_create_x8(const char *name, umode_t mode,
 static inline struct dentry *
 edac_debugfs_create_x16(const char *name, umode_t mode,
   struct dentry *parent, u16 *value)   
{ return NULL; }
+static inline struct dentry *
+edac_debugfs_create_x32(const char *name, umode_t mode,
+  struct dentry *parent, u32 *value)   
{ return NULL; }
 #endif
 
 /*
-- 
2.19.1


[PATCH v6 3/9] ARM: aurora-l2: add defines for parity and ECC registers

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

These defines will be used by subsequent patches to add support for the
parity check and error correction functionality in the Aurora L2 cache
controller.

Signed-off-by: Jan Luebbe 
Signed-off-by: Chris Packham 
---
 .../include/asm/hardware/cache-aurora-l2.h| 48 +++
 1 file changed, 48 insertions(+)

diff --git a/arch/arm/include/asm/hardware/cache-aurora-l2.h 
b/arch/arm/include/asm/hardware/cache-aurora-l2.h
index dc5c479ec4c3..39769ffa0051 100644
--- a/arch/arm/include/asm/hardware/cache-aurora-l2.h
+++ b/arch/arm/include/asm/hardware/cache-aurora-l2.h
@@ -31,6 +31,9 @@
 #define AURORA_ACR_REPLACEMENT_TYPE_SEMIPLRU \
(3 << AURORA_ACR_REPLACEMENT_OFFSET)
 
+#define AURORA_ACR_PARITY_EN   (1 << 21)
+#define AURORA_ACR_ECC_EN  (1 << 20)
+
 #define AURORA_ACR_FORCE_WRITE_POLICY_OFFSET   0
 #define AURORA_ACR_FORCE_WRITE_POLICY_MASK \
(0x3 << AURORA_ACR_FORCE_WRITE_POLICY_OFFSET)
@@ -41,6 +44,51 @@
 #define AURORA_ACR_FORCE_WRITE_THRO_POLICY \
(2 << AURORA_ACR_FORCE_WRITE_POLICY_OFFSET)
 
+#define AURORA_ERR_CNT_REG  0x600
+#define AURORA_ERR_ATTR_CAP_REG 0x608
+#define AURORA_ERR_ADDR_CAP_REG 0x60c
+#define AURORA_ERR_WAY_CAP_REG  0x610
+#define AURORA_ERR_INJECT_CTL_REG   0x614
+#define AURORA_ERR_INJECT_MASK_REG  0x618
+
+#define AURORA_ERR_CNT_CLR_OFFSET 31
+#define AURORA_ERR_CNT_CLR\
+   (0x1 << AURORA_ERR_CNT_CLR_OFFSET)
+#define AURORA_ERR_CNT_UE_OFFSET  16
+#define AURORA_ERR_CNT_UE_MASK \
+   (0x7fff << AURORA_ERR_CNT_UE_OFFSET)
+#define AURORA_ERR_CNT_CE_OFFSET   0
+#define AURORA_ERR_CNT_CE_MASK \
+   (0x << AURORA_ERR_CNT_CE_OFFSET)
+
+#define AURORA_ERR_ATTR_SRC_OFF   16
+#define AURORA_ERR_ATTR_SRC_MSK\
+   (0x7 << AURORA_ERR_ATTR_SRC_OFF)
+#define AURORA_ERR_ATTR_TXN_OFF   12
+#define AURORA_ERR_ATTR_TXN_MSK\
+   (0xf << AURORA_ERR_ATTR_TXN_OFF)
+#define AURORA_ERR_ATTR_ERR_OFF8
+#define AURORA_ERR_ATTR_ERR_MSK\
+   (0x3 << AURORA_ERR_ATTR_ERR_OFF)
+#define AURORA_ERR_ATTR_CAP_VALID_OFF  0
+#define AURORA_ERR_ATTR_CAP_VALID  \
+   (0x1 << AURORA_ERR_ATTR_CAP_VALID_OFF)
+
+#define AURORA_ERR_ADDR_CAP_ADDR_MASK 0xffe0
+
+#define AURORA_ERR_WAY_IDX_OFF 8
+#define AURORA_ERR_WAY_IDX_MSK \
+   (0xfff << AURORA_ERR_WAY_IDX_OFF)
+#define AURORA_ERR_WAY_CAP_WAY_OFFSET  1
+#define AURORA_ERR_WAY_CAP_WAY_MASK\
+   (0xf << AURORA_ERR_WAY_CAP_WAY_OFFSET)
+
+#define AURORA_ERR_INJECT_CTL_ADDR_MASK 0xfff0
+#define AURORA_ERR_ATTR_TXN_OFF   12
+#define AURORA_ERR_INJECT_CTL_EN_MASK  0x3
+#define AURORA_ERR_INJECT_CTL_EN_PARITY0x2
+#define AURORA_ERR_INJECT_CTL_EN_ECC   0x1
+
 #define AURORA_MAX_RANGE_SIZE  1024
 
 #define AURORA_WAY_SIZE_SHIFT  2
-- 
2.19.1


[PATCH v6 1/9] ARM: l2c: move cache-aurora-l2.h to asm/hardware

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

This include file will be used by the AURORA EDAC code.

Signed-off-by: Jan Luebbe 
Reviewed-by: Gregory CLEMENT 
Signed-off-by: Chris Packham 
---
 arch/arm/{mm => include/asm/hardware}/cache-aurora-l2.h | 0
 arch/arm/mm/cache-l2x0.c| 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/arm/{mm => include/asm/hardware}/cache-aurora-l2.h (100%)

diff --git a/arch/arm/mm/cache-aurora-l2.h 
b/arch/arm/include/asm/hardware/cache-aurora-l2.h
similarity index 100%
rename from arch/arm/mm/cache-aurora-l2.h
rename to arch/arm/include/asm/hardware/cache-aurora-l2.h
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 808efbb89b88..a00d6f7fd34c 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -30,8 +30,8 @@
 #include 
 #include 
 #include 
+#include 
 #include "cache-tauros3.h"
-#include "cache-aurora-l2.h"
 
 struct l2c_init_data {
const char *type;
-- 
2.19.1


[PATCH v6 6/9] ARM: l2x0: add marvell,ecc-enable property for aurora

2018-11-08 Thread Chris Packham
The aurora cache on the Marvell Armada-XP SoC supports ECC protection
for the L2 data arrays. Add a "marvell,ecc-enable" device tree property
which can be used to enable this.

Signed-off-by: Chris Packham 
[j...@pengutronix.de: use aurora specific define AURORA_ACR_ECC_EN]
Signed-off-by: Jan Luebbe 
---
 arch/arm/mm/cache-l2x0.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index b70bee74750d..644f786e4fa9 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1505,6 +1505,13 @@ static void __init aurora_of_parse(const struct 
device_node *np,
mask |= AURORA_ACR_FORCE_WRITE_POLICY_MASK;
}
 
+   if (of_property_read_bool(np, "marvell,ecc-enable")) {
+   mask |= AURORA_ACR_ECC_EN;
+   val |= AURORA_ACR_ECC_EN;
+   } else if (of_property_read_bool(np, "marvell,ecc-disable")) {
+   mask |= AURORA_ACR_ECC_EN;
+   }
+
if (of_property_read_bool(np, "arm,parity-enable")) {
mask |= AURORA_ACR_PARITY_EN;
val |= AURORA_ACR_PARITY_EN;
-- 
2.19.1


[PATCH v6 8/9] EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

Add support for the ECC functionality as found in the DDR RAM and L2
cache controllers on the MV78230/MV78x60 SoCs. This driver has been
tested on the MV78460 (on a custom board with a DDR3 ECC DIMM).

Signed-off-by: Jan Luebbe 
[cp use SPDX license]
Signed-off-by: Chris Packham 
---
 MAINTAINERS   |   6 +
 drivers/edac/Kconfig  |   7 +
 drivers/edac/Makefile |   1 +
 drivers/edac/armada_xp_edac.c | 630 ++
 4 files changed, 644 insertions(+)
 create mode 100644 drivers/edac/armada_xp_edac.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6c7ed26e84fa..7ae4cfa5c121 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5242,6 +5242,12 @@ L:   linux-e...@vger.kernel.org
 S: Maintained
 F: drivers/edac/amd64_edac*
 
+EDAC-ARMADA
+M: Jan Luebbe 
+L: linux-e...@vger.kernel.org
+S: Maintained
+F: drivers/edac/armada_xp_*
+
 EDAC-CALXEDA
 M: Robert Richter 
 L: linux-e...@vger.kernel.org
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 57304b2e989f..4567757d9f82 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -439,6 +439,13 @@ config EDAC_ALTERA_SDMMC
  Support for error detection and correction on the
  Altera SDMMC FIFO Memory for Altera SoCs.
 
+config EDAC_ARMADA_XP
+   bool "Marvell Armada XP DDR and L2 Cache ECC"
+   depends on MACH_MVEBU_V7
+   help
+ Support for error correction and detection on the Marvell Aramada XP
+ DDR RAM and L2 cache controllers.
+
 config EDAC_SYNOPSYS
tristate "Synopsys DDR Memory Controller"
depends on ARCH_ZYNQ
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 02b43a7d8c3e..f3ea40b0ce9c 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -74,6 +74,7 @@ obj-$(CONFIG_EDAC_OCTEON_PCI) += octeon_edac-pci.o
 obj-$(CONFIG_EDAC_THUNDERX)+= thunderx_edac.o
 
 obj-$(CONFIG_EDAC_ALTERA)  += altera_edac.o
+obj-$(CONFIG_EDAC_ARMADA_XP)   += armada_xp_edac.o
 obj-$(CONFIG_EDAC_SYNOPSYS)+= synopsys_edac.o
 obj-$(CONFIG_EDAC_XGENE)   += xgene_edac.o
 obj-$(CONFIG_EDAC_TI)  += ti_edac.o
diff --git a/drivers/edac/armada_xp_edac.c b/drivers/edac/armada_xp_edac.c
new file mode 100644
index ..3759a4fbbdee
--- /dev/null
+++ b/drivers/edac/armada_xp_edac.c
@@ -0,0 +1,630 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017 Pengutronix, Jan Luebbe 
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "edac_mc.h"
+#include "edac_device.h"
+#include "edac_module.h"
+
+/ EDAC MC (DDR RAM) /
+
+#define SDRAM_NUM_CS 4
+
+#define SDRAM_CONFIG_REG0x0
+#define SDRAM_CONFIG_ECC_MASK BIT(18)
+#define SDRAM_CONFIG_REGISTERED_MASK  BIT(17)
+#define SDRAM_CONFIG_BUS_WIDTH_MASK   BIT(15)
+
+#define SDRAM_ADDR_CTRL_REG 0x10
+#define SDRAM_ADDR_CTRL_SIZE_HIGH_OFFSET(cs) (20+cs)
+#define SDRAM_ADDR_CTRL_SIZE_HIGH_MASK(cs)   (0x1 << 
SDRAM_ADDR_CTRL_SIZE_HIGH_OFFSET(cs))
+#define SDRAM_ADDR_CTRL_ADDR_SEL_MASK(cs)BIT(16+cs)
+#define SDRAM_ADDR_CTRL_SIZE_LOW_OFFSET(cs)  (cs*4+2)
+#define SDRAM_ADDR_CTRL_SIZE_LOW_MASK(cs)(0x3 << 
SDRAM_ADDR_CTRL_SIZE_LOW_OFFSET(cs))
+#define SDRAM_ADDR_CTRL_STRUCT_OFFSET(cs)(cs*4)
+#define SDRAM_ADDR_CTRL_STRUCT_MASK(cs)  (0x3 << 
SDRAM_ADDR_CTRL_STRUCT_OFFSET(cs))
+
+#define SDRAM_ERR_DATA_H_REG0x40
+#define SDRAM_ERR_DATA_L_REG0x44
+
+#define SDRAM_ERR_RECV_ECC_REG  0x48
+#define SDRAM_ERR_RECV_ECC_VALUE_MASK 0xff
+
+#define SDRAM_ERR_CALC_ECC_REG  0x4c
+#define SDRAM_ERR_CALC_ECC_ROW_OFFSET 8
+#define SDRAM_ERR_CALC_ECC_ROW_MASK   (0x << SDRAM_ERR_CALC_ECC_ROW_OFFSET)
+#define SDRAM_ERR_CALC_ECC_VALUE_MASK 0xff
+
+#define SDRAM_ERR_ADDR_REG  0x50
+#define SDRAM_ERR_ADDR_BANK_OFFSET23
+#define SDRAM_ERR_ADDR_BANK_MASK  (0x7 << SDRAM_ERR_ADDR_BANK_OFFSET)
+#define SDRAM_ERR_ADDR_COL_OFFSET 8
+#define SDRAM_ERR_ADDR_COL_MASK   (0x7fff << SDRAM_ERR_ADDR_COL_OFFSET)
+#define SDRAM_ERR_ADDR_CS_OFFSET  1
+#define SDRAM_ERR_ADDR_CS_MASK(0x3 << SDRAM_ERR_ADDR_CS_OFFSET)
+#define SDRAM_ERR_ADDR_TYPE_MASK  BIT(0)
+
+#define SDRAM_ERR_CTRL_REG  0x54
+#define SDRAM_ERR_CTRL_THR_OFFSET 16
+#define SDRAM_ERR_CTRL_THR_MASK   (0xff << SDRAM_ERR_CTRL_THR_OFFSET)
+#define SDRAM_ERR_CTRL_PROP_MASK  BIT(9)
+
+#define SDRAM_ERR_SBE_COUNT_REG 0x58
+#define SDRAM_ERR_DBE_COUNT_REG 0x5c
+
+#define SDRAM_ERR_CAUSE_ERR_REG 0xd0
+#define SDRAM_ERR_CAUSE_MSG_REG 0xd8
+#define SDRAM_ERR_CAUSE_DBE_MASK  BIT(1)
+#define SDRAM_ERR_CAUSE_SBE_MASK  BIT(0)
+
+#define SDRAM_RANK_CTRL_REG 0x1e0
+#define SDRAM_RANK_CTRL_EXIST_MASK(cs) BIT(cs)
+
+struct axp_mc_drvdata {
+   void __iomem *base;
+   /* width in bytes */
+   unsigned int width;
+   /* bank interleaving */
+   bool 

[PATCH v6 4/9] ARM: l2x0: support parity-enable/disable on aurora

2018-11-08 Thread Chris Packham
The aurora cache on the Marvell Armada-XP SoC supports the same tag
parity features as the other l2x0 cache implementations.

Signed-off-by: Chris Packham 
[j...@pengutronix.de: use aurora specific define AURORA_ACR_PARITY_EN]
Signed-off-by: Jan Luebbe 
---
 arch/arm/mm/cache-l2x0.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 7d2d2a3c67d0..b70bee74750d 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1505,6 +1505,13 @@ static void __init aurora_of_parse(const struct 
device_node *np,
mask |= AURORA_ACR_FORCE_WRITE_POLICY_MASK;
}
 
+   if (of_property_read_bool(np, "arm,parity-enable")) {
+   mask |= AURORA_ACR_PARITY_EN;
+   val |= AURORA_ACR_PARITY_EN;
+   } else if (of_property_read_bool(np, "arm,parity-disable")) {
+   mask |= AURORA_ACR_PARITY_EN;
+   }
+
*aux_val &= ~mask;
*aux_val |= val;
*aux_mask &= ~mask;
-- 
2.19.1


[PATCH v6 9/9] EDAC: armada_xp: Add support for more SoCs

2018-11-08 Thread Chris Packham
The Armada 38x and other integrated SoCs use a reduced pin count so the
width of the SDRAM interface is smaller than the Armada XP SoCs. This
means that the definition of "full" and "half" width is reduced from
64/32 to 32/16.

Signed-off-by: Chris Packham 
---
 drivers/edac/armada_xp_edac.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/edac/armada_xp_edac.c b/drivers/edac/armada_xp_edac.c
index 3759a4fbbdee..7f227bdcbc84 100644
--- a/drivers/edac/armada_xp_edac.c
+++ b/drivers/edac/armada_xp_edac.c
@@ -332,6 +332,11 @@ static int axp_mc_probe(struct platform_device *pdev)
 
axp_mc_read_config(mci);
 
+   /* These SoCs have a reduced width bus */
+   if (of_machine_is_compatible("marvell,armada380") ||
+   of_machine_is_compatible("marvell,armadaxp-98dx3236"))
+   drvdata->width /= 2;
+
/* configure SBE threshold */
/* it seems that SBEs are not captured otherwise */
writel(1 << SDRAM_ERR_CTRL_THR_OFFSET, drvdata->base + 
SDRAM_ERR_CTRL_REG);
-- 
2.19.1


[PATCH v6 5/9] dt-bindings: ARM: document marvell,ecc-enable binding

2018-11-08 Thread Chris Packham
Add documentation for the marvell,ecc-enable and marvell,ecc-disable
properties which can be used to enable/disable ECC on the Marvell aurora
cache.

Signed-off-by: Chris Packham 
---

Notes:
Changes in v6:
- new (split binding doc from implementation).

 Documentation/devicetree/bindings/arm/l2c2x0.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/l2c2x0.txt 
b/Documentation/devicetree/bindings/arm/l2c2x0.txt
index fbe6cb21f4cf..15a84f0ba9f1 100644
--- a/Documentation/devicetree/bindings/arm/l2c2x0.txt
+++ b/Documentation/devicetree/bindings/arm/l2c2x0.txt
@@ -76,6 +76,8 @@ Optional properties:
   specified to indicate that such transforms are precluded.
 - arm,parity-enable : enable parity checking on the L2 cache (L220 or PL310).
 - arm,parity-disable : disable parity checking on the L2 cache (L220 or PL310).
+- marvell,ecc-enable : enable ECC protection on the L2 cache
+- marvell,ecc-disable : disable ECC protection on the L2 cache
 - arm,outer-sync-disable : disable the outer sync operation on the L2 cache.
   Some core tiles, especially ARM PB11MPCore have a faulty L220 cache that
   will randomly hang unless outer sync operations are disabled.
-- 
2.19.1


[PATCH v6 8/9] EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

Add support for the ECC functionality as found in the DDR RAM and L2
cache controllers on the MV78230/MV78x60 SoCs. This driver has been
tested on the MV78460 (on a custom board with a DDR3 ECC DIMM).

Signed-off-by: Jan Luebbe 
[cp use SPDX license]
Signed-off-by: Chris Packham 
---
 MAINTAINERS   |   6 +
 drivers/edac/Kconfig  |   7 +
 drivers/edac/Makefile |   1 +
 drivers/edac/armada_xp_edac.c | 630 ++
 4 files changed, 644 insertions(+)
 create mode 100644 drivers/edac/armada_xp_edac.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6c7ed26e84fa..7ae4cfa5c121 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5242,6 +5242,12 @@ L:   linux-e...@vger.kernel.org
 S: Maintained
 F: drivers/edac/amd64_edac*
 
+EDAC-ARMADA
+M: Jan Luebbe 
+L: linux-e...@vger.kernel.org
+S: Maintained
+F: drivers/edac/armada_xp_*
+
 EDAC-CALXEDA
 M: Robert Richter 
 L: linux-e...@vger.kernel.org
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 57304b2e989f..4567757d9f82 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -439,6 +439,13 @@ config EDAC_ALTERA_SDMMC
  Support for error detection and correction on the
  Altera SDMMC FIFO Memory for Altera SoCs.
 
+config EDAC_ARMADA_XP
+   bool "Marvell Armada XP DDR and L2 Cache ECC"
+   depends on MACH_MVEBU_V7
+   help
+ Support for error correction and detection on the Marvell Aramada XP
+ DDR RAM and L2 cache controllers.
+
 config EDAC_SYNOPSYS
tristate "Synopsys DDR Memory Controller"
depends on ARCH_ZYNQ
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 02b43a7d8c3e..f3ea40b0ce9c 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -74,6 +74,7 @@ obj-$(CONFIG_EDAC_OCTEON_PCI) += octeon_edac-pci.o
 obj-$(CONFIG_EDAC_THUNDERX)+= thunderx_edac.o
 
 obj-$(CONFIG_EDAC_ALTERA)  += altera_edac.o
+obj-$(CONFIG_EDAC_ARMADA_XP)   += armada_xp_edac.o
 obj-$(CONFIG_EDAC_SYNOPSYS)+= synopsys_edac.o
 obj-$(CONFIG_EDAC_XGENE)   += xgene_edac.o
 obj-$(CONFIG_EDAC_TI)  += ti_edac.o
diff --git a/drivers/edac/armada_xp_edac.c b/drivers/edac/armada_xp_edac.c
new file mode 100644
index ..3759a4fbbdee
--- /dev/null
+++ b/drivers/edac/armada_xp_edac.c
@@ -0,0 +1,630 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017 Pengutronix, Jan Luebbe 
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "edac_mc.h"
+#include "edac_device.h"
+#include "edac_module.h"
+
+/ EDAC MC (DDR RAM) /
+
+#define SDRAM_NUM_CS 4
+
+#define SDRAM_CONFIG_REG0x0
+#define SDRAM_CONFIG_ECC_MASK BIT(18)
+#define SDRAM_CONFIG_REGISTERED_MASK  BIT(17)
+#define SDRAM_CONFIG_BUS_WIDTH_MASK   BIT(15)
+
+#define SDRAM_ADDR_CTRL_REG 0x10
+#define SDRAM_ADDR_CTRL_SIZE_HIGH_OFFSET(cs) (20+cs)
+#define SDRAM_ADDR_CTRL_SIZE_HIGH_MASK(cs)   (0x1 << 
SDRAM_ADDR_CTRL_SIZE_HIGH_OFFSET(cs))
+#define SDRAM_ADDR_CTRL_ADDR_SEL_MASK(cs)BIT(16+cs)
+#define SDRAM_ADDR_CTRL_SIZE_LOW_OFFSET(cs)  (cs*4+2)
+#define SDRAM_ADDR_CTRL_SIZE_LOW_MASK(cs)(0x3 << 
SDRAM_ADDR_CTRL_SIZE_LOW_OFFSET(cs))
+#define SDRAM_ADDR_CTRL_STRUCT_OFFSET(cs)(cs*4)
+#define SDRAM_ADDR_CTRL_STRUCT_MASK(cs)  (0x3 << 
SDRAM_ADDR_CTRL_STRUCT_OFFSET(cs))
+
+#define SDRAM_ERR_DATA_H_REG0x40
+#define SDRAM_ERR_DATA_L_REG0x44
+
+#define SDRAM_ERR_RECV_ECC_REG  0x48
+#define SDRAM_ERR_RECV_ECC_VALUE_MASK 0xff
+
+#define SDRAM_ERR_CALC_ECC_REG  0x4c
+#define SDRAM_ERR_CALC_ECC_ROW_OFFSET 8
+#define SDRAM_ERR_CALC_ECC_ROW_MASK   (0x << SDRAM_ERR_CALC_ECC_ROW_OFFSET)
+#define SDRAM_ERR_CALC_ECC_VALUE_MASK 0xff
+
+#define SDRAM_ERR_ADDR_REG  0x50
+#define SDRAM_ERR_ADDR_BANK_OFFSET23
+#define SDRAM_ERR_ADDR_BANK_MASK  (0x7 << SDRAM_ERR_ADDR_BANK_OFFSET)
+#define SDRAM_ERR_ADDR_COL_OFFSET 8
+#define SDRAM_ERR_ADDR_COL_MASK   (0x7fff << SDRAM_ERR_ADDR_COL_OFFSET)
+#define SDRAM_ERR_ADDR_CS_OFFSET  1
+#define SDRAM_ERR_ADDR_CS_MASK(0x3 << SDRAM_ERR_ADDR_CS_OFFSET)
+#define SDRAM_ERR_ADDR_TYPE_MASK  BIT(0)
+
+#define SDRAM_ERR_CTRL_REG  0x54
+#define SDRAM_ERR_CTRL_THR_OFFSET 16
+#define SDRAM_ERR_CTRL_THR_MASK   (0xff << SDRAM_ERR_CTRL_THR_OFFSET)
+#define SDRAM_ERR_CTRL_PROP_MASK  BIT(9)
+
+#define SDRAM_ERR_SBE_COUNT_REG 0x58
+#define SDRAM_ERR_DBE_COUNT_REG 0x5c
+
+#define SDRAM_ERR_CAUSE_ERR_REG 0xd0
+#define SDRAM_ERR_CAUSE_MSG_REG 0xd8
+#define SDRAM_ERR_CAUSE_DBE_MASK  BIT(1)
+#define SDRAM_ERR_CAUSE_SBE_MASK  BIT(0)
+
+#define SDRAM_RANK_CTRL_REG 0x1e0
+#define SDRAM_RANK_CTRL_EXIST_MASK(cs) BIT(cs)
+
+struct axp_mc_drvdata {
+   void __iomem *base;
+   /* width in bytes */
+   unsigned int width;
+   /* bank interleaving */
+   bool 

[PATCH v6 4/9] ARM: l2x0: support parity-enable/disable on aurora

2018-11-08 Thread Chris Packham
The aurora cache on the Marvell Armada-XP SoC supports the same tag
parity features as the other l2x0 cache implementations.

Signed-off-by: Chris Packham 
[j...@pengutronix.de: use aurora specific define AURORA_ACR_PARITY_EN]
Signed-off-by: Jan Luebbe 
---
 arch/arm/mm/cache-l2x0.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 7d2d2a3c67d0..b70bee74750d 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1505,6 +1505,13 @@ static void __init aurora_of_parse(const struct 
device_node *np,
mask |= AURORA_ACR_FORCE_WRITE_POLICY_MASK;
}
 
+   if (of_property_read_bool(np, "arm,parity-enable")) {
+   mask |= AURORA_ACR_PARITY_EN;
+   val |= AURORA_ACR_PARITY_EN;
+   } else if (of_property_read_bool(np, "arm,parity-disable")) {
+   mask |= AURORA_ACR_PARITY_EN;
+   }
+
*aux_val &= ~mask;
*aux_val |= val;
*aux_mask &= ~mask;
-- 
2.19.1


[PATCH v6 9/9] EDAC: armada_xp: Add support for more SoCs

2018-11-08 Thread Chris Packham
The Armada 38x and other integrated SoCs use a reduced pin count so the
width of the SDRAM interface is smaller than the Armada XP SoCs. This
means that the definition of "full" and "half" width is reduced from
64/32 to 32/16.

Signed-off-by: Chris Packham 
---
 drivers/edac/armada_xp_edac.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/edac/armada_xp_edac.c b/drivers/edac/armada_xp_edac.c
index 3759a4fbbdee..7f227bdcbc84 100644
--- a/drivers/edac/armada_xp_edac.c
+++ b/drivers/edac/armada_xp_edac.c
@@ -332,6 +332,11 @@ static int axp_mc_probe(struct platform_device *pdev)
 
axp_mc_read_config(mci);
 
+   /* These SoCs have a reduced width bus */
+   if (of_machine_is_compatible("marvell,armada380") ||
+   of_machine_is_compatible("marvell,armadaxp-98dx3236"))
+   drvdata->width /= 2;
+
/* configure SBE threshold */
/* it seems that SBEs are not captured otherwise */
writel(1 << SDRAM_ERR_CTRL_THR_OFFSET, drvdata->base + 
SDRAM_ERR_CTRL_REG);
-- 
2.19.1


[PATCH v6 5/9] dt-bindings: ARM: document marvell,ecc-enable binding

2018-11-08 Thread Chris Packham
Add documentation for the marvell,ecc-enable and marvell,ecc-disable
properties which can be used to enable/disable ECC on the Marvell aurora
cache.

Signed-off-by: Chris Packham 
---

Notes:
Changes in v6:
- new (split binding doc from implementation).

 Documentation/devicetree/bindings/arm/l2c2x0.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/l2c2x0.txt 
b/Documentation/devicetree/bindings/arm/l2c2x0.txt
index fbe6cb21f4cf..15a84f0ba9f1 100644
--- a/Documentation/devicetree/bindings/arm/l2c2x0.txt
+++ b/Documentation/devicetree/bindings/arm/l2c2x0.txt
@@ -76,6 +76,8 @@ Optional properties:
   specified to indicate that such transforms are precluded.
 - arm,parity-enable : enable parity checking on the L2 cache (L220 or PL310).
 - arm,parity-disable : disable parity checking on the L2 cache (L220 or PL310).
+- marvell,ecc-enable : enable ECC protection on the L2 cache
+- marvell,ecc-disable : disable ECC protection on the L2 cache
 - arm,outer-sync-disable : disable the outer sync operation on the L2 cache.
   Some core tiles, especially ARM PB11MPCore have a faulty L220 cache that
   will randomly hang unless outer sync operations are disabled.
-- 
2.19.1


[PATCH v6 0/9] EDAC drivers for Armada XP L2 and DDR

2018-11-08 Thread Chris Packham
The current plan is for these to go in via the ARM tree once appropriate
Reviews/Acks have been given

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-August/525561.html

This series adds drivers for the L2 cache and DDR RAM ECC functionality as
found on the MV78230/MV78x60 SoCs. Jan has tested these changes with the
MV78460 (on a custom board with a DDR3 ECC DIMM), Chris has tested these
changes with 88F6820 and 98dx3236 (both a custom boards with fixed DDR3 + ECC).

Also contained in this series is an additional debugfs wrapper.

Compared to the previous v5 series I've split the dt-binding documentation into
its own patch and updated armada_xp_edac.c to use a SPDX license.

Compared to the previous v4 series I've added my s-o-b to some of Jan's
patches and rebased against v4.19.0.

Compared to the previous v3 series, the following changes have been made:
- Use shorter names for the AURORA ECC and parity registers
- Numerous formatting changes to edac/armada_xp.c (as requested by Boris)
- Added support for Armada-38x and 98dx3236 SoCs

Compared to the previous v2 series, the following changes have been made:
- Allocate EDAC structures later during probing and drop devres support
  patches (requested by Boris)
- Make drvdata->width usage consistent according to the comment (suggested by
  Chris)

Compared to the previous v1 series, the following changes have been made:
- Add the aurora-l2 register defines earlier in the series (suggested by
  Russell King and Gregory CLEMENT )
- Changed the DT vendor prefix from "arm" to "marvell" for the 
ecc-enable/disable
  properties on the aurora-l2 (suggested by Russell King)
- Fix some warnings reported by checkpatch

Compared to the original RFC series, the following changes have been made:
- Integrated Chris' patches for parity and ECC configuration via DT
- Merged the DDR RAM and L2 cache drivers (as requested by Boris, analogous
  to fsl_ddr_edac.c and mpc85xx_edac.c)
- Added myself to MAINTAINERS (requested by Boris)
- L2 cache: Track the msg size and use snprintf (review comment by Chris)
- L2 cache: Split error injection from the check function (review comment by
  Chris)
- DDR RAM: Allow 16 bit width in addition to 32 and 64 bit (review comment by
  Chris)
- Use of_match_ptr() (review comments by Chris)
- Minor checkpatch cleanups

Chris Packham (4):
  ARM: l2x0: support parity-enable/disable on aurora
  dt-bindings: ARM: document marvell,ecc-enable binding
  ARM: l2x0: add marvell,ecc-enable property for aurora
  EDAC: armada_xp: Add support for more SoCs

Jan Luebbe (5):
  ARM: l2c: move cache-aurora-l2.h to asm/hardware
  ARM: aurora-l2: add prefix to MAX_RANGE_SIZE
  ARM: aurora-l2: add defines for parity and ECC registers
  EDAC: Add missing debugfs_create_x32 wrapper
  EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC

 .../devicetree/bindings/arm/l2c2x0.txt|   2 +
 MAINTAINERS   |   6 +
 .../asm/hardware}/cache-aurora-l2.h   |  50 +-
 arch/arm/mm/cache-l2x0.c  |  20 +-
 drivers/edac/Kconfig  |   7 +
 drivers/edac/Makefile |   1 +
 drivers/edac/armada_xp_edac.c | 635 ++
 drivers/edac/debugfs.c|  11 +
 drivers/edac/edac_module.h|   5 +
 9 files changed, 733 insertions(+), 4 deletions(-)
 rename arch/arm/{mm => include/asm/hardware}/cache-aurora-l2.h (50%)
 create mode 100644 drivers/edac/armada_xp_edac.c

-- 
2.19.1


[PATCH v6 0/9] EDAC drivers for Armada XP L2 and DDR

2018-11-08 Thread Chris Packham
The current plan is for these to go in via the ARM tree once appropriate
Reviews/Acks have been given

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-August/525561.html

This series adds drivers for the L2 cache and DDR RAM ECC functionality as
found on the MV78230/MV78x60 SoCs. Jan has tested these changes with the
MV78460 (on a custom board with a DDR3 ECC DIMM), Chris has tested these
changes with 88F6820 and 98dx3236 (both a custom boards with fixed DDR3 + ECC).

Also contained in this series is an additional debugfs wrapper.

Compared to the previous v5 series I've split the dt-binding documentation into
its own patch and updated armada_xp_edac.c to use a SPDX license.

Compared to the previous v4 series I've added my s-o-b to some of Jan's
patches and rebased against v4.19.0.

Compared to the previous v3 series, the following changes have been made:
- Use shorter names for the AURORA ECC and parity registers
- Numerous formatting changes to edac/armada_xp.c (as requested by Boris)
- Added support for Armada-38x and 98dx3236 SoCs

Compared to the previous v2 series, the following changes have been made:
- Allocate EDAC structures later during probing and drop devres support
  patches (requested by Boris)
- Make drvdata->width usage consistent according to the comment (suggested by
  Chris)

Compared to the previous v1 series, the following changes have been made:
- Add the aurora-l2 register defines earlier in the series (suggested by
  Russell King and Gregory CLEMENT )
- Changed the DT vendor prefix from "arm" to "marvell" for the 
ecc-enable/disable
  properties on the aurora-l2 (suggested by Russell King)
- Fix some warnings reported by checkpatch

Compared to the original RFC series, the following changes have been made:
- Integrated Chris' patches for parity and ECC configuration via DT
- Merged the DDR RAM and L2 cache drivers (as requested by Boris, analogous
  to fsl_ddr_edac.c and mpc85xx_edac.c)
- Added myself to MAINTAINERS (requested by Boris)
- L2 cache: Track the msg size and use snprintf (review comment by Chris)
- L2 cache: Split error injection from the check function (review comment by
  Chris)
- DDR RAM: Allow 16 bit width in addition to 32 and 64 bit (review comment by
  Chris)
- Use of_match_ptr() (review comments by Chris)
- Minor checkpatch cleanups

Chris Packham (4):
  ARM: l2x0: support parity-enable/disable on aurora
  dt-bindings: ARM: document marvell,ecc-enable binding
  ARM: l2x0: add marvell,ecc-enable property for aurora
  EDAC: armada_xp: Add support for more SoCs

Jan Luebbe (5):
  ARM: l2c: move cache-aurora-l2.h to asm/hardware
  ARM: aurora-l2: add prefix to MAX_RANGE_SIZE
  ARM: aurora-l2: add defines for parity and ECC registers
  EDAC: Add missing debugfs_create_x32 wrapper
  EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC

 .../devicetree/bindings/arm/l2c2x0.txt|   2 +
 MAINTAINERS   |   6 +
 .../asm/hardware}/cache-aurora-l2.h   |  50 +-
 arch/arm/mm/cache-l2x0.c  |  20 +-
 drivers/edac/Kconfig  |   7 +
 drivers/edac/Makefile |   1 +
 drivers/edac/armada_xp_edac.c | 635 ++
 drivers/edac/debugfs.c|  11 +
 drivers/edac/edac_module.h|   5 +
 9 files changed, 733 insertions(+), 4 deletions(-)
 rename arch/arm/{mm => include/asm/hardware}/cache-aurora-l2.h (50%)
 create mode 100644 drivers/edac/armada_xp_edac.c

-- 
2.19.1


[PATCH v6 2/9] ARM: aurora-l2: add prefix to MAX_RANGE_SIZE

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

The macro name is too generic, so add a AURORA_ prefix.

Signed-off-by: Jan Luebbe 
Reviewed-by: Gregory CLEMENT 
Signed-off-by: Chris Packham 
---
 arch/arm/include/asm/hardware/cache-aurora-l2.h | 2 +-
 arch/arm/mm/cache-l2x0.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/hardware/cache-aurora-l2.h 
b/arch/arm/include/asm/hardware/cache-aurora-l2.h
index c86124769831..dc5c479ec4c3 100644
--- a/arch/arm/include/asm/hardware/cache-aurora-l2.h
+++ b/arch/arm/include/asm/hardware/cache-aurora-l2.h
@@ -41,7 +41,7 @@
 #define AURORA_ACR_FORCE_WRITE_THRO_POLICY \
(2 << AURORA_ACR_FORCE_WRITE_POLICY_OFFSET)
 
-#define MAX_RANGE_SIZE 1024
+#define AURORA_MAX_RANGE_SIZE  1024
 
 #define AURORA_WAY_SIZE_SHIFT  2
 
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index a00d6f7fd34c..7d2d2a3c67d0 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1364,8 +1364,8 @@ static unsigned long aurora_range_end(unsigned long 
start, unsigned long end)
 * since cache range operations stall the CPU pipeline
 * until completion.
 */
-   if (end > start + MAX_RANGE_SIZE)
-   end = start + MAX_RANGE_SIZE;
+   if (end > start + AURORA_MAX_RANGE_SIZE)
+   end = start + AURORA_MAX_RANGE_SIZE;
 
/*
 * Cache range operations can't straddle a page boundary.
-- 
2.19.1


[PATCH v6 2/9] ARM: aurora-l2: add prefix to MAX_RANGE_SIZE

2018-11-08 Thread Chris Packham
From: Jan Luebbe 

The macro name is too generic, so add a AURORA_ prefix.

Signed-off-by: Jan Luebbe 
Reviewed-by: Gregory CLEMENT 
Signed-off-by: Chris Packham 
---
 arch/arm/include/asm/hardware/cache-aurora-l2.h | 2 +-
 arch/arm/mm/cache-l2x0.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/hardware/cache-aurora-l2.h 
b/arch/arm/include/asm/hardware/cache-aurora-l2.h
index c86124769831..dc5c479ec4c3 100644
--- a/arch/arm/include/asm/hardware/cache-aurora-l2.h
+++ b/arch/arm/include/asm/hardware/cache-aurora-l2.h
@@ -41,7 +41,7 @@
 #define AURORA_ACR_FORCE_WRITE_THRO_POLICY \
(2 << AURORA_ACR_FORCE_WRITE_POLICY_OFFSET)
 
-#define MAX_RANGE_SIZE 1024
+#define AURORA_MAX_RANGE_SIZE  1024
 
 #define AURORA_WAY_SIZE_SHIFT  2
 
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index a00d6f7fd34c..7d2d2a3c67d0 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1364,8 +1364,8 @@ static unsigned long aurora_range_end(unsigned long 
start, unsigned long end)
 * since cache range operations stall the CPU pipeline
 * until completion.
 */
-   if (end > start + MAX_RANGE_SIZE)
-   end = start + MAX_RANGE_SIZE;
+   if (end > start + AURORA_MAX_RANGE_SIZE)
+   end = start + AURORA_MAX_RANGE_SIZE;
 
/*
 * Cache range operations can't straddle a page boundary.
-- 
2.19.1


Re: [PATCH 2/2] phy: qualcomm: Add Synopsys High-Speed USB PHY driver

2018-11-08 Thread Shawn Guo
On Fri, Nov 09, 2018 at 10:52:17AM +0530, Vinod Koul wrote:
> On 08-11-18, 15:04, Shawn Guo wrote:
> > +static int qcom_snps_hsphy_config_regulators(struct hsphy_priv *priv, int 
> > high)
> > +{
> > +   int min, ret, i;
> > +
> > +   min = high ? 1 : 0; /* low or none? */
> > +
> > +   for (i = 0; i < VREG_NUM; i++) {
> > +   ret = regulator_set_voltage(priv->vregs[i].consumer,
> > +   priv->voltages[i][min],
> > +   priv->voltages[i][VOL_MAX]);
> > +   if (ret)
> > +   return ret;
> 
> should we not roll back the set voltages on error?

Yes.  I will get that handled in v2.  Thanks.

> 
> > +static int qcom_snps_hsphy_por_reset(struct hsphy_priv *priv)
> > +{
> > +   int ret;
> > +
> > +   ret = reset_control_assert(priv->por_reset);
> > +   if (ret)
> > +   return ret;
> > +
> > +   /*
> > +* The Femto PHY is POR reset in the following scenarios.
> 
> POR?

Hmm, I do not understand this comment.  The POR is commonly used as the
abbrev of power-on-reset.  What do you meat exactly?

> 
> > +static int qcom_snps_hsphy_init(struct phy *phy)
> > +{
> > +   struct hsphy_priv *priv = phy_get_drvdata(phy);
> > +   int state;
> > +   int ret;
> 
> perhaps they can be in a single line :)

I prefer to keep them on separate line, as that makes the addition and
removal of one of them relatively easier.

> 
> > +static int qcom_snps_hsphy_probe(struct platform_device *pdev)
> > +{
> > +   struct device *dev = >dev;
> > +   struct phy_provider *provider;
> > +   struct hsphy_priv *priv;
> > +   struct resource *res;
> > +   struct phy *phy;
> > +   int size;
> > +   int ret;
> > +   int i;
> > +
> > +   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> > +   if (!priv)
> > +   return -ENOMEM;
> > +
> > +   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +   priv->base = devm_ioremap_resource(dev, res);
> > +   if (IS_ERR(priv->base))
> > +   return PTR_ERR(priv->base);
> > +
> > +   priv->num_clks = ARRAY_SIZE(qcom_snps_hsphy_clks);
> > +   priv->clks = devm_kcalloc(dev, priv->num_clks, sizeof(*priv->clks),
> > + GFP_KERNEL);
> > +   if (!priv->clks)
> > +   return -ENOMEM;
> > +
> > +   for (i = 0; i < priv->num_clks; i++)
> > +   priv->clks[i].id = qcom_snps_hsphy_clks[i];
> > +
> > +   ret = devm_clk_bulk_get(dev, priv->num_clks, priv->clks);
> > +   if (ret)
> > +   return ret;
> > +
> > +   priv->phy_reset = devm_reset_control_get(dev, "phy");
> > +   if (IS_ERR(priv->phy_reset))
> > +   return PTR_ERR(priv->phy_reset);
> > +
> > +   priv->por_reset = devm_reset_control_get(dev, "por");
> > +   if (IS_ERR(priv->por_reset))
> > +   return PTR_ERR(priv->por_reset);
> > +
> > +   priv->vregs[VDD].supply = "vdd";
> > +   priv->vregs[VDDA_1P8].supply = "vdda1p8";
> > +   priv->vregs[VDDA_3P3].supply = "vdda3p3";
> > +
> > +   ret = devm_regulator_bulk_get(dev, VREG_NUM, priv->vregs);
> > +   if (ret)
> > +   return ret;
> > +
> > +   priv->voltages[VDDA_1P8][VOL_NONE] = 0;
> > +   priv->voltages[VDDA_1P8][VOL_MIN] = 180;
> > +   priv->voltages[VDDA_1P8][VOL_MAX] = 180;
> > +
> > +   priv->voltages[VDDA_3P3][VOL_NONE] = 0;
> > +   priv->voltages[VDDA_3P3][VOL_MIN] = 305;
> > +   priv->voltages[VDDA_3P3][VOL_MAX] = 330;
> > +
> > +   ret = of_property_read_u32_array(dev->of_node, "qcom,vdd-voltage-level",
> > +priv->voltages[VDD], VOL_NUM);
> > +   if (ret) {
> > +   dev_err(dev, "failed to read qcom,vdd-voltage-level\n");
> > +   return ret;
> > +   }
> > +
> > +   size = of_property_count_u32_elems(dev->of_node, "qcom,init-seq");
> > +   if (size < 0)
> > +   size = 0;
> > +
> > +   priv->init_seq = devm_kcalloc(dev, (size / 3) + 1,
> 
> size/3? I think it would be good to add a common explaining this

The property is a sequence of  tuples, and we are
figuring out how many tuples are there.  Yep, I will add a comment in
there for v2.

Shawn


Re: [PATCH 2/2] phy: qualcomm: Add Synopsys High-Speed USB PHY driver

2018-11-08 Thread Shawn Guo
On Fri, Nov 09, 2018 at 10:52:17AM +0530, Vinod Koul wrote:
> On 08-11-18, 15:04, Shawn Guo wrote:
> > +static int qcom_snps_hsphy_config_regulators(struct hsphy_priv *priv, int 
> > high)
> > +{
> > +   int min, ret, i;
> > +
> > +   min = high ? 1 : 0; /* low or none? */
> > +
> > +   for (i = 0; i < VREG_NUM; i++) {
> > +   ret = regulator_set_voltage(priv->vregs[i].consumer,
> > +   priv->voltages[i][min],
> > +   priv->voltages[i][VOL_MAX]);
> > +   if (ret)
> > +   return ret;
> 
> should we not roll back the set voltages on error?

Yes.  I will get that handled in v2.  Thanks.

> 
> > +static int qcom_snps_hsphy_por_reset(struct hsphy_priv *priv)
> > +{
> > +   int ret;
> > +
> > +   ret = reset_control_assert(priv->por_reset);
> > +   if (ret)
> > +   return ret;
> > +
> > +   /*
> > +* The Femto PHY is POR reset in the following scenarios.
> 
> POR?

Hmm, I do not understand this comment.  The POR is commonly used as the
abbrev of power-on-reset.  What do you meat exactly?

> 
> > +static int qcom_snps_hsphy_init(struct phy *phy)
> > +{
> > +   struct hsphy_priv *priv = phy_get_drvdata(phy);
> > +   int state;
> > +   int ret;
> 
> perhaps they can be in a single line :)

I prefer to keep them on separate line, as that makes the addition and
removal of one of them relatively easier.

> 
> > +static int qcom_snps_hsphy_probe(struct platform_device *pdev)
> > +{
> > +   struct device *dev = >dev;
> > +   struct phy_provider *provider;
> > +   struct hsphy_priv *priv;
> > +   struct resource *res;
> > +   struct phy *phy;
> > +   int size;
> > +   int ret;
> > +   int i;
> > +
> > +   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> > +   if (!priv)
> > +   return -ENOMEM;
> > +
> > +   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +   priv->base = devm_ioremap_resource(dev, res);
> > +   if (IS_ERR(priv->base))
> > +   return PTR_ERR(priv->base);
> > +
> > +   priv->num_clks = ARRAY_SIZE(qcom_snps_hsphy_clks);
> > +   priv->clks = devm_kcalloc(dev, priv->num_clks, sizeof(*priv->clks),
> > + GFP_KERNEL);
> > +   if (!priv->clks)
> > +   return -ENOMEM;
> > +
> > +   for (i = 0; i < priv->num_clks; i++)
> > +   priv->clks[i].id = qcom_snps_hsphy_clks[i];
> > +
> > +   ret = devm_clk_bulk_get(dev, priv->num_clks, priv->clks);
> > +   if (ret)
> > +   return ret;
> > +
> > +   priv->phy_reset = devm_reset_control_get(dev, "phy");
> > +   if (IS_ERR(priv->phy_reset))
> > +   return PTR_ERR(priv->phy_reset);
> > +
> > +   priv->por_reset = devm_reset_control_get(dev, "por");
> > +   if (IS_ERR(priv->por_reset))
> > +   return PTR_ERR(priv->por_reset);
> > +
> > +   priv->vregs[VDD].supply = "vdd";
> > +   priv->vregs[VDDA_1P8].supply = "vdda1p8";
> > +   priv->vregs[VDDA_3P3].supply = "vdda3p3";
> > +
> > +   ret = devm_regulator_bulk_get(dev, VREG_NUM, priv->vregs);
> > +   if (ret)
> > +   return ret;
> > +
> > +   priv->voltages[VDDA_1P8][VOL_NONE] = 0;
> > +   priv->voltages[VDDA_1P8][VOL_MIN] = 180;
> > +   priv->voltages[VDDA_1P8][VOL_MAX] = 180;
> > +
> > +   priv->voltages[VDDA_3P3][VOL_NONE] = 0;
> > +   priv->voltages[VDDA_3P3][VOL_MIN] = 305;
> > +   priv->voltages[VDDA_3P3][VOL_MAX] = 330;
> > +
> > +   ret = of_property_read_u32_array(dev->of_node, "qcom,vdd-voltage-level",
> > +priv->voltages[VDD], VOL_NUM);
> > +   if (ret) {
> > +   dev_err(dev, "failed to read qcom,vdd-voltage-level\n");
> > +   return ret;
> > +   }
> > +
> > +   size = of_property_count_u32_elems(dev->of_node, "qcom,init-seq");
> > +   if (size < 0)
> > +   size = 0;
> > +
> > +   priv->init_seq = devm_kcalloc(dev, (size / 3) + 1,
> 
> size/3? I think it would be good to add a common explaining this

The property is a sequence of  tuples, and we are
figuring out how many tuples are there.  Yep, I will add a comment in
there for v2.

Shawn


[RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-08 Thread Anthony Yznaga
When the THP enabled policy is "always", or the mode is "madvise" and a
region is marked as MADV_HUGEPAGE, a hugepage is allocated on a page
fault if the PMD is empty.  This yields the best VA translation
performance but increases memory consumption if a significant part of
the huge page is never accessed.

A while back a former colleague presented a patch to help address this
bloat [1]. Feedback from the community suggested investigating an alternate
approach to allocating THP hugepages using reservations, and since then
I have taken my colleague's work and expanded on it to implement a form
of reservation-based THP for private anonymous memory.  What I hope to
gain from this RFC is whether this approach is viable and what issues
there may be that I had not considered.  Apologies for the monolithic
patch.

The basic idea as outlined by Mel Gorman in [2] is:

1) On first fault in a sufficiently sized range, allocate a huge page
   sized and aligned block of base pages.  Map the base page
   corresponding to the fault address and hold the rest of the pages in
   reserve.
2) On subsequent faults in the range, map the pages from the reservation.
3) When enough pages have been mapped, promote the mapped pages and
   remaining pages in the reservation to a huge page.
4) When there is memory pressure, release the unused pages from their
   reservations.

[1] https://marc.info/?l=linux-mm=151631857310828=2
[2] https://lkml.org/lkml/2018/1/25/571

To test the idea I wrote a simple test that repeatedly forks children
where each child attempts to allocate a very large chunk of memory and
then touch either 1 page or a random number of pages in each huge page
region of the chunk.  On a machine with 256GB with a test chunk size of
16GB the test ends when the 17th child fails to map its chunk.  With THP
reservations enabled, the test ends when the 118th child fails.

Below are some additional implementation details and known issues.

User-visible files:

/sys/kernel/mm/transparent_hugepage/promotion_threshold

The number of base pages within a huge page aligned region that
must be faulted in before the region is eligible for promotion
to a huge page.

1
On the first page fault in a huge page sized and aligned
region, allocate and map a huge page.

> 1
On the first page fault in a huge page sized and aligned
region, allocate and reserve a huge page sized and aligned
block of pages and map a single page from the reservation.
Continue to map pages from the reservation on subsequent
faults.  Once the number of pages faulted from the reservation
is equal to or greater than the promotion_threshold, the
reservation is eligible to be promoted to a huge page by
khugepaged.

Currently the default value is HPAGE_PMD_NR / 2.

/sys/kernel/mm/transparent_hugepage/khugepaged/res_pages_collapsed

The number of THP reservations promoted to huge pages
by khugepaged.

This total is also included in the total reported in pages_collapsed.

Counters added to /proc/vmstat:

nr_thp_reserved

The total number of small pages in existing reservations
that have not had a page fault since their respective
reservation were created.  The amount is also included
in the estimated total memory available as reported
in MemAvailable in /proc/meminfo.

thp_res_alloc

Incremented every time the pages for a reservation have been
successfully allocated to handle a page fault.

thp_res_alloc_failed

Incremented if pages could not successfully allocated for
a reservation.

Known Issues:

- COW handling of reservations is insufficient.   While the pages of a
reservation are shared between parent and child after fork, currently
the reservation data structures are not shared and remain with the
parent.  A COW fault by the child allocates a new small page and a new
reservation is not allocated.  A COW fault by the parent allocates a new
small page and releases the reservation if one exists.

- If the pages in a reservation are remapped read-only (e.g. after fork
and child exit), khugepaged will never promote the pages to a huge page
until at least one page is written.

- A reservation is allocated even if the first fault on a pmd range maps
a zero page.  It may be more space efficient to allocate the reservation
on the first write fault.

- To facilitate the shrinker implementation, reservations are kept in a
global struct list_lru.  The list_lru internal implementation puts items
added to a list_lru on to per-node lists based on the node id derived
from the address of the item passed to list_lru_add().  For the current
reservations shrinker implementation this means that reservations will
be placed on the internal per-node list corresponding to the node where
the reservation data structure is located rather than the node where the
reserved 

[RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-08 Thread Anthony Yznaga
When the THP enabled policy is "always", or the mode is "madvise" and a
region is marked as MADV_HUGEPAGE, a hugepage is allocated on a page
fault if the PMD is empty.  This yields the best VA translation
performance but increases memory consumption if a significant part of
the huge page is never accessed.

A while back a former colleague presented a patch to help address this
bloat [1]. Feedback from the community suggested investigating an alternate
approach to allocating THP hugepages using reservations, and since then
I have taken my colleague's work and expanded on it to implement a form
of reservation-based THP for private anonymous memory.  What I hope to
gain from this RFC is whether this approach is viable and what issues
there may be that I had not considered.  Apologies for the monolithic
patch.

The basic idea as outlined by Mel Gorman in [2] is:

1) On first fault in a sufficiently sized range, allocate a huge page
   sized and aligned block of base pages.  Map the base page
   corresponding to the fault address and hold the rest of the pages in
   reserve.
2) On subsequent faults in the range, map the pages from the reservation.
3) When enough pages have been mapped, promote the mapped pages and
   remaining pages in the reservation to a huge page.
4) When there is memory pressure, release the unused pages from their
   reservations.

[1] https://marc.info/?l=linux-mm=151631857310828=2
[2] https://lkml.org/lkml/2018/1/25/571

To test the idea I wrote a simple test that repeatedly forks children
where each child attempts to allocate a very large chunk of memory and
then touch either 1 page or a random number of pages in each huge page
region of the chunk.  On a machine with 256GB with a test chunk size of
16GB the test ends when the 17th child fails to map its chunk.  With THP
reservations enabled, the test ends when the 118th child fails.

Below are some additional implementation details and known issues.

User-visible files:

/sys/kernel/mm/transparent_hugepage/promotion_threshold

The number of base pages within a huge page aligned region that
must be faulted in before the region is eligible for promotion
to a huge page.

1
On the first page fault in a huge page sized and aligned
region, allocate and map a huge page.

> 1
On the first page fault in a huge page sized and aligned
region, allocate and reserve a huge page sized and aligned
block of pages and map a single page from the reservation.
Continue to map pages from the reservation on subsequent
faults.  Once the number of pages faulted from the reservation
is equal to or greater than the promotion_threshold, the
reservation is eligible to be promoted to a huge page by
khugepaged.

Currently the default value is HPAGE_PMD_NR / 2.

/sys/kernel/mm/transparent_hugepage/khugepaged/res_pages_collapsed

The number of THP reservations promoted to huge pages
by khugepaged.

This total is also included in the total reported in pages_collapsed.

Counters added to /proc/vmstat:

nr_thp_reserved

The total number of small pages in existing reservations
that have not had a page fault since their respective
reservation were created.  The amount is also included
in the estimated total memory available as reported
in MemAvailable in /proc/meminfo.

thp_res_alloc

Incremented every time the pages for a reservation have been
successfully allocated to handle a page fault.

thp_res_alloc_failed

Incremented if pages could not successfully allocated for
a reservation.

Known Issues:

- COW handling of reservations is insufficient.   While the pages of a
reservation are shared between parent and child after fork, currently
the reservation data structures are not shared and remain with the
parent.  A COW fault by the child allocates a new small page and a new
reservation is not allocated.  A COW fault by the parent allocates a new
small page and releases the reservation if one exists.

- If the pages in a reservation are remapped read-only (e.g. after fork
and child exit), khugepaged will never promote the pages to a huge page
until at least one page is written.

- A reservation is allocated even if the first fault on a pmd range maps
a zero page.  It may be more space efficient to allocate the reservation
on the first write fault.

- To facilitate the shrinker implementation, reservations are kept in a
global struct list_lru.  The list_lru internal implementation puts items
added to a list_lru on to per-node lists based on the node id derived
from the address of the item passed to list_lru_add().  For the current
reservations shrinker implementation this means that reservations will
be placed on the internal per-node list corresponding to the node where
the reservation data structure is located rather than the node where the
reserved 

[RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h

2018-11-08 Thread Naoya Horiguchi
num_poisoned_pages_inc/dec had better be visible to some file like
mm/sparse.c and mm/page_alloc.c (for a subsequent patch). So let's
move it to include/linux/mm.h.

Signed-off-by: Naoya Horiguchi 
---
 include/linux/mm.h  | 13 -
 include/linux/swapops.h | 16 
 mm/sparse.c |  2 +-
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index 59df394..22623ba 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2741,7 +2741,7 @@ extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(struct page *page, int flags);
 
-
+#ifdef CONFIG_MEMORY_FAILURE
 /*
  * Error handlers for various types of pages.
  */
@@ -2777,6 +2777,17 @@ enum mf_action_page_type {
MF_MSG_UNKNOWN,
 };
 
+static inline void num_poisoned_pages_inc(void)
+{
+   atomic_long_inc(_poisoned_pages);
+}
+
+static inline void num_poisoned_pages_dec(void)
+{
+   atomic_long_dec(_poisoned_pages);
+}
+#endif
+
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
unsigned long addr_hint,
diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
index 4d96166..88137e9 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
@@ -320,8 +320,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
 
 #ifdef CONFIG_MEMORY_FAILURE
 
-extern atomic_long_t num_poisoned_pages __read_mostly;
-
 /*
  * Support for hardware poisoned pages
  */
@@ -336,16 +334,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
return swp_type(entry) == SWP_HWPOISON;
 }
 
-static inline void num_poisoned_pages_inc(void)
-{
-   atomic_long_inc(_poisoned_pages);
-}
-
-static inline void num_poisoned_pages_dec(void)
-{
-   atomic_long_dec(_poisoned_pages);
-}
-
 #else
 
 static inline swp_entry_t make_hwpoison_entry(struct page *page)
@@ -357,10 +345,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
 {
return 0;
 }
-
-static inline void num_poisoned_pages_inc(void)
-{
-}
 #endif
 
 #if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION)
diff --git v4.19-mmotm-2018-10-30-16-08/mm/sparse.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
index 33307fc..7ada2e5 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/sparse.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
@@ -726,7 +726,7 @@ static void clear_hwpoisoned_pages(struct page *memmap, int 
nr_pages)
 
for (i = 0; i < nr_pages; i++) {
if (PageHWPoison([i])) {
-   atomic_long_sub(1, _poisoned_pages);
+   num_poisoned_pages_dec();
ClearPageHWPoison([i]);
}
}
-- 
2.7.0



[RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h

2018-11-08 Thread Naoya Horiguchi
num_poisoned_pages_inc/dec had better be visible to some file like
mm/sparse.c and mm/page_alloc.c (for a subsequent patch). So let's
move it to include/linux/mm.h.

Signed-off-by: Naoya Horiguchi 
---
 include/linux/mm.h  | 13 -
 include/linux/swapops.h | 16 
 mm/sparse.c |  2 +-
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index 59df394..22623ba 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2741,7 +2741,7 @@ extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(struct page *page, int flags);
 
-
+#ifdef CONFIG_MEMORY_FAILURE
 /*
  * Error handlers for various types of pages.
  */
@@ -2777,6 +2777,17 @@ enum mf_action_page_type {
MF_MSG_UNKNOWN,
 };
 
+static inline void num_poisoned_pages_inc(void)
+{
+   atomic_long_inc(_poisoned_pages);
+}
+
+static inline void num_poisoned_pages_dec(void)
+{
+   atomic_long_dec(_poisoned_pages);
+}
+#endif
+
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
unsigned long addr_hint,
diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
index 4d96166..88137e9 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
@@ -320,8 +320,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
 
 #ifdef CONFIG_MEMORY_FAILURE
 
-extern atomic_long_t num_poisoned_pages __read_mostly;
-
 /*
  * Support for hardware poisoned pages
  */
@@ -336,16 +334,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
return swp_type(entry) == SWP_HWPOISON;
 }
 
-static inline void num_poisoned_pages_inc(void)
-{
-   atomic_long_inc(_poisoned_pages);
-}
-
-static inline void num_poisoned_pages_dec(void)
-{
-   atomic_long_dec(_poisoned_pages);
-}
-
 #else
 
 static inline swp_entry_t make_hwpoison_entry(struct page *page)
@@ -357,10 +345,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
 {
return 0;
 }
-
-static inline void num_poisoned_pages_inc(void)
-{
-}
 #endif
 
 #if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION)
diff --git v4.19-mmotm-2018-10-30-16-08/mm/sparse.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
index 33307fc..7ada2e5 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/sparse.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
@@ -726,7 +726,7 @@ static void clear_hwpoisoned_pages(struct page *memmap, int 
nr_pages)
 
for (i = 0; i < nr_pages; i++) {
if (PageHWPoison([i])) {
-   atomic_long_sub(1, _poisoned_pages);
+   num_poisoned_pages_dec();
ClearPageHWPoison([i]);
}
}
-- 
2.7.0



Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes"

2018-11-08 Thread Sergey Senozhatsky
On (11/01/18 09:05), Daniel Wang wrote:
> > Another deadlock scenario could be the following one:
> >
> > printk()
> >  console_trylock()
> >   down_trylock()
> >raw_spin_lock_irqsave(>lock, flags)
> > 
> >  panic()
> >   console_flush_on_panic()
> >console_trylock()
> > raw_spin_lock_irqsave(>lock, flags)// deadlock
> >
> > There are no patches addressing this one at the moment. And it's
> > unclear if you are hitting this scenario.
> 
> I am not sure, but Steven's patches did make the deadlock I saw go away...

You certainly can find cases when "busy spin on console_sem owner" logic
can reduce some possibilities.

But spin_lock(); NMI; spin_lock(); code is still in the kernel.

> A little swamped by other things lately but I'll run a test with it.
> If it works, would you recommend taking your patch alone

Let's first figure out if it works.

-ss


Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes"

2018-11-08 Thread Sergey Senozhatsky
On (11/01/18 09:05), Daniel Wang wrote:
> > Another deadlock scenario could be the following one:
> >
> > printk()
> >  console_trylock()
> >   down_trylock()
> >raw_spin_lock_irqsave(>lock, flags)
> > 
> >  panic()
> >   console_flush_on_panic()
> >console_trylock()
> > raw_spin_lock_irqsave(>lock, flags)// deadlock
> >
> > There are no patches addressing this one at the moment. And it's
> > unclear if you are hitting this scenario.
> 
> I am not sure, but Steven's patches did make the deadlock I saw go away...

You certainly can find cases when "busy spin on console_sem owner" logic
can reduce some possibilities.

But spin_lock(); NMI; spin_lock(); code is still in the kernel.

> A little swamped by other things lately but I'll run a test with it.
> If it works, would you recommend taking your patch alone

Let's first figure out if it works.

-ss


[RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page()

2018-11-08 Thread Naoya Horiguchi
set_hwpoison_free_buddy_page() could fail, then the target page is
finally not isolated, so it's better to report -EBUSY for userspace
to know the failure and chance of retry.

And for consistency, this patch moves set_hwpoison_free_buddy_page()
in unmap_and_move() to __soft_offline_page().

Fixes: 6bc9b56433b7 ("mm: fix race on soft-offlining free huge pages")
Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c | 15 ---
 mm/migrate.c|  9 -
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 9f09bf3..11e283e 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1719,14 +1719,18 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
/*
 * We set PG_hwpoison only when the migration source hugepage
 * was successfully dissolved, because otherwise hwpoisoned
-* hugepage remains on free hugepage list, then userspace will
-* find it as SIGBUS by allocation failure. That's not expected
-* in soft-offlining.
+* hugepage remains on free hugepage list. The allocator ignores
+* such a hwpoisoned page so it's never allocated, but it could
+* kill a process because of no-memory rather than hwpoison.
+* Soft-offline never impacts the userspace, so this is
+* undesired.
 */
ret = dissolve_free_huge_page(page);
if (!ret) {
if (set_hwpoison_free_buddy_page(page))
num_poisoned_pages_inc();
+   else
+   ret = -EBUSY;
}
}
return ret;
@@ -1804,6 +1808,11 @@ static int __soft_offline_page(struct page *page, int 
flags)
pfn, ret, page->flags, >flags);
if (ret > 0)
ret = -EIO;
+   } else {
+   if (set_hwpoison_free_buddy_page(page))
+   num_poisoned_pages_inc();
+   else
+   ret = -EBUSY;
}
} else {
pr_info("soft offline: %#lx: isolation failed: %d, page count 
%d, type %lx (%pGp)\n",
diff --git v4.19-mmotm-2018-10-30-16-08/mm/migrate.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/migrate.c
index f7e4bfd..1742372 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/migrate.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/migrate.c
@@ -1199,15 +1199,6 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
 */
if (rc == MIGRATEPAGE_SUCCESS) {
put_page(page);
-   if (reason == MR_MEMORY_FAILURE) {
-   /*
-* Set PG_HWPoison on just freed page
-* intentionally. Although it's rather weird,
-* it's how HWPoison flag works at the moment.
-*/
-   if (set_hwpoison_free_buddy_page(page))
-   num_poisoned_pages_inc();
-   }
} else {
if (rc != -EAGAIN) {
if (likely(!__PageMovable(page))) {
-- 
2.7.0



[RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED

2018-11-08 Thread Naoya Horiguchi
Currently madvise_inject_error() pins the target page when calling
memory error handler, but it's not good because the refcount is just
an artifact of error injector and mock nothing about hw error itself.
IOW, pinning the error page is part of error handler's task, so
let's stop doing it.

Signed-off-by: Naoya Horiguchi 
---
 mm/madvise.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
index 6cb1ca9..9fa0225 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
@@ -637,6 +637,16 @@ static int madvise_inject_error(int behavior,
ret = get_user_pages_fast(start, 1, 0, );
if (ret != 1)
return ret;
+   /*
+* The get_user_pages_fast() is just to get the pfn of the
+* given address, and the refcount has nothing to do with
+* what we try to test, so it should be released immediately.
+* This is racy but it's intended because the real hardware
+* errors could happen at any moment and memory error handlers
+* must properly handle the race.
+*/
+   put_page(page);
+
pfn = page_to_pfn(page);
 
/*
@@ -646,16 +656,11 @@ static int madvise_inject_error(int behavior,
 */
order = compound_order(compound_head(page));
 
-   if (PageHWPoison(page)) {
-   put_page(page);
-   continue;
-   }
-
if (behavior == MADV_SOFT_OFFLINE) {
pr_info("Soft offlining pfn %#lx at process virtual 
address %#lx\n",
pfn, start);
 
-   ret = soft_offline_page(page, MF_COUNT_INCREASED);
+   ret = soft_offline_page(page, 0);
if (ret)
return ret;
continue;
@@ -663,14 +668,6 @@ static int madvise_inject_error(int behavior,
 
pr_info("Injecting memory failure for pfn %#lx at process 
virtual address %#lx\n",
pfn, start);
-
-   /*
-* Drop the page reference taken by get_user_pages_fast(). In
-* the absence of MF_COUNT_INCREASED the memory_failure()
-* routine is responsible for pinning the page to prevent it
-* from being released back to the page allocator.
-*/
-   put_page(page);
ret = memory_failure(pfn, 0);
if (ret)
return ret;
-- 
2.7.0



[RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page()

2018-11-08 Thread Naoya Horiguchi
The new function is a reverse operation of set_hwpoison_free_buddy_page()
to adjust unpoison_memory() to the new semantics.

Signed-off-by: Naoya Horiguchi 
---
 include/linux/page-flags.h |  8 +++-
 mm/memory-failure.c|  5 +++--
 mm/page_alloc.c| 21 +
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/page-flags.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/page-flags.h
index 50ce1bd..ab0bde0 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/page-flags.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/page-flags.h
@@ -382,11 +382,17 @@ PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
 extern bool set_hwpoison_free_buddy_page(struct page *page);
+extern bool clear_hwpoison_free_buddy_page(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
 static inline bool set_hwpoison_free_buddy_page(struct page *page)
 {
-   return 0;
+   return false;
+}
+
+static inline bool clear_hwpoison_free_buddy_page(struct page *page)
+{
+   return false;
 }
 #define __PG_HWPOISON 0
 #endif
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index af541141..a0e1cd4 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1590,8 +1590,9 @@ int unpoison_memory(unsigned long pfn)
}
 
if (!get_hwpoison_page(p)) {
-   if (TestClearPageHWPoison(p))
-   num_poisoned_pages_dec();
+   if (!clear_hwpoison_free_buddy_page(p))
+   return 0;
+   num_poisoned_pages_dec();
unpoison_pr_info("Unpoison: Software-unpoisoned free page 
%#lx\n",
 pfn, _rs);
return 0;
diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index 27826b3..9a90f93 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8270,4 +8270,25 @@ bool set_hwpoison_free_buddy_page(struct page *page)
 
return hwpoisoned;
 }
+
+/*
+ * Reverse operation of set_hwpoison_free_buddy_page(), which is expected
+ * to work only on error pages isolated from buddy allocator.
+ */
+bool clear_hwpoison_free_buddy_page(struct page *page)
+{
+   struct zone *zone = page_zone(page);
+   bool unpoisoned = false;
+
+   spin_lock(>lock);
+   if (TestClearPageHWPoison(page)) {
+   unsigned long pfn = page_to_pfn(page);
+   int migratetype = get_pfnblock_migratetype(page, pfn);
+
+   __free_one_page(page, pfn, zone, 0, migratetype);
+   unpoisoned = true;
+   }
+   spin_unlock(>lock);
+   return unpoisoned;
+}
 #endif
-- 
2.7.0



[RFC][PATCH v1 05/11] mm: hwpoison-inject: don't pin for hwpoison_filter()

2018-11-08 Thread Naoya Horiguchi
Another memory error injection interface debugfs:hwpoison/corrupt-pfn
also takes bogus refcount for hwpoison_filter(). It's justified
because this does a coarse filter, expecting that memory_failure()
redoes the check for sure.

Signed-off-by: Naoya Horiguchi 
---
 mm/hwpoison-inject.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/hwpoison-inject.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/hwpoison-inject.c
index b6ac706..766062c 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/hwpoison-inject.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/hwpoison-inject.c
@@ -25,11 +25,6 @@ static int hwpoison_inject(void *data, u64 val)
 
p = pfn_to_page(pfn);
hpage = compound_head(p);
-   /*
-* This implies unable to support free buddy pages.
-*/
-   if (!get_hwpoison_page(p))
-   return 0;
 
if (!hwpoison_filter_enable)
goto inject;
@@ -39,23 +34,20 @@ static int hwpoison_inject(void *data, u64 val)
 * This implies unable to support non-LRU pages.
 */
if (!PageLRU(hpage) && !PageHuge(p))
-   goto put_out;
+   return 0;
 
/*
-* do a racy check with elevated page count, to make sure PG_hwpoison
-* will only be set for the targeted owner (or on a free page).
+* do a racy check to make sure PG_hwpoison will only be set for
+* the targeted owner (or on a free page).
 * memory_failure() will redo the check reliably inside page lock.
 */
err = hwpoison_filter(hpage);
if (err)
-   goto put_out;
+   return 0;
 
 inject:
pr_info("Injecting memory failure at pfn %#lx\n", pfn);
-   return memory_failure(pfn, MF_COUNT_INCREASED);
-put_out:
-   put_hwpoison_page(p);
-   return 0;
+   return memory_failure(pfn, 0);
 }
 
 static int hwpoison_unpoison(void *data, u64 val)
-- 
2.7.0



[RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check

2018-11-08 Thread Naoya Horiguchi
memory_failure() forks to memory_failure_hugetlb() for hugetlb pages,
so a PageHuge() check after the fork should not be necessary.

Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 0cd3de3..9f09bf3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1357,10 +1357,7 @@ int memory_failure(unsigned long pfn, int flags)
 * page_remove_rmap() in try_to_unmap_one(). So to determine page status
 * correctly, we save a copy of the page flags at this time.
 */
-   if (PageHuge(p))
-   page_flags = hpage->flags;
-   else
-   page_flags = p->flags;
+   page_flags = p->flags;
 
/*
 * unpoison always clear PG_hwpoison inside page lock
-- 
2.7.0



[RFC][PATCH v1 09/11] mm: hwpoison: apply buddy page handling code to hard-offline

2018-11-08 Thread Naoya Horiguchi
Hard-offline of free buddy pages can be handled in the same manner as
soft-offline. So this patch applies the new semantics to hard-offline to
more complete isolation of offlined page. As a result, the successful
case is worth MF_RECOVERED instead of MF_DELAYED, so this patch also
changes it.

Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c | 38 --
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index ecafd4a..af541141 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -772,6 +772,16 @@ static int me_swapcache_clean(struct page *p, unsigned 
long pfn)
return MF_FAILED;
 }
 
+static int me_huge_free_page(struct page *p)
+{
+   int rc = dissolve_free_huge_page(p);
+
+   if (!rc && set_hwpoison_free_buddy_page(p))
+   return MF_RECOVERED;
+   else
+   return MF_FAILED;
+}
+
 /*
  * Huge pages. Needs work.
  * Issues:
@@ -799,8 +809,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 */
if (PageAnon(hpage))
put_page(hpage);
-   dissolve_free_huge_page(p);
-   res = MF_RECOVERED;
+   res = me_huge_free_page(p);
lock_page(hpage);
}
 
@@ -1108,8 +1117,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int 
flags)
}
}
unlock_page(head);
-   dissolve_free_huge_page(p);
-   action_result(pfn, MF_MSG_FREE_HUGE, MF_DELAYED);
+
+   res = me_huge_free_page(p);
+   if (res == MF_FAILED)
+   num_poisoned_pages_dec();
+   action_result(pfn, MF_MSG_FREE_HUGE, res);
return 0;
}
 
@@ -1270,6 +1282,13 @@ int memory_failure(unsigned long pfn, int flags)
p = pfn_to_page(pfn);
if (PageHuge(p))
return memory_failure_hugetlb(pfn, flags);
+
+   if (set_hwpoison_free_buddy_page(p)) {
+   action_result(pfn, MF_MSG_BUDDY, MF_RECOVERED);
+   num_poisoned_pages_inc();
+   return 0;
+   }
+
if (TestSetPageHWPoison(p)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n",
pfn);
@@ -1281,8 +1300,7 @@ int memory_failure(unsigned long pfn, int flags)
 
/*
 * We need/can do nothing about count=0 pages.
-* 1) it's a free page, and therefore in safe hand:
-*prep_new_page() will be the gate keeper.
+* 1) it's a free page, and removed from buddy allocator.
 * 2) it's part of a non-compound high order page.
 *Implies some kernel user: cannot stop them from
 *R/W the page; let's pray that the page has been
@@ -1291,8 +1309,8 @@ int memory_failure(unsigned long pfn, int flags)
 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 */
if (!get_hwpoison_page(p)) {
-   if (is_free_buddy_page(p)) {
-   action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
+   if (set_hwpoison_free_buddy_page(p)) {
+   action_result(pfn, MF_MSG_BUDDY, MF_RECOVERED);
return 0;
} else {
action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, 
MF_IGNORED);
@@ -1330,8 +1348,8 @@ int memory_failure(unsigned long pfn, int flags)
 */
shake_page(p, 0);
/* shake_page could have turned it free. */
-   if (!PageLRU(p) && is_free_buddy_page(p)) {
-   action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+   if (!PageLRU(p) && set_hwpoison_free_buddy_page(p)) {
+   action_result(pfn, MF_MSG_BUDDY_2ND, MF_RECOVERED);
return 0;
}
 
-- 
2.7.0



[RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page()

2018-11-08 Thread Naoya Horiguchi
set_hwpoison_free_buddy_page() could fail, then the target page is
finally not isolated, so it's better to report -EBUSY for userspace
to know the failure and chance of retry.

And for consistency, this patch moves set_hwpoison_free_buddy_page()
in unmap_and_move() to __soft_offline_page().

Fixes: 6bc9b56433b7 ("mm: fix race on soft-offlining free huge pages")
Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c | 15 ---
 mm/migrate.c|  9 -
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 9f09bf3..11e283e 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1719,14 +1719,18 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
/*
 * We set PG_hwpoison only when the migration source hugepage
 * was successfully dissolved, because otherwise hwpoisoned
-* hugepage remains on free hugepage list, then userspace will
-* find it as SIGBUS by allocation failure. That's not expected
-* in soft-offlining.
+* hugepage remains on free hugepage list. The allocator ignores
+* such a hwpoisoned page so it's never allocated, but it could
+* kill a process because of no-memory rather than hwpoison.
+* Soft-offline never impacts the userspace, so this is
+* undesired.
 */
ret = dissolve_free_huge_page(page);
if (!ret) {
if (set_hwpoison_free_buddy_page(page))
num_poisoned_pages_inc();
+   else
+   ret = -EBUSY;
}
}
return ret;
@@ -1804,6 +1808,11 @@ static int __soft_offline_page(struct page *page, int 
flags)
pfn, ret, page->flags, >flags);
if (ret > 0)
ret = -EIO;
+   } else {
+   if (set_hwpoison_free_buddy_page(page))
+   num_poisoned_pages_inc();
+   else
+   ret = -EBUSY;
}
} else {
pr_info("soft offline: %#lx: isolation failed: %d, page count 
%d, type %lx (%pGp)\n",
diff --git v4.19-mmotm-2018-10-30-16-08/mm/migrate.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/migrate.c
index f7e4bfd..1742372 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/migrate.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/migrate.c
@@ -1199,15 +1199,6 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
 */
if (rc == MIGRATEPAGE_SUCCESS) {
put_page(page);
-   if (reason == MR_MEMORY_FAILURE) {
-   /*
-* Set PG_HWPoison on just freed page
-* intentionally. Although it's rather weird,
-* it's how HWPoison flag works at the moment.
-*/
-   if (set_hwpoison_free_buddy_page(page))
-   num_poisoned_pages_inc();
-   }
} else {
if (rc != -EAGAIN) {
if (likely(!__PageMovable(page))) {
-- 
2.7.0



[RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED

2018-11-08 Thread Naoya Horiguchi
Currently madvise_inject_error() pins the target page when calling
memory error handler, but it's not good because the refcount is just
an artifact of error injector and mock nothing about hw error itself.
IOW, pinning the error page is part of error handler's task, so
let's stop doing it.

Signed-off-by: Naoya Horiguchi 
---
 mm/madvise.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
index 6cb1ca9..9fa0225 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
@@ -637,6 +637,16 @@ static int madvise_inject_error(int behavior,
ret = get_user_pages_fast(start, 1, 0, );
if (ret != 1)
return ret;
+   /*
+* The get_user_pages_fast() is just to get the pfn of the
+* given address, and the refcount has nothing to do with
+* what we try to test, so it should be released immediately.
+* This is racy but it's intended because the real hardware
+* errors could happen at any moment and memory error handlers
+* must properly handle the race.
+*/
+   put_page(page);
+
pfn = page_to_pfn(page);
 
/*
@@ -646,16 +656,11 @@ static int madvise_inject_error(int behavior,
 */
order = compound_order(compound_head(page));
 
-   if (PageHWPoison(page)) {
-   put_page(page);
-   continue;
-   }
-
if (behavior == MADV_SOFT_OFFLINE) {
pr_info("Soft offlining pfn %#lx at process virtual 
address %#lx\n",
pfn, start);
 
-   ret = soft_offline_page(page, MF_COUNT_INCREASED);
+   ret = soft_offline_page(page, 0);
if (ret)
return ret;
continue;
@@ -663,14 +668,6 @@ static int madvise_inject_error(int behavior,
 
pr_info("Injecting memory failure for pfn %#lx at process 
virtual address %#lx\n",
pfn, start);
-
-   /*
-* Drop the page reference taken by get_user_pages_fast(). In
-* the absence of MF_COUNT_INCREASED the memory_failure()
-* routine is responsible for pinning the page to prevent it
-* from being released back to the page allocator.
-*/
-   put_page(page);
ret = memory_failure(pfn, 0);
if (ret)
return ret;
-- 
2.7.0



[RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page()

2018-11-08 Thread Naoya Horiguchi
The new function is a reverse operation of set_hwpoison_free_buddy_page()
to adjust unpoison_memory() to the new semantics.

Signed-off-by: Naoya Horiguchi 
---
 include/linux/page-flags.h |  8 +++-
 mm/memory-failure.c|  5 +++--
 mm/page_alloc.c| 21 +
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/page-flags.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/page-flags.h
index 50ce1bd..ab0bde0 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/page-flags.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/page-flags.h
@@ -382,11 +382,17 @@ PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
 extern bool set_hwpoison_free_buddy_page(struct page *page);
+extern bool clear_hwpoison_free_buddy_page(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
 static inline bool set_hwpoison_free_buddy_page(struct page *page)
 {
-   return 0;
+   return false;
+}
+
+static inline bool clear_hwpoison_free_buddy_page(struct page *page)
+{
+   return false;
 }
 #define __PG_HWPOISON 0
 #endif
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index af541141..a0e1cd4 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1590,8 +1590,9 @@ int unpoison_memory(unsigned long pfn)
}
 
if (!get_hwpoison_page(p)) {
-   if (TestClearPageHWPoison(p))
-   num_poisoned_pages_dec();
+   if (!clear_hwpoison_free_buddy_page(p))
+   return 0;
+   num_poisoned_pages_dec();
unpoison_pr_info("Unpoison: Software-unpoisoned free page 
%#lx\n",
 pfn, _rs);
return 0;
diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index 27826b3..9a90f93 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8270,4 +8270,25 @@ bool set_hwpoison_free_buddy_page(struct page *page)
 
return hwpoisoned;
 }
+
+/*
+ * Reverse operation of set_hwpoison_free_buddy_page(), which is expected
+ * to work only on error pages isolated from buddy allocator.
+ */
+bool clear_hwpoison_free_buddy_page(struct page *page)
+{
+   struct zone *zone = page_zone(page);
+   bool unpoisoned = false;
+
+   spin_lock(>lock);
+   if (TestClearPageHWPoison(page)) {
+   unsigned long pfn = page_to_pfn(page);
+   int migratetype = get_pfnblock_migratetype(page, pfn);
+
+   __free_one_page(page, pfn, zone, 0, migratetype);
+   unpoisoned = true;
+   }
+   spin_unlock(>lock);
+   return unpoisoned;
+}
 #endif
-- 
2.7.0



[RFC][PATCH v1 05/11] mm: hwpoison-inject: don't pin for hwpoison_filter()

2018-11-08 Thread Naoya Horiguchi
Another memory error injection interface debugfs:hwpoison/corrupt-pfn
also takes bogus refcount for hwpoison_filter(). It's justified
because this does a coarse filter, expecting that memory_failure()
redoes the check for sure.

Signed-off-by: Naoya Horiguchi 
---
 mm/hwpoison-inject.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/hwpoison-inject.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/hwpoison-inject.c
index b6ac706..766062c 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/hwpoison-inject.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/hwpoison-inject.c
@@ -25,11 +25,6 @@ static int hwpoison_inject(void *data, u64 val)
 
p = pfn_to_page(pfn);
hpage = compound_head(p);
-   /*
-* This implies unable to support free buddy pages.
-*/
-   if (!get_hwpoison_page(p))
-   return 0;
 
if (!hwpoison_filter_enable)
goto inject;
@@ -39,23 +34,20 @@ static int hwpoison_inject(void *data, u64 val)
 * This implies unable to support non-LRU pages.
 */
if (!PageLRU(hpage) && !PageHuge(p))
-   goto put_out;
+   return 0;
 
/*
-* do a racy check with elevated page count, to make sure PG_hwpoison
-* will only be set for the targeted owner (or on a free page).
+* do a racy check to make sure PG_hwpoison will only be set for
+* the targeted owner (or on a free page).
 * memory_failure() will redo the check reliably inside page lock.
 */
err = hwpoison_filter(hpage);
if (err)
-   goto put_out;
+   return 0;
 
 inject:
pr_info("Injecting memory failure at pfn %#lx\n", pfn);
-   return memory_failure(pfn, MF_COUNT_INCREASED);
-put_out:
-   put_hwpoison_page(p);
-   return 0;
+   return memory_failure(pfn, 0);
 }
 
 static int hwpoison_unpoison(void *data, u64 val)
-- 
2.7.0



[RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check

2018-11-08 Thread Naoya Horiguchi
memory_failure() forks to memory_failure_hugetlb() for hugetlb pages,
so a PageHuge() check after the fork should not be necessary.

Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 0cd3de3..9f09bf3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1357,10 +1357,7 @@ int memory_failure(unsigned long pfn, int flags)
 * page_remove_rmap() in try_to_unmap_one(). So to determine page status
 * correctly, we save a copy of the page flags at this time.
 */
-   if (PageHuge(p))
-   page_flags = hpage->flags;
-   else
-   page_flags = p->flags;
+   page_flags = p->flags;
 
/*
 * unpoison always clear PG_hwpoison inside page lock
-- 
2.7.0



[RFC][PATCH v1 09/11] mm: hwpoison: apply buddy page handling code to hard-offline

2018-11-08 Thread Naoya Horiguchi
Hard-offline of free buddy pages can be handled in the same manner as
soft-offline. So this patch applies the new semantics to hard-offline to
more complete isolation of offlined page. As a result, the successful
case is worth MF_RECOVERED instead of MF_DELAYED, so this patch also
changes it.

Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c | 38 --
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index ecafd4a..af541141 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -772,6 +772,16 @@ static int me_swapcache_clean(struct page *p, unsigned 
long pfn)
return MF_FAILED;
 }
 
+static int me_huge_free_page(struct page *p)
+{
+   int rc = dissolve_free_huge_page(p);
+
+   if (!rc && set_hwpoison_free_buddy_page(p))
+   return MF_RECOVERED;
+   else
+   return MF_FAILED;
+}
+
 /*
  * Huge pages. Needs work.
  * Issues:
@@ -799,8 +809,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 */
if (PageAnon(hpage))
put_page(hpage);
-   dissolve_free_huge_page(p);
-   res = MF_RECOVERED;
+   res = me_huge_free_page(p);
lock_page(hpage);
}
 
@@ -1108,8 +1117,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int 
flags)
}
}
unlock_page(head);
-   dissolve_free_huge_page(p);
-   action_result(pfn, MF_MSG_FREE_HUGE, MF_DELAYED);
+
+   res = me_huge_free_page(p);
+   if (res == MF_FAILED)
+   num_poisoned_pages_dec();
+   action_result(pfn, MF_MSG_FREE_HUGE, res);
return 0;
}
 
@@ -1270,6 +1282,13 @@ int memory_failure(unsigned long pfn, int flags)
p = pfn_to_page(pfn);
if (PageHuge(p))
return memory_failure_hugetlb(pfn, flags);
+
+   if (set_hwpoison_free_buddy_page(p)) {
+   action_result(pfn, MF_MSG_BUDDY, MF_RECOVERED);
+   num_poisoned_pages_inc();
+   return 0;
+   }
+
if (TestSetPageHWPoison(p)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n",
pfn);
@@ -1281,8 +1300,7 @@ int memory_failure(unsigned long pfn, int flags)
 
/*
 * We need/can do nothing about count=0 pages.
-* 1) it's a free page, and therefore in safe hand:
-*prep_new_page() will be the gate keeper.
+* 1) it's a free page, and removed from buddy allocator.
 * 2) it's part of a non-compound high order page.
 *Implies some kernel user: cannot stop them from
 *R/W the page; let's pray that the page has been
@@ -1291,8 +1309,8 @@ int memory_failure(unsigned long pfn, int flags)
 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 */
if (!get_hwpoison_page(p)) {
-   if (is_free_buddy_page(p)) {
-   action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
+   if (set_hwpoison_free_buddy_page(p)) {
+   action_result(pfn, MF_MSG_BUDDY, MF_RECOVERED);
return 0;
} else {
action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, 
MF_IGNORED);
@@ -1330,8 +1348,8 @@ int memory_failure(unsigned long pfn, int flags)
 */
shake_page(p, 0);
/* shake_page could have turned it free. */
-   if (!PageLRU(p) && is_free_buddy_page(p)) {
-   action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+   if (!PageLRU(p) && set_hwpoison_free_buddy_page(p)) {
+   action_result(pfn, MF_MSG_BUDDY_2ND, MF_RECOVERED);
return 0;
}
 
-- 
2.7.0



[PATCH RFC v1 00/11] hwpoison improvement part 1

2018-11-08 Thread Naoya Horiguchi
Hi everyone,

I wrote hwpoison patches which partially mention the problems
discussed recently on this area [1].

Main point of this series is how we isolate faulty pages more
safely/reliable. As pointed out from Michal in thread [2], we can
have better isolation functions rather than what we currently have.
Patch 8/11 gives the implementation. As a result, the behavior of
poisoned pages (at least from soft-offline) are more predictable
and I think that memory hotremove should properly work with it.

The structure of this series:
  - patch 1-7 are small fixes, preparation, and/or cleanup.
I can separate these out from main part if you like.
  - patch 8 is core part of this series, providing some code
to pick out the target page from buddy allocator,
  - patch 9-11 are changes on caller sides (hard-offline,
hotremove and unpoison.)

One big issue not addressed by this series is hard-offlining hugetlb,
which is still a todo unfortunately.

Another remaining work is to rework on the behavior of PG_hwpoison
flag from hard-offlining of in-use page. Even with this series,
hard-offline for in-use pages works as in the past (i.e. we still take
racy "set PG_hwpoison at first, then do some handling" approach.)
Without changing this, we can't be free from many "if (PageHWPoison)"
checks in mm code. So I'll think/try more about it after this one.

Anyway this is the first step for better solution (I believe,)
and any kind of help is applicated.

Thanks,
Naoya Horiguchi

[1]: https://lwn.net/Articles/753261/
[2]: https://lkml.org/lkml/2018/7/17/60
---
Summary:

Naoya Horiguchi (11):
  mm: hwpoison: cleanup unused PageHuge() check
  mm: soft-offline: add missing error check of 
set_hwpoison_free_buddy_page()
  mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h
  mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED
  mm: hwpoison-inject: don't pin for hwpoison_filter()
  mm: hwpoison: remove MF_COUNT_INCREASED
  mm: remove flag argument from soft offline functions
  mm: soft-offline: isolate error pages from buddy freelist
  mm: hwpoison: apply buddy page handling code to hard-offline
  mm: clear PageHWPoison in memory hotremove
  mm: hwpoison: introduce clear_hwpoison_free_buddy_page()

 drivers/base/memory.c  |   2 +-
 include/linux/mm.h |  22 ++---
 include/linux/page-flags.h |   8 +++-
 include/linux/swapops.h|  16 ---
 mm/hwpoison-inject.c   |  18 ++--
 mm/madvise.c   |  25 +-
 mm/memory-failure.c| 112 ++---
 mm/migrate.c   |   9 
 mm/page_alloc.c|  95 +++---
 mm/sparse.c|   2 +-
 10 files changed, 193 insertions(+), 116 deletions(-)


[RFC][PATCH v1 10/11] mm: clear PageHWPoison in memory hotremove

2018-11-08 Thread Naoya Horiguchi
One hopeful usecase of memory hotplug is to replace half-broken DIMMs
with new ones, so it makes sense to clear hwpoison info at the time of
memory hotremove.

I hope that this patch covers the topic discussed in
https://lkml.org/lkml/2018/1/17/1228

Signed-off-by: Naoya Horiguchi 
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index 970d6ff..27826b3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8139,8 +8139,9 @@ __offline_isolated_pages(unsigned long start_pfn, 
unsigned long end_pfn)
 * The HWPoisoned page may be not in buddy system, and
 * page_count() is not 0.
 */
-   if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
+   if (unlikely(!PageBuddy(page) && TestClearPageHWPoison(page))) {
pfn++;
+   num_poisoned_pages_dec();
SetPageReserved(page);
continue;
}
-- 
2.7.0



[RFC][PATCH v1 06/11] mm: hwpoison: remove MF_COUNT_INCREASED

2018-11-08 Thread Naoya Horiguchi
Now there's no user of MF_COUNT_INCREASED, so we can safely remove
all calling points.

Signed-off-by: Naoya Horiguchi 
---
 include/linux/mm.h  |  7 +++
 mm/memory-failure.c | 16 +++-
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index 22623ba..f85b450 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2725,10 +2725,9 @@ void register_page_bootmem_memmap(unsigned long 
section_nr, struct page *map,
  unsigned long nr_pages);
 
 enum mf_flags {
-   MF_COUNT_INCREASED = 1 << 0,
-   MF_ACTION_REQUIRED = 1 << 1,
-   MF_MUST_KILL = 1 << 2,
-   MF_SOFT_OFFLINE = 1 << 3,
+   MF_ACTION_REQUIRED = 1 << 0,
+   MF_MUST_KILL = 1 << 1,
+   MF_SOFT_OFFLINE = 1 << 2,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 11e283e..ed347f8 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1094,7 +1094,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int 
flags)
 
num_poisoned_pages_inc();
 
-   if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+   if (!get_hwpoison_page(p)) {
/*
 * Check "filter hit" and "race with other subpage."
 */
@@ -1290,7 +1290,7 @@ int memory_failure(unsigned long pfn, int flags)
 * In fact it's dangerous to directly bump up page count from 0,
 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 */
-   if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+   if (!get_hwpoison_page(p)) {
if (is_free_buddy_page(p)) {
action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
return 0;
@@ -1331,10 +1331,7 @@ int memory_failure(unsigned long pfn, int flags)
shake_page(p, 0);
/* shake_page could have turned it free. */
if (!PageLRU(p) && is_free_buddy_page(p)) {
-   if (flags & MF_COUNT_INCREASED)
-   action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
-   else
-   action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+   action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
return 0;
}
 
@@ -1622,9 +1619,6 @@ static int __get_any_page(struct page *p, unsigned long 
pfn, int flags)
 {
int ret;
 
-   if (flags & MF_COUNT_INCREASED)
-   return 1;
-
/*
 * When the target page is a free hugepage, just remove it
 * from free hugepage list.
@@ -1906,15 +1900,11 @@ int soft_offline_page(struct page *page, int flags)
if (is_zone_device_page(page)) {
pr_debug_ratelimited("soft_offline: %#lx page is device page\n",
pfn);
-   if (flags & MF_COUNT_INCREASED)
-   put_page(page);
return -EIO;
}
 
if (PageHWPoison(page)) {
pr_info("soft offline: %#lx page already poisoned\n", pfn);
-   if (flags & MF_COUNT_INCREASED)
-   put_hwpoison_page(page);
return -EBUSY;
}
 
-- 
2.7.0



[PATCH RFC v1 00/11] hwpoison improvement part 1

2018-11-08 Thread Naoya Horiguchi
Hi everyone,

I wrote hwpoison patches which partially mention the problems
discussed recently on this area [1].

Main point of this series is how we isolate faulty pages more
safely/reliable. As pointed out from Michal in thread [2], we can
have better isolation functions rather than what we currently have.
Patch 8/11 gives the implementation. As a result, the behavior of
poisoned pages (at least from soft-offline) are more predictable
and I think that memory hotremove should properly work with it.

The structure of this series:
  - patch 1-7 are small fixes, preparation, and/or cleanup.
I can separate these out from main part if you like.
  - patch 8 is core part of this series, providing some code
to pick out the target page from buddy allocator,
  - patch 9-11 are changes on caller sides (hard-offline,
hotremove and unpoison.)

One big issue not addressed by this series is hard-offlining hugetlb,
which is still a todo unfortunately.

Another remaining work is to rework on the behavior of PG_hwpoison
flag from hard-offlining of in-use page. Even with this series,
hard-offline for in-use pages works as in the past (i.e. we still take
racy "set PG_hwpoison at first, then do some handling" approach.)
Without changing this, we can't be free from many "if (PageHWPoison)"
checks in mm code. So I'll think/try more about it after this one.

Anyway this is the first step for better solution (I believe,)
and any kind of help is applicated.

Thanks,
Naoya Horiguchi

[1]: https://lwn.net/Articles/753261/
[2]: https://lkml.org/lkml/2018/7/17/60
---
Summary:

Naoya Horiguchi (11):
  mm: hwpoison: cleanup unused PageHuge() check
  mm: soft-offline: add missing error check of 
set_hwpoison_free_buddy_page()
  mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h
  mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED
  mm: hwpoison-inject: don't pin for hwpoison_filter()
  mm: hwpoison: remove MF_COUNT_INCREASED
  mm: remove flag argument from soft offline functions
  mm: soft-offline: isolate error pages from buddy freelist
  mm: hwpoison: apply buddy page handling code to hard-offline
  mm: clear PageHWPoison in memory hotremove
  mm: hwpoison: introduce clear_hwpoison_free_buddy_page()

 drivers/base/memory.c  |   2 +-
 include/linux/mm.h |  22 ++---
 include/linux/page-flags.h |   8 +++-
 include/linux/swapops.h|  16 ---
 mm/hwpoison-inject.c   |  18 ++--
 mm/madvise.c   |  25 +-
 mm/memory-failure.c| 112 ++---
 mm/migrate.c   |   9 
 mm/page_alloc.c|  95 +++---
 mm/sparse.c|   2 +-
 10 files changed, 193 insertions(+), 116 deletions(-)


[RFC][PATCH v1 10/11] mm: clear PageHWPoison in memory hotremove

2018-11-08 Thread Naoya Horiguchi
One hopeful usecase of memory hotplug is to replace half-broken DIMMs
with new ones, so it makes sense to clear hwpoison info at the time of
memory hotremove.

I hope that this patch covers the topic discussed in
https://lkml.org/lkml/2018/1/17/1228

Signed-off-by: Naoya Horiguchi 
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index 970d6ff..27826b3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8139,8 +8139,9 @@ __offline_isolated_pages(unsigned long start_pfn, 
unsigned long end_pfn)
 * The HWPoisoned page may be not in buddy system, and
 * page_count() is not 0.
 */
-   if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
+   if (unlikely(!PageBuddy(page) && TestClearPageHWPoison(page))) {
pfn++;
+   num_poisoned_pages_dec();
SetPageReserved(page);
continue;
}
-- 
2.7.0



[RFC][PATCH v1 06/11] mm: hwpoison: remove MF_COUNT_INCREASED

2018-11-08 Thread Naoya Horiguchi
Now there's no user of MF_COUNT_INCREASED, so we can safely remove
all calling points.

Signed-off-by: Naoya Horiguchi 
---
 include/linux/mm.h  |  7 +++
 mm/memory-failure.c | 16 +++-
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index 22623ba..f85b450 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2725,10 +2725,9 @@ void register_page_bootmem_memmap(unsigned long 
section_nr, struct page *map,
  unsigned long nr_pages);
 
 enum mf_flags {
-   MF_COUNT_INCREASED = 1 << 0,
-   MF_ACTION_REQUIRED = 1 << 1,
-   MF_MUST_KILL = 1 << 2,
-   MF_SOFT_OFFLINE = 1 << 3,
+   MF_ACTION_REQUIRED = 1 << 0,
+   MF_MUST_KILL = 1 << 1,
+   MF_SOFT_OFFLINE = 1 << 2,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 11e283e..ed347f8 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1094,7 +1094,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int 
flags)
 
num_poisoned_pages_inc();
 
-   if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+   if (!get_hwpoison_page(p)) {
/*
 * Check "filter hit" and "race with other subpage."
 */
@@ -1290,7 +1290,7 @@ int memory_failure(unsigned long pfn, int flags)
 * In fact it's dangerous to directly bump up page count from 0,
 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 */
-   if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+   if (!get_hwpoison_page(p)) {
if (is_free_buddy_page(p)) {
action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
return 0;
@@ -1331,10 +1331,7 @@ int memory_failure(unsigned long pfn, int flags)
shake_page(p, 0);
/* shake_page could have turned it free. */
if (!PageLRU(p) && is_free_buddy_page(p)) {
-   if (flags & MF_COUNT_INCREASED)
-   action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
-   else
-   action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+   action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
return 0;
}
 
@@ -1622,9 +1619,6 @@ static int __get_any_page(struct page *p, unsigned long 
pfn, int flags)
 {
int ret;
 
-   if (flags & MF_COUNT_INCREASED)
-   return 1;
-
/*
 * When the target page is a free hugepage, just remove it
 * from free hugepage list.
@@ -1906,15 +1900,11 @@ int soft_offline_page(struct page *page, int flags)
if (is_zone_device_page(page)) {
pr_debug_ratelimited("soft_offline: %#lx page is device page\n",
pfn);
-   if (flags & MF_COUNT_INCREASED)
-   put_page(page);
return -EIO;
}
 
if (PageHWPoison(page)) {
pr_info("soft offline: %#lx page already poisoned\n", pfn);
-   if (flags & MF_COUNT_INCREASED)
-   put_hwpoison_page(page);
return -EBUSY;
}
 
-- 
2.7.0



[RFC][PATCH v1 08/11] mm: soft-offline: isolate error pages from buddy freelist

2018-11-08 Thread Naoya Horiguchi
Soft-offline shares PG_hwpoison with hard-offline to keep track
of memory error, but recently we found that the approach can be
undesirable for soft-offline because it never expects to stop
applications unlike hard-offline.

So this patch suggests that memory error handler (not only sets
PG_hwpoison, but) isolates error pages from buddy allocator in
its context.

In previous works [1], we allow soft-offline handler to set
PG_hwpoison only after successful page migration and page freeing.
This patch, along with that, makes the isolation always done via
set_hwpoison_free_buddy_page() with zone->lock, so the behavior
should be less racy and more predictable.

Note that only considering for isolation, we don't have to set
PG_hwpoison, but my analysis shows that to make memory hotremove
properly work, we still need some flag to clearly separate memory
error from any other type of pages. So this patch doesn't change this.

[1]:
  commit 6bc9b56433b7 ("mm: fix race on soft-offlining free huge pages")
  commit d4ae9916ea29 ("mm: soft-offline: close the race against page 
allocation")

Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c |  8 +++---
 mm/page_alloc.c | 71 -
 2 files changed, 70 insertions(+), 9 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 869ff8f..ecafd4a 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1762,9 +1762,11 @@ static int __soft_offline_page(struct page *page)
if (ret == 1) {
put_hwpoison_page(page);
pr_info("soft_offline: %#lx: invalidated\n", pfn);
-   SetPageHWPoison(page);
-   num_poisoned_pages_inc();
-   return 0;
+   if (set_hwpoison_free_buddy_page(page)) {
+   num_poisoned_pages_inc();
+   return 0;
+   } else
+   return -EBUSY;
}
 
/*
diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index ae31839..970d6ff 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8183,10 +8183,55 @@ bool is_free_buddy_page(struct page *page)
 }
 
 #ifdef CONFIG_MEMORY_FAILURE
+
+/*
+ * Pick out a free page from buddy allocator. Unlike expand(), this
+ * function can choose the target page by @target which is not limited
+ * to the first page of some free block.
+ *
+ * This function changes zone state, so callers need to hold zone->lock.
+ */
+static inline void pickout_buddy_page(struct zone *zone, struct page *page,
+   struct page *target, int torder, int low, int high,
+   struct free_area *area, int migratetype)
+{
+   unsigned long size = 1 << high;
+   struct page *current_buddy, *next_page;
+
+   while (high > low) {
+   area--;
+   high--;
+   size >>= 1;
+
+   if (target >= [size]) { /* target is in higher buddy */
+   next_page = page + size;
+   current_buddy = page;
+   } else { /* target is in lower buddy */
+   next_page = page;
+   current_buddy = page + size;
+   }
+   VM_BUG_ON_PAGE(bad_range(zone, current_buddy), current_buddy);
+
+   if (set_page_guard(zone, [size], high, migratetype))
+   continue;
+
+   list_add(_buddy->lru, >free_list[migratetype]);
+   area->nr_free++;
+   set_page_order(current_buddy, high);
+   page = next_page;
+   }
+}
+
 /*
- * Set PG_hwpoison flag if a given page is confirmed to be a free page.  This
- * test is performed under the zone lock to prevent a race against page
- * allocation.
+ * Isolate hwpoisoned free page which actully does the following
+ *   - confirm that a given page is a free page under zone->lock,
+ *   - set PG_hwpoison flag,
+ *   - remove the page from buddy allocator, subdividing buddy page
+ * of each order.
+ *
+ * Just setting PG_hwpoison flag is not safe enough for complete isolation
+ * because rapidly-changing memory allocator code is always with the
+ * risk of mishandling the flag and potential race.
  */
 bool set_hwpoison_free_buddy_page(struct page *page)
 {
@@ -8199,10 +8244,24 @@ bool set_hwpoison_free_buddy_page(struct page *page)
spin_lock_irqsave(>lock, flags);
for (order = 0; order < MAX_ORDER; order++) {
struct page *page_head = page - (pfn & ((1 << order) - 1));
+   unsigned int forder = page_order(page_head);
+   struct free_area *area = &(zone->free_area[forder]);
 
-   if (PageBuddy(page_head) && page_order(page_head) >= 

[RFC][PATCH v1 07/11] mm: remove flag argument from soft offline functions

2018-11-08 Thread Naoya Horiguchi
The argument @flag no longer affects the behavior of soft_offline_page()
and its variants, so let's remove them.

Signed-off-by: Naoya Horiguchi 
---
 drivers/base/memory.c |  2 +-
 include/linux/mm.h|  2 +-
 mm/madvise.c  |  2 +-
 mm/memory-failure.c   | 27 +--
 4 files changed, 16 insertions(+), 17 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/drivers/base/memory.c 
v4.19-mmotm-2018-10-30-16-08_patched/drivers/base/memory.c
index 0e59856..4a554a5 100644
--- v4.19-mmotm-2018-10-30-16-08/drivers/base/memory.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/drivers/base/memory.c
@@ -548,7 +548,7 @@ store_soft_offline_page(struct device *dev,
pfn >>= PAGE_SHIFT;
if (!pfn_valid(pfn))
return -ENXIO;
-   ret = soft_offline_page(pfn_to_page(pfn), 0);
+   ret = soft_offline_page(pfn_to_page(pfn));
return ret == 0 ? count : ret;
 }
 
diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index f85b450..6c496da 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2738,7 +2738,7 @@ extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages __read_mostly;
-extern int soft_offline_page(struct page *page, int flags);
+extern int soft_offline_page(struct page *page);
 
 #ifdef CONFIG_MEMORY_FAILURE
 /*
diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
index 9fa0225..86453f3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
@@ -660,7 +660,7 @@ static int madvise_inject_error(int behavior,
pr_info("Soft offlining pfn %#lx at process virtual 
address %#lx\n",
pfn, start);
 
-   ret = soft_offline_page(page, 0);
+   ret = soft_offline_page(page);
if (ret)
return ret;
continue;
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index ed347f8..869ff8f 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1482,7 +1482,7 @@ static void memory_failure_work_func(struct work_struct 
*work)
if (!gotten)
break;
if (entry.flags & MF_SOFT_OFFLINE)
-   soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
+   soft_offline_page(pfn_to_page(entry.pfn));
else
memory_failure(entry.pfn, entry.flags);
}
@@ -1615,7 +1615,7 @@ static struct page *new_page(struct page *p, unsigned 
long private)
  * that is not free, and 1 for any other page type.
  * For 1 the page is returned with increased page count, otherwise not.
  */
-static int __get_any_page(struct page *p, unsigned long pfn, int flags)
+static int __get_any_page(struct page *p, unsigned long pfn)
 {
int ret;
 
@@ -1642,9 +1642,9 @@ static int __get_any_page(struct page *p, unsigned long 
pfn, int flags)
return ret;
 }
 
-static int get_any_page(struct page *page, unsigned long pfn, int flags)
+static int get_any_page(struct page *page, unsigned long pfn)
 {
-   int ret = __get_any_page(page, pfn, flags);
+   int ret = __get_any_page(page, pfn);
 
if (ret == 1 && !PageHuge(page) &&
!PageLRU(page) && !__PageMovable(page)) {
@@ -1657,7 +1657,7 @@ static int get_any_page(struct page *page, unsigned long 
pfn, int flags)
/*
 * Did it turn free?
 */
-   ret = __get_any_page(page, pfn, 0);
+   ret = __get_any_page(page, pfn);
if (ret == 1 && !PageLRU(page)) {
/* Drop page reference which is from __get_any_page() */
put_hwpoison_page(page);
@@ -1669,7 +1669,7 @@ static int get_any_page(struct page *page, unsigned long 
pfn, int flags)
return ret;
 }
 
-static int soft_offline_huge_page(struct page *page, int flags)
+static int soft_offline_huge_page(struct page *page)
 {
int ret;
unsigned long pfn = page_to_pfn(page);
@@ -1730,7 +1730,7 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
return ret;
 }
 
-static int __soft_offline_page(struct page *page, int flags)
+static int __soft_offline_page(struct page *page)
 {
int ret;
unsigned long pfn = page_to_pfn(page);
@@ -1815,7 +1815,7 @@ static int __soft_offline_page(struct page *page, int 
flags)
return ret;
 }
 
-static int soft_offline_in_use_page(struct page *page, int flags)
+static int 

[RFC][PATCH v1 08/11] mm: soft-offline: isolate error pages from buddy freelist

2018-11-08 Thread Naoya Horiguchi
Soft-offline shares PG_hwpoison with hard-offline to keep track
of memory error, but recently we found that the approach can be
undesirable for soft-offline because it never expects to stop
applications unlike hard-offline.

So this patch suggests that memory error handler (not only sets
PG_hwpoison, but) isolates error pages from buddy allocator in
its context.

In previous works [1], we allow soft-offline handler to set
PG_hwpoison only after successful page migration and page freeing.
This patch, along with that, makes the isolation always done via
set_hwpoison_free_buddy_page() with zone->lock, so the behavior
should be less racy and more predictable.

Note that only considering for isolation, we don't have to set
PG_hwpoison, but my analysis shows that to make memory hotremove
properly work, we still need some flag to clearly separate memory
error from any other type of pages. So this patch doesn't change this.

[1]:
  commit 6bc9b56433b7 ("mm: fix race on soft-offlining free huge pages")
  commit d4ae9916ea29 ("mm: soft-offline: close the race against page 
allocation")

Signed-off-by: Naoya Horiguchi 
---
 mm/memory-failure.c |  8 +++---
 mm/page_alloc.c | 71 -
 2 files changed, 70 insertions(+), 9 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 869ff8f..ecafd4a 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1762,9 +1762,11 @@ static int __soft_offline_page(struct page *page)
if (ret == 1) {
put_hwpoison_page(page);
pr_info("soft_offline: %#lx: invalidated\n", pfn);
-   SetPageHWPoison(page);
-   num_poisoned_pages_inc();
-   return 0;
+   if (set_hwpoison_free_buddy_page(page)) {
+   num_poisoned_pages_inc();
+   return 0;
+   } else
+   return -EBUSY;
}
 
/*
diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index ae31839..970d6ff 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8183,10 +8183,55 @@ bool is_free_buddy_page(struct page *page)
 }
 
 #ifdef CONFIG_MEMORY_FAILURE
+
+/*
+ * Pick out a free page from buddy allocator. Unlike expand(), this
+ * function can choose the target page by @target which is not limited
+ * to the first page of some free block.
+ *
+ * This function changes zone state, so callers need to hold zone->lock.
+ */
+static inline void pickout_buddy_page(struct zone *zone, struct page *page,
+   struct page *target, int torder, int low, int high,
+   struct free_area *area, int migratetype)
+{
+   unsigned long size = 1 << high;
+   struct page *current_buddy, *next_page;
+
+   while (high > low) {
+   area--;
+   high--;
+   size >>= 1;
+
+   if (target >= [size]) { /* target is in higher buddy */
+   next_page = page + size;
+   current_buddy = page;
+   } else { /* target is in lower buddy */
+   next_page = page;
+   current_buddy = page + size;
+   }
+   VM_BUG_ON_PAGE(bad_range(zone, current_buddy), current_buddy);
+
+   if (set_page_guard(zone, [size], high, migratetype))
+   continue;
+
+   list_add(_buddy->lru, >free_list[migratetype]);
+   area->nr_free++;
+   set_page_order(current_buddy, high);
+   page = next_page;
+   }
+}
+
 /*
- * Set PG_hwpoison flag if a given page is confirmed to be a free page.  This
- * test is performed under the zone lock to prevent a race against page
- * allocation.
+ * Isolate hwpoisoned free page which actully does the following
+ *   - confirm that a given page is a free page under zone->lock,
+ *   - set PG_hwpoison flag,
+ *   - remove the page from buddy allocator, subdividing buddy page
+ * of each order.
+ *
+ * Just setting PG_hwpoison flag is not safe enough for complete isolation
+ * because rapidly-changing memory allocator code is always with the
+ * risk of mishandling the flag and potential race.
  */
 bool set_hwpoison_free_buddy_page(struct page *page)
 {
@@ -8199,10 +8244,24 @@ bool set_hwpoison_free_buddy_page(struct page *page)
spin_lock_irqsave(>lock, flags);
for (order = 0; order < MAX_ORDER; order++) {
struct page *page_head = page - (pfn & ((1 << order) - 1));
+   unsigned int forder = page_order(page_head);
+   struct free_area *area = &(zone->free_area[forder]);
 
-   if (PageBuddy(page_head) && page_order(page_head) >= 

[RFC][PATCH v1 07/11] mm: remove flag argument from soft offline functions

2018-11-08 Thread Naoya Horiguchi
The argument @flag no longer affects the behavior of soft_offline_page()
and its variants, so let's remove them.

Signed-off-by: Naoya Horiguchi 
---
 drivers/base/memory.c |  2 +-
 include/linux/mm.h|  2 +-
 mm/madvise.c  |  2 +-
 mm/memory-failure.c   | 27 +--
 4 files changed, 16 insertions(+), 17 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/drivers/base/memory.c 
v4.19-mmotm-2018-10-30-16-08_patched/drivers/base/memory.c
index 0e59856..4a554a5 100644
--- v4.19-mmotm-2018-10-30-16-08/drivers/base/memory.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/drivers/base/memory.c
@@ -548,7 +548,7 @@ store_soft_offline_page(struct device *dev,
pfn >>= PAGE_SHIFT;
if (!pfn_valid(pfn))
return -ENXIO;
-   ret = soft_offline_page(pfn_to_page(pfn), 0);
+   ret = soft_offline_page(pfn_to_page(pfn));
return ret == 0 ? count : ret;
 }
 
diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h 
v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index f85b450..6c496da 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2738,7 +2738,7 @@ extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages __read_mostly;
-extern int soft_offline_page(struct page *page, int flags);
+extern int soft_offline_page(struct page *page);
 
 #ifdef CONFIG_MEMORY_FAILURE
 /*
diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
index 9fa0225..86453f3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
@@ -660,7 +660,7 @@ static int madvise_inject_error(int behavior,
pr_info("Soft offlining pfn %#lx at process virtual 
address %#lx\n",
pfn, start);
 
-   ret = soft_offline_page(page, 0);
+   ret = soft_offline_page(page);
if (ret)
return ret;
continue;
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c 
v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index ed347f8..869ff8f 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1482,7 +1482,7 @@ static void memory_failure_work_func(struct work_struct 
*work)
if (!gotten)
break;
if (entry.flags & MF_SOFT_OFFLINE)
-   soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
+   soft_offline_page(pfn_to_page(entry.pfn));
else
memory_failure(entry.pfn, entry.flags);
}
@@ -1615,7 +1615,7 @@ static struct page *new_page(struct page *p, unsigned 
long private)
  * that is not free, and 1 for any other page type.
  * For 1 the page is returned with increased page count, otherwise not.
  */
-static int __get_any_page(struct page *p, unsigned long pfn, int flags)
+static int __get_any_page(struct page *p, unsigned long pfn)
 {
int ret;
 
@@ -1642,9 +1642,9 @@ static int __get_any_page(struct page *p, unsigned long 
pfn, int flags)
return ret;
 }
 
-static int get_any_page(struct page *page, unsigned long pfn, int flags)
+static int get_any_page(struct page *page, unsigned long pfn)
 {
-   int ret = __get_any_page(page, pfn, flags);
+   int ret = __get_any_page(page, pfn);
 
if (ret == 1 && !PageHuge(page) &&
!PageLRU(page) && !__PageMovable(page)) {
@@ -1657,7 +1657,7 @@ static int get_any_page(struct page *page, unsigned long 
pfn, int flags)
/*
 * Did it turn free?
 */
-   ret = __get_any_page(page, pfn, 0);
+   ret = __get_any_page(page, pfn);
if (ret == 1 && !PageLRU(page)) {
/* Drop page reference which is from __get_any_page() */
put_hwpoison_page(page);
@@ -1669,7 +1669,7 @@ static int get_any_page(struct page *page, unsigned long 
pfn, int flags)
return ret;
 }
 
-static int soft_offline_huge_page(struct page *page, int flags)
+static int soft_offline_huge_page(struct page *page)
 {
int ret;
unsigned long pfn = page_to_pfn(page);
@@ -1730,7 +1730,7 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
return ret;
 }
 
-static int __soft_offline_page(struct page *page, int flags)
+static int __soft_offline_page(struct page *page)
 {
int ret;
unsigned long pfn = page_to_pfn(page);
@@ -1815,7 +1815,7 @@ static int __soft_offline_page(struct page *page, int 
flags)
return ret;
 }
 
-static int soft_offline_in_use_page(struct page *page, int flags)
+static int 

Re: [PATCH 04/13] elf-em.h: add EM_XTENSA

2018-11-08 Thread Max Filippov
On Thu, Nov 8, 2018 at 7:15 PM Dmitry V. Levin  wrote:
> The uapi/linux/audit.h header is going to use EM_XTENSA in order
> to define AUDIT_ARCH_XTENSA which is needed to implement
> syscall_get_arch() which in turn is required to extend
> the generic ptrace API with PTRACE_GET_SYSCALL_INFO request.
>
> The value for EM_XTENSA has been taken from
> http://www.sco.com/developers/gabi/2012-12-31/ch4.eheader.html
>
> Signed-off-by: Dmitry V. Levin 
> ---
>  include/uapi/linux/elf-em.h | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Max Filippov 

-- 
Thanks.
-- Max


Re: [PATCH 04/13] elf-em.h: add EM_XTENSA

2018-11-08 Thread Max Filippov
On Thu, Nov 8, 2018 at 7:15 PM Dmitry V. Levin  wrote:
> The uapi/linux/audit.h header is going to use EM_XTENSA in order
> to define AUDIT_ARCH_XTENSA which is needed to implement
> syscall_get_arch() which in turn is required to extend
> the generic ptrace API with PTRACE_GET_SYSCALL_INFO request.
>
> The value for EM_XTENSA has been taken from
> http://www.sco.com/developers/gabi/2012-12-31/ch4.eheader.html
>
> Signed-off-by: Dmitry V. Levin 
> ---
>  include/uapi/linux/elf-em.h | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Max Filippov 

-- 
Thanks.
-- Max


Re: [PATCH v1 4/4] thermal: tegra: fix coverity defect

2018-11-08 Thread Wei Ni



On 8/11/2018 8:37 PM, Thierry Reding wrote:
> On Mon, Nov 05, 2018 at 05:32:34PM +0800, Wei Ni wrote:
>> Fix dereference dev before null check.
>>
>> Signed-off-by: Wei Ni 
>> ---
>>  drivers/thermal/tegra/soctherm.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/thermal/tegra/soctherm.c 
>> b/drivers/thermal/tegra/soctherm.c
>> index 3042837364e8..96527df91f2a 100644
>> --- a/drivers/thermal/tegra/soctherm.c
>> +++ b/drivers/thermal/tegra/soctherm.c
>> @@ -397,7 +397,7 @@ static int throttrip_program(struct device *dev,
>>   struct soctherm_throt_cfg *stc,
>>   int trip_temp)
>>  {
>> -struct tegra_soctherm *ts = dev_get_drvdata(dev);
>> +struct tegra_soctherm *ts;
>>  int temp, cpu_throt, gpu_throt;
>>  unsigned int throt;
>>  u32 r, reg_off;
>> @@ -405,6 +405,8 @@ static int throttrip_program(struct device *dev,
>>  if (!sg || !stc || !stc->init)
>>  return -EINVAL;
>>  
>> +ts = dev_get_drvdata(dev);
> 
> I think coverity is wrong. How is dev ever going to be NULL in this
> case? We allocate all of these struct tegra_thermctl_zone structures in
> tegra_soctherm_probe() and assign zone->dev = >dev, which can
> never be NULL.
> 
> And even if it could, the code would've crashed earlier in
> tegra_soctherm_probe() already.
> 
> Furthermore, I fail to see how your patch would fix the defect. None of
> the checks in the conditional above actually check the dev value.
> 
Yes, you are right, we doesn't need this change. The driver would not
pass null dev in any case.
And this driver already had a change "1fba81cc09bd thermal: tegra:
remove null check for dev pointer" which remove this "dev" checking.

Thank.
Wei.

> Thierry
> 


Re: [PATCH v1 4/4] thermal: tegra: fix coverity defect

2018-11-08 Thread Wei Ni



On 8/11/2018 8:37 PM, Thierry Reding wrote:
> On Mon, Nov 05, 2018 at 05:32:34PM +0800, Wei Ni wrote:
>> Fix dereference dev before null check.
>>
>> Signed-off-by: Wei Ni 
>> ---
>>  drivers/thermal/tegra/soctherm.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/thermal/tegra/soctherm.c 
>> b/drivers/thermal/tegra/soctherm.c
>> index 3042837364e8..96527df91f2a 100644
>> --- a/drivers/thermal/tegra/soctherm.c
>> +++ b/drivers/thermal/tegra/soctherm.c
>> @@ -397,7 +397,7 @@ static int throttrip_program(struct device *dev,
>>   struct soctherm_throt_cfg *stc,
>>   int trip_temp)
>>  {
>> -struct tegra_soctherm *ts = dev_get_drvdata(dev);
>> +struct tegra_soctherm *ts;
>>  int temp, cpu_throt, gpu_throt;
>>  unsigned int throt;
>>  u32 r, reg_off;
>> @@ -405,6 +405,8 @@ static int throttrip_program(struct device *dev,
>>  if (!sg || !stc || !stc->init)
>>  return -EINVAL;
>>  
>> +ts = dev_get_drvdata(dev);
> 
> I think coverity is wrong. How is dev ever going to be NULL in this
> case? We allocate all of these struct tegra_thermctl_zone structures in
> tegra_soctherm_probe() and assign zone->dev = >dev, which can
> never be NULL.
> 
> And even if it could, the code would've crashed earlier in
> tegra_soctherm_probe() already.
> 
> Furthermore, I fail to see how your patch would fix the defect. None of
> the checks in the conditional above actually check the dev value.
> 
Yes, you are right, we doesn't need this change. The driver would not
pass null dev in any case.
And this driver already had a change "1fba81cc09bd thermal: tegra:
remove null check for dev pointer" which remove this "dev" checking.

Thank.
Wei.

> Thierry
> 


Re: [PATCH 1/2] dt-bindings: phy: Add Qualcomm Synopsys High-Speed USB PHY binding

2018-11-08 Thread Shawn Guo
On Fri, Nov 09, 2018 at 10:38:19AM +0530, Vinod Koul wrote:
> On 08-11-18, 15:04, Shawn Guo wrote:
> > From: Sriharsha Allenki 
> > 
> > It adds bindings for Synopsys 28nm femto phy controller that supports
> > LS/FS/HS usb connectivity on Qualcomm chipsets.
> > 
> > Signed-off-by: Sriharsha Allenki 
> > Signed-off-by: Anu Ramanathan 
> > Signed-off-by: Bjorn Andersson 
> > Signed-off-by: Shawn Guo 
> > ---
> >  .../phy/qcom,snps-28nm-usb-hs-phy.txt | 101 ++
> >  1 file changed, 101 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/phy/qcom,snps-28nm-usb-hs-phy.txt
> > 
> > diff --git 
> > a/Documentation/devicetree/bindings/phy/qcom,snps-28nm-usb-hs-phy.txt 
> > b/Documentation/devicetree/bindings/phy/qcom,snps-28nm-usb-hs-phy.txt
> > new file mode 100644
> > index ..75e7a09dd558
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/phy/qcom,snps-28nm-usb-hs-phy.txt
> > @@ -0,0 +1,101 @@
> > +Qualcomm Synopsys 28nm Femto phy controller
> > +===
> > +
> > +Synopsys 28nm femto phy controller supports LS/FS/HS usb connectivity on
> > +Qualcomm chipsets.
> > +
> > +Required properties:
> > +
> > +- compatible:
> > +Value type: 
> > +Definition: Should contain "qcom,usb-snps-hsphy".
> > +
> > +- reg:
> > +Value type: 
> > +Definition: USB PHY base address and length of the register map.
> > +
> > +- #phy-cells:
> > +Value type: 
> > +Definition: Should be 0.
> 
> I dont quite understand the definition that it should be 0, maybe you
> mean allowed value is 0, if so why have this property?

The property is defined by generic phy bindings phy/phy-bindings.txt.
I can add a pointer to it if you think that's necessary.  The property
should be 0 for our device, because there is zero number cell in phy
specifier from dwc3 node as shown in the example.

dwc3@78c {
...
phys = <_phy_prim>;
phy-names = "usb2-phy";
}

And for that reason, we can use the generic .of_xlate implementation
of_phy_simple_xlate() provided by phy core.  There are some comments
in kernel doc of of_phy_simple_xlate() which might be helpful.

Shawn


Re: [PATCH 1/2] dt-bindings: phy: Add Qualcomm Synopsys High-Speed USB PHY binding

2018-11-08 Thread Shawn Guo
On Fri, Nov 09, 2018 at 10:38:19AM +0530, Vinod Koul wrote:
> On 08-11-18, 15:04, Shawn Guo wrote:
> > From: Sriharsha Allenki 
> > 
> > It adds bindings for Synopsys 28nm femto phy controller that supports
> > LS/FS/HS usb connectivity on Qualcomm chipsets.
> > 
> > Signed-off-by: Sriharsha Allenki 
> > Signed-off-by: Anu Ramanathan 
> > Signed-off-by: Bjorn Andersson 
> > Signed-off-by: Shawn Guo 
> > ---
> >  .../phy/qcom,snps-28nm-usb-hs-phy.txt | 101 ++
> >  1 file changed, 101 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/phy/qcom,snps-28nm-usb-hs-phy.txt
> > 
> > diff --git 
> > a/Documentation/devicetree/bindings/phy/qcom,snps-28nm-usb-hs-phy.txt 
> > b/Documentation/devicetree/bindings/phy/qcom,snps-28nm-usb-hs-phy.txt
> > new file mode 100644
> > index ..75e7a09dd558
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/phy/qcom,snps-28nm-usb-hs-phy.txt
> > @@ -0,0 +1,101 @@
> > +Qualcomm Synopsys 28nm Femto phy controller
> > +===
> > +
> > +Synopsys 28nm femto phy controller supports LS/FS/HS usb connectivity on
> > +Qualcomm chipsets.
> > +
> > +Required properties:
> > +
> > +- compatible:
> > +Value type: 
> > +Definition: Should contain "qcom,usb-snps-hsphy".
> > +
> > +- reg:
> > +Value type: 
> > +Definition: USB PHY base address and length of the register map.
> > +
> > +- #phy-cells:
> > +Value type: 
> > +Definition: Should be 0.
> 
> I dont quite understand the definition that it should be 0, maybe you
> mean allowed value is 0, if so why have this property?

The property is defined by generic phy bindings phy/phy-bindings.txt.
I can add a pointer to it if you think that's necessary.  The property
should be 0 for our device, because there is zero number cell in phy
specifier from dwc3 node as shown in the example.

dwc3@78c {
...
phys = <_phy_prim>;
phy-names = "usb2-phy";
}

And for that reason, we can use the generic .of_xlate implementation
of_phy_simple_xlate() provided by phy core.  There are some comments
in kernel doc of of_phy_simple_xlate() which might be helpful.

Shawn


Re: [PATCH 2/2] arm64: defconfig: Enable some qcom remoteproc configs

2018-11-08 Thread Vinod Koul
On 08-11-18, 22:27, Bjorn Andersson wrote:
> On Thu 08 Nov 22:16 PST 2018, Vinod Koul wrote:
> 
> > From: Bjorn Andersson 
> > 
> > Enable remoteproc configs to boot the remoteprocs on QC chipsets. These
> > are common configs and not specific to a specific SoC so should be enabled
> > across the board.
> > 
> > Signed-off-by: Bjorn Andersson 
> > Signed-off-by: Vinod Koul 
> > ---
> >  arch/arm64/configs/defconfig | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > index b02da2e7a39a..b38d92c96c81 100644
> > --- a/arch/arm64/configs/defconfig
> > +++ b/arch/arm64/configs/defconfig
> > @@ -614,9 +614,15 @@ CONFIG_TEGRA_IOMMU_SMMU=y
> >  CONFIG_ARM_SMMU=y
> >  CONFIG_ARM_SMMU_V3=y
> >  CONFIG_QCOM_IOMMU=y
> > +CONFIG_REMOTEPROC=m
> > +CONFIG_QCOM_ADSP_PIL=m
> 
> This is now CONFIG_QCOM_Q6V5_PAS
> 
> > +CONFIG_QCOM_Q6V5_PIL=m
> 
> and CONFIG_QCOM_Q6V5_MSS

Sure I will update this

-- 
~Vinod


Re: [PATCH 2/2] arm64: defconfig: Enable some qcom remoteproc configs

2018-11-08 Thread Vinod Koul
On 08-11-18, 22:27, Bjorn Andersson wrote:
> On Thu 08 Nov 22:16 PST 2018, Vinod Koul wrote:
> 
> > From: Bjorn Andersson 
> > 
> > Enable remoteproc configs to boot the remoteprocs on QC chipsets. These
> > are common configs and not specific to a specific SoC so should be enabled
> > across the board.
> > 
> > Signed-off-by: Bjorn Andersson 
> > Signed-off-by: Vinod Koul 
> > ---
> >  arch/arm64/configs/defconfig | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > index b02da2e7a39a..b38d92c96c81 100644
> > --- a/arch/arm64/configs/defconfig
> > +++ b/arch/arm64/configs/defconfig
> > @@ -614,9 +614,15 @@ CONFIG_TEGRA_IOMMU_SMMU=y
> >  CONFIG_ARM_SMMU=y
> >  CONFIG_ARM_SMMU_V3=y
> >  CONFIG_QCOM_IOMMU=y
> > +CONFIG_REMOTEPROC=m
> > +CONFIG_QCOM_ADSP_PIL=m
> 
> This is now CONFIG_QCOM_Q6V5_PAS
> 
> > +CONFIG_QCOM_Q6V5_PIL=m
> 
> and CONFIG_QCOM_Q6V5_MSS

Sure I will update this

-- 
~Vinod


Re: [LINUX PATCH v12 1/3] dt-bindings: mtd: arasan: Add device tree binding documentation

2018-11-08 Thread Boris Brezillon
On Fri, 9 Nov 2018 10:30:39 +0530
Naga Sureshkumar Relli  wrote:

> This patch adds the dts binding document for arasan nand flash controller
> 
> Signed-off-by: Naga Sureshkumar Relli 
> ---
> Changes in v12:
>  - Removed interrupt-parent description as it is implied as suggested by
>Rob Herring
>  - Added missing ';' as required
> Changes in v11:
>  - Updated compatible description as suggested by Boris
>  - Removed arasan-has-dma property
> Changes in v10:
>  - None
> Changes in v9:
>  - None
> Changes in v8:
>  - Updated compatible and clock-names as per Boris comments
> Changes in v7:
>  - Corrected the acronyms those should be in caps
> Changes in v6:
>  - Removed num-cs property
>  - Separated nandchip from nand controller
> Changes in v5:
>  - None
> Changes in v4:
>  - Added num-cs property
>  - Added clock support
> Changes in v3:
>  - None
> Changes in v2:
>  - None
> ---
>  .../devicetree/bindings/mtd/arasan_nand.txt| 32 
> ++
>  1 file changed, 32 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/mtd/arasan_nand.txt
> 
> diff --git a/Documentation/devicetree/bindings/mtd/arasan_nand.txt 
> b/Documentation/devicetree/bindings/mtd/arasan_nand.txt
> new file mode 100644
> index 000..b522daf
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mtd/arasan_nand.txt
> @@ -0,0 +1,32 @@
> +Arasan NAND Flash Controller with ONFI 3.1 support
> +
> +Required properties:
> +- compatible:Should be "xlnx,zynqmp-nand", "arasan,nfc-v3p10"
> +- reg:   Memory map for module access
> +- interrupts:Should contain the interrupt for the device
> +- clock-name:List of input clocks - "sys", "flash"
> + (See clock bindings for details)
> +- clocks:Clock phandles (see clock bindings for details)
> +
> +Required properties for child node:
> +- nand-ecc-mode: see nand.txt

Why is it required? Can't you fallback to HW when this prop is missing?
Oh, and reg is not listed in the required props.

> +
> +For NAND partition information please refer the below file
> +Documentation/devicetree/bindings/mtd/partition.txt
> +
> +Example:
> + nfc: nand@ff10 {
> + compatible = "xlnx,zynqmp-nand", "arasan,nfc-v3p10";
> + reg = <0x0 0xff10 0x1000>;
> + clock-name = "sys", "flash";
> + clocks = <_clk _clk>;
> + interrupt-parent = <>;
> + interrupts = <0 14 4>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + nand@0 {
> + reg = <0>;
> + nand-ecc-mode = "hw";
> + };
> + };



Re: [LINUX PATCH v12 1/3] dt-bindings: mtd: arasan: Add device tree binding documentation

2018-11-08 Thread Boris Brezillon
On Fri, 9 Nov 2018 10:30:39 +0530
Naga Sureshkumar Relli  wrote:

> This patch adds the dts binding document for arasan nand flash controller
> 
> Signed-off-by: Naga Sureshkumar Relli 
> ---
> Changes in v12:
>  - Removed interrupt-parent description as it is implied as suggested by
>Rob Herring
>  - Added missing ';' as required
> Changes in v11:
>  - Updated compatible description as suggested by Boris
>  - Removed arasan-has-dma property
> Changes in v10:
>  - None
> Changes in v9:
>  - None
> Changes in v8:
>  - Updated compatible and clock-names as per Boris comments
> Changes in v7:
>  - Corrected the acronyms those should be in caps
> Changes in v6:
>  - Removed num-cs property
>  - Separated nandchip from nand controller
> Changes in v5:
>  - None
> Changes in v4:
>  - Added num-cs property
>  - Added clock support
> Changes in v3:
>  - None
> Changes in v2:
>  - None
> ---
>  .../devicetree/bindings/mtd/arasan_nand.txt| 32 
> ++
>  1 file changed, 32 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/mtd/arasan_nand.txt
> 
> diff --git a/Documentation/devicetree/bindings/mtd/arasan_nand.txt 
> b/Documentation/devicetree/bindings/mtd/arasan_nand.txt
> new file mode 100644
> index 000..b522daf
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mtd/arasan_nand.txt
> @@ -0,0 +1,32 @@
> +Arasan NAND Flash Controller with ONFI 3.1 support
> +
> +Required properties:
> +- compatible:Should be "xlnx,zynqmp-nand", "arasan,nfc-v3p10"
> +- reg:   Memory map for module access
> +- interrupts:Should contain the interrupt for the device
> +- clock-name:List of input clocks - "sys", "flash"
> + (See clock bindings for details)
> +- clocks:Clock phandles (see clock bindings for details)
> +
> +Required properties for child node:
> +- nand-ecc-mode: see nand.txt

Why is it required? Can't you fallback to HW when this prop is missing?
Oh, and reg is not listed in the required props.

> +
> +For NAND partition information please refer the below file
> +Documentation/devicetree/bindings/mtd/partition.txt
> +
> +Example:
> + nfc: nand@ff10 {
> + compatible = "xlnx,zynqmp-nand", "arasan,nfc-v3p10";
> + reg = <0x0 0xff10 0x1000>;
> + clock-name = "sys", "flash";
> + clocks = <_clk _clk>;
> + interrupt-parent = <>;
> + interrupts = <0 14 4>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + nand@0 {
> + reg = <0>;
> + nand-ecc-mode = "hw";
> + };
> + };



Re: [PATCH 2/2] arm64: defconfig: Enable some qcom remoteproc configs

2018-11-08 Thread Bjorn Andersson
On Thu 08 Nov 22:16 PST 2018, Vinod Koul wrote:

> From: Bjorn Andersson 
> 
> Enable remoteproc configs to boot the remoteprocs on QC chipsets. These
> are common configs and not specific to a specific SoC so should be enabled
> across the board.
> 
> Signed-off-by: Bjorn Andersson 
> Signed-off-by: Vinod Koul 
> ---
>  arch/arm64/configs/defconfig | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index b02da2e7a39a..b38d92c96c81 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -614,9 +614,15 @@ CONFIG_TEGRA_IOMMU_SMMU=y
>  CONFIG_ARM_SMMU=y
>  CONFIG_ARM_SMMU_V3=y
>  CONFIG_QCOM_IOMMU=y
> +CONFIG_REMOTEPROC=m
> +CONFIG_QCOM_ADSP_PIL=m

This is now CONFIG_QCOM_Q6V5_PAS

> +CONFIG_QCOM_Q6V5_PIL=m

and CONFIG_QCOM_Q6V5_MSS

> +CONFIG_QCOM_SYSMON=m
>  CONFIG_RPMSG_QCOM_GLINK_RPM=y
> +CONFIG_RPMSG_QCOM_GLINK_SMEM=m
>  CONFIG_RPMSG_QCOM_SMD=y
>  CONFIG_RASPBERRYPI_POWER=y
> +CONFIG_QCOM_GLINK_SSR=m
>  CONFIG_QCOM_SMEM=y
>  CONFIG_QCOM_SMD_RPM=y
>  CONFIG_QCOM_SMP2P=y

Apart from that this looks good.

Regards,
Bjorn


Re: [PATCH 2/2] arm64: defconfig: Enable some qcom remoteproc configs

2018-11-08 Thread Bjorn Andersson
On Thu 08 Nov 22:16 PST 2018, Vinod Koul wrote:

> From: Bjorn Andersson 
> 
> Enable remoteproc configs to boot the remoteprocs on QC chipsets. These
> are common configs and not specific to a specific SoC so should be enabled
> across the board.
> 
> Signed-off-by: Bjorn Andersson 
> Signed-off-by: Vinod Koul 
> ---
>  arch/arm64/configs/defconfig | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index b02da2e7a39a..b38d92c96c81 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -614,9 +614,15 @@ CONFIG_TEGRA_IOMMU_SMMU=y
>  CONFIG_ARM_SMMU=y
>  CONFIG_ARM_SMMU_V3=y
>  CONFIG_QCOM_IOMMU=y
> +CONFIG_REMOTEPROC=m
> +CONFIG_QCOM_ADSP_PIL=m

This is now CONFIG_QCOM_Q6V5_PAS

> +CONFIG_QCOM_Q6V5_PIL=m

and CONFIG_QCOM_Q6V5_MSS

> +CONFIG_QCOM_SYSMON=m
>  CONFIG_RPMSG_QCOM_GLINK_RPM=y
> +CONFIG_RPMSG_QCOM_GLINK_SMEM=m
>  CONFIG_RPMSG_QCOM_SMD=y
>  CONFIG_RASPBERRYPI_POWER=y
> +CONFIG_QCOM_GLINK_SSR=m
>  CONFIG_QCOM_SMEM=y
>  CONFIG_QCOM_SMD_RPM=y
>  CONFIG_QCOM_SMP2P=y

Apart from that this looks good.

Regards,
Bjorn


Re: [PATCH 10/12] fs/locks: create a tree of dependent requests.

2018-11-08 Thread NeilBrown
On Thu, Nov 08 2018, J. Bruce Fields wrote:

> On Fri, Nov 09, 2018 at 11:38:19AM +1100, NeilBrown wrote:
>> On Thu, Nov 08 2018, J. Bruce Fields wrote:
>> 
>> > On Mon, Nov 05, 2018 at 12:30:48PM +1100, NeilBrown wrote:
>> >> When we find an existing lock which conflicts with a request,
>> >> and the request wants to wait, we currently add the request
>> >> to a list.  When the lock is removed, the whole list is woken.
>> >> This can cause the thundering-herd problem.
>> >> To reduce the problem, we make use of the (new) fact that
>> >> a pending request can itself have a list of blocked requests.
>> >> When we find a conflict, we look through the existing blocked requests.
>> >> If any one of them blocks the new request, the new request is attached
>> >> below that request, otherwise it is added to the list of blocked
>> >> requests, which are now known to be mutually non-conflicting.
>> >> 
>> >> This way, when the lock is released, only a set of non-conflicting
>> >> locks will be woken, the rest can stay asleep.
>> >> If the lock request cannot be granted and the request needs to be
>> >> requeued, all the other requests it blocks will then be woken
>> >
>> > So, to make sure I understand: the tree of blocking locks only ever has
>> > three levels (the active lock, the locks blocking on it, and their
>> > children?)
>> 
>> Not correct.
>> Blocks is only vertical, never horizontal.  Siblings never block each
>> other.
>> So one process hold a lock on a byte, and 27 other process want a lock
>> on that byte, then there will be 28 levels in a narrow tree - it is
>> effectively a queue.
>> Branching (via siblings) only happens when a child conflict with only
>> part of the lock held by the parent.
>> So if one process locks 32K, then two other processes request locks on
>> the 2 16K halves, then 4 processes request locks on the 8K quarters, and
>> so-on, then you could end up with 32767 processes in a binary tree, with
>> half of them all waiting on different individual bytes.
>
> Maybe I should actually read the code carefully instead of just skimming
> the changelog and jumping to conclusions.
>
> I think this is correct, but I wish we had an actual written-out
> argument that it's correct, because intuition isn't a great guide for
> posix file locks.
>
> Maybe:
>
> Waiting and applied locks are all kept in trees whose properties are:
>
>   - the root of a tree may be an applied or unapplied lock.
>   - every other node in the tree is an unapplied lock that
> conflicts with every ancestor of that node.
>
> Every such tree begins life as an unapplied singleton which obviously
> satisfies the above properties.
>
> The only ways we modify trees preserve these properties:
>
>   1. We may add a new child, but only after first verifying that it
>  conflicts with all of its ancestors.
>   2. We may remove the root of a tree, creating a new singleton
>  tree from the root and N new trees rooted in the immediate
>  children.
>   3. If the root of a tree is not currently an applied lock, we may
>  apply it (if possible).
>   4. We may upgrade the root of the tree (either extend its range,
>  or upgrade its entire range from read to write).
>
> When an applied lock is modified in a way that reduces or downgrades any
> part of its range, we remove all its children (2 above).
>
> For each of those child trees: if the root of the tree applies, we do so
> (3).  If it doesn't, it must conflict with some applied lock.  We remove
> all of its children (2), and add it is a new leaf to the tree rooted in
> the applied lock (1).  We then repeat the process recursively with those
> children.
>

Thanks pretty thorough - and even looks correct.
I'll re-reading some time when it isn't late, and maybe make it into a
comment in the code.
I agree, this sort of documentation can be quite helpful.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH 10/12] fs/locks: create a tree of dependent requests.

2018-11-08 Thread NeilBrown
On Thu, Nov 08 2018, J. Bruce Fields wrote:

> On Fri, Nov 09, 2018 at 11:38:19AM +1100, NeilBrown wrote:
>> On Thu, Nov 08 2018, J. Bruce Fields wrote:
>> 
>> > On Mon, Nov 05, 2018 at 12:30:48PM +1100, NeilBrown wrote:
>> >> When we find an existing lock which conflicts with a request,
>> >> and the request wants to wait, we currently add the request
>> >> to a list.  When the lock is removed, the whole list is woken.
>> >> This can cause the thundering-herd problem.
>> >> To reduce the problem, we make use of the (new) fact that
>> >> a pending request can itself have a list of blocked requests.
>> >> When we find a conflict, we look through the existing blocked requests.
>> >> If any one of them blocks the new request, the new request is attached
>> >> below that request, otherwise it is added to the list of blocked
>> >> requests, which are now known to be mutually non-conflicting.
>> >> 
>> >> This way, when the lock is released, only a set of non-conflicting
>> >> locks will be woken, the rest can stay asleep.
>> >> If the lock request cannot be granted and the request needs to be
>> >> requeued, all the other requests it blocks will then be woken
>> >
>> > So, to make sure I understand: the tree of blocking locks only ever has
>> > three levels (the active lock, the locks blocking on it, and their
>> > children?)
>> 
>> Not correct.
>> Blocks is only vertical, never horizontal.  Siblings never block each
>> other.
>> So one process hold a lock on a byte, and 27 other process want a lock
>> on that byte, then there will be 28 levels in a narrow tree - it is
>> effectively a queue.
>> Branching (via siblings) only happens when a child conflict with only
>> part of the lock held by the parent.
>> So if one process locks 32K, then two other processes request locks on
>> the 2 16K halves, then 4 processes request locks on the 8K quarters, and
>> so-on, then you could end up with 32767 processes in a binary tree, with
>> half of them all waiting on different individual bytes.
>
> Maybe I should actually read the code carefully instead of just skimming
> the changelog and jumping to conclusions.
>
> I think this is correct, but I wish we had an actual written-out
> argument that it's correct, because intuition isn't a great guide for
> posix file locks.
>
> Maybe:
>
> Waiting and applied locks are all kept in trees whose properties are:
>
>   - the root of a tree may be an applied or unapplied lock.
>   - every other node in the tree is an unapplied lock that
> conflicts with every ancestor of that node.
>
> Every such tree begins life as an unapplied singleton which obviously
> satisfies the above properties.
>
> The only ways we modify trees preserve these properties:
>
>   1. We may add a new child, but only after first verifying that it
>  conflicts with all of its ancestors.
>   2. We may remove the root of a tree, creating a new singleton
>  tree from the root and N new trees rooted in the immediate
>  children.
>   3. If the root of a tree is not currently an applied lock, we may
>  apply it (if possible).
>   4. We may upgrade the root of the tree (either extend its range,
>  or upgrade its entire range from read to write).
>
> When an applied lock is modified in a way that reduces or downgrades any
> part of its range, we remove all its children (2 above).
>
> For each of those child trees: if the root of the tree applies, we do so
> (3).  If it doesn't, it must conflict with some applied lock.  We remove
> all of its children (2), and add it is a new leaf to the tree rooted in
> the applied lock (1).  We then repeat the process recursively with those
> children.
>

Thanks pretty thorough - and even looks correct.
I'll re-reading some time when it isn't late, and maybe make it into a
comment in the code.
I agree, this sort of documentation can be quite helpful.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH 2/5] mtd: rawnand: qcom: remove driver specific block_markbad function

2018-11-08 Thread Abhishek Sahu

On 2018-11-04 21:26, Boris Brezillon wrote:

Hi Abhishek,

On Fri, 20 Jul 2018 15:03:48 +0200
Boris Brezillon  wrote:


On Fri, 20 Jul 2018 17:46:38 +0530
Abhishek Sahu  wrote:

> Hi Boris,
>
> On 2018-07-19 03:13, Boris Brezillon wrote:
> > On Wed, 18 Jul 2018 23:23:50 +0200
> > Miquel Raynal  wrote:
> >
> >> Boris,
> >>
> >> Can you please check the change in qcom_nandc_write_oob() is
> >> valid? I think it is but as this is a bit of a hack I prefer double
> >> checking.
> >
> > Indeed, it's hack-ish.
> >
> >>
> >> Thanks,
> >> Miquèl
> >>
> >>
> >> Abhishek Sahu  wrote on Fri,  6 Jul 2018
> >> 13:21:56 +0530:
> >>
> >> > The NAND base layer calls write_oob() by setting bytes at
> >> > chip->badblockpos with value non 0xFF for updating bad block status.
> >> > The QCOM NAND controller skips the bad block bytes while doing normal
> >> > write with ECC enabled. When initial support for this driver was
> >> > added, the driver specific function was added temporarily for
> >> > block_markbad() with assumption to change for raw read in NAND base
> >> > layer. Moving to raw read for block_markbad() seems to take more time
> >> > so this patch removes driver specific block_markbad() function by
> >> > using following HACK in write_oob() function.
> >> >
> >> > Check for BBM bytes in OOB and accordingly do raw write for updating
> >> > BBM bytes in NAND flash or normal write for updating available OOB
> >> > bytes.
> >
> > Why don't we change that instead of patching the qcom driver to guess
> > when the core tries to mark a block bad? If you're afraid of breaking
> > existing drivers that might rely on the "write/read BBM in non-raw
> > mode" solution (I'm sure some drivers are), you can always add a new
> > flag in chip->options (NAND_ACCESS_BBM_IN_RAW_MODE) and only use raw
> > accessors when this flag is set.
> >
>
>   We started with that Only
>
>   http://patchwork.ozlabs.org/patch/508565/
>
>   and since we didn't conclude, we went for driver
>   specific bad block check and mark bad block functions.
>
>   Now, we wanted to get rid of driver specific functions
>
>   1. For bad block check, we found the way to get the BBM bytes
>  with ECC read. Controller updates BBM in separate register
>  which we can read and update the same in OOB. Patch #1 of
>  series does the same.
>
>   2. For bad block mark, there is no way to update in ECC mode
>  that's why we went for HACK to get rid of driver specific
>  function.
>
>   If adding flag is fine now then this HACK won't be required.

Yep. I'm fine with that. Can you rebase the patch you pointed out on 
top

of nand/next and move the flag to chip->options instead of
chip->bbt_options + prefix it with NAND_ instead of NAND_BBT_?


I'm currently trying to get rid of chip->block_bad() (now placed in
chip->legacy.block_bad()), and I wanted to know if you were still
planning to submit the changes we discussed in this thread. If you
don't have time, please let me know and I'll try to do it.



 Sorry Boris, I couldn't work on these patches.

 Currently, I am working on non open source projects so
 I can't submit any patches in open source till this project
 completion due to legal guidelines.

 If this is urgent then you can try. I will help in
 QCOM related stuffs and testing.

 Thanks,
 Abhishek



Re: [PATCH 2/5] mtd: rawnand: qcom: remove driver specific block_markbad function

2018-11-08 Thread Abhishek Sahu

On 2018-11-04 21:26, Boris Brezillon wrote:

Hi Abhishek,

On Fri, 20 Jul 2018 15:03:48 +0200
Boris Brezillon  wrote:


On Fri, 20 Jul 2018 17:46:38 +0530
Abhishek Sahu  wrote:

> Hi Boris,
>
> On 2018-07-19 03:13, Boris Brezillon wrote:
> > On Wed, 18 Jul 2018 23:23:50 +0200
> > Miquel Raynal  wrote:
> >
> >> Boris,
> >>
> >> Can you please check the change in qcom_nandc_write_oob() is
> >> valid? I think it is but as this is a bit of a hack I prefer double
> >> checking.
> >
> > Indeed, it's hack-ish.
> >
> >>
> >> Thanks,
> >> Miquèl
> >>
> >>
> >> Abhishek Sahu  wrote on Fri,  6 Jul 2018
> >> 13:21:56 +0530:
> >>
> >> > The NAND base layer calls write_oob() by setting bytes at
> >> > chip->badblockpos with value non 0xFF for updating bad block status.
> >> > The QCOM NAND controller skips the bad block bytes while doing normal
> >> > write with ECC enabled. When initial support for this driver was
> >> > added, the driver specific function was added temporarily for
> >> > block_markbad() with assumption to change for raw read in NAND base
> >> > layer. Moving to raw read for block_markbad() seems to take more time
> >> > so this patch removes driver specific block_markbad() function by
> >> > using following HACK in write_oob() function.
> >> >
> >> > Check for BBM bytes in OOB and accordingly do raw write for updating
> >> > BBM bytes in NAND flash or normal write for updating available OOB
> >> > bytes.
> >
> > Why don't we change that instead of patching the qcom driver to guess
> > when the core tries to mark a block bad? If you're afraid of breaking
> > existing drivers that might rely on the "write/read BBM in non-raw
> > mode" solution (I'm sure some drivers are), you can always add a new
> > flag in chip->options (NAND_ACCESS_BBM_IN_RAW_MODE) and only use raw
> > accessors when this flag is set.
> >
>
>   We started with that Only
>
>   http://patchwork.ozlabs.org/patch/508565/
>
>   and since we didn't conclude, we went for driver
>   specific bad block check and mark bad block functions.
>
>   Now, we wanted to get rid of driver specific functions
>
>   1. For bad block check, we found the way to get the BBM bytes
>  with ECC read. Controller updates BBM in separate register
>  which we can read and update the same in OOB. Patch #1 of
>  series does the same.
>
>   2. For bad block mark, there is no way to update in ECC mode
>  that's why we went for HACK to get rid of driver specific
>  function.
>
>   If adding flag is fine now then this HACK won't be required.

Yep. I'm fine with that. Can you rebase the patch you pointed out on 
top

of nand/next and move the flag to chip->options instead of
chip->bbt_options + prefix it with NAND_ instead of NAND_BBT_?


I'm currently trying to get rid of chip->block_bad() (now placed in
chip->legacy.block_bad()), and I wanted to know if you were still
planning to submit the changes we discussed in this thread. If you
don't have time, please let me know and I'll try to do it.



 Sorry Boris, I couldn't work on these patches.

 Currently, I am working on non open source projects so
 I can't submit any patches in open source till this project
 completion due to legal guidelines.

 If this is urgent then you can try. I will help in
 QCOM related stuffs and testing.

 Thanks,
 Abhishek



  1   2   3   4   5   6   7   8   9   10   >