Re: [PATCH v5 1/5] mm: add vm_insert_mixed_mkwrite()

2017-07-24 Thread Kirill A. Shutemov
On Mon, Jul 24, 2017 at 11:06:12AM -0600, Ross Zwisler wrote:
> @@ -1658,14 +1658,35 @@ static int insert_pfn(struct vm_area_struct *vma, 
> unsigned long addr,
>   if (!pte)
>   goto out;
>   retval = -EBUSY;
> - if (!pte_none(*pte))
> - goto out_unlock;
> + if (!pte_none(*pte)) {
> + if (mkwrite) {
> + /*
> +  * For read faults on private mappings the PFN passed
> +  * in may not match the PFN we have mapped if the
> +  * mapped PFN is a writeable COW page.  In the mkwrite
> +  * case we are creating a writable PTE for a shared
> +  * mapping and we expect the PFNs to match.
> +  */

Can we?

I guess it's up to filesystem if it wants to reuse the same spot to write
data or not. I think your assumptions works for ext4 and xfs. I wouldn't
be that sure for btrfs or other filesystems with CoW support.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kthread: Fix documentation build warning

2017-07-24 Thread Jonathan Corbet
On Mon, 24 Jul 2017 14:24:44 -0700
Randy Dunlap  wrote:

> > + * @arg...: arguments for @namefmt.
> >   *  
> 
> Hm, Documentation/doc-guide/kernel-doc.rst says:
> If a function parameter is ``...`` (varargs), it should be listed
> in kernel-doc notation as: ``@...:``.
> 
> but the patch here is for a macro, not a function.
> Does that make a difference?

Macros are a little different.  I've tried a couple of times to figure out
and rationalize the "..." handling, should maybe do so again.  Meanwhile,
this makes the warning go away, enough for one day :)

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kthread: Fix documentation build warning

2017-07-24 Thread Randy Dunlap
On 07/24/2017 12:59 PM, Jonathan Corbet wrote:
> The kerneldoc comment for kthread_create() had an incorrect argument name,
> leading to a warning in the docs build.  Correct it, and make one more
> small step toward a warning-free build.
> 
> Signed-off-by: Jonathan Corbet 
> ---
>  include/linux/kthread.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 4fec8b775895..82e197eeac91 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -15,7 +15,7 @@ struct task_struct *kthread_create_on_node(int 
> (*threadfn)(void *data),
>   * @threadfn: the function to run in the thread
>   * @data: data pointer for @threadfn()
>   * @namefmt: printf-style format string for the thread name
> - * @...: arguments for @namefmt.
> + * @arg...: arguments for @namefmt.
>   *

Hm, Documentation/doc-guide/kernel-doc.rst says:
If a function parameter is ``...`` (varargs), it should be listed
in kernel-doc notation as: ``@...:``.

but the patch here is for a macro, not a function.
Does that make a difference?

>   * This macro will create a kthread on the current node, leaving it in
>   * the stopped state.  This is just a helper for kthread_create_on_node();
> 


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/13] net: dsa: lan9303: unicast offload, fdb,mdb,STP

2017-07-24 Thread David Miller
From: Egil Hjelmeland 
Date: Mon, 24 Jul 2017 16:47:51 +0200

> This is my first patches submitted to the kernel, so I am looking
> forward to comments.

Please clean up how the dates are handled in your submission.

They are all over the place, over a period of 3 days.

Instead, they should be consequentive, near the moment the patch is
submitted.

We manage patches in patchwork, and there the patches are ordered in
my queue based upon date.  So instead of a nice clean order of changes
showing up recently at the top of my queue, your's got mixed in deep
near the bottom of the queue, intermixed with other unrelated changes.

This seriously makes things more difficult for me.

The best thing to do is to apply your series into a fresh tree (which
you pretty much _MUST_ do anyways, to make sure your changes apply,
build and work properly in my GIT tree, right?) and then extract those
commits for your patch series emails.

You must also say in your subject line which of my two GIT networking
trees ('net' or 'net-next') your changes are targetting.  If you don't
know, you need to figure that out before submitting.

I'm not applying this series until you fix your process up.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kthread: Fix documentation build warning

2017-07-24 Thread Jonathan Corbet
The kerneldoc comment for kthread_create() had an incorrect argument name,
leading to a warning in the docs build.  Correct it, and make one more
small step toward a warning-free build.

Signed-off-by: Jonathan Corbet 
---
 include/linux/kthread.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 4fec8b775895..82e197eeac91 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -15,7 +15,7 @@ struct task_struct *kthread_create_on_node(int 
(*threadfn)(void *data),
  * @threadfn: the function to run in the thread
  * @data: data pointer for @threadfn()
  * @namefmt: printf-style format string for the thread name
- * @...: arguments for @namefmt.
+ * @arg...: arguments for @namefmt.
  *
  * This macro will create a kthread on the current node, leaving it in
  * the stopped state.  This is just a helper for kthread_create_on_node();
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sched/wait: Clean up some documentation warnings

2017-07-24 Thread Jonathan Corbet
A couple of kerneldoc comments in  had incorrect names for
macro parameters, with this unsightly result:

  ./include/linux/wait.h:555: warning: No description found for parameter 'wq'
  ./include/linux/wait.h:555: warning: Excess function parameter 'wq_head' 
description in 'wait_event_interruptible_hrtimeout'
  ./include/linux/wait.h:759: warning: No description found for parameter 
'wq_head'
  ./include/linux/wait.h:759: warning: Excess function parameter 'wq' 
description in 'wait_event_killable'

Correct the comments and kill the warnings.

Signed-off-by: Jonathan Corbet 
---
 include/linux/wait.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index b289c96151ee..5b74e36c0ca8 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -529,13 +529,13 @@ do {  
\
 
 /**
  * wait_event_interruptible_hrtimeout - sleep until a condition gets true or a 
timeout elapses
- * @wq_head: the waitqueue to wait on
+ * @wq: the waitqueue to wait on
  * @condition: a C expression for the event to wait for
  * @timeout: timeout, as a ktime_t
  *
  * The process is put to sleep (TASK_INTERRUPTIBLE) until the
  * @condition evaluates to true or a signal is received.
- * The @condition is checked each time the waitqueue @wq_head is woken up.
+ * The @condition is checked each time the waitqueue @wq is woken up.
  *
  * wake_up() has to be called after changing any variable that could
  * change the result of the wait condition.
@@ -735,12 +735,12 @@ extern int do_wait_intr_irq(wait_queue_head_t *, 
wait_queue_entry_t *);
 
 /**
  * wait_event_killable - sleep until a condition gets true
- * @wq: the waitqueue to wait on
+ * @wq_head: the waitqueue to wait on
  * @condition: a C expression for the event to wait for
  *
  * The process is put to sleep (TASK_KILLABLE) until the
  * @condition evaluates to true or a signal is received.
- * The @condition is checked each time the waitqueue @wq is woken up.
+ * The @condition is checked each time the waitqueue @wq_head is woken up.
  *
  * wake_up() has to be called after changing any variable that could
  * change the result of the wait condition.
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sched/core: Fix some documentation build warnings

2017-07-24 Thread Jonathan Corbet
The kerneldoc comments for try_to_wake_up_local() were out of date, leading
to these documentation build warnings:

  ./kernel/sched/core.c:2080: warning: No description found for parameter 'rf'
  ./kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' 
description in 'try_to_wake_up_local'

Update the comment to reflect current reality and give us some peace and
quiet.

Signed-off-by: Jonathan Corbet 
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 17c667b427b4..0869b20fba81 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2069,7 +2069,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, 
int wake_flags)
 /**
  * try_to_wake_up_local - try to wake up a local task with rq lock held
  * @p: the thread to be awakened
- * @cookie: context's cookie for pinning
+ * @rf: request-queue flags for pinning
  *
  * Put @p on the run-queue if it's not already there. The caller must
  * ensure that this_rq() is locked, @p is bound to this_rq() and not
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mod_devicetable.h: Fix docs build warnings

2017-07-24 Thread Jonathan Corbet
Commit 0afef45654ae908536278ecb143ded5bbc713391 (staging: fsl-mc: add
support for device table matching) added kerneldoc comments for two
nonexistent structure fields, leading to these warnings in the docs build:

  ./include/linux/mod_devicetable.h:687: warning: Excess 
struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id'
  ./include/linux/mod_devicetable.h:687: warning: Excess 
struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id'

Remove the offending lines to make the docs build a bit quieter.

CC: Stuart Yoder 
Signed-off-by: Jonathan Corbet 
---
Greg, you seem as likely a person as any to take this one...?

 include/linux/mod_devicetable.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 3f74ef2281e8..694cebb50f72 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -674,8 +674,6 @@ struct ulpi_device_id {
  * struct fsl_mc_device_id - MC object device identifier
  * @vendor: vendor ID
  * @obj_type: MC object type
- * @ver_major: MC object version major number
- * @ver_minor: MC object version minor number
  *
  * Type of entries in the "device Id" table for MC object devices supported by
  * a MC object device driver. The last entry of the table has vendor set to 0x0
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/5] mm: add vm_insert_mixed_mkwrite()

2017-07-24 Thread Ross Zwisler
To be able to use the common 4k zero page in DAX we need to have our PTE
fault path look more like our PMD fault path where a PTE entry can be
marked as dirty and writeable as it is first inserted rather than waiting
for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.

Right now we can rely on having a dax_pfn_mkwrite() call because we can
distinguish between these two cases in do_wp_page():

case 1: 4k zero page => writable DAX storage
case 2: read-only DAX storage => writeable DAX storage

This distinction is made by via vm_normal_page().  vm_normal_page() returns
false for the common 4k zero page, though, just as it does for DAX ptes.
Instead of special casing the DAX + 4k zero page case we will simplify our
DAX PTE page fault sequence so that it matches our DAX PMD sequence, and
get rid of the dax_pfn_mkwrite() helper.  We will instead use
dax_iomap_fault() to handle write-protection faults.

This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set insert_pfn()
will do the work that was previously done by wp_page_reuse() as part of the
dax_pfn_mkwrite() call path.

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
---
 include/linux/mm.h |  2 ++
 mm/memory.c| 50 +++---
 2 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5..483e84c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2293,6 +2293,8 @@ int vm_insert_pfn_prot(struct vm_area_struct *vma, 
unsigned long addr,
unsigned long pfn, pgprot_t pgprot);
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
pfn_t pfn);
+int vm_insert_mixed_mkwrite(struct vm_area_struct *vma, unsigned long addr,
+   pfn_t pfn);
 int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned 
long len);
 
 
diff --git a/mm/memory.c b/mm/memory.c
index 0e517be..b29dd42 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1646,7 +1646,7 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned 
long addr,
 EXPORT_SYMBOL(vm_insert_page);
 
 static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
-   pfn_t pfn, pgprot_t prot)
+   pfn_t pfn, pgprot_t prot, bool mkwrite)
 {
struct mm_struct *mm = vma->vm_mm;
int retval;
@@ -1658,14 +1658,35 @@ static int insert_pfn(struct vm_area_struct *vma, 
unsigned long addr,
if (!pte)
goto out;
retval = -EBUSY;
-   if (!pte_none(*pte))
-   goto out_unlock;
+   if (!pte_none(*pte)) {
+   if (mkwrite) {
+   /*
+* For read faults on private mappings the PFN passed
+* in may not match the PFN we have mapped if the
+* mapped PFN is a writeable COW page.  In the mkwrite
+* case we are creating a writable PTE for a shared
+* mapping and we expect the PFNs to match.
+*/
+   if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn)))
+   goto out_unlock;
+   entry = *pte;
+   goto out_mkwrite;
+   } else
+   goto out_unlock;
+   }
 
/* Ok, finally just insert the thing.. */
if (pfn_t_devmap(pfn))
entry = pte_mkdevmap(pfn_t_pte(pfn, prot));
else
entry = pte_mkspecial(pfn_t_pte(pfn, prot));
+
+out_mkwrite:
+   if (mkwrite) {
+   entry = pte_mkyoung(entry);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   }
+
set_pte_at(mm, addr, pte, entry);
update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
 
@@ -1736,14 +1757,15 @@ int vm_insert_pfn_prot(struct vm_area_struct *vma, 
unsigned long addr,
 
track_pfn_insert(vma, , __pfn_to_pfn_t(pfn, PFN_DEV));
 
-   ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot);
+   ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot,
+   false);
 
return ret;
 }
 EXPORT_SYMBOL(vm_insert_pfn_prot);
 
-int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
-   pfn_t pfn)
+static int __vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
+   pfn_t pfn, bool mkwrite)
 {
pgprot_t pgprot = vma->vm_page_prot;
 
@@ -1772,10 +1794,24 @@ int vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
page = pfn_to_page(pfn_t_to_pfn(pfn));
return insert_page(vma, addr, page, pgprot);
}
-   return insert_pfn(vma, addr, pfn, pgprot);
+   

[PATCH v5 4/5] dax: remove DAX code from page_cache_tree_insert()

2017-07-24 Thread Ross Zwisler
Now that we no longer insert struct page pointers in DAX radix trees we can
remove the special casing for DAX in page_cache_tree_insert().  This also
allows us to make dax_wake_mapping_entry_waiter() local to fs/dax.c,
removing it from dax.h.

Signed-off-by: Ross Zwisler 
Suggested-by: Jan Kara 
Reviewed-by: Jan Kara 
---
 fs/dax.c|  2 +-
 include/linux/dax.h |  2 --
 mm/filemap.c| 13 ++---
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 8f8bcb8..a0484a1 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -127,7 +127,7 @@ static int wake_exceptional_entry_func(wait_queue_entry_t 
*wait, unsigned int mo
  * correct waitqueue where tasks might be waiting for that old 'entry' and
  * wake them.
  */
-void dax_wake_mapping_entry_waiter(struct address_space *mapping,
+static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
pgoff_t index, void *entry, bool wake_all)
 {
struct exceptional_entry_key key;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 29cced8..afa99bb 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -122,8 +122,6 @@ int dax_iomap_fault(struct vm_fault *vmf, enum 
page_entry_size pe_size,
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
  pgoff_t index);
-void dax_wake_mapping_entry_waiter(struct address_space *mapping,
-   pgoff_t index, void *entry, bool wake_all);
 
 #ifdef CONFIG_FS_DAX
 int __dax_zero_page_range(struct block_device *bdev,
diff --git a/mm/filemap.c b/mm/filemap.c
index a497024..1bf1265 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -130,17 +130,8 @@ static int page_cache_tree_insert(struct address_space 
*mapping,
return -EEXIST;
 
mapping->nrexceptional--;
-   if (!dax_mapping(mapping)) {
-   if (shadowp)
-   *shadowp = p;
-   } else {
-   /* DAX can replace empty locked entry with a hole */
-   WARN_ON_ONCE(p !=
-   dax_radix_locked_entry(0, RADIX_DAX_EMPTY));
-   /* Wakeup waiters for exceptional entry lock */
-   dax_wake_mapping_entry_waiter(mapping, page->index, p,
- true);
-   }
+   if (shadowp)
+   *shadowp = p;
}
__radix_tree_replace(>page_tree, node, slot, page,
 workingset_update_node, mapping);
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/5] dax: move all DAX radix tree defs to fs/dax.c

2017-07-24 Thread Ross Zwisler
Now that we no longer insert struct page pointers in DAX radix trees the
page cache code no longer needs to know anything about DAX exceptional
entries.  Move all the DAX exceptional entry definitions from dax.h to
fs/dax.c.

Signed-off-by: Ross Zwisler 
Suggested-by: Jan Kara 
Reviewed-by: Jan Kara 
---
 fs/dax.c| 34 ++
 include/linux/dax.h | 41 -
 2 files changed, 34 insertions(+), 41 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index a0484a1..75760f7 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -54,6 +54,40 @@ static int __init init_dax_wait_table(void)
 }
 fs_initcall(init_dax_wait_table);
 
+/*
+ * We use lowest available bit in exceptional entry for locking, one bit for
+ * the entry size (PMD) and two more to tell us if the entry is a zero page or
+ * an empty entry that is just used for locking.  In total four special bits.
+ *
+ * If the PMD bit isn't set the entry has size PAGE_SIZE, and if the ZERO_PAGE
+ * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem
+ * block allocation.
+ */
+#define RADIX_DAX_SHIFT(RADIX_TREE_EXCEPTIONAL_SHIFT + 4)
+#define RADIX_DAX_ENTRY_LOCK   (1 << RADIX_TREE_EXCEPTIONAL_SHIFT)
+#define RADIX_DAX_PMD  (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 1))
+#define RADIX_DAX_ZERO_PAGE(1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 2))
+#define RADIX_DAX_EMPTY(1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 
3))
+
+static unsigned long dax_radix_sector(void *entry)
+{
+   return (unsigned long)entry >> RADIX_DAX_SHIFT;
+}
+
+static void *dax_radix_locked_entry(sector_t sector, unsigned long flags)
+{
+   return (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | flags |
+   ((unsigned long)sector << RADIX_DAX_SHIFT) |
+   RADIX_DAX_ENTRY_LOCK);
+}
+
+static unsigned int dax_radix_order(void *entry)
+{
+   if ((unsigned long)entry & RADIX_DAX_PMD)
+   return PMD_SHIFT - PAGE_SHIFT;
+   return 0;
+}
+
 static int dax_is_pmd_entry(void *entry)
 {
return (unsigned long)entry & RADIX_DAX_PMD;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index afa99bb..d0e3272 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -88,33 +88,6 @@ void dax_flush(struct dax_device *dax_dev, pgoff_t pgoff, 
void *addr,
size_t size);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 
-/*
- * We use lowest available bit in exceptional entry for locking, one bit for
- * the entry size (PMD) and two more to tell us if the entry is a zero page or
- * an empty entry that is just used for locking.  In total four special bits.
- *
- * If the PMD bit isn't set the entry has size PAGE_SIZE, and if the ZERO_PAGE
- * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem
- * block allocation.
- */
-#define RADIX_DAX_SHIFT(RADIX_TREE_EXCEPTIONAL_SHIFT + 4)
-#define RADIX_DAX_ENTRY_LOCK (1 << RADIX_TREE_EXCEPTIONAL_SHIFT)
-#define RADIX_DAX_PMD (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 1))
-#define RADIX_DAX_ZERO_PAGE (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 2))
-#define RADIX_DAX_EMPTY (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 3))
-
-static inline unsigned long dax_radix_sector(void *entry)
-{
-   return (unsigned long)entry >> RADIX_DAX_SHIFT;
-}
-
-static inline void *dax_radix_locked_entry(sector_t sector, unsigned long 
flags)
-{
-   return (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | flags |
-   ((unsigned long)sector << RADIX_DAX_SHIFT) |
-   RADIX_DAX_ENTRY_LOCK);
-}
-
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops);
 int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
@@ -136,20 +109,6 @@ static inline int __dax_zero_page_range(struct 
block_device *bdev,
 }
 #endif
 
-#ifdef CONFIG_FS_DAX_PMD
-static inline unsigned int dax_radix_order(void *entry)
-{
-   if ((unsigned long)entry & RADIX_DAX_PMD)
-   return PMD_SHIFT - PAGE_SHIFT;
-   return 0;
-}
-#else
-static inline unsigned int dax_radix_order(void *entry)
-{
-   return 0;
-}
-#endif
-
 static inline bool dax_mapping(struct address_space *mapping)
 {
return mapping->host && IS_DAX(mapping->host);
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/5] dax: use common 4k zero page for dax mmap reads

2017-07-24 Thread Ross Zwisler
When servicing mmap() reads from file holes the current DAX code allocates
a page cache page of all zeroes and places the struct page pointer in the
mapping->page_tree radix tree.  This has three major drawbacks:

1) It consumes memory unnecessarily.  For every 4k page that is read via a
DAX mmap() over a hole, we allocate a new page cache page.  This means that
if you read 1GiB worth of pages, you end up using 1GiB of zeroed memory.
This is easily visible by looking at the overall memory consumption of the
system or by looking at /proc/[pid]/smaps:

7f62e72b3000-7f63272b3000 rw-s  103:00 12   /root/dax/data
Size:1048576 kB
Rss: 1048576 kB
Pss: 1048576 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean:   1048576 kB
Private_Dirty: 0 kB
Referenced:  1048576 kB
Anonymous: 0 kB
LazyFree:  0 kB
AnonHugePages: 0 kB
ShmemPmdMapped:0 kB
Shared_Hugetlb:0 kB
Private_Hugetlb:   0 kB
Swap:  0 kB
SwapPss:   0 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB
Locked:0 kB

2) It is slower than using a common zero page because each page fault has
more work to do.  Instead of just inserting a common zero page we have to
allocate a page cache page, zero it, and then insert it.  Here are the
average latencies of dax_load_hole() as measured by ftrace on a random test
box:

Old method, using zeroed page cache pages:  3.4 us
New method, using the common 4k zero page:  0.8 us

This was the average latency over 1 GiB of sequential reads done by this
simple fio script:

  [global]
  size=1G
  filename=/root/dax/data
  fallocate=none
  [io]
  rw=read
  ioengine=mmap

3) The fact that we had to check for both DAX exceptional entries and for
page cache pages in the radix tree made the DAX code more complex.

Solve these issues by following the lead of the DAX PMD code and using a
common 4k zero page instead.  As with the PMD code we will now insert a DAX
exceptional entry into the radix tree instead of a struct page pointer
which allows us to remove all the special casing in the DAX code.

Note that we do still pretty aggressively check for regular pages in the
DAX radix tree, especially where we take action based on the bits set in
the page.  If we ever find a regular page in our radix tree now that most
likely means that someone besides DAX is inserting pages (which has
happened lots of times in the past), and we want to find that out early and
fail loudly.

This solution also removes the extra memory consumption.  Here is that same
/proc/[pid]/smaps after 1GiB of reading from a hole with the new code:

7f2054a74000-7f2094a74000 rw-s  103:00 12   /root/dax/data
Size:1048576 kB
Rss:   0 kB
Pss:   0 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced:0 kB
Anonymous: 0 kB
LazyFree:  0 kB
AnonHugePages: 0 kB
ShmemPmdMapped:0 kB
Shared_Hugetlb:0 kB
Private_Hugetlb:   0 kB
Swap:  0 kB
SwapPss:   0 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB
Locked:0 kB

Overall system memory consumption is similarly improved.

Another major change is that we remove dax_pfn_mkwrite() from our fault
flow, and instead rely on the page fault itself to make the PTE dirty and
writeable.  The following description from the patch adding the
vm_insert_mixed_mkwrite() call explains this a little more:

***
To be able to use the common 4k zero page in DAX we need to have our
PTE fault path look more like our PMD fault path where a PTE entry can
be marked as dirty and writeable as it is first inserted rather than
waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault()
call.

Right now we can rely on having a dax_pfn_mkwrite() call because we can
distinguish between these two cases in do_wp_page():

case 1: 4k zero page => writable DAX storage
case 2: read-only DAX storage => writeable DAX storage

This distinction is made by via vm_normal_page().  vm_normal_page()
returns false for the common 4k zero page, though, just as it does for
DAX ptes.  Instead of special casing the DAX + 4k zero page case we
will simplify our DAX PTE page fault sequence so that it matches our
DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.  We will
instead use dax_iomap_fault() to handle write-protection faults.

This means that 

[PATCH v5 2/5] dax: relocate some dax functions

2017-07-24 Thread Ross Zwisler
dax_load_hole() will soon need to call dax_insert_mapping_entry(), so it
needs to be moved lower in dax.c so the definition exists.

dax_wake_mapping_entry_waiter() will soon be removed from dax.h and be made
static to dax.c, so we need to move its definition above all its callers.

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
---
 fs/dax.c | 138 +++
 1 file changed, 69 insertions(+), 69 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 306c2b6..197067f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -121,6 +121,31 @@ static int wake_exceptional_entry_func(wait_queue_entry_t 
*wait, unsigned int mo
 }
 
 /*
+ * We do not necessarily hold the mapping->tree_lock when we call this
+ * function so it is possible that 'entry' is no longer a valid item in the
+ * radix tree.  This is okay because all we really need to do is to find the
+ * correct waitqueue where tasks might be waiting for that old 'entry' and
+ * wake them.
+ */
+void dax_wake_mapping_entry_waiter(struct address_space *mapping,
+   pgoff_t index, void *entry, bool wake_all)
+{
+   struct exceptional_entry_key key;
+   wait_queue_head_t *wq;
+
+   wq = dax_entry_waitqueue(mapping, index, entry, );
+
+   /*
+* Checking for locked entry and prepare_to_wait_exclusive() happens
+* under mapping->tree_lock, ditto for entry handling in our callers.
+* So at this point all tasks that could have seen our entry locked
+* must be in the waitqueue and the following check will see them.
+*/
+   if (waitqueue_active(wq))
+   __wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, );
+}
+
+/*
  * Check whether the given slot is locked. The function must be called with
  * mapping->tree_lock held
  */
@@ -392,31 +417,6 @@ static void *grab_mapping_entry(struct address_space 
*mapping, pgoff_t index,
return entry;
 }
 
-/*
- * We do not necessarily hold the mapping->tree_lock when we call this
- * function so it is possible that 'entry' is no longer a valid item in the
- * radix tree.  This is okay because all we really need to do is to find the
- * correct waitqueue where tasks might be waiting for that old 'entry' and
- * wake them.
- */
-void dax_wake_mapping_entry_waiter(struct address_space *mapping,
-   pgoff_t index, void *entry, bool wake_all)
-{
-   struct exceptional_entry_key key;
-   wait_queue_head_t *wq;
-
-   wq = dax_entry_waitqueue(mapping, index, entry, );
-
-   /*
-* Checking for locked entry and prepare_to_wait_exclusive() happens
-* under mapping->tree_lock, ditto for entry handling in our callers.
-* So at this point all tasks that could have seen our entry locked
-* must be in the waitqueue and the following check will see them.
-*/
-   if (waitqueue_active(wq))
-   __wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, );
-}
-
 static int __dax_invalidate_mapping_entry(struct address_space *mapping,
  pgoff_t index, bool trunc)
 {
@@ -468,50 +468,6 @@ int dax_invalidate_mapping_entry_sync(struct address_space 
*mapping,
return __dax_invalidate_mapping_entry(mapping, index, false);
 }
 
-/*
- * The user has performed a load from a hole in the file.  Allocating
- * a new page in the file would cause excessive storage usage for
- * workloads with sparse files.  We allocate a page cache page instead.
- * We'll kick it out of the page cache if it's ever written to,
- * otherwise it will simply fall out of the page cache under memory
- * pressure without ever having been dirtied.
- */
-static int dax_load_hole(struct address_space *mapping, void **entry,
-struct vm_fault *vmf)
-{
-   struct inode *inode = mapping->host;
-   struct page *page;
-   int ret;
-
-   /* Hole page already exists? Return it...  */
-   if (!radix_tree_exceptional_entry(*entry)) {
-   page = *entry;
-   goto finish_fault;
-   }
-
-   /* This will replace locked radix tree entry with a hole page */
-   page = find_or_create_page(mapping, vmf->pgoff,
-  vmf->gfp_mask | __GFP_ZERO);
-   if (!page) {
-   ret = VM_FAULT_OOM;
-   goto out;
-   }
-
-finish_fault:
-   vmf->page = page;
-   ret = finish_fault(vmf);
-   vmf->page = NULL;
-   *entry = page;
-   if (!ret) {
-   /* Grab reference for PTE that is now referencing the page */
-   get_page(page);
-   ret = VM_FAULT_NOPAGE;
-   }
-out:
-   trace_dax_load_hole(inode, vmf, ret);
-   return ret;
-}
-
 static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev,
sector_t sector, size_t size, struct page *to,
unsigned long vaddr)
@@ -938,6 +894,50 @@ int 

[PATCH v5 0/5] DAX common 4k zero page

2017-07-24 Thread Ross Zwisler
Changes since v4:
 - Added static __vm_insert_mixed() to mm/memory.c that holds the common
   code for both vm_insert_mixed() and vm_insert_mixed_mkwrite() so we
   don't have duplicate code and we don't have to pass boolean flags
   around.  (Dan & Jan)

 - Added a comment for the PFN sanity checking done in the mkwrite case of
   insert_pfn().

 - Added Jan's reviewed-by tags.

This series has passed a full xfstests run on both XFS and ext4.

---

When servicing mmap() reads from file holes the current DAX code allocates
a page cache page of all zeroes and places the struct page pointer in the
mapping->page_tree radix tree.  This has three major drawbacks:

1) It consumes memory unnecessarily.  For every 4k page that is read via a
DAX mmap() over a hole, we allocate a new page cache page.  This means that
if you read 1GiB worth of pages, you end up using 1GiB of zeroed memory.

2) It is slower than using a common zero page because each page fault has
more work to do.  Instead of just inserting a common zero page we have to
allocate a page cache page, zero it, and then insert it.

3) The fact that we had to check for both DAX exceptional entries and for
page cache pages in the radix tree made the DAX code more complex.

This series solves these issues by following the lead of the DAX PMD code
and using a common 4k zero page instead.  This reduces memory usage and
decreases latencies for some workloads, and it simplifies the DAX code,
removing over 100 lines in total.

Ross Zwisler (5):
  mm: add vm_insert_mixed_mkwrite()
  dax: relocate some dax functions
  dax: use common 4k zero page for dax mmap reads
  dax: remove DAX code from page_cache_tree_insert()
  dax: move all DAX radix tree defs to fs/dax.c

 Documentation/filesystems/dax.txt |   5 +-
 fs/dax.c  | 345 --
 fs/ext2/file.c|  25 +--
 fs/ext4/file.c|  32 +---
 fs/xfs/xfs_file.c |   2 +-
 include/linux/dax.h   |  45 -
 include/linux/mm.h|   2 +
 include/trace/events/fs_dax.h |   2 -
 mm/filemap.c  |  13 +-
 mm/memory.c   |  50 +-
 10 files changed, 196 insertions(+), 325 deletions(-)

-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/13] net: dsa: lan9303: lan9303_port_mdb_del remove port 0

2017-07-24 Thread Florian Fainelli
On 07/20/2017 06:57 AM, Egil Hjelmeland wrote:
> Workaround for dsa_switch_mdb_add adding CPU port to group,
> but forgetting to remove it:
> 
> Remove port 0 if only port 0 is only port left.
> 
> Signed-off-by: Egil Hjelmeland 
> ---
>  drivers/net/dsa/lan9303-core.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index 54646eb38185..61c915eed649 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -1424,6 +1424,17 @@ static int lan9303_port_mdb_del(
>   if (mdb->vid)
>   return -EOPNOTSUPP;
>   lan9303_alr_del_port(chip, mdb->addr, port);
> +
> + {

No need for curly braces here.

> + /* Workaround for dsa_switch_mdb_add adding CPU port to
> +  * group, but forgetting to remove it. Remove port 0
> +  * if only port left

Should not we move this logic one layer above into DSA then such that
insertions and removals are strictly symmetrical in which and how many
ports are targeted?

> +  **/
> + struct lan9303_alr_cache_entry *entr =
> + lan9303_alr_cache_find_mac(chip, mdb->addr);
> + if (entr && (entr->port_map == BIT(0)))
> + lan9303_alr_del_port(chip, mdb->addr, 0);
> + }
>   return 0;
>  }
>  
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/13] net: dsa: lan9303: Added "stp_enable" sysfs attribute

2017-07-24 Thread Florian Fainelli
On 07/20/2017 06:42 AM, Egil Hjelmeland wrote:
> Must be set to 1 by user space when STP is used on the lan9303.
> If bridging without local STP, leave at 0, so external STP BPDUs
> are forwarded.
> 
> Hopefully the kernel can be improved so the driver can handle this
> without user intervention, and this control can be removed.

Same here, we can't have a driver-specific sysfs attribute just for
this, either we find a way to have the bridge's STP settings propagate
correctly to the switch driver, or you have to make better decisions
based on hints/calls you are getting from switchdev -> dsa -> driver.

> 
> Signed-off-by: Egil Hjelmeland 
> ---
>  Documentation/networking/dsa/lan9303.txt | 23 ++
>  drivers/net/dsa/lan9303-core.c   | 33 
> 
>  2 files changed, 56 insertions(+)
> 
> diff --git a/Documentation/networking/dsa/lan9303.txt 
> b/Documentation/networking/dsa/lan9303.txt
> index ace91c821ce7..0694b6646d2a 100644
> --- a/Documentation/networking/dsa/lan9303.txt
> +++ b/Documentation/networking/dsa/lan9303.txt
> @@ -40,6 +40,10 @@ When a user port is enabled, the driver creates sysfs 
> directory
>   - alr_dump (RO): List the 168 first entries of the ALR table.
>Including port 0 entires. This file is identical for both ports.
>Format: MAC; list of ports; (l)earned / (s)tatic
> + - stp_enable (RW): Must be set to 1 when STP is used. Installs an ALR
> +  entry so that received STP BPDUs are only sent to port 0.
> +  When 0 (default) received STP BPDUs are forwarded to all ports.
> +  This file is identical for both ports.
>   - swe_bcst_throt (RW): Set/get 6.4.7 Broadcast Storm Control
>Throttle Level for the port. Accesses the corresponding bits of
>the SWE_BCST_THROT register (13.4.3.23).
> @@ -49,3 +53,22 @@ Driver limitations
>  ==
>  
>   - No support for VLAN
> +
> +
> +Bridging notes
> +==
> +When the user ports are bridged, broadcasts, multicasts and unknown
> +frames with unknown destination are flooded by the chip. Therefore SW
> +flooding must be disabled by:
> +
> +   echo 0 > /sys/class/net/p1/brport/broadcast_flood
> +   echo 0 > /sys/class/net/p1/brport/multicast_flood
> +   echo 0 > /sys/class/net/p1/brport/unicast_flood
> +   echo 0 > /sys/class/net/p2/brport/broadcast_flood
> +   echo 0 > /sys/class/net/p2/brport/multicast_flood
> +   echo 0 > /sys/class/net/p2/brport/unicast_flood
> +
> +If enabling local STP, the LAN9303 must be configured to forward
> +BPDUs only to port 0, by writing 1 to "stp_enable" of one of the ports:
> +
> +   echo 1 > /sys/class/net/p1/lan9303/stp_enable
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index b682aa4f1fca..54646eb38185 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -187,6 +187,8 @@
>  #define MII_LAN911X_SPECIAL_MODES 0x12
>  #define MII_LAN911X_SPECIAL_CONTROL_STATUS 0x1f
>  
> +#define eth_stp_addr eth_reserved_addr_base
> +
>  static const struct regmap_range lan9303_valid_regs[] = {
>   regmap_reg_range(0x14, 0x17), /* misc, interrupt */
>   regmap_reg_range(0x19, 0x19), /* endian test */
> @@ -988,9 +990,40 @@ alr_dump_show(struct device *dev, struct 
> device_attribute *attr,
>  }
>  static DEVICE_ATTR_RO(alr_dump);
>  
> +static ssize_t
> +stp_enable_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct dsa_port *dp = dsa_net_device_to_dsa_port(to_net_dev(dev));
> + struct lan9303 *chip = dp->ds->priv;
> + char result = lan9303_alr_cache_find_mac(chip, eth_stp_addr) ?
> + '1' : '0';
> + return scnprintf(buf, PAGE_SIZE, "%c\n", result);
> +}
> +
> +static ssize_t
> +stp_enable_store(struct device *dev, struct device_attribute *attr,
> +  const char *buf, size_t len)
> +{
> + struct dsa_port *dp = dsa_net_device_to_dsa_port(to_net_dev(dev));
> + struct lan9303 *chip = dp->ds->priv;
> + unsigned long enable;
> + int ret = kstrtoul(buf, 0, );
> +
> + if (ret)
> + return ret;
> + if (enable)
> + lan9303_alr_add_port(chip, eth_stp_addr, 0, true);
> + else
> + lan9303_alr_del_port(chip, eth_stp_addr, 0);
> + return len;
> +}
> +static DEVICE_ATTR_RW(stp_enable);
> +
>  static struct attribute *lan9303_attrs[] = {
>   _attr_swe_bcst_throt.attr,
>   _attr_alr_dump.attr,
> + _attr_stp_enable.attr,
>   NULL
>  };
>  
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/13] net: dsa: lan9303: Only allocate 3 ports

2017-07-24 Thread Florian Fainelli
On 07/20/2017 03:35 AM, Egil Hjelmeland wrote:
> Saving 2628 bytes.
> 
> Signed-off-by: Egil Hjelmeland 

Reviewed-by: Florian Fainelli 
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/13] net: dsa: lan9303: Added "alr_dump" sysfs port attribute

2017-07-24 Thread Florian Fainelli
On 07/20/2017 01:49 AM, Egil Hjelmeland wrote:
> Added read only file /sys/class/net//lan9303/alr_dump,
> that output 168 first ALR entires.
> 
> Currently "bridge fdb show" does not include the CPU port, while
> "alr_dump" list all three ports per entry.

Agreed, and this is a limitation we would probably want to remove in the
future, but duplicating what already exists with "bridge fdb show" into
a sysfs node is a non-starter.
> 
> Example output:
> 
> 9c:57:ad:79:d0:84  1  l
> 01:80:c2:00:00:00 0   s
> 00:13:cb:0d:01:95 0   s
> 10:f3:11:f5:6f:cf   2 l
> 48:4d:7e:f4:59:a8   2 l
> 01:00:5e:00:01:0a 0 2 s
> ec:f4:bb:0f:e2:fd   2 l



> 
> Signed-off-by: Egil Hjelmeland 
> ---
>  Documentation/networking/dsa/lan9303.txt |  3 ++
>  drivers/net/dsa/lan9303-core.c   | 58 
> 
>  2 files changed, 61 insertions(+)
> 
> diff --git a/Documentation/networking/dsa/lan9303.txt 
> b/Documentation/networking/dsa/lan9303.txt
> index 1fd72ff4b492..ace91c821ce7 100644
> --- a/Documentation/networking/dsa/lan9303.txt
> +++ b/Documentation/networking/dsa/lan9303.txt
> @@ -37,6 +37,9 @@ Sysfs nodes
>  When a user port is enabled, the driver creates sysfs directory
>  /sys/class/net/xxx/lan9303 with the following files:
>  
> + - alr_dump (RO): List the 168 first entries of the ALR table.
> +  Including port 0 entires. This file is identical for both ports.
> +  Format: MAC; list of ports; (l)earned / (s)tatic
>   - swe_bcst_throt (RW): Set/get 6.4.7 Broadcast Storm Control
>Throttle Level for the port. Accesses the corresponding bits of
>the SWE_BCST_THROT register (13.4.3.23).
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index ad7a4c72e1fb..b682aa4f1fca 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -642,6 +642,47 @@ static void alr_loop_cb_fdb_port_dump(
>   dump_ctx->cb(>obj);
>  }
>  
> +/* /sys/class/net/xxx/lan9303/alr_dump: display 168 first ALR entires,
> + * including cpu port
> + */
> +struct port_sysfs_dump_ctx {
> + char *buf;
> + int pos;
> +};
> +
> +static void alr_loop_cb_sysfs_dump(
> + struct lan9303 *chip, u32 dat0, u32 dat1, int portmap, void *ctx)
> +{
> +#define LINE_LEN 24
> + struct port_sysfs_dump_ctx *dump_ctx = ctx;
> + char *buf = dump_ctx->buf;
> + int  pos =  dump_ctx->pos;
> +
> + u8 mac[ETH_ALEN];
> + int p;
> + char ports[LAN9303_NUM_PORTS + 1];
> + const char trunc_txt[] = "Truncated!\n";
> +
> + if (pos >= PAGE_SIZE - LINE_LEN - (sizeof(trunc_txt) - 1)) {
> + if (pos < PAGE_SIZE - LINE_LEN)
> + pos += sprintf(buf + pos, trunc_txt);
> + dump_ctx->pos = pos;
> + return;
> + }
> +
> + _alr_reg_to_mac(dat0, dat1, mac);
> +
> + /* print ports as list of port numbers: */
> + for (p = 0; p < LAN9303_NUM_PORTS; p++)
> + ports[p] = (portmap & BIT(p)) ? '0' + p : ' ';
> + ports[LAN9303_NUM_PORTS] = 0;
> +
> + pos += sprintf(buf + pos, "%pM %s %s\n",
> +mac, ports,
> +(dat1 & ALR_DAT1_STATIC) ? "s" : "l");
> + dump_ctx->pos = pos;
> +}
> +
>  /* ALR: Add/modify/delete ALR entries */
>  
>  /* Set a static ALR entry. Delete entry if port_map is zero */
> @@ -931,8 +972,25 @@ swe_bcst_throt_store(struct device *dev, struct 
> device_attribute *attr,
>  
>  static DEVICE_ATTR_RW(swe_bcst_throt);
>  
> +static ssize_t
> +alr_dump_show(struct device *dev, struct device_attribute *attr,
> +   char *buf)
> +{
> + struct dsa_port *dp = dsa_net_device_to_dsa_port(to_net_dev(dev));
> + struct lan9303 *chip = dp->ds->priv;
> + struct port_sysfs_dump_ctx dump_ctx = {
> + .buf = buf,
> + .pos = 0,
> + };
> +
> + lan9303_alr_loop(chip, alr_loop_cb_sysfs_dump, _ctx);
> + return dump_ctx.pos;
> +}
> +static DEVICE_ATTR_RO(alr_dump);
> +
>  static struct attribute *lan9303_attrs[] = {
>   _attr_swe_bcst_throt.attr,
> + _attr_alr_dump.attr,
>   NULL
>  };
>  
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] fs/dcache: Enable automatic pruning of negative dentries

2017-07-24 Thread Waiman Long
On 07/21/2017 07:07 PM, James Bottomley wrote:
> On Fri, 2017-07-21 at 16:17 -0400, Waiman Long wrote:
>> On 07/21/2017 03:30 PM, James Bottomley wrote:
>>> On Fri, 2017-07-21 at 09:43 -0400, Waiman Long wrote:
 Having a limit for the number of negative dentries does have an
 undesirable side effect that no new negative dentries will be
 allowed when the limit is reached. This will have performance
 implication for some types of workloads.
>>> This really seems like a significant problem: negative dentries
>>> should be released in strict lru order because the chances are no-
>>> one cares about the least recently used one, but they may care
>>> about having the most recently created one.
>> This should not happen under normal circumstances as the asynchronous
>> shrinker should be able to keep enough free negative dentry available
>> in the pool that direct negative dentry killing will rarely happen.
> But that's an argument for not bothering with the patch set at all: if
> it's a corner case that rarely occurs.
>
> Perhaps we should start with why the series is important and then get
> back to what we do if the condition is hit?

There is concern that unlimited negative dentry growth can be used as a
kind of DoS attack by robbing system of all its memory and triggering
massive memory reclaim or even OOM kill. This patchset is aimed to
prevent this kind of worst case situation from happening.

>>> [...]
 @@ -323,6 +329,16 @@ static void __neg_dentry_inc(struct dentry
 *dentry)
 */
if (!cnt)
dentry->d_flags |= DCACHE_KILL_NEGATIVE;
 +
 +  /*
 +   * Initiate negative dentry pruning if free pool has
 less
 than
 +   * 1/4 of its initial value.
 +   */
 +  if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) {
 +  WRITE_ONCE(ndblk.prune_sb, dentry->d_sb);
 +  schedule_delayed_work(_neg_dentry_work,
 +NEG_PRUNING_DELAY);
 +  }
>>> So here, why not run the negative dentry shrinker synchronously to
>>> see if we can shrink the cache and avoid killing the current
>>> negative dentry.  If there are context problems doing that, we
>>> should at least make the effort to track down the least recently
>>> used negative dentry and mark that for killing instead.
>> Only one CPU will be calling the asynchronous shrinker. So its effect
>> on the overall performance of the system should be negligible.
>>
>> Allowing all CPUs to potentially do synchronous shrinking can cause a
>> lot of lock and cacheline contention.
> Does that matter on something you thought was a rare occurrence above?
>  Plus, if the only use case is malicious user, as you say below, then
> they get to pay the biggest overhead penalty because they're the ones
> trying to create negative dentries.
>
> Perhaps if the threat model truly is malicious users creating negative
> dentries then a better fix is to add a delay penalty to true negative
> dentry creation (say short msleep) when we get into this situation (if
> you only pay the penalty on true negative dentry creation, not negative
> promotes to positive, then we'll mostly penalise the process trying to
> be malicious).

I can insert a delay loop before killing the negative dentry. Taking a
msleep can be problematic as the callers of dput() may not be in a
sleepable state.

>>  I will look further to see if there is opportunity to do some
>> optimistic synchronous shrinking. If that fails because of a
>> contended lock, for example, we will need to fall back to killing the
>> dentry. That should only happen under the worst case situation, like
>> when a malicious process is running.
> Right, so I can force this by doing a whole load of non existent file
> lookups then what happens:  negative dentries stop working for everyone
> because they're killed as soon as they're created.  Negative dentries
> are useful performance enhancements for things like the usual does
> x.lock exist, if not modify x things that applications do.

I am well aware of the performance benefit offered by negative dentries.
That is why I added this patch to prune the LRU list before it is too
late and is forced to kill negative dentries. However, negative dentry
killing may still happen if the negative dentry generation rate is
faster than the pruning rate.
This can be caused by bugs in the applications or malicious intent. If
this happens, it is likely that negative dentries will consume most of
the system memory without this patchset. Application performance will
suffer somewhat, but it will not as bad as when most of the memory are
consumed by negative dentries.

> It also looks like the dentry never loses DCACHE_KILL_NEGATIVE, so if
> I'm creating the x.lock file, and we're in this situation, it gets a
> positive dentry with the DCACHE_KILL_NEGATIVE flag set (because we
> start the lookup finding a negative dentry which gets the flag and then
> promote it to positive).


[PATCH 08/13] net: dsa: lan9303: Added ALR/fdb/mdb handling

2017-07-24 Thread Egil Hjelmeland
Added functions for accessing / managing the lan9303 ALR (Address Logic
Resolution).

Implemented DSA methods: set_addr, port_fast_age, port_fdb_prepare,
port_fdb_add, port_fdb_del, port_fdb_dump, port_mdb_prepare,
port_mdb_add and port_mdb_del.

Since the lan9303 do not offer reading specific ALR entry, the driver
caches all static entries - in a flat table.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 369 +
 drivers/net/dsa/lan9303.h  |  11 ++
 2 files changed, 380 insertions(+)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 426a75bd89f4..dc95973d62ed 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "lan9303.h"
 
@@ -121,6 +122,21 @@
 #define LAN9303_MAC_RX_CFG_2 0x0c01
 #define LAN9303_MAC_TX_CFG_2 0x0c40
 #define LAN9303_SWE_ALR_CMD 0x1800
+# define ALR_CMD_MAKE_ENTRYBIT(2)
+# define ALR_CMD_GET_FIRST BIT(1)
+# define ALR_CMD_GET_NEXT  BIT(0)
+#define LAN9303_SWE_ALR_WR_DAT_0 0x1801
+#define LAN9303_SWE_ALR_WR_DAT_1 0x1802
+# define ALR_DAT1_VALIDBIT(26)
+# define ALR_DAT1_END_OF_TABL  BIT(25)
+# define ALR_DAT1_AGE_OVERRID  BIT(25)
+# define ALR_DAT1_STATIC   BIT(24)
+# define ALR_DAT1_PORT_BITOFFS  16
+# define ALR_DAT1_PORT_MASK(7 << ALR_DAT1_PORT_BITOFFS)
+#define LAN9303_SWE_ALR_RD_DAT_0 0x1805
+#define LAN9303_SWE_ALR_RD_DAT_1 0x1806
+#define LAN9303_SWE_ALR_CMD_STS 0x1808
+# define ALR_STS_MAKE_PEND BIT(0)
 #define LAN9303_SWE_VLAN_CMD 0x180b
 # define LAN9303_SWE_VLAN_CMD_RNW BIT(5)
 # define LAN9303_SWE_VLAN_CMD_PVIDNVLAN BIT(4)
@@ -473,6 +489,229 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
return 0;
 }
 
+/* - Address Logic Resolution (ALR)--*/
+
+/* Map ALR-port bits to port bitmap, and back*/
+static const int alrport_2_portmap[] = {1, 2, 4, 0, 3, 5, 6, 7 };
+static const int portmap_2_alrport[] = {3, 0, 1, 4, 2, 5, 6, 7 };
+
+/* ALR: Cache static entries: mac address + port bitmap */
+
+/* Return pointer to first free ALR cache entry, return NULL if none */
+static struct lan9303_alr_cache_entry *lan9303_alr_cache_find_free(
+   struct lan9303 *chip)
+{
+   int i;
+   struct lan9303_alr_cache_entry *entr = chip->alr_cache;
+
+   for (i = 0; i < LAN9303_NUM_ALR_RECORDS; i++, entr++)
+   if (entr->port_map == 0)
+   return entr;
+   return NULL;
+}
+
+/* Return pointer to ALR cache entry matching MAC address */
+static struct lan9303_alr_cache_entry *lan9303_alr_cache_find_mac(
+   struct lan9303 *chip,
+   const u8 *mac_addr)
+{
+   int i;
+   struct lan9303_alr_cache_entry *entr = chip->alr_cache;
+
+   BUILD_BUG_ON_MSG(sizeof(struct lan9303_alr_cache_entry) & 1,
+"ether_addr_equal require u16 alignment");
+
+   for (i = 0; i < LAN9303_NUM_ALR_RECORDS; i++, entr++)
+   if (ether_addr_equal(entr->mac_addr, mac_addr))
+   return entr;
+   return NULL;
+}
+
+/* ALR: Actual register access functions */
+
+/* This function will wait a while until mask & reg == value */
+/* Otherwise, return timeout */
+static int lan9303_csr_reg_wait(struct lan9303 *chip, int regno,
+   int mask, char value)
+{
+   int i;
+
+   for (i = 0; i < 0x1000; i++) {
+   u32 reg;
+
+   lan9303_read_switch_reg(chip, regno, );
+   if ((reg & mask) == value)
+   return 0;
+   }
+   return -ETIMEDOUT;
+}
+
+static int _lan9303_alr_make_entry_raw(struct lan9303 *chip, u32 dat0, u32 
dat1)
+{
+   lan9303_write_switch_reg(
+   chip, LAN9303_SWE_ALR_WR_DAT_0, dat0);
+   lan9303_write_switch_reg(
+   chip, LAN9303_SWE_ALR_WR_DAT_1, dat1);
+   lan9303_write_switch_reg(
+   chip, LAN9303_SWE_ALR_CMD, ALR_CMD_MAKE_ENTRY);
+   lan9303_csr_reg_wait(
+   chip, LAN9303_SWE_ALR_CMD_STS, ALR_STS_MAKE_PEND, 0);
+   lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0);
+   return 0;
+}
+
+typedef void alr_loop_cb_t(
+   struct lan9303 *chip, u32 dat0, u32 dat1, int portmap, void *ctx);
+
+static void lan9303_alr_loop(struct lan9303 *chip, alr_loop_cb_t *cb, void 
*ctx)
+{
+   int i;
+
+   lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, ALR_CMD_GET_FIRST);
+   lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0);
+
+   for (i = 1; i < LAN9303_NUM_ALR_RECORDS; i++) {
+   u32 dat0, dat1;
+   int alrport, portmap;
+
+   lan9303_read_switch_reg(chip, LAN9303_SWE_ALR_RD_DAT_0, );
+   lan9303_read_switch_reg(chip, LAN9303_SWE_ALR_RD_DAT_1, );
+   if (dat1 & ALR_DAT1_END_OF_TABL)
+   break;
+
+   

[PATCH 02/13] net: dsa: lan9303: Do not disable/enable switch fabric port 0 at startup

2017-07-24 Thread Egil Hjelmeland
For some mysterious reason enable switch fabric port 0 TX fails to
work, when the TX has previous been disabled. Resolved by not
disable/enable switch fabric port 0 at startup. Port 1 and 2 are
still disabled in early init.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index e622db586c3d..c2b53659f58f 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -557,9 +557,6 @@ static int lan9303_disable_processing(struct lan9303 *chip)
 {
int ret;
 
-   ret = lan9303_disable_packet_processing(chip, LAN9303_PORT_0_OFFSET);
-   if (ret)
-   return ret;
ret = lan9303_disable_packet_processing(chip, LAN9303_PORT_1_OFFSET);
if (ret)
return ret;
@@ -633,10 +630,6 @@ static int lan9303_setup(struct dsa_switch *ds)
if (ret)
dev_err(chip->dev, "failed to separate ports %d\n", ret);
 
-   ret = lan9303_enable_packet_processing(chip, LAN9303_PORT_0_OFFSET);
-   if (ret)
-   dev_err(chip->dev, "failed to re-enable switching %d\n", ret);
-
return 0;
 }
 
-- 
2.11.0


DISCLAIMER:
This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient. Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/13] net: dsa: lan9303: lan9303_port_mdb_del remove port 0

2017-07-24 Thread Egil Hjelmeland
Workaround for dsa_switch_mdb_add adding CPU port to group,
but forgetting to remove it:

Remove port 0 if only port 0 is only port left.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 54646eb38185..61c915eed649 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -1424,6 +1424,17 @@ static int lan9303_port_mdb_del(
if (mdb->vid)
return -EOPNOTSUPP;
lan9303_alr_del_port(chip, mdb->addr, port);
+
+   {
+   /* Workaround for dsa_switch_mdb_add adding CPU port to
+* group, but forgetting to remove it. Remove port 0
+* if only port left
+**/
+   struct lan9303_alr_cache_entry *entr =
+   lan9303_alr_cache_find_mac(chip, mdb->addr);
+   if (entr && (entr->port_map == BIT(0)))
+   lan9303_alr_del_port(chip, mdb->addr, 0);
+   }
return 0;
 }
 
-- 
2.11.0


DISCLAIMER:
This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient. Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/13] net: dsa: lan9303: Added "stp_enable" sysfs attribute

2017-07-24 Thread Egil Hjelmeland
Must be set to 1 by user space when STP is used on the lan9303.
If bridging without local STP, leave at 0, so external STP BPDUs
are forwarded.

Hopefully the kernel can be improved so the driver can handle this
without user intervention, and this control can be removed.

Signed-off-by: Egil Hjelmeland 
---
 Documentation/networking/dsa/lan9303.txt | 23 ++
 drivers/net/dsa/lan9303-core.c   | 33 
 2 files changed, 56 insertions(+)

diff --git a/Documentation/networking/dsa/lan9303.txt 
b/Documentation/networking/dsa/lan9303.txt
index ace91c821ce7..0694b6646d2a 100644
--- a/Documentation/networking/dsa/lan9303.txt
+++ b/Documentation/networking/dsa/lan9303.txt
@@ -40,6 +40,10 @@ When a user port is enabled, the driver creates sysfs 
directory
  - alr_dump (RO): List the 168 first entries of the ALR table.
   Including port 0 entires. This file is identical for both ports.
   Format: MAC; list of ports; (l)earned / (s)tatic
+ - stp_enable (RW): Must be set to 1 when STP is used. Installs an ALR
+  entry so that received STP BPDUs are only sent to port 0.
+  When 0 (default) received STP BPDUs are forwarded to all ports.
+  This file is identical for both ports.
  - swe_bcst_throt (RW): Set/get 6.4.7 Broadcast Storm Control
   Throttle Level for the port. Accesses the corresponding bits of
   the SWE_BCST_THROT register (13.4.3.23).
@@ -49,3 +53,22 @@ Driver limitations
 ==
 
  - No support for VLAN
+
+
+Bridging notes
+==
+When the user ports are bridged, broadcasts, multicasts and unknown
+frames with unknown destination are flooded by the chip. Therefore SW
+flooding must be disabled by:
+
+   echo 0 > /sys/class/net/p1/brport/broadcast_flood
+   echo 0 > /sys/class/net/p1/brport/multicast_flood
+   echo 0 > /sys/class/net/p1/brport/unicast_flood
+   echo 0 > /sys/class/net/p2/brport/broadcast_flood
+   echo 0 > /sys/class/net/p2/brport/multicast_flood
+   echo 0 > /sys/class/net/p2/brport/unicast_flood
+
+If enabling local STP, the LAN9303 must be configured to forward
+BPDUs only to port 0, by writing 1 to "stp_enable" of one of the ports:
+
+   echo 1 > /sys/class/net/p1/lan9303/stp_enable
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index b682aa4f1fca..54646eb38185 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -187,6 +187,8 @@
 #define MII_LAN911X_SPECIAL_MODES 0x12
 #define MII_LAN911X_SPECIAL_CONTROL_STATUS 0x1f
 
+#define eth_stp_addr eth_reserved_addr_base
+
 static const struct regmap_range lan9303_valid_regs[] = {
regmap_reg_range(0x14, 0x17), /* misc, interrupt */
regmap_reg_range(0x19, 0x19), /* endian test */
@@ -988,9 +990,40 @@ alr_dump_show(struct device *dev, struct device_attribute 
*attr,
 }
 static DEVICE_ATTR_RO(alr_dump);
 
+static ssize_t
+stp_enable_show(struct device *dev, struct device_attribute *attr,
+   char *buf)
+{
+   struct dsa_port *dp = dsa_net_device_to_dsa_port(to_net_dev(dev));
+   struct lan9303 *chip = dp->ds->priv;
+   char result = lan9303_alr_cache_find_mac(chip, eth_stp_addr) ?
+   '1' : '0';
+   return scnprintf(buf, PAGE_SIZE, "%c\n", result);
+}
+
+static ssize_t
+stp_enable_store(struct device *dev, struct device_attribute *attr,
+const char *buf, size_t len)
+{
+   struct dsa_port *dp = dsa_net_device_to_dsa_port(to_net_dev(dev));
+   struct lan9303 *chip = dp->ds->priv;
+   unsigned long enable;
+   int ret = kstrtoul(buf, 0, );
+
+   if (ret)
+   return ret;
+   if (enable)
+   lan9303_alr_add_port(chip, eth_stp_addr, 0, true);
+   else
+   lan9303_alr_del_port(chip, eth_stp_addr, 0);
+   return len;
+}
+static DEVICE_ATTR_RW(stp_enable);
+
 static struct attribute *lan9303_attrs[] = {
_attr_swe_bcst_throt.attr,
_attr_alr_dump.attr,
+   _attr_stp_enable.attr,
NULL
 };
 
-- 
2.11.0


DISCLAIMER:
This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient. Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/13] net: dsa: lan9303: Added "alr_dump" sysfs port attribute

2017-07-24 Thread Egil Hjelmeland
Added read only file /sys/class/net//lan9303/alr_dump,
that output 168 first ALR entires.

Currently "bridge fdb show" does not include the CPU port, while
"alr_dump" list all three ports per entry.

Example output:

9c:57:ad:79:d0:84  1  l
01:80:c2:00:00:00 0   s
00:13:cb:0d:01:95 0   s
10:f3:11:f5:6f:cf   2 l
48:4d:7e:f4:59:a8   2 l
01:00:5e:00:01:0a 0 2 s
ec:f4:bb:0f:e2:fd   2 l

Signed-off-by: Egil Hjelmeland 
---
 Documentation/networking/dsa/lan9303.txt |  3 ++
 drivers/net/dsa/lan9303-core.c   | 58 
 2 files changed, 61 insertions(+)

diff --git a/Documentation/networking/dsa/lan9303.txt 
b/Documentation/networking/dsa/lan9303.txt
index 1fd72ff4b492..ace91c821ce7 100644
--- a/Documentation/networking/dsa/lan9303.txt
+++ b/Documentation/networking/dsa/lan9303.txt
@@ -37,6 +37,9 @@ Sysfs nodes
 When a user port is enabled, the driver creates sysfs directory
 /sys/class/net/xxx/lan9303 with the following files:
 
+ - alr_dump (RO): List the 168 first entries of the ALR table.
+  Including port 0 entires. This file is identical for both ports.
+  Format: MAC; list of ports; (l)earned / (s)tatic
  - swe_bcst_throt (RW): Set/get 6.4.7 Broadcast Storm Control
   Throttle Level for the port. Accesses the corresponding bits of
   the SWE_BCST_THROT register (13.4.3.23).
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index ad7a4c72e1fb..b682aa4f1fca 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -642,6 +642,47 @@ static void alr_loop_cb_fdb_port_dump(
dump_ctx->cb(>obj);
 }
 
+/* /sys/class/net/xxx/lan9303/alr_dump: display 168 first ALR entires,
+ * including cpu port
+ */
+struct port_sysfs_dump_ctx {
+   char *buf;
+   int pos;
+};
+
+static void alr_loop_cb_sysfs_dump(
+   struct lan9303 *chip, u32 dat0, u32 dat1, int portmap, void *ctx)
+{
+#  define LINE_LEN 24
+   struct port_sysfs_dump_ctx *dump_ctx = ctx;
+   char *buf = dump_ctx->buf;
+   int  pos =  dump_ctx->pos;
+
+   u8 mac[ETH_ALEN];
+   int p;
+   char ports[LAN9303_NUM_PORTS + 1];
+   const char trunc_txt[] = "Truncated!\n";
+
+   if (pos >= PAGE_SIZE - LINE_LEN - (sizeof(trunc_txt) - 1)) {
+   if (pos < PAGE_SIZE - LINE_LEN)
+   pos += sprintf(buf + pos, trunc_txt);
+   dump_ctx->pos = pos;
+   return;
+   }
+
+   _alr_reg_to_mac(dat0, dat1, mac);
+
+   /* print ports as list of port numbers: */
+   for (p = 0; p < LAN9303_NUM_PORTS; p++)
+   ports[p] = (portmap & BIT(p)) ? '0' + p : ' ';
+   ports[LAN9303_NUM_PORTS] = 0;
+
+   pos += sprintf(buf + pos, "%pM %s %s\n",
+  mac, ports,
+  (dat1 & ALR_DAT1_STATIC) ? "s" : "l");
+   dump_ctx->pos = pos;
+}
+
 /* ALR: Add/modify/delete ALR entries */
 
 /* Set a static ALR entry. Delete entry if port_map is zero */
@@ -931,8 +972,25 @@ swe_bcst_throt_store(struct device *dev, struct 
device_attribute *attr,
 
 static DEVICE_ATTR_RW(swe_bcst_throt);
 
+static ssize_t
+alr_dump_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+   struct dsa_port *dp = dsa_net_device_to_dsa_port(to_net_dev(dev));
+   struct lan9303 *chip = dp->ds->priv;
+   struct port_sysfs_dump_ctx dump_ctx = {
+   .buf = buf,
+   .pos = 0,
+   };
+
+   lan9303_alr_loop(chip, alr_loop_cb_sysfs_dump, _ctx);
+   return dump_ctx.pos;
+}
+static DEVICE_ATTR_RO(alr_dump);
+
 static struct attribute *lan9303_attrs[] = {
_attr_swe_bcst_throt.attr,
+   _attr_alr_dump.attr,
NULL
 };
 
-- 
2.11.0


DISCLAIMER:
This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient. Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/13] net: dsa: lan9303: Only allocate 3 ports

2017-07-24 Thread Egil Hjelmeland
Saving 2628 bytes.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index dc95973d62ed..ad7a4c72e1fb 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -23,6 +23,8 @@
 
 #include "lan9303.h"
 
+#define LAN9303_NUM_PORTS 3
+
 /* 13.2 System Control and Status Registers
  * Multiply register number by 4 to get address offset.
  */
@@ -1361,7 +1363,7 @@ static struct dsa_switch_ops lan9303_switch_ops = {
 
 static int lan9303_register_switch(struct lan9303 *chip)
 {
-   chip->ds = dsa_switch_alloc(chip->dev, DSA_MAX_PORTS);
+   chip->ds = dsa_switch_alloc(chip->dev, LAN9303_NUM_PORTS);
if (!chip->ds)
return -ENOMEM;
 
-- 
2.11.0


DISCLAIMER:
This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient. Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/13] net: dsa: lan9303: Added Documentation/networking/dsa/lan9303.txt

2017-07-24 Thread Egil Hjelmeland
Signed-off-by: Egil Hjelmeland 
---
 Documentation/networking/dsa/lan9303.txt | 48 
 1 file changed, 48 insertions(+)
 create mode 100644 Documentation/networking/dsa/lan9303.txt

diff --git a/Documentation/networking/dsa/lan9303.txt 
b/Documentation/networking/dsa/lan9303.txt
new file mode 100644
index ..1fd72ff4b492
--- /dev/null
+++ b/Documentation/networking/dsa/lan9303.txt
@@ -0,0 +1,48 @@
+LAN9303 Ethernet switch driver
+==
+
+The LAN9303 is a three port 10/100 ethernet switch with integrated phys
+for the two external ethernet ports. The third port is an RMII/MII
+interface to a host master network interface (e.g. fixed link).
+
+
+Driver details
+==
+
+The driver is implemented as a DSA driver, see
+Documentation/networking/dsa/dsa.txt.
+
+See Documentation/devicetree/bindings/net/dsa/lan9303.txt for device
+tree binding.
+
+The LAN9303 can be managed both via MDIO and I2C, both supported by this
+driver.
+
+At startup the driver configures the device to provide two separate
+network interfaces (which is the default state of a DSA device).
+
+When both user ports are joined to the same bridge, the normal
+HW MAC learning is enabled. This means that unicast traffic is forwarded
+in HW. STP is also supported in this mode.
+
+If one of the user ports leave the bridge,
+the ports goes back to the initial separated operation.
+
+The driver implements the port_fdb_xxx/port_mdb_xxx methods.
+
+
+Sysfs nodes
+===
+
+When a user port is enabled, the driver creates sysfs directory
+/sys/class/net/xxx/lan9303 with the following files:
+
+ - swe_bcst_throt (RW): Set/get 6.4.7 Broadcast Storm Control
+  Throttle Level for the port. Accesses the corresponding bits of
+  the SWE_BCST_THROT register (13.4.3.23).
+
+
+Driver limitations
+==
+
+ - No support for VLAN
-- 
2.11.0


DISCLAIMER:
This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient. Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/13] net: dsa: lan9303: Refactor lan9303_enable_packet_processing()

2017-07-24 Thread Egil Hjelmeland
lan9303_enable_packet_processing, lan9303_disable_packet_processing()
Pass port number (0,1,2) as parameter instead of port offset.
Simplify accordingly.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 66 --
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index c2b53659f58f..0806a0684d55 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -159,9 +159,7 @@
 # define LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT1 (BIT(9) | BIT(8))
 # define LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0 (BIT(1) | BIT(0))
 
-#define LAN9303_PORT_0_OFFSET 0x400
-#define LAN9303_PORT_1_OFFSET 0x800
-#define LAN9303_PORT_2_OFFSET 0xc00
+#define LAN9303_SWITCH_PORT_REG(port, reg0) (0x400 * (port) + (reg0))
 
 /* the built-in PHYs are of type LAN911X */
 #define MII_LAN911X_SPECIAL_MODES 0x12
@@ -457,24 +455,25 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
return 0;
 }
 
-#define LAN9303_MAC_RX_CFG_OFFS (LAN9303_MAC_RX_CFG_0 - LAN9303_PORT_0_OFFSET)
-#define LAN9303_MAC_TX_CFG_OFFS (LAN9303_MAC_TX_CFG_0 - LAN9303_PORT_0_OFFSET)
-
 static int lan9303_disable_packet_processing(struct lan9303 *chip,
 unsigned int port)
 {
int ret;
 
/* disable RX, but keep register reset default values else */
-   ret = lan9303_write_switch_reg(chip, LAN9303_MAC_RX_CFG_OFFS + port,
-  LAN9303_MAC_RX_CFG_X_REJECT_MAC_TYPES);
+   ret = lan9303_write_switch_reg(
+   chip,
+   LAN9303_SWITCH_PORT_REG(port, LAN9303_MAC_RX_CFG_0),
+   LAN9303_MAC_RX_CFG_X_REJECT_MAC_TYPES);
if (ret)
return ret;
 
/* disable TX, but keep register reset default values else */
-   return lan9303_write_switch_reg(chip, LAN9303_MAC_TX_CFG_OFFS + port,
-   LAN9303_MAC_TX_CFG_X_TX_IFG_CONFIG_DEFAULT |
-   LAN9303_MAC_TX_CFG_X_TX_PAD_ENABLE);
+   return lan9303_write_switch_reg(
+   chip,
+   LAN9303_SWITCH_PORT_REG(port, LAN9303_MAC_TX_CFG_0),
+   LAN9303_MAC_TX_CFG_X_TX_IFG_CONFIG_DEFAULT |
+   LAN9303_MAC_TX_CFG_X_TX_PAD_ENABLE);
 }
 
 static int lan9303_enable_packet_processing(struct lan9303 *chip,
@@ -483,17 +482,21 @@ static int lan9303_enable_packet_processing(struct 
lan9303 *chip,
int ret;
 
/* enable RX and keep register reset default values else */
-   ret = lan9303_write_switch_reg(chip, LAN9303_MAC_RX_CFG_OFFS + port,
-  LAN9303_MAC_RX_CFG_X_REJECT_MAC_TYPES |
-  LAN9303_MAC_RX_CFG_X_RX_ENABLE);
+   ret = lan9303_write_switch_reg(
+   chip,
+   LAN9303_SWITCH_PORT_REG(port, LAN9303_MAC_RX_CFG_0),
+   LAN9303_MAC_RX_CFG_X_REJECT_MAC_TYPES |
+   LAN9303_MAC_RX_CFG_X_RX_ENABLE);
if (ret)
return ret;
 
/* enable TX and keep register reset default values else */
-   return lan9303_write_switch_reg(chip, LAN9303_MAC_TX_CFG_OFFS + port,
-   LAN9303_MAC_TX_CFG_X_TX_IFG_CONFIG_DEFAULT |
-   LAN9303_MAC_TX_CFG_X_TX_PAD_ENABLE |
-   LAN9303_MAC_TX_CFG_X_TX_ENABLE);
+   return lan9303_write_switch_reg(
+   chip,
+   LAN9303_SWITCH_PORT_REG(port, LAN9303_MAC_TX_CFG_0),
+   LAN9303_MAC_TX_CFG_X_TX_IFG_CONFIG_DEFAULT |
+   LAN9303_MAC_TX_CFG_X_TX_PAD_ENABLE |
+   LAN9303_MAC_TX_CFG_X_TX_ENABLE);
 }
 
 /* We want a special working switch:
@@ -555,12 +558,14 @@ static int lan9303_handle_reset(struct lan9303 *chip)
 /* stop processing packets for all ports */
 static int lan9303_disable_processing(struct lan9303 *chip)
 {
-   int ret;
+   int ret, p;
 
-   ret = lan9303_disable_packet_processing(chip, LAN9303_PORT_1_OFFSET);
-   if (ret)
-   return ret;
-   return lan9303_disable_packet_processing(chip, LAN9303_PORT_2_OFFSET);
+   for (p = 1; p <= 2; p++) {
+   ret = lan9303_disable_packet_processing(chip, p);
+   if (ret)
+   return ret;
+   }
+   return 0;
 }
 
 static int lan9303_check_device(struct lan9303 *chip)
@@ -696,7 +701,7 @@ static void lan9303_get_ethtool_stats(struct dsa_switch 
*ds, int port,
unsigned int u, poff;
int ret;
 
-   poff = port * 0x400;
+   poff = LAN9303_SWITCH_PORT_REG(port, 0);
 
for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
ret = lan9303_read_switch_reg(chip,
@@ -749,11 

[PATCH 07/13] net: dsa: lan9303: Added basic offloading of unicast traffic

2017-07-24 Thread Egil Hjelmeland
When both user ports are joined to the same bridge, the normal
HW MAC learning is enabled. This means that unicast traffic is forwarded
in HW. Support for STP is also added.

If one of the user ports leave the bridge,
the ports goes back to the initial separated operation.

Added brigde methods port_bridge_join, port_bridge_leave and
port_stp_state_set.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 115 ++---
 drivers/net/dsa/lan9303.h  |   1 +
 2 files changed, 98 insertions(+), 18 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index b70acb73aad6..426a75bd89f4 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "lan9303.h"
 
@@ -143,6 +144,7 @@
 # define LAN9303_SWE_PORT_STATE_FORWARDING_PORT0 (0)
 # define LAN9303_SWE_PORT_STATE_LEARNING_PORT0 BIT(1)
 # define LAN9303_SWE_PORT_STATE_BLOCKING_PORT0 BIT(0)
+# define LAN9303_SWE_PORT_STATE_DISABLED_PORT0 (3)
 #define LAN9303_SWE_PORT_MIRROR 0x1846
 # define LAN9303_SWE_PORT_MIRROR_SNIFF_ALL BIT(8)
 # define LAN9303_SWE_PORT_MIRROR_SNIFFER_PORT2 BIT(7)
@@ -515,11 +517,30 @@ static int lan9303_enable_packet_processing(struct 
lan9303 *chip,
LAN9303_MAC_TX_CFG_X_TX_ENABLE);
 }
 
+/* forward special tagged packets from port 0 to port 1 *or* port 2 */
+static int lan9303_setup_tagging(struct lan9303 *chip)
+{
+   int ret;
+   /* enable defining the destination port via special VLAN tagging
+* for port 0
+*/
+   ret = lan9303_write_switch_reg(chip, LAN9303_SWE_INGRESS_PORT_TYPE,
+  0x03);
+   if (ret)
+   return ret;
+
+   /* tag incoming packets at port 1 and 2 on their way to port 0 to be
+* able to discover their source port
+*/
+   return lan9303_write_switch_reg(
+   chip, LAN9303_BM_EGRSS_PORT_TYPE,
+   LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0);
+}
+
 /* We want a special working switch:
  * - do not forward packets between port 1 and 2
  * - forward everything from port 1 to port 0
  * - forward everything from port 2 to port 0
- * - forward special tagged packets from port 0 to port 1 *or* port 2
  */
 static int lan9303_separate_ports(struct lan9303 *chip)
 {
@@ -534,22 +555,6 @@ static int lan9303_separate_ports(struct lan9303 *chip)
if (ret)
return ret;
 
-   /* enable defining the destination port via special VLAN tagging
-* for port 0
-*/
-   ret = lan9303_write_switch_reg(chip, LAN9303_SWE_INGRESS_PORT_TYPE,
-  0x03);
-   if (ret)
-   return ret;
-
-   /* tag incoming packets at port 1 and 2 on their way to port 0 to be
-* able to discover their source port
-*/
-   ret = lan9303_write_switch_reg(chip, LAN9303_BM_EGRSS_PORT_TYPE,
-   LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0);
-   if (ret)
-   return ret;
-
/* prevent port 1 and 2 from forwarding packets by their own */
return lan9303_write_switch_reg(chip, LAN9303_SWE_PORT_STATE,
LAN9303_SWE_PORT_STATE_FORWARDING_PORT0 |
@@ -557,6 +562,12 @@ static int lan9303_separate_ports(struct lan9303 *chip)
LAN9303_SWE_PORT_STATE_BLOCKING_PORT2);
 }
 
+static void lan9303_bridge_ports(struct lan9303 *chip)
+{
+   /* ports bridged: remove mirroring */
+   lan9303_write_switch_reg(chip, LAN9303_SWE_PORT_MIRROR, 0);
+}
+
 static int lan9303_handle_reset(struct lan9303 *chip)
 {
if (!chip->reset_gpio)
@@ -707,6 +718,10 @@ static int lan9303_setup(struct dsa_switch *ds)
return -EINVAL;
}
 
+   ret = lan9303_setup_tagging(chip);
+   if (ret)
+   dev_err(chip->dev, "failed to setup port tagging %d\n", ret);
+
ret = lan9303_separate_ports(chip);
if (ret)
dev_err(chip->dev, "failed to separate ports %d\n", ret);
@@ -898,17 +913,81 @@ static void lan9303_port_disable(struct dsa_switch *ds, 
int port,
}
 }
 
+static int lan9303_port_bridge_join(struct dsa_switch *ds, int port,
+   struct net_device *br)
+{
+   struct lan9303 *chip = ds->priv;
+
+   dev_dbg(chip->dev, "%s(port %d)\n", __func__, port);
+   if (ds->ports[1].bridge_dev ==  ds->ports[2].bridge_dev) {
+   lan9303_bridge_ports(chip);
+   chip->is_bridged = true;  /* unleash stp_state_set() */
+   }
+
+   return 0;
+}
+
+static void lan9303_port_bridge_leave(struct dsa_switch *ds, int port,
+ struct net_device *br)
+{
+   struct lan9303 *chip = ds->priv;
+
+   dev_dbg(chip->dev, "%s(port %d)\n", __func__, port);
+  

[PATCH 06/13] net: dsa: lan9303: added sysfs node swe_bcst_throt

2017-07-24 Thread Egil Hjelmeland
Allowing per-port access to Switch Engine Broadcast Throttling Register

Also added lan9303_write_switch_reg_mask()

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 83 ++
 1 file changed, 83 insertions(+)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index be6d78f45a5f..b70acb73aad6 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -154,6 +154,7 @@
 # define LAN9303_SWE_PORT_MIRROR_ENABLE_RX_MIRRORING BIT(1)
 # define LAN9303_SWE_PORT_MIRROR_ENABLE_TX_MIRRORING BIT(0)
 #define LAN9303_SWE_INGRESS_PORT_TYPE 0x1847
+#define LAN9303_SWE_BCST_THROT 0x1848
 #define LAN9303_BM_CFG 0x1c00
 #define LAN9303_BM_EGRSS_PORT_TYPE 0x1c0c
 # define LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT2 (BIT(17) | BIT(16))
@@ -426,6 +427,20 @@ static int lan9303_read_switch_reg(struct lan9303 *chip, 
u16 regnum, u32 *val)
return ret;
 }
 
+static int lan9303_write_switch_reg_mask(
+   struct lan9303 *chip, u16 regnum, u32 val, u32 mask)
+{
+   int ret;
+   u32 reg;
+
+   ret = lan9303_read_switch_reg(chip, regnum, );
+   if (ret)
+   return ret;
+   reg = (reg & ~mask) | val;
+
+   return lan9303_write_switch_reg(chip, regnum, reg);
+}
+
 static int lan9303_detect_phy_setup(struct lan9303 *chip)
 {
int reg;
@@ -614,6 +629,66 @@ static int lan9303_check_device(struct lan9303 *chip)
return 0;
 }
 
+/* -- Sysfs on slave port --*/
+/*13.4.3.23 Switch Engine Broadcast Throttling Register (SWE_BCST_THROT)*/
+static ssize_t
+swe_bcst_throt_show(struct device *dev, struct device_attribute *attr,
+   char *buf)
+{
+   struct dsa_port *dp = dsa_net_device_to_dsa_port(to_net_dev(dev));
+   struct lan9303 *chip = dp->ds->priv;
+   int port = dp->index;
+   int reg;
+
+   if (lan9303_read_switch_reg(chip, LAN9303_SWE_BCST_THROT, ))
+   return 0;
+
+   reg = (reg >> (9 * port)) & 0x1ff; /*extract port N*/
+   if (reg & 0x100)
+   reg &= 0xff; /* remove enable bit */
+   else
+   reg = 0; /* not enabled*/
+
+   return scnprintf(buf, PAGE_SIZE, "%d\n", reg);
+}
+
+static ssize_t
+swe_bcst_throt_store(struct device *dev, struct device_attribute *attr,
+const char *buf, size_t len)
+{
+   struct dsa_port *dp = dsa_net_device_to_dsa_port(to_net_dev(dev));
+   struct lan9303 *chip = dp->ds->priv;
+   int port = dp->index;
+   int ret;
+   unsigned long level;
+
+   ret = kstrtoul(buf, 0, );
+   if (ret)
+   return ret;
+   level &= 0xff; /* ensure valid range */
+   if (level)
+   level |= 0x100; /* Set enable bit  */
+
+   ret = lan9303_write_switch_reg_mask(chip, LAN9303_SWE_BCST_THROT,
+   level << (9 * port),
+   0x1ff << (9 * port));
+   if (ret)
+   return ret;
+   return len;
+}
+
+static DEVICE_ATTR_RW(swe_bcst_throt);
+
+static struct attribute *lan9303_attrs[] = {
+   _attr_swe_bcst_throt.attr,
+   NULL
+};
+
+static struct attribute_group lan9303_group = {
+   .name = "lan9303",
+   .attrs = lan9303_attrs,
+};
+
 /*  DSA ---*/
 
 static enum dsa_tag_protocol lan9303_get_tag_protocol(struct dsa_switch *ds)
@@ -787,6 +862,11 @@ static int lan9303_port_enable(struct dsa_switch *ds, int 
port,
switch (port) {
case 1:
case 2:
+   /* lan9303_setup is too early to attach sysfs nodes... */
+   if (sysfs_create_group(
+   >ports[port].netdev->dev.kobj,
+   _group))
+   dev_dbg(chip->dev, "cannot create sysfs group\n");
return lan9303_enable_packet_processing(chip, port);
default:
dev_dbg(chip->dev,
@@ -805,6 +885,9 @@ static void lan9303_port_disable(struct dsa_switch *ds, int 
port,
switch (port) {
case 1:
case 2:
+   sysfs_remove_group(>ports[port].netdev->dev.kobj,
+  _group);
+
lan9303_disable_packet_processing(chip, port);
lan9303_phy_write(ds, chip->phy_addr_sel_strap + port,
  MII_BMCR, BMCR_PDOWN);
-- 
2.11.0


DISCLAIMER:
This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient. Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the 

[PATCH 05/13] net: dsa: added dsa_net_device_to_dsa_port()

2017-07-24 Thread Egil Hjelmeland
Allowing dsa drivers to attach sysfs nodes.

Signed-off-by: Egil Hjelmeland 
---
 include/net/dsa.h |  1 +
 net/dsa/slave.c   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 88da272d20d0..a71c0a2401ee 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -450,6 +450,7 @@ void unregister_switch_driver(struct dsa_switch_driver 
*type);
 struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev);
 
 struct net_device *dsa_dev_to_net_device(struct device *dev);
+struct dsa_port *dsa_net_device_to_dsa_port(struct net_device *dev);
 
 /* Keep inline for faster access in hot path */
 static inline bool netdev_uses_dsa(struct net_device *dev)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 9507bd38cf04..40410f1740de 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -209,6 +209,16 @@ static int dsa_slave_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
return -EOPNOTSUPP;
 }
 
+struct dsa_port *dsa_net_device_to_dsa_port(struct net_device *dev)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+
+   if (!dsa_slave_dev_check(dev))
+   return NULL;
+   return p->dp;
+}
+EXPORT_SYMBOL_GPL(dsa_net_device_to_dsa_port);
+
 static int dsa_slave_port_attr_set(struct net_device *dev,
   const struct switchdev_attr *attr,
   struct switchdev_trans *trans)
-- 
2.11.0


DISCLAIMER:
This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient. Any review, use, distribution or disclosure by 
others is strictly prohibited. If you are not the intended recipient (or 
authorized to receive for the recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kbuild: Update example for ccflags-y usage

2017-07-24 Thread Sedat Dilek
From: Sedat Dilek 

The old example to describe ccflags-y usage is no more valid.

Signed-off-by: Sedat Dilek 
---
 Documentation/kbuild/makefiles.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/kbuild/makefiles.txt 
b/Documentation/kbuild/makefiles.txt
index 7003141a6d4f..cadfa882c275 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -297,9 +297,9 @@ more details, with real examples.
ccflags-y specifies options for compiling with $(CC).
 
Example:
-   # drivers/acpi/Makefile
-   ccflags-y := -Os
-   ccflags-$(CONFIG_ACPI_DEBUG) += -DACPI_DEBUG_OUTPUT
+   # drivers/acpi/acpica/Makefile
+   ccflags-y   := -Os 
-D_LINUX -DBUILDING_ACPICA
+   ccflags-$(CONFIG_ACPI_DEBUG)+= -DACPI_DEBUG_OUTPUT
 
This variable is necessary because the top Makefile owns the
variable $(KBUILD_CFLAGS) and uses it for compilation flags for the
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/5] mm: add mkwrite param to vm_insert_mixed()

2017-07-24 Thread Ross Zwisler
On Mon, Jul 24, 2017 at 01:25:30PM +0200, Jan Kara wrote:
> > @@ -1658,14 +1658,28 @@ static int insert_pfn(struct vm_area_struct *vma, 
> > unsigned long addr,
> > if (!pte)
> > goto out;
> > retval = -EBUSY;
> > -   if (!pte_none(*pte))
> > -   goto out_unlock;
> > +   if (!pte_none(*pte)) {
> > +   if (mkwrite) {
> > +   if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn)))
> 
> Is the WARN_ON_ONCE() really appropriate here? Your testcase with private
> mappings has triggered this situation if I'm right...

Yep, I think this WARN_ON_ONCE() is correct.  The test with private mappings
had collisions between read-only DAX mappings which were being faulted in via
insert_pfn(), and read/write COW page cache mappings which were being faulted
in by wp_page_copy().

I was hitting a false-positive warning when I had the WARN_ON_ONCE() in
insert_pfn() outside of the mkwrite case, i.e.:

if (!pte_none(*pte)) {
if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn)))
goto out_unlock;
if (mkwrite) {
entry = *pte;
goto out_mkwrite;
} else
goto out_unlock;
}

This was triggering when one thread was faulting in a read-only DAX mapping
when another thread had already faulted in a read-write COW page cache page.

The patches I sent out have the warning in the mkwrite case, which would mean
that we were getting a fault for a read/write PTE in insert_pfn() and the PFN
didn't match what was already in the PTE.

This can't ever happen in the private mapping case because we will never
install a read/write PTE for normal storage, only for COW page cache pages.
Essentially I don't think we should ever be able to hit this warning, and if
we do I'd like to get the bug report so that I can track down how it was
happening and make sure that it's safe.  It is in the mkwrite path of
insert_pfn() which is currently only used by the DAX code.

Does that make sense to you, or would you recommend leaving it out?  (If so,
why?)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/5] mm: add mkwrite param to vm_insert_mixed()

2017-07-24 Thread Ross Zwisler
On Mon, Jul 24, 2017 at 01:15:31PM +0200, Jan Kara wrote:
> On Sat 22-07-17 09:21:31, Dan Williams wrote:
> > On Fri, Jul 21, 2017 at 3:39 PM, Ross Zwisler
> >  wrote:
> > > To be able to use the common 4k zero page in DAX we need to have our PTE
> > > fault path look more like our PMD fault path where a PTE entry can be
> > > marked as dirty and writeable as it is first inserted, rather than waiting
> > > for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > >
> > > Right now we can rely on having a dax_pfn_mkwrite() call because we can
> > > distinguish between these two cases in do_wp_page():
> > >
> > > case 1: 4k zero page => writable DAX storage
> > > case 2: read-only DAX storage => writeable DAX storage
> > >
> > > This distinction is made by via vm_normal_page().  vm_normal_page() 
> > > returns
> > > false for the common 4k zero page, though, just as it does for DAX ptes.
> > > Instead of special casing the DAX + 4k zero page case, we will simplify 
> > > our
> > > DAX PTE page fault sequence so that it matches our DAX PMD sequence, and
> > > get rid of the dax_pfn_mkwrite() helper.  We will instead use
> > > dax_iomap_fault() to handle write-protection faults.
> > >
> > > This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> > > and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set 
> > > insert_pfn()
> > > will do the work that was previously done by wp_page_reuse() as part of 
> > > the
> > > dax_pfn_mkwrite() call path.
> > >
> > > Signed-off-by: Ross Zwisler 
> > > ---
> > >  drivers/dax/device.c|  2 +-
> > >  drivers/gpu/drm/exynos/exynos_drm_gem.c |  3 ++-
> > >  drivers/gpu/drm/gma500/framebuffer.c|  2 +-
> > >  drivers/gpu/drm/msm/msm_gem.c   |  3 ++-
> > >  drivers/gpu/drm/omapdrm/omap_gem.c  |  6 --
> > >  drivers/gpu/drm/ttm/ttm_bo_vm.c |  2 +-
> > >  fs/dax.c|  2 +-
> > >  include/linux/mm.h  |  2 +-
> > >  mm/memory.c | 27 +--
> > >  9 files changed, 34 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> > > index e9f3b3e..3973521 100644
> > > --- a/drivers/dax/device.c
> > > +++ b/drivers/dax/device.c
> > > @@ -273,7 +273,7 @@ static int __dev_dax_pte_fault(struct dev_dax 
> > > *dev_dax, struct vm_fault *vmf)
> > >
> > > pfn = phys_to_pfn_t(phys, dax_region->pfn_flags);
> > >
> > > -   rc = vm_insert_mixed(vmf->vma, vmf->address, pfn);
> > > +   rc = vm_insert_mixed(vmf->vma, vmf->address, pfn, false);
> > 
> > Ugh, I generally find bool flags unreadable. They place a tax on
> > jumping to function definition to recall what true and false mean. If
> > we want to go this 'add an argument' route can we at least add an enum
> > like:
> > 
> > enum {
> > PTE_MKDIRTY,
> > PTE_MKCLEAN,
> > };
> > 
> > ...to differentiate the two cases?
> 
> So how I usually deal with this is that I create e.g.:
> 
> __vm_insert_mixed() that takes the bool argument, make vm_insert_mixed()
> pass false, and vm_insert_mixed_mkwrite() pass true. That way there's no
> code duplication, old call sites can stay unchanged, the naming clearly
> says what's going on...

Ah, that does seem cleaner.  I'll try that for v5.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kmemleak: add oom=<disable|ignore> runtime parameter

2017-07-24 Thread Catalin Marinas
On Mon, Jul 24, 2017 at 05:16:34PM +0800, shuw...@redhat.com wrote:
> When running memory stress tests, kmemleak could be easily disabled in
> function create_object as system is out of memory and kmemleak failed to
> alloc from object_cache. Since there's no way to enable kmemleak after
> it's off, simply ignore the object_cache alloc failure will just loses
> track of some memory objects, but could increase the usability of kmemleak
> under memory stress.

I wonder how usable kmemleak is when not recording all the allocated
objects. If some of these memory blocks contain references to others,
such referenced objects could be reported as leaks (basically increasing
the false positives rate).

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] pkeys: fix macro typo in protection-keys.txt

2017-07-24 Thread Wang Kai
Replace PKEY_DENY_WRITE with PKEY_DISABLE_WRITE,
which correspond with source code.

Signed-off-by: Wang Kai 
---
 Documentation/x86/protection-keys.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/protection-keys.txt 
b/Documentation/x86/protection-keys.txt
index b643045..fa46dcb 100644
--- a/Documentation/x86/protection-keys.txt
+++ b/Documentation/x86/protection-keys.txt
@@ -34,7 +34,7 @@ with a key.  In this example WRPKRU is wrapped by a C function
 called pkey_set().
 
int real_prot = PROT_READ|PROT_WRITE;
-   pkey = pkey_alloc(0, PKEY_DENY_WRITE);
+   pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 
0);
ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
... application runs here
@@ -42,9 +42,9 @@ called pkey_set().
 Now, if the application needs to update the data at 'ptr', it can
 gain access, do the update, then remove its write access:
 
-   pkey_set(pkey, 0); // clear PKEY_DENY_WRITE
+   pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
*ptr = foo; // assign something
-   pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again
+   pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
 
 Now when it frees the memory, it will also free the pkey since it
 is no longer in use:
-- 
2.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/8] Add a script to check for Sphinx install requirements

2017-07-24 Thread Mauro Carvalho Chehab
Em Sun, 23 Jul 2017 20:01:44 -0300
Mauro Carvalho Chehab  escreveu:

> Em Sun, 23 Jul 2017 16:08:29 -0600
> Jonathan Corbet  escreveu:
> 
> > On Mon, 17 Jul 2017 18:46:34 -0300
> > Mauro Carvalho Chehab  wrote:
> >   
> > > Sphinx installation is not trivial, as not all versions are supported,
> > > and it requires a lot of stuff for math, images and PDF/LaTeX output
> > > to work.
> > > 
> > > So, add a script that checks if everything is fine, providing
> > > distro-specific hints about what's needed for it to work.
> > 
> > I've applied these (including the Mageia one), thanks.
> > 
> > It occurs to me that adding CentOS would be easy, just treat it like an
> > RHEL system.  Maybe I'll give that a shot in a bit.  
> 
> Heh, I sent a patch for it today :-)

Sorry, the e-mail was queued, but was actually sent only today.


Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scripts/sphinx-pre-install: add minimum support for RHEL

2017-07-24 Thread Mauro Carvalho Chehab
RHEL 7.x and clone distros are shipped with Sphinx 1.1.x,
with is incompatible with Kernel ReST markups.

So, on those systems, the only alternative is to install
it via a Python virtual environment.

While seeking for "pip" on CentOS 7.3, I noticed that it
is not really needed, as python-virtualenv has its version
packaged there already. So, remove this from the list of
requirements for all distributions.

With regards to PDF, twe need at least texlive-tabulary
extension, but that is not shipped there (at least on
CentOS). So, disable PDF packages as a hole.

Please notice, however, that texlive + amsmath is needed for
ReST to properly handle ReST ".. math::" tags. Yet, Sphinx
fall back to display the LaTeX math expressions as-is, if
such extension is not available.

So, let's just disable all texlive packages as a hole.

Signed-off-by: Mauro Carvalho Chehab 
---

This patch comes after this patch:
[PATCH] sphinx-pre-install: add support for Mageia
and after this series
[PATCH v2 0/8] Add a script to check for Sphinx install requirements

The full patch series is available at:
https://git.linuxtv.org/mchehab/experimental.git/log/?h=sphinx_install_v2


 scripts/sphinx-pre-install | 54 +++---
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install
index 5d2799dcfceb..677756ae34c9 100755
--- a/scripts/sphinx-pre-install
+++ b/scripts/sphinx-pre-install
@@ -216,7 +216,6 @@ sub check_sphinx()
$prog = findprog("virtualenv-3.5") if (!$prog);
 
check_program("virtualenv", 0) if (!$prog);
-   check_program("pip", 0) if (!findprog("pip3"));
$need_sphinx = 1;
} else {
add_package("python-sphinx", 0);
@@ -256,7 +255,6 @@ sub give_debian_hints()
"python-sphinx" => "python3-sphinx",
"sphinx_rtd_theme"  => "python3-sphinx-rtd-theme",
"virtualenv"=> "virtualenv",
-   "pip"   => "python3-pip",
"dot"   => "graphviz",
"convert"   => "imagemagick",
"Pod::Usage"=> "perl-modules",
@@ -282,7 +280,6 @@ sub give_redhat_hints()
"python-sphinx" => "python3-sphinx",
"sphinx_rtd_theme"  => "python3-sphinx_rtd_theme",
"virtualenv"=> "python3-virtualenv",
-   "pip"   => "python3-pip",
"dot"   => "graphviz",
"convert"   => "ImageMagick",
"Pod::Usage"=> "perl-Pod-Usage",
@@ -302,6 +299,13 @@ sub give_redhat_hints()
"dejavu-sans-mono-fonts",
);
 
+   #
+   # Checks valid for RHEL/CentOS version 7.x.
+   #
+   if (! $system_release =~ /Fedora/) {
+   $map{"virtualenv"} = "python-virtualenv";
+   }
+
my $release;
 
$release = $1 if ($system_release =~ /Fedora\s+release\s+(\d+)/);
@@ -312,7 +316,14 @@ sub give_redhat_hints()
check_missing(\%map);
 
return if (!$need && !$optional);
-   printf("You should run:\n\n\tsudo dnf install -y $install\n");
+
+   if ($release >= 18) {
+   # dnf, for Fedora 18+
+   printf("You should run:\n\n\tsudo dnf install -y $install\n");
+   } else {
+   # yum, for RHEL (and clones) or Fedora version < 18
+   printf("You should run:\n\n\tsudo yum install -y $install\n");
+   }
 }
 
 sub give_opensuse_hints()
@@ -321,7 +332,6 @@ sub give_opensuse_hints()
"python-sphinx" => "python3-sphinx",
"sphinx_rtd_theme"  => "python3-sphinx_rtd_theme",
"virtualenv"=> "python3-virtualenv",
-   "pip"   => "python3-pip",
"dot"   => "graphviz",
"convert"   => "ImageMagick",
"Pod::Usage"=> "perl-Pod-Usage",
@@ -360,7 +370,6 @@ sub give_mageia_hints()
"python-sphinx" => "python3-sphinx",
"sphinx_rtd_theme"  => "python3-sphinx_rtd_theme",
"virtualenv"=> "python3-virtualenv",
-   "pip"   => "python3-pip",
"dot"   => "graphviz",
"convert"   => "ImageMagick",
"Pod::Usage"=> "perl-Pod-Usage",
@@ -372,8 +381,6 @@ sub give_mageia_hints()
"texlive-fontsextra",
);
 
-   my $release;
-
check_rpm_missing(\@tex_pkgs, 1) if ($pdf);
check_missing(\%map);
 
@@ -386,7 +393,6 @@ sub give_arch_linux_hints()
my %map = (
"sphinx_rtd_theme"  => 

Re: [PATCH v4 4/5] dax: remove DAX code from page_cache_tree_insert()

2017-07-24 Thread Jan Kara
On Fri 21-07-17 16:39:54, Ross Zwisler wrote:
> Now that we no longer insert struct page pointers in DAX radix trees we can
> remove the special casing for DAX in page_cache_tree_insert().  This also
> allows us to make dax_wake_mapping_entry_waiter() local to fs/dax.c,
> removing it from dax.h.
> 
> Signed-off-by: Ross Zwisler 
> Suggested-by: Jan Kara 

Looks good. You can add:

Reviewed-by: Jan Kara 

Honza


> ---
>  fs/dax.c|  2 +-
>  include/linux/dax.h |  2 --
>  mm/filemap.c| 13 ++---
>  3 files changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index fb0e4c1..0e27d90 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -127,7 +127,7 @@ static int wake_exceptional_entry_func(wait_queue_entry_t 
> *wait, unsigned int mo
>   * correct waitqueue where tasks might be waiting for that old 'entry' and
>   * wake them.
>   */
> -void dax_wake_mapping_entry_waiter(struct address_space *mapping,
> +static void dax_wake_mapping_entry_waiter(struct address_space *mapping,
>   pgoff_t index, void *entry, bool wake_all)
>  {
>   struct exceptional_entry_key key;
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 29cced8..afa99bb 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -122,8 +122,6 @@ int dax_iomap_fault(struct vm_fault *vmf, enum 
> page_entry_size pe_size,
>  int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
>  int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
> pgoff_t index);
> -void dax_wake_mapping_entry_waiter(struct address_space *mapping,
> - pgoff_t index, void *entry, bool wake_all);
>  
>  #ifdef CONFIG_FS_DAX
>  int __dax_zero_page_range(struct block_device *bdev,
> diff --git a/mm/filemap.c b/mm/filemap.c
> index a497024..1bf1265 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -130,17 +130,8 @@ static int page_cache_tree_insert(struct address_space 
> *mapping,
>   return -EEXIST;
>  
>   mapping->nrexceptional--;
> - if (!dax_mapping(mapping)) {
> - if (shadowp)
> - *shadowp = p;
> - } else {
> - /* DAX can replace empty locked entry with a hole */
> - WARN_ON_ONCE(p !=
> - dax_radix_locked_entry(0, RADIX_DAX_EMPTY));
> - /* Wakeup waiters for exceptional entry lock */
> - dax_wake_mapping_entry_waiter(mapping, page->index, p,
> -   true);
> - }
> + if (shadowp)
> + *shadowp = p;
>   }
>   __radix_tree_replace(>page_tree, node, slot, page,
>workingset_update_node, mapping);
> -- 
> 2.9.4
> 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/5] dax: use common 4k zero page for dax mmap reads

2017-07-24 Thread Jan Kara
On Fri 21-07-17 16:39:53, Ross Zwisler wrote:
> When servicing mmap() reads from file holes the current DAX code allocates
> a page cache page of all zeroes and places the struct page pointer in the
> mapping->page_tree radix tree.  This has three major drawbacks:
> 
> 1) It consumes memory unnecessarily.  For every 4k page that is read via a
> DAX mmap() over a hole, we allocate a new page cache page.  This means that
> if you read 1GiB worth of pages, you end up using 1GiB of zeroed memory.
> This is easily visible by looking at the overall memory consumption of the
> system or by looking at /proc/[pid]/smaps:
> 
>   7f62e72b3000-7f63272b3000 rw-s  103:00 12   /root/dax/data
>   Size:1048576 kB
>   Rss: 1048576 kB
>   Pss: 1048576 kB
>   Shared_Clean:  0 kB
>   Shared_Dirty:  0 kB
>   Private_Clean:   1048576 kB
>   Private_Dirty: 0 kB
>   Referenced:  1048576 kB
>   Anonymous: 0 kB
>   LazyFree:  0 kB
>   AnonHugePages: 0 kB
>   ShmemPmdMapped:0 kB
>   Shared_Hugetlb:0 kB
>   Private_Hugetlb:   0 kB
>   Swap:  0 kB
>   SwapPss:   0 kB
>   KernelPageSize:4 kB
>   MMUPageSize:   4 kB
>   Locked:0 kB
> 
> 2) It is slower than using a common zero page because each page fault has
> more work to do.  Instead of just inserting a common zero page we have to
> allocate a page cache page, zero it, and then insert it.  Here are the
> average latencies of dax_load_hole() as measured by ftrace on a random test
> box:
> 
> Old method, using zeroed page cache pages:3.4 us
> New method, using the common 4k zero page:0.8 us
> 
> This was the average latency over 1 GiB of sequential reads done by this
> simple fio script:
> 
>   [global]
>   size=1G
>   filename=/root/dax/data
>   fallocate=none
>   [io]
>   rw=read
>   ioengine=mmap
> 
> 3) The fact that we had to check for both DAX exceptional entries and for
> page cache pages in the radix tree made the DAX code more complex.
> 
> Solve these issues by following the lead of the DAX PMD code and using a
> common 4k zero page instead.  As with the PMD code we will now insert a DAX
> exceptional entry into the radix tree instead of a struct page pointer
> which allows us to remove all the special casing in the DAX code.
> 
> Note that we do still pretty aggressively check for regular pages in the
> DAX radix tree, especially where we take action based on the bits set in
> the page.  If we ever find a regular page in our radix tree now that most
> likely means that someone besides DAX is inserting pages (which has
> happened lots of times in the past), and we want to find that out early and
> fail loudly.
> 
> This solution also removes the extra memory consumption.  Here is that same
> /proc/[pid]/smaps after 1GiB of reading from a hole with the new code:
> 
>   7f2054a74000-7f2094a74000 rw-s  103:00 12   /root/dax/data
>   Size:1048576 kB
>   Rss:   0 kB
>   Pss:   0 kB
>   Shared_Clean:  0 kB
>   Shared_Dirty:  0 kB
>   Private_Clean: 0 kB
>   Private_Dirty: 0 kB
>   Referenced:0 kB
>   Anonymous: 0 kB
>   LazyFree:  0 kB
>   AnonHugePages: 0 kB
>   ShmemPmdMapped:0 kB
>   Shared_Hugetlb:0 kB
>   Private_Hugetlb:   0 kB
>   Swap:  0 kB
>   SwapPss:   0 kB
>   KernelPageSize:4 kB
>   MMUPageSize:   4 kB
>   Locked:0 kB
> 
> Overall system memory consumption is similarly improved.
> 
> Another major change is that we remove dax_pfn_mkwrite() from our fault
> flow, and instead rely on the page fault itself to make the PTE dirty and
> writeable.  The following description from the patch adding the
> vm_insert_mixed_mkwrite() call explains this a little more:
> 
> ***
>   To be able to use the common 4k zero page in DAX we need to have our PTE
>   fault path look more like our PMD fault path where a PTE entry can be
>   marked as dirty and writeable as it is first inserted, rather than
>   waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> 
>   Right now we can rely on having a dax_pfn_mkwrite() call because we can
>   distinguish between these two cases in do_wp_page():
> 
>   case 1: 4k zero page => writable DAX storage
>   case 2: read-only DAX storage => writeable DAX storage
> 
>   This distinction is made by via vm_normal_page().  vm_normal_page()
>   returns false for the common 4k zero page, though, just as it does for
>   DAX ptes.  Instead of special casing the DAX + 4k zero page case, we will
>   simplify our DAX PTE page fault sequence so that it matches our DAX PMD
>   

Re: [PATCH v4 2/5] dax: relocate some dax functions

2017-07-24 Thread Jan Kara
On Fri 21-07-17 16:39:52, Ross Zwisler wrote:
> dax_load_hole() will soon need to call dax_insert_mapping_entry(), so it
> needs to be moved lower in dax.c so the definition exists.
> 
> dax_wake_mapping_entry_waiter() will soon be removed from dax.h and be made
> static to dax.c, so we need to move its definition above all its callers.
> 
> Signed-off-by: Ross Zwisler 

Looks good. You can add:

Reviewed-by: Jan Kara 

Honza


> ---
>  fs/dax.c | 138 
> +++
>  1 file changed, 69 insertions(+), 69 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index c844a51..779dc5e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -121,6 +121,31 @@ static int 
> wake_exceptional_entry_func(wait_queue_entry_t *wait, unsigned int mo
>  }
>  
>  /*
> + * We do not necessarily hold the mapping->tree_lock when we call this
> + * function so it is possible that 'entry' is no longer a valid item in the
> + * radix tree.  This is okay because all we really need to do is to find the
> + * correct waitqueue where tasks might be waiting for that old 'entry' and
> + * wake them.
> + */
> +void dax_wake_mapping_entry_waiter(struct address_space *mapping,
> + pgoff_t index, void *entry, bool wake_all)
> +{
> + struct exceptional_entry_key key;
> + wait_queue_head_t *wq;
> +
> + wq = dax_entry_waitqueue(mapping, index, entry, );
> +
> + /*
> +  * Checking for locked entry and prepare_to_wait_exclusive() happens
> +  * under mapping->tree_lock, ditto for entry handling in our callers.
> +  * So at this point all tasks that could have seen our entry locked
> +  * must be in the waitqueue and the following check will see them.
> +  */
> + if (waitqueue_active(wq))
> + __wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, );
> +}
> +
> +/*
>   * Check whether the given slot is locked. The function must be called with
>   * mapping->tree_lock held
>   */
> @@ -392,31 +417,6 @@ static void *grab_mapping_entry(struct address_space 
> *mapping, pgoff_t index,
>   return entry;
>  }
>  
> -/*
> - * We do not necessarily hold the mapping->tree_lock when we call this
> - * function so it is possible that 'entry' is no longer a valid item in the
> - * radix tree.  This is okay because all we really need to do is to find the
> - * correct waitqueue where tasks might be waiting for that old 'entry' and
> - * wake them.
> - */
> -void dax_wake_mapping_entry_waiter(struct address_space *mapping,
> - pgoff_t index, void *entry, bool wake_all)
> -{
> - struct exceptional_entry_key key;
> - wait_queue_head_t *wq;
> -
> - wq = dax_entry_waitqueue(mapping, index, entry, );
> -
> - /*
> -  * Checking for locked entry and prepare_to_wait_exclusive() happens
> -  * under mapping->tree_lock, ditto for entry handling in our callers.
> -  * So at this point all tasks that could have seen our entry locked
> -  * must be in the waitqueue and the following check will see them.
> -  */
> - if (waitqueue_active(wq))
> - __wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, );
> -}
> -
>  static int __dax_invalidate_mapping_entry(struct address_space *mapping,
> pgoff_t index, bool trunc)
>  {
> @@ -468,50 +468,6 @@ int dax_invalidate_mapping_entry_sync(struct 
> address_space *mapping,
>   return __dax_invalidate_mapping_entry(mapping, index, false);
>  }
>  
> -/*
> - * The user has performed a load from a hole in the file.  Allocating
> - * a new page in the file would cause excessive storage usage for
> - * workloads with sparse files.  We allocate a page cache page instead.
> - * We'll kick it out of the page cache if it's ever written to,
> - * otherwise it will simply fall out of the page cache under memory
> - * pressure without ever having been dirtied.
> - */
> -static int dax_load_hole(struct address_space *mapping, void **entry,
> -  struct vm_fault *vmf)
> -{
> - struct inode *inode = mapping->host;
> - struct page *page;
> - int ret;
> -
> - /* Hole page already exists? Return it...  */
> - if (!radix_tree_exceptional_entry(*entry)) {
> - page = *entry;
> - goto finish_fault;
> - }
> -
> - /* This will replace locked radix tree entry with a hole page */
> - page = find_or_create_page(mapping, vmf->pgoff,
> -vmf->gfp_mask | __GFP_ZERO);
> - if (!page) {
> - ret = VM_FAULT_OOM;
> - goto out;
> - }
> -
> -finish_fault:
> - vmf->page = page;
> - ret = finish_fault(vmf);
> - vmf->page = NULL;
> - *entry = page;
> - if (!ret) {
> - /* Grab reference for PTE that is now referencing the page */
> - get_page(page);
> - ret = VM_FAULT_NOPAGE;
> - 

Re: [PATCH v8 00/20] ILP32 for ARM64

2017-07-24 Thread Yury Norov
> > The decision to merge upstream will be revisited every 6 months,
> > > assessing the progress on the points I mentioned above, with a time
> > > limit of 2 years
> > 
> > IIUC, this is your personal decision based on responses and comments
> > from community?
> 
> Yes, as arm64 kernel maintainer.
> 
> > If so, I would like to ask you to do the first ILP32 community poll
> > now, not in 6 months. So we'll collect opinions and requests from
> > people interested in ILP32, and in 6 month will be able to check the
> > progress. I would like to see this thread public because if we are not
> > taking ILP32 to official sources right now, this is the only way to
> > inform people that the project exists and is ready to use.
> 
> That's an ongoing process, I'm not going to ask for people's opinion
> every 6 months. It's just that I will revisit periodically the progress
> on automated testing, public availability of a cross-toolchain,
> Tested/Acked/Reviewed-by tags on these patches from interested parties.
> Since I haven't seen any of these now, I don't see any point in asking.
> 
> To be clear, I'm not really interested in "we need this too" opinions, I
> get lots of these via the marketing channels. I'm looking for actual
> users with a long-term view of making/keeping ILP32 a first class ABI.

>From my side, there are people who ask me for help with ilo32, and who
write from big companies mailservers. But they don't want to ask
their questions publicly for some reason. From my point of view, there
is not so big but stable interest in ILP32.

Nevertheless.

This is the 4.12 and linux-next - based kernel patches:
https://github.com/norov/linux/tree/ilp32-4.12
https://github.com/norov/linux/tree/ilp32-20170724

And this is the glibc series I've created based on Steve's patches in
glibc-alpha mail list (for reference only):
https://github.com/norov/glibc/tree/ilp32-2.26

I hope I didn't miss any reviewer's comments. But if that happened i
kindly ask to excuse me. Should I resend kernel patches to LKML, or
links above are enough for you?

Yury
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/5] mm: add mkwrite param to vm_insert_mixed()

2017-07-24 Thread Jan Kara
> @@ -1658,14 +1658,28 @@ static int insert_pfn(struct vm_area_struct *vma, 
> unsigned long addr,
>   if (!pte)
>   goto out;
>   retval = -EBUSY;
> - if (!pte_none(*pte))
> - goto out_unlock;
> + if (!pte_none(*pte)) {
> + if (mkwrite) {
> + if (WARN_ON_ONCE(pte_pfn(*pte) != pfn_t_to_pfn(pfn)))

Is the WARN_ON_ONCE() really appropriate here? Your testcase with private
mappings has triggered this situation if I'm right...

Otherwise the patch looks good to me.

Honza

-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/5] mm: add vm_insert_mixed_mkwrite()

2017-07-24 Thread Jan Kara
On Fri 21-07-17 11:44:05, Ross Zwisler wrote:
> On Wed, Jul 19, 2017 at 03:58:31PM -0600, Ross Zwisler wrote:
> > On Wed, Jul 19, 2017 at 11:51:12AM -0600, Ross Zwisler wrote:
> > > On Wed, Jul 19, 2017 at 04:16:59PM +0200, Jan Kara wrote:
> > > > On Wed 28-06-17 16:01:48, Ross Zwisler wrote:
> > > > > To be able to use the common 4k zero page in DAX we need to have our 
> > > > > PTE
> > > > > fault path look more like our PMD fault path where a PTE entry can be
> > > > > marked as dirty and writeable as it is first inserted, rather than 
> > > > > waiting
> > > > > for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > > > > 
> > > > > Right now we can rely on having a dax_pfn_mkwrite() call because we 
> > > > > can
> > > > > distinguish between these two cases in do_wp_page():
> > > > > 
> > > > >   case 1: 4k zero page => writable DAX storage
> > > > >   case 2: read-only DAX storage => writeable DAX storage
> > > > > 
> > > > > This distinction is made by via vm_normal_page().  vm_normal_page() 
> > > > > returns
> > > > > false for the common 4k zero page, though, just as it does for DAX 
> > > > > ptes.
> > > > > Instead of special casing the DAX + 4k zero page case, we will 
> > > > > simplify our
> > > > > DAX PTE page fault sequence so that it matches our DAX PMD sequence, 
> > > > > and
> > > > > get rid of dax_pfn_mkwrite() completely.
> > > > > 
> > > > > This means that insert_pfn() needs to follow the lead of 
> > > > > insert_pfn_pmd()
> > > > > and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set 
> > > > > insert_pfn()
> > > > > will do the work that was previously done by wp_page_reuse() as part 
> > > > > of the
> > > > > dax_pfn_mkwrite() call path.
> > > > > 
> > > > > Signed-off-by: Ross Zwisler 
> > > > 
> > > > Just one small comment below.
> > > > 
> > > > > @@ -1658,14 +1658,26 @@ static int insert_pfn(struct vm_area_struct 
> > > > > *vma, unsigned long addr,
> > > > >   if (!pte)
> > > > >   goto out;
> > > > >   retval = -EBUSY;
> > > > > - if (!pte_none(*pte))
> > > > > - goto out_unlock;
> > > > > + if (!pte_none(*pte)) {
> > > > > + if (mkwrite) {
> > > > > + entry = *pte;
> > > > > + goto out_mkwrite;
> > > > 
> > > > Can we maybe check here that (pte_pfn(*pte) == pfn_t_to_pfn(pfn)) and
> > > > return -EBUSY otherwise? That way we are sure insert_pfn() isn't doing
> > > > anything we don't expect 
> > > 
> > > Sure, that's fine.  I'll add it as a WARN_ON_ONCE() so it's a very loud
> > > failure.  If the pfns don't match I think we're insane (and would have 
> > > been
> > > insane prior to this patch series as well) because we are getting a page 
> > > fault
> > > and somehow have a different PFN already mapped at that location.
> > 
> > Umm...well, I added the warning, and during my regression testing hit a case
> > where the PFNs didn't match.  (generic/437 with both ext4 & XFS)
> > 
> > I've verified that this behavior happens with vanilla v4.12, so it's not a 
> > new
> > condition introduced by my patch.
> > 
> > I'm off tracking that down - there's a bug lurking somewhere, I think.
> 
> Actually, I think we're fine.  What was happening was that two faults were
> racing for a private mapping.  One was installing a RW PTE for the COW page
> cache page via wp_page_copy(), and the second was trying to install a
> read-only PTE in insert_pfn().  The PFNs don't match because the two faults
> are trying to map very different PTEs - one for DAX storage, one for a page
> cache page.

OK, so two threads (sharing page tables) were doing read and write fault at
the same offset of a private mapping. OK, makes sense.

> This collision is handled by insert_pfn() by just returning -EBUSY, which will
> bail out of the fault and either re-fault if necessary, or use the PTE that
> the other thread installed.  For the case I described above I think both
> faults will just happily use the page cache page, and the RO DAX fault won't
> be retried.
> 
> I think this is fine, and I'll preserve this behavior as you suggest in the
> mkwrite case by validating that the PTE is what we think it should be after we
> grab the PTL.

Yeah, that seems to essential for the races of faults in private mappings
to work as they should. Thanks for analysing this!

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/5] mm: add mkwrite param to vm_insert_mixed()

2017-07-24 Thread Jan Kara
On Sat 22-07-17 09:21:31, Dan Williams wrote:
> On Fri, Jul 21, 2017 at 3:39 PM, Ross Zwisler
>  wrote:
> > To be able to use the common 4k zero page in DAX we need to have our PTE
> > fault path look more like our PMD fault path where a PTE entry can be
> > marked as dirty and writeable as it is first inserted, rather than waiting
> > for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> >
> > Right now we can rely on having a dax_pfn_mkwrite() call because we can
> > distinguish between these two cases in do_wp_page():
> >
> > case 1: 4k zero page => writable DAX storage
> > case 2: read-only DAX storage => writeable DAX storage
> >
> > This distinction is made by via vm_normal_page().  vm_normal_page() returns
> > false for the common 4k zero page, though, just as it does for DAX ptes.
> > Instead of special casing the DAX + 4k zero page case, we will simplify our
> > DAX PTE page fault sequence so that it matches our DAX PMD sequence, and
> > get rid of the dax_pfn_mkwrite() helper.  We will instead use
> > dax_iomap_fault() to handle write-protection faults.
> >
> > This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> > and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set insert_pfn()
> > will do the work that was previously done by wp_page_reuse() as part of the
> > dax_pfn_mkwrite() call path.
> >
> > Signed-off-by: Ross Zwisler 
> > ---
> >  drivers/dax/device.c|  2 +-
> >  drivers/gpu/drm/exynos/exynos_drm_gem.c |  3 ++-
> >  drivers/gpu/drm/gma500/framebuffer.c|  2 +-
> >  drivers/gpu/drm/msm/msm_gem.c   |  3 ++-
> >  drivers/gpu/drm/omapdrm/omap_gem.c  |  6 --
> >  drivers/gpu/drm/ttm/ttm_bo_vm.c |  2 +-
> >  fs/dax.c|  2 +-
> >  include/linux/mm.h  |  2 +-
> >  mm/memory.c | 27 +--
> >  9 files changed, 34 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> > index e9f3b3e..3973521 100644
> > --- a/drivers/dax/device.c
> > +++ b/drivers/dax/device.c
> > @@ -273,7 +273,7 @@ static int __dev_dax_pte_fault(struct dev_dax *dev_dax, 
> > struct vm_fault *vmf)
> >
> > pfn = phys_to_pfn_t(phys, dax_region->pfn_flags);
> >
> > -   rc = vm_insert_mixed(vmf->vma, vmf->address, pfn);
> > +   rc = vm_insert_mixed(vmf->vma, vmf->address, pfn, false);
> 
> Ugh, I generally find bool flags unreadable. They place a tax on
> jumping to function definition to recall what true and false mean. If
> we want to go this 'add an argument' route can we at least add an enum
> like:
> 
> enum {
> PTE_MKDIRTY,
> PTE_MKCLEAN,
> };
> 
> ...to differentiate the two cases?

So how I usually deal with this is that I create e.g.:

__vm_insert_mixed() that takes the bool argument, make vm_insert_mixed()
pass false, and vm_insert_mixed_mkwrite() pass true. That way there's no
code duplication, old call sites can stay unchanged, the naming clearly
says what's going on...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-24 Thread Akira Yokosawa
On 2017/07/24 14:34:07 +0800, Boqun Feng wrote:
> On Mon, Jul 24, 2017 at 09:04:57AM +0900, Akira Yokosawa wrote:
> [...]
>>>
>>> ->8
>>> Subject: [PATCH] kernel: Emphasize the return value of READ_ONCE() is 
>>> honored
>>>
>>> READ_ONCE() is used around in kernel to provide a control dependency,
>>> and to make the control dependency valid, we must 1) make the load of
>>> READ_ONCE() actually happen and 2) make sure compilers take the return
>>> value of READ_ONCE() serious. 1) is already done and commented,
>>> and in current implementation, 2) is also considered done in the
>>> same way as 1): a 'volatile' load.
>>>
>>> Whereas, Akira Yokosawa recently reported a problem that would be
>>> triggered if 2) is not achieved. 
>>
>> To clarity the timeline, it was Paul who pointed out it would become
>> easier for compilers to optimize away the "if" statements in response
>> to my suggestion of partial revert (">" -> ">=").
>>
> 
> Ah.. right, I missed that part. I will use proper sentences here like:
> 
>   During a recent discussion brought up by Akira Yokosawa on
>   memory-barriers.txt, a problem is discovered, which would be
>   triggered if 2) is not achieved.
> 
> Works with you?

Looks fine. Thanks!

Akira

> 
>>>  Moreover, according to Paul Mckenney,
>>> using volatile might not actually give us what we want for 2) depending
>>> on compiler writers' definition of 'volatile'. Therefore it's necessary
>>> to emphasize 2) as a part of the semantics of READ_ONCE(), this not only
>>> fits the conceptual semantics we have been using, but also makes the
>>> implementation requirement more accurate.
>>>
>>> In the future, we can either make compiler writers accept our use of
>>> 'volatile', or(if that fails) find another way to provide this
>>> guarantee.
>>>
>>> Cc: Akira Yokosawa 
>>> Cc: Paul E. McKenney 
>>> Signed-off-by: Boqun Feng 
>>> ---
>>>  include/linux/compiler.h | 25 +
>>>  1 file changed, 25 insertions(+)
>>>
>>> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
>>> index 219f82f3ec1a..8094f594427c 100644
>>> --- a/include/linux/compiler.h
>>> +++ b/include/linux/compiler.h
>>> @@ -305,6 +305,31 @@ static __always_inline void __write_once_size(volatile 
>>> void *p, void *res, int s
>>>   * mutilate accesses that either do not require ordering or that interact
>>>   * with an explicit memory barrier or atomic instruction that provides the
>>>   * required ordering.
>>> + *
>>> + * The return value of READ_ONCE() should be honored by compilers, IOW,
>>> + * compilers must treat the return value of READ_ONCE() as an unknown 
>>> value at
>>> + * compile time, i.e. no optimization should be done based on the value of 
>>> a
>>> + * READ_ONCE(). For example, the following code snippet:
>>> + *
>>> + * int a = 0;
>>> + * int x = 0;
>>> + *
>>> + * void some_func() {
>>> + * int t = READ_ONCE(a);
>>> + * if (!t)
>>> + * WRITE_ONCE(x, 1);
>>> + * }
>>> + *
>>> + * , should never be optimized as:
>>> + *
>>> + * void some_func() {
>>> + * WRITE_ONCE(x, 1);
>>> + * }
>> READ_ONCE() should still be honored. so maybe the following?
>>
> 
> Make sense. Thanks!
> 
> Regaords,
> Boqun
> 
>> + * , should never be optimized as:
>> + *
>> + *  void some_func() {
>> + *  int t = READ_ONCE(a);
>> + *  WRITE_ONCE(x, 1);
>> + *  }
>>
>>  Thanks, Akira
>>
>>> + *
>>> + * because the compiler is 'smart' enough to think the value of 'a' is 
>>> never
>>> + * changed.
>>> + *
>>> + * We provide this guarantee by making READ_ONCE() a *volatile* load.
>>>   */
>>>  
>>>  #define __READ_ONCE(x, check)  
>>> \
>>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kmemleak: add oom=<disable|ignore> runtime parameter

2017-07-24 Thread shuwang
From: Shu Wang 

When running memory stress tests, kmemleak could be easily disabled in
function create_object as system is out of memory and kmemleak failed to
alloc from object_cache. Since there's no way to enable kmemleak after
it's off, simply ignore the object_cache alloc failure will just loses
track of some memory objects, but could increase the usability of kmemleak
under memory stress.

The default action for oom is still disable kmemleak,
echo oom=ignore > /sys/kernel/debug/kmemleak can change to action to
ignore oom.

Signed-off-by: Shu Wang 
---
 Documentation/dev-tools/kmemleak.rst |  5 +
 mm/kmemleak.c| 10 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/kmemleak.rst 
b/Documentation/dev-tools/kmemleak.rst
index cb88626..3013809 100644
--- a/Documentation/dev-tools/kmemleak.rst
+++ b/Documentation/dev-tools/kmemleak.rst
@@ -60,6 +60,11 @@ Memory scanning parameters can be modified at run-time by 
writing to the
 or free all kmemleak objects if kmemleak has been disabled.
 - dump=
 dump information about the object found at 
+- oom=disable
+disable kmemleak after system out of memory (default)
+- oom=ignore
+do not disable kmemleak after system out of memory
+(useful for memory stress test, but will lose some objects)
 
 Kmemleak can also be disabled at boot-time by passing ``kmemleak=off`` on
 the kernel command line.
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7780cd8..b5cb2c6 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -236,6 +236,9 @@ static DEFINE_MUTEX(scan_mutex);
 static int kmemleak_skip_disable;
 /* If there are leaks that can be reported */
 static bool kmemleak_found_leaks;
+/* If disable kmemleak after out of memory */
+static bool kmemleak_oom_disable = true;
+
 
 /*
  * Early object allocation/freeing logging. Kmemleak is initialized after the
@@ -556,7 +559,8 @@ static struct kmemleak_object *create_object(unsigned long 
ptr, size_t size,
object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
if (!object) {
pr_warn("Cannot allocate a kmemleak_object structure\n");
-   kmemleak_disable();
+   if (kmemleak_oom_disable)
+   kmemleak_disable();
return NULL;
}
 
@@ -1888,6 +1892,10 @@ static ssize_t kmemleak_write(struct file *file, const 
char __user *user_buf,
kmemleak_scan();
else if (strncmp(buf, "dump=", 5) == 0)
ret = dump_str_object_info(buf + 5);
+   else if (strncmp(buf, "oom=ignore", 10) == 0)
+   kmemleak_oom_disable = false;
+   else if (strncmp(buf, "oom=disable", 11) == 0)
+   kmemleak_oom_disable = true;
else
ret = -EINVAL;
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] docs: driver-api: Remove trailing blank line

2017-07-24 Thread Thierry Reding
From: Thierry Reding 

There's no use for this blank line at the end of the file. Remove it.

Signed-off-by: Thierry Reding 
---
 Documentation/driver-api/miscellaneous.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Documentation/driver-api/miscellaneous.rst 
b/Documentation/driver-api/miscellaneous.rst
index 8da7d115bafc..304ffb146cf9 100644
--- a/Documentation/driver-api/miscellaneous.rst
+++ b/Documentation/driver-api/miscellaneous.rst
@@ -47,4 +47,3 @@ used by one consumer at a time.
 
 .. kernel-doc:: drivers/pwm/core.c
:export:
-   
-- 
2.13.3

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Documentation: generated module param documentation

2017-07-24 Thread Jani Nikula
On Mon, 24 Jul 2017, Jonathan Corbet  wrote:
> On Wed, 19 Jul 2017 16:05:01 +0300
> Jani Nikula  wrote:
>
>> Hi Jon, all, here are some quick'n'dirty patches to semi-automatically
>> generate module param documentation from the source, via module build
>> and modinfo(8). No polish or proper design, just a hacked up
>> proof-of-concept to think about.
>> 
>> Do we want nice documentation for module parameters, somewhere that
>> search engines can find them? And do we want to reuse the documentation
>> already in place for parameters?
>
> Certainly I like the idea of automatically generating module parameter
> docs.  I will confess that I don't like committing a duplicated version
> of those docs into the repository, though; nobody will ever update them
> until somebody complains.  I'm also concerned about removing
> documentation from kernel-parameters.rst, since that's where people tend
> to go looking for such information now.
>
> I wonder if we could hack up some sort of trick using "cc -E" and a
> special include file to extract the info from the source?  Then maybe a
> simple module-doc extension to run that trick and include the results?
> It means adding a little stanza for each module we want to document, but
> I'm not convinced that's worse than committing the documentation
> separately.
>
> Thoughts?

Just a quick reply that one alternative I came up with afterwards is
doing what I do here, but with a separate "module param docs" target
that does allmodconfig, build, and modinfo on all the .ko. Likely slow
as molasses, but bypasses committing intermediate files and doesn't slow
down the main docs build.

I'll have to think about your proposal a bit more.

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] irq/irq_sim: add a simple interrupt simulator framework

2017-07-24 Thread Lars-Peter Clausen
On 07/19/2017 02:20 PM, Bartosz Golaszewski wrote:
[...]
> +void irq_sim_fini(struct irq_sim *sim)
> +{

Not very likely to happen in practice, but for correctness we should
probably put a irq_work_sync() here for each of the IRQs to make sure that
the memory associated with the irq_sim_work_ctx struct is no longer accessed
and that handle_simple_irq() is not called after irq_free_descs().


> + irq_free_descs(sim->irq_base, sim->irq_count);
> + kfree(sim->irqs);
> +}
> +EXPORT_SYMBOL_GPL(irq_sim_fini);

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html