[PATCH v2] Input: cap11xx - switch to using set_brightness_blocking()

2019-02-07 Thread Dmitry Torokhov
Updating LED state requires access to regmap and therefore we may sleep,
so we could not do that directly form set_brightness() method.
Historically we used private work to adjust the brightness, but with the
introduction of set_brightness_blocking() we no longer need it.

As a bonus, not having our own work item means we do not have
use-after-free issue as we neglected to cancel outstanding work on
driver unbind.

Reported-by: Sven Van Asbroeck 
Reviewed-by: Sven Van Asbroeck 
Signed-off-by: Dmitry Torokhov 
---

v2: no longer checking current brightness before trying to update since
regmap should take care of caching and skipping updates when needed
(Jacek).

 drivers/input/keyboard/cap11xx.c | 35 ++--
 1 file changed, 11 insertions(+), 24 deletions(-)

diff --git a/drivers/input/keyboard/cap11xx.c b/drivers/input/keyboard/cap11xx.c
index 312916f99597..73686c2460ce 100644
--- a/drivers/input/keyboard/cap11xx.c
+++ b/drivers/input/keyboard/cap11xx.c
@@ -75,9 +75,7 @@
 struct cap11xx_led {
struct cap11xx_priv *priv;
struct led_classdev cdev;
-   struct work_struct work;
u32 reg;
-   enum led_brightness new_brightness;
 };
 #endif
 
@@ -233,30 +231,21 @@ static void cap11xx_input_close(struct input_dev *idev)
 }
 
 #ifdef CONFIG_LEDS_CLASS
-static void cap11xx_led_work(struct work_struct *work)
+static int cap11xx_led_set(struct led_classdev *cdev,
+   enum led_brightness value)
 {
-   struct cap11xx_led *led = container_of(work, struct cap11xx_led, work);
+   struct cap11xx_led *led = container_of(cdev, struct cap11xx_led, cdev);
struct cap11xx_priv *priv = led->priv;
-   int value = led->new_brightness;
 
/*
-* All LEDs share the same duty cycle as this is a HW limitation.
-* Brightness levels per LED are either 0 (OFF) and 1 (ON).
+* All LEDs share the same duty cycle as this is a HW
+* limitation. Brightness levels per LED are either
+* 0 (OFF) and 1 (ON).
 */
-   regmap_update_bits(priv->regmap, CAP11XX_REG_LED_OUTPUT_CONTROL,
-   BIT(led->reg), value ? BIT(led->reg) : 0);
-}
-
-static void cap11xx_led_set(struct led_classdev *cdev,
-  enum led_brightness value)
-{
-   struct cap11xx_led *led = container_of(cdev, struct cap11xx_led, cdev);
-
-   if (led->new_brightness == value)
-   return;
-
-   led->new_brightness = value;
-   schedule_work(>work);
+   return regmap_update_bits(priv->regmap,
+ CAP11XX_REG_LED_OUTPUT_CONTROL,
+ BIT(led->reg),
+ value ? BIT(led->reg) : 0);
 }
 
 static int cap11xx_init_leds(struct device *dev,
@@ -299,7 +288,7 @@ static int cap11xx_init_leds(struct device *dev,
led->cdev.default_trigger =
of_get_property(child, "linux,default-trigger", NULL);
led->cdev.flags = 0;
-   led->cdev.brightness_set = cap11xx_led_set;
+   led->cdev.brightness_set_blocking = cap11xx_led_set;
led->cdev.max_brightness = 1;
led->cdev.brightness = LED_OFF;
 
@@ -312,8 +301,6 @@ static int cap11xx_init_leds(struct device *dev,
led->reg = reg;
led->priv = priv;
 
-   INIT_WORK(>work, cap11xx_led_work);
-
error = devm_led_classdev_register(dev, >cdev);
if (error) {
of_node_put(child);
-- 
2.20.1.791.gb4d0f1c61a-goog


-- 
Dmitry


[PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2019-02-07 Thread john . hubbard
From: John Hubbard 

Introduces put_user_page(), which simply calls put_page().
This provides a way to update all get_user_pages*() callers,
so that they call put_user_page(), instead of put_page().

Also introduces put_user_pages(), and a few dirty/locked variations,
as a replacement for release_pages(), and also as a replacement
for open-coded loops that release multiple pages.
These may be used for subsequent performance improvements,
via batching of pages to be released.

This is the first step of fixing a problem (also described in [1] and
[2]) with interactions between get_user_pages ("gup") and filesystems.

Problem description: let's start with a bug report. Below, is what happens
sometimes, under memory pressure, when a driver pins some pages via gup,
and then marks those pages dirty, and releases them. Note that the gup
documentation actually recommends that pattern. The problem is that the
filesystem may do a writeback while the pages were gup-pinned, and then the
filesystem believes that the pages are clean. So, when the driver later
marks the pages as dirty, that conflicts with the filesystem's page
tracking and results in a BUG(), like this one that I experienced:

kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899!
backtrace:
ext4_writepage
__writepage
write_cache_pages
ext4_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork

...which is due to the file system asserting that there are still buffer
heads attached:

({  \
BUG_ON(!PagePrivate(page)); \
((struct buffer_head *)page_private(page)); \
})

Dave Chinner's description of this is very clear:

"The fundamental issue is that ->page_mkwrite must be called on every
write access to a clean file backed page, not just the first one.
How long the GUP reference lasts is irrelevant, if the page is clean
and you need to dirty it, you must call ->page_mkwrite before it is
marked writeable and dirtied. Every. Time."

This is just one symptom of the larger design problem: filesystems do not
actually support get_user_pages() being called on their pages, and letting
hardware write directly to those pages--even though that patter has been
going on since about 2005 or so.

The steps are to fix it are:

1) (This patch): provide put_user_page*() routines, intended to be used
   for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
   invoke put_user_page*(), instead of put_page(). This involves dozens of
   call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
   implement tracking of these pages. This tracking will be separate from
   the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
   special handling (especially in writeback paths) when the pages are
   backed by a filesystem.

[1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()"
[2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

Cc: Al Viro 
Cc: Christoph Hellwig 
Cc: Christopher Lameter 
Cc: Dan Williams 
Cc: Dave Chinner 
Cc: Jan Kara 
Cc: Jason Gunthorpe 
Cc: Jerome Glisse 
Cc: Matthew Wilcox 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: Ralph Campbell 

Reviewed-by: Jan Kara 
Signed-off-by: John Hubbard 
---
 include/linux/mm.h | 24 ++
 mm/swap.c  | 82 ++
 2 files changed, 106 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80bb6408fe73..809b7397d41e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -993,6 +993,30 @@ static inline void put_page(struct page *page)
__put_page(page);
 }
 
+/**
+ * put_user_page() - release a gup-pinned page
+ * @page:pointer to page to be released
+ *
+ * Pages that were pinned via get_user_pages*() must be released via
+ * either put_user_page(), or one of the put_user_pages*() routines
+ * below. This is so that eventually, pages that are pinned via
+ * get_user_pages*() can be separately tracked and uniquely handled. In
+ * particular, interactions with RDMA and filesystems need special
+ * handling.
+ *
+ * put_user_page() and put_page() are not interchangeable, despite this early
+ * implementation that makes them look the same. put_user_page() calls must
+ * be perfectly matched up with get_user_page() calls.
+ */
+static inline void put_user_page(struct page *page)
+{
+   put_page(page);
+}
+
+void put_user_pages_dirty(struct page **pages, unsigned long npages);
+void put_user_pages_dirty_lock(struct page **pages, unsigned long 

[PATCH 0/2] mm: put_user_page() call site conversion first

2019-02-07 Thread john . hubbard
From: John Hubbard 

Hi,

It seems about time to post these initial patches: I think we have pretty
good consensus on the concept and details of the put_user_pages() approach.
Therefore, here are the first two patches, to get started on converting the
get_user_pages() call sites to use put_user_page(), instead of put_page().
This is in order to implement tracking of get_user_page() pages.

A discussion of the overall problem is below.

As mentioned in patch 0001, the steps are to fix the problem are:

1) Provide put_user_page*() routines, intended to be used
   for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
   invoke put_user_page*(), instead of put_page(). This involves dozens of
   call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
   implement tracking of these pages. This tracking will be separate from
   the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
   special handling (especially in writeback paths) when the pages are
   backed by a filesystem.

This write up is lifted from the RFC v2 patchset cover letter [1]:

Overview


Some kernel components (file systems, device drivers) need to access
memory that is specified via process virtual address. For a long time, the
API to achieve that was get_user_pages ("GUP") and its variations. However,
GUP has critical limitations that have been overlooked; in particular, GUP
does not interact correctly with filesystems in all situations. That means
that file-backed memory + GUP is a recipe for potential problems, some of
which have already occurred in the field.

GUP was first introduced for Direct IO (O_DIRECT), allowing filesystem code
to get the struct page behind a virtual address and to let storage hardware
perform a direct copy to or from that page. This is a short-lived access
pattern, and as such, the window for a concurrent writeback of GUP'd page
was small enough that there were not (we think) any reported problems.
Also, userspace was expected to understand and accept that Direct IO was
not synchronized with memory-mapped access to that data, nor with any
process address space changes such as munmap(), mremap(), etc.

Over the years, more GUP uses have appeared (virtualization, device
drivers, RDMA) that can keep the pages they get via GUP for a long period
of time (seconds, minutes, hours, days, ...). This long-term pinning makes
an underlying design problem more obvious.

In fact, there are a number of key problems inherent to GUP:

Interactions with file systems
==

File systems expect to be able to write back data, both to reclaim pages,
and for data integrity. Allowing other hardware (NICs, GPUs, etc) to gain
write access to the file memory pages means that such hardware can dirty
the pages, without the filesystem being aware. This can, in some cases
(depending on filesystem, filesystem options, block device, block device
options, and other variables), lead to data corruption, and also to kernel
bugs of the form:

kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899!
backtrace:
ext4_writepage
__writepage
write_cache_pages
ext4_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork

...which is due to the file system asserting that there are still buffer
heads attached:

({  \
BUG_ON(!PagePrivate(page)); \
((struct buffer_head *)page_private(page)); \
})

Dave Chinner's description of this is very clear:

"The fundamental issue is that ->page_mkwrite must be called on every
write access to a clean file backed page, not just the first one.
How long the GUP reference lasts is irrelevant, if the page is clean
and you need to dirty it, you must call ->page_mkwrite before it is
marked writeable and dirtied. Every. Time."

This is just one symptom of the larger design problem: filesystems do not
actually support get_user_pages() being called on their pages, and letting
hardware write directly to those pages--even though that pattern has been
going on since about 2005 or so.

Long term GUP
=

Long term GUP is an issue when FOLL_WRITE is specified to GUP (so, a
writeable mapping is created), and the pages are file-backed. That can lead
to filesystem corruption. What happens is that when a file-backed page is
being written back, it is first mapped read-only in all of the CPU page
tables; the file system then assumes that nobody can write to the page, and
that the page content is therefore stable. Unfortunately, the GUP callers
generally 

[PATCH 2/2] infiniband/mm: convert put_page() to put_user_page*()

2019-02-07 Thread john . hubbard
From: John Hubbard 

For infiniband code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(), or
put_user_pages*(), instead of put_page()

This is a tiny part of the second step of fixing the problem described
in [1]. The steps are:

1) Provide put_user_page*() routines, intended to be used
   for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
   invoke put_user_page*(), instead of put_page(). This involves dozens of
   call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
   implement tracking of these pages. This tracking will be separate from
   the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
   special handling (especially in writeback paths) when the pages are
   backed by a filesystem. Again, [1] provides details as to why that is
   desirable.

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: Mike Marciniszyn 
Cc: Dennis Dalessandro 
Cc: Christian Benvenuti 

Reviewed-by: Jan Kara 
Reviewed-by: Dennis Dalessandro 
Acked-by: Jason Gunthorpe 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c  |  7 ---
 drivers/infiniband/core/umem_odp.c  |  2 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
 drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
 7 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index c6144df47ea4..c2898bc7b3b2 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -58,9 +58,10 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
 
page = sg_page(sg);
-   if (!PageDirty(page) && umem->writable && dirty)
-   set_page_dirty_lock(page);
-   put_page(page);
+   if (umem->writable && dirty)
+   put_user_pages_dirty_lock(, 1);
+   else
+   put_user_page(page);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index acb882f279cb..d32757c1f77e 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -663,7 +663,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
ret = -EFAULT;
break;
}
-   put_page(local_page_list[j]);
+   put_user_page(local_page_list[j]);
continue;
}
 
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index e341e6dcc388..99ccc0483711 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -121,13 +121,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, 
unsigned long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   size_t i;
-
-   for (i = 0; i < npages; i++) {
-   if (dirty)
-   set_page_dirty_lock(p[i]);
-   put_page(p[i]);
-   }
+   if (dirty)
+   put_user_pages_dirty_lock(p, npages);
+   else
+   put_user_pages(p, npages);
 
if (mm) { /* during close after signal, mm can be NULL */
down_write(>mmap_sem);
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 112d2f38e0de..99108f3dcf01 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -481,7 +481,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 
ret = pci_map_sg(dev->pdev, _tab->page[i].mem, 1, PCI_DMA_TODEVICE);
if (ret < 0) {
-   put_page(pages[0]);
+   put_user_page(pages[0]);
goto out;
}
 
@@ -489,7 +489,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
 mthca_uarc_virt(dev, uar, i));
if (ret) {
pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, 
PCI_DMA_TODEVICE);
-   put_page(sg_page(_tab->page[i].mem));
+   put_user_page(sg_page(_tab->page[i].mem));
goto out;
}
 
@@ -555,7 +555,7 @@ void 

Re: [PATCH 07/32] timens/kernel: Take into account timens clock offsets in clock_nanosleep

2019-02-07 Thread Thomas Gleixner
On Wed, 6 Feb 2019, Dmitry Safonov wrote:
>  
> @@ -1721,9 +1722,16 @@ long hrtimer_nanosleep(const struct timespec64 *rqtp,
>  {
>   struct restart_block *restart;
>   struct hrtimer_sleeper t;
> + struct timespec64 tp;
>   int ret = 0;
>   u64 slack;
>  
> + if (!(mode & HRTIMER_MODE_REL)) {
> + tp = *rqtp;
> + rqtp = 

So every invocation of hrtimer_nanosleep() gains a copy of the timespec64
even if the namespace muck is disabled.

The only relevant caller of this is common_nsleep(). So it might make sense
to have common_nsleep() separated for CLOCK_MONOTONIC/BOOTTIME and handle
the thing there. That again avoids the switch() to and out of line calls.

Thanks,

tglx




Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release

2019-02-07 Thread Christoph Hellwig
Yes, we should not reset the DMA ops before releasing all resources:

Acked-by: Christoph Hellwig 


Re: [BUGFIX PATCH] crypto: ccree: fix resume race condition on init

2019-02-07 Thread Herbert Xu
On Thu, Feb 07, 2019 at 03:36:11PM +0200, Gilad Ben-Yossef wrote:
> We were enabling autosuspend, which is using data set by the
> hash module, prior to the hash module being inited, casuing
> a crash on resume as part of the startup sequence if the race
> was lost.
> 
> This was never a real problem because the PM infra was using low
> res timers so we were always winning the race, until commit 8234f6734c5d
> ("PM-runtime: Switch autosuspend over to using hrtimers") changed that :-)
> 
> Fix this by seperating the PM setup and enablement and doing the
> latter only at the end of the init sequence.
> 
> Signed-off-by: Gilad Ben-Yossef 
> Cc: Vincent Guittot 
> Cc: sta...@kernel.org # v4.20
> ---
> Herbert, could you please take this for 5.0-rc6 ? thanks.
> 
>  drivers/crypto/ccree/cc_driver.c |  7 ---
>  drivers/crypto/ccree/cc_pm.c | 13 ++---
>  drivers/crypto/ccree/cc_pm.h |  3 +++
>  3 files changed, 13 insertions(+), 10 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] MAINTAINERS: crypto: ccree: remove co-maintainer

2019-02-07 Thread Herbert Xu
On Thu, Feb 07, 2019 at 03:44:15PM +0200, Gilad Ben-Yossef wrote:
> The best-laid plans of mice and men often go awry.
> Remove Yael C. as co-maintainer as she moved on to other endeavours.
> 
> Signed-off-by: Gilad Ben-Yossef 
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2 00/15] crypto: improved skcipher, aead, and hash tests

2019-02-07 Thread Herbert Xu
On Thu, Jan 31, 2019 at 11:51:35PM -0800, Eric Biggers wrote:
> Hello,
> 
> Crypto algorithms must produce the same output for the same input
> regardless of data layout, i.e. how the src and dst scatterlists are
> divided into chunks and how each chunk is aligned.  Request flags such
> as CRYPTO_TFM_REQ_MAY_SLEEP must not affect the result either.
> 
> However, testing of this currently has many gaps.  For example,
> individual algorithms are responsible for providing their own chunked
> test vectors.  But many don't bother to do this or test only one or two
> cases, providing poor test coverage.  Also, other things such as buffers
> spanning a page boundary, misaligned IVs, and CRYPTO_TFM_REQ_MAY_SLEEP
> are never tested at all.
> 
> Test code is also duplicated between the chunked and non-chunked cases,
> making it difficult to make other improvements.
> 
> To improve the situation, this patch series basically moves the chunk
> descriptions into the testmgr itself so that they are shared by all
> algorithms.  However, it's done in an extensible way via a new struct
> 'testvec_config', which describes not just the scaled chunk lengths but
> also all other aspects of the crypto operation besides the data itself
> such as the buffer alignments, the request flags, whether the operation
> is in-place or not, the IV alignment, and for hash algorithms when to do
> each update() and when to use finup() vs. final() vs. digest().
> 
> Then, this patch series makes skcipher, aead, and hash algorithms be
> tested against a list of default testvec_configs, replacing the current
> test code.  This improves overall test coverage, without reducing test
> performance too much.  Note that the test vectors themselves are not
> changed, except for removing the chunk lists.
> 
> This series also adds randomized fuzz tests, enabled by a new kconfig
> option intended for developer use only, where skcipher, aead, and hash
> algorithms are tested against many randomly generated testvec_configs.
> This provides much more comprehensive test coverage.
> 
> I've run these improved tests on x86, arm32, and arm64 with all crypto
> algorithms enabled, and they have already found many bugs.  Patches 1-7
> and the patches from Ard Biesheuvel fix most of the bugs found so far.
> A bug was also detected in the Rockchip crypto driver which remains to
> be fixed.  Also many AEADs incorrectly change aead_request::base.tfm,
> but for now I'm temporarily working around that in the tests as I plan
> to fix it later after the other types of bugs are addressed.
> 
> If anyone reading this has access to systems with other architectures or
> crypto drivers that may not have been tested yet, you can help by
> applying these patches on your system, enabling
> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS, and reporting or fixing any test
> failures.
> 
> This patch series can also be found in git at
> https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git
> branch "testmgr-improvements".
> 
> Changed since v1:
> 
> - Made CONFIG_CRYPTO_MANAGER_EXTRA_TESTS depend on CONFIG_DEBUG_KERNEL.
> - Improved commit description of AEGIS and MORUS fixes.
> - A few very minor cleanups to the test code.
> 
> Eric Biggers (15):
>   crypto: aegis - fix handling chunked inputs
>   crypto: morus - fix handling chunked inputs
>   crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP
>   crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP
>   crypto: x86/aesni-gcm - fix crash on empty plaintext
>   crypto: ahash - fix another early termination in hash walk
>   crypto: arm64/aes-neonbs - fix returning final keystream block
>   crypto: testmgr - add testvec_config struct and helper functions
>   crypto: testmgr - introduce CONFIG_CRYPTO_MANAGER_EXTRA_TESTS
>   crypto: testmgr - implement random testvec_config generation
>   crypto: testmgr - convert skcipher testing to use testvec_configs
>   crypto: testmgr - convert aead testing to use testvec_configs
>   crypto: testmgr - convert hash testing to use testvec_configs
>   crypto: testmgr - check for skcipher_request corruption
>   crypto: testmgr - check for aead_request corruption
> 
>  arch/arm64/crypto/aes-neonbs-core.S|8 +-
>  arch/x86/crypto/aegis128-aesni-glue.c  |   38 +-
>  arch/x86/crypto/aegis128l-aesni-glue.c |   38 +-
>  arch/x86/crypto/aegis256-aesni-glue.c  |   38 +-
>  arch/x86/crypto/aesni-intel_glue.c |   13 +-
>  arch/x86/crypto/morus1280_glue.c   |   40 +-
>  arch/x86/crypto/morus640_glue.c|   39 +-
>  crypto/Kconfig |   10 +
>  crypto/aegis128.c  |   14 +-
>  crypto/aegis128l.c |   14 +-
>  crypto/aegis256.c  |   14 +-
>  crypto/ahash.c |   14 +-
>  crypto/morus1280.c |   13 +-
>  crypto/morus640.c  |   13 +-
>  crypto/testmgr.c   | 2549 +---
>  crypto/testmgr.h   

Re: [PATCH v2 0/6] General Key Derivation Function Support

2019-02-07 Thread Herbert Xu
On Wed, Jan 30, 2019 at 03:39:10PM +0100, Stephan Mueller wrote:
> Am Mittwoch, 30. Januar 2019, 11:08:54 CET schrieb Herbert Xu:
> 
> Hi Herbert,
> 
> > I'm still not convinced why this needs to go into the crypto API
> > instead of being hosted in a helper which should achieve pretty
> > much the same result.
> 
> How do you propose to handle the FIPS 140-2 related requirements of enforcing 
> the integrity test result and the known-answer self tests?

We could easily add self-tests for the helper.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: ccp - fix the SEV probe in kexec boot path

2019-02-07 Thread Herbert Xu
On Wed, Jan 30, 2019 at 08:57:52PM +, Singh, Brijesh wrote:
> A kexec reboot may leave the firmware in INIT or WORKING state.
> Currently, we issue PLATFORM_INIT command during the probe without
> checking the current state. The PLATFORM_INIT command fails if the
> FW is already in INIT state. Lets check the current state, if FW
> is not in UNINIT state then transition it to UNINIT before
> initializing or upgrading the FW.
> 
> Signed-off-by: Brijesh Singh 
> Cc: Tom Lendacky 
> Cc: Gary Hook 
> Reviewed-by: Tom Lendacky 
> ---
>  drivers/crypto/ccp/psp-dev.c | 16 
>  1 file changed, 16 insertions(+)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] net: stmmac: Variable "val" in function sun8i_dwmac_set_syscon() could be uninitialized

2019-02-07 Thread Maxime Ripard
On Thu, Feb 07, 2019 at 09:46:23AM -0800, Yizhuo wrote:
> In function sun8i_dwmac_set_syscon(), local variable "val" could
> be uninitialized if function regmap_read() returns -EINVAL.
> However, it will be used directly in the if statement, which
> is potentially unsafe.
> 
> Signed-off-by: Yizhuo 

Acked-by: Maxime Ripard 

Thanks!
Maxime

-- 
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature


Re: [patch-next] crypto: testmgr - use kmemdup

2019-02-07 Thread Herbert Xu
On Mon, Jan 28, 2019 at 07:01:18PM -0500, Christopher Diaz Riveros wrote:
> Fixes coccinnelle alerts:
> 
> /crypto/testmgr.c:2112:13-20: WARNING opportunity for kmemdup
> /crypto/testmgr.c:2130:13-20: WARNING opportunity for kmemdup
> /crypto/testmgr.c:2152:9-16: WARNING opportunity for kmemdup
> 
> Signed-off-by: Christopher Diaz Riveros 
> ---
>  crypto/testmgr.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2] lib/scatterlist: Provide a DMA page iterator

2019-02-07 Thread Thomas Hellstrom
On Thu, 2019-02-07 at 22:26 +, Jason Gunthorpe wrote:
> Commit 2db76d7c3c6d ("lib/scatterlist: sg_page_iter: support sg lists
> w/o
> backing pages") introduced the sg_page_iter_dma_address() function
> without
> providing a way to use it in the general case. If the sg_dma_len() is
> not
> equal to the sg length callers cannot safely use the
> for_each_sg_page/sg_page_iter_dma_address combination.
> 
> Resolve this API mistake by providing a DMA specific iterator,
> for_each_sg_dma_page(), that uses the right length so
> sg_page_iter_dma_address() works as expected with all sglists.
> 
> A new iterator type is introduced to provide compile-time safety
> against
> wrongly mixing accessors and iterators.
> 
> Acked-by: Christoph Hellwig  (for scatterlist)
> Signed-off-by: Jason Gunthorpe 
> 

For the vmwgfx part, 
Acked-by: Thomas Hellstrom 

I'll take a deeper look to provide a vmwgfx fix as a follow up.

Thanks,
Thomas



[PATCH resend] regulator: fix device unlinking

2019-02-07 Thread Guennadi Liakhovetski
From: Guennadi Liakhovetski 

Device links are refcounted, device_link_remove() has to be called as
many times as device_link_add().

Signed-off-by: Guennadi Liakhovetski 
---

Resending with the "PATCH" subject line modifier.

 drivers/regulator/core.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 2c66b528aede..342102e8bc21 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1837,15 +1837,7 @@ static void _regulator_put(struct regulator *regulator)
debugfs_remove_recursive(regulator->debugfs);
 
if (regulator->dev) {
-   int count = 0;
-   struct regulator *r;
-
-   list_for_each_entry(r, >consumer_list, list)
-   if (r->dev == regulator->dev)
-   count++;
-
-   if (count == 1)
-   device_link_remove(regulator->dev, >dev);
+   device_link_remove(regulator->dev, >dev);
 
/* remove any sysfs entries */
sysfs_remove_link(>dev.kobj, regulator->supply_name);
-- 
2.17.1



Re: linux-next: build failure after merge of the sound-asoc tree

2019-02-07 Thread Takashi Iwai
On Fri, 08 Feb 2019 03:18:23 +0100,
Stephen Rothwell wrote:
> 
> Hi all,
> 
> After merging the sound-asoc tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> sound/soc/xilinx/xlnx_formatter_pcm.c: In function 'xlnx_formatter_pcm_new':
> sound/soc/xilinx/xlnx_formatter_pcm.c:539:9: error: void value not ignored as 
> it ought to be
>   return snd_pcm_lib_preallocate_pages_for_all(rtd->pcm,
>  ^~~
> SNDRV_DMA_TYPE_DEV, component->dev,
> ~~~
> xlnx_pcm_hardware.buffer_bytes_max,
> ~~~
> xlnx_pcm_hardware.buffer_bytes_max);
> ~~~
> sound/soc/xilinx/xlnx_formatter_pcm.c:543:1: warning: control reaches end of 
> non-void function [-Wreturn-type]
>  }
>  ^
> 
> Caused by commit
> 
>   6f6c3c36f091 ("ASoC: xlnx: add pcm formatter platform driver")
> 
> interacting with commit
> 
>   9adb5165f1de ("ALSA: pcm: Define snd_pcm_lib_preallocate_*() as returning 
> void")
> 
> from the sound tree.
> 
> I have applied the following merge fix patch for today:
> 
> From: Stephen Rothwell 
> Date: Fri, 8 Feb 2019 13:14:24 +1100
> Subject: [PATCH] Asoc: xlnx: fix up for 
> snd_pcm_lib_preallocate_pages_for_all() API change
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  sound/soc/xilinx/xlnx_formatter_pcm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/sound/soc/xilinx/xlnx_formatter_pcm.c 
> b/sound/soc/xilinx/xlnx_formatter_pcm.c
> index 97177d35652e..dc8721f4f56b 100644
> --- a/sound/soc/xilinx/xlnx_formatter_pcm.c
> +++ b/sound/soc/xilinx/xlnx_formatter_pcm.c
> @@ -536,10 +536,11 @@ static int xlnx_formatter_pcm_new(struct 
> snd_soc_pcm_runtime *rtd)
>  {
>   struct snd_soc_component *component = snd_soc_rtdcom_lookup(rtd,
>   DRV_NAME);
> - return snd_pcm_lib_preallocate_pages_for_all(rtd->pcm,
> + snd_pcm_lib_preallocate_pages_for_all(rtd->pcm,
>   SNDRV_DMA_TYPE_DEV, component->dev,
>   xlnx_pcm_hardware.buffer_bytes_max,
>   xlnx_pcm_hardware.buffer_bytes_max);
> + return 0;
>  }
>  
>  static const struct snd_pcm_ops xlnx_formatter_pcm_ops = {
> -- 
> 2.20.1

Thanks Stephen.

Mark, could you apply it on your tree?
Or let me pull your tree into mine.


Takashi


Re: [PATCH] huegtlbfs: fix page leak during migration of file pages

2019-02-07 Thread Naoya Horiguchi
On Thu, Feb 07, 2019 at 09:50:30PM -0800, Mike Kravetz wrote:
> On 2/7/19 6:31 PM, Naoya Horiguchi wrote:
> > On Thu, Feb 07, 2019 at 10:50:55AM -0800, Mike Kravetz wrote:
> >> On 1/30/19 1:14 PM, Mike Kravetz wrote:
> >>> +++ b/fs/hugetlbfs/inode.c
> >>> @@ -859,6 +859,16 @@ static int hugetlbfs_migrate_page(struct 
> >>> address_space *mapping,
> >>>   rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> >>>   if (rc != MIGRATEPAGE_SUCCESS)
> >>>   return rc;
> >>> +
> >>> + /*
> >>> +  * page_private is subpool pointer in hugetlb pages, transfer
> >>> +  * if needed.
> >>> +  */
> >>> + if (page_private(page) && !page_private(newpage)) {
> >>> + set_page_private(newpage, page_private(page));
> >>> + set_page_private(page, 0);
> > 
> > You don't have to copy PagePrivate flag?
> > 
> 
> Well my original thought was no.  For hugetlb pages, PagePrivate is not
> associated with page_private.  It indicates a reservation was consumed.
> It is set  when a hugetlb page is newly allocated and the allocation is
> associated with a reservation and the global reservation count is
> decremented.  When the page is added to the page cache or rmap,
> PagePrivate is cleared.  If the page is free'ed before being added to page
> cache or rmap, PagePrivate tells free_huge_page to restore (increment) the
> reserve count as we did not 'instantiate' the page.
> 
> So, PagePrivate is only set from the time a huge page is allocated until
> it is added to page cache or rmap.  My original thought was that the page
> could not be migrated during this time.  However, I am not sure if that
> reasoning is correct.  The page is not locked, so it would appear that it
> could be migrated?  But, if it can be migrated at this time then perhaps
> there are bigger issues for the (hugetlb) page fault code?

In my understanding, free hugetlb pages are not expected to be passed to
migrate_pages(), and currently that's ensured by each migration caller
which checks and avoids free hugetlb pages on its own.
migrate_pages() and its internal code are probably not aware of handling
free hugetlb pages, so if they are accidentally passed to migration code,
that's a big problem as you are concerned.
So the above reasoning should work at least this assumption is correct.

Most of migration callers are not intersted in moving free hugepages.
The one I'm not sure of is the code path from alloc_contig_range().
If someone think it's worthwhile to migrate free hugepage to get bigger
contiguous memory, he/she tries to enable that code path and the assumption
will be broken.

Thanks,
Naoya Horiguchi

> 
> >>> +
> >>> + }
> >>> +
> >>>   if (mode != MIGRATE_SYNC_NO_COPY)
> >>>   migrate_page_copy(newpage, page);
> >>>   else
> >>> diff --git a/mm/migrate.c b/mm/migrate.c
> >>> index f7e4bfdc13b7..0d9708803553 100644
> >>> --- a/mm/migrate.c
> >>> +++ b/mm/migrate.c
> >>> @@ -703,8 +703,14 @@ void migrate_page_states(struct page *newpage, 
> >>> struct page *page)
> >>>*/
> >>>   if (PageSwapCache(page))
> >>>   ClearPageSwapCache(page);
> >>> - ClearPagePrivate(page);
> >>> - set_page_private(page, 0);
> >>> + /*
> >>> +  * Unlikely, but PagePrivate and page_private could potentially
> >>> +  * contain information needed at hugetlb free page time.
> >>> +  */
> >>> + if (!PageHuge(page)) {
> >>> + ClearPagePrivate(page);
> >>> + set_page_private(page, 0);
> >>> + }
> > 
> > # This argument is mainly for existing code...
> > 
> > According to the comment on migrate_page():
> > 
> > /*
> >  * Common logic to directly migrate a single LRU page suitable for
> >  * pages that do not use PagePrivate/PagePrivate2.
> >  *
> >  * Pages are locked upon entry and exit.
> >  */
> > int migrate_page(struct address_space *mapping, ...
> > 
> > So this common logic assumes that page_private is not used, so why do
> > we explicitly clear page_private in migrate_page_states()?
> 
> Perhaps someone else knows.  If not, I can do some git research and
> try to find out why.
> 
> > buffer_migrate_page(), which is commonly used for the case when
> > page_private is used, does that clearing outside migrate_page_states().
> > So I thought that hugetlbfs_migrate_page() could do in the similar manner.
> > IOW, migrate_page_states() should not do anything on PagePrivate.
> > But there're a few other .migratepage callbacks, and I'm not sure all of
> > them are safe for the change, so this approach might not fit for a small 
> > fix.
> 
> I will look at those as well unless someone knows without researching.
> 
> > 
> > # BTW, there seems a typo in $SUBJECT.
> 
> Thanks!
> 
> -- 
> Mike Kravetz
> 


Re: [PATCH] input: keyboard: gpio-keys-polled: use input name from pdata if available

2019-02-07 Thread Dmitry Torokhov
Hi Enrico,

On Thu, Feb 07, 2019 at 06:05:31PM +0100, Enrico Weigelt, metux IT consult 
wrote:
> Instead of hardcoding the input name to the driver name ('gpio-keys-polled'),
> allow the passing a name via platform data ('name' field was already present),
> but default to old behaviour in case of NULL.

I thought the world is moving away from platform data and towards
OF/ACPI systems. What device are you targeting with this change?

I would want to convert gpio-keys[-polled] to generic device properties
and away form platform data...

> 
> Signed-off-by: Enrico Weigelt, metux IT consult 
> ---
>  drivers/input/keyboard/gpio_keys_polled.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/input/keyboard/gpio_keys_polled.c 
> b/drivers/input/keyboard/gpio_keys_polled.c
> index edc7262..3312186 100644
> --- a/drivers/input/keyboard/gpio_keys_polled.c
> +++ b/drivers/input/keyboard/gpio_keys_polled.c
> @@ -272,7 +272,7 @@ static int gpio_keys_polled_probe(struct platform_device 
> *pdev)
>  
>   input = poll_dev->input;
>  
> - input->name = pdev->name;
> + input->name = (pdata->name ? pdata->name : pdev->name);
>   input->phys = DRV_NAME"/input0";
>  
>   input->id.bustype = BUS_HOST;
> -- 
> 1.9.1
> 

Thanks.

-- 
Dmitry


[PATCH] lib/string.c: make strncmp() smaller

2019-02-07 Thread Alexey Dobriyan
Space savings on x86_64:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-29 (-29)
Function old new   delta
strncmp   67  38 -29

Space savings on arm:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-60 (-60)
Function old new   delta
strncmp  116  56 -60

Signed-off-by: Alexey Dobriyan 
---

 lib/string.c |   10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

--- a/lib/string.c
+++ b/lib/string.c
@@ -344,16 +344,14 @@ EXPORT_SYMBOL(strcmp);
  */
 int strncmp(const char *cs, const char *ct, size_t count)
 {
-   unsigned char c1, c2;
+   while (count--) {
+   unsigned int c1 = *(unsigned char *)cs++;
+   unsigned int c2 = *(unsigned char *)ct++;
 
-   while (count) {
-   c1 = *cs++;
-   c2 = *ct++;
if (c1 != c2)
-   return c1 < c2 ? -1 : 1;
+   return c1 - c2;
if (!c1)
break;
-   count--;
}
return 0;
 }


linux-next: build failure after merge of the apparmor tree

2019-02-07 Thread Stephen Rothwell
Hi all,

After merging the apparmor tree, today's linux-next build (powerpc
allyesconfig) failed like this:

security/apparmor/policy_unpack.c: In function 'deflate_compress':
security/apparmor/policy_unpack.c:1064:4: error: implicit declaration of 
function 'vfree'; did you mean 'kfree'? [-Werror=implicit-function-declaration]
vfree(stgbuf);
^
kfree

Caused by commit

  876dd866c084 ("apparmor: Initial implementation of raw policy blob 
compression")

I have reverted that commit for today.

-- 
Cheers,
Stephen Rothwell


pgpiKNGVRbqKT.pgp
Description: OpenPGP digital signature


Re: [PATCH v8 2/8] mfd: bd70528: Support ROHM bd70528 PMIC - core

2019-02-07 Thread Matti Vaittinen
Hello Again Lee,

After a good night sleep few things came to my mind =)

On Thu, Feb 07, 2019 at 02:00:53PM +, Lee Jones wrote:
> On Thu, 07 Feb 2019, Matti Vaittinen wrote:
> 
> > +
> > +static struct mfd_cell bd70528_mfd_cells[] = {
> > +   { .name = "bd70528-pmic", },
> > +   { .name = "bd70528-gpio", },
> > +   /*
> > +* We use BD71837 driver to drive the clk block. Only differences to
> > +* BD70528 clock gate are the register address and mask.
> > +*/
> > +   { .name = "bd718xx-clk", },
> > +   { .name = "bd70528-wdt", },
> > +   {
> > +   .name = "bd70528-power",
> > +   .resources = _irqs[0],
> > +   .num_resources = ARRAY_SIZE(charger_irqs),
> > +   },
> > +   {
> 
> These should be on the same line.

I know I said 'Ok' yesterday. And I can change the styling to what ever
suits you - but I am not entirely sure what you mean by this? Do you
mean that the brackets should be on same line? After a quick look to
few other MFD devices it seems to be common convention to have these on
separate lines - and such style is used also at other locations
throughout this file. 

> > +static const struct regmap_access_table volatile_regs = {
> > +   .yes_ranges = _ranges[0],
> > +   .n_yes_ranges = ARRAY_SIZE(volatile_ranges),
> > +};
> > +
> > +static struct regmap_config bd70528_regmap = {
> > +   .reg_bits = 8,
> > +   .val_bits = 8,
> > +   .volatile_table = _regs,
> > +   .max_register = BD70528_MAX_REGISTER,
> > +   .cache_type = REGCACHE_RBTREE,
> > +};
> 
> '\n' here.
> 
> > +/* bit [0] - Shutdown register */
> > +unsigned int bit0_offsets[] = {0};
> > +/* bit [1] - Power failure register */
> > +unsigned int bit1_offsets[] = {1};
> > +/* bit [2] - VR FAULT register */
> > +unsigned int bit2_offsets[] = {2};
> > +/* bit [3] - PMU register interrupts */
> > +unsigned int bit3_offsets[] = {3};
> > +/* bit [4] - Charger 1 and Charger 2 registers */
> > +unsigned int bit4_offsets[] = {4, 5};
> > +/* bit [5] - RTC register */
> > +unsigned int bit5_offsets[] = {6};
> > +/* bit [6] - GPIO register */
> > +unsigned int bit6_offsets[] = {7};
> > +/* bit [7] - Invalid operation register */
> > +unsigned int bit7_offsets[] = {8};
> 
> What on earth is this?

Would this comment help:
/*
 * Mapping of main IRQ register bits to sub irq register offsets so
 * that we can access corect sub IRQ registers based on bits that
 * are set in main IRQ register.
 */

/* bit [0] - Shutdown register */
unsigned int bit0_offsets[] = {0};
/* bit [1] - Power failure register */
unsigned int bit1_offsets[] = {1};
/* bit [2] - VR FAULT register */
unsigned int bit2_offsets[] = {2};
/* bit [3] - PMU register interrupts */
unsigned int bit3_offsets[] = {3};
/* bit [4] - Charger 1 and Charger 2 registers */
unsigned int bit4_offsets[] = {4, 5};
/* bit [5] - RTC register */
unsigned int bit5_offsets[] = {6};
/* bit [6] - GPIO register */
unsigned int bit6_offsets[] = {7};
/* bit [7] - Invalid operation register */
unsigned int bit7_offsets[] = {8};

> > +static int bd70528_wdt_set(struct bd70528 *bd70528, int enable, int 
> > *old_state)
> > +{
[..]
> > +}
> 
> Shouldn't this be one in the WDT driver?

Maybe I should explain it like this:

/*
 * Both the WDT and RTC drivers need to be able to control WDT. WDT uses
 * RTC for timeouts and setting the RTC may trigger watchdog. Thus the
 * RTC must disable the WDT when RTC time is set. We provide WDT disabling
 * code from the MFD parent as we don't want to make direct dependency
 * between RTC and WDT. Some may want to use only WDT or only RTC.
 */

#define WD_CTRL_MAGIC1 0x55
#define WD_CTRL_MAGIC2 0xAA

static int bd70528_wdt_set(struct bd70528 *bd70528, int enable, int *old_state)
{

> > +   /*
> > +* Disallow type setting for all IRQs by default as
> > +*  most of them do not support setting type.
> > +*/
> > +   for (i = 0; i < ARRAY_SIZE(irqs); i++)
> > +   irqs[i].type.types_supported = 0;
> > +
> > +   irqs[BD70528_INT_GPIO0].type.type_reg_offset = 0;
> > +   irqs[BD70528_INT_GPIO0].type.type_rising_val = 0x20;
> > +   irqs[BD70528_INT_GPIO0].type.type_falling_val = 0x10;
> > +   irqs[BD70528_INT_GPIO0].type.type_level_high_val = 0x40;
> > +   irqs[BD70528_INT_GPIO0].type.type_level_low_val = 0x50;
> > +   irqs[BD70528_INT_GPIO0].type.types_supported = (IRQ_TYPE_EDGE_BOTH |
> > +   IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_LEVEL_LOW);
> > +   irqs[BD70528_INT_GPIO1].type.type_reg_offset = 2;
> > +   irqs[BD70528_INT_GPIO1].type.type_rising_val = 0x20;
> > +   irqs[BD70528_INT_GPIO1].type.type_falling_val = 0x10;
> > +   irqs[BD70528_INT_GPIO1].type.type_level_high_val = 0x40;
> > +   irqs[BD70528_INT_GPIO1].type.type_level_low_val = 0x50;
> > +   irqs[BD70528_INT_GPIO1].type.types_supported = (IRQ_TYPE_EDGE_BOTH |
> > +   IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_LEVEL_LOW);
> > +   irqs[BD70528_INT_GPIO2].type.type_reg_offset = 4;
> > +   irqs[BD70528_INT_GPIO2].type.type_rising_val = 0x20;
> > 

Re: [PATCH] Input: ps2-gpio - flush TX work when closing port

2019-02-07 Thread Dmitry Torokhov
On Thu, Feb 07, 2019 at 06:03:03PM -0500, Sven Van Asbroeck wrote:
> On Thu, Feb 7, 2019 at 5:27 PM Dmitry Torokhov
>  wrote:
> >
> > +   flush_work(>tx_work.work);
> 
> Would cancel_work_sync() be better than flush_work() ?

No, because we want to have interrupt and gpios in a consistent state.
If we cancel then we need to see if we should disable it or it may
already be disabled, etc. This way we know it is enabled after
flush_delayed_work() returns, and we need to disable it.

Thanks.

-- 
Dmitry


[GIT PULL] xfs: fixes for v5.0-rc6

2019-02-07 Thread Darrick J. Wong
Hi Linus,

Here are a handful of XFS fixes to fix a data corruption problem, a
crasher bug, and a deadlock.  It merged cleanly against this morning's
master, so please let me know if you encounter any problems.

--D

The following changes since commit 8834f5600cf3c8db365e18a3d5cac2c2780c81e5:

  Linux 5.0-rc5 (2019-02-03 13:48:04 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git tags/xfs-5.0-fixes-1

for you to fetch changes up to add46b3b021263c02d5a7080c58e5b459479fafd:

  xfs: set buffer ops when repair probes for btree type (2019-02-03 14:03:59 
-0800)


Changes since last update:
- Fix cache coherency problem with writeback mappings
- Fix buffer deadlock when shutting fs down
- Fix a null pointer dereference when running online repair


Brian Foster (2):
  xfs: eof trim writeback mapping as soon as it is cached
  xfs: end sync buffer I/O properly on shutdown error

Darrick J. Wong (1):
  xfs: set buffer ops when repair probes for btree type

 fs/xfs/scrub/repair.c | 11 ---
 fs/xfs/xfs_aops.c |  2 ++
 fs/xfs/xfs_buf.c  | 19 +--
 3 files changed, 27 insertions(+), 5 deletions(-)


Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-07 Thread Dan Williams
On Thu, Feb 7, 2019 at 9:19 PM Jason Gunthorpe  wrote:
>
> On Thu, Feb 07, 2019 at 03:54:58PM -0800, Dan Williams wrote:
>
> > > The only production worthy way is to have the FS be a partner in
> > > making this work without requiring revoke, so the critical RDMA
> > > traffic can operate safely.
> >
> > ...belies a path forward. Just swap out "FS be a partner" with "system
> > administrator be a partner". In other words, If the RDMA stack can't
> > tolerate an MR being disabled then the administrator needs to actively
> > disable the paths that would trigger it. Turn off reflink, don't
> > truncate, avoid any future FS feature that might generate unwanted
> > lease breaks.
>
> This is what I suggested already, except with explicit kernel aid, not
> left as some gordian riddle for the administrator to unravel.

It's a riddle either way. "Why is my truncate failing?"

The lease path allows the riddle to be solved in a way that moves the
ecosystem forwards. It provides a mechanism to notify (effectively mmu
notifers plumbed to userspace), an opportunity for capable RDMA apps /
drivers to do better than SIGKILL, and a path for filesystems to
continue to innovate and not make users choose filesystems just on the
chance they might need to do RDMA.

> You already said it is too hard for expert FS developers to maintain a
> mode switch

I do disagree with a truncate behavior switch, but reflink already has
a mkfs switch so it's obviously possible for any future feature that
might run afoul of the RDMA restrictions to have fs-feature control.

> , it seems like a really big stretch to think application
> and systems architects will have any hope to do better.

Certainly they can, it's just a matter of documenting options. It can
be made easier if we can get commonly named options across filesystems
to disable lease dependent functionality.

> It makes much more sense for the admin to flip some kind of bit and
> the FS guarentees the safety that you are asking the admin to create.

Flipping the bit changes the ABI contract in backwards incompatible
ways. I'm saying go the other way, audit the configuration for legacy
RDMA safety.

> > We would need to make sure that lease notifications include the
> > information to identify the lease breaker to debug escapes that
> > might happen, but it is a solution that can be qualified to not
> > lease break.
>
> I think building a complicated lease framework and then telling
> everyone in user space to design around it so it never gets used would
> be very hard to explain and justify.

There is no requirement to design around it. If an RDMA-implementation
doesn't use it the longterm-GUPs are already blocked. If the
implementation does use it, but fails to service lease breaks it gets
SIGKILL with information of what lead to the SIGKILL so the
configuration can be fixed. Implementations that want to do better
have an opportunity to be a partner to the filesytem and repair the
MR.

> Never mind the security implications if some seemingly harmless future
> filesystem change causes unexpected lease revokes across something
> like a tenant boundary.

Fileystems innovate quickly, but not that quickly. Ongoing
communication between FS and RDMA developers is not insurmountable.

> > In any event, this lets end users pick their filesystem
> > (modulo RDMA incompatible features), provides an enumeration of
> > lease break sources in the kernel, and opens up FS-DAX to a wider
> > array of RDMA adapters. In general this is what Linux has
> > historically done, give end users technology freedom.
>
> I think this is not the Linux model. The kernel should not allow
> unpriv user space to do an operation that could be unsafe.

There's permission to block unprivileged writes/truncates to a file,
otherwise I'm missing what hole is being opened? That said, the horse
already left the barn. Linux has already shipped in the page-cache
case "punch hole in the middle of a MR succeeds and leaves the state
of the file relative to ongoing RDMA inconsistent". Now that we know
about the bug the question is how do we do better than the current
status quo of taking all of the functionality away.

> I continue to think this is is the best idea that has come up - but
> only if the filesystem is involved and expressly tells the kernel
> layers that this combination of DAX & filesystem is safe.

I think we're getting into "need to discuss at LSF/MM territory",
because the concept of "DAX safety", or even DAX as an explicit FS
capability has been a point of contention since day one. We're trying
change DAX to be defined by mmap API flags like MAP_SYNC and maybe
MAP_DIRECT in the future.

For example, if the MR was not established to a MAP_SYNC vma then the
kernel should be free to indirect the RDMA through the page-cache like
the typical non-DAX case. DAX as a global setting is too coarse.


Re: [RFC/PATCH 0/5] DVFS in the OPP core

2019-02-07 Thread Viresh Kumar
On 07-02-19, 14:37, Ulf Hansson wrote:
> I think we also need to consider cross SoC drivers. One SoC may have
> both clocks and OPPs to manage, while another may have only clocks.

We already have that case with CPUs as well and dev_pm_opp_set_rate()
takes care of it.

> Even it this may be fairly uncommon, we should consider it, before we
> decide to fold in additional clock management, like
> clk_prepare|unprepare() for example, behind the dev_pm_opp_set_rate()
> API.
> 
> The point is, the driver may need to call clk_prepare|enable()
> anyways, unless we make that conditional depending on a DT compatible
> string, for example. Of course, because the clock prepare/enable is
> reference counted, there may not be a problem in practice to have both
> the OPP and driver to deal with it.

-- 
viresh


Re: [PATCH net-next] mlxsw: spectrum_router: Use struct_size() in kzalloc()

2019-02-07 Thread Jiri Pirko
Fri, Feb 08, 2019 at 04:42:41AM CET, gust...@embeddedor.com wrote:
>One of the more common cases of allocation size calculations is finding
>the size of a structure that has a zero-sized array at the end, along
>with memory for some number of elements for that array. For example:
>
>struct foo {
>int stuff;
>struct boo entry[];
>};
>
>size = sizeof(struct foo) + count * sizeof(struct boo);
>instance = kzalloc(size, GFP_KERNEL)
>
>Instead of leaving these open-coded and prone to type mistakes, we can
>now use the new struct_size() helper:
>
>instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)
>
>Notice that, in this case, variable alloc_size is not necessary, hence
>it is removed.
>
>This code was detected with the help of Coccinelle.
>
>Signed-off-by: Gustavo A. R. Silva 

Acked-by: Jiri Pirko 


Re: [RFC/PATCH 0/5] DVFS in the OPP core

2019-02-07 Thread Viresh Kumar
On 06-02-19, 23:58, Stephen Boyd wrote:
> Quoting Viresh Kumar (2019-01-31 01:23:49)
> > FWIW, I suggested exactly this solution sometime back [1]
> > 
> > - Drivers need to use two API sets to change clock rate (OPP helpers)
> >   and enable/disable them (CLK framework helpers) and this leads us to
> >   exactly that combination. Is that acceptable ? It doesn't look great
> >   to me as well..
> 
> Agreed. I don't think anyone thinks this looks great, but I'll argue
> that it's improving OPP for devices that already use it so that we can
> remove voltage requirements when their clk is off. Think about CPUs that
> are in their own clk domain where we want to remove the voltage
> requirement when those CPUs are offline, or a GPU that wants to remove
> its voltage requirement when it turns off clks. The combination is
> already out there, just OPP hasn't solved this problem.
> 
> The only other plan I had was to implement another API like
> dev_pm_set_state() or something like that and have that do something
> like what the OPP rate API does right now. The main difference being
> that the argument to the function would be some opaque u64 that's
> converted by the bus/class/genpd attached to the device into whatever
> frequency/voltage/performance state is desired (and sequenced in the
> right order too). And then I was thinking that runtime PM or explicit
> dev_pm_set_state() calls would be used to tell this new layer that the
> device was going to a lower power mode with some other number (sub-kHz
> integer?) and have that be translated again into some
> frequency/voltage/performance state.
> 
> Either way, each driver will have to change from using the clk APIs to
> changes rates to something else like one of these APIs, so I don't see a
> huge difference. Drivers will have to change.

I agree, that's why I wrote the dev_pm_opp_set_rate() API initially.

> > 
> > - Do we expect the callers will disable clk before calling
> >   opp-set-rate with 0 ? We should remove the regulator requirements as
> >   well along with perf-state.
> 
> Yes, that's the plan. Problems come along with this though, like shared
> resource constraints and actually knowing the clk on/off state,
> frequency, voltage, etc. at boot time and making sure to keep those
> constraints satisfied during normal operation.

But that isn't any different from drivers doing clk_disable directly,
right ? So that shouldn't worry us.

> > - What about enabling/disabling clock as well from OPP framework. We
> >   can enable it on the very first call to opp-set-rate and disable
> >   when freq is 0. That will simplify the drivers as well.
> 
> It works when those drivers aren't calling clk_disable() directly from
> some irq handler. I don't think that's very common, but in those cases
> we would probably want to allow drivers to quickly gate and ungate their
> clks but then defer the sleeping stuff (voltages and off chip clks) to
> the schedulable contexts. We'll still be left with a small number of
> drivers that want to only call clk_prepare() and clk_unprepare() from
> within OPP and keep calling clk_enable() and clk_disable() from their
> driver. So introduce different APIs for those drivers to indicate this
> to OPP? And only do that when it becomes a requirement?

I am not sure I understood this well. The reference counting within
clk/regulator should let both the layers (driver and opp core) work
just fine. Why would a driver don't want OPP core to call
clk_prepare_enable() all the time ?

> Otherwise I don't really see a problem with the OPP call handling the
> enable state of the clk as well.

Right, so I would like that to be part of this series when this gets
implemented.

> > > One nice feature of this approach is that we don't need to change the
> > > OPP binding to support this. We can specify only the max frequencies and
> > > the voltage requirements in DT with the existing binding and slightly
> > > tweak the OPP code to achieve these results. 
> > > 
> > > This series includes a conversion of the uart and spi drivers on
> > > qcom sdm845 where these patches are being developed. It shows how a
> > > driver is converted from the clk APIs to the OPP APIs and how
> > > enable/disable state of the clk is communicated to the OPP layer.
> > > 
> > > Some open topics and initial points for discussion are:
> > > 
> > > 1) The dev_pm_opp_set_rate() API changes may break something that's 
> > > relying on the rate rounding that OPP provides. If those exist,
> > > we may need to implement another API that is more explicit about using
> > > the clk API instead of the OPP tables.

I don't remember any such cases, I may have forgotten about those
though.

> > > 2) dev_pm_opp_set_rate(0) is an interesting solution to the problem of
> > > dropping the rate requirement. Is there anything better than this?

I am okay with it. I don't want to invent another set of APIs to
enable / disable the resources.

> > > 3) How do we handle devices that already 

Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt

2019-02-07 Thread Florian Westphal
Sander Eikelenboom  wrote:
> L.S.,
> 
> While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression 
> with NAT.
> (using an nftables firewall with NAT and connection tracking).
> 
> Unfortunately it isn't too obvious since no errors are logged, but on clients 
> it
> causes symptoms like firefox intermittently not being able to load pages with:
> Network Protocol Error
> An error occurred during a connection to www.example.com
> The page you are trying to view cannot be shown because an error in the 
> network protocol was detected.
> Please contact the website owners to inform them of this problem.
> 
> But it's only intermittently, so i can still visit some webpages with 
> clients, 
> could be that packet size and or fragments are at play ?
> 
> So I tried testing with 
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with 
> e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have 
> the latest netdev has to offer,
> but to no avail. 
> 
> After that I tried to git bisect and ended up with:
> 
> faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
> commit faec18dbb0405c7d4dda025054511dc3a6696918
> Author: Florian Westphal 
> Date:   Thu Dec 13 16:01:33 2018 +0100
> 
> netfilter: nat: remove l4proto->manip_pkt

Thanks, this is immensely helpful.

I think I see the bug, we can't use target->dst.protonum in
nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing
with a related icmp packet.

I will send a patch in a few hours when I get back.


[PATCH] crypto: qat - Remove unused goto label

2019-02-07 Thread Herbert Xu
On Mon, Feb 04, 2019 at 11:01:06AM +1100, Stephen Rothwell wrote:
> Hi Herbert,
> 
> After merging the crypto tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
> 
> drivers/crypto/qat/qat_common/adf_transport.c: In function 
> 'adf_init_etr_data':
> drivers/crypto/qat/qat_common/adf_transport.c:501:1: warning: label 
> 'err_bank_debug' defined but not used [-Wunused-label]
>  err_bank_debug:
>  ^~
> 
> Introduced by commit
> 
>   f0fcf9ade46a ("crypto: qat - no need to check return value of 
> debugfs_create functions")

Thanks for the report Stephen!

---8<---
This patch removes an unused label.

Reported-by: Stephen Rothwell 
Fixes: f0fcf9ade46a ("crypto: qat - no need to check return...")
Signed-off-by: Herbert Xu 

diff --git a/drivers/crypto/qat/qat_common/adf_transport.c 
b/drivers/crypto/qat/qat_common/adf_transport.c
index ac658ce46836..2136cbe4bf6c 100644
--- a/drivers/crypto/qat/qat_common/adf_transport.c
+++ b/drivers/crypto/qat/qat_common/adf_transport.c
@@ -498,7 +498,6 @@ int adf_init_etr_data(struct adf_accel_dev *accel_dev)
 
 err_bank_all:
debugfs_remove(etr_data->debug);
-err_bank_debug:
kfree(etr_data->banks);
 err_bank:
kfree(etr_data);
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC v1 0/3] Address potential user-after-free on module unload

2019-02-07 Thread Greg KH
On Wed, Feb 06, 2019 at 09:30:29AM -0800, Dmitry Torokhov wrote:
> On Wed, Feb 6, 2019 at 8:47 AM Greg KH  wrote:
> >
> > On Tue, Feb 05, 2019 at 02:12:31PM -0500, Sven Van Asbroeck wrote:
> > > On Tue, Feb 5, 2019 at 1:43 PM Greg KH  wrote:
> > > >
> > > >
> > > > It really should happen when the device is removed (if it is a driver
> > > > that binds to a device.)
> > >
> > > Absolutely. That's why I'm advocating adding a devm_init_work(),
> > > which will take care of this automatically.
> > >
> > > But it's of course not universally applicable. Not all drivers use devm.
> >
> > Ick, no, watch out for devm() calls.  Odds are this is _NOT_ what you
> > want to do for a device.  Remember when devm calls get freed (hint, not
> > at driver unbind/unload, but at device structure removal.
> 
> 
> ??? We unwind devm on probe() failure and after remove() is called.
> The device can live on.

{sigh} you are right, I don't know what I was thinking.  Then why were
the DRM developers so upset that they didn't see this happening
recently?  Anyway, all should be fine here, nevermind...

thanks,

greg k-h


Re: [PATCH 0/2] cpufreq/opp: rework regulator initialization

2019-02-07 Thread Viresh Kumar
On 07-02-19, 13:22, Marek Szyprowski wrote:
> Dear All,
> 
> Recent commit 9ac6cb5fbb17 ("i2c: add suspended flag and accessors for
> i2c adapters") added a visible warning for an attempt to do i2c transfer
> over a suspended i2c bus. This revealed a long standing issue in the
> cpufreq-dt driver, which gives a following warning during system
> suspend/resume cycle:
> 
> --->8---
> Enabling non-boot CPUs ...
> CPU1 is up
> CPU2 is up
> CPU3 is up
> [ cut here ]
> WARNING: CPU: 4 PID: 29 at drivers/i2c/i2c-core-base.c:1869 
> __i2c_transfer+0x6f8/0xa50
> Modules linked in:
> CPU: 4 PID: 29 Comm: cpuhp/4 Tainted: GW 
> 5.0.0-rc4-next-20190131-00024-g54b06b29cc65 #5324
> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0x90/0xc8)
> [] (dump_stack) from [] (__warn+0xf8/0x124)
> [] (__warn) from [] (warn_slowpath_null+0x40/0x48)
> [] (warn_slowpath_null) from [] 
> (__i2c_transfer+0x6f8/0xa50)
> [] (__i2c_transfer) from [] (i2c_transfer+0x70/0xe4)
> [] (i2c_transfer) from [] (regmap_i2c_read+0x48/0x64)
> [] (regmap_i2c_read) from [] (_regmap_raw_read+0xf8/0x450)
> [] (_regmap_raw_read) from [] (_regmap_bus_read+0x38/0x68)
> [] (_regmap_bus_read) from [] (_regmap_read+0x60/0x250)
> [] (_regmap_read) from [] (regmap_read+0x3c/0x5c)
> [] (regmap_read) from [] 
> (regulator_is_enabled_regmap+0x20/0x90)
> [] (regulator_is_enabled_regmap) from [] 
> (_regulator_is_enabled+0x34/0x40)
> [] (_regulator_is_enabled) from [] 
> (create_regulator+0x1a4/0x25c)
> [] (create_regulator) from [] (_regulator_get+0xe4/0x278)
> [] (_regulator_get) from [] 
> (dev_pm_opp_set_regulators+0xa0/0x1c0)
> [] (dev_pm_opp_set_regulators) from [] 
> (cpufreq_init+0x98/0x2d0)
> [] (cpufreq_init) from [] (cpufreq_online+0xc8/0x71c)
> [] (cpufreq_online) from [] 
> (cpuhp_cpufreq_online+0x8/0x10)
> [] (cpuhp_cpufreq_online) from [] 
> (cpuhp_invoke_callback+0xf4/0xebc)
> [] (cpuhp_invoke_callback) from [] 
> (cpuhp_thread_fun+0x1d8/0x320)
> [] (cpuhp_thread_fun) from [] 
> (smpboot_thread_fn+0x194/0x340)
> [] (smpboot_thread_fn) from [] (kthread+0x124/0x160)
> [] (kthread) from [] (ret_from_fork+0x14/0x20)
> Exception stack(0xe897dfb0 to 0xe897dff8)
> dfa0:    
> dfc0:        
> dfe0:     0013 
> irq event stamp: 3865
> hardirqs last  enabled at (3873): [] vprintk_emit+0x228/0x2a4
> hardirqs last disabled at (3880): [] vprintk_emit+0x12c/0x2a4
> softirqs last  enabled at (3052): [] __do_softirq+0x3a4/0x66c
> softirqs last disabled at (3043): [] irq_exit+0x140/0x168
> ---[ end trace db48b455d924fec2 ]---
> CPU4 is up
> CPU5 is up
> CPU6 is up
> CPU7 is up
> --->8---
> 
> This is a scenario that triggers the above issue:
> 
> 1. system disables non-boot cpu's at the end of system suspend procedure,
> 2. this in turn deinitializes cpufreq drivers for the disabled cpus,
> 3. early in the system resume procedure all cpus are got back to online
>state,
> 4. this in turn causes cpufreq to be initialized for the newly onlined
>cpus,
> 5. cpufreq-dt acquires all its resources (clocks, regulators) during
>->init() callback,
> 6. getting regulator require to check its state, what in turn requires
>i2c transfer,
> 7. during system early resume stage this is not really possible.
> 
> The issue is caused by cpufreq-dt driver not keeping its resources for
> the whole driver lifetime and relying that they can be always acquired
> at any system context. This problem has been observed on Samsung
> Exynos based Odroid XU3/XU4 boards, but it happens on all boards, which
> have separate regulators for different CPU clusters.

Why don't you get similar problem during suspend? I think you can get
it when the CPUs are offlined as I2C would have gone by then. The
cpufreq or OPP core can try and run some regulator or genpd or clk
calls to disable resources, etc. Even if doesn't happen, it certainly
can.

Also at resume the cpufreq core may try to change the frequency right
from ->init() on certain cases, though not everytime and so the
problem can come despite of this series.

We guarantee that the resources are available during probe but not
during resume, that's where the problem is.

@Rafael any suggestions on how to fix this ?

-- 
viresh


Re: [PATCH 4.4 00/34] 4.4.174-stable review

2019-02-07 Thread Greg Kroah-Hartman
On Fri, Feb 08, 2019 at 11:43:55AM +0530, Naresh Kamboju wrote:
> On Thu, 7 Feb 2019 at 17:12, Greg Kroah-Hartman
>  wrote:
> >
> > This is the start of the stable review cycle for the 4.4.174 release.
> > There are 34 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sat Feb  9 11:30:10 UTC 2019.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.174-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.4.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> 
> Results from Linaro’s test farm.
> No regressions on arm64, arm, x86_64, and i386.

Great, thanks for testing and letting me know.

greg k-h


Re: [LSF/MM TOPIC] Non standard size THP

2019-02-07 Thread Anshuman Khandual



On 02/08/2019 09:54 AM, Matthew Wilcox wrote:
> On Fri, Feb 08, 2019 at 07:43:57AM +0530, Anshuman Khandual wrote:
>> How non-standard huge pages can be supported for THP
>>
>>  - THP starts recognizing non standard huge page (exported by arch) like 
>> HPAGE_CONT_(PMD|PTE)_SIZE
>>  - THP starts operating for either on HPAGE_PMD_SIZE or 
>> HPAGE_CONT_PMD_SIZE or HPAGE_CONT_PTE_SIZE
>>  - set_pmd_at() only recognizes HPAGE_PMD_SIZE hence replace 
>> set_pmd_at() with set_huge_pmd_at()
>>  - set_huge_pmd_at() could differentiate between HPAGE_PMD_SIZE or 
>> HPAGE_CONT_PMD_SIZE
>>  - In case for HPAGE_CONT_PTE_SIZE extend page table walker till PTE 
>> level
>>  - Use set_huge_pte_at() which can operate on multiple contiguous PTE 
>> bits
> 
> I think your proposed solution reflects thinking like a hardware person
> rather than like a software person.  Or maybe like an MM person rather
> than a FS person.  I see the same problem with Kirill's solutions ;-)

You might be right on this :) I was trying to derive a solution based on
all existing semantics with limited code addition rather than inventing
something completely different.

> 
> Perhaps you don't realise that using larger pages when appropriate
> would also benefit filesystems as well as CPUs.  You didn't include
> linux-fsdevel on this submission, so that's a plausible explanation.

Yes that was an omission. Thanks for adding linux-fsdevel to the thread.

> 
> The XArray currently supports arbitrary power-of-two-naturally-aligned
> page sizes, and conveniently so does the page allocator [1].  The problem
> is that various bits of the MM have a very fixed mindset that pages are
> PTE, PMD or PUD in size.

I agree. But in general it works as allocated page with required order do
reside in one of these levels in the page table.

> 
> We should enhance routines like vmf_insert_page() to handle
> arbitrary sized pages rather than having separate vmf_insert_pfn()
> and vmf_insert_pfn_pmd().  We probably need to enhance the set_pxx_at()
> API to pass in an order, rather than explicitly naming pte/pmd/pud/...

I agree. set_huge_pte_at() actually does that to some extent on ARM64.
But thats just for HugeTLB.

> 
> First, though, we need to actually get arbitrary sized pages handled
> correctly in the page cache.  So if anyone's interested in talking about
> this, but hasn't been reviewing or commenting on the patches I've been
> sending to make this happen, I'm going to seriously question their actual
> commitment to wanting this to happen, rather than wanting a nice holiday
> in Puerto Rico.
> 
> Sorry to be so blunt about this, but I've only had review from Kirill,
> which makes me think that nobody else actually cares about getting
> this fixed.

To be honest I have not been following your work in this regard. I started
looking into this problem late last year and my goal has been more focused 
towards a THP solution for intermediate page table level sized huge pages.

But I agree to your point that there should be an wider solution which can
make generic MM deal with page sizes of any order rather than page table
level ones like PTE/PMD/PUD etc.

> 
> [1] Support for arbitrary sized and aligned entries is in progress for
> the XArray, but I don't think there's any appetite for changing the buddy
> allocator to let us allocate "pages" that are an arbitrary extent in size.
> 
> 


Re: [PATCH v2] livepatch: core: Return EOPNOTSUPP instead of ENOSYS

2019-02-07 Thread Kamalesh Babulal
On Wed, Feb 06, 2019 at 11:28:32AM +0100, Petr Mladek wrote:
> On Tue 2019-02-05 09:59:33, Josh Poimboeuf wrote:
> > On Tue, Feb 05, 2019 at 03:33:28AM +0900, Alice Ferrazzi wrote:
> > > From: Alice Ferrazzi 
> > > 
> > > As a result of an unsupported operation is better to use EOPNOTSUPP
> > > as error code.
> > > ENOSYS is only used for 'invalid syscall nr' and nothing else.
> > > 
> > > Signed-off-by: Alice Ferrazzi 
> > 
> > Acked-by: Josh Poimboeuf 
> 
> I have applied the patch into for-5.1/atomic-replace branch.

Sorry to jump into the discussion so late. Thinking a little more about
the check itself, previously with immediate flag an architecture can do
livepatching with limitations and without the reliable stack trace
implemented yet.

After removal of the immediate flag by
commit d0807da78e11 ("livepatch: Remove immediate feature"), every
architecture enabling livepatching is required to have implemented
reliable stack trace.  Is it a better idea to make
HAVE_RELIABLE_STACKTRACE a config dependency, which will disable
livepatching support for architectures without reliable stack trace
function during kernel build?

The idea is to remove klp_have_reliable_stack() by moving
CONFIG_HAVE_RELIABLE_STACKTRACE as a config dependency to Kconfig file
and adding the other CONFIG_STACKTRACE as a config dependency is not
required, as it's selected via CONFIG_DYNAMIC_FTRACE_WITH_REGS
dependency chain. With the patch on architecture without
HAVE_RELIABLE_STACKTRACE, the user should see:

# insmod ./livepatch-sample.ko 
insmod: ERROR: could not insert module ./livepatch-sample.ko: Invalid module 
format

# dmesg
...
[  286.453463] livepatch_sample: module is marked as livepatch module, but 
livepatch support is disabled

I have done limited testing on PowerPC and to test the unsupported case,
the config dependency HAVE_RELIABLE_STACKTRACE was misspelled in Kconfig
file. If the idea sounds ok I will send a formal patch.

---8<

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 53551f470722..7848c7bbffbb 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -214,12 +214,6 @@ static inline bool klp_patch_pending(struct task_struct 
*task)
return test_tsk_thread_flag(task, TIF_PATCH_PENDING);
 }
 
-static inline bool klp_have_reliable_stack(void)
-{
-   return IS_ENABLED(CONFIG_STACKTRACE) &&
-  IS_ENABLED(CONFIG_HAVE_RELIABLE_STACKTRACE);
-}
-
 typedef int (*klp_shadow_ctor_t)(void *obj,
 void *shadow_data,
 void *ctor_data);
diff --git a/kernel/livepatch/Kconfig b/kernel/livepatch/Kconfig
index ec4565122e65..16b3692ddf9f 100644
--- a/kernel/livepatch/Kconfig
+++ b/kernel/livepatch/Kconfig
@@ -9,6 +9,7 @@ config LIVEPATCH
depends on MODULES
depends on SYSFS
depends on KALLSYMS_ALL
+   depends on HAVE_RELIABLE_STACKTRACE
depends on HAVE_LIVEPATCH
depends on !TRIM_UNUSED_KSYMS
help
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index fe1993399823..9a80f7574d75 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -1002,12 +1002,6 @@ int klp_enable_patch(struct klp_patch *patch)
if (!klp_initialized())
return -ENODEV;
 
-   if (!klp_have_reliable_stack()) {
-   pr_err("This architecture doesn't have support for the 
livepatch consistency model.\n");
-   return -ENOSYS;
-   }
-
-
mutex_lock(_mutex);
 
ret = klp_init_patch_early(patch);



Re: [PATCH 4.4 00/34] 4.4.174-stable review

2019-02-07 Thread Naresh Kamboju
On Thu, 7 Feb 2019 at 17:12, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 4.4.174 release.
> There are 34 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sat Feb  9 11:30:10 UTC 2019.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.174-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 4.4.174-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 968fa3bc933965940f7fb2a56789dac2bd12e4da
git describe: v4.4.173-35-g968fa3bc9339
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.173-35-g968fa3bc9339

No regressions (compared to build v4.4.173)

No fixes (compared to build v4.4.173)



Ran 17146 total tests in the following environments and test suites.

Environments
--
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* boot
* kselftest
* libhugetlbfs
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* spectre-meltdown-checker-test
* ltp-cap_bounds-tests
* install-android-platform-tools-r2600
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

Summary


kernel: 4.4.174-rc1
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git branch: 4.4.174-rc1-hikey-20190207-370
git commit: b7ab7cb3251d307982d559727fff755b29867bd2
git describe: 4.4.174-rc1-hikey-20190207-370
Test details: 
https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.174-rc1-hikey-20190207-370


No regressions (compared to build 4.4.173-rc2-hikey-20190204-367)


No fixes (compared to build 4.4.173-rc2-hikey-20190204-367)

Ran 2562 total tests in the following environments and test suites.

Environments
--
- hi6220-hikey - arm64
- qemu_arm64

Test Suites
---
* boot
* install-android-platform-tools-r2600
* ltp-cap_bounds-tests
* ltp-cpuhotplug-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* libhugetlbfs
* ltp-containers-tests
* ltp-cve-tests
* ltp-fs-tests
* spectre-meltdown-checker-test

-- 
Linaro LKFT
https://lkft.linaro.org


Re: [PATCH 4/4] Makefile: lld: set -O2 linker flag when linking with LLD

2019-02-07 Thread Nathan Chancellor
On Thu, Feb 07, 2019 at 02:01:52PM -0800, ndesaulni...@google.com wrote:
> -O2 enables tail merging of string table strings.
> 
> For arm64:
> 0.34% size improvement with lld -O2 over lld for vmlinux.
> 3.30% size improvement with lld -O2 over lld for Image.lz4-dtb.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/343
> Suggested-by: Rui Ueyama 
> Suggested-by: Nathan Chancellor 
> Signed-off-by: Nick Desaulniers 

Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 

> ---
>  Makefile | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index 6307c17259ea..c07208ec49d4 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -718,6 +718,10 @@ else
>  KBUILD_CFLAGS += -Wno-unused-but-set-variable
>  endif
>  
> +ifdef CONFIG_LD_IS_LLD
> +KBUILD_LDFLAGS += -O2
> +endif
> +
>  KBUILD_CFLAGS += $(call cc-disable-warning, unused-const-variable)
>  ifdef CONFIG_FRAME_POINTER
>  KBUILD_CFLAGS+= -fno-omit-frame-pointer -fno-optimize-sibling-calls
> -- 
> 2.20.1.791.gb4d0f1c61a-goog
> 


[PATCH 0/3] generic asm: mman cleanup

2019-02-07 Thread Michael S. Tsirkin
Now that we have MAP_SHARED, MAP_PRIVATE and MAP_SHARED_VALIDATE on all
architectures, it probably makes sense to de-duplicate these
and put them into a common header.

Please review and consider merging though the generic tree.

Build tested on x86 only. Has been in linux-next for a while now.

Michael S. Tsirkin (3):
  x86/mpx: tweak header name
  drm: tweak header name
  arch: move common mmap flags to linux/mman.h

 arch/alpha/include/uapi/asm/mman.h | 4 +---
 arch/mips/include/uapi/asm/mman.h  | 4 +---
 arch/parisc/include/uapi/asm/mman.h| 4 +---
 arch/x86/mm/mpx.c  | 2 +-
 arch/xtensa/include/uapi/asm/mman.h| 4 +---
 include/drm/drmP.h | 3 +--
 include/uapi/asm-generic/mman-common.h | 4 +---
 include/uapi/linux/mman.h  | 4 
 8 files changed, 11 insertions(+), 18 deletions(-)

-- 
MST



[PATCH 2/3] drm: tweak header name

2019-02-07 Thread Michael S. Tsirkin
Use linux/mman.h to make sure we get all mmap flags we need.

Signed-off-by: Michael S. Tsirkin 
---
 include/drm/drmP.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index bdb0d5548f39..a3184416ddc5 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -57,8 +57,7 @@
 #include 
 #include 
 #include 
-
-#include 
+#include 
 #include 
 #include 
 
-- 
MST



[PATCH 1/3] x86/mpx: tweak header name

2019-02-07 Thread Michael S. Tsirkin
Use linux/mman.h to make sure we get all mmap flags we need.

Signed-off-by: Michael S. Tsirkin 
---
 arch/x86/mm/mpx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index de1851d15699..c805db6236b4 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -9,12 +9,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
MST



[PATCH 2/2] arch: move common mmap flags to linux/mman.h

2019-02-07 Thread Michael S. Tsirkin
Now that we have 3 mmap flags shared by all architectures,
let's move them into the common header.

This will help discourage future architectures from duplicating code.

Signed-off-by: Michael S. Tsirkin 
---
 arch/alpha/include/uapi/asm/mman.h | 4 +---
 arch/mips/include/uapi/asm/mman.h  | 4 +---
 arch/parisc/include/uapi/asm/mman.h| 4 +---
 arch/xtensa/include/uapi/asm/mman.h| 4 +---
 include/uapi/asm-generic/mman-common.h | 4 +---
 include/uapi/linux/mman.h  | 4 
 6 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index f9d4e6b6d4bd..ac23379b7a87 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -10,9 +10,7 @@
 #define PROT_GROWSDOWN 0x0100  /* mprotect flag: extend change to 
start of growsdown vma */
 #define PROT_GROWSUP   0x0200  /* mprotect flag: extend change to end 
of growsup vma */
 
-#define MAP_SHARED 0x01/* Share changes */
-#define MAP_PRIVATE0x02/* Changes are private */
-#define MAP_SHARED_VALIDATE 0x03   /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x0f/* Mask for type of mapping (OSF/1 is 
_wrong_) */
 #define MAP_FIXED  0x100   /* Interpret addr exactly */
 #define MAP_ANONYMOUS  0x10/* don't use a file */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 3035ca499cd8..c2b40969eb1f 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -27,9 +27,7 @@
 /*
  * Flags for mmap
  */
-#define MAP_SHARED 0x001   /* Share changes */
-#define MAP_PRIVATE0x002   /* Changes are private */
-#define MAP_SHARED_VALIDATE 0x003  /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x00f   /* Mask for type of mapping */
 #define MAP_FIXED  0x010   /* Interpret addr exactly */
 
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 870fbf8c7088..c98162f494db 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -10,9 +10,7 @@
 #define PROT_GROWSDOWN 0x0100  /* mprotect flag: extend change to 
start of growsdown vma */
 #define PROT_GROWSUP   0x0200  /* mprotect flag: extend change to end 
of growsup vma */
 
-#define MAP_SHARED 0x01/* Share changes */
-#define MAP_PRIVATE0x02/* Changes are private */
-#define MAP_SHARED_VALIDATE 0x03   /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x2b/* Mask for type of mapping, includes 
bits 0x08 and 0x20 */
 #define MAP_FIXED  0x04/* Interpret addr exactly */
 #define MAP_ANONYMOUS  0x10/* don't use a file */
diff --git a/arch/xtensa/include/uapi/asm/mman.h 
b/arch/xtensa/include/uapi/asm/mman.h
index 58f29a9d895d..be726062412b 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -34,9 +34,7 @@
 /*
  * Flags for mmap
  */
-#define MAP_SHARED 0x001   /* Share changes */
-#define MAP_PRIVATE0x002   /* Changes are private */
-#define MAP_SHARED_VALIDATE 0x003  /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x00f   /* Mask for type of mapping */
 #define MAP_FIXED  0x010   /* Interpret addr exactly */
 
diff --git a/include/uapi/asm-generic/mman-common.h 
b/include/uapi/asm-generic/mman-common.h
index e7ee32861d51..abd238d0f7a4 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -15,9 +15,7 @@
 #define PROT_GROWSDOWN 0x0100  /* mprotect flag: extend change to 
start of growsdown vma */
 #define PROT_GROWSUP   0x0200  /* mprotect flag: extend change to end 
of growsup vma */
 
-#define MAP_SHARED 0x01/* Share changes */
-#define MAP_PRIVATE0x02/* Changes are private */
-#define MAP_SHARED_VALIDATE 0x03   /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x0f/* Mask for type of mapping */
 #define MAP_FIXED  0x10/* Interpret addr exactly */
 #define MAP_ANONYMOUS  0x20/* don't use a file */
diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index d0f515d53299..fc1a64c3447b 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -12,6 +12,10 @@
 #define OVERCOMMIT_ALWAYS  1
 #define OVERCOMMIT_NEVER   2
 
+#define MAP_SHARED 0x01/* Share changes */
+#define MAP_PRIVATE0x02/* Changes are private */
+#define 

[PATCH 3/3] arch: move common mmap flags to linux/mman.h

2019-02-07 Thread Michael S. Tsirkin
Now that we have 3 mmap flags shared by all architectures,
let's move them into the common header.

This will help discourage future architectures from duplicating code.

Signed-off-by: Michael S. Tsirkin 
---
 arch/alpha/include/uapi/asm/mman.h | 4 +---
 arch/mips/include/uapi/asm/mman.h  | 4 +---
 arch/parisc/include/uapi/asm/mman.h| 4 +---
 arch/xtensa/include/uapi/asm/mman.h| 4 +---
 include/uapi/asm-generic/mman-common.h | 4 +---
 include/uapi/linux/mman.h  | 4 
 6 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index f9d4e6b6d4bd..ac23379b7a87 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -10,9 +10,7 @@
 #define PROT_GROWSDOWN 0x0100  /* mprotect flag: extend change to 
start of growsdown vma */
 #define PROT_GROWSUP   0x0200  /* mprotect flag: extend change to end 
of growsup vma */
 
-#define MAP_SHARED 0x01/* Share changes */
-#define MAP_PRIVATE0x02/* Changes are private */
-#define MAP_SHARED_VALIDATE 0x03   /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x0f/* Mask for type of mapping (OSF/1 is 
_wrong_) */
 #define MAP_FIXED  0x100   /* Interpret addr exactly */
 #define MAP_ANONYMOUS  0x10/* don't use a file */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 3035ca499cd8..c2b40969eb1f 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -27,9 +27,7 @@
 /*
  * Flags for mmap
  */
-#define MAP_SHARED 0x001   /* Share changes */
-#define MAP_PRIVATE0x002   /* Changes are private */
-#define MAP_SHARED_VALIDATE 0x003  /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x00f   /* Mask for type of mapping */
 #define MAP_FIXED  0x010   /* Interpret addr exactly */
 
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 870fbf8c7088..c98162f494db 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -10,9 +10,7 @@
 #define PROT_GROWSDOWN 0x0100  /* mprotect flag: extend change to 
start of growsdown vma */
 #define PROT_GROWSUP   0x0200  /* mprotect flag: extend change to end 
of growsup vma */
 
-#define MAP_SHARED 0x01/* Share changes */
-#define MAP_PRIVATE0x02/* Changes are private */
-#define MAP_SHARED_VALIDATE 0x03   /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x2b/* Mask for type of mapping, includes 
bits 0x08 and 0x20 */
 #define MAP_FIXED  0x04/* Interpret addr exactly */
 #define MAP_ANONYMOUS  0x10/* don't use a file */
diff --git a/arch/xtensa/include/uapi/asm/mman.h 
b/arch/xtensa/include/uapi/asm/mman.h
index 58f29a9d895d..be726062412b 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -34,9 +34,7 @@
 /*
  * Flags for mmap
  */
-#define MAP_SHARED 0x001   /* Share changes */
-#define MAP_PRIVATE0x002   /* Changes are private */
-#define MAP_SHARED_VALIDATE 0x003  /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x00f   /* Mask for type of mapping */
 #define MAP_FIXED  0x010   /* Interpret addr exactly */
 
diff --git a/include/uapi/asm-generic/mman-common.h 
b/include/uapi/asm-generic/mman-common.h
index e7ee32861d51..abd238d0f7a4 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -15,9 +15,7 @@
 #define PROT_GROWSDOWN 0x0100  /* mprotect flag: extend change to 
start of growsdown vma */
 #define PROT_GROWSUP   0x0200  /* mprotect flag: extend change to end 
of growsup vma */
 
-#define MAP_SHARED 0x01/* Share changes */
-#define MAP_PRIVATE0x02/* Changes are private */
-#define MAP_SHARED_VALIDATE 0x03   /* share + validate extension flags */
+/* 0x01 - 0x03 are defined in linux/mman.h */
 #define MAP_TYPE   0x0f/* Mask for type of mapping */
 #define MAP_FIXED  0x10/* Interpret addr exactly */
 #define MAP_ANONYMOUS  0x20/* don't use a file */
diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index d0f515d53299..fc1a64c3447b 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -12,6 +12,10 @@
 #define OVERCOMMIT_ALWAYS  1
 #define OVERCOMMIT_NEVER   2
 
+#define MAP_SHARED 0x01/* Share changes */
+#define MAP_PRIVATE0x02/* Changes are private */
+#define 

linux-next: build failure after merge of the rtc tree

2019-02-07 Thread Stephen Rothwell
Hi Alexandre,

After merging the rtc tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/rtc/rtc-x1205: struct of_device_id is 200 bytes.  The last of 1 is:
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x78 0x69 0x72 0x63 0x6f 0x6d 0x2c 0x78 0x31 0x32 0x30 0x35 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
FATAL: drivers/rtc/rtc-x1205: struct of_device_id is not terminated with a NULL 
entry!

Caused by commit

  08bb868190c2 ("rtc: x1205: Add DT probing support")

I have used the rtc tree from next-20190207 for today.

-- 
Cheers,
Stephen Rothwell


pgpdC66xO48v_.pgp
Description: OpenPGP digital signature


Re: [PATCH] mm/page_poison: update comment after code moved

2019-02-07 Thread Michael S. Tsirkin
On Thu, Feb 07, 2019 at 09:01:41PM -0800, Andrew Morton wrote:
> On Thu, 7 Feb 2019 14:11:16 -0500 "Michael S. Tsirkin"  
> wrote:
> 
> > mm/debug-pagealloc.c is no more, so of course header now needs to be
> > updated. This seems like something checkpatch should be
> > able to catch - worth looking into?
> > 
> > Cc: triv...@kernel.org
> > Cc: linux...@kvack.org
> > Cc: Linus Torvalds 
> > Cc: a...@linux-foundation.org
> > Fixes: 8823b1dbc05f ("mm/page_poison.c: enable PAGE_POISONING as a separate 
> > option")
> 
> Please send along a signed-off-by: for this.

Oh sorry.

Signed-off-by: Michael S. Tsirkin 

Good enough or should I repost?


Re: [PATCH] huegtlbfs: fix page leak during migration of file pages

2019-02-07 Thread Mike Kravetz
On 2/7/19 6:31 PM, Naoya Horiguchi wrote:
> On Thu, Feb 07, 2019 at 10:50:55AM -0800, Mike Kravetz wrote:
>> On 1/30/19 1:14 PM, Mike Kravetz wrote:
>>> +++ b/fs/hugetlbfs/inode.c
>>> @@ -859,6 +859,16 @@ static int hugetlbfs_migrate_page(struct address_space 
>>> *mapping,
>>> rc = migrate_huge_page_move_mapping(mapping, newpage, page);
>>> if (rc != MIGRATEPAGE_SUCCESS)
>>> return rc;
>>> +
>>> +   /*
>>> +* page_private is subpool pointer in hugetlb pages, transfer
>>> +* if needed.
>>> +*/
>>> +   if (page_private(page) && !page_private(newpage)) {
>>> +   set_page_private(newpage, page_private(page));
>>> +   set_page_private(page, 0);
> 
> You don't have to copy PagePrivate flag?
> 

Well my original thought was no.  For hugetlb pages, PagePrivate is not
associated with page_private.  It indicates a reservation was consumed.
It is set  when a hugetlb page is newly allocated and the allocation is
associated with a reservation and the global reservation count is
decremented.  When the page is added to the page cache or rmap,
PagePrivate is cleared.  If the page is free'ed before being added to page
cache or rmap, PagePrivate tells free_huge_page to restore (increment) the
reserve count as we did not 'instantiate' the page.

So, PagePrivate is only set from the time a huge page is allocated until
it is added to page cache or rmap.  My original thought was that the page
could not be migrated during this time.  However, I am not sure if that
reasoning is correct.  The page is not locked, so it would appear that it
could be migrated?  But, if it can be migrated at this time then perhaps
there are bigger issues for the (hugetlb) page fault code?

>>> +
>>> +   }
>>> +
>>> if (mode != MIGRATE_SYNC_NO_COPY)
>>> migrate_page_copy(newpage, page);
>>> else
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index f7e4bfdc13b7..0d9708803553 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -703,8 +703,14 @@ void migrate_page_states(struct page *newpage, struct 
>>> page *page)
>>>  */
>>> if (PageSwapCache(page))
>>> ClearPageSwapCache(page);
>>> -   ClearPagePrivate(page);
>>> -   set_page_private(page, 0);
>>> +   /*
>>> +* Unlikely, but PagePrivate and page_private could potentially
>>> +* contain information needed at hugetlb free page time.
>>> +*/
>>> +   if (!PageHuge(page)) {
>>> +   ClearPagePrivate(page);
>>> +   set_page_private(page, 0);
>>> +   }
> 
> # This argument is mainly for existing code...
> 
> According to the comment on migrate_page():
> 
> /*
>  * Common logic to directly migrate a single LRU page suitable for
>  * pages that do not use PagePrivate/PagePrivate2.
>  *
>  * Pages are locked upon entry and exit.
>  */
> int migrate_page(struct address_space *mapping, ...
> 
> So this common logic assumes that page_private is not used, so why do
> we explicitly clear page_private in migrate_page_states()?

Perhaps someone else knows.  If not, I can do some git research and
try to find out why.

> buffer_migrate_page(), which is commonly used for the case when
> page_private is used, does that clearing outside migrate_page_states().
> So I thought that hugetlbfs_migrate_page() could do in the similar manner.
> IOW, migrate_page_states() should not do anything on PagePrivate.
> But there're a few other .migratepage callbacks, and I'm not sure all of
> them are safe for the change, so this approach might not fit for a small fix.

I will look at those as well unless someone knows without researching.

> 
> # BTW, there seems a typo in $SUBJECT.

Thanks!

-- 
Mike Kravetz


Re: [PATCH 3/4] Makefile: lld: tell clang to use lld

2019-02-07 Thread Nathan Chancellor
On Thu, Feb 07, 2019 at 02:01:51PM -0800, ndesaulni...@google.com wrote:
> This is needed because clang doesn't select which linker to use based on
> $LD but rather -fuse-ld=$(LD). This is problematic especially for
> cc-ldoption, which checks for linker flag support via invoking the
> compiler, rather than the linker.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/342
> Suggested-by: Nathan Chancellor 
> Signed-off-by: Nick Desaulniers 
> ---
>  Makefile | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index 0eae4277206e..6307c17259ea 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -502,6 +502,9 @@ endif
>  CLANG_FLAGS  += -no-integrated-as
>  KBUILD_CFLAGS+= $(CLANG_FLAGS)
>  KBUILD_AFLAGS+= $(CLANG_FLAGS)
> +ifneq ($(shell $(LD) --version 2>&1 | head -n 1 | grep LLD),)
> +CLANG_FLAGS  += -fuse-ld=lld
> +endif

This section needs to be moved up above KBUILD_CFLAGS otherwise it never
actually gets passed along.

With that change, please add:

Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 

>  export CLANG_FLAGS
>  endif
>  
> -- 
> 2.20.1.791.gb4d0f1c61a-goog
> 


Re: [PATCH 1/1] mm: page_cache_add_speculative(): refactor out some code duplication

2019-02-07 Thread Andrew Morton
On Wed,  6 Feb 2019 15:10:16 -0800 john.hubb...@gmail.com wrote:

> From: John Hubbard 
> 
> This combines the common elements of these routines:
> 
> page_cache_get_speculative()
> page_cache_add_speculative()
> 
> This was anticipated by the original author, as shown by the comment
> in commit ce0ad7f095258 ("powerpc/mm: Lockless get_user_pages_fast()
> for 64-bit (v3)"):
> 
> "Same as above, but add instead of inc (could just be merged)"
> 
> There is no intention to introduce any behavioral change, but there is a
> small risk of that, due to slightly differing ways of expressing the
> TINY_RCU and related configurations.
> 
> This also removes the VM_BUG_ON(in_interrupt()) that was in
> page_cache_add_speculative(), but not in page_cache_get_speculative(). This
> provides slightly less detection of such bugs, but it given that it was
> only there on the "add" path anyway, we can likely do without it just fine.

It removes the 
VM_BUG_ON_PAGE(PageCompound(page) && page != compound_head(page), page);
also.

We'll live ;)




Re: [PATCH 2/4] Makefile: clang: choose GCC_TOOLCHAIN_DIR not on LD

2019-02-07 Thread Nathan Chancellor
On Thu, Feb 07, 2019 at 02:01:50PM -0800, ndesaulni...@google.com wrote:
> This causes an issue when trying to build with `make LD=ld.lld` if
> ld.lld and the rest of your cross tools aren't in the same directory
> (ex. /usr/local/bin) (as is the case for Android's build system), as the
> GCC_TOOLCHAIN_DIR then gets set based on `which $(LD)` which will point
> where LLVM tools are, not GCC/binutils tools are located.
> 
> Instead, select the GCC_TOOLCHAIN_DIR based on another tool provided by
> binutils for which LLVM does not provide a substitute for, such as
> elfedit.
> 
> Fixes commit 785f11aa595b ("kbuild: Add better clang cross build support")
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/341
> Suggested-by: Nathan Chancellor 
> Signed-off-by: Nick Desaulniers 

Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 

> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index 3142e67d03f1..0eae4277206e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -492,7 +492,7 @@ endif
>  ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
>  ifneq ($(CROSS_COMPILE),)
>  CLANG_FLAGS  := --target=$(notdir $(CROSS_COMPILE:%-=%))
> -GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD)))
> +GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
>  CLANG_FLAGS  += --prefix=$(GCC_TOOLCHAIN_DIR)
>  GCC_TOOLCHAIN:= $(realpath $(GCC_TOOLCHAIN_DIR)/..)
>  endif
> -- 
> 2.20.1.791.gb4d0f1c61a-goog
> 


Re: [PATCH v03] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update

2019-02-07 Thread Srikar Dronamraju
> 
>  int arch_update_cpu_topology(void)
>  {
> - return numa_update_cpu_topology(true);
> + int changed = topology_changed;
> +
> + topology_changed = 0;
> + return changed;
>  }
> 

Do we need Powerpc override for arch_update_cpu_topology() now?  That
topology_changed sometime back doesn't seem to have help. The scheduler
atleast now is neglecting whether the topology changed or not.

Also we can do away with the new topology_changed.

>  static void topology_work_fn(struct work_struct *work)
>  {
> - rebuild_sched_domains();
> + lock_device_hotplug();
> + if (numa_update_cpu_topology(true))
> + rebuild_sched_domains();
> + unlock_device_hotplug();
>  }

Should this hunk be a separate patch by itself to say why
rebuild_sched_domains with a changelog that explains why it should be under
lock_device_hotplug? rebuild_sched_domains already takes cpuset_mutex. 
So I am not sure if we need to take device_hotplug_lock.

>  static DECLARE_WORK(topology_work, topology_work_fn);
> 
> -static void topology_schedule_update(void)
> +void topology_schedule_update(void)
>  {
> - schedule_work(_work);
> + if (!topology_update_in_progress)
> + schedule_work(_work);
>  }
> 
>  static void topology_timer_fn(struct timer_list *unused)
>  {
> + bool sdo = false;

Is sdo any abbrevation?

> +
> + if (topology_scans < 1)
> + bitmap_fill(cpumask_bits(_associativity_changes_mask),
> + nr_cpumask_bits);

Why do we need topology_scan? Just to make sure
cpu_associativity_changes_mask is populated only once?
cant we use a static bool inside the function for the same?


> +
>   if (prrn_enabled && cpumask_weight(_associativity_changes_mask))
> - topology_schedule_update();
> - else if (vphn_enabled) {
> + sdo =  true;
> + if (vphn_enabled) {

Any reason to remove the else above?

>   if (update_cpu_associativity_changes_mask() > 0)
> - topology_schedule_update();
> + sdo =  true;
>   reset_topology_timer();
>   }
> + if (sdo)
> + topology_schedule_update();
> + topology_scans++;
>  }

Are the above two hunks necessary? Not getting how the current changes are
different from the previous.

-- 
Thanks and Regards
Srikar Dronamraju



Re: [PATCH 1/4] init/Kconfig: add config support for detecting linker

2019-02-07 Thread Nathan Chancellor
On Thu, Feb 07, 2019 at 07:57:20PM -0500, Mathieu Desnoyers wrote:
> 
> - ndesaulni...@google.com wrote:
> > Similar to how we differentiate between CONFIG_CC_IS_GCC and
> > CONFIG_CC_IS_CLANG, add CONFIG_LD_IS_BFD, CONFIG_LD_IS_GOLD, and
> > CONFIG_LD_IS_LLD.
> > 
> > This simiplifies patches to Makefiles that need to do different things
> > for different linkers.
> 

Hi Mathieu,

> What guarantees that the linker used for e.g. make defconfig is the same 
> linker used for make ?
> 
> Is it required with this patch ?
> 

The build system ensures that the compiler (and after this patch, the
linker) config values are correct before compiling (and prompting the
user for the new choices that are available to them in that case).

For example:

$ make ARCH=arm64 CC=clang CROSS_COMPILE=aarch64-linux-gnu- defconfig
...

$ rg "CONFIG_CC_IS|CONFIG_LD_IS" .config
10:CONFIG_CC_IS_CLANG=y
12:CONFIG_LD_IS_BFD=y

$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LD=ld.lld
scripts/kconfig/conf  --syncconfig Kconfig
*
* Restart config...
*
*
* General architecture-dependent options
*
Kprobes (KPROBES) [N/y/?] n
Optimize very unlikely/likely branches (JUMP_LABEL) [N/y/?] (NEW)
Stack Protector buffer overflow detection (STACKPROTECTOR) [Y/n/?] y
  Strong Stack Protector (STACKPROTECTOR_STRONG) [Y/n/?] y
Use a virtually-mapped stack (VMAP_STACK) [Y/n/?] y
Perform full reference count validation at the expense of speed (REFCOUNT_FULL) 
[Y/?] y
*
* GCC plugins
*
GCC plugins (GCC_PLUGINS) [N/y/?] (NEW)
...

$ rg "CONFIG_CC_IS|CONFIG_LD_IS" .config
9:CONFIG_CC_IS_GCC=y
13:CONFIG_LD_IS_LLD=y

> How does it work in a cross compile environment ?
> 

Since $(LD) is defined based on $(CROSS_COMPILE), it should work fine.
I tested it with both arm64 and x86_64 and CONFIG_LD_IS_BFD gets
set by default and I can see CONFIG_LD_IS_GOLD get set with both
LD=aarch64-linux-gnu-ld.gold and LD=ld.gold.

If I misunderstood the questions or my explanation wasn't correct,
please let me know!

Cheers,
Nathan

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Cc: Nathan Chancellor 
> > Cc: Sami Tolvanen 
> > Signed-off-by: Nick Desaulniers 
> > ---
> >  init/Kconfig | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index c9386a365eea..b6046dcf7794 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -26,6 +26,15 @@ config CLANG_VERSION
> >  config CC_HAS_ASM_GOTO
> > def_bool $(success,$(srctree)/scripts/gcc-goto.sh $(CC))
> >  
> > +config LD_IS_BFD
> > +   def_bool $(success,$(LD) --version | head -n 1 | grep -q 'GNU ld')
> > +
> > +config LD_IS_GOLD
> > +   def_bool $(success,$(LD) --version | head -n 1 | grep -q 'GNU gold')
> > +
> > +config LD_IS_LLD
> > +   def_bool $(success,$(LD) --version | head -n 1 | grep -q 'LLD')
> > +
> >  config CONSTRUCTORS
> > bool
> > depends on !UML
> > -- 
> > 2.20.1.791.gb4d0f1c61a-goog
> > 
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com


Re: [PATCH 1/2] Revert "mm: don't reclaim inodes with many attached pages"

2019-02-07 Thread Andrew Morton
On Thu, 7 Feb 2019 11:27:50 +0100 Jan Kara  wrote:

> On Fri 01-02-19 09:19:04, Dave Chinner wrote:
> > Maybe for memcgs, but that's exactly the oppose of what we want to
> > do for global caches (e.g. filesystem metadata caches). We need to
> > make sure that a single, heavily pressured cache doesn't evict small
> > caches that lower pressure but are equally important for
> > performance.
> > 
> > e.g. I've noticed recently a significant increase in RMW cycles in
> > XFS inode cache writeback during various benchmarks. It hasn't
> > affected performance because the machine has IO and CPU to burn, but
> > on slower machines and storage, it will have a major impact.
> 
> Just as a data point, our performance testing infrastructure has bisected
> down to the commits discussed in this thread as the cause of about 40%
> regression in XFS file delete performance in bonnie++ benchmark.
> 

Has anyone done significant testing with Rik's maybe-fix?



From: Rik van Riel 
Subject: mm, slab, vmscan: accumulate gradual pressure on small slabs

There are a few issues with the way the number of slab objects to scan is
calculated in do_shrink_slab.  First, for zero-seek slabs, we could leave
the last object around forever.  That could result in pinning a dying
cgroup into memory, instead of reclaiming it.  The fix for that is
trivial.

Secondly, small slabs receive much more pressure, relative to their size,
than larger slabs, due to "rounding up" the minimum number of scanned
objects to batch_size.

We can keep the pressure on all slabs equal relative to their size by
accumulating the scan pressure on small slabs over time, resulting in
sometimes scanning an object, instead of always scanning several.

This results in lower system CPU use, and a lower major fault rate, as
actively used entries from smaller caches get reclaimed less aggressively,
and need to be reloaded/recreated less often.

[a...@linux-foundation.org: whitespace fixes, per Roman]
[r...@surriel.com: couple of fixes]
  Link: http://lkml.kernel.org/r/20190129142831.6a373...@imladris.surriel.com
Link: http://lkml.kernel.org/r/20190128143535.7767c...@imladris.surriel.com
Fixes: 4b85afbdacd2 ("mm: zero-seek shrinkers")
Fixes: 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of 
objects")
Signed-off-by: Rik van Riel 
Tested-by: Chris Mason 
Acked-by: Roman Gushchin 
Acked-by: Johannes Weiner 
Cc: Dave Chinner 
Cc: Jonathan Lemon 
Cc: Jan Kara 
Cc: 

Signed-off-by: Andrew Morton 
---


--- 
a/include/linux/shrinker.h~mmslabvmscan-accumulate-gradual-pressure-on-small-slabs
+++ a/include/linux/shrinker.h
@@ -65,6 +65,7 @@ struct shrinker {
 
long batch; /* reclaim batch size, 0 = default */
int seeks;  /* seeks to recreate an obj */
+   int small_scan; /* accumulate pressure on slabs with few objects */
unsigned flags;
 
/* These are for internal use */
--- a/mm/vmscan.c~mmslabvmscan-accumulate-gradual-pressure-on-small-slabs
+++ a/mm/vmscan.c
@@ -488,18 +488,30 @@ static unsigned long do_shrink_slab(stru
 * them aggressively under memory pressure to keep
 * them from causing refetches in the IO caches.
 */
-   delta = freeable / 2;
+   delta = (freeable + 1) / 2;
}
 
/*
 * Make sure we apply some minimal pressure on default priority
-* even on small cgroups. Stale objects are not only consuming memory
+* even on small cgroups, by accumulating pressure across multiple
+* slab shrinker runs. Stale objects are not only consuming memory
 * by themselves, but can also hold a reference to a dying cgroup,
 * preventing it from being reclaimed. A dying cgroup with all
 * corresponding structures like per-cpu stats and kmem caches
 * can be really big, so it may lead to a significant waste of memory.
 */
-   delta = max_t(unsigned long long, delta, min(freeable, batch_size));
+   if (!delta && shrinker->seeks) {
+   unsigned long nr_considered;
+
+   shrinker->small_scan += freeable;
+   nr_considered = shrinker->small_scan >> priority;
+
+   delta = 4 * nr_considered;
+   do_div(delta, shrinker->seeks);
+
+   if (delta)
+   shrinker->small_scan -= nr_considered << priority;
+   }
 
total_scan += delta;
if (total_scan < 0) {
_



Re: [PATCHv2 1/9] mm: Introduce new vm_insert_range and vm_insert_range_buggy API

2019-02-07 Thread Souptick Joarder
On Thu, Feb 7, 2019 at 10:17 PM Matthew Wilcox  wrote:
>
> On Thu, Feb 07, 2019 at 09:19:47PM +0530, Souptick Joarder wrote:
> > Just thought to take opinion for documentation before placing it in v3.
> > Does it looks fine ?
> >
> > +/**
> > + * __vm_insert_range - insert range of kernel pages into user vma
> > + * @vma: user vma to map to
> > + * @pages: pointer to array of source kernel pages
> > + * @num: number of pages in page array
> > + * @offset: user's requested vm_pgoff
> > + *
> > + * This allow drivers to insert range of kernel pages into a user vma.
> > + *
> > + * Return: 0 on success and error code otherwise.
> > + */
> > +static int __vm_insert_range(struct vm_area_struct *vma, struct page 
> > **pages,
> > +   unsigned long num, unsigned long offset)
>
> For static functions, I prefer to leave off the second '*', ie make it
> formatted like a docbook comment, but not be processed like a docbook
> comment.  That avoids cluttering the html with descriptions of internal
> functions that people can't actually call.
>
> > +/**
> > + * vm_insert_range - insert range of kernel pages starts with non zero 
> > offset
> > + * @vma: user vma to map to
> > + * @pages: pointer to array of source kernel pages
> > + * @num: number of pages in page array
> > + *
> > + * Maps an object consisting of `num' `pages', catering for the user's
>
> Rather than using `num', you should use @num.
>
> > + * requested vm_pgoff
> > + *
> > + * If we fail to insert any page into the vma, the function will return
> > + * immediately leaving any previously inserted pages present.  Callers
> > + * from the mmap handler may immediately return the error as their caller
> > + * will destroy the vma, removing any successfully inserted pages. Other
> > + * callers should make their own arrangements for calling unmap_region().
> > + *
> > + * Context: Process context. Called by mmap handlers.
> > + * Return: 0 on success and error code otherwise.
> > + */
> > +int vm_insert_range(struct vm_area_struct *vma, struct page **pages,
> > +   unsigned long num)
> >
> >
> > +/**
> > + * vm_insert_range_buggy - insert range of kernel pages starts with zero 
> > offset
> > + * @vma: user vma to map to
> > + * @pages: pointer to array of source kernel pages
> > + * @num: number of pages in page array
> > + *
> > + * Similar to vm_insert_range(), except that it explicitly sets @vm_pgoff 
> > to
>
> But vm_pgoff isn't a parameter, so it's misleading to format it as such.
>
> > + * 0. This function is intended for the drivers that did not consider
> > + * @vm_pgoff.
> > + *
> > + * Context: Process context. Called by mmap handlers.
> > + * Return: 0 on success and error code otherwise.
> > + */
> > +int vm_insert_range_buggy(struct vm_area_struct *vma, struct page **pages,
> > +   unsigned long num)
>
> I don't think we should call it 'buggy'.  'zero' would make more sense
> as a suffix.

suffix can be *zero or zero_offset* whichever suits better.

>
> Given how this interface has evolved, I'm no longer sure than
> 'vm_insert_range' makes sense as the name for it.  Is it perhaps
> 'vm_map_object' or 'vm_map_pages'?
>

I prefer vm_map_pages. Considering it, both the interface name can be changed
to *vm_insert_range -> vm_map_pages* and *vm_insert_range_buggy ->
vm_map_pages_{zero/zero_offset}.

As this is only change in interface name and rest of code remain same
shall I post it in v3 ( with additional change log mentioned about interface
name changed) ?

or,

It will be a new patch series ( with carry forward all the Reviewed-by
/ Tested-by on
vm_insert_range/ vm_insert_range_buggy ) ?


Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-07 Thread Jason Gunthorpe
On Thu, Feb 07, 2019 at 03:54:58PM -0800, Dan Williams wrote:

> > The only production worthy way is to have the FS be a partner in
> > making this work without requiring revoke, so the critical RDMA
> > traffic can operate safely.
> 
> ...belies a path forward. Just swap out "FS be a partner" with "system
> administrator be a partner". In other words, If the RDMA stack can't
> tolerate an MR being disabled then the administrator needs to actively
> disable the paths that would trigger it. Turn off reflink, don't
> truncate, avoid any future FS feature that might generate unwanted
> lease breaks. 

This is what I suggested already, except with explicit kernel aid, not
left as some gordian riddle for the administrator to unravel.

You already said it is too hard for expert FS developers to maintain a
mode switch, it seems like a really big stretch to think application
and systems architects will have any hope to do better.

It makes much more sense for the admin to flip some kind of bit and
the FS guarentees the safety that you are asking the admin to create.

> We would need to make sure that lease notifications include the
> information to identify the lease breaker to debug escapes that
> might happen, but it is a solution that can be qualified to not
> lease break. 

I think building a complicated lease framework and then telling
everyone in user space to design around it so it never gets used would
be very hard to explain and justify.

Never mind the security implications if some seemingly harmless future
filesystem change causes unexpected lease revokes across something
like a tenant boundary.

> In any event, this lets end users pick their filesystem
> (modulo RDMA incompatible features), provides an enumeration of
> lease break sources in the kernel, and opens up FS-DAX to a wider
> array of RDMA adapters. In general this is what Linux has
> historically done, give end users technology freedom.

I think this is not the Linux model. The kernel should not allow
unpriv user space to do an operation that could be unsafe.

I continue to think this is is the best idea that has come up - but
only if the filesystem is involved and expressly tells the kernel
layers that this combination of DAX & filesystem is safe.

Jason


Re: [PATCH 1/4] init/Kconfig: add config support for detecting linker

2019-02-07 Thread Nathan Chancellor
On Thu, Feb 07, 2019 at 02:01:49PM -0800, ndesaulni...@google.com wrote:
> Similar to how we differentiate between CONFIG_CC_IS_GCC and
> CONFIG_CC_IS_CLANG, add CONFIG_LD_IS_BFD, CONFIG_LD_IS_GOLD, and
> CONFIG_LD_IS_LLD.
> 
> This simiplifies patches to Makefiles that need to do different things
> for different linkers.
> 
> Cc: Nathan Chancellor 
> Cc: Sami Tolvanen 
> Signed-off-by: Nick Desaulniers 

Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 

> ---
>  init/Kconfig | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index c9386a365eea..b6046dcf7794 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -26,6 +26,15 @@ config CLANG_VERSION
>  config CC_HAS_ASM_GOTO
>   def_bool $(success,$(srctree)/scripts/gcc-goto.sh $(CC))
>  
> +config LD_IS_BFD
> + def_bool $(success,$(LD) --version | head -n 1 | grep -q 'GNU ld')
> +
> +config LD_IS_GOLD
> + def_bool $(success,$(LD) --version | head -n 1 | grep -q 'GNU gold')
> +
> +config LD_IS_LLD
> + def_bool $(success,$(LD) --version | head -n 1 | grep -q 'LLD')
> +
>  config CONSTRUCTORS
>   bool
>   depends on !UML
> -- 
> 2.20.1.791.gb4d0f1c61a-goog
> 


Re: [PATCHv7 6/6] coresight: cpu-debug: Add support for Qualcomm Kryo

2019-02-07 Thread Sai Prakash Ranjan

On 2/8/2019 1:58 AM, Mathieu Poirier wrote:

On Thu, 31 Jan 2019 at 17:55, Sai Prakash Ranjan
 wrote:


Add support for coresight CPU debug module on Qualcomm
Kryo CPUs. This patch adds the UCI entries for Kryo CPUs
found on MSM8996 which shares the same PIDs as ETMs.


[..]


I have reviewed your work and things look good to me.  Once again
there isn't much I can do with it right now.  Please re-submit when
Mike's set has been merged.



Sure, thanks.


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2] exec: don't force_sigsegv processes with a pending fatal signal

2019-02-07 Thread Eric W. Biederman
Ivan Delalande  writes:

> On Tue, Feb 05, 2019 at 01:11:19PM -0800, Andrew Morton wrote:
>> On Mon, 4 Feb 2019 18:53:08 -0800 Ivan Delalande  wrote:
>> > --- a/fs/exec.c
>> > +++ b/fs/exec.c
>> > @@ -1660,7 +1660,12 @@ int search_binary_handler(struct linux_binprm *bprm)
>> >if (retval < 0 && !bprm->mm) {
>> >/* we got to flush_old_exec() and failed after it */
>> >read_unlock(_lock);
>> > -  force_sigsegv(SIGSEGV, current);
>> > +  if (!fatal_signal_pending(current)) {
>> > +  if (print_fatal_signals)
>> > +  pr_info("load_binary() failed: %d\n",
>> > +  retval);
>> 
>> Should we be using print_fatal_signal() here?
>
> I don't think so, the force_sigsegv() already ensures it will be called
> from get_signal() when the signal is handled, and so the process
> information will be printed then.
>
>> > +  force_sigsegv(SIGSEGV, current);
>> > +  }
>> >return retval;
>> >}
>> >if (retval != -ENOEXEC || !bprm->file) {
>


I just noticed this.  From  my patch queue that I intend to send to
Linus tomorrow. I think this change fixes your issue of getting
the SIGSEGV instead of the already pending fatal signal.

So I think this fixes your issue without any other code changes.
Ivan can you verify that the patch below is enough?


diff --git a/kernel/signal.c b/kernel/signal.c
index 9ca8e5278c8e..5424cb0006bc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2393,6 +2393,11 @@ bool get_signal(struct ksignal *ksig)
goto relock;
}
 
+   /* Has this task already been marked for death? */
+   ksig->info.si_signo = signr = SIGKILL;
+   if (signal_group_exit(signal))
+   goto fatal;
+
for (;;) {
struct k_sigaction *ka;
 
@@ -2488,6 +2493,7 @@ bool get_signal(struct ksignal *ksig)
continue;
}
 
+   fatal:
spin_unlock_irq(>siglock);
 
   


Re: [PATCH v2 8/9] PCI: dwc: Remove Keystone specific dw_pcie_host_ops

2019-02-07 Thread Kishon Vijay Abraham I
Hi,

On 08/02/19 2:56 AM, Bjorn Helgaas wrote:
> On Thu, Feb 07, 2019 at 04:39:23PM +0530, Kishon Vijay Abraham I wrote:
>> Now that Keystone started using it's own msi_irq_chip, remove
>> Keystone specific callback function defined in dw_pcie_host_ops.
> 
> s/it's/its/
> s/callback function/callback functions/

okay.
> 
>> Signed-off-by: Kishon Vijay Abraham I 
>> Acked-by: Gustavo Pimentel 
>> ---
>>  .../pci/controller/dwc/pcie-designware-host.c | 45 ++-
>>  drivers/pci/controller/dwc/pcie-designware.h  |  5 ---
>>  2 files changed, 14 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
>> b/drivers/pci/controller/dwc/pcie-designware-host.c
>> index 042de09b0451..9492b05e8ff0 100644
>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>> @@ -126,18 +126,12 @@ static void dw_pci_setup_msi_msg(struct irq_data 
>> *data, struct msi_msg *msg)
>>  struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>>  u64 msi_target;
>>  
>> -if (pp->ops->get_msi_addr)
>> -msi_target = pp->ops->get_msi_addr(pp);
>> -else
>> -msi_target = (u64)pp->msi_data;
>> +msi_target = (u64)pp->msi_data;
>>  
>>  msg->address_lo = lower_32_bits(msi_target);
>>  msg->address_hi = upper_32_bits(msi_target);
>>  
>> -if (pp->ops->get_msi_data)
>> -msg->data = pp->ops->get_msi_data(pp, data->hwirq);
>> -else
>> -msg->data = data->hwirq;
>> +msg->data = data->hwirq;
>>  
>>  dev_dbg(pci->dev, "msi#%d address_hi %#x address_lo %#x\n",
>>  (int)data->hwirq, msg->address_hi, msg->address_lo);
>> @@ -157,17 +151,13 @@ static void dw_pci_bottom_mask(struct irq_data *data)
>>  
>>  raw_spin_lock_irqsave(>lock, flags);
>>  
>> -if (pp->ops->msi_clear_irq) {
>> -pp->ops->msi_clear_irq(pp, data->hwirq);
>> -} else {
>> -ctrl = data->hwirq / MAX_MSI_IRQS_PER_CTRL;
>> -res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
>> -bit = data->hwirq % MAX_MSI_IRQS_PER_CTRL;
>> +ctrl = data->hwirq / MAX_MSI_IRQS_PER_CTRL;
>> +res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
>> +bit = data->hwirq % MAX_MSI_IRQS_PER_CTRL;
>>  
>> -pp->irq_status[ctrl] &= ~(1 << bit);
>> -dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_MASK + res, 4,
>> -~pp->irq_status[ctrl]);
>> -}
>> +pp->irq_status[ctrl] &= ~(1 << bit);
>> +dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_MASK + res, 4,
>> +~pp->irq_status[ctrl]);
>>  
>>  raw_spin_unlock_irqrestore(>lock, flags);
>>  }
>> @@ -180,17 +170,13 @@ static void dw_pci_bottom_unmask(struct irq_data *data)
>>  
>>  raw_spin_lock_irqsave(>lock, flags);
>>  
>> -if (pp->ops->msi_set_irq) {
>> -pp->ops->msi_set_irq(pp, data->hwirq);
>> -} else {
>> -ctrl = data->hwirq / MAX_MSI_IRQS_PER_CTRL;
>> -res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
>> -bit = data->hwirq % MAX_MSI_IRQS_PER_CTRL;
>> +ctrl = data->hwirq / MAX_MSI_IRQS_PER_CTRL;
>> +res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
>> +bit = data->hwirq % MAX_MSI_IRQS_PER_CTRL;
>>  
>> -pp->irq_status[ctrl] |= 1 << bit;
>> -dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_MASK + res, 4,
>> -~pp->irq_status[ctrl]);
>> -}
>> +pp->irq_status[ctrl] |= 1 << bit;
>> +dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_MASK + res, 4,
>> +~pp->irq_status[ctrl]);
>>  
>>  raw_spin_unlock_irqrestore(>lock, flags);
>>  }
>> @@ -209,9 +195,6 @@ static void dw_pci_bottom_ack(struct irq_data *d)
>>  
>>  dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS + res, 4, 1 << bit);
>>  
>> -if (pp->ops->msi_irq_ack)
>> -pp->ops->msi_irq_ack(d->hwirq, pp);
>> -
>>  raw_spin_unlock_irqrestore(>lock, flags);
>>  }
>>  
>> diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
>> b/drivers/pci/controller/dwc/pcie-designware.h
>> index 95e0c3c93f48..ea4b215b605d 100644
>> --- a/drivers/pci/controller/dwc/pcie-designware.h
>> +++ b/drivers/pci/controller/dwc/pcie-designware.h
>> @@ -142,14 +142,9 @@ struct dw_pcie_host_ops {
>>  int (*wr_other_conf)(struct pcie_port *pp, struct pci_bus *bus,
>>   unsigned int devfn, int where, int size, u32 val);
>>  int (*host_init)(struct pcie_port *pp);
>> -void (*msi_set_irq)(struct pcie_port *pp, int irq);
>> -void (*msi_clear_irq)(struct pcie_port *pp, int irq);
>> -phys_addr_t (*get_msi_addr)(struct pcie_port *pp);
>> -u32 (*get_msi_data)(struct pcie_port *pp, int pos);
> 
> I don't see the whole series on linux-pci (I only see patches 2, 6, 8, 9),

I do see all the patches here
https://patchwork.kernel.org/project/linux-pci/list/. Maybe it arrived little 
late?
> but I expected to somewhere see the removal of assignments to 

Re: [PATCHv7 5/6] coresight: etm4x: Add ETM PIDs for SDM845 and MSM8996

2019-02-07 Thread Sai Prakash Ranjan

On 2/8/2019 1:56 AM, Mathieu Poirier wrote:

On Thu, 31 Jan 2019 at 17:54, Sai Prakash Ranjan
 wrote:




I am good with this patch but there isn't much I can do with it until
Mike's patchset has been merged.  Please submit again when that has
been done.



Sure, thanks.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


Re: [RFC PATCH] leds: multicolor: Add sysfs interface definition

2019-02-07 Thread Vesa Jääskeläinen

Hi All,

On 08/02/2019 6.55, Vesa Jääskeläinen wrote:

Hi All,

On 31/01/2019 0.35, Pavel Machek wrote:

On Wed 2019-01-30 12:30:05, Dan Murphy wrote:

Add a documentation of LED Multicolor LED class specific
sysfs attributes.


No, sorry. This does not most of the requirements.

    Pavel

Requirements for RGB LED interface:


...

I have tried to capture relevant parts of ideas and requirements and 
usage in a wiki page in github:


https://github.com/vesajaaskelainen/linux-multicolor-leds/wiki

I believe the discussion is good to perform in mailing list -- I am 
happy to update or give you access to update the wiki so we could have 
easy to use summary/details source during discussions.


Ideas on the interface seem to be a bit drifting so I apologize if some 
of Your ideas were not captured.


There has been at least two direct proposals for userspace interface and 
I tried to provide usage example also for Dan's proposal. Feel free to 
correct me if I made a mistake.


I believe it is a good to create summary page as there seems to many 
aspects to be though out. What do you feel on this approach?


And should we perhaps move this discussion only to linux-leds mailing 
list for successive replies :) ? So we don't generate too broad traffic.


Thanks,
Vesa Jääskeläinen


Clang warning in drivers/net/ethernet/intel/igc/igc_ethtool.c

2019-02-07 Thread Nathan Chancellor
Hi all,

After commit 8c5ad0dae93c ("igc: Add ethtool support"), Clang warns:

drivers/net/ethernet/intel/igc/igc_ethtool.c:9:19: warning: variable 
'igc_priv_flags_strings' is not needed and will not be emitted 
[-Wunneeded-internal-declaration]
static const char igc_priv_flags_strings[][ETH_GSTRING_LEN] = {
  ^
1 warning generated.

igc_priv_flags_strings is only used in an ARRAY_SIZE macro, which is a
compile time evaluation, so no reference to it is being emitted in the
final assembly. Is it actually needed and was forgotten to be used
somewhere or could it be eliminated so that Clang no longer warns?

Thanks,
Nathan


Re: [PATCH v2 3/4] dt-bindings: phy: ti: Add dt binding documentation for SERDES in AM654x SoC

2019-02-07 Thread Kishon Vijay Abraham I
Hi Roger,

On 07/02/19 7:52 PM, Roger Quadros wrote:
> 
> 
> On 07/02/19 14:19, Kishon Vijay Abraham I wrote:
>> Hi Roger,
>>
>> On 07/02/19 4:56 PM, Roger Quadros wrote:
>>>
>>>
>>> On 06/02/19 13:07, Kishon Vijay Abraham I wrote:
 AM654x has two SERDES instances. Each instance has three input clocks
 (left input, externel reference clock and right input) and two output
 clocks (left output and right output) in addition to a PLL mux clock
 which the SERDES uses for Clock Multiplier Unit (CMU refclock).
 The PLL mux clock can select from one of the three input clocks.
 The right output can select between left input and external reference
 clock while the left output can select between the right input and
 external reference clock.

 The left and right input reference clock of SERDES0 and SERDES1
 respectively are connected to the SoC clock. In the case of two lane
 SERDES personality card, the left input of SERDES1 is connected to
 the right output of SERDES0 in a chained fashion.

 See section "Reference Clock Distribution" of AM65x Sitara Processors
 TRM (SPRUID7 – April 2018) for more details.

 Add dt-binding documentation in order to represent all these different
 configurations in device tree.

 Signed-off-by: Kishon Vijay Abraham I 
 ---
  .../devicetree/bindings/phy/ti-phy.txt| 77 +++
  include/dt-bindings/phy/phy-am654-serdes.h| 13 
  2 files changed, 90 insertions(+)
  create mode 100644 include/dt-bindings/phy/phy-am654-serdes.h

 diff --git a/Documentation/devicetree/bindings/phy/ti-phy.txt 
 b/Documentation/devicetree/bindings/phy/ti-phy.txt
 index 57dfda8a7a1d..fc2fff6b2c37 100644
 --- a/Documentation/devicetree/bindings/phy/ti-phy.txt
 +++ b/Documentation/devicetree/bindings/phy/ti-phy.txt
 @@ -132,3 +132,80 @@ sata_phy: phy@4a096000 {
syscon-pllreset = <_conf 0x3fc>;
#phy-cells = <0>;
  };
 +
 +
 +TI AM654 SERDES
 +
 +Required properties:
 + - compatible: Should be "ti,phy-am654-serdes"
 + - reg : Address and length of the register set for the device.
 + - reg-names: Should be "serdes" which corresponds to the register space
 +  populated in "reg".
 + - #phy-cells: determine the number of cells that should be given in the
 +  phandle while referencing this phy. Should be "2". The 1st cell
 +  corresponds to the phy type (should be one of the types specified in
 +  include/dt-bindings/phy/phy.h) and the 2nd cell should be the serdes
 +  lane function.
 +  If SERDES0 is referenced 2nd cell should be:
 +  0 - USB3
 +  1 - PCIe0 Lane0
 +  2 - ICSS2 SGMII Lane0
 +  If SERDES1 is referenced 2nd cell should be:
 +  0 - PCIe1 Lane0
 +  1 - PCIe0 Lane1
 +  2 - ICSS2 SGMII Lane1
>>>
>>> Instead of these magic numbers and expecting a human to decipher this
>>> which is prone to error, is it better to create the following defines and
>>> check for valid configuration in the driver?
>>>
>>> AM654_SERDES_LANE_USB3,
>>> AM654_SERDES_LANE_PCIE_LANE0,
>>> AM654_SERDES_LANE_PCIE_LANE1,
>>> AM654_SERDES_LANE_SGMII,
>>>
>>> So if you pass AM654_SERDES_LANE_PCIE_LANE0 to SERDES1, driver can easily
>>> figure out that it should be 1 if it is serdes0 and 0 if serdes1
>>>
>>> Which means the DT must contain something so that you can identify
>>> if it is serdes0 or serdes1.
>>
>> Generally I'd like to avoid drivers having to know instance numbers. That 
>> gets
>> more complicated to handle when the same IP is used in different platforms.
> 
> But this PHY driver is for AM654 platform. Are you saying that variants of 
> this
> platform have different lane configurations?

It can have different lane configurations (we've already seen that in dra72).
Nothing prevents from having the same SERDES IP in a future platform with
different lane configuration.

This is more of a system configuration which might get complicated if we try to
check all the valid configurations in driver.


> 
> You have already specified of the possibilities that can be in 2nd cell in
> the binding document.
> 
> Is it better to create a directory ti/ and put this binding in 
> ti,phy-am654-serdes.txt
> instead of the already so large ti,phy.txt?

sure.

Thanks
Kishon


Re: [PATCH] mm/page_poison: update comment after code moved

2019-02-07 Thread Andrew Morton
On Thu, 7 Feb 2019 14:11:16 -0500 "Michael S. Tsirkin"  wrote:

> mm/debug-pagealloc.c is no more, so of course header now needs to be
> updated. This seems like something checkpatch should be
> able to catch - worth looking into?
> 
> Cc: triv...@kernel.org
> Cc: linux...@kvack.org
> Cc: Linus Torvalds 
> Cc: a...@linux-foundation.org
> Fixes: 8823b1dbc05f ("mm/page_poison.c: enable PAGE_POISONING as a separate 
> option")

Please send along a signed-off-by: for this.


[PATCH] ASoC: codecs: jz4725b: Remove unnecessary const qualifier

2019-02-07 Thread Nathan Chancellor
Clang warns:

sound/soc/codecs/jz4725b.c:177:14: warning: duplicate 'const' declaration 
specifier [-Wduplicate-decl-specifier]
static const SOC_VALUE_ENUM_SINGLE_DECL(jz4725b_codec_adc_src_enum,
 ^
include/sound/soc.h:356:2: note: expanded from macro 
'SOC_VALUE_ENUM_SINGLE_DECL'
SOC_VALUE_ENUM_DOUBLE_DECL(name, xreg, xshift, xshift, xmask, xtexts, 
xvalues)
^
include/sound/soc.h:353:2: note: expanded from macro 
'SOC_VALUE_ENUM_DOUBLE_DECL'
const struct soc_enum name = SOC_VALUE_ENUM_DOUBLE(xreg, xshift_l, 
xshift_r, xmask, \
^

As it points out, SOC_VALUE_ENUM_DOUBLE_DECL has the const attribute in
its definition so remove it here.

Fixes: e9d97b05a80f ("ASoC: codecs: Add jz4725b-codec driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/354
Signed-off-by: Nathan Chancellor 
---
 sound/soc/codecs/jz4725b.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/sound/soc/codecs/jz4725b.c b/sound/soc/codecs/jz4725b.c
index e3dba92f30bd..5cc8c7ca24c6 100644
--- a/sound/soc/codecs/jz4725b.c
+++ b/sound/soc/codecs/jz4725b.c
@@ -174,12 +174,12 @@ static const char * const jz4725b_codec_adc_src_texts[] = 
{
"Mic 1", "Mic 2", "Line In", "Mixer",
 };
 static const unsigned int jz4725b_codec_adc_src_values[] = { 0, 1, 2, 3, };
-static const SOC_VALUE_ENUM_SINGLE_DECL(jz4725b_codec_adc_src_enum,
-   JZ4725B_CODEC_REG_CR3,
-   REG_CR3_INSEL_OFFSET,
-   REG_CR3_INSEL_MASK,
-   jz4725b_codec_adc_src_texts,
-   jz4725b_codec_adc_src_values);
+static SOC_VALUE_ENUM_SINGLE_DECL(jz4725b_codec_adc_src_enum,
+ JZ4725B_CODEC_REG_CR3,
+ REG_CR3_INSEL_OFFSET,
+ REG_CR3_INSEL_MASK,
+ jz4725b_codec_adc_src_texts,
+ jz4725b_codec_adc_src_values);
 static const struct snd_kcontrol_new jz4725b_codec_adc_src_ctrl =
SOC_DAPM_ENUM("Route", jz4725b_codec_adc_src_enum);
 
-- 
2.20.1



Re: [RFC PATCH] leds: multicolor: Add sysfs interface definition

2019-02-07 Thread Vesa Jääskeläinen

Hi All,

On 31/01/2019 0.35, Pavel Machek wrote:

On Wed 2019-01-30 12:30:05, Dan Murphy wrote:

Add a documentation of LED Multicolor LED class specific
sysfs attributes.


No, sorry. This does not most of the requirements.

Pavel

Requirements for RGB LED interface:


...

I have tried to capture relevant parts of ideas and requirements and 
usage in a wiki page in github:


https://github.com/vesajaaskelainen/linux-multicolor-leds/wiki

I believe the discussion is good to perform in mailing list -- I am 
happy to update or give you access to update the wiki so we could have 
easy to use summary/details source during discussions.


Ideas on the interface seem to be a bit drifting so I apologize if some 
of Your ideas were not captured.


There has been at least two direct proposals for userspace interface and 
I tried to provide usage example also for Dan's proposal. Feel free to 
correct me if I made a mistake.


I believe it is a good to create summary page as there seems to many 
aspects to be though out. What do you feel on this approach?


Thanks,
Vesa Jääskeläinen


[PATCH] ARM: dts: add description of Netgear RN NV+v2 LCD

2019-02-07 Thread David Coles
GPIO pin mapping taken from NetGear NV+ v2 5.3.13 kernel sources[1]
(see arch/arm/plat-feroceon/mv_hal/rtc/ext_rtc/usiLCD.c).

1. https://www.downloads.netgear.com/files/GPL/RND_5.3.13_WW.src.zip

Signed-off-by: David Coles 
---
 .../boot/dts/kirkwood-netgear_readynas_nv+_v2.dts  | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm/boot/dts/kirkwood-netgear_readynas_nv+_v2.dts 
b/arch/arm/boot/dts/kirkwood-netgear_readynas_nv+_v2.dts
index 8cc8550242ef..a61b56f44ec7 100644
--- a/arch/arm/boot/dts/kirkwood-netgear_readynas_nv+_v2.dts
+++ b/arch/arm/boot/dts/kirkwood-netgear_readynas_nv+_v2.dts
@@ -184,6 +184,20 @@
gpios = < 30 GPIO_ACTIVE_LOW>;
};
 
+   auxdisplay {
+   compatible = "hit,hd44780";
+   data-gpios = < 17 GPIO_ACTIVE_HIGH>,
+< 1 GPIO_ACTIVE_HIGH>,
+< 3 GPIO_ACTIVE_HIGH>,
+< 17 GPIO_ACTIVE_HIGH>;
+   enable-gpios = < 16 GPIO_ACTIVE_HIGH>;
+   rs-gpios = < 14 GPIO_ACTIVE_HIGH>;
+   rw-gpios = < 15 GPIO_ACTIVE_HIGH>;
+   backlight-gpios = < 12 GPIO_ACTIVE_LOW>;
+   display-height-chars = <2>;
+   display-width-chars = <16>;
+   };
+
regulators {
compatible = "simple-bus";
#address-cells = <1>;
-- 
2.17.1



[PATCH net-next] ethtool: Remove unnecessary null check in ethtool_rx_flow_rule_create

2019-02-07 Thread Nathan Chancellor
net/core/ethtool.c:3023:19: warning: address of array
'ext_m_spec->h_dest' will always evaluate to 'true'
[-Wpointer-bool-conversion]
if (ext_m_spec->h_dest) {
~~  ^~

h_dest is an array, it can't be null so remove this check.

Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure 
translator")
Link: https://github.com/ClangBuiltLinux/linux/issues/353
Signed-off-by: Nathan Chancellor 
---
 net/core/ethtool.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 0fbf39239b29..d2c47cdf25da 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -3020,17 +3020,15 @@ ethtool_rx_flow_rule_create(const struct 
ethtool_rx_flow_spec_input *input)
const struct ethtool_flow_ext *ext_h_spec = >h_ext;
const struct ethtool_flow_ext *ext_m_spec = >m_ext;
 
-   if (ext_m_spec->h_dest) {
-   memcpy(match->key.eth_addrs.dst, ext_h_spec->h_dest,
-  ETH_ALEN);
-   memcpy(match->mask.eth_addrs.dst, ext_m_spec->h_dest,
-  ETH_ALEN);
-
-   match->dissector.used_keys |=
-   BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS);
-   match->dissector.offset[FLOW_DISSECTOR_KEY_ETH_ADDRS] =
-   offsetof(struct ethtool_rx_flow_key, eth_addrs);
-   }
+   memcpy(match->key.eth_addrs.dst, ext_h_spec->h_dest,
+  ETH_ALEN);
+   memcpy(match->mask.eth_addrs.dst, ext_m_spec->h_dest,
+  ETH_ALEN);
+
+   match->dissector.used_keys |=
+   BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS);
+   match->dissector.offset[FLOW_DISSECTOR_KEY_ETH_ADDRS] =
+   offsetof(struct ethtool_rx_flow_key, eth_addrs);
}
 
act = >rule->action.entries[0];
-- 
2.20.1



Re: [PATCH v2 6/9] PCI: dwc: Add support to use non default msi_irq_chip

2019-02-07 Thread Kishon Vijay Abraham I
Hi Lorenzo,

On 07/02/19 10:18 PM, Lorenzo Pieralisi wrote:
> On Thu, Feb 07, 2019 at 04:39:21PM +0530, Kishon Vijay Abraham I wrote:
>> Platforms using Designware IP uses dw_pci_msi_bottom_irq_chip for
>> configuring the MSI controller logic within the Designware IP. However
>> certain platforms like Keystone (K2G) which uses Desingware IP has
>> it's own MSI controller logic. For handling such platforms,
>> the irqchip ops uses msi_irq_ack, msi_set_irq, msi_clear_irq callback
>> functions.
>>
>> Add support to use different msi_irq_chip with default as
>> dw_pci_msi_bottom_irq_chip. This is in preparation to get rid off
>> msi_irq_ack, msi_set_irq, msi_clear_irq and other Keystone specific
>> dw_pcie_host_ops. This will also help to get rid of get_msi_addr and
>> get_msi_data ops.
>>
>> Signed-off-by: Kishon Vijay Abraham I 
>> ---
>>  drivers/pci/controller/dwc/pcie-designware-host.c | 5 -
>>  drivers/pci/controller/dwc/pcie-designware.h  | 1 +
>>  2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
>> b/drivers/pci/controller/dwc/pcie-designware-host.c
>> index 721d60a5d9e4..042de09b0451 100644
>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>> @@ -245,7 +245,7 @@ static int dw_pcie_irq_domain_alloc(struct irq_domain 
>> *domain,
>>  
>>  for (i = 0; i < nr_irqs; i++)
>>  irq_domain_set_info(domain, virq + i, bit + i,
>> -_pci_msi_bottom_irq_chip,
>> +pp->msi_irq_chip,
>>  pp, handle_edge_irq,
>>  NULL, NULL);
>>  
>> @@ -277,6 +277,9 @@ int dw_pcie_allocate_domains(struct pcie_port *pp)
>>  struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>>  struct fwnode_handle *fwnode = of_node_to_fwnode(pci->dev->of_node);
>>  
>> +if (!pp->msi_irq_chip)
>> +pp->msi_irq_chip = _pci_msi_bottom_irq_chip;
> 
> I think it is better to initialize pp->msi_irq_chip outside
> dw_pcie_allocate_domains(), it makes things clearer.
> 
> In:
> 
> dw_pcie_host_init() for dwc
> 
> or
> 
> msi_host_init() for platforms with that hook implemented.
> 
> Is there any gotcha I am missing ?

I added here only to avoid breaking "git bisect". Next patch adds
ks_pcie_msi_irq_chip in msi_host_init of keystone. However till then it has to
use dw_pci_msi_bottom_irq_chip. Adding anywhere else in dw_pcie_host_init would
mean msi_irq_chip is uninitialized for keystone.

Maybe I can add that in the commit log and move it to dw_pcie_host_init?

Thanks
Kishon


Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-07 Thread Dave Chinner
On Thu, Feb 07, 2019 at 04:55:37PM +, Christopher Lameter wrote:
> One approach that may be a clean way to solve this:
> 3. Filesystems that allow bypass of the page cache (like XFS / DAX) will
>provide the virtual mapping when the PIN is done and DO NO OPERATIONS
>on the longterm pinned range until the long term pin is removed.

So, ummm, how do we do block allocation then, which is done on
demand during writes?

IOWs, this requires the application to set up the file in the
correct state for the filesystem to lock it down so somebody else
can write to it.  That means the file can't be sparse, it can't be
preallocated (i.e. can't contain unwritten extents), it must have zeroes
written to it's full size before being shared because otherwise it
exposes stale data to the remote client (secure sites are going to
love that!), they can't be extended, etc.

IOWs, once the file is prepped and leased out for RDMA, it becomes
an immutable for the purposes of local access.

Which, essentially we can already do. Prep the file, map it
read/write, mark it immutable, then pin it via the longterm gup
interface which can do the necessary checks.

Simple to implement, the reasons for errors trying to modify the
file are already documented and queriable, and it's hard for
applications to get wrong.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [RFC/PATCH 0/5] DVFS in the OPP core

2019-02-07 Thread Rajendra Nayak




On 2/8/2019 1:17 AM, Stephen Boyd wrote:

Quoting Rajendra Nayak (2019-02-06 22:57:12)



3) How do we handle devices that already have power-domains specified in
DT? The opp binding for required-opps doesn't let us specify the power
domain to target, instead it assumes that whatever power domain is
attached to a device is the one that OPP needs to use to change the
genpd performance state. Do we need a
dev_pm_opp_set_required_opps_name() or something to be explicit about
this? Can we have some way for the power domain that required-opps
correspond to be expressed in the OPP tables themselves?


I was converting a few more drivers to use the proposed approach in this
RFC, in order to identify all outstanding issues we need to deal with,
and specifically for UFS, I end up with this exact scenario where UFS already
has an existing power domain (gdsc) and I need to add another one (rpmhpd) for
setting the performance state.

If I use dev_pm_opp_of_add_table() to add the opp table from DT, the opp
layer assumes its the same device on which it can do a 
dev_pm_genpd_set_performance_state()
with, however the device that's actually associated with the pm_domain when we
have multiple power domains is infact the one (dummy) that we create when
the driver makes a call to dev_pm_domain_attach_by_name/id().

Any thoughts on whats a good way to handle this?



Ulf mentioned that we can use dev_pm_opp_set_genpd_virt_dev() for this.
Does that API work here?


Ah, yes, that should work, I hadn't noticed this API existed.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v10 12/25] arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking

2019-02-07 Thread Nathan Chancellor
On Thu, Jan 31, 2019 at 02:58:50PM +, Julien Thierry wrote:
> Instead disabling interrupts by setting the PSR.I bit, use a priority
> higher than the one used for interrupts to mask them via PMR.
> 
> When using PMR to disable interrupts, the value of PMR will be used
> instead of PSR.[DAIF] for the irqflags.
> 
> Signed-off-by: Julien Thierry 
> Suggested-by: Daniel Thompson 
> Acked-by: Ard Biesheuvel 
> Reviewed-by: Catalin Marinas 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Ard Biesheuvel 
> Cc: Oleg Nesterov 
> ---
>  arch/arm64/include/asm/efi.h  |  11 +
>  arch/arm64/include/asm/irqflags.h | 100 
> +++---
>  2 files changed, 83 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
> index 7ed3208..c9e9a69 100644
> --- a/arch/arm64/include/asm/efi.h
> +++ b/arch/arm64/include/asm/efi.h
> @@ -44,6 +44,17 @@
>  
>  #define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | 
> PSR_F_BIT)
>  
> +/*
> + * Even when Linux uses IRQ priorities for IRQ disabling, EFI does not.
> + * And EFI shouldn't really play around with priority masking as it is not 
> aware
> + * which priorities the OS has assigned to its interrupts.
> + */
> +#define arch_efi_save_flags(state_flags) \
> + ((void)((state_flags) = read_sysreg(daif)))
> +
> +#define arch_efi_restore_flags(state_flags)  write_sysreg(state_flags, daif)
> +
> +
>  /* arch specific definitions used by the stub code */
>  
>  /*
> diff --git a/arch/arm64/include/asm/irqflags.h 
> b/arch/arm64/include/asm/irqflags.h
> index 24692ed..d4597b2 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -18,7 +18,9 @@
>  
>  #ifdef __KERNEL__
>  
> +#include 
>  #include 
> +#include 
>  
>  /*
>   * Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts 
> and
> @@ -36,33 +38,27 @@
>  /*
>   * CPU interrupt mask handling.
>   */
> -static inline unsigned long arch_local_irq_save(void)
> -{
> - unsigned long flags;
> - asm volatile(
> - "mrs%0, daif// arch_local_irq_save\n"
> - "msrdaifset, #2"
> - : "=r" (flags)
> - :
> - : "memory");
> - return flags;
> -}
> -
>  static inline void arch_local_irq_enable(void)
>  {
> - asm volatile(
> - "msrdaifclr, #2 // arch_local_irq_enable"
> - :
> + asm volatile(ALTERNATIVE(
> + "msrdaifclr, #2 // arch_local_irq_enable\n"
> + "nop",
> + "msr_s  " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
> + "dsbsy",
> + ARM64_HAS_IRQ_PRIO_MASKING)
>   :
> + : "r" (GIC_PRIO_IRQON)
>   : "memory");
>  }
>  
>  static inline void arch_local_irq_disable(void)
>  {
> - asm volatile(
> - "msrdaifset, #2 // arch_local_irq_disable"
> - :
> + asm volatile(ALTERNATIVE(
> + "msrdaifset, #2 // arch_local_irq_disable",
> + "msr_s  " __stringify(SYS_ICC_PMR_EL1) ", %0",
> + ARM64_HAS_IRQ_PRIO_MASKING)
>   :
> + : "r" (GIC_PRIO_IRQOFF)
>   : "memory");
>  }
>  
> @@ -71,12 +67,44 @@ static inline void arch_local_irq_disable(void)
>   */
>  static inline unsigned long arch_local_save_flags(void)
>  {
> + unsigned long daif_bits;
>   unsigned long flags;
> - asm volatile(
> - "mrs%0, daif// arch_local_save_flags"
> - : "=r" (flags)
> - :
> +
> + daif_bits = read_sysreg(daif);
> +
> + /*
> +  * The asm is logically equivalent to:
> +  *
> +  * if (system_uses_irq_prio_masking())
> +  *  flags = (daif_bits & PSR_I_BIT) ?
> +  *  GIC_PRIO_IRQOFF :
> +  *  read_sysreg_s(SYS_ICC_PMR_EL1);
> +  * else
> +  *  flags = daif_bits;
> +  */
> + asm volatile(ALTERNATIVE(
> + "mov%0, %1\n"
> + "nop\n"
> + "nop",
> + "mrs_s  %0, " __stringify(SYS_ICC_PMR_EL1) "\n"
> + "ands   %1, %1, " __stringify(PSR_I_BIT) "\n"
> + "csel   %0, %0, %2, eq",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> + : "=" (flags), "+r" (daif_bits)
> + : "r" (GIC_PRIO_IRQOFF)
>   : "memory");
> +
> + return flags;
> +}
> +
> +static inline unsigned long arch_local_irq_save(void)
> +{
> + unsigned long flags;
> +
> + flags = arch_local_save_flags();
> +
> + arch_local_irq_disable();
> +
>   return flags;
>  }
>  
> @@ -85,16 +113,32 @@ static inline unsigned long arch_local_save_flags(void)
>   */
>  static inline void arch_local_irq_restore(unsigned long flags)
>  {
> - asm volatile(
> - 

Re: [PATCH v2 3/9] PCI: keystone: Add separate functions for configuring MSI and legacy interrupt

2019-02-07 Thread Kishon Vijay Abraham I
Hi,

On 07/02/19 9:14 PM, Lorenzo Pieralisi wrote:
> On Thu, Feb 07, 2019 at 04:39:18PM +0530, Kishon Vijay Abraham I wrote:
>> ks_pcie_get_irq_controller_info() was used to configure both MSI and
>> legacy interrupt. This will prevent MSI or legacy interrupt specific
>> intializations. Add separate functions to configure MSI and legacy
>> interrupts.
>>
>> Signed-off-by: Kishon Vijay Abraham I 
>> ---
>>  drivers/pci/controller/dwc/pci-keystone.c | 188 +++---
>>  1 file changed, 96 insertions(+), 92 deletions(-)
>>
>> diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
>> b/drivers/pci/controller/dwc/pci-keystone.c
>> index 4cf9849d5a1d..b1d01751c1af 100644
>> --- a/drivers/pci/controller/dwc/pci-keystone.c
>> +++ b/drivers/pci/controller/dwc/pci-keystone.c
>> @@ -87,11 +87,8 @@ struct keystone_pcie {
>>  struct dw_pcie  *pci;
>>  /* PCI Device ID */
>>  u32 device_id;
>> -int num_legacy_host_irqs;
>> -int legacy_host_irqs[PCI_NUM_INTX];
>>  struct  device_node *legacy_intc_np;
>>  
>> -int num_msi_host_irqs;
>>  int msi_host_irqs[MAX_MSI_HOST_IRQS];
>>  int num_lanes;
>>  u32 num_viewport;
>> @@ -201,14 +198,6 @@ static int ks_pcie_msi_host_init(struct pcie_port *pp)
>>  return dw_pcie_allocate_domains(pp);
>>  }
>>  
>> -static void ks_pcie_enable_legacy_irqs(struct keystone_pcie *ks_pcie)
>> -{
>> -int i;
>> -
>> -for (i = 0; i < PCI_NUM_INTX; i++)
>> -ks_pcie_app_writel(ks_pcie, IRQ_ENABLE_SET(i), 0x1);
>> -}
>> -
>>  static void ks_pcie_handle_legacy_irq(struct keystone_pcie *ks_pcie,
>>int offset)
>>  {
>> @@ -490,17 +479,6 @@ static int __init ks_pcie_dw_host_init(struct 
>> keystone_pcie *ks_pcie)
>>  
>>  ks_pcie->app = *res;
>>  
>> -/* Create legacy IRQ domain */
>> -ks_pcie->legacy_irq_domain =
>> -irq_domain_add_linear(ks_pcie->legacy_intc_np,
>> -  PCI_NUM_INTX,
>> -  _pcie_legacy_irq_domain_ops,
>> -  NULL);
>> -if (!ks_pcie->legacy_irq_domain) {
>> -dev_err(dev, "Failed to add irq domain for legacy irqs\n");
>> -return -EINVAL;
>> -}
>> -
>>  return dw_pcie_host_init(pp);
>>  }
>>  
>> @@ -624,85 +602,117 @@ static void ks_pcie_legacy_irq_handler(struct 
>> irq_desc *desc)
>>  chained_irq_exit(chip, desc);
>>  }
>>  
>> -static int ks_pcie_get_irq_controller_info(struct keystone_pcie *ks_pcie,
>> -   char *controller, int *num_irqs)
>> +static int ks_pcie_config_msi_irq(struct keystone_pcie *ks_pcie)
>>  {
>> -int temp, max_host_irqs, legacy = 1, *host_irqs;
>>  struct device *dev = ks_pcie->pci->dev;
>> -struct device_node *np_pcie = dev->of_node, **np_temp;
>> -
>> -if (!strcmp(controller, "msi-interrupt-controller"))
>> -legacy = 0;
>> -
>> -if (legacy) {
>> -np_temp = _pcie->legacy_intc_np;
>> -max_host_irqs = PCI_NUM_INTX;
>> -host_irqs = _pcie->legacy_host_irqs[0];
>> -} else {
>> -np_temp = _pcie->msi_intc_np;
>> -max_host_irqs = MAX_MSI_HOST_IRQS;
>> -host_irqs =  _pcie->msi_host_irqs[0];
>> -}
>> +struct device_node *np = ks_pcie->np;
>> +struct device_node *intc_np;
>> +int irq_count;
>> +int irq;
>> +int ret;
>> +int i;
> 
> Nit: all int can be in one line.

Okay.
> 
>> -/* interrupt controller is in a child node */
>> -*np_temp = of_get_child_by_name(np_pcie, controller);
>> -if (!(*np_temp)) {
>> -dev_err(dev, "Node for %s is absent\n", controller);
>> -return -EINVAL;
>> -}
>> +if (!IS_ENABLED(CONFIG_PCI_MSI))
>> +return 0;
>>  
>> -temp = of_irq_count(*np_temp);
>> -if (!temp) {
>> -dev_err(dev, "No IRQ entries in %s\n", controller);
>> -of_node_put(*np_temp);
>> +intc_np = of_get_child_by_name(np, "msi-interrupt-controller");
>> +if (!intc_np) {
>> +dev_WARN(dev, "msi-interrupt-controller node is absent\n");
> 
> I do not think you can justify a backtrace for this error path, so
> convert it to a dev_warn() please.

Sure.
> 
>>  return -EINVAL;
>>  }
>>  
>> -if (temp > max_host_irqs)
>> -dev_warn(dev, "Too many %s interrupts defined %u\n",
>> -(legacy ? "legacy" : "MSI"), temp);
>> +irq_count = of_irq_count(intc_np);
>> +if (!irq_count) {
>> +dev_err(dev, "No IRQ entries in msi-interrupt-controller\n");
>> +ret = -EINVAL;
>> +goto err;
>> +}
>>  
>> -/*
>> - * support upto max_host_irqs. In dt from index 0 to 3 (legacy) or 0 

Re: [PATCH v2 2/9] PCI: keystone: Modify legacy_irq_handler to check the IRQ_STATUS of INTA/B/C/D

2019-02-07 Thread Kishon Vijay Abraham I
Hi Lorenzo,

On 07/02/19 9:45 PM, Lorenzo Pieralisi wrote:
> On Thu, Feb 07, 2019 at 04:39:17PM +0530, Kishon Vijay Abraham I wrote:
>> The legacy interrupt handler directly checks the IRQ_STATUS register
>> corresponding to a interrupt line inorder to invoke generic_handle_irq.
>>
>> While this is okay for K2G platform which has separate interrupt line for
>> each of the 4 legacy interrupts, AM654 which uses the same PCIe wrapper
>> has a single interrupt line for all the legacy interrupts. So for AM654
>> the interrupt handler won't be able to directly check the IRQ_STATUS
>> register corresponding to the interrupt line.
>>
>> Also the legacy interrupt handler uses 'virq' obtained from
>> irq_of_parse_and_map to find the correct interrupt line which raised the
>> interrupt. There is no guarantee that virq assigned for contiguous hardware
>> irq will be contiguous and the interrupt handler might end up checking
>> the wrong IRQ_STATUS register.
>>
>> In order to overcome the above issues, read the IRQ_STATUS register of
>> all the 4 legacy interrupts to determine which interrupt was raised.
>>
>> Link: https://lkml.kernel.org/r/bb081d21-7c03-0357-4294-7e92d95d8...@arm.com
>> Signed-off-by: Kishon Vijay Abraham I 
>> ---
>>  drivers/pci/controller/dwc/pci-keystone.c | 22 --
>>  1 file changed, 12 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
>> b/drivers/pci/controller/dwc/pci-keystone.c
>> index 5286a480f76b..4cf9849d5a1d 100644
>> --- a/drivers/pci/controller/dwc/pci-keystone.c
>> +++ b/drivers/pci/controller/dwc/pci-keystone.c
>> @@ -214,16 +214,11 @@ static void ks_pcie_handle_legacy_irq(struct 
>> keystone_pcie *ks_pcie,
>>  {
>>  struct dw_pcie *pci = ks_pcie->pci;
>>  struct device *dev = pci->dev;
>> -u32 pending;
>>  int virq;
>>  
>> -pending = ks_pcie_app_readl(ks_pcie, IRQ_STATUS(offset));
>> -
>> -if (BIT(0) & pending) {
>> -virq = irq_linear_revmap(ks_pcie->legacy_irq_domain, offset);
>> -dev_dbg(dev, ": irq: irq_offset %d, virq %d\n", offset, virq);
>> -generic_handle_irq(virq);
>> -}
>> +virq = irq_linear_revmap(ks_pcie->legacy_irq_domain, offset);
>> +dev_dbg(dev, ": irq: irq_offset %d, virq %d\n", offset, virq);
>> +generic_handle_irq(virq);
>>  
>>  /* EOI the INTx interrupt */
>>  ks_pcie_app_writel(ks_pcie, IRQ_EOI, offset);
>> @@ -607,8 +602,9 @@ static void ks_pcie_legacy_irq_handler(struct irq_desc 
>> *desc)
>>  struct keystone_pcie *ks_pcie = irq_desc_get_handler_data(desc);
>>  struct dw_pcie *pci = ks_pcie->pci;
>>  struct device *dev = pci->dev;
>> -u32 irq_offset = irq - ks_pcie->legacy_host_irqs[0];
>>  struct irq_chip *chip = irq_desc_get_chip(desc);
>> +unsigned int irq_no;
>> +u32 reg;
>>  
>>  dev_dbg(dev, ": Handling legacy irq %d\n", irq);
>>  
>> @@ -618,7 +614,13 @@ static void ks_pcie_legacy_irq_handler(struct irq_desc 
>> *desc)
>>   * ack operation.
>>   */
>>  chained_irq_enter(chip, desc);
>> -ks_pcie_handle_legacy_irq(ks_pcie, irq_offset);
>> +for (irq_no = 0; irq_no < PCI_NUM_INTX; irq_no++) {
> 
> I understand the aim of this code but now on platforms where there
> is a 1:1 relationship between Linux IRQ and INTX this loop has
> steps carried out for nothing.
> 
> If I understand the code correctly what this code does is force
> looping over INTX status regs regardless of what linux IRQ number was
> actually active.

right. This driver is used by 2 platforms K2G and AM654 (The patches are there
on the list). K2G has 4 interrupt lines for each of the 4 legacy interrups
while AM654 has a single interrupt line. One of the purpose of this patch is to
have a single legacy interrupt handler for both K2G and AM654.
> 
> You could do something faster by creating a matrix LinuxIRQ x INTx to
> detect what INTx status register should actually be checked.
> 
> This seems overkill to me but it is not that complicated to implement
> and may clarify the code (and avoid reading up to three registers for
> nothing on the IRQ code path, which can make things faster too).

Agreed. But that's not possible for AM654 which has a single interrupt line and
all the registers has to be read to identify the interrupt source.

Thanks
Kishon


Re: [PATCH v2] lib/scatterlist: Provide a DMA page iterator

2019-02-07 Thread Miguel Ojeda
On Thu, Feb 7, 2019 at 11:28 PM Jason Gunthorpe  wrote:
>
> Commit 2db76d7c3c6d ("lib/scatterlist: sg_page_iter: support sg lists w/o
> backing pages") introduced the sg_page_iter_dma_address() function without
> providing a way to use it in the general case. If the sg_dma_len() is not
> equal to the sg length callers cannot safely use the
> for_each_sg_page/sg_page_iter_dma_address combination.
>
> Resolve this API mistake by providing a DMA specific iterator,
> for_each_sg_dma_page(), that uses the right length so
> sg_page_iter_dma_address() works as expected with all sglists.
>
> A new iterator type is introduced to provide compile-time safety against
> wrongly mixing accessors and iterators.
>
> Acked-by: Christoph Hellwig  (for scatterlist)
> Signed-off-by: Jason Gunthorpe 
> ---
>  .clang-format  |  1 +

Thanks for updating the .clang-format, Jason! :-)

Cheers,
Miguel


Re: [RFC v1 0/3] Address potential user-after-free on module unload

2019-02-07 Thread Miguel Ojeda
On Thu, Feb 7, 2019 at 11:33 PM Sven Van Asbroeck  wrote:
>
> On Thu, Feb 7, 2019 at 5:21 PM Dmitry Torokhov
>  wrote:
> >
> > > ./drivers//input/keyboard/matrix_keypad.c:512:1-18: missing clean-up
> > > of INIT_WORK/INIT_DELAYED_WORK initialized here
> >
> > This is not as simple.
> >
>
> PS If you change
> flush_work(>work.work);
> to
> flush_delayed_work(>work);
>
> then the Coccinelle script works correctly, and does not flag
> this driver.

Similarly, in drivers/auxdisplay/ht16k33.c, the cancel_delayed_work()
is there, instead of cancel_delayed_work_sync(). Having the script
suggest this change would be useful, too (i.e. instead of the devm_
change, assuming the cancel_delayed_work() is already there).

Thanks!
Miguel


Re: [PATCH v10, RESEND 4/6] tpm: move tpm_chip definition to include/linux/tpm.h

2019-02-07 Thread Nathan Chancellor
p(d) container_of(d, struct tpm_chip, dev)
>  
>  struct tpm_header {
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index afd022fc9d3d..816e686a73ac 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -22,6 +22,10 @@
>  #ifndef __LINUX_TPM_H__
>  #define __LINUX_TPM_H__
>  
> +#include 
> +#include 
> +#include 
> +#include 
>  #include 
>  
>  #define TPM_DIGEST_SIZE 20   /* Max TPM v1.2 PCR size */
> @@ -75,6 +79,93 @@ struct tpm_class_ops {
>   void (*clk_enable)(struct tpm_chip *chip, bool value);
>  };
>  
> +#define TPM_NUM_EVENT_LOG_FILES  3
> +
> +/* Indexes the duration array */
> +enum tpm_duration {
> + TPM_SHORT = 0,
> + TPM_MEDIUM = 1,
> + TPM_LONG = 2,
> + TPM_LONG_LONG = 3,
> + TPM_UNDEFINED,
> + TPM_NUM_DURATIONS = TPM_UNDEFINED,
> +};
> +
> +#define TPM_PPI_VERSION_LEN  3
> +
> +struct tpm_space {
> + u32 context_tbl[3];
> + u8 *context_buf;
> + u32 session_tbl[3];
> + u8 *session_buf;
> +};
> +
> +struct tpm_bios_log {
> + void *bios_event_log;
> + void *bios_event_log_end;
> +};
> +
> +struct tpm_chip_seqops {
> + struct tpm_chip *chip;
> + const struct seq_operations *seqops;
> +};
> +
> +struct tpm_chip {
> + struct device dev;
> + struct device devs;
> + struct cdev cdev;
> + struct cdev cdevs;
> +
> + /* A driver callback under ops cannot be run unless ops_sem is held
> +  * (sometimes implicitly, eg for the sysfs code). ops becomes null
> +  * when the driver is unregistered, see tpm_try_get_ops.
> +  */
> + struct rw_semaphore ops_sem;
> + const struct tpm_class_ops *ops;
> +
> + struct tpm_bios_log log;
> + struct tpm_chip_seqops bin_log_seqops;
> + struct tpm_chip_seqops ascii_log_seqops;
> +
> + unsigned int flags;
> +
> + int dev_num;/* /dev/tpm# */
> + unsigned long is_open;  /* only one allowed */
> +
> + char hwrng_name[64];
> + struct hwrng hwrng;
> +
> + struct mutex tpm_mutex; /* tpm is processing */
> +
> + unsigned long timeout_a; /* jiffies */
> + unsigned long timeout_b; /* jiffies */
> + unsigned long timeout_c; /* jiffies */
> + unsigned long timeout_d; /* jiffies */
> + bool timeout_adjusted;
> + unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
> + bool duration_adjusted;
> +
> + struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
> +
> + const struct attribute_group *groups[3];
> + unsigned int groups_cnt;
> +
> + u32 nr_allocated_banks;
> + struct tpm_bank_info *allocated_banks;
> +#ifdef CONFIG_ACPI
> + acpi_handle acpi_dev_handle;
> + char ppi_version[TPM_PPI_VERSION_LEN + 1];
> +#endif /* CONFIG_ACPI */
> +
> + struct tpm_space work_space;
> + u32 last_cc;
> + u32 nr_commands;
> + u32 *cc_attrs_tbl;
> +
> + /* active locality */
> + int locality;
> +};
> +
>  #if defined(CONFIG_TCG_TPM) || defined(CONFIG_TCG_TPM_MODULE)
>  
>  extern int tpm_is_tpm2(struct tpm_chip *chip);
> -- 
> 2.17.1
> 

Hi Robert,

This patch causes a build error with Clang (bisected on next-20190207):

security/integrity/ima/ima.h:191:2: error: redefinition of enumerator 'NONE'
__ima_hooks(__ima_hook_enumify)
^
security/integrity/ima/ima.h:176:7: note: expanded from macro '__ima_hooks'
hook(NONE)  \
 ^
include/linux/efi.h:1709:2: note: previous definition is here
NONE,
^
1 error generated.

I am not sure how to reconcile this otherwise I would have sent a patch.

Thanks,
Nathan


Re: [LSF/MM TOPIC] Non standard size THP

2019-02-07 Thread Matthew Wilcox
On Fri, Feb 08, 2019 at 07:43:57AM +0530, Anshuman Khandual wrote:
> How non-standard huge pages can be supported for THP
> 
>   - THP starts recognizing non standard huge page (exported by arch) like 
> HPAGE_CONT_(PMD|PTE)_SIZE
>   - THP starts operating for either on HPAGE_PMD_SIZE or 
> HPAGE_CONT_PMD_SIZE or HPAGE_CONT_PTE_SIZE
>   - set_pmd_at() only recognizes HPAGE_PMD_SIZE hence replace 
> set_pmd_at() with set_huge_pmd_at()
>   - set_huge_pmd_at() could differentiate between HPAGE_PMD_SIZE or 
> HPAGE_CONT_PMD_SIZE
>   - In case for HPAGE_CONT_PTE_SIZE extend page table walker till PTE 
> level
>   - Use set_huge_pte_at() which can operate on multiple contiguous PTE 
> bits

I think your proposed solution reflects thinking like a hardware person
rather than like a software person.  Or maybe like an MM person rather
than a FS person.  I see the same problem with Kirill's solutions ;-)

Perhaps you don't realise that using larger pages when appropriate
would also benefit filesystems as well as CPUs.  You didn't include
linux-fsdevel on this submission, so that's a plausible explanation.

The XArray currently supports arbitrary power-of-two-naturally-aligned
page sizes, and conveniently so does the page allocator [1].  The problem
is that various bits of the MM have a very fixed mindset that pages are
PTE, PMD or PUD in size.

We should enhance routines like vmf_insert_page() to handle
arbitrary sized pages rather than having separate vmf_insert_pfn()
and vmf_insert_pfn_pmd().  We probably need to enhance the set_pxx_at()
API to pass in an order, rather than explicitly naming pte/pmd/pud/...

First, though, we need to actually get arbitrary sized pages handled
correctly in the page cache.  So if anyone's interested in talking about
this, but hasn't been reviewing or commenting on the patches I've been
sending to make this happen, I'm going to seriously question their actual
commitment to wanting this to happen, rather than wanting a nice holiday
in Puerto Rico.

Sorry to be so blunt about this, but I've only had review from Kirill,
which makes me think that nobody else actually cares about getting
this fixed.

[1] Support for arbitrary sized and aligned entries is in progress for
the XArray, but I don't think there's any appetite for changing the buddy
allocator to let us allocate "pages" that are an arbitrary extent in size.



[PATCH net-next] ixgbe: Use struct_size() helper

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

Notice that, in this case, variable size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 62e6499e4146..cc3196ae5aea 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -836,12 +836,10 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter 
*adapter,
struct ixgbe_ring *ring;
int node = NUMA_NO_NODE;
int cpu = -1;
-   int ring_count, size;
+   int ring_count;
u8 tcs = adapter->hw_tcs;
 
ring_count = txr_count + rxr_count + xdp_count;
-   size = sizeof(struct ixgbe_q_vector) +
-  (sizeof(struct ixgbe_ring) * ring_count);
 
/* customize cpu for Flow Director mapping */
if ((tcs <= 1) && !(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)) {
@@ -855,9 +853,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter 
*adapter,
}
 
/* allocate q_vector and rings */
-   q_vector = kzalloc_node(size, GFP_KERNEL, node);
+   q_vector = kzalloc_node(struct_size(q_vector, ring, ring_count),
+   GFP_KERNEL, node);
if (!q_vector)
-   q_vector = kzalloc(size, GFP_KERNEL);
+   q_vector = kzalloc(struct_size(q_vector, ring, ring_count),
+  GFP_KERNEL);
if (!q_vector)
return -ENOMEM;
 
-- 
2.20.1



[PATCH net-next] igc: Use struct_size() helper

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)

Notice that, in this case, variable size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/intel/igc/igc_main.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
b/drivers/net/ethernet/intel/igc/igc_main.c
index 11c75433fe22..87a11879bf2d 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2958,22 +2958,21 @@ static int igc_alloc_q_vector(struct igc_adapter 
*adapter,
 {
struct igc_q_vector *q_vector;
struct igc_ring *ring;
-   int ring_count, size;
+   int ring_count;
 
/* igc only supports 1 Tx and/or 1 Rx queue per vector */
if (txr_count > 1 || rxr_count > 1)
return -ENOMEM;
 
ring_count = txr_count + rxr_count;
-   size = sizeof(struct igc_q_vector) +
-   (sizeof(struct igc_ring) * ring_count);
 
/* allocate q_vector and rings */
q_vector = adapter->q_vector[v_idx];
if (!q_vector)
-   q_vector = kzalloc(size, GFP_KERNEL);
+   q_vector = kzalloc(struct_size(q_vector, ring, ring_count),
+  GFP_KERNEL);
else
-   memset(q_vector, 0, size);
+   memset(q_vector, 0, struct_size(q_vector, ring, ring_count));
if (!q_vector)
return -ENOMEM;
 
-- 
2.20.1



Re: [RFC PATCH 1/4] watchdog: hpwdt: Don't disable watchdog on NMI

2019-02-07 Thread Guenter Roeck

On 2/7/19 5:26 PM, Jerry Hoemann wrote:

On Sat, Feb 02, 2019 at 09:55:29AM +0500, Ivan Mironov wrote:

On Tue, 2019-01-15 at 19:27 -0700, Jerry Hoemann wrote:

On Mon, Jan 14, 2019 at 07:36:14AM +0500, Ivan Mironov wrote:


Somehow I missed the whole pretimout thing when reading about the
watchdog API. Thanks for clarification, now code makes much more sense
=).

Still, I do not really understand the point of enabling of kdump
support in hpwdt driver by default while kdump is not enabled by
default.


Kdump is enabled by default by our Distro partners.

HPE works with distro partners to deliver a validated system which we support.

The ability to generate crash dumps is one of the means we use to
support our customers.  Even if kdump isn't configured, panic will
at least print stack trace to indicate system activity.




Also, existing code may call hpwdt_stop() (and thus break watchdog)
even if pretimout is disabled.

Also, "panic=N" option is not providing a way to *not* panic on NMI
unrelated with iLO. This could be circumvented by blacklisting the
hpwdt module entirely, but normal watchdog functionality would be lost
then.


panic=N provides for reset upon receipt of NMI if user wants timeout
to reset system but not a crash dump.

The panic is for error containment.  On the legacy systems within
the context of hpwdt_pretimeout we cannot determine if the error
is recoverable or not.  So, we have little choice but to panic.




It is possible to rebuild kernel without HPWDT_NMI_DECODING (which is
enabled in Fedora, for example). But it is nearly impossible to come to
this solution without examining the source code, because description of
this option does not mention that it is really about pretimout support
and panics and not about something else...


The name is not the best given its current use, but I'm not sure a
name change would be allowed.



I would be open to accepting an improved help text of this configuration
option. I am not going to entertain (or accept) a name change.
That would open up a can of worms, with everyone in the world requesting
name changes. That would be pretty pointless and make the kernel all but
unmanageable.

Guenter


However, I will send a patch to update the documentation in Kconfig.






[PATCH net-next] igb: use struct_size() helper

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

size = struct_size(instance, entry, count);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 6d812e96572d..69b230c53fed 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1189,15 +1189,15 @@ static int igb_alloc_q_vector(struct igb_adapter 
*adapter,
 {
struct igb_q_vector *q_vector;
struct igb_ring *ring;
-   int ring_count, size;
+   int ring_count;
+   size_t size;
 
/* igb only supports 1 Tx and/or 1 Rx queue per vector */
if (txr_count > 1 || rxr_count > 1)
return -ENOMEM;
 
ring_count = txr_count + rxr_count;
-   size = sizeof(struct igb_q_vector) +
-  (sizeof(struct igb_ring) * ring_count);
+   size = struct_size(q_vector, ring, ring_count);
 
/* allocate q_vector and rings */
q_vector = adapter->q_vector[v_idx];
-- 
2.20.1



[PATCH net-next] ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs

2019-02-07 Thread Callum Sinclair
Currently the only way to clear the mfc cache was to delete the entries
one by one using the MRT_DEL_MFC socket option or to destroy and
recreate the socket.

Create a new socket option which will clear the multicast forwarding
cache on the socket without destroying the socket. The new socket option
MRT_FLUSH_ENTRIES will clear all multicast entries on the sockets table
and the MRT_FLUSH_VIFS will delete all multicast vifs on the socket
table.

Signed-off-by: Callum Sinclair 
---
 include/uapi/linux/mroute.h  |  7 +++-
 include/uapi/linux/mroute6.h |  7 +++-
 net/ipv4/ipmr.c  | 69 -
 net/ipv6/ip6mr.c | 74 ++--
 4 files changed, 100 insertions(+), 57 deletions(-)

diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h
index 5d37a9ccce63..673495ca3495 100644
--- a/include/uapi/linux/mroute.h
+++ b/include/uapi/linux/mroute.h
@@ -28,12 +28,17 @@
 #define MRT_TABLE  (MRT_BASE+9)/* Specify mroute table ID  
*/
 #define MRT_ADD_MFC_PROXY  (MRT_BASE+10)   /* Add a (*,*|G) mfc entry  
*/
 #define MRT_DEL_MFC_PROXY  (MRT_BASE+11)   /* Del a (*,*|G) mfc entry  
*/
-#define MRT_MAX(MRT_BASE+11)
+#define MRT_FLUSH  (MRT_BASE+12)   /* Flush all multicast entries and vifs 
*/
+#define MRT_MAX(MRT_BASE+12)
 
 #define SIOCGETVIFCNT  SIOCPROTOPRIVATE/* IP protocol privates */
 #define SIOCGETSGCNT   (SIOCPROTOPRIVATE+1)
 #define SIOCGETRPF (SIOCPROTOPRIVATE+2)
 
+/* MRT_FLUSH optional flags */
+#define MRT_FLUSH_ENTRIES  1   /* For flushing all multicast entries */
+#define MRT_FLUSH_VIFS 2   /* For flushing all multicast vifs */
+
 #define MAXVIFS32
 typedef unsigned long vifbitmap_t; /* User mode code depends on this lot */
 typedef unsigned short vifi_t;
diff --git a/include/uapi/linux/mroute6.h b/include/uapi/linux/mroute6.h
index cc006390..1f7ac3f0bc20 100644
--- a/include/uapi/linux/mroute6.h
+++ b/include/uapi/linux/mroute6.h
@@ -31,12 +31,17 @@
 #define MRT6_TABLE (MRT6_BASE+9)   /* Specify mroute table ID  
*/
 #define MRT6_ADD_MFC_PROXY (MRT6_BASE+10)  /* Add a (*,*|G) mfc entry  
*/
 #define MRT6_DEL_MFC_PROXY (MRT6_BASE+11)  /* Del a (*,*|G) mfc entry  
*/
-#define MRT6_MAX   (MRT6_BASE+11)
+#define MRT6_FLUSH (MRT6_BASE+12)  /* Flush all multicast entries and vifs 
*/
+#define MRT6_MAX   (MRT6_BASE+12)
 
 #define SIOCGETMIFCNT_IN6  SIOCPROTOPRIVATE/* IP protocol privates 
*/
 #define SIOCGETSGCNT_IN6   (SIOCPROTOPRIVATE+1)
 #define SIOCGETRPF (SIOCPROTOPRIVATE+2)
 
+/* MRT6_FLUSH optional flags */
+#define MRT6_FLUSH_ENTRIES 1   /* For flushing all multicast entries */
+#define MRT6_FLUSH_VIFS2   /* For flushing all multicast 
vifs */
+
 #define MAXMIFS32
 typedef unsigned long mifbitmap_t; /* User mode code depends on this lot */
 typedef unsigned short mifi_t;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index e536970557dd..dc7a4ee1d7ca 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -110,7 +110,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
 static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
 int cmd);
 static void igmpmsg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt);
-static void mroute_clean_tables(struct mr_table *mrt, bool all);
+static void mroute_clean_tables(struct mr_table *mrt, bool all, int flags);
 static void ipmr_expire_process(struct timer_list *t);
 
 #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
@@ -415,7 +415,7 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 
id)
 static void ipmr_free_table(struct mr_table *mrt)
 {
del_timer_sync(>ipmr_expire_timer);
-   mroute_clean_tables(mrt, true);
+   mroute_clean_tables(mrt, true, MRT_FLUSH_VIFS | MRT_FLUSH_ENTRIES);
rhltable_destroy(>mfc_hash);
kfree(mrt);
 }
@@ -1296,7 +1296,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table 
*mrt,
 }
 
 /* Close the multicast socket, and clear the vif tables etc */
-static void mroute_clean_tables(struct mr_table *mrt, bool all)
+static void mroute_clean_tables(struct mr_table *mrt, bool all, int flags)
 {
struct net *net = read_pnet(>net);
struct mr_mfc *c, *tmp;
@@ -1305,35 +1305,39 @@ static void mroute_clean_tables(struct mr_table *mrt, 
bool all)
int i;
 
/* Shut down all active vif entries */
-   for (i = 0; i < mrt->maxvif; i++) {
-   if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
-   continue;
-   vif_delete(mrt, i, 0, );
+   if (flags & MRT_FLUSH_VIFS) {
+   for (i = 0; i < mrt->maxvif; i++) {
+   if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
+   continue;
+   

[PATCH net-next] ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs

2019-02-07 Thread Callum Sinclair
Created a way to clear the multicast forwarding cache on a socket
without having to either remove the entries manually using the delete
entry socket option or destroy and recreate the multicast socket.

Using the flags MRT_FLUSH_ENTRIES and MRT_FLUSH_VIFS, all multicast
entries can be cleared, all multicast interfaces can be closed or both
can be cleared using one sockopt call.

Callum Sinclair (1):
  ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs

 include/uapi/linux/mroute.h  |  7 +++-
 include/uapi/linux/mroute6.h |  7 +++-
 net/ipv4/ipmr.c  | 69 -
 net/ipv6/ip6mr.c | 74 ++--
 4 files changed, 100 insertions(+), 57 deletions(-)

-- 
2.20.1



Re: [LSF/MM TOPIC] The end of the DAX experiment

2019-02-07 Thread Kani, Toshi
On Wed, 2019-02-06 at 13:12 -0800, Dan Williams wrote:
> Before people get too excited this isn't a proposal to kill DAX. The
> topic proposal is a discussion to resolve lingering open questions
> that currently motivate ext4 and xfs to scream "EXPERIMENTAL" when the
> current DAX facilities are enabled. The are 2 primary concerns to
> resolve. Enumerate the remaining features/fixes, and identify a path
> to implement it all without regressing any existing application use
> cases.
> 
> An enumeration of remaining projects follows, please expand this list
> if I missed something:
> 
> * "DAX" has no specific meaning by itself, users have 2 use cases for
> "DAX" capabilities: userspace cache management via MAP_SYNC, and page
> cache avoidance where the latter aspect of DAX has no current api to
> discover / use it. The project is to supplement MAP_SYNC with a
> MAP_DIRECT facility and MADV_SYNC / MADV_DIRECT to indicate the same
> dynamically via madvise. Similar to O_DIRECT, MAP_DIRECT would be an
> application hint to avoid / minimiize page cache usage, but no strict
> guarantee like what MAP_SYNC provides.

Agreed that basic dax programming model needs to be settled.

> * Resolve all "if (dax) goto fail;" patterns in the kernel. Outside of
> longterm-GUP (a topic in its own right) the projects here are
> XFS-reflink and XFS-realtime-device support. DAX+reflink effectively
> requires a given physical page to be mapped into two different inodes
> at different (page->index) offsets. The challenge is to support
> DAX-reflink without violating any existing application visible
> semantics, the operating assumption / strawman to debate is that
> experimental status is not blanket permission to go change existing
> semantics in backwards incompatible ways.

I do not think "if (dax) goto fail;" is actually bad.  It helps users to
immediately realize that dax may not be enabled and they have a setup
issue.  I also think that FS-specific enhancements like DAX-reflink=1
support should be managed separately and should be possible after
EXPERIMENTAL is removed from 'mount -o dax'.  That is, removing
EXPERIMENTAL for DAX-reflink=0 should not require support of DAX-
reflink=1.

I've also had multiple users complained about 'mount -o dax' succeeded
by falling back to non-dax despite of their intent.  Such users wasted
many hours without knowing their setup error / current restrictions. 
The next think they always ask is how to check if dax is enabled in such
case, and they are not happy about limited interfaces (ex. look for
mount entry in /proc/mounts), either.

> * Deprecate, but not remove, the DAX mount option. Too many flows
> depend on the option so it will never go away, but the facility is too
> coarse. Provide an option to enable MAP_SYNC and
> more-likely-to-do-something-useful-MAP_DIRECT on a per-directory
> basis. The current proposal is to allow this property to only be
> toggled while the directory is empty to avoid the complications of
> racing page invalidation with new DAX mappings.

Same as above.  Such enhancement should be possible after EXPERIMENTAL
is removed from 'mount -o dax'.  IOW, a separate EXPERIMENTAL message
can be shown when user requests per-directory dax.

Thanks,
-Toshi


Re: [PATCH][next] Bluetooth: a2mp: Use struct_size() helper

2019-02-07 Thread Gustavo A. R. Silva



On 2/7/19 10:00 PM, Joe Perches wrote:
> On Thu, 2019-02-07 at 18:28 -0600, Gustavo A. R. Silva wrote:
>> One of the more common cases of allocation size calculations is finding
>> the size of a structure that has a zero-sized array at the end, along
>> with memory for some number of elements for that array. For example:
>>
>> struct foo {
>> int stuff;
>> struct boo entry[];
>> };
>>
>> size = sizeof(struct foo) + count * sizeof(struct boo);
>> instance = alloc(size, GFP_KERNEL)
>>
>> Instead of leaving these open-coded and prone to type mistakes, we can
>> now use the new struct_size() helper:
>>
>> size = struct_size(instance, entry, count);
>> instance = alloc(size, GFP_KERNEL)
>>
>> This code was detected with the help of Coccinelle.
> []
>> diff --git a/net/bluetooth/a2mp.c b/net/bluetooth/a2mp.c
> []
>> @@ -174,7 +174,7 @@ static int a2mp_discover_req(struct amp_mgr *mgr, struct 
>> sk_buff *skb,
>>  num_ctrl++;
>>  }
>>  
>> -len = num_ctrl * sizeof(struct a2mp_cl) + sizeof(*rsp);
>> +len = struct_size(rsp, cl, num_ctrl);
>>  rsp = kmalloc(len, GFP_ATOMIC);
>>  if (!rsp) {
>>  read_unlock(_dev_list_lock);
> 
> At least a weakness in this code is len is u16
> and struct_size is size_t so there's a size
> truncation possible.
> 
> 

That's true.  I didn't change the type to size_t because of the call
to le16_to_cpu():

u16 len = le16_to_cpu(hdr->len);

I've been changing the type of the variable in other cases.

--
Gustavo


Re: [PATCH][next] Bluetooth: hci_event: Use struct_size() helper

2019-02-07 Thread Joe Perches
On Thu, 2019-02-07 at 18:40 -0600, Gustavo A. R. Silva wrote:
> Make use of the struct_size() helper instead of an open-coded version
> in order to avoid any potential type mistakes, in particular in the
> context in which this code is being used.
> 
> So, change the following form:
> 
> sizeof(*ev) + ev->num_hndl * sizeof(struct hci_comp_pkts_info)
> 
>  to :
> 
> struct_size(ev, handles, ev->num_hndl)
> 
> This code was detected with the help of Coccinelle.
[]
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
[]
> @@ -3556,8 +3556,8 @@ static void hci_num_comp_pkts_evt(struct hci_dev *hdev, 
> struct sk_buff *skb)
>   return;
>   }
>  
> - if (skb->len < sizeof(*ev) || skb->len < sizeof(*ev) +
> - ev->num_hndl * sizeof(struct hci_comp_pkts_info)) {
> + if (skb->len < sizeof(*ev) ||
> + skb->len < struct_size(ev, handles, ev->num_hndl)) {
>   BT_DBG("%s bad parameters", hdev->name);
>   return;
>   }
> @@ -3644,8 +3644,8 @@ static void hci_num_comp_blocks_evt(struct hci_dev 
> *hdev, struct sk_buff *skb)
>   return;
>   }
>  
> - if (skb->len < sizeof(*ev) || skb->len < sizeof(*ev) +
> - ev->num_hndl * sizeof(struct hci_comp_blocks_info)) {
> + if (skb->len < sizeof(*ev) ||
> + skb->len < struct_size(ev, handles, ev->num_hndl)) {

Unless these are in a real fast path where skb->len < sizeof(*ev)
is pretty likely, it seems a bit redundant as the multiplication
is pretty cheap and num_hndl is unsigned (actually __u8)

This could/should be simply

if (skb->len < struct_size(...))




Re: [PATCH][next] Bluetooth: a2mp: Use struct_size() helper

2019-02-07 Thread Joe Perches
On Thu, 2019-02-07 at 18:28 -0600, Gustavo A. R. Silva wrote:
> One of the more common cases of allocation size calculations is finding
> the size of a structure that has a zero-sized array at the end, along
> with memory for some number of elements for that array. For example:
> 
> struct foo {
> int stuff;
> struct boo entry[];
> };
> 
> size = sizeof(struct foo) + count * sizeof(struct boo);
> instance = alloc(size, GFP_KERNEL)
> 
> Instead of leaving these open-coded and prone to type mistakes, we can
> now use the new struct_size() helper:
> 
> size = struct_size(instance, entry, count);
> instance = alloc(size, GFP_KERNEL)
> 
> This code was detected with the help of Coccinelle.
[]
> diff --git a/net/bluetooth/a2mp.c b/net/bluetooth/a2mp.c
[]
> @@ -174,7 +174,7 @@ static int a2mp_discover_req(struct amp_mgr *mgr, struct 
> sk_buff *skb,
>   num_ctrl++;
>   }
>  
> - len = num_ctrl * sizeof(struct a2mp_cl) + sizeof(*rsp);
> + len = struct_size(rsp, cl, num_ctrl);
>   rsp = kmalloc(len, GFP_ATOMIC);
>   if (!rsp) {
>   read_unlock(_dev_list_lock);

At least a weakness in this code is len is u16
and struct_size is size_t so there's a size
truncation possible.




[PATCH net-next] fm10k: use struct_size() in kzalloc()

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

Notice that, in this case, variable size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 6fd15a734324..5a0419421511 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1603,14 +1603,12 @@ static int fm10k_alloc_q_vector(struct fm10k_intfc 
*interface,
 {
struct fm10k_q_vector *q_vector;
struct fm10k_ring *ring;
-   int ring_count, size;
+   int ring_count;
 
ring_count = txr_count + rxr_count;
-   size = sizeof(struct fm10k_q_vector) +
-  (sizeof(struct fm10k_ring) * ring_count);
 
/* allocate q_vector and rings */
-   q_vector = kzalloc(size, GFP_KERNEL);
+   q_vector = kzalloc(struct_size(q_vector, ring, ring_count), GFP_KERNEL);
if (!q_vector)
return -ENOMEM;
 
-- 
2.20.1



Re: [PATCH net-next] xprtrdma: Use struct_size() in kzalloc()

2019-02-07 Thread Gustavo A. R. Silva



On 1/31/19 8:11 AM, Chuck Lever wrote:
> 
> 
>> On Jan 30, 2019, at 7:46 PM, Gustavo A. R. Silva  
>> wrote:
>>
>> One of the more common cases of allocation size calculations is finding
>> the size of a structure that has a zero-sized array at the end, along
>> with memory for some number of elements for that array. For example:
>>
>> struct foo {
>>int stuff;
>>struct boo entry[];
>> };
>>
>> instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), 
>> GFP_KERNEL);
>>
>> Instead of leaving these open-coded and prone to type mistakes, we can
>> now use the new struct_size() helper:
>>
>> instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
>>
>> This code was detected with the help of Coccinelle.
>>
>> Signed-off-by: Gustavo A. R. Silva 
> 
> Reviewed-by: Chuck Lever 
> 

Thanks, Chuck.

--
Gustavo


[PATCH net-next] nfp: flower: cmsg: use struct_size() helper

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
void *entry[];
};

size = sizeof(struct foo) + count * sizeof(void *);
instance = alloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = alloc(struct_size(instance, entry, count), GFP_KERNEL);

Notice that, in this case, variable size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index 56b22ea32474..cf9e1118ee8f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -45,11 +45,9 @@ nfp_flower_cmsg_mac_repr_start(struct nfp_app *app, unsigned 
int num_ports)
 {
struct nfp_flower_cmsg_mac_repr *msg;
struct sk_buff *skb;
-   unsigned int size;
 
-   size = sizeof(*msg) + num_ports * sizeof(msg->ports[0]);
-   skb = nfp_flower_cmsg_alloc(app, size, NFP_FLOWER_CMSG_TYPE_MAC_REPR,
-   GFP_KERNEL);
+   skb = nfp_flower_cmsg_alloc(app, struct_size(msg, ports, num_ports),
+   NFP_FLOWER_CMSG_TYPE_MAC_REPR, GFP_KERNEL);
if (!skb)
return NULL;
 
-- 
2.20.1



[PATCH net-next] mlxsw: spectrum_router: Use struct_size() in kzalloc()

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)

Notice that, in this case, variable alloc_size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c| 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 230e1f6e192b..e3a3535f9f23 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -3813,13 +3813,11 @@ mlxsw_sp_nexthop4_group_create(struct mlxsw_sp 
*mlxsw_sp, struct fib_info *fi)
struct mlxsw_sp_nexthop_group *nh_grp;
struct mlxsw_sp_nexthop *nh;
struct fib_nh *fib_nh;
-   size_t alloc_size;
int i;
int err;
 
-   alloc_size = sizeof(*nh_grp) +
-fi->fib_nhs * sizeof(struct mlxsw_sp_nexthop);
-   nh_grp = kzalloc(alloc_size, GFP_KERNEL);
+   nh_grp = kzalloc(struct_size(nh_grp, nexthops, fi->fib_nhs),
+GFP_KERNEL);
if (!nh_grp)
return ERR_PTR(-ENOMEM);
nh_grp->priv = fi;
@@ -5045,13 +5043,11 @@ mlxsw_sp_nexthop6_group_create(struct mlxsw_sp 
*mlxsw_sp,
struct mlxsw_sp_nexthop_group *nh_grp;
struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
struct mlxsw_sp_nexthop *nh;
-   size_t alloc_size;
int i = 0;
int err;
 
-   alloc_size = sizeof(*nh_grp) +
-fib6_entry->nrt6 * sizeof(struct mlxsw_sp_nexthop);
-   nh_grp = kzalloc(alloc_size, GFP_KERNEL);
+   nh_grp = kzalloc(struct_size(nh_grp, nexthops, fib6_entry->nrt6),
+GFP_KERNEL);
if (!nh_grp)
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(_grp->fib_list);
-- 
2.20.1



Re: [PATCH 0/1] Start conversion of PowerPC docs

2019-02-07 Thread Michael Ellerman
Jonathan Corbet  writes:
> On Thu,  7 Feb 2019 17:03:15 +1100
> "Tobin C. Harding"  wrote:
>
>> As discussed at LCA here is the start to the docs conversion for PowerPC
>> to RST.
>> 
>> This applies cleanly on top of the mainline (5.20-rc5) and Jon's tree
>> (docs-next branch).
>> 
>> I'm guessing it should go in through the PowerPC tree because I doubt
>> you want to review this Jon, it's one big single patch (all blame for
>> that falls on mpe ;)
>
> Well, I went and took a look anyway, being a glutton for punishment.  So
> naturally I do have some comments...
>
> - I don't think this should be a top-level directory full of docs; the top
>   level is already rather overpopulated.  At worst, we should create an
>   arch/ directory for architecture-specific docs.

We currently have arch specific directories for arm, arm64, ia64, m68k,
nios2, openrisc, parisc, powerpc, s390, sh, sparc, x86, xtensa.

Do you mean they should all be moved to Documentation/arch ?

>   I kind of think that
>   this should be thought through a bit more, though, with an eye toward
>   who the audience is.  Some of it is clearly developer documentation, and
>   some of it is aimed at admins; ptrace.rst is user-space API stuff.
>   Nobody ever welcomes me saying this, but we should really split things
>   into the appropriate manuals according to audience.

I don't think any of it's aimed at admins, but I haven't read every
word. I see it as aimed at kernel devs or people writing directly to the
kernel API, eg. gdb developers reading ptrace.rst.

If Documentation/ wants to be more user focused and nicely curated
perhaps we need arch/foo/docs/ for these developer centric docs?

> - It would be good to know how much of this stuff is still relevant.
>   bootwrapper.txt hasn't been modified since it was added in 2008.

It hasn't been modified but AFAIK it's still pretty much accurate and
definitely something we want to have documented.

>   cpu_features.txt predates the git era

But so does the code it's documenting.

>   as does mpc52xx.txt; hvcs.txt is nearly as old. And so on. Can we
>   perhaps stop dragging some of those docs around?

We support some hardware that is ~25 years old, so we have some old
documentation too, and I'd rather we didn't drop things just because
they're old.

> - The use of flat-table in isa-versions.rst totally wrecks the readability
>   of those tables in the plain-text version.  Said tables are pretty close
>   to being RST in their original form; it would be far better to just fix
>   anything needing fixing but to keep that form.

Yes agree, I'd like the docs to be as readable as possible as plain text.

> - I'm glad you're adding SPDX lines, but do you know that the license is
>   correct in each case?  It's best to be careful with such things.

None of the files have licenses so I think we just fall back to COPYING
don't we? In which case GPL-2.0 is correct for all files.

cheers


[PATCH net-next] bnx2x: Use struct_size() in kzalloc()

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)

Notice that, in this case, variable fsz is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 8e0a317b31f7..a9bdc21873d3 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -1654,13 +1654,9 @@ static int bnx2x_vf_mbx_macvlan_list(struct bnx2x *bp,
 {
int i, j;
struct bnx2x_vf_mac_vlan_filters *fl = NULL;
-   size_t fsz;
 
-   fsz = tlv->n_mac_vlan_filters *
- sizeof(struct bnx2x_vf_mac_vlan_filter) +
- sizeof(struct bnx2x_vf_mac_vlan_filters);
-
-   fl = kzalloc(fsz, GFP_KERNEL);
+   fl = kzalloc(struct_size(fl, filters, tlv->n_mac_vlan_filters),
+GFP_KERNEL);
if (!fl)
return -ENOMEM;
 
-- 
2.20.1



[PATCH net-next] wimax/i2400m: use struct_size() helper

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
void *entry[];
};

size = sizeof(struct foo) + count * sizeof(void *);
instance = alloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

size = struct_size(instance, entry, count);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/wimax/i2400m/rx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wimax/i2400m/rx.c b/drivers/net/wimax/i2400m/rx.c
index 0b602951ff6b..d28b96d06919 100644
--- a/drivers/net/wimax/i2400m/rx.c
+++ b/drivers/net/wimax/i2400m/rx.c
@@ -1260,8 +1260,8 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
goto error_msg_hdr_check;
result = -EIO;
num_pls = le16_to_cpu(msg_hdr->num_pls);
-   pl_itr = sizeof(*msg_hdr) + /* Check payload descriptor(s) */
-   num_pls * sizeof(msg_hdr->pld[0]);
+   /* Check payload descriptor(s) */
+   pl_itr = struct_size(msg_hdr, pld, num_pls);
pl_itr = ALIGN(pl_itr, I2400M_PL_ALIGN);
if (pl_itr > skb_len) { /* got all the payload descriptors? */
dev_err(dev, "RX: HW BUG? message too short (%u bytes) for "
-- 
2.20.1



[PATCH net-next] wan: wanxl: use struct_size() in kzalloc()

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = alloc(struct_size(instance, entry, count), GFP_KERNEL)

Notice that, in this case, variable alloc_size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/wan/wanxl.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wan/wanxl.c b/drivers/net/wan/wanxl.c
index d573a57bc301..459ce52a0a4d 100644
--- a/drivers/net/wan/wanxl.c
+++ b/drivers/net/wan/wanxl.c
@@ -565,7 +565,7 @@ static int wanxl_pci_init_one(struct pci_dev *pdev,
u32 plx_phy;/* PLX PCI base address */
u32 mem_phy;/* memory PCI base addr */
u8 __iomem *mem;/* memory virtual base addr */
-   int i, ports, alloc_size;
+   int i, ports;
 
 #ifndef MODULE
pr_info_once("%s\n", version);
@@ -601,8 +601,7 @@ static int wanxl_pci_init_one(struct pci_dev *pdev,
default: ports = 4;
}
 
-   alloc_size = sizeof(struct card) + ports * sizeof(struct port);
-   card = kzalloc(alloc_size, GFP_KERNEL);
+   card = kzalloc(struct_size(card, ports, ports), GFP_KERNEL);
if (card == NULL) {
pci_release_regions(pdev);
pci_disable_device(pdev);
-- 
2.20.1



[PATCH net-next] net: usb: cdc-phonet: use struct_size() in alloc_netdev()

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
void *entry[];
};

instance = alloc(sizeof(struct foo) + count * sizeof(void *));

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = alloc(struct_size(instance, entry, count));

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/usb/cdc-phonet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/cdc-phonet.c b/drivers/net/usb/cdc-phonet.c
index 78b16eb9e58c..63aaae487995 100644
--- a/drivers/net/usb/cdc-phonet.c
+++ b/drivers/net/usb/cdc-phonet.c
@@ -361,8 +361,8 @@ static int usbpn_probe(struct usb_interface *intf, const 
struct usb_device_id *i
else
return -EINVAL;
 
-   dev = alloc_netdev(sizeof(*pnd) + sizeof(pnd->urbs[0]) * rxq_size,
-  ifname, NET_NAME_UNKNOWN, usbpn_setup);
+   dev = alloc_netdev(struct_size(pnd, urbs, rxq_size), ifname,
+  NET_NAME_UNKNOWN, usbpn_setup);
if (!dev)
return -ENOMEM;
 
-- 
2.20.1



Re: Kernel panics with recent 4.9 Kernels

2019-02-07 Thread Mike Galbraith
On Fri, 2019-02-08 at 04:09 +0100, Mike Galbraith wrote:
> On Thu, 2019-02-07 at 20:13 +0100, Michael Brunnbauer wrote:
> > hi,
> > 
> > no replies to my mail. Any suggestions what I should do or where I
> > could ask for help?
> 
> I'd suggest trying to capture events with something better than an
> incomplete screenshot, ie serial console, netconsole, anything that
> will let you show the full event and a few lines leading up to it.

P.S. CONFIG_PRINTK_TIME improves log quality as well.


Re: Kernel panics with recent 4.9 Kernels

2019-02-07 Thread Mike Galbraith
On Thu, 2019-02-07 at 20:13 +0100, Michael Brunnbauer wrote:
> hi,
> 
> no replies to my mail. Any suggestions what I should do or where I
> could ask for help?

I'd suggest trying to capture events with something better than an
incomplete screenshot, ie serial console, netconsole, anything that
will let you show the full event and a few lines leading up to it.

-Mike


[PATCH net-next] can: kvaser_usb: Use struct_size() in alloc_candev()

2019-02-07 Thread Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
void *entry[];
};

instance = alloc(sizeof(struct foo) + count * sizeof(void *));

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = alloc(struct_size(instance, entry, count));

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c 
b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
index c89c7d4900d7..0f1d3e807d63 100644
--- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
+++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
@@ -643,8 +643,7 @@ static int kvaser_usb_init_one(struct kvaser_usb *dev,
return err;
}
 
-   netdev = alloc_candev(sizeof(*priv) +
- dev->max_tx_urbs * sizeof(*priv->tx_contexts),
+   netdev = alloc_candev(struct_size(priv, tx_contexts, dev->max_tx_urbs),
  dev->max_tx_urbs);
if (!netdev) {
dev_err(>intf->dev, "Cannot alloc candev\n");
-- 
2.20.1



Re: [PATCH] mm/memory-hotplug: Add sysfs hot-remove trigger

2019-02-07 Thread Anshuman Khandual



On 02/07/2019 09:02 PM, Robin Murphy wrote:
> On 07/02/2019 13:36, Oscar Salvador wrote:
>> On Wed, Feb 06, 2019 at 05:03:53PM +, Robin Murphy wrote:
>>> ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
>>> but being able to exercise the (arguably trickier) hot-remove path would
>>> be even more useful. Extend the feature to allow removal of offline
>>> sections to be triggered manually to aid development.
>>>
>>> Signed-off-by: Robin Murphy 
>>> ---
>>>
>>> This is inspired by a previous proposal[1], but in coming up with a
>>> more robust interface I ended up rewriting the whole thing from
>>> scratch. The lack of documentation is semi-deliberate, since I don't
>>> like the idea of anyone actually relying on this interface as ABI, but
>>> as a handy tool it felt useful enough to be worth sharing :)
>>
>> Hi Robin,
>>
>> I think this might come in handy, especially when trying to test hot-remove
>> on arch's that do not have any means to hot-remove memory, or even on virtual
>> platforms that do not have yet support for hot-remove depending on the 
>> platform,
>> like qemu/arm64.
>>
>>
>> I could have used this while testing hot-remove on other archs for [1]
>>
>>>
>>> Robin.
>>>
>>> [1] 
>>> https://lore.kernel.org/lkml/22d34fe30df0fbacbfceeb47e20cb1184af73585.1511433386.git...@linux.vnet.ibm.com/
>>>
>>
>>> +    if (mem->state != MEM_OFFLINE)
>>> +    return -EBUSY;
>>
>> We do have the helper "is_memblock_offlined()", although it is only used in 
>> one place now.
>> So, I would rather use it here as well.
> 
> Ooh, if I'd actually noticed that that helper existed, I would indeed have 
> used it - fixed.
> 
>>> +
>>> +    ret = lock_device_hotplug_sysfs();
>>> +    if (ret)
>>> +    return ret;
>>> +
>>> +    if (device_remove_file_self(dev, attr)) {
>>> +    __remove_memory(pfn_to_nid(start_pfn), PFN_PHYS(start_pfn),
>>> +    MIN_MEMORY_BLOCK_SIZE * sections_per_block);
>>
>> Sorry, I am not into sysfs inners, but I thought that:
>> device_del::device_remove_attrs::device_remove_groups::sysfs_remove_groups
>> would be enough to remove the dev attributes.
>> I guess in this case that is not enough, could you explain why?
> 
> As I found out the hard way, since the "remove" attribute itself belongs to 
> the device being removed, the standard device teardown callchain would end up 
> trying to remove the file from its own method, which results in deadlock. 
> Fortunately, the PCI sysfs code has a similar "remove" attribute which showed 
> me how it should be handled - following the kerneldoc breadcrumb trail to 
> kernfs_remove_self() hopefully explains it more completely.
Instead we could have an interface like 
/sys/devices/system/memory/[unprobe|remove]
which would take a memory block start address looking into the existing ones and
attempt to remove [addr, addr + memory_block_size] provided its already 
offlined.
This will be exact opposite for /sys/devices/system/memory/probe except the fact
that it can trigger onlining of the memory automatically (even this new one 
could
trigger offlining automatically as well). But I dont have a preference between 
the
proposed one or this one. Either of them should be okay.


Re: [PATCH v3 1/2] mm: add probe_user_read()

2019-02-07 Thread Michael Ellerman
Jann Horn  writes:
> On Thu, Feb 7, 2019 at 10:22 AM Christophe Leroy
>  wrote:
>> In powerpc code, there are several places implementing safe
>> access to user data. This is sometimes implemented using
>> probe_kernel_address() with additional access_ok() verification,
>> sometimes with get_user() enclosed in a pagefault_disable()/enable()
>> pair, etc. :
>> show_user_instructions()
>> bad_stack_expansion()
>> p9_hmi_special_emu()
>> fsl_pci_mcheck_exception()
>> read_user_stack_64()
>> read_user_stack_32() on PPC64
>> read_user_stack_32() on PPC32
>> power_pmu_bhrb_to()
>>
>> In the same spirit as probe_kernel_read(), this patch adds
>> probe_user_read().
>>
>> probe_user_read() does the same as probe_kernel_read() but
>> first checks that it is really a user address.
>>
>> The patch defines this function as a static inline so the "size"
>> variable can be examined for const-ness by the check_object_size()
>> in __copy_from_user_inatomic()
>>
>> Signed-off-by: Christophe Leroy 
>
>
>
>> ---
>>  v3: Moved 'Returns:" comment after description.
>>  Explained in the commit log why the function is defined static inline
>>
>>  v2: Added "Returns:" comment and removed probe_user_address()
>>
>>  include/linux/uaccess.h | 34 ++
>>  1 file changed, 34 insertions(+)
>>
>> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
>> index 37b226e8df13..ef99edd63da3 100644
>> --- a/include/linux/uaccess.h
>> +++ b/include/linux/uaccess.h
>> @@ -263,6 +263,40 @@ extern long strncpy_from_unsafe(char *dst, const void 
>> *unsafe_addr, long count);
>>  #define probe_kernel_address(addr, retval) \
>> probe_kernel_read(, addr, sizeof(retval))
>>
>> +/**
>> + * probe_user_read(): safely attempt to read from a user location
>> + * @dst: pointer to the buffer that shall take the data
>> + * @src: address to read from
>> + * @size: size of the data chunk
>> + *
>> + * Safely read from address @src to the buffer at @dst.  If a kernel fault
>> + * happens, handle that and return -EFAULT.
>> + *
>> + * We ensure that the copy_from_user is executed in atomic context so that
>> + * do_page_fault() doesn't attempt to take mmap_sem.  This makes
>> + * probe_user_read() suitable for use within regions where the caller
>> + * already holds mmap_sem, or other locks which nest inside mmap_sem.
>> + *
>> + * Returns: 0 on success, -EFAULT on error.
>> + */
>> +
>> +#ifndef probe_user_read
>> +static __always_inline long probe_user_read(void *dst, const void __user 
>> *src,
>> +   size_t size)
>> +{
>> +   long ret;
>> +
>> +   if (!access_ok(src, size))
>> +   return -EFAULT;
>
> If this happens in code that's running with KERNEL_DS, the access_ok()
> is a no-op. If this helper is only intended for accessing real
> userspace memory, it would be more robust to add
> set_fs(USER_DS)/set_fs(oldfs) around this thing. Looking at the
> functions you're referring to in the commit message, e.g.
> show_user_instructions() does an explicit `__access_ok(pc,
> NR_INSN_TO_PRINT * sizeof(int), USER_DS)` to get the same effect.

Yeah I raised the same question up thread.

I think we're both right :) - it should explicitly set USER_DS.

There's precedent for that in the code you mentioned and also in the
perf code, eg:

  88b0193d9418 ("perf/callchain: Force USER_DS when invoking 
perf_callchain_user()")


cheers


[RFC] Bluetooth: Retry configure request if result is L2CAP_CONF_UNKNOWN

2019-02-07 Thread Andrey Smirnov
Due to:

 - current implementation of l2cap_config_rsp() dropping BT
   connection if sender of configuration response replied with unknown
   option failure (Result=0x0003/L2CAP_CONF_UNKNOWN)

 - current implementation of l2cap_build_conf_req() adding
   L2CAP_CONF_RFC(0x04) option to initial configure request sent by
   the Linux host.

devices that do no recongninze L2CAP_CONF_RFC, such as Xbox One S
controllers, will get stuck in endless connect -> configure ->
disconnect loop, never connect and be generaly unusable.

To avoid this problem add code to do the following:

 1. Store a mask of supported conf option types per connection

 2. Parse the body of response L2CAP_CONF_UNKNOWN and adjust
connection's supported conf option types mask

 3. Retry configuration step the same way it's done for
L2CAP_CONF_UNACCEPT

Signed-off-by: Andrey Smirnov 
Cc: Pierre-Loup A. Griffais 
Cc: Florian Dollinger 
Cc: Marcel Holtmann 
Cc: Johan Hedberg 
Cc: linux-blueto...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---

Everyone:

I marked this as an RFC, since I don't have a lot of experience with
Bluetooth subsystem and don't have hight degree of confidence about
choices made in this patch. I do, however, thins is is good enough to
start a discussion about the problem.

Thanks,
Andrey Smirnov

 include/net/bluetooth/l2cap.h |  1 +
 net/bluetooth/l2cap_core.c| 58 ++-
 2 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
index 093aedebdf0c..6898bba5d9a8 100644
--- a/include/net/bluetooth/l2cap.h
+++ b/include/net/bluetooth/l2cap.h
@@ -632,6 +632,7 @@ struct l2cap_conn {
unsigned intmtu;
 
__u32   feat_mask;
+   __u32   known_options;
__u8remote_fixed_chan;
__u8local_fixed_chan;
 
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index f17e393b43b4..49be98b6de72 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -3243,8 +3243,10 @@ static int l2cap_build_conf_req(struct l2cap_chan *chan, 
void *data, size_t data
rfc.monitor_timeout = 0;
rfc.max_pdu_size= 0;
 
-   l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
-  (unsigned long) , endptr - ptr);
+   if (chan->conn->known_options & BIT(L2CAP_CONF_RFC)) {
+   l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
+  (unsigned long), endptr - ptr);
+   }
break;
 
case L2CAP_MODE_ERTM:
@@ -3263,8 +3265,10 @@ static int l2cap_build_conf_req(struct l2cap_chan *chan, 
void *data, size_t data
rfc.txwin_size = min_t(u16, chan->tx_win,
   L2CAP_DEFAULT_TX_WINDOW);
 
-   l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
-  (unsigned long) , endptr - ptr);
+   if (chan->conn->known_options & BIT(L2CAP_CONF_RFC)) {
+   l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
+  (unsigned long), endptr - ptr);
+   }
 
if (test_bit(FLAG_EFS_ENABLE, >flags))
l2cap_add_opt_efs(, chan, endptr - ptr);
@@ -3295,8 +3299,10 @@ static int l2cap_build_conf_req(struct l2cap_chan *chan, 
void *data, size_t data
 L2CAP_FCS_SIZE);
rfc.max_pdu_size = cpu_to_le16(size);
 
-   l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
-  (unsigned long) , endptr - ptr);
+   if (chan->conn->known_options & BIT(L2CAP_CONF_RFC)) {
+   l2cap_add_conf_opt(, L2CAP_CONF_RFC, sizeof(rfc),
+  (unsigned long), endptr - ptr);
+   }
 
if (test_bit(FLAG_EFS_ENABLE, >flags))
l2cap_add_opt_efs(, chan, endptr - ptr);
@@ -3550,11 +3556,47 @@ static int l2cap_parse_conf_rsp(struct l2cap_chan 
*chan, void *rsp, int len,
void *endptr = data + size;
int type, olen;
unsigned long val;
+   const bool unknown_options = *result == L2CAP_CONF_UNKNOWN;
struct l2cap_conf_rfc rfc = { .mode = L2CAP_MODE_BASIC };
struct l2cap_conf_efs efs;
 
BT_DBG("chan %p, rsp %p, len %d, req %p", chan, rsp, len, data);
 
+   /* throw out any old stored conf requests */
+   *result = L2CAP_CONF_SUCCESS;
+
+   if (unknown_options) {
+   const u8 *option_type = rsp;
+
+   if (!len) {
+   /* If no list of unknown option types is
+* provided there's nothing for us to do
+*/
+   return -ECONNREFUSED;
+   

  1   2   3   4   5   6   7   8   9   10   >