date:20180701

Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()

2018-07-01 Thread Jan Kara

On Mon 02-07-18 08:52:51, Leon Romanovsky wrote:
> On Thu, Jun 28, 2018 at 11:17:43AM +0200, Jan Kara wrote:
> > On Wed 27-06-18 19:42:01, John Hubbard wrote:
> > > On 06/27/2018 10:02 AM, Jan Kara wrote:
> > > > On Wed 27-06-18 08:57:18, Jason Gunthorpe wrote:
> > > >> On Wed, Jun 27, 2018 at 02:42:55PM +0200, Jan Kara wrote:
> > > >>> On Wed 27-06-18 13:59:27, Michal Hocko wrote:
> > >  On Wed 27-06-18 13:53:49, Jan Kara wrote:
> > > > On Wed 27-06-18 13:32:21, Michal Hocko wrote:
> > >  [...]
> > > >> Appart from that, do we really care about 32b here? Big DIO, IB 
> > > >> users
> > > >> seem to be 64b only AFAIU.
> > > >
> > > > IMO it is a bad habit to leave unpriviledged-user-triggerable oops 
> > > > in the
> > > > kernel even for uncommon platforms...
> > > 
> > >  Absolutely agreed! I didn't mean to keep the blow up for 32b. I just
> > >  wanted to say that we can stay with a simple solution for 32b. I 
> > >  thought
> > >  the g-u-p-longterm has plugged the most obvious breakage already. But
> > >  maybe I just misunderstood.
> > > >>>
> > > >>> Most yes, but if you try hard enough, you can still trigger the oops 
> > > >>> e.g.
> > > >>> with appropriately set up direct IO when racing with writeback / 
> > > >>> reclaim.
> > > >>
> > > >> gup longterm is only different from normal gup if you have DAX and few
> > > >> people do, which really means it doesn't help at all.. AFAIK??
> > > >
> > > > Right, what I wrote works only for DAX. For non-DAX situation g-u-p
> > > > longterm does not currently help at all. Sorry for confusion.
> > > >
> > >
> > > OK, I've got an early version of this up and running, reusing the 
> > > page->lru
> > > fields. I'll clean it up and do some heavier testing, and post as a PATCH 
> > > v2.
> >
> > Cool.
> >
> > > One question though: I'm still vague on the best actions to take in the
> > > following functions:
> > >
> > > page_mkclean_one
> > > try_to_unmap_one
> > >
> > > At the moment, they are both just doing an evil little early-out:
> > >
> > >   if (PageDmaPinned(page))
> > >   return false;
> > >
> > > ...but we talked about maybe waiting for the condition to clear, instead?
> > > Thoughts?
> >
> > What needs to happen in page_mkclean() depends on the caller. Most of the
> > callers really need to be sure the page is write-protected once
> > page_mkclean() returns. Those are:
> >
> >   pagecache_isize_extended()
> >   fb_deferred_io_work()
> >   clear_page_dirty_for_io() if called for data-integrity writeback - which
> > is currently known only in its caller (e.g. write_cache_pages()) where
> > it can be determined as wbc->sync_mode == WB_SYNC_ALL. Getting this
> > information into page_mkclean() will require some plumbing and
> > clear_page_dirty_for_io() has some 50 callers but it's doable.
> >
> > clear_page_dirty_for_io() for cleaning writeback (wbc->sync_mode !=
> > WB_SYNC_ALL) can just skip pinned pages and we probably need to do that as
> > otherwise memory cleaning would get stuck on pinned pages until RDMA
> > drivers release its pins.
> 
> Sorry for naive question, but won't it create too much dirty pages
> so writeback will be called "non-stop" to rebalance watermarks without
> ability to progress?

If the amount of pinned pages is more than allowed dirty limit then yes.
However dirty limit is there exactly to prevent too many
difficult-to-get-rid-of pages in page cache. So if your amount of pinned
pages crosses the dirty limit you have just violated this mm constraint and
you either need to modify your workload not to pin so many pages or you
need to verify so many dirty pages are indeed safe and increase the dirty
limit.

Realistically, I think we need to come up with a hard limit (similar to
mlock or account them as mlock) on these pinned pages because they are even
worse than plain dirty pages. However I'd strongly prefer to keep that
discussion separate to this discussion about method how to avoid memory
corruption / oopses because that is another can of worms with a big
bikeshedding potential.

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH v3 2/2] ubi: expose the volume CRC check skip flag

2018-07-01 Thread Richard Weinberger

Am Montag, 2. Juli 2018, 08:52:27 CEST schrieb Boris Brezillon:
> On Mon, 2 Jul 2018 08:44:33 +0200
> Quentin Schulz  wrote:
> 
> > Hi Richard, Boris,
> > 
> > On Sun, Jul 01, 2018 at 10:50:41PM +0200, Richard Weinberger wrote:
> > > Am Sonntag, 1. Juli 2018, 22:33:47 CEST schrieb Boris Brezillon:  
> > > > On Sun, 01 Jul 2018 21:35:57 +0200
> > > > Richard Weinberger  wrote:
> > > >   
> > > > > Quentin,
> > > > > 
> > > > > Am Donnerstag, 28. Juni 2018, 09:40:53 CEST schrieb Quentin Schulz:  
> > > > > > Now that we have the logic for skipping CRC check for static UBI 
> > > > > > volumes
> > > > > > in the core, let's expose it to users.
> > > > > > 
> > > > > > This makes use of a padding byte in the volume description data
> > > > > > structure as a flag. This flag only tell for now whether we should 
> > > > > > skip
> > > > > > the CRC check of a volume.
> > > > > > 
> > > > > > This checks the UBI volume for which we are trying to skip the CRC 
> > > > > > check
> > > > > > is static.
> > > > > > 
> > > > > > Suggested-by: Boris Brezillon 
> > > > > > Signed-off-by: Quentin Schulz 
> > > > > > Reviewed-by: Boris Brezillon 
> > > > > > ---
> > > > > >  drivers/mtd/ubi/cdev.c  |  4 
> > > > > >  drivers/mtd/ubi/vmt.c   |  3 +++
> > > > > >  include/uapi/mtd/ubi-user.h | 16 ++--
> > > > > >  3 files changed, 21 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c
> > > > > > index 45c3296..3eea1df 100644
> > > > > > --- a/drivers/mtd/ubi/cdev.c
> > > > > > +++ b/drivers/mtd/ubi/cdev.c
> > > > > > @@ -622,6 +622,10 @@ static int verify_mkvol_req(const struct 
> > > > > > ubi_device *ubi,
> > > > > > req->vol_type != UBI_STATIC_VOLUME)
> > > > > > goto bad;
> > > > > >  
> > > > > > +   if (req->flags & UBI_VOL_SKIP_CRC_CHECK_FLG &&  
> > > > 
> > > > Oops, missed that req->flags & UBI_VOL_SKIP_CRC_CHECK_FLG check was
> > > > missing parens (checkpatch --strict should complain about that).  
> > > 
> > > Latest when building my local branch or in linux-next we had noticed.
> > > No need to worry.
> > >
> > > > > > +   req->vol_type != UBI_STATIC_VOLUME)
> > > > > > +   goto bad;
> > > > > 
> > > > > We should also reject unknown flags here.  
> > > > 
> > > > I agree.  
> > 
> > Should I send another version of my patches for it?

Yes. Please.

> Yes please, respin your series with this additional check. Just define
> 
> #define UBI_VOL_VALID_FLGS(UBI_VOL_SKIP_CRC_CHECK_FLG)
> 
> and then, in verify_mkvol_req() add
> 
>   if (req->flags & ~UBI_VOL_VALID_FLGS)
>   goto bad;

Yep.
 
> > Same for
> > parenthesis around the flags masking above?
> 
> No need to fix that one (unless Richard cares), as it seems I had it
> wrong.

Nah. If both gcc and checkpatch don't complain, let's keep it as-is.

Thanks,
//richard

Re: [PATCH 1/3] IIO: st_accel_i2c.c: Use fallback if DT/ACPI enum failed

2018-07-01 Thread Nikolaus Voss


On Sat, 30 Jun 2018, Jonathan Cameron wrote:


On Fri, 29 Jun 2018 10:10:10 +0200
Nikolaus Voss  wrote:


Currently, the driver bails out if not explicitly referred to in
DT or ACPI tables. This prevents fallback mechanisms from coming
into effect, e.g. I2C device ID table match via DT or ACPI
PRP0001 HID. However DT/ACPI enum should take precedence over
the fallback, so evaluate that first.

Signed-off-by: Nikolaus Voss 


Is the change to probe_new actually related to the rest of the change?

I can't immediately see why...  If not I would prefer that as a separate
change.


Well, it is, because the id table pointer of the old probe() is not 
used any more.





---
 drivers/iio/accel/st_accel_i2c.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/iio/accel/st_accel_i2c.c b/drivers/iio/accel/st_accel_i2c.c
index 6bdec8c451e0..e360da407027 100644
--- a/drivers/iio/accel/st_accel_i2c.c
+++ b/drivers/iio/accel/st_accel_i2c.c
@@ -138,8 +138,7 @@ static const struct i2c_device_id st_accel_id_table[] = {
 };
 MODULE_DEVICE_TABLE(i2c, st_accel_id_table);

-static int st_accel_i2c_probe(struct i2c_client *client,
-   const struct i2c_device_id *id)
+static int st_accel_i2c_probe(struct i2c_client *client)
 {
struct iio_dev *indio_dev;
struct st_sensor_data *adata;
@@ -156,14 +155,18 @@ static int st_accel_i2c_probe(struct i2c_client *client,
 client->name, sizeof(client->name));
} else if (ACPI_HANDLE(&client->dev)) {
ret = st_sensors_match_acpi_device(&client->dev);
-   if ((ret < 0) || (ret >= ST_ACCEL_MAX))
-   return -ENODEV;
-
-   strlcpy(client->name, st_accel_id_table[ret].name,
+   if ((ret >= 0) && (ret < ST_ACCEL_MAX))
+   strlcpy(client->name, st_accel_id_table[ret].name,
sizeof(client->name));
-   } else if (!id)
-   return -ENODEV;
+   }

+   /*
+* If OF and ACPI enumeration failed, there could still be platform
+* information via fallback enumeration or explicit instantiation, so
+* check if id table has been matched via client->name.
+*/
+   if (!client->name)
+   return -ENODEV;

st_sensors_i2c_configure(indio_dev, client, adata);

@@ -187,7 +190,7 @@ static struct i2c_driver st_accel_driver = {
.of_match_table = of_match_ptr(st_accel_of_match),
.acpi_match_table = ACPI_PTR(st_accel_acpi_match),
},
-   .probe = st_accel_i2c_probe,
+   .probe_new = st_accel_i2c_probe,
.remove = st_accel_i2c_remove,
.id_table = st_accel_id_table,
 };

Re: [PATCH v3 2/2] ubi: expose the volume CRC check skip flag

2018-07-01 Thread Boris Brezillon

On Mon, 2 Jul 2018 08:44:33 +0200
Quentin Schulz  wrote:

> Hi Richard, Boris,
> 
> On Sun, Jul 01, 2018 at 10:50:41PM +0200, Richard Weinberger wrote:
> > Am Sonntag, 1. Juli 2018, 22:33:47 CEST schrieb Boris Brezillon:  
> > > On Sun, 01 Jul 2018 21:35:57 +0200
> > > Richard Weinberger  wrote:
> > >   
> > > > Quentin,
> > > > 
> > > > Am Donnerstag, 28. Juni 2018, 09:40:53 CEST schrieb Quentin Schulz:  
> > > > > Now that we have the logic for skipping CRC check for static UBI 
> > > > > volumes
> > > > > in the core, let's expose it to users.
> > > > > 
> > > > > This makes use of a padding byte in the volume description data
> > > > > structure as a flag. This flag only tell for now whether we should 
> > > > > skip
> > > > > the CRC check of a volume.
> > > > > 
> > > > > This checks the UBI volume for which we are trying to skip the CRC 
> > > > > check
> > > > > is static.
> > > > > 
> > > > > Suggested-by: Boris Brezillon 
> > > > > Signed-off-by: Quentin Schulz 
> > > > > Reviewed-by: Boris Brezillon 
> > > > > ---
> > > > >  drivers/mtd/ubi/cdev.c  |  4 
> > > > >  drivers/mtd/ubi/vmt.c   |  3 +++
> > > > >  include/uapi/mtd/ubi-user.h | 16 ++--
> > > > >  3 files changed, 21 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c
> > > > > index 45c3296..3eea1df 100644
> > > > > --- a/drivers/mtd/ubi/cdev.c
> > > > > +++ b/drivers/mtd/ubi/cdev.c
> > > > > @@ -622,6 +622,10 @@ static int verify_mkvol_req(const struct 
> > > > > ubi_device *ubi,
> > > > >   req->vol_type != UBI_STATIC_VOLUME)
> > > > >   goto bad;
> > > > >  
> > > > > + if (req->flags & UBI_VOL_SKIP_CRC_CHECK_FLG &&  
> > > 
> > > Oops, missed that req->flags & UBI_VOL_SKIP_CRC_CHECK_FLG check was
> > > missing parens (checkpatch --strict should complain about that).  
> > 
> > Latest when building my local branch or in linux-next we had noticed.
> > No need to worry.
> >
> > > > > + req->vol_type != UBI_STATIC_VOLUME)
> > > > > + goto bad;
> > > > 
> > > > We should also reject unknown flags here.  
> > > 
> > > I agree.  
> 
> Should I send another version of my patches for it?

Yes please, respin your series with this additional check. Just define

#define UBI_VOL_VALID_FLGS  (UBI_VOL_SKIP_CRC_CHECK_FLG)

and then, in verify_mkvol_req() add

if (req->flags & ~UBI_VOL_VALID_FLGS)
goto bad;

> Same for
> parenthesis around the flags masking above?

No need to fix that one (unless Richard cares), as it seems I had it
wrong.

Re: [PATCH 0/3] IIO: st_sensors_i2c: improve device enumeration

2018-07-01 Thread Nikolaus Voss


On Fri, 29 Jun 2018, Andy Shevchenko wrote:


I'm not sure I understand how ->probe_new() is supposed to work
against i2c_id_table, but I don't care for legacy platform data
anyway.

What I would like to point to is device_get_match_data() API which
should simplify / unify the case how you get driver data.


This driver doesn't need any driver data/ platform_data beyond the 
i2c_id_table name (which has already been matched when probe()/ 
probe_new() is called), so strictly neither of_match_table nor 
apci_match_table would be necessary, because i2c DT/ ACPI enumeration also 
matches against i2c_table names.


But thanks for the hint ;-).

Niko

Re: [PATCH] siox: don't create a thread without starting it

2018-07-01 Thread Uwe Kleine-König

Hello,

On Mon, Jul 02, 2018 at 09:34:04AM +1000, Stephen Rothwell wrote:
> On Fri, 29 Jun 2018 09:38:58 +0200 Uwe Kleine-König 
>  wrote:
> >
> > On Thu, Jun 28, 2018 at 09:57:42AM +0200, Uwe Kleine-König wrote:
> > > Greg, you applied the initial patches creating drivers/siox. I assume
> > > you will continue to apply siox patches and tell if I should search a
> > > different path for these.  
> > 
> > I put the two patches that I'd like to get in during the next merge
> > window into a branch in my repository:
> > 
> > https://git.pengutronix.de/git/ukl/linux siox/next
> > 
> > I assume you wouldn't object to add this branch to linux-next?
> 
> Added from today. 

Ok, great. 

> Is this a continuing effort, or should I just remove the tree once
> Greg has merged it?

I expect the patch count to be zero for most future releases because
there are probably no users apart from Eckelmann. On the other hand if
we still need a patch it would continue to be good to have the exposure
in next.

So I'd say, if it reduces your burden feel free to drop it and if there
is something relevant later, it seems to be easy enough to readd it
then.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |

Re: [PATCH v2 0/4] PCI: mediatek: fixup find_port, enable_msi and add pm, module support

2018-07-01 Thread Ryder Lee

On Mon, 2018-07-02 at 11:49 +0800, honghui.zh...@mediatek.com wrote:
> From: Honghui Zhang 
> 
> This patchset includes misc patchs:
> 
> The first patch fixup the mtk_pcie_find_port logical which will cause system
> could not touch the EP's configuration space which was connected to PCIe slot 
> 1.
> 
> The second patch fixup the enable msi logical, the operation to enable msi
> should be after system clock is enabled. The function of 
> mtk_pcie_startup_port_v2's
> define location is re-arranged to avoid mtk_pcie_enable_msi's forward 
> declaration.
> And call mtk_pcie_enable_msi in mtk_pcie_startup_port_v2 since the clock was 
> all
> enabled at that time.
> 
> The third patch was rebased and refactor of the v4 patch[1], changes are:
>  -Add PM support for MT7622.
>  -Using mtk_pcie_enable_port to re-establish the link when resumed.
>  -Rebase on the previous two patches.
> 
> The fourth patch add loadable kernel module support.
> 
> Some of those patches was already reviewed-by Ryder Lee 
> ,
> so I just add the Reviewed-by tags in those patches.
> 
> [1] https://patchwork.kernel.org/patch/10479079
> 
> Change since v1:
>  - A bit of code refact of the first patch suggested by Andy Shevchenko, and
>commit message updated.
>  - Using __maybe_unused.
>  - Remove the redundant list_empty check of the fourth patch.
> 
> Honghui Zhang (4):
>   PCI: mediatek: fixup mtk_pcie_find_port logical
>   PCI: mediatek: enable msi after clock enabled
>   PCI: mediatek: Add system pm support for MT2712 and MT7622
>   PCI: mediatek: Add loadable kernel module support
> 
>  drivers/pci/controller/Kconfig |   2 +-
>  drivers/pci/controller/pcie-mediatek.c | 289 
> -
>  2 files changed, 213 insertions(+), 78 deletions(-)
> 

For the series:

Acked-by: Ryder Lee 


Thanks.

Re: [PATCH v3 2/2] ubi: expose the volume CRC check skip flag

2018-07-01 Thread Boris Brezillon

On Sun, 01 Jul 2018 13:54:32 -0700
Joe Perches  wrote:

> On Sun, 2018-07-01 at 22:33 +0200, Boris Brezillon wrote:
> > On Sun, 01 Jul 2018 21:35:57 +0200 Richard Weinberger  
> > wrote:  
> > > Am Donnerstag, 28. Juni 2018, 09:40:53 CEST schrieb Quentin Schulz:  
> > > > Now that we have the logic for skipping CRC check for static UBI volumes
> > > > in the core, let's expose it to users.  
> []
> > > > diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c  
> []
> > > > @@ -622,6 +622,10 @@ static int verify_mkvol_req(const struct 
> > > > ubi_device *ubi,
> > > > req->vol_type != UBI_STATIC_VOLUME)
> > > > goto bad;
> > > >  
> > > > +   if (req->flags & UBI_VOL_SKIP_CRC_CHECK_FLG &&  
> > 
> > Oops, missed that req->flags & UBI_VOL_SKIP_CRC_CHECK_FLG check was
> > missing parens (checkpatch --strict should complain about that).  
> 
> Why should checkpatch complain?
> & has higher precedence than &&.
> 

Yes, I know, but I remember checkpatch complaining about that in one
of my patch (maybe it was a slightly different case though). Anyway, I
double checked and, as you report, checkpatch does not complain, so
please ignore this comment (sorry for the noise).

Re: [PATCH v3 2/2] ubi: expose the volume CRC check skip flag

2018-07-01 Thread Quentin Schulz

Hi Richard, Boris,

On Sun, Jul 01, 2018 at 10:50:41PM +0200, Richard Weinberger wrote:
> Am Sonntag, 1. Juli 2018, 22:33:47 CEST schrieb Boris Brezillon:
> > On Sun, 01 Jul 2018 21:35:57 +0200
> > Richard Weinberger  wrote:
> > 
> > > Quentin,
> > > 
> > > Am Donnerstag, 28. Juni 2018, 09:40:53 CEST schrieb Quentin Schulz:
> > > > Now that we have the logic for skipping CRC check for static UBI volumes
> > > > in the core, let's expose it to users.
> > > > 
> > > > This makes use of a padding byte in the volume description data
> > > > structure as a flag. This flag only tell for now whether we should skip
> > > > the CRC check of a volume.
> > > > 
> > > > This checks the UBI volume for which we are trying to skip the CRC check
> > > > is static.
> > > > 
> > > > Suggested-by: Boris Brezillon 
> > > > Signed-off-by: Quentin Schulz 
> > > > Reviewed-by: Boris Brezillon 
> > > > ---
> > > >  drivers/mtd/ubi/cdev.c  |  4 
> > > >  drivers/mtd/ubi/vmt.c   |  3 +++
> > > >  include/uapi/mtd/ubi-user.h | 16 ++--
> > > >  3 files changed, 21 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c
> > > > index 45c3296..3eea1df 100644
> > > > --- a/drivers/mtd/ubi/cdev.c
> > > > +++ b/drivers/mtd/ubi/cdev.c
> > > > @@ -622,6 +622,10 @@ static int verify_mkvol_req(const struct 
> > > > ubi_device *ubi,
> > > > req->vol_type != UBI_STATIC_VOLUME)
> > > > goto bad;
> > > >  
> > > > +   if (req->flags & UBI_VOL_SKIP_CRC_CHECK_FLG &&
> > 
> > Oops, missed that req->flags & UBI_VOL_SKIP_CRC_CHECK_FLG check was
> > missing parens (checkpatch --strict should complain about that).
> 
> Latest when building my local branch or in linux-next we had noticed.
> No need to worry.
>  
> > > > +   req->vol_type != UBI_STATIC_VOLUME)
> > > > +   goto bad;  
> > > 
> > > We should also reject unknown flags here.
> > 
> > I agree.

Should I send another version of my patches for it? Same for
parenthesis around the flags masking above?

Quentin


signature.asc
Description: PGP signature

Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()

2018-07-01 Thread John Hubbard

On 07/01/2018 11:34 PM, Leon Romanovsky wrote:
> On Sun, Jul 01, 2018 at 11:10:04PM -0700, John Hubbard wrote:
>> On 07/01/2018 10:52 PM, Leon Romanovsky wrote:
>>> On Thu, Jun 28, 2018 at 11:17:43AM +0200, Jan Kara wrote:
 On Wed 27-06-18 19:42:01, John Hubbard wrote:
> On 06/27/2018 10:02 AM, Jan Kara wrote:
>> On Wed 27-06-18 08:57:18, Jason Gunthorpe wrote:
>>> On Wed, Jun 27, 2018 at 02:42:55PM +0200, Jan Kara wrote:
 On Wed 27-06-18 13:59:27, Michal Hocko wrote:
> On Wed 27-06-18 13:53:49, Jan Kara wrote:
>> On Wed 27-06-18 13:32:21, Michal Hocko wrote:
> [...]
> One question though: I'm still vague on the best actions to take in the
> following functions:
>
> page_mkclean_one
> try_to_unmap_one
>
> At the moment, they are both just doing an evil little early-out:
>
>   if (PageDmaPinned(page))
>   return false;
>
> ...but we talked about maybe waiting for the condition to clear, instead?
> Thoughts?

 What needs to happen in page_mkclean() depends on the caller. Most of the
 callers really need to be sure the page is write-protected once
 page_mkclean() returns. Those are:

   pagecache_isize_extended()
   fb_deferred_io_work()
   clear_page_dirty_for_io() if called for data-integrity writeback - which
 is currently known only in its caller (e.g. write_cache_pages()) where
 it can be determined as wbc->sync_mode == WB_SYNC_ALL. Getting this
 information into page_mkclean() will require some plumbing and
 clear_page_dirty_for_io() has some 50 callers but it's doable.

 clear_page_dirty_for_io() for cleaning writeback (wbc->sync_mode !=
 WB_SYNC_ALL) can just skip pinned pages and we probably need to do that as
 otherwise memory cleaning would get stuck on pinned pages until RDMA
 drivers release its pins.
>>>
>>> Sorry for naive question, but won't it create too much dirty pages
>>> so writeback will be called "non-stop" to rebalance watermarks without
>>> ability to progress?
>>>
>>
>> That is an interesting point.
>>
>> Holding off page writeback of this region does seem like it could cause
>> problems under memory pressure. Maybe adjusting the watermarks so that we
>> tell the writeback  system, "all is well, just ignore this region until
>> we're done with it" might help? Any ideas here are welcome...
> 
> AFAIR, it is per-zone, so the solution to count dirty-but-untouchable
> number of pages to take them into account for accounting can work, but
> it seems like an overkill. Can we create special ZONE for such gup
> pages, or this is impossible too?
> 

Let's see what Michal and others prefer. The zone idea intrigues me. 

thanks,
-- 
John Hubbard
NVIDIA

Re: [PATCH v7 2/4] phy: General struct and field cleanup

2018-07-01 Thread Vivek Gautam

Hi Can,


On Tue, Jun 19, 2018 at 2:06 PM, Can Guo  wrote:
> Move MSM8996 specific PHY vreg list struct name to a genernal one as it is
> used by all PHYs. Add a specific field to handle dual lane situation.
>
> Signed-off-by: Can Guo 
> ---
>  drivers/phy/qualcomm/phy-qcom-qmp.c | 25 ++---
>  1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c 
> b/drivers/phy/qualcomm/phy-qcom-qmp.c
> index ccb8578..9be9754 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> @@ -649,6 +649,8 @@ struct qmp_phy_cfg {
>
> /* true, if PHY has a separate DP_COM control block */
> bool has_phy_dp_com_ctrl;
> +   /* true, if PHY has secondary tx/rx lanes to be configured */
> +   bool is_dual_lane_phy;
> /* Register offset of secondary tx/rx lanes for USB DP combo PHY */
> unsigned int tx_b_lane_offset;
> unsigned int rx_b_lane_offset;
> @@ -758,7 +760,7 @@ static inline void qphy_clrbits(void __iomem *base, u32 
> offset, u32 val)
>  };
>
>  /* list of regulators */
> -static const char * const msm8996_phy_vreg_l[] = {
> +static const char * const qmp_phy_vreg_l[] = {
> "vdda-phy", "vdda-pll",
>  };
>
> @@ -778,8 +780,8 @@ static inline void qphy_clrbits(void __iomem *base, u32 
> offset, u32 val)
> .num_clks   = ARRAY_SIZE(msm8996_phy_clk_l),
> .reset_list = msm8996_pciephy_reset_l,
> .num_resets = ARRAY_SIZE(msm8996_pciephy_reset_l),
> -   .vreg_list  = msm8996_phy_vreg_l,
> -   .num_vregs  = ARRAY_SIZE(msm8996_phy_vreg_l),
> +   .vreg_list  = qmp_phy_vreg_l,
> +   .num_vregs  = ARRAY_SIZE(qmp_phy_vreg_l),
> .regs   = pciephy_regs_layout,
>
> .start_ctrl = PCS_START | PLL_READY_GATE_EN,
> @@ -809,8 +811,8 @@ static inline void qphy_clrbits(void __iomem *base, u32 
> offset, u32 val)
> .num_clks   = ARRAY_SIZE(msm8996_phy_clk_l),
> .reset_list = msm8996_usb3phy_reset_l,
> .num_resets = ARRAY_SIZE(msm8996_usb3phy_reset_l),
> -   .vreg_list  = msm8996_phy_vreg_l,
> -   .num_vregs  = ARRAY_SIZE(msm8996_phy_vreg_l),
> +   .vreg_list  = qmp_phy_vreg_l,
> +   .num_vregs  = ARRAY_SIZE(qmp_phy_vreg_l),
> .regs   = usb3phy_regs_layout,
>
> .start_ctrl = SERDES_START | PCS_START,
> @@ -870,8 +872,8 @@ static inline void qphy_clrbits(void __iomem *base, u32 
> offset, u32 val)
> .num_clks   = ARRAY_SIZE(qmp_v3_phy_clk_l),
> .reset_list = msm8996_usb3phy_reset_l,
> .num_resets = ARRAY_SIZE(msm8996_usb3phy_reset_l),
> -   .vreg_list  = msm8996_phy_vreg_l,
> -   .num_vregs  = ARRAY_SIZE(msm8996_phy_vreg_l),
> +   .vreg_list  = qmp_phy_vreg_l,
> +   .num_vregs  = ARRAY_SIZE(qmp_phy_vreg_l),
> .regs   = qmp_v3_usb3phy_regs_layout,
>
> .start_ctrl = SERDES_START | PCS_START,
> @@ -883,6 +885,7 @@ static inline void qphy_clrbits(void __iomem *base, u32 
> offset, u32 val)
> .pwrdn_delay_max= POWER_DOWN_DELAY_US_MAX,
>
> .has_phy_dp_com_ctrl= true,
> +   .is_dual_lane_phy   = true,
> .tx_b_lane_offset   = 0x400,
> .rx_b_lane_offset   = 0x400,
>  };
> @@ -903,8 +906,8 @@ static inline void qphy_clrbits(void __iomem *base, u32 
> offset, u32 val)
> .num_clks   = ARRAY_SIZE(qmp_v3_phy_clk_l),
> .reset_list = msm8996_usb3phy_reset_l,
> .num_resets = ARRAY_SIZE(msm8996_usb3phy_reset_l),
> -   .vreg_list  = msm8996_phy_vreg_l,
> -   .num_vregs  = ARRAY_SIZE(msm8996_phy_vreg_l),
> +   .vreg_list  = qmp_phy_vreg_l,
> +   .num_vregs  = ARRAY_SIZE(qmp_phy_vreg_l),
> .regs   = qmp_v3_usb3phy_regs_layout,
>
> .start_ctrl = SERDES_START | PCS_START,
> @@ -1116,12 +1119,12 @@ static int qcom_qmp_phy_init(struct phy *phy)
> /* Tx, Rx, and PCS configurations */
> qcom_qmp_phy_configure(tx, cfg->regs, cfg->tx_tbl, cfg->tx_tbl_num);
> /* Configuration for other LANE for USB-DP combo PHY */
> -   if (cfg->has_phy_dp_com_ctrl)
> +   if (cfg->is_dual_lane_phy)
> qcom_qmp_phy_configure(tx + cfg->tx_b_lane_offset, cfg->regs,
>cfg->tx_tbl, cfg->tx_tbl_num);
>
> qcom_qmp_phy_configure(rx, cfg->regs, cfg->rx_tbl, cfg->rx_tbl_num);
> -   if (cfg->has_phy_dp_com_ctrl)
> +   if (cfg->is_dual_lane_phy)
> qcom_qmp_phy_configure(rx + cfg->rx_b_lane_offset, cfg->regs,
>

Re: [PATCH] rpmsg: smd: Add missing include of sizes.h

2018-07-01 Thread Bjorn Andersson

On Fri 29 Jun 10:01 PDT 2018, Niklas Cassel wrote:

> Add missing include of sizes.h.
> 
> drivers/rpmsg/qcom_smd.c: In function ‘qcom_smd_channel_open’:
> drivers/rpmsg/qcom_smd.c:809:36: error: ‘SZ_4K’ undeclared (first use in this 
> function)
>   bb_size = min(channel->fifo_size, SZ_4K);
> ^
> 
> Signed-off-by: Niklas Cassel 

Applied

Thanks,
Bjorn

> ---
>  drivers/rpmsg/qcom_smd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/rpmsg/qcom_smd.c b/drivers/rpmsg/qcom_smd.c
> index 6437bbeebc91..8695cb041c31 100644
> --- a/drivers/rpmsg/qcom_smd.c
> +++ b/drivers/rpmsg/qcom_smd.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> -- 
> 2.17.1
>

Re: [PATCH 3.18 00/85] 3.18.114-stable review

2018-07-01 Thread Greg Kroah-Hartman

On Sun, Jul 01, 2018 at 12:37:07PM -0700, Nathan Chancellor wrote:
> On Sun, Jul 01, 2018 at 06:01:18PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.18.114 release.
> > There are 85 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Tue Jul  3 15:31:04 UTC 2018.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.114-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-3.18.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Merged, compiled with -Werror, and installed onto my Pixel XL.
> 
> No initial issues noticed in dmesg or general usage.

Wonderful, thanks for testing all three of those kernels and letting me
know.

greg k-h

Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()

2018-07-01 Thread Leon Romanovsky

On Sun, Jul 01, 2018 at 11:10:04PM -0700, John Hubbard wrote:
> On 07/01/2018 10:52 PM, Leon Romanovsky wrote:
> > On Thu, Jun 28, 2018 at 11:17:43AM +0200, Jan Kara wrote:
> >> On Wed 27-06-18 19:42:01, John Hubbard wrote:
> >>> On 06/27/2018 10:02 AM, Jan Kara wrote:
>  On Wed 27-06-18 08:57:18, Jason Gunthorpe wrote:
> > On Wed, Jun 27, 2018 at 02:42:55PM +0200, Jan Kara wrote:
> >> On Wed 27-06-18 13:59:27, Michal Hocko wrote:
> >>> On Wed 27-06-18 13:53:49, Jan Kara wrote:
>  On Wed 27-06-18 13:32:21, Michal Hocko wrote:
> >>> [...]
> >>> One question though: I'm still vague on the best actions to take in the
> >>> following functions:
> >>>
> >>> page_mkclean_one
> >>> try_to_unmap_one
> >>>
> >>> At the moment, they are both just doing an evil little early-out:
> >>>
> >>>   if (PageDmaPinned(page))
> >>>   return false;
> >>>
> >>> ...but we talked about maybe waiting for the condition to clear, instead?
> >>> Thoughts?
> >>
> >> What needs to happen in page_mkclean() depends on the caller. Most of the
> >> callers really need to be sure the page is write-protected once
> >> page_mkclean() returns. Those are:
> >>
> >>   pagecache_isize_extended()
> >>   fb_deferred_io_work()
> >>   clear_page_dirty_for_io() if called for data-integrity writeback - which
> >> is currently known only in its caller (e.g. write_cache_pages()) where
> >> it can be determined as wbc->sync_mode == WB_SYNC_ALL. Getting this
> >> information into page_mkclean() will require some plumbing and
> >> clear_page_dirty_for_io() has some 50 callers but it's doable.
> >>
> >> clear_page_dirty_for_io() for cleaning writeback (wbc->sync_mode !=
> >> WB_SYNC_ALL) can just skip pinned pages and we probably need to do that as
> >> otherwise memory cleaning would get stuck on pinned pages until RDMA
> >> drivers release its pins.
> >
> > Sorry for naive question, but won't it create too much dirty pages
> > so writeback will be called "non-stop" to rebalance watermarks without
> > ability to progress?
> >
>
> That is an interesting point.
>
> Holding off page writeback of this region does seem like it could cause
> problems under memory pressure. Maybe adjusting the watermarks so that we
> tell the writeback  system, "all is well, just ignore this region until
> we're done with it" might help? Any ideas here are welcome...

AFAIR, it is per-zone, so the solution to count dirty-but-untouchable
number of pages to take them into account for accounting can work, but
it seems like an overkill. Can we create special ZONE for such gup
pages, or this is impossible too?

>
> Longer term, maybe some additional work could allow the kernel to be able
> to writeback the gup-pinned pages (while DMA is happening--snapshots), but
> that seems like a pretty big overhaul.
>
> thanks,
> --
> John Hubbard
> NVIDIA


signature.asc
Description: PGP signature

Re: [PATCH 4.17 154/220] UBIFS: Fix potential integer overflow in allocation

2018-07-01 Thread Greg Kroah-Hartman

On Mon, Jul 02, 2018 at 08:32:55AM +0200, Greg Kroah-Hartman wrote:
> On Sun, Jul 01, 2018 at 08:48:07PM +0200, Richard Weinberger wrote:
> > On Sun, Jul 1, 2018 at 6:22 PM, Greg Kroah-Hartman
> >  wrote:
> > > 4.17-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > >
> > > --
> > >
> > > From: Silvio Cesare 
> > >
> > > commit 353748a359f1821ee934afc579cf04572406b420 upstream.
> > >
> > > There is potential for the size and len fields in ubifs_data_node to be
> > > too large causing either a negative value for the length fields or an
> > > integer overflow leading to an incorrect memory allocation. Likewise,
> > > when the len field is small, an integer underflow may occur.
> > >
> > > Signed-off-by: Silvio Cesare 
> > > Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system")
> > > Cc: sta...@vger.kernel.org
> > > Signed-off-by: Kees Cook 
> > > Signed-off-by: Greg Kroah-Hartman 
> > 
> > Guys, this patch was never on linux-mtd nor was I CC'ed.
> > I don't see it so super security critical which argues to bypass the
> > whole community review process.
> > 
> > Anyway, I don't like this patch for two reasons.
> > 1. Instead of doing the kmalloc_array() dance, just check whether size
> > is 0 > and <= UBIFS_BLOCK_SIZE, in the caller.
> > 2. It will not apply to most stable kernels since it targets the code
> > path with UBIFS encryption available.
> 
> Can you get a fix into Linus's tree that I can also queue up for a
> stable release?

Ah nevermind, I see your revert/add patch now, sorry for the noise.

greg k-h

Re: [PATCH 4.17 154/220] UBIFS: Fix potential integer overflow in allocation

2018-07-01 Thread Greg Kroah-Hartman

On Sun, Jul 01, 2018 at 08:48:07PM +0200, Richard Weinberger wrote:
> On Sun, Jul 1, 2018 at 6:22 PM, Greg Kroah-Hartman
>  wrote:
> > 4.17-stable review patch.  If anyone has any objections, please let me know.
> >
> > --
> >
> > From: Silvio Cesare 
> >
> > commit 353748a359f1821ee934afc579cf04572406b420 upstream.
> >
> > There is potential for the size and len fields in ubifs_data_node to be
> > too large causing either a negative value for the length fields or an
> > integer overflow leading to an incorrect memory allocation. Likewise,
> > when the len field is small, an integer underflow may occur.
> >
> > Signed-off-by: Silvio Cesare 
> > Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Kees Cook 
> > Signed-off-by: Greg Kroah-Hartman 
> 
> Guys, this patch was never on linux-mtd nor was I CC'ed.
> I don't see it so super security critical which argues to bypass the
> whole community review process.
> 
> Anyway, I don't like this patch for two reasons.
> 1. Instead of doing the kmalloc_array() dance, just check whether size
> is 0 > and <= UBIFS_BLOCK_SIZE, in the caller.
> 2. It will not apply to most stable kernels since it targets the code
> path with UBIFS encryption available.

Can you get a fix into Linus's tree that I can also queue up for a
stable release?

thanks,

greg k-h

Re: h8300: BUG: Bad page state in process swapper (was: Re: why do we still need bootmem allocator?)

2018-07-01 Thread Yoshinori Sato

On Sun, 01 Jul 2018 21:22:46 +0900,
Mike Rapoport wrote:
> 
> (added Yoshinori Sato, here's the beginning of the discussion:
> https://lore.kernel.org/lkml/20180625140754.gb29...@dhcp22.suse.cz/)
> 
> On Wed, Jun 27, 2018 at 07:02:06PM +0300, Mike Rapoport wrote:
> > On Wed, Jun 27, 2018 at 07:33:55AM -0600, Rob Herring wrote:
> > > On Wed, Jun 27, 2018 at 5:27 AM Mike Rapoport  
> > > wrote:
> > > >
> > > > I've tried running the current upstream on h8300 gdb simulator and it
> > > > failed:
> > > 
> > > It seems my patch[1] is still not applied. The maintainer said he applied 
> > > it.
> > 
> > I've applied it manually. Without it unflatten_and_copy_device_tree() fails
> > to allocate memory. It indeed can be fixed with moving bootmem_init()
> > before, as you've noted in the commit message.
> > 
> > I'll try to dig deeper into it.
> >  
> > > > [0.00] BUG: Bad page state in process swapper  pfn:4
> > > > [0.00] page:007ed080 count:0 mapcount:-128 mapping:
> > > > index:0x0
> > > > [0.00] flags: 0x0()
> > > > [0.00] raw:  0040bdac 0040bdac   
> > > > 0002
> > > > ff7f 
> > > > [0.00] page dumped because: nonzero mapcount
> > > > ---Type  to continue, or q  to quit---
> > > > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-rc2+ #50
> > > > [0.00] Stack from 00401f2c:
> > > > [0.00]   00401f2c 001116cb 007ed080 00401f40 000e20e6 00401f54
> > > > 0004df14 
> > > > [0.00]   007ed080 007ed000 00401f5c 0004df8c 00401f90 0004e982
> > > > 0044 00401fd1
> > > > [0.00]   007ed000 007ed000  0004 0008 
> > > > 0003 0011
> > > > [0.00]
> > > > [0.00] Call Trace:
> > > > [0.00] [<000e20e6>] [<0004df14>] [<0004df8c>] 
> > > > [<0004e982>]
> > > > [0.00] [<00051a28>] [<1000>] [<0100>]
> > > > [0.00] Disabling lock debugging due to kernel taint
> > > >
> > > > With v4.13 I was able to get to "no valid init found".
> > > >
> > > > I had a quick look at h8300 memory initialization and it seems it has
> > > > starting pfn set to 0 while fdt defines memory start at 4M.
> > > 
> > > Perhaps there's another issue.
> 
> In my setup this is caused by __ffs() clobbering start pfn in
> nobootmem.c::__free_pages_memory().
> 
> If I change the __ffs() implementation from the inline assembly to generic
> bitops everything is fine.

OK.
Current bitops.h implementations have some dependencies on gcc's behavior.
I think that it is necessary to modify it generically so that it can
correspond to the new gcc.

Please wait until it gets fixed.


> I'm using gcc 8.1.0 from [1] and gdb 8.1.0.20180625-git
> 
> [1] http://cdn.kernel.org/pub/tools/crosstool/files/bin/x86_64/
> 
> 
> -- 
> Sincerely yours,
> 

-- 
Yosinori Sato

Re: [PATCH v2 1/4] perf test shell: Replace '|&' with '2>&1 |' to work with more shells

2018-07-01 Thread Thomas-Mich Richter

On 06/29/2018 07:46 PM, Kim Phillips wrote:
> Since we do not specify bash (and/or zsh) as a requirement, use the
> standard error redirection that is more widely supported.
> 
> BEFORE:
> 
>  $ sudo ./perf test -v 62
>  62: Check open filename arg using perf trace + vfs_getname:
>  --- start ---
>  test child forked, pid 27305
>  ./tests/shell/trace+probe_vfs_getname.sh: 20: 
> ./tests/shell/trace+probe_vfs_getname.sh: Syntax error: "&" unexpected
>  test child finished with -2
>   end 
>  Check open filename arg using perf trace + vfs_getname: Skip
> 
> AFTER:
> 
>  $ sudo ./perf test -v 62
>  64: Check open filename arg using perf trace + vfs_getname   :
>  --- start ---
>  test child forked, pid 23008
>  Added new event:
>probe:vfs_getname(on getname_flags:72 with 
> pathname=result->name:string)
> 
>  You can now use it in all perf tools, such as:
> 
>  perf record -e probe:vfs_getname -aR sleep 1
> 
>   0.361 ( 0.008 ms): touch/23032 openat(dfd: CWD, filename: 
> /tmp/temporary_file.VEh0n, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: 
> IRUGO|IWUGO) = 4
>  test child finished with 0
>   end 
>  Check open filename arg using perf trace + vfs_getname: Ok
> 
> Similar to commit 35435cd06081, with the same title.
> 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Alexander Shishkin 
> Cc: Jiri Olsa 
> Cc: Namhyung Kim 
> Cc: Thomas Richter 
> Cc: Michael Petlan 
> Signed-off-by: Kim Phillips 
> ---
> v2: indent terminal session logs with a space to avoid git-am parsing
> '--- start ---' as the end of the description text.
> 
>  tools/perf/tests/shell/trace+probe_vfs_getname.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/tests/shell/trace+probe_vfs_getname.sh 
> b/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> index 55ad9793d544..4ce276efe6b4 100755
> --- a/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> +++ b/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> @@ -17,7 +17,7 @@ skip_if_no_perf_probe || exit 2
>  file=$(mktemp /tmp/temporary_file.X)
> 
>  trace_open_vfs_getname() {
> - evts=$(echo $(perf list syscalls:sys_enter_open* |& egrep 'open(at)? ' 
> | sed -r 's/.*sys_enter_([a-z]+) +\[.*$/\1/') | sed 's/ /,/')
> + evts=$(echo $(perf list syscalls:sys_enter_open* 2>&1 | egrep 
> 'open(at)? ' | sed -r 's/.*sys_enter_([a-z]+) +\[.*$/\1/') | sed 's/ /,/')
>   perf trace -e $evts touch $file 2>&1 | \
>   egrep " +[0-9]+\.[0-9]+ +\( +[0-9]+\.[0-9]+ ms\): +touch\/[0-9]+ 
> open(at)?\((dfd: +CWD, +)?filename: +${file}, +flags: 
> CREAT\|NOCTTY\|NONBLOCK\|WRONLY, +mode: +IRUGO\|IWUGO\) += +[0-9]+$"
>  }
> 


Applied and tested for s390. You have my tested by.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()

2018-07-01 Thread John Hubbard

On 07/01/2018 10:52 PM, Leon Romanovsky wrote:
> On Thu, Jun 28, 2018 at 11:17:43AM +0200, Jan Kara wrote:
>> On Wed 27-06-18 19:42:01, John Hubbard wrote:
>>> On 06/27/2018 10:02 AM, Jan Kara wrote:
 On Wed 27-06-18 08:57:18, Jason Gunthorpe wrote:
> On Wed, Jun 27, 2018 at 02:42:55PM +0200, Jan Kara wrote:
>> On Wed 27-06-18 13:59:27, Michal Hocko wrote:
>>> On Wed 27-06-18 13:53:49, Jan Kara wrote:
 On Wed 27-06-18 13:32:21, Michal Hocko wrote:
>>> [...]
>>> One question though: I'm still vague on the best actions to take in the
>>> following functions:
>>>
>>> page_mkclean_one
>>> try_to_unmap_one
>>>
>>> At the moment, they are both just doing an evil little early-out:
>>>
>>> if (PageDmaPinned(page))
>>> return false;
>>>
>>> ...but we talked about maybe waiting for the condition to clear, instead?
>>> Thoughts?
>>
>> What needs to happen in page_mkclean() depends on the caller. Most of the
>> callers really need to be sure the page is write-protected once
>> page_mkclean() returns. Those are:
>>
>>   pagecache_isize_extended()
>>   fb_deferred_io_work()
>>   clear_page_dirty_for_io() if called for data-integrity writeback - which
>> is currently known only in its caller (e.g. write_cache_pages()) where
>> it can be determined as wbc->sync_mode == WB_SYNC_ALL. Getting this
>> information into page_mkclean() will require some plumbing and
>> clear_page_dirty_for_io() has some 50 callers but it's doable.
>>
>> clear_page_dirty_for_io() for cleaning writeback (wbc->sync_mode !=
>> WB_SYNC_ALL) can just skip pinned pages and we probably need to do that as
>> otherwise memory cleaning would get stuck on pinned pages until RDMA
>> drivers release its pins.
> 
> Sorry for naive question, but won't it create too much dirty pages
> so writeback will be called "non-stop" to rebalance watermarks without
> ability to progress?
> 

That is an interesting point. 

Holding off page writeback of this region does seem like it could cause
problems under memory pressure. Maybe adjusting the watermarks so that we
tell the writeback  system, "all is well, just ignore this region until
we're done with it" might help? Any ideas here are welcome...

Longer term, maybe some additional work could allow the kernel to be able
to writeback the gup-pinned pages (while DMA is happening--snapshots), but
that seems like a pretty big overhaul.

thanks,
-- 
John Hubbard
NVIDIA

[PATCH 1/2] ARM: dts: pxa: add pincontrol helpers

2018-07-01 Thread Robert Jarzmik

Add 3 helpers so that pincontrol definitions for pxa25x and pxa27x are
easier, and can be easily converted from old mfp mach-pxa code to
devicetree.

An example of such conversion would be :
static unsigned long mioa701_pin_config[] = {
GPIO32_MMC_CLK,
GPIO92_MMC_DAT_0,
GPIO109_MMC_DAT_1,
GPIO110_MMC_DAT_2,
GPIO111_MMC_DAT_3,
GPIO112_MMC_CMD,
MIO_CFG_IN(GPIO78_SDIO_RO, AF0),
MIO_CFG_IN(GPIO15_SDIO_INSERT, AF0),
MIO_CFG_OUT(GPIO91_SDIO_EN, AF0, DRIVE_LOW),
};
into:
pinctrl_mmc_default: mmc-default {
PMMUX(sd-insert, 15, gpio_in);
PMMUX(mmclk, 32, MMCLK);
PMMUX(sd-ro, 78, gpio_in);
PMMUX_LPM_LOW(sd-enable, 91, gpio_out);
PMMUX(mmdat0, 92, MMDAT<0>);
PMMUX(mmdat1, 109, MMDAT<1>);
PMMUX(mmdat2, 110, MMDAT<2>);
PMMUX(mmdat3, 111, MMDAT<3>);
PMMUX(mmcmd, 112, MMCMD);
};

The third column of PMMUX*() helpers can be found in pincontrol muxing
functions, either in pinctrl-pxa27x.c (or pinctrl-pxa25x.c), or by
inspecting the pincontrol once booted in debugfs.

Signed-off-by: Robert Jarzmik 
---
 arch/arm/boot/dts/pxa2xx.dtsi | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm/boot/dts/pxa2xx.dtsi b/arch/arm/boot/dts/pxa2xx.dtsi
index e4ebcde17837..f609dbb86abf 100644
--- a/arch/arm/boot/dts/pxa2xx.dtsi
+++ b/arch/arm/boot/dts/pxa2xx.dtsi
@@ -9,6 +9,25 @@
 #include "skeleton.dtsi"
 #include "dt-bindings/clock/pxa-clock.h"
 
+#define PMGROUP(pin) #pin
+#define PMMUX(func, pin, af)   \
+   mux- ## func {  \
+   groups = PMGROUP(P ## pin); \
+   function = #af; \
+   }
+#define PMMUX_LPM_LOW(func, pin, af)   \
+   mux- ## func {  \
+   groups = PMGROUP(P ## pin); \
+   function = #af; \
+   low-power-disable;  \
+   }
+#define PMMUX_LPM_HIGH(func, pin, af)  \
+   mux- ## func {  \
+   groups = PMGROUP(P ## pin); \
+   function = #af; \
+   low-power-enable;   \
+   }
+
 / {
model = "Marvell PXA2xx family SoC";
compatible = "marvell,pxa2xx";
-- 
2.11.0

[PATCH 2/2] ARM: dts: pxa: add mioa701 board description

2018-07-01 Thread Robert Jarzmik

Add device-tree description of the Mitac MIO A701 board.
This is aimed at replacing mioa701.c board file, and once stabilized,
the leftover, such as the suspend resume mechanics will rely on a new
IPL, and not the legacy Windows CE one.

Signed-off-by: Robert Jarzmik 
---
This patch deserves some special "love review". As it will probably
serve for a more broad pxa conversion to devicetree of the other boards,
and because it touches almost all domains for a pxa platform (camera,
video, audio, i2c, ...), it should be as clean as possible so that
mistakes are not carried on ...

Therefore I expect the review of this one to be long (ie. it won't land
for v4.19), until it looks good enough.
---
 arch/arm/boot/dts/Makefile|   1 +
 arch/arm/boot/dts/mioa701.dts | 565 ++
 2 files changed, 566 insertions(+)
 create mode 100644 arch/arm/boot/dts/mioa701.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 37a3de760d40..6b12ab50c8d1 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -756,6 +756,7 @@ dtb-$(CONFIG_ARCH_PRIMA2) += \
 dtb-$(CONFIG_ARCH_OXNAS) += \
ox810se-wd-mbwe.dtb \
ox820-cloudengines-pogoplug-series-3.dtb
+dtb-$(CONFIG_ARCH_PXA) += mioa701.dtb
 dtb-$(CONFIG_ARCH_QCOM) += \
qcom-apq8060-dragonboard.dtb \
qcom-apq8064-arrow-sd-600eval.dtb \
diff --git a/arch/arm/boot/dts/mioa701.dts b/arch/arm/boot/dts/mioa701.dts
new file mode 100644
index ..680e0e44d526
--- /dev/null
+++ b/arch/arm/boot/dts/mioa701.dts
@@ -0,0 +1,565 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Copyright (C) 2018 Robert Jarzmik 
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  publishhed by the Free Software Foundation.
+ */
+
+/dts-v1/;
+#include "pxa27x.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Mitac Mio A701 Board";
+   /* compatible = "mitac,mioa701"; */
+   compatible = "marvell,pxa270";
+
+   chosen {
+   bootargs = 
"mtdparts=docg3.0:256k@3456k(barebox)ro,256k(barebox-logo),128k(barebox-env),4M(kernel),-(root)
 ubi.mtd=4 rootfstype=ubifs root=ubi0:linux_root ro";
+   };
+
+   memory {
+   reg = <0xa000 0x0400>;
+
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+
+   pstore_region:region@0xa200 {
+   compatible = "linux,contiguous-memory-region";
+   reg = <0xa200 1048576>;
+   };
+   };
+   };
+
+   cpus {
+   cpu {
+   cpu-supply = <&vcc_core>;
+   };
+   };
+
+   pxabus {
+   pinctrl: pinctrl@40e0 {
+   status = "okay";
+   pinctrl_ac97_default: ac97-default {
+   PMMUX(hpjack-detect, 12, gpio_in);
+   PMMUX(ac97-bitclk, 28, AC97_BITCLK);
+   PMMUX(ac97-sdata-in-0, 29, AC97_SDATA_IN_0);
+   PMMUX(ac97-sdata-out, 30, AC97_SDATA_OUT);
+   PMMUX(ac97-sync, 31, AC97_SYNC);
+   PMMUX(ac97-sysclk, 89, AC97_SYSCLK);
+   };
+   pinctrl_btuart_default: btuart-default {
+   PMMUX(btuart-nactivity, 14, gpio_in);
+   PMMUX(btuart-rxd, 42, BTRXD);
+   PMMUX(btuart-txd, 43, BTTXD);
+   PMMUX(btuart-cts, 44, BTCTS);
+   PMMUX(btuart-rts, 45, BTRTS);
+   PMMUX_LPM_LOW(bt-on, 83, gpio_out);
+   PMMUX_LPM_HIGH(bt-unknown, 77, gpio_out);
+   PMMUX_LPM_HIGH(bt-nreset, 86, gpio_out);
+   };
+   pinctrl_ffuart_default: ffuart-default {
+   PMMUX(ffuart-rxd, 34, FFRXD);
+   PMMUX(ffuart-cts, 35, FFCTS);
+   PMMUX(ffuart-dcd, 36, FFDCD);
+   PMMUX(ffuart-dsr, 37, FFDSR);
+   PMMUX(ffuart-txd, 39, FFTXD);
+   PMMUX(ffuart-dtr, 40, FFDTR);
+   PMMUX(ffuart-rts, 41, FFRTS);
+   PMMUX_LPM_LOW(gsm-reset, 24, gpio_out);
+   PMMUX(gsm-is-on, 25, gpio_in);
+   PMMUX_LPM_HIGH(gsm-nset-on, 88, gpio_out);
+   PMMUX_LPM_HIGH(gsm-nset-off, 90, gpio_out);
+   PMMUX(gsm-event-available, 113, gpio_in);
+   PMMUX_LPM_HIGH(gsm-dte-state, 1

Re: [PATCH v2 0/6] mm/fs: gup: don't unmap or drop filesystem buffers

2018-07-01 Thread John Hubbard

On 07/01/2018 05:56 PM, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 

There were some typos in patches #4 and #5, which I've fixed locally.
Let me know if anyone would like me to repost with those right away, otherwise
I'll wait for other review besides the kbuild test robot.

Meanwhile, for convenience, you can pull down the latest version of the
patchset from:

g...@github.com:johnhubbard/linux (branch: gup_dma_next)

thanks,
-- 
John Hubbard
NVIDIA

[PATCH v5 3/6] fs/dcache: Enable automatic pruning of negative dentries

2018-07-01 Thread Waiman Long

It is not good enough to have a soft limit for the number of
negative dentries in the system and print a warning if that limit is
exceeded. We need to do something about it when this happens.

This patch enables automatic pruning of negative dentries when
the soft limit is going to be exceeded.  This is done by using the
workqueue API to do the pruning gradually when a threshold is reached
to minimize performance impact on other running tasks.

The current threshold is 1/4 of the initial value of the free pool
count. Once the threshold is reached, the automatic pruning process
will be kicked in to replenish the free pool. Each pruning run will
scan 64 dentries per LRU list and can remove up to 256 negative
dentries to minimize the LRU locks hold time. The pruning rate will
be 50 Hz if the free pool count is less than 1/8 of the original and
10 Hz otherwise.

The dentry pruning operation may also free some least recently used
positive dentries.

In the unlikely event that a superblock is being umount'ed while in
negative dentry pruning mode, the umount may face an additional delay
of up to 0.1s.

Signed-off-by: Waiman Long 
---
 fs/dcache.c  | 158 +++
 include/linux/list_lru.h |   1 +
 mm/list_lru.c|   4 +-
 3 files changed, 162 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 889d3bb..6d00f52 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -136,6 +136,11 @@ struct dentry_stat_t dentry_stat = {
  */
 #define NEG_DENTRY_PC_DEFAULT  2
 #define NEG_DENTRY_BATCH   (1 << 8)
+#define NEG_PRUNING_SIZE   (1 << 6)
+#define NEG_PRUNING_SLOW_RATE  (HZ/10)
+#define NEG_PRUNING_FAST_RATE  (HZ/50)
+#define NEG_IS_SB_UMOUNTING(sb)\
+   unlikely(!(sb)->s_root || !((sb)->s_flags & MS_ACTIVE))
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
 static int neg_dentry_pc __read_mostly = NEG_DENTRY_PC_DEFAULT;
@@ -143,8 +148,17 @@ struct dentry_stat_t dentry_stat = {
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
raw_spinlock_t nfree_lock;
+   int niter;  /* Pruning iteration count */
+   int lru_count;  /* Per-LRU pruning count */
+   long n_neg; /* # of negative dentries pruned */
+   long n_pos; /* # of positive dentries pruned */
long nfree; /* Negative dentry free pool */
+   struct super_block *prune_sb;   /* Super_block for pruning */
 } ndblk cacheline_aligned_in_smp;
+
+static void prune_negative_dentry(struct work_struct *work);
+static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
+
 static DEFINE_PER_CPU(long, nr_dentry_neg);
 #endif
 
@@ -330,6 +344,25 @@ static void __neg_dentry_inc(struct dentry *dentry)
 */
if (!cnt)
pr_warn_once("Too many negative dentries.");
+
+   /*
+* Initiate negative dentry pruning if free pool has less than
+* 1/4 of its initial value.
+*/
+   if ((READ_ONCE(ndblk.nfree) < READ_ONCE(neg_dentry_nfree_init)/4) &&
+   !READ_ONCE(ndblk.prune_sb) &&
+   !cmpxchg(&ndblk.prune_sb, NULL, dentry->d_sb)) {
+   /*
+* Abort if umounting is in progress, otherwise take a
+* reference and move on.
+*/
+   if (NEG_IS_SB_UMOUNTING(ndblk.prune_sb)) {
+   WRITE_ONCE(ndblk.prune_sb, NULL);
+   } else {
+   atomic_inc(&ndblk.prune_sb->s_active);
+   schedule_delayed_work(&prune_neg_dentry_work, 1);
+   }
+   }
 }
 
 static inline void neg_dentry_inc(struct dentry *dentry)
@@ -1368,6 +1401,131 @@ void shrink_dcache_sb(struct super_block *sb)
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+/*
+ * A modified version that attempts to remove a limited number of negative
+ * dentries as well as some other non-negative dentries at the front.
+ */
+static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
+   struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+   struct list_head *freeable = arg;
+   struct dentry   *dentry = container_of(item, struct dentry, d_lru);
+   enum lru_status status = LRU_SKIP;
+
+   /*
+* Limit amount of dentry walking in each LRU list.
+*/
+   if (ndblk.lru_count >= NEG_PRUNING_SIZE) {
+   ndblk.lru_count = 0;
+   return LRU_STOP;
+   }
+   ndblk.lru_count++;
+
+   /*
+* we are inverting the lru lock/dentry->d_lock here,
+* so use a trylock. If we fail to get the lock, just skip
+* it
+*/
+   if (!spin_trylock(&dentry->d_lock))
+   return LRU_SKIP;
+
+   /*
+* Referenced dentries are still in use. If they have active
+* counts, just remove th

[PATCH v5 4/6] fs/dcache: Spread negative dentry pruning across multiple CPUs

2018-07-01 Thread Waiman Long

Doing negative dentry pruning using schedule_delayed_work() will
typically concentrate the pruning effort on one particular CPU. That is
not fair to the tasks running on that CPU. In addition, it is possible
that one CPU can have all its negative dentries pruned away while the
others can still have more negative dentries than the percpu limit.

To be fair, negative dentries pruning is now done across all the online
CPUs, if they all have close to the percpu limit of negative dentries.

Signed-off-by: Waiman Long 
---
 fs/dcache.c | 43 ++-
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 6d00f52..4f34f53 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -360,7 +360,8 @@ static void __neg_dentry_inc(struct dentry *dentry)
WRITE_ONCE(ndblk.prune_sb, NULL);
} else {
atomic_inc(&ndblk.prune_sb->s_active);
-   schedule_delayed_work(&prune_neg_dentry_work, 1);
+   schedule_delayed_work_on(smp_processor_id(),
+   &prune_neg_dentry_work, 1);
}
}
 }
@@ -1467,8 +1468,9 @@ static enum lru_status dentry_negative_lru_isolate(struct 
list_head *item,
  */
 static void prune_negative_dentry(struct work_struct *work)
 {
+   int cpu = smp_processor_id();
int freed, last_n_neg;
-   long nfree;
+   long nfree, excess;
struct super_block *sb = READ_ONCE(ndblk.prune_sb);
LIST_HEAD(dispose);
 
@@ -1502,9 +1504,40 @@ static void prune_negative_dentry(struct work_struct 
*work)
(nfree >= neg_dentry_nfree_init/2) || NEG_IS_SB_UMOUNTING(sb))
goto stop_pruning;
 
-   schedule_delayed_work(&prune_neg_dentry_work,
-(nfree < neg_dentry_nfree_init/8)
-? NEG_PRUNING_FAST_RATE : NEG_PRUNING_SLOW_RATE);
+   /*
+* If the negative dentry count in the current cpu is less than the
+* per_cpu limit, schedule the pruning in the next cpu if it has
+* more negative dentries. This will make the negative dentry count
+* reduction spread more evenly across multiple per-cpu counters.
+*/
+   excess = neg_dentry_percpu_limit - __this_cpu_read(nr_dentry_neg);
+   if (excess > 0) {
+   int next_cpu = cpumask_next(cpu, cpu_online_mask);
+
+   if (next_cpu >= nr_cpu_ids)
+   next_cpu = cpumask_first(cpu_online_mask);
+   if (per_cpu(nr_dentry_neg, next_cpu) >
+   __this_cpu_read(nr_dentry_neg)) {
+   cpu = next_cpu;
+
+   /*
+* Transfer some of the excess negative dentry count
+* to the free pool if the current percpu pool is less
+* than 3/4 of the limit.
+*/
+   if ((excess > neg_dentry_percpu_limit/4) &&
+   raw_spin_trylock(&ndblk.nfree_lock)) {
+   WRITE_ONCE(ndblk.nfree,
+  ndblk.nfree + NEG_DENTRY_BATCH);
+   __this_cpu_add(nr_dentry_neg, NEG_DENTRY_BATCH);
+   raw_spin_unlock(&ndblk.nfree_lock);
+   }
+   }
+   }
+
+   schedule_delayed_work_on(cpu, &prune_neg_dentry_work,
+   (nfree < neg_dentry_nfree_init/8)
+   ? NEG_PRUNING_FAST_RATE : NEG_PRUNING_SLOW_RATE);
return;
 
 stop_pruning:
-- 
1.8.3.1

[PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries

2018-07-01 Thread Waiman Long

 v4->v5:
  - Backed to the latest 4.18 kernel and modify the code
accordingly. Patch 1 "Relocate dentry_kill() after lock_parent()"
is now no longer necessary.
  - Make tracking and limiting of negative dentries an user configurable
option (CONFIG_DCACHE_TRACK_NEG_ENTRY) so that users can decide if
they want to include this capability in the kernel.
  - Make killing excess negative dentries an optional feature that can be
enabled via a boot command line option or a sysctl parameter.
  - Spread negative dentry pruning across multiple CPUs.

 v4: https://lkml.org/lkml/2017/9/18/739

A rogue application can potentially create a large number of negative
dentries in the system consuming most of the memory available if it
is not under the direct control of a memory controller that enforce
kernel memory limit.

This patchset introduces changes to the dcache subsystem to track and
optionally limit the number of negative dentries allowed to be created by
background pruning of excess negative dentries or even kill it after use.
This capability will help to limit the amount of memory that can be
consumed by negative dentries.

Patch 1 tracks the number of negative dentries present in the LRU
lists and reports it in /proc/sys/fs/dentry-state.

Patch 2 makes negative dentry tracking a user configurable option
(CONFIG_DCACHE_TRACK_NEG_ENTRY) as well as adding a "neg_dentry_pc=" boot
command line option to specify a soft limit on the number of negative
allowed as a percentage of total system memory. The default is 2%.

Patch 3 enables automatic pruning of least recently used negative
dentries when the total number is close to the preset limit.

Patch 4 spreads the negative dentry pruning effort to multiple CPUs to
make it more fair.

Patch 5 extends the "neg_dentry_pc=" boot command line option to
optionally enable enforcing the limit by killing off excess negative
dentries immediately after use.

Patch 6 makes the limit enforcing option a sysctl parameter so that it
can be dynamically enabled at run time if the need arises, for example,
when a rogue application generating a lot of negative dentries is
detected.

Waiman Long (6):
  fs/dcache: Track & report number of negative dentries
  fs/dcache: Make negative dentry tracking configurable
  fs/dcache: Enable automatic pruning of negative dentries
  fs/dcache: Spread negative dentry pruning across multiple CPUs
  fs/dcache: Allow optional enforcement of negative dentry limit
  fs/dcache: Make negative dentry limit enforcement sysctl parameter

 Documentation/admin-guide/kernel-parameters.txt |  12 +
 Documentation/sysctl/fs.txt |  30 +-
 fs/Kconfig  |  10 +
 fs/dcache.c | 452 +++-
 include/linux/dcache.h  |  13 +-
 include/linux/list_lru.h|   1 +
 kernel/sysctl.c |  11 +
 mm/list_lru.c   |   4 +-
 8 files changed, 519 insertions(+), 14 deletions(-)

-- 
1.8.3.1

[PATCH v5 5/6] fs/dcache: Allow optional enforcement of negative dentry limit

2018-07-01 Thread Waiman Long

If a rogue application that generates a large number of negative
dentries is running, the automatic negative dentries pruning process
may not be fast enough to clear up the negative dentries in time. In
this case, it is possible that negative dentries will use up most
of the available memory in the system when that application is not
under the control of a memory cgroup that limit kernel memory.

The lack of available memory may significantly affect the operation
of other applications running in the system. It may even lead to OOM
kill of useful applications.

To allow system administrators the option to prevent this extreme
situation from happening, the "enforce" option can now be added to
the "neg_dentry_pc" kernel parameter to enforce the negative dentry
limit. When the limit is enforced, extra negative dentries that exceed
the limit will be killed after use instead of leaving them in the LRU.

Signed-off-by: Waiman Long 
---
 Documentation/admin-guide/kernel-parameters.txt |  5 +-
 fs/dcache.c | 94 +++--
 include/linux/dcache.h  |  2 +-
 3 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index b7ab98a..05531a8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2468,8 +2468,11 @@
allowable in a system as a percentage of the
total system memory. The default is 2% and the
valid range is 0-10 where 0 means no limit.
+   The optional "enforce" option can be added to
+   enforce the limit by killing excessive negative
+   dentries.
 
-   Format: 
+   Format: [,enforce]
 
netdev= [NET] Network devices parameters
Format: 
diff --git a/fs/dcache.c b/fs/dcache.c
index 4f34f53..77910c9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -124,7 +124,10 @@ struct dentry_stat_t dentry_stat = {
  * allowed in the super blocks' LRU lists, if enabled. The default limit
  * is 2% of the total system memory. On a 64-bit system with 1G memory,
  * that translated to about 100k dentries which is quite a lot. The limit
- * can be changed by using the "neg_dentry_pc" kernel parameter.
+ * can be changed by using the "neg_dentry_pc" kernel parameter. An
+ * optional "enforce" option can be added to enforce the limit by
+ * destroying extra negative dentries after use when the limit is
+ * exceeded.
  *
  * To avoid performance problem with a global counter on an SMP system,
  * the tracking is done mostly on a per-cpu basis. The total limit is
@@ -143,6 +146,7 @@ struct dentry_stat_t dentry_stat = {
unlikely(!(sb)->s_root || !((sb)->s_flags & MS_ACTIVE))
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+static int enforce_neg_dentry_limit __read_mostly;
 static int neg_dentry_pc __read_mostly = NEG_DENTRY_PC_DEFAULT;
 static long neg_dentry_percpu_limit __read_mostly;
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
@@ -276,6 +280,9 @@ static inline int dentry_string_cmp(const unsigned char 
*cs, const unsigned char
 #endif
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+
+static void d_lru_del(struct dentry *dentry);
+
 /*
  * Decrement negative dentry count if applicable.
  */
@@ -318,8 +325,12 @@ static long __neg_dentry_nfree_dec(void)
 
 /*
  * Increment negative dentry count if applicable.
+ *
+ * The retain flag will only be set when calling from
+ * __d_clear_type_and_inode() so as to retain the entry even
+ * if the negative dentry limit has been exceeded.
  */
-static void __neg_dentry_inc(struct dentry *dentry)
+static void __neg_dentry_inc(struct dentry *dentry, bool retain)
 {
long cnt = 0, *pcnt;
 
@@ -340,10 +351,18 @@ static void __neg_dentry_inc(struct dentry *dentry)
put_cpu_ptr(&nr_dentry_neg);
 
/*
-* Put out a warning if there are too many negative dentries.
+* Put out a warning if there are too many negative dentries or
+* kill it by removing it from the LRU and set the
+* DCACHE_KILL_NEGATIVE flag if the enforce option is on.
 */
-   if (!cnt)
-   pr_warn_once("Too many negative dentries.");
+   if (!cnt) {
+   if (enforce_neg_dentry_limit && !retain) {
+   dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+   d_lru_del(dentry);
+   } else {
+   pr_warn_once("Too many negative dentries.");
+   }
+   }
 
/*
 * Initiate negative dentry pruning if free pool has less than
@@ -369,7 +388,7 @@ static void __neg_dentry_inc(struct dentry *dentry)
 static inline void neg_dentry_inc(struct dentry *dentry)
 {
if (unlikely(d_is_negativ

[PATCH v5 1/6] fs/dcache: Track & report number of negative dentries

2018-07-01 Thread Waiman Long

The current dentry number tracking code doesn't distinguish between
positive & negative dentries. It just reports the total number of
dentries in the LRU lists.

As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.

This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the /proc/sys/fs/dentry-state
file.  The number of positive dentries in the LRU lists can be found
by subtracting the number of negative dentries from the total.

Signed-off-by: Waiman Long 
---
 Documentation/sysctl/fs.txt | 19 +--
 fs/dcache.c | 45 +
 include/linux/dcache.h  |  7 ---
 3 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 6c00c1e..a8e3f1f 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -61,19 +61,26 @@ struct {
 int nr_unused;
 int age_limit; /* age in seconds */
 int want_pages;/* pages requested by system */
-int dummy[2];
+int nr_negative;   /* # of unused negative dentries */
+int dummy;
 } dentry_stat = {0, 0, 45, 0,};
--- 
+--
+
+Dentries are dynamically allocated and deallocated.
+
+nr_dentry shows the total number of dentries allocated (active
++ unused). nr_unused shows the number of dentries that are not
+actively used, but are saved in the LRU list for future reuse.
 
-Dentries are dynamically allocated and deallocated, and
-nr_dentry seems to be 0 all the time. Hence it's safe to
-assume that only nr_unused, age_limit and want_pages are
-used. Nr_unused seems to be exactly what its name says.
 Age_limit is the age in seconds after which dcache entries
 can be reclaimed when memory is short and want_pages is
 nonzero when shrink_dcache_pages() has been called and the
 dcache isn't pruned yet.
 
+nr_negative shows the number of unused dentries that are also
+negative dentries which do not mapped to actual files if negative
+dentries tracking is enabled.
+
 ==
 
 dquot-max & dquot-nr:
diff --git a/fs/dcache.c b/fs/dcache.c
index 0e8e5de..dbab6c2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -119,6 +119,7 @@ struct dentry_stat_t dentry_stat = {
 
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
+static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -152,11 +153,22 @@ static long get_nr_dentry_unused(void)
return sum < 0 ? 0 : sum;
 }
 
+static long get_nr_dentry_neg(void)
+{
+   int i;
+   long sum = 0;
+
+   for_each_possible_cpu(i)
+   sum += per_cpu(nr_dentry_neg, i);
+   return sum < 0 ? 0 : sum;
+}
+
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
   size_t *lenp, loff_t *ppos)
 {
dentry_stat.nr_dentry = get_nr_dentry();
dentry_stat.nr_unused = get_nr_dentry_unused();
+   dentry_stat.nr_negative = get_nr_dentry_neg();
return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -214,6 +226,28 @@ static inline int dentry_string_cmp(const unsigned char 
*cs, const unsigned char
 
 #endif
 
+static inline void __neg_dentry_dec(struct dentry *dentry)
+{
+   this_cpu_dec(nr_dentry_neg);
+}
+
+static inline void neg_dentry_dec(struct dentry *dentry)
+{
+   if (unlikely(d_is_negative(dentry)))
+   __neg_dentry_dec(dentry);
+}
+
+static inline void __neg_dentry_inc(struct dentry *dentry)
+{
+   this_cpu_inc(nr_dentry_neg);
+}
+
+static inline void neg_dentry_inc(struct dentry *dentry)
+{
+   if (unlikely(d_is_negative(dentry)))
+   __neg_dentry_inc(dentry);
+}
+
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char 
*ct, unsigned tcount)
 {
/*
@@ -330,6 +364,8 @@ static inline void __d_clear_type_and_inode(struct dentry 
*dentry)
flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
WRITE_ONCE(dentry->d_flags, flags);
dentry->d_inode = NULL;
+   if (dentry->d_flags & DCACHE_LRU_LIST)
+   __neg_dentry_inc(dentry);
 }
 
 static void dentry_free(struct dentry *dentry)
@@ -397,6 +433,7 @@ static void d_lru_add(struct dentry *dentry)
dentry->d_flags |= DCACHE_LRU_LIST;
this_cpu_inc(nr_dentry_unused);
WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, 
&dentry->d_lru));
+   neg_dentry_inc(dentry);
 }
 
 static void d_lru_del(struct dentry *dentry)
@@ -405,6 +442,7 @@ static void d_lru_del(struct dentry *dentry)
dentry->d_flags &= ~DCACHE_LRU_LIST;
this_cpu_dec(nr_d

Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()

2018-07-01 Thread Leon Romanovsky

On Thu, Jun 28, 2018 at 11:17:43AM +0200, Jan Kara wrote:
> On Wed 27-06-18 19:42:01, John Hubbard wrote:
> > On 06/27/2018 10:02 AM, Jan Kara wrote:
> > > On Wed 27-06-18 08:57:18, Jason Gunthorpe wrote:
> > >> On Wed, Jun 27, 2018 at 02:42:55PM +0200, Jan Kara wrote:
> > >>> On Wed 27-06-18 13:59:27, Michal Hocko wrote:
> >  On Wed 27-06-18 13:53:49, Jan Kara wrote:
> > > On Wed 27-06-18 13:32:21, Michal Hocko wrote:
> >  [...]
> > >> Appart from that, do we really care about 32b here? Big DIO, IB users
> > >> seem to be 64b only AFAIU.
> > >
> > > IMO it is a bad habit to leave unpriviledged-user-triggerable oops in 
> > > the
> > > kernel even for uncommon platforms...
> > 
> >  Absolutely agreed! I didn't mean to keep the blow up for 32b. I just
> >  wanted to say that we can stay with a simple solution for 32b. I 
> >  thought
> >  the g-u-p-longterm has plugged the most obvious breakage already. But
> >  maybe I just misunderstood.
> > >>>
> > >>> Most yes, but if you try hard enough, you can still trigger the oops 
> > >>> e.g.
> > >>> with appropriately set up direct IO when racing with writeback / 
> > >>> reclaim.
> > >>
> > >> gup longterm is only different from normal gup if you have DAX and few
> > >> people do, which really means it doesn't help at all.. AFAIK??
> > >
> > > Right, what I wrote works only for DAX. For non-DAX situation g-u-p
> > > longterm does not currently help at all. Sorry for confusion.
> > >
> >
> > OK, I've got an early version of this up and running, reusing the page->lru
> > fields. I'll clean it up and do some heavier testing, and post as a PATCH 
> > v2.
>
> Cool.
>
> > One question though: I'm still vague on the best actions to take in the
> > following functions:
> >
> > page_mkclean_one
> > try_to_unmap_one
> >
> > At the moment, they are both just doing an evil little early-out:
> >
> > if (PageDmaPinned(page))
> > return false;
> >
> > ...but we talked about maybe waiting for the condition to clear, instead?
> > Thoughts?
>
> What needs to happen in page_mkclean() depends on the caller. Most of the
> callers really need to be sure the page is write-protected once
> page_mkclean() returns. Those are:
>
>   pagecache_isize_extended()
>   fb_deferred_io_work()
>   clear_page_dirty_for_io() if called for data-integrity writeback - which
> is currently known only in its caller (e.g. write_cache_pages()) where
> it can be determined as wbc->sync_mode == WB_SYNC_ALL. Getting this
> information into page_mkclean() will require some plumbing and
> clear_page_dirty_for_io() has some 50 callers but it's doable.
>
> clear_page_dirty_for_io() for cleaning writeback (wbc->sync_mode !=
> WB_SYNC_ALL) can just skip pinned pages and we probably need to do that as
> otherwise memory cleaning would get stuck on pinned pages until RDMA
> drivers release its pins.

Sorry for naive question, but won't it create too much dirty pages
so writeback will be called "non-stop" to rebalance watermarks without
ability to progress?

Thanks


signature.asc
Description: PGP signature

[PATCH v5 6/6] fs/dcache: Make negative dentry limit enforcement sysctl parameter

2018-07-01 Thread Waiman Long

It can be useful to make negative dentry limit enformcement a runtime
tuning parameter instead of just a boot time option. This allows system
administrator to disable limit enforcement in normal use, but turn it
on under certain circumstances.

A new /proc/sys/fs/enforce-neg-dentry-limit sysctl parameter is now
added. This is a boolean flag that accept a value of either 0 or 1
for disabling and enabling enforcement of negative dentry limit
respectively.

Signed-off-by: Waiman Long 
---
 Documentation/sysctl/fs.txt | 11 +++
 fs/dcache.c |  4 +++-
 include/linux/dcache.h  |  4 
 kernel/sysctl.c | 11 +++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index a8e3f1f..ef8fd32 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -24,6 +24,7 @@ Currently, these files are in /proc/sys/fs:
 - dentry-state
 - dquot-max
 - dquot-nr
+- enforce-neg-dentry-limit
 - file-max
 - file-nr
 - inode-max
@@ -97,6 +98,16 @@ you might want to raise the limit.
 
 ==
 
+enforce-neg-dentry-limit:
+
+The file enforce-neg-dentry-limit, if present, contains a boolean
+flag (0 or 1) indicating if the negative dentries limit set by
+the "neg_dentry_pc" kernel parameter should be enforced or not.
+If enforced, excess negative dentries over the limit will be killed
+immediately after use.
+
+==
+
 file-max & file-nr:
 
 The value in file-max denotes the maximum number of file-
diff --git a/fs/dcache.c b/fs/dcache.c
index 77910c9..f50886e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -146,7 +146,9 @@ struct dentry_stat_t dentry_stat = {
unlikely(!(sb)->s_root || !((sb)->s_flags & MS_ACTIVE))
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
-static int enforce_neg_dentry_limit __read_mostly;
+int enforce_neg_dentry_limit __read_mostly;
+EXPORT_SYMBOL_GPL(enforce_neg_dentry_limit);
+
 static int neg_dentry_pc __read_mostly = NEG_DENTRY_PC_DEFAULT;
 static long neg_dentry_percpu_limit __read_mostly;
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 69b8cb3..bd7238a0 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -610,4 +610,8 @@ struct name_snapshot {
 void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);
 
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+extern int enforce_neg_dentry_limit;
+#endif
+
 #endif /* __LINUX_DCACHE_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2d9837c..94f6f6c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1849,6 +1849,17 @@ static int sysrq_sysctl_handler(struct ctl_table *table, 
int write,
.proc_handler   = proc_dointvec_minmax,
.extra1 = &one,
},
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+   {
+   .procname   = "enforce-neg-dentry-limit",
+   .data   = &enforce_neg_dentry_limit,
+   .maxlen = sizeof(enforce_neg_dentry_limit),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = &zero,
+   .extra2 = &one,
+   },
+#endif
{ }
 };
 
-- 
1.8.3.1

[PATCH v5 2/6] fs/dcache: Make negative dentry tracking configurable

2018-07-01 Thread Waiman Long

The negative dentry tracking is made a configurable option so that
users who don't care about negative dentry tracking will have the
option to disable it. The new config option DCACHE_TRACK_NEG_ENTRY
is disabled by default.

If this option is enabled, a new kernel parameter "neg_dentry_pc=<%>"
allows users to set the soft limit on how many negative dentries are
allowed as a percentage of the total system memory. The default is 2%
and this new parameter accept a range of 0-10% where 0% means there
is no limit.

When the soft limit is reached, a warning message will be printed to
the console to alert the system administrator.

Signed-off-by: Waiman Long 
---
 Documentation/admin-guide/kernel-parameters.txt |   9 ++
 fs/Kconfig  |  10 ++
 fs/dcache.c | 170 +++-
 3 files changed, 184 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index efc7aa7..b7ab98a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2462,6 +2462,15 @@
 
n2= [NET] SDL Inc. RISCom/N2 synchronous serial card
 
+   neg_dentry_pc=
+   With "CONFIG_DCACHE_TRACK_NEG_ENTRY=y", specify
+   the limit for the number negative dentries
+   allowable in a system as a percentage of the
+   total system memory. The default is 2% and the
+   valid range is 0-10 where 0 means no limit.
+
+   Format: 
+
netdev= [NET] Network devices parameters
Format: 
Note that mem_start is often overloaded to mean
diff --git a/fs/Kconfig b/fs/Kconfig
index ac474a6..2e81637 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -113,6 +113,16 @@ source "fs/autofs/Kconfig"
 source "fs/fuse/Kconfig"
 source "fs/overlayfs/Kconfig"
 
+#
+# Track and limit the number of negative dentries allowed in the system.
+#
+config DCACHE_TRACK_NEG_ENTRY
+   bool "Track & limit negative dcache entries"
+   default n
+   help
+ This option enables the tracking and limiting of the total
+ number of negative dcache entries in the filesystem.
+
 menu "Caches"
 
 source "fs/fscache/Kconfig"
diff --git a/fs/dcache.c b/fs/dcache.c
index dbab6c2..889d3bb 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -14,6 +14,8 @@
  * the dcache entry is deleted or garbage collected.
  */
 
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -117,9 +119,37 @@ struct dentry_stat_t dentry_stat = {
.age_limit = 45,
 };
 
+/*
+ * There is a system-wide soft limit to the number of negative dentries
+ * allowed in the super blocks' LRU lists, if enabled. The default limit
+ * is 2% of the total system memory. On a 64-bit system with 1G memory,
+ * that translated to about 100k dentries which is quite a lot. The limit
+ * can be changed by using the "neg_dentry_pc" kernel parameter.
+ *
+ * To avoid performance problem with a global counter on an SMP system,
+ * the tracking is done mostly on a per-cpu basis. The total limit is
+ * distributed in a 80/20 ratio to per-cpu counters and a global free pool.
+ *
+ * If a per-cpu counter runs out of negative dentries, it can borrow extra
+ * ones from the global free pool. If it has more than its percpu limit,
+ * the extra ones will be returned back to the global pool.
+ */
+#define NEG_DENTRY_PC_DEFAULT  2
+#define NEG_DENTRY_BATCH   (1 << 8)
+
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+static int neg_dentry_pc __read_mostly = NEG_DENTRY_PC_DEFAULT;
+static long neg_dentry_percpu_limit __read_mostly;
+static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
+static struct {
+   raw_spinlock_t nfree_lock;
+   long nfree; /* Negative dentry free pool */
+} ndblk cacheline_aligned_in_smp;
+static DEFINE_PER_CPU(long, nr_dentry_neg);
+#endif
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
-static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -153,6 +183,7 @@ static long get_nr_dentry_unused(void)
return sum < 0 ? 0 : sum;
 }
 
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
 static long get_nr_dentry_neg(void)
 {
int i;
@@ -160,8 +191,12 @@ static long get_nr_dentry_neg(void)
 
for_each_possible_cpu(i)
sum += per_cpu(nr_dentry_neg, i);
+   sum += neg_dentry_nfree_init - ndblk.nfree;
return sum < 0 ? 0 : sum;
 }
+#else
+static long get_nr_dentry_neg(void){ return 0L; }
+#endif
 
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
   size_t *lenp, loff_t *ppos)
@@ -226,9 +261,23 @@ static inline int dentry_string_cmp(const unsigned

Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate()

2018-07-01 Thread Huang, Ying

Matthew Wilcox  writes:

> On Fri, Jun 22, 2018 at 11:51:33AM +0800, Huang, Ying wrote:
>> +++ b/mm/swap_state.c
>> @@ -433,7 +433,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
>> gfp_t gfp_mask,
>>  /*
>>   * Swap entry may have been freed since our caller observed it.
>>   */
>> -err = swapcache_prepare(entry);
>> +err = swapcache_prepare(entry, false);
>>  if (err == -EEXIST) {
>>  radix_tree_preload_end();
>>  /*
>
> This commit should be just a textual conflict.

Yes.  Will check it.

Best Regards,
Huang, Ying

Re: [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-07-01 Thread Huang, Ying

Matthew Wilcox  writes:

> On Fri, Jun 29, 2018 at 09:17:16AM +0800, Huang, Ying wrote:
>> Matthew Wilcox  writes:
>> > I'll take a look.  Honestly, my biggest problem with this patch set is
>> > overuse of tagging:
>> >
>> > 59832 Jun 22 Huang, Ying ( 131) [PATCH -mm -v4 00/21] mm, THP, 
>> > swap: Swa
>> > There's literally zero useful information displayed in the patch subjects.
>> 
>> Thanks!  What's your suggestion on tagging?  Only keep "mm" or "swap"?
>
> Subject: [PATCH v14 10/74] xarray: Add XArray tags
>
> I'm not sure where the extra '-' in front of '-v4' comes from.  I also
> wouldn't put the '-mm' in front of it -- that information can live in
> the cover letter's body rather than any patch's subject.
>
> I think 'swap:' implies "mm:", so yeah I'd just go with that.
>
> Subject: [PATCH v4 00/21] swap: Useful information here
>
> I'd see that as:
>
> 59832 Jun 22 Huang, Ying ( 131) [PATCH v4 00/21] swap: Useful 
> informatio

Looks good!  I will use this naming convention in the future.

> I had a quick look at your patches.  I think only two are affected by
> the XArray, and I'll make some general comments about them soon.

Thanks!

Best Regards,
Huang, Ying

[lkp-robot] [x86/kernel] b1ff47aace: WARNING:at_kernel/jump_label.c:#__jump_label_update

2018-07-01 Thread kernel test robot


FYI, we noticed the following commit (built with gcc-7):

commit: b1ff47aacea95e5be1bedf2aee740395b52f4591 ("[PATCH 5/5] x86/kernel: 
jump_table: use relative references")
url: 
https://github.com/0day-ci/linux/commits/Ard-Biesheuvel/add-support-for-relative-references-in-jump-tables/20180628-021246


in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 360M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+-+++
| | 1843c4017f | b1ff47aace 
|
+-+++
| boot_successes  | 57 | 46 
|
| boot_failures   | 2  | 14 
|
| Mem-Info| 2  | 3  
|
| invoked_oom-killer:gfp_mask=0x  | 1  | 3  
|
| Out_of_memory:Kill_process  | 1  | 3  
|
| WARNING:at_kernel/jump_label.c:#__jump_label_update | 0  | 11 
|
| EIP:__jump_label_update | 0  | 11 
|
+-+++



[   43.154660] WARNING: CPU: 0 PID: 351 at kernel/jump_label.c:388 
__jump_label_update+0x101/0x130
[   43.172391] Modules linked in:
[   43.176312] CPU: 0 PID: 351 Comm: trinity-main Not tainted 
4.18.0-rc2-00124-gb1ff47a #206
[   43.186389] EIP: __jump_label_update+0x101/0x130
[   43.192131] Code: a5 bf fd ff 6a 01 31 c9 ba 01 00 00 00 b8 c0 02 cd b1 c6 
05 ba 2e cb b1 01 e8 8b bf fd ff ff 33 68 8b 35 b2 b1 e8 cf 74 f3 ff <0f> 0b 6a 
01 31 c9 ba 01 00 00 00 b8 a8 02 cd b1 e8 6a bf fd ff 83 
[   43.215879] EAX: 0021 EBX: b1cb67b0 ECX:  EDX: 
[   43.223498] ESI: b1cb67b8 EDI: b1cb2fbc EBP: b89c9dc0 ESP: b89c9d9c
[   43.231212] DS: 007b ES: 007b FS:  GS: 00e0 SS: 0068 EFLAGS: 00010292
[   43.239602] CR0: 80050033 CR2: 0805a000 CR3: 08979000 CR4: 0690
[   43.247344] Call Trace:
[   43.250614]  jump_label_update+0x95/0x120
[   43.255705]  static_key_slow_inc_cpuslocked+0xcd/0xe0
[   43.261993]  static_key_slow_inc+0xd/0x10
[   43.266986]  tracepoint_probe_register_prio+0x257/0x320
[   43.273467]  tracepoint_probe_register+0xf/0x20
[   43.279104]  trace_event_reg+0x90/0x100
[   43.283964]  perf_trace_init+0x222/0x280
[   43.288833]  perf_tp_event_init+0x1d/0x50
[   43.293947]  perf_try_init_event+0x27/0xb0
[   43.299066]  perf_event_alloc+0x757/0xb20
[   43.304996]  __do_sys_perf_event_open+0x3de/0xd60
[   43.310932]  sys_perf_event_open+0x17/0x20
[   43.315362]  do_int80_syscall_32+0x98/0x1f0
[   43.319354]  entry_INT80_32+0x33/0x33
[   43.322816] EIP: 0xa7fa41b2
[   43.325381] Code: 89 c2 31 c0 89 d7 f3 aa 8b 44 24 1c 89 30 c6 40 04 00 83 
c4 2c 89 f0 5b 5e 5f 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 cd 80  8d b6 
00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00 
[   43.346022] EAX: ffda EBX: 080d3000 ECX: 015f EDX: 
[   43.355195] ESI:  EDI: 0001 EBP:  ESP: af819388
[   43.362426] DS: 007b ES: 007b FS:  GS: 0033 SS: 007b EFLAGS: 0282
[   43.368681] ---[ end trace 323a8199e30cb153 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script # job-script is attached in this 
email



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.18.0-rc2 Kernel Configuration
#

#
# Compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
#
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_FILTER_PGPROT=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=70300
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOC

Re: [PATCH v2 5/6] mm: track gup pages with page->dma_pinned_* fields

2018-07-01 Thread John Hubbard

On 07/01/2018 07:58 PM, kbuild test robot wrote:
> Hi John,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v4.18-rc3]
> [cannot apply to next-20180629]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/mm-fs-gup-don-t-unmap-or-drop-filesystem-buffers/20180702-090125
> config: x86_64-randconfig-x010-201826 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All warnings (new ones prefixed by >>):
> 
>In file included from arch/x86/include/asm/atomic.h:5:0,
> from include/linux/atomic.h:5,
> from include/linux/page_counter.h:5,
> from mm/memcontrol.c:34:
>mm/memcontrol.c: In function 'unlock_page_lru':
>mm/memcontrol.c:2087:32: error: 'page_tail' undeclared (first use in this 
> function); did you mean 'page_pool'?
>   VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
>^
Yes, that should have been:

VM_BUG_ON_PAGE(PageDmaPinned(page), page);

Fixed locally...maybe I'll post a v3 right now, as there were half a dozen 
ridiculous typos that
snuck in.



thanks,
-- 
John Hubbard
NVIDIA

Re: [PATCH v7 3/6] kernel/reboot.c: export pm_power_off_prepare

2018-07-01 Thread Oleksij Rempel

Hi Rafael,

it is two weeks since this email. Probably it was lost some where in the
space time continuum.
Can you please respond to it :)

On 17.06.2018 09:05, Shawn Guo wrote:
> On Tue, Jun 12, 2018 at 04:33:05PM +0200, Rafael J. Wysocki wrote:
>> On Tuesday, June 12, 2018 2:42:12 PM CEST Oleksij Rempel wrote:
>>>  This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
>>> --Sj2PRcQlY7eZybdA0sq9wWzJEO8fKS924
>>> Content-Type: multipart/mixed; boundary="d6BZYFRi4L3iCmOh3nm6wjii3dWC9QFDg";
>>>  protected-headers="v1"
>>> From: Oleksij Rempel 
>>> To: Shawn Guo , Mark Brown ,
>>>  "Rafael J. Wysocki" 
>>> Cc: ker...@pengutronix.de, devicet...@vger.kernel.org,
>>>  linux-arm-ker...@lists.infradead.org, linux-...@vger.kernel.org,
>>>  linux-kernel@vger.kernel.org, Andrew Morton ,
>>>  Liam Girdwood ,
>>>  Leonard Crestez , Rob Herring 
>>> ,
>>>  Mark Rutland ,
>>>  Michael Turquette ,
>>>  Stephen Boyd , Fabio Estevam ,
>>>  Russell King 
>>> Message-ID: 
>>> Subject: Re: [PATCH v7 3/6] kernel/reboot.c: export pm_power_off_prepare
>>> References: <20180517055014.6607-1-o.rem...@pengutronix.de>
>>>  <20180517055014.6607-4-o.rem...@pengutronix.de>
>>> In-Reply-To: <20180517055014.6607-4-o.rem...@pengutronix.de>
>>>
>>> --d6BZYFRi4L3iCmOh3nm6wjii3dWC9QFDg
>>> Content-Type: text/plain; charset=utf-8
>>> Content-Language: en-US
>>> Content-Transfer-Encoding: quoted-printable
>>>
>>> Hi Rafael,
>>>
>>> Last version of this patch was send at 17.05.2018. No other comment was
>>> provided and this patch is a blocker for other patches in this serie.
>>> Can you please give some feedback on it.
>>
>> I would have done that had I not missed the patch.
>>
>> Which probably wouldn't have happened had you CCed it to linux-pm.
>>
>> Anyway, I have no particular problems with exporting pm_power_off_prepare via
>> EXPORT_SYMBOL_GPL().
> 
> Rafael,
> 
> Can we have your explicit Acked-by tag on this patch?  Thanks.
> 
> Shawn
> 



signature.asc
Description: OpenPGP digital signature

drivers/android/.tmp_gl_binder.o:undefined reference to `__user_bad'

2018-07-01 Thread kbuild test robot

Hi Martijn,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   021c91791a5e7e85c567452f1be3e4c2c6cb6063
commit: 1190b4e38f97023154e6b3bef61b251aa5f970d0 ANDROID: binder: remove 32-bit 
binder interface.
date:   7 weeks ago
config: microblaze-allmodconfig (attached as .config)
compiler: microblaze-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 1190b4e38f97023154e6b3bef61b251aa5f970d0
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=microblaze 

All errors (new ones prefixed by >>):

   drivers/android/binder.o: In function `binder_thread_write':
>> drivers/android/.tmp_gl_binder.o:(.text+0xcbb0): undefined reference to 
>> `__user_bad'
   drivers/android/.tmp_gl_binder.o:(.text+0xcbdc): undefined reference to 
`__user_bad'
   drivers/android/.tmp_gl_binder.o:(.text+0xcfc4): undefined reference to 
`__user_bad'
   drivers/android/.tmp_gl_binder.o:(.text+0xd650): undefined reference to 
`__user_bad'
   drivers/android/.tmp_gl_binder.o:(.text+0xdbc8): undefined reference to 
`__user_bad'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH 1/1] ARM: dts: imx6ull: add operating points

2018-07-01 Thread Viresh Kumar

On 29-06-18, 16:52, Sébastien Szymanski wrote:
> i.MX6ULL has different operating ranges than i.MX6UL so add the
> operating points for the i.MX6ULL and removed them form board device

s/removed/remove/
s/form/from/

> trees. A 25mV offset is added to the minimum allowed values like for the
> i.MX6UL.
> The valid frequencies are now selected by the cpufreq driver according
> to ratings stored in fuses since commit 0aa9abd4c212 ("cpufreq: imx6q:
> check speed grades for i.MX6ULL")
> 
> Signed-off-by: Sébastien Szymanski 
> ---
>  arch/arm/boot/dts/imx6ull-colibri-wifi.dtsi | 14 --
>  arch/arm/boot/dts/imx6ull.dtsi  | 19 +++
>  2 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/imx6ull-colibri-wifi.dtsi 
> b/arch/arm/boot/dts/imx6ull-colibri-wifi.dtsi
> index 3dffbcd50bf6..183193e8580d 100644
> --- a/arch/arm/boot/dts/imx6ull-colibri-wifi.dtsi
> +++ b/arch/arm/boot/dts/imx6ull-colibri-wifi.dtsi
> @@ -20,20 +20,6 @@
>  
>  &cpu0 {
>   clock-frequency = <79200>;
> - operating-points = <
> - /* kHz  uV */
> - 792000  1225000
> - 528000  1175000
> - 396000  1025000
> - 198000  95
> - >;
> - fsl,soc-operating-points = <
> - /* KHz  uV */
> - 792000  1175000
> - 528000  1175000
> - 396000  1175000
> - 198000  1175000
> - >;
>  };
>  
>  &iomuxc {
> diff --git a/arch/arm/boot/dts/imx6ull.dtsi b/arch/arm/boot/dts/imx6ull.dtsi
> index ebc25c98e5e1..ade64bd46fab 100644
> --- a/arch/arm/boot/dts/imx6ull.dtsi
> +++ b/arch/arm/boot/dts/imx6ull.dtsi
> @@ -48,6 +48,25 @@
>  /* Delete CAAM node in AIPS-2 (i.MX6UL specific) */
>  /delete-node/ &crypto;
>  
> +&cpu0 {
> + operating-points = <
> + /* kHz  uV */
> + 90  1275000
> + 792000  1225000
> + 528000  1175000
> + 396000  1025000
> + 198000  95
> + >;
> + fsl,soc-operating-points = <
> + /* KHz  uV */
> + 90  1175000
> + 792000  1175000
> + 528000  1175000
> + 396000  1175000
> + 198000  1175000
> + >;
> +};
> +
>  / {
>   soc {
>   aips3: aips-bus@220 {

Acked-by: Viresh Kumar 

-- 
viresh

linux-next: Tree for Jul 2

2018-07-01 Thread Stephen Rothwell

Hi all,

Changes since 20180629:

New tree: siox

The btrfs-kdave tree lost its build failure.

The fbdev tree lost its build failure.

The net-next tree gained conflicts against the net and rdma trees.

The akpm-current tree still had its build failure for which I reverted
a commit.

Non-merge commits (relative to Linus' tree): 3138
 3430 files changed, 109593 insertions(+), 56376 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 282 trees (counting Linus' and 65 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (d3bc0e67f852 Merge tag 'for-4.18-rc2-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (883c9ab9eb59 Merge branch 'parisc-4.18-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux)
Merging arc-current/for-curr (3d561d86e99f ARC: Enable CONFIG_SWAP)
Merging arm-current/fixes (92d44a42af81 ARM: fix kill( ,SIGFPE) breakage)
Merging arm64-fixes/for-next/fixes (24fe1b0efad4 arm64: Remove unnecessary ISBs 
from set_{pte,pmd,pud})
Merging m68k-current/for-linus (b12c8a70643f m68k: Set default dma mask for 
platform devices)
Merging powerpc-fixes/fixes (22db552b50fa powerpc/powermac: Fix rtc read/write 
functions)
Merging sparc/master (1aaccb5fa0ea Merge tag 'rtc-4.18' of 
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (1236f22fbae1 tcp: prevent bogus FRTO undos with non-SACK 
flows)
CONFLICT (content): Merge conflict in net/smc/af_smc.c
Merging bpf/master (bf2b866a2fe2 Merge branch 'bpf-sockmap-fixes')
Merging ipsec/master (7284fdf39a91 esp6: fix memleak on error path in 
esp6_input)
Merging netfilter/master (24ac3a08e658 net/smc: rebuild nonblocking connect)
Merging ipvs/master (312564269535 net: netsec: reduce DMA mask to 40 bits)
Merging wireless-drivers/master (4fa9433f950a Merge ath-current from 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git)
Merging mac80211/master (95bca62fb723 nl80211: check nla_parse_nested() return 
values)
Merging rdma-fixes/for-rc (b697d7d8c741 IB/hfi1: Fix incorrect mixing of 
ERR_PTR and NULL return values)
Merging sound-current/for-linus (c9a4c63888db ALSA: seq: Fix UBSAN warning at 
SNDRV_SEQ_IOCTL_QUERY_NEXT_CLIENT ioctl)
Merging sound-asoc-fixes/for-linus (1c3d1081dfd2 Merge branch 'asoc-4.18' into 
asoc-linus)
Merging regmap-fixes/for-linus (7daf201d7fe8 Linux 4.18-rc2)
Merging regulator-fixes/for-linus (b327c75b3a76 Merge branch 'regulator-4.18' 
into regulator-linus)
Merging spi-fixes/for-linus (8fac34ed031d Merge branch 'spi-4.18' into 
spi-linus)
Merging pci-current/for-linus (83235822b8b4 nfp: stop limiting VFs to 0)
Merging driver-core.current/driver-core-linus (7daf201d7fe8 Linux 4.18-rc2)
Merging tty.current/tty-linus (21eff690 vt: prevent leaking uninitialized 
data to userspace via /dev/vcs*)
Merging usb.current/usb-linus (226e2d2d31b1 Merge tag 'usb-serial-4.18-rc3' of 
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus)
Merging usb-gadget-fixes/fixes (1d8e5c002758 dwc2: gadget: Fix ISOC IN DDMA PID 
bitfield value calcula

Re: [PATCH RFC tip/core/rcu 1/2] rcu: Defer reporting RCU-preempt quiescent states when disabled

2018-07-01 Thread Joel Fernandes

On Sun, Jul 01, 2018 at 08:11:32PM -0700, Paul E. McKenney wrote:
> On Sun, Jul 01, 2018 at 05:35:53PM -0700, Joel Fernandes wrote:
> > On Sun, Jul 01, 2018 at 03:27:49PM -0700, Paul E. McKenney wrote:
> > [...]
> > > > > +/*
> > > > > + * Report a deferred quiescent state if needed and safe to do so.
> > > > > + * As with rcu_preempt_need_deferred_qs(), "safe" involves only
> > > > > + * not being in an RCU read-side critical section.  The caller must
> > > > > + * evaluate safety in terms of interrupt, softirq, and preemption
> > > > > + * disabling.
> > > > > + */
> > > > > +static void rcu_preempt_deferred_qs(struct task_struct *t)
> > > > > +{
> > > > > + unsigned long flags;
> > > > > +
> > > > > + if (!rcu_preempt_need_deferred_qs(t))
> > > > > + return;
> > > > > + local_irq_save(flags);
> > > > > + rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Handle special cases during rcu_read_unlock(), such as needing to
> > > > > + * notify RCU core processing or task having blocked during the RCU
> > > > > + * read-side critical section.
> > > > > + */
> > > > > +static void rcu_read_unlock_special(struct task_struct *t)
> > > > > +{
> > > > > + unsigned long flags;
> > > > > + bool preempt_bh_were_disabled = !!(preempt_count() & 
> > > > > ~HARDIRQ_MASK);
> > > > > + bool irqs_were_disabled;
> > > > > +
> > > > > + /* NMI handlers cannot block and cannot safely manipulate 
> > > > > state. */
> > > > > + if (in_nmi())
> > > > > + return;
> > > > > +
> > > > > + local_irq_save(flags);
> > > > > + irqs_were_disabled = irqs_disabled_flags(flags);
> > > > > + if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > > + t->rcu_read_unlock_special.b.blocked) {
> > > > > + /* Need to defer quiescent state until everything is 
> > > > > enabled. */
> > > > > + raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > > + local_irq_restore(flags);
> > > > > + return;
> > > > > + }
> > > > > + rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * Dump detailed information for all tasks blocking the current RCU
> > > > >   * grace period on the specified rcu_node structure.
> > > > > @@ -737,10 +784,20 @@ static void rcu_preempt_check_callbacks(void)
> > > > >   struct rcu_state *rsp = &rcu_preempt_state;
> > > > >   struct task_struct *t = current;
> > > > >  
> > > > > - if (t->rcu_read_lock_nesting == 0) {
> > > > > - rcu_preempt_qs();
> > > > > + if (t->rcu_read_lock_nesting > 0 ||
> > > > > + (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
> > > > > + /* No QS, force context switch if deferred. */
> > > > > + if (rcu_preempt_need_deferred_qs(t))
> > > > > + resched_cpu(smp_processor_id());
> > > > 
> > > > 
> > > > Hi Paul,
> > > > 
> > > > I had a similar idea of checking the preempt_count() sometime back but 
> > > > didn't
> > > > believe this path can be called with preempt enabled (for some reason 
> > > > ;-)).
> > > > Now that I've convinced myself that's possible, what do you think about
> > > > taking advantage of the opportunity to report a RCU-sched qs like below 
> > > > from
> > > > rcu_check_callbacks ?
> > > > 
> > > > Did some basic testing, can roll into a patch later if you're Ok with 
> > > > it.
> > > 
> > > The problem here is that the code patch above cannot be called
> > > with CONFIG_PREEMPT_COUNT=n, but the code below can.  And if
> > > CONFIG_PREEMPT_COUNT=n, the return value from preempt_count() can be
> > > misleading.
> > > 
> > > Or am I missing something here?
> > 
> > That is true! so then I could also test if PREEMPT_RCU is enabled like 
> > you're
> > doing in the other path.
> > 
> > thanks!
> > 
> > ---8<---
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index fb440baf8ac6..03a460921dca 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2683,6 +2683,12 @@ void rcu_check_callbacks(int user)
> > rcu_note_voluntary_context_switch(current);
> > 
> > } else if (!in_softirq()) {
> > +   /*
> > +* Report RCU-sched qs if not in an RCU-sched read-side
> > +* critical section.
> > +*/
> > +   if (IS_ENABLED(PREEMPT_RCU) && !(preempt_count() & 
> > PREEMPT_MASK))
> 
> For more precision, s/PREEMPT_RCU/CONFIG_PREEMPT_COUNT/
> 
> Hmmm...  I recently queued a patch that redefines the RCU-bh update-side
> API in terms of the consolidated RCU implementation, so this "else"
> clause no longer exists.  One approach would be to fold this condition
> (with the addition of SOFTIRQ_MASK) into the previous "if" condition,
> but that would call rcu_note_voluntary_context_switch() at bad times.
> So maybe this becomes a new "else if" clause.
> 
> Another complication is an u

Re: [PATCH v2 4/6] mm/fs: add a sync_mode param for clear_page_dirty_for_io()

2018-07-01 Thread John Hubbard

On 07/01/2018 07:47 PM, kbuild test robot wrote:
> Hi John,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.18-rc3]
> [cannot apply to next-20180629]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/mm-fs-gup-don-t-unmap-or-drop-filesystem-buffers/20180702-090125
> config: i386-randconfig-x075-201826 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>fs/f2fs/dir.c: In function 'f2fs_delete_entry':
>>> fs/f2fs/dir.c:734:33: error: 'WB_SYNC_ALL' undeclared (first use in this 
>>> function); did you mean 'FS_SYNC_FL'?
>   clear_page_dirty_for_io(page, WB_SYNC_ALL);
> ^~~
> FS_SYNC_FL

Fixed locally, via:

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 258f9dc117f4..ca20c1262582 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -16,6 +16,7 @@
 #include "acl.h"
 #include "xattr.h"
 #include 
+#include 
 
 static unsigned long dir_blocks(struct inode *inode)
 {



thanks,
-- 
John Hubbard
NVIDIA

Re: [PATCH v2 4/6] mm/fs: add a sync_mode param for clear_page_dirty_for_io()

2018-07-01 Thread John Hubbard

On 07/01/2018 07:11 PM, kbuild test robot wrote:
> Hi John,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v4.18-rc3]
> [cannot apply to next-20180629]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/mm-fs-gup-don-t-unmap-or-drop-filesystem-buffers/20180702-090125
> config: x86_64-randconfig-x010-201826 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
[...]
> ^
>include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
>  if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
>  ^~~~
>>> fs/f2fs/data.c:2021:4: note: in expansion of macro 'if'
>if (!clear_page_dirty_for_io(page), wbc->sync_mode)
>^~
>fs/f2fs/data.c:2021:9: error: too few arguments to function 
> 'clear_page_dirty_for_io'
>if (!clear_page_dirty_for_io(page), wbc->sync_mode)
> ^
>include/linux/compiler.h:69:16: note: in definition of macro '__trace_if'
>   __r = !!(cond); \
>^~~~
>>> fs/f2fs/data.c:2021:4: note: in expansion of macro 'if'
>if (!clear_page_dirty_for_io(page), wbc->sync_mode)
>^~

> 

Typo, that should have been:
 if (!clear_page_dirty_for_io(page, wbc->sync_mode))

...fixed locally, I'll include it in the next spin. (Somehow my last build 
didn't
have all the filesystems enabled, sorry for the glitches.)
   

thanks,
-- 
John Hubbard
NVIDIA

Re: [PATCH v2] stop_machine: Disable preemption when waking two stopper threads

2018-07-01 Thread Pavan Kondeti

Hi Issac,

On Fri, Jun 29, 2018 at 01:55:12PM -0700, Isaac J. Manjarres wrote:
> When cpu_stop_queue_two_works() begins to wake the stopper
> threads, it does so without preemption disabled, which leads
> to the following race condition:
> 
> The source CPU calls cpu_stop_queue_two_works(), with cpu1
> as the source CPU, and cpu2 as the destination CPU. When
> adding the stopper threads to the wake queue used in this
> function, the source CPU stopper thread is added first,
> and the destination CPU stopper thread is added last.
> 
> When wake_up_q() is invoked to wake the stopper threads, the
> threads are woken up in the order that they are queued in,
> so the source CPU's stopper thread is woken up first, and
> it preempts the thread running on the source CPU.
> 
> The stopper thread will then execute on the source CPU,
> disable preemption, and begin executing multi_cpu_stop(),
> and wait for an ack from the destination CPU's stopper thread,
> with preemption still disabled. Since the worker thread that
> woke up the stopper thread on the source CPU is affine to the
> source CPU, and preemption is disabled on the source CPU, that
> thread will never run to dequeue the destination CPU's stopper
> thread from the wake queue, and thus, the destination CPU's
> stopper thread will never run, causing the source CPU's stopper
> thread to wait forever, and stall.
> 
> Disable preemption when waking the stopper threads in
> cpu_stop_queue_two_works() to ensure that the worker thread
> that is waking up the stopper threads isn't preempted
> by the source CPU's stopper thread, and permanently
> scheduled out, leaving the remaining stopper thread asleep
> in the wake queue.
> 
> Co-developed-by: Pavankumar Kondeti 
> Signed-off-by: Prasad Sodagudi 
> Signed-off-by: Pavankumar Kondeti 
> Signed-off-by: Isaac J. Manjarres 
> ---

You might want to add the below Fixes tag and CC stable.

Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. 
active_balance() deadlock")

>  kernel/stop_machine.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index f89014a..1ff523d 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -270,7 +270,11 @@ static int cpu_stop_queue_two_works(int cpu1, struct 
> cpu_stop_work *work1,
>   goto retry;
>   }
>  
> - wake_up_q(&wakeq);
> + if (!err) {
> + preempt_disable();
> + wake_up_q(&wakeq);
> + preempt_enable();
> + }
>  
>   return err;
>  }

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH v5 2/2] arm64: KVM: export the capability to set guest SError syndrome

2018-07-01 Thread gengdongjiu

Hi James,

On 2018/6/29 23:58, James Morse wrote:
> Hi Dongjiu Geng,
> 
> This patch doesn't apply on v4.18-rc2.
> 
> Documentation/virtual/kvm/api.txt already has a 8.18 section. I guess you 
> based
> this on v4.17.

Yes, indeed I based on v4.17.

> 
> For posting patches, please use the latest 'rc' from Linus' tree, (or the
> maintainer's tree listed in MAINTAINERS for the tree you are targeting if the
> maintainer has started to pick up patches).
Ok, I will rebase it using the latest 'rc' from Linus' tree. thanks for the 
reminder.


> 
> 
> Thanks,
> 
> James
> 
> 
> On 25/06/18 21:58, Dongjiu Geng wrote:
>> For the arm64 RAS Extension, user space can inject a virtual-SError
>> with specified ESR. So user space needs to know whether KVM support
>> to inject such SError, this interface adds this query for this capability.
>>
>> KVM will check whether system support RAS Extension, if supported, KVM
>> returns true to user space, otherwise returns false.
> 
> 
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index 3732097..86b3808 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4628,3 +4628,14 @@ Architectures: s390
>>  This capability indicates that kvm will implement the interfaces to handle
>>  reset, migration and nested KVM for branch prediction blocking. The stfle
>>  facility 82 should not be provided to the guest without this capability.
>> +
>> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that userspace can specify the syndrome value 
>> reported
>> +to the guest OS when guest takes a virtual SError interrupt exception.
>> +If KVM has this capability, userspace can only specify the ISS field for 
>> the ESR
>> +syndrome, it can not specify the EC field which is not under control by KVM.
>> +If this virtual SError is taken to EL1 using AArch64, this value will be 
>> reported
>> +in ISS filed of ESR_EL1.
> 
> 
> 
> 
> .
>

Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5

2018-07-01 Thread Michael Ellerman

Linus Torvalds  writes:
> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger  
> wrote:
>>
>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), 
>> page)
>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>> page->page_type. The routine was called twice. The first had page_type of
>> 0xfbff, which would have been expected for a . The second call had
>> 0x, which led to the BUG.
>
> So it looks to me like the tear-down of the page tables first found a
> page that is indeed a page table, and cleared the page table bit
> (well, it set it - the bits are reversed).
...
>
> That said, can some ppc person who knows the 32-bit ppc code and maybe
> knows what that "interrupt: 700" means talk about that oddity in the
> trace, please?

I think everyone else answered your questions here, and it should be
fixed now in your tree.

Larry let me know if you're still seeing a crash with 4.18-rc3.

cheers

[PATCH 0/3] fix selftests compiling errors and warnings

2018-07-01 Thread Li Zhijian



Li Zhijian (3):
  selftests/android: fix compiling error
  selftests/android: initialize heap_type to avoid compiling warning
  selftests/gpio: unset OUTPUT for build tools/gpio

 tools/testing/selftests/android/ion/Makefile| 5 -
 tools/testing/selftests/android/ion/ionapp_export.c | 7 +++
 tools/testing/selftests/gpio/Makefile   | 2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

-- 
2.7.4

[PATCH 1/3] selftests/android: fix compiling error

2018-07-01 Thread Li Zhijian

lizhijian@haswell-OptiPlex-9020:/home/lizj/linux/tools/testing/selftests/android/ion$
 make
gcc  -I. -I../../../../../drivers/staging/android/uapi/ 
-I../../../../../usr/include/ -Wall -O2 -gionapp_export.c ipcsocket.c 
ionutils.c   -o ionapp_export
gcc  -I. -I../../../../../drivers/staging/android/uapi/ 
-I../../../../../usr/include/ -Wall -O2 -gionapp_import.c ipcsocket.c 
ionutils.c   -o ionapp_import
gcc  -I. -I../../../../../drivers/staging/android/uapi/ 
-I../../../../../usr/include/ -Wall -O2 -gionmap_test.c ipcsocket.c 
ionutils.c   -o ionmap_test
ionmap_test.c:12:27: fatal error: linux/dma-buf.h: No such file or directory
compilation terminated.
: recipe for target 'ionmap_test' failed
make: *** [ionmap_test] Error 1

It requires headers_install to $TOP/usr

CC: Shuah Khan 
CC: Pintu Agarwal 
Signed-off-by: Li Zhijian 
---
 tools/testing/selftests/android/ion/Makefile | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/android/ion/Makefile 
b/tools/testing/selftests/android/ion/Makefile
index e036952..9f83299 100644
--- a/tools/testing/selftests/android/ion/Makefile
+++ b/tools/testing/selftests/android/ion/Makefile
@@ -4,7 +4,10 @@ CFLAGS := $(CFLAGS) $(INCLUDEDIR) -Wall -O2 -g
 
 TEST_GEN_FILES := ionapp_export ionapp_import ionmap_test
 
-all: $(TEST_GEN_FILES)
+all: ../../../../../usr/include/linux/dma-buf.h $(TEST_GEN_FILES)
+
+../../../../../usr/include/linux/dma-buf.h:
+   make -C ../../../../../ headers_install INSTALL_HDR_PATH=$(shell 
pwd)/../../../../../usr
 
 $(TEST_GEN_FILES): ipcsocket.c ionutils.c
 
-- 
2.7.4

[PATCH 3/3] selftests/gpio: unset OUTPUT for build tools/gpio

2018-07-01 Thread Li Zhijian

when we execute 'make' to build selftests, the TOP Makefile build gpio like:
selftests$ make ARCH= CROSS_COMPILE= 
OUTPUT=/home/lizj/linux/tools/testing/selftests/gpio -C gpio
...
make[2]: Leaving directory '/home/lizj/linux/tools/gpio'
gcc -O2 -g -std=gnu99 -Wall -I../../../../usr/include/gpio-mockup-chardev.c 
../../../gpio/gpio-utils.o ../../../../usr/include/linux/gpio.h  
-I/usr/include/libmount -I/usr/include/blkid -I/usr/include/uuid -lmount -o 
gpio-mockup-chardev
gcc: error: ../../../gpio/gpio-utils.o: No such file or directory
: recipe for target 'gpio-mockup-chardev' failed
make[1]: *** [gpio-mockup-chardev] Error 1
make[1]: Leaving directory
'/home/lizj/linux/tools/testing/selftests/gpio'
Makefile:84: recipe for target 'all' failed
make: *** [all] Error 2

CC: Bamvor Jian Zhang 
CC: Bartosz Golaszewski 
CC: Shuah Khan 
CC: linux-g...@vger.kernel.org
Signed-off-by: Li Zhijian 
---
 tools/testing/selftests/gpio/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/gpio/Makefile 
b/tools/testing/selftests/gpio/Makefile
index 1bbb475..e64a11c 100644
--- a/tools/testing/selftests/gpio/Makefile
+++ b/tools/testing/selftests/gpio/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lmount -I/usr/include/libmount
 $(BINARIES): ../../../gpio/gpio-utils.o ../../../../usr/include/linux/gpio.h
 
 ../../../gpio/gpio-utils.o:
-   make ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C ../../../gpio
+   make OUTPUT= ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C 
../../../gpio
 
 ../../../../usr/include/linux/gpio.h:
make -C ../../../.. headers_install INSTALL_HDR_PATH=$(shell 
pwd)/../../../../usr/
-- 
2.7.4

[PATCH 2/3] selftests/android: initialize heap_type to avoid compiling warning

2018-07-01 Thread Li Zhijian

root@vm-lkp-nex04-8G-7 ~/linux-v4.18-rc2/tools/testing/selftests/android# make
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make 
rule.
make[1]: Entering directory 
'/root/linux-v4.18-rc2/tools/testing/selftests/android/ion'
gcc  -I. -I../../../../../drivers/staging/android/uapi/ 
-I../../../../../usr/include/ -Wall -O2 -gionapp_export.c ipcsocket.c 
ionutils.c   -o ionapp_export
ionapp_export.c: In function 'main':
ionapp_export.c:91:2: warning: 'heap_type' may be used uninitialized in
this function [-Wmaybe-uninitialized]
  printf("heap_type: %ld, heap_size: %ld\n", heap_type, heap_size);
  ^~~~

CC: Shuah Khan 
CC: Pintu Agarwal 
Signed-off-by: Li Zhijian 
---
 tools/testing/selftests/android/ion/ionapp_export.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/testing/selftests/android/ion/ionapp_export.c 
b/tools/testing/selftests/android/ion/ionapp_export.c
index a944e72..e3435c2 100644
--- a/tools/testing/selftests/android/ion/ionapp_export.c
+++ b/tools/testing/selftests/android/ion/ionapp_export.c
@@ -49,6 +49,7 @@ int main(int argc, char *argv[])
return -1;
}
 
+   heap_type = -1UL;
heap_size = 0;
flags = 0;
 
@@ -82,6 +83,12 @@ int main(int argc, char *argv[])
}
}
 
+   if (heap_type == -1UL) {
+   printf("heap_type is invalid\n");
+   print_usage(argc, argv);
+   exit(1);
+   }
+
if (heap_size <= 0) {
printf("heap_size cannot be 0\n");
print_usage(argc, argv);
-- 
2.7.4

[PATCH v2 3/4] PCI: mediatek: Add system pm support for MT2712 and MT7622

2018-07-01 Thread honghui.zhang

From: Honghui Zhang 

The MTCMOS of PCIe Host for MT2712 and MT7622 will be off when system
suspend, and all the internal control register will be reset after system
resume. The PCIe link should be re-established and the related control
register values should be re-set after system resume.

Signed-off-by: Honghui Zhang 
---
 drivers/pci/controller/pcie-mediatek.c | 67 ++
 1 file changed, 67 insertions(+)

diff --git a/drivers/pci/controller/pcie-mediatek.c 
b/drivers/pci/controller/pcie-mediatek.c
index 86918d4..c530539 100644
--- a/drivers/pci/controller/pcie-mediatek.c
+++ b/drivers/pci/controller/pcie-mediatek.c
@@ -134,12 +134,14 @@ struct mtk_pcie_port;
 /**
  * struct mtk_pcie_soc - differentiate between host generations
  * @need_fix_class_id: whether this host's class ID needed to be fixed or not
+ * @pm_support: whether the host's MTCMOS will be off when suspend
  * @ops: pointer to configuration access functions
  * @startup: pointer to controller setting functions
  * @setup_irq: pointer to initialize IRQ functions
  */
 struct mtk_pcie_soc {
bool need_fix_class_id;
+   bool pm_support;
struct pci_ops *ops;
int (*startup)(struct mtk_pcie_port *port);
int (*setup_irq)(struct mtk_pcie_port *port, struct device_node *node);
@@ -1197,12 +1199,75 @@ static int mtk_pcie_probe(struct platform_device *pdev)
return err;
 }
 
+static int __maybe_unused mtk_pcie_suspend_noirq(struct device *dev)
+{
+   struct mtk_pcie *pcie = dev_get_drvdata(dev);
+   const struct mtk_pcie_soc *soc = pcie->soc;
+   struct mtk_pcie_port *port;
+
+   if (!soc->pm_support)
+   return 0;
+
+   if (list_empty(&pcie->ports))
+   return 0;
+
+   list_for_each_entry(port, &pcie->ports, list) {
+   clk_disable_unprepare(port->pipe_ck);
+   clk_disable_unprepare(port->obff_ck);
+   clk_disable_unprepare(port->axi_ck);
+   clk_disable_unprepare(port->aux_ck);
+   clk_disable_unprepare(port->ahb_ck);
+   clk_disable_unprepare(port->sys_ck);
+   phy_power_off(port->phy);
+   phy_exit(port->phy);
+   }
+
+   mtk_pcie_subsys_powerdown(pcie);
+
+   return 0;
+}
+
+static int __maybe_unused mtk_pcie_resume_noirq(struct device *dev)
+{
+   struct mtk_pcie *pcie = dev_get_drvdata(dev);
+   const struct mtk_pcie_soc *soc = pcie->soc;
+   struct mtk_pcie_port *port;
+
+   if (!soc->pm_support)
+   return 0;
+
+   if (list_empty(&pcie->ports))
+   return 0;
+
+   if (dev->pm_domain) {
+   pm_runtime_enable(dev);
+   pm_runtime_get_sync(dev);
+   }
+
+   clk_prepare_enable(pcie->free_ck);
+
+   list_for_each_entry_safe(port, &pcie->ports, list)
+   mtk_pcie_enable_port(port);
+
+   /* In case of EP was removed while system suspend. */
+   if (list_empty(&pcie->ports))
+   mtk_pcie_subsys_powerdown(pcie);
+
+   return 0;
+}
+
+static const struct dev_pm_ops mtk_pcie_pm_ops = {
+   SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(mtk_pcie_suspend_noirq,
+ mtk_pcie_resume_noirq)
+};
+
 static const struct mtk_pcie_soc mtk_pcie_soc_v1 = {
.ops = &mtk_pcie_ops,
.startup = mtk_pcie_startup_port,
 };
 
 static const struct mtk_pcie_soc mtk_pcie_soc_mt2712 = {
+   .pm_support = true,
.ops = &mtk_pcie_ops_v2,
.startup = mtk_pcie_startup_port_v2,
.setup_irq = mtk_pcie_setup_irq,
@@ -1210,6 +1275,7 @@ static const struct mtk_pcie_soc mtk_pcie_soc_mt2712 = {
 
 static const struct mtk_pcie_soc mtk_pcie_soc_mt7622 = {
.need_fix_class_id = true,
+   .pm_support = true,
.ops = &mtk_pcie_ops_v2,
.startup = mtk_pcie_startup_port_v2,
.setup_irq = mtk_pcie_setup_irq,
@@ -1229,6 +1295,7 @@ static struct platform_driver mtk_pcie_driver = {
.name = "mtk-pcie",
.of_match_table = mtk_pcie_ids,
.suppress_bind_attrs = true,
+   .pm = &mtk_pcie_pm_ops,
},
 };
 builtin_platform_driver(mtk_pcie_driver);
-- 
2.6.4

[PATCH v2 1/4] PCI: mediatek: fixup mtk_pcie_find_port logical

2018-07-01 Thread honghui.zhang

From: Honghui Zhang 

The Mediatek's host controller has two slots, each with it's own control
registers. The host driver need to identify which slot was connected
in order to access the device's configuration space. There's problem
for current host driver to find out which slot was connected to for
a given EP device.

Assuming each slot have connect with one EP device as below:

host bridge
  bus 0 --> __|___
   |  |
   |  |
 slot 0 slot 1
  bus 1 -->|bus 2 --> |
   |  |
 EP 0   EP 1

During PCI enumeration, system software will scan all the PCI device
starting from devfn 0. So it will get the proper port for slot0 and
slot1 device when using PCI_SLOT(devfn) for match. But it will get
the wrong slot for EP1: The devfn will be start from 0 when scanning
EP1 behind slot1, it will get port0 since the PCI_SLOT(EP1) is match
for port0's slot value. So the host driver should not using EP's devfn
but the slot's devfn(the slot which EP was connected to) for match.

This patch fix the mtk_pcie_find_port's logical by using the slot's
devfn for match.

Signed-off-by: Honghui Zhang 
---
 drivers/pci/controller/pcie-mediatek.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pcie-mediatek.c 
b/drivers/pci/controller/pcie-mediatek.c
index 0baabe3..b43f41d 100644
--- a/drivers/pci/controller/pcie-mediatek.c
+++ b/drivers/pci/controller/pcie-mediatek.c
@@ -337,11 +337,26 @@ static struct mtk_pcie_port *mtk_pcie_find_port(struct 
pci_bus *bus,
 {
struct mtk_pcie *pcie = bus->sysdata;
struct mtk_pcie_port *port;
+   struct pci_dev *dev;
+   struct pci_bus *pbus;
 
-   list_for_each_entry(port, &pcie->ports, list)
-   if (port->slot == PCI_SLOT(devfn))
+   list_for_each_entry(port, &pcie->ports, list) {
+   if (!bus->number && port->slot == PCI_SLOT(devfn))
return port;
 
+   if (bus->number) {
+   pbus = bus;
+
+   while (pbus->number) {
+   dev = pbus->self;
+   pbus = dev->bus;
+   }
+
+   if (port->slot == PCI_SLOT(dev->devfn))
+   return port;
+   }
+   }
+
return NULL;
 }
 
-- 
2.6.4

[PATCH v2 4/4] PCI: mediatek: Add loadable kernel module support

2018-07-01 Thread honghui.zhang

From: Honghui Zhang 

Implement remove callback function for Mediatek PCIe driver to add
loadable kernel module support.

Signed-off-by: Honghui Zhang 
---
 drivers/pci/controller/Kconfig |  2 +-
 drivers/pci/controller/pcie-mediatek.c | 60 +++---
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index 18fa09b..6c61ac65 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -234,7 +234,7 @@ config PCIE_ROCKCHIP_EP
  available to support GEN2 with 4 slots.
 
 config PCIE_MEDIATEK
-   bool "MediaTek PCIe controller"
+   tristate "MediaTek PCIe controller"
depends on ARCH_MEDIATEK || COMPILE_TEST
depends on OF
depends on PCI_MSI_IRQ_DOMAIN
diff --git a/drivers/pci/controller/pcie-mediatek.c 
b/drivers/pci/controller/pcie-mediatek.c
index c530539..b17cac6 100644
--- a/drivers/pci/controller/pcie-mediatek.c
+++ b/drivers/pci/controller/pcie-mediatek.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -184,6 +185,7 @@ struct mtk_pcie_port {
struct phy *phy;
u32 lane;
u32 slot;
+   int irq;
struct irq_domain *irq_domain;
struct irq_domain *inner_domain;
struct irq_domain *msi_domain;
@@ -538,6 +540,27 @@ static void mtk_pcie_enable_msi(struct mtk_pcie_port *port)
writel(val, port->base + PCIE_INT_MASK);
 }
 
+static void mtk_pcie_irq_teardown(struct mtk_pcie *pcie)
+{
+   struct mtk_pcie_port *port, *tmp;
+
+   list_for_each_entry_safe(port, tmp, &pcie->ports, list) {
+   irq_set_chained_handler_and_data(port->irq, NULL, NULL);
+
+   if (port->irq_domain)
+   irq_domain_remove(port->irq_domain);
+
+   if (IS_ENABLED(CONFIG_PCI_MSI)) {
+   if (port->msi_domain)
+   irq_domain_remove(port->msi_domain);
+   if (port->inner_domain)
+   irq_domain_remove(port->inner_domain);
+   }
+
+   irq_dispose_mapping(port->irq);
+   }
+}
+
 static int mtk_pcie_intx_map(struct irq_domain *domain, unsigned int irq,
 irq_hw_number_t hwirq)
 {
@@ -628,7 +651,7 @@ static int mtk_pcie_setup_irq(struct mtk_pcie_port *port,
struct mtk_pcie *pcie = port->pcie;
struct device *dev = pcie->dev;
struct platform_device *pdev = to_platform_device(dev);
-   int err, irq;
+   int err;
 
err = mtk_pcie_init_irq_domain(port, node);
if (err) {
@@ -636,8 +659,9 @@ static int mtk_pcie_setup_irq(struct mtk_pcie_port *port,
return err;
}
 
-   irq = platform_get_irq(pdev, port->slot);
-   irq_set_chained_handler_and_data(irq, mtk_pcie_intr_handler, port);
+   port->irq = platform_get_irq(pdev, port->slot);
+   irq_set_chained_handler_and_data(port->irq,
+mtk_pcie_intr_handler, port);
 
return 0;
 }
@@ -1199,6 +1223,32 @@ static int mtk_pcie_probe(struct platform_device *pdev)
return err;
 }
 
+
+static void mtk_pcie_free_resources(struct mtk_pcie *pcie)
+{
+   struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
+   struct list_head *windows = &host->windows;
+
+   pci_unmap_iospace(&pcie->pio);
+   pci_free_resource_list(windows);
+}
+
+static int mtk_pcie_remove(struct platform_device *pdev)
+{
+   struct mtk_pcie *pcie = platform_get_drvdata(pdev);
+   struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
+
+   pci_stop_root_bus(host->bus);
+   pci_remove_root_bus(host->bus);
+   mtk_pcie_free_resources(pcie);
+
+   mtk_pcie_irq_teardown(pcie);
+
+   mtk_pcie_put_resources(pcie);
+
+   return 0;
+}
+
 static int __maybe_unused mtk_pcie_suspend_noirq(struct device *dev)
 {
struct mtk_pcie *pcie = dev_get_drvdata(dev);
@@ -1291,6 +1341,7 @@ static const struct of_device_id mtk_pcie_ids[] = {
 
 static struct platform_driver mtk_pcie_driver = {
.probe = mtk_pcie_probe,
+   .remove = mtk_pcie_remove,
.driver = {
.name = "mtk-pcie",
.of_match_table = mtk_pcie_ids,
@@ -1298,4 +1349,5 @@ static struct platform_driver mtk_pcie_driver = {
.pm = &mtk_pcie_pm_ops,
},
 };
-builtin_platform_driver(mtk_pcie_driver);
+module_platform_driver(mtk_pcie_driver);
+MODULE_LICENSE("GPL v2");
-- 
2.6.4

[PATCH v2 2/4] PCI: mediatek: enable msi after clock enabled

2018-07-01 Thread honghui.zhang

From: Honghui Zhang 

The clocks was not enabled when enable MSI. This patch fix this
issue by calling mtk_pcie_enable_msi in mtk_pcie_startup_port_v2
since the clock was all enabled at that time.

The function of mtk_pcie_startup_port_v2's define location is
re-arranged to avoid mtk_pcie_enable_msi's forward declaration.

Signed-off-by: Honghui Zhang 
Reviewed-by: Ryder Lee 
---
 drivers/pci/controller/pcie-mediatek.c | 143 +
 1 file changed, 72 insertions(+), 71 deletions(-)

diff --git a/drivers/pci/controller/pcie-mediatek.c 
b/drivers/pci/controller/pcie-mediatek.c
index b43f41d..86918d4 100644
--- a/drivers/pci/controller/pcie-mediatek.c
+++ b/drivers/pci/controller/pcie-mediatek.c
@@ -398,75 +398,6 @@ static struct pci_ops mtk_pcie_ops_v2 = {
.write = mtk_pcie_config_write,
 };
 
-static int mtk_pcie_startup_port_v2(struct mtk_pcie_port *port)
-{
-   struct mtk_pcie *pcie = port->pcie;
-   struct resource *mem = &pcie->mem;
-   const struct mtk_pcie_soc *soc = port->pcie->soc;
-   u32 val;
-   size_t size;
-   int err;
-
-   /* MT7622 platforms need to enable LTSSM and ASPM from PCIe subsys */
-   if (pcie->base) {
-   val = readl(pcie->base + PCIE_SYS_CFG_V2);
-   val |= PCIE_CSR_LTSSM_EN(port->slot) |
-  PCIE_CSR_ASPM_L1_EN(port->slot);
-   writel(val, pcie->base + PCIE_SYS_CFG_V2);
-   }
-
-   /* Assert all reset signals */
-   writel(0, port->base + PCIE_RST_CTRL);
-
-   /*
-* Enable PCIe link down reset, if link status changed from link up to
-* link down, this will reset MAC control registers and configuration
-* space.
-*/
-   writel(PCIE_LINKDOWN_RST_EN, port->base + PCIE_RST_CTRL);
-
-   /* De-assert PHY, PE, PIPE, MAC and configuration reset */
-   val = readl(port->base + PCIE_RST_CTRL);
-   val |= PCIE_PHY_RSTB | PCIE_PERSTB | PCIE_PIPE_SRSTB |
-  PCIE_MAC_SRSTB | PCIE_CRSTB;
-   writel(val, port->base + PCIE_RST_CTRL);
-
-   /* Set up vendor ID and class code */
-   if (soc->need_fix_class_id) {
-   val = PCI_VENDOR_ID_MEDIATEK;
-   writew(val, port->base + PCIE_CONF_VEND_ID);
-
-   val = PCI_CLASS_BRIDGE_HOST;
-   writew(val, port->base + PCIE_CONF_CLASS_ID);
-   }
-
-   /* 100ms timeout value should be enough for Gen1/2 training */
-   err = readl_poll_timeout(port->base + PCIE_LINK_STATUS_V2, val,
-!!(val & PCIE_PORT_LINKUP_V2), 20,
-100 * USEC_PER_MSEC);
-   if (err)
-   return -ETIMEDOUT;
-
-   /* Set INTx mask */
-   val = readl(port->base + PCIE_INT_MASK);
-   val &= ~INTX_MASK;
-   writel(val, port->base + PCIE_INT_MASK);
-
-   /* Set AHB to PCIe translation windows */
-   size = mem->end - mem->start;
-   val = lower_32_bits(mem->start) | AHB2PCIE_SIZE(fls(size));
-   writel(val, port->base + PCIE_AHB_TRANS_BASE0_L);
-
-   val = upper_32_bits(mem->start);
-   writel(val, port->base + PCIE_AHB_TRANS_BASE0_H);
-
-   /* Set PCIe to AXI translation memory space.*/
-   val = fls(0x) | WIN_ENABLE;
-   writel(val, port->base + PCIE_AXI_WINDOW0);
-
-   return 0;
-}
-
 static void mtk_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 {
struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data);
@@ -643,8 +574,6 @@ static int mtk_pcie_init_irq_domain(struct mtk_pcie_port 
*port,
ret = mtk_pcie_allocate_msi_domains(port);
if (ret)
return ret;
-
-   mtk_pcie_enable_msi(port);
}
 
return 0;
@@ -711,6 +640,78 @@ static int mtk_pcie_setup_irq(struct mtk_pcie_port *port,
return 0;
 }
 
+static int mtk_pcie_startup_port_v2(struct mtk_pcie_port *port)
+{
+   struct mtk_pcie *pcie = port->pcie;
+   struct resource *mem = &pcie->mem;
+   const struct mtk_pcie_soc *soc = port->pcie->soc;
+   u32 val;
+   size_t size;
+   int err;
+
+   /* MT7622 platforms need to enable LTSSM and ASPM from PCIe subsys */
+   if (pcie->base) {
+   val = readl(pcie->base + PCIE_SYS_CFG_V2);
+   val |= PCIE_CSR_LTSSM_EN(port->slot) |
+  PCIE_CSR_ASPM_L1_EN(port->slot);
+   writel(val, pcie->base + PCIE_SYS_CFG_V2);
+   }
+
+   /* Assert all reset signals */
+   writel(0, port->base + PCIE_RST_CTRL);
+
+   /*
+* Enable PCIe link down reset, if link status changed from link up to
+* link down, this will reset MAC control registers and configuration
+* space.
+*/
+   writel(PCIE_LINKDOWN_RST_EN, port->base + PCIE_RST_CTRL);
+
+   /* De-assert PHY, PE, PIPE, MAC and configuration reset */
+   val = readl(port->base + PCIE_RST_CTRL);
+   val |= PCIE_

[PATCH v2 0/4] PCI: mediatek: fixup find_port, enable_msi and add pm, module support

2018-07-01 Thread honghui.zhang

From: Honghui Zhang 

This patchset includes misc patchs:

The first patch fixup the mtk_pcie_find_port logical which will cause system
could not touch the EP's configuration space which was connected to PCIe slot 1.

The second patch fixup the enable msi logical, the operation to enable msi
should be after system clock is enabled. The function of 
mtk_pcie_startup_port_v2's
define location is re-arranged to avoid mtk_pcie_enable_msi's forward 
declaration.
And call mtk_pcie_enable_msi in mtk_pcie_startup_port_v2 since the clock was all
enabled at that time.

The third patch was rebased and refactor of the v4 patch[1], changes are:
 -Add PM support for MT7622.
 -Using mtk_pcie_enable_port to re-establish the link when resumed.
 -Rebase on the previous two patches.

The fourth patch add loadable kernel module support.

Some of those patches was already reviewed-by Ryder Lee 
,
so I just add the Reviewed-by tags in those patches.

[1] https://patchwork.kernel.org/patch/10479079

Change since v1:
 - A bit of code refact of the first patch suggested by Andy Shevchenko, and
   commit message updated.
 - Using __maybe_unused.
 - Remove the redundant list_empty check of the fourth patch.

Honghui Zhang (4):
  PCI: mediatek: fixup mtk_pcie_find_port logical
  PCI: mediatek: enable msi after clock enabled
  PCI: mediatek: Add system pm support for MT2712 and MT7622
  PCI: mediatek: Add loadable kernel module support

 drivers/pci/controller/Kconfig |   2 +-
 drivers/pci/controller/pcie-mediatek.c | 289 -
 2 files changed, 213 insertions(+), 78 deletions(-)

-- 
2.6.4

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Baoquan He

On 07/01/18 at 11:28pm, Pavel Tatashin wrote:
> > > So, on the first failure, we even stop trying to populate other
> > > sections. No more memory to do so.
> >
> > This is the thing I worry about. In old sparse_mem_maps_populate_node()
> > you can see, when not present or failed to populate, just continue. This
> > is the main difference between yours and the old code. The key logic is
> > changed here.
> >
> 
> I do not see how  we can succeed after the first failure. We still
> allocate from the same node:
> 
> sparse_mem_map_populate() may fail only if we could not allocate large
> enough buffer vmemmap_buf_start earlier.
> 
> This means that in:
> sparse_mem_map_populate()
>   vmemmap_populate()
> vmemmap_populate_hugepages()
>   vmemmap_alloc_block_buf() (no buffer, so call allocator)
> vmemmap_alloc_block(size, node);
> __earlyonly_bootmem_alloc(node, size, size, 
> __pa(MAX_DMA_ADDRESS));
>   memblock_virt_alloc_try_nid_raw() -> Nothing changes for
> this call to succeed. So, all consequent calls to
> sparse_mem_map_populate() in this node will fail as well.

Yes, you are right, it's improvement. Thanks.

> 
> > >
> > Forgot mentioning it's the vervion in mm/sparse-vmemmap.c
> 
> Sorry, I do not understand what is vervion.

Typo, 'version', should be. Sorry for that.

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Pavel Tatashin

> > So, on the first failure, we even stop trying to populate other
> > sections. No more memory to do so.
>
> This is the thing I worry about. In old sparse_mem_maps_populate_node()
> you can see, when not present or failed to populate, just continue. This
> is the main difference between yours and the old code. The key logic is
> changed here.
>

I do not see how  we can succeed after the first failure. We still
allocate from the same node:

sparse_mem_map_populate() may fail only if we could not allocate large
enough buffer vmemmap_buf_start earlier.

This means that in:
sparse_mem_map_populate()
  vmemmap_populate()
vmemmap_populate_hugepages()
  vmemmap_alloc_block_buf() (no buffer, so call allocator)
vmemmap_alloc_block(size, node);
__earlyonly_bootmem_alloc(node, size, size, __pa(MAX_DMA_ADDRESS));
  memblock_virt_alloc_try_nid_raw() -> Nothing changes for
this call to succeed. So, all consequent calls to
sparse_mem_map_populate() in this node will fail as well.

> >
> Forgot mentioning it's the vervion in mm/sparse-vmemmap.c

Sorry, I do not understand what is vervion.

Thank you,
Pavel

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Baoquan He

On 07/02/18 at 11:14am, Baoquan He wrote:
> On 07/01/18 at 11:03pm, Pavel Tatashin wrote:
> > > Ah, yes, I misunderstood it, sorry for that.
> > >
> > > Then I have only one concern, for vmemmap case, if one section doesn't
> > > succeed to populate its memmap, do we need to skip all the remaining
> > > sections in that node?
> > 
> > Yes, in sparse_populate_node() we have the following:
> > 
> > 294 for (pnum = pnum_begin; map_index < map_count; pnum++) {
> > 295 if (!present_section_nr(pnum))
> > 296 continue;
> > 297 if (!sparse_mem_map_populate(pnum, nid, NULL))
> > 298 break;
> > 
> > So, on the first failure, we even stop trying to populate other
> > sections. No more memory to do so.
> 
> This is the thing I worry about. In old sparse_mem_maps_populate_node()
> you can see, when not present or failed to populate, just continue. This
> is the main difference between yours and the old code. The key logic is
> changed here.
> 
Forgot mentioning it's the vervion in mm/sparse-vmemmap.c

> void __init sparse_mem_maps_populate_node(struct page **map_map,
>   unsigned long pnum_begin,
>   unsigned long pnum_end,
>   unsigned long map_count, int nodeid)
> {
>   ...
>   for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
> struct mem_section *ms;
> 
> if (!present_section_nr(pnum))
> continue;
> 
> map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
> if (map_map[pnum])
> 
> continue;
> ms = __nr_to_section(pnum);
> pr_err("%s: sparsemem memory map backing failed some memory 
> will not be available\n",
>__func__);
> ms->section_mem_map = 0;
> }
>   ...
> }
>

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Baoquan He

On 07/01/18 at 11:03pm, Pavel Tatashin wrote:
> > Ah, yes, I misunderstood it, sorry for that.
> >
> > Then I have only one concern, for vmemmap case, if one section doesn't
> > succeed to populate its memmap, do we need to skip all the remaining
> > sections in that node?
> 
> Yes, in sparse_populate_node() we have the following:
> 
> 294 for (pnum = pnum_begin; map_index < map_count; pnum++) {
> 295 if (!present_section_nr(pnum))
> 296 continue;
> 297 if (!sparse_mem_map_populate(pnum, nid, NULL))
> 298 break;
> 
> So, on the first failure, we even stop trying to populate other
> sections. No more memory to do so.

This is the thing I worry about. In old sparse_mem_maps_populate_node()
you can see, when not present or failed to populate, just continue. This
is the main difference between yours and the old code. The key logic is
changed here.

void __init sparse_mem_maps_populate_node(struct page **map_map,
  unsigned long pnum_begin,
  unsigned long pnum_end,
  unsigned long map_count, int nodeid)
{
...
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
struct mem_section *ms;

if (!present_section_nr(pnum))
continue;

map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
if (map_map[pnum])  

continue;
ms = __nr_to_section(pnum);
pr_err("%s: sparsemem memory map backing failed some memory 
will not be available\n",
   __func__);
ms->section_mem_map = 0;
}
...
}

Re: [PATCH RFC tip/core/rcu 1/2] rcu: Defer reporting RCU-preempt quiescent states when disabled

2018-07-01 Thread Paul E. McKenney

On Sun, Jul 01, 2018 at 05:35:53PM -0700, Joel Fernandes wrote:
> On Sun, Jul 01, 2018 at 03:27:49PM -0700, Paul E. McKenney wrote:
> [...]
> > > > +/*
> > > > + * Report a deferred quiescent state if needed and safe to do so.
> > > > + * As with rcu_preempt_need_deferred_qs(), "safe" involves only
> > > > + * not being in an RCU read-side critical section.  The caller must
> > > > + * evaluate safety in terms of interrupt, softirq, and preemption
> > > > + * disabling.
> > > > + */
> > > > +static void rcu_preempt_deferred_qs(struct task_struct *t)
> > > > +{
> > > > +   unsigned long flags;
> > > > +
> > > > +   if (!rcu_preempt_need_deferred_qs(t))
> > > > +   return;
> > > > +   local_irq_save(flags);
> > > > +   rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > +}
> > > > +
> > > > +/*
> > > > + * Handle special cases during rcu_read_unlock(), such as needing to
> > > > + * notify RCU core processing or task having blocked during the RCU
> > > > + * read-side critical section.
> > > > + */
> > > > +static void rcu_read_unlock_special(struct task_struct *t)
> > > > +{
> > > > +   unsigned long flags;
> > > > +   bool preempt_bh_were_disabled = !!(preempt_count() & 
> > > > ~HARDIRQ_MASK);
> > > > +   bool irqs_were_disabled;
> > > > +
> > > > +   /* NMI handlers cannot block and cannot safely manipulate 
> > > > state. */
> > > > +   if (in_nmi())
> > > > +   return;
> > > > +
> > > > +   local_irq_save(flags);
> > > > +   irqs_were_disabled = irqs_disabled_flags(flags);
> > > > +   if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > > > +   t->rcu_read_unlock_special.b.blocked) {
> > > > +   /* Need to defer quiescent state until everything is 
> > > > enabled. */
> > > > +   raise_softirq_irqoff(RCU_SOFTIRQ);
> > > > +   local_irq_restore(flags);
> > > > +   return;
> > > > +   }
> > > > +   rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > +}
> > > > +
> > > >  /*
> > > >   * Dump detailed information for all tasks blocking the current RCU
> > > >   * grace period on the specified rcu_node structure.
> > > > @@ -737,10 +784,20 @@ static void rcu_preempt_check_callbacks(void)
> > > > struct rcu_state *rsp = &rcu_preempt_state;
> > > > struct task_struct *t = current;
> > > >  
> > > > -   if (t->rcu_read_lock_nesting == 0) {
> > > > -   rcu_preempt_qs();
> > > > +   if (t->rcu_read_lock_nesting > 0 ||
> > > > +   (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
> > > > +   /* No QS, force context switch if deferred. */
> > > > +   if (rcu_preempt_need_deferred_qs(t))
> > > > +   resched_cpu(smp_processor_id());
> > > 
> > > 
> > > Hi Paul,
> > > 
> > > I had a similar idea of checking the preempt_count() sometime back but 
> > > didn't
> > > believe this path can be called with preempt enabled (for some reason 
> > > ;-)).
> > > Now that I've convinced myself that's possible, what do you think about
> > > taking advantage of the opportunity to report a RCU-sched qs like below 
> > > from
> > > rcu_check_callbacks ?
> > > 
> > > Did some basic testing, can roll into a patch later if you're Ok with it.
> > 
> > The problem here is that the code patch above cannot be called
> > with CONFIG_PREEMPT_COUNT=n, but the code below can.  And if
> > CONFIG_PREEMPT_COUNT=n, the return value from preempt_count() can be
> > misleading.
> > 
> > Or am I missing something here?
> 
> That is true! so then I could also test if PREEMPT_RCU is enabled like you're
> doing in the other path.
> 
> thanks!
> 
> ---8<---
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index fb440baf8ac6..03a460921dca 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2683,6 +2683,12 @@ void rcu_check_callbacks(int user)
>   rcu_note_voluntary_context_switch(current);
> 
>   } else if (!in_softirq()) {
> + /*
> +  * Report RCU-sched qs if not in an RCU-sched read-side
> +  * critical section.
> +  */
> + if (IS_ENABLED(PREEMPT_RCU) && !(preempt_count() & 
> PREEMPT_MASK))

For more precision, s/PREEMPT_RCU/CONFIG_PREEMPT_COUNT/

Hmmm...  I recently queued a patch that redefines the RCU-bh update-side
API in terms of the consolidated RCU implementation, so this "else"
clause no longer exists.  One approach would be to fold this condition
(with the addition of SOFTIRQ_MASK) into the previous "if" condition,
but that would call rcu_note_voluntary_context_switch() at bad times.
So maybe this becomes a new "else if" clause.

Another complication is an upcoming step that redefines the RCU-sched
update-side API in terms of the consolidated RCU implementation, which
will likely restructure this "if" statement yet again.

So I will try to fold this idea in (with attribution).  If I don't get
it

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Pavel Tatashin

> > + if (!usemap) {
> > + pr_err("%s: usemap allocation failed", __func__);
>
> Wondering if we can provide more useful information for better debugging
> if failed. E.g here tell on what nid the usemap allocation failed.
>
> > +pnum, nid);
> > + if (!map) {
> > + pr_err("%s: memory map backing failed. Some memory 
> > will not be available.",
> > +__func__);
> And here tell nid and the memory section nr failed.

Sure, I will wait for more comments, if any, and add more info to the
error messages in the next revision.

Thank you,
Pavel

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Pavel Tatashin

> Ah, yes, I misunderstood it, sorry for that.
>
> Then I have only one concern, for vmemmap case, if one section doesn't
> succeed to populate its memmap, do we need to skip all the remaining
> sections in that node?

Yes, in sparse_populate_node() we have the following:

294 for (pnum = pnum_begin; map_index < map_count; pnum++) {
295 if (!present_section_nr(pnum))
296 continue;
297 if (!sparse_mem_map_populate(pnum, nid, NULL))
298 break;

So, on the first failure, we even stop trying to populate other
sections. No more memory to do so.

Pavel

Re: [PATCH RFC tip/core/rcu 1/2] rcu: Defer reporting RCU-preempt quiescent states when disabled

2018-07-01 Thread Paul E. McKenney

On Sun, Jul 01, 2018 at 05:37:32PM -0700, Joel Fernandes wrote:
> On Sun, Jul 01, 2018 at 03:25:01PM -0700, Paul E. McKenney wrote:
> [...]
> > > > @@ -602,6 +589,66 @@ static void rcu_read_unlock_special(struct 
> > > > task_struct *t)
> > > > }
> > > >  }
> > > >  
> > > > +/*
> > > > + * Is a deferred quiescent-state pending, and are we also not in
> > > > + * an RCU read-side critical section?  It is the caller's 
> > > > responsibility
> > > > + * to ensure it is otherwise safe to report any deferred quiescent
> > > > + * states.  The reason for this is that it is safe to report a
> > > > + * quiescent state during context switch even though preemption
> > > > + * is disabled.  This function cannot be expected to understand these
> > > > + * nuances, so the caller must handle them.
> > > > + */
> > > > +static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > > > +{
> > > > +   return (this_cpu_ptr(&rcu_preempt_data)->deferred_qs ||
> > > > +   READ_ONCE(t->rcu_read_unlock_special.s)) &&
> > > > +  !t->rcu_read_lock_nesting;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Report a deferred quiescent state if needed and safe to do so.
> > > > + * As with rcu_preempt_need_deferred_qs(), "safe" involves only
> > > > + * not being in an RCU read-side critical section.  The caller must
> > > > + * evaluate safety in terms of interrupt, softirq, and preemption
> > > > + * disabling.
> > > > + */
> > > > +static void rcu_preempt_deferred_qs(struct task_struct *t)
> > > > +{
> > > > +   unsigned long flags;
> > > > +
> > > > +   if (!rcu_preempt_need_deferred_qs(t))
> > > > +   return;
> > > > +   local_irq_save(flags);
> > > > +   rcu_preempt_deferred_qs_irqrestore(t, flags);
> > > > +}
> > > > +
> > > > +/*
> > > > + * Handle special cases during rcu_read_unlock(), such as needing to
> > > > + * notify RCU core processing or task having blocked during the RCU
> > > > + * read-side critical section.
> > > > + */
> > > > +static void rcu_read_unlock_special(struct task_struct *t)
> > > > +{
> > > > +   unsigned long flags;
> > > > +   bool preempt_bh_were_disabled = !!(preempt_count() & 
> > > > ~HARDIRQ_MASK);
> > > 
> > > Would it be better to just test for those bits just to be safe the higher
> > > order bits don't bleed in, such as PREEMPT_NEED_RESCHED, something like 
> > > the
> > > following based on the 'dev' branch?
> > 
> > Good point!  My plan is to merge it into the original commit with
> > attribution.  Please let me know if you have objections.
> > 
> 
> Sure! That sounds good to me.

Very good, I now have a "squash" commit queued, thank you!

Thanx, Paul

Re: [PATCH v2 5/6] mm: track gup pages with page->dma_pinned_* fields

2018-07-01 Thread kbuild test robot

Hi John,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.18-rc3]
[cannot apply to next-20180629]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/mm-fs-gup-don-t-unmap-or-drop-filesystem-buffers/20180702-090125
config: x86_64-randconfig-x010-201826 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/atomic.h:5:0,
from include/linux/atomic.h:5,
from include/linux/page_counter.h:5,
from mm/memcontrol.c:34:
   mm/memcontrol.c: In function 'unlock_page_lru':
   mm/memcontrol.c:2087:32: error: 'page_tail' undeclared (first use in this 
function); did you mean 'page_pool'?
  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
   ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> include/linux/mmdebug.h:21:3: note: in expansion of macro 'if'
  if (unlikely(cond)) { \
  ^~
   include/linux/compiler.h:48:24: note: in expansion of macro 
'__branch_check__'
#  define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
   ^~~~
>> include/linux/mmdebug.h:21:7: note: in expansion of macro 'unlikely'
  if (unlikely(cond)) { \
  ^~~~
   mm/memcontrol.c:2087:3: note: in expansion of macro 'VM_BUG_ON_PAGE'
  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
  ^~
   mm/memcontrol.c:2087:32: note: each undeclared identifier is reported only 
once for each function it appears in
  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
   ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> include/linux/mmdebug.h:21:3: note: in expansion of macro 'if'
  if (unlikely(cond)) { \
  ^~
   include/linux/compiler.h:48:24: note: in expansion of macro 
'__branch_check__'
#  define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
   ^~~~
>> include/linux/mmdebug.h:21:7: note: in expansion of macro 'unlikely'
  if (unlikely(cond)) { \
  ^~~~
   mm/memcontrol.c:2087:3: note: in expansion of macro 'VM_BUG_ON_PAGE'
  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
  ^~

vim +/if +21 include/linux/mmdebug.h

309381feae Sasha Levin   2014-01-23  16  
59ea746337 Jiri Slaby2008-06-12  17  #ifdef CONFIG_DEBUG_VM
59ea746337 Jiri Slaby2008-06-12  18  #define VM_BUG_ON(cond) 
BUG_ON(cond)
309381feae Sasha Levin   2014-01-23  19  #define VM_BUG_ON_PAGE(cond, 
page) \
e4f674229c Dave Hansen   2014-06-04  20 do {
\
e4f674229c Dave Hansen   2014-06-04 @21 if 
(unlikely(cond)) {   \
e4f674229c Dave Hansen   2014-06-04  22 
dump_page(page, "VM_BUG_ON_PAGE(" __stringify(cond)")");\
e4f674229c Dave Hansen   2014-06-04  23 BUG();  
\
e4f674229c Dave Hansen   2014-06-04  24 }   
\
e4f674229c Dave Hansen   2014-06-04  25 } while (0)
fa3759ccd5 Sasha Levin   2014-10-09  26  #define VM_BUG_ON_VMA(cond, 
vma)   \
fa3759ccd5 Sasha Levin   2014-10-09  27 do {
\
fa3759ccd5 Sasha Levin   2014-10-09  28 if 
(unlikely(cond)) {   \
fa3759ccd5 Sasha Levin   2014-10-09  29 
dump_vma(vma);  \
fa3759ccd5 Sasha Levin   2014-10-09  30 BUG();  
\
fa3759ccd5 Sasha Levin   2014-10-09  31 }   
\
fa3759ccd5 Sasha Levin   2014-10-09  32 } while (0)
31c9afa6db Sasha Levin   2014-10-09  33  #define VM_BUG_ON_MM(cond, mm) 
\
31c9afa6db Sasha Levin   2014-10-09  34 do {
\
31c9afa6db Sasha Levin   2014-10-09

Re: [PATCH] f2fs: Replace strncpy with memcpy

2018-07-01 Thread Chao Yu

On 2018/7/2 10:16, Guenter Roeck wrote:
> On 07/01/2018 06:53 PM, Chao Yu wrote:
>> On 2018/7/2 4:57, Guenter Roeck wrote:
>>> gcc 8.1.0 complains:
>>>
>>> fs/f2fs/namei.c: In function 'f2fs_update_extension_list':
>>> fs/f2fs/namei.c:257:3: warning:
>>> 'strncpy' output truncated before terminating nul copying
>>> as many bytes from a string as its length
>>> fs/f2fs/namei.c:249:3: warning:
>>> 'strncpy' output truncated before terminating nul copying
>>> as many bytes from a string as its length
>>>
>>> Using strncpy() is indeed less than perfect since the length of data to
>>> be copied has already been determined with strlen(). Replace strncpy()
>>> with memcpy() to address the warning and optimize the code a little.
>>>
>>> Signed-off-by: Guenter Roeck 
>>> ---
>>>   fs/f2fs/namei.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
>>> index 231b7f3ea7d3..e75607544f7c 100644
>>> --- a/fs/f2fs/namei.c
>>> +++ b/fs/f2fs/namei.c
>>> @@ -246,7 +246,7 @@ int f2fs_update_extension_list(struct f2fs_sb_info 
>>> *sbi, const char *name,
>>> return -EINVAL;
>>>   
>>> if (hot) {
>>> -   strncpy(extlist[count], name, strlen(name));
>>> +   memcpy(extlist[count], name, strlen(name));
>>
>> How about replacing with strcpy(extlist[count], name)? Because name length 
>> has
>> already been checked before f2fs_update_extension_list, it should be valid, 
>> and
>> will not cause overflow during copying.
>>
> 
> Your call; feel free to submit an alternative. Since it is different files, 
> static
> analysis might not know and complain, though. You might want to make sure 
> that this
> doesn't happen, and also add a comment explaining the reason for using 
> strcpy().

Yeah, that could be changed in another patch, but it will be trivial. Anyway, to
fix this gcc complaint, this patch looks good to me, thanks for the patch. :)

Reviewed-by: Chao Yu 

Thanks,

> 
> Thanks,
> Guenter
> 
>

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Baoquan He

On 07/01/18 at 10:04pm, Pavel Tatashin wrote:
> +/*
> + * Initialize sparse on a specific node. The node spans [pnum_begin, 
> pnum_end)
> + * And number of present sections in this node is map_count.
> + */
> +void __init sparse_init_nid(int nid, unsigned long pnum_begin,
> +unsigned long pnum_end,
> +unsigned long map_count)
> +{
> + unsigned long pnum, usemap_longs, *usemap, map_index;
> + struct page *map, *map_base;
> +
> + usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS);
> + usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
> +   usemap_size() *
> +   map_count);
> + if (!usemap) {
> + pr_err("%s: usemap allocation failed", __func__);

Wondering if we can provide more useful information for better debugging
if failed. E.g here tell on what nid the usemap allocation failed.

> + goto failed;
> + }
> + map_base = sparse_populate_node(pnum_begin, pnum_end,
> + map_count, nid);
> + map_index = 0;
> + for_each_present_section_nr(pnum_begin, pnum) {
> + if (pnum >= pnum_end)
> + break;
> +
> + BUG_ON(map_index == map_count);
> + map = sparse_populate_node_section(map_base, map_index,
> +pnum, nid);
> + if (!map) {
> + pr_err("%s: memory map backing failed. Some memory will 
> not be available.",
> +__func__);
And here tell nid and the memory section nr failed.

> + pnum_begin = pnum;
> + goto failed;
> + }
> + check_usemap_section_nr(nid, usemap);
> + sparse_init_one_section(__nr_to_section(pnum), pnum, map,
> + usemap);
> + map_index++;
> + usemap += usemap_longs;
> + }
> + return;
> +failed:
> + /* We failed to allocate, mark all the following pnums as not present */
> + for_each_present_section_nr(pnum_begin, pnum) {
> + struct mem_section *ms;
> +
> + if (pnum >= pnum_end)
> + break;
> + ms = __nr_to_section(pnum);
> + ms->section_mem_map = 0;
> + }
> +}
> +
>  /*
>   * Allocate the accumulated non-linear sections, allocate a mem_map
>   * for each and record the physical to section mapping.
> -- 
> 2.18.0
>

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Baoquan He

On 07/01/18 at 10:43pm, Pavel Tatashin wrote:
> On Sun, Jul 1, 2018 at 10:31 PM Baoquan He  wrote:
> >
> > On 07/01/18 at 10:18pm, Pavel Tatashin wrote:
> > > > Here, I think it might be not right to jump to 'failed' directly if one
> > > > section of the node failed to populate memmap. I think the original code
> > > > is only skipping the section which memmap failed to populate by marking
> > > > it as not present with "ms->section_mem_map = 0".
> > > >
> > >
> > > Hi Baoquan,
> > >
> > > Thank you for a careful review. This is an intended change compared to
> > > the original code. Because we operate per-node now, if we fail to
> > > allocate a single section, in this node, it means we also will fail to
> > > allocate all the consequent sections in the same node and no need to
> > > check them anymore. In the original code we could not simply bailout,
> > > because we still might have valid entries in the following nodes.
> > > Similarly, sparse_init() will call sparse_init_nid() for the next node
> > > even if previous node failed to setup all the memory.
> >
> > Hmm, say the node we are handling is node5, and there are 100 sections.
> > If you allocate memmap for section at one time, you have succeeded to
> > handle for the first 99 sections, now the 100th failed, so you will mark
> > all sections on node5 as not present. And the allocation failure is only
> > for single section memmap allocation case.
> 
> No, unless I am missing something, that's not how code works:
> 
> 463 if (!map) {
> 464 pr_err("%s: memory map backing failed.
> Some memory will not be available.",
> 465__func__);
> 466 pnum_begin = pnum;
> 467 goto failed;
> 468 }
> 
> 476 failed:
> 477 /* We failed to allocate, mark all the following pnums as
> not present */
> 478 for_each_present_section_nr(pnum_begin, pnum) {
> 
> We continue from the pnum that failed as we set pnum_begin to pnum,
> and mark all the consequent sections as not-present.

Ah, yes, I misunderstood it, sorry for that.

Then I have only one concern, for vmemmap case, if one section doesn't
succeed to populate its memmap, do we need to skip all the remaining
sections in that node?

> 
> The only change compared to the original code is that once we found an
> empty pnum we stop checking the consequent pnums in this node, as we
> know they are empty as well, because there is no more memory in this
> node to allocate from.
>

Re: [PATCH RESEND 0/3] K2G: mmc: Update mmc dt node to use

2018-07-01 Thread Kishon Vijay Abraham I




On Saturday 30 June 2018 04:29 AM, Santosh Shilimkar wrote:
> On 6/27/2018 9:15 PM, Kishon Vijay Abraham I wrote:
>> Santosh,
>>
>> On Friday 22 June 2018 03:46 PM, Kishon Vijay Abraham I wrote:
>>> Update mmc dt node to use sdhci-omap binding instead of omap_hsmmc
>>> binding.
>>>
>>> I've also updated keystone_defconfig to enable CONFIG_MMC_SDHCI_OMAP.
>>> Everyone who use a custom .config should also enable
>>> CONFIG_MMC_SDHCI_OMAP for MMC to work.
>>
>> Can this series be merged?
>>
> Applied. Sorry for missing it in earlier batch.

Thanks Santosh.

-Kishon

[PATCH] x86: make Memory Management options more visible

2018-07-01 Thread Randy Dunlap

From: Randy Dunlap 

Currently for x86, the "Memory Management" kconfig options are
displayed under "Processor type and features."  This tends to
make them hidden or difficult to find.

This patch makes Memory Managment options a first-class menu by moving
it away from "Processor type and features" and into the main menu.

Also clarify "endmenu" lines with '#' comments of their respective
menu names, just to help people who are reading or editing the
Kconfig file.

Signed-off-by: Randy Dunlap 
---
 arch/x86/Kconfig |  186 ++---
 1 file changed, 95 insertions(+), 91 deletions(-)

--- lnx-418-rc3.orig/arch/x86/Kconfig
+++ lnx-418-rc3/arch/x86/Kconfig
@@ -1638,93 +1638,6 @@ config ILLEGAL_POINTER_VALUE
default 0 if X86_32
default 0xdead if X86_64
 
-source "mm/Kconfig"
-
-config X86_PMEM_LEGACY_DEVICE
-   bool
-
-config X86_PMEM_LEGACY
-   tristate "Support non-standard NVDIMMs and ADR protected memory"
-   depends on PHYS_ADDR_T_64BIT
-   depends on BLK_DEV
-   select X86_PMEM_LEGACY_DEVICE
-   select LIBNVDIMM
-   help
- Treat memory marked using the non-standard e820 type of 12 as used
- by the Intel Sandy Bridge-EP reference BIOS as protected memory.
- The kernel will offer these regions to the 'pmem' driver so
- they can be used for persistent storage.
-
- Say Y if unsure.
-
-config HIGHPTE
-   bool "Allocate 3rd-level pagetables from highmem"
-   depends on HIGHMEM
-   ---help---
- The VM uses one page table entry for each page of physical memory.
- For systems with a lot of RAM, this can be wasteful of precious
- low memory.  Setting this option will put user-space page table
- entries in high memory.
-
-config X86_CHECK_BIOS_CORRUPTION
-   bool "Check for low memory corruption"
-   ---help---
- Periodically check for memory corruption in low memory, which
- is suspected to be caused by BIOS.  Even when enabled in the
- configuration, it is disabled at runtime.  Enable it by
- setting "memory_corruption_check=1" on the kernel command
- line.  By default it scans the low 64k of memory every 60
- seconds; see the memory_corruption_check_size and
- memory_corruption_check_period parameters in
- Documentation/admin-guide/kernel-parameters.rst to adjust this.
-
- When enabled with the default parameters, this option has
- almost no overhead, as it reserves a relatively small amount
- of memory and scans it infrequently.  It both detects corruption
- and prevents it from affecting the running system.
-
- It is, however, intended as a diagnostic tool; if repeatable
- BIOS-originated corruption always affects the same memory,
- you can use memmap= to prevent the kernel from using that
- memory.
-
-config X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK
-   bool "Set the default setting of memory_corruption_check"
-   depends on X86_CHECK_BIOS_CORRUPTION
-   default y
-   ---help---
- Set whether the default state of memory_corruption_check is
- on or off.
-
-config X86_RESERVE_LOW
-   int "Amount of low memory, in kilobytes, to reserve for the BIOS"
-   default 64
-   range 4 640
-   ---help---
- Specify the amount of low memory to reserve for the BIOS.
-
- The first page contains BIOS data structures that the kernel
- must not use, so that page must always be reserved.
-
- By default we reserve the first 64K of physical RAM, as a
- number of BIOSes are known to corrupt that memory range
- during events such as suspend/resume or monitor cable
- insertion, so it must not be used by the kernel.
-
- You can set this to 4 if you are absolutely sure that you
- trust the BIOS to get all its memory reservations and usages
- right.  If you know your BIOS have problems beyond the
- default 64K area, you can set this to 640 to avoid using the
- entire low memory range.
-
- If you have doubts about the BIOS (e.g. suspend/resume does
- not work or there's kernel crashes after certain hardware
- hotplug events) then you might want to enable
- X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check
- typical corruption patterns.
-
- Leave this to the default value of 64 if you are unsure.
-
 config MATH_EMULATION
bool
depends on MODIFY_LDT_SYSCALL
@@ -2392,7 +2305,98 @@ config MODIFY_LDT_SYSCALL
 
 source "kernel/livepatch/Kconfig"
 
-endmenu
+endmenu # Processor type and features
+
+menu "Memory Management options"
+
+source "mm/Kconfig"
+
+config X86_PMEM_LEGACY_DEVICE
+   bool
+
+config X86_PMEM_LEGACY
+   tristate "Support non-standard NVDIMMs and ADR protected memory"
+   depends on PHYS_ADDR_T_64BIT
+

Re: [PATCH v2 4/6] mm/fs: add a sync_mode param for clear_page_dirty_for_io()

2018-07-01 Thread kbuild test robot

Hi John,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.18-rc3]
[cannot apply to next-20180629]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/mm-fs-gup-don-t-unmap-or-drop-filesystem-buffers/20180702-090125
config: i386-randconfig-x075-201826 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   fs/f2fs/dir.c: In function 'f2fs_delete_entry':
>> fs/f2fs/dir.c:734:33: error: 'WB_SYNC_ALL' undeclared (first use in this 
>> function); did you mean 'FS_SYNC_FL'?
  clear_page_dirty_for_io(page, WB_SYNC_ALL);
^~~
FS_SYNC_FL
   fs/f2fs/dir.c:734:33: note: each undeclared identifier is reported only once 
for each function it appears in
--
   fs/f2fs/inline.c: In function 'f2fs_convert_inline_page':
>> fs/f2fs/inline.c:139:50: error: dereferencing pointer to incomplete type 
>> 'struct writeback_control'
 dirty = clear_page_dirty_for_io(page, fio.io_wbc->sync_mode);
 ^~
--
   In file included from include/linux/kernel.h:10:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs/f2fs/checkpoint.c:11:
   fs/f2fs/checkpoint.c: In function 'commit_checkpoint':
>> fs/f2fs/checkpoint.c:1200:49: error: invalid type argument of '->' (have 
>> 'struct writeback_control')
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
^
   include/linux/compiler.h:77:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
--
   fs/f2fs/data.c: In function 'f2fs_write_cache_pages':
>> fs/f2fs/data.c:2021:9: error: too few arguments to function 
>> 'clear_page_dirty_for_io'
   if (!clear_page_dirty_for_io(page), wbc->sync_mode)
^~~
   In file included from include/linux/pagemap.h:8:0,
from include/linux/f2fs_fs.h:14,
from fs/f2fs/data.c:12:
   include/linux/mm.h:1540:5: note: declared here
int clear_page_dirty_for_io(struct page *page, int sync_mode);
^~~
   fs/f2fs/data.c:2021:38: warning: left-hand operand of comma expression has 
no effect [-Wunused-value]
   if (!clear_page_dirty_for_io(page), wbc->sync_mode)
 ^

vim +734 fs/f2fs/dir.c

   690  
   691  /*
   692   * It only removes the dentry from the dentry page, corresponding name
   693   * entry in name page does not need to be touched during deletion.
   694   */
   695  void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
   696  struct inode *dir, struct inode 
*inode)
   697  {
   698  struct  f2fs_dentry_block *dentry_blk;
   699  unsigned int bit_pos;
   700  int slots = GET_DENTRY_SLOTS(le16_to_cpu(dentry->name_len));
   701  int i;
   702  
   703  f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
   704  
   705  if (F2FS_OPTION(F2FS_I_SB(dir)).fsync_mode == FSYNC_MODE_STRICT)
   706  f2fs_add_ino_entry(F2FS_I_SB(dir), dir->i_ino, 
TRANS_DIR_INO);
   707  
   708  if (f2fs_has_inline_dentry(dir))
   709  return f2fs_delete_inline_entry(dentry, page, dir, 
inode);
   710  
   711  lock_page(page);
   712  f2fs_wait_on_page_writeback(page, DATA, true);
   713  
   714  dentry_blk = page_address(page);
   715  bit_pos = dentry - dentry_blk->dentry;
   716  for (i = 0; i < slots; i++)
   717  __clear_bit_le(bit_pos + i, &dentry_blk->dentry_bitmap);
   718  
   719  /* Let's check and deallocate this dentry page */
   720  bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
   721  NR_DENTRY_IN_BLOCK,
   722  0);
   723  set_page_dirty(page);
   724  
   725  dir->i_ctime = dir->i_mtime = current_time(dir);
   726  f2fs_mark_inode_dirty_sync(dir, false);
   727  
   728  if (inode)
   729  f2fs_drop_nlink(dir, inode);
   730  
   731  if (bit_pos == NR_DENTRY_IN_BLOCK &&
   732  !f2fs_truncate_hole(dir, page->index, page->index + 1)) 
{
   733  f2fs_clear_radix_tree_dirty_tag(page);
 > 734  clear_page_dirty_for_io(page, WB_SYNC_ALL);
   735  ClearPageP

[PATCH v2] kbuild: verify that $DEPMOD is installed

2018-07-01 Thread Randy Dunlap

From: Randy Dunlap 

Verify that 'depmod' ($DEPMOD) is installed.
This is a partial revert of 620c231c7a7f (from 2012):
  ("kbuild: do not check for ancient modutils tools")

Also update Documentation/process/changes.rst to refer to
kmod instead of module-init-tools.

Fixes kernel bugzilla #198965:
https://bugzilla.kernel.org/show_bug.cgi?id=198965

Signed-off-by: Randy Dunlap 
Cc: Lucas De Marchi 
Cc: Lucas De Marchi 
Cc: Michal Marek 
Cc: Jessica Yu 
Cc: Chih-Wei Huang 
Cc: sta...@vger.kernel.org # any kernel since 2012
---
v2:
- spell out that modules_install requires $DEPMOD
- remove references to module-init-tools from
  Documentation/process/changes.rst and add kmod

 Documentation/process/changes.rst |   19 +++
 scripts/depmod.sh |8 +++-
 2 files changed, 14 insertions(+), 13 deletions(-)

--- lnx-418-rc3.orig/scripts/depmod.sh
+++ lnx-418-rc3/scripts/depmod.sh
@@ -10,10 +10,16 @@ fi
 DEPMOD=$1
 KERNELRELEASE=$2
 
-if ! test -r System.map -a -x "$DEPMOD"; then
+if ! test -r System.map ; then
exit 0
 fi
 
+if [ -z $(command -v $DEPMOD) ]; then
+   echo "'make modules_install' requires $DEPMOD. Please install it." >&2
+   echo "This is probably in the kmod package." >&2
+   exit 1
+fi
+
 # older versions of depmod require the version string to start with three
 # numbers, so we cheat with a symlink here
 depmod_hack_needed=true
--- lnx-418-rc3.orig/Documentation/process/changes.rst
+++ lnx-418-rc3/Documentation/process/changes.rst
@@ -35,7 +35,7 @@ binutils   2.20
 flex   2.5.35   flex --version
 bison  2.0  bison --version
 util-linux 2.10ofdformat --version
-module-init-tools  0.9.10   depmod -V
+kmod   13   depmod -V
 e2fsprogs  1.41.4   e2fsck -V
 jfsutils   1.1.3fsck.jfs -V
 reiserfsprogs  3.6.3reiserfsck -V
@@ -156,12 +156,6 @@ is not build with ``CONFIG_KALLSYMS`` an
 reproduce the Oops with that option, then you can still decode that Oops
 with ksymoops.
 
-Module-Init-Tools
--
-
-A new module loader is now in the kernel that requires ``module-init-tools``
-to use.  It is backward compatible with the 2.4.x series kernels.
-
 Mkinitrd
 
 
@@ -371,16 +365,17 @@ Util-linux
 
 - 
 
+Kmod
+
+
+- 
+- 
+
 Ksymoops
 
 
 - 
 
-Module-Init-Tools
--
-
-- 
-
 Mkinitrd

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Pavel Tatashin

On Sun, Jul 1, 2018 at 10:31 PM Baoquan He  wrote:
>
> On 07/01/18 at 10:18pm, Pavel Tatashin wrote:
> > > Here, I think it might be not right to jump to 'failed' directly if one
> > > section of the node failed to populate memmap. I think the original code
> > > is only skipping the section which memmap failed to populate by marking
> > > it as not present with "ms->section_mem_map = 0".
> > >
> >
> > Hi Baoquan,
> >
> > Thank you for a careful review. This is an intended change compared to
> > the original code. Because we operate per-node now, if we fail to
> > allocate a single section, in this node, it means we also will fail to
> > allocate all the consequent sections in the same node and no need to
> > check them anymore. In the original code we could not simply bailout,
> > because we still might have valid entries in the following nodes.
> > Similarly, sparse_init() will call sparse_init_nid() for the next node
> > even if previous node failed to setup all the memory.
>
> Hmm, say the node we are handling is node5, and there are 100 sections.
> If you allocate memmap for section at one time, you have succeeded to
> handle for the first 99 sections, now the 100th failed, so you will mark
> all sections on node5 as not present. And the allocation failure is only
> for single section memmap allocation case.

No, unless I am missing something, that's not how code works:

463 if (!map) {
464 pr_err("%s: memory map backing failed.
Some memory will not be available.",
465__func__);
466 pnum_begin = pnum;
467 goto failed;
468 }

476 failed:
477 /* We failed to allocate, mark all the following pnums as
not present */
478 for_each_present_section_nr(pnum_begin, pnum) {

We continue from the pnum that failed as we set pnum_begin to pnum,
and mark all the consequent sections as not-present.

The only change compared to the original code is that once we found an
empty pnum we stop checking the consequent pnums in this node, as we
know they are empty as well, because there is no more memory in this
node to allocate from.

Pavel

Re: [PATCH v1 0/6] perf cs-etm: Fix tracing packet handling and minor refactoring

2018-07-01 Thread leo . yan

Hi Arnaldo,

On Tue, Jun 19, 2018 at 03:19:43PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jun 19, 2018 at 11:46:02AM -0600, Mathieu Poirier escreveu:
> > On Sun, 17 Jun 2018 at 23:10, Leo Yan  wrote:
> > >
> > > Due the current code is missing to handle cs-etm start tracing packet
> > > and CS_ETM_TRACE_ON packet, we fail to generate branch sample for them.
> > >
> > > This patch series is to fix cs-etm tracing packet handling:
> > >
> > > Patch 0001 is to add invalid address macro for readable coding;
> > >
> > > Patch 0002 is one minor fixing to return error code for instruction
> > > sample failure;
> > >
> > > Patches 0003~0006 are fixing patches for start tracing packet
> > > and CS_ETM_TRACE_ON packet.
> > >
> > > This patch series is applied on acme tree [1] on branch perf/core with
> > > latest commit: e238cf2e3d2e ("perf intel-pt: Fix packet decoding of CYC
> > > packets").  Also applied successfully this patch series on Linus tree
> > > on 4.18-rc1.
> > >
> > > This patch series has been verified on Hikey620 platform with below two
> > > commands:
> > > perf script --itrace=i1il128 -F cpu,event,ip,addr,sym -k ./vmlinux
> > > perf script -F cpu,event,ip,addr,sym -k ./vmlinux
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
> > >
> > >
> > > Leo Yan (6):
> > >   perf cs-etm: Introduce invalid address macro
> > >   perf cs-etm: Bail out immediately for instruction sample failure
> > >   perf cs-etm: Fix start tracing packet handling
> > >   perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet
> > >   perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON
> > > packet
> > >   perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
> > >
> > >  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 ++--
> > >  tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  1 +
> > >  tools/perf/util/cs-etm.c| 71 
> > > +
> > >  3 files changed, 68 insertions(+), 14 deletions(-)
> > 
> > Good day Arnaldo,
> > 
> > I am good with this set:
> > 
> > Reviewed-by: Mathieu Poirier 
> > 
> > Please consider for inclusion in your tree if you are satisfied with the 
> > work.
> 
> I'll take a look and get it into perf/core, now I'm concentrating on
> perf/urgent work.

Gentle ping for this patch series.

I tested this patch series on perf/core branch with latest commit
87b40003b17a ("perf tests: Check that complex event name is parsed
correctly"), this patch series is still valid and can pass testing.

Could you pick up them?  If you want me to resend this patch series,
please let me know.

Thanks,
Leo Yan

Re: [PATCH v2 1/3] arm64: dts: actions: Enable clock controller for S700

2018-07-01 Thread Manivannan Sadhasivam

Hi,

On Sun, Jul 01, 2018 at 07:50:01PM +0200, Saravanan Sekar wrote:
> Hi Mani,
> 
> 
> On 06/30/18 11:42, Manivannan Sadhasivam wrote:
> > On Thu, Jun 28, 2018 at 09:18:03PM +0200, Saravanan Sekar wrote:
> > > Added clock management controller for S700
> > > 
> > > Signed-off-by: Parthiban Nallathambi 
> > > Signed-off-by: Saravanan Sekar 
> > > ---
> > >   .../boot/dts/actions/s700-cubieboard7.dts |   7 -
> > >   arch/arm64/boot/dts/actions/s700.dtsi |   8 ++
> > >   include/dt-bindings/clock/actions,s700-cmu.h  | 128 ++
> > >   3 files changed, 136 insertions(+), 7 deletions(-)
> > >   create mode 100644 include/dt-bindings/clock/actions,s700-cmu.h
> > > 
> > > diff --git a/arch/arm64/boot/dts/actions/s700-cubieboard7.dts 
> > > b/arch/arm64/boot/dts/actions/s700-cubieboard7.dts
> > > index ef79d7905f44..28f3f4a0f7f0 100644
> > > --- a/arch/arm64/boot/dts/actions/s700-cubieboard7.dts
> > > +++ b/arch/arm64/boot/dts/actions/s700-cubieboard7.dts
> > > @@ -28,12 +28,6 @@
> > >   device_type = "memory";
> > >   reg = <0x1 0xe000 0x0 0x0>;
> > >   };
> > > -
> > > - uart3_clk: uart3-clk {
> > > - compatible = "fixed-clock";
> > > - clock-frequency = <921600>;
> > > - #clock-cells = <0>;
> > > - };
> > Sourcing CMU clock for UART should be in a separate patch.
> 
> sure
> 
> > >   };
> > >   &timer {
> > > @@ -42,5 +36,4 @@
> > >   &uart3 {
> > >   status = "okay";
> > > - clocks = <&uart3_clk>;
> > >   };
> > > diff --git a/arch/arm64/boot/dts/actions/s700.dtsi 
> > > b/arch/arm64/boot/dts/actions/s700.dtsi
> > > index 66dd5309f0a2..3530b705df90 100644
> > > --- a/arch/arm64/boot/dts/actions/s700.dtsi
> > > +++ b/arch/arm64/boot/dts/actions/s700.dtsi
> > > @@ -4,6 +4,7 @@
> > >*/
> > >   #include 
> > > +#include 
> > >   / {
> > >   compatible = "actions,s700";
> > > @@ -44,6 +45,12 @@
> > >   };
> > >   };
> > > + clock: clock-controller@e0168000 {
> > > + compatible = "actions,s700-cmu";
> > > + reg = <0 0xe0168000 0 0x1000>;
> > > + #clock-cells = <1>;
> > > + };
> > > +
> > There is no fixed rate clock like losc?
> 
> losc is 32k, I will add it
> 
> > >   reserved-memory {
> > >   #address-cells = <2>;
> > >   #size-cells = <2>;
> > > @@ -129,6 +136,7 @@
> > >   compatible = "actions,s900-uart", 
> > > "actions,owl-uart";
> > >   reg = <0x0 0xe0126000 0x0 0x2000>;
> > >   interrupts = ;
> > > + clocks = <&clock CLK_UART3>;
> > >   status = "disabled";
> > >   };
> > > diff --git a/include/dt-bindings/clock/actions,s700-cmu.h 
> > > b/include/dt-bindings/clock/actions,s700-cmu.h
> > > new file mode 100644
> > > index ..e5b4ea130953
> > > --- /dev/null
> > > +++ b/include/dt-bindings/clock/actions,s700-cmu.h
> > > @@ -0,0 +1,128 @@
> > > +// SPDX-License-Identifier: GPL-2.0+
> > > +/*
> > > + * Actions S700 clock driver
> > > + *
> > > + * Copyright (c) 2014 Actions Semi Inc.
> > > + * Author: David Liu 
> > > + *
> > > + * Author: Pathiban Nallathambi 
> > > + * Author: Saravanan Sekar 
> > > + */
> > > +
> > > +#ifndef __DT_BINDINGS_CLOCK_S700_H
> > > +#define __DT_BINDINGS_CLOCK_S700_H
> > > +
> > > +#define CLK_NONE 0
> > > +
> > > +/* fixed rate clocks */
> > > +#define CLK_LOSC 1
> > > +#define CLK_HOSC 2
> > > +
> > > +/* pll clocks */
> > > +#define CLK_CORE_PLL 3
> > > +#define CLK_DEV_PLL  4
> > > +#define CLK_DDR_PLL  5
> > > +#define CLK_NAND_PLL 6
> > > +#define CLK_DISPLAY_PLL  7
> > > +#define CLK_TVOUT_PLL8
> > > +#define CLK_CVBS_PLL 9
> > > +#define CLK_AUDIO_PLL10
> > > +#define CLK_ETHERNET_PLL 11
> > > +
> > Remove extra new line please.
> > 
> > > +
> > > +/* system clock */
> > > +#define CLK_SYS_BASE 12
> > > +#define CLK_CPU  CLK_SYS_BASE
> > > +#define CLK_DEV  (CLK_SYS_BASE+1)
> > > +#define CLK_AHB  (CLK_SYS_BASE+2)
> > > +#define CLK_APB  (CLK_SYS_BASE+3)
> > > +#define CLK_DMAC (CLK_SYS_BASE+4)
> > > +#define CLK_NOC0_CLK_MUX (CLK_SYS_BASE+5)
> > > +#define CLK_NOC1_CLK_MUX (CLK_SYS_BASE+6)
> > > +#define CLK_HP_CLK_MUX   (CLK_SYS_BASE+7)
> > > +#define CLK_HP_CLK_DIV   (CLK_SYS_BASE+8)
> > > +#define CLK_NOC1_CLK_DIV (CLK_SYS_BASE+9)
> > > +#define CLK_NOC0 (CLK_SYS_BASE+10)
> > > +#define CLK_NOC1 (CLK_SYS_BASE+11)
> > > +#define CLK_SENOR_SRC(CLK_SYS_BASE+12)
> > > +
> > > +/* peripheral dev

Re: [PATCH v5 0/7] add virt-dma support for imx-sdma

2018-07-01 Thread Robin Gong

Hi Vinod,
Do you have any comment for this patchset? Lucas and Sascha
acked it and tty patch already merged in.

On 二, 2018-06-26 at 17:04 +0200, Lucas Stach wrote:
> Hi Robin,
> 
> I've tested this whole series with the SDMA being used for SPI, UART
> and SSI with no regressions spotted. As this should cover most common
> use-cases, I think this series is good to go in.
> 
> Tested-by: Lucas Stach 
> 
> Regards,
> Lucas
> 
> Am Mittwoch, den 20.06.2018, 00:56 +0800 schrieb Robin Gong:
> > 
> > The legacy sdma driver has below limitations or drawbacks:
> >   1. Hardcode the max BDs number as "PAGE_SIZE / sizeof(*)", and
> > alloc
> >  one page size for one channel regardless of only few BDs
> > needed
> >  most time. But in few cases, the max PAGE_SIZE maybe not
> > enough.
> >   2. One SDMA channel can't stop immediatley once channel disabled
> > which
> >  means SDMA interrupt may come in after this channel
> > terminated.There
> >  are some patches for this corner case such as commit
> > "2746e2c389f9",
> >  but not cover non-cyclic.
> > 
> > The common virt-dma overcomes the above limitations. It can alloc
> > bd
> > dynamically and free bd once this tx transfer done. No memory
> > wasted or
> > maximum limititation here, only depends on how many memory can be
> > requested
> > from kernel. For No.2, such issue can be workaround by checking if
> > there
> > is available descript("sdmac->desc") now once the unwanted
> > interrupt
> > coming. At last the common virt-dma is easier for sdma driver
> > maintain.
> > 
> > Change from v4:
> >   1. identify lockdep issue which caused by allocate memory with
> >  'GFP_KERNEL', change to 'GFP_NOWAIT' instead so that lockdep
> >  ignore check. That also make sense since Audio/uart driver may
> >  call dma function after spin_lock_irqsave()...
> >   2. use dma pool instead for bd description allocated,since audio
> >  driver may call dma_terminate_all in irq. Please refer to 7/7.
> >   3. remove 7/7 serial patch in v4, since lockdep issued fixed by
> > No.1 
> > 
> > Change from v3:
> >   1. add two uart patches which impacted by this patchset.
> >   2. unlock 'vc.lock' before cyclic dma callback and lock again
> > after
> >  it because some driver such as uart will call
> > dmaengine_tx_status
> >  which will acquire 'vc.lock' again and dead lock comes out.
> >   3. remove 'Revert commit' stuff since that patch is not wrong and
> >  combine two patch into one patch as Sascha's comment.
> > 
> > Change from v2:
> >   1. include Sascha's patch to make the main patch easier to
> > review.
> >  Thanks Sacha.
> >   2. remove useless 'desc'/'chan' in struct sdma_channe.
> > 
> > Change from v1:
> >   1. split v1 patch into 5 patches.
> >   2. remove some unnecessary condition check.
> >   3. remove unnecessary 'pending' list.
> > 
> > Robin Gong (6):
> >   tty: serial: imx: correct dma cookie status
> >   dmaengine: imx-sdma: add virt-dma support
> >   dmaengine: imx-sdma: remove useless 'lock' and 'enabled' in
> > 'struct
> > sdma_channel'
> >   dmaengine: imx-sdma: remove the maximum limitation for bd numbers
> >   dmaengine: imx-sdma: add sdma_transfer_init to decrease code
> > overlap
> >   dmaengine: imx-sdma: alloclate bd memory from dma pool
> > 
> > Sascha Hauer (1):
> >   dmaengine: imx-sdma: factor out a struct sdma_desc from struct
> > sdma_channel
> > 
> >  drivers/dma/Kconfig  |   1 +
> >  drivers/dma/imx-sdma.c   | 400 +++--
> > --
> >  drivers/tty/serial/imx.c |   2 +-
> >  3 files changed, 235 insertions(+), 168 deletions(-)
> >

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Baoquan He

On 07/01/18 at 10:18pm, Pavel Tatashin wrote:
> > Here, I think it might be not right to jump to 'failed' directly if one
> > section of the node failed to populate memmap. I think the original code
> > is only skipping the section which memmap failed to populate by marking
> > it as not present with "ms->section_mem_map = 0".
> >
> 
> Hi Baoquan,
> 
> Thank you for a careful review. This is an intended change compared to
> the original code. Because we operate per-node now, if we fail to
> allocate a single section, in this node, it means we also will fail to
> allocate all the consequent sections in the same node and no need to
> check them anymore. In the original code we could not simply bailout,
> because we still might have valid entries in the following nodes.
> Similarly, sparse_init() will call sparse_init_nid() for the next node
> even if previous node failed to setup all the memory.

Hmm, say the node we are handling is node5, and there are 100 sections.
If you allocate memmap for section at one time, you have succeeded to
handle for the first 99 sections, now the 100th failed, so you will mark
all sections on node5 as not present. And the allocation failure is only
for single section memmap allocation case.

Please think about the vmemmap case, it will map the struct page pages
to vmemmap, and will populate page tables for them to map. That is a
long walk, not only memmory allocation, and page table checking and
populating, one section failing to populate memmap doesn't mean all the
consequent sections also failed. I think the original code is
reasonable.

Thanks
Baoquan

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Pavel Tatashin

> Here, I think it might be not right to jump to 'failed' directly if one
> section of the node failed to populate memmap. I think the original code
> is only skipping the section which memmap failed to populate by marking
> it as not present with "ms->section_mem_map = 0".
>

Hi Baoquan,

Thank you for a careful review. This is an intended change compared to
the original code. Because we operate per-node now, if we fail to
allocate a single section, in this node, it means we also will fail to
allocate all the consequent sections in the same node and no need to
check them anymore. In the original code we could not simply bailout,
because we still might have valid entries in the following nodes.
Similarly, sparse_init() will call sparse_init_nid() for the next node
even if previous node failed to setup all the memory.

Pavel

Re: [PATCH v1] ARM: dts: imx6sl-evk: keep sw4 always on

2018-07-01 Thread Robin Gong

On 日, 2018-07-01 at 22:17 -0300, Fabio Estevam wrote:
> On Sun, Jul 1, 2018 at 10:09 PM, Anson Huang 
> wrote:
> 
> > 
> > On some new i.MX platforms, PFuze switches are used for supplying
> > GPU/VPU
> > or other non-critical modules only, these switches need to be
> > turned off by
> > runtime PM to avoid very high power leakage, like on mScale850D.
> Ok, in this case I suggest adding a new property so that the switches
> can be turned off only when the new property is present.
> 
> When this new property is absent, then we keep the current behavior
> and avoid dtb breakage.
> 
> Since MX8M support is not in place yet, this is not urgent, so I will
> send a revert and then you can re-work the patch so that it does not
> affect the old dtbs.
> 
> Do you agree with such approach?
But in fact, the original dts is not correct without 'regulator-always-
on'since SW4 is the critical DDR power rail, although, it's kept on in
the previous kernel by no switches enable/disable interfaces provided
in pfuze driver. Adding new property which can be done totally by the
common 'regulator-always-on' is not a good choice. Keep the dts patch
adding 'regulator-always-on' ahead of pfuze driver pach adding
enable/disable interface is enough for such case I think.

Re: [PATCH v2 5/6] mm: track gup pages with page->dma_pinned_* fields

2018-07-01 Thread kbuild test robot

Hi John,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.18-rc3]
[cannot apply to next-20180629]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/mm-fs-gup-don-t-unmap-or-drop-filesystem-buffers/20180702-090125
config: i386-randconfig-x074-201826 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from include/asm-generic/atomic-instrumented.h:16:0,
from arch/x86/include/asm/atomic.h:283,
from include/linux/atomic.h:5,
from include/linux/page_counter.h:5,
from mm/memcontrol.c:34:
   mm/memcontrol.c: In function 'unlock_page_lru':
>> mm/memcontrol.c:2087:32: error: 'page_tail' undeclared (first use in this 
>> function); did you mean 'page_pool'?
  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
   ^
   include/linux/build_bug.h:36:63: note: in definition of macro 
'BUILD_BUG_ON_INVALID'
#define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e
  ^
>> include/linux/mmdebug.h:46:36: note: in expansion of macro 'VM_BUG_ON'
#define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond)
   ^
>> mm/memcontrol.c:2087:3: note: in expansion of macro 'VM_BUG_ON_PAGE'
  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
  ^~
   mm/memcontrol.c:2087:32: note: each undeclared identifier is reported only 
once for each function it appears in
  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
   ^
   include/linux/build_bug.h:36:63: note: in definition of macro 
'BUILD_BUG_ON_INVALID'
#define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e
  ^
>> include/linux/mmdebug.h:46:36: note: in expansion of macro 'VM_BUG_ON'
#define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond)
   ^
>> mm/memcontrol.c:2087:3: note: in expansion of macro 'VM_BUG_ON_PAGE'
  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
  ^~

vim +2087 mm/memcontrol.c

  2077  
  2078  static void unlock_page_lru(struct page *page, int isolated)
  2079  {
  2080  struct zone *zone = page_zone(page);
  2081  
  2082  if (isolated) {
  2083  struct lruvec *lruvec;
  2084  
  2085  lruvec = mem_cgroup_page_lruvec(page, zone->zone_pgdat);
  2086  VM_BUG_ON_PAGE(PageLRU(page), page);
> 2087  VM_BUG_ON_PAGE(PageDmaPinned(page_tail), page);
  2088  
  2089  SetPageLRU(page);
  2090  add_page_to_lru_list(page, lruvec, page_lru(page));
  2091  }
  2092  spin_unlock_irq(zone_lru_lock(zone));
  2093  }
  2094  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v2 4/6] mm/fs: add a sync_mode param for clear_page_dirty_for_io()

2018-07-01 Thread kbuild test robot

Hi John,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.18-rc3]
[cannot apply to next-20180629]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/mm-fs-gup-don-t-unmap-or-drop-filesystem-buffers/20180702-090125
config: x86_64-randconfig-x010-201826 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:10:0,
from include/linux/list.h:9,
from include/linux/wait.h:7,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from fs/f2fs/checkpoint.c:11:
   fs/f2fs/checkpoint.c: In function 'commit_checkpoint':
   fs/f2fs/checkpoint.c:1200:49: error: invalid type argument of '->' (have 
'struct writeback_control')
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> fs/f2fs/checkpoint.c:1200:2: note: in expansion of macro 'if'
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
 ^~
   include/linux/compiler.h:48:24: note: in expansion of macro 
'__branch_check__'
#  define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
   ^~~~
   fs/f2fs/checkpoint.c:1200:6: note: in expansion of macro 'unlikely'
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
 ^~~~
   fs/f2fs/checkpoint.c:1200:49: error: invalid type argument of '->' (have 
'struct writeback_control')
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> fs/f2fs/checkpoint.c:1200:2: note: in expansion of macro 'if'
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
 ^~
   include/linux/compiler.h:48:24: note: in expansion of macro 
'__branch_check__'
#  define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
   ^~~~
   fs/f2fs/checkpoint.c:1200:6: note: in expansion of macro 'unlikely'
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
 ^~~~
   fs/f2fs/checkpoint.c:1200:49: error: invalid type argument of '->' (have 
'struct writeback_control')
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> fs/f2fs/checkpoint.c:1200:2: note: in expansion of macro 'if'
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
 ^~
   include/linux/compiler.h:48:24: note: in expansion of macro 
'__branch_check__'
#  define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
   ^~~~
   fs/f2fs/checkpoint.c:1200:6: note: in expansion of macro 'unlikely'
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
 ^~~~
   fs/f2fs/checkpoint.c:1200:49: error: invalid type argument of '->' (have 
'struct writeback_control')
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> fs/f2fs/checkpoint.c:1200:2: note: in expansion of macro 'if'
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
 ^~
   include/linux/compiler.h:48:24: note: in expansion of macro 
'__branch_check__'
#  define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
   ^~~~
   fs/f2fs/checkpoint.c:1200:6: note: in expansion of macro 'unlikely'
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
 ^~~~
   fs/f2fs/checkpoint.c:1200:49: error: invalid type argument of '->' (have 
'struct writeback_control')
 if (unlikely(!clear_page_dirty_for_io(page, wbc->sync_mode)))
^
   include/linux/compiler.h:69:16: note: in definition of macro '__trace_if'
  __r = !!(cond); \
   ^~~~
>> fs/f2fs/chec

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Baoquan He

Hi Pavel,

Thanks for your quick fix. You might have missed another comment to v2
patch 1/2 which is at the bottom.

On 07/01/18 at 10:04pm, Pavel Tatashin wrote:
> +/*
> + * Initialize sparse on a specific node. The node spans [pnum_begin, 
> pnum_end)
> + * And number of present sections in this node is map_count.
> + */
> +void __init sparse_init_nid(int nid, unsigned long pnum_begin,
> +unsigned long pnum_end,
> +unsigned long map_count)
> +{
> + unsigned long pnum, usemap_longs, *usemap, map_index;
> + struct page *map, *map_base;
> +
> + usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS);
> + usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
> +   usemap_size() *
> +   map_count);
> + if (!usemap) {
> + pr_err("%s: usemap allocation failed", __func__);
> + goto failed;
> + }
> + map_base = sparse_populate_node(pnum_begin, pnum_end,
> + map_count, nid);
> + map_index = 0;
> + for_each_present_section_nr(pnum_begin, pnum) {
> + if (pnum >= pnum_end)
> + break;
> +
> + BUG_ON(map_index == map_count);
> + map = sparse_populate_node_section(map_base, map_index,
> +pnum, nid);
> + if (!map) {

Here, I think it might be not right to jump to 'failed' directly if one
section of the node failed to populate memmap. I think the original code
is only skipping the section which memmap failed to populate by marking
it as not present with "ms->section_mem_map = 0".

> + pr_err("%s: memory map backing failed. Some memory will 
> not be available.",
> +__func__);
> + pnum_begin = pnum;
> + goto failed;
> + }
> + check_usemap_section_nr(nid, usemap);
> + sparse_init_one_section(__nr_to_section(pnum), pnum, map,
> + usemap);
> + map_index++;
> + usemap += usemap_longs;
> + }
> + return;
> +failed:
> + /* We failed to allocate, mark all the following pnums as not present */
> + for_each_present_section_nr(pnum_begin, pnum) {
> + struct mem_section *ms;
> +
> + if (pnum >= pnum_end)
> + break;
> + ms = __nr_to_section(pnum);
> + ms->section_mem_map = 0;
> + }
> +}
> +
>  /*
>   * Allocate the accumulated non-linear sections, allocate a mem_map
>   * for each and record the physical to section mapping.
> -- 
> 2.18.0
>

[PATCH V2] ARM: dts: imx6: correct anatop regulators range

2018-07-01 Thread Anson Huang

According to i.MX6 datasheet, the LDO_1P1's typical
programming operating range is 1.0V to 1.2V, and
the LDO_2P5's typical programming operating range
is 2.25V to 2.75V, correct LDO_1P1 and LDO_2P5's
regulator range settings for i.MX6 SoCs.

Signed-off-by: Anson Huang 
---
changes since V1:
Correct the regulator range according to datasheet's statement.
 arch/arm/boot/dts/imx6sl.dtsi | 8 
 arch/arm/boot/dts/imx6sx.dtsi | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm/boot/dts/imx6sl.dtsi b/arch/arm/boot/dts/imx6sl.dtsi
index 81f48116..c64bd90 100644
--- a/arch/arm/boot/dts/imx6sl.dtsi
+++ b/arch/arm/boot/dts/imx6sl.dtsi
@@ -524,8 +524,8 @@
regulator-1p1 {
compatible = "fsl,anatop-regulator";
regulator-name = "vdd1p1";
-   regulator-min-microvolt = <80>;
-   regulator-max-microvolt = <1375000>;
+   regulator-min-microvolt = <100>;
+   regulator-max-microvolt = <120>;
regulator-always-on;
anatop-reg-offset = <0x110>;
anatop-vol-bit-shift = <8>;
@@ -554,8 +554,8 @@
regulator-2p5 {
compatible = "fsl,anatop-regulator";
regulator-name = "vdd2p5";
-   regulator-min-microvolt = <210>;
-   regulator-max-microvolt = <285>;
+   regulator-min-microvolt = <225>;
+   regulator-max-microvolt = <275>;
regulator-always-on;
anatop-reg-offset = <0x130>;
anatop-vol-bit-shift = <8>;
diff --git a/arch/arm/boot/dts/imx6sx.dtsi b/arch/arm/boot/dts/imx6sx.dtsi
index 7130ab8..596763c 100644
--- a/arch/arm/boot/dts/imx6sx.dtsi
+++ b/arch/arm/boot/dts/imx6sx.dtsi
@@ -592,8 +592,8 @@
regulator-1p1 {
compatible = "fsl,anatop-regulator";
regulator-name = "vdd1p1";
-   regulator-min-microvolt = <80>;
-   regulator-max-microvolt = <1375000>;
+   regulator-min-microvolt = <100>;
+   regulator-max-microvolt = <120>;
regulator-always-on;
anatop-reg-offset = <0x110>;
anatop-vol-bit-shift = <8>;
@@ -622,8 +622,8 @@
regulator-2p5 {
compatible = "fsl,anatop-regulator";
regulator-name = "vdd2p5";
-   regulator-min-microvolt = <210>;
-   regulator-max-microvolt = <2875000>;
+   regulator-min-microvolt = <225>;
+   regulator-max-microvolt = <275>;
regulator-always-on;
anatop-reg-offset = <0x130>;
anatop-vol-bit-shift = <8>;
-- 
2.7.4

[PATCH v3 0/2] sparse_init rewrite

2018-07-01 Thread Pavel Tatashin

Changelog:
v3 - v1
- Fixed two issues found by Baoquan He
v1 - v2
- Addressed comments from Oscar Salvador

In sparse_init() we allocate two large buffers to temporary hold usemap and
memmap for the whole machine. However, we can avoid doing that if we
changed sparse_init() to operated on per-node bases instead of doing it on
the whole machine beforehand.

As shown by Baoquan
http://lkml.kernel.org/r/20180628062857.29658-1-...@redhat.com

The buffers are large enough to cause machine stop to boot on small memory
systems.

These patches should be applied on top of Baoquan's work, as
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed in that work.

For the ease of review, I split this work so the first patch only adds new
interfaces, the second patch enables them, and removes the old ones.

Pavel Tatashin (2):
  mm/sparse: add sparse_init_nid()
  mm/sparse: start using sparse_init_nid(), and remove old code

 include/linux/mm.h  |   9 +-
 mm/sparse-vmemmap.c |  44 ---
 mm/sparse.c | 279 +++-
 3 files changed, 125 insertions(+), 207 deletions(-)

-- 
2.18.0

[PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Pavel Tatashin

sparse_init() requires to temporary allocate two large buffers:
usemap_map and map_map. Baoquan He has identified that these buffers are so
large that Linux is not bootable on small memory machines, such as a kdump
boot. The buffers are especially large when CONFIG_X86_5LEVEL is set, as
they are scaled to the maximum physical memory size.

Baoquan provided a fix, which reduces these sizes of these buffers, but it
is much better to get rid of them entirely.

Add a new way to initialize sparse memory: sparse_init_nid(), which only
operates within one memory node, and thus allocates memory either in large
contiguous block or allocates section by section. This eliminates the need
for use of temporary buffers.

For simplified bisecting and review, the new interface is going to be
enabled as well as old code removed in the next patch.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Oscar Salvador 
---
 include/linux/mm.h  |  8 
 mm/sparse-vmemmap.c | 49 
 mm/sparse.c | 91 +
 3 files changed, 148 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a0fbb9ffe380..85530fdfb1f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2651,6 +2651,14 @@ void sparse_mem_maps_populate_node(struct page **map_map,
   unsigned long pnum_end,
   unsigned long map_count,
   int nodeid);
+struct page * sparse_populate_node(unsigned long pnum_begin,
+  unsigned long pnum_end,
+  unsigned long map_count,
+  int nid);
+struct page * sparse_populate_node_section(struct page *map_base,
+  unsigned long map_index,
+  unsigned long pnum,
+  int nid);
 
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
struct vmem_altmap *altmap);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index e1a54ba411ec..b3e325962306 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -311,3 +311,52 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
vmemmap_buf_end = NULL;
}
 }
+
+struct page * __init sparse_populate_node(unsigned long pnum_begin,
+ unsigned long pnum_end,
+ unsigned long map_count,
+ int nid)
+{
+   unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+   unsigned long pnum, map_index = 0;
+   void *vmemmap_buf_start;
+
+   size = ALIGN(size, PMD_SIZE) * map_count;
+   vmemmap_buf_start = __earlyonly_bootmem_alloc(nid, size,
+ PMD_SIZE,
+ __pa(MAX_DMA_ADDRESS));
+   if (vmemmap_buf_start) {
+   vmemmap_buf = vmemmap_buf_start;
+   vmemmap_buf_end = vmemmap_buf_start + size;
+   }
+
+   for (pnum = pnum_begin; map_index < map_count; pnum++) {
+   if (!present_section_nr(pnum))
+   continue;
+   if (!sparse_mem_map_populate(pnum, nid, NULL))
+   break;
+   map_index++;
+   BUG_ON(pnum >= pnum_end);
+   }
+
+   if (vmemmap_buf_start) {
+   /* need to free left buf */
+   memblock_free_early(__pa(vmemmap_buf),
+   vmemmap_buf_end - vmemmap_buf);
+   vmemmap_buf = NULL;
+   vmemmap_buf_end = NULL;
+   }
+   return pfn_to_page(section_nr_to_pfn(pnum_begin));
+}
+
+/*
+ * Return map for pnum section. sparse_populate_node() has populated memory map
+ * in this node, we simply do pnum to struct page conversion.
+ */
+struct page * __init sparse_populate_node_section(struct page *map_base,
+ unsigned long map_index,
+ unsigned long pnum,
+ int nid)
+{
+   return pfn_to_page(section_nr_to_pfn(pnum));
+}
diff --git a/mm/sparse.c b/mm/sparse.c
index d18e2697a781..c18d92b8ab9b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -456,6 +456,43 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
   __func__);
}
 }
+
+static unsigned long section_map_size(void)
+{
+   return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
+}
+
+/*
+ * Try to allocate all struct pages for this node, if this fails, we will
+ * be allocating one section at a time in sparse_populate_node_section().
+ */
+struct page * __init sparse_populate_node(unsigned long pnum_begin,
+ unsigned long pnum_end,
+

[PATCH v3 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-01 Thread Pavel Tatashin

Change sprase_init() to only find the pnum ranges that belong to a specific
node and call sprase_init_nid() for that range from sparse_init().

Delete all the code that became obsolete with this change.

Signed-off-by: Pavel Tatashin 
---
 include/linux/mm.h  |   5 -
 mm/sparse-vmemmap.c |  39 
 mm/sparse.c | 220 
 3 files changed, 17 insertions(+), 247 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 85530fdfb1f2..a7438be90658 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2646,11 +2646,6 @@ extern int randomize_va_space;
 const char * arch_vma_name(struct vm_area_struct *vma);
 void print_vma_addr(char *prefix, unsigned long rip);
 
-void sparse_mem_maps_populate_node(struct page **map_map,
-  unsigned long pnum_begin,
-  unsigned long pnum_end,
-  unsigned long map_count,
-  int nodeid);
 struct page * sparse_populate_node(unsigned long pnum_begin,
   unsigned long pnum_end,
   unsigned long map_count,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index b3e325962306..4edc877cfe82 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -273,45 +273,6 @@ struct page * __meminit sparse_mem_map_populate(unsigned 
long pnum, int nid,
return map;
 }
 
-void __init sparse_mem_maps_populate_node(struct page **map_map,
- unsigned long pnum_begin,
- unsigned long pnum_end,
- unsigned long map_count, int nodeid)
-{
-   unsigned long pnum;
-   unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
-   void *vmemmap_buf_start;
-   int nr_consumed_maps = 0;
-
-   size = ALIGN(size, PMD_SIZE);
-   vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
-PMD_SIZE, __pa(MAX_DMA_ADDRESS));
-
-   if (vmemmap_buf_start) {
-   vmemmap_buf = vmemmap_buf_start;
-   vmemmap_buf_end = vmemmap_buf_start + size * map_count;
-   }
-
-   for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-   if (!present_section_nr(pnum))
-   continue;
-
-   map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, 
nodeid, NULL);
-   if (map_map[nr_consumed_maps++])
-   continue;
-   pr_err("%s: sparsemem memory map backing failed some memory 
will not be available\n",
-  __func__);
-   }
-
-   if (vmemmap_buf_start) {
-   /* need to free left buf */
-   memblock_free_early(__pa(vmemmap_buf),
-   vmemmap_buf_end - vmemmap_buf);
-   vmemmap_buf = NULL;
-   vmemmap_buf_end = NULL;
-   }
-}
-
 struct page * __init sparse_populate_node(unsigned long pnum_begin,
  unsigned long pnum_end,
  unsigned long map_count,
diff --git a/mm/sparse.c b/mm/sparse.c
index c18d92b8ab9b..f6da0b07947b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -200,11 +200,10 @@ static inline int next_present_section_nr(int section_nr)
  (section_nr <= __highest_present_section_nr));\
 section_nr = next_present_section_nr(section_nr))
 
-/*
- * Record how many memory sections are marked as present
- * during system bootup.
- */
-static int __initdata nr_present_sections;
+static inline unsigned long first_present_section_nr(void)
+{
+   return next_present_section_nr(-1);
+}
 
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
@@ -235,7 +234,6 @@ void __init memory_present(int nid, unsigned long start, 
unsigned long end)
ms->section_mem_map = sparse_encode_early_nid(nid) |
SECTION_IS_ONLINE;
section_mark_present(ms);
-   nr_present_sections++;
}
}
 }
@@ -377,34 +375,6 @@ static void __init check_usemap_section_nr(int nid, 
unsigned long *usemap)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-static void __init sparse_early_usemaps_alloc_node(void *data,
-unsigned long pnum_begin,
-unsigned long pnum_end,
-unsigned long usemap_count, int nodeid)
-{
-   void *usemap;
-   unsigned long pnum;
-   unsigned long **usemap_map = (unsigned long **)data;
-   int size = usemap_size();
-   int nr_consumed_maps = 0;
-
-   usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
-

Re: [PATCH v3 2/2] iio: adc: Add Spreadtrum SC27XX PMICs ADC support

2018-07-01 Thread Baolin Wang

On 1 July 2018 at 02:00, Jonathan Cameron  wrote:
> On Mon, 25 Jun 2018 09:47:54 +0800
> Baolin Wang  wrote:
>
>> Hi Jonathan,
>>
>> On 24 June 2018 at 21:47, Jonathan Cameron  
>> wrote:
>> > On Sun, 24 Jun 2018 14:30:09 +0100
>> > Jonathan Cameron  wrote:
>> >
>> >> On Sun, 24 Jun 2018 17:13:00 +0800
>> >> Baolin Wang  wrote:
>> >>
>> >> > Hi Jonathan,
>> >> >
>> >> > On 22 June 2018 at 22:13, Jonathan Cameron 
>> >> >  wrote:
>> >> > > On Thu, 21 Jun 2018 11:14:05 +0800
>> >> > > Baolin Wang  wrote:
>> >> > >
>> >> > >> From: Freeman Liu 
>> >> > >>
>> >> > >> The Spreadtrum SC27XX PMICs ADC controller contains 32 channels,
>> >> > >> which is used to sample voltages with 12 bits conversion.
>> >> > >>
>> >> > >> [Baolin Wang did lots of improvements]
>> >> > >>
>> >> > >> Signed-off-by: Freeman Liu 
>> >> > >> Signed-off-by: Baolin Wang 
>> >> > >
>> >> > > One trivial missed bit of cleanup inline.  I'll sort that
>> >> > > when applying if no one else points anything out before I get back to 
>> >> > > my
>> >> > > development machine.
>> >> >
>> >> > Thanks.
>> >>
>> >> Applied to the togreg branch of iio.git and pushed out as testing
>> >> for the autobuilders to play with it.
>> > Sorry, backed out for now.  My togreg tree is non rebasing and
>> > this is dependant on some stuff that only went in during the merge
>> > window.
>> >
>> > I'll send a pull request to GregKH fairly soon and after that do
>> > the merge back into my tree to pick that hwspin lock stuff up.
>> >
>> > So this will be a week or two before I can apply it without issues.
>> >
>> > Give me a poke if I seem to have forgotten about it.
>>
>> OK. Many thanks for your help.
>>
> Applied now with dependencies in place.  Applied to the togreg
> branch of iio.git and pushed out as testing for the autobuilders
> to play with it.

Thanks Jonathan.

---
Baolin Wang
Best Regards

Re: [PATCH v2 3/3] dt-bindings: clock: Modify Actions Soc clock bindings

2018-07-01 Thread Manivannan Sadhasivam

Hi Andreas,

On Sun, Jul 01, 2018 at 07:58:15PM +0200, Andreas Färber wrote:
> Hi,
> 
> Am 01.07.2018 um 19:37 schrieb Manivannan Sadhasivam:
> > On Sun, Jul 01, 2018 at 07:26:20PM +0200, Saravanan Sekar wrote:
> >> Hi Mani
> >>
> >>
> >> On 06/30/18 11:32, Manivannan Sadhasivam wrote:
> >>> Hi Saravanan,
> >>>
> >>> I agree with modifying the existing binding to accomodate other
> >>> SoC's of the same family. But the binding should be
> >>> "actions,owl-cmu.txt" since it reflects the family name.
> >>
> >> Agree, will modify the name
> >>
> >>> Andreas, what do you think?
> 
> I concur that sx00 is insufficient. Older models were called ATM.
> Unfortunately with owl- it then no longer matches the compatible, but I
> wouldn't mind.
> 
> >>> On Thu, Jun 28, 2018 at 09:18:05PM +0200, Saravanan Sekar wrote:
>  Modify clock bindings common Actions Semi Soc family S700/S900.
> 
>  Signed-off-by: Parthiban Nallathambi 
>  Signed-off-by: Saravanan Sekar 
>  ---
>    ...tions,s900-cmu.txt => actions,sx00-cmu.txt} | 18 ++
>    1 file changed, 10 insertions(+), 8 deletions(-)
>    rename Documentation/devicetree/bindings/clock/{actions,s900-cmu.txt 
>  => actions,sx00-cmu.txt} (71%)
> 
>  diff --git 
>  a/Documentation/devicetree/bindings/clock/actions,s900-cmu.txt 
>  b/Documentation/devicetree/bindings/clock/actions,sx00-cmu.txt
>  similarity index 71%
>  rename from Documentation/devicetree/bindings/clock/actions,s900-cmu.txt
>  rename to Documentation/devicetree/bindings/clock/actions,sx00-cmu.txt
>  index 93e4fb827cd6..8dc7edb4d198 100644
>  --- a/Documentation/devicetree/bindings/clock/actions,s900-cmu.txt
>  +++ b/Documentation/devicetree/bindings/clock/actions,sx00-cmu.txt
>  @@ -1,12 +1,14 @@
>  -* Actions S900 Clock Management Unit (CMU)
>  +* Actions S900/S700 Clock Management Unit (CMU)
> >>> Same as above. Should be Actions OWL SoC's Clock Management Unit (CMU).
> >>
> >> sure
> >>
> > 
> > During the review of I2C controller driver, Andreas pointed out that
> > we should use Owl instead of OWL in all places and also Actions should
> > be replaced by Actions Semiconductor.
> 
> Actually I was just asking to include the company name and not just say
> Owl. Whether it's Actions, Actions Semi or Actions Semiconductor is not
> that important to me - that would be for the Actions Semi colleagues to
> comment - which I see are not in CC... Please fix that! You don't need
> mp-cs I think, but the others please.
> 

Okay, cool.

> > So, please change it in relevant
> > places. For this binding, title should be:
> > 
> > Actions Semiconductor Owl SoC's Clock Management Unit (CMU).
> 
> "SoC's" looks weird there, do we have such precedence to add it?
> "Actions Owl Clock Management Unit (CMU)" might do?
> 

Let's keep 'Actions Semi' everywhere unless Actions Semi people say
differently.

> Saravanan, please compare other patch titles and adapt your subjects
> accordingly: "modify bindings" is not very meaningful, since a patch
> always modifies something and the prefix already indicates dt-bindings
> as target. "Soc" is misspelled. The time-saving information to put there
> would be addition of S700.
> 
>  -The Actions S900 clock management unit generates and supplies clock to 
>  various
>  -controllers within the SoC. The clock binding described here is 
>  applicable to
>  -S900 SoC.
>  +The Actions S900/S700 clock management unit generates and supplies 
>  clock to
>  +various controllers within the SoC. The clock binding described here is
>  +applicable to S900/S700 SoC.
> 
> "S900 and S700 SoCs"? (keep the slash above)
> 

Agree.

>    Required Properties:
>  -- compatible: should be "actions,s900-cmu"
>  +- compatible: should be one of this
> >>> Change to: compatible: should be one of the following:
> >>
> >> sure
> >>
> >>> Thanks,
> >>> Mani
> >>>
>  +"actions,s900-cmu"
>  +"actions,s700-cmu"
> 
> Mani, should we order alphabetically? I.e., will S500 go before S900, or
> will people just always append at the end now? Not saying we have to,
> but keeping it consistent across Actions bindings would be desirable.
> 

Yes, we should order alphabetically IMO. That way it looks good!

>    - reg: physical base address of the controller and length of memory 
>  mapped
>  region.
>    - clocks: Reference to the parent clocks ("hosc", "losc")
>  @@ -15,9 +17,9 @@ Required Properties:
>    Each clock is assigned an identifier, and client nodes can use this 
>  identifier
>    to specify the clock which they consume.
>  -All available clocks are defined as preprocessor macros in
>  -dt-bindings/clock/actions,s900-cmu.h header and can be used in device
>  -tree sources.
>  +All available clocks are defined as preprocessor macros in corresponding
>  +dt-bindings/clock/actions,s900-cm

Re: [PATCH v2 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-01 Thread Pavel Tatashin

>
> Yes, if they are equal at 501, 'continue' to for loop. If nid is not
> equal to nid_begin, we execute sparse_init_nid(), here should it be that
> nid_begin is the current node, nid is next node?

Nevermind, I forgot about the continue, I will fix it. Thank you again!

Pavel

Re: [PATCH v2 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-01 Thread Baoquan He

On 07/01/18 at 09:46pm, Pavel Tatashin wrote:
>  ~~~
> > Here, node id passed to sparse_init_nid() should be 'nid_begin', but not
> > 'nid'. When you found out the current section's 'nid' is diferent than
> > 'nid_begin', handle node 'nid_begin', then start to next node 'nid'.
> 
> Thank you for reviewing this work. Here nid equals to nid_begin:
> 
> See, "if" at 501, and this call is at 505.

Yes, if they are equal at 501, 'continue' to for loop. If nid is not
equal to nid_begin, we execute sparse_init_nid(), here should it be that
nid_begin is the current node, nid is next node?

> 
> 492 void __init sparse_init(void)
> 493 {
> 494 unsigned long pnum_begin = first_present_section_nr();
> 495 int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
> 496 unsigned long pnum_end, map_count = 1;
> 497
> 498 for_each_present_section_nr(pnum_begin + 1, pnum_end) {
> 499 int nid = sparse_early_nid(__nr_to_section(pnum_end));
> 500
> 501 if (nid == nid_begin) {
> 502 map_count++;
> 503 continue;
> 504 }
> 505 sparse_init_nid(nid, pnum_begin, pnum_end, map_count);
> 506 nid_begin = nid;
> 507 pnum_begin = pnum_end;
> 508 map_count = 1;
> 509 }
> 510 sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
> 511 vmemmap_populate_print_last();
> 512 }
> 
> Thank you,
> Pavel

Re: [PATCH v2 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-01 Thread Pavel Tatashin

 ~~~
> Here, node id passed to sparse_init_nid() should be 'nid_begin', but not
> 'nid'. When you found out the current section's 'nid' is diferent than
> 'nid_begin', handle node 'nid_begin', then start to next node 'nid'.

Thank you for reviewing this work. Here nid equals to nid_begin:

See, "if" at 501, and this call is at 505.

492 void __init sparse_init(void)
493 {
494 unsigned long pnum_begin = first_present_section_nr();
495 int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
496 unsigned long pnum_end, map_count = 1;
497
498 for_each_present_section_nr(pnum_begin + 1, pnum_end) {
499 int nid = sparse_early_nid(__nr_to_section(pnum_end));
500
501 if (nid == nid_begin) {
502 map_count++;
503 continue;
504 }
505 sparse_init_nid(nid, pnum_begin, pnum_end, map_count);
506 nid_begin = nid;
507 pnum_begin = pnum_end;
508 map_count = 1;
509 }
510 sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
511 vmemmap_populate_print_last();
512 }

Thank you,
Pavel

Re: [PATCH v2 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-01 Thread Pavel Tatashin

On Sun, Jul 1, 2018 at 9:34 PM Baoquan He  wrote:
>
> Hi Pavel,
>
> On 06/29/18 at 11:09pm, Pavel Tatashin wrote:
> > Change sprase_init() to only find the pnum ranges that belong to a specific
> > node and call sprase_init_nid() for that range from sparse_init().
> >
> > Delete all the code that became obsolete with this change.
>
> > @@ -617,87 +491,24 @@ void __init sparse_init_nid(int nid, unsigned long 
> > pnum_begin,
> >   */
> >  void __init sparse_init(void)
> >  {
> > - unsigned long pnum;
> > - struct page *map;
> > - struct page **map_map;
> > - unsigned long *usemap;
> > - unsigned long **usemap_map;
> > - int size, size2;
> > - int nr_consumed_maps = 0;
> > -
> > - /* see include/linux/mmzone.h 'struct mem_section' definition */
> > - BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
> > + unsigned long pnum_begin = first_present_section_nr();
> > + int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
> > + unsigned long pnum_end, map_count = 1;
> >
> > - /* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
> > - set_pageblock_order();
>
> Not very sure if removing set_pageblock_order() calling here is OK. What
> if CONFIG_HUGETLB_PAGE_SIZE_VARIABLE is enabled? usemap_size() depends
> on value of 'pageblock_order'.

Hi Baoquan,

Nice catch, you are right, I incorrectly removed this call, will add
it back in the next version.

Pavel

>
> Thanks
> Baoquan
>
> > + for_each_present_section_nr(pnum_begin + 1, pnum_end) {
> > + int nid = sparse_early_nid(__nr_to_section(pnum_end));
> >
> > - /*
> > -  * map is using big page (aka 2M in x86 64 bit)
> > -  * usemap is less one page (aka 24 bytes)
> > -  * so alloc 2M (with 2M align) and 24 bytes in turn will
> > -  * make next 2M slip to one more 2M later.
> > -  * then in big system, the memory will have a lot of holes...
> > -  * here try to allocate 2M pages continuously.
> > -  *
> > -  * powerpc need to call sparse_init_one_section right after each
> > -  * sparse_early_mem_map_alloc, so allocate usemap_map at first.
> > -  */
> > - size = sizeof(unsigned long *) * nr_present_sections;
> > - usemap_map = memblock_virt_alloc(size, 0);
> > - if (!usemap_map)
> > - panic("can not allocate usemap_map\n");
> > - alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
> > - (void *)usemap_map,
> > - sizeof(usemap_map[0]));
> > -
> > - size2 = sizeof(struct page *) * nr_present_sections;
> > - map_map = memblock_virt_alloc(size2, 0);
> > - if (!map_map)
> > - panic("can not allocate map_map\n");
> > - alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
> > - (void *)map_map,
> > - sizeof(map_map[0]));
> > -
> > - /* The numner of present sections stored in nr_present_sections
> > -  * are kept the same since mem sections are marked as present in
> > -  * memory_present(). In this for loop, we need check which sections
> > -  * failed to allocate memmap or usemap, then clear its
> > -  * ->section_mem_map accordingly. During this process, we need
> > -  * increase 'nr_consumed_maps' whether its allocation of memmap
> > -  * or usemap failed or not, so that after we handle the i-th
> > -  * memory section, can get memmap and usemap of (i+1)-th section
> > -  * correctly. */
> > - for_each_present_section_nr(0, pnum) {
> > - struct mem_section *ms;
> > -
> > - if (nr_consumed_maps >= nr_present_sections) {
> > - pr_err("nr_consumed_maps goes beyond 
> > nr_present_sections\n");
> > - break;
> > - }
> > - ms = __nr_to_section(pnum);
> > - usemap = usemap_map[nr_consumed_maps];
> > - if (!usemap) {
> > - ms->section_mem_map = 0;
> > - nr_consumed_maps++;
> > - continue;
> > - }
> > -
> > - map = map_map[nr_consumed_maps];
> > - if (!map) {
> > - ms->section_mem_map = 0;
> > - nr_consumed_maps++;
> > + if (nid == nid_begin) {
> > + map_count++;
> >   continue;
> >   }
> > -
> > - sparse_init_one_section(__nr_to_section(pnum), pnum, map,
> > - usemap);
> > - nr_consumed_maps++;
> > + sparse_init_nid(nid, pnum_begin, pnum_end, map_count);
> > + nid_begin = nid;
> > + pnum_begin = pnum_end;
> > + map_count = 1;
> >   }
> > -
> > + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
> >   vmemmap_populate_print_last();
> > -
> > - memblock_free_early(__p

Re: [PATCH v2 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Pavel Tatashin

On Sun, Jul 1, 2018 at 9:29 PM Baoquan He  wrote:
>
> On 06/29/18 at 11:09pm, Pavel Tatashin wrote:
> > sparse_init() requires to temporary allocate two large buffers:
> > usemap_map and map_map. Baoquan He has identified that these buffers are so
> > large that Linux is not bootable on small memory machines, such as a kdump
> > boot.
>
> These two temporary buffers are large when CONFIG_X86_5LEVEL is enabled.
> Otherwise it's OK.

Thank you. I will add CONFIG_X86_5LEVEL to the commit log.

Pavel

>
> >
> > Baoquan provided a fix, which reduces these sizes of these buffers, but it
> > is much better to get rid of them entirely.
> >
> > Add a new way to initialize sparse memory: sparse_init_nid(), which only
> > operates within one memory node, and thus allocates memory either in large
> > contiguous block or allocates section by section. This eliminates the need
> > for use of temporary buffers.
> >
> > For simplified bisecting and review, the new interface is going to be
> > enabled as well as old code removed in the next patch.
> >
> > Signed-off-by: Pavel Tatashin 
> > Reviewed-by: Oscar Salvador 
> > ---
> >  include/linux/mm.h  |  8 
> >  mm/sparse-vmemmap.c | 49 
> >  mm/sparse.c | 91 +
> >  3 files changed, 148 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index a0fbb9ffe380..85530fdfb1f2 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2651,6 +2651,14 @@ void sparse_mem_maps_populate_node(struct page 
> > **map_map,
> >  unsigned long pnum_end,
> >  unsigned long map_count,
> >  int nodeid);
> > +struct page * sparse_populate_node(unsigned long pnum_begin,
> > +unsigned long pnum_end,
> > +unsigned long map_count,
> > +int nid);
> > +struct page * sparse_populate_node_section(struct page *map_base,
> > +unsigned long map_index,
> > +unsigned long pnum,
> > +int nid);
> >
> >  struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
> >   struct vmem_altmap *altmap);
> > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> > index e1a54ba411ec..b3e325962306 100644
> > --- a/mm/sparse-vmemmap.c
> > +++ b/mm/sparse-vmemmap.c
> > @@ -311,3 +311,52 @@ void __init sparse_mem_maps_populate_node(struct page 
> > **map_map,
> >   vmemmap_buf_end = NULL;
> >   }
> >  }
> > +
> > +struct page * __init sparse_populate_node(unsigned long pnum_begin,
> > +   unsigned long pnum_end,
> > +   unsigned long map_count,
> > +   int nid)
> > +{
> > + unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> > + unsigned long pnum, map_index = 0;
> > + void *vmemmap_buf_start;
> > +
> > + size = ALIGN(size, PMD_SIZE) * map_count;
> > + vmemmap_buf_start = __earlyonly_bootmem_alloc(nid, size,
> > +   PMD_SIZE,
> > +   __pa(MAX_DMA_ADDRESS));
> > + if (vmemmap_buf_start) {
> > + vmemmap_buf = vmemmap_buf_start;
> > + vmemmap_buf_end = vmemmap_buf_start + size;
> > + }
> > +
> > + for (pnum = pnum_begin; map_index < map_count; pnum++) {
> > + if (!present_section_nr(pnum))
> > + continue;
> > + if (!sparse_mem_map_populate(pnum, nid, NULL))
> > + break;
> > + map_index++;
> > + BUG_ON(pnum >= pnum_end);
> > + }
> > +
> > + if (vmemmap_buf_start) {
> > + /* need to free left buf */
> > + memblock_free_early(__pa(vmemmap_buf),
> > + vmemmap_buf_end - vmemmap_buf);
> > + vmemmap_buf = NULL;
> > + vmemmap_buf_end = NULL;
> > + }
> > + return pfn_to_page(section_nr_to_pfn(pnum_begin));
> > +}
> > +
> > +/*
> > + * Return map for pnum section. sparse_populate_node() has populated 
> > memory map
> > + * in this node, we simply do pnum to struct page conversion.
> > + */
> > +struct page * __init sparse_populate_node_section(struct page *map_base,
> > +   unsigned long map_index,
> > +   unsigned long pnum,
> > +   int nid)
> > +{
> > + return pfn_to_page(section_nr_to_pfn(pnum));
> > +}
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index d18e2697a781..c18d92b8ab9b 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -456,6 +456,43 @@ void __init sparse_mem_maps_populate_node(struct page 
> >

Re: [PATCH upstream] KASAN: slab-out-of-bounds Read in getname_kernel

2018-07-01 Thread Ian Kent

On Mon, 2018-07-02 at 09:10 +0800, Ian Kent wrote:
> On Mon, 2018-07-02 at 00:04 +0200, tomas wrote:
> > Hi,
> > 
> > I've looked into this issue found by Syzbot and I made a patch:
> > 
> > https://syzkaller.appspot.com/bug?id=d03abd8b42847f7f69b1d1d7f97208ae425b116
> > 3
> 
> Umm ... oops!
> 
> Thanks for looking into this Tomas.
> 
> > 
> > 
> > The autofs subsystem does not check that the "path" parameter is present
> > within the "param" struct passed by the userspace in case the
> > AUTOFS_DEV_IOCTL_OPENMOUNT_CMD command is passed. Indeed, it assumes a
> > path is always provided (though a path is not always present, as per how
> > the struct is defined:
> > https://github.com/torvalds/linux/blob/master/include/uapi/linux/auto_dev-io
> > ct
> > l.h#L89).
> > Skipping the check provokes an oob read in "strlen", called by
> > "getname_kernel", in turn called by the autofs to assess the length of
> > the non-existing path.
> > 
> > To solve it, modify the "validate_dev_ioctl" function to check also that
> > a path has been provided if the command is AUTOFS_DEV_IOCTL_OPENMOUNT_CMD.
> > 
> > 
> > --- b/fs/autofs/dev-ioctl.c2018-07-01 23:10:16.059728621 +0200
> > +++ a/fs/autofs/dev-ioctl.c2018-07-01 23:10:24.311792133 +0200
> > @@ -136,6 +136,9 @@ static int validate_dev_ioctl(int cmd, s
> >  goto out;
> >  }
> >  }
> > +/* AUTOFS_DEV_IOCTL_OPENMOUNT_CMD without path */
> > +else if(_IOC_NR(cmd) == AUTOFS_DEV_IOCTL_OPENMOUNT_CMD)
> > +return -EINVAL;
> 
> My preference is to put the comment inside the else but ...
> 
> There's another question, should the check be done in
> autofs_dev_ioctl_openmount() in the same way it's checked in other
> ioctls that need a path, such as in autofs_dev_ioctl_requester()
> and autofs_dev_ioctl_ismountpoint()?
> 
> For consistency I'd say it should.
> 
> >  
> >  err = 0;
> >  out:
> > 
> > 
> > Tested and solves the issue on Linus' main git tree.
> > 
> > 

Or perhaps this (not even compile tested) patch would be better?

autofs - fix slab out of bounds read in getname_kernel()

From: Ian Kent 

The autofs subsystem does not check that the "path" parameter is
present for all cases where it is required when it is passed in
via the "param" struct.

In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD
ioctl command.

To solve it, modify validate_dev_ioctl() function to check that a
path has been provided for ioctl commands that require it.
---
 fs/autofs/dev-ioctl.c |   15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c
index ea4ca1445ab7..61c63715c3fb 100644
--- a/fs/autofs/dev-ioctl.c
+++ b/fs/autofs/dev-ioctl.c
@@ -135,6 +135,11 @@ static int validate_dev_ioctl(int cmd, struct 
autofs_dev_ioctl *param)
cmd);
goto out;
}
+   } else if (cmd == AUTOFS_DEV_IOCTL_OPENMOUNT_CMD ||
+  cmd == AUTOFS_DEV_IOCTL_REQUESTER_CMD ||
+  cmd == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD) {
+   err = -EINVAL;
+   goto out;
}
 
err = 0;
@@ -433,10 +438,7 @@ static int autofs_dev_ioctl_requester(struct file *fp,
dev_t devid;
int err = -ENOENT;
 
-   if (param->size <= AUTOFS_DEV_IOCTL_SIZE) {
-   err = -EINVAL;
-   goto out;
-   }
+   /* param->path has already been checked */
 
devid = sbi->sb->s_dev;
 
@@ -521,10 +523,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp,
unsigned int devid, magic;
int err = -ENOENT;
 
-   if (param->size <= AUTOFS_DEV_IOCTL_SIZE) {
-   err = -EINVAL;
-   goto out;
-   }
+   /* param->path has already been checked */
 
name = param->path;
type = param->ismountpoint.in.type;

Re: [PATCH v2 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-01 Thread Baoquan He

On 06/29/18 at 11:09pm, Pavel Tatashin wrote:
> Change sprase_init() to only find the pnum ranges that belong to a specific
> node and call sprase_init_nid() for that range from sparse_init().
> 
> Delete all the code that became obsolete with this change.
>  void __init sparse_init(void)
>  {
> - unsigned long pnum;
> - struct page *map;
> - struct page **map_map;
> - unsigned long *usemap;
> - unsigned long **usemap_map;
> - int size, size2;
> - int nr_consumed_maps = 0;
> -
> - /* see include/linux/mmzone.h 'struct mem_section' definition */
> - BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
> + unsigned long pnum_begin = first_present_section_nr();
> + int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
> + unsigned long pnum_end, map_count = 1;
>  
> - /* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
> - set_pageblock_order();
> + for_each_present_section_nr(pnum_begin + 1, pnum_end) {
> + int nid = sparse_early_nid(__nr_to_section(pnum_end));
>  
> - /*
> -  * map is using big page (aka 2M in x86 64 bit)
> -  * usemap is less one page (aka 24 bytes)
> -  * so alloc 2M (with 2M align) and 24 bytes in turn will
> -  * make next 2M slip to one more 2M later.
> -  * then in big system, the memory will have a lot of holes...
> -  * here try to allocate 2M pages continuously.
> -  *
> -  * powerpc need to call sparse_init_one_section right after each
> -  * sparse_early_mem_map_alloc, so allocate usemap_map at first.
> -  */
> - size = sizeof(unsigned long *) * nr_present_sections;
> - usemap_map = memblock_virt_alloc(size, 0);
> - if (!usemap_map)
> - panic("can not allocate usemap_map\n");
> - alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
> - (void *)usemap_map,
> - sizeof(usemap_map[0]));
> -
> - size2 = sizeof(struct page *) * nr_present_sections;
> - map_map = memblock_virt_alloc(size2, 0);
> - if (!map_map)
> - panic("can not allocate map_map\n");
> - alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
> - (void *)map_map,
> - sizeof(map_map[0]));
> -
> - /* The numner of present sections stored in nr_present_sections
> -  * are kept the same since mem sections are marked as present in
> -  * memory_present(). In this for loop, we need check which sections
> -  * failed to allocate memmap or usemap, then clear its
> -  * ->section_mem_map accordingly. During this process, we need
> -  * increase 'nr_consumed_maps' whether its allocation of memmap
> -  * or usemap failed or not, so that after we handle the i-th
> -  * memory section, can get memmap and usemap of (i+1)-th section
> -  * correctly. */
> - for_each_present_section_nr(0, pnum) {
> - struct mem_section *ms;
> -
> - if (nr_consumed_maps >= nr_present_sections) {
> - pr_err("nr_consumed_maps goes beyond 
> nr_present_sections\n");
> - break;
> - }
> - ms = __nr_to_section(pnum);
> - usemap = usemap_map[nr_consumed_maps];
> - if (!usemap) {
> - ms->section_mem_map = 0;
> - nr_consumed_maps++;
> - continue;
> - }
> -
> - map = map_map[nr_consumed_maps];
> - if (!map) {
> - ms->section_mem_map = 0;
> - nr_consumed_maps++;
> + if (nid == nid_begin) {
> + map_count++;
>   continue;
>   }
> -
> - sparse_init_one_section(__nr_to_section(pnum), pnum, map,
> - usemap);
> - nr_consumed_maps++;
> + sparse_init_nid(nid, pnum_begin, pnum_end, map_count);
~~~
Here, node id passed to sparse_init_nid() should be 'nid_begin', but not
'nid'. When you found out the current section's 'nid' is diferent than
'nid_begin', handle node 'nid_begin', then start to next node 'nid'.


> + nid_begin = nid;
> + pnum_begin = pnum_end;
> + map_count = 1;
>   }
> -
> + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
>   vmemmap_populate_print_last();
> -
> - memblock_free_early(__pa(map_map), size2);
> - memblock_free_early(__pa(usemap_map), size);
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -- 
> 2.18.0
>

Re: [PATCH] arm: dts: socfpga: denali needs nand_x_clk too

2018-07-01 Thread Masahiro Yamada

Hi Dinh,

2018-06-27 23:55 GMT+09:00 Dinh Nguyen :
> Hi Masahiro,
>
> On 06/26/2018 09:52 PM, Masahiro Yamada wrote:
>> 2018-06-27 3:09 GMT+09:00 Miquel Raynal :
>>> Hi Masahiro,
>>>
>>> On Tue, 26 Jun 2018 11:38:21 +0900, Masahiro Yamada
>>>  wrote:
>>>
 2018-06-25 23:55 GMT+09:00 Boris Brezillon :
> On Mon, 25 Jun 2018 09:50:18 -0500
> Dinh Nguyen  wrote:
>
>> On 06/22/2018 10:58 AM, Richard Weinberger wrote:
>>> Masahiro,
>>>
>>> Am Freitag, 22. Juni 2018, 16:37:21 CEST schrieb Masahiro Yamada:
 Hi Richard,


 2018-06-19 21:07 GMT+09:00 Richard Weinberger :
> The denali NAND flash controller needs at least two clocks to operate,
> nand_clk and nand_x_clk.
> Since 1bb88666775e ("mtd: nand: denali: handle timing parameters by
> setup_data_interface()") nand_x_clk is used to derive timing settings.
>
> Signed-off-by: Richard Weinberger 
> ---
> Strictly speaking denali needs a ecc_clk too, but AFAIK such a clock
> is not present on this SoC.
> But my SoCFPGA knowledge is very limited.
>
> Thanks,
> //richard
> ---
>  arch/arm/boot/dts/socfpga.dtsi | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/dts/socfpga.dtsi 
> b/arch/arm/boot/dts/socfpga.dtsi
> index 486d4e7433ed..562f7b375bbd 100644
> --- a/arch/arm/boot/dts/socfpga.dtsi
> +++ b/arch/arm/boot/dts/socfpga.dtsi
> @@ -754,7 +754,8 @@
> reg-names = "nand_data", "denali_reg";
> interrupts = <0x0 0x90 0x4>;
> dma-mask = <0x>;
> -   clocks = <&nand_clk>;
> +   clocks = <&nand_clk>, <&nand_x_clk>;
> +   clock-names = "nand", "nand_x";


 IMHO, this should be

   clocks = <&nand_clk>, <&nand_x_clk>, 
 <&nand_x_clk>;
   clock-names = "nand", "nand_x", "ecc";
>>
>> No, it should be just the nand_x and ecc.
>>
>> There's already a patch to use the nand_x_clk and not the nand_clk.


 Different people try to fix the problem in different ways.

 I think it is due to miscommunication across sub-systems.
>>>
>>> Is the series named
>>>
>>> mtd: rawnand: denali: add new clocks and improve
>>>   setup_data_interface
>>>
>>> still valid?
>>
>> Yes.
>> I believe V4 is valid.
>>
>>
>> Information for Dinh Nguyen:
>>
>> http://patchwork.ozlabs.org/patch/933507/
>> http://patchwork.ozlabs.org/patch/933487/
>> http://patchwork.ozlabs.org/patch/933494/
>> http://patchwork.ozlabs.org/patch/933506/
>>
>>
>> If he is not convinced, I am open to discussion, though.
>
> I wasn't aware of these patches. This patch is staged to go into
> v4.17-rc3 through the arm-soc:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git/commit/arch/arm/boot/dts/socfpga.dtsi?h=fixes&id=4eda9b766b042ea38d84df91581b03f6145a2ab0
>
> I think your patch will handle a case where only 1 clock is passed in,
> so it should be okay right?
>

I should be OK,
but please consider the proper fix for v4.19
as Boris suggested.



-- 
Best Regards
Masahiro Yamada

Re: [PATCH v2 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-01 Thread Baoquan He

Hi Pavel,

On 06/29/18 at 11:09pm, Pavel Tatashin wrote:
> Change sprase_init() to only find the pnum ranges that belong to a specific
> node and call sprase_init_nid() for that range from sparse_init().
> 
> Delete all the code that became obsolete with this change.

> @@ -617,87 +491,24 @@ void __init sparse_init_nid(int nid, unsigned long 
> pnum_begin,
>   */
>  void __init sparse_init(void)
>  {
> - unsigned long pnum;
> - struct page *map;
> - struct page **map_map;
> - unsigned long *usemap;
> - unsigned long **usemap_map;
> - int size, size2;
> - int nr_consumed_maps = 0;
> -
> - /* see include/linux/mmzone.h 'struct mem_section' definition */
> - BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
> + unsigned long pnum_begin = first_present_section_nr();
> + int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
> + unsigned long pnum_end, map_count = 1;
>  
> - /* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
> - set_pageblock_order();

Not very sure if removing set_pageblock_order() calling here is OK. What
if CONFIG_HUGETLB_PAGE_SIZE_VARIABLE is enabled? usemap_size() depends
on value of 'pageblock_order'.

Thanks
Baoquan

> + for_each_present_section_nr(pnum_begin + 1, pnum_end) {
> + int nid = sparse_early_nid(__nr_to_section(pnum_end));
>  
> - /*
> -  * map is using big page (aka 2M in x86 64 bit)
> -  * usemap is less one page (aka 24 bytes)
> -  * so alloc 2M (with 2M align) and 24 bytes in turn will
> -  * make next 2M slip to one more 2M later.
> -  * then in big system, the memory will have a lot of holes...
> -  * here try to allocate 2M pages continuously.
> -  *
> -  * powerpc need to call sparse_init_one_section right after each
> -  * sparse_early_mem_map_alloc, so allocate usemap_map at first.
> -  */
> - size = sizeof(unsigned long *) * nr_present_sections;
> - usemap_map = memblock_virt_alloc(size, 0);
> - if (!usemap_map)
> - panic("can not allocate usemap_map\n");
> - alloc_usemap_and_memmap(sparse_early_usemaps_alloc_node,
> - (void *)usemap_map,
> - sizeof(usemap_map[0]));
> -
> - size2 = sizeof(struct page *) * nr_present_sections;
> - map_map = memblock_virt_alloc(size2, 0);
> - if (!map_map)
> - panic("can not allocate map_map\n");
> - alloc_usemap_and_memmap(sparse_early_mem_maps_alloc_node,
> - (void *)map_map,
> - sizeof(map_map[0]));
> -
> - /* The numner of present sections stored in nr_present_sections
> -  * are kept the same since mem sections are marked as present in
> -  * memory_present(). In this for loop, we need check which sections
> -  * failed to allocate memmap or usemap, then clear its
> -  * ->section_mem_map accordingly. During this process, we need
> -  * increase 'nr_consumed_maps' whether its allocation of memmap
> -  * or usemap failed or not, so that after we handle the i-th
> -  * memory section, can get memmap and usemap of (i+1)-th section
> -  * correctly. */
> - for_each_present_section_nr(0, pnum) {
> - struct mem_section *ms;
> -
> - if (nr_consumed_maps >= nr_present_sections) {
> - pr_err("nr_consumed_maps goes beyond 
> nr_present_sections\n");
> - break;
> - }
> - ms = __nr_to_section(pnum);
> - usemap = usemap_map[nr_consumed_maps];
> - if (!usemap) {
> - ms->section_mem_map = 0;
> - nr_consumed_maps++;
> - continue;
> - }
> -
> - map = map_map[nr_consumed_maps];
> - if (!map) {
> - ms->section_mem_map = 0;
> - nr_consumed_maps++;
> + if (nid == nid_begin) {
> + map_count++;
>   continue;
>   }
> -
> - sparse_init_one_section(__nr_to_section(pnum), pnum, map,
> - usemap);
> - nr_consumed_maps++;
> + sparse_init_nid(nid, pnum_begin, pnum_end, map_count);
> + nid_begin = nid;
> + pnum_begin = pnum_end;
> + map_count = 1;
>   }
> -
> + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
>   vmemmap_populate_print_last();
> -
> - memblock_free_early(__pa(map_map), size2);
> - memblock_free_early(__pa(usemap_map), size);
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -- 
> 2.18.0
>

[PATCH V2] mmc: core: cd_label must be last entry of mmc_gpio struct

2018-07-01 Thread Anson Huang

commit bfd694d5e21c ("mmc: core: Add tunable delay
before detecting card after card is inserted") adds
"u32 cd_debounce_delay_ms" to the last of mmc_gpio
struct and cause "char cd_label[0]" NOT work as string
pointer of card detect label, when "cat /proc/interrupts",
the devname for card detect gpio is incorrect as below:

144:  0  gpio-mxc  22 Edge  ▒
161:  0  gpio-mxc   7 Edge  ▒

Move the cd_label field down to fix this, and drop the
zero from the array size to prevent future similar bugs,
the result is correct as below:

144:  0  gpio-mxc  22 Edge  2198000.mmc cd
161:  0  gpio-mxc   7 Edge  219.mmc cd

Fixes: bfd694d5e21c ("mmc: core: Add tunable delay before detecting card after 
card is inserted")
Signed-off-by: Anson Huang 
Tested-by: Fabio Estevam 
---
changes since V1:
- Add fix tag;
- Drop the zero from the array size, then gcc will have compiling error 
for such kind of issue next time.
 drivers/mmc/core/slot-gpio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/slot-gpio.c b/drivers/mmc/core/slot-gpio.c
index ef05e00..2a83368 100644
--- a/drivers/mmc/core/slot-gpio.c
+++ b/drivers/mmc/core/slot-gpio.c
@@ -27,8 +27,8 @@ struct mmc_gpio {
bool override_cd_active_level;
irqreturn_t (*cd_gpio_isr)(int irq, void *dev_id);
char *ro_label;
-   char cd_label[0];
u32 cd_debounce_delay_ms;
+   char cd_label[];
 };
 
 static irqreturn_t mmc_gpio_cd_irqt(int irq, void *dev_id)
-- 
2.7.4

Re: [PATCH v2 1/2] mm/sparse: add sparse_init_nid()

2018-07-01 Thread Baoquan He

On 06/29/18 at 11:09pm, Pavel Tatashin wrote:
> sparse_init() requires to temporary allocate two large buffers:
> usemap_map and map_map. Baoquan He has identified that these buffers are so
> large that Linux is not bootable on small memory machines, such as a kdump
> boot.

These two temporary buffers are large when CONFIG_X86_5LEVEL is enabled.
Otherwise it's OK.

> 
> Baoquan provided a fix, which reduces these sizes of these buffers, but it
> is much better to get rid of them entirely.
> 
> Add a new way to initialize sparse memory: sparse_init_nid(), which only
> operates within one memory node, and thus allocates memory either in large
> contiguous block or allocates section by section. This eliminates the need
> for use of temporary buffers.
> 
> For simplified bisecting and review, the new interface is going to be
> enabled as well as old code removed in the next patch.
> 
> Signed-off-by: Pavel Tatashin 
> Reviewed-by: Oscar Salvador 
> ---
>  include/linux/mm.h  |  8 
>  mm/sparse-vmemmap.c | 49 
>  mm/sparse.c | 91 +
>  3 files changed, 148 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a0fbb9ffe380..85530fdfb1f2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2651,6 +2651,14 @@ void sparse_mem_maps_populate_node(struct page 
> **map_map,
>  unsigned long pnum_end,
>  unsigned long map_count,
>  int nodeid);
> +struct page * sparse_populate_node(unsigned long pnum_begin,
> +unsigned long pnum_end,
> +unsigned long map_count,
> +int nid);
> +struct page * sparse_populate_node_section(struct page *map_base,
> +unsigned long map_index,
> +unsigned long pnum,
> +int nid);
>  
>  struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
>   struct vmem_altmap *altmap);
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index e1a54ba411ec..b3e325962306 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -311,3 +311,52 @@ void __init sparse_mem_maps_populate_node(struct page 
> **map_map,
>   vmemmap_buf_end = NULL;
>   }
>  }
> +
> +struct page * __init sparse_populate_node(unsigned long pnum_begin,
> +   unsigned long pnum_end,
> +   unsigned long map_count,
> +   int nid)
> +{
> + unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> + unsigned long pnum, map_index = 0;
> + void *vmemmap_buf_start;
> +
> + size = ALIGN(size, PMD_SIZE) * map_count;
> + vmemmap_buf_start = __earlyonly_bootmem_alloc(nid, size,
> +   PMD_SIZE,
> +   __pa(MAX_DMA_ADDRESS));
> + if (vmemmap_buf_start) {
> + vmemmap_buf = vmemmap_buf_start;
> + vmemmap_buf_end = vmemmap_buf_start + size;
> + }
> +
> + for (pnum = pnum_begin; map_index < map_count; pnum++) {
> + if (!present_section_nr(pnum))
> + continue;
> + if (!sparse_mem_map_populate(pnum, nid, NULL))
> + break;
> + map_index++;
> + BUG_ON(pnum >= pnum_end);
> + }
> +
> + if (vmemmap_buf_start) {
> + /* need to free left buf */
> + memblock_free_early(__pa(vmemmap_buf),
> + vmemmap_buf_end - vmemmap_buf);
> + vmemmap_buf = NULL;
> + vmemmap_buf_end = NULL;
> + }
> + return pfn_to_page(section_nr_to_pfn(pnum_begin));
> +}
> +
> +/*
> + * Return map for pnum section. sparse_populate_node() has populated memory 
> map
> + * in this node, we simply do pnum to struct page conversion.
> + */
> +struct page * __init sparse_populate_node_section(struct page *map_base,
> +   unsigned long map_index,
> +   unsigned long pnum,
> +   int nid)
> +{
> + return pfn_to_page(section_nr_to_pfn(pnum));
> +}
> diff --git a/mm/sparse.c b/mm/sparse.c
> index d18e2697a781..c18d92b8ab9b 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -456,6 +456,43 @@ void __init sparse_mem_maps_populate_node(struct page 
> **map_map,
>  __func__);
>   }
>  }
> +
> +static unsigned long section_map_size(void)
> +{
> + return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
> +}
> +
> +/*
> + * Try to allocate all struct pages for this node, if this fails, we will
> + * be allocating one section at a time in sparse_populate_n

Re: [PATCH] kbuild: check for /sbin/depmod installed

2018-07-01 Thread Masahiro Yamada

2018-07-02 2:12 GMT+09:00 Randy Dunlap :
> From: Randy Dunlap 
>
> Verify that 'depmod' ($DEPMOD) is installed.
> This is a partial revert of 620c231c7a7f (from 2012):
>   ("kbuild: do not check for ancient modutils tools")
>
> Fixes kernel bugzilla #198965:
> https://bugzilla.kernel.org/show_bug.cgi?id=198965
>
> Signed-off-by: Randy Dunlap 
> Cc: Lucas De Marchi 
> Cc: Michal Marek 
> Cc: Chih-Wei Huang 
> Cc: sta...@vger.kernel.org # any kernel since 2012
> ---
>  scripts/depmod.sh |8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> --- lnx-418-rc2.orig/scripts/depmod.sh
> +++ lnx-418-rc2/scripts/depmod.sh
> @@ -10,10 +10,16 @@ fi
>  DEPMOD=$1
>  KERNELRELEASE=$2
>
> -if ! test -r System.map -a -x "$DEPMOD"; then
> +if ! test -r System.map ; then
> exit 0
>  fi
>
> +if [ -z $(command -v $DEPMOD) ]; then
> +   echo "'make *install' requires $DEPMOD. Please install it." >&2



I think depmod is required by 'make modules_install'

Is there any reason to make this ambiguous like 'make *install' ?




> +   echo "This is probably in the module-init-tools package." >&2
> +   exit 1
> +fi
> +
>  # older versions of depmod require the version string to start with three
>  # numbers, so we cheat with a symlink here
>  depmod_hack_needed=true
>
>



-- 
Best Regards
Masahiro Yamada

RE: [PATCH v1] ARM: dts: imx6sl-evk: keep sw4 always on

2018-07-01 Thread Anson Huang



Anson Huang
Best Regards!


> -Original Message-
> From: Fabio Estevam [mailto:feste...@gmail.com]
> Sent: Monday, July 2, 2018 9:17 AM
> To: Anson Huang 
> Cc: Shawn Guo ; Robin Gong ;
> Mark Rutland ; open list:OPEN FIRMWARE AND
> FLATTENED DEVICE TREE BINDINGS ;
> linux-kernel ; Rob Herring
> ; dl-linux-imx ; Sascha Hauer
> ; Fabio Estevam ;
> moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
> 
> Subject: Re: [PATCH v1] ARM: dts: imx6sl-evk: keep sw4 always on
> 
> On Sun, Jul 1, 2018 at 10:09 PM, Anson Huang 
> wrote:
> 
> > On some new i.MX platforms, PFuze switches are used for supplying
> > GPU/VPU or other non-critical modules only, these switches need to be
> > turned off by runtime PM to avoid very high power leakage, like on
> mScale850D.
> 
> Ok, in this case I suggest adding a new property so that the switches can be
> turned off only when the new property is present.
> 
> When this new property is absent, then we keep the current behavior and avoid
> dtb breakage.
> 
> Since MX8M support is not in place yet, this is not urgent, so I will send a 
> revert
> and then you can re-work the patch so that it does not affect the old dtbs.
> 
> Do you agree with such approach?

Sure, I agree for now. As I did NOT want to have any breakage. Thanks.
 
Anson.

[PATCH] regulator: Revert "regulator: pfuze100: add enable/disable for switch"

2018-07-01 Thread Fabio Estevam

From: Fabio Estevam 

This reverts commit 5fe156f1cab4f340ddb6283c993912be77594016.

Commit 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch")
causes boot regression on some platforms such as imx6sl-evk and
imx6sll-evk.

After this commit the SW4 regulator will be turned
off and since it supplies the DDR voltage on these boards, a
kernel hang is observed.

Revert it to avoid breaking old dtb's.

Fixes: 5fe156f1cab4 ("regulator: pfuze100: add enable/disable for switch")
Signed-off-by: Fabio Estevam 
---
 drivers/regulator/pfuze100-regulator.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/regulator/pfuze100-regulator.c 
b/drivers/regulator/pfuze100-regulator.c
index 32f9af7..cde6eda 100644
--- a/drivers/regulator/pfuze100-regulator.c
+++ b/drivers/regulator/pfuze100-regulator.c
@@ -163,9 +163,6 @@ static const struct regulator_ops 
pfuze100_fixed_regulator_ops = {
 };
 
 static const struct regulator_ops pfuze100_sw_regulator_ops = {
-   .enable = regulator_enable_regmap,
-   .disable = regulator_disable_regmap,
-   .is_enabled = regulator_is_enabled_regmap,
.list_voltage = regulator_list_voltage_linear,
.set_voltage_sel = regulator_set_voltage_sel_regmap,
.get_voltage_sel = regulator_get_voltage_sel_regmap,
@@ -212,11 +209,6 @@ static const struct regulator_ops 
pfuze100_swb_regulator_ops = {
.uV_step = (step),  \
.vsel_reg = (base) + PFUZE100_VOL_OFFSET,   \
.vsel_mask = 0x3f,  \
-   .enable_reg = (base) + PFUZE100_MODE_OFFSET,\
-   .enable_val = 0xc,  \
-   .disable_val = 0x0, \
-   .enable_mask = 0xf, \
-   .enable_time = 500, \
},  \
.stby_reg = (base) + PFUZE100_STANDBY_OFFSET,   \
.stby_mask = 0x3f,  \
-- 
2.7.4

Re: [PATCH v1] ARM: dts: imx6sl-evk: keep sw4 always on

2018-07-01 Thread Fabio Estevam

On Sun, Jul 1, 2018 at 10:09 PM, Anson Huang  wrote:

> On some new i.MX platforms, PFuze switches are used for supplying GPU/VPU
> or other non-critical modules only, these switches need to be turned off by
> runtime PM to avoid very high power leakage, like on mScale850D.

Ok, in this case I suggest adding a new property so that the switches
can be turned off only when the new property is present.

When this new property is absent, then we keep the current behavior
and avoid dtb breakage.

Since MX8M support is not in place yet, this is not urgent, so I will
send a revert and then you can re-work the patch so that it does not
affect the old dtbs.

Do you agree with such approach?

1 2 3 4 5 6 7 8 9 >

1 - 100 of 893 matches

Mail list logo