date:20171129

Re: [PATCH] dt-bindings: Remove leading 0x from bindings notation

2017-11-29 Thread Mathieu Malaterre

Hi David,

On Thu, Nov 30, 2017 at 12:21 AM, David Daney  wrote:
> On 11/29/2017 12:55 PM, Mathieu Malaterre wrote:
>>
>> Improve the binding example by removing all the leading 0x to fix the
>> following dtc warnings:
>>
>> Warning (unit_address_format): Node /XXX unit name should not have leading
>> "0x"
>
>
> How does it fix the warnings?  You are not changing the .dts files that are
> compiled.

I originally only wanted to fix [...]watchdog/ingenic,jz4740-wdt.txt,
but when I lookup git log, I eventually found out about the commit I
refer to in my commit message:

https://github.com/torvalds/linux/commit/48c926cd3414

and I simply followed suggestion from Rob:

https://lkml.org/lkml/2017/11/1/965

> This may also cause the binding documentation to differ from the reality of
> what the actual device trees contain.


Chicken or the egg dilemma, but you understand that linux master tree
still has the original warning:

$ perl -p -i -e 's/\@0+([0-9a-f])/\@$1/g' `find ./ -type f \( -iname
\*.dtsi -o -iname \*.dts \)`
$ git diff | diffstat
[...]
 40 files changed, 160 insertions(+), 160 deletions(-)

And those are real W=1 actual warnings. Do you want me to re-submit it
as patch series instead which fix both the documentation side and the
dts* files ?


>
>>
>> Converted using the following command:
>>
>> find Documentation/devicetree/bindings -name "*.txt" -exec sed -i -e
>> 's/([^ ])\@0x([0-9a-f])/$1\@$2/g' {} +
>>
>> This is a follow up to commit 48c926cd3414
>>
>> Signed-off-by: Mathieu Malaterre 
>> ---
>> I've also checked using the original perl command that I did not
>> introduce:
>>
>> Warning (unit_address_format): Node /XXX unit name should not have leading
>> 0s
>>
>>   Documentation/devicetree/bindings/arm/ccn.txt|  2 +-
>>   Documentation/devicetree/bindings/arm/omap/crossbar.txt  |  2 +-
>>   .../devicetree/bindings/arm/tegra/nvidia,tegra20-mc.txt  |  2 +-
>>   Documentation/devicetree/bindings/clock/axi-clkgen.txt   |  2 +-
>>   .../devicetree/bindings/clock/brcm,bcm2835-aux-clock.txt |  2 +-
>>   Documentation/devicetree/bindings/clock/exynos4-clock.txt|  2 +-
>>   Documentation/devicetree/bindings/clock/exynos5250-clock.txt |  2 +-
>>   Documentation/devicetree/bindings/clock/exynos5410-clock.txt |  2 +-
>>   Documentation/devicetree/bindings/clock/exynos5420-clock.txt |  2 +-
>>   Documentation/devicetree/bindings/clock/exynos5440-clock.txt |  2 +-
>>   .../devicetree/bindings/clock/ti-keystone-pllctrl.txt|  2 +-
>>   Documentation/devicetree/bindings/clock/zx296702-clk.txt |  4 ++--
>>   Documentation/devicetree/bindings/crypto/fsl-sec4.txt|  4 ++--
>>   .../devicetree/bindings/devfreq/event/rockchip-dfi.txt   |  2 +-
>>   Documentation/devicetree/bindings/display/atmel,lcdc.txt |  4 ++--
>>   Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt|  4 ++--
>>   Documentation/devicetree/bindings/dma/zxdma.txt  |  2 +-
>>   Documentation/devicetree/bindings/gpio/gpio-altera.txt   |  2 +-
>>   Documentation/devicetree/bindings/i2c/i2c-jz4780.txt |  2 +-
>>   Documentation/devicetree/bindings/iio/pressure/hp03.txt  |  2 +-
>>   .../devicetree/bindings/input/touchscreen/bu21013.txt|  2 +-
>>   .../devicetree/bindings/interrupt-controller/arm,gic.txt |  4 ++--
>>   .../bindings/interrupt-controller/img,meta-intc.txt  |  2 +-
>>   .../bindings/interrupt-controller/img,pdc-intc.txt   |  2 +-
>>   .../bindings/interrupt-controller/st,spear3xx-shirq.txt  |  2 +-
>>   Documentation/devicetree/bindings/mailbox/altera-mailbox.txt |  6 +++---
>>   .../devicetree/bindings/mailbox/brcm,iproc-pdc-mbox.txt  |  2 +-
>>   Documentation/devicetree/bindings/media/exynos5-gsc.txt  |  2 +-
>>   Documentation/devicetree/bindings/media/mediatek-vcodec.txt  |  2 +-
>>   Documentation/devicetree/bindings/media/rcar_vin.txt |  2 +-
>>   Documentation/devicetree/bindings/media/samsung-fimc.txt |  2 +-
>>   Documentation/devicetree/bindings/media/sh_mobile_ceu.txt|  2 +-
>>   Documentation/devicetree/bindings/media/video-interfaces.txt | 10
>> +-
>>   .../devicetree/bindings/memory-controllers/ti/emif.txt   |  2 +-
>>   .../devicetree/bindings/mfd/ti-keystone-devctrl.txt  |  2 +-
>>   Documentation/devicetree/bindings/misc/brcm,kona-smc.txt |  2 +-
>>   Documentation/devicetree/bindings/mmc/brcm,kona-sdhci.txt|  2 +-
>>   Documentation/devicetree/bindings/mmc/brcm,sdhci-iproc.txt   |  2 +-
>>   Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt  |  4 ++--
>>   Documentation/devicetree/bindings/mtd/gpmc-nor.txt   |  6 +++---
>>   Documentation/devicetree/bindings/mtd/mtk-nand.txt   |  2 +-
>>   Documentation/devicetree/bindings/net/altera_tse.txt |  4 ++--
>>   Documentation/devicetree/bindings/net/mdio.txt   |  2 +-
>>

Re: [PATCH RFC 2/2] mm, hugetlb: do not rely on overcommit limit during migration

2017-11-29 Thread Michal Hocko

On Wed 29-11-17 11:52:53, Mike Kravetz wrote:
> On 11/29/2017 01:22 AM, Michal Hocko wrote:
> > What about this on top. I haven't tested this yet though.
> 
> Yes, this would work.
> 
> However, I think a simple modification to your previous free_huge_page
> changes would make this unnecessary.  I was confused in your previous
> patch because you decremented the per-node surplus page count, but not
> the global count.  I think it would have been correct (and made this
> patch unnecessary) if you decremented the global counter there as well.

We cannot really increment the global counter because the over number of
surplus pages during migration doesn't increase.

> Of course, this patch makes the surplus accounting more explicit.
> 
> If we move forward with this patch, one issue below.
> 
> > ---
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 1b6d7783c717..f5fcd4e355dc 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -119,6 +119,7 @@ long hugetlb_unreserve_pages(struct inode *inode, long 
> > start, long end,
> > long freed);
> >  bool isolate_huge_page(struct page *page, struct list_head *list);
> >  void putback_active_hugepage(struct page *page);
> > +void move_hugetlb_state(struct page *oldpage, struct page *newpage, int 
> > reason);
> >  void free_huge_page(struct page *page);
> >  void hugetlb_fix_reserve_counts(struct inode *inode);
> >  extern struct mutex *hugetlb_fault_mutex_table;
> > @@ -232,6 +233,7 @@ static inline bool isolate_huge_page(struct page *page, 
> > struct list_head *list)
> > return false;
> >  }
> >  #define putback_active_hugepage(p) do {} while (0)
> > +#define move_hugetlb_state(old, new, reason)   do {} while (0)
> >  
> >  static inline unsigned long hugetlb_change_protection(struct 
> > vm_area_struct *vma,
> > unsigned long address, unsigned long end, pgprot_t newprot)
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 037bf0f89463..30601c1c62f3 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -34,6 +34,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "internal.h"
> >  
> >  int hugetlb_max_hstate __read_mostly;
> > @@ -4830,3 +4831,34 @@ void putback_active_hugepage(struct page *page)
> > spin_unlock(_lock);
> > put_page(page);
> >  }
> > +
> > +void move_hugetlb_state(struct page *oldpage, struct page *newpage, int 
> > reason)
> > +{
> > +   struct hstate *h = page_hstate(oldpage);
> > +
> > +   hugetlb_cgroup_migrate(oldpage, newpage);
> > +   set_page_owner_migrate_reason(newpage, reason);
> > +
> > +   /*
> > +* transfer temporary state of the new huge page. This is
> > +* reverse to other transitions because the newpage is going to
> > +* be final while the old one will be freed so it takes over
> > +* the temporary status.
> > +*
> > +* Also note that we have to transfer the per-node surplus state
> > +* here as well otherwise the global surplus count will not match
> > +* the per-node's.
> > +*/
> > +   if (PageHugeTemporary(newpage)) {
> > +   int old_nid = page_to_nid(oldpage);
> > +   int new_nid = page_to_nid(newpage);
> > +
> > +   SetPageHugeTemporary(oldpage);
> > +   ClearPageHugeTemporary(newpage);
> > +
> > +   if (h->surplus_huge_pages_node[old_nid]) {
> > +   h->surplus_huge_pages_node[old_nid]--;
> > +   h->surplus_huge_pages_node[new_nid]++;
> > +   }
> 
> You need to take hugetlb_lock before adjusting the surplus counts.

You are right. Actually moving the code to hugetlb.c was exactly because
I didn't want to take the lock outside of the hugetlb proper. I just
forgot to add it here. Thanks for spotting.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] dt-bindings: Remove leading 0x from bindings notation

2017-11-29 Thread Mathieu Malaterre

Hi David,

On Thu, Nov 30, 2017 at 12:21 AM, David Daney  wrote:
> On 11/29/2017 12:55 PM, Mathieu Malaterre wrote:
>>
>> Improve the binding example by removing all the leading 0x to fix the
>> following dtc warnings:
>>
>> Warning (unit_address_format): Node /XXX unit name should not have leading
>> "0x"
>
>
> How does it fix the warnings?  You are not changing the .dts files that are
> compiled.

I originally only wanted to fix [...]watchdog/ingenic,jz4740-wdt.txt,
but when I lookup git log, I eventually found out about the commit I
refer to in my commit message:

https://github.com/torvalds/linux/commit/48c926cd3414

and I simply followed suggestion from Rob:

https://lkml.org/lkml/2017/11/1/965

> This may also cause the binding documentation to differ from the reality of
> what the actual device trees contain.


Chicken or the egg dilemma, but you understand that linux master tree
still has the original warning:

$ perl -p -i -e 's/\@0+([0-9a-f])/\@$1/g' `find ./ -type f \( -iname
\*.dtsi -o -iname \*.dts \)`
$ git diff | diffstat
[...]
 40 files changed, 160 insertions(+), 160 deletions(-)

And those are real W=1 actual warnings. Do you want me to re-submit it
as patch series instead which fix both the documentation side and the
dts* files ?


>
>>
>> Converted using the following command:
>>
>> find Documentation/devicetree/bindings -name "*.txt" -exec sed -i -e
>> 's/([^ ])\@0x([0-9a-f])/$1\@$2/g' {} +
>>
>> This is a follow up to commit 48c926cd3414
>>
>> Signed-off-by: Mathieu Malaterre 
>> ---
>> I've also checked using the original perl command that I did not
>> introduce:
>>
>> Warning (unit_address_format): Node /XXX unit name should not have leading
>> 0s
>>
>>   Documentation/devicetree/bindings/arm/ccn.txt|  2 +-
>>   Documentation/devicetree/bindings/arm/omap/crossbar.txt  |  2 +-
>>   .../devicetree/bindings/arm/tegra/nvidia,tegra20-mc.txt  |  2 +-
>>   Documentation/devicetree/bindings/clock/axi-clkgen.txt   |  2 +-
>>   .../devicetree/bindings/clock/brcm,bcm2835-aux-clock.txt |  2 +-
>>   Documentation/devicetree/bindings/clock/exynos4-clock.txt|  2 +-
>>   Documentation/devicetree/bindings/clock/exynos5250-clock.txt |  2 +-
>>   Documentation/devicetree/bindings/clock/exynos5410-clock.txt |  2 +-
>>   Documentation/devicetree/bindings/clock/exynos5420-clock.txt |  2 +-
>>   Documentation/devicetree/bindings/clock/exynos5440-clock.txt |  2 +-
>>   .../devicetree/bindings/clock/ti-keystone-pllctrl.txt|  2 +-
>>   Documentation/devicetree/bindings/clock/zx296702-clk.txt |  4 ++--
>>   Documentation/devicetree/bindings/crypto/fsl-sec4.txt|  4 ++--
>>   .../devicetree/bindings/devfreq/event/rockchip-dfi.txt   |  2 +-
>>   Documentation/devicetree/bindings/display/atmel,lcdc.txt |  4 ++--
>>   Documentation/devicetree/bindings/dma/qcom_hidma_mgmt.txt|  4 ++--
>>   Documentation/devicetree/bindings/dma/zxdma.txt  |  2 +-
>>   Documentation/devicetree/bindings/gpio/gpio-altera.txt   |  2 +-
>>   Documentation/devicetree/bindings/i2c/i2c-jz4780.txt |  2 +-
>>   Documentation/devicetree/bindings/iio/pressure/hp03.txt  |  2 +-
>>   .../devicetree/bindings/input/touchscreen/bu21013.txt|  2 +-
>>   .../devicetree/bindings/interrupt-controller/arm,gic.txt |  4 ++--
>>   .../bindings/interrupt-controller/img,meta-intc.txt  |  2 +-
>>   .../bindings/interrupt-controller/img,pdc-intc.txt   |  2 +-
>>   .../bindings/interrupt-controller/st,spear3xx-shirq.txt  |  2 +-
>>   Documentation/devicetree/bindings/mailbox/altera-mailbox.txt |  6 +++---
>>   .../devicetree/bindings/mailbox/brcm,iproc-pdc-mbox.txt  |  2 +-
>>   Documentation/devicetree/bindings/media/exynos5-gsc.txt  |  2 +-
>>   Documentation/devicetree/bindings/media/mediatek-vcodec.txt  |  2 +-
>>   Documentation/devicetree/bindings/media/rcar_vin.txt |  2 +-
>>   Documentation/devicetree/bindings/media/samsung-fimc.txt |  2 +-
>>   Documentation/devicetree/bindings/media/sh_mobile_ceu.txt|  2 +-
>>   Documentation/devicetree/bindings/media/video-interfaces.txt | 10
>> +-
>>   .../devicetree/bindings/memory-controllers/ti/emif.txt   |  2 +-
>>   .../devicetree/bindings/mfd/ti-keystone-devctrl.txt  |  2 +-
>>   Documentation/devicetree/bindings/misc/brcm,kona-smc.txt |  2 +-
>>   Documentation/devicetree/bindings/mmc/brcm,kona-sdhci.txt|  2 +-
>>   Documentation/devicetree/bindings/mmc/brcm,sdhci-iproc.txt   |  2 +-
>>   Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt  |  4 ++--
>>   Documentation/devicetree/bindings/mtd/gpmc-nor.txt   |  6 +++---
>>   Documentation/devicetree/bindings/mtd/mtk-nand.txt   |  2 +-
>>   Documentation/devicetree/bindings/net/altera_tse.txt |  4 ++--
>>   Documentation/devicetree/bindings/net/mdio.txt   |  2 +-
>>   Documentation/devicetree/bindings/net/socfpga-dwmac.txt  |  2 +-
>>

Re: [PATCH RFC 2/2] mm, hugetlb: do not rely on overcommit limit during migration

2017-11-29 Thread Michal Hocko

On Wed 29-11-17 11:52:53, Mike Kravetz wrote:
> On 11/29/2017 01:22 AM, Michal Hocko wrote:
> > What about this on top. I haven't tested this yet though.
> 
> Yes, this would work.
> 
> However, I think a simple modification to your previous free_huge_page
> changes would make this unnecessary.  I was confused in your previous
> patch because you decremented the per-node surplus page count, but not
> the global count.  I think it would have been correct (and made this
> patch unnecessary) if you decremented the global counter there as well.

We cannot really increment the global counter because the over number of
surplus pages during migration doesn't increase.

> Of course, this patch makes the surplus accounting more explicit.
> 
> If we move forward with this patch, one issue below.
> 
> > ---
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 1b6d7783c717..f5fcd4e355dc 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -119,6 +119,7 @@ long hugetlb_unreserve_pages(struct inode *inode, long 
> > start, long end,
> > long freed);
> >  bool isolate_huge_page(struct page *page, struct list_head *list);
> >  void putback_active_hugepage(struct page *page);
> > +void move_hugetlb_state(struct page *oldpage, struct page *newpage, int 
> > reason);
> >  void free_huge_page(struct page *page);
> >  void hugetlb_fix_reserve_counts(struct inode *inode);
> >  extern struct mutex *hugetlb_fault_mutex_table;
> > @@ -232,6 +233,7 @@ static inline bool isolate_huge_page(struct page *page, 
> > struct list_head *list)
> > return false;
> >  }
> >  #define putback_active_hugepage(p) do {} while (0)
> > +#define move_hugetlb_state(old, new, reason)   do {} while (0)
> >  
> >  static inline unsigned long hugetlb_change_protection(struct 
> > vm_area_struct *vma,
> > unsigned long address, unsigned long end, pgprot_t newprot)
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 037bf0f89463..30601c1c62f3 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -34,6 +34,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "internal.h"
> >  
> >  int hugetlb_max_hstate __read_mostly;
> > @@ -4830,3 +4831,34 @@ void putback_active_hugepage(struct page *page)
> > spin_unlock(_lock);
> > put_page(page);
> >  }
> > +
> > +void move_hugetlb_state(struct page *oldpage, struct page *newpage, int 
> > reason)
> > +{
> > +   struct hstate *h = page_hstate(oldpage);
> > +
> > +   hugetlb_cgroup_migrate(oldpage, newpage);
> > +   set_page_owner_migrate_reason(newpage, reason);
> > +
> > +   /*
> > +* transfer temporary state of the new huge page. This is
> > +* reverse to other transitions because the newpage is going to
> > +* be final while the old one will be freed so it takes over
> > +* the temporary status.
> > +*
> > +* Also note that we have to transfer the per-node surplus state
> > +* here as well otherwise the global surplus count will not match
> > +* the per-node's.
> > +*/
> > +   if (PageHugeTemporary(newpage)) {
> > +   int old_nid = page_to_nid(oldpage);
> > +   int new_nid = page_to_nid(newpage);
> > +
> > +   SetPageHugeTemporary(oldpage);
> > +   ClearPageHugeTemporary(newpage);
> > +
> > +   if (h->surplus_huge_pages_node[old_nid]) {
> > +   h->surplus_huge_pages_node[old_nid]--;
> > +   h->surplus_huge_pages_node[new_nid]++;
> > +   }
> 
> You need to take hugetlb_lock before adjusting the surplus counts.

You are right. Actually moving the code to hugetlb.c was exactly because
I didn't want to take the lock outside of the hugetlb proper. I just
forgot to add it here. Thanks for spotting.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] Support TrackStick of Thinkpad L570

2017-11-29 Thread Aaron Ma

Please add the patch version next time.

The patch make trackstick work on L570.

Tested-by: Aaron Ma 

On 11/29/2017 04:33 PM, Masaki Ota wrote:
> From: Masaki Ota 
> - The issue is that Thinkpad L570 TrackStick does not work. Because the main 
> interface of Thinkpad L570 device is SMBus, so ALPS overlooked PS2 interface 
> Firmware setting of TrackStick. The detail is that TrackStick otp bit is 
> disabled.
> - Add the code that checks 0xD7 address value. This value is device number 
> information, so we can identify the device by checking this value.
> - If we check 0xD7 value, we need to enable Command mode and after check the 
> value we need to disable Command mode, then we have to enable the device(0xF4 
> command).
> - Thinkpad L570 device number is 0x0C or 0x1D. If it is TRUE, enable 
> ALPS_DUALPOINT flag.
> 
> Signed-off-by: Masaki Ota 
> ---
>  drivers/input/mouse/alps.c | 24 +---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
> index 850b00e3ad8e..6f092bdd9fc5 100644
> --- a/drivers/input/mouse/alps.c
> +++ b/drivers/input/mouse/alps.c
> @@ -2541,13 +2541,31 @@ static int alps_update_btn_info_ss4_v2(unsigned char 
> otp[][4],
>  }
>  
>  static int alps_update_dual_info_ss4_v2(unsigned char otp[][4],
> -struct alps_data *priv)
> +struct alps_data *priv,
> + struct psmouse *psmouse)
>  {
>   bool is_dual = false;
> + int reg_val = 0;
> + struct ps2dev *ps2dev = >ps2dev;
>  
> - if (IS_SS4PLUS_DEV(priv->dev_id))
> + if (IS_SS4PLUS_DEV(priv->dev_id)) {
>   is_dual = (otp[0][0] >> 4) & 0x01;
>  
> + if (!is_dual) {
> + /* For support TrackStick of Thinkpad L/E series */
> + if (alps_exit_command_mode(psmouse) == 0 &&
> + alps_enter_command_mode(psmouse) == 0) {
> + reg_val = alps_command_mode_read_reg(psmouse,
> + 0xD7);
> + }
> + alps_exit_command_mode(psmouse);
> + ps2_command(ps2dev, NULL, PSMOUSE_CMD_ENABLE);
> +
> + if (reg_val == 0x0C || reg_val == 0x1D)
> + is_dual = true;
> + }
> + }
> +
>   if (is_dual)
>   priv->flags |= ALPS_DUALPOINT |
>   ALPS_DUALPOINT_WITH_PRESSURE;
> @@ -2570,7 +2588,7 @@ static int alps_set_defaults_ss4_v2(struct psmouse 
> *psmouse,
>  
>   alps_update_btn_info_ss4_v2(otp, priv);
>  
> - alps_update_dual_info_ss4_v2(otp, priv);
> + alps_update_dual_info_ss4_v2(otp, priv, psmouse);
>  
>   return 0;
>  }
>

Re: [PATCH] Support TrackStick of Thinkpad L570

2017-11-29 Thread Aaron Ma

Please add the patch version next time.

The patch make trackstick work on L570.

Tested-by: Aaron Ma 

On 11/29/2017 04:33 PM, Masaki Ota wrote:
> From: Masaki Ota 
> - The issue is that Thinkpad L570 TrackStick does not work. Because the main 
> interface of Thinkpad L570 device is SMBus, so ALPS overlooked PS2 interface 
> Firmware setting of TrackStick. The detail is that TrackStick otp bit is 
> disabled.
> - Add the code that checks 0xD7 address value. This value is device number 
> information, so we can identify the device by checking this value.
> - If we check 0xD7 value, we need to enable Command mode and after check the 
> value we need to disable Command mode, then we have to enable the device(0xF4 
> command).
> - Thinkpad L570 device number is 0x0C or 0x1D. If it is TRUE, enable 
> ALPS_DUALPOINT flag.
> 
> Signed-off-by: Masaki Ota 
> ---
>  drivers/input/mouse/alps.c | 24 +---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
> index 850b00e3ad8e..6f092bdd9fc5 100644
> --- a/drivers/input/mouse/alps.c
> +++ b/drivers/input/mouse/alps.c
> @@ -2541,13 +2541,31 @@ static int alps_update_btn_info_ss4_v2(unsigned char 
> otp[][4],
>  }
>  
>  static int alps_update_dual_info_ss4_v2(unsigned char otp[][4],
> -struct alps_data *priv)
> +struct alps_data *priv,
> + struct psmouse *psmouse)
>  {
>   bool is_dual = false;
> + int reg_val = 0;
> + struct ps2dev *ps2dev = >ps2dev;
>  
> - if (IS_SS4PLUS_DEV(priv->dev_id))
> + if (IS_SS4PLUS_DEV(priv->dev_id)) {
>   is_dual = (otp[0][0] >> 4) & 0x01;
>  
> + if (!is_dual) {
> + /* For support TrackStick of Thinkpad L/E series */
> + if (alps_exit_command_mode(psmouse) == 0 &&
> + alps_enter_command_mode(psmouse) == 0) {
> + reg_val = alps_command_mode_read_reg(psmouse,
> + 0xD7);
> + }
> + alps_exit_command_mode(psmouse);
> + ps2_command(ps2dev, NULL, PSMOUSE_CMD_ENABLE);
> +
> + if (reg_val == 0x0C || reg_val == 0x1D)
> + is_dual = true;
> + }
> + }
> +
>   if (is_dual)
>   priv->flags |= ALPS_DUALPOINT |
>   ALPS_DUALPOINT_WITH_PRESSURE;
> @@ -2570,7 +2588,7 @@ static int alps_set_defaults_ss4_v2(struct psmouse 
> *psmouse,
>  
>   alps_update_btn_info_ss4_v2(otp, priv);
>  
> - alps_update_dual_info_ss4_v2(otp, priv);
> + alps_update_dual_info_ss4_v2(otp, priv, psmouse);
>  
>   return 0;
>  }
>

Re: [PATCH v7 2/4] KVM: X86: Add Paravirt TLB Shootdown

2017-11-29 Thread Wanpeng Li

2017-11-30 14:01 GMT+08:00 Wanpeng Li :
> From: Wanpeng Li 
>
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
>
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter.
>
> The best result is achieved when we're overcommiting the host by running
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
> vCPUs which are not scheduled and avoid the wait on the main CPU.
>
> Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
> so 64 pCPUs, and each VM is 64 vCPUs.
>
> ebizzy -M
>   vanillaoptimized boost
> 1VM46799   48670 4%
> 2VM23962   4269178%
> 3VM16152   37539   132%
>
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Peter Zijlstra 
> Signed-off-by: Wanpeng Li 
> ---
>  Documentation/virtual/kvm/cpuid.txt  |  4 +++
>  arch/x86/include/uapi/asm/kvm_para.h |  2 ++
>  arch/x86/kernel/kvm.c| 47 
> 
>  3 files changed, 53 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/cpuid.txt 
> b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..dcab6dc 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,10 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest 
> checks this feature bit
> ||   || before enabling 
> paravirtualized
> ||   || spinlock support.
>  
> --
> +KVM_FEATURE_PV_TLB_FLUSH   || 9 || guest checks this feature bit
> +   ||   || before enabling 
> paravirtualized
> +   ||   || tlb flush.
> +--
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no 
> guest-side
> ||   || per-cpu warps are expected in
> ||   || kvmclock.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 763b692..8fbcc16 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -25,6 +25,7 @@
>  #define KVM_FEATURE_STEAL_TIME 5
>  #define KVM_FEATURE_PV_EOI 6
>  #define KVM_FEATURE_PV_UNHALT  7
> +#define KVM_FEATURE_PV_TLB_FLUSH   9
>
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> @@ -53,6 +54,7 @@ struct kvm_steal_time {
>
>  #define KVM_VCPU_NOT_PREEMPTED  (0 << 0)
>  #define KVM_VCPU_PREEMPTED  (1 << 0)
> +#define KVM_VCPU_SHOULD_FLUSH   (1 << 1)
>
>  #define KVM_CLOCK_PAIRING_WALLCLOCK 0
>  struct kvm_clock_pairing {
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 6610b92..64fb9a4 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -498,6 +498,34 @@ static void __init kvm_apf_trap_init(void)
> update_intr_gate(X86_TRAP_PF, async_page_fault);
>  }
>
> +static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
> +
> +static void kvm_flush_tlb_others(const struct cpumask *cpumask,
> +   const struct flush_tlb_info *info)
> +{
> +   u8 state;
> +   int cpu;
> +   struct kvm_steal_time *src;
> +   struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_tlb_mask);
> +
> +   cpumask_copy(flushmask, cpumask);
> +   /*
> +* We have to call flush only on online vCPUs. And
> +* queue flush_on_enter for pre-empted vCPUs
> +*/
> +   for_each_cpu(cpu, flushmask) {
> +   src = _cpu(steal_time, cpu);
> +   state = READ_ONCE(src->preempted);
> +   if ((state & KVM_VCPU_PREEMPTED)) {
> +   if (try_cmpxchg(>preempted, ,
> +   state | KVM_VCPU_SHOULD_FLUSH))
> +   __cpumask_clear_cpu(cpu, flushmask);
> +   }
> +   }
> +
> +   native_flush_tlb_others(flushmask, info);
> +}
> +
>  static void __init kvm_guest_init(void)
>  {
> int i;
> @@ -517,6 +545,9 @@ static void __init kvm_guest_init(void)
> pv_time_ops.steal_clock = kvm_steal_clock;
> }
>
> +   if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH))
> +   pv_mmu_ops.flush_tlb_others =

Re: [PATCH v7 2/4] KVM: X86: Add Paravirt TLB Shootdown

2017-11-29 Thread Wanpeng Li

2017-11-30 14:01 GMT+08:00 Wanpeng Li :
> From: Wanpeng Li 
>
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
>
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter.
>
> The best result is achieved when we're overcommiting the host by running
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
> vCPUs which are not scheduled and avoid the wait on the main CPU.
>
> Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
> so 64 pCPUs, and each VM is 64 vCPUs.
>
> ebizzy -M
>   vanillaoptimized boost
> 1VM46799   48670 4%
> 2VM23962   4269178%
> 3VM16152   37539   132%
>
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Peter Zijlstra 
> Signed-off-by: Wanpeng Li 
> ---
>  Documentation/virtual/kvm/cpuid.txt  |  4 +++
>  arch/x86/include/uapi/asm/kvm_para.h |  2 ++
>  arch/x86/kernel/kvm.c| 47 
> 
>  3 files changed, 53 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/cpuid.txt 
> b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..dcab6dc 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,10 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest 
> checks this feature bit
> ||   || before enabling 
> paravirtualized
> ||   || spinlock support.
>  
> --
> +KVM_FEATURE_PV_TLB_FLUSH   || 9 || guest checks this feature bit
> +   ||   || before enabling 
> paravirtualized
> +   ||   || tlb flush.
> +--
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no 
> guest-side
> ||   || per-cpu warps are expected in
> ||   || kvmclock.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 763b692..8fbcc16 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -25,6 +25,7 @@
>  #define KVM_FEATURE_STEAL_TIME 5
>  #define KVM_FEATURE_PV_EOI 6
>  #define KVM_FEATURE_PV_UNHALT  7
> +#define KVM_FEATURE_PV_TLB_FLUSH   9
>
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> @@ -53,6 +54,7 @@ struct kvm_steal_time {
>
>  #define KVM_VCPU_NOT_PREEMPTED  (0 << 0)
>  #define KVM_VCPU_PREEMPTED  (1 << 0)
> +#define KVM_VCPU_SHOULD_FLUSH   (1 << 1)
>
>  #define KVM_CLOCK_PAIRING_WALLCLOCK 0
>  struct kvm_clock_pairing {
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 6610b92..64fb9a4 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -498,6 +498,34 @@ static void __init kvm_apf_trap_init(void)
> update_intr_gate(X86_TRAP_PF, async_page_fault);
>  }
>
> +static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
> +
> +static void kvm_flush_tlb_others(const struct cpumask *cpumask,
> +   const struct flush_tlb_info *info)
> +{
> +   u8 state;
> +   int cpu;
> +   struct kvm_steal_time *src;
> +   struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_tlb_mask);
> +
> +   cpumask_copy(flushmask, cpumask);
> +   /*
> +* We have to call flush only on online vCPUs. And
> +* queue flush_on_enter for pre-empted vCPUs
> +*/
> +   for_each_cpu(cpu, flushmask) {
> +   src = _cpu(steal_time, cpu);
> +   state = READ_ONCE(src->preempted);
> +   if ((state & KVM_VCPU_PREEMPTED)) {
> +   if (try_cmpxchg(>preempted, ,
> +   state | KVM_VCPU_SHOULD_FLUSH))
> +   __cpumask_clear_cpu(cpu, flushmask);
> +   }
> +   }
> +
> +   native_flush_tlb_others(flushmask, info);
> +}
> +
>  static void __init kvm_guest_init(void)
>  {
> int i;
> @@ -517,6 +545,9 @@ static void __init kvm_guest_init(void)
> pv_time_ops.steal_clock = kvm_steal_clock;
> }
>
> +   if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH))
> +   pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
> +
> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>

Re: [PATCH v2 25/35] nds32: Build infrastructure

2017-11-29 Thread Geert Uytterhoeven

On Thu, Nov 30, 2017 at 6:48 AM, Greentime Hu  wrote:
> 2017-11-30 4:27 GMT+08:00 Arnd Bergmann :
>> On Wed, Nov 29, 2017 at 3:10 PM, Greentime Hu  wrote:
>>> 2017-11-29 19:57 GMT+08:00 Arnd Bergmann :
 On Wed, Nov 29, 2017 at 12:39 PM, Greentime Hu  wrote:
> I think I can use this name "CPU_V3" for all nds32 v3 compatible cpu.
> It will be implemented like this.
>
> config HWZOL
> bool "hardware zero overhead loop support"
> depends on CPU_D10 || CPU_D15
> default n
> help
>   A set of Zero-Overhead Loop mechanism is provided to reduce the
>   instruction fetch and execution overhead of loop-control 
> instructions.
>   It will save 3 registers($LB, $LC, $LE) for context saving if say Y.
>   You don't need to save these registers if you can make sure your 
> user
>   program doesn't use these registers.
>
>   If unsure, say N.
>
> config CPU_CACHE_NONALIASING
> bool "Non-aliasing cache"
> depends on !CPU_N10 && !CPU_D10
> default n
> help
>   If this CPU is using VIPT data cache and its cache way size is 
> larger
>   than page size, say N. If it is using PIPT data cache, say Y.
>
>   If unsure, say N.

I still think it will be easier to revert the logic, and have
CPU_CACHE_ALIASING.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v2 25/35] nds32: Build infrastructure

2017-11-29 Thread Geert Uytterhoeven

On Thu, Nov 30, 2017 at 6:48 AM, Greentime Hu  wrote:
> 2017-11-30 4:27 GMT+08:00 Arnd Bergmann :
>> On Wed, Nov 29, 2017 at 3:10 PM, Greentime Hu  wrote:
>>> 2017-11-29 19:57 GMT+08:00 Arnd Bergmann :
 On Wed, Nov 29, 2017 at 12:39 PM, Greentime Hu  wrote:
> I think I can use this name "CPU_V3" for all nds32 v3 compatible cpu.
> It will be implemented like this.
>
> config HWZOL
> bool "hardware zero overhead loop support"
> depends on CPU_D10 || CPU_D15
> default n
> help
>   A set of Zero-Overhead Loop mechanism is provided to reduce the
>   instruction fetch and execution overhead of loop-control 
> instructions.
>   It will save 3 registers($LB, $LC, $LE) for context saving if say Y.
>   You don't need to save these registers if you can make sure your 
> user
>   program doesn't use these registers.
>
>   If unsure, say N.
>
> config CPU_CACHE_NONALIASING
> bool "Non-aliasing cache"
> depends on !CPU_N10 && !CPU_D10
> default n
> help
>   If this CPU is using VIPT data cache and its cache way size is 
> larger
>   than page size, say N. If it is using PIPT data cache, say Y.
>
>   If unsure, say N.

I still think it will be easier to revert the logic, and have
CPU_CACHE_ALIASING.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v2 1/2] ARM: dts: exynos: Switch to dedicated Odroid-XU3 sound card binding

2017-11-29 Thread Krzysztof Kozlowski

On Thu, Nov 30, 2017 at 8:30 AM, Marek Szyprowski
 wrote:
> Hi Krzysztof,
>
> On 2017-11-29 18:55, Krzysztof Kozlowski wrote:
>>
>> On Mon, Nov 27, 2017 at 7:12 PM, Krzysztof Kozlowski 
>> wrote:
>>>
>>> On Fri, Nov 03, 2017 at 05:54:45PM +0100, Sylwester Nawrocki wrote:

 The new sound card DT binding is used for Odroid XU3 in order
 to properly support the HDMI audio path.
 Clocks configuration is changed so the I2S controller is now the bit
 and the frame clock master with EPLL as the root clock source.

 Signed-off-by: Sylwester Nawrocki 
 ---
   arch/arm/boot/dts/exynos4.dtsi|  1 +
   arch/arm/boot/dts/exynos5420.dtsi |  1 +
   arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 60
 ++-
   3 files changed, 40 insertions(+), 22 deletions(-)

>>
>> Unfortunately this patch causes the audio card to disappear on Odroid
>> XU3. "aplay -L" shows nothing and obviously speaker-test fails.
>>
>> Applied on v4.15-rc1... any dependencies are missing?
>>
>> Full boot-logs are here:
>> http://www.krzk.eu/#/builders/1/builds/976
>> (test exits on aplay -L).
>>
>> Should this be dropped?
>
>
> Please add CONFIG_SND_SOC_ODROID=y to you .config. Probably exynos_defconfig
> and multi_v7_defconfig should be updated too.

That would explain it. Can you send a patch for both of configs?

Best regards,
Krzysztof

Re: [PATCH v2 1/2] ARM: dts: exynos: Switch to dedicated Odroid-XU3 sound card binding

2017-11-29 Thread Krzysztof Kozlowski

On Thu, Nov 30, 2017 at 8:30 AM, Marek Szyprowski
 wrote:
> Hi Krzysztof,
>
> On 2017-11-29 18:55, Krzysztof Kozlowski wrote:
>>
>> On Mon, Nov 27, 2017 at 7:12 PM, Krzysztof Kozlowski 
>> wrote:
>>>
>>> On Fri, Nov 03, 2017 at 05:54:45PM +0100, Sylwester Nawrocki wrote:

 The new sound card DT binding is used for Odroid XU3 in order
 to properly support the HDMI audio path.
 Clocks configuration is changed so the I2S controller is now the bit
 and the frame clock master with EPLL as the root clock source.

 Signed-off-by: Sylwester Nawrocki 
 ---
   arch/arm/boot/dts/exynos4.dtsi|  1 +
   arch/arm/boot/dts/exynos5420.dtsi |  1 +
   arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 60
 ++-
   3 files changed, 40 insertions(+), 22 deletions(-)

>>
>> Unfortunately this patch causes the audio card to disappear on Odroid
>> XU3. "aplay -L" shows nothing and obviously speaker-test fails.
>>
>> Applied on v4.15-rc1... any dependencies are missing?
>>
>> Full boot-logs are here:
>> http://www.krzk.eu/#/builders/1/builds/976
>> (test exits on aplay -L).
>>
>> Should this be dropped?
>
>
> Please add CONFIG_SND_SOC_ODROID=y to you .config. Probably exynos_defconfig
> and multi_v7_defconfig should be updated too.

That would explain it. Can you send a patch for both of configs?

Best regards,
Krzysztof

RE: [PATCH 0/4] lockd refcount conversions

2017-11-29 Thread Reshetova, Elena

> Thanks, applying all four for 4.16.--b.

Thank you very much!

Best Regards,
Elena.

> 
> On Wed, Nov 29, 2017 at 01:15:42PM +0200, Elena Reshetova wrote:
> > This series, for lockd component, replaces atomic_t reference
> > counters with the new refcount_t type and API (see 
> > include/linux/refcount.h).
> > By doing this we prevent intentional or accidental
> > underflows or overflows that can led to use-after-free vulnerabilities.
> >
> > The patches are fully independent and can be cherry-picked separately.
> > If there are no objections to the patches, please merge them via respective
> tree.
> >
> > Elena Reshetova (4):
> >   lockd: convert nlm_host.h_count from atomic_t to refcount_t
> >   lockd: convert nsm_handle.sm_count from atomic_t to refcount_t
> >   lockd: convert nlm_lockowner.count from atomic_t to refcount_t
> >   lockd: convert nlm_rqst.a_count from atomic_t to refcount_t
> >
> >  fs/lockd/clntproc.c | 14 +++---
> >  fs/lockd/host.c | 16 
> >  fs/lockd/mon.c  | 14 +++---
> >  fs/lockd/svcproc.c  |  2 +-
> >  include/linux/lockd/lockd.h |  9 +
> >  5 files changed, 28 insertions(+), 27 deletions(-)
> >
> > --
> > 2.7.4

RE: [PATCH 0/4] lockd refcount conversions

2017-11-29 Thread Reshetova, Elena

> Thanks, applying all four for 4.16.--b.

Thank you very much!

Best Regards,
Elena.

> 
> On Wed, Nov 29, 2017 at 01:15:42PM +0200, Elena Reshetova wrote:
> > This series, for lockd component, replaces atomic_t reference
> > counters with the new refcount_t type and API (see 
> > include/linux/refcount.h).
> > By doing this we prevent intentional or accidental
> > underflows or overflows that can led to use-after-free vulnerabilities.
> >
> > The patches are fully independent and can be cherry-picked separately.
> > If there are no objections to the patches, please merge them via respective
> tree.
> >
> > Elena Reshetova (4):
> >   lockd: convert nlm_host.h_count from atomic_t to refcount_t
> >   lockd: convert nsm_handle.sm_count from atomic_t to refcount_t
> >   lockd: convert nlm_lockowner.count from atomic_t to refcount_t
> >   lockd: convert nlm_rqst.a_count from atomic_t to refcount_t
> >
> >  fs/lockd/clntproc.c | 14 +++---
> >  fs/lockd/host.c | 16 
> >  fs/lockd/mon.c  | 14 +++---
> >  fs/lockd/svcproc.c  |  2 +-
> >  include/linux/lockd/lockd.h |  9 +
> >  5 files changed, 28 insertions(+), 27 deletions(-)
> >
> > --
> > 2.7.4

Re: [PATCH 2/2] lockdep: Up MAX_LOCKDEP_CHAINS

2017-11-29 Thread Peter Zijlstra

On Wed, Nov 29, 2017 at 04:41:45PM +0100, Daniel Vetter wrote:
> cross-release ftl
> 
> From Chris:
> 
> "Fwiw, this isn't cross-release but us reloading the module many times,
> creating a whole host of new lockclasses. Even more fun is when the
> module gets a slightly different address and the new lock address hashes
> into an old lock...

Yeah, this is a known issue, just reboot.

> "I did think about a module-hook to revoke the stale lockclasses, but
> that still leaves all the hashed chains.

Its an absolute royal pain to remove all the resources consumed by a
module, and if you manage you then have to deal with fragmented storage
-- that is, we need to go keep track of which entries are used.

Its a giant heap of complexity that's just not worth it.


Given all that, I don't see why we should up this. Just don't reload
modules (or better, don't use modules at all).

Re: [PATCH 2/2] lockdep: Up MAX_LOCKDEP_CHAINS

2017-11-29 Thread Peter Zijlstra

On Wed, Nov 29, 2017 at 04:41:45PM +0100, Daniel Vetter wrote:
> cross-release ftl
> 
> From Chris:
> 
> "Fwiw, this isn't cross-release but us reloading the module many times,
> creating a whole host of new lockclasses. Even more fun is when the
> module gets a slightly different address and the new lock address hashes
> into an old lock...

Yeah, this is a known issue, just reboot.

> "I did think about a module-hook to revoke the stale lockclasses, but
> that still leaves all the hashed chains.

Its an absolute royal pain to remove all the resources consumed by a
module, and if you manage you then have to deal with fragmented storage
-- that is, we need to go keep track of which entries are used.

Its a giant heap of complexity that's just not worth it.


Given all that, I don't see why we should up this. Just don't reload
modules (or better, don't use modules at all).

[PATCH v3] tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented

2017-11-29 Thread Javier Martinez Canillas

According to the TPM Library Specification, a TPM device must do a command
header validation before processing and return a TPM_RC_COMMAND_CODE code
if the command is not implemented.

So user-space will expect to handle that response as an error. But if the
in-kernel resource manager is used (/dev/tpmrm?), an -EINVAL errno code is
returned instead if the command isn't implemented. This confuses userspace
since it doesn't expect that error value.

This also isn't consistent with the behavior when not using TPM spaces and
accessing the TPM directly (/dev/tpm?). In this case, the command is sent
to the TPM even when not implemented and the TPM responds with an error.

Instead of returning an -EINVAL errno code when the tpm_validate_command()
function fails, synthesize a TPM command response so user-space can get a
TPM_RC_COMMAND_CODE as expected when a chip doesn't implement the command.

The TPM only sets 12 of the 32 bits in the TPM_RC response, so the TSS and
TAB specifications define that higher layers in the stack should use some
of the unused 20 bits to specify from which level of the stack the error
is coming from.

Since the TPM_RC_COMMAND_CODE response code is sent by the kernel resource
manager, set the error level to the TAB/RM layer so user-space is aware of
this.

Suggested-by: Jason Gunthorpe 
Signed-off-by: Javier Martinez Canillas 
Reviewed-by: Philip Tricca 

---

Changes since v2:
- Use TSS2 as prefix instead of TPM2 to match the spec (suggested by Jarkko
  Sakkinen).
- Add Philip's Reviewed-by tag as he asked and I missed in previous version.

Changes since v1:
- Remove the unused macros for the driver and resmgr RC layers (suggested by
  Philip Tricca).
- Use naming conventions from the latest version of the TSS spec (suggested
  by Philip Tricca).

Changes since RFCv2:
- Set the error level to the TAB/RM layer so user-space is aware that the error
  is not coming from the TPM (suggested by Philip Tricca and Jarkko Sakkinen).

Changes since RFCv1:
- Don't pass not validated commands to the TPM, instead return a synthesized
  response with the correct TPM return code (suggested by Jason Gunthorpe).

And example of user-space getting confused by the TPM chardev returning -EINVAL
when sending a not supported TPM command can be seen in this tpm2-tools issue:

https://github.com/intel/tpm2-tools/issues/621

Best regards,
Javier

 drivers/char/tpm/tpm-interface.c | 28 
 drivers/char/tpm/tpm.h   |  5 +
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 1d6729be4cd6..c5da6d3f7058 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -328,7 +328,7 @@ unsigned long tpm_calc_ordinal_duration(struct tpm_chip 
*chip,
 }
 EXPORT_SYMBOL_GPL(tpm_calc_ordinal_duration);
 
-static bool tpm_validate_command(struct tpm_chip *chip,
+static int tpm_validate_command(struct tpm_chip *chip,
 struct tpm_space *space,
 const u8 *cmd,
 size_t len)
@@ -340,10 +340,10 @@ static bool tpm_validate_command(struct tpm_chip *chip,
unsigned int nr_handles;
 
if (len < TPM_HEADER_SIZE)
-   return false;
+   return -EINVAL;
 
if (!space)
-   return true;
+   return 0;
 
if (chip->flags & TPM_CHIP_FLAG_TPM2 && chip->nr_commands) {
cc = be32_to_cpu(header->ordinal);
@@ -352,7 +352,7 @@ static bool tpm_validate_command(struct tpm_chip *chip,
if (i < 0) {
dev_dbg(>dev, "0x%04X is an invalid command\n",
cc);
-   return false;
+   return -EOPNOTSUPP;
}
 
attrs = chip->cc_attrs_tbl[i];
@@ -362,11 +362,11 @@ static bool tpm_validate_command(struct tpm_chip *chip,
goto err_len;
}
 
-   return true;
+   return 0;
 err_len:
dev_dbg(>dev,
"%s: insufficient command length %zu", __func__, len);
-   return false;
+   return -EINVAL;
 }
 
 /**
@@ -391,8 +391,20 @@ ssize_t tpm_transmit(struct tpm_chip *chip, struct 
tpm_space *space,
unsigned long stop;
bool need_locality;
 
-   if (!tpm_validate_command(chip, space, buf, bufsiz))
-   return -EINVAL;
+   rc = tpm_validate_command(chip, space, buf, bufsiz);
+   if (rc == -EINVAL)
+   return rc;
+   /*
+* If the command is not implemented by the TPM, synthesize a
+* response with a TPM2_RC_COMMAND_CODE return for user-space.
+*/
+   if (rc == -EOPNOTSUPP) {
+   header->length = cpu_to_be32(sizeof(*header));
+   header->tag = cpu_to_be16(TPM2_ST_NO_SESSIONS);
+

[PATCH v3] tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented

2017-11-29 Thread Javier Martinez Canillas

According to the TPM Library Specification, a TPM device must do a command
header validation before processing and return a TPM_RC_COMMAND_CODE code
if the command is not implemented.

So user-space will expect to handle that response as an error. But if the
in-kernel resource manager is used (/dev/tpmrm?), an -EINVAL errno code is
returned instead if the command isn't implemented. This confuses userspace
since it doesn't expect that error value.

This also isn't consistent with the behavior when not using TPM spaces and
accessing the TPM directly (/dev/tpm?). In this case, the command is sent
to the TPM even when not implemented and the TPM responds with an error.

Instead of returning an -EINVAL errno code when the tpm_validate_command()
function fails, synthesize a TPM command response so user-space can get a
TPM_RC_COMMAND_CODE as expected when a chip doesn't implement the command.

The TPM only sets 12 of the 32 bits in the TPM_RC response, so the TSS and
TAB specifications define that higher layers in the stack should use some
of the unused 20 bits to specify from which level of the stack the error
is coming from.

Since the TPM_RC_COMMAND_CODE response code is sent by the kernel resource
manager, set the error level to the TAB/RM layer so user-space is aware of
this.

Suggested-by: Jason Gunthorpe 
Signed-off-by: Javier Martinez Canillas 
Reviewed-by: Philip Tricca 

---

Changes since v2:
- Use TSS2 as prefix instead of TPM2 to match the spec (suggested by Jarkko
  Sakkinen).
- Add Philip's Reviewed-by tag as he asked and I missed in previous version.

Changes since v1:
- Remove the unused macros for the driver and resmgr RC layers (suggested by
  Philip Tricca).
- Use naming conventions from the latest version of the TSS spec (suggested
  by Philip Tricca).

Changes since RFCv2:
- Set the error level to the TAB/RM layer so user-space is aware that the error
  is not coming from the TPM (suggested by Philip Tricca and Jarkko Sakkinen).

Changes since RFCv1:
- Don't pass not validated commands to the TPM, instead return a synthesized
  response with the correct TPM return code (suggested by Jason Gunthorpe).

And example of user-space getting confused by the TPM chardev returning -EINVAL
when sending a not supported TPM command can be seen in this tpm2-tools issue:

https://github.com/intel/tpm2-tools/issues/621

Best regards,
Javier

 drivers/char/tpm/tpm-interface.c | 28 
 drivers/char/tpm/tpm.h   |  5 +
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 1d6729be4cd6..c5da6d3f7058 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -328,7 +328,7 @@ unsigned long tpm_calc_ordinal_duration(struct tpm_chip 
*chip,
 }
 EXPORT_SYMBOL_GPL(tpm_calc_ordinal_duration);
 
-static bool tpm_validate_command(struct tpm_chip *chip,
+static int tpm_validate_command(struct tpm_chip *chip,
 struct tpm_space *space,
 const u8 *cmd,
 size_t len)
@@ -340,10 +340,10 @@ static bool tpm_validate_command(struct tpm_chip *chip,
unsigned int nr_handles;
 
if (len < TPM_HEADER_SIZE)
-   return false;
+   return -EINVAL;
 
if (!space)
-   return true;
+   return 0;
 
if (chip->flags & TPM_CHIP_FLAG_TPM2 && chip->nr_commands) {
cc = be32_to_cpu(header->ordinal);
@@ -352,7 +352,7 @@ static bool tpm_validate_command(struct tpm_chip *chip,
if (i < 0) {
dev_dbg(>dev, "0x%04X is an invalid command\n",
cc);
-   return false;
+   return -EOPNOTSUPP;
}
 
attrs = chip->cc_attrs_tbl[i];
@@ -362,11 +362,11 @@ static bool tpm_validate_command(struct tpm_chip *chip,
goto err_len;
}
 
-   return true;
+   return 0;
 err_len:
dev_dbg(>dev,
"%s: insufficient command length %zu", __func__, len);
-   return false;
+   return -EINVAL;
 }
 
 /**
@@ -391,8 +391,20 @@ ssize_t tpm_transmit(struct tpm_chip *chip, struct 
tpm_space *space,
unsigned long stop;
bool need_locality;
 
-   if (!tpm_validate_command(chip, space, buf, bufsiz))
-   return -EINVAL;
+   rc = tpm_validate_command(chip, space, buf, bufsiz);
+   if (rc == -EINVAL)
+   return rc;
+   /*
+* If the command is not implemented by the TPM, synthesize a
+* response with a TPM2_RC_COMMAND_CODE return for user-space.
+*/
+   if (rc == -EOPNOTSUPP) {
+   header->length = cpu_to_be32(sizeof(*header));
+   header->tag = cpu_to_be16(TPM2_ST_NO_SESSIONS);
+   header->return_code = cpu_to_be32(TPM2_RC_COMMAND_CODE

Re: [PATCHv2 0/4] x86: 5-level related changes into decompression code

2017-11-29 Thread Kirill A. Shutemov

On Wed, Nov 29, 2017 at 05:48:51PM +, Borislav Petkov wrote:
> On Wed, Nov 29, 2017 at 08:08:31PM +0300, Kirill A. Shutemov wrote:
> > We're really early in the boot -- startup_64 in decompression code -- and
> > I don't know a way print a message there. Is there a way?
> > 
> > no_longmode handled by just hanging the machine. Is it enough for no_la57
> > case too?
> 
> Patch pls.

The patch below on top of patch 2/4 from this patch would do the trick.

Please give it a shot.

>From 95b5489d1f4ea03c6226d13eb6797825234489d6 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" 
Date: Thu, 30 Nov 2017 10:23:53 +0300
Subject: [PATCH] x86/boot/compressed/64: Print error if 5-level paging is not
 supported

We cannot proceed booting if the machine doesn't support the paging mode
kernel was compiled for.

Getting error the usual way -- via validate_cpu() -- is not going to
work. We need to enable appropriate paging mode before that, otherwise
kernel would triple-fault during KASLR setup.

This code will go away once we get support for boot-time switching
between paging modes.

Signed-off-by: Kirill A. Shutemov 
---
 arch/x86/boot/compressed/misc.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index b50c42455e25..5205e848dc33 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -40,6 +40,8 @@
 /* Functions used by the included decompressor code below. */
 void *memmove(void *dest, const void *src, size_t n);
 
+int l5_paging_required(void);
+
 /*
  * This is set up by the setup-routine at boot-time
  */
@@ -362,6 +364,13 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
console_init();
debug_putstr("early console in extract_kernel\n");
 
+   if (IS_ENABLED(CONFIG_X86_5LEVEL) && !l5_paging_required()) {
+   error("The kernel is compiled with 5-level paging enabled, "
+   "but the CPU doesn't support la57\n"
+   "Unable to boot - please use "
+   "a kernel appropriate for your CPU.\n");
+   }
+
free_mem_ptr = heap;/* Heap */
free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
-- 
 Kirill A. Shutemov

Re: [PATCHv2 0/4] x86: 5-level related changes into decompression code

2017-11-29 Thread Kirill A. Shutemov

On Wed, Nov 29, 2017 at 05:48:51PM +, Borislav Petkov wrote:
> On Wed, Nov 29, 2017 at 08:08:31PM +0300, Kirill A. Shutemov wrote:
> > We're really early in the boot -- startup_64 in decompression code -- and
> > I don't know a way print a message there. Is there a way?
> > 
> > no_longmode handled by just hanging the machine. Is it enough for no_la57
> > case too?
> 
> Patch pls.

The patch below on top of patch 2/4 from this patch would do the trick.

Please give it a shot.

>From 95b5489d1f4ea03c6226d13eb6797825234489d6 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" 
Date: Thu, 30 Nov 2017 10:23:53 +0300
Subject: [PATCH] x86/boot/compressed/64: Print error if 5-level paging is not
 supported

We cannot proceed booting if the machine doesn't support the paging mode
kernel was compiled for.

Getting error the usual way -- via validate_cpu() -- is not going to
work. We need to enable appropriate paging mode before that, otherwise
kernel would triple-fault during KASLR setup.

This code will go away once we get support for boot-time switching
between paging modes.

Signed-off-by: Kirill A. Shutemov 
---
 arch/x86/boot/compressed/misc.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index b50c42455e25..5205e848dc33 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -40,6 +40,8 @@
 /* Functions used by the included decompressor code below. */
 void *memmove(void *dest, const void *src, size_t n);
 
+int l5_paging_required(void);
+
 /*
  * This is set up by the setup-routine at boot-time
  */
@@ -362,6 +364,13 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
console_init();
debug_putstr("early console in extract_kernel\n");
 
+   if (IS_ENABLED(CONFIG_X86_5LEVEL) && !l5_paging_required()) {
+   error("The kernel is compiled with 5-level paging enabled, "
+   "but the CPU doesn't support la57\n"
+   "Unable to boot - please use "
+   "a kernel appropriate for your CPU.\n");
+   }
+
free_mem_ptr = heap;/* Heap */
free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
-- 
 Kirill A. Shutemov

Re: [PATCH v2 1/2] ARM: dts: exynos: Switch to dedicated Odroid-XU3 sound card binding

2017-11-29 Thread Marek Szyprowski


Hi Krzysztof,

On 2017-11-29 18:55, Krzysztof Kozlowski wrote:

On Mon, Nov 27, 2017 at 7:12 PM, Krzysztof Kozlowski  wrote:

On Fri, Nov 03, 2017 at 05:54:45PM +0100, Sylwester Nawrocki wrote:

The new sound card DT binding is used for Odroid XU3 in order
to properly support the HDMI audio path.
Clocks configuration is changed so the I2S controller is now the bit
and the frame clock master with EPLL as the root clock source.

Signed-off-by: Sylwester Nawrocki 
---
  arch/arm/boot/dts/exynos4.dtsi|  1 +
  arch/arm/boot/dts/exynos5420.dtsi |  1 +
  arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 60 ++-
  3 files changed, 40 insertions(+), 22 deletions(-)



Unfortunately this patch causes the audio card to disappear on Odroid
XU3. "aplay -L" shows nothing and obviously speaker-test fails.

Applied on v4.15-rc1... any dependencies are missing?

Full boot-logs are here:
http://www.krzk.eu/#/builders/1/builds/976
(test exits on aplay -L).

Should this be dropped?


Please add CONFIG_SND_SOC_ODROID=y to you .config. Probably exynos_defconfig
and multi_v7_defconfig should be updated too.

Best regards
--
Marek Szyprowski, PhD
Samsung R Institute Poland

Re: [PATCH v2 1/2] ARM: dts: exynos: Switch to dedicated Odroid-XU3 sound card binding

2017-11-29 Thread Marek Szyprowski


Hi Krzysztof,

On 2017-11-29 18:55, Krzysztof Kozlowski wrote:

On Mon, Nov 27, 2017 at 7:12 PM, Krzysztof Kozlowski  wrote:

On Fri, Nov 03, 2017 at 05:54:45PM +0100, Sylwester Nawrocki wrote:

The new sound card DT binding is used for Odroid XU3 in order
to properly support the HDMI audio path.
Clocks configuration is changed so the I2S controller is now the bit
and the frame clock master with EPLL as the root clock source.

Signed-off-by: Sylwester Nawrocki 
---
  arch/arm/boot/dts/exynos4.dtsi|  1 +
  arch/arm/boot/dts/exynos5420.dtsi |  1 +
  arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 60 ++-
  3 files changed, 40 insertions(+), 22 deletions(-)



Unfortunately this patch causes the audio card to disappear on Odroid
XU3. "aplay -L" shows nothing and obviously speaker-test fails.

Applied on v4.15-rc1... any dependencies are missing?

Full boot-logs are here:
http://www.krzk.eu/#/builders/1/builds/976
(test exits on aplay -L).

Should this be dropped?


Please add CONFIG_SND_SOC_ODROID=y to you .config. Probably exynos_defconfig
and multi_v7_defconfig should be updated too.

Best regards
--
Marek Szyprowski, PhD
Samsung R Institute Poland

Re: [PATCH v3 0/5] ACPI: DMA ranges management

2017-11-29 Thread Feng Kan

On Thu, Aug 3, 2017 at 5:32 AM, Lorenzo Pieralisi
 wrote:
> This patch series is v3 of a previous posting:
>
> v2->v3:
> - Fixed DMA masks computation
> - Fixed size computation overflow in acpi_dma_get_range()
>
> v1->v2:
> - Reworked acpi_dma_get_range() flow and logs
> - Added IORT named component address limits
> - Renamed acpi_dev_get_resources() helper function
> - Rebased against v4.13-rc3
>
> v2: http://lkml.kernel.org/r/20170731152323.32488-1-lorenzo.pieral...@arm.com
> v1: http://lkml.kernel.org/r/20170720144517.32529-1-lorenzo.pieral...@arm.com
>
> -- Original cover letter --
>
> As reported in:
>
> http://lkml.kernel.org/r/cal85gma_sscwm80tkdkzqee+s1bewzdevdki1kpkmutdrms...@mail.gmail.com
>
> the bus connecting devices to an IOMMU bus can be smaller in size than
> the IOMMU input address bits which results in devices DMA HW bugs in
> particular related to IOVA allocation (ie chopping of higher address
> bits owing to system bus HW capabilities mismatch with the IOMMU).
>
> Fortunately this problem can be solved through an already present but never
> used ACPI 6.2 firmware bindings (ie _DMA object) allowing to define the DMA
> window for a specific bus in ACPI and therefore all upstream devices
> connected to it.
>
> This small patch series enables _DMA parsing in ACPI core code and
> use it in ACPI IORT code in order to detect DMA ranges for devices and
> update their data structures to make them work with their related DMA
> addressing restrictions.
>
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Feng Kan 
> Cc: Jon Masters 
> Cc: Robert Moore 
> Cc: Robin Murphy 
> Cc: Zhang Rui 
> Cc: "Rafael J. Wysocki" 
>
> Lorenzo Pieralisi (5):
>   ACPICA: resource_mgr: Allow _DMA method in walk resources
>   ACPI: Make acpi_dev_get_resources() method agnostic
>   ACPI: Introduce DMA ranges parsing
>   ACPI: Make acpi_dma_configure() DMA regions aware
>   ACPI/IORT: Add IORT named component memory address limits
>
>  drivers/acpi/acpica/rsxface.c |  7 ++--
>  drivers/acpi/arm64/iort.c | 57 ++-
>  drivers/acpi/resource.c   | 82 +-
>  drivers/acpi/scan.c   | 91 
> +++
>  include/acpi/acnames.h|  1 +
>  include/acpi/acpi_bus.h   |  2 +
>  include/linux/acpi.h  |  8 
>  include/linux/acpi_iort.h |  5 ++-
>  8 files changed, 219 insertions(+), 34 deletions(-)
>
> --
> 2.10.0
>
Lorenzo:

A network driver can use pci_set_dma_mask or its like to override what
is done with this patch here.
Which would result in iova allocation greater than the original _DMA
aperture. Should we force
the dma_set_mask to not change if an existing mask is already set?

Re: [PATCH v3 0/5] ACPI: DMA ranges management

2017-11-29 Thread Feng Kan

On Thu, Aug 3, 2017 at 5:32 AM, Lorenzo Pieralisi
 wrote:
> This patch series is v3 of a previous posting:
>
> v2->v3:
> - Fixed DMA masks computation
> - Fixed size computation overflow in acpi_dma_get_range()
>
> v1->v2:
> - Reworked acpi_dma_get_range() flow and logs
> - Added IORT named component address limits
> - Renamed acpi_dev_get_resources() helper function
> - Rebased against v4.13-rc3
>
> v2: http://lkml.kernel.org/r/20170731152323.32488-1-lorenzo.pieral...@arm.com
> v1: http://lkml.kernel.org/r/20170720144517.32529-1-lorenzo.pieral...@arm.com
>
> -- Original cover letter --
>
> As reported in:
>
> http://lkml.kernel.org/r/cal85gma_sscwm80tkdkzqee+s1bewzdevdki1kpkmutdrms...@mail.gmail.com
>
> the bus connecting devices to an IOMMU bus can be smaller in size than
> the IOMMU input address bits which results in devices DMA HW bugs in
> particular related to IOVA allocation (ie chopping of higher address
> bits owing to system bus HW capabilities mismatch with the IOMMU).
>
> Fortunately this problem can be solved through an already present but never
> used ACPI 6.2 firmware bindings (ie _DMA object) allowing to define the DMA
> window for a specific bus in ACPI and therefore all upstream devices
> connected to it.
>
> This small patch series enables _DMA parsing in ACPI core code and
> use it in ACPI IORT code in order to detect DMA ranges for devices and
> update their data structures to make them work with their related DMA
> addressing restrictions.
>
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Feng Kan 
> Cc: Jon Masters 
> Cc: Robert Moore 
> Cc: Robin Murphy 
> Cc: Zhang Rui 
> Cc: "Rafael J. Wysocki" 
>
> Lorenzo Pieralisi (5):
>   ACPICA: resource_mgr: Allow _DMA method in walk resources
>   ACPI: Make acpi_dev_get_resources() method agnostic
>   ACPI: Introduce DMA ranges parsing
>   ACPI: Make acpi_dma_configure() DMA regions aware
>   ACPI/IORT: Add IORT named component memory address limits
>
>  drivers/acpi/acpica/rsxface.c |  7 ++--
>  drivers/acpi/arm64/iort.c | 57 ++-
>  drivers/acpi/resource.c   | 82 +-
>  drivers/acpi/scan.c   | 91 
> +++
>  include/acpi/acnames.h|  1 +
>  include/acpi/acpi_bus.h   |  2 +
>  include/linux/acpi.h  |  8 
>  include/linux/acpi_iort.h |  5 ++-
>  8 files changed, 219 insertions(+), 34 deletions(-)
>
> --
> 2.10.0
>
Lorenzo:

A network driver can use pci_set_dma_mask or its like to override what
is done with this patch here.
Which would result in iova allocation greater than the original _DMA
aperture. Should we force
the dma_set_mask to not change if an existing mask is already set?

Re: [PATCH v4 2/2] ASoC: fsl_ssi: add 20-bit sample format for AC'97 and use it for capture

2017-11-29 Thread Nicolin Chen

Hi Maciej,

On Mon, Nov 27, 2017 at 11:34:44PM +0100, Maciej S. Szmigiero wrote:
> There is no problem in using different bit widths in playback and capture
> in AC'97 mode so allow this, too.

> @@ -1557,11 +1558,12 @@ static int fsl_ssi_probe(struct platform_device *pdev)
>  
>   /* Are the RX and the TX clocks locked? */
>   if (!of_find_property(np, "fsl,ssi-asynchronous", NULL)) {
> - if (!fsl_ssi_is_ac97(ssi_private))
> + if (!fsl_ssi_is_ac97(ssi_private)) {
>   ssi_private->cpu_dai_drv.symmetric_rates = 1;
> + ssi_private->cpu_dai_drv.symmetric_samplebits = 1;
> + }
>  
>   ssi_private->cpu_dai_drv.symmetric_channels = 1;
> - ssi_private->cpu_dai_drv.symmetric_samplebits = 1;
>   }

I was actually wondering how the AC97 works in the synchronous mode
while being able to handle different bit widths. Then I found that
the drivers does corresponding configurations for synchronous mode
only if symmetric_rates is set -- which is unset for AC97 cases. So
in fact AC97 case (symmetric_rates unset) is probably being treated
as asynchronous mode by the driver -- it'd be better if you confirm
this for me.

And I am not so sure about the physical pin connections in an AC97
situation, but I started to think that, instead of having a change
above, AC97 cases might be supposed to have "fsl,ssi-asynchronous"
property in DT since it's working when the driver sets both TX and
RX control registers (i.e. asynchronous mode), not like synchronous
mode that only sets TX's registers.

Thanks
Nicolin

Re: [PATCH v4 2/2] ASoC: fsl_ssi: add 20-bit sample format for AC'97 and use it for capture

2017-11-29 Thread Nicolin Chen

Hi Maciej,

On Mon, Nov 27, 2017 at 11:34:44PM +0100, Maciej S. Szmigiero wrote:
> There is no problem in using different bit widths in playback and capture
> in AC'97 mode so allow this, too.

> @@ -1557,11 +1558,12 @@ static int fsl_ssi_probe(struct platform_device *pdev)
>  
>   /* Are the RX and the TX clocks locked? */
>   if (!of_find_property(np, "fsl,ssi-asynchronous", NULL)) {
> - if (!fsl_ssi_is_ac97(ssi_private))
> + if (!fsl_ssi_is_ac97(ssi_private)) {
>   ssi_private->cpu_dai_drv.symmetric_rates = 1;
> + ssi_private->cpu_dai_drv.symmetric_samplebits = 1;
> + }
>  
>   ssi_private->cpu_dai_drv.symmetric_channels = 1;
> - ssi_private->cpu_dai_drv.symmetric_samplebits = 1;
>   }

I was actually wondering how the AC97 works in the synchronous mode
while being able to handle different bit widths. Then I found that
the drivers does corresponding configurations for synchronous mode
only if symmetric_rates is set -- which is unset for AC97 cases. So
in fact AC97 case (symmetric_rates unset) is probably being treated
as asynchronous mode by the driver -- it'd be better if you confirm
this for me.

And I am not so sure about the physical pin connections in an AC97
situation, but I started to think that, instead of having a change
above, AC97 cases might be supposed to have "fsl,ssi-asynchronous"
property in DT since it's working when the driver sets both TX and
RX control registers (i.e. asynchronous mode), not like synchronous
mode that only sets TX's registers.

Thanks
Nicolin

Re: [v2,01/12] hwrng: bcm2835 - Obtain base register via resource

2017-11-29 Thread Herbert Xu

On Wed, Nov 29, 2017 at 09:38:52AM -0800, Florian Fainelli wrote:
>
> Hu, okay, I actually had a v3 prepared that I was going to post
> addressing some of the comments. Should I send an incremental set of
> changes now?

Please send it as an incremental set.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [v2,01/12] hwrng: bcm2835 - Obtain base register via resource

2017-11-29 Thread Herbert Xu

On Wed, Nov 29, 2017 at 09:38:52AM -0800, Florian Fainelli wrote:
>
> Hu, okay, I actually had a v3 prepared that I was going to post
> addressing some of the comments. Should I send an incremental set of
> changes now?

Please send it as an incremental set.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: general protection fault in af_alg_free_areq_sgls

2017-11-29 Thread Herbert Xu

On Wed, Nov 29, 2017 at 11:51:09AM -0800, Eric Biggers wrote:
>
> Herbert, if it's not too late can you fix the subject?  It got split into two
> lines:

Sorry, it's already pushed out with other patches sitting on top
of it.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: general protection fault in af_alg_free_areq_sgls

2017-11-29 Thread Herbert Xu

On Wed, Nov 29, 2017 at 11:51:09AM -0800, Eric Biggers wrote:
>
> Herbert, if it's not too late can you fix the subject?  It got split into two
> lines:

Sorry, it's already pushed out with other patches sitting on top
of it.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] f2fs: avoid false positive of free secs check

2017-11-29 Thread Chao Yu

On 2017/11/30 10:42, Yunlong Song wrote:
> SSR can make hot/warm/cold nodes written together, so why should we account
> them different?

Current segment which is using ssr allocation has only one valid type, so we
can not write data/node with different type into current segment which already
has fixed type, right?

Thanks,

> 
> On 2017/11/29 19:56, Chao Yu wrote:
>> On 2017/11/27 14:54, Yunlong Song wrote:
>>> Sometimes f2fs_gc is called with no target victim (e.g. xfstest
>>> generic/027, ndirty_node:545 ndiry_dent:1 ndirty_imeta:513 rsvd_segs:21
>>> free_segs:27, has_not_enough_free_secs will return true). This patch
>>> first merges pages and then converts into sections.
>> I don't think this could be right, IMO, instead, it would be better to
>> account dirty hot/warm/cold nodes or imeta separately, as actually, they
>> will use different section, but currently, our calculation way is based
>> on that they could be written to same section.
>>
>> Thanks,
>>
>>> Signed-off-by: Yunlong Song 
>>> ---
>>>   fs/f2fs/f2fs.h|  9 -
>>>   fs/f2fs/segment.c | 12 +++-
>>>   fs/f2fs/segment.h | 13 +
>>>   3 files changed, 16 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index ca6b0c9..e89cff7 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -1675,15 +1675,6 @@ static inline int get_dirty_pages(struct inode 
>>> *inode)
>>> return atomic_read(_I(inode)->dirty_pages);
>>>   }
>>>   
>>> -static inline int get_blocktype_secs(struct f2fs_sb_info *sbi, int 
>>> block_type)
>>> -{
>>> -   unsigned int pages_per_sec = sbi->segs_per_sec * sbi->blocks_per_seg;
>>> -   unsigned int segs = (get_pages(sbi, block_type) + pages_per_sec - 1) >>
>>> -   sbi->log_blocks_per_seg;
>>> -
>>> -   return segs / sbi->segs_per_sec;
>>> -}
>>> -
>>>   static inline block_t valid_user_blocks(struct f2fs_sb_info *sbi)
>>>   {
>>> return sbi->total_valid_block_count;
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index c117e09..603f805 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -171,17 +171,19 @@ static unsigned long __find_rev_next_zero_bit(const 
>>> unsigned long *addr,
>>>   
>>>   bool need_SSR(struct f2fs_sb_info *sbi)
>>>   {
>>> -   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>>> -   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
>>> -   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
>>> +   s64 node_pages = get_pages(sbi, F2FS_DIRTY_NODES);
>>> +   s64 dent_pages = get_pages(sbi, F2FS_DIRTY_DENTS);
>>> +   s64 imeta_pages = get_pages(sbi, F2FS_DIRTY_IMETA);
>>>   
>>> if (test_opt(sbi, LFS))
>>> return false;
>>> if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
>>> return true;
>>>   
>>> -   return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
>>> -   SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
>>> +   return free_sections(sbi) <=
>>> +   (PAGE2SEC(sbi, node_pages + imeta_pages) +
>>> +   PAGE2SEC(sbi, 2 * dent_pages) +
>>> +   SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
>>>   }
>>>   
>>>   void register_inmem_page(struct inode *inode, struct page *page)
>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
>>> index d1d394c..723d79e 100644
>>> --- a/fs/f2fs/segment.h
>>> +++ b/fs/f2fs/segment.h
>>> @@ -115,6 +115,10 @@
>>>   #define SECTOR_TO_BLOCK(sectors)  \
>>> ((sectors) >> F2FS_LOG_SECTORS_PER_BLOCK)
>>>   
>>> +#define PAGE2SEC(sbi, pages)   \
>>> +   pages) + BLKS_PER_SEC(sbi) - 1) \
>>> +   >> sbi->log_blocks_per_seg) / sbi->segs_per_sec)
>>> +
>>>   /*
>>>* indicate a block allocation direction: RIGHT and LEFT.
>>>* RIGHT means allocating new sections towards the end of volume.
>>> @@ -527,9 +531,9 @@ static inline bool has_curseg_enough_space(struct 
>>> f2fs_sb_info *sbi)
>>>   static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
>>> int freed, int needed)
>>>   {
>>> -   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>>> -   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
>>> -   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
>>> +   s64 node_pages = get_pages(sbi, F2FS_DIRTY_NODES);
>>> +   s64 dent_pages = get_pages(sbi, F2FS_DIRTY_DENTS);
>>> +   s64 imeta_pages = get_pages(sbi, F2FS_DIRTY_IMETA);
>>>   
>>> if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
>>> return false;
>>> @@ -538,7 +542,8 @@ static inline bool has_not_enough_free_secs(struct 
>>> f2fs_sb_info *sbi,
>>> has_curseg_enough_space(sbi))
>>> return false;
>>> return (free_sections(sbi) + freed) <=
>>> -   (node_secs + 2 * dent_secs + imeta_secs +
>>> +

Re: [PATCH] f2fs: avoid false positive of free secs check

2017-11-29 Thread Chao Yu

On 2017/11/30 10:42, Yunlong Song wrote:
> SSR can make hot/warm/cold nodes written together, so why should we account
> them different?

Current segment which is using ssr allocation has only one valid type, so we
can not write data/node with different type into current segment which already
has fixed type, right?

Thanks,

> 
> On 2017/11/29 19:56, Chao Yu wrote:
>> On 2017/11/27 14:54, Yunlong Song wrote:
>>> Sometimes f2fs_gc is called with no target victim (e.g. xfstest
>>> generic/027, ndirty_node:545 ndiry_dent:1 ndirty_imeta:513 rsvd_segs:21
>>> free_segs:27, has_not_enough_free_secs will return true). This patch
>>> first merges pages and then converts into sections.
>> I don't think this could be right, IMO, instead, it would be better to
>> account dirty hot/warm/cold nodes or imeta separately, as actually, they
>> will use different section, but currently, our calculation way is based
>> on that they could be written to same section.
>>
>> Thanks,
>>
>>> Signed-off-by: Yunlong Song 
>>> ---
>>>   fs/f2fs/f2fs.h|  9 -
>>>   fs/f2fs/segment.c | 12 +++-
>>>   fs/f2fs/segment.h | 13 +
>>>   3 files changed, 16 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index ca6b0c9..e89cff7 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -1675,15 +1675,6 @@ static inline int get_dirty_pages(struct inode 
>>> *inode)
>>> return atomic_read(_I(inode)->dirty_pages);
>>>   }
>>>   
>>> -static inline int get_blocktype_secs(struct f2fs_sb_info *sbi, int 
>>> block_type)
>>> -{
>>> -   unsigned int pages_per_sec = sbi->segs_per_sec * sbi->blocks_per_seg;
>>> -   unsigned int segs = (get_pages(sbi, block_type) + pages_per_sec - 1) >>
>>> -   sbi->log_blocks_per_seg;
>>> -
>>> -   return segs / sbi->segs_per_sec;
>>> -}
>>> -
>>>   static inline block_t valid_user_blocks(struct f2fs_sb_info *sbi)
>>>   {
>>> return sbi->total_valid_block_count;
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index c117e09..603f805 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -171,17 +171,19 @@ static unsigned long __find_rev_next_zero_bit(const 
>>> unsigned long *addr,
>>>   
>>>   bool need_SSR(struct f2fs_sb_info *sbi)
>>>   {
>>> -   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>>> -   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
>>> -   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
>>> +   s64 node_pages = get_pages(sbi, F2FS_DIRTY_NODES);
>>> +   s64 dent_pages = get_pages(sbi, F2FS_DIRTY_DENTS);
>>> +   s64 imeta_pages = get_pages(sbi, F2FS_DIRTY_IMETA);
>>>   
>>> if (test_opt(sbi, LFS))
>>> return false;
>>> if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
>>> return true;
>>>   
>>> -   return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
>>> -   SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
>>> +   return free_sections(sbi) <=
>>> +   (PAGE2SEC(sbi, node_pages + imeta_pages) +
>>> +   PAGE2SEC(sbi, 2 * dent_pages) +
>>> +   SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
>>>   }
>>>   
>>>   void register_inmem_page(struct inode *inode, struct page *page)
>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
>>> index d1d394c..723d79e 100644
>>> --- a/fs/f2fs/segment.h
>>> +++ b/fs/f2fs/segment.h
>>> @@ -115,6 +115,10 @@
>>>   #define SECTOR_TO_BLOCK(sectors)  \
>>> ((sectors) >> F2FS_LOG_SECTORS_PER_BLOCK)
>>>   
>>> +#define PAGE2SEC(sbi, pages)   \
>>> +   pages) + BLKS_PER_SEC(sbi) - 1) \
>>> +   >> sbi->log_blocks_per_seg) / sbi->segs_per_sec)
>>> +
>>>   /*
>>>* indicate a block allocation direction: RIGHT and LEFT.
>>>* RIGHT means allocating new sections towards the end of volume.
>>> @@ -527,9 +531,9 @@ static inline bool has_curseg_enough_space(struct 
>>> f2fs_sb_info *sbi)
>>>   static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
>>> int freed, int needed)
>>>   {
>>> -   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>>> -   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
>>> -   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
>>> +   s64 node_pages = get_pages(sbi, F2FS_DIRTY_NODES);
>>> +   s64 dent_pages = get_pages(sbi, F2FS_DIRTY_DENTS);
>>> +   s64 imeta_pages = get_pages(sbi, F2FS_DIRTY_IMETA);
>>>   
>>> if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
>>> return false;
>>> @@ -538,7 +542,8 @@ static inline bool has_not_enough_free_secs(struct 
>>> f2fs_sb_info *sbi,
>>> has_curseg_enough_space(sbi))
>>> return false;
>>> return (free_sections(sbi) + freed) <=
>>> -   (node_secs + 2 * dent_secs + imeta_secs +
>>> +   (PAGE2SEC(sbi, node_pages +

Re: [PATCH 0/5] PCI: Add support to the Cadence PCIe controller

2017-11-29 Thread Kishon Vijay Abraham I

Hi,

On Tuesday 28 November 2017 09:20 PM, Lorenzo Pieralisi wrote:
> On Thu, Nov 23, 2017 at 04:01:45PM +0100, Cyrille Pitchen wrote:
>> Hi all,
>>
>> this series of patches adds support to the Cadence PCIe controller.
>> It was tested on a ARM64 platform emulated by a Palladium running both
>> linux-next (next-20171123) and pci-next kernels.
>>
>> The host mode was tested with some PCIe devices connected to the Palladium
>> through a speed-bridge. Some of those devices were a USB host controller
>> and a SATA controller. The PCIe host controller was also tested with a
>> second controller configured in endpoint mode and connected back to back
>> to the first controller.
>>
>> The EndPoint Controller (EPC) driver of this series was tested with the
>> pci-epf-test.c EndPoint Function (EPF) driver and the pcitest userspace
>> program.
>>
>> For linux-next, I applied this series on top of Kishon's patch
>> ("PCI: endpoint: Use EPC's device in dma_alloc_coherent/dma_free_coherent")
>> otherwise dma_alloc_coherent() fails when called by pci_epf_alloc_space().
>>
>> Also, I patched drivers/Makefile rather than drivers/pci/Makefile to make
>> the drivers/pci/cadence/pcie-cadence-ep.o linked after

The reason to patch drivers/Makefile should be because pcie-cadence-ep has to
be compiled even when CONFIG_PCI is not enabled. CONFIG_PCI enables host
specific features and ENDPOINT shouldn't depend on CONFIG_PCI.
>> drivers/pci/endpoint/*.o objects, otherwise the built-in pci-cadence-ep
>> driver would be probed before the PCI endpoint framework would have been
>> initialized, which results in a kernel crash.
> 
> Nice :( - isn't there a way to improve this (ie probe deferral or
> registering the EPF bus earlier) ?
> 
>> I guess this is the reason why the "pci/dwc" line was also put in
>> drivers/Makefile, right after the "pci/endpoint" line.
> 
> Or probably the other way around - see commit 5e8cb4033807
> 
> @Kishon, thoughts ?

Lorenzo, ordering Makefile is one way to initialize EP core before other
drivers. the other way is to have PCI EP core have a different initcall level..
subsys_initcall??

Thanks
Kishon

Re: [PATCH 0/5] PCI: Add support to the Cadence PCIe controller

2017-11-29 Thread Kishon Vijay Abraham I

Hi,

On Tuesday 28 November 2017 09:20 PM, Lorenzo Pieralisi wrote:
> On Thu, Nov 23, 2017 at 04:01:45PM +0100, Cyrille Pitchen wrote:
>> Hi all,
>>
>> this series of patches adds support to the Cadence PCIe controller.
>> It was tested on a ARM64 platform emulated by a Palladium running both
>> linux-next (next-20171123) and pci-next kernels.
>>
>> The host mode was tested with some PCIe devices connected to the Palladium
>> through a speed-bridge. Some of those devices were a USB host controller
>> and a SATA controller. The PCIe host controller was also tested with a
>> second controller configured in endpoint mode and connected back to back
>> to the first controller.
>>
>> The EndPoint Controller (EPC) driver of this series was tested with the
>> pci-epf-test.c EndPoint Function (EPF) driver and the pcitest userspace
>> program.
>>
>> For linux-next, I applied this series on top of Kishon's patch
>> ("PCI: endpoint: Use EPC's device in dma_alloc_coherent/dma_free_coherent")
>> otherwise dma_alloc_coherent() fails when called by pci_epf_alloc_space().
>>
>> Also, I patched drivers/Makefile rather than drivers/pci/Makefile to make
>> the drivers/pci/cadence/pcie-cadence-ep.o linked after

The reason to patch drivers/Makefile should be because pcie-cadence-ep has to
be compiled even when CONFIG_PCI is not enabled. CONFIG_PCI enables host
specific features and ENDPOINT shouldn't depend on CONFIG_PCI.
>> drivers/pci/endpoint/*.o objects, otherwise the built-in pci-cadence-ep
>> driver would be probed before the PCI endpoint framework would have been
>> initialized, which results in a kernel crash.
> 
> Nice :( - isn't there a way to improve this (ie probe deferral or
> registering the EPF bus earlier) ?
> 
>> I guess this is the reason why the "pci/dwc" line was also put in
>> drivers/Makefile, right after the "pci/endpoint" line.
> 
> Or probably the other way around - see commit 5e8cb4033807
> 
> @Kishon, thoughts ?

Lorenzo, ordering Makefile is one way to initialize EP core before other
drivers. the other way is to have PCI EP core have a different initcall level..
subsys_initcall??

Thanks
Kishon

Re: [PATCH v4 7/8] netdev: octeon-ethernet: Add Cavium Octeon III support.

2017-11-29 Thread Souptick Joarder

Hi David, Dan,


On Thu, Nov 30, 2017 at 12:50 AM, David Daney  wrote:
> On 11/29/2017 08:07 AM, Souptick Joarder wrote:
>>
>> On Wed, Nov 29, 2017 at 4:00 PM, Souptick Joarder 
>> wrote:
>>>
>>> On Wed, Nov 29, 2017 at 6:25 AM, David Daney 
>>> wrote:

 From: Carlos Munoz 

 The Cavium OCTEON cn78xx and cn73xx SoCs have network packet I/O
 hardware that is significantly different from previous generations of
 the family.
>>
>>
 diff --git a/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
 b/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
 new file mode 100644
 index ..4dad35fa4270
 --- /dev/null
 +++ b/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
 @@ -0,0 +1,2033 @@
 +// SPDX-License-Identifier: GPL-2.0
 +/* Copyright (c) 2017 Cavium, Inc.
 + *
 + * This file is subject to the terms and conditions of the GNU General
 Public
 + * License.  See the file "COPYING" in the main directory of this
 archive
 + * for more details.
 + */
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +
>>
>>
 +static void bgx_port_sgmii_set_link_down(struct bgx_port_priv *priv)
 +{
 +   u64 data;
>>
>>
 +   data = oct_csr_read(BGX_GMP_PCS_MISC_CTL(priv->node, priv->bgx,
 priv->index));
 +   data |= BIT(11);
 +   oct_csr_write(data, BGX_GMP_PCS_MISC_CTL(priv->node, priv->bgx,
 priv->index));
 +   data = oct_csr_read(BGX_GMP_PCS_MISC_CTL(priv->node, priv->bgx,
 priv->index));
>>>
>>>
>>> Any particular reason to read immediately after write ?
>>
>>
>
> Yes, to ensure the write is committed to hardware before the next step.
>
>>
>>
 +static int bgx_port_sgmii_set_link_speed(struct bgx_port_priv *priv,
 struct port_status status)
 +{
 +   u64 data;
 +   u64 prtx;
 +   u64 miscx;
 +   int timeout;
 +
>>
>>
 +
 +   switch (status.speed) {
 +   case 10:
>>>
>>>
>>> In my opinion, instead of hard coding the value, is it fine to use ENUM ?
>>
>> Similar comments applicable in other places where hard coded values
>> are used.
>>
>
> There is nothing to be gained by interposing an extra layer of abstraction
> in this case.  The code is more clear with the raw numbers in this
> particular case.

   As mentioned by Andrew,  macros defined in uapi/linux/ethtool.h may
be useful here.
   Otherwise it's fine to me :)
>
>
>>
>>
 +static int bgx_port_gser_27882(struct bgx_port_priv *priv)
 +{
 +   u64 data;
 +   u64 addr;
>>>
>>>
 +   int timeout = 200;
 +
 +   //timeout = 200;
>>
>> Better to initialize the timeout value

>
>
> What are you talking about?  It is properly initialized using valid C code.

  I mean, instead of writing

   int timeout;
   timeout = 200;

  write,

   int timeout = 200;

Anyway both are correct and there is nothing wrong in your code.
Please ignore my comment here.

>
>
>>
>>
 +static int bgx_port_qlm_rx_equalization(struct bgx_port_priv *priv, int
 qlm, int lane)
 +{
 +   lmode = oct_csr_read(GSER_LANE_MODE(priv->node, qlm));
 +   lmode &= 0xf;
 +   addr = GSER_LANE_P_MODE_1(priv->node, qlm, lmode);
 +   data = oct_csr_read(addr);
 +   /* Don't complete rx equalization if in VMA manual mode */
 +   if (data & BIT(14))
 +   return 0;
 +
 +   /* Apply rx equalization for speed > 6250 */
 +   if (bgx_port_get_qlm_speed(priv, qlm) < 6250)
 +   return 0;
 +
 +   /* Wait until rx data is valid (CDRLOCK) */
 +   timeout = 500;
>>>
>>>
>>> 500 us is the min required value or it can be further reduced ?
>>
>>
>
>
> 500 uS works well and is shorter than the 2000 uS from the hardware manual.
>
> If you would like to verify shorter timeout values, we could consider
> merging such a patch.  But really, this doesn't matter as it is a very short
> one-off action when the link is brought up.

   Ok.
>
>>
 +static int bgx_port_init_xaui_link(struct bgx_port_priv *priv)
 +{
>>
>>
 +
 +   if (use_ber) {
 +   timeout = 1;
 +   do {
 +   data =
 +
 oct_csr_read(BGX_SPU_BR_STATUS1(priv->node, priv->bgx, priv->index));
 +   if (data & BIT(0))
 +   break;
 +   timeout--;
 +   udelay(1);
 +   } while (timeout);
>>>
>>>
>>> In my opinion, it's better to implement similar kind of loops

Re: [PATCH v4 7/8] netdev: octeon-ethernet: Add Cavium Octeon III support.

2017-11-29 Thread Souptick Joarder

Hi David, Dan,


On Thu, Nov 30, 2017 at 12:50 AM, David Daney  wrote:
> On 11/29/2017 08:07 AM, Souptick Joarder wrote:
>>
>> On Wed, Nov 29, 2017 at 4:00 PM, Souptick Joarder 
>> wrote:
>>>
>>> On Wed, Nov 29, 2017 at 6:25 AM, David Daney 
>>> wrote:

 From: Carlos Munoz 

 The Cavium OCTEON cn78xx and cn73xx SoCs have network packet I/O
 hardware that is significantly different from previous generations of
 the family.
>>
>>
 diff --git a/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
 b/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
 new file mode 100644
 index ..4dad35fa4270
 --- /dev/null
 +++ b/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
 @@ -0,0 +1,2033 @@
 +// SPDX-License-Identifier: GPL-2.0
 +/* Copyright (c) 2017 Cavium, Inc.
 + *
 + * This file is subject to the terms and conditions of the GNU General
 Public
 + * License.  See the file "COPYING" in the main directory of this
 archive
 + * for more details.
 + */
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +
>>
>>
 +static void bgx_port_sgmii_set_link_down(struct bgx_port_priv *priv)
 +{
 +   u64 data;
>>
>>
 +   data = oct_csr_read(BGX_GMP_PCS_MISC_CTL(priv->node, priv->bgx,
 priv->index));
 +   data |= BIT(11);
 +   oct_csr_write(data, BGX_GMP_PCS_MISC_CTL(priv->node, priv->bgx,
 priv->index));
 +   data = oct_csr_read(BGX_GMP_PCS_MISC_CTL(priv->node, priv->bgx,
 priv->index));
>>>
>>>
>>> Any particular reason to read immediately after write ?
>>
>>
>
> Yes, to ensure the write is committed to hardware before the next step.
>
>>
>>
 +static int bgx_port_sgmii_set_link_speed(struct bgx_port_priv *priv,
 struct port_status status)
 +{
 +   u64 data;
 +   u64 prtx;
 +   u64 miscx;
 +   int timeout;
 +
>>
>>
 +
 +   switch (status.speed) {
 +   case 10:
>>>
>>>
>>> In my opinion, instead of hard coding the value, is it fine to use ENUM ?
>>
>> Similar comments applicable in other places where hard coded values
>> are used.
>>
>
> There is nothing to be gained by interposing an extra layer of abstraction
> in this case.  The code is more clear with the raw numbers in this
> particular case.

   As mentioned by Andrew,  macros defined in uapi/linux/ethtool.h may
be useful here.
   Otherwise it's fine to me :)
>
>
>>
>>
 +static int bgx_port_gser_27882(struct bgx_port_priv *priv)
 +{
 +   u64 data;
 +   u64 addr;
>>>
>>>
 +   int timeout = 200;
 +
 +   //timeout = 200;
>>
>> Better to initialize the timeout value

>
>
> What are you talking about?  It is properly initialized using valid C code.

  I mean, instead of writing

   int timeout;
   timeout = 200;

  write,

   int timeout = 200;

Anyway both are correct and there is nothing wrong in your code.
Please ignore my comment here.

>
>
>>
>>
 +static int bgx_port_qlm_rx_equalization(struct bgx_port_priv *priv, int
 qlm, int lane)
 +{
 +   lmode = oct_csr_read(GSER_LANE_MODE(priv->node, qlm));
 +   lmode &= 0xf;
 +   addr = GSER_LANE_P_MODE_1(priv->node, qlm, lmode);
 +   data = oct_csr_read(addr);
 +   /* Don't complete rx equalization if in VMA manual mode */
 +   if (data & BIT(14))
 +   return 0;
 +
 +   /* Apply rx equalization for speed > 6250 */
 +   if (bgx_port_get_qlm_speed(priv, qlm) < 6250)
 +   return 0;
 +
 +   /* Wait until rx data is valid (CDRLOCK) */
 +   timeout = 500;
>>>
>>>
>>> 500 us is the min required value or it can be further reduced ?
>>
>>
>
>
> 500 uS works well and is shorter than the 2000 uS from the hardware manual.
>
> If you would like to verify shorter timeout values, we could consider
> merging such a patch.  But really, this doesn't matter as it is a very short
> one-off action when the link is brought up.

   Ok.
>
>>
 +static int bgx_port_init_xaui_link(struct bgx_port_priv *priv)
 +{
>>
>>
 +
 +   if (use_ber) {
 +   timeout = 1;
 +   do {
 +   data =
 +
 oct_csr_read(BGX_SPU_BR_STATUS1(priv->node, priv->bgx, priv->index));
 +   if (data & BIT(0))
 +   break;
 +   timeout--;
 +   udelay(1);
 +   } while (timeout);
>>>
>>>
>>> In my opinion, it's better to implement similar kind of loops inside
>>> macros.
>
>
> Ok, duly noted.  I think we are in disagreement with respect to this

Re: [PATCH v2 6/6] ARM: ep93xx: ts72xx: Add support for BK3 board - ts72xx derivative

2017-11-29 Thread Alexander Sverdlin

Hello Lukasz,

On Wed, 29 Nov 2017 23:07:04 +0100
Lukasz Majewski  wrote:

> > > +/*
> > > + * BK3 support code
> > > +
> > > */
> > > +static struct mtd_partition bk3_nand_parts[] = {
> > > + {
> > > + .name   = "System",
> > > + .offset = 0x,  
> > 
> > I see the above and below lines as unaligned
> 
> This is strange I'm using emacs with extension to have coding style
> for kernel.
> 
> Probably tabs get unaligned...

Yes, seems that they are.

[...]

> > 
> > > + .atag_offset= 0x100,
> > > + .map_io = bk3_map_io,  
> > 
> > again, inconsistent alignment...
> 
> Even more. checkpatch.pl did not complained

checkpatch.pl wouldn't complain, as there are basically two styles,
some people do not align the individual assignments in the structures at all.
But I was quite confident in the beginning and now even applied your v3 to
the code. And indeed it's unaligned... I even checked with emacs. Still
unaligned.

-- 
Alexander Sverdlin.

Re: [PATCH v2 6/6] ARM: ep93xx: ts72xx: Add support for BK3 board - ts72xx derivative

2017-11-29 Thread Alexander Sverdlin

Hello Lukasz,

On Wed, 29 Nov 2017 23:07:04 +0100
Lukasz Majewski  wrote:

> > > +/*
> > > + * BK3 support code
> > > +
> > > */
> > > +static struct mtd_partition bk3_nand_parts[] = {
> > > + {
> > > + .name   = "System",
> > > + .offset = 0x,  
> > 
> > I see the above and below lines as unaligned
> 
> This is strange I'm using emacs with extension to have coding style
> for kernel.
> 
> Probably tabs get unaligned...

Yes, seems that they are.

[...]

> > 
> > > + .atag_offset= 0x100,
> > > + .map_io = bk3_map_io,  
> > 
> > again, inconsistent alignment...
> 
> Even more. checkpatch.pl did not complained

checkpatch.pl wouldn't complain, as there are basically two styles,
some people do not align the individual assignments in the structures at all.
But I was quite confident in the beginning and now even applied your v3 to
the code. And indeed it's unaligned... I even checked with emacs. Still
unaligned.

-- 
Alexander Sverdlin.

Re: [f2fs-dev] [PATCH] f2fs: remove a redundant conditional expression

2017-11-29 Thread Chao Yu

On 2017/11/28 20:17, LiFan wrote:
> Avoid checking is_inode repeatedly, and make the logic 
> a little bit clearer.
> 
> Signed-off-by: Fan li 

Reviewed-by: Chao Yu 

Thanks,

Re: [f2fs-dev] [PATCH] f2fs: remove a redundant conditional expression

2017-11-29 Thread Chao Yu

On 2017/11/28 20:17, LiFan wrote:
> Avoid checking is_inode repeatedly, and make the logic 
> a little bit clearer.
> 
> Signed-off-by: Fan li 

Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH v18 01/10] idr: add #include

2017-11-29 Thread Michal Hocko

On Wed 29-11-17 16:58:17, Matthew Wilcox wrote:
> On Wed, Nov 29, 2017 at 09:55:17PM +0800, Wei Wang wrote:
> > The  was removed from radix-tree.h by the following commit:
> > f5bba9d11a256ad2a1c2f8e7fc6aabe6416b7890.
> > 
> > Since that commit, tools/testing/radix-tree/ couldn't pass compilation
> > due to: tools/testing/radix-tree/idr.c:17: undefined reference to
> > WARN_ON_ONCE. This patch adds the bug.h header to idr.h to solve the
> > issue.
> 
> Thanks; I sent this same patch out yesterday.
> 
> Unfortunately, you didn't cc the author of this breakage, Masahiro Yamada.
> I want to highlight that these kinds of header cleanups are risky,
> and very low reward.  I really don't want to see patches going all over
> the tree randomly touching header files.  If we've got a real problem
> to solve, then sure.  But I want to see a strong justification for any
> more header file cleanups.

I agree. It usually requires unexpected combination of config options to
uncover some nasty include dependencies. So these patches might break
build while their additional value is quite questionable.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v18 01/10] idr: add #include

2017-11-29 Thread Michal Hocko

On Wed 29-11-17 16:58:17, Matthew Wilcox wrote:
> On Wed, Nov 29, 2017 at 09:55:17PM +0800, Wei Wang wrote:
> > The  was removed from radix-tree.h by the following commit:
> > f5bba9d11a256ad2a1c2f8e7fc6aabe6416b7890.
> > 
> > Since that commit, tools/testing/radix-tree/ couldn't pass compilation
> > due to: tools/testing/radix-tree/idr.c:17: undefined reference to
> > WARN_ON_ONCE. This patch adds the bug.h header to idr.h to solve the
> > issue.
> 
> Thanks; I sent this same patch out yesterday.
> 
> Unfortunately, you didn't cc the author of this breakage, Masahiro Yamada.
> I want to highlight that these kinds of header cleanups are risky,
> and very low reward.  I really don't want to see patches going all over
> the tree randomly touching header files.  If we've got a real problem
> to solve, then sure.  But I want to see a strong justification for any
> more header file cleanups.

I agree. It usually requires unexpected combination of config options to
uncover some nasty include dependencies. So these patches might break
build while their additional value is quite questionable.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-11-29 Thread Chao Yu

Hi Hyunchul,

On 2017/11/28 8:23, Hyunchul Lee wrote:
> From: Hyunchul Lee 
> 
> This implements which hint is passed down to block layer
> for datas from the specific segment type.
> 
> segment type hints
>  -
> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
> WARM_DATAWRITE_LIFE_NONE
> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
> HOT_DATA WRITE_LIFE_MEDIUM
> META_DATAWRITE_LIFE_SHORT

Just noticed, if our user do not give the hint via ioctl, f2fs can
provider hint to lower layer according to hot/cold separation ability,
it will be okay. But once user give his hint which may be more accurate
than filesystem, hint converted by f2fs may be wrong.

So what do you think of adding an option to control whether filesystem
can convert hint user given?

Thanks,

Re: [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-11-29 Thread Chao Yu

Hi Hyunchul,

On 2017/11/28 8:23, Hyunchul Lee wrote:
> From: Hyunchul Lee 
> 
> This implements which hint is passed down to block layer
> for datas from the specific segment type.
> 
> segment type hints
>  -
> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
> WARM_DATAWRITE_LIFE_NONE
> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
> HOT_DATA WRITE_LIFE_MEDIUM
> META_DATAWRITE_LIFE_SHORT

Just noticed, if our user do not give the hint via ioctl, f2fs can
provider hint to lower layer according to hot/cold separation ability,
it will be okay. But once user give his hint which may be more accurate
than filesystem, hint converted by f2fs may be wrong.

So what do you think of adding an option to control whether filesystem
can convert hint user given?

Thanks,

Re: [PATCH] schedule: use unlikely()

2017-11-29 Thread Mikulas Patocka



On Tue, 28 Nov 2017, Greg KH wrote:

> On Mon, Nov 27, 2017 at 07:05:22PM -0500, Mikulas Patocka wrote:
> > 
> > 
> > On Sat, 25 Nov 2017, Greg KH wrote:
> > 
> > > On Mon, Nov 13, 2017 at 02:00:45PM -0500, Mikulas Patocka wrote:
> > > > A small patch for schedule(), so that the code goes straght in the 
> > > > common
> > > > case.
> > > > 
> > > > Signed-off-by: Mikulas Patocka 
> > > 
> > > Was this a measurable difference?  If so, great, please provide the
> > > numbers and how you tested in the changelog.  If it can't be measured,
> > > then it is not worth it to add these markings
> > 
> > It is much easier to make microoptimizations (such as using likely() and 
> > unlikely()) than to measure their effect.
> > 
> > If a programmer were required to measure performance every time he uses 
> > likely() or unlikely() in his code, he wouldn't use them at all.
> 
> If you can not measure it, you should not use it.  You are forgetting
> about the testing that was done a few years ago that found that some
> huge percentage (80? 75? 90?) of all of these markings were wrong and
> harmful or did absolutely nothing.

The whole kernel has 19878 likely/unlikely tags.

Do you have benchmark proving efficiency for each of them? :-)

Mikulas

Att! Att!! Att!!! Att!!!!

2017-11-29 Thread Albert H Daniels

Good Day,
I'm Wong Shiu a staff of Wing Hang Bank
here in Hong Kong. Can i TRUST you in transferring-
$13,991,674 USD? If yes do get back to me with my private email: 
wong.shiu...@accountant.com
Best Regards

Re: [PATCH] schedule: use unlikely()

2017-11-29 Thread Mikulas Patocka



On Tue, 28 Nov 2017, Greg KH wrote:

> On Mon, Nov 27, 2017 at 07:05:22PM -0500, Mikulas Patocka wrote:
> > 
> > 
> > On Sat, 25 Nov 2017, Greg KH wrote:
> > 
> > > On Mon, Nov 13, 2017 at 02:00:45PM -0500, Mikulas Patocka wrote:
> > > > A small patch for schedule(), so that the code goes straght in the 
> > > > common
> > > > case.
> > > > 
> > > > Signed-off-by: Mikulas Patocka 
> > > 
> > > Was this a measurable difference?  If so, great, please provide the
> > > numbers and how you tested in the changelog.  If it can't be measured,
> > > then it is not worth it to add these markings
> > 
> > It is much easier to make microoptimizations (such as using likely() and 
> > unlikely()) than to measure their effect.
> > 
> > If a programmer were required to measure performance every time he uses 
> > likely() or unlikely() in his code, he wouldn't use them at all.
> 
> If you can not measure it, you should not use it.  You are forgetting
> about the testing that was done a few years ago that found that some
> huge percentage (80? 75? 90?) of all of these markings were wrong and
> harmful or did absolutely nothing.

The whole kernel has 19878 likely/unlikely tags.

Do you have benchmark proving efficiency for each of them? :-)

Mikulas

Att! Att!! Att!!! Att!!!!

2017-11-29 Thread Albert H Daniels

Good Day,
I'm Wong Shiu a staff of Wing Hang Bank
here in Hong Kong. Can i TRUST you in transferring-
$13,991,674 USD? If yes do get back to me with my private email: 
wong.shiu...@accountant.com
Best Regards

Re: [PATCHv3] drm: adv7511/33: Fix adv7511_cec_init() failure handling

2017-11-29 Thread Archit Taneja




On 11/23/2017 05:52 AM, John Stultz wrote:

On Tue, Nov 21, 2017 at 12:17 AM, Hans Verkuil  wrote:

If the device tree for a board did not specify a cec clock, then
adv7511_cec_init would return an error, which would cause adv7511_probe()
to fail and thus there is no HDMI output.

There is no need to have adv7511_probe() fail if the CEC initialization
fails, so just change adv7511_cec_init() to a void function. In addition,
adv7511_cec_init() should just return silently if the cec clock isn't
found and show a message for any other errors.

An otherwise correct cleanup patch from Dan Carpenter turned this broken
failure handling into a kernel Oops, so bisection points to commit
7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL") rather
than 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support").

Based on earlier patches from Arnd and John.

Reported-by: Naresh Kamboju 
Cc: Xinliang Liu 
Cc: Dan Carpenter 
Cc: Sean Paul 
Cc: Archit Taneja 
Cc: John Stultz 
Link: https://bugs.linaro.org/show_bug.cgi?id=3345
Link: https://lkft.validation.linaro.org/scheduler/job/48017#L3551
Fixes: 7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL")
Fixes: 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support")
Signed-off-by: Hans Verkuil 
Tested-by: Hans Verkuil 
---
This rework of Arnd and John's patches goes a bit further and just silently
exits if there is no cec clock defined in the dts. I'm sure that's the
reason why the kirin board failed on this. BTW: if the kirin board DOES
support cec, then it would be nice if it can be hooked up in the dts!

Tested with my Dragonboard and Renesas Koelsch board. Also tested what
happens when probing is deferred due to missing cec clock.

John, can you test this again?


Sorry I didn't get back to you yesterday on this!

Seems to be working ok for me!

Tested-by: John Stultz 


Queued to drm-misc-fixes. Thanks for fixing this.

Archit

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCHv3] drm: adv7511/33: Fix adv7511_cec_init() failure handling

2017-11-29 Thread Archit Taneja




On 11/23/2017 05:52 AM, John Stultz wrote:

On Tue, Nov 21, 2017 at 12:17 AM, Hans Verkuil  wrote:

If the device tree for a board did not specify a cec clock, then
adv7511_cec_init would return an error, which would cause adv7511_probe()
to fail and thus there is no HDMI output.

There is no need to have adv7511_probe() fail if the CEC initialization
fails, so just change adv7511_cec_init() to a void function. In addition,
adv7511_cec_init() should just return silently if the cec clock isn't
found and show a message for any other errors.

An otherwise correct cleanup patch from Dan Carpenter turned this broken
failure handling into a kernel Oops, so bisection points to commit
7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL") rather
than 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support").

Based on earlier patches from Arnd and John.

Reported-by: Naresh Kamboju 
Cc: Xinliang Liu 
Cc: Dan Carpenter 
Cc: Sean Paul 
Cc: Archit Taneja 
Cc: John Stultz 
Link: https://bugs.linaro.org/show_bug.cgi?id=3345
Link: https://lkft.validation.linaro.org/scheduler/job/48017#L3551
Fixes: 7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL")
Fixes: 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support")
Signed-off-by: Hans Verkuil 
Tested-by: Hans Verkuil 
---
This rework of Arnd and John's patches goes a bit further and just silently
exits if there is no cec clock defined in the dts. I'm sure that's the
reason why the kirin board failed on this. BTW: if the kirin board DOES
support cec, then it would be nice if it can be hooked up in the dts!

Tested with my Dragonboard and Renesas Koelsch board. Also tested what
happens when probing is deferred due to missing cec clock.

John, can you test this again?


Sorry I didn't get back to you yesterday on this!

Seems to be working ok for me!

Tested-by: John Stultz 


Queued to drm-misc-fixes. Thanks for fixing this.

Archit

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [RFC V7 2/2] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

2017-11-29 Thread Viresh Kumar

On 29-11-17, 16:50, Stephen Boyd wrote:
> Sorry it still makes zero sense to me. It seems that we're trying
> to make the OPP table parsing generic just for the sake of code
> brevity.

Not just the code but bindings as well to make sure we don't add a new
property (similar to earlier ones) for every platform that wants to
use performance states.

> Is this the goal? From a DT writer perspective it seems
> confusing to say that opp-microvolt is sometimes a microvolt and
> sometimes not a microvolt.

Well it would still represent the voltage but not in microvolt units
as the platform guys decided to hide those values from kernel and
handle them directly in firmware.

> Why can't the SoC specific genpd
> driver parse something like "qcom,corner" instead out of the
> node?

Sure we can, but that means that a new property will be required for
the next platform.

I did it this way as Kevin (and Rob) suggested NOT to add another
property but use the earlier ones as we aren't passing anything new
here, just that the units of the property are different. For another
SoC, we may want to hide both freq and voltage values from kernel and
pass firmware dependent values. Should we add two new properties for
that SoC then ?

> BTW, I don't believe I have a use-case where I want to express
> power domain OPP tables.

I do remember that you once said [1] that you may want to pass the
real voltage values as well via DT. And so I thought that you can pass
performance-state (corner) in opp-hz and real voltage values in
opp-microvolt.

> I have many devices that all have
> different frequencies that are all tied into the same power
> domain. This binding makes it look like we can only have one
> frequency per domain which won't work.

No, that isn't the case. Looks like we have some confusion here. Let
me try with a simple example:

foo: foo-power-domain@0900 {
compatible = "foo,genpd";
#power-domain-cells = <0>;
operating-points-v2 = <_opp_table>;
};

cpu0: cpu@0 {
compatible = "arm,cortex-a53", "arm,armv8";
...
operating-points-v2 = <_opp_table>;
power-domains = <>;
};

domain_opp_table: domain_opp_table {
compatible = "operating-points-v2";

domain_opp_1: opp00 {
opp-hz = /bits/ 64 <1>; /* These are corners AKA perf 
states */
};
domain_opp_2: opp01 {
opp-hz = /bits/ 64 <2>;
};
domain_opp_3: opp02 {
opp-hz = /bits/ 64 <3>;
};
};

cpu_opp_table: cpu_opp_table {
compatible = "operating-points-v2";
opp-shared;

opp00 {
opp-hz = /bits/ 64 <20800>;
clock-latency-ns = <50>;
power-domain-opp = <_opp_1>;
};
opp01 {
opp-hz = /bits/ 64 <43200>;
clock-latency-ns = <50>;
power-domain-opp = <_opp_2>;
};
opp02 {
opp-hz = /bits/ 64 <72900>;
clock-latency-ns = <50>;
power-domain-opp = <_opp_2>;
};
opp03 {
opp-hz = /bits/ 64 <96000>;
clock-latency-ns = <50>;
power-domain-opp = <_opp_3>;
};
};

The device frequencies are still managed by device's OPP table,
just that device's OPP has OPP requirement from another device which
is power domain in this case.

> I want to express that a device with a range of frequencies (or
> really multiple ranges of frequencies) is inside certain physical
> power domains and the frequency of the clks dictates the minimum
> voltage requirement of those power domains.
> 
> For the most complicated case, imagine something like our eMMC
> controller that has two clks (clk1,clk2) that it changes the rate
> of independently and those two clks rely on two different
> regulators (vreg1, vreg2) that supply voltage domains in the SoC
> which the eMMC controller happens to be part of (pd1, pd2). And
> the device is also part of another power domain that we use to
> turn everything off (pd3).
> 
>  +---+ +---+
>  | vreg1 | | vreg2 |
>  +---+---+ ++--+
>  |  |
>  | ++-+ |
>  | | +---+  |  ++ | |
>  +-> | clk1  |  |  |  clk2  | <-+
>| +---+---+  |  ++---+ |
>| |  |   | |
>   +--v--v---+
>

Re: [RFC V7 2/2] OPP: Allow "opp-hz" and "opp-microvolt" to contain magic values

2017-11-29 Thread Viresh Kumar

On 29-11-17, 16:50, Stephen Boyd wrote:
> Sorry it still makes zero sense to me. It seems that we're trying
> to make the OPP table parsing generic just for the sake of code
> brevity.

Not just the code but bindings as well to make sure we don't add a new
property (similar to earlier ones) for every platform that wants to
use performance states.

> Is this the goal? From a DT writer perspective it seems
> confusing to say that opp-microvolt is sometimes a microvolt and
> sometimes not a microvolt.

Well it would still represent the voltage but not in microvolt units
as the platform guys decided to hide those values from kernel and
handle them directly in firmware.

> Why can't the SoC specific genpd
> driver parse something like "qcom,corner" instead out of the
> node?

Sure we can, but that means that a new property will be required for
the next platform.

I did it this way as Kevin (and Rob) suggested NOT to add another
property but use the earlier ones as we aren't passing anything new
here, just that the units of the property are different. For another
SoC, we may want to hide both freq and voltage values from kernel and
pass firmware dependent values. Should we add two new properties for
that SoC then ?

> BTW, I don't believe I have a use-case where I want to express
> power domain OPP tables.

I do remember that you once said [1] that you may want to pass the
real voltage values as well via DT. And so I thought that you can pass
performance-state (corner) in opp-hz and real voltage values in
opp-microvolt.

> I have many devices that all have
> different frequencies that are all tied into the same power
> domain. This binding makes it look like we can only have one
> frequency per domain which won't work.

No, that isn't the case. Looks like we have some confusion here. Let
me try with a simple example:

foo: foo-power-domain@0900 {
compatible = "foo,genpd";
#power-domain-cells = <0>;
operating-points-v2 = <_opp_table>;
};

cpu0: cpu@0 {
compatible = "arm,cortex-a53", "arm,armv8";
...
operating-points-v2 = <_opp_table>;
power-domains = <>;
};

domain_opp_table: domain_opp_table {
compatible = "operating-points-v2";

domain_opp_1: opp00 {
opp-hz = /bits/ 64 <1>; /* These are corners AKA perf 
states */
};
domain_opp_2: opp01 {
opp-hz = /bits/ 64 <2>;
};
domain_opp_3: opp02 {
opp-hz = /bits/ 64 <3>;
};
};

cpu_opp_table: cpu_opp_table {
compatible = "operating-points-v2";
opp-shared;

opp00 {
opp-hz = /bits/ 64 <20800>;
clock-latency-ns = <50>;
power-domain-opp = <_opp_1>;
};
opp01 {
opp-hz = /bits/ 64 <43200>;
clock-latency-ns = <50>;
power-domain-opp = <_opp_2>;
};
opp02 {
opp-hz = /bits/ 64 <72900>;
clock-latency-ns = <50>;
power-domain-opp = <_opp_2>;
};
opp03 {
opp-hz = /bits/ 64 <96000>;
clock-latency-ns = <50>;
power-domain-opp = <_opp_3>;
};
};

The device frequencies are still managed by device's OPP table,
just that device's OPP has OPP requirement from another device which
is power domain in this case.

> I want to express that a device with a range of frequencies (or
> really multiple ranges of frequencies) is inside certain physical
> power domains and the frequency of the clks dictates the minimum
> voltage requirement of those power domains.
> 
> For the most complicated case, imagine something like our eMMC
> controller that has two clks (clk1,clk2) that it changes the rate
> of independently and those two clks rely on two different
> regulators (vreg1, vreg2) that supply voltage domains in the SoC
> which the eMMC controller happens to be part of (pd1, pd2). And
> the device is also part of another power domain that we use to
> turn everything off (pd3).
> 
>  +---+ +---+
>  | vreg1 | | vreg2 |
>  +---+---+ ++--+
>  |  |
>  | ++-+ |
>  | | +---+  |  ++ | |
>  +-> | clk1  |  |  |  clk2  | <-+
>| +---+---+  |  ++---+ |
>| |  |   | |
>   +--v--v---+
>

Re: [PATCH 0/2] mm: introduce MAP_FIXED_SAFE

2017-11-29 Thread Michal Hocko

On Wed 29-11-17 14:25:36, Kees Cook wrote:
> On Wed, Nov 29, 2017 at 6:42 AM, Michal Hocko  wrote:
> > The first patch introduced MAP_FIXED_SAFE which enforces the given
> > address but unlike MAP_FIXED it fails with ENOMEM if the given range
> > conflicts with an existing one. The flag is introduced as a completely
> 
> I still think this name should be better. "SAFE" doesn't say what it's
> safe from...

It is safe in a sense it doesn't perform any address space dangerous
operations. mmap is _inherently_ about the address space so the context
should be kind of clear.

> MAP_FIXED_UNIQUE
> MAP_FIXED_ONCE
> MAP_FIXED_FRESH

Well, I can open a poll for the best name, but none of those you are
proposing sound much better to me. Yeah, naming sucks...
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 0/2] mm: introduce MAP_FIXED_SAFE

2017-11-29 Thread Michal Hocko

On Wed 29-11-17 14:25:36, Kees Cook wrote:
> On Wed, Nov 29, 2017 at 6:42 AM, Michal Hocko  wrote:
> > The first patch introduced MAP_FIXED_SAFE which enforces the given
> > address but unlike MAP_FIXED it fails with ENOMEM if the given range
> > conflicts with an existing one. The flag is introduced as a completely
> 
> I still think this name should be better. "SAFE" doesn't say what it's
> safe from...

It is safe in a sense it doesn't perform any address space dangerous
operations. mmap is _inherently_ about the address space so the context
should be kind of clear.

> MAP_FIXED_UNIQUE
> MAP_FIXED_ONCE
> MAP_FIXED_FRESH

Well, I can open a poll for the best name, but none of those you are
proposing sound much better to me. Yeah, naming sucks...
-- 
Michal Hocko
SUSE Labs

Re: [PATCH resend] mm/page_alloc: fix comment is __get_free_pages

2017-11-29 Thread Michal Hocko

On Wed 29-11-17 13:41:59, Andrew Morton wrote:
> On Wed, 29 Nov 2017 17:04:46 +0100 Michal Hocko  wrote:
> 
> > On Mon 27-11-17 12:33:41, Michal Hocko wrote:
> > > On Mon 27-11-17 19:09:24, JianKang Chen wrote:
> > > > From: Jiankang Chen 
> > > > 
> > > > __get_free_pages will return an virtual address, 
> > > > but it is not just 32-bit address, for example a 64-bit system. 
> > > > And this comment really confuse new bigenner of mm.
> > > 
> > > s@bigenner@beginner@
> > > 
> > > Anyway, do we really need a bug on for this? Has this actually caught
> > > any wrong usage? VM_BUG_ON tends to be enabled these days AFAIK and
> > > panicking the kernel seems like an over-reaction. If there is a real
> > > risk then why don't we simply mask __GFP_HIGHMEM off when calling
> > > alloc_pages?
> > 
> > I meant this
> > ---
> > >From 000bb422fe07adbfa8cd8ed953b18f48647a45d6 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko 
> > Date: Wed, 29 Nov 2017 17:02:33 +0100
> > Subject: [PATCH] mm: drop VM_BUG_ON from __get_free_pages
> > 
> > There is no real reason to blow up just because the caller doesn't know
> > that __get_free_pages cannot return highmem pages. Simply fix that up
> > silently. Even if we have some confused users such a fixup will not be
> > harmful.
> 
> mm...  So we have a caller which hopes to be getting highmem pages but
> isn't.  Caller then proceeds to pointlessly kmap the page and wonders
> why it isn't getting as much memory as it would like on 32-bit systems,
> etc.

How he can kmap the page when he gets a _virtual_ address?

> I do think we should help ferret out such bogosity.  A WARN_ON_ONCE
> would suffice.

This function has always been about lowmem pages. I seriously doubt we
have anybody confused and asking for a highmem page in the kernel. I
haven't checked that but it would already blow up as VM_BUG_ON tends to
be enabled on many setups.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH resend] mm/page_alloc: fix comment is __get_free_pages

2017-11-29 Thread Michal Hocko

On Wed 29-11-17 13:41:59, Andrew Morton wrote:
> On Wed, 29 Nov 2017 17:04:46 +0100 Michal Hocko  wrote:
> 
> > On Mon 27-11-17 12:33:41, Michal Hocko wrote:
> > > On Mon 27-11-17 19:09:24, JianKang Chen wrote:
> > > > From: Jiankang Chen 
> > > > 
> > > > __get_free_pages will return an virtual address, 
> > > > but it is not just 32-bit address, for example a 64-bit system. 
> > > > And this comment really confuse new bigenner of mm.
> > > 
> > > s@bigenner@beginner@
> > > 
> > > Anyway, do we really need a bug on for this? Has this actually caught
> > > any wrong usage? VM_BUG_ON tends to be enabled these days AFAIK and
> > > panicking the kernel seems like an over-reaction. If there is a real
> > > risk then why don't we simply mask __GFP_HIGHMEM off when calling
> > > alloc_pages?
> > 
> > I meant this
> > ---
> > >From 000bb422fe07adbfa8cd8ed953b18f48647a45d6 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko 
> > Date: Wed, 29 Nov 2017 17:02:33 +0100
> > Subject: [PATCH] mm: drop VM_BUG_ON from __get_free_pages
> > 
> > There is no real reason to blow up just because the caller doesn't know
> > that __get_free_pages cannot return highmem pages. Simply fix that up
> > silently. Even if we have some confused users such a fixup will not be
> > harmful.
> 
> mm...  So we have a caller which hopes to be getting highmem pages but
> isn't.  Caller then proceeds to pointlessly kmap the page and wonders
> why it isn't getting as much memory as it would like on 32-bit systems,
> etc.

How he can kmap the page when he gets a _virtual_ address?

> I do think we should help ferret out such bogosity.  A WARN_ON_ONCE
> would suffice.

This function has always been about lowmem pages. I seriously doubt we
have anybody confused and asking for a highmem page in the kernel. I
haven't checked that but it would already blow up as VM_BUG_ON tends to
be enabled on many setups.

-- 
Michal Hocko
SUSE Labs

Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules

2017-11-29 Thread Daniel Micay

> And once you disable it by default, and it becomes purely opt-in, that
> means that nothing will change for most cases. Some embedded people
> that do their own thing (ie Android) might change, but normal
> distributions probably won't.
> 
> Yes, Android may be 99% of the users, and yes, the embedded world in
> general needs to be secure, but I'd still like this to be something
> that helps _everybody_.

Android devices won't get much benefit since they ship a tiny set of
modules chosen for the device. The kernels already get very stripped
down to the bare minimum vs. enabling every feature and driver available
and shipping it all by default on a traditional distribution.

Lots of potential module attack surface also gets eliminated by default
via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl
commands, etc. The global seccomp whitelist might be relevant in some
cases too.

Android devices like to build everything into the kernel too, so even if
they weren't using a module this feature wouldn't usually help them. It
would need to work like this existing sysctl:

net.ipv4.tcp_available_congestion_control = cubic reno lp

i.e. whitelists for functionality offered by the modules, not just
whether they can be loaded.

Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules

2017-11-29 Thread Daniel Micay

> And once you disable it by default, and it becomes purely opt-in, that
> means that nothing will change for most cases. Some embedded people
> that do their own thing (ie Android) might change, but normal
> distributions probably won't.
> 
> Yes, Android may be 99% of the users, and yes, the embedded world in
> general needs to be secure, but I'd still like this to be something
> that helps _everybody_.

Android devices won't get much benefit since they ship a tiny set of
modules chosen for the device. The kernels already get very stripped
down to the bare minimum vs. enabling every feature and driver available
and shipping it all by default on a traditional distribution.

Lots of potential module attack surface also gets eliminated by default
via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl
commands, etc. The global seccomp whitelist might be relevant in some
cases too.

Android devices like to build everything into the kernel too, so even if
they weren't using a module this feature wouldn't usually help them. It
would need to work like this existing sysctl:

net.ipv4.tcp_available_congestion_control = cubic reno lp

i.e. whitelists for functionality offered by the modules, not just
whether they can be loaded.

Re: [PATCH 10/10] input: joystick: riscv has get_cycles

2017-11-29 Thread Dmitry Torokhov

On Wed, Nov 29, 2017 at 05:55:21PM -0800, Olof Johansson wrote:
> Fixes:
> 
> drivers/input/joystick/analog.c:176:2: warning: #warning Precise timer not 
> defined for this architecture. [-Wcpp]
> 
> Signed-off-by: Olof Johansson 
> Cc: Dmitry Torokhov 

Applied, thank you.

> ---
>  drivers/input/joystick/analog.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/input/joystick/analog.c b/drivers/input/joystick/analog.c
> index 3d8ff09..c868a87 100644
> --- a/drivers/input/joystick/analog.c
> +++ b/drivers/input/joystick/analog.c
> @@ -163,7 +163,7 @@ static unsigned int get_time_pit(void)
>  #define GET_TIME(x)  do { x = (unsigned int)rdtsc(); } while (0)
>  #define DELTA(x,y)   ((y)-(x))
>  #define TIME_NAME"TSC"
> -#elif defined(__alpha__) || defined(CONFIG_MN10300) || defined(CONFIG_ARM) 
> || defined(CONFIG_ARM64) || defined(CONFIG_TILE)
> +#elif defined(__alpha__) || defined(CONFIG_MN10300) || defined(CONFIG_ARM) 
> || defined(CONFIG_ARM64) || defined(CONFIG_RISCV) || defined(CONFIG_TILE)
>  #define GET_TIME(x)  do { x = get_cycles(); } while (0)
>  #define DELTA(x,y)   ((y)-(x))
>  #define TIME_NAME"get_cycles"
> -- 
> 2.8.6
> 

-- 
Dmitry

Re: [PATCH 10/10] input: joystick: riscv has get_cycles

2017-11-29 Thread Dmitry Torokhov

On Wed, Nov 29, 2017 at 05:55:21PM -0800, Olof Johansson wrote:
> Fixes:
> 
> drivers/input/joystick/analog.c:176:2: warning: #warning Precise timer not 
> defined for this architecture. [-Wcpp]
> 
> Signed-off-by: Olof Johansson 
> Cc: Dmitry Torokhov 

Applied, thank you.

> ---
>  drivers/input/joystick/analog.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/input/joystick/analog.c b/drivers/input/joystick/analog.c
> index 3d8ff09..c868a87 100644
> --- a/drivers/input/joystick/analog.c
> +++ b/drivers/input/joystick/analog.c
> @@ -163,7 +163,7 @@ static unsigned int get_time_pit(void)
>  #define GET_TIME(x)  do { x = (unsigned int)rdtsc(); } while (0)
>  #define DELTA(x,y)   ((y)-(x))
>  #define TIME_NAME"TSC"
> -#elif defined(__alpha__) || defined(CONFIG_MN10300) || defined(CONFIG_ARM) 
> || defined(CONFIG_ARM64) || defined(CONFIG_TILE)
> +#elif defined(__alpha__) || defined(CONFIG_MN10300) || defined(CONFIG_ARM) 
> || defined(CONFIG_ARM64) || defined(CONFIG_RISCV) || defined(CONFIG_TILE)
>  #define GET_TIME(x)  do { x = get_cycles(); } while (0)
>  #define DELTA(x,y)   ((y)-(x))
>  #define TIME_NAME"get_cycles"
> -- 
> 2.8.6
> 

-- 
Dmitry

Re: [PATCH] drm/bridge: Fix lvds-encoder since the panel_bridge rework.

2017-11-29 Thread Archit Taneja




On 11/15/2017 03:29 PM, Lothar Waßmann wrote:

Hi,

On Tue, 14 Nov 2017 11:16:47 -0800 Eric Anholt wrote:

The panel_bridge bridge attaches to the panel's OF node, not the
lvds-encoder's node.  Put in a little no-op bridge of our own so that
our consumers can still find a bridge where they expect.

This also fixes an unintended unregistration and leak of the
panel-bridge on module remove.

Signed-off-by: Eric Anholt 
Fixes: 13dfc0540a57 ("drm/bridge: Refactor out the panel wrapper from the 
lvds-encoder bri
dge.")
---

Note: I haven't actually tested this patch!  Hope it helps, though.

  drivers/gpu/drm/bridge/lvds-encoder.c | 48 ++-
  1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/bridge/lvds-encoder.c 
b/drivers/gpu/drm/bridge/lvds-encoder.c
index 0903ba574f61..75b0d3f6e4de 100644
--- a/drivers/gpu/drm/bridge/lvds-encoder.c
+++ b/drivers/gpu/drm/bridge/lvds-encoder.c
@@ -13,13 +13,37 @@
  
  #include 
  
+struct lvds_encoder {

+   struct drm_bridge bridge;
+   struct drm_bridge *panel_bridge;
+};
+
+static int lvds_encoder_attach(struct drm_bridge *bridge)
+{
+   struct lvds_encoder *lvds_encoder = container_of(bridge,
+struct lvds_encoder,
+bridge);
+
+   return drm_bridge_attach(bridge->encoder, lvds_encoder->panel_bridge,
+bridge);
+}
+
+static struct drm_bridge_funcs funcs = {
+   .attach = lvds_encoder_attach,
+};
+
  static int lvds_encoder_probe(struct platform_device *pdev)
  {
struct device_node *port;
struct device_node *endpoint;
struct device_node *panel_node;
struct drm_panel *panel;
-   struct drm_bridge *bridge;
+   struct lvds_encoder *lvds_encoder;
+
+   lvds_encoder = devm_kzalloc(>dev, sizeof(*lvds_encoder),
+   GFP_KERNEL);
+   if (!lvds_encoder)
+   return -ENOMEM;
  
  	/* Locate the panel DT node. */

port = of_graph_get_port_by_id(pdev->dev.of_node, 1);
@@ -49,20 +73,30 @@ static int lvds_encoder_probe(struct platform_device *pdev)
return -EPROBE_DEFER;
}
  
-	bridge = drm_panel_bridge_add(panel, DRM_MODE_CONNECTOR_LVDS);

-   if (IS_ERR(bridge))
-   return PTR_ERR(bridge);
+   lvds_encoder->panel_bridge =
+   devm_drm_panel_bridge_add(>dev,
+ panel, DRM_MODE_CONNECTOR_LVDS);
+   if (IS_ERR(lvds_encoder->panel_bridge))
+   return PTR_ERR(lvds_encoder->panel_bridge);
+
+   /* The panel_bridge bridge is attached to the panel's of_node,
+* but we need a bridge attached to our of_node for our user
+* to look up.
+*/
+   lvds_encoder->bridge.of_node = pdev->dev.of_node;
+   lvds_encoder->bridge.funcs = 
+   drm_bridge_add(_encoder->bridge);
  
-	platform_set_drvdata(pdev, bridge);

+   platform_set_drvdata(pdev, lvds_encoder);
  
  	return 0;

  }
  
  static int lvds_encoder_remove(struct platform_device *pdev)

  {
-   struct drm_bridge *bridge = platform_get_drvdata(pdev);
+   struct lvds_encoder *lvds_encoder = platform_get_drvdata(pdev);
  
-	drm_bridge_remove(bridge);

+   drm_bridge_remove(_encoder->bridge);
  
  	return 0;

  }


Tested-by: Lothar Waßmann 


queued to drm-misc-fixes.

I think we should send this patch for the stable kernels (v4.12+) too.

Thanks,
Archit




Lothar Waßmann



--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH] drm/bridge: Fix lvds-encoder since the panel_bridge rework.

2017-11-29 Thread Archit Taneja




On 11/15/2017 03:29 PM, Lothar Waßmann wrote:

Hi,

On Tue, 14 Nov 2017 11:16:47 -0800 Eric Anholt wrote:

The panel_bridge bridge attaches to the panel's OF node, not the
lvds-encoder's node.  Put in a little no-op bridge of our own so that
our consumers can still find a bridge where they expect.

This also fixes an unintended unregistration and leak of the
panel-bridge on module remove.

Signed-off-by: Eric Anholt 
Fixes: 13dfc0540a57 ("drm/bridge: Refactor out the panel wrapper from the 
lvds-encoder bri
dge.")
---

Note: I haven't actually tested this patch!  Hope it helps, though.

  drivers/gpu/drm/bridge/lvds-encoder.c | 48 ++-
  1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/bridge/lvds-encoder.c 
b/drivers/gpu/drm/bridge/lvds-encoder.c
index 0903ba574f61..75b0d3f6e4de 100644
--- a/drivers/gpu/drm/bridge/lvds-encoder.c
+++ b/drivers/gpu/drm/bridge/lvds-encoder.c
@@ -13,13 +13,37 @@
  
  #include 
  
+struct lvds_encoder {

+   struct drm_bridge bridge;
+   struct drm_bridge *panel_bridge;
+};
+
+static int lvds_encoder_attach(struct drm_bridge *bridge)
+{
+   struct lvds_encoder *lvds_encoder = container_of(bridge,
+struct lvds_encoder,
+bridge);
+
+   return drm_bridge_attach(bridge->encoder, lvds_encoder->panel_bridge,
+bridge);
+}
+
+static struct drm_bridge_funcs funcs = {
+   .attach = lvds_encoder_attach,
+};
+
  static int lvds_encoder_probe(struct platform_device *pdev)
  {
struct device_node *port;
struct device_node *endpoint;
struct device_node *panel_node;
struct drm_panel *panel;
-   struct drm_bridge *bridge;
+   struct lvds_encoder *lvds_encoder;
+
+   lvds_encoder = devm_kzalloc(>dev, sizeof(*lvds_encoder),
+   GFP_KERNEL);
+   if (!lvds_encoder)
+   return -ENOMEM;
  
  	/* Locate the panel DT node. */

port = of_graph_get_port_by_id(pdev->dev.of_node, 1);
@@ -49,20 +73,30 @@ static int lvds_encoder_probe(struct platform_device *pdev)
return -EPROBE_DEFER;
}
  
-	bridge = drm_panel_bridge_add(panel, DRM_MODE_CONNECTOR_LVDS);

-   if (IS_ERR(bridge))
-   return PTR_ERR(bridge);
+   lvds_encoder->panel_bridge =
+   devm_drm_panel_bridge_add(>dev,
+ panel, DRM_MODE_CONNECTOR_LVDS);
+   if (IS_ERR(lvds_encoder->panel_bridge))
+   return PTR_ERR(lvds_encoder->panel_bridge);
+
+   /* The panel_bridge bridge is attached to the panel's of_node,
+* but we need a bridge attached to our of_node for our user
+* to look up.
+*/
+   lvds_encoder->bridge.of_node = pdev->dev.of_node;
+   lvds_encoder->bridge.funcs = 
+   drm_bridge_add(_encoder->bridge);
  
-	platform_set_drvdata(pdev, bridge);

+   platform_set_drvdata(pdev, lvds_encoder);
  
  	return 0;

  }
  
  static int lvds_encoder_remove(struct platform_device *pdev)

  {
-   struct drm_bridge *bridge = platform_get_drvdata(pdev);
+   struct lvds_encoder *lvds_encoder = platform_get_drvdata(pdev);
  
-	drm_bridge_remove(bridge);

+   drm_bridge_remove(_encoder->bridge);
  
  	return 0;

  }


Tested-by: Lothar Waßmann 


queued to drm-misc-fixes.

I think we should send this patch for the stable kernels (v4.12+) too.

Thanks,
Archit




Lothar Waßmann



--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [ 0.003333] BUG: KASAN: use-after-scope in console_unlock+0x605/0xcc0

2017-11-29 Thread Sergey Senozhatsky

Hi,

On (11/30/17 10:26), Fengguang Wu wrote:
> FYI this happens in mainline kernel v4.15-rc1 .
> It shows up after v4.14 . Bisect is on the way.

hm, printk saw no changes between 4.14 and 4.15


> It occurs in 4 out of 4 boots.
> 
> [0.00]RCU callback double-/use-after-free debug enabled.
> [0.00]RCU CPU stall warnings timeout set to 100 
> (rcu_cpu_stall_timeout).
> [0.00]Tasks RCU enabled.
> [0.00] NR_IRQS: 4352, nr_irqs: 48, preallocated irqs: 16
> [0.00] 
> ==
> [0.00] BUG: KASAN: use-after-scope in console_unlock+0x605/0xcc0:
>   atomic_read at 
> arch/x86/include/asm/atomic.h:27
>(inlined by) static_key_count 
> at include/linux/jump_label.h:191
>(inlined by) static_key_false 
> at include/linux/jump_label.h:201
>(inlined by) 
> trace_console_rcuidle at include/trace/events/printk.h:10
>(inlined by) 
> call_console_drivers at kernel/printk/printk.c:1556
>(inlined by) console_unlock at 
> kernel/printk/printk.c:2233
> [0.00] Write of size 4 at addr 83607aa0 by task swapper/0
> [0.00]
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-rc1 #1
> [0.00] Call Trace:
> [0.00]  ? print_address_description+0x4f/0x3c0:
>   print_address_description at 
> mm/kasan/report.c:253
> [0.00]  ? console_unlock+0x605/0xcc0:
>   atomic_read at 
> arch/x86/include/asm/atomic.h:27
>(inlined by) static_key_count 
> at include/linux/jump_label.h:191
>(inlined by) static_key_false 
> at include/linux/jump_label.h:201
>(inlined by) 
> trace_console_rcuidle at include/trace/events/printk.h:10
>(inlined by) 
> call_console_drivers at kernel/printk/printk.c:1556
>(inlined by) console_unlock at 
> kernel/printk/printk.c:2233

so KASAN didn't like atomic_read(>enabled) from static_key_count()?
"Write of size 4"...


> [0.00]  ? kasan_report+0x304/0x390:
>   kasan_report_error at 
> mm/kasan/report.c:352
>(inlined by) kasan_report at 
> mm/kasan/report.c:409
> [0.00]  ? console_unlock+0x605/0xcc0:
>   atomic_read at 
> arch/x86/include/asm/atomic.h:27
>(inlined by) static_key_count 
> at include/linux/jump_label.h:191
>(inlined by) static_key_false 
> at include/linux/jump_label.h:201
>(inlined by) 
> trace_console_rcuidle at include/trace/events/printk.h:10
>(inlined by) 
> call_console_drivers at kernel/printk/printk.c:1556
>(inlined by) console_unlock at 
> kernel/printk/printk.c:2233
> [0.00]  ? wake_up_klogd+0x180/0x180:
>   console_unlock at 
> kernel/printk/printk.c:2138
> [0.00]  ? _raw_spin_unlock_irqrestore+0xcf/0xf0:
>   __raw_spin_unlock_irqrestore at 
> include/linux/spinlock_api_smp.h:161
>(inlined by) 
> _raw_spin_unlock_irqrestore at kernel/locking/spinlock.c:191
> [0.00]  ? __down_trylock_console_sem+0xf8/0x120:
>   __down_trylock_console_sem at 
> kernel/printk/printk.c:234
> [0.00]  ? __down_trylock_console_sem+0x106/0x120:
>   __down_trylock_console_sem at 
> kernel/printk/printk.c:235
> [0.00]  ? vprintk_emit+0x63e/0x6f0:
>   vprintk_emit at 
> kernel/printk/printk.c:1757
> [0.00]  ? vprintk_func+0x11e/0x130:
>   vprintk_func at 
> kernel/printk/printk_safe.c:379
> [0.00]  ? printk+0xaf/0xcf:
>   printk at 
> kernel/printk/printk.c:1824
> [0.00]  ? show_regs_print_info+0x40/0x40:
>   printk at 
> kernel/printk/printk.c:1824
> [0.00]  ? __flush_tlb_all+0x1e/0x31:
>   __flush_tlb_global at 
>

Re: [ 0.003333] BUG: KASAN: use-after-scope in console_unlock+0x605/0xcc0

2017-11-29 Thread Sergey Senozhatsky

Hi,

On (11/30/17 10:26), Fengguang Wu wrote:
> FYI this happens in mainline kernel v4.15-rc1 .
> It shows up after v4.14 . Bisect is on the way.

hm, printk saw no changes between 4.14 and 4.15


> It occurs in 4 out of 4 boots.
> 
> [0.00]RCU callback double-/use-after-free debug enabled.
> [0.00]RCU CPU stall warnings timeout set to 100 
> (rcu_cpu_stall_timeout).
> [0.00]Tasks RCU enabled.
> [0.00] NR_IRQS: 4352, nr_irqs: 48, preallocated irqs: 16
> [0.00] 
> ==
> [0.00] BUG: KASAN: use-after-scope in console_unlock+0x605/0xcc0:
>   atomic_read at 
> arch/x86/include/asm/atomic.h:27
>(inlined by) static_key_count 
> at include/linux/jump_label.h:191
>(inlined by) static_key_false 
> at include/linux/jump_label.h:201
>(inlined by) 
> trace_console_rcuidle at include/trace/events/printk.h:10
>(inlined by) 
> call_console_drivers at kernel/printk/printk.c:1556
>(inlined by) console_unlock at 
> kernel/printk/printk.c:2233
> [0.00] Write of size 4 at addr 83607aa0 by task swapper/0
> [0.00]
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-rc1 #1
> [0.00] Call Trace:
> [0.00]  ? print_address_description+0x4f/0x3c0:
>   print_address_description at 
> mm/kasan/report.c:253
> [0.00]  ? console_unlock+0x605/0xcc0:
>   atomic_read at 
> arch/x86/include/asm/atomic.h:27
>(inlined by) static_key_count 
> at include/linux/jump_label.h:191
>(inlined by) static_key_false 
> at include/linux/jump_label.h:201
>(inlined by) 
> trace_console_rcuidle at include/trace/events/printk.h:10
>(inlined by) 
> call_console_drivers at kernel/printk/printk.c:1556
>(inlined by) console_unlock at 
> kernel/printk/printk.c:2233

so KASAN didn't like atomic_read(>enabled) from static_key_count()?
"Write of size 4"...


> [0.00]  ? kasan_report+0x304/0x390:
>   kasan_report_error at 
> mm/kasan/report.c:352
>(inlined by) kasan_report at 
> mm/kasan/report.c:409
> [0.00]  ? console_unlock+0x605/0xcc0:
>   atomic_read at 
> arch/x86/include/asm/atomic.h:27
>(inlined by) static_key_count 
> at include/linux/jump_label.h:191
>(inlined by) static_key_false 
> at include/linux/jump_label.h:201
>(inlined by) 
> trace_console_rcuidle at include/trace/events/printk.h:10
>(inlined by) 
> call_console_drivers at kernel/printk/printk.c:1556
>(inlined by) console_unlock at 
> kernel/printk/printk.c:2233
> [0.00]  ? wake_up_klogd+0x180/0x180:
>   console_unlock at 
> kernel/printk/printk.c:2138
> [0.00]  ? _raw_spin_unlock_irqrestore+0xcf/0xf0:
>   __raw_spin_unlock_irqrestore at 
> include/linux/spinlock_api_smp.h:161
>(inlined by) 
> _raw_spin_unlock_irqrestore at kernel/locking/spinlock.c:191
> [0.00]  ? __down_trylock_console_sem+0xf8/0x120:
>   __down_trylock_console_sem at 
> kernel/printk/printk.c:234
> [0.00]  ? __down_trylock_console_sem+0x106/0x120:
>   __down_trylock_console_sem at 
> kernel/printk/printk.c:235
> [0.00]  ? vprintk_emit+0x63e/0x6f0:
>   vprintk_emit at 
> kernel/printk/printk.c:1757
> [0.00]  ? vprintk_func+0x11e/0x130:
>   vprintk_func at 
> kernel/printk/printk_safe.c:379
> [0.00]  ? printk+0xaf/0xcf:
>   printk at 
> kernel/printk/printk.c:1824
> [0.00]  ? show_regs_print_info+0x40/0x40:
>   printk at 
> kernel/printk/printk.c:1824
> [0.00]  ? __flush_tlb_all+0x1e/0x31:
>   __flush_tlb_global at 
>

Re: jsm_tty: Fix a possible null pointer dereference in two functions

2017-11-29 Thread Jiri Slaby

On 11/29/2017, 07:19 PM, SF Markus Elfring wrote:
>>> It's pretty unlikely, but it is an actual defect.
>>
>> No it is not, those variables will never be set to NULL,
>> so this can never be triggered.  Walk up the call chain.
> 
> If the involved software developers are convinced about the validity
> of this pointer:
> 
> How do you think about to delete the following condition check
> instead in the discussed function implementations?
> 
>   if (!ch)
>   return;

ACK

thanks,
-- 
js
suse labs

Re: jsm_tty: Fix a possible null pointer dereference in two functions

2017-11-29 Thread Jiri Slaby

On 11/29/2017, 07:19 PM, SF Markus Elfring wrote:
>>> It's pretty unlikely, but it is an actual defect.
>>
>> No it is not, those variables will never be set to NULL,
>> so this can never be triggered.  Walk up the call chain.
> 
> If the involved software developers are convinced about the validity
> of this pointer:
> 
> How do you think about to delete the following condition check
> instead in the discussed function implementations?
> 
>   if (!ch)
>   return;

ACK

thanks,
-- 
js
suse labs

Re: [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-11-29 Thread Chao Yu

On 2017/11/28 8:23, Hyunchul Lee wrote:
> From: Hyunchul Lee 
> 
> This implements which hint is passed down to block layer
> for datas from the specific segment type.
> 
> segment type hints
>  -
> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
> WARM_DATAWRITE_LIFE_NONE
> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
> HOT_DATA WRITE_LIFE_MEDIUM
> META_DATAWRITE_LIFE_SHORT

The correspondence looks good to me.

> 
> Many thanks to Chao Yu and Jaegeuk Kim for comments to
> implement this patch.
> 
> Signed-off-by: Hyunchul Lee 
> ---
>  fs/f2fs/data.c|  1 +
>  fs/f2fs/f2fs.h|  2 ++
>  fs/f2fs/segment.c | 29 +
>  fs/f2fs/super.c   |  2 ++
>  4 files changed, 34 insertions(+)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index a7ae418..0919a43 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -437,6 +437,7 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
>   }
>   io->bio = __bio_alloc(sbi, fio->new_blkaddr,
>   BIO_MAX_PAGES, false);
> + io->bio->bi_write_hint = io->write_hint;

Need to assign bi_write_hint for IPU path?

- rewrite_data_page
 - f2fs_submit_page_bio

>   io->fio = *fio;
>   }
>  
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 7bcd148..be3cb0c 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -969,6 +969,7 @@ struct f2fs_bio_info {
>   struct rw_semaphore io_rwsem;   /* blocking op for bio */
>   spinlock_t io_lock; /* serialize DATA/NODE IOs */
>   struct list_head io_list;   /* track fios */
> + enum rw_hint write_hint;

Add missing comment?

>  };
>  
>  #define FDEV(i)  (sbi->devs[i])
> @@ -2674,6 +2675,7 @@ int lookup_journal_in_cursum(struct f2fs_journal 
> *journal, int type,
>  int __init create_segment_manager_caches(void);
>  void destroy_segment_manager_caches(void);
>  int rw_hint_to_seg_type(enum rw_hint hint);
> +enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp);
>  
>  /*
>   * checkpoint.c
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index c117e09..0570db7 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2446,6 +2446,35 @@ int rw_hint_to_seg_type(enum rw_hint hint)
>   }
>  }
>  

It will be better to copy commit log here to declare correspondence
more clearly.

Thanks,

> +enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp)
> +{
> + if (page == DATA) {
> + switch (temp) {
> + case WARM:
> + return WRITE_LIFE_NONE;
> + case COLD:
> + return WRITE_LIFE_EXTREME;
> + case HOT:
> + return WRITE_LIFE_MEDIUM;
> + default:
> + return WRITE_LIFE_NOT_SET;
> + }
> + } else if (page == NODE) {
> + switch (temp) {
> + case WARM:
> + case HOT:
> + return WRITE_LIFE_LONG;
> + case COLD:
> + return WRITE_LIFE_EXTREME;
> + default:
> + return WRITE_LIFE_NOT_SET;
> + }
> +
> + } else {
> + return WRITE_LIFE_SHORT;
> + }
> +}
> +
>  static int __get_segment_type_2(struct f2fs_io_info *fio)
>  {
>   if (fio->type == DATA)
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index a6c5dd4..51c19a0 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -2508,6 +2508,8 @@ static int f2fs_fill_super(struct super_block *sb, void 
> *data, int silent)
>   sbi->write_io[i][j].bio = NULL;
>   spin_lock_init(>write_io[i][j].io_lock);
>   INIT_LIST_HEAD(>write_io[i][j].io_list);
> + sbi->write_io[i][j].write_hint =
> + io_type_to_rw_hint(i, j);
>   }
>   }
>  
>

Re: [PATCH 1/2] f2fs: pass down write hints to block layer for bufferd write

2017-11-29 Thread Chao Yu

On 2017/11/28 8:23, Hyunchul Lee wrote:
> From: Hyunchul Lee 
> 
> This implements which hint is passed down to block layer
> for datas from the specific segment type.
> 
> segment type hints
>  -
> COLD_NODE & COLD_DATAWRITE_LIFE_EXTREME
> WARM_DATAWRITE_LIFE_NONE
> HOT_NODE & WARM_NODE WRITE_LIFE_LONG
> HOT_DATA WRITE_LIFE_MEDIUM
> META_DATAWRITE_LIFE_SHORT

The correspondence looks good to me.

> 
> Many thanks to Chao Yu and Jaegeuk Kim for comments to
> implement this patch.
> 
> Signed-off-by: Hyunchul Lee 
> ---
>  fs/f2fs/data.c|  1 +
>  fs/f2fs/f2fs.h|  2 ++
>  fs/f2fs/segment.c | 29 +
>  fs/f2fs/super.c   |  2 ++
>  4 files changed, 34 insertions(+)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index a7ae418..0919a43 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -437,6 +437,7 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
>   }
>   io->bio = __bio_alloc(sbi, fio->new_blkaddr,
>   BIO_MAX_PAGES, false);
> + io->bio->bi_write_hint = io->write_hint;

Need to assign bi_write_hint for IPU path?

- rewrite_data_page
 - f2fs_submit_page_bio

>   io->fio = *fio;
>   }
>  
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 7bcd148..be3cb0c 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -969,6 +969,7 @@ struct f2fs_bio_info {
>   struct rw_semaphore io_rwsem;   /* blocking op for bio */
>   spinlock_t io_lock; /* serialize DATA/NODE IOs */
>   struct list_head io_list;   /* track fios */
> + enum rw_hint write_hint;

Add missing comment?

>  };
>  
>  #define FDEV(i)  (sbi->devs[i])
> @@ -2674,6 +2675,7 @@ int lookup_journal_in_cursum(struct f2fs_journal 
> *journal, int type,
>  int __init create_segment_manager_caches(void);
>  void destroy_segment_manager_caches(void);
>  int rw_hint_to_seg_type(enum rw_hint hint);
> +enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp);
>  
>  /*
>   * checkpoint.c
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index c117e09..0570db7 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2446,6 +2446,35 @@ int rw_hint_to_seg_type(enum rw_hint hint)
>   }
>  }
>  

It will be better to copy commit log here to declare correspondence
more clearly.

Thanks,

> +enum rw_hint io_type_to_rw_hint(enum page_type page, enum temp_type temp)
> +{
> + if (page == DATA) {
> + switch (temp) {
> + case WARM:
> + return WRITE_LIFE_NONE;
> + case COLD:
> + return WRITE_LIFE_EXTREME;
> + case HOT:
> + return WRITE_LIFE_MEDIUM;
> + default:
> + return WRITE_LIFE_NOT_SET;
> + }
> + } else if (page == NODE) {
> + switch (temp) {
> + case WARM:
> + case HOT:
> + return WRITE_LIFE_LONG;
> + case COLD:
> + return WRITE_LIFE_EXTREME;
> + default:
> + return WRITE_LIFE_NOT_SET;
> + }
> +
> + } else {
> + return WRITE_LIFE_SHORT;
> + }
> +}
> +
>  static int __get_segment_type_2(struct f2fs_io_info *fio)
>  {
>   if (fio->type == DATA)
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index a6c5dd4..51c19a0 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -2508,6 +2508,8 @@ static int f2fs_fill_super(struct super_block *sb, void 
> *data, int silent)
>   sbi->write_io[i][j].bio = NULL;
>   spin_lock_init(>write_io[i][j].io_lock);
>   INIT_LIST_HEAD(>write_io[i][j].io_list);
> + sbi->write_io[i][j].write_hint =
> + io_type_to_rw_hint(i, j);
>   }
>   }
>  
>

Re: [PATCH] Revert "x86/apic: Remove init_bsp_APIC()"

2017-11-29 Thread Dou Liyang


Hi, Ville

At 11/30/2017 05:25 AM, Ville Syrjälä wrote:

On Wed, Nov 29, 2017 at 09:15:19AM +0800, Dou Liyang wrote:

Hi Ville,

At 11/28/2017 10:53 PM, Ville Syrjala wrote:

From: Ville Syrjälä 

This reverts commit b371ae0d4a194b178817b0edfb6a7395c7aec37a.

Causes my P3 UP machine to hang at boot with "lapic".

Cc: Dou Liyang 
Cc: Thomas Gleixner 
Cc: ying...@kernel.org
Cc: b...@redhat.com
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Signed-off-by: Ville Syrjälä 
---


Could you give me the dmesg log of your P3 UP machine?


This is with the revert:


Oops, Sorry about my ambiguous description, I knew the revert patch
could work.

Actually, It's an interesting problem. your machine (UP system without
ACPI table and MP table) will make the interrupt mode in a rare
situation:

  Virtual Wire Mode with no configuration

My test cases also simulate this situation, can work well. I hope to
know why you machine hang?

So, the true thing I want to get is the log without the revert.
or could you tell me the phenomenon and reason of your machine hang
directly. :-)

Thanks,
dou

Re: [PATCH] Revert "x86/apic: Remove init_bsp_APIC()"

2017-11-29 Thread Dou Liyang


Hi, Ville

At 11/30/2017 05:25 AM, Ville Syrjälä wrote:

On Wed, Nov 29, 2017 at 09:15:19AM +0800, Dou Liyang wrote:

Hi Ville,

At 11/28/2017 10:53 PM, Ville Syrjala wrote:

From: Ville Syrjälä 

This reverts commit b371ae0d4a194b178817b0edfb6a7395c7aec37a.

Causes my P3 UP machine to hang at boot with "lapic".

Cc: Dou Liyang 
Cc: Thomas Gleixner 
Cc: ying...@kernel.org
Cc: b...@redhat.com
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Signed-off-by: Ville Syrjälä 
---


Could you give me the dmesg log of your P3 UP machine?


This is with the revert:


Oops, Sorry about my ambiguous description, I knew the revert patch
could work.

Actually, It's an interesting problem. your machine (UP system without
ACPI table and MP table) will make the interrupt mode in a rare
situation:

  Virtual Wire Mode with no configuration

My test cases also simulate this situation, can work well. I hope to
know why you machine hang?

So, the true thing I want to get is the log without the revert.
or could you tell me the phenomenon and reason of your machine hang
directly. :-)

Thanks,
dou

Re: [PATCH v6 2/4] KVM: X86: Add Paravirt TLB Shootdown

2017-11-29 Thread Wanpeng Li

2017-11-30 0:21 GMT+08:00 Radim Krčmář :
> 2017-11-27 20:05-0800, Wanpeng Li:
>> From: Wanpeng Li 
>>
>> Remote flushing api's does a busy wait which is fine in bare-metal
>> scenario. But with-in the guest, the vcpus might have been pre-empted
>> or blocked. In this scenario, the initator vcpu would end up
>> busy-waiting for a long amount of time.
>>
>> This patch set implements para-virt flush tlbs making sure that it
>> does not wait for vcpus that are sleeping. And all the sleeping vcpus
>> flush the tlb on guest enter.
>>
>> The best result is achieved when we're overcommiting the host by running
>> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
>> vCPUs which are not scheduled and avoid the wait on the main CPU.
>>
>> Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
>> so 64 pCPUs, and each VM is 64 vCPUs.
>>
>> ebizzy -M
>>   vanillaoptimized boost
>> 1VM46799 486704%
>> 2VM23962 42691   78%
>> 3VM16152 37539  132%
>>
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Peter Zijlstra 
>> Signed-off-by: Wanpeng Li 
>> ---
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> @@ -498,6 +498,37 @@ static void __init kvm_apf_trap_init(void)
>>   update_intr_gate(X86_TRAP_PF, async_page_fault);
>>  }
>>
>> +static DEFINE_PER_CPU(cpumask_t, __pv_tlb_mask);
>> +
>> +static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>> + const struct flush_tlb_info *info)
>> +{
>> + u8 state;
>> + int cpu;
>> + struct kvm_steal_time *src;
>> + cpumask_t *flushmask = _cpu(__pv_tlb_mask, smp_processor_id());
>> +
>> + if (unlikely(!flushmask))
>> + return;
>
> I don't see how this can be NULL and if it could, we'd have to call
> native_flush_tlb_others() instead of returning anyway.
>
> Also, Peter mentioned that we're wasting memory (default is 1k per CPU)
> when not running on KVM.  Hyper-V hijacks x86_platform.apic_post_init()
> to achieve late allocation.  smp_ops.smp_prepare_cpus seems slightly
> better for our purposes, but I don't really like either.
>
> Couldn't we use use arch_initcall(), or early_initcall() if there are
> complications with allocating after smp_init()?

Do it in v7. In addition, move pv_mmu_ops.flush_tlb_others =
kvm_flush_tlb_others to the arch_initcall() fails to work even if I
disable rodata through grub. So I continue to keep the callback
replacement in kvm_guest_init() and late allocation in
arch_initcall().

Regards,
Wanpeng Li

Re: [PATCH v6 2/4] KVM: X86: Add Paravirt TLB Shootdown

2017-11-29 Thread Wanpeng Li

2017-11-30 0:21 GMT+08:00 Radim Krčmář :
> 2017-11-27 20:05-0800, Wanpeng Li:
>> From: Wanpeng Li 
>>
>> Remote flushing api's does a busy wait which is fine in bare-metal
>> scenario. But with-in the guest, the vcpus might have been pre-empted
>> or blocked. In this scenario, the initator vcpu would end up
>> busy-waiting for a long amount of time.
>>
>> This patch set implements para-virt flush tlbs making sure that it
>> does not wait for vcpus that are sleeping. And all the sleeping vcpus
>> flush the tlb on guest enter.
>>
>> The best result is achieved when we're overcommiting the host by running
>> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
>> vCPUs which are not scheduled and avoid the wait on the main CPU.
>>
>> Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
>> so 64 pCPUs, and each VM is 64 vCPUs.
>>
>> ebizzy -M
>>   vanillaoptimized boost
>> 1VM46799 486704%
>> 2VM23962 42691   78%
>> 3VM16152 37539  132%
>>
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Peter Zijlstra 
>> Signed-off-by: Wanpeng Li 
>> ---
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> @@ -498,6 +498,37 @@ static void __init kvm_apf_trap_init(void)
>>   update_intr_gate(X86_TRAP_PF, async_page_fault);
>>  }
>>
>> +static DEFINE_PER_CPU(cpumask_t, __pv_tlb_mask);
>> +
>> +static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>> + const struct flush_tlb_info *info)
>> +{
>> + u8 state;
>> + int cpu;
>> + struct kvm_steal_time *src;
>> + cpumask_t *flushmask = _cpu(__pv_tlb_mask, smp_processor_id());
>> +
>> + if (unlikely(!flushmask))
>> + return;
>
> I don't see how this can be NULL and if it could, we'd have to call
> native_flush_tlb_others() instead of returning anyway.
>
> Also, Peter mentioned that we're wasting memory (default is 1k per CPU)
> when not running on KVM.  Hyper-V hijacks x86_platform.apic_post_init()
> to achieve late allocation.  smp_ops.smp_prepare_cpus seems slightly
> better for our purposes, but I don't really like either.
>
> Couldn't we use use arch_initcall(), or early_initcall() if there are
> complications with allocating after smp_init()?

Do it in v7. In addition, move pv_mmu_ops.flush_tlb_others =
kvm_flush_tlb_others to the arch_initcall() fails to work even if I
disable rodata through grub. So I continue to keep the callback
replacement in kvm_guest_init() and late allocation in
arch_initcall().

Regards,
Wanpeng Li

[PATCH 0/2] ipc: Fix ctl(..IPC_STAT..) bugs

2017-11-29 Thread Philippe Mikoyan

Hi,

Some applications that uses System V IPC mechanisms rely on data
structures that are returned by ctl(..IPC_STAT..) system calls.

However, up to now information in these structures was not reliable,
due to following reasons:
1) Non-atomic data structures filling process, which, for obvious reasons
of not taking ipc lock, performed better - see:
commit ac0ba20ea6f2 ("ipc,msg: make msgctl_nolock lockless")
commit 16df3674efe3 ("ipc,sem: do not hold ipc lock more than necessary")
commit c97cb9ccab8c ("ipc,shm: make shmctl_nolock lockless")
2) [Refer only to shared memory] Because shm_nattch is used by kernel as
a ref_count, as a side effect it has to be increased twice in do_shmat
in order not to lose segment before mmap and shm_open.

These matters can lead, for example, to following unexpectable ipc data
structures values:
1) When there are concurrently running shmat and shmctl(... IPC_STAT ...):
{... shm_lpid = 0, shm_nattch = 1, ...}
2) If a shared memory segment was created just now and first process
attaches, another process, concurrently checking number of shmat-s via
shmctl(... IPC_STAT ...), can, at some point of time, get the following
result:
{... shm_nattch = 2, ...}

Bug reproducing code can be found here:
https://github.com/Phikimon/sysipc_break
To catch bug you have to execute and kill bug reproducing program several
times(one to five times, I guess).

In this patchset I make an attempt to fix these bugs:
1) By taking locks before filling data structure fields. Note that lock
is taken after permissions and security checks in order to increase
performance.
2) By filling certain data structure fields directly in do_shmat in order
to fill shm_nattch and other fields atomically. shm_open call is removed
from shm_mmap.

Philippe Mikoyan (2):
  ipc/shm: Fix shm_nattch incorrect value
  ipc: Fix ipc data structures inconsistency

 ipc/msg.c  | 20 +++-
 ipc/sem.c  | 10 
 ipc/shm.c  | 77 +-
 ipc/util.c |  5 +++-
 4 files changed, 69 insertions(+), 43 deletions(-)

--
2.11.0

[PATCH 0/2] ipc: Fix ctl(..IPC_STAT..) bugs

2017-11-29 Thread Philippe Mikoyan

Hi,

Some applications that uses System V IPC mechanisms rely on data
structures that are returned by ctl(..IPC_STAT..) system calls.

However, up to now information in these structures was not reliable,
due to following reasons:
1) Non-atomic data structures filling process, which, for obvious reasons
of not taking ipc lock, performed better - see:
commit ac0ba20ea6f2 ("ipc,msg: make msgctl_nolock lockless")
commit 16df3674efe3 ("ipc,sem: do not hold ipc lock more than necessary")
commit c97cb9ccab8c ("ipc,shm: make shmctl_nolock lockless")
2) [Refer only to shared memory] Because shm_nattch is used by kernel as
a ref_count, as a side effect it has to be increased twice in do_shmat
in order not to lose segment before mmap and shm_open.

These matters can lead, for example, to following unexpectable ipc data
structures values:
1) When there are concurrently running shmat and shmctl(... IPC_STAT ...):
{... shm_lpid = 0, shm_nattch = 1, ...}
2) If a shared memory segment was created just now and first process
attaches, another process, concurrently checking number of shmat-s via
shmctl(... IPC_STAT ...), can, at some point of time, get the following
result:
{... shm_nattch = 2, ...}

Bug reproducing code can be found here:
https://github.com/Phikimon/sysipc_break
To catch bug you have to execute and kill bug reproducing program several
times(one to five times, I guess).

In this patchset I make an attempt to fix these bugs:
1) By taking locks before filling data structure fields. Note that lock
is taken after permissions and security checks in order to increase
performance.
2) By filling certain data structure fields directly in do_shmat in order
to fill shm_nattch and other fields atomically. shm_open call is removed
from shm_mmap.

Philippe Mikoyan (2):
  ipc/shm: Fix shm_nattch incorrect value
  ipc: Fix ipc data structures inconsistency

 ipc/msg.c  | 20 +++-
 ipc/sem.c  | 10 
 ipc/shm.c  | 77 +-
 ipc/util.c |  5 +++-
 4 files changed, 69 insertions(+), 43 deletions(-)

--
2.11.0

Re: [PATCH v5] lib: optimize cpumask_next_and()

2017-11-29 Thread Yury Norov

On Wed, Nov 29, 2017 at 10:35:55AM +0100, Clement Courbet wrote:
> > > Note that on Arm (), the new c implementation still outperforms the
> > > old one that uses c+ the asm implementation of `find_next_bit` [3].
> > What is 'c+'? Is it typo?
> 
> I meant "a mix of C and asm" ~(C + asm). Rephrased.
> 
> > If you find generic find_bit() on arm faster that asm one, we'd
> > definitely drop that piece of asm. I have this check it in my
> > long list.
> 
> What's faster for sure is the mix (the improvement in this commit minus the
> possible hit from not using the ASM implementation). I can't tell whether the
> latter is negligible or not (I only have one ARM board to try it out), but
> that's definitly something to try.
> 
> > This is old version of test based on get_cycles. New one is based on
> > ktime_get and has other minor changes. I think you'd rerun tests to
> > not confuse readers. New version is already in linux-next.
> 
> So I'm not sure whether I should be submitting this against 'linux' or
> 'linux-next' ? This patch is against 'linux', so I think it should
> be consistent with the code around.

Linux-next is your choice. 
https://lwn.net/Articles/289013/


> > > #ifndef find_first_bit
> > > #define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
> > > #endif
> > > #ifndef find_first_zero_bit
> > > #define find_first_zero_bit(addr, size) find_next_zero_bit((addr), 
> > > (size), 0)
> > > #endif
> > How this change related to the find_next_and_bit?
> 
> The arm header defines these symbols. Now that we're including
> the generic implementation in the arm headers, we need to guard this to
> avoid the duplicate definition.

So I think it really worth to be separated patch. Really, it's
completely nontrivial why adding new function in lib/find_bit.c
requires including asm-generic/bitops/find.h in arm and uncore32
asm/bitops.h headers (bug?). And why doing that makes you guard
find_first_bit and find_first_zero_bit (another bug?).

Headers are always important. Please elaborate it and also CC arm
and uncore32 communities, linux-arch and Arnd Bergman.

> > > test_find_next_and_bit_ref
> > I don't understand the purpose of this. It's obviously clear that
> > test_find_next_and_bit cannot be slower than test_find_next_and_bit_ref
> 
> Fair enough :) That was to back my claim that this commit is worth it.
> I've removed the "_ref" version.
> 
> > For sparse bitmaps it will be like traversing zero-bitmaps. I doubt
> > this numbers will be representative. Do we need this test at all?
> 
> It's just two lines, and gives an interesting data point. Why not
> keep it ?

---

Again. test_find_next_and_bit is trimmed, but it is still based on
get_cycles and uses tabs in printf(). Please fix it.

Yury

Re: [PATCH v5] lib: optimize cpumask_next_and()

2017-11-29 Thread Yury Norov

On Wed, Nov 29, 2017 at 10:35:55AM +0100, Clement Courbet wrote:
> > > Note that on Arm (), the new c implementation still outperforms the
> > > old one that uses c+ the asm implementation of `find_next_bit` [3].
> > What is 'c+'? Is it typo?
> 
> I meant "a mix of C and asm" ~(C + asm). Rephrased.
> 
> > If you find generic find_bit() on arm faster that asm one, we'd
> > definitely drop that piece of asm. I have this check it in my
> > long list.
> 
> What's faster for sure is the mix (the improvement in this commit minus the
> possible hit from not using the ASM implementation). I can't tell whether the
> latter is negligible or not (I only have one ARM board to try it out), but
> that's definitly something to try.
> 
> > This is old version of test based on get_cycles. New one is based on
> > ktime_get and has other minor changes. I think you'd rerun tests to
> > not confuse readers. New version is already in linux-next.
> 
> So I'm not sure whether I should be submitting this against 'linux' or
> 'linux-next' ? This patch is against 'linux', so I think it should
> be consistent with the code around.

Linux-next is your choice. 
https://lwn.net/Articles/289013/


> > > #ifndef find_first_bit
> > > #define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
> > > #endif
> > > #ifndef find_first_zero_bit
> > > #define find_first_zero_bit(addr, size) find_next_zero_bit((addr), 
> > > (size), 0)
> > > #endif
> > How this change related to the find_next_and_bit?
> 
> The arm header defines these symbols. Now that we're including
> the generic implementation in the arm headers, we need to guard this to
> avoid the duplicate definition.

So I think it really worth to be separated patch. Really, it's
completely nontrivial why adding new function in lib/find_bit.c
requires including asm-generic/bitops/find.h in arm and uncore32
asm/bitops.h headers (bug?). And why doing that makes you guard
find_first_bit and find_first_zero_bit (another bug?).

Headers are always important. Please elaborate it and also CC arm
and uncore32 communities, linux-arch and Arnd Bergman.

> > > test_find_next_and_bit_ref
> > I don't understand the purpose of this. It's obviously clear that
> > test_find_next_and_bit cannot be slower than test_find_next_and_bit_ref
> 
> Fair enough :) That was to back my claim that this commit is worth it.
> I've removed the "_ref" version.
> 
> > For sparse bitmaps it will be like traversing zero-bitmaps. I doubt
> > this numbers will be representative. Do we need this test at all?
> 
> It's just two lines, and gives an interesting data point. Why not
> keep it ?

---

Again. test_find_next_and_bit is trimmed, but it is still based on
get_cycles and uses tabs in printf(). Please fix it.

Yury

Re: [PATCH v2] f2fs: apply write hints to select the type of segment for direct write

2017-11-29 Thread Chao Yu

On 2017/11/28 8:23, Hyunchul Lee wrote:
> From: Hyunchul Lee 
> 
> When blocks are allocated for direct write, select the type of
> segment using the kiocb hint. But if an inode has FI_NO_ALLOC,
> use the inode hint.
> 
> Signed-off-by: Hyunchul Lee 

Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH v2] f2fs: apply write hints to select the type of segment for direct write

2017-11-29 Thread Chao Yu

On 2017/11/28 8:23, Hyunchul Lee wrote:
> From: Hyunchul Lee 
> 
> When blocks are allocated for direct write, select the type of
> segment using the kiocb hint. But if an inode has FI_NO_ALLOC,
> use the inode hint.
> 
> Signed-off-by: Hyunchul Lee 

Reviewed-by: Chao Yu 

Thanks,

[PATCH 2/2] ipc: Fix ipc data structures inconsistency

2017-11-29 Thread Philippe Mikoyan

As described in the title, this patch fixes id_ds inconsistency
when ctl_stat runs concurrently with some ds-changing function,
e.g. shmat, msgsnd or whatever.

For instance, if shmctl(IPC_STAT) is running concurrently with shmat,
following data structure can be returned:
{... shm_lpid = 0, shm_nattch = 1, ...}

Signed-off-by: Philippe Mikoyan 
---
 ipc/msg.c  | 20 ++--
 ipc/sem.c  | 10 ++
 ipc/shm.c  | 19 ++-
 ipc/util.c |  5 -
 4 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 06be5a9adfa4..047579b42de4 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -475,9 +475,9 @@ static int msgctl_info(struct ipc_namespace *ns, int msqid,
 static int msgctl_stat(struct ipc_namespace *ns, int msqid,
 int cmd, struct msqid64_ds *p)
 {
-   int err;
struct msg_queue *msq;
-   int success_return;
+   int id = 0;
+   int err;

memset(p, 0, sizeof(*p));

@@ -488,14 +488,13 @@ static int msgctl_stat(struct ipc_namespace *ns, int 
msqid,
err = PTR_ERR(msq);
goto out_unlock;
}
-   success_return = msq->q_perm.id;
+   id = msq->q_perm.id;
} else {
msq = msq_obtain_object_check(ns, msqid);
if (IS_ERR(msq)) {
err = PTR_ERR(msq);
goto out_unlock;
}
-   success_return = 0;
}

err = -EACCES;
@@ -506,6 +505,14 @@ static int msgctl_stat(struct ipc_namespace *ns, int msqid,
if (err)
goto out_unlock;

+   ipc_lock_object(>q_perm);
+
+   if (!ipc_valid_object(>q_perm)) {
+   ipc_unlock_object(>q_perm);
+   err = -EIDRM;
+   goto out_unlock;
+   }
+
kernel_to_ipc64_perm(>q_perm, >msg_perm);
p->msg_stime  = msq->q_stime;
p->msg_rtime  = msq->q_rtime;
@@ -515,9 +522,10 @@ static int msgctl_stat(struct ipc_namespace *ns, int msqid,
p->msg_qbytes = msq->q_qbytes;
p->msg_lspid  = msq->q_lspid;
p->msg_lrpid  = msq->q_lrpid;
-   rcu_read_unlock();

-   return success_return;
+   ipc_unlock_object(>q_perm);
+   rcu_read_unlock();
+   return id;

 out_unlock:
rcu_read_unlock();
diff --git a/ipc/sem.c b/ipc/sem.c
index f7385bce5fd3..9b6f80d1b3f1 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1211,10 +1211,20 @@ static int semctl_stat(struct ipc_namespace *ns, int 
semid,
if (err)
goto out_unlock;

+   ipc_lock_object(>sem_perm);
+
+   if (!ipc_valid_object(>sem_perm)) {
+   ipc_unlock_object(>sem_perm);
+   err = -EIDRM;
+   goto out_unlock;
+   }
+
kernel_to_ipc64_perm(>sem_perm, >sem_perm);
semid64->sem_otime = get_semotime(sma);
semid64->sem_ctime = sma->sem_ctime;
semid64->sem_nsems = sma->sem_nsems;
+
+   ipc_unlock_object(>sem_perm);
rcu_read_unlock();
return id;

diff --git a/ipc/shm.c b/ipc/shm.c
index 565f17925128..8f58faba7429 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -896,9 +896,11 @@ static int shmctl_stat(struct ipc_namespace *ns, int shmid,
int cmd, struct shmid64_ds *tbuf)
 {
struct shmid_kernel *shp;
-   int result;
+   int id = 0;
int err;

+   memset(tbuf, 0, sizeof(*tbuf));
+
rcu_read_lock();
if (cmd == SHM_STAT) {
shp = shm_obtain_object(ns, shmid);
@@ -906,14 +908,13 @@ static int shmctl_stat(struct ipc_namespace *ns, int 
shmid,
err = PTR_ERR(shp);
goto out_unlock;
}
-   result = shp->shm_perm.id;
+   id = shp->shm_perm.id;
} else {
shp = shm_obtain_object_check(ns, shmid);
if (IS_ERR(shp)) {
err = PTR_ERR(shp);
goto out_unlock;
}
-   result = 0;
}

err = -EACCES;
@@ -924,7 +925,14 @@ static int shmctl_stat(struct ipc_namespace *ns, int shmid,
if (err)
goto out_unlock;

-   memset(tbuf, 0, sizeof(*tbuf));
+   ipc_lock_object(>shm_perm);
+
+   if (!ipc_valid_object(>shm_perm)) {
+   ipc_unlock_object(>shm_perm);
+   err = -EIDRM;
+   goto out_unlock;
+   }
+
kernel_to_ipc64_perm(>shm_perm, >shm_perm);
tbuf->shm_segsz = shp->shm_segsz;
tbuf->shm_atime = shp->shm_atim;
@@ -934,8 +942,9 @@ static int shmctl_stat(struct ipc_namespace *ns, int shmid,
tbuf->shm_lpid  = shp->shm_lprid;
tbuf->shm_nattch = shp->shm_nattch;

+   ipc_unlock_object(>shm_perm);
rcu_read_unlock();
-   return result;
+   return id;

 out_unlock:
rcu_read_unlock();
diff --git

[PATCH 2/2] ipc: Fix ipc data structures inconsistency

2017-11-29 Thread Philippe Mikoyan

As described in the title, this patch fixes id_ds inconsistency
when ctl_stat runs concurrently with some ds-changing function,
e.g. shmat, msgsnd or whatever.

For instance, if shmctl(IPC_STAT) is running concurrently with shmat,
following data structure can be returned:
{... shm_lpid = 0, shm_nattch = 1, ...}

Signed-off-by: Philippe Mikoyan 
---
 ipc/msg.c  | 20 ++--
 ipc/sem.c  | 10 ++
 ipc/shm.c  | 19 ++-
 ipc/util.c |  5 -
 4 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 06be5a9adfa4..047579b42de4 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -475,9 +475,9 @@ static int msgctl_info(struct ipc_namespace *ns, int msqid,
 static int msgctl_stat(struct ipc_namespace *ns, int msqid,
 int cmd, struct msqid64_ds *p)
 {
-   int err;
struct msg_queue *msq;
-   int success_return;
+   int id = 0;
+   int err;

memset(p, 0, sizeof(*p));

@@ -488,14 +488,13 @@ static int msgctl_stat(struct ipc_namespace *ns, int 
msqid,
err = PTR_ERR(msq);
goto out_unlock;
}
-   success_return = msq->q_perm.id;
+   id = msq->q_perm.id;
} else {
msq = msq_obtain_object_check(ns, msqid);
if (IS_ERR(msq)) {
err = PTR_ERR(msq);
goto out_unlock;
}
-   success_return = 0;
}

err = -EACCES;
@@ -506,6 +505,14 @@ static int msgctl_stat(struct ipc_namespace *ns, int msqid,
if (err)
goto out_unlock;

+   ipc_lock_object(>q_perm);
+
+   if (!ipc_valid_object(>q_perm)) {
+   ipc_unlock_object(>q_perm);
+   err = -EIDRM;
+   goto out_unlock;
+   }
+
kernel_to_ipc64_perm(>q_perm, >msg_perm);
p->msg_stime  = msq->q_stime;
p->msg_rtime  = msq->q_rtime;
@@ -515,9 +522,10 @@ static int msgctl_stat(struct ipc_namespace *ns, int msqid,
p->msg_qbytes = msq->q_qbytes;
p->msg_lspid  = msq->q_lspid;
p->msg_lrpid  = msq->q_lrpid;
-   rcu_read_unlock();

-   return success_return;
+   ipc_unlock_object(>q_perm);
+   rcu_read_unlock();
+   return id;

 out_unlock:
rcu_read_unlock();
diff --git a/ipc/sem.c b/ipc/sem.c
index f7385bce5fd3..9b6f80d1b3f1 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1211,10 +1211,20 @@ static int semctl_stat(struct ipc_namespace *ns, int 
semid,
if (err)
goto out_unlock;

+   ipc_lock_object(>sem_perm);
+
+   if (!ipc_valid_object(>sem_perm)) {
+   ipc_unlock_object(>sem_perm);
+   err = -EIDRM;
+   goto out_unlock;
+   }
+
kernel_to_ipc64_perm(>sem_perm, >sem_perm);
semid64->sem_otime = get_semotime(sma);
semid64->sem_ctime = sma->sem_ctime;
semid64->sem_nsems = sma->sem_nsems;
+
+   ipc_unlock_object(>sem_perm);
rcu_read_unlock();
return id;

diff --git a/ipc/shm.c b/ipc/shm.c
index 565f17925128..8f58faba7429 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -896,9 +896,11 @@ static int shmctl_stat(struct ipc_namespace *ns, int shmid,
int cmd, struct shmid64_ds *tbuf)
 {
struct shmid_kernel *shp;
-   int result;
+   int id = 0;
int err;

+   memset(tbuf, 0, sizeof(*tbuf));
+
rcu_read_lock();
if (cmd == SHM_STAT) {
shp = shm_obtain_object(ns, shmid);
@@ -906,14 +908,13 @@ static int shmctl_stat(struct ipc_namespace *ns, int 
shmid,
err = PTR_ERR(shp);
goto out_unlock;
}
-   result = shp->shm_perm.id;
+   id = shp->shm_perm.id;
} else {
shp = shm_obtain_object_check(ns, shmid);
if (IS_ERR(shp)) {
err = PTR_ERR(shp);
goto out_unlock;
}
-   result = 0;
}

err = -EACCES;
@@ -924,7 +925,14 @@ static int shmctl_stat(struct ipc_namespace *ns, int shmid,
if (err)
goto out_unlock;

-   memset(tbuf, 0, sizeof(*tbuf));
+   ipc_lock_object(>shm_perm);
+
+   if (!ipc_valid_object(>shm_perm)) {
+   ipc_unlock_object(>shm_perm);
+   err = -EIDRM;
+   goto out_unlock;
+   }
+
kernel_to_ipc64_perm(>shm_perm, >shm_perm);
tbuf->shm_segsz = shp->shm_segsz;
tbuf->shm_atime = shp->shm_atim;
@@ -934,8 +942,9 @@ static int shmctl_stat(struct ipc_namespace *ns, int shmid,
tbuf->shm_lpid  = shp->shm_lprid;
tbuf->shm_nattch = shp->shm_nattch;

+   ipc_unlock_object(>shm_perm);
rcu_read_unlock();
-   return result;
+   return id;

 out_unlock:
rcu_read_unlock();
diff --git a/ipc/util.c b/ipc/util.c
index

[PATCH 1/2] ipc/shm: Fix shm_nattch incorrect value

2017-11-29 Thread Philippe Mikoyan

This patch fixes that do_shmat increases shm_nattch value twice.

E.g. if memory segment was created just now and process attaches it,
shmctl(..IPC_STAT..) of concurrently running process can at some
point of time return data structure with 'shm_nattch' equal to 2.

Signed-off-by: Philippe Mikoyan 
Signed-off-by: Edgar Kaziakhmedov 
---
 ipc/shm.c | 58 +++---
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index badac463e2c8..565f17925128 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -190,33 +190,31 @@ static inline void shm_rmid(struct ipc_namespace *ns, 
struct shmid_kernel *s)
ipc_rmid(_ids(ns), >shm_perm);
 }

-
-static int __shm_open(struct vm_area_struct *vma)
-{
-   struct file *file = vma->vm_file;
-   struct shm_file_data *sfd = shm_file_data(file);
-   struct shmid_kernel *shp;
-
-   shp = shm_lock(sfd->ns, sfd->id);
-
-   if (IS_ERR(shp))
-   return PTR_ERR(shp);
-
-   shp->shm_atim = ktime_get_real_seconds();
-   shp->shm_lprid = task_tgid_vnr(current);
-   shp->shm_nattch++;
-   shm_unlock(shp);
-   return 0;
-}
-
 /* This is called by fork, once for every shm attach. */
 static void shm_open(struct vm_area_struct *vma)
 {
-   int err = __shm_open(vma);
+   struct file *file = vma->vm_file;
+   struct shm_file_data *sfd = shm_file_data(file);
+   struct shmid_kernel *shp;
+   int err = 0;
+
+   shp = shm_lock(sfd->ns, sfd->id);
+
+   if (IS_ERR(shp)) {
+   err = PTR_ERR(shp);
+   goto warn;
+   }
+
+   shp->shm_atim = ktime_get_real_seconds();
+   shp->shm_lprid = task_tgid_vnr(current);
+   shp->shm_nattch++;
+   shm_unlock(shp);
+
/*
 * We raced in the idr lookup or with shm_destroy().
 * Either way, the ID is busted.
 */
+warn:
WARN_ON_ONCE(err);
 }

@@ -418,19 +416,10 @@ static int shm_mmap(struct file *file, struct 
vm_area_struct *vma)
struct shm_file_data *sfd = shm_file_data(file);
int ret;

-   /*
-* In case of remap_file_pages() emulation, the file can represent
-* removed IPC ID: propogate shm_lock() error to caller.
-*/
-   ret = __shm_open(vma);
-   if (ret)
-   return ret;
-
ret = call_mmap(sfd->file, vma);
-   if (ret) {
-   shm_close(vma);
+   if (ret)
return ret;
-   }
+
sfd->vm_ops = vma->vm_ops;
 #ifdef CONFIG_MMU
WARN_ON(!sfd->vm_ops->fault);
@@ -944,6 +933,7 @@ static int shmctl_stat(struct ipc_namespace *ns, int shmid,
tbuf->shm_cpid  = shp->shm_cprid;
tbuf->shm_lpid  = shp->shm_lprid;
tbuf->shm_nattch = shp->shm_nattch;
+
rcu_read_unlock();
return result;

@@ -1351,7 +1341,11 @@ long do_shmat(int shmid, char __user *shmaddr, int 
shmflg,

path = shp->shm_file->f_path;
path_get();
+
+   shp->shm_atim = ktime_get_real_seconds();
+   shp->shm_lprid = task_tgid_vnr(current);
shp->shm_nattch++;
+
size = i_size_read(d_inode(path.dentry));
ipc_unlock_object(>shm_perm);
rcu_read_unlock();
@@ -1411,6 +1405,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,

 out_fput:
fput(file);
+   if (!err)
+   goto out;

 out_nattch:
down_write(_ids(ns).rwsem);
--
2.11.0

[PATCH 1/2] ipc/shm: Fix shm_nattch incorrect value

2017-11-29 Thread Philippe Mikoyan

This patch fixes that do_shmat increases shm_nattch value twice.

E.g. if memory segment was created just now and process attaches it,
shmctl(..IPC_STAT..) of concurrently running process can at some
point of time return data structure with 'shm_nattch' equal to 2.

Signed-off-by: Philippe Mikoyan 
Signed-off-by: Edgar Kaziakhmedov 
---
 ipc/shm.c | 58 +++---
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index badac463e2c8..565f17925128 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -190,33 +190,31 @@ static inline void shm_rmid(struct ipc_namespace *ns, 
struct shmid_kernel *s)
ipc_rmid(_ids(ns), >shm_perm);
 }

-
-static int __shm_open(struct vm_area_struct *vma)
-{
-   struct file *file = vma->vm_file;
-   struct shm_file_data *sfd = shm_file_data(file);
-   struct shmid_kernel *shp;
-
-   shp = shm_lock(sfd->ns, sfd->id);
-
-   if (IS_ERR(shp))
-   return PTR_ERR(shp);
-
-   shp->shm_atim = ktime_get_real_seconds();
-   shp->shm_lprid = task_tgid_vnr(current);
-   shp->shm_nattch++;
-   shm_unlock(shp);
-   return 0;
-}
-
 /* This is called by fork, once for every shm attach. */
 static void shm_open(struct vm_area_struct *vma)
 {
-   int err = __shm_open(vma);
+   struct file *file = vma->vm_file;
+   struct shm_file_data *sfd = shm_file_data(file);
+   struct shmid_kernel *shp;
+   int err = 0;
+
+   shp = shm_lock(sfd->ns, sfd->id);
+
+   if (IS_ERR(shp)) {
+   err = PTR_ERR(shp);
+   goto warn;
+   }
+
+   shp->shm_atim = ktime_get_real_seconds();
+   shp->shm_lprid = task_tgid_vnr(current);
+   shp->shm_nattch++;
+   shm_unlock(shp);
+
/*
 * We raced in the idr lookup or with shm_destroy().
 * Either way, the ID is busted.
 */
+warn:
WARN_ON_ONCE(err);
 }

@@ -418,19 +416,10 @@ static int shm_mmap(struct file *file, struct 
vm_area_struct *vma)
struct shm_file_data *sfd = shm_file_data(file);
int ret;

-   /*
-* In case of remap_file_pages() emulation, the file can represent
-* removed IPC ID: propogate shm_lock() error to caller.
-*/
-   ret = __shm_open(vma);
-   if (ret)
-   return ret;
-
ret = call_mmap(sfd->file, vma);
-   if (ret) {
-   shm_close(vma);
+   if (ret)
return ret;
-   }
+
sfd->vm_ops = vma->vm_ops;
 #ifdef CONFIG_MMU
WARN_ON(!sfd->vm_ops->fault);
@@ -944,6 +933,7 @@ static int shmctl_stat(struct ipc_namespace *ns, int shmid,
tbuf->shm_cpid  = shp->shm_cprid;
tbuf->shm_lpid  = shp->shm_lprid;
tbuf->shm_nattch = shp->shm_nattch;
+
rcu_read_unlock();
return result;

@@ -1351,7 +1341,11 @@ long do_shmat(int shmid, char __user *shmaddr, int 
shmflg,

path = shp->shm_file->f_path;
path_get();
+
+   shp->shm_atim = ktime_get_real_seconds();
+   shp->shm_lprid = task_tgid_vnr(current);
shp->shm_nattch++;
+
size = i_size_read(d_inode(path.dentry));
ipc_unlock_object(>shm_perm);
rcu_read_unlock();
@@ -1411,6 +1405,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,

 out_fput:
fput(file);
+   if (!err)
+   goto out;

 out_nattch:
down_write(_ids(ns).rwsem);
--
2.11.0

[RESENT PATCH] drm/panel: support Innolux P097PFG panel

2017-11-29 Thread Lin Huang

Support Innolux P097PFG 9.7" 1536x2048 TFT LCD panel,
it refactor Innolux P079ZCA panel driver, let it support
multi panel, and add support P097PFG panel in this driver.

Signed-off-by: Lin Huang 

---
 drivers/gpu/drm/panel/panel-innolux-p079zca.c | 178 --
 1 file changed, 136 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-innolux-p079zca.c 
b/drivers/gpu/drm/panel/panel-innolux-p079zca.c
index 6ba9344..a40798f 100644
--- a/drivers/gpu/drm/panel/panel-innolux-p079zca.c
+++ b/drivers/gpu/drm/panel/panel-innolux-p079zca.c
@@ -20,12 +20,32 @@
 
 #include 
 
+struct panel_desc {
+   const struct drm_display_mode *modes;
+   unsigned int bpc;
+   struct {
+   unsigned int width;
+   unsigned int height;
+   } size;
+};
+
+struct panel_desc_dsi {
+   struct panel_desc desc;
+
+   unsigned long flags;
+   enum mipi_dsi_pixel_format format;
+   unsigned int lanes;
+};
+
 struct innolux_panel {
struct drm_panel base;
struct mipi_dsi_device *link;
+   const struct panel_desc_dsi *dsi_desc;
 
struct backlight_device *backlight;
-   struct regulator *supply;
+   struct regulator *vddi;
+   struct regulator *avdd;
+   struct regulator *avee;
struct gpio_desc *enable_gpio;
 
bool prepared;
@@ -78,9 +98,9 @@ static int innolux_panel_unprepare(struct drm_panel *panel)
/* T8: 80ms - 1000ms */
msleep(80);
 
-   err = regulator_disable(innolux->supply);
-   if (err < 0)
-   return err;
+   regulator_disable(innolux->avee);
+   regulator_disable(innolux->avdd);
+   regulator_disable(innolux->vddi);
 
innolux->prepared = false;
 
@@ -97,10 +117,18 @@ static int innolux_panel_prepare(struct drm_panel *panel)
 
gpiod_set_value_cansleep(innolux->enable_gpio, 0);
 
-   err = regulator_enable(innolux->supply);
+   err = regulator_enable(innolux->vddi);
if (err < 0)
return err;
 
+   err = regulator_enable(innolux->avdd);
+   if (err < 0)
+   goto disable_vddi;
+
+   err = regulator_enable(innolux->avee);
+   if (err < 0)
+   goto disable_avdd;
+
/* T2: 15ms - 1000ms */
usleep_range(15000, 16000);
 
@@ -134,12 +162,13 @@ static int innolux_panel_prepare(struct drm_panel *panel)
return 0;
 
 poweroff:
-   regulator_err = regulator_disable(innolux->supply);
-   if (regulator_err)
-   DRM_DEV_ERROR(panel->dev, "failed to disable regulator: %d\n",
- regulator_err);
-
gpiod_set_value_cansleep(innolux->enable_gpio, 0);
+   regulator_disable(innolux->avee);
+disable_avdd:
+   regulator_disable(innolux->avdd);
+disable_vddi:
+   regulator_disable(innolux->vddi);
+
return err;
 }
 
@@ -164,7 +193,7 @@ static int innolux_panel_enable(struct drm_panel *panel)
return 0;
 }
 
-static const struct drm_display_mode default_mode = {
+static const struct drm_display_mode innolux_p079zca_mode = {
.clock = 56900,
.hdisplay = 768,
.hsync_start = 768 + 40,
@@ -177,15 +206,59 @@ static const struct drm_display_mode default_mode = {
.vrefresh = 60,
 };
 
+static const struct panel_desc_dsi innolux_p079zca_panel_desc = {
+   .desc = {
+   .modes = _p079zca_mode,
+   .bpc = 8,
+   .size = {
+   .width = 120,
+   .height = 160,
+   },
+   },
+   .flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
+MIPI_DSI_MODE_LPM,
+   .format = MIPI_DSI_FMT_RGB888,
+   .lanes = 4,
+};
+
+static const struct drm_display_mode innolux_p097pfg_mode = {
+   .clock = 22,
+   .hdisplay = 1536,
+   .hsync_start = 1536 + 100,
+   .hsync_end = 1536 + 100 + 24,
+   .htotal = 1536 + 100 + 24 + 100,
+   .vdisplay = 2048,
+   .vsync_start = 2048 + 18,
+   .vsync_end = 2048 + 18 + 2,
+   .vtotal = 2048 + 18 + 2 + 18,
+   .vrefresh = 60,
+};
+
+static const struct panel_desc_dsi innolux_p097pfg_panel_desc = {
+   .desc = {
+   .modes = _p097pfg_mode,
+   .bpc = 8,
+   .size = {
+   .width = 147,
+   .height = 196,
+   },
+   },
+   .flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
+MIPI_DSI_MODE_LPM,
+   .format = MIPI_DSI_FMT_RGB888,
+   .lanes = 8,
+};
+
 static int innolux_panel_get_modes(struct drm_panel *panel)
 {
struct drm_display_mode *mode;
+   struct innolux_panel *innolux = to_innolux_panel(panel);
+   const struct drm_display_mode *m = innolux->dsi_desc->desc.modes;
 
-   mode = drm_mode_duplicate(panel->drm, _mode);
+   mode = drm_mode_duplicate(panel->drm, m);
if (!mode) {

[RESENT PATCH] drm/panel: support Innolux P097PFG panel

2017-11-29 Thread Lin Huang

Support Innolux P097PFG 9.7" 1536x2048 TFT LCD panel,
it refactor Innolux P079ZCA panel driver, let it support
multi panel, and add support P097PFG panel in this driver.

Signed-off-by: Lin Huang 

---
 drivers/gpu/drm/panel/panel-innolux-p079zca.c | 178 --
 1 file changed, 136 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-innolux-p079zca.c 
b/drivers/gpu/drm/panel/panel-innolux-p079zca.c
index 6ba9344..a40798f 100644
--- a/drivers/gpu/drm/panel/panel-innolux-p079zca.c
+++ b/drivers/gpu/drm/panel/panel-innolux-p079zca.c
@@ -20,12 +20,32 @@
 
 #include 
 
+struct panel_desc {
+   const struct drm_display_mode *modes;
+   unsigned int bpc;
+   struct {
+   unsigned int width;
+   unsigned int height;
+   } size;
+};
+
+struct panel_desc_dsi {
+   struct panel_desc desc;
+
+   unsigned long flags;
+   enum mipi_dsi_pixel_format format;
+   unsigned int lanes;
+};
+
 struct innolux_panel {
struct drm_panel base;
struct mipi_dsi_device *link;
+   const struct panel_desc_dsi *dsi_desc;
 
struct backlight_device *backlight;
-   struct regulator *supply;
+   struct regulator *vddi;
+   struct regulator *avdd;
+   struct regulator *avee;
struct gpio_desc *enable_gpio;
 
bool prepared;
@@ -78,9 +98,9 @@ static int innolux_panel_unprepare(struct drm_panel *panel)
/* T8: 80ms - 1000ms */
msleep(80);
 
-   err = regulator_disable(innolux->supply);
-   if (err < 0)
-   return err;
+   regulator_disable(innolux->avee);
+   regulator_disable(innolux->avdd);
+   regulator_disable(innolux->vddi);
 
innolux->prepared = false;
 
@@ -97,10 +117,18 @@ static int innolux_panel_prepare(struct drm_panel *panel)
 
gpiod_set_value_cansleep(innolux->enable_gpio, 0);
 
-   err = regulator_enable(innolux->supply);
+   err = regulator_enable(innolux->vddi);
if (err < 0)
return err;
 
+   err = regulator_enable(innolux->avdd);
+   if (err < 0)
+   goto disable_vddi;
+
+   err = regulator_enable(innolux->avee);
+   if (err < 0)
+   goto disable_avdd;
+
/* T2: 15ms - 1000ms */
usleep_range(15000, 16000);
 
@@ -134,12 +162,13 @@ static int innolux_panel_prepare(struct drm_panel *panel)
return 0;
 
 poweroff:
-   regulator_err = regulator_disable(innolux->supply);
-   if (regulator_err)
-   DRM_DEV_ERROR(panel->dev, "failed to disable regulator: %d\n",
- regulator_err);
-
gpiod_set_value_cansleep(innolux->enable_gpio, 0);
+   regulator_disable(innolux->avee);
+disable_avdd:
+   regulator_disable(innolux->avdd);
+disable_vddi:
+   regulator_disable(innolux->vddi);
+
return err;
 }
 
@@ -164,7 +193,7 @@ static int innolux_panel_enable(struct drm_panel *panel)
return 0;
 }
 
-static const struct drm_display_mode default_mode = {
+static const struct drm_display_mode innolux_p079zca_mode = {
.clock = 56900,
.hdisplay = 768,
.hsync_start = 768 + 40,
@@ -177,15 +206,59 @@ static const struct drm_display_mode default_mode = {
.vrefresh = 60,
 };
 
+static const struct panel_desc_dsi innolux_p079zca_panel_desc = {
+   .desc = {
+   .modes = _p079zca_mode,
+   .bpc = 8,
+   .size = {
+   .width = 120,
+   .height = 160,
+   },
+   },
+   .flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
+MIPI_DSI_MODE_LPM,
+   .format = MIPI_DSI_FMT_RGB888,
+   .lanes = 4,
+};
+
+static const struct drm_display_mode innolux_p097pfg_mode = {
+   .clock = 22,
+   .hdisplay = 1536,
+   .hsync_start = 1536 + 100,
+   .hsync_end = 1536 + 100 + 24,
+   .htotal = 1536 + 100 + 24 + 100,
+   .vdisplay = 2048,
+   .vsync_start = 2048 + 18,
+   .vsync_end = 2048 + 18 + 2,
+   .vtotal = 2048 + 18 + 2 + 18,
+   .vrefresh = 60,
+};
+
+static const struct panel_desc_dsi innolux_p097pfg_panel_desc = {
+   .desc = {
+   .modes = _p097pfg_mode,
+   .bpc = 8,
+   .size = {
+   .width = 147,
+   .height = 196,
+   },
+   },
+   .flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
+MIPI_DSI_MODE_LPM,
+   .format = MIPI_DSI_FMT_RGB888,
+   .lanes = 8,
+};
+
 static int innolux_panel_get_modes(struct drm_panel *panel)
 {
struct drm_display_mode *mode;
+   struct innolux_panel *innolux = to_innolux_panel(panel);
+   const struct drm_display_mode *m = innolux->dsi_desc->desc.modes;
 
-   mode = drm_mode_duplicate(panel->drm, _mode);
+   mode = drm_mode_duplicate(panel->drm, m);
if (!mode) {

[PATCH] drm/panel: support Innolux P097PFG panel

2017-11-29 Thread Lin Huang

Support Innolux P097PFG 9.7" 1536x2048 TFT LCD panel,
it refactor Innolux P079ZCA panel driver, let it support
multi panel, and add support P097PFG panel in this driver.

Change-Id: If342e58a3de2861219b0b1313f402b6cb41ffa29
Signed-off-by: Lin Huang 
---
 drivers/gpu/drm/panel/panel-innolux-p079zca.c | 178 --
 1 file changed, 136 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-innolux-p079zca.c 
b/drivers/gpu/drm/panel/panel-innolux-p079zca.c
index 6ba9344..a40798f 100644
--- a/drivers/gpu/drm/panel/panel-innolux-p079zca.c
+++ b/drivers/gpu/drm/panel/panel-innolux-p079zca.c
@@ -20,12 +20,32 @@
 
 #include 
 
+struct panel_desc {
+   const struct drm_display_mode *modes;
+   unsigned int bpc;
+   struct {
+   unsigned int width;
+   unsigned int height;
+   } size;
+};
+
+struct panel_desc_dsi {
+   struct panel_desc desc;
+
+   unsigned long flags;
+   enum mipi_dsi_pixel_format format;
+   unsigned int lanes;
+};
+
 struct innolux_panel {
struct drm_panel base;
struct mipi_dsi_device *link;
+   const struct panel_desc_dsi *dsi_desc;
 
struct backlight_device *backlight;
-   struct regulator *supply;
+   struct regulator *vddi;
+   struct regulator *avdd;
+   struct regulator *avee;
struct gpio_desc *enable_gpio;
 
bool prepared;
@@ -78,9 +98,9 @@ static int innolux_panel_unprepare(struct drm_panel *panel)
/* T8: 80ms - 1000ms */
msleep(80);
 
-   err = regulator_disable(innolux->supply);
-   if (err < 0)
-   return err;
+   regulator_disable(innolux->avee);
+   regulator_disable(innolux->avdd);
+   regulator_disable(innolux->vddi);
 
innolux->prepared = false;
 
@@ -97,10 +117,18 @@ static int innolux_panel_prepare(struct drm_panel *panel)
 
gpiod_set_value_cansleep(innolux->enable_gpio, 0);
 
-   err = regulator_enable(innolux->supply);
+   err = regulator_enable(innolux->vddi);
if (err < 0)
return err;
 
+   err = regulator_enable(innolux->avdd);
+   if (err < 0)
+   goto disable_vddi;
+
+   err = regulator_enable(innolux->avee);
+   if (err < 0)
+   goto disable_avdd;
+
/* T2: 15ms - 1000ms */
usleep_range(15000, 16000);
 
@@ -134,12 +162,13 @@ static int innolux_panel_prepare(struct drm_panel *panel)
return 0;
 
 poweroff:
-   regulator_err = regulator_disable(innolux->supply);
-   if (regulator_err)
-   DRM_DEV_ERROR(panel->dev, "failed to disable regulator: %d\n",
- regulator_err);
-
gpiod_set_value_cansleep(innolux->enable_gpio, 0);
+   regulator_disable(innolux->avee);
+disable_avdd:
+   regulator_disable(innolux->avdd);
+disable_vddi:
+   regulator_disable(innolux->vddi);
+
return err;
 }
 
@@ -164,7 +193,7 @@ static int innolux_panel_enable(struct drm_panel *panel)
return 0;
 }
 
-static const struct drm_display_mode default_mode = {
+static const struct drm_display_mode innolux_p079zca_mode = {
.clock = 56900,
.hdisplay = 768,
.hsync_start = 768 + 40,
@@ -177,15 +206,59 @@ static const struct drm_display_mode default_mode = {
.vrefresh = 60,
 };
 
+static const struct panel_desc_dsi innolux_p079zca_panel_desc = {
+   .desc = {
+   .modes = _p079zca_mode,
+   .bpc = 8,
+   .size = {
+   .width = 120,
+   .height = 160,
+   },
+   },
+   .flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
+MIPI_DSI_MODE_LPM,
+   .format = MIPI_DSI_FMT_RGB888,
+   .lanes = 4,
+};
+
+static const struct drm_display_mode innolux_p097pfg_mode = {
+   .clock = 22,
+   .hdisplay = 1536,
+   .hsync_start = 1536 + 100,
+   .hsync_end = 1536 + 100 + 24,
+   .htotal = 1536 + 100 + 24 + 100,
+   .vdisplay = 2048,
+   .vsync_start = 2048 + 18,
+   .vsync_end = 2048 + 18 + 2,
+   .vtotal = 2048 + 18 + 2 + 18,
+   .vrefresh = 60,
+};
+
+static const struct panel_desc_dsi innolux_p097pfg_panel_desc = {
+   .desc = {
+   .modes = _p097pfg_mode,
+   .bpc = 8,
+   .size = {
+   .width = 147,
+   .height = 196,
+   },
+   },
+   .flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
+MIPI_DSI_MODE_LPM,
+   .format = MIPI_DSI_FMT_RGB888,
+   .lanes = 8,
+};
+
 static int innolux_panel_get_modes(struct drm_panel *panel)
 {
struct drm_display_mode *mode;
+   struct innolux_panel *innolux = to_innolux_panel(panel);
+   const struct drm_display_mode *m = innolux->dsi_desc->desc.modes;
 
-   mode = drm_mode_duplicate(panel->drm, _mode);
+   mode =

[PATCH] drm/panel: support Innolux P097PFG panel

2017-11-29 Thread Lin Huang

Support Innolux P097PFG 9.7" 1536x2048 TFT LCD panel,
it refactor Innolux P079ZCA panel driver, let it support
multi panel, and add support P097PFG panel in this driver.

Change-Id: If342e58a3de2861219b0b1313f402b6cb41ffa29
Signed-off-by: Lin Huang 
---
 drivers/gpu/drm/panel/panel-innolux-p079zca.c | 178 --
 1 file changed, 136 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-innolux-p079zca.c 
b/drivers/gpu/drm/panel/panel-innolux-p079zca.c
index 6ba9344..a40798f 100644
--- a/drivers/gpu/drm/panel/panel-innolux-p079zca.c
+++ b/drivers/gpu/drm/panel/panel-innolux-p079zca.c
@@ -20,12 +20,32 @@
 
 #include 
 
+struct panel_desc {
+   const struct drm_display_mode *modes;
+   unsigned int bpc;
+   struct {
+   unsigned int width;
+   unsigned int height;
+   } size;
+};
+
+struct panel_desc_dsi {
+   struct panel_desc desc;
+
+   unsigned long flags;
+   enum mipi_dsi_pixel_format format;
+   unsigned int lanes;
+};
+
 struct innolux_panel {
struct drm_panel base;
struct mipi_dsi_device *link;
+   const struct panel_desc_dsi *dsi_desc;
 
struct backlight_device *backlight;
-   struct regulator *supply;
+   struct regulator *vddi;
+   struct regulator *avdd;
+   struct regulator *avee;
struct gpio_desc *enable_gpio;
 
bool prepared;
@@ -78,9 +98,9 @@ static int innolux_panel_unprepare(struct drm_panel *panel)
/* T8: 80ms - 1000ms */
msleep(80);
 
-   err = regulator_disable(innolux->supply);
-   if (err < 0)
-   return err;
+   regulator_disable(innolux->avee);
+   regulator_disable(innolux->avdd);
+   regulator_disable(innolux->vddi);
 
innolux->prepared = false;
 
@@ -97,10 +117,18 @@ static int innolux_panel_prepare(struct drm_panel *panel)
 
gpiod_set_value_cansleep(innolux->enable_gpio, 0);
 
-   err = regulator_enable(innolux->supply);
+   err = regulator_enable(innolux->vddi);
if (err < 0)
return err;
 
+   err = regulator_enable(innolux->avdd);
+   if (err < 0)
+   goto disable_vddi;
+
+   err = regulator_enable(innolux->avee);
+   if (err < 0)
+   goto disable_avdd;
+
/* T2: 15ms - 1000ms */
usleep_range(15000, 16000);
 
@@ -134,12 +162,13 @@ static int innolux_panel_prepare(struct drm_panel *panel)
return 0;
 
 poweroff:
-   regulator_err = regulator_disable(innolux->supply);
-   if (regulator_err)
-   DRM_DEV_ERROR(panel->dev, "failed to disable regulator: %d\n",
- regulator_err);
-
gpiod_set_value_cansleep(innolux->enable_gpio, 0);
+   regulator_disable(innolux->avee);
+disable_avdd:
+   regulator_disable(innolux->avdd);
+disable_vddi:
+   regulator_disable(innolux->vddi);
+
return err;
 }
 
@@ -164,7 +193,7 @@ static int innolux_panel_enable(struct drm_panel *panel)
return 0;
 }
 
-static const struct drm_display_mode default_mode = {
+static const struct drm_display_mode innolux_p079zca_mode = {
.clock = 56900,
.hdisplay = 768,
.hsync_start = 768 + 40,
@@ -177,15 +206,59 @@ static const struct drm_display_mode default_mode = {
.vrefresh = 60,
 };
 
+static const struct panel_desc_dsi innolux_p079zca_panel_desc = {
+   .desc = {
+   .modes = _p079zca_mode,
+   .bpc = 8,
+   .size = {
+   .width = 120,
+   .height = 160,
+   },
+   },
+   .flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
+MIPI_DSI_MODE_LPM,
+   .format = MIPI_DSI_FMT_RGB888,
+   .lanes = 4,
+};
+
+static const struct drm_display_mode innolux_p097pfg_mode = {
+   .clock = 22,
+   .hdisplay = 1536,
+   .hsync_start = 1536 + 100,
+   .hsync_end = 1536 + 100 + 24,
+   .htotal = 1536 + 100 + 24 + 100,
+   .vdisplay = 2048,
+   .vsync_start = 2048 + 18,
+   .vsync_end = 2048 + 18 + 2,
+   .vtotal = 2048 + 18 + 2 + 18,
+   .vrefresh = 60,
+};
+
+static const struct panel_desc_dsi innolux_p097pfg_panel_desc = {
+   .desc = {
+   .modes = _p097pfg_mode,
+   .bpc = 8,
+   .size = {
+   .width = 147,
+   .height = 196,
+   },
+   },
+   .flags = MIPI_DSI_MODE_VIDEO | MIPI_DSI_MODE_VIDEO_SYNC_PULSE |
+MIPI_DSI_MODE_LPM,
+   .format = MIPI_DSI_FMT_RGB888,
+   .lanes = 8,
+};
+
 static int innolux_panel_get_modes(struct drm_panel *panel)
 {
struct drm_display_mode *mode;
+   struct innolux_panel *innolux = to_innolux_panel(panel);
+   const struct drm_display_mode *m = innolux->dsi_desc->desc.modes;
 
-   mode = drm_mode_duplicate(panel->drm, _mode);
+   mode =

Re: [PATCH v3 04/21] fpga: add device feature list support

2017-11-29 Thread Wu Hao

On Tue, Nov 28, 2017 at 10:07:36PM -0800, Moritz Fischer wrote:
> Hi Hao,
> 
> first pass, I didn't get all the way through, yet.

Hi Moritz

Thanks a lot for your review and comments. :)

> 
> On Mon, Nov 27, 2017 at 02:42:11PM +0800, Wu Hao wrote:
> > Device Feature List (DFL) defines a feature list structure that creates
> > a link list of feature headers within the MMIO space to provide an
> > extensible way of adding features. This patch introduces a kernel module
> > to provide basic infrastructure to support FPGA devices which implement
> > the Device Feature List.
> > 
> > Usually there will be different features and their sub features linked into
> > the DFL. This code provides common APIs for feature enumeration, it creates
> > a container device (FPGA base region), walks through the DFLs and creates
> > platform devices for feature devices (Currently it only supports two
> > different feature devices, FPGA Management Engine (FME) and Port which
> > the Accelerator Function Unit (AFU) connected to). In order to enumerate
> > the DFLs, the common APIs required low level driver to provide necessary
> > enumeration information (e.g address for each device feature list for
> > given device) and fill it to the fpga_enum_info data structure. Please
> > refer to below description for APIs added for enumeration.
> > 
> > Functions for enumeration information preparation:
> >  *fpga_enum_info_alloc
> >allocate enumeration information data structure.
> > 
> >  *fpga_enum_info_add_dfl
> >add a device feature list to fpga_enum_info data structure.
> > 
> >  *fpga_enum_info_free
> >free fpga_enum_info data structure and related resources.
> > 
> > Functions for feature device enumeration:
> >  *fpga_enumerate_feature_devs
> >enumerate feature devices and return container device.
> > 
> >  *fpga_remove_feature_devs
> >remove feature devices under given container device.
> > 
> > Signed-off-by: Tim Whisonant 
> > Signed-off-by: Enno Luebbers 
> > Signed-off-by: Shiva Rao 
> > Signed-off-by: Christopher Rauer 
> > Signed-off-by: Zhang Yi 
> > Signed-off-by: Xiao Guangrong 
> > Signed-off-by: Wu Hao 
> > 
> > v3: split from another patch.
> > separate dfl enumeration code from original pcie driver.
> > provide common data structures and APIs for enumeration.
> > update device feature list parsing process according to latest hw.
> > add dperf/iperf/hssi sub feature placeholder according to latest hw.
> > remove build_info_add_sub_feature and other small functions.
> > replace *_feature_num function with macro.
> > remove writeq/readq.
> > ---
> >  drivers/fpga/Kconfig|  16 +
> >  drivers/fpga/Makefile   |   3 +
> >  drivers/fpga/fpga-dfl.c | 884 
> > 
> >  drivers/fpga/fpga-dfl.h | 365 
> >  4 files changed, 1268 insertions(+)
> >  create mode 100644 drivers/fpga/fpga-dfl.c
> >  create mode 100644 drivers/fpga/fpga-dfl.h
> > 
> > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> > index f47ef84..01ad31f 100644
> > --- a/drivers/fpga/Kconfig
> > +++ b/drivers/fpga/Kconfig
> > @@ -124,4 +124,20 @@ config OF_FPGA_REGION
> >   Support for loading FPGA images by applying a Device Tree
> >   overlay.
> >  
> > +config FPGA_DFL
> > +   tristate "FPGA Device Feature List (DFL) support"
> > +   select FPGA_BRIDGE
> > +   select FPGA_REGION
> > +   help
> > + Device Feature List (DFL) defines a feature list structure that
> > + creates a link list of feature headers within the MMIO space
> > + to provide an extensible way of adding features for FPGA.
> > + Driver can walk through the feature headers to enumerate feature
> > + devices (e.g FPGA Management Engine, Port and Accelerator
> > + Function Unit) and their private features for target FPGA devices.
> > +
> > + Select this option to enable common support for Field-Programmable
> > + Gate Array (FPGA) solutions which implement Device Feature List.
> > + It provides enumeration APIs, and feature device infrastructure.
> > +
> >  endif # FPGA
> > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> > index 3cb276a..447ba2b 100644
> > --- a/drivers/fpga/Makefile
> > +++ b/drivers/fpga/Makefile
> > @@ -27,3 +27,6 @@ obj-$(CONFIG_XILINX_PR_DECOUPLER) += xilinx-pr-decoupler.o
> >  # High Level Interfaces
> >  obj-$(CONFIG_FPGA_REGION)  += fpga-region.o
> >  obj-$(CONFIG_OF_FPGA_REGION)   += of-fpga-region.o
> > +
> > +# FPGA Device Feature List Support
> > +obj-$(CONFIG_FPGA_DFL) += fpga-dfl.o
> > diff --git a/drivers/fpga/fpga-dfl.c b/drivers/fpga/fpga-dfl.c
> > new file mode 100644
> > index 000..6609828
> > --- /dev/null
> > +++ b/drivers/fpga/fpga-dfl.c
> > @@ -0,0 +1,884 @@
>

Re: [PATCH v3 04/21] fpga: add device feature list support

2017-11-29 Thread Wu Hao

On Tue, Nov 28, 2017 at 10:07:36PM -0800, Moritz Fischer wrote:
> Hi Hao,
> 
> first pass, I didn't get all the way through, yet.

Hi Moritz

Thanks a lot for your review and comments. :)

> 
> On Mon, Nov 27, 2017 at 02:42:11PM +0800, Wu Hao wrote:
> > Device Feature List (DFL) defines a feature list structure that creates
> > a link list of feature headers within the MMIO space to provide an
> > extensible way of adding features. This patch introduces a kernel module
> > to provide basic infrastructure to support FPGA devices which implement
> > the Device Feature List.
> > 
> > Usually there will be different features and their sub features linked into
> > the DFL. This code provides common APIs for feature enumeration, it creates
> > a container device (FPGA base region), walks through the DFLs and creates
> > platform devices for feature devices (Currently it only supports two
> > different feature devices, FPGA Management Engine (FME) and Port which
> > the Accelerator Function Unit (AFU) connected to). In order to enumerate
> > the DFLs, the common APIs required low level driver to provide necessary
> > enumeration information (e.g address for each device feature list for
> > given device) and fill it to the fpga_enum_info data structure. Please
> > refer to below description for APIs added for enumeration.
> > 
> > Functions for enumeration information preparation:
> >  *fpga_enum_info_alloc
> >allocate enumeration information data structure.
> > 
> >  *fpga_enum_info_add_dfl
> >add a device feature list to fpga_enum_info data structure.
> > 
> >  *fpga_enum_info_free
> >free fpga_enum_info data structure and related resources.
> > 
> > Functions for feature device enumeration:
> >  *fpga_enumerate_feature_devs
> >enumerate feature devices and return container device.
> > 
> >  *fpga_remove_feature_devs
> >remove feature devices under given container device.
> > 
> > Signed-off-by: Tim Whisonant 
> > Signed-off-by: Enno Luebbers 
> > Signed-off-by: Shiva Rao 
> > Signed-off-by: Christopher Rauer 
> > Signed-off-by: Zhang Yi 
> > Signed-off-by: Xiao Guangrong 
> > Signed-off-by: Wu Hao 
> > 
> > v3: split from another patch.
> > separate dfl enumeration code from original pcie driver.
> > provide common data structures and APIs for enumeration.
> > update device feature list parsing process according to latest hw.
> > add dperf/iperf/hssi sub feature placeholder according to latest hw.
> > remove build_info_add_sub_feature and other small functions.
> > replace *_feature_num function with macro.
> > remove writeq/readq.
> > ---
> >  drivers/fpga/Kconfig|  16 +
> >  drivers/fpga/Makefile   |   3 +
> >  drivers/fpga/fpga-dfl.c | 884 
> > 
> >  drivers/fpga/fpga-dfl.h | 365 
> >  4 files changed, 1268 insertions(+)
> >  create mode 100644 drivers/fpga/fpga-dfl.c
> >  create mode 100644 drivers/fpga/fpga-dfl.h
> > 
> > diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> > index f47ef84..01ad31f 100644
> > --- a/drivers/fpga/Kconfig
> > +++ b/drivers/fpga/Kconfig
> > @@ -124,4 +124,20 @@ config OF_FPGA_REGION
> >   Support for loading FPGA images by applying a Device Tree
> >   overlay.
> >  
> > +config FPGA_DFL
> > +   tristate "FPGA Device Feature List (DFL) support"
> > +   select FPGA_BRIDGE
> > +   select FPGA_REGION
> > +   help
> > + Device Feature List (DFL) defines a feature list structure that
> > + creates a link list of feature headers within the MMIO space
> > + to provide an extensible way of adding features for FPGA.
> > + Driver can walk through the feature headers to enumerate feature
> > + devices (e.g FPGA Management Engine, Port and Accelerator
> > + Function Unit) and their private features for target FPGA devices.
> > +
> > + Select this option to enable common support for Field-Programmable
> > + Gate Array (FPGA) solutions which implement Device Feature List.
> > + It provides enumeration APIs, and feature device infrastructure.
> > +
> >  endif # FPGA
> > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> > index 3cb276a..447ba2b 100644
> > --- a/drivers/fpga/Makefile
> > +++ b/drivers/fpga/Makefile
> > @@ -27,3 +27,6 @@ obj-$(CONFIG_XILINX_PR_DECOUPLER) += xilinx-pr-decoupler.o
> >  # High Level Interfaces
> >  obj-$(CONFIG_FPGA_REGION)  += fpga-region.o
> >  obj-$(CONFIG_OF_FPGA_REGION)   += of-fpga-region.o
> > +
> > +# FPGA Device Feature List Support
> > +obj-$(CONFIG_FPGA_DFL) += fpga-dfl.o
> > diff --git a/drivers/fpga/fpga-dfl.c b/drivers/fpga/fpga-dfl.c
> > new file mode 100644
> > index 000..6609828
> > --- /dev/null
> > +++ b/drivers/fpga/fpga-dfl.c
> > @@ -0,0 +1,884 @@
> > +/*
> > + * Driver for FPGA Device Feature List (DFL) Support
> > + *
> > + * Copyright (C) 2017 Intel Corporation, Inc.
> > + *
> > + * Authors:
> > + *   Kang Luwei 
>

[PATCH] mm: check pfn_valid first in zero_resv_unavail

2017-11-29 Thread Dave Young

With latest kernel I get below bug while testing kdump:

[0.00] BUG: unable to handle kernel paging request at ea00034b1040
[0.00] IP: zero_resv_unavail+0xbd/0x126
[0.00] PGD 37b98067 P4D 37b98067 PUD 37b97067 PMD 0 
[0.00] Oops: 0002 [#1] SMP
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-rc1+ #316
[0.00] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET92WW (2.42 
) 03/03/2017
[0.00] task: 81a0e4c0 task.stack: 81a0
[0.00] RIP: 0010:zero_resv_unavail+0xbd/0x126
[0.00] RSP: :81a03d88 EFLAGS: 00010006
[0.00] RAX:  RBX: ea00034b1040 RCX: 0010
[0.00] RDX:  RSI: 0092 RDI: ea00034b1040
[0.00] RBP: 000d2c41 R08: 00c0 R09: 0a0d
[0.00] R10: 0002 R11: 7f01 R12: 81a03d90
[0.00] R13: ea00 R14: 0063 R15: 0062
[0.00] FS:  () GS:81c73000() 
knlGS:
[0.00] CS:  0010 DS:  ES:  CR0: 80050033
[0.00] CR2: ea00034b1040 CR3: 37609000 CR4: 000606b0
[0.00] Call Trace:
[0.00]  ? free_area_init_nodes+0x640/0x664
[0.00]  ? zone_sizes_init+0x58/0x72
[0.00]  ? setup_arch+0xb50/0xc6c
[0.00]  ? start_kernel+0x64/0x43d
[0.00]  ? secondary_startup_64+0xa5/0xb0
[0.00] Code: c1 e8 0c 48 39 d8 76 27 48 89 de 48 c1 e3 06 48 c7 c7 7a 
87 79 81 e8 b0 c0 3e ff 4c 01 eb b9 10 00 00 00 31 c0 48 89 df 49 ff c6  ab 
eb bc 6a 00 49 
c7 c0 f0 93 d1 81 31 d2 83 ce ff 41 54 49 
[0.00] RIP: zero_resv_unavail+0xbd/0x126 RSP: 81a03d88
[0.00] CR2: ea00034b1040
[0.00] ---[ end trace f5ba9e8f73c7ee26 ]---

This is introduced with below commit:
commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
Author: Pavel Tatashin 
Date:   Wed Nov 15 17:36:31 2017 -0800

mm: zero reserved and unavailable struct pages

The reason is some efi reserved boot ranges is not reported in E820 ram.
In my case it is a bgrt buffer:
efi: mem00: [Boot Data  |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] 
range=[0xd2c41000-0xd2c85fff] (0MB)

Use "add_efi_memmap" can workaround the problem with another fix:
https://lkml.org/lkml/2017/11/30/5

In zero_resv_unavail it would be better to check pfn_valid first before zero
the page struct. This fixes the problem and potential other similar problems.

Signed-off-by: Dave Young 
---
 mm/page_alloc.c |2 ++
 1 file changed, 2 insertions(+)

--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -6253,6 +6253,8 @@ void __paginginit zero_resv_unavail(void
pgcnt = 0;
for_each_resv_unavail_range(i, , ) {
for (pfn = PFN_DOWN(start); pfn < PFN_UP(end); pfn++) {
+   if (!pfn_valid(pfn))
+   continue;
mm_zero_struct_page(pfn_to_page(pfn));
pgcnt++;
}

[PATCH] mm: check pfn_valid first in zero_resv_unavail

2017-11-29 Thread Dave Young

With latest kernel I get below bug while testing kdump:

[0.00] BUG: unable to handle kernel paging request at ea00034b1040
[0.00] IP: zero_resv_unavail+0xbd/0x126
[0.00] PGD 37b98067 P4D 37b98067 PUD 37b97067 PMD 0 
[0.00] Oops: 0002 [#1] SMP
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-rc1+ #316
[0.00] Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET92WW (2.42 
) 03/03/2017
[0.00] task: 81a0e4c0 task.stack: 81a0
[0.00] RIP: 0010:zero_resv_unavail+0xbd/0x126
[0.00] RSP: :81a03d88 EFLAGS: 00010006
[0.00] RAX:  RBX: ea00034b1040 RCX: 0010
[0.00] RDX:  RSI: 0092 RDI: ea00034b1040
[0.00] RBP: 000d2c41 R08: 00c0 R09: 0a0d
[0.00] R10: 0002 R11: 7f01 R12: 81a03d90
[0.00] R13: ea00 R14: 0063 R15: 0062
[0.00] FS:  () GS:81c73000() 
knlGS:
[0.00] CS:  0010 DS:  ES:  CR0: 80050033
[0.00] CR2: ea00034b1040 CR3: 37609000 CR4: 000606b0
[0.00] Call Trace:
[0.00]  ? free_area_init_nodes+0x640/0x664
[0.00]  ? zone_sizes_init+0x58/0x72
[0.00]  ? setup_arch+0xb50/0xc6c
[0.00]  ? start_kernel+0x64/0x43d
[0.00]  ? secondary_startup_64+0xa5/0xb0
[0.00] Code: c1 e8 0c 48 39 d8 76 27 48 89 de 48 c1 e3 06 48 c7 c7 7a 
87 79 81 e8 b0 c0 3e ff 4c 01 eb b9 10 00 00 00 31 c0 48 89 df 49 ff c6  ab 
eb bc 6a 00 49 
c7 c0 f0 93 d1 81 31 d2 83 ce ff 41 54 49 
[0.00] RIP: zero_resv_unavail+0xbd/0x126 RSP: 81a03d88
[0.00] CR2: ea00034b1040
[0.00] ---[ end trace f5ba9e8f73c7ee26 ]---

This is introduced with below commit:
commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
Author: Pavel Tatashin 
Date:   Wed Nov 15 17:36:31 2017 -0800

mm: zero reserved and unavailable struct pages

The reason is some efi reserved boot ranges is not reported in E820 ram.
In my case it is a bgrt buffer:
efi: mem00: [Boot Data  |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC] 
range=[0xd2c41000-0xd2c85fff] (0MB)

Use "add_efi_memmap" can workaround the problem with another fix:
https://lkml.org/lkml/2017/11/30/5

In zero_resv_unavail it would be better to check pfn_valid first before zero
the page struct. This fixes the problem and potential other similar problems.

Signed-off-by: Dave Young 
---
 mm/page_alloc.c |2 ++
 1 file changed, 2 insertions(+)

--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -6253,6 +6253,8 @@ void __paginginit zero_resv_unavail(void
pgcnt = 0;
for_each_resv_unavail_range(i, , ) {
for (pfn = PFN_DOWN(start); pfn < PFN_UP(end); pfn++) {
+   if (!pfn_valid(pfn))
+   continue;
mm_zero_struct_page(pfn_to_page(pfn));
pgcnt++;
}

Re: [PATCH v3 3/3] arm64: dts: meson-axg: add clock DT info for Meson AXG SoC

2017-11-29 Thread Yixun Lan

Hi Stephen

On 11/30/17 03:35, Stephen Boyd wrote:
> On 11/28, Yixun Lan wrote:
>> diff --git a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi 
>> b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
>> index b932a784b02a..36a2e98338a8 100644
>> --- a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
>> +++ b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
>> @@ -7,6 +7,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  / {
>>  compatible = "amlogic,meson-axg";
>> @@ -148,6 +149,20 @@
>>  #address-cells = <0>;
>>  };
>>  
>> +hiubus: hiubus@ff63c000 {
> 
> Maybe just call the node "bus@ff63c000"?
> 
isn't this just a name? what's the benefits to change?
personally, I tend to keep it this way, because it's better map to the
data sheet

we also has 'aobus', 'cbus' scattered there..

>> +compatible = "simple-bus";
>> +reg = <0x0 0xff63c000 0x0 0x1c00>;
>> +#address-cells = <2>;
>> +#size-cells = <2>;
>> +ranges = <0x0 0x0 0x0 0xff63c000 0x0 0x1c00>;
>> +
>> +clkc: clock-controller@0 {
>> +compatible = "amlogic,axg-clkc";
>> +#clock-cells = <1>;
>> +reg = <0x0 0x0 0x0 0x320>;
>> +};
>> +};
>> +
>>  mailbox: mailbox@ff63dc00 {
>>  compatible = "amlogic,meson-gx-mhu", 
>> "amlogic,meson-gxbb-mhu";
>>  reg = <0 0xff63dc00 0 0x400>;
>> -- 
>> 2.15.0
>>
>

Re: [PATCH v3 3/3] arm64: dts: meson-axg: add clock DT info for Meson AXG SoC

2017-11-29 Thread Yixun Lan

Hi Stephen

On 11/30/17 03:35, Stephen Boyd wrote:
> On 11/28, Yixun Lan wrote:
>> diff --git a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi 
>> b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
>> index b932a784b02a..36a2e98338a8 100644
>> --- a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
>> +++ b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
>> @@ -7,6 +7,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  / {
>>  compatible = "amlogic,meson-axg";
>> @@ -148,6 +149,20 @@
>>  #address-cells = <0>;
>>  };
>>  
>> +hiubus: hiubus@ff63c000 {
> 
> Maybe just call the node "bus@ff63c000"?
> 
isn't this just a name? what's the benefits to change?
personally, I tend to keep it this way, because it's better map to the
data sheet

we also has 'aobus', 'cbus' scattered there..

>> +compatible = "simple-bus";
>> +reg = <0x0 0xff63c000 0x0 0x1c00>;
>> +#address-cells = <2>;
>> +#size-cells = <2>;
>> +ranges = <0x0 0x0 0x0 0xff63c000 0x0 0x1c00>;
>> +
>> +clkc: clock-controller@0 {
>> +compatible = "amlogic,axg-clkc";
>> +#clock-cells = <1>;
>> +reg = <0x0 0x0 0x0 0x320>;
>> +};
>> +};
>> +
>>  mailbox: mailbox@ff63dc00 {
>>  compatible = "amlogic,meson-gx-mhu", 
>> "amlogic,meson-gxbb-mhu";
>>  reg = <0 0xff63dc00 0 0x400>;
>> -- 
>> 2.15.0
>>
>

[PATCH v7 1/4] KVM: X86: Add vCPU running/preempted state

2017-11-29 Thread Wanpeng Li

From: Wanpeng Li 

This patch reuses the preempted field in kvm_steal_time, and will export
the vcpu running/pre-empted information to the guest from host. This will
enable guest to intelligently send ipi to running vcpus and set flag for
pre-empted vcpus. This will prevent waiting for vcpus that are not running.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Peter Zijlstra 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/uapi/asm/kvm_para.h | 3 +++
 arch/x86/kernel/kvm.c| 2 +-
 arch/x86/kvm/x86.c   | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 09cc064..763b692 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -51,6 +51,9 @@ struct kvm_steal_time {
__u32 pad[11];
 };
 
+#define KVM_VCPU_NOT_PREEMPTED  (0 << 0)
+#define KVM_VCPU_PREEMPTED  (1 << 0)
+
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
__s64 sec;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b40ffbf..6610b92 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -643,7 +643,7 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
 {
struct kvm_steal_time *src = _cpu(steal_time, cpu);
 
-   return !!src->preempted;
+   return !!(src->preempted & KVM_VCPU_PREEMPTED);
 }
 PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 990df39..bc501aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2130,7 +2130,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>arch.st.steal, sizeof(struct kvm_steal_time
return;
 
-   vcpu->arch.st.steal.preempted = 0;
+   vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
 
if (vcpu->arch.st.steal.version & 1)
vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
@@ -2907,7 +2907,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu 
*vcpu)
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
-   vcpu->arch.st.steal.preempted = 1;
+   vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
 
kvm_write_guest_offset_cached(vcpu->kvm, >arch.st.stime,
>arch.st.steal.preempted,
-- 
2.7.4

[PATCH v7 1/4] KVM: X86: Add vCPU running/preempted state

2017-11-29 Thread Wanpeng Li

From: Wanpeng Li 

This patch reuses the preempted field in kvm_steal_time, and will export
the vcpu running/pre-empted information to the guest from host. This will
enable guest to intelligently send ipi to running vcpus and set flag for
pre-empted vcpus. This will prevent waiting for vcpus that are not running.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Peter Zijlstra 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/uapi/asm/kvm_para.h | 3 +++
 arch/x86/kernel/kvm.c| 2 +-
 arch/x86/kvm/x86.c   | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 09cc064..763b692 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -51,6 +51,9 @@ struct kvm_steal_time {
__u32 pad[11];
 };
 
+#define KVM_VCPU_NOT_PREEMPTED  (0 << 0)
+#define KVM_VCPU_PREEMPTED  (1 << 0)
+
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
__s64 sec;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b40ffbf..6610b92 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -643,7 +643,7 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
 {
struct kvm_steal_time *src = _cpu(steal_time, cpu);
 
-   return !!src->preempted;
+   return !!(src->preempted & KVM_VCPU_PREEMPTED);
 }
 PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 990df39..bc501aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2130,7 +2130,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>arch.st.steal, sizeof(struct kvm_steal_time
return;
 
-   vcpu->arch.st.steal.preempted = 0;
+   vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
 
if (vcpu->arch.st.steal.version & 1)
vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
@@ -2907,7 +2907,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu 
*vcpu)
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
-   vcpu->arch.st.steal.preempted = 1;
+   vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
 
kvm_write_guest_offset_cached(vcpu->kvm, >arch.st.stime,
>arch.st.steal.preempted,
-- 
2.7.4

[PATCH v7 2/4] KVM: X86: Add Paravirt TLB Shootdown

2017-11-29 Thread Wanpeng Li

From: Wanpeng Li 

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter.

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
so 64 pCPUs, and each VM is 64 vCPUs.

ebizzy -M 
  vanillaoptimized boost
1VM46799   48670 4%
2VM23962   4269178%
3VM16152   37539   132%

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Peter Zijlstra 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/cpuid.txt  |  4 +++
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c| 47 
 3 files changed, 53 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..dcab6dc 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,10 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_PV_TLB_FLUSH   || 9 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || tlb flush.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 763b692..8fbcc16 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_PV_TLB_FLUSH   9
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -53,6 +54,7 @@ struct kvm_steal_time {
 
 #define KVM_VCPU_NOT_PREEMPTED  (0 << 0)
 #define KVM_VCPU_PREEMPTED  (1 << 0)
+#define KVM_VCPU_SHOULD_FLUSH   (1 << 1)
 
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6610b92..64fb9a4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -498,6 +498,34 @@ static void __init kvm_apf_trap_init(void)
update_intr_gate(X86_TRAP_PF, async_page_fault);
 }
 
+static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
+
+static void kvm_flush_tlb_others(const struct cpumask *cpumask,
+   const struct flush_tlb_info *info)
+{
+   u8 state;
+   int cpu;
+   struct kvm_steal_time *src;
+   struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_tlb_mask);
+
+   cpumask_copy(flushmask, cpumask);
+   /*
+* We have to call flush only on online vCPUs. And
+* queue flush_on_enter for pre-empted vCPUs
+*/
+   for_each_cpu(cpu, flushmask) {
+   src = _cpu(steal_time, cpu);
+   state = READ_ONCE(src->preempted);
+   if ((state & KVM_VCPU_PREEMPTED)) {
+   if (try_cmpxchg(>preempted, ,
+   state | KVM_VCPU_SHOULD_FLUSH))
+   __cpumask_clear_cpu(cpu, flushmask);
+   }
+   }
+
+   native_flush_tlb_others(flushmask, info);
+}
+
 static void __init kvm_guest_init(void)
 {
int i;
@@ -517,6 +545,9 @@ static void __init kvm_guest_init(void)
pv_time_ops.steal_clock = kvm_steal_clock;
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH))
+   pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
+
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
@@ -598,6 +629,22 @@ static __init int activate_jump_labels(void)
 }
 arch_initcall(activate_jump_labels);
 
+static __init int

[PATCH v7 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush

2017-11-29 Thread Wanpeng Li

From: Wanpeng Li 

Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush,
it will be used by later patches to just flush guest tlb.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Peter Zijlstra 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm.c  | 14 +++---
 arch/x86/kvm/vmx.c  | 21 +++--
 arch/x86/kvm/x86.c  |  6 +++---
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b97726e..63d34bc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -952,7 +952,7 @@ struct kvm_x86_ops {
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 
-   void (*tlb_flush)(struct kvm_vcpu *vcpu);
+   void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
 
void (*run)(struct kvm_vcpu *vcpu);
int (*handle_exit)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1f3e7f2..14cca8c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -285,7 +285,7 @@ static int vgif = true;
 module_param(vgif, int, 0444);
 
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-static void svm_flush_tlb(struct kvm_vcpu *vcpu);
+static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
 
 static int nested_svm_exit_handled(struct vcpu_svm *svm);
@@ -2035,7 +2035,7 @@ static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
return 1;
 
if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 
vcpu->arch.cr4 = cr4;
if (!npt_enabled)
@@ -2385,7 +2385,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
 
svm->vmcb->control.nested_cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_NPT);
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
@@ -2989,7 +2989,7 @@ static void enter_svm_guest_mode(struct vcpu_svm *svm, 
u64 vmcb_gpa,
svm->nested.intercept_exceptions = 
nested_vmcb->control.intercept_exceptions;
svm->nested.intercept= nested_vmcb->control.intercept;
 
-   svm_flush_tlb(>vcpu);
+   svm_flush_tlb(>vcpu, true);
svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | 
V_INTR_MASKING_MASK;
if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
svm->vcpu.arch.hflags |= HF_VINTR_MASK;
@@ -4785,7 +4785,7 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int 
addr)
return 0;
 }
 
-static void svm_flush_tlb(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -5076,7 +5076,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
 
svm->vmcb->save.cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_CR);
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
@@ -5090,7 +5090,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
svm->vmcb->save.cr3 = kvm_read_cr3(vcpu);
mark_dirty(svm->vmcb, VMCB_CR);
 
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static int is_disabled(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b10d203..8c7e816 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4136,9 +4136,10 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 #endif
 
-static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
+static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
+   bool invalidate_gpa)
 {
-   if (enable_ept) {
+   if (enable_ept && (invalidate_gpa || !enable_vpid)) {
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa));
@@ -4147,15 +4148,15 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu 
*vcpu, int vpid)
}
 }
 
-static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+static void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
-   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid);
+   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid, invalidate_gpa);
 }
 
 static void vmx_flush_tlb_ept_only(struct kvm_vcpu *vcpu)
 {
if (enable_ept)
-   vmx_flush_tlb(vcpu);
+   vmx_flush_tlb(vcpu, true);
 }
 
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
@@

[PATCH v7 4/4] KVM: X86: Add flush_on_enter before guest enter

2017-11-29 Thread Wanpeng Li

From: Wanpeng Li 

PV-Flush guest would indicate to flush on enter, flush the TLB before
entering the guest.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Peter Zijlstra 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   | 21 ++---
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b943711..8834898 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -601,7 +601,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1 << KVM_FEATURE_ASYNC_PF) |
 (1 << KVM_FEATURE_PV_EOI) |
 (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1 << KVM_FEATURE_PV_UNHALT);
+(1 << KVM_FEATURE_PV_UNHALT) |
+(1 << KVM_FEATURE_PV_TLB_FLUSH);
 
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c279530..94c23ae 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2121,6 +2121,12 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
vcpu->arch.pv_time_enabled = false;
 }
 
+static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
+{
+   ++vcpu->stat.tlb_flush;
+   kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
+}
+
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
@@ -2130,7 +2136,14 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>arch.st.steal, sizeof(struct kvm_steal_time
return;
 
-   vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
+   if (xchg(>arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
+   (KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED)) {
+   /*
+* Do TLB_FLUSH before entering the guest, its passed
+* the stage of request checking
+*/
+   kvm_vcpu_flush_tlb(vcpu, false);
+   }
 
if (vcpu->arch.st.steal.version & 1)
vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
@@ -6775,12 +6788,6 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap);
 }
 
-static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
-{
-   ++vcpu->stat.tlb_flush;
-   kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
-}
-
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
struct page *page = NULL;
-- 
2.7.4

[PATCH v7 2/4] KVM: X86: Add Paravirt TLB Shootdown

2017-11-29 Thread Wanpeng Li

From: Wanpeng Li 

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter.

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
so 64 pCPUs, and each VM is 64 vCPUs.

ebizzy -M 
  vanillaoptimized boost
1VM46799   48670 4%
2VM23962   4269178%
3VM16152   37539   132%

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Peter Zijlstra 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/cpuid.txt  |  4 +++
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c| 47 
 3 files changed, 53 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..dcab6dc 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,10 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_PV_TLB_FLUSH   || 9 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || tlb flush.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 763b692..8fbcc16 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_PV_TLB_FLUSH   9
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -53,6 +54,7 @@ struct kvm_steal_time {
 
 #define KVM_VCPU_NOT_PREEMPTED  (0 << 0)
 #define KVM_VCPU_PREEMPTED  (1 << 0)
+#define KVM_VCPU_SHOULD_FLUSH   (1 << 1)
 
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6610b92..64fb9a4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -498,6 +498,34 @@ static void __init kvm_apf_trap_init(void)
update_intr_gate(X86_TRAP_PF, async_page_fault);
 }
 
+static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
+
+static void kvm_flush_tlb_others(const struct cpumask *cpumask,
+   const struct flush_tlb_info *info)
+{
+   u8 state;
+   int cpu;
+   struct kvm_steal_time *src;
+   struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_tlb_mask);
+
+   cpumask_copy(flushmask, cpumask);
+   /*
+* We have to call flush only on online vCPUs. And
+* queue flush_on_enter for pre-empted vCPUs
+*/
+   for_each_cpu(cpu, flushmask) {
+   src = _cpu(steal_time, cpu);
+   state = READ_ONCE(src->preempted);
+   if ((state & KVM_VCPU_PREEMPTED)) {
+   if (try_cmpxchg(>preempted, ,
+   state | KVM_VCPU_SHOULD_FLUSH))
+   __cpumask_clear_cpu(cpu, flushmask);
+   }
+   }
+
+   native_flush_tlb_others(flushmask, info);
+}
+
 static void __init kvm_guest_init(void)
 {
int i;
@@ -517,6 +545,9 @@ static void __init kvm_guest_init(void)
pv_time_ops.steal_clock = kvm_steal_clock;
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH))
+   pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
+
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
@@ -598,6 +629,22 @@ static __init int activate_jump_labels(void)
 }
 arch_initcall(activate_jump_labels);
 
+static __init int kvm_setup_pv_tlb_flush(void)
+{
+   int cpu;
+
+   if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH)) {
+

[PATCH v7 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush

2017-11-29 Thread Wanpeng Li

From: Wanpeng Li 

Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush,
it will be used by later patches to just flush guest tlb.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Peter Zijlstra 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm.c  | 14 +++---
 arch/x86/kvm/vmx.c  | 21 +++--
 arch/x86/kvm/x86.c  |  6 +++---
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b97726e..63d34bc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -952,7 +952,7 @@ struct kvm_x86_ops {
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 
-   void (*tlb_flush)(struct kvm_vcpu *vcpu);
+   void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
 
void (*run)(struct kvm_vcpu *vcpu);
int (*handle_exit)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1f3e7f2..14cca8c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -285,7 +285,7 @@ static int vgif = true;
 module_param(vgif, int, 0444);
 
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-static void svm_flush_tlb(struct kvm_vcpu *vcpu);
+static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
 
 static int nested_svm_exit_handled(struct vcpu_svm *svm);
@@ -2035,7 +2035,7 @@ static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
return 1;
 
if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 
vcpu->arch.cr4 = cr4;
if (!npt_enabled)
@@ -2385,7 +2385,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
 
svm->vmcb->control.nested_cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_NPT);
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
@@ -2989,7 +2989,7 @@ static void enter_svm_guest_mode(struct vcpu_svm *svm, 
u64 vmcb_gpa,
svm->nested.intercept_exceptions = 
nested_vmcb->control.intercept_exceptions;
svm->nested.intercept= nested_vmcb->control.intercept;
 
-   svm_flush_tlb(>vcpu);
+   svm_flush_tlb(>vcpu, true);
svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | 
V_INTR_MASKING_MASK;
if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
svm->vcpu.arch.hflags |= HF_VINTR_MASK;
@@ -4785,7 +4785,7 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int 
addr)
return 0;
 }
 
-static void svm_flush_tlb(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -5076,7 +5076,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
 
svm->vmcb->save.cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_CR);
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
@@ -5090,7 +5090,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
svm->vmcb->save.cr3 = kvm_read_cr3(vcpu);
mark_dirty(svm->vmcb, VMCB_CR);
 
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static int is_disabled(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b10d203..8c7e816 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4136,9 +4136,10 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 #endif
 
-static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
+static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
+   bool invalidate_gpa)
 {
-   if (enable_ept) {
+   if (enable_ept && (invalidate_gpa || !enable_vpid)) {
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa));
@@ -4147,15 +4148,15 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu 
*vcpu, int vpid)
}
 }
 
-static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+static void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
-   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid);
+   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid, invalidate_gpa);
 }
 
 static void vmx_flush_tlb_ept_only(struct kvm_vcpu *vcpu)
 {
if (enable_ept)
-   vmx_flush_tlb(vcpu);
+   vmx_flush_tlb(vcpu, true);
 }
 
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
@@ -4353,7 +4354,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long cr3)

[PATCH v7 4/4] KVM: X86: Add flush_on_enter before guest enter

2017-11-29 Thread Wanpeng Li

From: Wanpeng Li 

PV-Flush guest would indicate to flush on enter, flush the TLB before
entering the guest.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Peter Zijlstra 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   | 21 ++---
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b943711..8834898 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -601,7 +601,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1 << KVM_FEATURE_ASYNC_PF) |
 (1 << KVM_FEATURE_PV_EOI) |
 (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1 << KVM_FEATURE_PV_UNHALT);
+(1 << KVM_FEATURE_PV_UNHALT) |
+(1 << KVM_FEATURE_PV_TLB_FLUSH);
 
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c279530..94c23ae 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2121,6 +2121,12 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
vcpu->arch.pv_time_enabled = false;
 }
 
+static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
+{
+   ++vcpu->stat.tlb_flush;
+   kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
+}
+
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
@@ -2130,7 +2136,14 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>arch.st.steal, sizeof(struct kvm_steal_time
return;
 
-   vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
+   if (xchg(>arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
+   (KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED)) {
+   /*
+* Do TLB_FLUSH before entering the guest, its passed
+* the stage of request checking
+*/
+   kvm_vcpu_flush_tlb(vcpu, false);
+   }
 
if (vcpu->arch.st.steal.version & 1)
vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
@@ -6775,12 +6788,6 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap);
 }
 
-static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
-{
-   ++vcpu->stat.tlb_flush;
-   kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
-}
-
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
struct page *page = NULL;
-- 
2.7.4

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2692 matches

Mail list logo