date:20190821

[PATCH RESEND] i2c: mediatek: disable zero-length transfers for mt8183

2019-08-21 Thread Hsin-Yi Wang

When doing i2cdetect quick write mode, we would get transfer
error ENOMEM, and i2cdetect shows there's no device at the address.
Quoting from mt8183 datasheet, the number of transfers to be
transferred in one transaction should be set to bigger than 1,
so we should forbid zero-length transfer and update functionality.

Incorrect return:
localhost ~ # i2cdetect -q -y 0
 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:  -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

After this patch:
localhost ~ #  i2cdetect -q -y 0
Error: Can't use SMBus Quick Write command on this bus

localhost ~ #  i2cdetect -y 0
Warning: Can't use SMBus Quick Write command, will skip some addresses
 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:
10:
20:
30: -- -- -- -- -- -- -- --
40:
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60:
70:

Reported-by: Alexandru M Stan 
Signed-off-by: Hsin-Yi Wang 
---
Previous patch and discussion:
http://patchwork.ozlabs.org/patch/1042684/
---
 drivers/i2c/busses/i2c-mt65xx.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
index 252edb433fdf..2842ca4b8c3b 100644
--- a/drivers/i2c/busses/i2c-mt65xx.c
+++ b/drivers/i2c/busses/i2c-mt65xx.c
@@ -234,6 +234,10 @@ static const struct i2c_adapter_quirks mt7622_i2c_quirks = 
{
.max_num_msgs = 255,
 };
 
+static const struct i2c_adapter_quirks mt8183_i2c_quirks = {
+   .flags = I2C_AQ_NO_ZERO_LEN,
+};
+
 static const struct mtk_i2c_compatible mt2712_compat = {
.regs = mt_i2c_regs_v1,
.pmic_i2c = 0,
@@ -298,6 +302,7 @@ static const struct mtk_i2c_compatible mt8173_compat = {
 };
 
 static const struct mtk_i2c_compatible mt8183_compat = {
+   .quirks = _i2c_quirks,
.regs = mt_i2c_regs_v2,
.pmic_i2c = 0,
.dcm = 0,
@@ -870,7 +875,11 @@ static irqreturn_t mtk_i2c_irq(int irqno, void *dev_id)
 
 static u32 mtk_i2c_functionality(struct i2c_adapter *adap)
 {
-   return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL;
+   if (adap->quirks->flags & I2C_AQ_NO_ZERO_LEN)
+   return I2C_FUNC_I2C |
+   (I2C_FUNC_SMBUS_EMUL & ~I2C_FUNC_SMBUS_QUICK);
+   else
+   return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL;
 }
 
 static const struct i2c_algorithm mtk_i2c_algorithm = {
@@ -933,8 +942,8 @@ static int mtk_i2c_probe(struct platform_device *pdev)
i2c->dev = >dev;
i2c->adap.dev.parent = >dev;
i2c->adap.owner = THIS_MODULE;
-   i2c->adap.algo = _i2c_algorithm;
i2c->adap.quirks = i2c->dev_comp->quirks;
+   i2c->adap.algo = _i2c_algorithm;
i2c->adap.timeout = 2 * HZ;
i2c->adap.retries = 1;
 
-- 
2.20.1

Re: [PATCH] mm: consolidate pgtable_cache_init() and pgd_cache_init()

2019-08-21 Thread Mike Rapoport

On Wed, Aug 21, 2019 at 06:17:12PM +0200, Marc Gonzalez wrote:
> On 21/08/2019 17:06, Mike Rapoport wrote:
> 
> > Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
> > cache for page table allocations on several architectures that do not use
> > PAGE_SIZE tables for one or more levels of the page table hierarchy.
> > 
> > Most architectures do not implement these functions and use __week default
> 
> s/week/weak  ?

Sure, thanks!

-- 
Sincerely yours,
Mike.

[linux-next][PPC][bisected c7d8b7][gcc 6.4.1] build error at drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1471

2019-08-21 Thread Abdul Haleem

Greeting's

Today's linux-next kernel 5.3.0-rc5-next-20190820 failed to build on my
powerpc machine

Build errors:
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c: In function amdgpu_exit:
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1471:2: error: implicit
declaration of function mmu_notifier_synchronize
[-Werror=implicit-function-declaration]
  mmu_notifier_synchronize();
  ^~~~ 
cc1: some warnings being treated as errors
make[4]: *** [drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o] Error 1
make[3]: *** [drivers/gpu/drm/amd/amdgpu] Error 2

It was introduced with commit c7d8b7 (hmm: use mmu_notifier_get/put for
'struct hmm')

error disappears when commit is reverted.

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre

Re: [PATCH v1 06/63] Input: atmel_mxt_ts - output status from T42 Touch Suppression

2019-08-21 Thread Jiada Wang


Hi

On 2019/08/17 2:34, Dmitry Torokhov wrote:

On Fri, Aug 16, 2019 at 05:30:33PM +0900, Jiada Wang wrote:

From: Nick Dyer 

Signed-off-by: Nick Dyer 
Acked-by: Benson Leung 
Acked-by: Yufeng Shen 
(cherry picked from ndyer/linux/for-upstream commit 
ab95b5a30d2c098daaa9f88d9fcfae7eb516)
Signed-off-by: George G. Davis 
Signed-off-by: Jiada Wang 
---
  drivers/input/touchscreen/atmel_mxt_ts.c | 25 
  1 file changed, 25 insertions(+)

diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c 
b/drivers/input/touchscreen/atmel_mxt_ts.c
index a75c35c6f9f9..9226ec528adf 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -155,6 +155,9 @@ struct t37_debug {
  #define MXT_RESET_VALUE   0x01
  #define MXT_BACKUP_VALUE  0x55
  
+/* Define for MXT_PROCI_TOUCHSUPPRESSION_T42 */

+#define MXT_T42_MSG_TCHSUP BIT(0)
+
  /* T100 Multiple Touch Touchscreen */
  #define MXT_T100_CTRL 0
  #define MXT_T100_CFG1 1
@@ -323,6 +326,8 @@ struct mxt_data {
u8 T9_reportid_max;
u16 T18_address;
u8 T19_reportid;
+   u8 T42_reportid_min;
+   u8 T42_reportid_max;
u16 T44_address;
u8 T48_reportid;
u8 T100_reportid_min;
@@ -978,6 +983,17 @@ static void mxt_proc_t100_message(struct mxt_data *data, 
u8 *message)
data->update_input = true;
  }
  
+static void mxt_proc_t42_messages(struct mxt_data *data, u8 *msg)

+{
+   struct device *dev = >client->dev;
+   u8 status = msg[1];
+
+   if (status & MXT_T42_MSG_TCHSUP)
+   dev_info(dev, "T42 suppress\n");
+   else
+   dev_info(dev, "T42 normal\n");


dev_dbg(). There is no need to flood the logs with this. I'd assume this
is for assisting in bringup. Should there be some more generic way of
monitoring the status?


will replace with dev_dbg() in v2 patchset

thanks,
Jiada

RE: [RFC PATCH] powerpc: Convert ____flush_dcache_icache_phys() to C

2019-08-21 Thread Alastair D'Silva

On Thu, 2019-08-22 at 07:06 +0200, Christophe Leroy wrote:
> 
> Le 22/08/2019 à 02:27, Alastair D'Silva a écrit :
> > On Wed, 2019-08-21 at 22:27 +0200, Christophe Leroy wrote:
> > > Le 20/08/2019 à 06:36, Alastair D'Silva a écrit :
> > > > On Fri, 2019-08-16 at 15:52 +, Christophe Leroy wrote:
> > > 
> > > [...]
> > > 
> > > > Thanks Christophe,
> > > > 
> > > > I'm trying a somewhat different approach that requires less
> > > > knowledge
> > > > of assembler. Handling of CPU_FTR_COHERENT_ICACHE is outside
> > > > this
> > > > function. The code below is not a patch as my tree is a bit
> > > > messy,
> > > > sorry:
> > > 
> > > Can we be 100% sure that GCC won't add any code accessing some
> > > global
> > > data or stack while the Data MMU is OFF ?
> > > 
> > > Christophe
> > > 
> > 
> > +mpe
> > 
> > I'm not sure how we would go about making such a guarantee, but
> > I've
> > tied every variable used to a register and addr is passed in a
> > register, so there is no stack usage, and every call in there only
> > operates on it's operands.
> > 
> > The calls to the inline cache helpers (for the PPC32 case) are all
> > constants, so I can't see a reasonable scenario where there would
> > be a
> > function call and reordered to after the DR bit is turned off, but
> > I
> > guess if we want to be paranoid, we could always add an mb() call
> > before the DR bit is manipulated to prevent the compiler from
> > reordering across the section where the data MMU is disabled.
> > 
> > 
> 
> Anyway, I think the benefit of converting that function to C is
> pretty 
> small. flush_dcache_range() and friends were converted to C mainly
> in 
> order to inline them. But this __flush_dcache_icache_phys() is too
> big 
> to be worth inlining, yet small and stable enough to remain in
> assembly 
> for the time being.
> 
I disagree on this point, after converting it to C, using
44x/currituck.defconfig, the compiler definitely will inline it (noting
that there is only 1 caller of it):

0134 :
 134:   94 21 ff f0 stwur1,-16(r1)
 138:   3d 20 00 00 lis r9,0
 13c:   81 29 00 00 lwz r9,0(r9)
 140:   7c 08 02 a6 mflrr0
 144:   38 81 00 0c addir4,r1,12
 148:   90 01 00 14 stw r0,20(r1)
 14c:   91 21 00 0c stw r9,12(r1)
 150:   48 00 00 01 bl  150 
 154:   39 00 00 20 li  r8,32
 158:   39 43 10 00 addir10,r3,4096
 15c:   7c 69 1b 78 mr  r9,r3
 160:   7d 09 03 a6 mtctr   r8
 164:   7c 00 48 6c dcbst   0,r9
 168:   39 29 00 80 addir9,r9,128
 16c:   42 00 ff f8 bdnz164 
 170:   7c 00 04 ac hwsync
 174:   7c 69 1b 78 mr  r9,r3
 178:   7c 00 4f ac icbi0,r9
 17c:   39 29 00 80 addir9,r9,128
 180:   7f 8a 48 40 cmplw   cr7,r10,r9
 184:   40 9e ff f4 bne cr7,178 
 188:   7c 00 04 ac hwsync
 18c:   4c 00 01 2c isync
 190:   80 01 00 14 lwz r0,20(r1)
 194:   38 21 00 10 addir1,r1,16
 198:   7c 08 03 a6 mtlrr0
 19c:   48 00 00 00 b   19c 


> So I suggest you keep it aside your series for now, just move 
> PURGE_PREFETCHED_INS inside it directly as it will be the only
> remaining 
> user of it.
> 
> Christophe

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819

[linux-next][PPC][bisected c7d8b7][gcc 6.4.1] build error at drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1471

2019-08-21 Thread Abdul Haleem

Greeting's

Today's linux-next kernel 5.3.0-rc5-next-20190820 failed to build on my
powerpc machine

Build errors:
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c: In function amdgpu_exit:
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1471:2: error: implicit
declaration of function mmu_notifier_synchronize
[-Werror=implicit-function-declaration]
  mmu_notifier_synchronize();
  ^~~~ 
cc1: some warnings being treated as errors
make[4]: *** [drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o] Error 1
make[3]: *** [drivers/gpu/drm/amd/amdgpu] Error 2

It was introduced with commit c7d8b7 (hmm: use mmu_notifier_get/put for
'struct hmm')

Reverting the commit fixes the issue.

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre

Re: [PATCH v3 1/2] dt-bindings: phy: intel-emmc-phy: Add YAML schema for LGM eMMC PHY

2019-08-21 Thread Ramuthevar, Vadivel MuruganX


Hi Rob,

On 21/8/2019 9:35 PM, Rob Herring wrote:

On Wed, Aug 21, 2019 at 5:11 AM Ramuthevar,Vadivel MuruganX
 wrote:

From: Ramuthevar Vadivel Murugan 

Add a YAML schema to use the host controller driver with the
eMMC PHY on Intel's Lightning Mountain SoC.

Signed-off-by: Ramuthevar Vadivel Murugan 

---
changes in v3:
   - resolve 'make dt_binding_check' warnings

changes in v2:
   As per Rob Herring review comments, the following updates
  - change GPL-2.0 -> (GPL-2.0-only OR BSD-2-Clause)
  - filename is the compatible string plus .yaml
  - LGM: Lightning Mountain
  - update maintainer
  - add intel,syscon under property list
  - keep one example instead of two
---
  .../bindings/phy/intel,lgm-emmc-phy.yaml   | 59 ++
  1 file changed, 59 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/phy/intel,lgm-emmc-phy.yaml

diff --git a/Documentation/devicetree/bindings/phy/intel,lgm-emmc-phy.yaml 
b/Documentation/devicetree/bindings/phy/intel,lgm-emmc-phy.yaml
new file mode 100644
index ..9342e33d8b02
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/intel,lgm-emmc-phy.yaml
@@ -0,0 +1,59 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/phy/intel,lgm-emmc-phy.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Intel Lightning Mountain(LGM) eMMC PHY Device Tree Bindings
+
+maintainers:
+  - Ramuthevar Vadivel Murugan 
+
+properties:
+  "#phy-cells":
+const: 0
+
+  compatible:
+const: intel,lgm-emmc-phy
+
+  reg:
+maxItems: 1
+
+  syscon:

intel,syscon like the example. You must have used 5.2 as on 5.3-rc the
example will fail validation.
Thanks for the review comments,  used 5.3 for validation, after 
addressing the below comments

once again validate on both 5.2 and 5.3 as well.

+items:

Drop items as there is only 1.

agreed

+  $ref: "/schemas/types.yaml#definitions/phandle"
+
+  clocks:
+items:
+  - description: e-MMC phy module clock
+
+  clock-names:
+items:
+  - const: emmcclk
+
+  resets:
+maxItems: 1
+
+required:
+  - "#phy-cells"
+  - compatible
+  - reg
+  - clocks
+  - clock-names
+  - resets
+  - ref

Not documented.


Agreed, will update

With Best Regards
Vadivel

+
+additionalProperties: false
+
+examples:
+  - |
+emmc_phy: emmc_phy {
+compatible = "intel,lgm-emmc-phy";
+reg = <0xe002 0x100>;
+intel,syscon = <>;
+clocks = <>;
+clock-names = "emmcclk";
+#phy-cells = <0>;
+};
+
+...
--
2.11.0

Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread Atish Patra

On Thu, 2019-08-22 at 06:27 +0200, h...@lst.de wrote:
> On Thu, Aug 22, 2019 at 04:01:24AM +, Atish Patra wrote:
> > The downside of this is that for every !cmask case in true SMP
> > (more
> > common probably) it will execute 2 extra cpumask instructions. As
> > tlbflush path is in performance critical path, I think we should
> > favor
> > more common case (SMP with more than 1 core).
> 
> Actually, looking at both the current mainline code, and the code
> from my
> cleanups tree I don't think remote_sfence_vma / __sbi_tlb_flush_range
> can ever be called with  NULL cpumask, as we always have a valid mm.
> 

Yes. You are correct.

As both cpumask functions here will crash if cpumask is null, we should
probably leave a harmless comment to warn the consequeunce of cpumask
being null.

> So this is a bit of a moot point, and we can drop andling that case
> entirely.  With that we can also use a simple if / else for the local
> cpu only vs remote case. 

Done.

>  Btw, what was the reason you didn't like
> using cpumask_any_but like x86, which should be more efficient than
> cpumask_test_cpu + hweigt?

I had it in v2 patch but removed as it can potentially return garbage
value if cpumask is empty. 

However, we are already checking empty cpumask before the local cpu
check. I will replace cpumask_test_cpu + hweight with
cpumask_any_but().

-- 
Regards,
Atish

[PATCH] x86/Hyper-V: Fix build error with CONFIG_HYPERV_TSCPAGE=N

2019-08-21 Thread lantianyu1986

From: Tianyu Lan 

Both Hyper-V tsc page and Hyper-V tsc MSR code use variable
hv_sched_clock_offset for their sched clock callback and so
define the variable regardless of CONFIG_HYPERV_TSCPAGE setting.

Signed-off-by: Tianyu Lan 
---
This patch is based on the top of 
"git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
timers/core".

 drivers/clocksource/hyperv_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/hyperv_timer.c 
b/drivers/clocksource/hyperv_timer.c
index dad8af198e20..c322ab4d3689 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -22,6 +22,7 @@
 #include 
 
 static struct clock_event_device __percpu *hv_clock_event;
+static u64 hv_sched_clock_offset __ro_after_init;
 
 /*
  * If false, we're using the old mechanism for stimer0 interrupts
@@ -215,7 +216,6 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 #ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
-static u64 hv_sched_clock_offset __ro_after_init;
 
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
-- 
2.14.5

Re: [PATCH v2] fs: fs_parser: avoid NULL param->string to kstrtouint

2019-08-21 Thread Al Viro

On Wed, Aug 21, 2019 at 09:22:49PM -0700, Eric Biggers wrote:
> > > diff --git a/fs/fs_parser.c b/fs/fs_parser.c
> > > index 83b66c9e9a24..7498a44f18c0 100644
> > > --- a/fs/fs_parser.c
> > > +++ b/fs/fs_parser.c
> > > @@ -206,6 +206,9 @@ int fs_parse(struct fs_context *fc,
> > >   case fs_param_is_fd: {
> > >   switch (param->type) {
> > >   case fs_value_is_string:
> > > + if (!result->has_value)
> > > + goto bad_value;
> > > +
> > >   ret = kstrtouint(param->string, 0, >uint_32);
> > >   break;
> > >   case fs_value_is_file:
> > > -- 
> > > 2.17.1
> > 
> > Reviewed-by: Eric Biggers 
> > 
> > Al, can you please apply this patch?
> > 
> > - Eric
> 
> Ping.  Al, when are you going to apply this?

Sits in the local queue.  Sorry, got seriously sidetracked into
configfs mess lately, will update for-next tomorrow and push
it out.

Re: [PATCH v1 04/63] Input: atmel_mxt_ts - split large i2c transfers into blocks

2019-08-21 Thread Jiada Wang


Hi Dmitry

On 2019/08/17 2:18, Dmitry Torokhov wrote:

On Fri, Aug 16, 2019 at 05:28:53PM +0900, Jiada Wang wrote:

From: Nick Dyer 

On some firmware variants, the size of the info block exceeds what can
be read in a single transfer.

Signed-off-by: Nick Dyer 
(cherry picked from ndyer/linux/for-upstream commit 
74c4f5277cfa403d43fafc404119dc57a08677db)
[gdavis: Forward port and fix conflicts due to v4.14.51 commit
 960fe000b1d3 ("Input: atmel_mxt_ts - fix the firmware
 update").]
Signed-off-by: George G. Davis 
Signed-off-by: Jiada Wang 
---
  drivers/input/touchscreen/atmel_mxt_ts.c | 27 +---
  1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c 
b/drivers/input/touchscreen/atmel_mxt_ts.c
index 9b165d23e092..2d70ddf71cd9 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -40,7 +40,7 @@
  #define MXT_OBJECT_START  0x07
  #define MXT_OBJECT_SIZE   6
  #define MXT_INFO_CHECKSUM_SIZE3
-#define MXT_MAX_BLOCK_WRITE256
+#define MXT_MAX_BLOCK_WRITE255
  
  /* Object types */

  #define MXT_DEBUG_DIAGNOSTIC_T37  37
@@ -659,6 +659,27 @@ static int __mxt_read_reg(struct i2c_client *client,
return ret;
  }
  
+static int mxt_read_blks(struct mxt_data *data, u16 start, u16 count, u8 *buf)


Can we call this __mxt_read_reg() and the original read reg call
__mxt_read_chunk()?


yes, I will update in v2 patch-set,
so that every call to __mxt_read_reg() in atmel driver,
can have the feature to split large size transfer.

Thanks,
Jiada


+{
+   u16 offset = 0;
+   int error;
+   u16 size;
+
+   while (offset < count) {
+   size = min(MXT_MAX_BLOCK_WRITE, count - offset);
+
+   error = __mxt_read_reg(data->client,
+  start + offset,
+  size, buf + offset);
+   if (error)
+   return error;
+
+   offset += size;
+   }
+
+   return 0;
+}


Thanks.

[PATCH net-next,v4, 4/6] net/mlx5: Add HV VHCA infrastructure

2019-08-21 Thread Haiyang Zhang

From: Eran Ben Elisha 

HV VHCA is a layer which provides PF to VF communication channel based on
HyperV PCI config channel. It implements Mellanox's Inter VHCA control
communication protocol. The protocol contains control block in order to
pass messages between the PF and VF drivers, and data blocks in order to
pass actual data.

The infrastructure is agent based. Each agent will be responsible of
contiguous buffer blocks in the VHCA config space. This infrastructure will
bind agents to their blocks, and those agents can only access read/write
the buffer blocks assigned to them. Each agent will provide three
callbacks (control, invalidate, cleanup). Control will be invoked when
block-0 is invalidated with a command that concerns this agent. Invalidate
callback will be invoked if one of the blocks assigned to this agent was
invalidated. Cleanup will be invoked before the agent is being freed in
order to clean all of its open resources or deferred works.

Block-0 serves as the control block. All execution commands from the PF
will be written by the PF over this block. VF will ack on those by
writing on block-0 as well. Its format is described by struct
mlx5_hv_vhca_control_block layout.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Haiyang Zhang 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c  | 253 +
 .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h  | 102 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   7 +
 include/linux/mlx5/driver.h|   2 +
 5 files changed, 365 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index fd32a5b..8d443fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -45,7 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o 
eswitch_offloads.o eswitch_offlo
 mlx5_core-$(CONFIG_MLX5_MPFS)  += lib/mpfs.o
 mlx5_core-$(CONFIG_VXLAN)  += lib/vxlan.o
 mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
-mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o
+mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o lib/hv_vhca.o
 
 #
 # Ipoib netdev
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
new file mode 100644
index 000..84d1d75
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018 Mellanox Technologies
+
+#include 
+#include "mlx5_core.h"
+#include "lib/hv.h"
+#include "lib/hv_vhca.h"
+
+struct mlx5_hv_vhca {
+   struct mlx5_core_dev   *dev;
+   struct workqueue_struct*work_queue;
+   struct mlx5_hv_vhca_agent  *agents[MLX5_HV_VHCA_AGENT_MAX];
+   struct mutexagents_lock; /* Protect agents array */
+};
+
+struct mlx5_hv_vhca_work {
+   struct work_struct invalidate_work;
+   struct mlx5_hv_vhca   *hv_vhca;
+   u64block_mask;
+};
+
+struct mlx5_hv_vhca_data_block {
+   u16 sequence;
+   u16 offset;
+   u8  reserved[4];
+   u64 data[15];
+};
+
+struct mlx5_hv_vhca_agent {
+   enum mlx5_hv_vhca_agent_type type;
+   struct mlx5_hv_vhca *hv_vhca;
+   void*priv;
+   u16  seq;
+   void (*control)(struct mlx5_hv_vhca_agent *agent,
+   struct mlx5_hv_vhca_control_block *block);
+   void (*invalidate)(struct mlx5_hv_vhca_agent *agent,
+  u64 block_mask);
+   void (*cleanup)(struct mlx5_hv_vhca_agent *agent);
+};
+
+struct mlx5_hv_vhca *mlx5_hv_vhca_create(struct mlx5_core_dev *dev)
+{
+   struct mlx5_hv_vhca *hv_vhca = NULL;
+
+   hv_vhca = kzalloc(sizeof(*hv_vhca), GFP_KERNEL);
+   if (!hv_vhca)
+   return ERR_PTR(-ENOMEM);
+
+   hv_vhca->work_queue = create_singlethread_workqueue("mlx5_hv_vhca");
+   if (!hv_vhca->work_queue) {
+   kfree(hv_vhca);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   hv_vhca->dev = dev;
+   mutex_init(_vhca->agents_lock);
+
+   return hv_vhca;
+}
+
+void mlx5_hv_vhca_destroy(struct mlx5_hv_vhca *hv_vhca)
+{
+   if (IS_ERR_OR_NULL(hv_vhca))
+   return;
+
+   destroy_workqueue(hv_vhca->work_queue);
+   kfree(hv_vhca);
+}
+
+static void mlx5_hv_vhca_invalidate_work(struct work_struct *work)
+{
+   struct mlx5_hv_vhca_work *hwork;
+   struct mlx5_hv_vhca *hv_vhca;
+   int i;
+
+   hwork = container_of(work, struct mlx5_hv_vhca_work, invalidate_work);
+

[PATCH net-next,v4, 5/6] net/mlx5: Add HV VHCA control agent

2019-08-21 Thread Haiyang Zhang

From: Eran Ben Elisha 

Control agent is responsible over of the control block (ID 0). It should
update the PF via this block about every capability change. In addition,
upon block 0 invalidate, it should activate all other supported agents
with data requests from the PF.

Upon agent create/destroy, the invalidate callback of the control agent
is being called in order to update the PF driver about this change.

The control agent is an integral part of HV VHCA and will be created
and destroy as part of the HV VHCA init/cleanup flow.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Haiyang Zhang 
---
 .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c  | 122 -
 .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h  |   1 +
 2 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
index 84d1d75..4047629 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
@@ -109,22 +109,131 @@ void mlx5_hv_vhca_invalidate(void *context, u64 
block_mask)
queue_work(hv_vhca->work_queue, >invalidate_work);
 }
 
+#define AGENT_MASK(type) (type ? BIT(type - 1) : 0 /* control */)
+
+static void mlx5_hv_vhca_agents_control(struct mlx5_hv_vhca *hv_vhca,
+   struct mlx5_hv_vhca_control_block 
*block)
+{
+   int i;
+
+   for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
+   struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
+
+   if (!agent || !agent->control)
+   continue;
+
+   if (!(AGENT_MASK(agent->type) & block->control))
+   continue;
+
+   agent->control(agent, block);
+   }
+}
+
+static void mlx5_hv_vhca_capabilities(struct mlx5_hv_vhca *hv_vhca,
+ u32 *capabilities)
+{
+   int i;
+
+   for (i = 0; i < MLX5_HV_VHCA_AGENT_MAX; i++) {
+   struct mlx5_hv_vhca_agent *agent = hv_vhca->agents[i];
+
+   if (agent)
+   *capabilities |= AGENT_MASK(agent->type);
+   }
+}
+
+static void
+mlx5_hv_vhca_control_agent_invalidate(struct mlx5_hv_vhca_agent *agent,
+ u64 block_mask)
+{
+   struct mlx5_hv_vhca *hv_vhca = agent->hv_vhca;
+   struct mlx5_core_dev *dev = hv_vhca->dev;
+   struct mlx5_hv_vhca_control_block *block;
+   u32 capabilities = 0;
+   int err;
+
+   block = kzalloc(sizeof(*block), GFP_KERNEL);
+   if (!block)
+   return;
+
+   err = mlx5_hv_read_config(dev, block, sizeof(*block), 0);
+   if (err)
+   goto free_block;
+
+   mlx5_hv_vhca_capabilities(hv_vhca, );
+
+   /* In case no capabilities, send empty block in return */
+   if (!capabilities) {
+   memset(block, 0, sizeof(*block));
+   goto write;
+   }
+
+   if (block->capabilities != capabilities)
+   block->capabilities = capabilities;
+
+   if (block->control & ~capabilities)
+   goto free_block;
+
+   mlx5_hv_vhca_agents_control(hv_vhca, block);
+   block->command_ack = block->command;
+
+write:
+   mlx5_hv_write_config(dev, block, sizeof(*block), 0);
+
+free_block:
+   kfree(block);
+}
+
+static struct mlx5_hv_vhca_agent *
+mlx5_hv_vhca_control_agent_create(struct mlx5_hv_vhca *hv_vhca)
+{
+   return mlx5_hv_vhca_agent_create(hv_vhca, MLX5_HV_VHCA_AGENT_CONTROL,
+NULL,
+mlx5_hv_vhca_control_agent_invalidate,
+NULL, NULL);
+}
+
+static void mlx5_hv_vhca_control_agent_destroy(struct mlx5_hv_vhca_agent 
*agent)
+{
+   mlx5_hv_vhca_agent_destroy(agent);
+}
+
 int mlx5_hv_vhca_init(struct mlx5_hv_vhca *hv_vhca)
 {
+   struct mlx5_hv_vhca_agent *agent;
+   int err;
+
if (IS_ERR_OR_NULL(hv_vhca))
return IS_ERR_OR_NULL(hv_vhca);
 
-   return mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
-  mlx5_hv_vhca_invalidate);
+   err = mlx5_hv_register_invalidate(hv_vhca->dev, hv_vhca,
+ mlx5_hv_vhca_invalidate);
+   if (err)
+   return err;
+
+   agent = mlx5_hv_vhca_control_agent_create(hv_vhca);
+   if (IS_ERR_OR_NULL(agent)) {
+   mlx5_hv_unregister_invalidate(hv_vhca->dev);
+   return IS_ERR_OR_NULL(agent);
+   }
+
+   hv_vhca->agents[MLX5_HV_VHCA_AGENT_CONTROL] = agent;
+
+   return 0;
 }
 
 void mlx5_hv_vhca_cleanup(struct mlx5_hv_vhca *hv_vhca)
 {
+   struct mlx5_hv_vhca_agent *agent;
int i;
 
if (IS_ERR_OR_NULL(hv_vhca))
return;
 
+   agent =

[PATCH net-next,v4, 2/6] PCI: hv: Add a Hyper-V PCI interface driver for software backchannel interface

2019-08-21 Thread Haiyang Zhang

This interface driver is a helper driver allows other drivers to
have a common interface with the Hyper-V PCI frontend driver.

Signed-off-by: Haiyang Zhang 
Signed-off-by: Saeed Mahameed 
---
 MAINTAINERS  |  1 +
 drivers/pci/Kconfig  |  1 +
 drivers/pci/controller/Kconfig   |  7 
 drivers/pci/controller/Makefile  |  1 +
 drivers/pci/controller/pci-hyperv-intf.c | 67 
 drivers/pci/controller/pci-hyperv.c  | 12 --
 include/linux/hyperv.h   | 30 ++
 7 files changed, 108 insertions(+), 11 deletions(-)
 create mode 100644 drivers/pci/controller/pci-hyperv-intf.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a406947..9860853 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7469,6 +7469,7 @@ F:drivers/hid/hid-hyperv.c
 F: drivers/hv/
 F: drivers/input/serio/hyperv-keyboard.c
 F: drivers/pci/controller/pci-hyperv.c
+F: drivers/pci/controller/pci-hyperv-intf.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/uio/uio_hv_generic.c
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 2ab9240..c313de9 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -182,6 +182,7 @@ config PCI_LABEL
 config PCI_HYPERV
 tristate "Hyper-V PCI Frontend"
 depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+   select PCI_HYPERV_INTERFACE
 help
   The PCI device frontend driver allows the kernel to import arbitrary
   PCI devices from a PCI backend to support PCI driver domains.
diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index fe9f9f1..70e0782 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -281,5 +281,12 @@ config VMD
  To compile this driver as a module, choose M here: the
  module will be called vmd.
 
+config PCI_HYPERV_INTERFACE
+   tristate "Hyper-V PCI Interface"
+   depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
+   help
+ The Hyper-V PCI Interface is a helper driver allows other drivers to
+ have a common interface with the Hyper-V PCI frontend driver.
+
 source "drivers/pci/controller/dwc/Kconfig"
 endmenu
diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
index d56a507..a2a22c9 100644
--- a/drivers/pci/controller/Makefile
+++ b/drivers/pci/controller/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_PCIE_CADENCE_HOST) += pcie-cadence-host.o
 obj-$(CONFIG_PCIE_CADENCE_EP) += pcie-cadence-ep.o
 obj-$(CONFIG_PCI_FTPCI100) += pci-ftpci100.o
 obj-$(CONFIG_PCI_HYPERV) += pci-hyperv.o
+obj-$(CONFIG_PCI_HYPERV_INTERFACE) += pci-hyperv-intf.o
 obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
 obj-$(CONFIG_PCI_AARDVARK) += pci-aardvark.o
 obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
diff --git a/drivers/pci/controller/pci-hyperv-intf.c 
b/drivers/pci/controller/pci-hyperv-intf.c
new file mode 100644
index 000..cc96be4
--- /dev/null
+++ b/drivers/pci/controller/pci-hyperv-intf.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) Microsoft Corporation.
+ *
+ * Author:
+ *   Haiyang Zhang 
+ *
+ * This small module is a helper driver allows other drivers to
+ * have a common interface with the Hyper-V PCI frontend driver.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+
+struct hyperv_pci_block_ops hvpci_block_ops;
+EXPORT_SYMBOL_GPL(hvpci_block_ops);
+
+int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
+   unsigned int block_id, unsigned int *bytes_returned)
+{
+   if (!hvpci_block_ops.read_block)
+   return -EOPNOTSUPP;
+
+   return hvpci_block_ops.read_block(dev, buf, buf_len, block_id,
+ bytes_returned);
+}
+EXPORT_SYMBOL_GPL(hyperv_read_cfg_blk);
+
+int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
+unsigned int block_id)
+{
+   if (!hvpci_block_ops.write_block)
+   return -EOPNOTSUPP;
+
+   return hvpci_block_ops.write_block(dev, buf, len, block_id);
+}
+EXPORT_SYMBOL_GPL(hyperv_write_cfg_blk);
+
+int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
+   void (*block_invalidate)(void *context,
+u64 block_mask))
+{
+   if (!hvpci_block_ops.reg_blk_invalidate)
+   return -EOPNOTSUPP;
+
+   return hvpci_block_ops.reg_blk_invalidate(dev, context,
+ block_invalidate);
+}
+EXPORT_SYMBOL_GPL(hyperv_reg_block_invalidate);
+
+static void __exit exit_hv_pci_intf(void)
+{
+}
+
+static int __init init_hv_pci_intf(void)
+{
+   return 0;
+}
+
+module_init(init_hv_pci_intf);
+module_exit(exit_hv_pci_intf);
+
+MODULE_DESCRIPTION("Hyper-V PCI

[PATCH net-next,v4, 6/6] net/mlx5e: Add mlx5e HV VHCA stats agent

2019-08-21 Thread Haiyang Zhang

From: Eran Ben Elisha 

HV VHCA stats agent is responsible on running a preiodic rx/tx
packets/bytes stats update. Currently the supported format is version
MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data
transfer from the VF to the PF.

The reporter fetch the statistics data from all opened channels, fill it
in a buffer and send it to mlx5_hv_vhca_write_agent.

As the stats layer should include some metadata per block (sequence and
offset), the HV VHCA layer shall modify the buffer before actually send it
over block 1.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Haiyang Zhang 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  13 ++
 .../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 162 +
 .../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h |  25 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   3 +
 .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h  |   1 +
 6 files changed, 205 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 8d443fc..f4de9cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -36,6 +36,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o 
en/port_buffer.o
 mlx5_core-$(CONFIG_MLX5_ESWITCH) += en_rep.o en_tc.o en/tc_tun.o 
lib/port_tun.o lag_mp.o \
lib/geneve.o en/tc_tun_vxlan.o 
en/tc_tun_gre.o \
en/tc_tun_geneve.o 
diag/en_tc_tracepoint.o
+mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o
 
 #
 # Core extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 7316571..4467927 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -54,6 +54,7 @@
 #include "mlx5_core.h"
 #include "en_stats.h"
 #include "en/fs.h"
+#include "lib/hv_vhca.h"
 
 extern const struct net_device_ops mlx5e_netdev_ops;
 struct page_pool;
@@ -782,6 +783,15 @@ struct mlx5e_modify_sq_param {
int rl_index;
 };
 
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+struct mlx5e_hv_vhca_stats_agent {
+   struct mlx5_hv_vhca_agent *agent;
+   struct delayed_workwork;
+   u16delay;
+   void  *buf;
+};
+#endif
+
 struct mlx5e_xsk {
/* UMEMs are stored separately from channels, because we don't want to
 * lose them when channels are recreated. The kernel also stores UMEMs,
@@ -853,6 +863,9 @@ struct mlx5e_priv {
struct devlink_health_reporter *tx_reporter;
struct devlink_health_reporter *rx_reporter;
struct mlx5e_xsk   xsk;
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+   struct mlx5e_hv_vhca_stats_agent stats_agent;
+#endif
 };
 
 struct mlx5e_profile {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
new file mode 100644
index 000..c37b4ac
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018 Mellanox Technologies
+
+#include "en.h"
+#include "en/hv_vhca_stats.h"
+#include "lib/hv_vhca.h"
+#include "lib/hv.h"
+
+struct mlx5e_hv_vhca_per_ring_stats {
+   u64 rx_packets;
+   u64 rx_bytes;
+   u64 tx_packets;
+   u64 tx_bytes;
+};
+
+static void
+mlx5e_hv_vhca_fill_ring_stats(struct mlx5e_priv *priv, int ch,
+ struct mlx5e_hv_vhca_per_ring_stats *data)
+{
+   struct mlx5e_channel_stats *stats;
+   int tc;
+
+   stats = >channel_stats[ch];
+   data->rx_packets = stats->rq.packets;
+   data->rx_bytes   = stats->rq.bytes;
+
+   for (tc = 0; tc < priv->max_opened_tc; tc++) {
+   data->tx_packets += stats->sq[tc].packets;
+   data->tx_bytes   += stats->sq[tc].bytes;
+   }
+}
+
+static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, u64 *data,
+int buf_len)
+{
+   int ch, i = 0;
+
+   for (ch = 0; ch < priv->max_nch; ch++) {
+   u64 *buf = data + i;
+
+   if (WARN_ON_ONCE(buf +
+sizeof(struct mlx5e_hv_vhca_per_ring_stats) >
+data + buf_len))
+   return;
+
+   mlx5e_hv_vhca_fill_ring_stats(priv, ch,
+ (struct 
mlx5e_hv_vhca_per_ring_stats *)buf);
+   i += sizeof(struct mlx5e_hv_vhca_per_ring_stats) /

Re: [RFC PATCH] powerpc: Convert ____flush_dcache_icache_phys() to C

2019-08-21 Thread Christophe Leroy





Le 22/08/2019 à 02:27, Alastair D'Silva a écrit :

On Wed, 2019-08-21 at 22:27 +0200, Christophe Leroy wrote:


Le 20/08/2019 à 06:36, Alastair D'Silva a écrit :

On Fri, 2019-08-16 at 15:52 +, Christophe Leroy wrote:


[...]



Thanks Christophe,

I'm trying a somewhat different approach that requires less
knowledge
of assembler. Handling of CPU_FTR_COHERENT_ICACHE is outside this
function. The code below is not a patch as my tree is a bit messy,
sorry:


Can we be 100% sure that GCC won't add any code accessing some
global
data or stack while the Data MMU is OFF ?

Christophe



+mpe

I'm not sure how we would go about making such a guarantee, but I've
tied every variable used to a register and addr is passed in a
register, so there is no stack usage, and every call in there only
operates on it's operands.

The calls to the inline cache helpers (for the PPC32 case) are all
constants, so I can't see a reasonable scenario where there would be a
function call and reordered to after the DR bit is turned off, but I
guess if we want to be paranoid, we could always add an mb() call
before the DR bit is manipulated to prevent the compiler from
reordering across the section where the data MMU is disabled.




Anyway, I think the benefit of converting that function to C is pretty 
small. flush_dcache_range() and friends were converted to C mainly in 
order to inline them. But this __flush_dcache_icache_phys() is too big 
to be worth inlining, yet small and stable enough to remain in assembly 
for the time being.


So I suggest you keep it aside your series for now, just move 
PURGE_PREFETCHED_INS inside it directly as it will be the only remaining 
user of it.


Christophe

[PATCH net-next,v4, 1/6] PCI: hv: Add a paravirtual backchannel in software

2019-08-21 Thread Haiyang Zhang

From: Dexuan Cui 

Windows SR-IOV provides a backchannel mechanism in software for communication
between a VF driver and a PF driver.  These "configuration blocks" are
similar in concept to PCI configuration space, but instead of doing reads and
writes in 32-bit chunks through a very slow path, packets of up to 128 bytes
can be sent or received asynchronously.

Nearly every SR-IOV device contains just such a communications channel in
hardware, so using this one in software is usually optional.  Using the
software channel, however, allows driver implementers to leverage software
tools that fuzz the communications channel looking for vulnerabilities.

The usage model for these packets puts the responsibility for reading or
writing on the VF driver.  The VF driver sends a read or a write packet,
indicating which "block" is being referred to by number.

If the PF driver wishes to initiate communication, it can "invalidate" one or
more of the first 64 blocks.  This invalidation is delivered via a callback
supplied by the VF driver by this driver.

No protocol is implied, except that supplied by the PF and VF drivers.

Signed-off-by: Jake Oshins 
Signed-off-by: Dexuan Cui 
Cc: Haiyang Zhang 
Cc: K. Y. Srinivasan 
Cc: Stephen Hemminger 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Haiyang Zhang 
---
 drivers/pci/controller/pci-hyperv.c | 302 
 include/linux/hyperv.h  |  15 ++
 2 files changed, 317 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index 40b6254..57adeca 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -365,6 +365,39 @@ struct pci_delete_interrupt {
struct tran_int_desc int_desc;
 } __packed;
 
+/*
+ * Note: the VM must pass a valid block id, wslot and bytes_requested.
+ */
+struct pci_read_block {
+   struct pci_message message_type;
+   u32 block_id;
+   union win_slot_encoding wslot;
+   u32 bytes_requested;
+} __packed;
+
+struct pci_read_block_response {
+   struct vmpacket_descriptor hdr;
+   u32 status;
+   u8 bytes[HV_CONFIG_BLOCK_SIZE_MAX];
+} __packed;
+
+/*
+ * Note: the VM must pass a valid block id, wslot and byte_count.
+ */
+struct pci_write_block {
+   struct pci_message message_type;
+   u32 block_id;
+   union win_slot_encoding wslot;
+   u32 byte_count;
+   u8 bytes[HV_CONFIG_BLOCK_SIZE_MAX];
+} __packed;
+
+struct pci_dev_inval_block {
+   struct pci_incoming_message incoming;
+   union win_slot_encoding wslot;
+   u64 block_mask;
+} __packed;
+
 struct pci_dev_incoming {
struct pci_incoming_message incoming;
union win_slot_encoding wslot;
@@ -499,6 +532,9 @@ struct hv_pci_dev {
struct hv_pcibus_device *hbus;
struct work_struct wrk;
 
+   void (*block_invalidate)(void *context, u64 block_mask);
+   void *invalidate_context;
+
/*
 * What would be observed if one wrote 0x to a BAR and then
 * read it back, for each of the BAR offsets within config space.
@@ -817,6 +853,256 @@ static int hv_pcifront_write_config(struct pci_bus *bus, 
unsigned int devfn,
.write = hv_pcifront_write_config,
 };
 
+/*
+ * Paravirtual backchannel
+ *
+ * Hyper-V SR-IOV provides a backchannel mechanism in software for
+ * communication between a VF driver and a PF driver.  These
+ * "configuration blocks" are similar in concept to PCI configuration space,
+ * but instead of doing reads and writes in 32-bit chunks through a very slow
+ * path, packets of up to 128 bytes can be sent or received asynchronously.
+ *
+ * Nearly every SR-IOV device contains just such a communications channel in
+ * hardware, so using this one in software is usually optional.  Using the
+ * software channel, however, allows driver implementers to leverage software
+ * tools that fuzz the communications channel looking for vulnerabilities.
+ *
+ * The usage model for these packets puts the responsibility for reading or
+ * writing on the VF driver.  The VF driver sends a read or a write packet,
+ * indicating which "block" is being referred to by number.
+ *
+ * If the PF driver wishes to initiate communication, it can "invalidate" one 
or
+ * more of the first 64 blocks.  This invalidation is delivered via a callback
+ * supplied by the VF driver by this driver.
+ *
+ * No protocol is implied, except that supplied by the PF and VF drivers.
+ */
+
+struct hv_read_config_compl {
+   struct hv_pci_compl comp_pkt;
+   void *buf;
+   unsigned int len;
+   unsigned int bytes_returned;
+};
+
+/**
+ * hv_pci_read_config_compl() - Invoked when a response packet
+ * for a read config block operation arrives.
+ * @context:   Identifies the read config operation
+ * @resp:  The response packet itself
+ * @resp_packet_size:  Size in bytes of the response packet
+ */
+static void hv_pci_read_config_compl(void

[PATCH net-next,v4, 3/6] net/mlx5: Add wrappers for HyperV PCIe operations

2019-08-21 Thread Haiyang Zhang

From: Eran Ben Elisha 

Add wrapper functions for HyperV PCIe read / write /
block_invalidate_register operations.  This will be used as an
infrastructure in the downstream patch for software communication.

This will be enabled by default if CONFIG_PCI_HYPERV_INTERFACE is set.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Haiyang Zhang 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c | 64 
 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h | 22 
 3 files changed, 87 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index bcf3655..fd32a5b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -45,6 +45,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH)   += eswitch.o 
eswitch_offloads.o eswitch_offlo
 mlx5_core-$(CONFIG_MLX5_MPFS)  += lib/mpfs.o
 mlx5_core-$(CONFIG_VXLAN)  += lib/vxlan.o
 mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
+mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += lib/hv.o
 
 #
 # Ipoib netdev
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
new file mode 100644
index 000..cf08d02
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2018 Mellanox Technologies
+
+#include 
+#include "mlx5_core.h"
+#include "lib/hv.h"
+
+static int mlx5_hv_config_common(struct mlx5_core_dev *dev, void *buf, int len,
+int offset, bool read)
+{
+   int rc = -EOPNOTSUPP;
+   int bytes_returned;
+   int block_id;
+
+   if (offset % HV_CONFIG_BLOCK_SIZE_MAX || len % HV_CONFIG_BLOCK_SIZE_MAX)
+   return -EINVAL;
+
+   block_id = offset / HV_CONFIG_BLOCK_SIZE_MAX;
+
+   rc = read ?
+hyperv_read_cfg_blk(dev->pdev, buf,
+HV_CONFIG_BLOCK_SIZE_MAX, block_id,
+_returned) :
+hyperv_write_cfg_blk(dev->pdev, buf,
+ HV_CONFIG_BLOCK_SIZE_MAX, block_id);
+
+   /* Make sure len bytes were read successfully  */
+   if (read)
+   rc |= !(len == bytes_returned);
+
+   if (rc) {
+   mlx5_core_err(dev, "Failed to %s hv config, err = %d, len = %d, 
offset = %d\n",
+ read ? "read" : "write", rc, len,
+ offset);
+   return rc;
+   }
+
+   return 0;
+}
+
+int mlx5_hv_read_config(struct mlx5_core_dev *dev, void *buf, int len,
+   int offset)
+{
+   return mlx5_hv_config_common(dev, buf, len, offset, true);
+}
+
+int mlx5_hv_write_config(struct mlx5_core_dev *dev, void *buf, int len,
+int offset)
+{
+   return mlx5_hv_config_common(dev, buf, len, offset, false);
+}
+
+int mlx5_hv_register_invalidate(struct mlx5_core_dev *dev, void *context,
+   void (*block_invalidate)(void *context,
+u64 block_mask))
+{
+   return hyperv_reg_block_invalidate(dev->pdev, context,
+  block_invalidate);
+}
+
+void mlx5_hv_unregister_invalidate(struct mlx5_core_dev *dev)
+{
+   hyperv_reg_block_invalidate(dev->pdev, NULL, NULL);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h 
b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
new file mode 100644
index 000..f9a4557
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __LIB_HV_H__
+#define __LIB_HV_H__
+
+#if IS_ENABLED(CONFIG_PCI_HYPERV_INTERFACE)
+
+#include 
+#include 
+
+int mlx5_hv_read_config(struct mlx5_core_dev *dev, void *buf, int len,
+   int offset);
+int mlx5_hv_write_config(struct mlx5_core_dev *dev, void *buf, int len,
+int offset);
+int mlx5_hv_register_invalidate(struct mlx5_core_dev *dev, void *context,
+   void (*block_invalidate)(void *context,
+u64 block_mask));
+void mlx5_hv_unregister_invalidate(struct mlx5_core_dev *dev);
+#endif
+
+#endif /* __LIB_HV_H__ */
-- 
1.8.3.1

[PATCH net-next,v4, 0/6] Add software backchannel and mlx5e HV VHCA stats

2019-08-21 Thread Haiyang Zhang

This patch set adds paravirtual backchannel in software in pci_hyperv,
which is required by the mlx5e driver HV VHCA stats agent.

The stats agent is responsible on running a periodic rx/tx packets/bytes
stats update.

Dexuan Cui (1):
  PCI: hv: Add a paravirtual backchannel in software

Eran Ben Elisha (4):
  net/mlx5: Add wrappers for HyperV PCIe operations
  net/mlx5: Add HV VHCA infrastructure
  net/mlx5: Add HV VHCA control agent
  net/mlx5e: Add mlx5e HV VHCA stats agent

Haiyang Zhang (1):
  PCI: hv: Add a Hyper-V PCI interface driver for software backchannel
interface

 MAINTAINERS|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  13 +
 .../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 162 +
 .../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h |  25 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c   |  64 
 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h   |  22 ++
 .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c  | 371 +
 .../net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h  | 104 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   7 +
 drivers/pci/Kconfig|   1 +
 drivers/pci/controller/Kconfig |   7 +
 drivers/pci/controller/Makefile|   1 +
 drivers/pci/controller/pci-hyperv-intf.c   |  67 
 drivers/pci/controller/pci-hyperv.c| 308 +
 include/linux/hyperv.h |  29 ++
 include/linux/mlx5/driver.h|   2 +
 18 files changed, 1189 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
 create mode 100644 drivers/pci/controller/pci-hyperv-intf.c

-- 
1.8.3.1

Re: [PATCH -next] cpufreq: qcom-hw: remove set but not used variable 'prev_cc'

2019-08-21 Thread Sibi Sankar


@YueHaibing thanks for the patch.

On 2019-08-22 08:10, Viresh Kumar wrote:

On 21-08-19, 20:14, YueHaibing wrote:
drivers/cpufreq/qcom-cpufreq-hw.c: In function 
qcom_cpufreq_hw_read_lut:

drivers/cpufreq/qcom-cpufreq-hw.c:89:38: warning:
 variable prev_cc set but not used [-Wunused-but-set-variable]

It is not used since commit 3003e75a5045 ("cpufreq:
qcom-hw: Update logic to detect turbo frequency")

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c 
b/drivers/cpufreq/qcom-cpufreq-hw.c

index 3eea197..a9ae2f8 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -86,7 +86,7 @@ static int qcom_cpufreq_hw_read_lut(struct device 
*cpu_dev,

struct cpufreq_policy *policy,
void __iomem *base)
 {
-	u32 data, src, lval, i, core_count, prev_cc = 0, prev_freq = 0, 
freq;

+   u32 data, src, lval, i, core_count, prev_freq = 0, freq;
u32 volt;
struct cpufreq_frequency_table  *table;

@@ -139,7 +139,6 @@ static int qcom_cpufreq_hw_read_lut(struct device 
*cpu_dev,

break;
}

-   prev_cc = core_count;
prev_freq = freq;
}


@Sibi, you fine with this change ? I will merge it with the original 
patch then.


yes the changes seem fine, I missed
removing prev_cc.

--
-- Sibi Sankar --
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

Re: [PATCH 1/3] x86,mm/pat: Use generic interval trees

2019-08-21 Thread Davidlohr Bueso


On Wed, 21 Aug 2019, Michel Lespinasse wrote:


On Tue, Aug 13, 2019 at 03:46:18PM -0700, Davidlohr Bueso wrote:

o The border cases for overlapping differ -- interval trees are closed,
while memtype intervals are open. We need to maintain semantics such
that conflict detection and getting the lowest match does not change.


Agree on the need to maintain semantics.

As I had commented some time ago, I wish the interval trees used [start,end)
intervals instead of [start,last] - it would be a better fit for basically
all of the current interval tree users.


Yes, after going through all the users of interval-tree,  I agree that
they all want to use [start,end intervals.



I'm not sure where to go with this - would it make sense to add a new
interval tree header file that uses [start,end) intervals (with the
thought of eventually converting all current interval tree users to it)
instead of adding one more use of the less-natural [start,last]
interval trees ?


It might be the safest way, although I really hate having another
header file for interval_tree... The following is a diffstat of a
tentative conversion (I'll send the patch separately); I'm not sure
if a single shot conversion would be acceptable, albeit with relevant
maintainer acks.

drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  |  8 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  5 +++--
drivers/gpu/drm/drm_mm.c|  2 +-
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 13 +
drivers/gpu/drm/radeon/radeon_mn.c  | 10 +++---
drivers/gpu/drm/radeon/radeon_vm.c  |  2 +-
drivers/infiniband/hw/hfi1/mmu_rb.c | 12 ++--
drivers/iommu/virtio-iommu.c|  4 ++--
drivers/vhost/vhost.c   |  6 +++---
include/drm/drm_mm.h|  2 +-
include/linux/interval_tree_generic.h   | 28 ++--
mm/interval_tree.c  |  2 +-
mm/rmap.c   |  2 +-
13 files changed, 42 insertions(+), 54 deletions(-)

This gets rid of 'end - 1' trick from the users and converts
cond1 and cond2 checks in interval_tree_generic.h

Note that I think amdgpu_vm.c actually uses fully open intervals.




diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index fa16036fa592..1be4d1856a9b 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -12,7 +12,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 

@@ -34,68 +34,41 @@
  * memtype_lock protects the rbtree.
  */

-static struct rb_root memtype_rbroot = RB_ROOT;
+static struct rb_root_cached memtype_rbroot = RB_ROOT_CACHED;
+
+#define START(node) ((node)->start)
+#define END(node)  ((node)->end)
+INTERVAL_TREE_DEFINE(struct memtype, rb, u64, subtree_max_end,
+START, END, static, memtype_interval)

 static int is_node_overlap(struct memtype *node, u64 start, u64 end)
 {
-   if (node->start >= end || node->end <= start)
+   /*
+* Unlike generic interval trees, the memtype nodes are ]a, b[


I think the memtype nodes are [a, b)  (which one could also write as [a, b[
depending on their local customs - but either way, closed on the start side
and open on the end side) ?


+* therefore we need to adjust the ranges accordingly. Missing
+* an overlap can lead to incorrectly detecting conflicts,
+* for example.
+*/
+   if (node->start + 1 >= end || node->end - 1 <= start)
return 0;

return 1;
 }


All right, now I am *really* confused.

My understanding is as follows:
* the PAT code wants to use [start, end( intervals
* interval trees are defined to use [start, last] intervals with last == end-1


Yes, we're talking about the same thing, but I overcomplicated things by
considering memtype lookups to be different than the nodes in the tree;
which obviously doesn't make sense... it is actually [a,b[ as you mention.



At first, I thought that you were handling that by removing 1 from the
end of the interval, to adjust between the PAT and interval tree
definitions. But, I don't see you doing that anywhere.


This should have been my first approach.



Then, I thought that you were using [start, end( intervals everywhere,
and the interval tree functions memtype_interval_iter_first and
memtype_interval_iter_next would just return too many candidate
matches as as you are passing "end" instead of "last" == end-1 as the
interval endpoint, but then you would filter out the extra intervals
using is_node_overlap(). But, if that is the case, then I don't
understand why you need to redefine is_node_overlap() here.


My original expectation was to actually remove a lot more of pat_rbtree,
including the is_node_overlap() and the filtering. Yes, I think this can
be done if the interval-tree is converted to [a,b[ and we can thus
just iterate the tree seamlessly.



Could you help me out by defining if the intervals are open or

[PATCH 1/6] kbuild: remove 'Using ... as source for kernel' message

2019-08-21 Thread Masahiro Yamada

You already know the location of the source tree without this message.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Makefile b/Makefile
index 7e54a821b4b0..a77102e4ee90 100644
--- a/Makefile
+++ b/Makefile
@@ -1118,7 +1118,6 @@ PHONY += prepare archprepare prepare3
 # 1) Check that make has not been executed in the kernel src $(srctree)
 prepare3: include/config/kernel.release
 ifdef building_out_of_srctree
-   @$(kecho) '  Using $(srctree) as source for kernel'
$(Q)if [ -f $(srctree)/.config -o \
 -d $(srctree)/include/config -o \
 -d $(srctree)/arch/$(SRCARCH)/include/generated ]; then \
-- 
2.17.1

[PATCH 3/6] kbuild: clarify where to run make mrproper when out-of-tree fails

2019-08-21 Thread Masahiro Yamada

If you try out-of-tree build with an unclean source tree, Kbuild
suggests to run make mrproper. The path to the source tree may be
shown with a relative path, for example, "make O=foo" emits the
following:

  .. is not clean, please run 'make mrproper'
  in the '..' directory.

This is somewhat confusing if you ran "make O=foo" in the source tree.
Using the absolute path will be clearer.

This commit changes the error message like follows:

***
*** The source tree is not clean, please run 'make mrproper'
*** in /absolute/path/to/linux
***

Signed-off-by: Masahiro Yamada 
---

 Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index d9cbbc27d4ba..901fcb8fffbe 100644
--- a/Makefile
+++ b/Makefile
@@ -1121,8 +1121,10 @@ ifdef building_out_of_srctree
$(Q)if [ -f $(srctree)/.config -o \
 -d $(srctree)/include/config -o \
 -d $(srctree)/arch/$(SRCARCH)/include/generated ]; then \
-   echo >&2 "  $(srctree) is not clean, please run 'make$(if 
$(findstring command line, $(origin ARCH)), ARCH=$(ARCH)) mrproper'"; \
-   echo >&2 "  in the '$(srctree)' directory.";\
+   echo >&2 "***"; \
+   echo >&2 "*** The source tree is not clean, please run 
'make$(if $(findstring command line, $(origin ARCH)), ARCH=$(ARCH)) mrproper'"; 
\
+   echo >&2 "*** in $(abs_srctree)";\
+   echo >&2 "***"; \
/bin/false; \
fi;
 endif
-- 
2.17.1

[PATCH 5/6] kbuild: remove prepare3 target

2019-08-21 Thread Masahiro Yamada

Now prepare3 does nothing but depends on include/config/kernel.release

Signed-off-by: Masahiro Yamada 
---

 Makefile | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index ca6851f5ebc9..960df4d35b15 100644
--- a/Makefile
+++ b/Makefile
@@ -1121,11 +1121,9 @@ scripts: scripts_basic scripts_dtc
 # archprepare is used in arch Makefiles and when processed asm symlink,
 # version.h and scripts_basic is processed / created.
 
-PHONY += prepare archprepare prepare3
+PHONY += prepare archprepare
 
-prepare3: include/config/kernel.release
-
-archprepare: archheaders archscripts scripts prepare3 outputmakefile \
+archprepare: archheaders archscripts scripts include/config/kernel.release 
outputmakefile \
asm-generic $(version_h) $(autoksyms_h) include/generated/utsrelease.h
 
 prepare0: archprepare
@@ -1261,11 +1259,11 @@ endif
 
 ifneq ($(dtstree),)
 
-%.dtb: prepare3 scripts_dtc
+%.dtb: include/config/kernel.release scripts_dtc
$(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
 
 PHONY += dtbs dtbs_install dt_binding_check
-dtbs dtbs_check: prepare3 scripts_dtc
+dtbs dtbs_check: include/config/kernel.release scripts_dtc
$(Q)$(MAKE) $(build)=$(dtstree)
 
 dtbs_check: export CHECK_DTBS=1
-- 
2.17.1

[PATCH 4/6] kbuild: move the clean srctree check to the outputmakefile target

2019-08-21 Thread Masahiro Yamada

With this commit, the error report is shown earlier, even before
running kconfig.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/Makefile b/Makefile
index 901fcb8fffbe..ca6851f5ebc9 100644
--- a/Makefile
+++ b/Makefile
@@ -522,6 +522,7 @@ scripts_basic:
$(Q)rm -f .tmp_quiet_recordmcount
 
 PHONY += outputmakefile
+# Before starting out-of-tree build, make sure the source tree is clean.
 # outputmakefile generates a Makefile in the output directory, if using a
 # separate output directory. This allows convenient use of make in the
 # output directory.
@@ -529,6 +530,15 @@ PHONY += outputmakefile
 # ignore whole output directory
 outputmakefile:
 ifdef building_out_of_srctree
+   $(Q)if [ -f $(srctree)/.config -o \
+-d $(srctree)/include/config -o \
+-d $(srctree)/arch/$(SRCARCH)/include/generated ]; then \
+   echo >&2 "***"; \
+   echo >&2 "*** The source tree is not clean, please run 
'make$(if $(findstring command line, $(origin ARCH)), ARCH=$(ARCH)) mrproper'"; 
\
+   echo >&2 "*** in $(abs_srctree)";\
+   echo >&2 "***"; \
+   false; \
+   fi
$(Q)ln -fsn $(srctree) source
$(Q)$(CONFIG_SHELL) $(srctree)/scripts/mkmakefile $(srctree)
$(Q)test -e .gitignore || \
@@ -1113,21 +1123,7 @@ scripts: scripts_basic scripts_dtc
 
 PHONY += prepare archprepare prepare3
 
-# prepare3 is used to check if we are building in a separate output directory,
-# and if so do:
-# 1) Check that make has not been executed in the kernel src $(srctree)
 prepare3: include/config/kernel.release
-ifdef building_out_of_srctree
-   $(Q)if [ -f $(srctree)/.config -o \
--d $(srctree)/include/config -o \
--d $(srctree)/arch/$(SRCARCH)/include/generated ]; then \
-   echo >&2 "***"; \
-   echo >&2 "*** The source tree is not clean, please run 
'make$(if $(findstring command line, $(origin ARCH)), ARCH=$(ARCH)) mrproper'"; 
\
-   echo >&2 "*** in $(abs_srctree)";\
-   echo >&2 "***"; \
-   /bin/false; \
-   fi;
-endif
 
 archprepare: archheaders archscripts scripts prepare3 outputmakefile \
asm-generic $(version_h) $(autoksyms_h) include/generated/utsrelease.h
-- 
2.17.1

[PATCH 6/6] kbuild: check clean srctree even earlier

2019-08-21 Thread Masahiro Yamada

Move the outputmakefile target to the leftmost in the prerequisite list
so that this is checked first. There is no guarantee that Make runs the
prerequisites from left to right, but at least the released versions of
GNU Make work like that when the parallel build option is not given.

Of course, when the parallel option -j given, other targets will be run
simultaneously but it is nice to show the error as early as possible.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 960df4d35b15..089983a8a028 100644
--- a/Makefile
+++ b/Makefile
@@ -581,10 +581,10 @@ ifdef config-build
 include arch/$(SRCARCH)/Makefile
 export KBUILD_DEFCONFIG KBUILD_KCONFIG CC_VERSION_TEXT
 
-config: scripts_basic outputmakefile FORCE
+config: outputmakefile scripts_basic FORCE
$(Q)$(MAKE) $(build)=scripts/kconfig $@
 
-%config: scripts_basic outputmakefile FORCE
+%config: outputmakefile scripts_basic FORCE
$(Q)$(MAKE) $(build)=scripts/kconfig $@
 
 else #!config-build
@@ -1123,7 +1123,7 @@ scripts: scripts_basic scripts_dtc
 
 PHONY += prepare archprepare
 
-archprepare: archheaders archscripts scripts include/config/kernel.release 
outputmakefile \
+archprepare: outputmakefile archheaders archscripts scripts 
include/config/kernel.release \
asm-generic $(version_h) $(autoksyms_h) include/generated/utsrelease.h
 
 prepare0: archprepare
-- 
2.17.1

[PATCH 2/6] kbuild: Inform user to pass ARCH= for make mrproper only when necessary

2019-08-21 Thread Masahiro Yamada

Since commit 3a475b2166fd ("kbuild: Inform user to pass ARCH= for make
mrproper"), if you try out-of-tree build with an unclean source tree,
it suggests to run 'make ARCH= mrproper'.

This looks odd when you are not cross-compiling the kernel. Show the
'ARCH=' part only when ARCH= was given from the command line.
If ARCH is the default (native build) or came from the environment,
it should simply suggest 'make mrproper' as before.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index a77102e4ee90..d9cbbc27d4ba 100644
--- a/Makefile
+++ b/Makefile
@@ -1121,7 +1121,7 @@ ifdef building_out_of_srctree
$(Q)if [ -f $(srctree)/.config -o \
 -d $(srctree)/include/config -o \
 -d $(srctree)/arch/$(SRCARCH)/include/generated ]; then \
-   echo >&2 "  $(srctree) is not clean, please run 'make 
ARCH=$(ARCH) mrproper'"; \
+   echo >&2 "  $(srctree) is not clean, please run 'make$(if 
$(findstring command line, $(origin ARCH)), ARCH=$(ARCH)) mrproper'"; \
echo >&2 "  in the '$(srctree)' directory.";\
/bin/false; \
fi;
-- 
2.17.1

Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread h...@lst.de

On Thu, Aug 22, 2019 at 04:01:24AM +, Atish Patra wrote:
> The downside of this is that for every !cmask case in true SMP (more
> common probably) it will execute 2 extra cpumask instructions. As
> tlbflush path is in performance critical path, I think we should favor
> more common case (SMP with more than 1 core).

Actually, looking at both the current mainline code, and the code from my
cleanups tree I don't think remote_sfence_vma / __sbi_tlb_flush_range
can ever be called with  NULL cpumask, as we always have a valid mm.

So this is a bit of a moot point, and we can drop andling that case
entirely.  With that we can also use a simple if / else for the local
cpu only vs remote case.  Btw, what was the reason you didn't like
using cpumask_any_but like x86, which should be more efficient than
cpumask_test_cpu + hweigt?

Re: [PATCH v3 3/3] RISC-V: Do not invoke SBI call if cpumask is empty

2019-08-21 Thread Atish Patra

On Thu, 2019-08-22 at 03:51 +0200, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 05:46:44PM -0700, Atish Patra wrote:
> > SBI calls are expensive. If cpumask is empty, there is no need to
> > trap via SBI as no remote tlb flushing is required.
> > 
> > Signed-off-by: Atish Patra 
> > ---
> >  arch/riscv/mm/tlbflush.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index 9f58b3790baa..2bd3c418d769 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -21,6 +21,9 @@ static void __sbi_tlb_flush_range(struct cpumask
> > *cmask, unsigned long start,
> > goto issue_sfence;
> > }
> >  
> > +   if (cpumask_empty(cmask))
> > +   goto done;
> 
> I think this can even be done before the get_cpu to optimize it a
> little
> further.

Yeah. I can just return directly in this case and call get_cpu after
this. Thanks for the suggestion.


-- 
Regards,
Atish

Re: [PATCH v2] fs: fs_parser: avoid NULL param->string to kstrtouint

2019-08-21 Thread Eric Biggers

[trimmed Cc list a bit]

On Thu, Aug 15, 2019 at 07:46:56PM -0700, Eric Biggers wrote:
> On Sat, Jul 20, 2019 at 07:29:49AM +0800, Yin Fengwei wrote:
> > syzbot reported general protection fault in kstrtouint:
> > https://lkml.org/lkml/2019/7/18/328
> > 
> > From the log, if the mount option is something like:
> >fd,
> > 
> > The default parameter (which has NULL param->string) will be
> > passed to vfs_parse_fs_param. Finally, this NULL param->string
> > is passed to kstrtouint and trigger NULL pointer access.
> > 
> > Reported-by: syzbot+398343b7c1b1b9892...@syzkaller.appspotmail.com
> > Fixes: 71cbb7570a9a ("vfs: Move the subtype parameter into fuse")
> > 
> > Signed-off-by: Yin Fengwei 
> > ---
> > ChangeLog:
> >  v1 -> v2:
> >- Fix typo in v1
> >- Remove braces {} from single statement blocks
> > 
> >  fs/fs_parser.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/fs/fs_parser.c b/fs/fs_parser.c
> > index 83b66c9e9a24..7498a44f18c0 100644
> > --- a/fs/fs_parser.c
> > +++ b/fs/fs_parser.c
> > @@ -206,6 +206,9 @@ int fs_parse(struct fs_context *fc,
> > case fs_param_is_fd: {
> > switch (param->type) {
> > case fs_value_is_string:
> > +   if (!result->has_value)
> > +   goto bad_value;
> > +
> > ret = kstrtouint(param->string, 0, >uint_32);
> > break;
> > case fs_value_is_file:
> > -- 
> > 2.17.1
> 
> Reviewed-by: Eric Biggers 
> 
> Al, can you please apply this patch?
> 
> - Eric

Ping.  Al, when are you going to apply this?

- Eric

RE: [PATCH net-next,v3, 0/6] Add software backchannel and mlx5e HV VHCA stats

2019-08-21 Thread Haiyang Zhang




> -Original Message-
> From: linux-hyperv-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of David Miller
> Sent: Wednesday, August 21, 2019 9:09 PM
> To: Haiyang Zhang 
> Cc: sas...@kernel.org; sae...@mellanox.com; l...@kernel.org;
> era...@mellanox.com; lorenzo.pieral...@arm.com; bhelg...@google.com;
> linux-...@vger.kernel.org; linux-hyp...@vger.kernel.org;
> net...@vger.kernel.org; KY Srinivasan ; Stephen
> Hemminger ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next,v3, 0/6] Add software backchannel and mlx5e
> HV VHCA stats
> 
> From: Haiyang Zhang 
> Date: Wed, 21 Aug 2019 00:23:19 +
> 
> > This patch set adds paravirtual backchannel in software in pci_hyperv,
> > which is required by the mlx5e driver HV VHCA stats agent.
> >
> > The stats agent is responsible on running a periodic rx/tx
> > packets/bytes stats update.
> 
> These patches don't apply cleanly to net-next, probably due to some recent
> mlx5 driver changes.
> 
> Please respin.

I will do.
Thanks,

- Haiyang

Re: [PATCH net-next,v3, 0/6] Add software backchannel and mlx5e HV VHCA stats

2019-08-21 Thread David Miller

From: Haiyang Zhang 
Date: Wed, 21 Aug 2019 00:23:19 +

> This patch set adds paravirtual backchannel in software in pci_hyperv,
> which is required by the mlx5e driver HV VHCA stats agent.
> 
> The stats agent is responsible on running a periodic rx/tx packets/bytes
> stats update.

These patches don't apply cleanly to net-next, probably due to some recent
mlx5 driver changes.

Please respin.

Re: [PATCH] arm: skip nomap memblocks while finding the lowmem/highmem boundary

2019-08-21 Thread Chester Lin

On Thu, Aug 22, 2019 at 11:45:34AM +0800, Chester Lin wrote:
> adjust_lowmem_bounds() checks every memblocks in order to find the boundary
> between lowmem and highmem. However some memblocks could be marked as NOMAP
> so they are not used by kernel, which should be skipped while calculating
> the boundary.
> 
> Signed-off-by: Chester Lin 
> ---
>  arch/arm/mm/mmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 426d9085396b..b86dba44d828 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1181,6 +1181,9 @@ void __init adjust_lowmem_bounds(void)
>   phys_addr_t block_start = reg->base;
>   phys_addr_t block_end = reg->base + reg->size;
>  
> + if (memblock_is_nomap(reg))
> + continue;
> +
>   if (reg->base < vmalloc_limit) {
>   if (block_end > lowmem_limit)
>   /*
> -- 
> 2.22.0
>

Hi Russell, Mike and Ard,

Per the discussion in the thread "[PATH] efi/arm: fix allocation failure ...",
(https://lkml.org/lkml/2019/8/21/163), I presume that the change to disregard
NOMAP memblocks in adjust_lowmem_bounds() should be separated as a single patch.

Please let me know if any suggestion, thank you.

Re: [PATCH] selftests: net: add missing NFT_FWD_NETDEV to config

2019-08-21 Thread David Miller

From: Anders Roxell 
Date: Tue, 20 Aug 2019 15:41:02 +0200

> When running xfrm_policy.sh we see the following
> 
>  # sysctl cannot stat /proc/sys/net/ipv4/conf/eth1/forwarding No such file or 
> directory
>  cannot: stat_/proc/sys/net/ipv4/conf/eth1/forwarding #

I don't understand how a netfilter config options is going to make that
generic ipv4 protocol per-device sysctl appear.

If it's unrelated to your change, don't include it in the commit message
as it is confusing.

Thank you.

Re: [PATCH v3 2/3] RISC-V: Issue a tlb page flush if possible

2019-08-21 Thread Atish Patra

On Thu, 2019-08-22 at 03:50 +0200, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 05:46:43PM -0700, Atish Patra wrote:
> > +   if (size <= PAGE_SIZE && size != -1)
> > +   local_flush_tlb_page(start);
> > +   else
> > +   local_flush_tlb_all();
> 
> As Andreas pointed out (unsigned long)-1 is actually larger than
> PAGE_SIZE, so we don't need the extra check.

Ahh yes. Sorry I missed his comment in the earlier email. Fixed it.

-- 
Regards,
Atish

Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread Atish Patra

On Thu, 2019-08-22 at 03:46 +0200, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 05:46:42PM -0700, Atish Patra wrote:
> > In RISC-V, tlb flush happens via SBI which is expensive. If the
> > local
> > cpu is the only cpu in cpumask, there is no need to invoke a SBI
> > call.
> > 
> > Just do a local flush and return.
> > 
> > Signed-off-by: Atish Patra 
> > ---
> >  arch/riscv/mm/tlbflush.c | 15 +++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index df93b26f1b9d..36430ee3bed9 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -2,6 +2,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  void flush_tlb_all(void)
> > @@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask
> > *cmask, unsigned long start,
> > unsigned long size)
> >  {
> > struct cpumask hmask;
> > +   unsigned int cpuid = get_cpu();
> >  
> > +   if (!cmask) {
> > +   riscv_cpuid_to_hartid_mask(cpu_online_mask, );
> > +   goto issue_sfence;
> > +   }
> > +
> > +   if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) ==
> > 1) {
> > +   local_flush_tlb_all();
> > +   goto done;
> > +   }
> 
> I think a single core on a SMP kernel is a valid enough use case
> given
> how litte distros still have UP kernels.  So Maybe this shiuld rather
> be:
> 
>   if (!cmask)
>   cmask = cpu_online_mask;
> 
>   if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) ==
> 1) {
>   local_flush_tlb_all();
>   } else {
>   riscv_cpuid_to_hartid_mask(cmask, );
>   sbi_remote_sfence_vma(hmask.bits, start, size);
>   }

The downside of this is that for every !cmask case in true SMP (more
common probably) it will execute 2 extra cpumask instructions. As
tlbflush path is in performance critical path, I think we should favor
more common case (SMP with more than 1 core).

Thoughts ?

-- 
Regards,
Atish

[PATCH] kbuild: get rid of $(realpath ...) from scripts/mkmakefile

2019-08-21 Thread Masahiro Yamada

Both relative path and absolute path have pros and cons. For example,
we can move the source and objtree around together by using the
relative path to the source tree.

Do not force the absolute path to the source tree. If you prefer the
absolute path, you can specify KBUILD_ABS_SRCTREE=1.

Signed-off-by: Masahiro Yamada 
---

 scripts/mkmakefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/mkmakefile b/scripts/mkmakefile
index 4d0faebb1719..1cb174751429 100755
--- a/scripts/mkmakefile
+++ b/scripts/mkmakefile
@@ -12,6 +12,6 @@ if [ "${quiet}" != "silent_" ]; then
 fi
 
 cat << EOF > Makefile
-# Automatically generated by $(realpath $0): don't edit
-include $(realpath $1/Makefile)
+# Automatically generated by $0: don't edit
+include $1/Makefile
 EOF
-- 
2.17.1

Re: [PATCH net-next v3 0/4] Improve phc2sys precision for mv88e6xxx switch in combination with imx6-fec

2019-08-21 Thread David Miller

From: Hubert Feurstein 
Date: Tue, 20 Aug 2019 10:48:29 +0200

> From: Hubert Feurstein 
> 
> Changelog:
>  v3: mv88e6xxx_smi_indirect_write: forward ptp_sts only on the last write
>  Copied Miroslav Lichvar because of PTP offset compensation patch
>  v2: Added patch for PTP offset compensation
>  Removed mdiobus_write_sts as there was no user
>  Removed ptp_sts_supported-boolean and introduced flags instead
> 
> With this patchset the phc2sys synchronisation precision improved to +/-555ns 
> on
> an IMX6DL with an MV88E6220 switch attached.
> 
> This patchset takes into account the comments from the following discussions:
> - https://lkml.org/lkml/2019/8/2/1364
> - https://lkml.org/lkml/2019/8/5/169
> 
> Patch 01 adds the required infrastructure in the MDIO layer.
> Patch 02 adds additional PTP offset compensation.
> Patch 03 adds support for the PTP_SYS_OFFSET_EXTENDED ioctl in the mv88e6xxx 
> driver.
> Patch 04 adds support for the PTP system timestamping in the imx-fec driver.

It looks like there is still some active discussion about these changes and
there will likely be another spin.

Re: [PATCH] rcu: don't include in rcutiny.h

2019-08-21 Thread Christoph Hellwig

On Wed, Aug 21, 2019 at 08:02:00PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 22, 2019 at 10:53:43AM +0900, Christoph Hellwig wrote:
> > The kbuild reported a built failure due to a header loop when RCUTINY is
> > enabled with my pending riscv-nommu port.  Switch rcutiny.h to only
> > include the minimal required header to get HZ instead.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> Queued for review and testing, thank you!
> 
> Do you need this in v5.4?  My normal workflow would put it into v5.5.

I hope the riscv-nommu coe gets merges for 5.4, so if we could queue
it up for that I'd appreciate it.

[PATCH] arm: skip nomap memblocks while finding the lowmem/highmem boundary

2019-08-21 Thread Chester Lin

adjust_lowmem_bounds() checks every memblocks in order to find the boundary
between lowmem and highmem. However some memblocks could be marked as NOMAP
so they are not used by kernel, which should be skipped while calculating
the boundary.

Signed-off-by: Chester Lin 
---
 arch/arm/mm/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 426d9085396b..b86dba44d828 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1181,6 +1181,9 @@ void __init adjust_lowmem_bounds(void)
phys_addr_t block_start = reg->base;
phys_addr_t block_end = reg->base + reg->size;
 
+   if (memblock_is_nomap(reg))
+   continue;
+
if (reg->base < vmalloc_limit) {
if (block_end > lowmem_limit)
/*
-- 
2.22.0

Re: [PATCH] x86/apic: Update virtual irq base for DT/OF based system as well

2019-08-21 Thread Tanwar, Rahul




Hi Thomas,

On 22/8/2019 12:47 AM, Andy Shevchenko wrote:

For DT we can actually avoid that completely. See below.

For ACPI not unfortunately as the stupid GSI mapping is hard coded.

The below works better for my case, so, if you are going with that
Tested-by: Andy Shevchenko 


8<-
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2438,7 +2438,13 @@ unsigned int arch_dynirq_lower_bound(uns
 * dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use
 * gsi_top if ioapic_dynirq_base hasn't been initialized yet.
 */
-   return ioapic_initialized ? ioapic_dynirq_base : gsi_top;
+   if (!ioapic_initialized)
+   return gsi_top;
+   /*
+* For DT enabled machines ioapic_dynirq_base is irrelevant and not
+* updated. So simply return @from if ioapic_dynirq_base == 0.
+*/
+   return ioapic_dynirq_base ? : from;
  }
  
  #ifdef CONFIG_X86_32



I have also tested above and it works fine. In fact, my first patch to

resolve it during internal review was exactly on similar lines. So if

you are going to add above then i will stop following up on this

topic further. Thanks.


Regards,

Rahul

Re: [PATCH 2/4] memremap: remove the dev field in struct dev_pagemap

2019-08-21 Thread Dan Williams

On Wed, Aug 21, 2019 at 4:51 PM Jason Gunthorpe  wrote:
>
> On Wed, Aug 21, 2019 at 01:24:20PM -0300, Jason Gunthorpe wrote:
> > On Tue, Aug 20, 2019 at 07:58:22PM -0700, Dan Williams wrote:
> > > On Tue, Aug 20, 2019 at 6:27 AM Jason Gunthorpe  wrote:
> > > >
> > > > On Mon, Aug 19, 2019 at 06:44:02PM -0700, Dan Williams wrote:
> > > > > On Sun, Aug 18, 2019 at 2:12 AM Christoph Hellwig  wrote:
> > > > > >
> > > > > > The dev field in struct dev_pagemap is only used to print dev_name 
> > > > > > in
> > > > > > two places, which are at best nice to have.  Just remove the field
> > > > > > and thus the name in those two messages.
> > > > > >
> > > > > > Signed-off-by: Christoph Hellwig 
> > > > > > Reviewed-by: Ira Weiny 
> > > > >
> > > > > Needs the below as well.
> > > > >
> > > > > /me goes to check if he ever merged the fix to make the unit test
> > > > > stuff get built by default with COMPILE_TEST [1]. Argh! Nope, didn't
> > > > > submit it for 5.3-rc1, sorry for the thrash.
> > > > >
> > > > > You can otherwise add:
> > > > >
> > > > > Reviewed-by: Dan Williams 
> > > > >
> > > > > [1]: 
> > > > > https://lore.kernel.org/lkml/156097224232.1086847.9463861924683372741.st...@dwillia2-desk3.amr.corp.intel.com/
> > > >
> > > > Can you get this merged? Do you want it to go with this series?
> > >
> > > Yeah, makes some sense to let you merge it so that you can get
> > > kbuild-robot reports about any follow-on memremap_pages() work that
> > > may trip up the build. Otherwise let me know and I'll get it queued
> > > with the other v5.4 libnvdimm pending bits.
> >
> > Done, I used it already to test build the last series from CH..
>
> It failed 0-day, I'm guessing some missing kconfig stuff
>
> For now I dropped it, but, if you send a v2 I can forward it toward
> 0-day again!

The system works!

Sorry for that thrash, I'll track it down.

Re: [PATCH v1 03/63] Input: atmel_mxt_ts - only read messages in mxt_acquire_irq() when necessary

2019-08-21 Thread Jiada Wang


Hi

On 2019/08/22 2:54, Dmitry Torokhov wrote:

On Wed, Aug 21, 2019 at 10:26:31PM +0900, Jiada Wang wrote:

Hi Dmitry

On 2019/08/17 2:16, Dmitry Torokhov wrote:

On Fri, Aug 16, 2019 at 05:28:52PM +0900, Jiada Wang wrote:

From: Nick Dyer 

The workaround of reading all messages until an invalid is received is a
way of forcing the CHG line high, which means that when using
edge-triggered interrupts the interrupt can be acquired.

With level-triggered interrupts the workaround is unnecessary.

Also, most recent maXTouch chips have a feature called RETRIGEN which, when
enabled, reasserts the interrupt line every cycle if there are messages
waiting. This also makes the workaround unnecessary.

Note: the RETRIGEN feature is only in some firmware versions/chips, it's
not valid simply to enable the bit.


Instead of trying to work around of misconfiguration for IRQ/firmware,
can we simply error out of probe if we see a level interrupt with
!RETRIGEN firmware?


I think for old firmwares, which doesn't support RETRIGEN feature, this
workaround is needed, otherwise we will break all old firmwares, which
configured with edge-triggered IRQ


Do you know if there are any? I know Chrome OS firmware have RETRIGEN
activated and they are pretty old (original Pixel is from 2013). But if
we indeed have devices with edge interrupt and old not firmware that
does not retrigger, I guess we'll have to keep it...



Honestly I don't know firmwares/chips which don't support RETRIGEN feature.

BUT Dyer originally authored this patch in 2012, I assume here "old" 
firmware/chips means, those before 2012.



Thanks,
Jiada


Thanks.

Re: [PATCH v3 2/3] drivers: hv: vmbus: add test attributes to debugfs

2019-08-21 Thread Branden Bonaby

> > +What:   /sys/kernel/debug/hyperv//fuzz_test_state
> > +Date:   August 2019
> > +KernelVersion:  5.3
> > +Contact:Branden Bonaby 
> > +Description:Fuzz testing status of a vmbus device, whether its in an ON
> > +state or a OFF state
> 
> Document what values are actually returned?  
> 
> > +Users:  Debugging tools
> > +
> > +What:   
> > /sys/kernel/debug/hyperv//delay/fuzz_test_buffer_interrupt_delay
> > +Date:   August 2019
> > +KernelVersion:  5.3
> > +Contact:Branden Bonaby 
> > +Description:Fuzz testing buffer delay value between 0 - 1000
> 
> It would be helpful to document the units -- I think this is 0 to 1000
> microseconds.

you're right, that makes sense I'll add that information in. Also 
to confirm, it is microseconds like you said.

> > +static int hv_debugfs_delay_set(void *data, u64 val)
> > +{
> > +   if (val >= 1 && val <= 1000)
> > +   *(u32 *)data = val;
> > +   /*Best to not use else statement here since we want
> > +* the delay to remain the same if val > 1000
> > +*/
> 
> The standard multi-line comment style would be:
> 
>   /*
>* Best to not use else statement here since we want
>* the delay to remain the same if val > 1000
>*/
>

will change

> > +   else if (val <= 0)
> > +   *(u32 *)data = 0;
> 
> You could consider returning an error for an invalid
> value (< 0, or > 1000).
> 

its subtle but it does make sense and shows anyone
reading that the only acceptable values in the 
function are 0 <= 1000 at a glance. I'll add
that in.

Re: [PATCH v4 06/10] modpost: Add modinfo flag to livepatch modules

2019-08-21 Thread Masahiro Yamada

Hi Joe,

On Tue, Aug 20, 2019 at 1:02 AM Joe Lawrence  wrote:
>
> On 8/19/19 3:31 AM, Miroslav Benes wrote:
> > On Mon, 19 Aug 2019, Masahiro Yamada wrote:
> >
> >>
> >> I can review this series from the build system point of view,
> >> but I am not familiar enough with live-patching itself.
> >>
> >> Some possibilities:
> >>
> >> [1] Merge this series thru the live-patch tree after the
> >>  kbuild base patches land.
> >>  This requires one extra development cycle (targeting for 5.5-rc1)
> >>  but I think this is the official way if you do not rush into it.
> >
> > I'd prefer this option. There is no real rush and I think we can wait one
> > extra development cycle.
>
> Agreed.  I'm in no hurry and was only curious about the kbuild changes
> that this patchset is now dependent on -- how to note them for other
> reviewers or anyone wishing to test.
>
> > Joe, could you submit one more revision with all the recent changes (once
> > kbuild improvements settle down), please? We should take a look at the
> > whole thing one more time? What do you think?
> >
>
> Definitely, yes.  I occasionally force a push to:
> https://github.com/joe-lawrence/linux/tree/klp-convert-v5-expanded
>
> as I've been updating and collecting feedback from v4.  Once updates
> settle, I'll send out a new v5 set.
>
> -- Joe

When you send v5, please squash the following clean-up too:



diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 89eaef0d3efc..9e77246d84e3 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -47,7 +47,7 @@ targets += $(modules) $(modules:.ko=.mod.o)
 # Live Patch
 # ---

-$(modules-klp:.ko=.tmp.ko): %.tmp.ko: %.o %.mod.o Symbols.list FORCE
+%.tmp.ko: %.o %.mod.o Symbols.list FORCE
+$(call if_changed,ld_ko_o)

 quiet_cmd_klp_convert = KLP $@




Thanks.


-- 
Best Regards
Masahiro Yamada

[PATCH v3] vfio_pci: Restore original state on release

2019-08-21 Thread hexin

vfio_pci_enable() saves the device's initial configuration information
with the intent that it is restored in vfio_pci_disable().  However,
the commit referenced in Fixes: below replaced the call to
__pci_reset_function_locked(), which is not wrapped in a state save
and restore, with pci_try_reset_function(), which overwrites the
restored device state with the current state before applying it to the
device.  Reinstate use of __pci_reset_function_locked() to return to
the desired behavior.

Fixes: 890ed578df82 ("vfio-pci: Use pci "try" reset interface")
Signed-off-by: hexin 
Signed-off-by: Liu Qi 
Signed-off-by: Zhang Yu 
---
v2->v3:
- change commit log 
v1->v2:
- add fixes tag
- add comment to warn 

[1] 
https://lore.kernel.org/linux-pci/1565926427-21675-1-git-send-email-hexi...@baidu.com
[2] 
https://lore.kernel.org/linux-pci/1566042663-16694-1-git-send-email-hexi...@baidu.com

 drivers/vfio/pci/vfio_pci.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 703948c..0220616 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -438,11 +438,20 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
 
/*
-* Try to reset the device.  The success of this is dependent on
-* being able to lock the device, which is not always possible.
+* Try to get the locks ourselves to prevent a deadlock. The
+* success of this is dependent on being able to lock the device,
+* which is not always possible.
+* We can not use the "try" reset interface here, which will
+* overwrite the previously restored configuration information.
 */
-   if (vdev->reset_works && !pci_try_reset_function(pdev))
-   vdev->needs_reset = false;
+   if (vdev->reset_works && pci_cfg_access_trylock(pdev)) {
+   if (device_trylock(>dev)) {
+   if (!__pci_reset_function_locked(pdev))
+   vdev->needs_reset = false;
+   device_unlock(>dev);
+   }
+   pci_cfg_access_unlock(pdev);
+   }
 
pci_restore_state(pdev);
 out:
-- 
1.8.3.1

Re: [PATCH net] net: dsa: bcm_sf2: Do not configure PHYLINK on CPU port

2019-08-21 Thread David Miller

From: Florian Fainelli 
Date: Wed, 21 Aug 2019 17:07:46 -0700

> The SF2 binding does not specify that the CPU port should have
> properties mandatory for successfully instantiating a PHYLINK object. As
> such, there will be missing properties (including fixed-link) and when
> attempting to validate and later configure link modes, we will have an
> incorrect set of parameters (interface, speed, duplex).
> 
> Simply prevent the CPU port from being configured through PHYLINK since
> bcm_sf2_imp_setup() takes care of that already.
> 
> Fixes: 0e27921816ad ("net: dsa: Use PHYLINK for the CPU/DSA ports")
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.

Re: KMSAN: uninit-value in rtm_new_nexthop

2019-08-21 Thread David Ahern

On 8/21/19 6:38 PM, syzbot wrote:
> ==
> BUG: KMSAN: uninit-value in rtm_to_nh_config net/ipv4/nexthop.c:1317
> [inline]
> BUG: KMSAN: uninit-value in rtm_new_nexthop+0x447/0x98e0
> net/ipv4/nexthop.c:1474

I believed this is fixed in net by commit:

Author: David Ahern 
Date:   Mon Aug 12 13:07:07 2019 -0700

netlink: Fix nlmsg_parse as a wrapper for strict message parsing

Re: [PATCH v2] ARM: UNWINDER_FRAME_POINTER implementation for Clang

2019-08-21 Thread Nick Desaulniers

On Wed, Aug 21, 2019 at 10:46 AM Nathan Huckleberry  wrote:
>
> The stackframe setup when compiled with clang is different.
> Since the stack unwinder expects the gcc stackframe setup it
> fails to print backtraces. This patch adds support for the
> clang stackframe setup.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/35
> Cc: clang-built-li...@googlegroups.com
> Suggested-by: Tri Vo 
> Signed-off-by: Nathan Huckleberry 
> ---
> Changes from v1->v2
> * Fix indentation in various files
> * Swap spaces for tabs
> * Rename Ldsi to Lopcode
> * Remove unused Ldsi entry
>
>  arch/arm/Kconfig.debug |   2 +-
>  arch/arm/Makefile  |   5 +-
>  arch/arm/lib/Makefile  |   8 +-
>  arch/arm/lib/backtrace-clang.S | 229 +
>  4 files changed, 241 insertions(+), 3 deletions(-)
>  create mode 100644 arch/arm/lib/backtrace-clang.S
>
> diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
> index 85710e078afb..b9c674ec19e0 100644
> --- a/arch/arm/Kconfig.debug
> +++ b/arch/arm/Kconfig.debug
> @@ -56,7 +56,7 @@ choice
>
>  config UNWINDER_FRAME_POINTER
> bool "Frame pointer unwinder"
> -   depends on !THUMB2_KERNEL && !CC_IS_CLANG
> +   depends on !THUMB2_KERNEL
> select ARCH_WANT_FRAME_POINTERS
> select FRAME_POINTER
> help
> diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> index c3624ca6c0bc..6f251c201db0 100644
> --- a/arch/arm/Makefile
> +++ b/arch/arm/Makefile
> @@ -36,7 +36,10 @@ KBUILD_CFLAGS+= $(call 
> cc-option,-mno-unaligned-access)
>  endif
>
>  ifeq ($(CONFIG_FRAME_POINTER),y)
> -KBUILD_CFLAGS  +=-fno-omit-frame-pointer -mapcs -mno-sched-prolog
> +KBUILD_CFLAGS  +=-fno-omit-frame-pointer
> +ifeq ($(CONFIG_CC_IS_GCC),y)
> +KBUILD_CFLAGS += -mapcs -mno-sched-prolog
> +endif
>  endif
>
>  ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index b25c54585048..6d2ba454f25b 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -5,7 +5,7 @@
>  # Copyright (C) 1995-2000 Russell King
>  #
>
> -lib-y  := backtrace.o changebit.o csumipv6.o csumpartial.o   \
> +lib-y  := changebit.o csumipv6.o csumpartial.o   \
>csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
>delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
>memmove.o memset.o setbit.o\
> @@ -19,6 +19,12 @@ lib-y:= backtrace.o changebit.o csumipv6.o 
> csumpartial.o   \
>  mmu-y  := clear_user.o copy_page.o getuser.o putuser.o   \
>copy_from_user.o copy_to_user.o
>
> +ifdef CONFIG_CC_IS_CLANG
> +  lib-y+= backtrace-clang.o
> +else
> +  lib-y+= backtrace.o
> +endif
> +
>  # using lib_ here won't override already available weak symbols
>  obj-$(CONFIG_UACCESS_WITH_MEMCPY) += uaccess_with_memcpy.o
>
> diff --git a/arch/arm/lib/backtrace-clang.S b/arch/arm/lib/backtrace-clang.S
> new file mode 100644
> index ..6f2a8a57d0fb
> --- /dev/null
> +++ b/arch/arm/lib/backtrace-clang.S
> @@ -0,0 +1,229 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + *  linux/arch/arm/lib/backtrace-clang.S
> + *
> + *  Copyright (C) 2019 Nathan Huckleberry
> + *
> + */
> +#include 
> +#include 
> +#include 
> +   .text
> +
> +/* fp is 0 or stack frame */
> +
> +#define frame  r4
> +#define sv_fp  r5
> +#define sv_pc  r6
> +#define mask   r7
> +#define sv_lr  r8
> +
> +ENTRY(c_backtrace)
> +
> +#if !defined(CONFIG_FRAME_POINTER) || !defined(CONFIG_PRINTK)
> +   ret lr
> +ENDPROC(c_backtrace)
> +#else
> +
> +
> +/*
> + * Clang does not store pc or sp in function prologues
> + * so we don't know exactly where the function
> + * starts.

To quickly re-wrap text (if you're using vim) such as with comments like these:
shift+v (VISUAL LINE MODE)
j or k to highlight lines
gq (to rewrap)
You may need `set cc=80` (not sure).

> + *
> + * We can treat the current frame's lr as the saved pc and the
> + * preceding frame's lr as the current frame's lr,
> + * but we can't trace the most recent call.
> + * Inserting a false stack frame allows us to reference the
> + * function called last in the stacktrace.
> + *
> + * If the call instruction was a bl we can look at the callers
> + * branch instruction to calculate the saved pc.
> + * We can recover the pc in most cases, but in cases such as
> + * calling function pointers we cannot. In this
> + * case, default to using the lr. This will be
> + * some address in the function, but will not
> + * be the function start.
> + *
> + * Unfortunately due to the stack frame layout we can't dump
> + *  r0 - r3, but these are less frequently saved.

I guess if they were spilled, but I'm ok with this; I'd rather have a
working unwinder than disabled config.  The printing is a debug
feature that's nice to have, but the main focus should be

[PATCH] gpio: Move gpiochip_lock/unlock_as_irq to gpio/driver.h

2019-08-21 Thread YueHaibing

If CONFIG_GPIOLIB is not, gpiochip_lock/unlock_as_irq will
conflict as this:

In file included from sound/soc/codecs/wm5100.c:18:0:
./include/linux/gpio.h:224:19: error: static declaration of 
gpiochip_lock_as_irq follows non-static declaration
 static inline int gpiochip_lock_as_irq(struct gpio_chip *chip,
   ^~~~
In file included from sound/soc/codecs/wm5100.c:17:0:
./include/linux/gpio/driver.h:494:5: note: previous declaration of 
gpiochip_lock_as_irq was here
 int gpiochip_lock_as_irq(struct gpio_chip *chip, unsigned int offset);
 ^~~~
In file included from sound/soc/codecs/wm5100.c:18:0:
./include/linux/gpio.h:231:20: error: static declaration of 
gpiochip_unlock_as_irq follows non-static declaration
 static inline void gpiochip_unlock_as_irq(struct gpio_chip *chip,
^~
In file included from sound/soc/codecs/wm5100.c:17:0:
./include/linux/gpio/driver.h:495:6: note: previous declaration of 
gpiochip_unlock_as_irq was here
 void gpiochip_unlock_as_irq(struct gpio_chip *chip, unsigned int offset);
 ^~

Move them to gpio/driver.h and use CONFIG_GPIOLIB guard this.

Reported-by: Hulk Robot 
Fixes: d74be6dfea1b ("gpio: remove gpiod_lock/unlock_as_irq()")
Signed-off-by: YueHaibing 
---
 include/linux/gpio.h| 13 -
 include/linux/gpio/driver.h | 19 ---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/include/linux/gpio.h b/include/linux/gpio.h
index f757a58..2157717 100644
--- a/include/linux/gpio.h
+++ b/include/linux/gpio.h
@@ -221,19 +221,6 @@ static inline int gpio_to_irq(unsigned gpio)
return -EINVAL;
 }
 
-static inline int gpiochip_lock_as_irq(struct gpio_chip *chip,
-  unsigned int offset)
-{
-   WARN_ON(1);
-   return -EINVAL;
-}
-
-static inline void gpiochip_unlock_as_irq(struct gpio_chip *chip,
- unsigned int offset)
-{
-   WARN_ON(1);
-}
-
 static inline int irq_to_gpio(unsigned irq)
 {
/* irq can never have been returned from gpio_to_irq() */
diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index 4d2d7b7..1778106 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -490,9 +490,6 @@ extern int devm_gpiochip_add_data(struct device *dev, 
struct gpio_chip *chip,
 extern struct gpio_chip *gpiochip_find(void *data,
  int (*match)(struct gpio_chip *chip, void *data));
 
-/* lock/unlock as IRQ */
-int gpiochip_lock_as_irq(struct gpio_chip *chip, unsigned int offset);
-void gpiochip_unlock_as_irq(struct gpio_chip *chip, unsigned int offset);
 bool gpiochip_line_is_irq(struct gpio_chip *chip, unsigned int offset);
 int gpiochip_reqres_irq(struct gpio_chip *chip, unsigned int offset);
 void gpiochip_relres_irq(struct gpio_chip *chip, unsigned int offset);
@@ -710,6 +707,10 @@ void devprop_gpiochip_set_names(struct gpio_chip *chip,
 
 struct gpio_chip *gpiod_to_chip(const struct gpio_desc *desc);
 
+/* lock/unlock as IRQ */
+int gpiochip_lock_as_irq(struct gpio_chip *chip, unsigned int offset);
+void gpiochip_unlock_as_irq(struct gpio_chip *chip, unsigned int offset);
+
 #else /* CONFIG_GPIOLIB */
 
 static inline struct gpio_chip *gpiod_to_chip(const struct gpio_desc *desc)
@@ -719,6 +720,18 @@ static inline struct gpio_chip *gpiod_to_chip(const struct 
gpio_desc *desc)
return ERR_PTR(-ENODEV);
 }
 
+static inline int gpiochip_lock_as_irq(struct gpio_chip *chip,
+  unsigned int offset)
+{
+   WARN_ON(1);
+   return -EINVAL;
+}
+
+static inline void gpiochip_unlock_as_irq(struct gpio_chip *chip,
+ unsigned int offset)
+{
+   WARN_ON(1);
+}
 #endif /* CONFIG_GPIOLIB */
 
 #endif /* __LINUX_GPIO_DRIVER_H */
-- 
2.7.4

Re: [PATCH v3 3/3] tools: hv: add vmbus testing tool

2019-08-21 Thread Branden Bonaby

On Thu, Aug 22, 2019 at 01:36:09AM +, Harry Zhang wrote:
> Tool function issues:  Please validate args errors for  '-p' and '--path',  
> in or following validate_args_path().  
> 
> Comments of functionality:
> - it's confusing when fuzz_testing are all OFF, then user run ' python3 
> /home/lisa/vmbus_testing -p 
> /sys/kernel/debug/hyperv/000d3a6e-4548-000d-3a6e-4548000d3a6e delay -d 0 0 -D 
> ' which will enable all delay testing state ('Y' in state files).  even I 
> used "-D", "--dis_all" param. 
> - if we have subparsers of "disable-all" for the testing tool, then 
> probably we don't need the mutually_exclusive_group under subparsers of 
> "delay"
> - the path argument (-p) could be an argument for subparsers of "delay" 
> and "view" only.
> 
> Regards,
> Harry
>

So I made the choice to keep disabling the state and disabling delay
testing seperate, because once we start adding other testing options
you wouldn't want to inadvertently disable all testing especially
if you were doing more than one type of test at a time.
So with your configuration

'python3 /home/lisa/vmbus_testing -p 
/sys/kernel/debug/hyperv/000d3a6e-4548-000d-3a6e-4548000d3a6e delay -d 0 0 -D '

this would stop all delay testing on all the devices but wouldn't change
their test state to OFF 'N'.So thats why I have the option -s --state to
change the state to Off with a -s 0. Then to disable all types of testing
and change the state to OFF thats where the 'disable-all' subparser  comes in.
with:

'python3 /home/lisa/vmbus_testing disable-all

For that last point I don't understand what you mean, are you saying it would be
better to have something like this using  delay as an example?

'python3 /home/lisa/vmbus_testing delay -p 
/sys/kernel/debug/hyperv/000d3a6e-4548-000d-3a6e-4548000d3a6e'

If thats what you mean I figured it was better to make the -p accessible
to all test type so I made it apart of the main parser. This would allow
us to just have it there once instead of having to make a -p for every
subparser.

Also maybe I need to change the examples and the help text
because with the -D option for delay you wouldnt actually need to put in 
the path. As

'python3 /home/lisa/vmbus_testing delay -d 0 0 -D '

would suffice to stop delay testing on all devices; -E would enable
it for all devices and change the state to On 'Y' if it wasn't already.

let me know your thoughts

branden bonaby

RE: [tip:timers/core 34/34] drivers//clocksource/hyperv_timer.c:264:35: error: 'hv_sched_clock_offset' undeclared; did you mean 'sched_clock_register'?

2019-08-21 Thread Tianyu Lan

Thanks for reporting. I will send out fix patch.

-Original Message-
From: kbuild test robot  
Sent: Thursday, August 22, 2019 10:25 AM
To: Tianyu Lan 
Cc: kbuild-...@01.org; linux-kernel@vger.kernel.org; tipbu...@zytor.com; Thomas 
Gleixner ; Michael Kelley 
Subject: [tip:timers/core 34/34] drivers//clocksource/hyperv_timer.c:264:35: 
error: 'hv_sched_clock_offset' undeclared; did you mean 'sched_clock_register'?

tree:   
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkernel.googlesource.com%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftip%2Ftip.gitdata=02%7C01%7CTianyu.Lan%40microsoft.com%7Cfa01680d45d1424cbbc308d726a82122%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637020378361213701sdata=56fY4vgmkc4Nk3ZqCZhRyaA%2BmfKSd%2Fp9eYXZUahw5uo%3Dreserved=0
 timers/core
head:   b74e1d61dbc614ff35ef3ad9267c61ed06b09051
commit: b74e1d61dbc614ff35ef3ad9267c61ed06b09051 [34/34] clocksource/hyperv: 
Add Hyper-V specific sched clock function
config: i386-randconfig-g002-201933 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
git checkout b74e1d61dbc614ff35ef3ad9267c61ed06b09051
# save the attached .config to linux build tree
make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All error/warnings (new ones prefixed by >>):

   drivers//clocksource/hyperv_timer.c: In function 'read_hv_sched_clock_msr':
>> drivers//clocksource/hyperv_timer.c:264:35: error: 'hv_sched_clock_offset' 
>> undeclared (first use in this function); did you mean 'sched_clock_register'?
 return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
  ^
  sched_clock_register
   drivers//clocksource/hyperv_timer.c:264:35: note: each undeclared identifier 
is reported only once for each function it appears in
   drivers//clocksource/hyperv_timer.c: In function 'hv_init_clocksource':
   drivers//clocksource/hyperv_timer.c:334:2: error: 'hv_sched_clock_offset' 
undeclared (first use in this function); did you mean 'sched_clock_register'?
 hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
 ^
 sched_clock_register
   drivers//clocksource/hyperv_timer.c: In function 'read_hv_sched_clock_msr':
>> drivers//clocksource/hyperv_timer.c:265:1: warning: control reaches end of 
>> non-void function [-Wreturn-type]
}
^

vim +264 drivers//clocksource/hyperv_timer.c

   261  
   262  static u64 read_hv_sched_clock_msr(void)
   263  {
 > 264  return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
 > 265  }
   266  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.01.org%2Fpipermail%2Fkbuild-alldata=02%7C01%7CTianyu.Lan%40microsoft.com%7Cfa01680d45d1424cbbc308d726a82122%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637020378361213701sdata=bmZV%2B2uKHUlwngubxhE2bZlfOqCRNYVDCXOs%2FWcy3f8%3Dreserved=0
   Intel Corporation

Re: [PATCH V6 2/2] genirq/affinity: Spread vectors on node according to nr_cpu ratio

2019-08-21 Thread Ming Lei

On Mon, Aug 19, 2019 at 04:02:21PM +0200, Thomas Gleixner wrote:
> On Mon, 19 Aug 2019, Ming Lei wrote:
> > On Mon, Aug 19, 2019 at 03:13:58PM +0200, Thomas Gleixner wrote:
> > > On Mon, 19 Aug 2019, Ming Lei wrote:
> > > 
> > > > Cc: Jon Derrick 
> > > > Cc: Jens Axboe 
> > > > Reported-by: Jon Derrick 
> > > > Reviewed-by: Jon Derrick 
> > > > Reviewed-by: Keith Busch 
> > > 
> > > This version is sufficiently different from the previous one, so I do not
> > > consider the reviewed-by tags still being valid and meaningful. Don't
> > > include them unless you just do cosmetic changes.
> > 
> > Fine.
> > 
> > However, the V6 only change isn't big, just for addressing the 
> > un-initialized
> > warning, and the change is only done on function of 
> > irq_build_affinity_masks().
> 
> They are not trivial either:
> 
>  affinity.c |   28 +---
>  1 file changed, 13 insertions(+), 15 deletions(-)
> 
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -339,7 +339,7 @@ static int irq_build_affinity_masks(unsi
>   unsigned int firstvec,
>   struct irq_affinity_desc *masks)
>  {
> - unsigned int curvec = startvec, nr_present, nr_others;
> + unsigned int curvec = startvec, nr_present = 0, nr_others = 0;
>   cpumask_var_t *node_to_cpumask;
>   cpumask_var_t nmsk, npresmsk;
>   int ret = -ENOMEM;
> @@ -354,19 +354,17 @@ static int irq_build_affinity_masks(unsi
>   if (!node_to_cpumask)
>   goto fail_npresmsk;
>  
> - ret = 0;
>   /* Stabilize the cpumasks */
>   get_online_cpus();
>   build_node_to_cpumask(node_to_cpumask);
>  
>   /* Spread on present CPUs starting from affd->pre_vectors */
> - nr_present = __irq_build_affinity_masks(curvec, numvecs,
> - firstvec, node_to_cpumask,
> - cpu_present_mask, nmsk, masks);
> - if (nr_present < 0) {
> - ret = nr_present;
> + ret = __irq_build_affinity_masks(curvec, numvecs, firstvec,
> +  node_to_cpumask, cpu_present_mask,
> +  nmsk, masks);
> + if (ret < 0)
>   goto fail_build_affinity;
> - }
> + nr_present = ret;
>  
>   /*
>* Spread on non present CPUs starting from the next vector to be
> @@ -379,16 +377,16 @@ static int irq_build_affinity_masks(unsi
>   else
>   curvec = firstvec + nr_present;
>   cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
> - nr_others = __irq_build_affinity_masks(curvec, numvecs,
> -firstvec, node_to_cpumask,
> -npresmsk, nmsk, masks);
> - if (nr_others < 0)
> - ret = nr_others;
> + ret = __irq_build_affinity_masks(curvec, numvecs, firstvec,
> +  node_to_cpumask, npresmsk, nmsk,
> +  masks);
> + if (ret >= 0)
> + nr_others = ret;
>  
>   fail_build_affinity:
>   put_online_cpus();
>  
> - if (min(nr_present, nr_others) >= 0)
> + if (ret >= 0)
>   WARN_ON(nr_present + nr_others < numvecs);
>  
>   free_node_to_cpumask(node_to_cpumask);
> @@ -398,7 +396,7 @@ static int irq_build_affinity_masks(unsi
>  
>   fail_nmsk:
>   free_cpumask_var(nmsk);
> - return ret;
> + return ret < 0 ? ret : 0;
>  }
>  
>  static void default_calc_sets(struct irq_affinity *affd, unsigned int 
> affvecs)

Hi Keith & Jon,

Could you review the above V6 extra change so that we can move on?

BTW, in-balanced numa nodes can be made easily via passing 'possible_cpus=N'.


Thanks, 
Ming

Re: BUG: MAX_STACK_TRACE_ENTRIES too low in tipc_topsrv_exit_net

2019-08-21 Thread Eric Biggers

On Mon, Aug 19, 2019 at 05:22:07AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:5181b473 net: phy: realtek: add NBase-T PHY auto-detection
> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=156b731c60
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=5f97459a05652f579f6c
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+5f97459a05652f579...@syzkaller.appspotmail.com
> 
> BUG: MAX_STACK_TRACE_ENTRIES too low!
> turning off the locking correctness validator.
> CPU: 0 PID: 2581 Comm: kworker/u4:4 Not tainted 5.3.0-rc3+ #132
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: netns cleanup_net
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
>  save_trace kernel/locking/lockdep.c:473 [inline]
>  save_trace.isra.0.cold+0x14/0x19 kernel/locking/lockdep.c:458
>  mark_lock+0x3db/0x11e0 kernel/locking/lockdep.c:3583
>  mark_usage kernel/locking/lockdep.c:3517 [inline]
>  __lock_acquire+0x538/0x4c30 kernel/locking/lockdep.c:3834
>  lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4412
>  flush_workqueue+0x126/0x14b0 kernel/workqueue.c:2774
>  drain_workqueue+0x1b4/0x470 kernel/workqueue.c:2939
>  destroy_workqueue+0x21/0x6c0 kernel/workqueue.c:4320
>  tipc_topsrv_work_stop net/tipc/topsrv.c:636 [inline]
>  tipc_topsrv_stop net/tipc/topsrv.c:694 [inline]
>  tipc_topsrv_exit_net+0x3fe/0x5d8 net/tipc/topsrv.c:706
>  ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:172
>  cleanup_net+0x4e2/0xa70 net/core/net_namespace.c:594
>  process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
>  worker_thread+0x98/0xe40 kernel/workqueue.c:2415
>  kthread+0x361/0x430 kernel/kthread.c:255
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> kobject: 'rx-0' (0e2c91cd): kobject_cleanup, parent 2003fefb
> kobject: 'rx-0' (0e2c91cd): auto cleanup 'remove' event
> kobject: 'rx-0' (0e2c91cd): kobject_uevent_env
> kobject: 'rx-0' (0e2c91cd): kobject_uevent_env: uevent_suppress
> caused the event to drop!
> kobject: 'rx-0' (0e2c91cd): auto cleanup kobject_del
> kobject: 'rx-0' (0e2c91cd): calling ktype release
> kobject: 'rx-0': free name
> kobject: 'tx-0' (58b6f726): kobject_cleanup, parent 2003fefb
> kobject: 'tx-0' (58b6f726): auto cleanup 'remove' event
> kobject: 'tx-0' (58b6f726): kobject_uevent_env
> kobject: 'tx-0' (58b6f726): kobject_uevent_env: uevent_suppress
> caused the event to drop!
> kobject: 'tx-0' (58b6f726): auto cleanup kobject_del
> kobject: 'tx-0' (58b6f726): calling ktype release
> kobject: 'tx-0': free name
> kobject: 'queues' (2003fefb): kobject_cleanup, parent
> 9c061a32
> kobject: 'queues' (2003fefb): calling ktype release
> kobject: 'queues' (2003fefb): kset_release
> kobject: 'queues': free name
> kobject: 'ip6gre0' (18a24d65): kobject_uevent_env
> kobject: 'ip6gre0' (18a24d65): kobject_uevent_env: uevent_suppress
> caused the event to drop!
> kobject: 'rx-0' (940b22b0): kobject_cleanup, parent 05a1fc3a
> kobject: 'rx-0' (940b22b0): auto cleanup 'remove' event
> kobject: 'rx-0' (940b22b0): kobject_uevent_env
> kobject: 'rx-0' (940b22b0): kobject_uevent_env: uevent_suppress
> caused the event to drop!
> kobject: 'rx-0' (940b22b0): auto cleanup kobject_del
> kobject: 'rx-0' (940b22b0): calling ktype release
> kobject: 'rx-0': free name
> kobject: 'tx-0' (278e85e2): kobject_cleanup, parent 05a1fc3a
> kobject: 'tx-0' (278e85e2): auto cleanup 'remove' event
> kobject: 'tx-0' (278e85e2): kobject_uevent_env
> kobject: 'tx-0' (278e85e2): kobject_uevent_env: uevent_suppress
> caused the event to drop!
> kobject: 'tx-0' (278e85e2): auto cleanup kobject_del
> kobject: 'tx-0' (278e85e2): calling ktype release
> kobject: 'tx-0': free name
> kobject: 'queues' (05a1fc3a): kobject_cleanup, parent
> 9c061a32
> kobject: 'queues' (05a1fc3a): calling ktype release
> kobject: 'queues' (05a1fc3a): kset_release
> kobject: 'queues': free name
> kobject: 'ip6gre0' (c78b955b): kobject_uevent_env
> kobject: 'ip6gre0' (c78b955b): kobject_uevent_env: uevent_suppress
> caused the event to drop!
> kobject: 'rx-0' (0fa7c1d1): kobject_cleanup, parent d264d5b4
> kobject: 'rx-0' (0fa7c1d1): auto cleanup 'remove' event
> kobject: 'rx-0' (0fa7c1d1): kobject_uevent_env
> kobject: 'rx-0' (0fa7c1d1): kobject_uevent_env:

[PATCH V2] csky: Fixup 610 vipt cache flush mechanism

2019-08-21 Thread guoren

From: Guo Ren 

610 has vipt aliasing issue, so we need to finish the cache flush
apis mentioned in cachetlb.rst to avoid data corruption.

Here is the list of modified apis in the patch:

 - flush_kernel_dcache_page  (new add)
 - flush_dcache_mmap_lock(new add)
 - flush_dcache_mmap_unlock  (new add)
 - flush_kernel_vmap_range   (new add)
 - invalidate_kernel_vmap_range  (new add)
 - flush_anon_page   (new add)
 - flush_cache_range (new add)
 - flush_cache_vmap  (flush all)
 - flush_cache_vunmap(flush all)
 - flush_cache_mm(only dcache flush)
 - flush_icache_page (just nop)
 - copy_from_user_page   (remove no need flush)
 - copy_to_user_page (remove no need flush)

Change to V2:
 - Fixup compile error with xa_lock*(>i_pages)

Signed-off-by: Guo Ren 
Cc: Arnd Bergmann 
Cc: Christoph Hellwig 
---
 arch/csky/abiv1/cacheflush.c | 20 ++
 arch/csky/abiv1/inc/abi/cacheflush.h | 41 +---
 2 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
index fee99fc..9f1fe80 100644
--- a/arch/csky/abiv1/cacheflush.c
+++ b/arch/csky/abiv1/cacheflush.c
@@ -54,3 +54,23 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned 
long addr,
icache_inv_all();
}
 }
+
+void flush_kernel_dcache_page(struct page *page)
+{
+   struct address_space *mapping;
+
+   mapping = page_mapping_file(page);
+
+   if (!mapping || mapping_mapped(mapping))
+   dcache_wbinv_all();
+}
+EXPORT_SYMBOL(flush_kernel_dcache_page);
+
+void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
+   unsigned long end)
+{
+   dcache_wbinv_all();
+
+   if (vma->vm_flags & VM_EXEC)
+   icache_inv_all();
+}
diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h 
b/arch/csky/abiv1/inc/abi/cacheflush.h
index fce5604..79ef9e8 100644
--- a/arch/csky/abiv1/inc/abi/cacheflush.h
+++ b/arch/csky/abiv1/inc/abi/cacheflush.h
@@ -4,26 +4,49 @@
 #ifndef __ABI_CSKY_CACHEFLUSH_H
 #define __ABI_CSKY_CACHEFLUSH_H
 
-#include 
+#include 
 #include 
 #include 
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
 
-#define flush_cache_mm(mm) cache_wbinv_all()
+#define flush_cache_mm(mm) dcache_wbinv_all()
 #define flush_cache_page(vma, page, pfn)   cache_wbinv_all()
 #define flush_cache_dup_mm(mm) cache_wbinv_all()
 
+#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
+extern void flush_kernel_dcache_page(struct page *);
+
+#define flush_dcache_mmap_lock(mapping)
xa_lock_irq(>i_pages)
+#define flush_dcache_mmap_unlock(mapping)  xa_unlock_irq(>i_pages)
+
+static inline void flush_kernel_vmap_range(void *addr, int size)
+{
+   dcache_wbinv_all();
+}
+static inline void invalidate_kernel_vmap_range(void *addr, int size)
+{
+   dcache_wbinv_all();
+}
+
+#define ARCH_HAS_FLUSH_ANON_PAGE
+static inline void flush_anon_page(struct vm_area_struct *vma,
+struct page *page, unsigned long vmaddr)
+{
+   if (PageAnon(page))
+   cache_wbinv_all();
+}
+
 /*
  * if (current_mm != vma->mm) cache_wbinv_range(start, end) will be broken.
  * Use cache_wbinv_all() here and need to be improved in future.
  */
-#define flush_cache_range(vma, start, end) cache_wbinv_all()
-#define flush_cache_vmap(start, end)   cache_wbinv_range(start, end)
-#define flush_cache_vunmap(start, end) cache_wbinv_range(start, end)
+extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, 
unsigned long end);
+#define flush_cache_vmap(start, end)   cache_wbinv_all()
+#define flush_cache_vunmap(start, end) cache_wbinv_all()
 
-#define flush_icache_page(vma, page)   cache_wbinv_all()
+#define flush_icache_page(vma, page)   do {} while (0);
 #define flush_icache_range(start, end) cache_wbinv_range(start, end)
 
 #define flush_icache_user_range(vma,page,addr,len) \
@@ -31,19 +54,13 @@ extern void flush_dcache_page(struct page *);
 
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
 do { \
-   cache_wbinv_all(); \
memcpy(dst, src, len); \
-   cache_wbinv_all(); \
 } while (0)
 
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
 do { \
-   cache_wbinv_all(); \
memcpy(dst, src, len); \
cache_wbinv_all(); \
 } while (0)
 
-#define flush_dcache_mmap_lock(mapping)do {} while (0)
-#define flush_dcache_mmap_unlock(mapping)  do {} while (0)
-
 #endif /* __ABI_CSKY_CACHEFLUSH_H */
-- 
2.7.4

Re: [PATCH] rcu: don't include in rcutiny.h

2019-08-21 Thread Paul E. McKenney

On Thu, Aug 22, 2019 at 10:53:43AM +0900, Christoph Hellwig wrote:
> The kbuild reported a built failure due to a header loop when RCUTINY is
> enabled with my pending riscv-nommu port.  Switch rcutiny.h to only
> include the minimal required header to get HZ instead.
> 
> Signed-off-by: Christoph Hellwig 

Queued for review and testing, thank you!

Do you need this in v5.4?  My normal workflow would put it into v5.5.

Thanx, Paul

> ---
>  include/linux/rcutiny.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 8e727f57d814..9bf1dfe7781f 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -12,7 +12,7 @@
>  #ifndef __LINUX_TINY_H
>  #define __LINUX_TINY_H
>  
> -#include 
> +#include  /* for HZ */
>  
>  /* Never flag non-existent other CPUs! */
>  static inline bool rcu_eqs_special_set(int cpu) { return false; }
> -- 
> 2.20.1
>

[PATCH V4 1/4] dt-bindings: watchdog: Add i.MX7ULP bindings

2019-08-21 Thread Anson Huang

Add the watchdog bindings for Freescale i.MX7ULP.

Signed-off-by: Anson Huang 
---
No changes.
---
 .../bindings/watchdog/fsl-imx7ulp-wdt.txt  | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/watchdog/fsl-imx7ulp-wdt.txt

diff --git a/Documentation/devicetree/bindings/watchdog/fsl-imx7ulp-wdt.txt 
b/Documentation/devicetree/bindings/watchdog/fsl-imx7ulp-wdt.txt
new file mode 100644
index 000..d83fc5c
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/fsl-imx7ulp-wdt.txt
@@ -0,0 +1,22 @@
+* Freescale i.MX7ULP Watchdog Timer (WDT) Controller
+
+Required properties:
+- compatible : Should be "fsl,imx7ulp-wdt"
+- reg : Should contain WDT registers location and length
+- interrupts : Should contain WDT interrupt
+- clocks: Should contain a phandle pointing to the gated peripheral clock.
+
+Optional properties:
+- timeout-sec : Contains the watchdog timeout in seconds
+
+Examples:
+
+wdog1: wdog@403d {
+   compatible = "fsl,imx7ulp-wdt";
+   reg = <0x403d 0x1>;
+   interrupts = ;
+   clocks = < IMX7ULP_CLK_WDG1>;
+   assigned-clocks = < IMX7ULP_CLK_WDG1>;
+   assigned-clocks-parents = < IMX7ULP_CLK_FIRC_BUS_CLK>;
+   timeout-sec = <40>;
+};
-- 
2.7.4

[PATCH V4 2/4] watchdog: Add i.MX7ULP watchdog support

2019-08-21 Thread Anson Huang

The i.MX7ULP Watchdog Timer (WDOG) module is an independent timer
that is available for system use.
It provides a safety feature to ensure that software is executing
as planned and that the CPU is not stuck in an infinite loop or
executing unintended code. If the WDOG module is not serviced
(refreshed) within a certain period, it resets the MCU.

Add driver support for i.MX7ULP watchdog.

Signed-off-by: Anson Huang 
---
Changes since V3:
- pass clk directly for reset action to avoid dereference from 
structure;
- use constant instead of variable for wdog clock rate, as it is fixed 
as 1KHz.
---
 drivers/watchdog/Kconfig   |  13 +++
 drivers/watchdog/Makefile  |   1 +
 drivers/watchdog/imx7ulp_wdt.c | 243 +
 3 files changed, 257 insertions(+)
 create mode 100644 drivers/watchdog/imx7ulp_wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index a8f5c81..d68e5b5 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -724,6 +724,19 @@ config IMX_SC_WDT
  To compile this driver as a module, choose M here: the
  module will be called imx_sc_wdt.
 
+config IMX7ULP_WDT
+   tristate "IMX7ULP Watchdog"
+   depends on ARCH_MXC || COMPILE_TEST
+   select WATCHDOG_CORE
+   help
+ This is the driver for the hardware watchdog on the Freescale
+ IMX7ULP and later processors. If you have one of these
+ processors and wish to have watchdog support enabled,
+ say Y, otherwise say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called imx7ulp_wdt.
+
 config UX500_WATCHDOG
tristate "ST-Ericsson Ux500 watchdog"
depends on MFD_DB8500_PRCMU
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index b5a0aed..2ee352b 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_TS4800_WATCHDOG) += ts4800_wdt.o
 obj-$(CONFIG_TS72XX_WATCHDOG) += ts72xx_wdt.o
 obj-$(CONFIG_IMX2_WDT) += imx2_wdt.o
 obj-$(CONFIG_IMX_SC_WDT) += imx_sc_wdt.o
+obj-$(CONFIG_IMX7ULP_WDT) += imx7ulp_wdt.o
 obj-$(CONFIG_UX500_WATCHDOG) += ux500_wdt.o
 obj-$(CONFIG_RETU_WATCHDOG) += retu_wdt.o
 obj-$(CONFIG_BCM2835_WDT) += bcm2835_wdt.o
diff --git a/drivers/watchdog/imx7ulp_wdt.c b/drivers/watchdog/imx7ulp_wdt.c
new file mode 100644
index 000..5ce5102
--- /dev/null
+++ b/drivers/watchdog/imx7ulp_wdt.c
@@ -0,0 +1,243 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 NXP.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define WDOG_CS0x0
+#define WDOG_CS_CMD32ENBIT(13)
+#define WDOG_CS_ULKBIT(11)
+#define WDOG_CS_RCSBIT(10)
+#define WDOG_CS_EN BIT(7)
+#define WDOG_CS_UPDATE BIT(5)
+
+#define WDOG_CNT   0x4
+#define WDOG_TOVAL 0x8
+
+#define REFRESH_SEQ0   0xA602
+#define REFRESH_SEQ1   0xB480
+#define REFRESH((REFRESH_SEQ1 << 16) | REFRESH_SEQ0)
+
+#define UNLOCK_SEQ00xC520
+#define UNLOCK_SEQ10xD928
+#define UNLOCK ((UNLOCK_SEQ1 << 16) | UNLOCK_SEQ0)
+
+#define DEFAULT_TIMEOUT60
+#define MAX_TIMEOUT128
+#define WDOG_CLOCK_RATE1000
+
+static bool nowayout = WATCHDOG_NOWAYOUT;
+module_param(nowayout, bool, );
+MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
+__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
+
+struct imx7ulp_wdt_device {
+   struct notifier_block restart_handler;
+   struct watchdog_device wdd;
+   void __iomem *base;
+   struct clk *clk;
+};
+
+static inline void imx7ulp_wdt_enable(void __iomem *base, bool enable)
+{
+   u32 val = readl(base + WDOG_CS);
+
+   writel(UNLOCK, base + WDOG_CNT);
+   if (enable)
+   writel(val | WDOG_CS_EN, base + WDOG_CS);
+   else
+   writel(val & ~WDOG_CS_EN, base + WDOG_CS);
+}
+
+static inline bool imx7ulp_wdt_is_enabled(void __iomem *base)
+{
+   u32 val = readl(base + WDOG_CS);
+
+   return val & WDOG_CS_EN;
+}
+
+static int imx7ulp_wdt_ping(struct watchdog_device *wdog)
+{
+   struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
+
+   writel(REFRESH, wdt->base + WDOG_CNT);
+
+   return 0;
+}
+
+static int imx7ulp_wdt_start(struct watchdog_device *wdog)
+{
+   struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
+
+   imx7ulp_wdt_enable(wdt->base, true);
+
+   return 0;
+}
+
+static int imx7ulp_wdt_stop(struct watchdog_device *wdog)
+{
+   struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
+
+   imx7ulp_wdt_enable(wdt->base, false);
+
+   return 0;
+}
+
+static int imx7ulp_wdt_set_timeout(struct watchdog_device *wdog,
+  unsigned int timeout)
+{
+   struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
+

[PATCH V4 4/4] ARM: dts: imx7ulp: Add wdog1 node

2019-08-21 Thread Anson Huang

Add wdog1 node to support watchdog driver.

Signed-off-by: Anson Huang 
---
No changes.
---
 arch/arm/boot/dts/imx7ulp.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/boot/dts/imx7ulp.dtsi b/arch/arm/boot/dts/imx7ulp.dtsi
index 6859a3a..1fdb5a35 100644
--- a/arch/arm/boot/dts/imx7ulp.dtsi
+++ b/arch/arm/boot/dts/imx7ulp.dtsi
@@ -264,6 +264,16 @@
#clock-cells = <1>;
};
 
+   wdog1: wdog@403d {
+   compatible = "fsl,imx7ulp-wdt";
+   reg = <0x403d 0x1>;
+   interrupts = ;
+   clocks = < IMX7ULP_CLK_WDG1>;
+   assigned-clocks = < IMX7ULP_CLK_WDG1>;
+   assigned-clocks-parents = < 
IMX7ULP_CLK_FIRC_BUS_CLK>;
+   timeout-sec = <40>;
+   };
+
pcc2: clock-controller@403f {
compatible = "fsl,imx7ulp-pcc2";
reg = <0x403f 0x1>;
-- 
2.7.4

[PATCH V4 3/4] ARM: imx_v6_v7_defconfig: Enable CONFIG_IMX7ULP_WDT by default

2019-08-21 Thread Anson Huang

Select CONFIG_IMX7ULP_WDT by default to support i.MX7ULP watchdog.

Signed-off-by: Anson Huang 
---
No changes.
---
 arch/arm/configs/imx_v6_v7_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index 9bfffbe..be2a694 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -236,6 +236,7 @@ CONFIG_DA9062_WATCHDOG=y
 CONFIG_DA9063_WATCHDOG=m
 CONFIG_RN5T618_WATCHDOG=y
 CONFIG_IMX2_WDT=y
+CONFIG_IMX7ULP_WDT=y
 CONFIG_MFD_DA9052_I2C=y
 CONFIG_MFD_DA9062=y
 CONFIG_MFD_DA9063=y
-- 
2.7.4

RE: [PATCH V3 2/4] watchdog: Add i.MX7ULP watchdog support

2019-08-21 Thread Anson Huang

Hi, Guenter

> On Tue, Aug 20, 2019 at 10:07:56PM -0400, Anson Huang wrote:
> > The i.MX7ULP Watchdog Timer (WDOG) module is an independent timer
> that
> > is available for system use.
> > It provides a safety feature to ensure that software is executing as
> > planned and that the CPU is not stuck in an infinite loop or executing
> > unintended code. If the WDOG module is not serviced
> > (refreshed) within a certain period, it resets the MCU.
> >
> > Add driver support for i.MX7ULP watchdog.
> >
> > Signed-off-by: Anson Huang 
> > ---
> > Changes since V2:
> > - add devm_add_action_or_reset to disable clk for remove action.
> > ---
> >  drivers/watchdog/Kconfig   |  13 +++
> >  drivers/watchdog/Makefile  |   1 +
> >  drivers/watchdog/imx7ulp_wdt.c | 246
> > +
> >  3 files changed, 260 insertions(+)
> >  create mode 100644 drivers/watchdog/imx7ulp_wdt.c
> >
> > diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index
> > a8f5c81..d68e5b5 100644
> > --- a/drivers/watchdog/Kconfig
> > +++ b/drivers/watchdog/Kconfig
> > @@ -724,6 +724,19 @@ config IMX_SC_WDT
> >   To compile this driver as a module, choose M here: the
> >   module will be called imx_sc_wdt.
> >
> > +config IMX7ULP_WDT
> > +   tristate "IMX7ULP Watchdog"
> > +   depends on ARCH_MXC || COMPILE_TEST
> > +   select WATCHDOG_CORE
> > +   help
> > + This is the driver for the hardware watchdog on the Freescale
> > + IMX7ULP and later processors. If you have one of these
> > + processors and wish to have watchdog support enabled,
> > + say Y, otherwise say N.
> > +
> > + To compile this driver as a module, choose M here: the
> > + module will be called imx7ulp_wdt.
> > +
> >  config UX500_WATCHDOG
> > tristate "ST-Ericsson Ux500 watchdog"
> > depends on MFD_DB8500_PRCMU
> > diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
> > index b5a0aed..2ee352b 100644
> > --- a/drivers/watchdog/Makefile
> > +++ b/drivers/watchdog/Makefile
> > @@ -67,6 +67,7 @@ obj-$(CONFIG_TS4800_WATCHDOG) += ts4800_wdt.o
> >  obj-$(CONFIG_TS72XX_WATCHDOG) += ts72xx_wdt.o
> >  obj-$(CONFIG_IMX2_WDT) += imx2_wdt.o
> >  obj-$(CONFIG_IMX_SC_WDT) += imx_sc_wdt.o
> > +obj-$(CONFIG_IMX7ULP_WDT) += imx7ulp_wdt.o
> >  obj-$(CONFIG_UX500_WATCHDOG) += ux500_wdt.o
> >  obj-$(CONFIG_RETU_WATCHDOG) += retu_wdt.o
> >  obj-$(CONFIG_BCM2835_WDT) += bcm2835_wdt.o diff --git
> > a/drivers/watchdog/imx7ulp_wdt.c b/drivers/watchdog/imx7ulp_wdt.c
> new
> > file mode 100644 index 000..5d37957
> > --- /dev/null
> > +++ b/drivers/watchdog/imx7ulp_wdt.c
> > @@ -0,0 +1,246 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright 2019 NXP.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define WDOG_CS0x0
> > +#define WDOG_CS_CMD32ENBIT(13)
> > +#define WDOG_CS_ULKBIT(11)
> > +#define WDOG_CS_RCSBIT(10)
> > +#define WDOG_CS_EN BIT(7)
> > +#define WDOG_CS_UPDATE BIT(5)
> > +
> > +#define WDOG_CNT   0x4
> > +#define WDOG_TOVAL 0x8
> > +
> > +#define REFRESH_SEQ0   0xA602
> > +#define REFRESH_SEQ1   0xB480
> > +#define REFRESH((REFRESH_SEQ1 << 16) | REFRESH_SEQ0)
> > +
> > +#define UNLOCK_SEQ00xC520
> > +#define UNLOCK_SEQ10xD928
> > +#define UNLOCK ((UNLOCK_SEQ1 << 16) | UNLOCK_SEQ0)
> > +
> > +#define DEFAULT_TIMEOUT60
> > +#define MAX_TIMEOUT128
> > +
> > +static bool nowayout = WATCHDOG_NOWAYOUT;
> module_param(nowayout,
> > +bool, ); MODULE_PARM_DESC(nowayout, "Watchdog cannot be
> stopped
> > +once started (default="
> > +__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
> > +
> > +struct imx7ulp_wdt_device {
> > +   struct notifier_block restart_handler;
> > +   struct watchdog_device wdd;
> > +   void __iomem *base;
> > +   struct clk *clk;
> > +   int rate;
> > +};
> > +
> > +static inline void imx7ulp_wdt_enable(void __iomem *base, bool
> > +enable) {
> > +   u32 val = readl(base + WDOG_CS);
> > +
> > +   writel(UNLOCK, base + WDOG_CNT);
> > +   if (enable)
> > +   writel(val | WDOG_CS_EN, base + WDOG_CS);
> > +   else
> > +   writel(val & ~WDOG_CS_EN, base + WDOG_CS); }
> > +
> > +static inline bool imx7ulp_wdt_is_enabled(void __iomem *base) {
> > +   u32 val = readl(base + WDOG_CS);
> > +
> > +   return val & WDOG_CS_EN;
> > +}
> > +
> > +static int imx7ulp_wdt_ping(struct watchdog_device *wdog) {
> > +   struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
> > +
> > +   writel(REFRESH, wdt->base + WDOG_CNT);
> > +
> > +   return 0;
> > +}
> > +
> > +static int imx7ulp_wdt_start(struct watchdog_device *wdog) {
> > +   struct imx7ulp_wdt_device *wdt = watchdog_get_drvdata(wdog);
> > +
> > +   imx7ulp_wdt_enable(wdt->base, true);
> > +
> > +   return 0;
> > +}
> > +
> >

Re: [PATCH] cpufreq: Print driver name if cpufreq_suspend() fails

2019-08-21 Thread Viresh Kumar

On 21-08-19, 16:16, Florian Fainelli wrote:
> Instead of printing the policy, which is incidentally a kernel pointer,
> so with limited interest, print the cpufreq driver name that failed to
> be suspend, which is more useful for debugging.
> 
> Fixes: 2f0aea936360 ("cpufreq: suspend governors on system suspend/hibernate")

I will drop this tag as this isn't a bug really.

> Signed-off-by: Florian Fainelli 
> ---
>  drivers/cpufreq/cpufreq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index c28ebf2810f1..330d789f81fc 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1807,8 +1807,8 @@ void cpufreq_suspend(void)
>   }
>  
>   if (cpufreq_driver->suspend && cpufreq_driver->suspend(policy))
> - pr_err("%s: Failed to suspend driver: %p\n", __func__,
> - policy);
> + pr_err("%s: Failed to suspend driver: %s\n", __func__,
> + cpufreq_driver->name);
>   }
>  
>  suspend:

Acked-by: Viresh Kumar 

-- 
viresh

Re: [PATCH -next] cpufreq: qcom-hw: remove set but not used variable 'prev_cc'

2019-08-21 Thread Viresh Kumar

On 21-08-19, 20:14, YueHaibing wrote:
> drivers/cpufreq/qcom-cpufreq-hw.c: In function qcom_cpufreq_hw_read_lut:
> drivers/cpufreq/qcom-cpufreq-hw.c:89:38: warning:
>  variable prev_cc set but not used [-Wunused-but-set-variable]
> 
> It is not used since commit 3003e75a5045 ("cpufreq:
> qcom-hw: Update logic to detect turbo frequency")
> 
> Reported-by: Hulk Robot 
> Signed-off-by: YueHaibing 
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c 
> b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 3eea197..a9ae2f8 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -86,7 +86,7 @@ static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev,
>   struct cpufreq_policy *policy,
>   void __iomem *base)
>  {
> - u32 data, src, lval, i, core_count, prev_cc = 0, prev_freq = 0, freq;
> + u32 data, src, lval, i, core_count, prev_freq = 0, freq;
>   u32 volt;
>   struct cpufreq_frequency_table  *table;
>  
> @@ -139,7 +139,6 @@ static int qcom_cpufreq_hw_read_lut(struct device 
> *cpu_dev,
>   break;
>   }
>  
> - prev_cc = core_count;
>   prev_freq = freq;
>   }

@Sibi, you fine with this change ? I will merge it with the original patch then.

-- 
viresh

Re: [PATCH v2] KVM: LAPIC: Periodically revaluate to get conservative lapic_timer_advance_ns

2019-08-21 Thread Wanpeng Li

ping,
On Thu, 15 Aug 2019 at 12:03, Wanpeng Li  wrote:
>
> From: Wanpeng Li 
>
> Even if for realtime CPUs, cache line bounces, frequency scaling, presence
> of higher-priority RT tasks, etc can still cause different response. These
> interferences should be considered and periodically revaluate whether
> or not the lapic_timer_advance_ns value is the best, do nothing if it is,
> otherwise recaluate again. Set lapic_timer_advance_ns to the minimal
> conservative value from all the estimated values.
>
> Testing on Skylake server, cat vcpu*/lapic_timer_advance_ns, before patch:
> 1628
> 4161
> 4321
> 3236
> ...
>
> Testing on Skylake server, cat vcpu*/lapic_timer_advance_ns, after patch:
> 1553
> 1499
> 1509
> 1489
> ...
>
> Testing on Haswell desktop, cat vcpu*/lapic_timer_advance_ns, before patch:
> 4617
> 3641
> 4102
> 4577
> ...
> Testing on Haswell desktop, cat vcpu*/lapic_timer_advance_ns, after patch:
> 2775
> 2892
> 2764
> 2775
> ...
>
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Signed-off-by: Wanpeng Li 
> ---
>  arch/x86/kvm/lapic.c | 34 --
>  arch/x86/kvm/lapic.h |  2 ++
>  2 files changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index df5cd07..8487d9c 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -69,6 +69,7 @@
>  #define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000
>  /* step-by-step approximation to mitigate fluctuation */
>  #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
> +#define LAPIC_TIMER_ADVANCE_RECALC_PERIOD (600 * HZ)
>
>  static inline int apic_test_vector(int vec, void *bitmap)
>  {
> @@ -1480,10 +1481,21 @@ static inline void __wait_lapic_expire(struct 
> kvm_vcpu *vcpu, u64 guest_cycles)
>  static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
>   s64 advance_expire_delta)
>  {
> -   struct kvm_lapic *apic = vcpu->arch.apic;
> -   u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
> +   struct kvm_timer *ktimer = >arch.apic->lapic_timer;
> +   u32 timer_advance_ns = ktimer->timer_advance_ns;
> u64 ns;
>
> +   /* periodic revaluate */
> +   if (unlikely(ktimer->timer_advance_adjust_done)) {
> +   ktimer->recalc_timer_advance_ns = jiffies +
> +   LAPIC_TIMER_ADVANCE_RECALC_PERIOD;
> +   if (abs(advance_expire_delta) > 
> LAPIC_TIMER_ADVANCE_ADJUST_DONE) {
> +   timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
> +   ktimer->timer_advance_adjust_done = false;
> +   } else
> +   return;
> +   }
> +
> /* too early */
> if (advance_expire_delta < 0) {
> ns = -advance_expire_delta * 100ULL;
> @@ -1499,12 +1511,18 @@ static inline void adjust_lapic_timer_advance(struct 
> kvm_vcpu *vcpu,
> }
>
> if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
> -   apic->lapic_timer.timer_advance_adjust_done = true;
> +   ktimer->timer_advance_adjust_done = true;
> if (unlikely(timer_advance_ns > 5000)) {
> timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
> -   apic->lapic_timer.timer_advance_adjust_done = false;
> +   ktimer->timer_advance_adjust_done = false;
> +   }
> +   ktimer->timer_advance_ns = timer_advance_ns;
> +
> +   if (ktimer->timer_advance_adjust_done) {
> +   if (ktimer->min_timer_advance_ns > timer_advance_ns)
> +   ktimer->min_timer_advance_ns = timer_advance_ns;
> +   ktimer->timer_advance_ns = ktimer->min_timer_advance_ns;
> }
> -   apic->lapic_timer.timer_advance_ns = timer_advance_ns;
>  }
>
>  static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
> @@ -1523,7 +1541,8 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu 
> *vcpu)
> if (guest_tsc < tsc_deadline)
> __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
>
> -   if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
> +   if (unlikely(!apic->lapic_timer.timer_advance_adjust_done) ||
> +   time_before(apic->lapic_timer.recalc_timer_advance_ns, 
> jiffies))
> adjust_lapic_timer_advance(vcpu, 
> apic->lapic_timer.advance_expire_delta);
>  }
>
> @@ -2301,9 +2320,12 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int 
> timer_advance_ns)
> if (timer_advance_ns == -1) {
> apic->lapic_timer.timer_advance_ns = 
> LAPIC_TIMER_ADVANCE_ADJUST_INIT;
> apic->lapic_timer.timer_advance_adjust_done = false;
> +   apic->lapic_timer.recalc_timer_advance_ns = jiffies;
> +   apic->lapic_timer.min_timer_advance_ns = UINT_MAX;
> } else {
> apic->lapic_timer.timer_advance_ns = timer_advance_ns;
> apic->lapic_timer.timer_advance_adjust_done = true;
> +

Re: [PATCH v8 04/14] media: rkisp1: add Rockchip MIPI Synopsys DPHY driver

2019-08-21 Thread Laurent Pinchart

Hi Helen,

On Wed, Aug 21, 2019 at 06:46:15PM -0300, Helen Koike wrote:
> On 8/15/19 2:54 PM, Laurent Pinchart wrote:
> > On Wed, Aug 07, 2019 at 10:37:55AM -0300, Helen Koike wrote:
> >> On 8/7/19 10:05 AM, Sakari Ailus wrote:
> >>> On Tue, Jul 30, 2019 at 03:42:46PM -0300, Helen Koike wrote:
>  From: Jacob Chen 
> 
>  This commit adds a subdev driver for Rockchip MIPI Synopsys DPHY driver
> 
>  Signed-off-by: Jacob Chen 
>  Signed-off-by: Shunqian Zheng 
>  Signed-off-by: Tomasz Figa 
>  [migrate to phy framework]
>  Signed-off-by: Ezequiel Garcia 
>  [update for upstream]
>  Signed-off-by: Helen Koike 
> 
>  ---
> 
>  Changes in v8:
>  - Remove boiler plate license text
> 
>  Changes in v7:
>  - Migrate dphy specific code from
>  drivers/media/platform/rockchip/isp1/mipi_dphy_sy.c
>  to drivers/phy/rockchip/phy-rockchip-dphy.c
>  - Drop support for rk3288
>  - Drop support for dphy txrx
>  - code styling and checkpatch fixes
> 
>   drivers/phy/rockchip/Kconfig |   8 +
>   drivers/phy/rockchip/Makefile|   1 +
>   drivers/phy/rockchip/phy-rockchip-dphy.c | 408 +++
>   3 files changed, 417 insertions(+)
>   create mode 100644 drivers/phy/rockchip/phy-rockchip-dphy.c
> 
>  diff --git a/drivers/phy/rockchip/Kconfig b/drivers/phy/rockchip/Kconfig
>  index c454c90cd99e..afd072f135e6 100644
>  --- a/drivers/phy/rockchip/Kconfig
>  +++ b/drivers/phy/rockchip/Kconfig
>  @@ -9,6 +9,14 @@ config PHY_ROCKCHIP_DP
>   help
> Enable this to support the Rockchip Display Port PHY.
>   
>  +config PHY_ROCKCHIP_DPHY
>  +tristate "Rockchip MIPI Synopsys DPHY driver"
> > 
> > How much of this PHY is Rockchip-specific ? Would it make sense to turn
> > it into a Synopsys DPHY driver, with some Rockchip glue ? I suppose this
> > could always be done later, if needed (and I also suppose there's no
> > existing driver in drivers/phy/ that support the same Synopsys IP).
> > 
>  +depends on ARCH_ROCKCHIP && OF
> >>>
> >>> How about (...) || COMPILE_TEST ?
> >>>
>  +select GENERIC_PHY_MIPI_DPHY
>  +select GENERIC_PHY
>  +help
>  +  Enable this to support the Rockchip MIPI Synopsys DPHY.
>  +
>   config PHY_ROCKCHIP_EMMC
>   tristate "Rockchip EMMC PHY Driver"
>   depends on ARCH_ROCKCHIP && OF
>  diff --git a/drivers/phy/rockchip/Makefile 
>  b/drivers/phy/rockchip/Makefile
>  index fd21cbaf40dd..f62e9010bcaf 100644
>  --- a/drivers/phy/rockchip/Makefile
>  +++ b/drivers/phy/rockchip/Makefile
>  @@ -1,5 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0
>   obj-$(CONFIG_PHY_ROCKCHIP_DP)   += phy-rockchip-dp.o
>  +obj-$(CONFIG_PHY_ROCKCHIP_DPHY) += phy-rockchip-dphy.o
>   obj-$(CONFIG_PHY_ROCKCHIP_EMMC) += phy-rockchip-emmc.o
>   obj-$(CONFIG_PHY_ROCKCHIP_INNO_HDMI)+= phy-rockchip-inno-hdmi.o
>   obj-$(CONFIG_PHY_ROCKCHIP_INNO_USB2)+= phy-rockchip-inno-usb2.o
>  diff --git a/drivers/phy/rockchip/phy-rockchip-dphy.c 
>  b/drivers/phy/rockchip/phy-rockchip-dphy.c
>  new file mode 100644
>  index ..3a29976c2dff
>  --- /dev/null
>  +++ b/drivers/phy/rockchip/phy-rockchip-dphy.c
>  @@ -0,0 +1,408 @@
>  +// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>  +/*
>  + * Rockchip MIPI Synopsys DPHY driver
>  + *
>  + * Based on:
>  + *
>  + * Copyright (C) 2016 FuZhou Rockchip Co., Ltd.
>  + * Author: Yakir Yang 
>  + */
>  +
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +#include 
>  +
>  +#define RK3399_GRF_SOC_CON9 0x6224
>  +#define RK3399_GRF_SOC_CON210x6254
>  +#define RK3399_GRF_SOC_CON220x6258
>  +#define RK3399_GRF_SOC_CON230x625c
>  +#define RK3399_GRF_SOC_CON240x6260
>  +#define RK3399_GRF_SOC_CON250x6264
>  +#define RK3399_GRF_SOC_STATUS1  0xe2a4
>  +
>  +#define CLOCK_LANE_HS_RX_CONTROL0x34
>  +#define LANE0_HS_RX_CONTROL 0x44
>  +#define LANE1_HS_RX_CONTROL 0x54
>  +#define LANE2_HS_RX_CONTROL 0x84
>  +#define LANE3_HS_RX_CONTROL 0x94
>  +#define HS_RX_DATA_LANES_THS_SETTLE_CONTROL 0x75
>  +
>  +#define MAX_DPHY_CLK 8
>  +
>  +#define PHY_TESTEN_ADDR (0x1 << 16)
>  +#define PHY_TESTEN_DATA (0x0 << 16)
>  +#define PHY_TESTCLK (0x1 << 1)
>  +#define PHY_TESTCLR (0x1 << 0)
> > 
> > Maybe s/0x// for the previous four lines ?
> > 
>  +#define

Re: [PATCH] ARM: UNWINDER_FRAME_POINTER implementation for Clang

2019-08-21 Thread Nick Desaulniers

On Wed, Aug 21, 2019 at 10:43 AM Nathan Huckleberry  wrote:
>
> On Tue, Aug 20, 2019 at 2:39 PM Nick Desaulniers
>  wrote:
> >
> > On Tue, Aug 20, 2019 at 12:44 PM Nathan Huckleberry  
> > wrote:
...snip...
> > > +tstr1, #0x10   @ 26 or 32-bit mode?
> > > +moveq  mask, #0xfc03
> >
> > Should we be using different masks for ARM vs THUMB as per the
> > reference implementation?
> The change that introduces the arm/thumb code looked like a script
> that was run over all arm in the kernel. Neither this code nor the
> reference solution is compatible with arm, so there's no need for the
> change.

Looks like you're referring to commit 8b592783a2e8 ("Thumb-2:
Implement the unified arch/arm/lib functions").

Currently, arch/arm/Kconfig.debug has:
  57 config UNWINDER_FRAME_POINTER
  58   bool "Frame pointer unwinder"
  59   depends on !THUMB2_KERNEL && !CC_IS_CLANG

So it looks like UNWINDER_FRAME_POINTER and THUMB2_KERNEL are mutually
exclusive.  Probably could send a patch cleaning that up. (ie.
removing the different masks; essentially removing the hunk from
arch/arm/lib/backtrace.S from 8b592783a2e8).

> > > +for_each_frame:tst frame, mask @ Check for 
> > > address exceptions
> > > +   bne no_frame
> > > +
> > > +/*
> > > + * sv_fp is the stack frame with the locals for the current considered
> > > + * function.
> > > + * sv_pc is the saved lr frame the frame above. This is a pointer to a
> > > + * code address within the current considered function, but
> > > + * it is not the function start. This value gets updated to be
> > > + * the function start later if it is possible.
> > > + */
> > > +1001:  ldr sv_pc, [frame, #4]  @ get saved 'pc'
> > > +1002:  ldr sv_fp, [frame, #0]  @ get saved fp
> >
> > The reference implementation applies the mask to sv_pc and sv_fp.  I
> > assume we want to, too?
> The mask is already applied to both. See for_each_frame:

ah, under the finished_setup label.
-- 
Thanks,
~Nick Desaulniers

Re: [PATCH V5 4/4] mmc: host: sdhci-pci: Add Genesys Logic GL975x support

2019-08-21 Thread Ben Chuang

Sorry to resend the email because of non-plain text issues.

On Wed, Aug 21, 2019 at 8:30 PM Adrian Hunter  wrote:
>
> On 20/08/19 5:07 AM, Ben Chuang wrote:
> > From: Ben Chuang 
> >
> > Add support for the GL9750 and GL9755 chipsets.
> >
> > The patches enable v4 mode and wait 5ms after set 1.8V signal enable for
> > GL9750/GL9755. It fixed the value of SDHCI_MAX_CURRENT register and uses
> > the vendor tuning flow for GL9750.
> >
> > Signed-off-by: Ben Chuang 
> > Co-developed-by: Michael K Johnson 
> > Signed-off-by: Michael K Johnson 
> > ---
> >  drivers/mmc/host/Makefile |   2 +-
> >  drivers/mmc/host/sdhci-pci-core.c |   2 +
> >  drivers/mmc/host/sdhci-pci-gli.c  | 381 ++
> >  drivers/mmc/host/sdhci-pci.h  |   5 +
> >  4 files changed, 389 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/mmc/host/sdhci-pci-gli.c
> >
> > diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
> > index 73578718f119..661445415090 100644
> > --- a/drivers/mmc/host/Makefile
> > +++ b/drivers/mmc/host/Makefile
> > @@ -13,7 +13,7 @@ obj-$(CONFIG_MMC_MXS)   += mxs-mmc.o
> >  obj-$(CONFIG_MMC_SDHCI)  += sdhci.o
> >  obj-$(CONFIG_MMC_SDHCI_PCI)  += sdhci-pci.o
> >  sdhci-pci-y  += sdhci-pci-core.o sdhci-pci-o2micro.o 
> > sdhci-pci-arasan.o \
> > -sdhci-pci-dwc-mshc.o
> > +sdhci-pci-dwc-mshc.o sdhci-pci-gli.o
> >  obj-$(subst m,y,$(CONFIG_MMC_SDHCI_PCI)) += sdhci-pci-data.o
> >  obj-$(CONFIG_MMC_SDHCI_ACPI) += sdhci-acpi.o
> >  obj-$(CONFIG_MMC_SDHCI_PXAV3)+= sdhci-pxav3.o
> > diff --git a/drivers/mmc/host/sdhci-pci-core.c 
> > b/drivers/mmc/host/sdhci-pci-core.c
> > index 4154ee11b47d..e5835fbf73bc 100644
> > --- a/drivers/mmc/host/sdhci-pci-core.c
> > +++ b/drivers/mmc/host/sdhci-pci-core.c
> > @@ -1682,6 +1682,8 @@ static const struct pci_device_id pci_ids[] = {
> >   SDHCI_PCI_DEVICE(O2, SEABIRD1, o2),
> >   SDHCI_PCI_DEVICE(ARASAN, PHY_EMMC, arasan),
> >   SDHCI_PCI_DEVICE(SYNOPSYS, DWC_MSHC, snps),
> > + SDHCI_PCI_DEVICE(GLI, 9750, gl9750),
> > + SDHCI_PCI_DEVICE(GLI, 9755, gl9755),
> >   SDHCI_PCI_DEVICE_CLASS(AMD, SYSTEM_SDHCI, PCI_CLASS_MASK, amd),
> >   /* Generic SD host controller */
> >   {PCI_DEVICE_CLASS(SYSTEM_SDHCI, PCI_CLASS_MASK)},
> > diff --git a/drivers/mmc/host/sdhci-pci-gli.c 
> > b/drivers/mmc/host/sdhci-pci-gli.c
> > new file mode 100644
> > index ..99abb7830e62
> > --- /dev/null
> > +++ b/drivers/mmc/host/sdhci-pci-gli.c
> > @@ -0,0 +1,381 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Copyright (C) 2019 Genesys Logic, Inc.
> > + *
> > + * Authors: Ben Chuang 
> > + *
> > + * Version: v0.9.0 (2019-08-08)
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include "sdhci.h"
> > +#include "sdhci-pci.h"
> > +
> > +/*  Genesys Logic extra registers */
> > +#define SDHCI_GLI_9750_WT 0x800
> > +#define SDHCI_GLI_9750_DRIVING0x860
> > +#define SDHCI_GLI_9750_PLL0x864
> > +#define SDHCI_GLI_9750_SW_CTRL0x874
> > +#define SDHCI_GLI_9750_MISC   0x878
> > +
> > +#define SDHCI_GLI_9750_TUNING_CONTROL0x540
> > +#define SDHCI_GLI_9750_TUNING_PARAMETERS 0x544
> > +
> > +#define GLI_MAX_TUNING_LOOP 40
> > +
> > +/* Genesys Logic chipset */
> > +static void gli_set_9750(struct sdhci_host *host)
> > +{
> > + u32 wt_value = 0;
> > + u32 driving_value = 0;
> > + u32 pll_value = 0;
> > + u32 sw_ctrl_value = 0;
> > + u32 misc_value = 0;
> > + u32 parameter_value = 0;
> > + u32 control_value = 0;
> > +
> > + u16 ctrl2 = 0;
> > +
> > + wt_value = sdhci_readl(host, SDHCI_GLI_9750_WT);
> > + if ((wt_value & 0x1) == 0) {
> > + wt_value |= 0x1;
> > + sdhci_writel(host, wt_value, SDHCI_GLI_9750_WT);
> > + }
> > +
> > + driving_value = sdhci_readl(host, SDHCI_GLI_9750_DRIVING);
> > + pll_value = sdhci_readl(host, SDHCI_GLI_9750_PLL);
> > + sw_ctrl_value = sdhci_readl(host, SDHCI_GLI_9750_SW_CTRL);
> > + misc_value = sdhci_readl(host, SDHCI_GLI_9750_MISC);
> > + parameter_value = sdhci_readl(host, SDHCI_GLI_9750_TUNING_PARAMETERS);
> > + control_value = sdhci_readl(host, SDHCI_GLI_9750_TUNING_CONTROL);
> > +
> > + driving_value &= ~(0x0C000FFF);
> > + driving_value |= 0x0C000FFF;
> > + sdhci_writel(host, driving_value, SDHCI_GLI_9750_DRIVING);
> > +
> > + sw_ctrl_value |= 0xc0;
> > + sdhci_writel(host, sw_ctrl_value, SDHCI_GLI_9750_SW_CTRL);
> > +
> > + // reset the tuning flow after reinit and before starting tuning
>
> For consistent style, please use C-style comments /* */ rather than //

Thanks. I will use C-style comments to replace // .

>
> > + pll_value |= 0x80; // bit23-1
>
> Please define bit fields and look at using GENMASK(), BIT(), FIELD_GET(),
> FIELD_PREP()
>

Got it. Thanks. I will define bit

[tip:timers/core 34/34] drivers//clocksource/hyperv_timer.c:264:35: error: 'hv_sched_clock_offset' undeclared; did you mean 'sched_clock_register'?

2019-08-21 Thread kbuild test robot

tree:   https://kernel.googlesource.com/pub/scm/linux/kernel/git/tip/tip.git 
timers/core
head:   b74e1d61dbc614ff35ef3ad9267c61ed06b09051
commit: b74e1d61dbc614ff35ef3ad9267c61ed06b09051 [34/34] clocksource/hyperv: 
Add Hyper-V specific sched clock function
config: i386-randconfig-g002-201933 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
git checkout b74e1d61dbc614ff35ef3ad9267c61ed06b09051
# save the attached .config to linux build tree
make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All error/warnings (new ones prefixed by >>):

   drivers//clocksource/hyperv_timer.c: In function 'read_hv_sched_clock_msr':
>> drivers//clocksource/hyperv_timer.c:264:35: error: 'hv_sched_clock_offset' 
>> undeclared (first use in this function); did you mean 'sched_clock_register'?
 return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
  ^
  sched_clock_register
   drivers//clocksource/hyperv_timer.c:264:35: note: each undeclared identifier 
is reported only once for each function it appears in
   drivers//clocksource/hyperv_timer.c: In function 'hv_init_clocksource':
   drivers//clocksource/hyperv_timer.c:334:2: error: 'hv_sched_clock_offset' 
undeclared (first use in this function); did you mean 'sched_clock_register'?
 hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
 ^
 sched_clock_register
   drivers//clocksource/hyperv_timer.c: In function 'read_hv_sched_clock_msr':
>> drivers//clocksource/hyperv_timer.c:265:1: warning: control reaches end of 
>> non-void function [-Wreturn-type]
}
^

vim +264 drivers//clocksource/hyperv_timer.c

   261  
   262  static u64 read_hv_sched_clock_msr(void)
   263  {
 > 264  return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
 > 265  }
   266  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH 0/3] fix interrupt swamp in NVMe

2019-08-21 Thread Ming Lei

On Thu, Aug 22, 2019 at 10:00 AM Keith Busch  wrote:
>
> On Wed, Aug 21, 2019 at 7:34 PM Ming Lei  wrote:
> > On Wed, Aug 21, 2019 at 04:27:00PM +, Long Li wrote:
> > > Here is the command to benchmark it:
> > >
> > > fio --bs=4k --ioengine=libaio --iodepth=128 
> > > --filename=/dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1:/dev/nvme8n1:/dev/nvme9n1
> > >  --direct=1 --runtime=120 --numjobs=80 --rw=randread --name=test 
> > > --group_reporting --gtod_reduce=1
> > >
> >
> > I can reproduce the issue on one machine(96 cores) with 4 NVMes(32 queues), 
> > so
> > each queue is served on 3 CPUs.
> >
> > IOPS drops > 20% when 'use_threaded_interrupts' is enabled. From fio log, 
> > CPU
> > context switch is increased a lot.
>
> Interestingly use_threaded_interrupts shows a marginal improvement on
> my machine with the same fio profile. It was only 5 NVMes, but they've
> one queue per-cpu on 112 cores.

Not investigate it yet.

BTW, my fio test is only done on the single hw queue via 'taskset -c
$cpu_list_of_the_queue',
without applying the threaded interrupt affinity patch. NVMe is Optane.

The same issue can be reproduced after I force to use 1:1 mapping via passing
'possible_cpus=32' kernel cmd line.

Maybe related with kernel options, so attache the one I used, and
basically it is
a subset of RHEL8 kernel.

Thanks,
Ming Lei


config.tar.gz
Description: application/gzip

[PATCH 04/15] sched,fair: move runnable_load_avg to cfs_rq

2019-08-21 Thread Rik van Riel

Since only the root cfs_rq runnable_load_avg field is used any more,
we can move the field from struct sched_avg, which every sched_entity
has one of, directly into the struct cfs_rq, of which we have way fewer.

No functional changes.

Suggested-by: Dietmar Eggemann 
Signed-off-by: Rik van Riel 
---
 include/linux/sched.h | 1 -
 kernel/sched/debug.c  | 3 +--
 kernel/sched/fair.c   | 8 
 kernel/sched/sched.h  | 1 +
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f5bb6948e40c..84a6cc6f5c47 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -394,7 +394,6 @@ struct sched_avg {
u32 util_sum;
u32 period_contrib;
unsigned long   load_avg;
-   unsigned long   runnable_load_avg;
unsigned long   util_avg;
struct util_est util_est;
 } cacheline_aligned;
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index cefc1b171c0b..6e7c8ff210a8 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -539,7 +539,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
SEQ_printf(m, "  .%-30s: %lu\n", "load_avg",
cfs_rq->avg.load_avg);
SEQ_printf(m, "  .%-30s: %lu\n", "runnable_load_avg",
-   cfs_rq->avg.runnable_load_avg);
+   cfs_rq->runnable_load_avg);
SEQ_printf(m, "  .%-30s: %lu\n", "util_avg",
cfs_rq->avg.util_avg);
SEQ_printf(m, "  .%-30s: %u\n", "util_est_enqueued",
@@ -960,7 +960,6 @@ void proc_sched_show_task(struct task_struct *p, struct 
pid_namespace *ns,
P(se.avg.load_sum);
P(se.avg.util_sum);
P(se.avg.load_avg);
-   P(se.avg.runnable_load_avg);
P(se.avg.util_avg);
P(se.enqueued_h_load);
P(se.avg.last_update_time);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 30afeda1620f..a48d0dbfc232 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2768,7 +2768,7 @@ enqueue_runnable_load_avg(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
struct cfs_rq *root_cfs_rq = _rq->rq->cfs;
se->enqueued_h_load = task_se_h_load(se);
 
-   root_cfs_rq->avg.runnable_load_avg += se->enqueued_h_load;
+   root_cfs_rq->runnable_load_avg += se->enqueued_h_load;
}
 }
 
@@ -2777,7 +2777,7 @@ dequeue_runnable_load_avg(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
 {
if (entity_is_task(se)) {
struct cfs_rq *root_cfs_rq = _rq->rq->cfs;
-   sub_positive(_cfs_rq->avg.runnable_load_avg,
+   sub_positive(_cfs_rq->runnable_load_avg,
se->enqueued_h_load);
}
 }
@@ -2795,7 +2795,7 @@ update_runnable_load_avg(struct sched_entity *se)
 
new_h_load = task_se_h_load(se);
delta = new_h_load - se->enqueued_h_load;
-   root_cfs_rq->avg.runnable_load_avg += delta;
+   root_cfs_rq->runnable_load_avg += delta;
se->enqueued_h_load = new_h_load;
 }
 
@@ -3561,7 +3561,7 @@ static void remove_entity_load_avg(struct sched_entity 
*se)
 
 static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq)
 {
-   return cfs_rq->avg.runnable_load_avg;
+   return cfs_rq->runnable_load_avg;
 }
 
 static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5be14cee61f9..32978a8de8ce 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -516,6 +516,7 @@ struct cfs_rq {
 * CFS load tracking
 */
struct sched_avgavg;
+   unsigned long   runnable_load_avg;
 #ifndef CONFIG_64BIT
u64 load_last_update_time_copy;
 #endif
-- 
2.20.1

[PATCH 14/15] sched,fair: ramp up task_se_h_weight quickly

2019-08-21 Thread Rik van Riel

The code in update_cfs_group / calc_group_shares has some logic to
quickly ramp up the load when a task has just started running in a
cgroup, in order to get sane values for the cgroup se->load.weight.

This code adds a similar hack to task_se_h_weight.

However, THIS CODE IS WRONG, since it does not do things hierarchically.

I am wondering a few things here:
1) Should I have something similar to the logic in calc_group_shares
   in update_cfs_rq_h_load?
2) If so, should I also use that fast-ramp-up value for task_h_load,
   to prevent the load balancer from thinking it is moving zero weight
   tasks around?
3) If update_cfs_rq_h_load is the wrong place, where should I be
   calculating a hierarchical group weight value, instead?

Not-yet-signed-off-by: Rik van Riel 
Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d6c881c5c4d5..3df5d60b245f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7672,6 +7672,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
 
 static unsigned long task_se_h_weight(struct sched_entity *se)
 {
+   unsigned long group_load;
struct cfs_rq *cfs_rq;
 
if (!task_se_in_cgroup(se))
@@ -7680,8 +7681,12 @@ static unsigned long task_se_h_weight(struct 
sched_entity *se)
cfs_rq = group_cfs_rq_of_parent(se);
update_cfs_rq_h_load(cfs_rq);
 
+   /* Ramp up quickly to keep h_weight sane. */
+   group_load = max(scale_load_down(se->parent->load.weight),
+   cfs_rq->h_load);
+
/* Reduce the load.weight by the h_load of the group the task is in. */
-   return (cfs_rq->h_load * se->load.weight) >> SCHED_FIXEDPOINT_SHIFT;
+   return (group_load * se->load.weight) >> SCHED_FIXEDPOINT_SHIFT;
 }
 
 static unsigned long task_se_h_load(struct sched_entity *se)
-- 
2.20.1

[PATCH 06/15] sched,cfs: use explicit cfs_rq of parent se helper

2019-08-21 Thread Rik van Riel

Use an explicit "cfs_rq of parent sched_entity" helper in a few
strategic places, where cfs_rq_of(se) may no longer point at the
right runqueue once we flatten the hierarchical cgroup runqueues.

No functional change.

Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04b216234265..31a26737a873 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -276,6 +276,15 @@ static inline struct cfs_rq *group_cfs_rq(struct 
sched_entity *grp)
return grp->my_q;
 }
 
+/* runqueue owned by the parent entity; the root cfs_rq for a top level se */
+static inline struct cfs_rq *group_cfs_rq_of_parent(struct sched_entity *se)
+{
+   if (se->parent)
+   return group_cfs_rq(se->parent);
+
+   return cfs_rq_of(se);
+}
+
 static inline bool list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq)
 {
struct rq *rq = rq_of(cfs_rq);
@@ -3319,7 +3328,7 @@ static inline int propagate_entity_load_avg(struct 
sched_entity *se)
 
gcfs_rq->propagate = 0;
 
-   cfs_rq = cfs_rq_of(se);
+   cfs_rq = group_cfs_rq_of_parent(se);
 
add_tg_cfs_propagate(cfs_rq, gcfs_rq->prop_runnable_sum);
 
@@ -7796,7 +7805,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
 
WRITE_ONCE(cfs_rq->h_load_next, NULL);
for_each_sched_entity(se) {
-   cfs_rq = cfs_rq_of(se);
+   cfs_rq = group_cfs_rq_of_parent(se);
WRITE_ONCE(cfs_rq->h_load_next, se);
if (cfs_rq->last_h_load_update == now)
break;
@@ -7819,7 +7828,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
 
 static unsigned long task_se_h_load(struct sched_entity *se)
 {
-   struct cfs_rq *cfs_rq = cfs_rq_of(se);
+   struct cfs_rq *cfs_rq = group_cfs_rq_of_parent(se);
 
update_cfs_rq_h_load(cfs_rq);
return div64_ul(se->avg.load_avg * cfs_rq->h_load,
@@ -10166,7 +10175,7 @@ static void task_tick_fair(struct rq *rq, struct 
task_struct *curr, int queued)
struct sched_entity *se = >se;
 
for_each_sched_entity(se) {
-   cfs_rq = cfs_rq_of(se);
+   cfs_rq = group_cfs_rq_of_parent(se);
entity_tick(cfs_rq, se, queued);
}
 
-- 
2.20.1

[v3 1/2] dt/bindings: clk: Add YAML schemas for LS1028A Display Clock bindings

2019-08-21 Thread Wen He

LS1028A has a clock domain PXLCLK0 used for provide pixel clocks to Display
output interface. Add a YAML schema for this.

Signed-off-by: Wen He 
Reviewed-by: Rob Herring 
---
 .../devicetree/bindings/clock/fsl,plldig.yaml | 43 +++
 1 file changed, 43 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/fsl,plldig.yaml

diff --git a/Documentation/devicetree/bindings/clock/fsl,plldig.yaml 
b/Documentation/devicetree/bindings/clock/fsl,plldig.yaml
new file mode 100644
index ..32274e94aafc
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/fsl,plldig.yaml
@@ -0,0 +1,43 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/bindings/clock/fsl,plldig.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NXP QorIQ Layerscape LS1028A Display PIXEL Clock Binding
+
+maintainers:
+  - Wen He 
+
+description: |
+  NXP LS1028A has a clock domain PXLCLK0 used for the Display output
+  interface in the display core, as implemented in TSMC CLN28HPM PLL.
+  which generate and offers pixel clocks to Display.
+
+properties:
+  compatible:
+const: fsl,ls1028a-plldig
+
+  reg:
+maxItems: 1
+
+  '#clock-cells':
+const: 0
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - '#clock-cells'
+
+examples:
+  # Display PIXEL Clock node:
+  - |
+dpclk: clock-display@f1f {
+compatible = "fsl,ls1028a-plldig";
+reg = <0x0 0xf1f 0x0 0x>;
+#clock-cells = <0>;
+clocks = <_27m>;
+};
+
+...
-- 
2.17.1

[v3 2/2] clk: ls1028a: Add clock driver for Display output interface

2019-08-21 Thread Wen He

Add clock driver for QorIQ LS1028A Display output interfaces(LCD, DPHY),
as implemented in TSMC CLN28HPM PLL, this PLL supports the programmable
integer division and range of the display output pixel clock's 27-594MHz.

Signed-off-by: Wen He 
---
change in v3:
- remove the OF dependency
- use clk_parent_data instead of parent_name

 drivers/clk/Kconfig  |  10 ++
 drivers/clk/Makefile |   1 +
 drivers/clk/clk-plldig.c | 283 +++
 3 files changed, 294 insertions(+)
 create mode 100644 drivers/clk/clk-plldig.c

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 801fa1cd0321..ab05f342af04 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -223,6 +223,16 @@ config CLK_QORIQ
  This adds the clock driver support for Freescale QorIQ platforms
  using common clock framework.
 
+config CLK_LS1028A_PLLDIG
+bool "Clock driver for LS1028A Display output"
+depends on ARCH_LAYERSCAPE || COMPILE_TEST
+default ARCH_LAYERSCAPE
+help
+  This driver support the Display output interfaces(LCD, DPHY) pixel 
clocks
+  of the QorIQ Layerscape LS1028A, as implemented TSMC CLN28HPM PLL. 
Not all
+  features of the PLL are currently supported by the driver. By 
default,
+  configured bypass mode with this PLL.
+
 config COMMON_CLK_XGENE
bool "Clock driver for APM XGene SoC"
default ARCH_XGENE
diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index 0cad76021297..c8e22a764c4d 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_COMMON_CLK_OXNAS)+= clk-oxnas.o
 obj-$(CONFIG_COMMON_CLK_PALMAS)+= clk-palmas.o
 obj-$(CONFIG_COMMON_CLK_PWM)   += clk-pwm.o
 obj-$(CONFIG_CLK_QORIQ)+= clk-qoriq.o
+obj-$(CONFIG_CLK_LS1028A_PLLDIG)   += clk-plldig.o
 obj-$(CONFIG_COMMON_CLK_RK808) += clk-rk808.o
 obj-$(CONFIG_COMMON_CLK_HI655X)+= clk-hi655x.o
 obj-$(CONFIG_COMMON_CLK_S2MPS11)   += clk-s2mps11.o
diff --git a/drivers/clk/clk-plldig.c b/drivers/clk/clk-plldig.c
new file mode 100644
index ..c5ce80a46fd4
--- /dev/null
+++ b/drivers/clk/clk-plldig.c
@@ -0,0 +1,283 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright 2019 NXP
+
+/*
+ * Clock driver for LS1028A Display output interfaces(LCD, DPHY).
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* PLLDIG register offsets and bit masks */
+#define PLLDIG_REG_PLLSR0x24
+#define PLLDIG_REG_PLLDV0x28
+#define PLLDIG_REG_PLLFM0x2c
+#define PLLDIG_REG_PLLFD0x30
+#define PLLDIG_REG_PLLCAL1  0x38
+#define PLLDIG_REG_PLLCAL2  0x3c
+#define PLLDIG_DEFAULE_MULT 0x2c
+#define PLLDIG_LOCK_MASKBIT(2)
+#define PLLDIG_SSCGBYP_ENABLE   BIT(30)
+#define PLLDIG_FDEN BIT(30)
+#define PLLDIG_DTHRCTL  (0x3 << 16)
+
+/* macro to get/set values into register */
+#define PLLDIG_GET_MULT(x)  (((x) & ~(0xff00)) << 0)
+#define PLLDIG_GET_RFDPHI1(x)   ((u32)(x) >> 25)
+#define PLLDIG_SET_RFDPHI1(x)   ((u32)(x) << 25)
+
+struct clk_plldig {
+   struct clk_hw hw;
+   void __iomem *regs;
+   struct device *dev;
+};
+#define to_clk_plldig(_hw) container_of(_hw, struct clk_plldig, hw)
+#define LOCK_TIMEOUT_USUSEC_PER_MSEC
+
+static int plldig_enable(struct clk_hw *hw)
+{
+   struct clk_plldig *data = to_clk_plldig(hw);
+   u32 val;
+
+   val = readl(data->regs + PLLDIG_REG_PLLFM);
+   /*
+* Use Bypass mode with PLL off by default, the frequency overshoot
+* detector output was disable. SSCG Bypass mode should be enable.
+*/
+   val |= PLLDIG_SSCGBYP_ENABLE;
+   writel(val, data->regs + PLLDIG_REG_PLLFM);
+
+   val = readl(data->regs + PLLDIG_REG_PLLFD);
+   /* Disable dither and Sigma delta modulation in bypass mode */
+   val |= (PLLDIG_FDEN | PLLDIG_DTHRCTL);
+   writel(val, data->regs + PLLDIG_REG_PLLFD);
+
+   return 0;
+}
+
+static void plldig_disable(struct clk_hw *hw)
+{
+   struct clk_plldig *data = to_clk_plldig(hw);
+   u32 val;
+
+   val = readl(data->regs + PLLDIG_REG_PLLFM);
+
+   val &= ~PLLDIG_SSCGBYP_ENABLE;
+   writel(val, data->regs + PLLDIG_REG_PLLFM);
+}
+
+static int plldig_is_enabled(struct clk_hw *hw)
+{
+   struct clk_plldig *data = to_clk_plldig(hw);
+
+   return (readl(data->regs + PLLDIG_REG_PLLFM) & PLLDIG_SSCGBYP_ENABLE);
+}
+
+/*
+ * Clock configuration relationship between the PHI1 frequency(fpll_phi) and
+ * the output frequency of the PLL is determined by the PLLDV, according to
+ * the following equation:
+ * pxlclk = fpll_phi / RFDPHI1 = (pll_ref x PLLDV[MFD]) / PLLDV[RFDPHI1].
+ */
+static bool plldig_is_valid_range(unsigned long

[PATCH 12/15] sched,fair: flatten update_curr functionality

2019-08-21 Thread Rik van Riel

Make it clear that update_curr only works on tasks any more.

There is no need for task_tick_fair to call it on every sched entity up
the hierarchy, so move the call out of entity_tick.

Signed-off-by: Rik van Riel `
Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fa8e88731821..5cfa3dbeba49 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -872,10 +872,11 @@ static void update_tg_load_avg(struct cfs_rq *cfs_rq, int 
force)
 static void update_curr(struct cfs_rq *cfs_rq)
 {
struct sched_entity *curr = cfs_rq->curr;
+   struct task_struct *curtask;
u64 now = rq_clock_task(rq_of(cfs_rq));
u64 delta_exec;
 
-   if (unlikely(!curr))
+   if (unlikely(!curr) || !entity_is_task(curr))
return;
 
delta_exec = now - curr->exec_start;
@@ -893,13 +894,10 @@ static void update_curr(struct cfs_rq *cfs_rq)
curr->vruntime += calc_delta_fair(delta_exec, curr);
update_min_vruntime(cfs_rq);
 
-   if (entity_is_task(curr)) {
-   struct task_struct *curtask = task_of(curr);
-
-   trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
-   cgroup_account_cputime(curtask, delta_exec);
-   account_group_exec_runtime(curtask, delta_exec);
-   }
+   curtask = task_of(curr);
+   trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
+   cgroup_account_cputime(curtask, delta_exec);
+   account_group_exec_runtime(curtask, delta_exec);
 
account_cfs_rq_runtime(cfs_rq, delta_exec);
 }
@@ -4196,11 +4194,6 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, 
struct sched_entity *prev)
 static void
 entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
 {
-   /*
-* Update run-time statistics of the 'current'.
-*/
-   update_curr(cfs_rq);
-
/*
 * Ensure that runnable average is periodically updated.
 */
@@ -10032,6 +10025,11 @@ static void task_tick_fair(struct rq *rq, struct 
task_struct *curr, int queued)
struct cfs_rq *cfs_rq;
struct sched_entity *se = >se;
 
+   /*
+* Update run-time statistics of the 'current'.
+*/
+   update_curr(>cfs);
+
for_each_sched_entity(se) {
cfs_rq = group_cfs_rq_of_parent(se);
entity_tick(cfs_rq, se, queued);
-- 
2.20.1

[PATCH 10/15] sched,fair: add helper functions for flattened runqueue

2019-08-21 Thread Rik van Riel

Add helper functions to make the flattened runqueue patch a little smaller.

The task_se_h_weight function is similar to task_se_h_load, but scales the
task weight by the group weight, without taking the task's duty cycle into
account.

The task_se_in_cgroup helper is functionally identical to parent_entity,
but directly calling a function with that name obscures what the other
code is trying to use it for, and would make the code harder to understand.

Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7b0d95f2e3a8..29bfa7379dec 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -243,6 +243,7 @@ static u64 __calc_delta(u64 delta_exec, unsigned long 
weight, struct load_weight
 
 const struct sched_class fair_sched_class;
 static unsigned long task_se_h_load(struct sched_entity *se);
+static unsigned long task_se_h_weight(struct sched_entity *se);
 
 /**
  * CFS operations on generic schedulable entities:
@@ -431,6 +432,12 @@ static inline struct sched_entity *parent_entity(struct 
sched_entity *se)
return se->parent;
 }
 
+/* Is this (task) sched_entity in a non-root cgroup? */
+static inline bool task_se_in_cgroup(struct sched_entity *se)
+{
+   return parent_entity(se);
+}
+
 static void
 find_matching_se(struct sched_entity **se, struct sched_entity **pse)
 {
@@ -513,6 +520,11 @@ static inline struct sched_entity *parent_entity(struct 
sched_entity *se)
return NULL;
 }
 
+static inline bool task_se_in_cgroup(struct sched_entity *se)
+{
+   return false;
+}
+
 static inline void
 find_matching_se(struct sched_entity **se, struct sched_entity **pse)
 {
@@ -7837,6 +7849,20 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
}
 }
 
+static unsigned long task_se_h_weight(struct sched_entity *se)
+{
+   struct cfs_rq *cfs_rq;
+
+   if (!task_se_in_cgroup(se))
+   return se->load.weight;
+
+   cfs_rq = group_cfs_rq_of_parent(se);
+   update_cfs_rq_h_load(cfs_rq);
+
+   /* Reduce the load.weight by the h_load of the group the task is in. */
+   return (cfs_rq->h_load * se->load.weight) >> SCHED_FIXEDPOINT_SHIFT;
+}
+
 static unsigned long task_se_h_load(struct sched_entity *se)
 {
struct cfs_rq *cfs_rq = group_cfs_rq_of_parent(se);
@@ -7873,6 +7899,11 @@ static unsigned long task_se_h_load(struct sched_entity 
*se)
 {
return se->avg.load_avg;
 }
+
+static unsigned long task_se_h_weight(struct sched_entity *se)
+{
+   return se->load.weight;
+}
 #endif
 
 /** Helpers for find_busiest_group /
-- 
2.20.1

[PATCH 15/15] sched,fair: scale vdiff in wakeup_preempt_entity

2019-08-21 Thread Rik van Riel

When a task wakes back up after having gone to sleep, place_entity
will limit the vruntime difference between min_vruntime and the
woken up task to half of sysctl_sched_latency.

The code in wakeup_preempt_entity calculates how much vruntime a
time slice for the woken up task represents, in wakeup_gran.

It then assumes that all the vruntime used since the task went to
sleep was used by the currently running task (which has its vruntime
scaled by calc_delta_fair, as well).

However, that assumption is not necessarily true, and the vruntime
may have advanced at different rates, pushed ahead by different tasks
on the CPU. This becomes more visible when the CPU controller is enabled.

This leads to the symptom that a high priority woken up task is likely to
preempt whatever is running, even if the currently running task is of equal
or higher priority than the woken up task!

Scaling the vdiff down if the currently running task is also high priority
solves that symptom.

This is not the correct thing to do if all of the vruntime was accumulated
by the current task, or other tasks at similar priority, and already scaled
by the same priority, but I do not have any better ideas on how to tackle
the "task X got preempted by task Y of the same priority" issue that system
administrators try to resolve by setting the sched_wakeup_granularity
sysctl variable to a larger value than half of sysctl_sched_latency...

Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3df5d60b245f..ef7629bdf41d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6774,6 +6774,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct 
sched_entity *se)
if (vdiff <= 0)
return -1;
 
+   vdiff = min((u64)vdiff, calc_delta_fair(vdiff, curr));
gran = wakeup_gran(se);
if (vdiff > gran)
return 1;
-- 
2.20.1

[PATCH 05/15] sched,fair: remove cfs_rqs from leaf_cfs_rq_list bottom up

2019-08-21 Thread Rik van Riel

Reducing the overhead of the CPU controller is achieved by not walking
all the sched_entities every time a task is enqueued or dequeued.

One of the things being checked every single time is whether the cfs_rq
is on the rq->leaf_cfs_rq_list.

By only removing a cfs_rq from the list once it no longer has children
on the list, we can avoid walking the sched_entity hierarchy if the bottom
cfs_rq is on the list, once the runqueues have been flattened.

Signed-off-by: Rik van Riel 
Suggested-by: Vincent Guittot 
---
 kernel/sched/fair.c | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a48d0dbfc232..04b216234265 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -369,6 +369,39 @@ static inline void assert_list_leaf_cfs_rq(struct rq *rq)
SCHED_WARN_ON(rq->tmp_alone_branch != >leaf_cfs_rq_list);
 }
 
+/*
+ * Because list_add_leaf_cfs_rq always places a child cfs_rq on the list
+ * immediately before a parent cfs_rq, and cfs_rqs are removed from the list
+ * bottom-up, we only have to test whether the cfs_rq before us on the list
+ * is our child.
+ */
+static inline bool child_cfs_rq_on_list(struct cfs_rq *cfs_rq)
+{
+   struct cfs_rq *prev_cfs_rq;
+   struct list_head *prev;
+
+   prev = cfs_rq->leaf_cfs_rq_list.prev;
+   prev_cfs_rq = container_of(prev, struct cfs_rq, leaf_cfs_rq_list);
+
+   return (prev_cfs_rq->tg->parent == cfs_rq->tg);
+}  
+
+/*
+ * Remove a cfs_rq from the list if it has no children on the list.
+ * The scheduler iterates over the list regularly; if conditions for
+ * removal are still true, we'll get to this cfs_rq in the future.
+ */
+static inline void list_del_leaf_cfs_rq_bottom(struct cfs_rq *cfs_rq)
+{
+   if (!cfs_rq->on_list)
+   return;
+
+   if (child_cfs_rq_on_list(cfs_rq))
+   return;
+
+   list_del_leaf_cfs_rq(cfs_rq);
+}
+
 /* Iterate thr' all leaf cfs_rq's on a runqueue */
 #define for_each_leaf_cfs_rq_safe(rq, cfs_rq, pos) \
list_for_each_entry_safe(cfs_rq, pos, >leaf_cfs_rq_list,\
@@ -7723,7 +7756,7 @@ static void update_blocked_averages(int cpu)
 * decayed cfs_rqs linger on the list.
 */
if (cfs_rq_is_decayed(cfs_rq))
-   list_del_leaf_cfs_rq(cfs_rq);
+   list_del_leaf_cfs_rq_bottom(cfs_rq);
 
/* Don't need periodic decay once load/util_avg are null */
if (cfs_rq_has_blocked(cfs_rq))
-- 
2.20.1

[PATCH 13/15] sched,fair: propagate sum_exec_runtime up the hierarchy

2019-08-21 Thread Rik van Riel

Now that enqueue_task_fair and dequeue_task_fair no longer iterate up
the hierarchy all the time, a method to lazily propagate sum_exec_runtime
up the hierarchy is necessary.

Once a tick, propagate the newly accumulated exec_runtime up the hierarchy,
and feed it into CFS bandwidth control.

Remove the pointless call to account_cfs_rq_runtime from update_curr,
which is always called with a root cfs_rq.

Signed-off-by: Rik van Riel 
---
 include/linux/sched.h |  1 +
 kernel/sched/core.c   |  1 +
 kernel/sched/fair.c   | 22 --
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 901c710363e7..bdca15b3afe7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -454,6 +454,7 @@ struct sched_entity {
int depth;
unsigned long   enqueued_h_load;
unsigned long   enqueued_h_weight;
+   u64 propagated_exec_runtime;
struct load_weight  h_load;
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fbd96900f715..9915d20e84a9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2137,6 +2137,7 @@ static void __sched_fork(unsigned long clone_flags, 
struct task_struct *p)
INIT_LIST_HEAD(>se.group_node);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
+   p->se.propagated_exec_runtime   = 0;
p->se.cfs_rq= NULL;
 #endif
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5cfa3dbeba49..d6c881c5c4d5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -898,8 +898,6 @@ static void update_curr(struct cfs_rq *cfs_rq)
trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
cgroup_account_cputime(curtask, delta_exec);
account_group_exec_runtime(curtask, delta_exec);
-
-   account_cfs_rq_runtime(cfs_rq, delta_exec);
 }
 
 static void update_curr_fair(struct rq *rq)
@@ -3412,6 +3410,20 @@ static inline bool skip_blocked_update(struct 
sched_entity *se)
return true;
 }
 
+static void propagate_exec_runtime(struct cfs_rq *cfs_rq,
+   struct sched_entity *se)
+{
+   struct sched_entity *parent = se->parent;
+   u64 diff = se->sum_exec_runtime - se->propagated_exec_runtime;
+
+   if (parent) {
+   parent->sum_exec_runtime += diff;
+   account_cfs_rq_runtime(cfs_rq, diff);
+   }
+
+   se->propagated_exec_runtime = se->sum_exec_runtime;
+}
+
 #else /* CONFIG_FAIR_GROUP_SCHED */
 
 static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force) {}
@@ -3423,6 +3435,11 @@ static inline int propagate_entity_load_avg(struct 
sched_entity *se)
 
 static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long 
runnable_sum) {}
 
+static void propagate_exec_runtime(struct cfs_rq *cfs_rq,
+   struct sched_entity *se);
+{
+}
+
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
 /**
@@ -10157,6 +10174,7 @@ static void propagate_entity_cfs_rq(struct sched_entity 
*se, int flags)
if (!(flags & DO_ATTACH))
break;
 
+   propagate_exec_runtime(cfs_rq, se);
update_cfs_group(se);
}
 }
-- 
2.20.1

[PATCH 08/15] sched,fair: simplify timeslice length code

2019-08-21 Thread Rik van Riel

The idea behind __sched_period makes sense, but the results do not always.

When a CPU has one high priority task and a large number of low priority
tasks, __sched_period will return a value larger than sysctl_sched_latency,
and the one high priority task may end up getting a timeslice all for itself
that is also much larger than sysctl_sched_latency.

The low priority tasks will have their time slices rounded up to
sysctl_sched_min_granularity, resulting in an even larger scheduling
latency than targeted by __sched_period.

Simplify the code by simply ripping out __sched_period and always taking
fractions of sysctl_sched_latency.

If a high priority task ends up getting a "too small" time slice compared
to low priority tasks, the vruntime scaling ensures that it will simply
get scheduled more frequently than low priority tasks.

Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 18 +-
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8f8c85c6da9b..74ee22c59d13 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -691,22 +691,6 @@ static inline u64 calc_delta_fair(u64 delta, struct 
sched_entity *se)
return delta;
 }
 
-/*
- * The idea is to set a period in which each task runs once.
- *
- * When there are too many tasks (sched_nr_latency) we have to stretch
- * this period because otherwise the slices get too small.
- *
- * p = (nr <= nl) ? l : l*nr/nl
- */
-static u64 __sched_period(unsigned long nr_running)
-{
-   if (unlikely(nr_running > sched_nr_latency))
-   return nr_running * sysctl_sched_min_granularity;
-   else
-   return sysctl_sched_latency;
-}
-
 /*
  * We calculate the wall-time slice from the period by taking a part
  * proportional to the weight.
@@ -715,7 +699,7 @@ static u64 __sched_period(unsigned long nr_running)
  */
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-   u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
+   u64 slice = sysctl_sched_latency;
 
for_each_sched_entity(se) {
struct load_weight *load;
-- 
2.20.1

[PATCH 07/15] sched,cfs: fix zero length timeslice calculation

2019-08-21 Thread Rik van Riel

The way the time slice length is currently calculated, not only do high
priority tasks get longer time slices than low priority tasks, but due
to fixed point math, low priority tasks could end up with a zero length
time slice. This can lead to cache thrashing and other inefficiencies.

Cap the minimum time slice length to sysctl_sched_min_granularity.

Tasks that end up getting a time slice length too long for their relative
priority will simply end up having their vruntime advanced much faster than
other tasks, resulting in them receiving time slices less frequently.

Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 31a26737a873..8f8c85c6da9b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -732,6 +732,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
}
slice = __calc_delta(slice, se->load.weight, load);
}
+
+   /*
+* To avoid cache thrashing, run at least sysctl_sched_min_granularity.
+* The vruntime of a low priority task advances faster; those tasks
+* will simply get time slices less frequently.
+*/
+   slice = max_t(u64, slice, sysctl_sched_min_granularity);
return slice;
 }
 
-- 
2.20.1

[PATCH 03/15] sched,fair: redefine runnable_load_avg as the sum of task_h_load

2019-08-21 Thread Rik van Riel

The runnable_load magic is used to quickly propagate information about
runnable tasks up the hierarchy of runqueues. The runnable_load_avg is
mostly used for the load balancing code, which only examines the value at
the root cfs_rq.

Redefine the root cfs_rq runnable_load_avg to be the sum of task_h_loads
of the runnable tasks. This works because the hierarchical runnable_load of
a task is already equal to the task_se_h_load today. This provides enough
information to the load balancer.

The runnable_load_avg of the cgroup cfs_rqs does not appear to be
used for anything, so don't bother calculating those.

This removes one of the things that the code currently traverses the
cgroup hierarchy for, and getting rid of it brings us one step closer
to a flat runqueue for the CPU controller.

Signed-off-by: Rik van Riel 
---
 include/linux/sched.h |   3 +-
 kernel/sched/core.c   |   2 -
 kernel/sched/debug.c  |   1 +
 kernel/sched/fair.c   | 125 +-
 kernel/sched/pelt.c   |  64 ++---
 kernel/sched/sched.h  |   6 --
 6 files changed, 56 insertions(+), 145 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..f5bb6948e40c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -391,7 +391,6 @@ struct util_est {
 struct sched_avg {
u64 last_update_time;
u64 load_sum;
-   u64 runnable_load_sum;
u32 util_sum;
u32 period_contrib;
unsigned long   load_avg;
@@ -439,7 +438,6 @@ struct sched_statistics {
 struct sched_entity {
/* For load-balancing: */
struct load_weight  load;
-   unsigned long   runnable_weight;
struct rb_node  run_node;
struct list_headgroup_node;
unsigned inton_rq;
@@ -455,6 +453,7 @@ struct sched_entity {
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
int depth;
+   unsigned long   enqueued_h_load;
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
struct cfs_rq   *cfs_rq;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 874c427742a9..fbd96900f715 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -744,7 +744,6 @@ static void set_load_weight(struct task_struct *p, bool 
update_load)
if (task_has_idle_policy(p)) {
load->weight = scale_load(WEIGHT_IDLEPRIO);
load->inv_weight = WMULT_IDLEPRIO;
-   p->se.runnable_weight = load->weight;
return;
}
 
@@ -757,7 +756,6 @@ static void set_load_weight(struct task_struct *p, bool 
update_load)
} else {
load->weight = scale_load(sched_prio_to_weight[prio]);
load->inv_weight = sched_prio_to_wmult[prio];
-   p->se.runnable_weight = load->weight;
}
 }
 
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index f6beaca97a09..cefc1b171c0b 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -962,6 +962,7 @@ void proc_sched_show_task(struct task_struct *p, struct 
pid_namespace *ns,
P(se.avg.load_avg);
P(se.avg.runnable_load_avg);
P(se.avg.util_avg);
+   P(se.enqueued_h_load);
P(se.avg.last_update_time);
P(se.avg.util_est.ewma);
P(se.avg.util_est.enqueued);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index eadf9a96b3e1..30afeda1620f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -723,9 +723,7 @@ void init_entity_runnable_average(struct sched_entity *se)
 * nothing has been attached to the task group yet.
 */
if (entity_is_task(se))
-   sa->runnable_load_avg = sa->load_avg = 
scale_load_down(se->load.weight);
-
-   se->runnable_weight = se->load.weight;
+   sa->load_avg = scale_load_down(se->load.weight);
 
/* when this task enqueue'ed, it will contribute to its cfs_rq's 
load_avg */
 }
@@ -2766,20 +2764,39 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
 static inline void
 enqueue_runnable_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-   cfs_rq->runnable_weight += se->runnable_weight;
+   if (entity_is_task(se)) {
+   struct cfs_rq *root_cfs_rq = _rq->rq->cfs;
+   se->enqueued_h_load = task_se_h_load(se);
 
-   cfs_rq->avg.runnable_load_avg += se->avg.runnable_load_avg;
-   cfs_rq->avg.runnable_load_sum += se_runnable(se) * 
se->avg.runnable_load_sum;
+   root_cfs_rq->avg.runnable_load_avg += se->enqueued_h_load;
+   }
 }
 
 static inline void
 dequeue_runnable_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-

[PATCH 02/15] sched: change /proc/sched_debug fields

2019-08-21 Thread Rik van Riel

Remove some fields from /proc/sched_debug that are removed from
sched_entity in a subsequent patch, and add h_load, which comes in
very handy to debug CPU controller weight distribution.

Signed-off-by: Rik van Riel 
Reviewed-by: Josef Bacik 
---
 kernel/sched/debug.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 14c6a8716ba1..f6beaca97a09 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -416,11 +416,9 @@ static void print_cfs_group_stats(struct seq_file *m, int 
cpu, struct task_group
}
 
P(se->load.weight);
-   P(se->runnable_weight);
 #ifdef CONFIG_SMP
P(se->avg.load_avg);
P(se->avg.util_avg);
-   P(se->avg.runnable_load_avg);
 #endif
 
 #undef PN_SCHEDSTAT
@@ -538,7 +536,6 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
SEQ_printf(m, "  .%-30s: %d\n", "nr_running", cfs_rq->nr_running);
SEQ_printf(m, "  .%-30s: %ld\n", "load", cfs_rq->load.weight);
 #ifdef CONFIG_SMP
-   SEQ_printf(m, "  .%-30s: %ld\n", "runnable_weight", 
cfs_rq->runnable_weight);
SEQ_printf(m, "  .%-30s: %lu\n", "load_avg",
cfs_rq->avg.load_avg);
SEQ_printf(m, "  .%-30s: %lu\n", "runnable_load_avg",
@@ -547,17 +544,15 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct 
cfs_rq *cfs_rq)
cfs_rq->avg.util_avg);
SEQ_printf(m, "  .%-30s: %u\n", "util_est_enqueued",
cfs_rq->avg.util_est.enqueued);
-   SEQ_printf(m, "  .%-30s: %ld\n", "removed.load_avg",
-   cfs_rq->removed.load_avg);
SEQ_printf(m, "  .%-30s: %ld\n", "removed.util_avg",
cfs_rq->removed.util_avg);
-   SEQ_printf(m, "  .%-30s: %ld\n", "removed.runnable_sum",
-   cfs_rq->removed.runnable_sum);
 #ifdef CONFIG_FAIR_GROUP_SCHED
SEQ_printf(m, "  .%-30s: %lu\n", "tg_load_avg_contrib",
cfs_rq->tg_load_avg_contrib);
SEQ_printf(m, "  .%-30s: %ld\n", "tg_load_avg",
atomic_long_read(_rq->tg->load_avg));
+   SEQ_printf(m, "  .%-30s: %lu\n", "h_load",
+   cfs_rq->h_load);
 #endif
 #endif
 #ifdef CONFIG_CFS_BANDWIDTH
@@ -961,10 +956,8 @@ void proc_sched_show_task(struct task_struct *p, struct 
pid_namespace *ns,
   "nr_involuntary_switches", (long long)p->nivcsw);
 
P(se.load.weight);
-   P(se.runnable_weight);
 #ifdef CONFIG_SMP
P(se.avg.load_sum);
-   P(se.avg.runnable_load_sum);
P(se.avg.util_sum);
P(se.avg.load_avg);
P(se.avg.runnable_load_avg);
-- 
2.20.1

[PATCH 11/15] sched,fair: flatten hierarchical runqueues

2019-08-21 Thread Rik van Riel

Flatten the hierarchical runqueues into just the per CPU rq.cfs runqueue.

Iteration of the sched_entity hierarchy is rate limited to once per jiffy
per sched_entity, which is a smaller change than it seems, because load
average adjustments were already rate limited to once per jiffy before this
patch series.

This patch breaks CONFIG_CFS_BANDWIDTH. The plan for that is to park tasks
from throttled cgroups onto their cgroup runqueues, and slowly (using the
GENTLE_FAIR_SLEEPERS) wake them back up, in vruntime order, once the cgroup
gets unthrottled, to prevent thundering herd issues.

Signed-off-by: Rik van Riel 

Header from folded patch 'fix-attach-detach_enticy_cfs_rq.patch~':

Subject: sched,fair: fix attach/detach_entity_cfs_rq

While attach_entity_cfs_rq and detach_entity_cfs_rq should iterate over
the hierarchy, they do not need to so that twice.

Passing flags into propagate_entity_cfs_rq allows us to reuse that same
loop from other functions.

Signed-off-by: Rik van Riel 


Header from folded patch 'enqueue-order.patch':

Subject: sched,fair: better ordering at enqueue_task_fair time

In order to get useful numbers for the task's hierarchical weight,
task priority, etc things need to be done in a certain order at task
enqueue time.

Specifically:
1) static load/weight to "local" cfs_rq
2) propagate load/weight up the tree
3) add runnable load avg to root cfs_rq

The reason is that each step depends on the things done by the
step beforehand, and we can end up with nonsense numbers if we
do not do things right.

Also, make sure that we walk all the way up the hierarchy at
enqueue_task_fair time in order to get the benefit from the ramp-up
logic in update_cfs_group.

Signed-off-by: Rik van Riel 
Suggested-by: Peter Zijlstra 
---
 include/linux/sched.h |   2 +
 kernel/sched/fair.c   | 502 ++
 kernel/sched/pelt.c   |   6 +-
 kernel/sched/pelt.h   |   2 +-
 kernel/sched/sched.h  |   2 +-
 5 files changed, 170 insertions(+), 344 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 84a6cc6f5c47..901c710363e7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -453,6 +453,8 @@ struct sched_entity {
 #ifdef CONFIG_FAIR_GROUP_SCHED
int depth;
unsigned long   enqueued_h_load;
+   unsigned long   enqueued_h_weight;
+   struct load_weight  h_load;
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
struct cfs_rq   *cfs_rq;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 29bfa7379dec..fa8e88731821 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -470,6 +470,19 @@ find_matching_se(struct sched_entity **se, struct 
sched_entity **pse)
}
 }
 
+/* Add the cgroup cfs_rqs to the list, for update_blocked_averages */
+static void enqueue_entity_cfs_rqs(struct sched_entity *se)
+{
+   SCHED_WARN_ON(!entity_is_task(se));
+
+   for_each_sched_entity(se) {
+   struct cfs_rq *cfs_rq = group_cfs_rq_of_parent(se);
+
+   if (list_add_leaf_cfs_rq(cfs_rq))
+   break;
+   }
+}
+
 #else  /* !CONFIG_FAIR_GROUP_SCHED */
 
 static inline struct task_struct *task_of(struct sched_entity *se)
@@ -697,8 +710,14 @@ int sched_proc_update_handler(struct ctl_table *table, int 
write,
  */
 static inline u64 calc_delta_fair(u64 delta, struct sched_entity *se)
 {
-   if (unlikely(se->load.weight != NICE_0_LOAD))
+   if (task_se_in_cgroup(se)) {
+   unsigned long h_weight = task_se_h_weight(se);
+   if (h_weight != se->h_load.weight)
+   update_load_set(>h_load, h_weight);
+   delta = __calc_delta(delta, NICE_0_LOAD, >h_load);
+   } else if (unlikely(se->load.weight != NICE_0_LOAD)) {
delta = __calc_delta(delta, NICE_0_LOAD, >load);
+   }
 
return delta;
 }
@@ -712,22 +731,16 @@ static inline u64 calc_delta_fair(u64 delta, struct 
sched_entity *se)
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
u64 slice = sysctl_sched_latency;
+   struct load_weight *load = _rq->load;
+   struct load_weight lw;
 
-   for_each_sched_entity(se) {
-   struct load_weight *load;
-   struct load_weight lw;
+   if (unlikely(!se->on_rq)) {
+   lw = cfs_rq->load;
 
-   cfs_rq = cfs_rq_of(se);
-   load = _rq->load;
-
-   if (unlikely(!se->on_rq)) {
-   lw = cfs_rq->load;
-
-   update_load_add(, se->load.weight);
-   load = 
-   }
-   slice = __calc_delta(slice, se->load.weight, load);
+   update_load_add(, task_se_h_weight(se));
+   load = 
}
+   slice = __calc_delta(slice, task_se_h_weight(se),

[PATCH 01/15] sched: introduce task_se_h_load helper

2019-08-21 Thread Rik van Riel

Sometimes the hierarchical load of a sched_entity needs to be calculated.
Rename task_h_load to task_se_h_load, and directly pass a sched_entity to
that function.

Move the function declaration up above where it will be used later.

No functional changes.

Signed-off-by: Rik van Riel 
Reviewed-by: Josef Bacik 
---
 kernel/sched/fair.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f35930f5e528..eadf9a96b3e1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -242,6 +242,7 @@ static u64 __calc_delta(u64 delta_exec, unsigned long 
weight, struct load_weight
 
 
 const struct sched_class fair_sched_class;
+static unsigned long task_se_h_load(struct sched_entity *se);
 
 /**
  * CFS operations on generic schedulable entities:
@@ -706,7 +707,6 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
 #ifdef CONFIG_SMP
 
 static int select_idle_sibling(struct task_struct *p, int prev_cpu, int cpu);
-static unsigned long task_h_load(struct task_struct *p);
 static unsigned long capacity_of(int cpu);
 
 /* Give new sched_entity start runnable values to heavy its load in infant 
time */
@@ -1668,7 +1668,7 @@ static void task_numa_compare(struct task_numa_env *env,
/*
 * In the overloaded case, try and keep the load balanced.
 */
-   load = task_h_load(env->p) - task_h_load(cur);
+   load = task_se_h_load(env->p->se) - task_se_h_load(cur->se);
if (!load)
goto assign;
 
@@ -1706,7 +1706,7 @@ static void task_numa_find_cpu(struct task_numa_env *env,
bool maymove = false;
int cpu;
 
-   load = task_h_load(env->p);
+   load = task_se_h_load(env->p->se);
dst_load = env->dst_stats.load + load;
src_load = env->src_stats.load - load;
 
@@ -3389,7 +3389,7 @@ static inline void add_tg_cfs_propagate(struct cfs_rq 
*cfs_rq, long runnable_sum
  * avg. The immediate corollary is that all (fair) tasks must be attached, see
  * post_init_entity_util_avg().
  *
- * cfs_rq->avg is used for task_h_load() and update_cfs_share() for example.
+ * cfs_rq->avg is used for task_se_h_load() and update_cfs_share() for example.
  *
  * Returns true if the load decayed or we removed load.
  *
@@ -3522,7 +3522,7 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, 
struct sched_entity *s
 
/*
 * Track task load average for carrying it to new CPU after migrated, 
and
-* track group sched_entity load average for task_h_load calc in 
migration
+* track group sched_entity load average for task_se_h_load calc in 
migration
 */
if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
__update_load_avg_se(now, cfs_rq, se);
@@ -3751,7 +3751,7 @@ static inline void update_misfit_status(struct 
task_struct *p, struct rq *rq)
return;
}
 
-   rq->misfit_task_load = task_h_load(p);
+   rq->misfit_task_load = task_se_h_load(>se);
 }
 
 #else /* CONFIG_SMP */
@@ -5739,7 +5739,7 @@ wake_affine_weight(struct sched_domain *sd, struct 
task_struct *p,
this_eff_load = target_load(this_cpu, sd->wake_idx);
 
if (sync) {
-   unsigned long current_load = task_h_load(current);
+   unsigned long current_load = task_se_h_load(>se);
 
if (current_load > this_eff_load)
return this_cpu;
@@ -5747,7 +5747,7 @@ wake_affine_weight(struct sched_domain *sd, struct 
task_struct *p,
this_eff_load -= current_load;
}
 
-   task_load = task_h_load(p);
+   task_load = task_se_h_load(>se);
 
this_eff_load += task_load;
if (sched_feat(WA_BIAS))
@@ -7600,7 +7600,7 @@ static int detach_tasks(struct lb_env *env)
if (!can_migrate_task(p, env))
goto next;
 
-   load = task_h_load(p);
+   load = task_se_h_load(>se);
 
if (sched_feat(LB_MIN) && load < 16 && 
!env->sd->nr_balance_failed)
goto next;
@@ -7833,12 +7833,12 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
}
 }
 
-static unsigned long task_h_load(struct task_struct *p)
+static unsigned long task_se_h_load(struct sched_entity *se)
 {
-   struct cfs_rq *cfs_rq = task_cfs_rq(p);
+   struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
update_cfs_rq_h_load(cfs_rq);
-   return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
+   return div64_ul(se->avg.load_avg * cfs_rq->h_load,
cfs_rq_load_avg(cfs_rq) + 1);
 }
 #else
@@ -7865,9 +7865,9 @@ static inline void update_blocked_averages(int cpu)
rq_unlock_irqrestore(rq, );
 }
 
-static unsigned long task_h_load(struct task_struct *p)
+static unsigned long task_se_h_load(struct sched_entity *se)
 {
-   return

[PATCH RFC v4 0/15] sched,fair: flatten CPU controller runqueues

2019-08-21 Thread Rik van Riel

The current implementation of the CPU controller uses hierarchical
runqueues, where on wakeup a task is enqueued on its group's runqueue,
the group is enqueued on the runqueue of the group above it, etc.

This increases a fairly large amount of overhead for workloads that
do a lot of wakeups a second, especially given that the default systemd
hierarchy is 2 or 3 levels deep.

This patch series is an attempt at reducing that overhead, by placing
all the tasks on the same runqueue, and scaling the task priority by
the priority of the group, which is calculated periodically.

My main TODO items for the next period of time are likely going to
be testing, testing, and testing. I hope to find and flush out any
corner case I can find, and make sure performance does not regress
with any workloads, and hopefully improves some.

Other TODO items:
- More code cleanups.
- Remove some more now unused code.
- Reimplement CONFIG_CFS_BANDWIDTH.

Plan for the CONFIG_CFS_BANDWIDTH reimplementation:
- When a cgroup gets throttled, mark the cgroup and its children
  as throttled.
- When pick_next_entity finds a task that is on a throttled cgroup,
  stash it on the cgroup runqueue (which is not used for runnable
  tasks any more). Leave the vruntime unchanged, and adjust that
  runqueue's vruntime to be that of the left-most task.
- When a cgroup gets unthrottled, and has tasks on it, place it on
  a vruntime ordered heap separate from the main runqueue.
- Have pick_next_task_fair grab one task off that heap every time it
  is called, and the min vruntime of that heap is lower than the
  vruntime of the CPU's cfs_rq (or the CPU has no other runnable tasks).
- Place that selected task on the CPU's cfs_rq, renormalizing its
  vruntime with the GENTLE_FAIR_SLEEPERS logic. That should help
  interleave the already runnable tasks with the recently unthrottled
  group, and prevent thundering herd issues.
- If the group gets throttled again before all of its task had a chance
  to run, vruntime sorting ensures all the tasks in the throttled cgroup
  get a chance to run over time.

Changes from v3:
- replace max_h_load with another hacky idea to ramp up the
  task_se_h_weight; I believe this new idea is wrong as well, but
  it will hopefully inspire a better solution (thanks to Peter Zijlstra)
- fix the ordering inside enqueue_task_fair to get task weights set up right
  (thanks to Peter Zijlstra)
- change wakeup_preempt_entity to reduce the number of task preemptions,
  hopefully resulting in behavior closer to what people configure in sysctl
- various other small cleanups and fixes

Changes from v2:
- fixed the web server performance regression, in a way vaguely similar
  to what Josef Bacik suggested (blame me for the implementation)
- removed some code duplication so the diffstat is redder than before
- propagate sum_exec_runtime up the tree, in preparation for CFS_BANDWIDTH
- small cleanups left and right

Changes from v1:
- use task_se_h_weight instead of task_se_h_load in calc_delta_fair
  and sched_slice, this seems to improve performance a little, but
  I still have some remaining regression to chase with our web server
  workload
- implement a number of the changes suggested by Dietmar Eggemann
  (still holding out for a better name for group_cfs_rq_of_parent)

This series applies on top of 5.2

 include/linux/sched.h |7 
 kernel/sched/core.c   |3 
 kernel/sched/debug.c  |   15 
 kernel/sched/fair.c   |  803 +-
 kernel/sched/pelt.c   |   68 +---
 kernel/sched/pelt.h   |2 
 kernel/sched/sched.h  |9 
 7 files changed, 372 insertions(+), 535 deletions(-)

[PATCH 09/15] sched,fair: refactor enqueue/dequeue_entity

2019-08-21 Thread Rik van Riel

Refactor enqueue_entity, dequeue_entity, and update_load_avg, in order
to split out the things we still want to happen at every level in the
cgroup hierarchy with a flat runqueue from the things we only need to
happen once.

No functional changes.

Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 64 +
 1 file changed, 42 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 74ee22c59d13..7b0d95f2e3a8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3502,7 +3502,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, 
struct sched_entity *s
 #define DO_ATTACH  0x4
 
 /* Update task and its cfs_rq load average */
-static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int flags)
+static inline bool update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int flags)
 {
u64 now = cfs_rq_clock_pelt(cfs_rq);
int decayed;
@@ -3531,6 +3531,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, 
struct sched_entity *s
 
} else if (decayed && (flags & UPDATE_TG))
update_tg_load_avg(cfs_rq, 0);
+
+   return decayed;
 }
 
 #ifndef CONFIG_64BIT
@@ -3747,9 +3749,10 @@ static inline void update_misfit_status(struct 
task_struct *p, struct rq *rq)
 #define SKIP_AGE_LOAD  0x0
 #define DO_ATTACH  0x0
 
-static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int not_used1)
+static inline bool update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int not_used1)
 {
cfs_rq_util_change(cfs_rq, 0);
+   return false;
 }
 
 static inline void remove_entity_load_avg(struct sched_entity *se) {}
@@ -3872,6 +3875,24 @@ static inline void check_schedstat_required(void)
  * CPU and an up-to-date min_vruntime on the destination CPU.
  */
 
+static bool
+enqueue_entity_groups(struct cfs_rq *cfs_rq, struct sched_entity *se, int 
flags)
+{
+   /*
+* When enqueuing a sched_entity, we must:
+*   - Update loads to have both entity and cfs_rq synced with now.
+*   - Add its load to cfs_rq->runnable_avg
+*   - For group_entity, update its weight to reflect the new share of
+* its group cfs_rq
+*   - Add its new weight to cfs_rq->load.weight
+*/
+   if (!update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH))
+   return false;
+
+   update_cfs_group(se);
+   return true;
+}
+
 static void
 enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
@@ -3896,16 +3917,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct 
sched_entity *se, int flags)
if (renorm && !curr)
se->vruntime += cfs_rq->min_vruntime;
 
-   /*
-* When enqueuing a sched_entity, we must:
-*   - Update loads to have both entity and cfs_rq synced with now.
-*   - Add its load to cfs_rq->runnable_avg
-*   - For group_entity, update its weight to reflect the new share of
-* its group cfs_rq
-*   - Add its new weight to cfs_rq->load.weight
-*/
-   update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH);
-   update_cfs_group(se);
enqueue_runnable_load_avg(cfs_rq, se);
account_entity_enqueue(cfs_rq, se);
 
@@ -3972,14 +3983,9 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct 
sched_entity *se)
 
 static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq);
 
-static void
-dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
+static bool
+dequeue_entity_groups(struct cfs_rq *cfs_rq, struct sched_entity *se, int 
flags)
 {
-   /*
-* Update run-time statistics of the 'current'.
-*/
-   update_curr(cfs_rq);
-
/*
 * When dequeuing a sched_entity, we must:
 *   - Update loads to have both entity and cfs_rq synced with now.
@@ -3988,7 +3994,21 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct 
sched_entity *se, int flags)
 *   - For group entity, update its weight to reflect the new share
 * of its group cfs_rq.
 */
-   update_load_avg(cfs_rq, se, UPDATE_TG);
+   if (!update_load_avg(cfs_rq, se, UPDATE_TG))
+   return false;
+   update_cfs_group(se);
+
+   return true;
+}
+
+static void
+dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
+{
+   /*
+* Update run-time statistics of the 'current'.
+*/
+   update_curr(cfs_rq);
+
dequeue_runnable_load_avg(cfs_rq, se);
 
update_stats_dequeue(cfs_rq, se, flags);
@@ -4012,8 +4032,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int flags)
/* return excess runtime on last dequeue */
return_cfs_rq_runtime(cfs_rq);
 
-   update_cfs_group(se);
-
/*
 * Now advance min_vruntime if @se was the entity holding it back,
 * except when:

Re: [PATCH 0/3] fix interrupt swamp in NVMe

2019-08-21 Thread Keith Busch

On Wed, Aug 21, 2019 at 7:34 PM Ming Lei  wrote:
> On Wed, Aug 21, 2019 at 04:27:00PM +, Long Li wrote:
> > Here is the command to benchmark it:
> >
> > fio --bs=4k --ioengine=libaio --iodepth=128 
> > --filename=/dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1:/dev/nvme8n1:/dev/nvme9n1
> >  --direct=1 --runtime=120 --numjobs=80 --rw=randread --name=test 
> > --group_reporting --gtod_reduce=1
> >
>
> I can reproduce the issue on one machine(96 cores) with 4 NVMes(32 queues), so
> each queue is served on 3 CPUs.
>
> IOPS drops > 20% when 'use_threaded_interrupts' is enabled. From fio log, CPU
> context switch is increased a lot.

Interestingly use_threaded_interrupts shows a marginal improvement on
my machine with the same fio profile. It was only 5 NVMes, but they've
one queue per-cpu on 112 cores.

[PATCH] rcu: don't include in rcutiny.h

2019-08-21 Thread Christoph Hellwig

The kbuild reported a built failure due to a header loop when RCUTINY is
enabled with my pending riscv-nommu port.  Switch rcutiny.h to only
include the minimal required header to get HZ instead.

Signed-off-by: Christoph Hellwig 
---
 include/linux/rcutiny.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 8e727f57d814..9bf1dfe7781f 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -12,7 +12,7 @@
 #ifndef __LINUX_TINY_H
 #define __LINUX_TINY_H
 
-#include 
+#include  /* for HZ */
 
 /* Never flag non-existent other CPUs! */
 static inline bool rcu_eqs_special_set(int cpu) { return false; }
-- 
2.20.1

Re: devm_memremap_pages() triggers a kasan_add_zero_shadow() warning

2019-08-21 Thread Qian Cai




> On Aug 21, 2019, at 9:31 PM, Baoquan He  wrote:
> 
> On 08/21/19 at 05:12pm, Qian Cai wrote:
 Does disabling CONFIG_RANDOMIZE_BASE help? Maybe that workaround has
 regressed. Effectively we need to find what is causing the kernel to
 sometimes be placed in the middle of a custom reserved memmap= range.
>>> 
>>> Yes, disabling KASLR works good so far. Assuming the workaround, i.e.,
>>> f28442497b5c
>>> (“x86/boot: Fix KASLR and memmap= collision”) is correct.
>>> 
>>> The only other commit that might regress it from my research so far is,
>>> 
>>> d52e7d5a952c ("x86/KASLR: Parse all 'memmap=' boot option entries”)
>>> 
>> 
>> It turns out that the origin commit f28442497b5c (“x86/boot: Fix KASLR and
>> memmap= collision”) has a bug that is unable to handle "memmap=" in
>> CONFIG_CMDLINE instead of a parameter in bootloader because when it (as well 
>> as
>> the commit d52e7d5a952c) calls get_cmd_line_ptr() in order to run
>> mem_avoid_memmap(), "boot_params" has no knowledge of CONFIG_CMDLINE. Only 
>> later
>> in setup_arch(), the kernel will deal with parameters over there.
> 
> Yes, we didn't consider CONFIG_CMDLINE during boot compressing stage. It
> should be a generic issue since other parameters from CONFIG_CMDLINE could
> be ignored too, not only KASLR handling. Would you like to cast a patch
> to fix it? Or I can fix it later, maybe next week.

I think you have more experience than me in this area, so if you have time to 
fix it, that
would be nice.

Re: [PATCH v3 3/3] RISC-V: Do not invoke SBI call if cpumask is empty

2019-08-21 Thread Christoph Hellwig

On Wed, Aug 21, 2019 at 05:46:44PM -0700, Atish Patra wrote:
> SBI calls are expensive. If cpumask is empty, there is no need to
> trap via SBI as no remote tlb flushing is required.
> 
> Signed-off-by: Atish Patra 
> ---
>  arch/riscv/mm/tlbflush.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index 9f58b3790baa..2bd3c418d769 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -21,6 +21,9 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, 
> unsigned long start,
>   goto issue_sfence;
>   }
>  
> + if (cpumask_empty(cmask))
> + goto done;

I think this can even be done before the get_cpu to optimize it a little
further.

Re: [PATCH v3 2/3] RISC-V: Issue a tlb page flush if possible

2019-08-21 Thread Christoph Hellwig

On Wed, Aug 21, 2019 at 05:46:43PM -0700, Atish Patra wrote:
> + if (size <= PAGE_SIZE && size != -1)
> + local_flush_tlb_page(start);
> + else
> + local_flush_tlb_all();

As Andreas pointed out (unsigned long)-1 is actually larger than
PAGE_SIZE, so we don't need the extra check.

Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread Christoph Hellwig

On Wed, Aug 21, 2019 at 05:46:42PM -0700, Atish Patra wrote:
> In RISC-V, tlb flush happens via SBI which is expensive. If the local
> cpu is the only cpu in cpumask, there is no need to invoke a SBI call.
> 
> Just do a local flush and return.
> 
> Signed-off-by: Atish Patra 
> ---
>  arch/riscv/mm/tlbflush.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index df93b26f1b9d..36430ee3bed9 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -2,6 +2,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  void flush_tlb_all(void)
> @@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, 
> unsigned long start,
>   unsigned long size)
>  {
>   struct cpumask hmask;
> + unsigned int cpuid = get_cpu();
>  
> + if (!cmask) {
> + riscv_cpuid_to_hartid_mask(cpu_online_mask, );
> + goto issue_sfence;
> + }
> +
> + if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
> + local_flush_tlb_all();
> + goto done;
> + }

I think a single core on a SMP kernel is a valid enough use case given
how litte distros still have UP kernels.  So Maybe this shiuld rather be:

if (!cmask)
cmask = cpu_online_mask;

if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
local_flush_tlb_all();
} else {
riscv_cpuid_to_hartid_mask(cmask, );
sbi_remote_sfence_vma(hmask.bits, start, size);
}

Re: [RFC PATCH 3/3] perf report: add --spe options for arm-spe

2019-08-21 Thread Tan Xiaojun

On 2019/8/21 20:38, James Clark wrote:
> Hi,
> 
> I also had a look at this and had a question about the --spe option.
> It seems that whatever options I give it, the output is the same:
> 
>   perf report 
> And
>   perf report --spe=t
> 
> Both give the same result:
> 
>   # Samples: 4  of event 'llc-miss'
>   # Event count (approx.): 4
>   #
>   # Children  Self  Command  Shared Object  Symbol
> 
>   #     ...  .  
> ..
>   #
>   ...
>   # Samples: 0  of event 'tlb-miss'
>   # Event count (approx.): 0
>   #
>   # Children  Self  Command  Shared Object  Symbol
>   #     ...  .  ..
>   #
> 
>   # Samples: 83  of event 'branch-miss'
>   # Event count (approx.): 83
>   #
>   # Children  Self  Command  Shared Object  Symbol
>
>   #     ...  .  
> .
>   #
>   ...
> 
> I would have expected it to not include the branch and LLC sections for the 
> second
> command with --spe=t.
> 

Hi,

Sorry, this should be a bug in my code.

> And that leads me to another point. Does it make sense to have this option as 
> a post
> processing step? SPE already has support for filtering events at collection 
> time with
> the PMSFCR_EL1 register.
> 
> Should we try to make the interface more like PEBS, where you specify which 
> events you
> are interested in doing precise tracing on like this?
> 
>   perf record -e branch-misses:pp
> 
> And then perf could use the modifier to configure SPE so that it only records 
> branch
> misses? The benefits of this would be keeping the user interface for precise 
> tracing
> similar between platforms.
> 

Good suggestion. And I need to spend some time thinking about how to implement 
it.

Thank you for your reply.
Xiaojun.

> Thanks
> James
> 
> On 02/08/2019 10:40, Tan Xiaojun wrote:
>> The previous patch added support in "perf report" for some arm-spe
>> events(llc-miss, tlb-miss, branch-miss). This patch adds their help
>> instructions.
>>
>> Signed-off-by: Tan Xiaojun 
>> ---
>>  tools/perf/Documentation/perf-report.txt | 9 +
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-report.txt 
>> b/tools/perf/Documentation/perf-report.txt
>> index 987261d..d998d4b 100644
>> --- a/tools/perf/Documentation/perf-report.txt
>> +++ b/tools/perf/Documentation/perf-report.txt
>> @@ -445,6 +445,15 @@ include::itrace.txt[]
>>  
>>  To disable decoding entirely, use --no-itrace.
>>  
>> +--spe::
>> +Options for decoding arm-spe tracing data. The options are:
>> +
>> +l   synthesize llc miss events
>> +t   synthesize tlb miss events
>> +b   synthesize branch miss events
>> +
>> +The default is all events i.e. the same as --spe=ltb
>> +
>>  --full-source-path::
>>  Show the full path for source files for srcline output.
>>  
>>

RE: [PATCH v3 3/3] tools: hv: add vmbus testing tool

2019-08-21 Thread Harry Zhang

Tool function issues:  Please validate args errors for  '-p' and '--path',  in 
or following validate_args_path().  

Comments of functionality:
-   it's confusing when fuzz_testing are all OFF, then user run ' python3 
/home/lisa/vmbus_testing -p 
/sys/kernel/debug/hyperv/000d3a6e-4548-000d-3a6e-4548000d3a6e delay -d 0 0 -D ' 
which will enable all delay testing state ('Y' in state files).  even I used 
"-D", "--dis_all" param. 
-   if we have subparsers of "disable-all" for the testing tool, then 
probably we don't need the mutually_exclusive_group under subparsers of "delay"
-   the path argument (-p) could be an argument for subparsers of "delay" 
and "view" only.

Regards,
Harry

-Original Message-
From: linux-hyperv-ow...@vger.kernel.org  
On Behalf Of Branden Bonaby
Sent: Tuesday, August 20, 2019 4:40 PM
To: KY Srinivasan ; Haiyang Zhang ; 
Stephen Hemminger ; sas...@kernel.org
Cc: brandonbonaby94 ; linux-hyp...@vger.kernel.org; 
linux-kernel@vger.kernel.org
Subject: [PATCH v3 3/3] tools: hv: add vmbus testing tool

This is a userspace tool to drive the testing. Currently it supports 
introducing user specified delay in the host to guest communication path on a 
per-channel basis.

Signed-off-by: Branden Bonaby 
---
Changes in v3:
 - Align python tool to match Linux coding style.

Changes in v2:
 - Move testing location to new location in debugfs.

 tools/hv/vmbus_testing | 342 +
 1 file changed, 342 insertions(+)
 create mode 100644 tools/hv/vmbus_testing

diff --git a/tools/hv/vmbus_testing b/tools/hv/vmbus_testing new file mode 
100644 index ..0f249f6ee698
--- /dev/null
+++ b/tools/hv/vmbus_testing
@@ -0,0 +1,342 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Program to allow users to fuzz test Hyper-V drivers # by interfacing 
+with Hyper-V debugfs directories # author: Branden Bonaby
+
+import os
+import cmd
+import argparse
+from collections import defaultdict
+from argparse import RawDescriptionHelpFormatter
+
+# debugfs paths for vmbus must exist (same as in lsvmbus) 
+debugfs_sys_path = "/sys/kernel/debug/hyperv"
+if not os.path.isdir(debugfs_sys_path):
+print("{} doesn't exist/check permissions".format(debugfs_sys_path))
+exit(-1)
+# Do not change unless, you change the debugfs attributes # in 
+"/sys/kernel/debug/hyperv//". All fuzz testing # attributes will 
+start with "fuzz_test".
+pathlen = len(debugfs_sys_path)
+fuzz_state_location = "fuzz_test_state"
+fuzz_states = {
+0 : "Disable",
+1 : "Enable"
+}
+
+fuzz_methods = {
+1 : "Delay_testing"
+}
+
+fuzz_delay_types = {
+1 : "fuzz_test_buffer_interrupt_delay",
+2 : "fuzz_test_message_delay"
+}
+
+def parse_args():
+parser = argparse.ArgumentParser(description = "vmbus_testing "
+"[-s] [0|1] [-q] [-p] \n""vmbus_testing [-s]"
+" [0|1] [-q][-p]  delay [-d] [val][val] 
[-E|-D]\n"
+"vmbus_testing [-q] disable-all\n"
+"vmbus_testing [-q] view [-v|-V]\n"
+"vmbus_testing --version",
+epilog = "Current testing options {}".format(fuzz_methods),
+prog = 'vmbus_testing',
+formatter_class = RawDescriptionHelpFormatter)
+subparsers = parser.add_subparsers(dest = "action")
+parser.add_argument("--version", action = "version",
+version = '%(prog)s 1.0')
+parser.add_argument("-q","--quiet", action = "store_true",
+help = "silence none important test messages")
+parser.add_argument("-s","--state", metavar = "", type = int,
+choices = range(0, 2),
+help = "Turn testing ON or OFF for a single device."
+" The value (1) will turn testing ON. The value"
+" of (0) will turn testing OFF with the default set"
+" to (0).")
+parser.add_argument("-p","--path", metavar = "",
+help = "Refers to the debugfs path to a vmbus device."
+" If the path is not a valid path to a vmbus device,"
+" the program will exit. The path must be the"
+" absolute path; use the lsvmbus command to find"
+" the path.")
+parser_delay = subparsers.add_parser("delay",
+help = "Delay buffer/message reads in microseconds.",
+description = "vmbus_testing -s [0|1] [-q] -p "
+" delay -d "
+"[buffer-delay-value] [message-delay-value]\n"
+"vmbus_testing [-q] delay [buffer-delay-value] "
+"[message-delay-value] -E\n"
+"vmbus_testing [-q] delay [buffer-delay-value] "
+

Re: [PATCH 0/3] fix interrupt swamp in NVMe

2019-08-21 Thread Ming Lei

On Wed, Aug 21, 2019 at 04:27:00PM +, Long Li wrote:
> >>>Subject: Re: [PATCH 0/3] fix interrupt swamp in NVMe
> >>>
> >>>On Wed, Aug 21, 2019 at 07:47:44AM +, Long Li wrote:
>  >>>Subject: Re: [PATCH 0/3] fix interrupt swamp in NVMe
>  >>>
>  >>>On 20/08/2019 09:25, Ming Lei wrote:
>   On Tue, Aug 20, 2019 at 2:14 PM  wrote:
>  >
>  > From: Long Li 
>  >
>  > This patch set tries to fix interrupt swamp in NVMe devices.
>  >
>  > On large systems with many CPUs, a number of CPUs may share
> >>>one
>  >>>NVMe
>  > hardware queue. It may have this situation where several CPUs
>  > are issuing I/Os, and all the I/Os are returned on the CPU where
>  > the
>  >>>hardware queue is bound to.
>  > This may result in that CPU swamped by interrupts and stay in
>  > interrupt mode for extended time while other CPUs continue to
>  > issue I/O. This can trigger Watchdog and RCU timeout, and make
>  > the system
>  >>>unresponsive.
>  >
>  > This patch set addresses this by enforcing scheduling and
>  > throttling I/O when CPU is starved in this situation.
>  >
>  > Long Li (3):
>  >   sched: define a function to report the number of context switches
> >>>on a
>  > CPU
>  >   sched: export idle_cpu()
>  >   nvme: complete request in work queue on CPU with flooded
>  > interrupts
>  >
>  >  drivers/nvme/host/core.c | 57
>  > +++-
>  >  drivers/nvme/host/nvme.h |  1 +
>  >  include/linux/sched.h|  2 ++
>  >  kernel/sched/core.c  |  7 +
>  >  4 files changed, 66 insertions(+), 1 deletion(-)
>  
>   Another simpler solution may be to complete request in threaded
>   interrupt handler for this case. Meantime allow scheduler to run
>   the interrupt thread handler on CPUs specified by the irq
>   affinity mask, which was discussed by the following link:
>  
>  
>  >>>https://lor
>  >>>e
>   .kernel.org%2Flkml%2Fe0e9478e-62a5-ca24-3b12-
>  >>>58f7d056383e%40huawei.com
>   %2Fdata=02%7C01%7Clongli%40microsoft.com%7Cc7f46d3e2
> >>>73f45
>  >>>176d1c08
>  
>  >>>d7254cc69e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63
> >>>70188
>  >>>8401588
>  
>  >>>9866sdata=h5k6HoGoyDxuhmDfuKLZUwgmw17PU%2BT%2FCb
> >>>awfxV
>  >>>Er3U%3D
>   reserved=0
>  
>   Could you try the above solution and see if the lockup can be
> >>>avoided?
>   John Garry
>   should have workable patch.
>  >>>
>  >>>Yeah, so we experimented with changing the interrupt handling in
>  >>>the SCSI driver I maintain to use a threaded handler IRQ handler
>  >>>plus patch below, and saw a significant throughput boost:
>  >>>
>  >>>--->8
>  >>>
>  >>>Subject: [PATCH] genirq: Add support to allow thread to use hard
>  >>>irq affinity
>  >>>
>  >>>Currently the cpu allowed mask for the threaded part of a threaded
>  >>>irq handler will be set to the effective affinity of the hard irq.
>  >>>
>  >>>Typically the effective affinity of the hard irq will be for a
>  >>>single cpu. As such, the threaded handler would always run on the
> >>>same cpu as the hard irq.
>  >>>
>  >>>We have seen scenarios in high data-rate throughput testing that
>  >>>the cpu handling the interrupt can be totally saturated handling
>  >>>both the hard interrupt and threaded handler parts, limiting
> >>>throughput.
>  >>>
>  >>>Add IRQF_IRQ_AFFINITY flag to allow the driver requesting the
>  >>>threaded interrupt to decide on the policy of which cpu the threaded
> >>>handler may run.
>  >>>
>  >>>Signed-off-by: John Garry 
> 
>  Thanks for pointing me to this patch. This fixed the interrupt swamp and
> >>>make the system stable.
> 
>  However I'm seeing reduced performance when using threaded
> >>>interrupts.
> 
>  Here are the test results on a system with 80 CPUs and 10 NVMe disks
>  (32 hardware queues for each disk) Benchmark tool is FIO, I/O pattern:
>  4k random reads on all NVMe disks, with queue depth = 64, num of jobs
>  = 80, direct=1
> 
>  With threaded interrupts: 1320k IOPS
>  With just interrupts: 3720k IOPS
>  With just interrupts and my patch: 3700k IOPS
> >>>
> >>>This gap looks too big wrt. threaded interrupts vs. interrupts.
> >>>
> 
>  At the peak IOPS, the overall CPU usage is at around 98-99%. I think the
> >>>cost of doing wake up and context switch for NVMe threaded IRQ handler
> >>>takes some CPU away.
> 
> >>>
> >>>In theory, it shouldn't be so because most of times the thread should be
> >>>running on CPUs of this hctx, and the

Re: devm_memremap_pages() triggers a kasan_add_zero_shadow() warning

2019-08-21 Thread Baoquan He

On 08/21/19 at 05:12pm, Qian Cai wrote:
> > > Does disabling CONFIG_RANDOMIZE_BASE help? Maybe that workaround has
> > > regressed. Effectively we need to find what is causing the kernel to
> > > sometimes be placed in the middle of a custom reserved memmap= range.
> > 
> > Yes, disabling KASLR works good so far. Assuming the workaround, i.e.,
> > f28442497b5c
> > (“x86/boot: Fix KASLR and memmap= collision”) is correct.
> > 
> > The only other commit that might regress it from my research so far is,
> > 
> > d52e7d5a952c ("x86/KASLR: Parse all 'memmap=' boot option entries”)
> > 
> 
> It turns out that the origin commit f28442497b5c (“x86/boot: Fix KASLR and
> memmap= collision”) has a bug that is unable to handle "memmap=" in
> CONFIG_CMDLINE instead of a parameter in bootloader because when it (as well 
> as
> the commit d52e7d5a952c) calls get_cmd_line_ptr() in order to run
> mem_avoid_memmap(), "boot_params" has no knowledge of CONFIG_CMDLINE. Only 
> later
> in setup_arch(), the kernel will deal with parameters over there.

Yes, we didn't consider CONFIG_CMDLINE during boot compressing stage. It
should be a generic issue since other parameters from CONFIG_CMDLINE could
be ignored too, not only KASLR handling. Would you like to cast a patch
to fix it? Or I can fix it later, maybe next week.

Thanks
Baoquan

RE: [PATCH v2 net-next] net: fec: add C45 MDIO read/write support

2019-08-21 Thread Andy Duan

From: Marco Hartman Sent: Wednesday, August 21, 2019 7:44 PM
> IEEE 802.3ae clause 45 defines a modified MDIO protocol that uses a two
> staged access model in order to increase the address space.
> 
> This patch adds support for C45 MDIO read and write accesses, which are
> used whenever the MII_ADDR_C45 flag in the regnum argument is set.
> In case it is not set, C22 accesses are used as before.
> 
> Signed-off-by: Marco Hartmann 

Acked-by: Fugang Duan 
> ---
> Changes in v2:
> - use bool variable is_c45
> - add missing goto statements
> ---
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 70
> ---
>  1 file changed, 64 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index c01d3ec3e9af..cb3ce27fb27a 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -208,8 +208,11 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet
> MAC address");
> 
>  /* FEC MII MMFR bits definition */
>  #define FEC_MMFR_ST  (1 << 30)
> +#define FEC_MMFR_ST_C45  (0)
>  #define FEC_MMFR_OP_READ (2 << 28)
> +#define FEC_MMFR_OP_READ_C45 (3 << 28)
>  #define FEC_MMFR_OP_WRITE(1 << 28)
> +#define FEC_MMFR_OP_ADDR_WRITE   (0)
>  #define FEC_MMFR_PA(v)   ((v & 0x1f) << 23)
>  #define FEC_MMFR_RA(v)   ((v & 0x1f) << 18)
>  #define FEC_MMFR_TA  (2 << 16)
> @@ -1767,7 +1770,8 @@ static int fec_enet_mdio_read(struct mii_bus *bus,
> int mii_id, int regnum)
>   struct fec_enet_private *fep = bus->priv;
>   struct device *dev = >pdev->dev;
>   unsigned long time_left;
> - int ret = 0;
> + int ret = 0, frame_start, frame_addr, frame_op;
> + bool is_c45 = !!(regnum & MII_ADDR_C45);
> 
>   ret = pm_runtime_get_sync(dev);
>   if (ret < 0)
> @@ -1775,9 +1779,37 @@ static int fec_enet_mdio_read(struct mii_bus
> *bus, int mii_id, int regnum)
> 
>   reinit_completion(>mdio_done);
> 
> + if (is_c45) {
> + frame_start = FEC_MMFR_ST_C45;
> +
> + /* write address */
> + frame_addr = (regnum >> 16);
> + writel(frame_start | FEC_MMFR_OP_ADDR_WRITE |
> +FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
> +FEC_MMFR_TA | (regnum & 0x),
> +fep->hwp + FEC_MII_DATA);
> +
> + /* wait for end of transfer */
> + time_left = wait_for_completion_timeout(>mdio_done,
> + usecs_to_jiffies(FEC_MII_TIMEOUT));
> + if (time_left == 0) {
> + netdev_err(fep->netdev, "MDIO address write timeout\n");
> + ret = -ETIMEDOUT;
> + goto out;
> + }
> +
> + frame_op = FEC_MMFR_OP_READ_C45;
> +
> + } else {
> + /* C22 read */
> + frame_op = FEC_MMFR_OP_READ;
> + frame_start = FEC_MMFR_ST;
> + frame_addr = regnum;
> + }
> +
>   /* start a read op */
> - writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> - FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + writel(frame_start | frame_op |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
>   FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
> 
>   /* wait for end of transfer */
> @@ -1804,7 +1836,8 @@ static int fec_enet_mdio_write(struct mii_bus *bus,
> int mii_id, int regnum,
>   struct fec_enet_private *fep = bus->priv;
>   struct device *dev = >pdev->dev;
>   unsigned long time_left;
> - int ret;
> + int ret, frame_start, frame_addr;
> + bool is_c45 = !!(regnum & MII_ADDR_C45);
> 
>   ret = pm_runtime_get_sync(dev);
>   if (ret < 0)
> @@ -1814,9 +1847,33 @@ static int fec_enet_mdio_write(struct mii_bus
> *bus, int mii_id, int regnum,
> 
>   reinit_completion(>mdio_done);
> 
> + if (is_c45) {
> + frame_start = FEC_MMFR_ST_C45;
> +
> + /* write address */
> + frame_addr = (regnum >> 16);
> + writel(frame_start | FEC_MMFR_OP_ADDR_WRITE |
> +FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(frame_addr) |
> +FEC_MMFR_TA | (regnum & 0x),
> +fep->hwp + FEC_MII_DATA);
> +
> + /* wait for end of transfer */
> + time_left = wait_for_completion_timeout(>mdio_done,
> + usecs_to_jiffies(FEC_MII_TIMEOUT));
> + if (time_left == 0) {
> + netdev_err(fep->netdev, "MDIO address write timeout\n");
> + ret = -ETIMEDOUT;
> + goto out;
> + }
> + } else {
> + /* C22 write */
> + frame_start = FEC_MMFR_ST;
> + frame_addr = regnum;
> + }
> +
>   /* start a write op */
> - writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> -

Re: [PATCH v4] kasan: add memory corruption identification for software tag-based mode

2019-08-21 Thread Walter Wu

On Wed, 2019-08-21 at 20:52 +0300, Andrey Ryabinin wrote:
> 
> On 8/20/19 8:37 AM, Walter Wu wrote:
> > On Tue, 2019-08-06 at 13:43 +0800, Walter Wu wrote:
> >> This patch adds memory corruption identification at bug report for
> >> software tag-based mode, the report show whether it is "use-after-free"
> >> or "out-of-bound" error instead of "invalid-access" error. This will make
> >> it easier for programmers to see the memory corruption problem.
> >>
> >> We extend the slab to store five old free pointer tag and free backtrace,
> >> we can check if the tagged address is in the slab record and make a
> >> good guess if the object is more like "use-after-free" or "out-of-bound".
> >> therefore every slab memory corruption can be identified whether it's
> >> "use-after-free" or "out-of-bound".
> >>
> >> == Changes
> >> Change since v1:
> >> - add feature option CONFIG_KASAN_SW_TAGS_IDENTIFY.
> >> - change QUARANTINE_FRACTION to reduce quarantine size.
> >> - change the qlist order in order to find the newest object in quarantine
> >> - reduce the number of calling kmalloc() from 2 to 1 time.
> >> - remove global variable to use argument to pass it.
> >> - correct the amount of qobject cache->size into the byes of qlist_head.
> >> - only use kasan_cache_shrink() to shink memory.
> >>
> >> Change since v2:
> >> - remove the shinking memory function kasan_cache_shrink()
> >> - modify the description of the CONFIG_KASAN_SW_TAGS_IDENTIFY
> >> - optimize the quarantine_find_object() and qobject_free()
> >> - fix the duplicating function name 3 times in the header.
> >> - modify the function name set_track() to kasan_set_track()
> >>
> >> Change since v3:
> >> - change tag-based quarantine to extend slab to identify memory corruption
> > 
> > Hi,Andrey,
> > 
> > Would you review the patch,please?
> 
> 
> I didn't notice anything fundamentally wrong, but I find there are some
> questionable implementation choices that makes code look weirder than 
> necessary
> and harder to understand. So I ended up with cleaning it up, see the diff 
> bellow.
> I'll send v5 with that diff folded.
> 

Thanks your review and suggestion.

Walter

Re: [patch V2 00/38] posix-cpu-timers: Cleanup and consolidation

2019-08-21 Thread Christoph Hellwig

On Thu, Aug 22, 2019 at 03:02:04AM +0200, Frederic Weisbecker wrote:
> > which repeats every time I fetch.  I can't of anythign particular on
> > my side that would cause this.
> 
> Yeah I had to run "git remote prune tip" and fetch again.
> 
> Apparently there was an old remote branch tip/WIP.timers and git
> seem to refuse to have a new subpath branch.

Thanks, that seems to have fixed the issue for me.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 998 matches

Mail list logo