date:20200710

My Dear in the lord

2020-07-10 Thread Mrs. Mina A. Brunel

My Dear in the lord


My name is Mrs. Mina A. Brunel I am a Norway Citizen who is living in
Burkina Faso, I am married to Mr. Brunel Patrice, a politicians who
owns a small gold company in Burkina Faso; He died of Leprosy and
Radesyge, in year February 2010, During his lifetime he deposited the
sum of € 8.5 Million Euro) Eight million, Five hundred thousand Euros
in a bank in Ouagadougou the capital city of of Burkina in West
Africa. The money was from the sale of his company and death benefits
payment and entitlements of my deceased husband by his company.

I am sending you this message with heavy tears in my eyes and great
sorrow in my heart, and also praying that it will reach you in good
health because I am not in good health, I sleep every night without
knowing if I may be alive to see the next day. I am suffering from
long time cancer and presently I am partially suffering from Leprosy,
which has become difficult for me to move around. I was married to my
late husband for more than 6 years without having a child and my
doctor confided that I have less chance to live, having to know when
the cup of death will come, I decided to contact you to claim the fund
since I don't have any relation I grew up from an orphanage home.

I have decided to donate this money for the support of helping
Motherless babies/Less privileged/Widows and churches also to build
the house of God because I am dying and diagnosed with cancer for
about 3 years ago. I have decided to donate from what I have inherited
from my late husband to you for the good work of Almighty God; I will
be going in for an operation surgery soon.

Now I want you to stand as my next of kin to claim the funds for
charity purposes. Because of this money remains unclaimed after my
death, the bank executives or the government will take the money as
unclaimed fund and maybe use it for selfishness and worthless
ventures, I need a very honest person who can claim this money and use
it for Charity works, for orphanages, widows and also build schools
and churches for less privilege that will be named after my late
husband and my name.

I need your urgent answer to know if you will be able to execute this
project, and I will give you more information on how the fund will be
transferred to your bank account or online banking.

Thanks
Mrs. Mina A. Brunel

[PATCH] CREDITS: Replace HTTP links with HTTPS ones

2020-07-10 Thread Alexander A. Klimov

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
For each line:
  If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
  If both the HTTP and HTTPS versions
  return 200 OK and serve the same content:
Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov 
---
 This is not a recall of the wish to be in CREDITS.
 Just preforming the requested changes.
 
 CREDITS | 80 -
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/CREDITS b/CREDITS
index 0787b5872906..e5267acb98e0 100644
--- a/CREDITS
+++ b/CREDITS
@@ -34,7 +34,7 @@ S: Romania
 
 N: Mark Adler
 E: mad...@alumni.caltech.edu
-W: http://alumnus.caltech.edu/~madler/
+W: https://alumnus.caltech.edu/~madler/
 D: zlib decompression
 
 N: Monalisa Agrawal
@@ -62,7 +62,7 @@ S: United Kingdom
 
 N: Werner Almesberger
 E: wer...@almesberger.net
-W: http://www.almesberger.net/
+W: https://www.almesberger.net/
 D: dosfs, LILO, some fd features, ATM, various other hacks here and there
 S: Buenos Aires
 S: Argentina
@@ -96,7 +96,7 @@ S: USA
 
 N: Erik Andersen
 E: ander...@codepoet.org
-W: http://www.codepoet.org/
+W: https://www.codepoet.org/
 P: 1024D/30D39057 1BC4 2742 E885 E4DE 9301  0C82 5F9B 643E 30D3 9057
 D: Maintainer of ide-cd and Uniform CD-ROM driver, 
 D: ATAPI CD-Changer support, Major 2.1.x CD-ROM update.
@@ -114,7 +114,7 @@ S: Canada K2P 0X3
 
 N: H. Peter Anvin
 E: h...@zytor.com
-W: http://www.zytor.com/~hpa/
+W: https://www.zytor.com/~hpa/
 P: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
 D: Author of the SYSLINUX boot loader, maintainer of the linux.* news
 D: hierarchy and the Linux Device List; various kernel hacks
@@ -124,7 +124,7 @@ S: USA
 
 N: Andrea Arcangeli
 E: and...@suse.de
-W: http://www.kernel.org/pub/linux/kernel/people/andrea/
+W: https://www.kernel.org/pub/linux/kernel/people/andrea/
 P: 1024D/68B9CB43 13D9 8355 295F 4823 7C49  C012 DFA1 686E 68B9 CB43
 P: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC  C0 4B 81 1D 8C 15 C8 E5
 D: Parport hacker
@@ -339,7 +339,7 @@ S: Haifa, Israel
 
 N: Johannes Berg
 E: johan...@sipsolutions.net
-W: http://johannes.sipsolutions.net/
+W: https://johannes.sipsolutions.net/
 P: 4096R/7BF9099A C0EB C440 F6DA 091C 884D  8532 E0F3 73F3 7BF9 099A
 D: powerpc & 802.11 hacker
 
@@ -376,7 +376,7 @@ D: Original author of the Linux networking code
 
 N: Anton Blanchard
 E: an...@samba.org
-W: http://samba.org/~anton/
+W: https://samba.org/~anton/
 P: 1024/8462A731 4C 55 86 34 44 59 A7 99  2B 97 88 4A 88 9A 0D 97
 D: sun4 port, Sparc hacker
 
@@ -483,7 +483,7 @@ D: Intel Wireless WiMAX Connection 2400 SDIO driver
 
 N: Derrick J. Brashear
 E: sha...@dementia.org
-W: http://www.dementia.org/~shadow
+W: https://www.dementia.org/~shadow
 P: 512/71EC9367 C5 29 0F BC 83 51 B9 F0  BC 05 89 A0 4F 1F 30 05
 D: Author of Sparc CS4231 audio driver, random Sparc work
 S: 403 Gilmore Avenue
@@ -509,7 +509,7 @@ S: Sweden
 
 N: Paul Bristow
 E: p...@paulbristow.net
-W: http://paulbristow.net/linux/idefloppy.html
+W: https://paulbristow.net/linux/idefloppy.html
 D: Maintainer of IDE/ATAPI floppy driver
 
 N: Stefano Brivio
@@ -518,7 +518,7 @@ D: Broadcom B43 driver
 
 N: Dominik Brodowski
 E: li...@brodo.de
-W: http://www.brodo.de/
+W: https://www.brodo.de/
 P: 1024D/725B37C6  190F 3E77 9C89 3B6D BECD  46EE 67C3 0308 725B 37C6
 D: parts of CPUFreq code, ACPI bugfixes, PCMCIA rewrite, cpufrequtils
 S: Tuebingen, Germany
@@ -865,7 +865,7 @@ D: Promise DC4030VL caching HD controller drivers
 
 N: Todd J. Derr
 E: t...@fore.com
-W: http://www.wordsmith.org/~tjd
+W: https://www.wordsmith.org/~tjd
 D: Random console hacks and other miscellaneous stuff
 S: 3000 FORE Drive
 S: Warrendale, Pennsylvania 15086
@@ -894,8 +894,8 @@ S: USA
 
 N: Matt Domsch
 E: matt_dom...@dell.com
-W: http://www.dell.com/linux
-W: http://domsch.com/linux
+W: https://www.dell.com/linux
+W: https://domsch.com/linux
 D: Linux/IA-64
 D: Dell PowerEdge server, SCSI layer, misc drivers, and other patches
 S: Dell Inc.
@@ -992,7 +992,7 @@ S: USA
 
 N: Randy Dunlap
 E: rdun...@infradead.org
-W: http://www.infradead.org/~rdunlap/
+W: https://www.infradead.org/~rdunlap/
 D: Linux-USB subsystem, USB core/UHCI/printer/storage drivers
 D: x86 SMP, ACPI, bootflag hacking
 D: documentation, builds
@@ -1063,7 +1063,7 @@ S: Sweden
 
 N: Pekka Enberg
 E: penb...@cs.helsinki.fi
-W: http://www.cs.helsinki.fi/u/penberg/
+W: https://www.cs.helsinki.fi/u/penberg/
 D: Various kernel hacks, fixes, and cleanups.
 D: Slab allocators
 S: Finland
@@ -1157,7 +1157,7 @@ S: Germany
 
 N: Jeremy Fitzhardinge
 E: jer...@goop.org
-W: http://www.goop.org/~jeremy
+W: https://www.goop.org/~jeremy
 D: author of userfs filesystem
 D: Improved mmap and munmap handling
 D: General mm minor tidyups
@@ -1460,7

Re: [PATCH] CREDITS: replace HTTP links with HTTPS ones and add myself

2020-07-10 Thread Alexander A. Klimov





Am 10.07.20 um 23:46 schrieb Jonathan Corbet:

On Fri, 10 Jul 2020 21:43:42 +0200
"Alexander A. Klimov"  wrote:


Regarding the links:

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
   If not .svg:
 For each line:
   If doesn't contain `\bxmlns\b`:
 For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
   If both the HTTP and HTTPS versions
   return 200 OK and serve the same content:
 Replace HTTP with HTTPS.

Regarding the addition of myself:


A couple of things here...


Rationale:
* 93431e0607e5


This is ... not particularly self-explanatory.  Is that meant to be a
commit reference?  If so, you would want to use the normal format.


* the replaced links in this patch


If you are going to do something like make an addition to the file, you
need to do that separately from a cleanup patch. >
But somebody has to say this: I don't think we have any sort of laid-down
policy for what it takes to be mentioned in CREDITS, but I don't think that

I have absolutely no problem with that.
But IMAO you *should* have a such policy.
At least for people who'd *have* a problem with that.


your work thus far clears whatever bar we might set.  We don't immortalize
every person who submits some cleanup patches, or this file would be a long
one indeed.  If you would like to be remembered for your kernel work, I
would respectfully suggest that you move beyond mechanical cleanups into
higher-level work.

One other little thing that jumped out at me:


  N: Alan Cox
-W: http://www.linux.org.uk/diary/
+W: https://www.linux.org.uk/diary/
  D: Linux Networking (0.99.10->2.0.29)
  D: Original Appletalk, AX.25, and IPX code
  D: 3c501 hacker


That link just redirects to linux.com, which is probably not what Alan had
in mind.  Replacing the link with one into the wayback machine (or perhaps
just removing it entirely) would seem like a more useful change than adding
HTTPS to a link that clearly does not reach the intended destination.

Thanks,

jon

WARNING in submit_bio_checks

2020-07-10 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:9e50b94b Add linux-next specific files for 20200703
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=112aaa1f10
kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
dashboard link: https://syzkaller.appspot.com/bug?extid=4c50ac32e5b10e4133e1
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=fb6d10
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1218fa1f10

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4c50ac32e5b10e413...@syzkaller.appspotmail.com

[ cut here ]
Trying to write to read-only block-device nullb0 (partno 0)
WARNING: CPU: 0 PID: 6821 at block/blk-core.c:857 bio_check_ro 
block/blk-core.c:857 [inline]
WARNING: CPU: 0 PID: 6821 at block/blk-core.c:857 
submit_bio_checks+0x1aba/0x1f70 block/blk-core.c:985
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 6821 Comm: syz-executor914 Not tainted 
5.8.0-rc3-next-20200703-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 panic+0x2e3/0x75c kernel/panic.c:231
 __warn.cold+0x20/0x45 kernel/panic.c:600
 report_bug+0x1bd/0x210 lib/bug.c:198
 handle_bug+0x38/0x90 arch/x86/kernel/traps.c:235
 exc_invalid_op+0x13/0x40 arch/x86/kernel/traps.c:255
 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:563
RIP: 0010:bio_check_ro block/blk-core.c:857 [inline]
RIP: 0010:submit_bio_checks+0x1aba/0x1f70 block/blk-core.c:985
Code: 04 00 00 45 8b a4 24 a4 05 00 00 48 8d 74 24 68 48 89 ef e8 b8 21 fe ff 
48 c7 c7 e0 ce 91 88 48 89 c6 44 89 e2 e8 08 df c0 fd <0f> 0b 48 b8 00 00 00 00 
00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02
RSP: 0018:c90001277338 EFLAGS: 00010286
RAX:  RBX: 8880a0cb2240 RCX: 
RDX: 8880a8ebc180 RSI: 815d7d27 RDI: f5200024ee59
RBP: 8880a03101c0 R08: 0001 R09: 8880ae6318e7
R10:  R11:  R12: 
R13: 8880a03101c8 R14:  R15: 8880a03101e8
 submit_bio_noacct+0x89/0x12d0 block/blk-core.c:1198
 submit_bio+0x263/0x5b0 block/blk-core.c:1283
 submit_bh_wbc+0x685/0x8e0 fs/buffer.c:3105
 __block_write_full_page+0x837/0x12e0 fs/buffer.c:1848
 block_write_full_page+0x214/0x270 fs/buffer.c:3034
 __writepage+0x60/0x170 mm/page-writeback.c:2311
 write_cache_pages+0x736/0x11b0 mm/page-writeback.c:2246
 generic_writepages mm/page-writeback.c:2337 [inline]
 generic_writepages+0xe2/0x150 mm/page-writeback.c:2326
 do_writepages+0xec/0x290 mm/page-writeback.c:2352
 __filemap_fdatawrite_range+0x2a1/0x380 mm/filemap.c:422
 filemap_write_and_wait_range mm/filemap.c:655 [inline]
 filemap_write_and_wait_range+0xe1/0x1c0 mm/filemap.c:649
 filemap_write_and_wait include/linux/fs.h:2629 [inline]
 __sync_blockdev fs/block_dev.c:480 [inline]
 sync_blockdev fs/block_dev.c:489 [inline]
 __blkdev_put+0x69a/0x890 fs/block_dev.c:1863
 blkdev_close+0x8c/0xb0 fs/block_dev.c:1947
 __fput+0x33c/0x880 fs/file_table.c:281
 task_work_run+0xdd/0x190 kernel/task_work.c:135
 exit_task_work include/linux/task_work.h:25 [inline]
 do_exit+0xb72/0x2a40 kernel/exit.c:806
 do_group_exit+0x125/0x310 kernel/exit.c:904
 __do_sys_exit_group kernel/exit.c:915 [inline]
 __se_sys_exit_group kernel/exit.c:913 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:913
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:367
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43ee48
Code: Bad RIP value.
RSP: 002b:7ffdd4c8f808 EFLAGS: 0246 ORIG_RAX: 00e7
RAX: ffda RBX:  RCX: 0043ee48
RDX:  RSI: 003c RDI: 
RBP: 004be648 R08: 00e7 R09: ffd0
R10: 0003 R11: 0246 R12: 0001
R13: 006d0180 R14:  R15: 
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Re: [PATCH] Replace HTTP links with HTTPS ones: USB MASS STORAGE DRIVER

2020-07-10 Thread Alexander A. Klimov





Am 10.07.20 um 23:35 schrieb Jonathan Corbet:

On Fri, 10 Jul 2020 21:36:03 +0200
"Alexander A. Klimov"  wrote:


2) Apropos "series" and "as whole"... I stumbled over
 `git log --oneline |grep -Fwe treewide`
 and am wondering:
 *Shouldn't all of these patches even begin with "treewide: "?*
 E.g.: "treewide: Replace HTTP links with HTTPS ones: GCC PLUGINS"


No, this isn't something that needs to be done across the tree all at
once.  Keep going through the appropriate maintainers as you have, but do

If we do treewide only if needed... why is this treewide:

git log --oneline |grep -Fwe 'treewide: Replace GPLv2 
boilerplate/reference with SPDX'



please try to adjust your subject lines to match what they do.

jon

drivers/gpu/drm/drm_gem_shmem_helper.c:260:17: error: implicit declaration of function 'pgprot_writecombine'; did you mean

2020-07-10 Thread kernel test robot

Hi Hans,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   aa0c9086b40c17a7ad94425b3b70dd1fdd7497bf
commit: e4f86e43716443e934d705952902d40de0fa9a05 drm: Add Grain Media GM12U320 
driver v2
date:   12 months ago
config: m68k-randconfig-r002-20200710 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout e4f86e43716443e934d705952902d40de0fa9a05
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from include/linux/file.h:9,
from include/linux/dma-buf.h:27,
from drivers/gpu/drm/drm_gem_shmem_helper.c:6:
   include/linux/scatterlist.h: In function 'sg_set_buf':
   arch/m68k/include/asm/page_no.h:33:50: warning: ordered comparison of 
pointer with null pointer [-Wextra]
  33 | #define virt_addr_valid(kaddr) (((void *)(kaddr) >= (void 
*)PAGE_OFFSET) && \
 |  ^~
   include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
  78 | # define unlikely(x) __builtin_expect(!!(x), 0)
 |  ^
   include/linux/scatterlist.h:143:2: note: in expansion of macro 'BUG_ON'
 143 |  BUG_ON(!virt_addr_valid(buf));
 |  ^~
   include/linux/scatterlist.h:143:10: note: in expansion of macro 
'virt_addr_valid'
 143 |  BUG_ON(!virt_addr_valid(buf));
 |  ^~~
   drivers/gpu/drm/drm_gem_shmem_helper.c: In function 
'drm_gem_shmem_vmap_locked':
>> drivers/gpu/drm/drm_gem_shmem_helper.c:260:17: error: implicit declaration 
>> of function 'pgprot_writecombine'; did you mean 'dma_free_writecombine'? 
>> [-Werror=implicit-function-declaration]
 260 | VM_MAP, pgprot_writecombine(PAGE_KERNEL));
 | ^~~
 | dma_free_writecombine
   drivers/gpu/drm/drm_gem_shmem_helper.c:260:17: error: incompatible type for 
argument 4 of 'vmap'
 260 | VM_MAP, pgprot_writecombine(PAGE_KERNEL));
 | ^~~~
 | |
 | int
   In file included from include/asm-generic/io.h:887,
from arch/m68k/include/asm/io.h:11,
from arch/m68k/include/asm/pgtable_no.h:14,
from arch/m68k/include/asm/pgtable.h:3,
from include/linux/mm.h:99,
from include/linux/scatterlist.h:8,
from include/linux/dma-buf.h:29,
from drivers/gpu/drm/drm_gem_shmem_helper.c:6:
   include/linux/vmalloc.h:109:14: note: expected 'pgprot_t' {aka 'struct 
'} but argument is of type 'int'
 109 | extern void *vmap(struct page **pages, unsigned int count,
 |  ^~~~
   cc1: some warnings being treated as errors

vim +260 drivers/gpu/drm/drm_gem_shmem_helper.c

2194a63a818db7 Noralf Trønnes  2019-03-12  243  
2194a63a818db7 Noralf Trønnes  2019-03-12  244  static void 
*drm_gem_shmem_vmap_locked(struct drm_gem_shmem_object *shmem)
2194a63a818db7 Noralf Trønnes  2019-03-12  245  {
2194a63a818db7 Noralf Trønnes  2019-03-12  246  struct drm_gem_object 
*obj = >base;
2194a63a818db7 Noralf Trønnes  2019-03-12  247  int ret;
2194a63a818db7 Noralf Trønnes  2019-03-12  248  
2194a63a818db7 Noralf Trønnes  2019-03-12  249  if 
(shmem->vmap_use_count++ > 0)
2194a63a818db7 Noralf Trønnes  2019-03-12  250  return 
shmem->vaddr;
2194a63a818db7 Noralf Trønnes  2019-03-12  251  
2194a63a818db7 Noralf Trønnes  2019-03-12  252  ret = 
drm_gem_shmem_get_pages(shmem);
2194a63a818db7 Noralf Trønnes  2019-03-12  253  if (ret)
2194a63a818db7 Noralf Trønnes  2019-03-12  254  goto 
err_zero_use;
2194a63a818db7 Noralf Trønnes  2019-03-12  255  
2194a63a818db7 Noralf Trønnes  2019-03-12  256  if (obj->import_attach)
2194a63a818db7 Noralf Trønnes  2019-03-12  257  shmem->vaddr = 
dma_buf_vmap(obj->import_attach->dmabuf);
2194a63a818db7 Noralf Trønnes  2019-03-12  258  else
be7d9f05c53e6f Boris Brezillon 2019-05-29  259  shmem->vaddr = 
vmap(shmem->pages, obj->size >> PAGE_SHIFT,
be7d9f05c53e6f Boris Brezillon 2019-05-29 @260  
VM_MAP, pgprot_writecombine(PAGE_KERNEL));
2194a63a818db7 Noralf Trønnes  2019-03-12  261  
2194a63a818db7 Noralf Trønnes  2019-03-12  2

[PATCH v2] panic: prevent panic_timeout * 1000 from overflow

2020-07-10 Thread Changming

From: Changming Liu 

Since panic_timeout is an integer passed-in through sysctl,
the loop boundary panic_timeout * 1000 could overflow and
result in a zero-delay panic when panic_timeout is greater
than INT_MAX/1000.

Fix this by moving 1000 to the left, also in case i/1000
might never be greater than panic_timeout, change i to
long long so that it strictly has more bits.

Signed-off-by: Changming Liu 
---
Changes in v2:
- change the loop in panic, instead of change the sysctl

 kernel/panic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index e2157ca387c8..941848aac0ee 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -178,7 +178,8 @@ void panic(const char *fmt, ...)
 {
static char buf[1024];
va_list args;
-   long i, i_next = 0, len;
+   long long i;
+   long i_next = 0, len;
int state = 0;
int old_cpu, this_cpu;
bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
@@ -315,7 +316,7 @@ void panic(const char *fmt, ...)
 */
pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
 
-   for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
+   for (i = 0; i / 1000 < panic_timeout; i += PANIC_TIMER_STEP) {
touch_nmi_watchdog();
if (i >= i_next) {
i += panic_blink(state ^= 1);
-- 
2.17.1

Re: [PATCH v6 13/17] static_call: Add static_call_cond()

2020-07-10 Thread Peter Zijlstra

On Fri, Jul 10, 2020 at 07:08:25PM -0400, Steven Rostedt wrote:
> On Fri, 10 Jul 2020 15:38:44 +0200
> Peter Zijlstra  wrote:
> 
> > +static void __static_call_transform(void *insn, enum insn_type type, void 
> > *func)
> >  {
> > -   const void *code = text_gen_insn(opcode, insn, func);
> > +   int size = CALL_INSN_SIZE;
> > +   const void *code;
> >  
> > -   if (WARN_ONCE(*(u8 *)insn != opcode,
> > - "unexpected static call insn opcode 0x%x at %pS\n",
> > - opcode, insn))
> 
> I would still feel better if we did some sort of sanity check before
> just writing to the text. Confirm this is a jmp, call, ret or nop?

I'll see if I can come up with something, but I'm not sure we keep
enough state to be able to reconstruct what should be there.

Re: [PATCH v6 15/17] static_call: Allow early init

2020-07-10 Thread Peter Zijlstra

On Fri, Jul 10, 2020 at 09:14:26PM -0400, Steven Rostedt wrote:
> On Fri, 10 Jul 2020 15:38:46 +0200
> Peter Zijlstra  wrote:
> 
> > In order to use static_call() to wire up x86_pmu, we need to
> > initialize earlier; copy some of the tricks from jump_label to enable
> > this.
> > 
> > Primarily we overload key->next to store a sites pointer when there
> > are no modules, this avoids having to use kmalloc() to initialize the
> > sites and allows us to run much earlier.
> > 
> 
> I'm confused. What was the need to have key->next store site pointers
> in order to move it up earlier?

The critical part was to not need an allocation.

Re: [PATCH v6 14/17] static_call: Handle tail-calls

2020-07-10 Thread Peter Zijlstra

On Fri, Jul 10, 2020 at 08:23:19PM -0400, Steven Rostedt wrote:
> On Fri, 10 Jul 2020 15:38:45 +0200
> Peter Zijlstra  wrote:
> > @@ -1639,6 +1647,10 @@ static int decode_sections(struct objtoo
> > if (ret)
> > return ret;
> >  
> > +   ret = read_static_call_tramps(file);
> > +   if (ret)
> > +   return ret;
> > +
> > ret = add_jump_destinations(file);
> > if (ret)
> > return ret;
> > @@ -1671,10 +1683,6 @@ static int decode_sections(struct objtoo
> > if (ret)
> > return ret;
> >  
> > -   ret = read_static_call_tramps(file);
> > -   if (ret)
> > -   return ret;
> 
> Hmm, what's the reason for moving this above? Should we have a comment
> here if there's importance that read_static_call_trampoline() is done
> earlier?

I suppose comments is something objtool lacks more of.

The reason is that add_jump_destination() is the thing that does
tail-call detection, and if it wants to add static-call sites, it needs
to know about the trampolines.

Re: [PATCH v6 14/17] static_call: Handle tail-calls

2020-07-10 Thread Peter Zijlstra

On Fri, Jul 10, 2020 at 08:23:19PM -0400, Steven Rostedt wrote:
> On Fri, 10 Jul 2020 15:38:45 +0200
> Peter Zijlstra  wrote:
> 
> > GCC can turn our static_call(name)(args...) into a tail call, in which
> > case we get a JMP.d32 into the trampoline (which then does a further
> > tail-call).
> > 
> > Teach objtool to recognise and mark these in .static_call_sites and
> > adjust the code patching to deal with this.
> > 
> 
> Hmm, were you able to trigger crashes before this patch?

No, just a bunch of tail-calls that didn't get patched and would still
point to the trampoline.

splat and freeze (2 instances)

2020-07-10 Thread Ilkka Prusi




Hi,

I have to splats followed by freezing, first one was saved in logs but 
second one is only what I could gather from screen of frozen machine. 
First one is 5.7.7 and second with 5.8.0-rc4+.


Logs from second one could not be saved but part of it could be captured 
with phonecamera (dmesg -w).


Computer:

- AMD Ryzen 7 2700, [AMD] 400 Series Chipset

- efi: EFI v2.60 by American Megatrends
- efi: ACPI 2.0=0xd13f2000 ACPI=0xd13f2000 SMBIOS=0xdb647000 SMBIOS 
3.0=0xdb646000 ESRT=0xd7bdd918 MEMATTR=0xd7c3e018

- SMBIOS 3.1.1 present.
- DMI: System manufacturer System Product Name/TUF B450-PLUS GAMING, 
BIOS 2008 12/06/2019

- gcc (Debian 9.3.0-14) 9.3.0

First one:

Linux version 5.7.7 (gcc version 9.3.0 (Debian 9.3.0-14), GNU ld (GNU 
Binutils for Debian) 2.34) #2 SMP PREEMPT Fri Jul 3 10:16:05 EEST 2020


[16835.276319][    C3] rcu: INFO: rcu_preempt self-detected stall on CPU
[16835.276331][    C3] rcu:  3-: (5250 ticks this GP) 
idle=526/1/0x4002 softirq=1880877/1880877 fqs=2299

[16835.276338][    C3]   (t=5250 jiffies g=3603393 q=18733)
[16835.276342][    C3] NMI backtrace for cpu 3
[16835.276347][    C3] CPU: 3 PID: 26434 Comm: CJobMgr::m_Work Tainted: 
G    E 5.7.7 #2
[16835.276351][    C3] Hardware name: System manufacturer System Product 
Name/TUF B450-PLUS GAMING, BIOS 2008 12/06/2019

[16835.276353][    C3] Call Trace:
[16835.276358][    C3]  
[16835.276367][    C3]  dump_stack+0x66/0x90
[16835.276373][    C3]  nmi_cpu_backtrace.cold+0x14/0x52
[16835.276378][    C3]  ? lapic_can_unplug_cpu.cold+0x40/0x40
[16835.276382][    C3]  nmi_trigger_cpumask_backtrace+0xfc/0x121
[16835.276388][    C3]  rcu_dump_cpu_stacks+0xa1/0xcf
[16835.276393][    C3]  rcu_sched_clock_irq.cold+0xab/0x16d
[16835.276397][    C3]  update_process_times+0x24/0x50
[16835.276402][    C3]  tick_sched_timer+0x5a/0x170
[16835.276405][    C3]  ? tick_switch_to_oneshot.cold+0x6f/0x6f
[16835.276409][    C3]  __hrtimer_run_queues+0xf6/0x2c0
[16835.276414][    C3]  hrtimer_interrupt+0x118/0x240
[16835.276421][    C3]  smp_apic_timer_interrupt+0x88/0x190
[16835.276425][    C3]  apic_timer_interrupt+0xf/0x20
[16835.276428][    C3]  
[16835.276433][    C3] RIP: 0010:native_queued_spin_lock_slowpath+0x6a/0x200
[16835.276438][    C3] Code: 73 f0 0f ba 2b 08 0f 92 c0 0f b6 c0 c1 e0 
08 89 c2 8b 03 30 e4 09 d0 a9 00 01 ff ff 75 4d 85 c0 74 0e 8b 03 84 c0 
74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 5d 41 5c 41 
5d c3 8b
[16835.276441][    C3] RSP: 0018:c90003d57cb0 EFLAGS: 0206 
ORIG_RAX: ff13
[16835.276445][    C3] RAX: 0105 RBX: 8887fabf6200 RCX: 

[16835.276447][    C3] RDX:  RSI:  RDI: 
8887fabf6200
[16835.276449][    C3] RBP: c90003d57d08 R08:  R09: 

[16835.276451][    C3] R10:  R11:  R12: 

[16835.276454][    C3] R13:  R14: 8887fabf6200 R15: 
db007e1c

[16835.276461][    C3]  _raw_spin_lock+0x2c/0x30
[16835.276465][    C3]  futex_wait+0x102/0x220
[16835.276470][    C3]  ? hrtimer_init_sleeper+0xa0/0xa0
[16835.276476][    C3]  do_futex+0x15a/0x8b0
[16835.276481][    C3]  __ia32_sys_futex_time32+0x13a/0x168
[16835.276488][    C3]  do_fast_syscall_32+0x94/0x280
[16835.276492][    C3] entry_SYSCALL_compat_after_hwframe+0x45/0x4d

And decode_stacktrace.sh gives me following:

$ cat ~/crashdata/577/rcustall | ./scripts/decode_stacktrace.sh vmlinux 
. /lib/modules/5.7.7/

[16835.276319][    C3] rcu: INFO: rcu_preempt self-detected stall on CPU
[16835.276331][    C3] rcu:  3-: (5250 ticks this GP) 
idle=526/1/0x4002 softirq=1880877/1880877 fqs=2299

[16835.276338][    C3]   (t=5250 jiffies g=3603393 q=18733)
[16835.276342][    C3] NMI backtrace for cpu 3
[16835.276347][    C3] CPU: 3 PID: 26434 Comm: CJobMgr::m_Work Tainted: 
G    E 5.7.7 #2
[16835.276351][    C3] Hardware name: System manufacturer System Product 
Name/TUF B450-PLUS GAMING, BIOS 2008 12/06/2019

[16835.276353][    C3] Call Trace:
[16835.276358][    C3]  
[16835.276367][ C3] dump_stack (/usr/src/linux-5.7.7/lib/dump_stack.c:120)
[16835.276373][ C3] nmi_cpu_backtrace.cold 
(/usr/src/linux-5.7.7/./include/linux/cpumask.h:350 
/usr/src/linux-5.7.7/lib/nmi_backtrace.c:103)
[16835.276378][ C3] ? lapic_can_unplug_cpu.cold 
(/usr/src/linux-5.7.7/arch/x86/kernel/apic/hw_nmi.c:32)
[16835.276382][ C3] nmi_trigger_cpumask_backtrace 
(/usr/src/linux-5.7.7/lib/nmi_backtrace.c:62)
[16835.276388][ C3] rcu_dump_cpu_stacks 
(/usr/src/linux-5.7.7/kernel/rcu/tree_stall.h:252 (discriminator 5))
[16835.276393][ C3] rcu_sched_clock_irq.cold 
(/usr/src/linux-5.7.7/kernel/rcu/tree_stall.h:477 
/usr/src/linux-5.7.7/kernel/rcu/tree_stall.h:549 
/usr/src/linux-5.7.7/kernel/rcu/tree.c:3225 
/usr/src/linux-5.7.7/kernel/rcu/tree.c:2296)
[16835.276397][ C3] update_process_times

Re: [PATCH] arm64: dts: qcom: sc7180: Add missing properties for Wifi node

2020-07-10 Thread Sibi Sankar


On 2020-07-11 09:07, Rakesh Pillai wrote:

-Original Message-
From: Doug Anderson 
Sent: Friday, July 10, 2020 1:36 AM
To: Rakesh Pillai 
Cc: open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
; Evan Green ;
Andy Gross ; Bjorn Andersson
; Rob Herring ; linux-
arm-msm ; LKML ; Sibi Sankar 
Subject: Re: [PATCH] arm64: dts: qcom: sc7180: Add missing properties 
for

Wifi node

Hi,

On Thu, Jul 9, 2020 at 2:18 AM Rakesh Pillai  
wrote:

>
> The wlan firmware memory is statically mapped in
> the Trusted Firmware, hence the wlan driver does
> not need to map/unmap this region dynamically.
>
> Hence add the property to indicate the wlan driver
> to not map/unamp the firmware memory region
> dynamically.
>
> Also add the chain1 voltage supply for wlan.
>
> Signed-off-by: Rakesh Pillai 
> ---
> This patch is created on top of the change by
> Douglas Anderson.
> https://lkml.org/lkml/2020/6/25/817
>
> Also the dt-bindings for the chain1 voltage supply
> is added by the below patch series:
> https://patchwork.kernel.org/project/linux-wireless/list/?series=309137
> ---
>  arch/arm64/boot/dts/qcom/sc7180-idp.dts | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/sc7180-idp.dts
b/arch/arm64/boot/dts/qcom/sc7180-idp.dts
> index 472f7f4..4c64bc1 100644
> --- a/arch/arm64/boot/dts/qcom/sc7180-idp.dts
> +++ b/arch/arm64/boot/dts/qcom/sc7180-idp.dts
> @@ -391,10 +391,12 @@
>
>   {
> status = "okay";
> +   qcom,msa-fixed-perm;


Should we include ^^ property
in the base dts? Since I don't
foresee any platform skipping
qcom,msa-fixed-perm property.



At one point in time I thought +Sibi said that this wouldn't be needed
once the firmware was fixed.  ...afterwards you said that it was
needed for SSR (subsystem reset).  Would be good to get confirmation
from Sibi that this matches his understanding.


Yes all of ^^ happened and yes
we now need qcom,msa-fixed-perm
since the wlan_fw_mem permission
assignment in now handled in ATF.



Hi Doug,

This is now needed as the firmware memory mapping was moved to Trusted 
firmware.

This region is now statically mapped to avoid access from driver.




> vdd-0.8-cx-mx-supply = <_l9a_0p6>;
> vdd-1.8-xo-supply = <_l1c_1p8>;
> vdd-1.3-rfa-supply = <_l2c_1p3>;
> vdd-3.3-ch0-supply = <_l10c_3p3>;
> +   vdd-3.3-ch1-supply = <_l11c_3p3>;
> wifi-firmware {
> iommus = <_smmu 0xc2 0x1>;
> };

Other than the one question this looks good to me.

-Doug


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

Re: [PATCH v2] riscv: Allow building with kcov coverage

2020-07-10 Thread Palmer Dabbelt


On Fri, 26 Jun 2020 05:40:56 PDT (-0700), tklau...@distanz.ch wrote:

Add ARCH_HAS_KCOV and HAVE_GCC_PLUGINS to the riscv Kconfig.
Also disable instrumentation of some early boot code and vdso.

Boot-tested on QEMU's riscv64 virt machine.


Thanks.  This is on for-next (with the ack).  I'm boot testing a config with

   CONFIG_KCOV=y
   CONFIG_KCOV_ENABLE_COMPARISONS=y

but LMK if there's something more interesting to test.  I don't see anything
coverage-related in the boot log...


Cc: Björn Töpel 
Cc: Dmitry Vyukov 
Signed-off-by: Tobias Klauser 
---
 arch/riscv/Kconfig  | 2 ++
 arch/riscv/boot/Makefile| 2 ++
 arch/riscv/kernel/vdso/Makefile | 1 +
 arch/riscv/mm/Makefile  | 2 ++
 4 files changed, 7 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 089293e4ad46..a7d7f8184f15 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -19,6 +19,7 @@ config RISCV
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
+   select ARCH_HAS_KCOV
select ARCH_HAS_MMIOWB
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP
@@ -57,6 +58,7 @@ config RISCV
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_EBPF_JIT if MMU
select HAVE_FUTEX_CMPXCHG if FUTEX
+   select HAVE_GCC_PLUGINS
select HAVE_GENERIC_VDSO if MMU && 64BIT
select HAVE_PCI
select HAVE_PERF_EVENTS
diff --git a/arch/riscv/boot/Makefile b/arch/riscv/boot/Makefile
index 3530c59b3ea7..c59fca695f9d 100644
--- a/arch/riscv/boot/Makefile
+++ b/arch/riscv/boot/Makefile
@@ -14,6 +14,8 @@
 # Based on the ia64 and arm64 boot/Makefile.
 #

+KCOV_INSTRUMENT := n
+
 OBJCOPYFLAGS_Image :=-O binary -R .note -R .note.gnu.build-id -R .comment -S

 targets := Image loader
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 29cf052f6541..4b0d3bcc44e5 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -29,6 +29,7 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)

 # Disable gcov profiling for VDSO code
 GCOV_PROFILE := n
+KCOV_INSTRUMENT := n

 # Force dependency
 $(obj)/vdso.o: $(obj)/vdso.so
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index 363ef01c30b1..c0185e556ca5 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -5,6 +5,8 @@ ifdef CONFIG_FTRACE
 CFLAGS_REMOVE_init.o = -pg
 endif

+KCOV_INSTRUMENT_init.o := n
+
 obj-y += init.o
 obj-y += extable.o
 obj-$(CONFIG_MMU) += fault.o pageattr.o

Re: [GIT PULL] SMB3 Fixes

2020-07-10 Thread pr-tracker-bot

The pull request you sent on Fri, 10 Jul 2020 19:53:30 -0500:

> git://git.samba.org/sfrench/cifs-2.6.git tags/5.8-rc4-smb3-fixes

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5ab39e08ff1558ed80d93f5c5217338f19369a40

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] CodingStyle: Inclusive Terminology

2020-07-10 Thread pr-tracker-bot

The pull request you sent on Fri, 10 Jul 2020 17:17:15 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux 
> tags/inclusive-terminology

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/49decddd39e5f6132ccd7d9fdc3d7c470b0061bb

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] libnvdimm fix for v5.8-rc5

2020-07-10 Thread pr-tracker-bot

The pull request you sent on Fri, 10 Jul 2020 18:26:27 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 
> tags/libnvdimm-fix-v5.8-rc5

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/1df0d8960499e58963fd6c8ac75e544f2b417b29

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT] Networking

2020-07-10 Thread pr-tracker-bot

The pull request you sent on Fri, 10 Jul 2020 16:58:15 -0700 (PDT):

> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git refs/heads/master

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5a764898afec0bc097003e8c3e727792289f76d6

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [PATCH] fuse_writepages_fill() optimization to avoid WARN_ON in tree_insert

2020-07-10 Thread Miklos Szeredi

On Thu, Jun 25, 2020 at 11:02 AM Vasily Averin  wrote:
>
> In current implementation fuse_writepages_fill() tries to share the code:
> for new wpa it calls tree_insert() with num_pages = 0
> then switches to common code used non-modified num_pages
> and increments it at the very end.
>
> Though it triggers WARN_ON(!wpa->ia.ap.num_pages) in tree_insert()
>  WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 
> [fuse]
>  RIP: 0010:tree_insert+0xab/0xc0 [fuse]
>  Call Trace:
>   fuse_writepages_fill+0x5da/0x6a0 [fuse]
>   write_cache_pages+0x171/0x470
>   fuse_writepages+0x8a/0x100 [fuse]
>   do_writepages+0x43/0xe0
>
> This patch re-works fuse_writepages_fill() to call tree_insert()
> with num_pages = 1 and avoids its subsequent increment and
> an extra spin_lock(>lock) for newly added wpa.

Looks good.  However, I don't like the way fuse_writepage_in_flight()
is silently changed to insert page into the rb_tree.  Also the
insertion can be merged with the search for in-flight and be done
unconditionally to simplify the logic.  See attached patch.

Thanks,
Miklos
---
 fs/fuse/file.c |   62 +++--
 1 file changed, 30 insertions(+), 32 deletions(-)

--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1674,7 +1674,8 @@ __acquires(fi->lock)
 	}
 }
 
-static void tree_insert(struct rb_root *root, struct fuse_writepage_args *wpa)
+static struct fuse_writepage_args *fuse_insert_writeback(struct rb_root *root,
+		struct fuse_writepage_args *wpa)
 {
 	pgoff_t idx_from = wpa->ia.write.in.offset >> PAGE_SHIFT;
 	pgoff_t idx_to = idx_from + wpa->ia.ap.num_pages - 1;
@@ -1697,11 +1698,17 @@ static void tree_insert(struct rb_root *
 		else if (idx_to < curr_index)
 			p = &(*p)->rb_left;
 		else
-			return (void) WARN_ON(true);
+			return curr;
 	}
 
 	rb_link_node(>writepages_entry, parent, p);
 	rb_insert_color(>writepages_entry, root);
+	return NULL;
+}
+
+static void tree_insert(struct rb_root *root, struct fuse_writepage_args *wpa)
+{
+	WARN_ON(fuse_insert_writeback(root, wpa));
 }
 
 static void fuse_writepage_end(struct fuse_conn *fc, struct fuse_args *args,
@@ -1953,14 +1960,14 @@ static void fuse_writepages_send(struct
 }
 
 /*
- * First recheck under fi->lock if the offending offset is still under
- * writeback.  If yes, then iterate auxiliary write requests, to see if there's
+ * Check under fi->lock if the page is under writeback, and insert it onto the
+ * rb_tree if not. Otherwise iterate auxiliary write requests, to see if there's
  * one already added for a page at this offset.  If there's none, then insert
  * this new request onto the auxiliary list, otherwise reuse the existing one by
- * copying the new page contents over to the old temporary page.
+ * swapping the new temp page with the old one.
  */
-static bool fuse_writepage_in_flight(struct fuse_writepage_args *new_wpa,
- struct page *page)
+static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa,
+			   struct page *page)
 {
 	struct fuse_inode *fi = get_fuse_inode(new_wpa->inode);
 	struct fuse_writepage_args *tmp;
@@ -1968,17 +1975,15 @@ static bool fuse_writepage_in_flight(str
 	struct fuse_args_pages *new_ap = _wpa->ia.ap;
 
 	WARN_ON(new_ap->num_pages != 0);
+	new_ap->num_pages = 1;
 
 	spin_lock(>lock);
-	rb_erase(_wpa->writepages_entry, >writepages);
-	old_wpa = fuse_find_writeback(fi, page->index, page->index);
+	old_wpa = fuse_insert_writeback(>writepages, new_wpa);
 	if (!old_wpa) {
-		tree_insert(>writepages, new_wpa);
 		spin_unlock(>lock);
-		return false;
+		return true;
 	}
 
-	new_ap->num_pages = 1;
 	for (tmp = old_wpa->next; tmp; tmp = tmp->next) {
 		pgoff_t curr_index;
 
@@ -2007,7 +2012,7 @@ static bool fuse_writepage_in_flight(str
 		fuse_writepage_free(new_wpa);
 	}
 
-	return true;
+	return false;
 }
 
 static int fuse_writepages_fill(struct page *page,
@@ -2086,12 +2091,6 @@ static int fuse_writepages_fill(struct p
 		ap->args.end = fuse_writepage_end;
 		ap->num_pages = 0;
 		wpa->inode = inode;
-
-		spin_lock(>lock);
-		tree_insert(>writepages, wpa);
-		spin_unlock(>lock);
-
-		data->wpa = wpa;
 	}
 	set_page_writeback(page);
 
@@ -2099,26 +2098,25 @@ static int fuse_writepages_fill(struct p
 	ap->pages[ap->num_pages] = tmp_page;
 	ap->descs[ap->num_pages].offset = 0;
 	ap->descs[ap->num_pages].length = PAGE_SIZE;
+	data->orig_pages[ap->num_pages] = page;
 
 	inc_wb_stat(_to_bdi(inode)->wb, WB_WRITEBACK);
 	inc_node_page_state(tmp_page, NR_WRITEBACK_TEMP);
 
 	err = 0;
-	if (is_writeback && fuse_writepage_in_flight(wpa, page)) {
+	if (data->wpa) {
+		/*
+		 * Protected by fi->lock against concurrent access by
+		 * fuse_page_is_writeback().
+		 */
+		spin_lock(>lock);
+		ap->num_pages++;
+		spin_unlock(>lock);
+	} else if (fuse_writepage_add(wpa, page)) {
+		data->wpa = wpa;
+	} else {
 		end_page_writeback(page);
-		data->wpa = NULL;
-		goto out_unlock;
 	}
-	data->orig_pages[ap->num_pages] = page;
-
-	/*
-	 * Protected by

[PATCH 3/3] arm64: Use the new generic copy_oldmem_page()

2020-07-10 Thread Palmer Dabbelt

From: Palmer Dabbelt 

This is exactly the same as the arm64 code, which I just into lib/ to
avoid copying it into the RISC-V port.  This builds with defconfig.

Signed-off-by: Palmer Dabbelt 
---
 arch/arm64/Kconfig |  1 +
 arch/arm64/kernel/crash_dump.c | 39 --
 2 files changed, 1 insertion(+), 39 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 66dc41fd49f2..55b27d56b163 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1109,6 +1109,7 @@ comment "Support for PE file signature verification 
disabled"
 
 config CRASH_DUMP
bool "Build kdump crash kernel"
+   select GENERIC_LIB_COPY_OLDMEM_PAGE
help
  Generate crash dump after being started by kexec. This should
  be normally only set in special crash dump kernels which are
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
index e6e284265f19..197b92c249ba 100644
--- a/arch/arm64/kernel/crash_dump.c
+++ b/arch/arm64/kernel/crash_dump.c
@@ -13,45 +13,6 @@
 #include 
 #include 
 
-/**
- * copy_oldmem_page() - copy one page from old kernel memory
- * @pfn: page frame number to be copied
- * @buf: buffer where the copied page is placed
- * @csize: number of bytes to copy
- * @offset: offset in bytes into the page
- * @userbuf: if set, @buf is in a user address space
- *
- * This function copies one page from old kernel memory into buffer pointed by
- * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
- * copied or negative error in case of failure.
- */
-ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
-size_t csize, unsigned long offset,
-int userbuf)
-{
-   void *vaddr;
-
-   if (!csize)
-   return 0;
-
-   vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
-   if (!vaddr)
-   return -ENOMEM;
-
-   if (userbuf) {
-   if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
-   memunmap(vaddr);
-   return -EFAULT;
-   }
-   } else {
-   memcpy(buf, vaddr + offset, csize);
-   }
-
-   memunmap(vaddr);
-
-   return csize;
-}
-
 /**
  * elfcorehdr_read - read from ELF core header
  * @buf: buffer where the data is placed
-- 
2.27.0.383.g050319c2ae-goog

[PATCH 1/3] lib: Add a generic copy_oldmem_page()

2020-07-10 Thread Palmer Dabbelt

From: Palmer Dabbelt 

A version of this that is identical to the arm64 version was recently
copied into the RISC-V port while adding kexec() support.  Instead I'd
like to put a shared copy in lib/ and use it from the various ports.

Signed-off-by: Palmer Dabbelt 
---
 lib/Kconfig|  3 +++
 lib/Makefile   |  2 ++
 lib/copy_oldmem_page.c | 51 ++
 3 files changed, 56 insertions(+)
 create mode 100644 lib/copy_oldmem_page.c

diff --git a/lib/Kconfig b/lib/Kconfig
index df3f3da95990..25544abc9547 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -676,3 +676,6 @@ config GENERIC_LIB_CMPDI2
 
 config GENERIC_LIB_UCMPDI2
bool
+
+config GENERIC_LIB_COPY_OLDMEM_PAGE
+   bool
diff --git a/lib/Makefile b/lib/Makefile
index b1c42c10073b..30d57d8b32b1 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -318,3 +318,5 @@ obj-$(CONFIG_OBJAGG) += objagg.o
 # KUnit tests
 obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o
 obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o
+
+obj-$(CONFIG_GENERIC_LIB_COPY_OLDMEM_PAGE) += copy_oldmem_page.o
diff --git a/lib/copy_oldmem_page.c b/lib/copy_oldmem_page.c
new file mode 100644
index ..f0090027218a
--- /dev/null
+++ b/lib/copy_oldmem_page.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Originally part of arch/arm64/kernel/crash_dump.c
+ * Copyright (C) 2017 Linaro Limited
+ * Author: AKASHI Takahiro 
+ */
+
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+size_t csize, unsigned long offset,
+int userbuf)
+{
+   void *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
+   memunmap(vaddr);
+   return -EFAULT;
+   }
+   } else {
+   memcpy(buf, vaddr + offset, csize);
+   }
+
+   memunmap(vaddr);
+
+   return csize;
+}
-- 
2.27.0.383.g050319c2ae-goog

[PATCH 2/3] arm: Use the new generic copy_oldmem_page()

2020-07-10 Thread Palmer Dabbelt

From: Palmer Dabbelt 

This is exactly the same as the arm64 code, which I just into lib/ to
avoid copying it into the RISC-V port.  This builds with defconfig and with
CRASH_DUMP=y.

Signed-off-by: Palmer Dabbelt 
---
 arch/arm/Kconfig |  1 +
 arch/arm/kernel/Makefile |  1 -
 arch/arm/kernel/crash_dump.c | 54 
 3 files changed, 1 insertion(+), 55 deletions(-)
 delete mode 100644 arch/arm/kernel/crash_dump.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2ac74904a3ce..dfbeb14e9673 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1933,6 +1933,7 @@ config ATAGS_PROC
 
 config CRASH_DUMP
bool "Build kdump crash kernel (EXPERIMENTAL)"
+   select GENERIC_LIB_COPY_OLDMEM_PAGE
help
  Generate crash dump after being started by kexec. This should
  be normally only set in special crash dump kernels which are
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 89e5d864e923..b5310a90dfe4 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -65,7 +65,6 @@ obj-$(CONFIG_KGDB)+= kgdb.o patch.o
 obj-$(CONFIG_ARM_UNWIND)   += unwind.o
 obj-$(CONFIG_HAVE_TCM) += tcm.o
 obj-$(CONFIG_OF)   += devtree.o
-obj-$(CONFIG_CRASH_DUMP)   += crash_dump.o
 obj-$(CONFIG_SWP_EMULATE)  += swp_emulate.o
 CFLAGS_swp_emulate.o   := -Wa,-march=armv7-a
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
diff --git a/arch/arm/kernel/crash_dump.c b/arch/arm/kernel/crash_dump.c
deleted file mode 100644
index 53cb92435392..
--- a/arch/arm/kernel/crash_dump.c
+++ /dev/null
@@ -1,54 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * arch/arm/kernel/crash_dump.c
- *
- * Copyright (C) 2010 Nokia Corporation.
- * Author: Mika Westerberg
- *
- * This code is taken from arch/x86/kernel/crash_dump_64.c
- *   Created by: Hariprasad Nellitheertha (h...@in.ibm.com)
- *   Copyright (C) IBM Corporation, 2004. All rights reserved
- */
-
-#include 
-#include 
-#include 
-#include 
-
-/**
- * copy_oldmem_page() - copy one page from old kernel memory
- * @pfn: page frame number to be copied
- * @buf: buffer where the copied page is placed
- * @csize: number of bytes to copy
- * @offset: offset in bytes into the page
- * @userbuf: if set, @buf is int he user address space
- *
- * This function copies one page from old kernel memory into buffer pointed by
- * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
- * copied or negative error in case of failure.
- */
-ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
-size_t csize, unsigned long offset,
-int userbuf)
-{
-   void *vaddr;
-
-   if (!csize)
-   return 0;
-
-   vaddr = ioremap(__pfn_to_phys(pfn), PAGE_SIZE);
-   if (!vaddr)
-   return -ENOMEM;
-
-   if (userbuf) {
-   if (copy_to_user(buf, vaddr + offset, csize)) {
-   iounmap(vaddr);
-   return -EFAULT;
-   }
-   } else {
-   memcpy(buf, vaddr + offset, csize);
-   }
-
-   iounmap(vaddr);
-   return csize;
-}
-- 
2.27.0.383.g050319c2ae-goog

Add and use a generic copy_oldmem_page()

2020-07-10 Thread Palmer Dabbelt

While adding support for kexec, Nick recently copied the arm64
copy_oldmem_page() into the RISC-V port.  Since this is shared verbatim with
arm and arm64 already, I'd like to add a generic version and so we can use it
instead.  I haven't converted over the MIPS, PPC, or SH ports: while I think we
could figure out how to share a version, they're not exactly the same right
now.  S/390 and x86 are definitely meaningfully different.

Unless there are any objections I'll include the first patch along with the
RISC-V kexec support, which I'm hoping to have for 5.9.  The code, based on
5.8-rc4, is at
ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git -b
copy_oldmem_page .

Re: [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries

2020-07-10 Thread Andrii Nakryiko

On Fri, Jul 10, 2020 at 6:30 PM Song Liu  wrote:
>
> Calling get_perf_callchain() on perf_events from PEBS entries may cause
> unwinder errors. To fix this issue, the callchain is fetched early. Such
> perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
>
> Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
> also cause unwinder errors. To fix this, block bpf_get_[stack|stackid] on
> these perf_events. Unfortunately, bpf verifier cannot tell whether the
> program will be attached to perf_event with PEBS entries. Therefore,
> block such programs during ioctl(PERF_EVENT_IOC_SET_BPF).
>
> Signed-off-by: Song Liu 
> ---

Perhaps it's a stupid question, but why bpf_get_stack/bpf_get_stackid
can't figure out automatically that they are called from
__PERF_SAMPLE_CALLCHAIN_EARLY perf event and use different callchain,
if necessary?

It is quite suboptimal from a user experience point of view to require
two different BPF helpers depending on PEBS or non-PEBS perf events.

[...]

RE: [PATCH] arm64: dts: qcom: sc7180: Add missing properties for Wifi node

2020-07-10 Thread Rakesh Pillai




> -Original Message-
> From: Doug Anderson 
> Sent: Friday, July 10, 2020 1:36 AM
> To: Rakesh Pillai 
> Cc: open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
> ; Evan Green ;
> Andy Gross ; Bjorn Andersson
> ; Rob Herring ; linux-
> arm-msm ; LKML  ker...@vger.kernel.org>; Sibi Sankar 
> Subject: Re: [PATCH] arm64: dts: qcom: sc7180: Add missing properties for
> Wifi node
> 
> Hi,
> 
> On Thu, Jul 9, 2020 at 2:18 AM Rakesh Pillai  wrote:
> >
> > The wlan firmware memory is statically mapped in
> > the Trusted Firmware, hence the wlan driver does
> > not need to map/unmap this region dynamically.
> >
> > Hence add the property to indicate the wlan driver
> > to not map/unamp the firmware memory region
> > dynamically.
> >
> > Also add the chain1 voltage supply for wlan.
> >
> > Signed-off-by: Rakesh Pillai 
> > ---
> > This patch is created on top of the change by
> > Douglas Anderson.
> > https://lkml.org/lkml/2020/6/25/817
> >
> > Also the dt-bindings for the chain1 voltage supply
> > is added by the below patch series:
> > https://patchwork.kernel.org/project/linux-wireless/list/?series=309137
> > ---
> >  arch/arm64/boot/dts/qcom/sc7180-idp.dts | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sc7180-idp.dts
> b/arch/arm64/boot/dts/qcom/sc7180-idp.dts
> > index 472f7f4..4c64bc1 100644
> > --- a/arch/arm64/boot/dts/qcom/sc7180-idp.dts
> > +++ b/arch/arm64/boot/dts/qcom/sc7180-idp.dts
> > @@ -391,10 +391,12 @@
> >
> >   {
> > status = "okay";
> > +   qcom,msa-fixed-perm;
> 
> At one point in time I thought +Sibi said that this wouldn't be needed
> once the firmware was fixed.  ...afterwards you said that it was
> needed for SSR (subsystem reset).  Would be good to get confirmation
> from Sibi that this matches his understanding.

Hi Doug,

This is now needed as the firmware memory mapping was moved to Trusted firmware.
This region is now statically mapped to avoid access from driver.

> 
> 
> > vdd-0.8-cx-mx-supply = <_l9a_0p6>;
> > vdd-1.8-xo-supply = <_l1c_1p8>;
> > vdd-1.3-rfa-supply = <_l2c_1p3>;
> > vdd-3.3-ch0-supply = <_l10c_3p3>;
> > +   vdd-3.3-ch1-supply = <_l11c_3p3>;
> > wifi-firmware {
> > iommus = <_smmu 0xc2 0x1>;
> > };
> 
> Other than the one question this looks good to me.
> 
> -Doug

Re: [PATCH -next] device-dax: make dev_dax_kmem_probe() static

2020-07-10 Thread Ira Weiny

On Tue, Jul 07, 2020 at 07:23:40PM +0800, Wei Yongjun wrote:
> sparse report warning as follows:
> 
> drivers/dax/kmem.c:22:5: warning:
>  symbol 'dev_dax_kmem_probe' was not declared. Should it be static?
> 
> dev_dax_kmem_probe() is not used outside of kmem.c, so marks
> it static.
> 
> Signed-off-by: Wei Yongjun 

Seems ok,

Reviewed-by: Ira Weiny 

> ---
>  drivers/dax/kmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 275aa5f87399..87e271668170 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -19,7 +19,7 @@ static const char *kmem_name;
>  /* Set if any memory will remain added when the driver will be unloaded. */
>  static bool any_hotremove_failed;
>  
> -int dev_dax_kmem_probe(struct device *dev)
> +static int dev_dax_kmem_probe(struct device *dev)
>  {
>   struct dev_dax *dev_dax = to_dev_dax(dev);
>   struct resource *res = _dax->region->res;
>

Re: sound/soc/codecs/zl38060.c:614:34: warning: unused variable 'zl38_dt_ids'

2020-07-10 Thread Nathan Chancellor

On Fri, Jul 10, 2020 at 01:24:59PM +0100, Mark Brown wrote:
> On Thu, Jul 09, 2020 at 07:41:00PM -0700, Nathan Chancellor wrote:
> 
> > When CONFIG_SND_SOC_ZL38060 is y, MODULE_DEVICE_TABLE expands to nothing
> > so zl38_dt_ids will be unused. This is a pretty common construct in the
> > kernel and the only way I can think of to resolve this through the code
> > is by adding __used annotations to all of these variables, which I think
> > is overkill for this.
> 
> > Personally, I think this warning should be downgraded to W=2, thoughts?
> 
> We've had that warning available for ever, we shouldn't need to disable
> it now.  I had thought there was supposed to be magic which caused
> of_match_ptr() to make things look referenced when !OF but don't seem to
> actually see any sign of it.  The other thing is to just have ifdefs
> around the table.

While it has been available, it's been hidden behind W=1, which is now
default on for 0day.

Sure, you could hide it behind an ifdef for either CONFIG_OF or MODULE
(since you could build this as a module with CONFIG_OF disabled).

I just figured this would be something frowned upon but if that is how
you would prefer it to be fixed, then I have no objections to it.

Cheers,
Nathan

Re: [PATCH] mm: vmscan: consistent update to pgrefill

2020-07-10 Thread Shakeel Butt

On Fri, Jul 10, 2020 at 7:32 PM Roman Gushchin  wrote:
>
> On Fri, Jul 10, 2020 at 06:14:59PM -0700, Shakeel Butt wrote:
> > The vmstat pgrefill is useful together with pgscan and pgsteal stats to
> > measure the reclaim efficiency. However vmstat's pgrefill is not updated
> > consistently at system level. It gets updated for both global and memcg
> > reclaim however pgscan and pgsteal are updated for only global reclaim.
> > So, update pgrefill only for global reclaim. If someone is interested in
> > the stats representing both system level as well as memcg level reclaim,
> > then consult the root memcg's memory.stat instead of /proc/vmstat.
> >
> > Signed-off-by: Shakeel Butt 
>
> So you went into the opposite direction from the "previous version"
> ( https://lkml.org/lkml/2020/5/7/1464 ) ?
>

Yes because we already had those stats in the root memcg and exposing
root memcg's memory.stat resolved the issue.

> Acked-by: Roman Gushchin 

Thanks a lot.

Re: [PATCH] Restore gcc check in mips asm/unroll.h

2020-07-10 Thread Nathan Chancellor

On Fri, Jul 10, 2020 at 03:31:00PM -0700, Linus Torvalds wrote:
> On Fri, Jul 10, 2020 at 11:43 AM Nick Desaulniers
>  wrote:
> >
> > What I'd really like to see as a policy in the kernel going forward in
> > that ANY new commit that adds some hack or workaround for a specific
> > compiler version add a comment about which toolchain version was
> > problematic, that way when we drop support for that version years
> > later, we can drop whatever hacks and technical debt we've accumulated
> > to support that older version.
> 
> The problem is that at the time we find and fix things, it's often
> _very_ unclear which compiler versions are affected.
> 
> We also have the situation that a lot of distro compilers aren't
> necessarily completely "clean" versions, particularly for the
> "enterprise" ones that get stuck on some old version and then fix up
> their breakage by backporting fixes.

Indeed. I would say this is less common for most distributions with
clang, where they tend to stick closer to tip of tree, but it can still
happen. I guess there is not a really good solution for this but we
could just have a policy that as soon as you move away from the upstream
version, you are on your own.

> When it's some particular version of a compiler that supports a
> particular feature, that tends to be much more straightforward. But
> we've had bugs where it was very unclear when exactly the bug was
> fixed (fi it was fixed at all by the time we do the workaround).

As for putting a seal of approval on a minimum supported version of
LLVM/clang, I have my reservations. 0day keeps uncovering various issues
with its builds and clang's release model is different than from GCC's
so if we ever come across a compiler bug in an older version of clang,
we have basically no hope for getting it fixed. GCC supports older
series through bug fix releases for quite some time (GCC 7 was supported
for two and a half years), whereas with clang, they only see one
servicing release before the next major release (for example, clang
9.0.1 before clang 10.0.0) so it makes getting compiler fixes into the
hands of users much more difficult. I am trying to rectify that with
clang 10 though, where I have been testing that release against a bunch
of different configs both in tree and out of tree:
https://github.com/nathanchance/llvm-kernel-testing

However, I think at this point, we can say clang itself is in a good
position as of clang 9, certainly clang 10. I am less confident in
placing a minimum version on the LLVM tools such as lld though. For arm,
arm64, and x86_64, we are in fairly good shape as of clang 10 but I
think there is probably some more work/polishing to be done there; for
other architectures, it is worse. I suppose we would have to consider
the support model: under what cases is it acceptable to bump the minimum
required version versus inserting a bad compiler hack? As someone who is
not super familiar with the relationship between GCC and the kernel, it
appears to me that the general attitude towards compiler bugs has been
workaround it in the kernel while hoping that it gets fixed at some
point in GCC. We have been pretty aggressive about fixing the compiler
instead of inserting a workaround, which I feel like is the better
solution, but it makes supporting multiple versions of the compiler more
difficult (versus just saying use the latest). It is something that
needs to be discussed and agreed upon sooner rather than later though,
especially as we grow more and more polished.

There were some other thoughts that I had on our issue tracker here, if
anyone cares for them:

https://github.com/ClangBuiltLinux/linux/issues/941

Sorry for the brain dump and cheers,
Nathan

Re: [PATCH] mm: vmscan: consistent update to pgrefill

2020-07-10 Thread Roman Gushchin

On Fri, Jul 10, 2020 at 06:14:59PM -0700, Shakeel Butt wrote:
> The vmstat pgrefill is useful together with pgscan and pgsteal stats to
> measure the reclaim efficiency. However vmstat's pgrefill is not updated
> consistently at system level. It gets updated for both global and memcg
> reclaim however pgscan and pgsteal are updated for only global reclaim.
> So, update pgrefill only for global reclaim. If someone is interested in
> the stats representing both system level as well as memcg level reclaim,
> then consult the root memcg's memory.stat instead of /proc/vmstat.
> 
> Signed-off-by: Shakeel Butt 

So you went into the opposite direction from the "previous version"
( https://lkml.org/lkml/2020/5/7/1464 ) ?

Acked-by: Roman Gushchin 

Thanks!

[PATCH v2] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-10 Thread Nayna Jain

The device-tree property to check secure and trusted boot state is
different for guests(pseries) compared to baremetal(powernv).

This patch updates the existing is_ppc_secureboot_enabled() and
is_ppc_trustedboot_enabled() function to add support for pseries.

Signed-off-by: Nayna Jain 
Reviewed-by: Daniel Axtens 
---
v2:
* included Michael Ellerman's feedback.
* added Daniel Axtens's Reviewed-by.

 arch/powerpc/kernel/secure_boot.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/secure_boot.c 
b/arch/powerpc/kernel/secure_boot.c
index 4b982324d368..efb325cbd42f 100644
--- a/arch/powerpc/kernel/secure_boot.c
+++ b/arch/powerpc/kernel/secure_boot.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct device_node *get_ppc_fw_sb_node(void)
 {
@@ -23,12 +24,21 @@ bool is_ppc_secureboot_enabled(void)
 {
struct device_node *node;
bool enabled = false;
+   u32 secureboot;
 
node = get_ppc_fw_sb_node();
enabled = of_property_read_bool(node, "os-secureboot-enforcing");
-
of_node_put(node);
 
+   if (enabled)
+   goto out;
+
+   if (!of_property_read_u32(of_root, "ibm,secure-boot", )) {
+   if (secureboot)
+   enabled = (secureboot > 1) ? true : false;
+   }
+
+out:
pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
 
return enabled;
@@ -38,12 +48,21 @@ bool is_ppc_trustedboot_enabled(void)
 {
struct device_node *node;
bool enabled = false;
+   u32 trustedboot;
 
node = get_ppc_fw_sb_node();
enabled = of_property_read_bool(node, "trusted-enabled");
-
of_node_put(node);
 
+   if (enabled)
+   goto out;
+
+   if (!of_property_read_u32(of_root, "ibm,trusted-boot", )) {
+   if (trustedboot)
+   enabled = (trustedboot > 0) ? true : false;
+   }
+
+out:
pr_info("Trusted boot mode %s\n", enabled ? "enabled" : "disabled");
 
return enabled;
-- 
2.26.2

Re: [PATCH] mips: Remove compiler check in unroll macro

2020-07-10 Thread Nathan Chancellor

On Fri, Jul 10, 2020 at 03:43:43PM -0700, Linus Torvalds wrote:
> On Fri, Jul 10, 2020 at 3:34 PM Nathan Chancellor
>  wrote:
> >
> > Clang 8 was chosen as a minimum version for this check because there
> > were some improvements around __builtin_constant_p in that release. In
> > reality, MIPS was not even buildable until clang 9 so that check was not
> > technically necessary. Just remove all compiler checks and just assume
> > that we have a working compiler.
> 
> Thanks, that looks much nicer.
> 
> Applied.
> 
> I think we could probably remove the (unrelated) clang-8 check in the
> arm side too, but I guess I'll let arm/clang people worry about it.
> 
> Linus

Yes, we probably should. I'll comment more on that in the other thread.

Thanks for picking up the patch quickly!

Cheers,
Nathan

[PATCH v4] ARM: dts: vfxxx: Add node for CAAM

2020-07-10 Thread Chris Healy

From: Andrey Smirnov  

Add node for CAAM device in NXP Vybrid SoC.

Signed-off-by: Andrey Smirnov 
Signed-off-by: Chris Healy 
Reviewed-by: Fabio Estevam 
---
v4:
- really add reviewed by from Fabio Estevam
v3:
- put version information in the correct place
- add reviewed by from Fabio Estevam
v2:
- fixup commit to show that this patch is from Andrey Smirnov

 arch/arm/boot/dts/vfxxx.dtsi | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/arm/boot/dts/vfxxx.dtsi b/arch/arm/boot/dts/vfxxx.dtsi
index 2d547e7b21ad..0fe03aa0367f 100644
--- a/arch/arm/boot/dts/vfxxx.dtsi
+++ b/arch/arm/boot/dts/vfxxx.dtsi
@@ -729,6 +729,28 @@
dma-names = "rx","tx";
status = "disabled";
};
+
+   crypto: crypto@400f {
+   compatible = "fsl,sec-v4.0";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   reg = <0x400f 0x9000>;
+   ranges = <0 0x400f 0x9000>;
+   clocks = < VF610_CLK_CAAM>;
+   clock-names = "ipg";
+
+   sec_jr0: jr0@1000 {
+   compatible = "fsl,sec-v4.0-job-ring";
+   reg = <0x1000 0x1000>;
+   interrupts = <102 IRQ_TYPE_LEVEL_HIGH>;
+   };
+
+   sec_jr1: jr1@2000 {
+   compatible = "fsl,sec-v4.0-job-ring";
+   reg = <0x2000 0x1000>;
+   interrupts = <102 IRQ_TYPE_LEVEL_HIGH>;
+   };
+   };
};
};
 };
-- 
2.21.3

Re: [PATCH -next] : add stub for of_get_next_parent() to fix qcom build error

2020-07-10 Thread Stephen Rothwell

Hi Randy,

On Fri, 10 Jul 2020 16:40:03 -0700 Randy Dunlap  wrote:
>
> Are linux-next hashes/tags stable?

That depends on the maintainer of the tree I fetch ... In this case the
qcom tree.
-- 
Cheers,
Stephen Rothwell


pgpTdH0HZLMCo.pgp
Description: OpenPGP digital signature

Re: [PATCH] hid-input: Fix devices that return multiple bytes in battery report

2020-07-10 Thread Darren Hart

On Fri, Jul 10, 2020 at 8:19 AM Grant Likely  wrote:
>
> Some devices, particularly the 3DConnexion Spacemouse wireless 3D
> controllers, return more than just the battery capacity in the battery
> report. The Spacemouse devices return an additional byte with a device
> specific field. However, hidinput_query_battery_capacity() only
> requests a 2 byte transfer.
>
> When a spacemouse is connected via USB (direct wire, no wireless dongle)
> and it returns a 3 byte report instead of the assumed 2 byte battery
> report the larger transfer confuses and frightens the USB subsystem
> which chooses to ignore the transfer. Then after 2 seconds assume the
> device has stopped responding and reset it. This can be reproduced
> easily by using a wired connection with a wireless spacemouse. The
> Spacemouse will enter a loop of resetting every 2 seconds which can be
> observed in dmesg.
>
> This patch solves the problem by increasing the transfer request to 4
> bytes instead of 2. The fix isn't particularly elegant, but it is simple
> and safe to backport to stable kernels. A further patch will follow to
> more elegantly handle battery reports that contain additional data.
>

Applied and tested on 5.8.0-rc4+ (aa0c9086b40c) with a 3Dconnexion
SpaceMouse Wireless (tested connected via USB). Observed the same
behavior Grant reports before the patch. After the patch, the device stays
connected successfully.

Tested-by: Darren Hart 

Thanks Grant!

> Signed-off-by: Grant Likely 
> Cc: Darren Hart 
> Cc: Jiri Kosina 
> Cc: Benjamin Tissoires 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/hid/hid-input.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c
> index dea9cc65bf80..e8641ce677e4 100644
> --- a/drivers/hid/hid-input.c
> +++ b/drivers/hid/hid-input.c
> @@ -350,13 +350,13 @@ static int hidinput_query_battery_capacity(struct 
> hid_device *dev)
> u8 *buf;
> int ret;
>
> -   buf = kmalloc(2, GFP_KERNEL);
> +   buf = kmalloc(4, GFP_KERNEL);
> if (!buf)
> return -ENOMEM;
>
> -   ret = hid_hw_raw_request(dev, dev->battery_report_id, buf, 2,
> +   ret = hid_hw_raw_request(dev, dev->battery_report_id, buf, 4,
>  dev->battery_report_type, 
> HID_REQ_GET_REPORT);
> -   if (ret != 2) {
> +   if (ret < 2) {
> kfree(buf);
> return -ENODATA;
> }
> --
> 2.20.1
>

Re: [PATCH] mm: vmscan: consistent update to pgrefill

2020-07-10 Thread Yafang Shao

On Sat, Jul 11, 2020 at 9:15 AM Shakeel Butt  wrote:
>
> The vmstat pgrefill is useful together with pgscan and pgsteal stats to
> measure the reclaim efficiency. However vmstat's pgrefill is not updated
> consistently at system level. It gets updated for both global and memcg
> reclaim however pgscan and pgsteal are updated for only global reclaim.
> So, update pgrefill only for global reclaim. If someone is interested in
> the stats representing both system level as well as memcg level reclaim,
> then consult the root memcg's memory.stat instead of /proc/vmstat.
>
> Signed-off-by: Shakeel Butt 

Acked-by: Yafang Shao 

> ---
>  mm/vmscan.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5215840ee217..4167b0cc1784 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2030,7 +2030,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
>
> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
>
> -   __count_vm_events(PGREFILL, nr_scanned);
> +   if (!cgroup_reclaim(sc))
> +   __count_vm_events(PGREFILL, nr_scanned);
> __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned);
>
> spin_unlock_irq(>lru_lock);
> --
> 2.27.0.383.g050319c2ae-goog
>


-- 
Thanks
Yafang

[PATCH v3] ARM: dts: vfxxx: Add node for CAAM

2020-07-10 Thread Chris Healy

From: Andrey Smirnov  

Add node for CAAM device in NXP Vybrid SoC.

Signed-off-by: Andrey Smirnov 
Signed-off-by: Chris Healy 
---
v3:
- put version information in the correct place
- add reviewed by from Fabio Estevam
v2:
- fixup commit to show that this patch is from Andrey Smirnov

 arch/arm/boot/dts/vfxxx.dtsi | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/arm/boot/dts/vfxxx.dtsi b/arch/arm/boot/dts/vfxxx.dtsi
index 2d547e7b21ad..0fe03aa0367f 100644
--- a/arch/arm/boot/dts/vfxxx.dtsi
+++ b/arch/arm/boot/dts/vfxxx.dtsi
@@ -729,6 +729,28 @@
dma-names = "rx","tx";
status = "disabled";
};
+
+   crypto: crypto@400f {
+   compatible = "fsl,sec-v4.0";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   reg = <0x400f 0x9000>;
+   ranges = <0 0x400f 0x9000>;
+   clocks = < VF610_CLK_CAAM>;
+   clock-names = "ipg";
+
+   sec_jr0: jr0@1000 {
+   compatible = "fsl,sec-v4.0-job-ring";
+   reg = <0x1000 0x1000>;
+   interrupts = <102 IRQ_TYPE_LEVEL_HIGH>;
+   };
+
+   sec_jr1: jr1@2000 {
+   compatible = "fsl,sec-v4.0-job-ring";
+   reg = <0x2000 0x1000>;
+   interrupts = <102 IRQ_TYPE_LEVEL_HIGH>;
+   };
+   };
};
};
 };
-- 
2.21.3

Re: [PATCH v2 2/2] riscv: Enable context tracking

2020-07-10 Thread Greentime Hu

Palmer Dabbelt  於 2020年7月11日 週六 上午1:30寫道：
>
> On Wed, 24 Jun 2020 02:03:16 PDT (-0700), greentime...@sifive.com wrote:
> > This patch implements and enables context tracking for riscv (which is a
> > prerequisite for CONFIG_NO_HZ_FULL support)
> >
> > It adds checking for previous state in the entry that all excepttions and
> > interrupts goes to and calls context_tracking_user_exit() if it comes from
> > user space. It also calls context_tracking_user_enter() if it will return
> > to user space before restore_all.
> >
> > This patch is tested with the dynticks-testing testcase in
> > qemu-system-riscv64 virt machine and Unleashed board.
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git
> >
> > We can see the log here. The tick got mostly stopped during the execution
> > of the user loop.
> >
> > _-=> irqs-off
> >/ _=> need-resched
> >   | / _---=> hardirq/softirq
> >   || / _--=> preempt-depth
> >   ||| / delay
> >  TASK-PID   CPU#  TIMESTAMP  FUNCTION
> > | |   |      | |
> >-0 [001] d..2   604.183512: sched_switch: prev_comm=swapper/1 
> > prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=taskset next_pid=273 
> > next_prio=120
> > user_loop-273   [001] d.h1   604.184788: hrtimer_expire_entry: 
> > hrtimer=2eda5fab function=tick_sched_timer now=604176096300
> > user_loop-273   [001] d.s2   604.184897: workqueue_queue_work: work 
> > struct=383402c2 function=vmstat_update workqueue=f36d35d4 
> > req_cpu=1 cpu=1
> > user_loop-273   [001] dns2   604.185039: tick_stop: success=0 
> > dependency=SCHED
> > user_loop-273   [001] dn.1   604.185103: tick_stop: success=0 
> > dependency=SCHED
> > user_loop-273   [001] d..2   604.185154: sched_switch: prev_comm=taskset 
> > prev_pid=273 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 
> > next_pid=46 next_prio=120
> > <...>-46[001]    604.185194: workqueue_execute_start: work 
> > struct 383402c2: function vmstat_update
> > <...>-46[001] d..2   604.185266: sched_switch: 
> > prev_comm=kworker/1:1 prev_pid=46 prev_prio=120 prev_state=I ==> 
> > next_comm=taskset next_pid=273 next_prio=120
> > user_loop-273   [001] d.h1   604.188812: hrtimer_expire_entry: 
> > hrtimer=2eda5fab function=tick_sched_timer now=604180133400
> > user_loop-273   [001] d..1   604.189050: tick_stop: success=1 
> > dependency=NONE
> > user_loop-273   [001] d..2   614.251386: sched_switch: prev_comm=user_loop 
> > prev_pid=273 prev_prio=120 prev_state=X ==> next_comm=swapper/1 next_pid=0 
> > next_prio=120
> >-0 [001] d..2   614.315391: sched_switch: prev_comm=swapper/1 
> > prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=taskset next_pid=276 
> > next_prio=120
> >
> > Signed-off-by: Greentime Hu 
> > ---
> >  arch/riscv/Kconfig|  1 +
> >  arch/riscv/kernel/entry.S | 23 +++
> >  2 files changed, 24 insertions(+)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 128192e14ff2..17520e11815b 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -52,6 +52,7 @@ config RISCV
> >   select HAVE_ARCH_SECCOMP_FILTER
> >   select HAVE_ARCH_TRACEHOOK
> >   select HAVE_ASM_MODVERSIONS
> > + select HAVE_CONTEXT_TRACKING
> >   select HAVE_COPY_THREAD_TLS
> >   select HAVE_DMA_CONTIGUOUS if MMU
> >   select HAVE_EBPF_JIT if MMU
> > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > index cae7e6d4c7ef..6ed579fc1073 100644
> > --- a/arch/riscv/kernel/entry.S
> > +++ b/arch/riscv/kernel/entry.S
> > @@ -97,6 +97,14 @@ _save_context:
> >   la gp, __global_pointer$
> >  .option pop
> >
> > +#ifdef CONFIG_CONTEXT_TRACKING
> > + /* If previous state is in user mode, call 
> > context_tracking_user_exit. */
> > + andi a0, s1, SR_SPP
>
> I've changed that to SR_PP, as I don't see any reason why this should depend 
> on
> MMU.
>
> I think this is correct: we're using scratch==0 elsewhere to detect recursive
> traps, but we've blown that away by this point so it's not an option.  I don't
> know of any reason why PP wouldn't be accurate.

Hi Palmer,

Thank you. That makes sense to me.

>
> > + bnez a0, skip_context_tracking
> > + call context_tracking_user_exit
> > +
> > +skip_context_tracking:
> > +#endif
> >   la ra, ret_from_exception
> >   /*
> >* MSB of cause differentiates between
> > @@ -137,6 +145,17 @@ _save_context:
> >   tail do_trap_unknown
> >
> >  handle_syscall:
> > +#ifdef CONFIG_CONTEXT_TRACKING
> > + /* Recover a0 - a7 for system calls */
> > + REG_L x10, PT_A0(sp)
> > + REG_L x11, PT_A1(sp)
> > + REG_L x12, PT_A2(sp)
> > + REG_L x13, PT_A3(sp)
> > + REG_L x14, PT_A4(sp)
> > + REG_L x15, PT_A5(sp)
> > + REG_L x16, PT_A6(sp)
> > + REG_L x17,

Re: [PATCH] stmmac: pci: Add support for LS7A bridge chip

2020-07-10 Thread Jiaxun Yang





在 2020/7/11 9:35, Jiaxun Yang 写道:



在 2020/7/10 16:51, Zhi Li 写道:

Add gmac platform data to support LS7A bridge chip.

Co-developed-by: Hongbin Li 
Signed-off-by: Hongbin Li 
Signed-off-by: Zhi Li 
---
  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 22 
++

  1 file changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c

index 272cb47..dab2a40 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -138,6 +138,24 @@ static const struct stmmac_pci_info 
snps_gmac5_pci_info = {

  .setup = snps_gmac5_default_data,
  };
  +static int loongson_default_data(struct pci_dev *pdev, struct 
plat_stmmacenent_data *plat)

+{
+    common_default_data(plat);
+
+    plat->bus_id = pci_dev_id(pdev);
+    plat->phy_addr = 0;
+    plat->interface = PHY_INTERFACE_MODE_GMII;
+
+    plat->dma_cfg->pbl = 32;
+    plat->dma_cfg->pblx8 = true;
+
+    return 0;
+}
+
+static struct stmmac_pci_info loongson_pci_info = {
+    .setup = loongson_default_data;
+};
+
  /**
   * stmmac_pci_probe
   *
@@ -204,6 +222,8 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
  res.addr = pcim_iomap_table(pdev)[i];
  res.wol_irq = pdev->irq;
  res.irq = pdev->irq;
+    if (pdev->vendor == PCI_VENDOR_ID_LOONGSON)
+    res.lpi_irq = pdev->irq + 1;


This can never work.
We're allocating IRQs by irq_domain, not ID.
Please describe IRQ in DeviceTree, and *DO NOT* sne dout untested patch.

Oops, ^ send out

FYI: Here is my solution of GMAC[1] [2], I was too busy to upstream it.
We're using totally different structure than Loongson's out of tree Kernel,
especially in IRQ management.

Please don't simply copy 'n paste code from your company's internal kernel.

Please try to understand how upstream kernel is working and test your 
patches

with upstrem kernel.

[1]: 
https://github.com/FlyGoat/linux/commit/9d6584c186a8007f14dc8bb2524e48a2fd7d689a
[2]: 
https://github.com/FlyGoat/linux/commit/558a256acfeb022e132113e7952a9df3df375302




Thanks.


    return stmmac_dvr_probe(>dev, plat, );
  }
@@ -273,11 +293,13 @@ static SIMPLE_DEV_PM_OPS(stmmac_pm_ops, 
stmmac_pci_suspend, stmmac_pci_resume);

    #define PCI_DEVICE_ID_STMMAC_STMMAC    0x1108
  #define PCI_DEVICE_ID_SYNOPSYS_GMAC5_ID    0x7102
+#define PCI_DEVICE_ID_LOONGSON_GMAC    0x7a03
    static const struct pci_device_id stmmac_id_table[] = {
  { PCI_DEVICE_DATA(STMMAC, STMMAC, _pci_info) },
  { PCI_DEVICE_DATA(STMICRO, MAC, _pci_info) },
  { PCI_DEVICE_DATA(SYNOPSYS, GMAC5_ID, _gmac5_pci_info) },
+    { PCI_DEVICE_DATA(LOONGSON, GMAC, _pci_info) },
  {}
  };

- Jiaxun

Re: [PATCH] stmmac: pci: Add support for LS7A bridge chip

2020-07-10 Thread Jiaxun Yang





在 2020/7/10 16:51, Zhi Li 写道:

Add gmac platform data to support LS7A bridge chip.

Co-developed-by: Hongbin Li 
Signed-off-by: Hongbin Li 
Signed-off-by: Zhi Li 
---
  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index 272cb47..dab2a40 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -138,6 +138,24 @@ static const struct stmmac_pci_info snps_gmac5_pci_info = {
.setup = snps_gmac5_default_data,
  };
  
+static int loongson_default_data(struct pci_dev *pdev, struct plat_stmmacenent_data *plat)

+{
+   common_default_data(plat);
+
+   plat->bus_id = pci_dev_id(pdev);
+   plat->phy_addr = 0;
+   plat->interface = PHY_INTERFACE_MODE_GMII;
+
+   plat->dma_cfg->pbl = 32;
+   plat->dma_cfg->pblx8 = true;
+
+   return 0;
+}
+
+static struct stmmac_pci_info loongson_pci_info = {
+   .setup = loongson_default_data;
+};
+
  /**
   * stmmac_pci_probe
   *
@@ -204,6 +222,8 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
res.addr = pcim_iomap_table(pdev)[i];
res.wol_irq = pdev->irq;
res.irq = pdev->irq;
+   if (pdev->vendor == PCI_VENDOR_ID_LOONGSON)
+   res.lpi_irq = pdev->irq + 1;


This can never work.
We're allocating IRQs by irq_domain, not ID.
Please describe IRQ in DeviceTree, and *DO NOT* sne dout untested patch.

Thanks.

  
  	return stmmac_dvr_probe(>dev, plat, );

  }
@@ -273,11 +293,13 @@ static SIMPLE_DEV_PM_OPS(stmmac_pm_ops, 
stmmac_pci_suspend, stmmac_pci_resume);
  
  #define PCI_DEVICE_ID_STMMAC_STMMAC		0x1108

  #define PCI_DEVICE_ID_SYNOPSYS_GMAC5_ID   0x7102
+#define PCI_DEVICE_ID_LOONGSON_GMAC0x7a03
  
  static const struct pci_device_id stmmac_id_table[] = {

{ PCI_DEVICE_DATA(STMMAC, STMMAC, _pci_info) },
{ PCI_DEVICE_DATA(STMICRO, MAC, _pci_info) },
{ PCI_DEVICE_DATA(SYNOPSYS, GMAC5_ID, _gmac5_pci_info) },
+   { PCI_DEVICE_DATA(LOONGSON, GMAC, _pci_info) },
{}
  };
  

- Jiaxun

Re: [linux-sunxi] [PATCH 01/16] ASoC: sun4i-i2s: Add support for H6 I2S

2020-07-10 Thread Samuel Holland

Jernej,

On 7/10/20 2:22 PM, Jernej Škrabec wrote:
>> From the description in the manual, this looks off by one. The number of
>> BCLKs per LRCK is LRCK_PERIOD + 1.
> 
> Are you sure? Macro SUN8I_I2S_FMT0_LRCK_PERIOD() is defined as follows:
> 
> #define SUN8I_I2S_FMT0_LRCK_PERIOD(period)((period - 1) << 8)
> 
> which already lowers value by 1.

No, sorry, I had missed the subtraction happening in the macro. So there's no
problem here.

Thanks,
Samuel

Re: [RFC PATCH 14/16] irq: Add support for core-wide protection of IRQ and softirq

2020-07-10 Thread Aubrey Li

On Fri, Jul 10, 2020 at 9:36 PM Vineeth Remanan Pillai
 wrote:
>
> Hi Aubrey,
>
> On Fri, Jul 10, 2020 at 8:19 AM Li, Aubrey  wrote:
> >
> > Hi Joel/Vineeth,
> > [...]
> > The problem is gone when we reverted this patch. We are running multiple
> > uperf threads(equal to cpu number) in a cgroup with coresched enabled.
> > This is 100% reproducible on our side.
> >
> > Just wonder if anything already known before we dig into it.
> >
> Thanks for reporting this. We haven't seen any lockups like this
> in our testing yet.

This is replicable on a bare metal machine. We tried to reproduce
on a 8-cpus KVM vm but failed.

> Could you please add more information on how to reproduce this?
> Was it a simple uperf run without any options or was it running any
> specific kind of network test?

I put our scripts at here:
https://github.com/aubreyli/uperf

>
> We shall also try to reproduce this and investigate.

I'll try to see if I can narrow down the test case and grab some logs
next week.

Thanks,
-Aubrey

Re: [PATCH v2 6/6] riscv: Add KPROBES_ON_FTRACE supported

2020-07-10 Thread Guo Ren

Thx Masami,

On Fri, Jul 10, 2020 at 9:50 PM Masami Hiramatsu  wrote:
>
> Hi Guo,
>
> On Thu,  9 Jul 2020 02:19:14 +
> guo...@kernel.org wrote:
>
> > +/* Ftrace callback handler for kprobes -- called under preepmt disabed */
> > +void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> > +struct ftrace_ops *ops, struct pt_regs *regs)
> > +{
> > + struct kprobe *p;
> > + struct kprobe_ctlblk *kcb;
> > +
> > + p = get_kprobe((kprobe_opcode_t *)ip);
> > + if (unlikely(!p) || kprobe_disabled(p))
> > + return;
> > +
> > + kcb = get_kprobe_ctlblk();
> > + if (kprobe_running()) {
> > + kprobes_inc_nmissed_count(p);
> > + } else {
> > + /*
> > +  * The regs->epc hasn't been saved by SAVE_ALL in mcount-dyn.S
> > +  * So no need to resume it, just for kprobe handler.
> > +  */
> > + instruction_pointer_set(regs, ip);
> > + __this_cpu_write(current_kprobe, p);
> > + kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> > + if (!p->pre_handler || !p->pre_handler(p, regs)) {
> > + /*
> > +  * Emulate singlestep (and also recover regs->pc)
> > +  * as if there is a nop
> > +  */
> > + instruction_pointer_set(regs,
> > + (unsigned long)p->addr + MCOUNT_INSN_SIZE);
> > + if (unlikely(p->post_handler)) {
> > + kcb->kprobe_status = KPROBE_HIT_SSDONE;
> > + p->post_handler(p, regs, 0);
> > + }
>
> Hmm, don't you need restoring the previous instruction pointer here?
look at  riscv mcount-dyn.S SAVE_ALL function, sp frame lay out like this:
---
| return address |
---
| frame pointer   |
---
| pt_regs x1-x31|
---
It's not a complete pt_regs for the handler, so modifing regs->ip is no use.

> If you don't support modifying the instruction pointer in the handler,
We can modify ip like this if necessary:
*(unsigned long *)((unsigned long)regs + sizeof(struct pt_regs) + 8) = xxx;

> it must not be compatible with kprobes.
Why, can you show related codes? thank you very much.

>
> Now BPF function override and function error injection depends on
> this behevior, so could you consider to support it in the "ftrace"
> implementation at first? (And if it is enabled, you can enable the
> livepatch on RISCV too)
Great message!

But can you show me codes that bpf and err-jnject using the behavior? Thx

I'll try to fix up it :)

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

[PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries

2020-07-10 Thread Song Liu

Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, the callchain is fetched early. Such
perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.

Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
also cause unwinder errors. To fix this, block bpf_get_[stack|stackid] on
these perf_events. Unfortunately, bpf verifier cannot tell whether the
program will be attached to perf_event with PEBS entries. Therefore,
block such programs during ioctl(PERF_EVENT_IOC_SET_BPF).

Signed-off-by: Song Liu 
---
 include/linux/filter.h |  3 ++-
 kernel/bpf/verifier.c  |  3 +++
 kernel/events/core.c   | 10 ++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 2593777236037..fb34dc40f039b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -534,7 +534,8 @@ struct bpf_prog {
is_func:1,  /* program is a bpf function */
kprobe_override:1, /* Do we override a kprobe? 
*/
has_callchain_buf:1, /* callchain buffer 
allocated? */
-   enforce_expected_attach_type:1; /* Enforce 
expected_attach_type checking at attach time */
+   enforce_expected_attach_type:1, /* Enforce 
expected_attach_type checking at attach time */
+   call_get_perf_callchain:1; /* Do we call 
helpers that uses get_perf_callchain()? */
enum bpf_prog_type  type;   /* Type of BPF program */
enum bpf_attach_typeexpected_attach_type; /* For some prog types */
u32 len;/* Number of filter blocks */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b608185e1ffd5..1e11b0f6fba31 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4884,6 +4884,9 @@ static int check_helper_call(struct bpf_verifier_env 
*env, int func_id, int insn
env->prog->has_callchain_buf = true;
}
 
+   if (func_id == BPF_FUNC_get_stackid || func_id == BPF_FUNC_get_stack)
+   env->prog->call_get_perf_callchain = true;
+
if (changes_data)
clear_all_pkt_pointers(env);
return 0;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 856d98c36f562..f2f575a286bb4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9544,6 +9544,16 @@ static int perf_event_set_bpf_handler(struct perf_event 
*event, u32 prog_fd)
if (IS_ERR(prog))
return PTR_ERR(prog);
 
+   if ((event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY) &&
+   prog->call_get_perf_callchain) {
+   /*
+* The perf_event get_perf_callchain() early, the attached
+* BPF program shouldn't call get_perf_callchain() again.
+*/
+   bpf_prog_put(prog);
+   return -EINVAL;
+   }
+
event->prog = prog;
event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
-- 
2.24.1

[PATCH bpf-next 4/5] selftests/bpf: add get_stackid_cannot_attach

2020-07-10 Thread Song Liu

This test confirms that BPF program that calls bpf_get_stackid() cannot
attach to perf_event with PEBS entry.

Signed-off-by: Song Liu 
---
 .../prog_tests/get_stackid_cannot_attach.c| 57 +++
 1 file changed, 57 insertions(+)
 create mode 100644 
tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c

diff --git a/tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c 
b/tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c
new file mode 100644
index 0..ae943c502b62b
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+#include 
+#include "test_stacktrace_build_id.skel.h"
+
+void test_get_stackid_cannot_attach(void)
+{
+   struct perf_event_attr attr = {
+   /* .type = PERF_TYPE_SOFTWARE, */
+   .type = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_CPU_CYCLES,
+   .precise_ip = 2,
+   .sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_BRANCH_STACK |
+   PERF_SAMPLE_CALLCHAIN,
+   .branch_sample_type = PERF_SAMPLE_BRANCH_USER |
+   PERF_SAMPLE_BRANCH_NO_FLAGS |
+   PERF_SAMPLE_BRANCH_NO_CYCLES |
+   PERF_SAMPLE_BRANCH_CALL_STACK,
+   .sample_period = 5000,
+   .size = sizeof(struct perf_event_attr),
+   };
+   struct test_stacktrace_build_id *skel;
+   __u32 duration = 0;
+   int pmu_fd, err;
+
+   skel = test_stacktrace_build_id__open();
+   if (CHECK(!skel, "skel_open", "skeleton open failed\n"))
+   return;
+
+   /* override program type */
+   bpf_program__set_perf_event(skel->progs.oncpu);
+
+   err = test_stacktrace_build_id__load(skel);
+   if (CHECK(err, "skel_load", "skeleton load failed: %d\n", err))
+   goto cleanup;
+
+   pmu_fd = syscall(__NR_perf_event_open, , -1 /* pid */,
+0 /* cpu 0 */, -1 /* group id */,
+0 /* flags */);
+   if (pmu_fd < 0 && errno == ENOENT) {
+   printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n", __func__);
+   test__skip();
+   goto cleanup;
+   }
+   if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n",
+ pmu_fd, errno))
+   goto cleanup;
+
+   skel->links.oncpu = bpf_program__attach_perf_event(skel->progs.oncpu,
+  pmu_fd);
+   CHECK(!IS_ERR(skel->links.oncpu), "attach_perf_event",
+ "should have failed\n");
+   close(pmu_fd);
+
+cleanup:
+   test_stacktrace_build_id__destroy(skel);
+}
-- 
2.24.1

[PATCH bpf-next 5/5] selftests/bpf: add callchain_stackid

2020-07-10 Thread Song Liu

This tests new helper function bpf_get_callchain_stackid(), which is the
alternative to bpf_get_stackid() for perf_event with PEBS entry.

Signed-off-by: Song Liu 
---
 .../bpf/prog_tests/callchain_stackid.c| 61 +++
 .../selftests/bpf/progs/callchain_stackid.c   | 37 +++
 2 files changed, 98 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/callchain_stackid.c
 create mode 100644 tools/testing/selftests/bpf/progs/callchain_stackid.c

diff --git a/tools/testing/selftests/bpf/prog_tests/callchain_stackid.c 
b/tools/testing/selftests/bpf/prog_tests/callchain_stackid.c
new file mode 100644
index 0..ebe6251324a1a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/callchain_stackid.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+#include 
+#include "callchain_stackid.skel.h"
+
+void test_callchain_stackid(void)
+{
+   struct perf_event_attr attr = {
+   /* .type = PERF_TYPE_SOFTWARE, */
+   .type = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_CPU_CYCLES,
+   .precise_ip = 2,
+   .sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_BRANCH_STACK |
+   PERF_SAMPLE_CALLCHAIN,
+   .branch_sample_type = PERF_SAMPLE_BRANCH_USER |
+   PERF_SAMPLE_BRANCH_NO_FLAGS |
+   PERF_SAMPLE_BRANCH_NO_CYCLES |
+   PERF_SAMPLE_BRANCH_CALL_STACK,
+   .sample_period = 5000,
+   .size = sizeof(struct perf_event_attr),
+   };
+   struct callchain_stackid *skel;
+   __u32 duration = 0;
+   int pmu_fd, err;
+
+   skel = callchain_stackid__open();
+
+   if (CHECK(!skel, "skel_open", "skeleton open failed\n"))
+   return;
+
+   /* override program type */
+   bpf_program__set_perf_event(skel->progs.oncpu);
+
+   err = callchain_stackid__load(skel);
+   if (CHECK(err, "skel_load", "skeleton load failed: %d\n", err))
+   goto cleanup;
+
+   pmu_fd = syscall(__NR_perf_event_open, , -1 /* pid */,
+0 /* cpu 0 */, -1 /* group id */,
+0 /* flags */);
+   if (pmu_fd < 0) {
+   printf("%s:SKIP:cpu doesn't support the event\n", __func__);
+   test__skip();
+   goto cleanup;
+   }
+
+   skel->links.oncpu = bpf_program__attach_perf_event(skel->progs.oncpu,
+  pmu_fd);
+   if (CHECK(IS_ERR(skel->links.oncpu), "attach_perf_event",
+ "err %ld\n", PTR_ERR(skel->links.oncpu))) {
+   close(pmu_fd);
+   goto cleanup;
+   }
+   usleep(50);
+
+   CHECK(skel->data->total_val == 1, "get_callchain_stack", "failed\n");
+   close(pmu_fd);
+
+cleanup:
+   callchain_stackid__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/callchain_stackid.c 
b/tools/testing/selftests/bpf/progs/callchain_stackid.c
new file mode 100644
index 0..aab2c736a0a45
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/callchain_stackid.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+#include "vmlinux.h"
+#include 
+
+#ifndef PERF_MAX_STACK_DEPTH
+#define PERF_MAX_STACK_DEPTH 127
+#endif
+
+#ifndef BPF_F_USER_STACK
+#define BPF_F_USER_STACK   (1ULL << 8)
+#endif
+
+typedef __u64 stack_trace_t[PERF_MAX_STACK_DEPTH];
+struct {
+   __uint(type, BPF_MAP_TYPE_STACK_TRACE);
+   __uint(max_entries, 16384);
+   __uint(key_size, sizeof(__u32));
+   __uint(value_size, sizeof(stack_trace_t));
+} stackmap SEC(".maps");
+
+long total_val = 1;
+
+SEC("perf_event")
+int oncpu(struct bpf_perf_event_data *ctx)
+{
+   long val;
+
+   val = bpf_get_callchain_stackid(ctx->callchain, , 0);
+
+   if (val > 0)
+   total_val += val;
+
+   return 0;
+}
+
+char LICENSE[] SEC("license") = "GPL";
-- 
2.24.1

[PATCH bpf-next 2/5] bpf: add callchain to bpf_perf_event_data

2020-07-10 Thread Song Liu

If the callchain is available, BPF program can use bpf_probe_read_kernel()
to fetch the callchain, or use it in a BPF helper.

Signed-off-by: Song Liu 
---
 include/linux/perf_event.h|  5 -
 include/linux/trace_events.h  |  5 +
 include/uapi/linux/bpf_perf_event.h   |  7 ++
 kernel/bpf/btf.c  |  5 +
 kernel/trace/bpf_trace.c  | 27 +++
 tools/include/uapi/linux/bpf_perf_event.h |  8 +++
 6 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 00ab5efa38334..3a68c999f50d1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -59,11 +59,6 @@ struct perf_guest_info_callbacks {
 #include 
 #include 
 
-struct perf_callchain_entry {
-   __u64   nr;
-   __u64   ip[]; /* 
/proc/sys/kernel/perf_event_max_stack */
-};
-
 struct perf_callchain_entry_ctx {
struct perf_callchain_entry *entry;
u32 max_stack;
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 5c69433540494..8e1e88f40eef9 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -631,6 +631,7 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
u32 *fd_type, const char **buf,
u64 *probe_offset, u64 *probe_addr);
+int bpf_trace_init_btf_ids(struct btf *btf);
 #else
 static inline unsigned int trace_call_bpf(struct trace_event_call *call, void 
*ctx)
 {
@@ -672,6 +673,10 @@ static inline int bpf_get_perf_event_info(const struct 
perf_event *event,
 {
return -EOPNOTSUPP;
 }
+int bpf_trace_init_btf_ids(struct btf *btf)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 
 enum {
diff --git a/include/uapi/linux/bpf_perf_event.h 
b/include/uapi/linux/bpf_perf_event.h
index eb1b9d21250c6..40f4df80ab4fa 100644
--- a/include/uapi/linux/bpf_perf_event.h
+++ b/include/uapi/linux/bpf_perf_event.h
@@ -9,11 +9,18 @@
 #define _UAPI__LINUX_BPF_PERF_EVENT_H__
 
 #include 
+#include 
+
+struct perf_callchain_entry {
+   __u64   nr;
+   __u64   ip[]; /* 
/proc/sys/kernel/perf_event_max_stack */
+};
 
 struct bpf_perf_event_data {
bpf_user_pt_regs_t regs;
__u64 sample_period;
__u64 addr;
+   __bpf_md_ptr(struct perf_callchain_entry *, callchain);
 };
 
 #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 4c3007f428b16..cb122e14dba38 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* BTF (BPF Type Format) is the meta data format which describes
@@ -3673,6 +3674,10 @@ struct btf *btf_parse_vmlinux(void)
if (err < 0)
goto errout;
 
+   err = bpf_trace_init_btf_ids(btf);
+   if (err < 0)
+   goto errout;
+
bpf_struct_ops_init(btf, log);
init_btf_sock_ids(btf);
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e0b7775039ab9..c014846c2723c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -31,6 +32,20 @@ struct bpf_trace_module {
 static LIST_HEAD(bpf_trace_modules);
 static DEFINE_MUTEX(bpf_module_mutex);
 
+static u32 perf_callchain_entry_btf_id;
+
+int bpf_trace_init_btf_ids(struct btf *btf)
+{
+   s32 type_id;
+
+   type_id = btf_find_by_name_kind(btf, "perf_callchain_entry",
+   BTF_KIND_STRUCT);
+   if (type_id < 0)
+   return -EINVAL;
+   perf_callchain_entry_btf_id = type_id;
+   return 0;
+}
+
 static struct bpf_raw_event_map *bpf_get_raw_tracepoint_module(const char 
*name)
 {
struct bpf_raw_event_map *btp, *ret = NULL;
@@ -1650,6 +1665,10 @@ static bool pe_prog_is_valid_access(int off, int size, 
enum bpf_access_type type
if (!bpf_ctx_narrow_access_ok(off, size, size_u64))
return false;
break;
+   case bpf_ctx_range(struct bpf_perf_event_data, callchain):
+   info->reg_type = PTR_TO_BTF_ID;
+   info->btf_id = perf_callchain_entry_btf_id;
+   break;
default:
if (size != sizeof(long))
return false;
@@ -1682,6 +1701,14 @@ static u32 pe_prog_convert_ctx_access(enum 
bpf_access_type type,
  bpf_target_off(struct perf_sample_data, 
addr, 8,
 target_size));
break;
+   case offsetof(struct bpf_perf_event_data, callchain):
+   *insn++ =

[PATCH bpf-next 3/5] bpf: introduce bpf_get_callchain_stackid

2020-07-10 Thread Song Liu

This helper is only used by BPF program attached to perf_event. If the
perf_event has PEBS entries, calling get_perf_callchain from BPF program
may cause unwinder errors. bpf_get_callchain_stackid serves as alternative
to bpf_get_stackid for these BPF programs.

Signed-off-by: Song Liu 
---
 include/linux/bpf.h|  1 +
 include/uapi/linux/bpf.h   | 43 +++
 kernel/bpf/stackmap.c  | 63 ++
 kernel/bpf/verifier.c  |  4 ++-
 kernel/trace/bpf_trace.c   |  2 ++
 scripts/bpf_helpers_doc.py |  2 ++
 tools/include/uapi/linux/bpf.h | 43 +++
 7 files changed, 142 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0cd7f6884c5cd..45cf12acb0e26 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1628,6 +1628,7 @@ extern const struct bpf_func_proto 
bpf_get_current_comm_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
 extern const struct bpf_func_proto bpf_get_stack_proto;
 extern const struct bpf_func_proto bpf_get_task_stack_proto;
+extern const struct bpf_func_proto bpf_get_callchain_stackid_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;
 extern const struct bpf_func_proto bpf_sock_hash_update_proto;
 extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 548a749aebb3e..a808accfbd457 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3319,6 +3319,48 @@ union bpf_attr {
  * A non-negative value equal to or less than *size* on success,
  * or a negative error in case of failure.
  *
+ * long bpf_get_callchain_stackid(struct perf_callchain_entry *callchain, 
struct bpf_map *map, u64 flags)
+ * Description
+ * Walk a user or a kernel stack and return its id. To achieve
+ * this, the helper needs *callchain*, which is a pointer to a
+ * valid perf_callchain_entry, and a pointer to a *map* of type
+ * **BPF_MAP_TYPE_STACK_TRACE**.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * a combination of the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_FAST_STACK_CMP**
+ * Compare stacks by hash only.
+ * **BPF_F_REUSE_STACKID**
+ * If two different stacks hash into the same *stackid*,
+ * discard the old one.
+ *
+ * The stack id retrieved is a 32 bit long integer handle which
+ * can be further combined with other data (including other stack
+ * ids) and used as a key into maps. This can be useful for
+ * generating a variety of graphs (such as flame graphs or off-cpu
+ * graphs).
+ *
+ * For walking a stack, this helper is an improvement over
+ * **bpf_probe_read**\ (), which can be used with unrolled loops
+ * but is not efficient and consumes a lot of eBPF instructions.
+ * Instead, **bpf_get_callchain_stackid**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=
+ * Return
+ * The positive or null stack id on success, or a negative error
+ * in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3463,6 +3505,7 @@ union bpf_attr {
FN(skc_to_tcp_request_sock),\
FN(skc_to_udp6_sock),   \
FN(get_task_stack), \
+   FN(get_callchain_stackid),  \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a6c361ed7937b..28acc610f7f94 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -386,11 +386,10 @@ get_callchain_entry_for_task(struct task_struct *task, 
u32 init_nr)
 #endif
 }
 
-BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
-  u64, flags)
+static long __bpf_get_stackid(struct bpf_map *map, struct perf_callchain_entry 
*trace,
+ u64 flags)
 {
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, 
map);
-   struct perf_callchain_entry *trace;
struct stack_map_bucket

[PATCH bpf-next 0/5] bpf: fix stackmap on perf_events with PEBS

2020-07-10 Thread Song Liu

Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, perf subsystem fetches callchain early,
and marks perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
Similar issue exists when BPF program calls get_perf_callchain() via
helper functions. For more information about this issue, please refer to
discussions in [1].

This set provides a solution for this problem.

1/5 blocks ioctl(PERF_EVENT_IOC_SET_BPF) attaching BPF program that calls
get_perf_callchain() to perf events with PEBS entries.
2/5 exposes callchain fetched by perf subsystem to BPF program.
3/5 introduces bpf_get_callchain_stackid(), which is alternative to
bpf_get_stackid() for perf_event with PEBS.
4/5 adds selftests for 1/5.
5/5 adds selftests for 2/5 and 3/5.

[1] https://lore.kernel.org/lkml/ed7b9430-6489-4260-b3c5-9cfa2e3aa...@fb.com/

Song Liu (5):
  bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries
  bpf: add callchain to bpf_perf_event_data
  bpf: introduce bpf_get_callchain_stackid
  selftests/bpf: add get_stackid_cannot_attach
  selftests/bpf: add callchain_stackid

 include/linux/bpf.h   |  1 +
 include/linux/filter.h|  3 +-
 include/linux/perf_event.h|  5 --
 include/linux/trace_events.h  |  5 ++
 include/uapi/linux/bpf.h  | 43 +
 include/uapi/linux/bpf_perf_event.h   |  7 +++
 kernel/bpf/btf.c  |  5 ++
 kernel/bpf/stackmap.c | 63 ++-
 kernel/bpf/verifier.c |  7 ++-
 kernel/events/core.c  | 10 +++
 kernel/trace/bpf_trace.c  | 29 +
 scripts/bpf_helpers_doc.py|  2 +
 tools/include/uapi/linux/bpf.h| 43 +
 tools/include/uapi/linux/bpf_perf_event.h |  8 +++
 .../bpf/prog_tests/callchain_stackid.c| 61 ++
 .../prog_tests/get_stackid_cannot_attach.c| 57 +
 .../selftests/bpf/progs/callchain_stackid.c   | 37 +++
 17 files changed, 364 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/callchain_stackid.c
 create mode 100644 
tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/callchain_stackid.c

--
2.24.1

[GIT PULL] libnvdimm fix for v5.8-rc5

2020-07-10 Thread Dan Williams

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
tags/libnvdimm-fix-v5.8-rc5

...to receive a one line fix for a regression from some of the 'keys'
subsystem reworks that landed in -rc1. I had been holding off to see
if anything else percolated up, but nothing has.

Please pull, thanks.

---

The following changes since commit 48778464bb7d346b47157d21ffde2af6b2d39110:

  Linux 5.8-rc2 (2020-06-21 15:45:29 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
tags/libnvdimm-fix-v5.8-rc5

for you to fetch changes up to 813357fead4adee73f7eca6bbe0e69dfcf514dc6:

  libnvdimm/security: Fix key lookup permissions (2020-07-08 17:08:01 -0700)


libnvdimm fix for v5.8-rc5

Fix key ring search permissions to address a regression from -rc1.


Dan Williams (1):
  libnvdimm/security: Fix key lookup permissions

 drivers/nvdimm/security.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Re: [PATCH v6 15/17] static_call: Allow early init

2020-07-10 Thread Steven Rostedt

On Fri, 10 Jul 2020 15:38:46 +0200
Peter Zijlstra  wrote:

> In order to use static_call() to wire up x86_pmu, we need to
> initialize earlier; copy some of the tricks from jump_label to enable
> this.
> 
> Primarily we overload key->next to store a sites pointer when there
> are no modules, this avoids having to use kmalloc() to initialize the
> sites and allows us to run much earlier.
> 

I'm confused. What was the need to have key->next store site pointers
in order to move it up earlier?

-- Steve


> (arguably, this is much much earlier than needed for perf, but it
> might allow other uses.)
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/kernel/setup.c   |2 +
>  arch/x86/kernel/static_call.c |8 +-
>  include/linux/static_call.h   |   15 +--
>  kernel/static_call.c  |   55 
> +++---
>  4 files changed, 74 insertions(+), 6 deletions(-)
> 
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -848,6 +849,7 @@ void __init setup_arch(char **cmdline_p)
>   early_cpu_init();
>   arch_init_ideal_nops();
>   jump_label_init();
> + static_call_init();
>   early_ioremap_init();
>  
>   setup_olpc_ofw_pgd();
> --- a/arch/x86/kernel/static_call.c
> +++ b/arch/x86/kernel/static_call.c
> @@ -11,7 +11,7 @@ enum insn_type {
>   RET = 3,  /* tramp / site cond-tail-call */
>  };
>  
> -static void __static_call_transform(void *insn, enum insn_type type, void 
> *func)
> +static void __ref __static_call_transform(void *insn, enum insn_type type, 
> void *func)
>  {
>   int size = CALL_INSN_SIZE;
>   const void *code;
> @@ -33,11 +33,17 @@ static void __static_call_transform(void
>   code = text_gen_insn(RET_INSN_OPCODE, insn, func);
>   size = RET_INSN_SIZE;
>   break;
> +
> + default: /* GCC is a moron -- it figures @code can be uninitialized 
> below */
> + BUG();
>   }
>  
>   if (memcmp(insn, code, size) == 0)
>   return;
>  
> + if (unlikely(system_state == SYSTEM_BOOTING))
> + return text_poke_early(insn, code, size);
> +
>   text_poke_bp(insn, code, size, NULL);
>  }
>  
> --- a/include/linux/static_call.h
> +++ b/include/linux/static_call.h
> @@ -99,6 +99,8 @@ extern void arch_static_call_transform(v
>  
>  #ifdef CONFIG_HAVE_STATIC_CALL_INLINE
>  
> +extern void __init static_call_init(void);
> +
>  struct static_call_mod {
>   struct static_call_mod *next;
>   struct module *mod; /* for vmlinux, mod == NULL */
> @@ -107,7 +109,12 @@ struct static_call_mod {
>  
>  struct static_call_key {
>   void *func;
> - struct static_call_mod *mods;
> + union {
> + /* bit 0: 0 = mods, 1 = sites */
> + unsigned long type;
> + struct static_call_mod *mods;
> + struct static_call_site *sites;
> + };
>  };
>  
>  extern void __static_call_update(struct static_call_key *key, void *tramp, 
> void *func);
> @@ -118,7 +125,7 @@ extern int static_call_text_reserved(voi
>   DECLARE_STATIC_CALL(name, _func);   \
>   struct static_call_key STATIC_CALL_KEY(name) = {\
>   .func = _func,  \
> - .mods = NULL,   \
> + .type = 1,  \
>   };  \
>   ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func)
>  
> @@ -143,6 +150,8 @@ extern int static_call_text_reserved(voi
>  
>  #elif defined(CONFIG_HAVE_STATIC_CALL)
>  
> +static inline void static_call_init(void) { }
> +
>  struct static_call_key {
>   void *func;
>  };
> @@ -188,6 +197,8 @@ static inline int static_call_text_reser
>  
>  #else /* Generic implementation */
>  
> +static inline void static_call_init(void) { }
> +
>  struct static_call_key {
>   void *func;
>  };
> --- a/kernel/static_call.c
> +++ b/kernel/static_call.c
> @@ -94,10 +94,31 @@ static inline void static_call_sort_entr
>static_call_site_cmp, static_call_site_swap);
>  }
>  
> +static inline bool static_call_key_has_mods(struct static_call_key *key)
> +{
> + return !(key->type & 1);
> +}
> +
> +static inline struct static_call_mod *static_call_key_next(struct 
> static_call_key *key)
> +{
> + if (!static_call_key_has_mods(key))
> + return NULL;
> +
> + return key->mods;
> +}
> +
> +static inline struct static_call_site *static_call_key_sites(struct 
> static_call_key *key)
> +{
> + if (static_call_key_has_mods(key))
> + return NULL;
> +
> + return (struct static_call_site *)(key->type & ~1);
> +}
> +
>  void __static_call_update(struct static_call_key *key, void *tramp, void 
> *func)
>  {
>   struct

[PATCH] mm: vmscan: consistent update to pgrefill

2020-07-10 Thread Shakeel Butt

The vmstat pgrefill is useful together with pgscan and pgsteal stats to
measure the reclaim efficiency. However vmstat's pgrefill is not updated
consistently at system level. It gets updated for both global and memcg
reclaim however pgscan and pgsteal are updated for only global reclaim.
So, update pgrefill only for global reclaim. If someone is interested in
the stats representing both system level as well as memcg level reclaim,
then consult the root memcg's memory.stat instead of /proc/vmstat.

Signed-off-by: Shakeel Butt 
---
 mm/vmscan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5215840ee217..4167b0cc1784 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2030,7 +2030,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
 
-   __count_vm_events(PGREFILL, nr_scanned);
+   if (!cgroup_reclaim(sc))
+   __count_vm_events(PGREFILL, nr_scanned);
__count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned);
 
spin_unlock_irq(>lru_lock);
-- 
2.27.0.383.g050319c2ae-goog

[PATCH] drm: sun4i: hdmi: Fix inverted HPD result

2020-07-10 Thread Chen-Yu Tsai

From: Chen-Yu Tsai 

When the extra HPD polling in sun4i_hdmi was removed, the result of
HPD was accidentally inverted.

Fix this by inverting the check.

Fixes: bda8eaa6dee7 ("drm: sun4i: hdmi: Remove extra HPD polling")
Signed-off-by: Chen-Yu Tsai 
---

Sorry for the screw-up.

---
 drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c 
b/drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c
index 557cbe5ab35f..2f2c9f0a1071 100644
--- a/drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c
+++ b/drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c
@@ -260,7 +260,7 @@ sun4i_hdmi_connector_detect(struct drm_connector 
*connector, bool force)
unsigned long reg;
 
reg = readl(hdmi->base + SUN4I_HDMI_HPD_REG);
-   if (reg & SUN4I_HDMI_HPD_HIGH) {
+   if (!(reg & SUN4I_HDMI_HPD_HIGH)) {
cec_phys_addr_invalidate(hdmi->cec_adap);
return connector_status_disconnected;
}
-- 
2.27.0

Re: [PATCH v16 00/22] per memcg lru_lock

2020-07-10 Thread Alex Shi

Hi Hugh,

I believe I own your a 'tested-by' for previous version.
Could you like to give a try on the new version and give a reviewed or tested-by
if it's fine.

Thanks
Alex

[PATCH v16 00/22] per memcg lru_lock

2020-07-10 Thread Alex Shi

The new version which bases on v5.8-rc4. Add 2 more patchs:
'mm/thp: remove code path which never got into'
'mm/thp: add tail pages into lru anyway in split_huge_page()'
and modified 'mm/mlock: reorder isolation sequence during munlock'

Current lru_lock is one for each of node, pgdat->lru_lock, that guard for
lru lists, but now we had moved the lru lists into memcg for long time. Still
using per node lru_lock is clearly unscalable, pages on each of memcgs have
to compete each others for a whole lru_lock. This patchset try to use per
lruvec/memcg lru_lock to repleace per node lru lock to guard lru lists, make
it scalable for memcgs and get performance gain.

Currently lru_lock still guards both lru list and page's lru bit, that's ok.
but if we want to use specific lruvec lock on the page, we need to pin down
the page's lruvec/memcg during locking. Just taking lruvec lock first may be
undermined by the page's memcg charge/migration. To fix this problem, we could
take out the page's lru bit clear and use it as pin down action to block the
memcg changes. That's the reason for new atomic func TestClearPageLRU.
So now isolating a page need both actions: TestClearPageLRU and hold the
lru_lock.

The typical usage of this is isolate_migratepages_block() in compaction.c
we have to take lru bit before lru lock, that serialized the page isolation
in memcg page charge/migration which will change page's lruvec and new 
lru_lock in it.

The above solution suggested by Johannes Weiner, and based on his new memcg 
charge path, then have this patchset. (Hugh Dickins tested and contributed much
code from compaction fix to general code polish, thanks a lot!).

The patchset includes 3 parts:
1, some code cleanup and minimum optimization as a preparation.
2, use TestCleanPageLRU as page isolation's precondition
3, replace per node lru_lock with per memcg per node lru_lock

Following Daniel Jordan's suggestion, I have run 208 'dd' with on 104
containers on a 2s * 26cores * HT box with a modefied case:
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice
With this patchset, the readtwice performance increased about 80%
in concurrent containers.

Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought this
idea 8 years ago, and others who give comments as well: Daniel Jordan, 
Mel Gorman, Shakeel Butt, Matthew Wilcox etc.

Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
and Yun Wang. Hugh Dickins also shared his kbuild-swap case. Thanks!

Alex Shi (20):
  mm/vmscan: remove unnecessary lruvec adding
  mm/page_idle: no unlikely double check for idle page counting
  mm/compaction: correct the comments of compact_defer_shift
  mm/compaction: rename compact_deferred as compact_should_defer
  mm/thp: move lru_add_page_tail func to huge_memory.c
  mm/thp: clean up lru_add_page_tail
  mm/thp: remove code path which never got into
  mm/thp: narrow lru locking
  mm/memcg: add debug checking in lock_page_memcg
  mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn
  mm/lru: move lru_lock holding in func lru_note_cost_page
  mm/lru: move lock into lru_note_cost
  mm/lru: introduce TestClearPageLRU
  mm/thp: add tail pages into lru anyway in split_huge_page()
  mm/compaction: do page isolation first in compaction
  mm/mlock: reorder isolation sequence during munlock
  mm/swap: serialize memcg changes during pagevec_lru_move_fn
  mm/lru: replace pgdat lru_lock with lruvec lock
  mm/lru: introduce the relock_page_lruvec function
  mm/pgdat: remove pgdat lru_lock

Hugh Dickins (2):
  mm/vmscan: use relock for move_pages_to_lru
  mm/lru: revise the comments of lru_lock

 Documentation/admin-guide/cgroup-v1/memcg_test.rst |  15 +-
 Documentation/admin-guide/cgroup-v1/memory.rst |  21 +--
 Documentation/trace/events-kmem.rst|   2 +-
 Documentation/vm/unevictable-lru.rst   |  22 +--
 include/linux/compaction.h |   4 +-
 include/linux/memcontrol.h |  98 +++
 include/linux/mm_types.h   |   2 +-
 include/linux/mmzone.h |   6 +-
 include/linux/page-flags.h |   1 +
 include/linux/swap.h   |   4 +-
 include/trace/events/compaction.h  |   2 +-
 mm/compaction.c| 113 
 mm/filemap.c   |   4 +-
 mm/huge_memory.c   |  47 +++--
 mm/memcontrol.c|  71 +++-
 mm/memory.c|   3 -
 mm/mlock.c |  93 +-
 mm/mmzone.c|   1 +
 mm/page_alloc.c|   1 -
 mm/page_idle.c |   8 -
 mm/rmap.c  |   4 +-

[PATCH v16 16/22] mm/mlock: reorder isolation sequence during munlock

2020-07-10 Thread Alex Shi

This patch reorder the isolation steps during munlock, move the lru lock
to guard each pages, unfold __munlock_isolate_lru_page func, to do the
preparation for lru lock change.

__split_huge_page_refcount doesn't exist, but we still have to guard
PageMlocked and PageLRU for tail page in __split_huge_page_tail.

[l...@intel.com: found a sleeping function bug ... at mm/rmap.c]
Signed-off-by: Alex Shi 
Cc: Kirill A. Shutemov 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/mlock.c | 93 ++
 1 file changed, 51 insertions(+), 42 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 228ba5a8e0a5..0bdde88b4438 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -103,25 +103,6 @@ void mlock_vma_page(struct page *page)
 }
 
 /*
- * Isolate a page from LRU with optional get_page() pin.
- * Assumes lru_lock already held and page already pinned.
- */
-static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
-{
-   if (TestClearPageLRU(page)) {
-   struct lruvec *lruvec;
-
-   lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (getpage)
-   get_page(page);
-   del_page_from_lru_list(page, lruvec, page_lru(page));
-   return true;
-   }
-
-   return false;
-}
-
-/*
  * Finish munlock after successful page isolation
  *
  * Page must be locked. This is a wrapper for try_to_munlock()
@@ -181,6 +162,7 @@ static void __munlock_isolation_failed(struct page *page)
 unsigned int munlock_vma_page(struct page *page)
 {
int nr_pages;
+   bool clearlru = false;
pg_data_t *pgdat = page_pgdat(page);
 
/* For try_to_munlock() and to serialize with page migration */
@@ -189,32 +171,42 @@ unsigned int munlock_vma_page(struct page *page)
VM_BUG_ON_PAGE(PageTail(page), page);
 
/*
-* Serialize with any parallel __split_huge_page_refcount() which
+* Serialize split tail pages in __split_huge_page_tail() which
 * might otherwise copy PageMlocked to part of the tail pages before
 * we clear it in the head page. It also stabilizes hpage_nr_pages().
 */
+   get_page(page);
+   clearlru = TestClearPageLRU(page);
spin_lock_irq(>lru_lock);
 
if (!TestClearPageMlocked(page)) {
-   /* Potentially, PTE-mapped THP: do not skip the rest PTEs */
-   nr_pages = 1;
-   goto unlock_out;
+   if (clearlru)
+   SetPageLRU(page);
+   /*
+* Potentially, PTE-mapped THP: do not skip the rest PTEs
+* Reuse lock as memory barrier for release_pages racing.
+*/
+   spin_unlock_irq(>lru_lock);
+   put_page(page);
+   return 0;
}
 
nr_pages = hpage_nr_pages(page);
__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
 
-   if (__munlock_isolate_lru_page(page, true)) {
+   if (clearlru) {
+   struct lruvec *lruvec;
+
+   lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
+   del_page_from_lru_list(page, lruvec, page_lru(page));
spin_unlock_irq(>lru_lock);
__munlock_isolated_page(page);
-   goto out;
+   } else {
+   spin_unlock_irq(>lru_lock);
+   put_page(page);
+   __munlock_isolation_failed(page);
}
-   __munlock_isolation_failed(page);
-
-unlock_out:
-   spin_unlock_irq(>lru_lock);
 
-out:
return nr_pages - 1;
 }
 
@@ -297,34 +289,51 @@ static void __munlock_pagevec(struct pagevec *pvec, 
struct zone *zone)
pagevec_init(_putback);
 
/* Phase 1: page isolation */
-   spin_lock_irq(>zone_pgdat->lru_lock);
for (i = 0; i < nr; i++) {
struct page *page = pvec->pages[i];
+   struct lruvec *lruvec;
+   bool clearlru;
 
-   if (TestClearPageMlocked(page)) {
-   /*
-* We already have pin from follow_page_mask()
-* so we can spare the get_page() here.
-*/
-   if (__munlock_isolate_lru_page(page, false))
-   continue;
-   else
-   __munlock_isolation_failed(page);
-   } else {
+   clearlru = TestClearPageLRU(page);
+   spin_lock_irq(>zone_pgdat->lru_lock);
+
+   if (!TestClearPageMlocked(page)) {
delta_munlocked++;
+   if (clearlru)
+   SetPageLRU(page);
+   goto putback;
+   }
+
+   if (!clearlru) {
+

[PATCH v16 22/22] mm/lru: revise the comments of lru_lock

2020-07-10 Thread Alex Shi

From: Hugh Dickins 

Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to
fix the incorrect comments in code. Also fixed some zone->lru_lock comment
error from ancient time. etc.

Signed-off-by: Hugh Dickins 
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Tejun Heo 
Cc: Andrey Ryabinin 
Cc: Jann Horn 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: cgro...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +++
 Documentation/admin-guide/cgroup-v1/memory.rst | 21 +
 Documentation/trace/events-kmem.rst|  2 +-
 Documentation/vm/unevictable-lru.rst   | 22 --
 include/linux/mm_types.h   |  2 +-
 include/linux/mmzone.h |  2 +-
 mm/filemap.c   |  4 ++--
 mm/memcontrol.c|  2 +-
 mm/rmap.c  |  4 ++--
 mm/vmscan.c| 12 
 10 files changed, 36 insertions(+), 50 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst 
b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
index 3f7115e07b5d..0b9f91589d3d 100644
--- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst
+++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
@@ -133,18 +133,9 @@ Under below explanation, we assume 
CONFIG_MEM_RES_CTRL_SWAP=y.
 
 8. LRU
 ==
-Each memcg has its own private LRU. Now, its handling is under global
-   VM's control (means that it's handled under global pgdat->lru_lock).
-   Almost all routines around memcg's LRU is called by global LRU's
-   list management functions under pgdat->lru_lock.
-
-   A special function is mem_cgroup_isolate_pages(). This scans
-   memcg's private LRU and call __isolate_lru_page() to extract a page
-   from LRU.
-
-   (By __isolate_lru_page(), the page is removed from both of global and
-   private LRU.)
-
+   Each memcg has its own vector of LRUs (inactive anon, active anon,
+   inactive file, active file, unevictable) of pages from each node,
+   each LRU handled under a single lru_lock for that memcg and node.
 
 9. Typical Tests.
 =
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst 
b/Documentation/admin-guide/cgroup-v1/memory.rst
index 12757e63b26c..24450696579f 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -285,20 +285,17 @@ When oom event notifier is registered, event will be 
delivered.
 2.6 Locking
 ---
 
-   lock_page_cgroup()/unlock_page_cgroup() should not be called under
-   the i_pages lock.
+Lock order is as follows:
 
-   Other lock order is following:
+  Page lock (PG_locked bit of page->flags)
+mm->page_table_lock or split pte_lock
+  lock_page_memcg (memcg->move_lock)
+mapping->i_pages lock
+  lruvec->lru_lock.
 
-   PG_locked.
- mm->page_table_lock
- pgdat->lru_lock
-  lock_page_cgroup.
-
-  In many cases, just lock_page_cgroup() is called.
-
-  per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
-  pgdat->lru_lock, it has no lock of its own.
+Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
+lruvec->lru_lock; PG_lru bit of page->flags is cleared before
+isolating a page from its LRU under lruvec->lru_lock.
 
 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
 ---
diff --git a/Documentation/trace/events-kmem.rst 
b/Documentation/trace/events-kmem.rst
index 555484110e36..68fa75247488 100644
--- a/Documentation/trace/events-kmem.rst
+++ b/Documentation/trace/events-kmem.rst
@@ -69,7 +69,7 @@ When pages are freed in batch, the also mm_page_free_batched 
is triggered.
 Broadly speaking, pages are taken off the LRU lock in bulk and
 freed in batch with a page list. Significant amounts of activity here could
 indicate that the system is under memory pressure and can also indicate
-contention on the zone->lru_lock.
+contention on the lruvec->lru_lock.
 
 4. Per-CPU Allocator Activity
 =
diff --git a/Documentation/vm/unevictable-lru.rst 
b/Documentation/vm/unevictable-lru.rst
index 17d0861b0f1d..0e1490524f53 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -33,7 +33,7 @@ reclaim in Linux.  The problems have been observed at 
customer sites on large
 memory x86_64 systems.
 
 To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of
-main memory will have over 32 million 4k pages in a single zone.  When a large
+main memory will have over 32 million 4k pages in a single node.  When a large
 fraction of these pages are not evictable for any reason [see below], vmscan
 will spend a lot

[PATCH v16 02/22] mm/page_idle: no unlikely double check for idle page counting

2020-07-10 Thread Alex Shi

As func comments mentioned, few isolated page missing be tolerated.
So why not do further to drop the unlikely double check. That won't
cause more idle pages, but reduce a lock contention.

This is also a preparation for later new page isolation feature.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/page_idle.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/mm/page_idle.c b/mm/page_idle.c
index 057c61df12db..5fdd753e151a 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -32,19 +32,11 @@
 static struct page *page_idle_get_page(unsigned long pfn)
 {
struct page *page = pfn_to_online_page(pfn);
-   pg_data_t *pgdat;
 
if (!page || !PageLRU(page) ||
!get_page_unless_zero(page))
return NULL;
 
-   pgdat = page_pgdat(page);
-   spin_lock_irq(>lru_lock);
-   if (unlikely(!PageLRU(page))) {
-   put_page(page);
-   page = NULL;
-   }
-   spin_unlock_irq(>lru_lock);
return page;
 }
 
-- 
1.8.3.1

[PATCH v16 08/22] mm/thp: narrow lru locking

2020-07-10 Thread Alex Shi

lru_lock and page cache xa_lock have no reason with current sequence,
put them together isn't necessary. let's narrow the lru locking, but
left the local_irq_disable to block interrupt re-entry and statistic update.

Hugh Dickins point: split_huge_page_to_list() was already silly,to be
using the _irqsave variant: it's just been taking sleeping locks, so
would already be broken if entered with interrupts enabled.
so we can save passing flags argument down to __split_huge_page().

Signed-off-by: Alex Shi 
Signed-off-by: Wei Yang 
Reviewed-by: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Kirill A. Shutemov 
Cc: Andrea Arcangeli 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1fb4147ff854..d866b6e43434 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2423,7 +2423,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
 }
 
 static void __split_huge_page(struct page *page, struct list_head *list,
-   pgoff_t end, unsigned long flags)
+ pgoff_t end)
 {
struct page *head = compound_head(page);
pg_data_t *pgdat = page_pgdat(head);
@@ -2432,8 +2432,6 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
unsigned long offset = 0;
int i;
 
-   lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
/* complete memcg works before add pages to LRU */
mem_cgroup_split_huge_fixup(head);
 
@@ -2445,6 +2443,11 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
xa_lock(_cache->i_pages);
}
 
+   /* prevent PageLRU to go away from under us, and freeze lru stats */
+   spin_lock(>lru_lock);
+
+   lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
@@ -2464,6 +2467,8 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
}
 
ClearPageCompound(head);
+   spin_unlock(>lru_lock);
+   /* Caller disabled irqs, so they are still disabled here */
 
split_page_owner(head, HPAGE_PMD_ORDER);
 
@@ -2481,8 +2486,7 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
page_ref_add(head, 2);
xa_unlock(>mapping->i_pages);
}
-
-   spin_unlock_irqrestore(>lru_lock, flags);
+   local_irq_enable();
 
remap_page(head);
 
@@ -2621,12 +2625,10 @@ bool can_split_huge_page(struct page *page, int 
*pextra_pins)
 int split_huge_page_to_list(struct page *page, struct list_head *list)
 {
struct page *head = compound_head(page);
-   struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int count, mapcount, extra_pins, ret;
-   unsigned long flags;
pgoff_t end;
 
VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2687,9 +2689,8 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
unmap_page(head);
VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
-   /* prevent PageLRU to go away from under us, and freeze lru stats */
-   spin_lock_irqsave(>lru_lock, flags);
-
+   /* block interrupt reentry in xa_lock and spinlock */
+   local_irq_disable();
if (mapping) {
XA_STATE(xas, >i_pages, page_index(head));
 
@@ -2719,7 +2720,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
__dec_node_page_state(head, NR_FILE_THPS);
}
 
-   __split_huge_page(page, list, end, flags);
+   __split_huge_page(page, list, end);
if (PageSwapCache(head)) {
swp_entry_t entry = { .val = page_private(head) };
 
@@ -2738,7 +2739,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
spin_unlock(_queue->split_queue_lock);
 fail:  if (mapping)
xa_unlock(>i_pages);
-   spin_unlock_irqrestore(>lru_lock, flags);
+   local_irq_enable();
remap_page(head);
ret = -EBUSY;
}
-- 
1.8.3.1

[PATCH v16 03/22] mm/compaction: correct the comments of compact_defer_shift

2020-07-10 Thread Alex Shi

There is no compact_defer_limit. It should be compact_defer_shift in
use. and add compact_order_failed explanation.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/mmzone.h | 1 +
 mm/compaction.c| 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f6f884970511..14c668b7e793 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -512,6 +512,7 @@ struct zone {
 * On compaction failure, 1<

[PATCH v16 01/22] mm/vmscan: remove unnecessary lruvec adding

2020-07-10 Thread Alex Shi

We don't have to add a freeable page into lru and then remove from it.
This change saves a couple of actions and makes the moving more clear.

The SetPageLRU needs to be kept here for list intergrity.
Otherwise:
 #0 mave_pages_to_lru  #1 release_pages
   if (put_page_testzero())
 if !put_page_testzero
 !PageLRU //skip lru_lock
   list_add(>lru,)
   list_add(>lru,) //corrupt

[a...@linux-foundation.org: coding style fixes]
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Tejun Heo 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmscan.c | 37 -
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 749d239c62b2..ddb29d813d77 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1856,26 +1856,29 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
while (!list_empty(list)) {
page = lru_to_page(list);
VM_BUG_ON_PAGE(PageLRU(page), page);
+   list_del(>lru);
if (unlikely(!page_evictable(page))) {
-   list_del(>lru);
spin_unlock_irq(>lru_lock);
putback_lru_page(page);
spin_lock_irq(>lru_lock);
continue;
}
-   lruvec = mem_cgroup_page_lruvec(page, pgdat);
 
+   /*
+* The SetPageLRU needs to be kept here for list intergrity.
+* Otherwise:
+*   #0 mave_pages_to_lru #1 release_pages
+*if (put_page_testzero())
+*   if !put_page_testzero
+*  !PageLRU //skip lru_lock
+*list_add(>lru,)
+* list_add(>lru,) //corrupt
+*/
SetPageLRU(page);
-   lru = page_lru(page);
 
-   nr_pages = hpage_nr_pages(page);
-   update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
-   list_move(>lru, >lists[lru]);
-
-   if (put_page_testzero(page)) {
+   if (unlikely(put_page_testzero(page))) {
__ClearPageLRU(page);
__ClearPageActive(page);
-   del_page_from_lru_list(page, lruvec, lru);
 
if (unlikely(PageCompound(page))) {
spin_unlock_irq(>lru_lock);
@@ -1883,11 +1886,19 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
spin_lock_irq(>lru_lock);
} else
list_add(>lru, _to_free);
-   } else {
-   nr_moved += nr_pages;
-   if (PageActive(page))
-   workingset_age_nonresident(lruvec, nr_pages);
+
+   continue;
}
+
+   lruvec = mem_cgroup_page_lruvec(page, pgdat);
+   lru = page_lru(page);
+   nr_pages = hpage_nr_pages(page);
+
+   update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
+   list_add(>lru, >lists[lru]);
+   nr_moved += nr_pages;
+   if (PageActive(page))
+   workingset_age_nonresident(lruvec, nr_pages);
}
 
/*
-- 
1.8.3.1

[PATCH v16 04/22] mm/compaction: rename compact_deferred as compact_should_defer

2020-07-10 Thread Alex Shi

The compact_deferred is a defer suggestion check, deferring action does in
defer_compaction not here. so, better rename it to avoid confusing.

Signed-off-by: Alex Shi 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Mike Kravetz 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 include/linux/compaction.h| 4 ++--
 include/trace/events/compaction.h | 2 +-
 mm/compaction.c   | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 6fa0eea3f530..be9ed7437a38 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -100,7 +100,7 @@ extern enum compact_result compaction_suitable(struct zone 
*zone, int order,
unsigned int alloc_flags, int highest_zoneidx);
 
 extern void defer_compaction(struct zone *zone, int order);
-extern bool compaction_deferred(struct zone *zone, int order);
+extern bool compaction_should_defer(struct zone *zone, int order);
 extern void compaction_defer_reset(struct zone *zone, int order,
bool alloc_success);
 extern bool compaction_restarting(struct zone *zone, int order);
@@ -199,7 +199,7 @@ static inline void defer_compaction(struct zone *zone, int 
order)
 {
 }
 
-static inline bool compaction_deferred(struct zone *zone, int order)
+static inline bool compaction_should_defer(struct zone *zone, int order)
 {
return true;
 }
diff --git a/include/trace/events/compaction.h 
b/include/trace/events/compaction.h
index 54e5bf081171..33633c71df04 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -274,7 +274,7 @@
1UL << __entry->defer_shift)
 );
 
-DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred,
+DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_should_defer,
 
TP_PROTO(struct zone *zone, int order),
 
diff --git a/mm/compaction.c b/mm/compaction.c
index cd1ef9e5e638..f14780fc296a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -154,7 +154,7 @@ void defer_compaction(struct zone *zone, int order)
 }
 
 /* Returns true if compaction should be skipped this time */
-bool compaction_deferred(struct zone *zone, int order)
+bool compaction_should_defer(struct zone *zone, int order)
 {
unsigned long defer_limit = 1UL << zone->compact_defer_shift;
 
@@ -168,7 +168,7 @@ bool compaction_deferred(struct zone *zone, int order)
if (zone->compact_considered >= defer_limit)
return false;
 
-   trace_mm_compaction_deferred(zone, order);
+   trace_mm_compaction_should_defer(zone, order);
 
return true;
 }
@@ -2377,7 +2377,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, 
unsigned int order,
enum compact_result status;
 
if (prio > MIN_COMPACT_PRIORITY
-   && compaction_deferred(zone, order)) {
+   && compaction_should_defer(zone, order)) {
rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
continue;
}
@@ -2561,7 +2561,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
if (!populated_zone(zone))
continue;
 
-   if (compaction_deferred(zone, cc.order))
+   if (compaction_should_defer(zone, cc.order))
continue;
 
if (compaction_suitable(zone, cc.order, 0, zoneid) !=
-- 
1.8.3.1

[PATCH v16 17/22] mm/swap: serialize memcg changes during pagevec_lru_move_fn

2020-07-10 Thread Alex Shi

Hugh Dickins' found a memcg change bug on original version:
If we want to change the pgdat->lru_lock to memcg's lruvec lock, we have
to serialize mem_cgroup_move_account during pagevec_lru_move_fn. The
possible bad scenario would like:

cpu 0   cpu 1
lruvec = mem_cgroup_page_lruvec()
if (!isolate_lru_page())
mem_cgroup_move_account

spin_lock_irqsave(>lru_lock <== wrong lock.

So we need the ClearPageLRU to block isolate_lru_page(), then serialize
the memcg change here.

Reported-by: Hugh Dickins 
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/swap.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/mm/swap.c b/mm/swap.c
index 5092fe9c8c47..8488b9b25730 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -221,8 +221,14 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
spin_lock_irqsave(>lru_lock, flags);
}
 
+   /* block memcg migration during page moving between lru */
+   if (!TestClearPageLRU(page))
+   continue;
+
lruvec = mem_cgroup_page_lruvec(page, pgdat);
(*move_fn)(page, lruvec);
+
+   SetPageLRU(page);
}
if (pgdat)
spin_unlock_irqrestore(>lru_lock, flags);
@@ -976,7 +982,29 @@ static void __pagevec_lru_add_fn(struct page *page, struct 
lruvec *lruvec)
  */
 void __pagevec_lru_add(struct pagevec *pvec)
 {
-   pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn);
+   int i;
+   struct pglist_data *pgdat = NULL;
+   struct lruvec *lruvec;
+   unsigned long flags = 0;
+
+   for (i = 0; i < pagevec_count(pvec); i++) {
+   struct page *page = pvec->pages[i];
+   struct pglist_data *pagepgdat = page_pgdat(page);
+
+   if (pagepgdat != pgdat) {
+   if (pgdat)
+   spin_unlock_irqrestore(>lru_lock, flags);
+   pgdat = pagepgdat;
+   spin_lock_irqsave(>lru_lock, flags);
+   }
+
+   lruvec = mem_cgroup_page_lruvec(page, pgdat);
+   __pagevec_lru_add_fn(page, lruvec);
+   }
+   if (pgdat)
+   spin_unlock_irqrestore(>lru_lock, flags);
+   release_pages(pvec->pages, pvec->nr);
+   pagevec_reinit(pvec);
 }
 
 /**
-- 
1.8.3.1

[PATCH v16 12/22] mm/lru: move lock into lru_note_cost

2020-07-10 Thread Alex Shi

This patch move lru_lock into lru_note_cost. It's a bit ugly and may
cost more locking, but it's necessary for later per pgdat lru_lock to
per memcg lru_lock change.

Signed-off-by: Alex Shi 
Cc: Johannes Weiner 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/swap.c   | 5 +++--
 mm/vmscan.c | 4 +---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index b88ca630db70..f645965fde0e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -269,7 +269,9 @@ void lru_note_cost(struct lruvec *lruvec, bool file, 
unsigned int nr_pages)
 {
do {
unsigned long lrusize;
+   struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 
+   spin_lock_irq(>lru_lock);
/* Record cost event */
if (file)
lruvec->file_cost += nr_pages;
@@ -293,15 +295,14 @@ void lru_note_cost(struct lruvec *lruvec, bool file, 
unsigned int nr_pages)
lruvec->file_cost /= 2;
lruvec->anon_cost /= 2;
}
+   spin_unlock_irq(>lru_lock);
} while ((lruvec = parent_lruvec(lruvec)));
 }
 
 void lru_note_cost_page(struct page *page)
 {
-   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost(mem_cgroup_page_lruvec(page, page_pgdat(page)),
  page_is_file_lru(page), hpage_nr_pages(page));
-   spin_unlock_irq(_pgdat(page)->lru_lock);
 }
 
 static void __activate_page(struct page *page, struct lruvec *lruvec)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ddb29d813d77..c1c4259b4de5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1976,19 +1976,17 @@ static int current_may_throttle(void)
, false);
 
spin_lock_irq(>lru_lock);
-
move_pages_to_lru(lruvec, _list);
 
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
-   lru_note_cost(lruvec, file, stat.nr_pageout);
item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
if (!cgroup_reclaim(sc))
__count_vm_events(item, nr_reclaimed);
__count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
__count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
-
spin_unlock_irq(>lru_lock);
 
+   lru_note_cost(lruvec, file, stat.nr_pageout);
mem_cgroup_uncharge_list(_list);
free_unref_page_list(_list);
 
-- 
1.8.3.1

[PATCH v16 14/22] mm/thp: add tail pages into lru anyway in split_huge_page()

2020-07-10 Thread Alex Shi

split_huge_page() must start with PageLRU(head), but lru bit *maybe*
cleared by isolate_lru_page, anyway the head still in lru list, since we
still held the lru_lock.

Signed-off-by: Alex Shi 
Cc: Kirill A. Shutemov 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: Mika Penttilä 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d866b6e43434..4fe7b92c9330 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2348,15 +2348,18 @@ static void lru_add_page_tail(struct page *head, struct 
page *page_tail,
VM_BUG_ON_PAGE(PageLRU(page_tail), head);
lockdep_assert_held(_pgdat(lruvec)->lru_lock);
 
-   if (!list)
-   SetPageLRU(page_tail);
-
-   if (likely(PageLRU(head)))
-   list_add_tail(_tail->lru, >lru);
-   else if (list) {
+   if (list) {
/* page reclaim is reclaiming a huge page */
get_page(page_tail);
list_add_tail(_tail->lru, list);
+   } else {
+   /*
+* Split start from PageLRU(head), but lru bit maybe cleared
+* by isolate_lru_page, but head still in lru list, since we
+* held the lru_lock.
+*/
+   SetPageLRU(page_tail);
+   list_add_tail(_tail->lru, >lru);
}
 }
 
-- 
1.8.3.1

[PATCH v16 06/22] mm/thp: clean up lru_add_page_tail

2020-07-10 Thread Alex Shi

Since the first parameter is only used by head page, it's better to make
it explicit.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9e050b13f597..b18f21da4dac 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2340,19 +2340,19 @@ static void remap_page(struct page *page)
}
 }
 
-static void lru_add_page_tail(struct page *page, struct page *page_tail,
+static void lru_add_page_tail(struct page *head, struct page *page_tail,
struct lruvec *lruvec, struct list_head *list)
 {
-   VM_BUG_ON_PAGE(!PageHead(page), page);
-   VM_BUG_ON_PAGE(PageCompound(page_tail), page);
-   VM_BUG_ON_PAGE(PageLRU(page_tail), page);
+   VM_BUG_ON_PAGE(!PageHead(head), head);
+   VM_BUG_ON_PAGE(PageCompound(page_tail), head);
+   VM_BUG_ON_PAGE(PageLRU(page_tail), head);
lockdep_assert_held(_pgdat(lruvec)->lru_lock);
 
if (!list)
SetPageLRU(page_tail);
 
-   if (likely(PageLRU(page)))
-   list_add_tail(_tail->lru, >lru);
+   if (likely(PageLRU(head)))
+   list_add_tail(_tail->lru, >lru);
else if (list) {
/* page reclaim is reclaiming a huge page */
get_page(page_tail);
-- 
1.8.3.1

[PATCH v16 09/22] mm/memcg: add debug checking in lock_page_memcg

2020-07-10 Thread Alex Shi

Add a debug checking in lock_page_memcg, then we could get alarm
if anything wrong here.

Suggested-by: Johannes Weiner 
Signed-off-by: Alex Shi 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Andrew Morton 
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/memcontrol.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 19622328e4b5..fde47272b13c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1983,6 +1983,12 @@ struct mem_cgroup *lock_page_memcg(struct page *page)
if (unlikely(!memcg))
return NULL;
 
+#ifdef CONFIG_PROVE_LOCKING
+   local_irq_save(flags);
+   might_lock(>move_lock);
+   local_irq_restore(flags);
+#endif
+
if (atomic_read(>moving_account) <= 0)
return memcg;
 
-- 
1.8.3.1

[PATCH v16 11/22] mm/lru: move lru_lock holding in func lru_note_cost_page

2020-07-10 Thread Alex Shi

It's a clean up patch w/o function changes.

Signed-off-by: Alex Shi 
Cc: Johannes Weiner 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/memory.c | 3 ---
 mm/swap.c   | 2 ++
 mm/swap_state.c | 2 --
 mm/workingset.c | 2 --
 4 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 87ec87cdc1ff..dafc5585517e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3150,10 +3150,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 * XXX: Move to lru_cache_add() when it
 * supports new vs putback
 */
-   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
-
lru_cache_add(page);
swap_readpage(page, true);
}
diff --git a/mm/swap.c b/mm/swap.c
index dc8b02cdddcb..b88ca630db70 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -298,8 +298,10 @@ void lru_note_cost(struct lruvec *lruvec, bool file, 
unsigned int nr_pages)
 
 void lru_note_cost_page(struct page *page)
 {
+   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost(mem_cgroup_page_lruvec(page, page_pgdat(page)),
  page_is_file_lru(page), hpage_nr_pages(page));
+   spin_unlock_irq(_pgdat(page)->lru_lock);
 }
 
 static void __activate_page(struct page *page, struct lruvec *lruvec)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 05889e8e3c97..080be52db6a8 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -440,9 +440,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
}
 
/* XXX: Move to lru_cache_add() when it supports new vs putback */
-   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
 
/* Caller will initiate read into locked page */
SetPageWorkingset(page);
diff --git a/mm/workingset.c b/mm/workingset.c
index 50b7937bab32..337d5b9ad132 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -372,9 +372,7 @@ void workingset_refault(struct page *page, void *shadow)
if (workingset) {
SetPageWorkingset(page);
/* XXX: Move to lru_cache_add() when it supports new vs putback 
*/
-   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
inc_lruvec_state(lruvec, WORKINGSET_RESTORE);
}
 out:
-- 
1.8.3.1

[PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock

2020-07-10 Thread Alex Shi

This patch moves per node lru_lock into lruvec, thus bring a lru_lock for
each of memcg per node. So on a large machine, each of memcg don't
have to suffer from per node pgdat->lru_lock competition. They could go
fast with their self lru_lock.

After move memcg charge before lru inserting, page isolation could
serialize page's memcg, then per memcg lruvec lock is stable and could
replace per node lru lock.

According to Daniel Jordan's suggestion, I run 208 'dd' with on 104
containers on a 2s * 26cores * HT box with a modefied case:
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice

With this and later patches, the readtwice performance increases about
80% within concurrent containers.

Also add a debug func in locking which may give some clues if there are
sth out of hands.

Hugh Dickins helped on patch polish, thanks!

Reported-by: kernel test robot 
Signed-off-by: Alex Shi 
Cc: Hugh Dickins 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Yang Shi 
Cc: Matthew Wilcox 
Cc: Konstantin Khlebnikov 
Cc: Tejun Heo 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Cc: cgro...@vger.kernel.org
---
 include/linux/memcontrol.h |  98 
 include/linux/mmzone.h |   2 +
 mm/compaction.c|  67 +++---
 mm/huge_memory.c   |  11 ++---
 mm/memcontrol.c|  63 +++-
 mm/mlock.c |  32 +++
 mm/mmzone.c|   1 +
 mm/swap.c  | 100 +
 mm/vmscan.c|  70 +--
 9 files changed, 310 insertions(+), 134 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e77197a62809..6e670f991b42 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -411,6 +411,19 @@ static inline struct lruvec *mem_cgroup_lruvec(struct 
mem_cgroup *memcg,
 
 struct mem_cgroup *get_mem_cgroup_from_page(struct page *page);
 
+struct lruvec *lock_page_lruvec(struct page *page);
+struct lruvec *lock_page_lruvec_irq(struct page *page);
+struct lruvec *lock_page_lruvec_irqsave(struct page *page,
+   unsigned long *flags);
+
+#ifdef CONFIG_DEBUG_VM
+void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page);
+#else
+static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page)
+{
+}
+#endif
+
 static inline
 struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){
return css ? container_of(css, struct mem_cgroup, css) : NULL;
@@ -892,6 +905,31 @@ static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 }
 
+static inline struct lruvec *lock_page_lruvec(struct page *page)
+{
+   struct pglist_data *pgdat = page_pgdat(page);
+
+   spin_lock(>__lruvec.lru_lock);
+   return >__lruvec;
+}
+
+static inline struct lruvec *lock_page_lruvec_irq(struct page *page)
+{
+   struct pglist_data *pgdat = page_pgdat(page);
+
+   spin_lock_irq(>__lruvec.lru_lock);
+   return >__lruvec;
+}
+
+static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page,
+   unsigned long *flagsp)
+{
+   struct pglist_data *pgdat = page_pgdat(page);
+
+   spin_lock_irqsave(>__lruvec.lru_lock, *flagsp);
+   return >__lruvec;
+}
+
 static inline struct mem_cgroup *
 mem_cgroup_iter(struct mem_cgroup *root,
struct mem_cgroup *prev,
@@ -1126,6 +1164,10 @@ static inline void count_memcg_page_event(struct page 
*page,
 void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx)
 {
 }
+
+static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page)
+{
+}
 #endif /* CONFIG_MEMCG */
 
 /* idx can be of type enum memcg_stat_item or node_stat_item */
@@ -1255,6 +1297,62 @@ static inline struct lruvec *parent_lruvec(struct lruvec 
*lruvec)
return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec));
 }
 
+static inline void unlock_page_lruvec(struct lruvec *lruvec)
+{
+   spin_unlock(>lru_lock);
+}
+
+static inline void unlock_page_lruvec_irq(struct lruvec *lruvec)
+{
+   spin_unlock_irq(>lru_lock);
+}
+
+static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec,
+   unsigned long flags)
+{
+   spin_unlock_irqrestore(>lru_lock, flags);
+}
+
+/* Don't lock again iff page's lruvec locked */
+static inline struct lruvec *relock_page_lruvec_irq(struct page *page,
+   struct lruvec *locked_lruvec)
+{
+   struct pglist_data *pgdat = page_pgdat(page);
+   bool locked;
+
+   rcu_read_lock();
+   locked = mem_cgroup_page_lruvec(page, pgdat) == locked_lruvec;
+   rcu_read_unlock();
+
+   if (locked)
+   return locked_lruvec;
+
+   if (locked_lruvec)
+   unlock_page_lruvec_irq(locked_lruvec);
+
+   return

[PATCH v16 21/22] mm/pgdat: remove pgdat lru_lock

2020-07-10 Thread Alex Shi

Now pgdat.lru_lock was replaced by lruvec lock. It's not used anymore.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Konstantin Khlebnikov 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
---
 include/linux/mmzone.h | 1 -
 mm/page_alloc.c| 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 36c1680efd90..8d7318ce5f62 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -735,7 +735,6 @@ struct deferred_split {
 
/* Write-intensive fields used by page reclaim */
ZONE_PADDING(_pad1_)
-   spinlock_t  lru_lock;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e028b87ce294..4d7df42b32d6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6721,7 +6721,6 @@ static void __meminit pgdat_init_internals(struct 
pglist_data *pgdat)
init_waitqueue_head(>pfmemalloc_wait);
 
pgdat_page_ext_init(pgdat);
-   spin_lock_init(>lru_lock);
lruvec_init(>__lruvec);
 }
 
-- 
1.8.3.1

[PATCH v16 19/22] mm/lru: introduce the relock_page_lruvec function

2020-07-10 Thread Alex Shi

Use this new function to replace repeated same code, no func change.

Signed-off-by: Alex Shi 
Cc: Johannes Weiner 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Andrey Ryabinin 
Cc: Matthew Wilcox 
Cc: Mel Gorman 
Cc: Konstantin Khlebnikov 
Cc: Hugh Dickins 
Cc: Tejun Heo 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
---
 mm/mlock.c  |  9 +
 mm/swap.c   | 33 +++--
 mm/vmscan.c |  8 +---
 3 files changed, 9 insertions(+), 41 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index cb23a0c2cfbf..4f40fc091cf9 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -289,17 +289,10 @@ static void __munlock_pagevec(struct pagevec *pvec, 
struct zone *zone)
/* Phase 1: page isolation */
for (i = 0; i < nr; i++) {
struct page *page = pvec->pages[i];
-   struct lruvec *new_lruvec;
bool clearlru;
 
clearlru = TestClearPageLRU(page);
-
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (new_lruvec != lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irq(lruvec);
-   lruvec = lock_page_lruvec_irq(page);
-   }
+   lruvec = relock_page_lruvec_irq(page, lruvec);
 
if (!TestClearPageMlocked(page)) {
delta_munlocked++;
diff --git a/mm/swap.c b/mm/swap.c
index 129c532357a4..9fb906fbaed5 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -209,19 +209,12 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 
for (i = 0; i < pagevec_count(pvec); i++) {
struct page *page = pvec->pages[i];
-   struct lruvec *new_lruvec;
-
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (lruvec != new_lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irqrestore(lruvec, flags);
-   lruvec = lock_page_lruvec_irqsave(page, );
-   }
 
/* block memcg migration during page moving between lru */
if (!TestClearPageLRU(page))
continue;
 
+   lruvec = relock_page_lruvec_irqsave(page, lruvec, );
(*move_fn)(page, lruvec);
 
SetPageLRU(page);
@@ -866,17 +859,12 @@ void release_pages(struct page **pages, int nr)
}
 
if (PageLRU(page)) {
-   struct lruvec *new_lruvec;
-
-   new_lruvec = mem_cgroup_page_lruvec(page,
-   page_pgdat(page));
-   if (new_lruvec != lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irqrestore(lruvec,
-   flags);
+   struct lruvec *pre_lruvec = lruvec;
+
+   lruvec = relock_page_lruvec_irqsave(page, lruvec,
+   );
+   if (pre_lruvec != lruvec)
lock_batch = 0;
-   lruvec = lock_page_lruvec_irqsave(page, );
-   }
 
__ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, 
page_off_lru(page));
@@ -982,15 +970,8 @@ void __pagevec_lru_add(struct pagevec *pvec)
 
for (i = 0; i < pagevec_count(pvec); i++) {
struct page *page = pvec->pages[i];
-   struct lruvec *new_lruvec;
-
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (lruvec != new_lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irqrestore(lruvec, flags);
-   lruvec = lock_page_lruvec_irqsave(page, );
-   }
 
+   lruvec = relock_page_lruvec_irqsave(page, lruvec, );
__pagevec_lru_add_fn(page, lruvec);
}
if (lruvec)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 168c1659e430..bdb53a678e7e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4292,15 +4292,9 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 
for (i = 0; i < pvec->nr; i++) {
struct page *page = pvec->pages[i];
-   struct lruvec *new_lruvec;
 
pgscanned++;
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (lruvec != new_lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irq(lruvec);
-   lruvec = lock_page_lruvec_irq(page);
-   }
+   lruvec = relock_page_lruvec_irq(page, lruvec);
 
if (!PageLRU(page) || !PageUnevictable(page))

[PATCH v16 13/22] mm/lru: introduce TestClearPageLRU

2020-07-10 Thread Alex Shi

Combine PageLRU check and ClearPageLRU into a function by new
introduced func TestClearPageLRU. This function will be used as page
isolation precondition to prevent other isolations some where else.
Then there are may non PageLRU page on lru list, need to remove BUG
checking accordingly.

Hugh Dickins pointed that __page_cache_release and release_pages
has no need to do atomic clear bit since no user on the page at that
moment. and no need get_page() before lru bit clear in isolate_lru_page,
since it '(1) Must be called with an elevated refcount on the page'.

As Andrew Morton mentioned this change would dirty cacheline for page
isn't on LRU. But the lost would be acceptable with Rong Chen
 report:
https://lkml.org/lkml/2020/3/4/173

Suggested-by: Johannes Weiner 
Signed-off-by: Alex Shi 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Andrew Morton 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
---
 include/linux/page-flags.h |  1 +
 mm/mlock.c |  3 +--
 mm/swap.c  |  6 ++
 mm/vmscan.c| 26 +++---
 4 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6be1aa559b1e..9554ed1387dc 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -326,6 +326,7 @@ static inline void page_init_poison(struct page *page, 
size_t size)
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
+   TESTCLEARFLAG(LRU, lru, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
diff --git a/mm/mlock.c b/mm/mlock.c
index f8736136fad7..228ba5a8e0a5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -108,13 +108,12 @@ void mlock_vma_page(struct page *page)
  */
 static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
 {
-   if (PageLRU(page)) {
+   if (TestClearPageLRU(page)) {
struct lruvec *lruvec;
 
lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
if (getpage)
get_page(page);
-   ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, page_lru(page));
return true;
}
diff --git a/mm/swap.c b/mm/swap.c
index f645965fde0e..5092fe9c8c47 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -83,10 +83,9 @@ static void __page_cache_release(struct page *page)
struct lruvec *lruvec;
unsigned long flags;
 
+   __ClearPageLRU(page);
spin_lock_irqsave(>lru_lock, flags);
lruvec = mem_cgroup_page_lruvec(page, pgdat);
-   VM_BUG_ON_PAGE(!PageLRU(page), page);
-   __ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, page_off_lru(page));
spin_unlock_irqrestore(>lru_lock, flags);
}
@@ -878,9 +877,8 @@ void release_pages(struct page **pages, int nr)
spin_lock_irqsave(_pgdat->lru_lock, 
flags);
}
 
-   lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
-   VM_BUG_ON_PAGE(!PageLRU(page), page);
__ClearPageLRU(page);
+   lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
del_page_from_lru_list(page, lruvec, 
page_off_lru(page));
}
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c1c4259b4de5..18986fefd49b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1548,16 +1548,16 @@ int __isolate_lru_page(struct page *page, 
isolate_mode_t mode)
 {
int ret = -EINVAL;
 
-   /* Only take pages on the LRU. */
-   if (!PageLRU(page))
-   return ret;
-
/* Compaction should not handle unevictable pages but CMA can do so */
if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
return ret;
 
ret = -EBUSY;
 
+   /* Only take pages on the LRU. */
+   if (!PageLRU(page))
+   return ret;
+
/*
 * To minimise LRU disruption, the caller can indicate that it only
 * wants to isolate pages it will be able to operate on without
@@ -1671,8 +1671,6 @@ static unsigned long isolate_lru_pages(unsigned long 
nr_to_scan,
page = lru_to_page(src);
prefetchw_prev_lru_page(page, src, flags);
 
-   VM_BUG_ON_PAGE(!PageLRU(page), page);
-
nr_pages = compound_nr(page);
total_scan += nr_pages;
 
@@ -1769,21 +1767,19 @@ int isolate_lru_page(struct page *page)
VM_BUG_ON_PAGE(!page_count(page), page);
WARN_RATELIMIT(PageTail(page), "trying to isolate tail

[PATCH v16 07/22] mm/thp: remove code path which never got into

2020-07-10 Thread Alex Shi

split_huge_page() will never call on a page which isn't on lru list, so
this code never got a chance to run, and should not be run, to add tail
pages on a lru list which head page isn't there.

Although the bug was never triggered, it'better be removed for code
correctness.

BTW, it looks better to have BUG() or soem warning set in the wrong
path, but the path will be changed in incomming new page isolation
func. So just save it here.

Signed-off-by: Alex Shi 
Cc: Kirill A. Shutemov 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b18f21da4dac..1fb4147ff854 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2357,16 +2357,6 @@ static void lru_add_page_tail(struct page *head, struct 
page *page_tail,
/* page reclaim is reclaiming a huge page */
get_page(page_tail);
list_add_tail(_tail->lru, list);
-   } else {
-   /*
-* Head page has not yet been counted, as an hpage,
-* so we must account for each subpage individually.
-*
-* Put page_tail on the list at the correct position
-* so they all end up in order.
-*/
-   add_page_to_lru_list_tail(page_tail, lruvec,
- page_lru(page_tail));
}
 }
 
-- 
1.8.3.1

[PATCH v16 05/22] mm/thp: move lru_add_page_tail func to huge_memory.c

2020-07-10 Thread Alex Shi

The func is only used in huge_memory.c, defining it in other file with a
CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird.

Let's move it THP. And make it static as Hugh Dickin suggested.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 include/linux/swap.h |  2 --
 mm/huge_memory.c | 30 ++
 mm/swap.c| 33 -
 3 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5b3216ba39a9..2c29399b29a0 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -339,8 +339,6 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file,
  unsigned int nr_pages);
 extern void lru_note_cost_page(struct page *);
 extern void lru_cache_add(struct page *);
-extern void lru_add_page_tail(struct page *page, struct page *page_tail,
-struct lruvec *lruvec, struct list_head *head);
 extern void activate_page(struct page *);
 extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 78c84bee7e29..9e050b13f597 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2340,6 +2340,36 @@ static void remap_page(struct page *page)
}
 }
 
+static void lru_add_page_tail(struct page *page, struct page *page_tail,
+   struct lruvec *lruvec, struct list_head *list)
+{
+   VM_BUG_ON_PAGE(!PageHead(page), page);
+   VM_BUG_ON_PAGE(PageCompound(page_tail), page);
+   VM_BUG_ON_PAGE(PageLRU(page_tail), page);
+   lockdep_assert_held(_pgdat(lruvec)->lru_lock);
+
+   if (!list)
+   SetPageLRU(page_tail);
+
+   if (likely(PageLRU(page)))
+   list_add_tail(_tail->lru, >lru);
+   else if (list) {
+   /* page reclaim is reclaiming a huge page */
+   get_page(page_tail);
+   list_add_tail(_tail->lru, list);
+   } else {
+   /*
+* Head page has not yet been counted, as an hpage,
+* so we must account for each subpage individually.
+*
+* Put page_tail on the list at the correct position
+* so they all end up in order.
+*/
+   add_page_to_lru_list_tail(page_tail, lruvec,
+ page_lru(page_tail));
+   }
+}
+
 static void __split_huge_page_tail(struct page *head, int tail,
struct lruvec *lruvec, struct list_head *list)
 {
diff --git a/mm/swap.c b/mm/swap.c
index a82efc33411f..7701d855873d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -933,39 +933,6 @@ void __pagevec_release(struct pagevec *pvec)
 }
 EXPORT_SYMBOL(__pagevec_release);
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/* used by __split_huge_page_refcount() */
-void lru_add_page_tail(struct page *page, struct page *page_tail,
-  struct lruvec *lruvec, struct list_head *list)
-{
-   VM_BUG_ON_PAGE(!PageHead(page), page);
-   VM_BUG_ON_PAGE(PageCompound(page_tail), page);
-   VM_BUG_ON_PAGE(PageLRU(page_tail), page);
-   lockdep_assert_held(_pgdat(lruvec)->lru_lock);
-
-   if (!list)
-   SetPageLRU(page_tail);
-
-   if (likely(PageLRU(page)))
-   list_add_tail(_tail->lru, >lru);
-   else if (list) {
-   /* page reclaim is reclaiming a huge page */
-   get_page(page_tail);
-   list_add_tail(_tail->lru, list);
-   } else {
-   /*
-* Head page has not yet been counted, as an hpage,
-* so we must account for each subpage individually.
-*
-* Put page_tail on the list at the correct position
-* so they all end up in order.
-*/
-   add_page_to_lru_list_tail(page_tail, lruvec,
- page_lru(page_tail));
-   }
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
 static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 void *arg)
 {
-- 
1.8.3.1

[PATCH v16 10/22] mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn

2020-07-10 Thread Alex Shi

Fold the PGROTATED event collection into pagevec_move_tail_fn call back
func like other funcs does in pagevec_lru_move_fn. Now all usage of
pagevec_lru_move_fn are same and no needs of the 3rd parameter.

It's simply the calling.

[l...@intel.com: found a build issue in the original patch, thanks]
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/swap.c | 66 +++
 1 file changed, 24 insertions(+), 42 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 7701d855873d..dc8b02cdddcb 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -204,8 +204,7 @@ int get_kernel_page(unsigned long start, int write, struct 
page **pages)
 EXPORT_SYMBOL_GPL(get_kernel_page);
 
 static void pagevec_lru_move_fn(struct pagevec *pvec,
-   void (*move_fn)(struct page *page, struct lruvec *lruvec, void *arg),
-   void *arg)
+   void (*move_fn)(struct page *page, struct lruvec *lruvec))
 {
int i;
struct pglist_data *pgdat = NULL;
@@ -224,7 +223,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
}
 
lruvec = mem_cgroup_page_lruvec(page, pgdat);
-   (*move_fn)(page, lruvec, arg);
+   (*move_fn)(page, lruvec);
}
if (pgdat)
spin_unlock_irqrestore(>lru_lock, flags);
@@ -232,35 +231,23 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
pagevec_reinit(pvec);
 }
 
-static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec,
-void *arg)
+static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec)
 {
-   int *pgmoved = arg;
-
if (PageLRU(page) && !PageUnevictable(page)) {
del_page_from_lru_list(page, lruvec, page_lru(page));
ClearPageActive(page);
add_page_to_lru_list_tail(page, lruvec, page_lru(page));
-   (*pgmoved) += hpage_nr_pages(page);
+   __count_vm_events(PGROTATED, hpage_nr_pages(page));
}
 }
 
 /*
- * pagevec_move_tail() must be called with IRQ disabled.
- * Otherwise this may cause nasty races.
- */
-static void pagevec_move_tail(struct pagevec *pvec)
-{
-   int pgmoved = 0;
-
-   pagevec_lru_move_fn(pvec, pagevec_move_tail_fn, );
-   __count_vm_events(PGROTATED, pgmoved);
-}
-
-/*
  * Writeback is about to end against a page which has been marked for immediate
  * reclaim.  If it still appears to be reclaimable, move it to the tail of the
  * inactive list.
+ *
+ * pagevec_move_tail_fn() must be called with IRQ disabled.
+ * Otherwise this may cause nasty races.
  */
 void rotate_reclaimable_page(struct page *page)
 {
@@ -273,7 +260,7 @@ void rotate_reclaimable_page(struct page *page)
local_lock_irqsave(_rotate.lock, flags);
pvec = this_cpu_ptr(_rotate.pvec);
if (!pagevec_add(pvec, page) || PageCompound(page))
-   pagevec_move_tail(pvec);
+   pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
local_unlock_irqrestore(_rotate.lock, flags);
}
 }
@@ -315,8 +302,7 @@ void lru_note_cost_page(struct page *page)
  page_is_file_lru(page), hpage_nr_pages(page));
 }
 
-static void __activate_page(struct page *page, struct lruvec *lruvec,
-   void *arg)
+static void __activate_page(struct page *page, struct lruvec *lruvec)
 {
if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
int lru = page_lru_base_type(page);
@@ -340,7 +326,7 @@ static void activate_page_drain(int cpu)
struct pagevec *pvec = _cpu(lru_pvecs.activate_page, cpu);
 
if (pagevec_count(pvec))
-   pagevec_lru_move_fn(pvec, __activate_page, NULL);
+   pagevec_lru_move_fn(pvec, __activate_page);
 }
 
 static bool need_activate_page_drain(int cpu)
@@ -358,7 +344,7 @@ void activate_page(struct page *page)
pvec = this_cpu_ptr(_pvecs.activate_page);
get_page(page);
if (!pagevec_add(pvec, page) || PageCompound(page))
-   pagevec_lru_move_fn(pvec, __activate_page, NULL);
+   pagevec_lru_move_fn(pvec, __activate_page);
local_unlock(_pvecs.lock);
}
 }
@@ -374,7 +360,7 @@ void activate_page(struct page *page)
 
page = compound_head(page);
spin_lock_irq(>lru_lock);
-   __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL);
+   __activate_page(page, mem_cgroup_page_lruvec(page, pgdat));
spin_unlock_irq(>lru_lock);
 }
 #endif
@@ -526,8 +512,7 @@ void lru_cache_add_active_or_unevictable(struct page *page,
  * be write it out by flusher threads as this is much more effective
  * than the single-page writeout from reclaim.
  */
-static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
-

[PATCH v16 20/22] mm/vmscan: use relock for move_pages_to_lru

2020-07-10 Thread Alex Shi

From: Hugh Dickins 

Use the relock function to replace relocking action. And try to save few
lock times.

Signed-off-by: Hugh Dickins 
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Tejun Heo 
Cc: Andrey Ryabinin 
Cc: Jann Horn 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: cgro...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 mm/vmscan.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bdb53a678e7e..078a1640ec60 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1854,15 +1854,15 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
enum lru_list lru;
 
while (!list_empty(list)) {
-   struct lruvec *new_lruvec = NULL;
-
page = lru_to_page(list);
VM_BUG_ON_PAGE(PageLRU(page), page);
list_del(>lru);
if (unlikely(!page_evictable(page))) {
-   spin_unlock_irq(>lru_lock);
+   if (lruvec) {
+   spin_unlock_irq(>lru_lock);
+   lruvec = NULL;
+   }
putback_lru_page(page);
-   spin_lock_irq(>lru_lock);
continue;
}
 
@@ -1876,12 +1876,7 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
 *list_add(>lru,)
 * list_add(>lru,) //corrupt
 */
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (new_lruvec != lruvec) {
-   if (lruvec)
-   spin_unlock_irq(>lru_lock);
-   lruvec = lock_page_lruvec_irq(page);
-   }
+   lruvec = relock_page_lruvec_irq(page, lruvec);
SetPageLRU(page);
 
if (unlikely(put_page_testzero(page))) {
@@ -1890,8 +1885,8 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
 
if (unlikely(PageCompound(page))) {
spin_unlock_irq(>lru_lock);
+   lruvec = NULL;
destroy_compound_page(page);
-   spin_lock_irq(>lru_lock);
} else
list_add(>lru, _to_free);
 
-- 
1.8.3.1

[PATCH v16 15/22] mm/compaction: do page isolation first in compaction

2020-07-10 Thread Alex Shi

Johannes Weiner has suggested:
"So here is a crazy idea that may be worth exploring:

Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's
linked list.

Can we make PageLRU atomic and use it to stabilize the lru_lock
instead, and then use the lru_lock only serialize list operations?
..."

Yes, this patch is doing so on  __isolate_lru_page which is the core
page isolation func in compaction and shrinking path.
With this patch, the compaction will only deal the PageLRU set and now
isolated pages to skip the just alloced page which no LRU bit. And the
isolation could exclusive the other isolations in memcg move_account,
page migrations and thp split_huge_page.

As a side effect, PageLRU may be cleared during shrink_inactive_list
path for isolation reason. If so, we can skip that page.

Hugh Dickins  fixed following bugs in this patch's
early version:

Fix lots of crashes under compaction load: isolate_migratepages_block()
must clean up appropriately when rejecting a page, setting PageLRU again
if it had been cleared; and a put_page() after get_page_unless_zero()
cannot safely be done while holding locked_lruvec - it may turn out to
be the final put_page(), which will take an lruvec lock when PageLRU.
And move __isolate_lru_page_prepare back after get_page_unless_zero to
make trylock_page() safe:
trylock_page() is not safe to use at this time: its setting PG_locked
can race with the page being freed or allocated ("Bad page"), and can
also erase flags being set by one of those "sole owners" of a freshly
allocated page who use non-atomic __SetPageFlag().

Suggested-by: Johannes Weiner 
Signed-off-by: Alex Shi 
Cc: Hugh Dickins 
Cc: Andrew Morton 
Cc: Matthew Wilcox 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 include/linux/swap.h |  2 +-
 mm/compaction.c  | 42 +-
 mm/vmscan.c  | 38 ++
 3 files changed, 56 insertions(+), 26 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2c29399b29a0..6d23d3beeff7 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -358,7 +358,7 @@ extern void lru_cache_add_active_or_unevictable(struct page 
*page,
 extern unsigned long zone_reclaimable_pages(struct zone *zone);
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask, nodemask_t *mask);
-extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
+extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
  unsigned long nr_pages,
  gfp_t gfp_mask,
diff --git a/mm/compaction.c b/mm/compaction.c
index f14780fc296a..2da2933fe56b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -869,6 +869,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) {
if (!cc->ignore_skip_hint && get_pageblock_skip(page)) {
low_pfn = end_pfn;
+   page = NULL;
goto isolate_abort;
}
valid_page = page;
@@ -950,6 +951,21 @@ static bool too_many_isolated(pg_data_t *pgdat)
if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
goto isolate_fail;
 
+   /*
+* Be careful not to clear PageLRU until after we're
+* sure the page is not being freed elsewhere -- the
+* page release code relies on it.
+*/
+   if (unlikely(!get_page_unless_zero(page)))
+   goto isolate_fail;
+
+   if (__isolate_lru_page_prepare(page, isolate_mode) != 0)
+   goto isolate_fail_put;
+
+   /* Try isolate the page */
+   if (!TestClearPageLRU(page))
+   goto isolate_fail_put;
+
/* If we already hold the lock, we can skip some rechecking */
if (!locked) {
locked = compact_lock_irqsave(>lru_lock,
@@ -962,10 +978,6 @@ static bool too_many_isolated(pg_data_t *pgdat)
goto isolate_abort;
}
 
-   /* Recheck PageLRU and PageCompound under lock */
-   if (!PageLRU(page))
-   goto isolate_fail;
-
/*
 * Page become compound since the non-locked check,
 * and it's on LRU. It can only be a THP so the order
@@ -973,16 +985,13 @@ static bool too_many_isolated(pg_data_t *pgdat)
 */
if (unlikely(PageCompound(page)

[GIT PULL] SMB3 Fixes

2020-07-10 Thread Steve French

Please pull the following changes since commit
dcb7fd82c75ee2d6e6f9d8cc71c52519ed52e258:

  Linux 5.8-rc4 (2020-07-05 16:20:22 -0700)

are available in the Git repository at:

  git://git.samba.org/sfrench/cifs-2.6.git tags/5.8-rc4-smb3-fixes

for you to fetch changes up to a8dab63ea623610bb258d93649e30330dd1b7c8b:

  cifs: update internal module version number (2020-07-09 10:07:09 -0500)


4 cifs/smb3 fixes: the three for stable fix problems found recently
with change notification including a reference count leak

Regression test results:
http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/2/builds/367

Ronnie Sahlberg (1):
  cifs: fix reference leak for tlink

Steve French (3):
  smb3: fix access denied on change notify request to some servers
  smb3: fix unneeded error message on change notify
  cifs: update internal module version number

yangerkun (1):
  cifs: remove the retry in cifs_poxis_lock_set

 fs/cifs/cifsfs.h   |  2 +-
 fs/cifs/file.c | 19 ++-
 fs/cifs/ioctl.c|  9 -
 fs/cifs/smb2misc.c |  8 ++--
 fs/cifs/smb2ops.c  |  2 +-
 5 files changed, 22 insertions(+), 18 deletions(-)

--
Thanks,

Steve

Re: [PATCH 11/12] device-dax: Add dis-contiguous resource support

2020-07-10 Thread Dan Williams

On Tue, May 12, 2020 at 7:37 AM Joao Martins  wrote:
>
> On 3/23/20 11:55 PM, Dan Williams wrote:
> > @@ -561,13 +580,26 @@ static int __alloc_dev_dax_range(struct dev_dax 
> > *dev_dax, u64 start,
> >   if (start == U64_MAX)
> >   return -EINVAL;
> >
> > + ranges = krealloc(dev_dax->ranges, sizeof(*ranges)
> > + * (dev_dax->nr_range + 1), GFP_KERNEL);
> > + if (!ranges)
> > + return -ENOMEM;
> > +
> >   alloc = __request_region(res, start, size, dev_name(dev), 0);
> > - if (!alloc)
> > + if (!alloc) {
> > + kfree(ranges);
> >   return -ENOMEM;
> > + }
>
> Noticed this yesterday while looking at alloc_dev_dax_range().
>
> Is it correct to free @ranges here on __request_region failure?
>
> IIUC krealloc() would free dev_dax->ranges if it succeeds, leaving us without
> any valid ranges if __request_region failure case indeed frees @ranges. These
> @ranges are being used afterwards when we delete the interface and free the
> assigned regions. Perhaps we should remove the kfree() above and set
> dev_dax->ranges instead before __request_region; or alternatively change the
> call order between krealloc and __request_region? FWIW, krealloc checks if the
> object being reallocated already meets the requested size, so perhaps there's 
> no
> harm with going with the former.

Yeah, the kfree is bogus. It can just wait until the device is
destroyed to be freed, but only if there is an existing allocation. If
this is a new allocation then nothing else will do the kfree.

[PATCH v2 7/8] drm/msm/dpu: add SM8150 to hw catalog

2020-07-10 Thread Jonathan Marek

This brings up basic video mode functionality for SM8150 DPU. Command mode
and dual mixer/intf configurations are not working, future patches will
address this. Scaler functionality and multiple planes is also untested.

Signed-off-by: Jonathan Marek 
---
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c| 148 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h   |   2 +
 2 files changed, 150 insertions(+)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index 1d19c377b096..20f869bbd574 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -92,6 +92,23 @@ static const struct dpu_caps sc7180_dpu_caps = {
.pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
 };
 
+static const struct dpu_caps sm8150_dpu_caps = {
+   .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
+   .max_mixer_blendstages = 0xb,
+   .max_linewidth = 4096,
+   .qseed_type = DPU_SSPP_SCALER_QSEED3,
+   .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
+   .ubwc_version = DPU_HW_UBWC_VER_30,
+   .has_src_split = true,
+   .has_dim_layer = true,
+   .has_idle_pc = true,
+   .has_3d_merge = true,
+   .max_linewidth = 4096,
+   .pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
+   .max_hdeci_exp = MAX_HORZ_DECIMATION,
+   .max_vdeci_exp = MAX_VERT_DECIMATION,
+};
+
 static const struct dpu_mdp_cfg sdm845_mdp[] = {
{
.name = "top_0", .id = MDP_TOP,
@@ -183,6 +200,39 @@ static const struct dpu_ctl_cfg sc7180_ctl[] = {
},
 };
 
+static const struct dpu_ctl_cfg sm8150_ctl[] = {
+   {
+   .name = "ctl_0", .id = CTL_0,
+   .base = 0x1000, .len = 0x1e0,
+   .features = BIT(DPU_CTL_ACTIVE_CFG) | BIT(DPU_CTL_SPLIT_DISPLAY)
+   },
+   {
+   .name = "ctl_1", .id = CTL_1,
+   .base = 0x1200, .len = 0x1e0,
+   .features = BIT(DPU_CTL_ACTIVE_CFG) | BIT(DPU_CTL_SPLIT_DISPLAY)
+   },
+   {
+   .name = "ctl_2", .id = CTL_2,
+   .base = 0x1400, .len = 0x1e0,
+   .features = BIT(DPU_CTL_ACTIVE_CFG)
+   },
+   {
+   .name = "ctl_3", .id = CTL_3,
+   .base = 0x1600, .len = 0x1e0,
+   .features = BIT(DPU_CTL_ACTIVE_CFG)
+   },
+   {
+   .name = "ctl_4", .id = CTL_4,
+   .base = 0x1800, .len = 0x1e0,
+   .features = BIT(DPU_CTL_ACTIVE_CFG)
+   },
+   {
+   .name = "ctl_5", .id = CTL_5,
+   .base = 0x1a00, .len = 0x1e0,
+   .features = BIT(DPU_CTL_ACTIVE_CFG)
+   },
+};
+
 /*
  * SSPP sub blocks config
  */
@@ -338,6 +388,23 @@ static const struct dpu_lm_cfg sc7180_lm[] = {
_lm_sblk, PINGPONG_1, LM_0, 0),
 };
 
+/* SM8150 */
+
+static const struct dpu_lm_cfg sm8150_lm[] = {
+   LM_BLK("lm_0", LM_0, 0x44000, MIXER_SDM845_MASK,
+   _lm_sblk, PINGPONG_0, LM_1, 0),
+   LM_BLK("lm_1", LM_1, 0x45000, MIXER_SDM845_MASK,
+   _lm_sblk, PINGPONG_1, LM_0, 0),
+   LM_BLK("lm_2", LM_2, 0x46000, MIXER_SDM845_MASK,
+   _lm_sblk, PINGPONG_2, LM_3, 0),
+   LM_BLK("lm_3", LM_3, 0x47000, MIXER_SDM845_MASK,
+   _lm_sblk, PINGPONG_3, LM_2, 0),
+   LM_BLK("lm_4", LM_4, 0x48000, MIXER_SDM845_MASK,
+   _lm_sblk, PINGPONG_4, LM_5, 0),
+   LM_BLK("lm_5", LM_5, 0x49000, MIXER_SDM845_MASK,
+   _lm_sblk, PINGPONG_5, LM_4, 0),
+};
+
 /*
  * DSPP sub blocks config
  */
@@ -357,6 +424,7 @@ static const struct dpu_dspp_sub_blks sc7180_dspp_sblk = {
 static const struct dpu_dspp_cfg sc7180_dspp[] = {
DSPP_BLK("dspp_0", DSPP_0, 0x54000),
 };
+
 /*
  * PINGPONG sub blocks config
  */
@@ -399,6 +467,15 @@ static struct dpu_pingpong_cfg sc7180_pp[] = {
PP_BLK_TE("pingpong_1", PINGPONG_1, 0x70800),
 };
 
+static const struct dpu_pingpong_cfg sm8150_pp[] = {
+   PP_BLK_TE("pingpong_0", PINGPONG_0, 0x7),
+   PP_BLK_TE("pingpong_1", PINGPONG_1, 0x70800),
+   PP_BLK("pingpong_2", PINGPONG_2, 0x71000),
+   PP_BLK("pingpong_3", PINGPONG_3, 0x71800),
+   PP_BLK("pingpong_4", PINGPONG_4, 0x72000),
+   PP_BLK("pingpong_5", PINGPONG_5, 0x72800),
+};
+
 /*
  * INTF sub blocks config
  */
@@ -424,6 +501,13 @@ static const struct dpu_intf_cfg sc7180_intf[] = {
INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0, INTF_SC7180_MASK),
 };
 
+static const struct dpu_intf_cfg sm8150_intf[] = {
+   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, 0, INTF_SC7180_MASK),
+

[PATCH v2 8/8] drm/msm/dpu: add SM8250 to hw catalog

2020-07-10 Thread Jonathan Marek

This brings up basic video mode functionality for SM8250 DPU. Command mode
and dual mixer/intf configurations are not working, future patches will
address this. Scaler functionality and multiple planes is also untested.

Signed-off-by: Jonathan Marek 
---
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c| 106 ++
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h|   3 +
 2 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index 20f869bbd574..17e9223e5a2e 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -109,6 +109,21 @@ static const struct dpu_caps sm8150_dpu_caps = {
.max_vdeci_exp = MAX_VERT_DECIMATION,
 };
 
+static const struct dpu_caps sm8250_dpu_caps = {
+   .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
+   .max_mixer_blendstages = 0xb,
+   .max_linewidth = 4096,
+   .qseed_type = DPU_SSPP_SCALER_QSEED3, /* TODO: qseed3 lite */
+   .smart_dma_rev = DPU_SSPP_SMART_DMA_V2, /* TODO: v2.5 */
+   .ubwc_version = DPU_HW_UBWC_VER_40,
+   .has_src_split = true,
+   .has_dim_layer = true,
+   .has_idle_pc = true,
+   .has_3d_merge = true,
+   .max_linewidth = 4096,
+   .pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
+};
+
 static const struct dpu_mdp_cfg sdm845_mdp[] = {
{
.name = "top_0", .id = MDP_TOP,
@@ -151,6 +166,33 @@ static const struct dpu_mdp_cfg sc7180_mdp[] = {
},
 };
 
+static const struct dpu_mdp_cfg sm8250_mdp[] = {
+   {
+   .name = "top_0", .id = MDP_TOP,
+   .base = 0x0, .len = 0x45C,
+   .features = 0,
+   .highest_bank_bit = 0x3, /* TODO: 2 for LP_DDR4 */
+   .clk_ctrls[DPU_CLK_CTRL_VIG0] = {
+   .reg_off = 0x2AC, .bit_off = 0},
+   .clk_ctrls[DPU_CLK_CTRL_VIG1] = {
+   .reg_off = 0x2B4, .bit_off = 0},
+   .clk_ctrls[DPU_CLK_CTRL_VIG2] = {
+   .reg_off = 0x2BC, .bit_off = 0},
+   .clk_ctrls[DPU_CLK_CTRL_VIG3] = {
+   .reg_off = 0x2C4, .bit_off = 0},
+   .clk_ctrls[DPU_CLK_CTRL_DMA0] = {
+   .reg_off = 0x2AC, .bit_off = 8},
+   .clk_ctrls[DPU_CLK_CTRL_DMA1] = {
+   .reg_off = 0x2B4, .bit_off = 8},
+   .clk_ctrls[DPU_CLK_CTRL_CURSOR0] = {
+   .reg_off = 0x2BC, .bit_off = 8},
+   .clk_ctrls[DPU_CLK_CTRL_CURSOR1] = {
+   .reg_off = 0x2C4, .bit_off = 8},
+   .clk_ctrls[DPU_CLK_CTRL_REG_DMA] = {
+   .reg_off = 0x2BC, .bit_off = 20},
+   },
+};
+
 /*
  * CTL sub blocks config
  */
@@ -542,6 +584,14 @@ static const struct dpu_reg_dma_cfg sm8150_regdma = {
.base = 0x0, .version = 0x00010001, .trigger_sel_off = 0x119c
 };
 
+static const struct dpu_reg_dma_cfg sm8250_regdma = {
+   .base = 0x0,
+   .version = 0x00010002,
+   .trigger_sel_off = 0x119c,
+   .xin_id = 7,
+   .clk_ctrl = DPU_CLK_CTRL_REG_DMA,
+};
+
 /*
  * PERF data config
  */
@@ -679,6 +729,31 @@ static const struct dpu_perf_cfg sm8150_perf_data = {
},
 };
 
+static const struct dpu_perf_cfg sm8250_perf_data = {
+   .max_bw_low = 1370,
+   .max_bw_high = 1660,
+   .min_core_ib = 480,
+   .min_llcc_ib = 0,
+   .min_dram_ib = 80,
+   .danger_lut_tbl = {0xf, 0x, 0x0},
+   .qos_lut_tbl = {
+   {.nentry = ARRAY_SIZE(sc7180_qos_linear),
+   .entries = sc7180_qos_linear
+   },
+   {.nentry = ARRAY_SIZE(sc7180_qos_macrotile),
+   .entries = sc7180_qos_macrotile
+   },
+   {.nentry = ARRAY_SIZE(sc7180_qos_nrt),
+   .entries = sc7180_qos_nrt
+   },
+   /* TODO: macrotile-qseed is different from macrotile */
+   },
+   .cdp_cfg = {
+   {.rd_enable = 1, .wr_enable = 1},
+   {.rd_enable = 1, .wr_enable = 0}
+   },
+};
+
 /*
  * Hardware catalog init
  */
@@ -772,11 +847,42 @@ static void sm8150_cfg_init(struct dpu_mdss_cfg *dpu_cfg)
};
 }
 
+/*
+ * sm8250_cfg_init(): populate sm8250 dpu sub-blocks reg offsets
+ * and instance counts.
+ */
+static void sm8250_cfg_init(struct dpu_mdss_cfg *dpu_cfg)
+{
+   *dpu_cfg = (struct dpu_mdss_cfg){
+   .caps = _dpu_caps,
+   .mdp_count = ARRAY_SIZE(sm8250_mdp),
+   .mdp = sm8250_mdp,
+   .ctl_count = ARRAY_SIZE(sm8150_ctl),
+   .ctl = sm8150_ctl,
+   /*

[PATCH v2 6/8] drm/msm/dpu: intf timing path for displayport

2020-07-10 Thread Jonathan Marek

Calculate the correct timings for displayport, from downstream driver.

Signed-off-by: Jonathan Marek 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index 64f556d693dd..6f0f54588124 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -107,11 +107,6 @@ static void dpu_hw_intf_setup_timing_engine(struct 
dpu_hw_intf *ctx,
display_v_end = ((vsync_period - p->v_front_porch) * hsync_period) +
p->hsync_skew - 1;
 
-   if (ctx->cap->type == INTF_EDP || ctx->cap->type == INTF_DP) {
-   display_v_start += p->hsync_pulse_width + p->h_back_porch;
-   display_v_end -= p->h_front_porch;
-   }
-
hsync_start_x = p->h_back_porch + p->hsync_pulse_width;
hsync_end_x = hsync_period - p->h_front_porch - 1;
 
@@ -144,10 +139,25 @@ static void dpu_hw_intf_setup_timing_engine(struct 
dpu_hw_intf *ctx,
hsync_ctl = (hsync_period << 16) | p->hsync_pulse_width;
display_hctl = (hsync_end_x << 16) | hsync_start_x;
 
+   if (ctx->cap->type == INTF_EDP || ctx->cap->type == INTF_DP) {
+   active_h_start = hsync_start_x;
+   active_h_end = active_h_start + p->xres - 1;
+   active_v_start = display_v_start;
+   active_v_end = active_v_start + (p->yres * hsync_period) - 1;
+
+   display_v_start += p->hsync_pulse_width + p->h_back_porch;
+
+   active_hctl = (active_h_end << 16) | active_h_start;
+   display_hctl = active_hctl;
+   }
+
den_polarity = 0;
if (ctx->cap->type == INTF_HDMI) {
hsync_polarity = p->yres >= 720 ? 0 : 1;
vsync_polarity = p->yres >= 720 ? 0 : 1;
+   } else if (ctx->cap->type == INTF_DP) {
+   hsync_polarity = p->hsync_polarity;
+   vsync_polarity = p->vsync_polarity;
} else {
hsync_polarity = 0;
vsync_polarity = 0;
-- 
2.26.1

[PATCH v2 2/8] drm/msm/dpu: update UBWC config for sm8150 and sm8250

2020-07-10 Thread Jonathan Marek

Update the UBWC registers to the right values for sm8150 and sm8250.

This removes broken dpu_hw_reset_ubwc, which doesn't work because the
"force blk offset to zero to access beginning of register region" hack is
copied from downstream, where mapped region starts 0x1000 below what is
used in the upstream driver.

Also simplifies the overly complicated change that was introduced in
e4f9bbe9f8beab9a1ce4 to work around dpu_hw_reset_ubwc being broken.

Signed-off-by: Jonathan Marek 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   |  8 --
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h|  8 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c   | 16 +++-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c| 18 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.h|  7 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c  | 75 ++-
 6 files changed, 42 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 148c6d71e6c1..46df0ff75b85 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -1115,7 +1115,6 @@ static void _dpu_encoder_virt_enable_helper(struct 
drm_encoder *drm_enc)
 {
struct dpu_encoder_virt *dpu_enc = NULL;
struct msm_drm_private *priv;
-   struct dpu_kms *dpu_kms;
int i;
 
if (!drm_enc || !drm_enc->dev) {
@@ -1124,7 +1123,6 @@ static void _dpu_encoder_virt_enable_helper(struct 
drm_encoder *drm_enc)
}
 
priv = drm_enc->dev->dev_private;
-   dpu_kms = to_dpu_kms(priv->kms);
 
dpu_enc = to_dpu_encoder_virt(drm_enc);
if (!dpu_enc || !dpu_enc->cur_master) {
@@ -1132,12 +1130,6 @@ static void _dpu_encoder_virt_enable_helper(struct 
drm_encoder *drm_enc)
return;
}
 
-   if (dpu_enc->cur_master->hw_mdptop &&
-   dpu_enc->cur_master->hw_mdptop->ops.reset_ubwc)
-   dpu_enc->cur_master->hw_mdptop->ops.reset_ubwc(
-   dpu_enc->cur_master->hw_mdptop,
-   dpu_kms->catalog);
-
_dpu_encoder_update_vsync_source(dpu_enc, _enc->disp_info);
 
if (dpu_enc->disp_info.intf_type == DRM_MODE_ENCODER_DSI) {
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
index f7de43838c69..63512753b369 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
@@ -37,7 +37,9 @@
 #define DPU_HW_VER_400 DPU_HW_VER(4, 0, 0) /* sdm845 v1.0 */
 #define DPU_HW_VER_401 DPU_HW_VER(4, 0, 1) /* sdm845 v2.0 */
 #define DPU_HW_VER_410 DPU_HW_VER(4, 1, 0) /* sdm670 v1.0 */
-#define DPU_HW_VER_500 DPU_HW_VER(5, 0, 0) /* sdm855 v1.0 */
+#define DPU_HW_VER_500 DPU_HW_VER(5, 0, 0) /* sm8150 v1.0 */
+#define DPU_HW_VER_501 DPU_HW_VER(5, 0, 1) /* sm8150 v2.0 */
+#define DPU_HW_VER_600 DPU_HW_VER(6, 0, 0) /* sm8250 */
 #define DPU_HW_VER_620 DPU_HW_VER(6, 2, 0) /* sc7180 v1.0 */
 
 
@@ -65,10 +67,9 @@ enum {
DPU_HW_UBWC_VER_10 = 0x100,
DPU_HW_UBWC_VER_20 = 0x200,
DPU_HW_UBWC_VER_30 = 0x300,
+   DPU_HW_UBWC_VER_40 = 0x400,
 };
 
-#define IS_UBWC_20_SUPPORTED(rev)   ((rev) >= DPU_HW_UBWC_VER_20)
-
 /**
  * MDP TOP BLOCK features
  * @DPU_MDP_PANIC_PER_PIPE Panic configuration needs to be be done per pipe
@@ -447,7 +448,6 @@ struct dpu_clk_ctrl_reg {
 struct dpu_mdp_cfg {
DPU_HW_BLK_INFO;
u32 highest_bank_bit;
-   u32 ubwc_static;
u32 ubwc_swizzle;
struct dpu_clk_ctrl_reg clk_ctrls[DPU_CLK_CTRL_MAX];
 };
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
index 82c5dbfdabc7..c940b69435e1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c
@@ -303,11 +303,25 @@ static void dpu_hw_sspp_setup_format(struct dpu_hw_pipe 
*ctx,
DPU_REG_WRITE(c, SSPP_FETCH_CONFIG,
DPU_FETCH_CONFIG_RESET_VALUE |
ctx->mdp->highest_bank_bit << 18);
-   if (IS_UBWC_20_SUPPORTED(ctx->catalog->caps->ubwc_version)) {
+   switch (ctx->catalog->caps->ubwc_version) {
+   case DPU_HW_UBWC_VER_10:
+   /* TODO: UBWC v1 case */
+   break;
+   case DPU_HW_UBWC_VER_20:
fast_clear = fmt->alpha_enable ? BIT(31) : 0;
DPU_REG_WRITE(c, SSPP_UBWC_STATIC_CTRL,
fast_clear | (ctx->mdp->ubwc_swizzle) |
(ctx->mdp->highest_bank_bit << 4));
+   break;
+   case DPU_HW_UBWC_VER_30:
+   DPU_REG_WRITE(c, SSPP_UBWC_STATIC_CTRL,
+   BIT(30) | (ctx->mdp->ubwc_swizzle) |
+

[PATCH v2 0/8] Initial SM8150 and SM8250 DPU bringup

2020-07-10 Thread Jonathan Marek

These patches bring up SM8150 and SM8250 with basic functionality.

Tested with displayport output (single mixer, video mode case).

v2: rebased

Jonathan Marek (8):
  drm/msm/dpu: use right setup_blend_config for sm8150 and sm8250
  drm/msm/dpu: update UBWC config for sm8150 and sm8250
  drm/msm/dpu: move some sspp caps to dpu_caps
  drm/msm/dpu: don't use INTF_INPUT_CTRL feature on sdm845
  drm/msm/dpu: set missing flush bits for INTF_2 and INTF_3
  drm/msm/dpu: intf timing path for displayport
  drm/msm/dpu: add SM8150 to hw catalog
  drm/msm/dpu: add SM8250 to hw catalog

 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   |   8 -
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c| 288 +-
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h|  48 +--
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c|  20 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c   |  29 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_lm.c |   5 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_mdss.h   |   2 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c   |  16 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.c|  18 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_top.h|   7 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c  |  75 ++---
 drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c |   6 +-
 12 files changed, 364 insertions(+), 158 deletions(-)

-- 
2.26.1

[PATCH v2 1/8] drm/msm/dpu: use right setup_blend_config for sm8150 and sm8250

2020-07-10 Thread Jonathan Marek

All DPU versions starting from 4.0 use the sdm845 version, so check for
that instead of checking each version individually. This chooses the right
function for sm8150 and sm8250.

Signed-off-by: Jonathan Marek 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_lm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_lm.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_lm.c
index 37becd43bd54..4b8baf71423f 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_lm.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_lm.c
@@ -152,14 +152,13 @@ static void _setup_mixer_ops(const struct dpu_mdss_cfg *m,
unsigned long features)
 {
ops->setup_mixer_out = dpu_hw_lm_setup_out;
-   if (IS_SDM845_TARGET(m->hwversion) || IS_SDM670_TARGET(m->hwversion)
-   || IS_SC7180_TARGET(m->hwversion))
+   if (m->hwversion >= DPU_HW_VER_400)
ops->setup_blend_config = dpu_hw_lm_setup_blend_config_sdm845;
else
ops->setup_blend_config = dpu_hw_lm_setup_blend_config;
ops->setup_alpha_out = dpu_hw_lm_setup_color3;
ops->setup_border_color = dpu_hw_lm_setup_border_color;
-};
+}
 
 static struct dpu_hw_blk_ops dpu_hw_ops;
 
-- 
2.26.1

[PATCH v2 5/8] drm/msm/dpu: set missing flush bits for INTF_2 and INTF_3

2020-07-10 Thread Jonathan Marek

This fixes flushing of INTF_2 and INTF_3 on SM8150 and SM8250 hardware.

Signed-off-by: Jonathan Marek 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c | 20 ++--
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
index 613ae8f0cfcd..758c355b4fd8 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
@@ -245,30 +245,14 @@ static int dpu_hw_ctl_get_bitmask_intf(struct dpu_hw_ctl 
*ctx,
 static int dpu_hw_ctl_get_bitmask_intf_v1(struct dpu_hw_ctl *ctx,
u32 *flushbits, enum dpu_intf intf)
 {
-   switch (intf) {
-   case INTF_0:
-   case INTF_1:
-   *flushbits |= BIT(31);
-   break;
-   default:
-   return 0;
-   }
+   *flushbits |= BIT(31);
return 0;
 }
 
 static int dpu_hw_ctl_active_get_bitmask_intf(struct dpu_hw_ctl *ctx,
u32 *flushbits, enum dpu_intf intf)
 {
-   switch (intf) {
-   case INTF_0:
-   *flushbits |= BIT(0);
-   break;
-   case INTF_1:
-   *flushbits |= BIT(1);
-   break;
-   default:
-   return 0;
-   }
+   *flushbits |= BIT(intf - INTF_0);
return 0;
 }
 
-- 
2.26.1

[PATCH v2 4/8] drm/msm/dpu: don't use INTF_INPUT_CTRL feature on sdm845

2020-07-10 Thread Jonathan Marek

The INTF_INPUT_CTRL feature is not available on sdm845, so don't set it.

This also adds separate feature bits for INTF (based on downstream) instead
of using CTL feature bit for it, and removes the unnecessary NULL check in
the added bind_pingpong_blk function.

Fixes: 73bfb790ac786ca55fa2786a06f59 ("msm:disp:dpu1: setup display datapath 
for SC7180 target")

Signed-off-by: Jonathan Marek 
---
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c| 20 +++
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h| 13 
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c   |  9 ++---
 3 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index f4ccbe56a09e..1d19c377b096 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -43,6 +43,10 @@
 
 #define DSPP_SC7180_MASK BIT(DPU_DSPP_PCC)
 
+#define INTF_SDM845_MASK (0)
+
+#define INTF_SC7180_MASK BIT(DPU_INTF_INPUT_CTRL) | BIT(DPU_INTF_TE)
+
 #define DEFAULT_PIXEL_RAM_SIZE (50 * 1024)
 #define DEFAULT_DPU_LINE_WIDTH 2048
 #define DEFAULT_DPU_OUTPUT_LINE_WIDTH  2560
@@ -398,26 +402,26 @@ static struct dpu_pingpong_cfg sc7180_pp[] = {
 /*
  * INTF sub blocks config
  */
-#define INTF_BLK(_name, _id, _base, _type, _ctrl_id) \
+#define INTF_BLK(_name, _id, _base, _type, _ctrl_id, _features) \
{\
.name = _name, .id = _id, \
.base = _base, .len = 0x280, \
-   .features = BIT(DPU_CTL_ACTIVE_CFG), \
+   .features = _features, \
.type = _type, \
.controller_id = _ctrl_id, \
.prog_fetch_lines_worst_case = 24 \
}
 
 static const struct dpu_intf_cfg sdm845_intf[] = {
-   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, 0),
-   INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0),
-   INTF_BLK("intf_2", INTF_2, 0x6B000, INTF_DSI, 1),
-   INTF_BLK("intf_3", INTF_3, 0x6B800, INTF_DP, 1),
+   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, 0, INTF_SDM845_MASK),
+   INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0, INTF_SDM845_MASK),
+   INTF_BLK("intf_2", INTF_2, 0x6B000, INTF_DSI, 1, INTF_SDM845_MASK),
+   INTF_BLK("intf_3", INTF_3, 0x6B800, INTF_DP, 1, INTF_SDM845_MASK),
 };
 
 static const struct dpu_intf_cfg sc7180_intf[] = {
-   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, 0),
-   INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0),
+   INTF_BLK("intf_0", INTF_0, 0x6A000, INTF_DP, 0, INTF_SC7180_MASK),
+   INTF_BLK("intf_1", INTF_1, 0x6A800, INTF_DSI, 0, INTF_SC7180_MASK),
 };
 
 /*
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
index a6221fdc02d2..e9458c85e20c 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
@@ -186,6 +186,19 @@ enum {
DPU_CTL_MAX
 };
 
+/**
+ * INTF sub-blocks
+ * @DPU_INTF_INPUT_CTRL Supports the setting of pp block from which
+ *  pixel data arrives to this INTF
+ * @DPU_INTF_TE INTF block has TE configuration support
+ * @DPU_INTF_MAX
+ */
+enum {
+   DPU_INTF_INPUT_CTRL = 0x1,
+   DPU_INTF_TE,
+   DPU_INTF_MAX
+};
+
 /**
  * VBIF sub-blocks and features
  * @DPU_VBIF_QOS_OTLIMVBIF supports OT Limit
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index efe9a5719c6b..64f556d693dd 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -225,14 +225,9 @@ static void dpu_hw_intf_bind_pingpong_blk(
bool enable,
const enum dpu_pingpong pp)
 {
-   struct dpu_hw_blk_reg_map *c;
+   struct dpu_hw_blk_reg_map *c = >hw;
u32 mux_cfg;
 
-   if (!intf)
-   return;
-
-   c = >hw;
-
mux_cfg = DPU_REG_READ(c, INTF_MUX);
mux_cfg &= ~0xf;
 
@@ -280,7 +275,7 @@ static void _setup_intf_ops(struct dpu_hw_intf_ops *ops,
ops->get_status = dpu_hw_intf_get_status;
ops->enable_timing = dpu_hw_intf_enable_timing_engine;
ops->get_line_count = dpu_hw_intf_get_line_count;
-   if (cap & BIT(DPU_CTL_ACTIVE_CFG))
+   if (cap & BIT(DPU_INTF_INPUT_CTRL))
ops->bind_pingpong_blk = dpu_hw_intf_bind_pingpong_blk;
 }
 
-- 
2.26.1

[PATCH v2 3/8] drm/msm/dpu: move some sspp caps to dpu_caps

2020-07-10 Thread Jonathan Marek

This isn't something that ever changes between planes, so move it to
dpu_caps struct. Making this change will allow more re-use in the
"SSPP sub blocks config" part of the catalog, in particular when adding
support for SM8150 and SM8250 which have different max_linewidth.

This also sets max_hdeci_exp/max_vdeci_exp to 0 for sc7180, as decimation
is not supported on the newest DPU versions. (note that decimation is not
implemented, so this changes nothing)

Signed-off-by: Jonathan Marek 
---
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c| 14 +--
 .../gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h| 24 +++
 drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c |  6 ++---
 3 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
index 29d4fde3172b..f4ccbe56a09e 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
@@ -70,6 +70,10 @@ static const struct dpu_caps sdm845_dpu_caps = {
.has_dim_layer = true,
.has_idle_pc = true,
.has_3d_merge = true,
+   .max_linewidth = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
+   .pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
+   .max_hdeci_exp = MAX_HORZ_DECIMATION,
+   .max_vdeci_exp = MAX_VERT_DECIMATION,
 };
 
 static const struct dpu_caps sc7180_dpu_caps = {
@@ -80,6 +84,8 @@ static const struct dpu_caps sc7180_dpu_caps = {
.ubwc_version = DPU_HW_UBWC_VER_20,
.has_dim_layer = true,
.has_idle_pc = true,
+   .max_linewidth = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
+   .pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
 };
 
 static const struct dpu_mdp_cfg sdm845_mdp[] = {
@@ -178,16 +184,9 @@ static const struct dpu_ctl_cfg sc7180_ctl[] = {
  */
 
 /* SSPP common configuration */
-static const struct dpu_sspp_blks_common sdm845_sspp_common = {
-   .maxlinewidth = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
-   .pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
-   .maxhdeciexp = MAX_HORZ_DECIMATION,
-   .maxvdeciexp = MAX_VERT_DECIMATION,
-};
 
 #define _VIG_SBLK(num, sdma_pri, qseed_ver) \
{ \
-   .common = _sspp_common, \
.maxdwnscale = MAX_DOWNSCALE_RATIO, \
.maxupscale = MAX_UPSCALE_RATIO, \
.smart_dma_priority = sdma_pri, \
@@ -207,7 +206,6 @@ static const struct dpu_sspp_blks_common sdm845_sspp_common 
= {
 
 #define _DMA_SBLK(num, sdma_pri) \
{ \
-   .common = _sspp_common, \
.maxdwnscale = SSPP_UNITY_SCALE, \
.maxupscale = SSPP_UNITY_SCALE, \
.smart_dma_priority = sdma_pri, \
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
index 63512753b369..a6221fdc02d2 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
@@ -301,6 +301,10 @@ struct dpu_qos_lut_tbl {
  * @has_dim_layer  dim layer feature status
  * @has_idle_pcindicate if idle power collapse feature is supported
  * @has_3d_merge   indicate if 3D merge is supported
+ * @max_linewidth  max linewidth for sspp
+ * @pixel_ram_size size of latency hiding and de-tiling buffer in bytes
+ * @max_hdeci_exp  max horizontal decimation supported (max is 2^value)
+ * @max_vdeci_exp  max vertical decimation supported (max is 2^value)
  */
 struct dpu_caps {
u32 max_mixer_width;
@@ -312,22 +316,11 @@ struct dpu_caps {
bool has_dim_layer;
bool has_idle_pc;
bool has_3d_merge;
-};
-
-/**
- * struct dpu_sspp_blks_common : SSPP sub-blocks common configuration
- * @maxwidth: max pixelwidth supported by this pipe
- * @pixel_ram_size: size of latency hiding and de-tiling buffer in bytes
- * @maxhdeciexp: max horizontal decimation supported by this pipe
- * (max is 2^value)
- * @maxvdeciexp: max vertical decimation supported by this pipe
- * (max is 2^value)
- */
-struct dpu_sspp_blks_common {
-   u32 maxlinewidth;
+   /* SSPP limits */
+   u32 max_linewidth;
u32 pixel_ram_size;
-   u32 maxhdeciexp;
-   u32 maxvdeciexp;
+   u32 max_hdeci_exp;
+   u32 max_vdeci_exp;
 };
 
 /**
@@ -353,7 +346,6 @@ struct dpu_sspp_blks_common {
  * @virt_num_formats: Number of supported formats for virtual planes
  */
 struct dpu_sspp_sub_blks {
-   const struct dpu_sspp_blks_common *common;
u32 creq_vblank;
u32 danger_vblank;
u32 maxdwnscale;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
index 3b9c33e694bf..33f6c56f01ed 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
@@ -153,7 +153,7 @@ static int _dpu_plane_calc_fill_level(struct drm_plane 
*plane,
 
pdpu = to_dpu_plane(plane);
pstate =

Re: [PATCH 11/12] device-dax: Add dis-contiguous resource support

2020-07-10 Thread Dan Williams

On Mon, Apr 6, 2020 at 1:22 PM Dan Williams  wrote:
>
> On Mon, Apr 6, 2020 at 3:46 AM Joao Martins  wrote:
> >
> > On 3/23/20 11:55 PM, Dan Williams wrote:
> >
> > [...]
> >
> > >  static ssize_t dev_dax_resize(struct dax_region *dax_region,
> > >   struct dev_dax *dev_dax, resource_size_t size)
> > >  {
> > >   resource_size_t avail = dax_region_avail_size(dax_region), to_alloc;
> > > - resource_size_t dev_size = range_len(_dax->range);
> > > + resource_size_t dev_size = dev_dax_size(dev_dax);
> > >   struct resource *region_res = _region->res;
> > >   struct device *dev = _dax->dev;
> > > - const char *name = dev_name(dev);
> > >   struct resource *res, *first;
> > > + resource_size_t alloc = 0;
> > > + int rc;
> > >
> > >   if (dev->driver)
> > >   return -EBUSY;
> > > @@ -684,38 +766,47 @@ static ssize_t dev_dax_resize(struct dax_region 
> > > *dax_region,
> > >* allocating a new resource.
> > >*/
> > >   first = region_res->child;
> > > - if (!first)
> > > - return __alloc_dev_dax_range(dev_dax, dax_region->res.start,
> > > - to_alloc);
> > > - for (res = first; to_alloc && res; res = res->sibling) {
> > > +retry:
> > > + rc = -ENOSPC;
> > > + for (res = first; res; res = res->sibling) {
> > >   struct resource *next = res->sibling;
> > > - resource_size_t free;
> > >
> > >   /* space at the beginning of the region */
> > > - free = 0;
> > > - if (res == first && res->start > dax_region->res.start)
> > > - free = res->start - dax_region->res.start;
> > > - if (free >= to_alloc && dev_size == 0)
> > > - return __alloc_dev_dax_range(dev_dax,
> > > - dax_region->res.start, to_alloc);
> > > -
> > > - free = 0;
> > > + if (res == first && res->start > dax_region->res.start) {
> > > + alloc = min(res->start - dax_region->res.start,
> > > + to_alloc);
> > > + rc = __alloc_dev_dax_range(dev_dax,
> > > + dax_region->res.start, alloc);
> >
> > You might be missing:
> >
> > first = region_res->child;
> >
> > (...) right after returning from __alloc_dev_dax_range(). Alternatively, 
> > perhaps
> > even moving the 'retry' label to right before the @first initialization.
> >
> > In the case that you pick space from the beginning, the child resource of 
> > the
> > dax region will point to first occupied region, and that changes after you 
> > pick
> > this space. So, IIUC, you want to adjust where you start searching free 
> > space
> > otherwise you end up wrongly picking that same space twice.
> >
> > If it helps, the bug can be reproduced in this unit test below, see
> > daxctl_test3() test:
>
> It definitely will, thanks. I'll be circling back to this now that
> I've settled my tree for the v5.7 window.

s/v5.7/v5.9/ whats a couple of kernel release cycles between friends?
I went ahead and moved the retry loop above the assignment of first as
you suggested.

Re: [PATCH 11/12] device-dax: Add dis-contiguous resource support

2020-07-10 Thread Dan Williams

On Tue, Mar 24, 2020 at 9:12 AM Joao Martins  wrote:
>
> On 3/23/20 11:55 PM, Dan Williams wrote:
> >  static ssize_t dev_dax_resize(struct dax_region *dax_region,
> >   struct dev_dax *dev_dax, resource_size_t size)
> >  {
> >   resource_size_t avail = dax_region_avail_size(dax_region), to_alloc;
> > - resource_size_t dev_size = range_len(_dax->range);
> > + resource_size_t dev_size = dev_dax_size(dev_dax);
> >   struct resource *region_res = _region->res;
> >   struct device *dev = _dax->dev;
> > - const char *name = dev_name(dev);
> >   struct resource *res, *first;
> > + resource_size_t alloc = 0;
> > + int rc;
> >
> >   if (dev->driver)
> >   return -EBUSY;
> > @@ -684,38 +766,47 @@ static ssize_t dev_dax_resize(struct dax_region 
> > *dax_region,
> >* allocating a new resource.
> >*/
> >   first = region_res->child;
> > - if (!first)
> > - return __alloc_dev_dax_range(dev_dax, dax_region->res.start,
> > - to_alloc);
>
> You probably want to retain the condition above?
>
> Otherwise it removes the ability to create new devices or resizing it , once 
> we
> have zero-ed the last one.
>
> > - for (res = first; to_alloc && res; res = res->sibling) {
> > +retry:
> > + rc = -ENOSPC;
> > + for (res = first; res; res = res->sibling) {
> >   struct resource *next = res->sibling;
> > - resource_size_t free;
> >
> >   /* space at the beginning of the region */
> > - free = 0;
> > - if (res == first && res->start > dax_region->res.start)
> > - free = res->start - dax_region->res.start;
> > - if (free >= to_alloc && dev_size == 0)
> > - return __alloc_dev_dax_range(dev_dax,
> > - dax_region->res.start, to_alloc);
> > -
> > - free = 0;
> > + if (res == first && res->start > dax_region->res.start) {
> > + alloc = min(res->start - dax_region->res.start,
> > + to_alloc);
> > + rc = __alloc_dev_dax_range(dev_dax,
> > + dax_region->res.start, alloc);
> > + break;
> > + }
> > +
> > + alloc = 0;
> >   /* space between allocations */
> >   if (next && next->start > res->end + 1)
> > - free = next->start - res->end + 1;
> > + alloc = min(next->start - (res->end + 1), to_alloc);
> >
> >   /* space at the end of the region */
> > - if (free < to_alloc && !next && res->end < region_res->end)
> > - free = region_res->end - res->end;
> > -
> > - if (free >= to_alloc && strcmp(name, res->name) == 0)
> > - return __adjust_dev_dax_range(dev_dax, res,
> > - resource_size(res) + to_alloc);
> > - else if (free >= to_alloc && dev_size == 0)
> > - return __alloc_dev_dax_range(dev_dax, res->end + 1,
> > - to_alloc);
> > + if (!alloc && !next && res->end < region_res->end)
> > + alloc = min(region_res->end - res->end, to_alloc);
> > +
> > + if (!alloc)
> > + continue;
> > +
> > + if (adjust_ok(dev_dax, res)) {
> > + rc = __adjust_dev_dax_range(dev_dax, res,
> > + resource_size(res) + alloc);
> > + break;
> > + }
> > + rc = __alloc_dev_dax_range(dev_dax, res->end + 1,
> > + alloc);
>
> I am wondering if we should switch to:
>
> if (adjust_ok(...))
> rc = __adjust_dev_dax_range(...);
> else
> rc = __alloc_dev_dax_range(...);
>
> And then a debug print at the end depicting whether and how did we grabbed
> space? Something like:
>
> dev_dbg(_dax->dev, "%s(%d) %d", action, location, rc);
>
> Assuming we set @location to its values when we allocate space at the end,
> beginning or middle; and @action to whether we adjusted up/down or allocated 
> new
> range.
>
> Essentially, something similar to namespaces scan_allocate() just to help
> troubleshoot?

I went ahead and just added "alloc", "extend", "shrink", and "delete
debug prints in the right places.

Re: 5.7 regression: Lots of PCIe AER errors and suspend failure without pcie=noaer

2020-07-10 Thread Robert Hancock

On Fri, Jul 10, 2020 at 6:23 PM Robert Hancock  wrote:
>
> Noticed a problem on my desktop with an Asus PRIME H270-PRO
> motherboard after Fedora 32 upgraded to the 5.7 kernel (now on 5.7.8):
> periodically there are PCIe AER errors getting spewed in dmesg that
> weren't happening before, and this also seems to causes suspend to
> fail - the system just wakes back up again right away, I am assuming
> due to some AER errors interrupting the process. 5.6 kernels didn't
> have this problem. Setting "pcie=noaer" on the kernel command line
> works around the issue, but I'm not sure what would have changed to
> trigger this to occur?

Correction: the workaround option is "pci=noaer".

Re: [PATCH stable v4.9 v2] arm64: entry: Place an SB sequence following an ERET instruction

2020-07-10 Thread Sasha Levin


On Thu, Jul 09, 2020 at 12:50:23PM -0700, Florian Fainelli wrote:

From: Will Deacon 

commit 679db70801da9fda91d26caf13bf5b5ccc74e8e8 upstream

Some CPUs can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by a lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.

This patch emits an SB sequence after each ERET so that speculation is
held up on exception return.

Signed-off-by: Will Deacon 
[florian: Adjust hyp-entry.S to account for the label
added change to hyp/entry.S]
Signed-off-by: Florian Fainelli 


I've queued it up, thanks!

--
Thanks,
Sasha

Re: [PATCH 1/2] clk: Export clk_register_composite

2020-07-10 Thread Stephen Boyd

Quoting Wendell Lin (2020-07-01 00:26:21)
> clk_register_composite() will be used in mediatek's
> clock kernel module, so export it to GPL modules.
> 
> Signed-off-by: Wendell Lin 
> ---
>  drivers/clk/clk-composite.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/clk/clk-composite.c b/drivers/clk/clk-composite.c
> index 7376f57..fb5cb4a 100644
> --- a/drivers/clk/clk-composite.c
> +++ b/drivers/clk/clk-composite.c
> @@ -360,6 +360,7 @@ struct clk *clk_register_composite(struct device *dev, 
> const char *name,
> return ERR_CAST(hw);
> return hw->clk;
>  }
> +EXPORT_SYMBOL(clk_register_composite);

Should be EXPORT_SYMBOL_GPL()

Re: [PATCH v6 14/17] static_call: Handle tail-calls

2020-07-10 Thread Steven Rostedt

On Fri, 10 Jul 2020 15:38:45 +0200
Peter Zijlstra  wrote:

> GCC can turn our static_call(name)(args...) into a tail call, in which
> case we get a JMP.d32 into the trampoline (which then does a further
> tail-call).
> 
> Teach objtool to recognise and mark these in .static_call_sites and
> adjust the code patching to deal with this.
> 

Hmm, were you able to trigger crashes before this patch?

> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/kernel/static_call.c   |   21 ++---
>  include/linux/static_call.h |4 ++--
>  include/linux/static_call_types.h   |7 +++
>  kernel/static_call.c|   21 +
>  tools/include/linux/static_call_types.h |7 +++
>  tools/objtool/check.c   |   18 +-
>  6 files changed, 60 insertions(+), 18 deletions(-)
> 
> --- a/arch/x86/kernel/static_call.c
> +++ b/arch/x86/kernel/static_call.c
> @@ -41,15 +41,30 @@ static void __static_call_transform(void
>   text_poke_bp(insn, code, size, NULL);
>  }
>  
> -void arch_static_call_transform(void *site, void *tramp, void *func)
> +static inline enum insn_type __sc_insn(bool null, bool tail)
> +{
> + /*
> +  * Encode the following table without branches:
> +  *
> +  *  tailnullinsn
> +  *  -+---+--
> +  *0  |   0   |  CALL
> +  *0  |   1   |  NOP
> +  *1  |   0   |  JMP
> +  *1  |   1   |  RET
> +  */
> + return 2*tail + null;
> +}
> +
> +void arch_static_call_transform(void *site, void *tramp, void *func, bool 
> tail)
>  {
>   mutex_lock(_mutex);
>  
>   if (tramp)
> - __static_call_transform(tramp, func ? JMP : RET, func);
> + __static_call_transform(tramp, __sc_insn(!func, true), func);
>  
>   if (IS_ENABLED(CONFIG_HAVE_STATIC_CALL_INLINE) && site)
> - __static_call_transform(site, func ? CALL : NOP, func);
> + __static_call_transform(site, __sc_insn(!func, tail), func);
>  
>   mutex_unlock(_mutex);
>  }
> --- a/include/linux/static_call.h
> +++ b/include/linux/static_call.h
> @@ -103,7 +103,7 @@
>  /*
>   * Either @site or @tramp can be NULL.
>   */
> -extern void arch_static_call_transform(void *site, void *tramp, void *func);
> +extern void arch_static_call_transform(void *site, void *tramp, void *func, 
> bool tail);
>  
>  #define STATIC_CALL_TRAMP_ADDR(name) _CALL_TRAMP(name)
>  
> @@ -206,7 +206,7 @@ void __static_call_update(struct static_
>  {
>   cpus_read_lock();
>   WRITE_ONCE(key->func, func);
> - arch_static_call_transform(NULL, tramp, func);
> + arch_static_call_transform(NULL, tramp, func, false);
>   cpus_read_unlock();
>  }
>  
> --- a/include/linux/static_call_types.h
> +++ b/include/linux/static_call_types.h
> @@ -17,6 +17,13 @@
>  #define STATIC_CALL_TRAMP_STR(name)  __stringify(STATIC_CALL_TRAMP(name))
>  
>  /*
> + * Flags in the low bits of static_call_site::key.
> + */
> +#define STATIC_CALL_SITE_TAIL 1UL/* tail call */
> +#define STATIC_CALL_SITE_INIT 2UL/* init section */
> +#define STATIC_CALL_SITE_FLAGS 3UL
> +
> +/*
>   * The static call site table needs to be created by external tooling 
> (objtool
>   * or a compiler plugin).
>   */
> --- a/kernel/static_call.c
> +++ b/kernel/static_call.c
> @@ -15,8 +15,6 @@ extern struct static_call_site __start_s
>  
>  static bool static_call_initialized;
>  
> -#define STATIC_CALL_INIT 1UL
> -
>  /* mutex to protect key modules/sites */
>  static DEFINE_MUTEX(static_call_mutex);
>  
> @@ -39,18 +37,23 @@ static inline void *static_call_addr(str
>  static inline struct static_call_key *static_call_key(const struct 
> static_call_site *site)
>  {
>   return (struct static_call_key *)
> - (((long)site->key + (long)>key) & ~STATIC_CALL_INIT);
> + (((long)site->key + (long)>key) & 
> ~STATIC_CALL_SITE_FLAGS);
>  }
>  
>  /* These assume the key is word-aligned. */
>  static inline bool static_call_is_init(struct static_call_site *site)
>  {
> - return ((long)site->key + (long)>key) & STATIC_CALL_INIT;
> + return ((long)site->key + (long)>key) & STATIC_CALL_SITE_INIT;
> +}
> +
> +static inline bool static_call_is_tail(struct static_call_site *site)
> +{
> + return ((long)site->key + (long)>key) & STATIC_CALL_SITE_TAIL;
>  }
>  
>  static inline void static_call_set_init(struct static_call_site *site)
>  {
> - site->key = ((long)static_call_key(site) | STATIC_CALL_INIT) -
> + site->key = ((long)static_call_key(site) | STATIC_CALL_SITE_INIT) -
>   (long)>key;
>  }
>  
> @@ -104,7 +107,7 @@ void __static_call_update(struct static_
>  
>   key->func = func;
>  
> - arch_static_call_transform(NULL, tramp, func);
> + arch_static_call_transform(NULL, tramp, func, false);
>  
>   /*
>* If uninitialized, we'll not update the callsites, but they still
> @@

Re: [PATCH V1] mmc: sdhci-msm: Set IO pins in low power state during suspend

2020-07-10 Thread Matthias Kaehlcke

Hi,

On Fri, Jul 10, 2020 at 04:28:36PM +0530, Veerabhadrarao Badiganti wrote:
> Hi Mathias,
> 
> On 7/10/2020 6:22 AM, Matthias Kaehlcke wrote:
> > Hi,
> > 
> > On Wed, Jul 08, 2020 at 06:41:20PM +0530, Veerabhadrarao Badiganti wrote:
> > > Configure SDHC IO pins with low power configuration when the driver
> > > is in suspend state.
> > > 
> > > Signed-off-by: Veerabhadrarao Badiganti 
> > > ---
> > >   drivers/mmc/host/sdhci-msm.c | 17 +
> > >   1 file changed, 17 insertions(+)
> > > 
> > > diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
> > > index 392d41d57a6e..efd2bae1430c 100644
> > > --- a/drivers/mmc/host/sdhci-msm.c
> > > +++ b/drivers/mmc/host/sdhci-msm.c
> > > @@ -15,6 +15,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include "sdhci-pltfm.h"
> > >   #include "cqhci.h"
> > > @@ -1352,6 +1353,19 @@ static void sdhci_msm_set_uhs_signaling(struct 
> > > sdhci_host *host,
> > >   sdhci_msm_hs400(host, >ios);
> > >   }
> > > +static int sdhci_msm_set_pincfg(struct sdhci_msm_host *msm_host, bool 
> > > level)
> > > +{
> > > + struct platform_device *pdev = msm_host->pdev;
> > > + int ret;
> > > +
> > > + if (level)
> > > + ret = pinctrl_pm_select_default_state(>dev);
> > > + else
> > > + ret = pinctrl_pm_select_sleep_state(>dev);
> > > +
> > > + return ret;
> > > +}
> > > +
> > >   static int sdhci_msm_set_vmmc(struct mmc_host *mmc)
> > >   {
> > >   if (IS_ERR(mmc->supply.vmmc))
> > > @@ -1596,6 +1610,9 @@ static void sdhci_msm_handle_pwr_irq(struct 
> > > sdhci_host *host, int irq)
> > >   ret = sdhci_msm_set_vqmmc(msm_host, mmc,
> > >   pwr_state & REQ_BUS_ON);
> > >   if (!ret)
> > > + ret = sdhci_msm_set_pincfg(msm_host,
> > > + pwr_state & REQ_BUS_ON);
> > > + if (!ret)
> > >   irq_ack |= CORE_PWRCTL_BUS_SUCCESS;
> > >   else
> > >   irq_ack |= CORE_PWRCTL_BUS_FAIL;
> > I happened to have a debug patch in my tree which logs when regulators
> > are enabled/disabled, with this patch I see the SD card regulator
> > toggling constantly after returning from the first system suspend.
> > 
> > I added more logs:
> > 
> > [ 1156.085819] DBG: sdhci_msm_set_pincfg: level = 0 (ret: 0)
> > [ 1156.248936] DBG: sdhci_msm_set_pincfg: level = 1 (ret: 0)
> > [ 1156.301989] DBG: sdhci_msm_set_pincfg: level = 0 (ret: 0)
> > [ 1156.462383] DBG: sdhci_msm_set_pincfg: level = 1 (ret: 0)
> > [ 1156.525988] DBG: sdhci_msm_set_pincfg: level = 0 (ret: 0)
> > [ 1156.670372] DBG: sdhci_msm_set_pincfg: level = 1 (ret: 0)
> > [ 1156.717935] DBG: sdhci_msm_set_pincfg: level = 0 (ret: 0)
> > [ 1156.878122] DBG: sdhci_msm_set_pincfg: level = 1 (ret: 0)
> > [ 1156.928134] DBG: sdhci_msm_set_pincfg: level = 0 (ret: 0)
> > 
> > This is on an SC7180 platform. It doesn't run an upstream kernel though,
> > but v5.4 with plenty of upstream patches.
> I have verified this on couple of sc7180 targets (on Chrome platform with
> Chrome kernel).
> But didn't see any issue. Its working as expected.

Did you test system suspend too? At least in the Chrome OS kernel tree system
suspend is not supported yet in the main branch, you'd need a pile of 30+
extra patches to get it to work. This is expected to change soon though :)

> Let me know if you are observing this issue constantly on multiple boards, I
> will share you
> a debug patch to check it further.

I currently have only one board with the SD card slot populated, I might
get another one next week.

The toggling occurs only when no SD card is inserted.

[GIT PULL] CodingStyle: Inclusive Terminology

2020-07-10 Thread Dan Williams

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux
tags/inclusive-terminology

...to receive a coding-style update for v5.9. The discussion has
tapered off as well as the incoming ack, review, and sign-off tags. I
did not see a reason to wait for the next merge window.

Thank you.

---

The following changes since commit 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68:

  Linux 5.8-rc3 (2020-06-28 15:00:24 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux
tags/inclusive-terminology

for you to fetch changes up to a5f526ecb075a08c4a082355020166c7fe13ae27:

  CodingStyle: Inclusive Terminology (2020-07-03 23:54:35 -0700)


Extend coding-style with inclusive-terminology recommendations.


Dan Williams (1):
  CodingStyle: Inclusive Terminology

 Documentation/process/coding-style.rst | 20 
 1 file changed, 20 insertions(+)

---

Acked-by: Randy Dunlap 
Acked-by: Dave Airlie 
Acked-by: SeongJae Park 
Acked-by: Christian Brauner 
Acked-by: James Bottomley 
Acked-by: Daniel Vetter 
Acked-by: Andy Lutomirski 
Acked-by: Laura Abbott 
Acked-by: Gustavo A. R. Silva 
Reviewed-by: Matthias Brugger 
Reviewed-by: Mark Brown 
Signed-off-by: Stephen Hemminger 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Shuah Khan 
Signed-off-by: Dan Carpenter 
Signed-off-by: Kees Cook 
Signed-off-by: Olof Johansson 
Signed-off-by: Jonathan Corbet 
Signed-off-by: Chris Mason 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Dan Williams

perf: util/syscalltbl.c:43:38: error: ‘SYSCALLTBL_ARM64_MAX_ID’ undeclared here (not in a function)

2020-07-10 Thread Gordan Bobic

I hit this FTBFS earlier today while trying to build the LT 4.19.132
kernel from source:

https://lore.kernel.org/patchwork/patch/960281/

The "quick hack" patch mentioned at the bottom of the thread gets it
to compile. Is this the correct solution, or is there a better fix? As
it stands, tools/perf doesn't seem to compile on 4.19.132.

[GIT] Networking

2020-07-10 Thread David Miller


It's been about two weeks since the last batch of fixes, and it
shows as we clock in here at 146 non-merge commits:

1) Restore previous behavior of CAP_SYS_ADMIN wrt. loading networking
   BPF programs, from Maciej Żenczykowski.

2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu Mariappan.

3) Slay memory leak in nl80211 bss color attribute parsing code, from
   Luca Coelho.

4) Get route from skb properly in ip_route_use_hint(), from Miaohe
   Lin.

5) Don't allow anything other than ARPHRD_ETHER in llc code, from
   Eric Dumazet.

6) xsk code dips too deeply into DMA mapping implementation internals.
   Add dma_need_sync and use it.  From Christoph Hellwig

7) Enforce power-of-2 for BPF ringbuf sizes.  From Andrii Nakryiko.

8) Check for disallowed attributes when loading flow dissector BPF
   programs.  From Lorenz Bauer.

9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
   Jason A. Donenfeld.

10) Don't advertise checksum offload on ipa devices that don't support
it.  From Alex Elder.

11) Resolve several issues in TCP MD5 signature support.  Missing
memory barriers, bogus options emitted when using syncookies,
and failure to allow md5 key changes in established states.
All from Eric Dumazet.

12) Fix interface leak in hsr code, from Taehee Yoo.

13) VF reset fixes in hns3 driver, from Huazhong Tan.

14) Make loopback work again with ipv6 anycast, from David Ahern.

15) Fix TX starvation under high load in fec driver, from Tobias
Waldekranz.

16) MLD2 payload lengths not checked properly in bridge multicast
code, from Linus Lüssing.

17) Packet scheduler code that wants to find the inner protocol
currently only works for one level of VLAN encapsulation.  Allow
Q-in-Q situations to work properly here, from Toke
Høiland-Jørgensen.

18) Fix route leak in l2tp, from Xin Long.

19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
support and various protocols.  From Martin KaFai Lau.

20) Fix socket cgroup v2 reference counting in some situations, from
Cong Wang.

21) Cure memory leak in mlx5 connection tracking offload support, from
Eli Britstein.

Please pull, thanks a lot!


The following changes since commit 4a21185cda0fbb860580eeeb4f1a70a9cda332a4:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2020-06-25 
18:27:40 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git 

for you to fetch changes up to 1195c7cebb95081d809f81a27b21829573cbd4a8:

  Merge branch 'mlxsw-Various-fixes' (2020-07-10 14:33:34 -0700)


AceLan Kao (1):
  net: usb: qmi_wwan: add support for Quectel EG95 LTE modem

Alex Elder (6):
  net: ipa: always check for stopped channel
  net: ipa: no checksum offload for SDM845 LAN RX
  net: ipa: introduce ipa_cmd_tag_process()
  net: ipa: fix QMI structure definition bugs
  net: ipa: declare struct types in "ipa_gsi.h"
  net: ipa: include declarations in "ipa_gsi.c"

Alexander Lobakin (1):
  net: qed: fix buffer overflow on ethtool -d

Alexei Starovoitov (3):
  Merge branch 'fix-sockmap'
  Merge branch 'bpf-multi-prog-prep'
  Merge branch 'fix-sockmap-flow_dissector-uapi'

Andre Edich (2):
  smsc95xx: check return value of smsc95xx_reset
  smsc95xx: avoid memory leak in smsc95xx_bind

Andrii Nakryiko (3):
  libbpf: Forward-declare bpf_stats_type for systems with outdated UAPI 
headers
  libbpf: Fix CO-RE relocs against .text section
  bpf: Enforce BPF ringbuf size to be the power of 2

Aya Levin (3):
  net/mlx5e: Fix VXLAN configuration restore after function reload
  net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
  net/mlx5e: Fix 50G per lane indication

Carl Huang (1):
  net: qrtr: free flow in __qrtr_node_release

Christoph Hellwig (4):
  dma-mapping: Add a new dma_need_sync API
  xsk: Replace the cheap_dma flag with a dma_need_sync flag
  xsk: Remove a double pool->dev assignment in xp_dma_map
  xsk: Use dma_need_sync instead of reimplenting it

Christoph Paasch (1):
  tcp: make sure listeners don't initialize congestion-control state

Claudiu Manoil (1):
  enetc: Fix tx rings bitmap iteration range, irq handling

Codrin Ciubotariu (1):
  net: dsa: microchip: set the correct number of ports

Cong Wang (6):
  net: get rid of lockdep_set_class_and_subclass()
  net: explain the lockdep annotations for dev_uc_unsync()
  genetlink: get rid of family->attrbuf
  cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
  net_sched: fix a memory leak in atm_tc_init()
  cgroup: Fix sock_cgroup_data on big-endian.

Dan Carpenter (1):
  net: qrtr: Fix an out of bounds read qrtr_endpoint_post()

David Ahern (2):
  ipv6: fib6_select_path can not use out path for nexthop objects
  ipv6: Fix use of

Re: Linux kernel in-tree Rust support

2020-07-10 Thread Linus Torvalds

On Fri, Jul 10, 2020 at 3:59 PM Josh Triplett  wrote:
>
> As I recall, Greg's biggest condition for initial introduction of this
> was to do the same kind of "turn this Kconfig option on and turn an
> option under it off" trick that LTO uses, so that neither "make
> allnoconfig" nor "make allyesconfig" would require Rust until we've had
> plenty of time to experiment with it.

No, please make it a "is rust available" automatic config option. The
exact same way we already do the compiler versions and check for
various availability of compiler flags at config time.

See init/Kconfig for things like

  config LD_IS_LLD
  def_bool $(success,$(LD) -v | head -n 1 | grep -q LLD)

and the rust support should be similar. Something like

  config RUST_IS_AVAILABLE
  def_bool $(success,$(RUST) ..sometest..)

because I _don't_ want us to be in the situation where any new rust
support isn't even build-tested by default.

Quite the reverse. I'd want the first rust driver (or whatever) to be
introduced in such a simple format that failures will be obvious and
simple.

The _worst_ situation to be in is that s (small) group of people start
testing their very special situation, and do bad and crazy things
because "nobody else cares, it's hidden".

No, thank you.

 Linus

Re: [PATCH v2 2/2] clk: scmi: Fix min and max rate when registering clocks with discrete rates

2020-07-10 Thread Stephen Boyd

Quoting Sudeep Holla (2020-07-09 01:17:05)
> Currently we are not initializing the scmi clock with discrete rates
> correctly. We fetch the min_rate and max_rate value only for clocks with
> ranges and ignore the ones with discrete rates. This will lead to wrong
> initialization of rate range when clock supports discrete rate.
> 
> Fix this by using the first and the last rate in the sorted list of the
> discrete clock rates while registering the clock.
> 
> Link: https://lore.kernel.org/r/20200708110725.18017-2-sudeep.ho...@arm.com
> Fixes: 6d6a1d82eaef7 ("clk: add support for clocks provided by SCMI")
> Reported-by: Dien Pham 
> Signed-off-by: Sudeep Holla 
> ---
>  drivers/clk/clk-scmi.c | 22 +++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> Hi Stephen,
> 
> If you are fine, I can take this via ARM SoC along with the change in
> firmware driver. However it is also fine if you want to merge this
> independently as there is no strict dependency. Let me know either way.

I don't mind either way. If you want to send it in along with the
firmware change then that's fine.

Reviewed-by: Stephen Boyd

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1291 matches

Mail list logo