date:20190930

Re: [PATCH v3 1/2] x86: Don't let pgprot_modify() change the page encryption bit

2019-09-30 Thread Thomas Hellstrom

Hi,

On 9/18/19 7:57 PM, Dave Hansen wrote:
> On 9/17/19 6:01 AM, Thomas Hellström (VMware) wrote:
>> diff --git a/arch/x86/include/asm/pgtable_types.h 
>> b/arch/x86/include/asm/pgtable_types.h
>> index b5e49e6bac63..8267dd426b15 100644
>> --- a/arch/x86/include/asm/pgtable_types.h
>> +++ b/arch/x86/include/asm/pgtable_types.h
>> @@ -123,7 +123,7 @@
>>   */
>>  #define _PAGE_CHG_MASK  (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | 
>> \
>>   _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
>> - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
>> + _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC)
>>  #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
> My only nit with what remains is that it expands the infestation of
> things that look like a simple macro but are not.
>
> I'm debating whether we want to go fix that now, though.
>
Any chance for an ack on this? It's really a small change that, as we've
found out, fixes an existing problem.

Thanks,

Thomas

Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages

2019-09-30 Thread Michal Hocko

On Mon 30-09-19 13:28:17, Michal Hocko wrote:
[...]
> Do not get me wrong, but we have a quite a long history of fine tuning
> for THP by adding kludges here and there and they usually turnout to
> break something else. I really want to get to understand the underlying
> problem and base a solution on it rather than "__GFP_THISNODE can cause
> overreclaim so pick up a break out condition and hope for the best".

I didn't really get to any testing earlier but I was really suspecting
that hugetlb will be the first one affected because it uses
__GFP_RETRY_MAYFAIL - aka it really wants to succeed as much as possible
because this is a direct admin request to preallocate a specific number
of huge pages. The patch 3 doesn't really make any difference for those
requests.

Here is a very simple test I've done. Single NUMA node with 1G of
memory. Populate the memory with a clean page cache. That is both easy
to compact and reclaim and then try to allocate 512M worth of hugetlb
pages.
root@test1:~# cat hugetlb_test.sh
#!/bin/sh

set -x
echo 0 > /proc/sys/vm/nr_hugepages
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10))
TS=$(date +%s)
cp /proc/vmstat vmstat.$TS.before
echo 256 > /proc/sys/vm/nr_hugepages
cat /proc/sys/vm/nr_hugepages
cp /proc/vmstat vmstat.$TS.after

The results for 2 consecutive runs on clean 5.3
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s
+ date +%s
+ TS=1569905284
+ cp /proc/vmstat vmstat.1569905284.before
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
256
+ cp /proc/vmstat vmstat.1569905284.after
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s
+ date +%s
+ TS=1569905311
+ cp /proc/vmstat vmstat.1569905311.before
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
256
+ cp /proc/vmstat vmstat.1569905311.after

so we get all the requested huge pages to the pool.

Now with first 3 patches of this series applied (the last one doesn't
make any difference for hugetlb allocations).

root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s
+ date +%s
+ TS=1569905516
+ cp /proc/vmstat vmstat.1569905516.before
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
11
+ cp /proc/vmstat vmstat.1569905516.after
root@test1:~# sh hugetlb_test.sh
+ echo 0
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s
+ date +%s
+ TS=1569905541
+ cp /proc/vmstat vmstat.1569905541.before
+ echo 256
+ cat /proc/sys/vm/nr_hugepages
12
+ cp /proc/vmstat vmstat.1569905541.after

so we do not get more that 12 huge pages which is really poor. Although
hugetlb pages tend to be allocated early after the boot they are still
an explicit admin request and having less than 5% success rate is really
bad. If anything the __GFP_RETRY_MAYFAIL needs to be reflected in the
code.

I can provide vmstat files if anybody is interested.

Then I have tried another test for THP. It is essentially the same
thing. Populate the page cache to simulate a quite common case of memory
being used for the cache and then populate 512M anonymous area with
MADV_HUGEPAG set on it:
$ cat thp_test.sh
#!/bin/sh

set -x
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10))
TS=$(date +%s)
cp /proc/vmstat vmstat.$TS.before
./mem_eater nowait 500M
cp /proc/vmstat vmstat.$TS.after

mem_eater is essentially (mmap, madvise, and touch page by page dummy
app).

Again clean 5.3 kernel

root@test1:~# sh thp_test.sh 
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 20.8575 s, 51.5 MB/s
+ date +%s
+ TS=1569906274
+ cp /proc/vmstat vmstat.1569906274.before
+ ./mem_eater nowait 500M
7f55e8282000-7f5607682000 rw-p  00:00 0 
Size: 512000 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB
Rss:  512000 kB
Pss:  512000 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty:512000 kB
Referenced:   260616 kB
Anonymous:512000 kB
LazyFree:  0 kB
AnonHugePages:509952 kB
ShmemPmdMapped:0 kB
Shared_Hugetlb:0 kB
Private_Hugetlb:   0 kB
Swap:  0 kB
SwapPss:   0 kB
Locked:0 kB
THPeligible:1
+ cp /proc/vmstat vmstat.1569906274.after

root@test1:~# sh thp_test.sh
+ echo 3
+ echo 1
+ dd if=/mnt/data/file-1G of=/dev/null bs=4096
262144+0

Re: [PATCH v2 0/6] media: cedrus: h264: Support multi-slice frames

2019-09-30 Thread Jernej Škrabec

Dne torek, 01. oktober 2019 ob 00:43:34 CEST je Hans Verkuil napisal(a):
> On 10/1/19 12:27 AM, Jernej Škrabec wrote:
> > Dne ponedeljek, 30. september 2019 ob 10:10:48 CEST je Hans Verkuil
> > 
> > napisal(a):
> >> On 9/29/19 10:00 PM, Jernej Skrabec wrote:
> >>> This series adds support for decoding multi-slice H264 frames along with
> >>> support for V4L2_DEC_CMD_FLUSH and V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF.
> >>> 
> >>> Code was tested by modified ffmpeg, which can be found here:
> >>> https://github.com/jernejsk/FFmpeg, branch mainline-test
> >>> It has to be configured with at least following options:
> >>> --enable-v4l2-request --enable-libudev --enable-libdrm
> >>> 
> >>> Samples used for testing:
> >>> http://jernej.libreelec.tv/videos/h264/BA1_FT_C.mp4
> >>> http://jernej.libreelec.tv/videos/h264/h264.mp4
> >>> 
> >>> Command line used for testing:
> >>> ffmpeg -hwaccel drm -hwaccel_device /dev/dri/card0 -i h264.mp4 -pix_fmt
> >>> bgra -f fbdev /dev/fb0
> >>> 
> >>> Please note that V4L2_DEC_CMD_FLUSH was not tested because I'm
> >>> not sure how. ffmpeg follows exactly which slice is last in frame
> >>> and sets hold flag accordingly. Improper usage of hold flag would
> >>> corrupt ffmpeg assumptions and it would probably crash. Any ideas
> >>> how to test this are welcome!
> >> 
> >> It can be tested partially at least: if ffmpeg tells you it is the last
> >> slice, then queue the slice with the HOLD flag set, then call FLUSH
> >> afterwards. This should clear the HOLD flag again. In this case the
> >> queued
> >> buffer is probably not yet processed, so the flag is cleared before the
> >> decode job starts.
> >> 
> >> You can also try to add a sleep before calling FLUSH to see what happens
> >> if the last queued buffer is already decoded.
> > 
> > Ok, I tried following code:
> > https://github.com/jernejsk/FFmpeg/blob/flush_test/libavcodec/
> > v4l2_request.c#L220-L233
> > 
> > But results are not good. It seems that out_vb in flush command is always
> > NULL and so it always marks capture buffer as done, which leads to kernel
> > warnings.
> > 
> > dmesg output with debugging messages is here: http://ix.io/1Ks8
> > 
> > Currently I'm not sure what is going on. Shouldn't be output buffer queued
> > and waiting to MEDIA_REQUEST_IOC_QUEUE which is executed after flush
> > command as it can be seen from ffmpeg code linked above?
> 
> Argh, I forgot about the fact that this uses requests.
> 
> The FLUSH should happen *after* the MEDIA_REQUEST_IOC_QUEUE ioctl. Otherwise
> it has no effect. As long as the request hasn't been queued, the buffer is
> also not queued to the driver, so out_vb will indeed be NULL.

It's better, but still not working. Currently ffmpeg sometimes reports such 
messages: https://pastebin.com/raw/9tVVtc20 This is dmesg output: http://
ix.io/1L1L

It seems to me like a race condition. Sometimes flush works as indendent and 
sometimes it influences next frame.

Best regards,
Jernje

> 
> Sorry for the confusion.
> 
> Regards,
> 
>   Hans
> 
> > Best regards,
> > Jernej
> > 
> >> Regards,
> >> 
> >>Hans
> >>
> >>> Thanks to Jonas for adjusting ffmpeg.
> >>> 
> >>> Please let me know what you think.
> >>> 
> >>> Best regards,
> >>> Jernej
> >>> 
> >>> Changes from v1:
> >>> - added Rb tags
> >>> - updated V4L2_DEC_CMD_FLUSH documentation
> >>> - updated first slice detection in Cedrus
> >>> - hold capture buffer flag is set according to source format
> >>> - added v4l m2m stateless_(try_)decoder_cmd ioctl helpers
> >>> 
> >>> Hans Verkuil (2):
> >>>   vb2: add V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF
> >>>   videodev2.h: add V4L2_DEC_CMD_FLUSH
> >>> 
> >>> Jernej Skrabec (4):
> >>>   media: v4l2-mem2mem: add stateless_(try_)decoder_cmd ioctl helpers
> >>>   media: cedrus: Detect first slice of a frame
> >>>   media: cedrus: h264: Support multiple slices per frame
> >>>   media: cedrus: Add support for holding capture buffer
> >>>  
> >>>  Documentation/media/uapi/v4l/buffer.rst   | 13 ++
> >>>  .../media/uapi/v4l/vidioc-decoder-cmd.rst | 10 +++-
> >>>  .../media/uapi/v4l/vidioc-reqbufs.rst |  6 +++
> >>>  .../media/videodev2.h.rst.exceptions  |  1 +
> >>>  .../media/common/videobuf2/videobuf2-v4l2.c   |  8 +++-
> >>>  drivers/media/v4l2-core/v4l2-mem2mem.c| 35 ++
> >>>  drivers/staging/media/sunxi/cedrus/cedrus.h   |  1 +
> >>>  .../staging/media/sunxi/cedrus/cedrus_dec.c   | 11 +
> >>>  .../staging/media/sunxi/cedrus/cedrus_h264.c  | 11 -
> >>>  .../staging/media/sunxi/cedrus/cedrus_hw.c|  8 ++--
> >>>  .../staging/media/sunxi/cedrus/cedrus_video.c | 14 ++
> >>>  include/media/v4l2-mem2mem.h  | 46 +++
> >>>  include/media/videobuf2-core.h|  3 ++
> >>>  include/media/videobuf2-v4l2.h|  5 ++
> >>>  include/uapi/linux/videodev2.h| 14 --
> >>>  15 files changed, 175 insertions(+), 11 deletions(-)

drivers/gpu/drm/exynos/exynos_drm_dsi.c:1796:2-9: line 1796 is redundant because platform_get_irq() already prints an error (fwd)

2019-09-30 Thread Julia Lawall




-- Forwarded message --
Date: Tue, 1 Oct 2019 10:47:40 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: drivers/gpu/drm/exynos/exynos_drm_dsi.c:1796:2-9: line 1796 is
redundant because platform_get_irq() already prints an error

CC: kbuild-...@01.org
CC: linux-kernel@vger.kernel.org
TO: Sam Ravnborg 
CC: Inki Dae 

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   54ecb8f7028c5eb3d740bb82b0f1d90f2df63c5c
commit: 156bdac99061b4013c8e47799c6e574f7f84e9f4 drm/exynos: trigger build of 
all modules
date:   3 months ago
:: branch date: 9 hours ago
:: commit date: 3 months ago

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 
Reported-by: Julia Lawall 

>> drivers/gpu/drm/exynos/exynos_drm_dsi.c:1796:2-9: line 1796 is redundant 
>> because platform_get_irq() already prints an error

# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=156bdac99061b4013c8e47799c6e574f7f84e9f4
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git remote update linus
git checkout 156bdac99061b4013c8e47799c6e574f7f84e9f4
vim +1796 drivers/gpu/drm/exynos/exynos_drm_dsi.c

f37cd5e8098441 Inki Dae 2014-05-09  1722
7eb8f069be8a03 Andrzej Hajda2014-04-04  1723  static int 
exynos_dsi_probe(struct platform_device *pdev)
7eb8f069be8a03 Andrzej Hajda2014-04-04  1724  {
2900c69c52079a Andrzej Hajda2014-10-07  1725struct device *dev = 
>dev;
7eb8f069be8a03 Andrzej Hajda2014-04-04  1726struct resource *res;
7eb8f069be8a03 Andrzej Hajda2014-04-04  1727struct exynos_dsi *dsi;
0ff03fd164a4f2 Hyungwon Hwang   2015-06-12  1728int ret, i;
7eb8f069be8a03 Andrzej Hajda2014-04-04  1729
2900c69c52079a Andrzej Hajda2014-10-07  1730dsi = devm_kzalloc(dev, 
sizeof(*dsi), GFP_KERNEL);
2900c69c52079a Andrzej Hajda2014-10-07  1731if (!dsi)
2900c69c52079a Andrzej Hajda2014-10-07  1732return -ENOMEM;
2900c69c52079a Andrzej Hajda2014-10-07  1733
e17ddecc3aa519 YoungJun Cho 2014-07-22  1734/* To be checked as 
invalid one */
e17ddecc3aa519 YoungJun Cho 2014-07-22  1735dsi->te_gpio = -ENOENT;
e17ddecc3aa519 YoungJun Cho 2014-07-22  1736
7eb8f069be8a03 Andrzej Hajda2014-04-04  1737
init_completion(>completed);
7eb8f069be8a03 Andrzej Hajda2014-04-04  1738
spin_lock_init(>transfer_lock);
7eb8f069be8a03 Andrzej Hajda2014-04-04  1739
INIT_LIST_HEAD(>transfer_list);
7eb8f069be8a03 Andrzej Hajda2014-04-04  1740
7eb8f069be8a03 Andrzej Hajda2014-04-04  1741dsi->dsi_host.ops = 
_dsi_ops;
e2d2a1e0a26472 Andrzej Hajda2014-10-07  1742dsi->dsi_host.dev = dev;
7eb8f069be8a03 Andrzej Hajda2014-04-04  1743
e2d2a1e0a26472 Andrzej Hajda2014-10-07  1744dsi->dev = dev;
2154ac9229c10f Marek Szyprowski 2016-04-19  1745dsi->driver_data = 
of_device_get_match_data(dev);
7eb8f069be8a03 Andrzej Hajda2014-04-04  1746
7eb8f069be8a03 Andrzej Hajda2014-04-04  1747ret = 
exynos_dsi_parse_dt(dsi);
7eb8f069be8a03 Andrzej Hajda2014-04-04  1748if (ret)
8665040850e3cb Andrzej Hajda2015-06-11  1749return ret;
7eb8f069be8a03 Andrzej Hajda2014-04-04  1750
7eb8f069be8a03 Andrzej Hajda2014-04-04  1751dsi->supplies[0].supply 
= "vddcore";
7eb8f069be8a03 Andrzej Hajda2014-04-04  1752dsi->supplies[1].supply 
= "vddio";
e2d2a1e0a26472 Andrzej Hajda2014-10-07  1753ret = 
devm_regulator_bulk_get(dev, ARRAY_SIZE(dsi->supplies),
7eb8f069be8a03 Andrzej Hajda2014-04-04  1754
  dsi->supplies);
7eb8f069be8a03 Andrzej Hajda2014-04-04  1755if (ret) {
e2d2a1e0a26472 Andrzej Hajda2014-10-07  1756dev_info(dev, 
"failed to get regulators: %d\n", ret);
7eb8f069be8a03 Andrzej Hajda2014-04-04  1757return 
-EPROBE_DEFER;
7eb8f069be8a03 Andrzej Hajda2014-04-04  1758}
7eb8f069be8a03 Andrzej Hajda2014-04-04  1759
a86854d0c599b3 Kees Cook2018-06-12  1760dsi->clks = 
devm_kcalloc(dev,
a86854d0c599b3 Kees Cook2018-06-12  1761
dsi->driver_data->num_clks, sizeof(*dsi->clks),
0ff03fd164a4f2 Hyungwon Hwang   2015-06-12  1762
GFP_KERNEL);
e6f988a4585762 Hyungwon Hwang   2015-06-12  1763if (!dsi->clks)
e6f988a4585762 Hyungwon Hwang   2015-06-12  1764return -ENOMEM;
e6f988a4585762 Hyungwon Hwang   2015-06-12  1765
0ff03fd164a4f2 Hyungwon Hwang   2015-06-12  1766for (i = 0; i < 
dsi->driver_data->num_clks; i++) {
0ff03fd164a4f2 Hyungwon Hwang   2015-06-12  1767dsi->clks[i] = 
devm_clk_get(dev, clk_names[i]);
0ff03fd164a4f2 Hyungwon Hwang   2015-06-12  1768if 
(IS_ERR(dsi->clks[i])) {
0ff03fd164a4f2

Re: [PATCH v3 0/2] mmc: sdhci-milbeaut: add Milbeaut SD driver

2019-09-30 Thread orito.takao


Hello

Does anyone have any comments on this ?

> The following patches add driver for SD Host controller on
> Socionext's Milbeaut M10V platforms.
> 
> SD Host controller on Milbeaut consists of two controller parts.
> One is core controller F_SDH30, this is similar to sdhci-fujitsu
> controller.
> Another is bridge controller. This bridge controller is not compatible
> with sdhci-fujitsu controller. This is special for Milbeaut series.
> 
> It has the several parts,
>  - reset control
>  - clock enable / select for SDR50/25/12
>  - hold control of DATA/CMD line
>  - select characteristics for WP/CD/LED line
>  - Re-tuning control for mode3
>  - Capability setting
>Timeout Clock / Base Clock / Timer Count for Re-Tuning /
>Debounce period
> These requires special procedures at reset or clock enable/change or
>  further tuning of clock.
> 
> Takao Orito (2):
>   dt-bindings: mmc: add DT bindings for Milbeaut SD controller
>   mmc: sdhci-milbeaut: add Milbeaut SD controller driver
> 
>  .../devicetree/bindings/mmc/sdhci-milbeaut.txt |  30 ++
>  drivers/mmc/host/Kconfig   |  11 +
>  drivers/mmc/host/Makefile  |   1 +
>  drivers/mmc/host/sdhci-milbeaut.c  | 362 
> +
>  drivers/mmc/host/sdhci_f_sdh30.c   |  26 +-
>  drivers/mmc/host/sdhci_f_sdh30.h   |  32 ++
>  6 files changed, 437 insertions(+), 25 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/mmc/sdhci-milbeaut.txt
>  create mode 100644 drivers/mmc/host/sdhci-milbeaut.c
>  create mode 100644 drivers/mmc/host/sdhci_f_sdh30.h
> 
> -- 
> 1.9.1
> 

Thanks
Orito

-
Takao Orito
Socionext Inc.
E-mail:orito.ta...@socionext.com
Tel:+81-80-9815-1460
-

RE: [PATCH] x86/PAT: priority the PAT warn to error to highlight the developer

2019-09-30 Thread Zhang, Jun

Please see my comments.

Thanks,
Jun

-Original Message-
From: Borislav Petkov  
Sent: Monday, September 30, 2019 8:03 PM
To: Zhang, Jun 
Cc: dave.han...@linux.intel.com; l...@kernel.org; pet...@infradead.org; 
t...@linutronix.de; mi...@redhat.com; h...@zytor.com; He, Bo ; 
x...@kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/PAT: priority the PAT warn to error to highlight the 
developer

On Sun, Sep 29, 2019 at 03:20:31PM +0800, jun.zh...@intel.com wrote:
> From: zhang jun 
> 
> Documentation/x86/pat.txt says:
> set_memory_uc() or set_memory_wc() must use together with 
> set_memory_wb()

I had to open that file to see what it actually says - btw, the filename is 
pat.rst now - and you're very heavily paraphrasing what is there. So try again 
explaining what the requirement is.
[ZJ] next parts come from pat.txt in kernel version 4.19
Drivers wanting to export some pages to userspace do it by using mmap
interface and a combination of
1) pgprot_noncached()
2) io_remap_pfn_range() or remap_pfn_range() or vm_insert_pfn()

With PAT support, a new API pgprot_writecombine is being added. So, drivers can
continue to use the above sequence, with either pgprot_noncached() or
pgprot_writecombine() in step 1, followed by step 2.

In addition, step 2 internally tracks the region as UC or WC in memtype
list in order to ensure no conflicting mapping.

Note that this set of APIs only works with IO (non RAM) regions. If driver
wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
as step 0 above and also track the usage of those pages and use set_memory_wb()
before the page is freed to free pool.

> if break the PAT attribute, there are tons of warning like:
> [   45.846872] x86/PAT: NDK MediaCodec_:3753 map pfn RAM range req

That's some android NDK thing, it seems: "The Android NDK is a toolset that 
lets you implement parts of your app in native code,... " lemme guess, they 
have a kernel module?
[ZJ] no, "NDK MediaCodec_" is an android codec2.0 process. It want to use WC 
memory.

> write-combining for [mem 0x1e7a8-0x1e7a87fff], got write-back and 
> in the extremely case, we see kernel panic unexpected like:
> list_del corruption. prev->next should be 88806dbe69c0, but was 
> 888036f048c0

This is not really helpful. You need to explain what exactly you're doing - not 
shortening the error messages.
[ZJ] android codec2.0 want to use WC memory. Which use ion to allocate memory. 
So, we enable drivers/staging/android/ion, which work well except X86, x86 need 
to set_memory_wc().
So there are tons of warning, then list_del corruption. I use this 
patch(https://www.lkml.org/lkml/2019/9/29/25), list crash disappear.
Next is error message.
<4>[49967.389732] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091f8fff], got write-back
<4>[49967.389747] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x1091f4000-0x1091f7fff], got write-back
<4>[49967.390622] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x10909-0x109090fff], got write-back
<4>[49967.390687] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091fbfff], got write-back
<4>[49967.390855] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f4000-0x1091f4fff], got write-back
<4>[49967.391405] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x109098000-0x109098fff], got write-back
<4>[49967.391454] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091f8fff], got write-back
<4>[49967.391474] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x1091f4000-0x1091f7fff], got write-back
<4>[49967.392641] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x14eb68000-0x14eb68fff], got write-back
<4>[49967.392708] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091fbfff], got write-back
<4>[49967.393001] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f4000-0x1091f4fff], got write-back
<4>[49967.394066] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091f8fff], got write-back
<6>[50045.677129] binder: 3390:3390 transaction failed 29189/-22, size 88-0 
line 3131
<3>[50046.153621] list_del corruption. prev->next should be 89598004c960, 
but was 895ad46e4590
<4>[50046.163464] invalid opcode:  [#1] PREEMPT SMP NOPTI
<4>[50046.169297] CPU: 1 PID: 18792 Comm: Binder:3390_1B Tainted: G U O 
 4.19.68-PKT-190905T163945Z-00031-g9de920e66b4e #1
<4>[50046.182213] RIP: 0010:__list_del_entry_valid+0x78/0x90
<4>[50046.187947] Code: e8 00 a1 c1 ff 0f 0b 48 89 fe 48 c7 c7 60 1a b6 a5 e8 
ef a0 c1 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 98 1a b6 a5 e8 db a0 c1 ff <0f> 0b

Re: What populates /proc/partitions ?

2019-09-30 Thread David F.

It has something to do with devtmpfs that causes it.   If I could set
this GENHD_FL_HIDDEN flag on it, it would solve the problem on those
system that say the have a floppy but doesn't really exist.Is
something built-in to allow that or it's up to the driver itself?

On Mon, Sep 30, 2019 at 8:38 PM David F.  wrote:
>
> Well, it's not straightforward.  No direct calls, it must be somehow
> when kmod is used to load the module.  The only difference I see in
> the udevadm output is the old system has attribute differences
> capability new==11, old==1, event_poll_msec new=2000, old=-1.  I
> figured if i could set the "hidden" attribute to 1 then it looks like
> /proc/partitions won't list it (already "removable"attribute), but
> udev doesn't seem to allow changing the attributes, only referencing
> them. unless I'm missing something?
>
> On Mon, Sep 30, 2019 at 5:13 PM David F.  wrote:
> >
> > Thanks for the replies.   I'll see if I can figure this out.   I know
> > with the same kernel and older udev version in use that it didn't add
> > it, but with the new udev (eudev) it does (one big difference is the
> > new one requires and uses devtmpfs and the old one didn't).
> >
> > I tried making the floppy a module but it still loads on vmware player
> > and the physical test system I'm using that doesn't have one but
> > reports it as existing (vmware doesn't hang, just adds fd0 read errors
> > to log, but physical system does hang while fdisk -l, mdadm --scan
> > runs, etc..).
> >
> > As far as the log, debugging udev output, it's close to the same, the
> > message log (busybox) not much in there to say what's up.   I even
> > tried the old .rules that were being used with the old udev version,
> > but made no difference.
> >
> > On Mon, Sep 30, 2019 at 4:49 PM Randy Dunlap  wrote:
> > >
> > > On 9/30/19 3:47 PM, David F. wrote:
> > > > Hi,
> > > >
> > > > I want to find out why fd0 is being added to /proc/partitions and stop
> > > > that for my build.  I've searched "/proc/partitions" and "partitions",
> > > > not finding anything that matters.
> > >
> > > /proc/partitions is produced on demand by causing a read of it.
> > > That is done by these functions (pointers) in block/genhd.c:
> > >
> > > static const struct seq_operations partitions_op = {
> > > .start  = show_partition_start,
> > > .next   = disk_seqf_next,
> > > .stop   = disk_seqf_stop,
> > > .show   = show_partition
> > > };
> > >
> > > in particular, show_partition().  In turn, that function uses data that 
> > > was
> > > produced upon block device discovery, also in block/genhd.c.
> > > See functions disk_get_part(), disk_part_iter_init(), 
> > > disk_part_iter_next(),
> > > disk_part_iter_exit(), __device_add_disk(), and get_gendisk().
> > >
> > > > If udev is doing it, what function is it call so I can search on that?
> > >
> > > I don't know about that.  I guess in the kernel it is about "uevents".
> > > E.g., in block/genhd.c, there are some calls to kobject_uevent() or 
> > > variants
> > > of it.
> > >
> > > > TIA!!
> > >
> > > There should be something in your boot log about "fd" or "fd0" or floppy.
> > > eh?
> > >
> > > --
> > > ~Randy

Re: [PATCH v2 0/3] Add support for SBI v0.2

2019-09-30 Thread Alan Kao

On Fri, Sep 27, 2019 at 10:57:45PM +, Atish Patra wrote:
> On Fri, 2019-09-27 at 15:19 -0700, Christoph Hellwig wrote:
> > On Thu, Sep 26, 2019 at 05:09:12PM -0700, Atish Patra wrote:
> > > The Supervisor Binary Interface(SBI) specification[1] now defines a
> > > base extension that provides extendability to add future extensions
> > > while maintaining backward compatibility with previous versions.
> > > The new version is defined as 0.2 and older version is marked as
> > > 0.1.
> > > 
> > > This series adds support v0.2 and a unified calling convention
> > > implementation between 0.1 and 0.2. It also adds minimal SBI
> > > functions
> > > from 0.2 as well to keep the series lean. 
> > 
> > So before we do this game can be please make sure we have a clean 0.2
> > environment that never uses the legacy extensions as discussed
> > before?
> > Without that all this work is rather futile.
> > 
> 
> As per our discussion offline, here are things need to be done to
> achieve that.
> 
> 1. Replace timer, sfence and ipi with better alternative APIs
>   - sbi_set_timer will be same but with new calling convention
>   - send_ipi and sfence_* apis can be modified in such a way that
>   - we don't have to use unprivileged load anymore
>   - Make it scalable
> 
> 2. Drop clear_ipi, console, and shutdown in 0.2.
> 
> We will have a new kernel config (LEGACY_SBI) that can be manually
> enabled if older firmware need to be used. By default, LEGACY_SBI will
> be disabled and kernel with new SBI will be built. We will have to set
> a flag day in a year or so when we can remove the LEGACY_SBI
> completely.
> 
> Let us know if it is not an acceptable approach to anybody.
> I will post a RFC patch with new alternate v0.2 APIs sometime next
> week.
>

Will this legacy option be compatible will bbl?  says, version 1.0.0 or
any earlier ones?

> > ___
> > linux-riscv mailing list
> > linux-ri...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> -- 
> Regards,
> Atish

Re: [PATCH -next] bfa: Make restart_bfa static

2019-09-30 Thread Martin K. Petersen



YueHaibing,

> Fix sparse warning:
>
> drivers/scsi/bfa/bfad.c:1491:1: warning:
>  symbol 'restart_bfa' was not declared. Should it be static?

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 0/2] scsi: ufs: Add driver for TI wrapper for Cadence UFS IP

2019-09-30 Thread Martin K. Petersen



Vignesh,

> This series add DT bindings and driver for TI wrapper for Cadence UFS
> IP that is present on TI's J721e SoC

Will need some reviews from DT and ufs folks respectively before I can
queue this up.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v2] scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry()

2019-09-30 Thread Martin K. Petersen



Daniel,

> Commit 88263208dd23 ("scsi: qla2xxx: Complain if sp->done() is not
> called from the completion path") introduced the WARN_ON_ONCE in
> qla2x00_status_cont_entry(). The assumption was that there is only one
> status continuations element. According to the firmware documentation
> it is possible that multiple status continuations are emitted by the
> firmware.

Applied to 5.4/scsi-fixes. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: What populates /proc/partitions ?

2019-09-30 Thread David F.

Well, it's not straightforward.  No direct calls, it must be somehow
when kmod is used to load the module.  The only difference I see in
the udevadm output is the old system has attribute differences
capability new==11, old==1, event_poll_msec new=2000, old=-1.  I
figured if i could set the "hidden" attribute to 1 then it looks like
/proc/partitions won't list it (already "removable"attribute), but
udev doesn't seem to allow changing the attributes, only referencing
them. unless I'm missing something?

On Mon, Sep 30, 2019 at 5:13 PM David F.  wrote:
>
> Thanks for the replies.   I'll see if I can figure this out.   I know
> with the same kernel and older udev version in use that it didn't add
> it, but with the new udev (eudev) it does (one big difference is the
> new one requires and uses devtmpfs and the old one didn't).
>
> I tried making the floppy a module but it still loads on vmware player
> and the physical test system I'm using that doesn't have one but
> reports it as existing (vmware doesn't hang, just adds fd0 read errors
> to log, but physical system does hang while fdisk -l, mdadm --scan
> runs, etc..).
>
> As far as the log, debugging udev output, it's close to the same, the
> message log (busybox) not much in there to say what's up.   I even
> tried the old .rules that were being used with the old udev version,
> but made no difference.
>
> On Mon, Sep 30, 2019 at 4:49 PM Randy Dunlap  wrote:
> >
> > On 9/30/19 3:47 PM, David F. wrote:
> > > Hi,
> > >
> > > I want to find out why fd0 is being added to /proc/partitions and stop
> > > that for my build.  I've searched "/proc/partitions" and "partitions",
> > > not finding anything that matters.
> >
> > /proc/partitions is produced on demand by causing a read of it.
> > That is done by these functions (pointers) in block/genhd.c:
> >
> > static const struct seq_operations partitions_op = {
> > .start  = show_partition_start,
> > .next   = disk_seqf_next,
> > .stop   = disk_seqf_stop,
> > .show   = show_partition
> > };
> >
> > in particular, show_partition().  In turn, that function uses data that was
> > produced upon block device discovery, also in block/genhd.c.
> > See functions disk_get_part(), disk_part_iter_init(), disk_part_iter_next(),
> > disk_part_iter_exit(), __device_add_disk(), and get_gendisk().
> >
> > > If udev is doing it, what function is it call so I can search on that?
> >
> > I don't know about that.  I guess in the kernel it is about "uevents".
> > E.g., in block/genhd.c, there are some calls to kobject_uevent() or variants
> > of it.
> >
> > > TIA!!
> >
> > There should be something in your boot log about "fd" or "fd0" or floppy.
> > eh?
> >
> > --
> > ~Randy

Re: [PATCH] scsi: libcxgbi: remove unused function to stop warning

2019-09-30 Thread Martin K. Petersen



Austin,

> Since 'commit fc8d0590d914 ("libcxgbi: Add ipv6 api to driver")' was 
> introduced, there is no call to csk_print_port() 
> and csk_print_ip() is made.

Applied to 5.5/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

KMSAN: uninit-value in adu_disconnect

2019-09-30 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:014077b5 DO-NOT-SUBMIT: usb-fuzzer: main usb gadget fuzzer..
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=108b582660
kernel config:  https://syzkaller.appspot.com/x/.config?x=f03c659d0830ab8d
dashboard link: https://syzkaller.appspot.com/bug?extid=224d4aba0201decca39c
compiler:   clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)

syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15abd5c160
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=168e3a3a60

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+224d4aba0201decca...@syzkaller.appspotmail.com

usb 1-1: USB disconnect, device number 23
==
BUG: KMSAN: uninit-value in adu_disconnect+0x302/0x360  
drivers/usb/misc/adutux.c:774

CPU: 0 PID: 3372 Comm: kworker/0:2 Not tainted 5.3.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x191/0x1f0 lib/dump_stack.c:113
 kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109
 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294
 adu_disconnect+0x302/0x360 drivers/usb/misc/adutux.c:774
 usb_unbind_interface+0x3a2/0xdd0 drivers/usb/core/driver.c:423
 __device_release_driver drivers/base/dd.c:1120 [inline]
 device_release_driver_internal+0x911/0xd20 drivers/base/dd.c:1151
 device_release_driver+0x4b/0x60 drivers/base/dd.c:1174
 bus_remove_device+0x4bf/0x670 drivers/base/bus.c:556
 device_del+0xcd5/0x1d10 drivers/base/core.c:2339
 usb_disable_device+0x567/0x1150 drivers/usb/core/message.c:1241
 usb_disconnect+0x51e/0xd60 drivers/usb/core/hub.c:2199
 hub_port_connect drivers/usb/core/hub.c:4949 [inline]
 hub_port_connect_change drivers/usb/core/hub.c:5213 [inline]
 port_event drivers/usb/core/hub.c:5359 [inline]
 hub_event+0x3fd0/0x72f0 drivers/usb/core/hub.c:5441
 process_one_work+0x1572/0x1ef0 kernel/workqueue.c:2269
 worker_thread+0x111b/0x2460 kernel/workqueue.c:2415
 kthread+0x4b5/0x4f0 kernel/kthread.c:256
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:355

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:189 [inline]
 kmsan_internal_poison_shadow+0x58/0xb0 mm/kmsan/kmsan.c:148
 kmsan_slab_free+0x8d/0x100 mm/kmsan/kmsan_hooks.c:195
 slab_free_freelist_hook mm/slub.c:1472 [inline]
 slab_free mm/slub.c:3038 [inline]
 kfree+0x4c1/0x2db0 mm/slub.c:3980
 adu_delete drivers/usb/misc/adutux.c:151 [inline]
 adu_release+0x95f/0xa50 drivers/usb/misc/adutux.c:332
 __fput+0x4c9/0xba0 fs/file_table.c:280
 fput+0x37/0x40 fs/file_table.c:313
 task_work_run+0x22e/0x2a0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop arch/x86/entry/common.c:163 [inline]
 prepare_exit_to_usermode+0x39d/0x4d0 arch/x86/entry/common.c:194
 syscall_return_slowpath+0x90/0x610 arch/x86/entry/common.c:274
 do_syscall_64+0xe2/0xf0 arch/x86/entry/common.c:300
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
==


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

[PATCH v2] perf script python: integrate page reclaim analyze script

2019-09-30 Thread Yafang Shao

A new perf script page-reclaim is introduced in this patch. This new script
is used to report the page reclaim details. The possible usage of this
script is as bellow,
- identify latency spike caused by direct reclaim
- whehter the latency spike is relevant with pageout
- why is page reclaim requested, i.e. whether it is because of memory
  fragmentation
- page reclaim efficiency
etc
In the future we may also enhance it to analyze the memcg reclaim.

Bellow is how to use this script,
# Record, one of the following
$ perf record -e 'vmscan:mm_vmscan_*' ./workload
$ perf script record page-reclaim

# Report
$ perf script report page-reclaim

# Report per process latency
$ perf script report page-reclaim -- -p

# Report per process latency details. At what time and how long it
# stalls at each time.
$ perf script report page-reclaim -- -v

An example of doing mmtests,
$ perf script report page-reclaim
Direct reclaims: 4924
Direct latency (ms)total max avg min
  177823.2116378.977  36.114   0.051
Direct file reclaimed 22920
Direct file scanned 28306
Direct file sync write I/O 0
Direct file async write I/O 0
Direct anon reclaimed 212567
Direct anon scanned 1446854
Direct anon sync write I/O 0
Direct anon async write I/O 278325
Direct order  0 1 3
   48702331
Wake kswapd requests 716
Wake order  0 1
  715 1

Kswapd reclaims: 9
Kswapd latency (ms)total max avg min
   86353.046   42128.8169594.783 120.736
Kswapd file reclaimed 366461
Kswapd file scanned 369554
Kswapd file sync write I/O 0
Kswapd file async write I/O 0
Kswapd anon reclaimed 362594
Kswapd anon scanned 693938
Kswapd anon sync write I/O 0
Kswapd anon async write I/O 330663
Kswapd order  0 1 3
  3 1 5
Kswapd re-wakes 705

$ perf script report page-reclaim -- -p
# besides the above basic output, it will also summary per task
# latency
Per process latency (ms):
 pid[comm] total max avg min
   1[systemd]276.764 248.933   21.29   0.293
 163[kswapd0]  86353.046   42128.8169594.783 120.736
7241[bash] 12787.749 859.091  94.028   0.163
1592[master]  81.604  70.811  27.201   2.906
1595[pickup] 496.162 374.168 165.387  14.478
1098[auditd]   19.32   19.32   19.32   19.32
1120[irqbalance]5232.3311386.352 158.555   0.169
7236[usemem]79649.041763.281  24.921   0.051
1605[sshd]   1344.41 645.125  34.4720.16
7238[bash]   1158.921023.307 231.784   0.067
7239[bash] 15100.776 993.447  82.069   0.145
...

$ per script report page-reclaim -- -v
# Besides the basic output, it will asl show per task latency details
Per process latency (ms):
 pid[comm] total max avg min
   timestamp  latency(ns)
   1[systemd]276.764 248.933   21.29   0.293
   3406860552338: 16819800
   3406877381650: 5532855
   3407458799399: 929517
   3407459796042: 916682
   3407460763220: 418989
   3407461250236: 332355
   3407461637534: 401731
   3407462092234: 449219
   3407462605855: 292857
   3407462952343: 372700
   3407463364947: 414880
   3407463829547: 949162
   3407464813883: 248933444
 163[kswapd0]  86353.046   42128.8169594.783 120.736
   3357637025977: 1026962745
   3358915619888: 41268642175
   3400239664127: 42128816204
   3443784780373: 679641989
   3444847948969: 120735792
   3445001978784: 342713657
   3445835850664: 316851589
   3446865035476: 247457873
   3449355401352: 221223878
  ...

This script must be in sync with bellow vmscan tracepoints,
mm_vmscan_direct_reclaim_begin
mm_vmscan_direct_reclaim_end
mm_vmscan_kswapd_wake
mm_vmscan_kswapd_sleep
mm_vmscan_wakeup_kswapd
mm_vmscan_lru_shrink_inactive
mm_vmscan_writepage

Signed-off-by: Yafang Shao 
Cc: Tony Jones 
Cc: Mel Gorman 
---
v1->v2: verified it with python2.7 and python3.6.
add vmscan tracepoints comments in this script

---
 tools/perf/scripts/python/bin/page-reclaim-record |   2 +
 tools/perf/scripts/python/bin/page-reclaim-report |   4 +
 tools/perf/scripts/python/page-reclaim.py | 393 ++
 3 files changed,

Re: [PATCH] kasan: fix the missing underflow in memmove and memcpy with CONFIG_KASAN_GENERIC=y

2019-09-30 Thread Walter Wu

On Tue, 2019-10-01 at 05:01 +0200, Dmitry Vyukov wrote:
> On Tue, Oct 1, 2019 at 4:36 AM Walter Wu  wrote:
> >
> > On Mon, 2019-09-30 at 10:57 +0200, Marc Gonzalez wrote:
> > > On 30/09/2019 06:36, Walter Wu wrote:
> > >
> > > >  bool check_memory_region(unsigned long addr, size_t size, bool write,
> > > > unsigned long ret_ip)
> > > >  {
> > > > +   if (long(size) < 0) {
> > > > +   kasan_report_invalid_size(src, dest, len, _RET_IP_);
> > > > +   return false;
> > > > +   }
> > > > +
> > > > return check_memory_region_inline(addr, size, write, ret_ip);
> > > >  }
> > >
> > > Is it expected that memcpy/memmove may sometimes (incorrectly) be passed
> > > a negative value? (It would indeed turn up as a "large" size_t)
> > >
> > > IMO, casting to long is suspicious.
> > >
> > > There seem to be some two implicit assumptions.
> > >
> > > 1) size >= ULONG_MAX/2 is invalid input
> > > 2) casting a size >= ULONG_MAX/2 to long yields a negative value
> > >
> > > 1) seems reasonable because we can't copy more than half of memory to
> > > the other half of memory. I suppose the constraint could be even tighter,
> > > but it's not clear where to draw the line, especially when considering
> > > 32b vs 64b arches.
> > >
> > > 2) is implementation-defined, and gcc works "as expected" (clang too
> > > probably) https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
> > >
> > > A comment might be warranted to explain the rationale.
> > > Regards.
> >
> > Thanks for your suggestion.
> > Yes, It is passed a negative value issue in memcpy/memmove/memset.
> > Our current idea should be assumption 1 and only consider 64b arch,
> > because KASAN only supports 64b. In fact, we really can't use so much
> > memory in 64b arch. so assumption 1 make sense.
> 
> Note there are arm KASAN patches floating around, so we should not
> make assumptions about 64-bit arch.
I think arm KASAN patch doesn't merge in mainline, because virtual
memory of shadow memory is so bigger, the kernel virtual memory only has
1GB or 2GB in 32-bit arch, it is hard to solve the issue. it may need
some trade-off.

> 
> But there seems to be a number of such casts already:
> 
It seems that everyone is the same assumption.

> $ find -name "*.c" -exec egrep "\(long\).* < 0" {} \; -print
> } else if ((long) delta < 0) {
> ./kernel/time/timer.c
> if ((long)state < 0)
> ./drivers/thermal/thermal_sysfs.c
> if ((long)delay < 0)
> ./drivers/infiniband/core/addr.c
> if ((long)tmo < 0)
> ./drivers/net/wireless/st/cw1200/pm.c
> if (pos < 0 || (long) pos != pos || (ssize_t) count < 0)
> ./sound/core/info.c
> if ((long)hwrpb->sys_type < 0) {
> ./arch/alpha/kernel/setup.c
> if ((long)m->driver_data < 0)
> ./arch/x86/kernel/apic/apic.c
> if ((long) size < 0L)
> if ((long)addr < 0L) {
> ./arch/sparc/mm/init_64.c
> if ((long)lpid < 0)
> ./arch/powerpc/kvm/book3s_hv.c
> if ((long)regs->regs[insn.mm_i_format.rs] < 0)
> if ((long)regs->regs[insn.i_format.rs] < 0) {
> if ((long)regs->regs[insn.i_format.rs] < 0) {
> ./arch/mips/kernel/branch.c
> if ((long)arch->gprs[insn.i_format.rs] < 0)
> if ((long)arch->gprs[insn.i_format.rs] < 0)
> ./arch/mips/kvm/emulate.c
> if ((long)regs->regs[insn.i_format.rs] < 0)
> ./arch/mips/math-emu/cp1emu.c
> if ((int32_t)(long)prom_vec < 0) {
> ./arch/mips/sibyte/common/cfe.c
> if (msgsz > ns->msg_ctlmax || (long) msgsz < 0 || msqid < 0)
> if (msqid < 0 || (long) bufsz < 0)
> ./ipc/msg.c
> if ((long)x < 0)
> ./mm/page-writeback.c
> if ((long)(next - val) < 0) {
> ./mm/memcontrol.c

Re: [PATCH] scsi: smartpqi: clean up an indentation issue

2019-09-30 Thread Martin K. Petersen



Colin,

> There are some statements that are indented too deeply, remove the
> extraneous tabs and rejoin split lines.

Applied (additional hunk) to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: csiostor: clean up indentation issue

2019-09-30 Thread Martin K. Petersen



Colin,

> The goto statement is indented incorrectly, remove the extraneous tab.

Applied to 5.5/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

KASAN: use-after-free Read in batadv_iv_ogm_queue_add

2019-09-30 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:faeacb6d net: tap: clean up an indentation issue
git tree:   net
console output: https://syzkaller.appspot.com/x/log.txt?x=1241cbd360
kernel config:  https://syzkaller.appspot.com/x/.config?x=6c210ff0b9a35071
dashboard link: https://syzkaller.appspot.com/bug?extid=0cc629f19ccb8534935b
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0cc629f19ccb85349...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in memcpy include/linux/string.h:359 [inline]
BUG: KASAN: use-after-free in batadv_iv_ogm_aggregate_new  
net/batman-adv/bat_iv_ogm.c:544 [inline]
BUG: KASAN: use-after-free in batadv_iv_ogm_queue_add+0x31d/0x1120  
net/batman-adv/bat_iv_ogm.c:640

Read of size 24 at addr 888099112740 by task kworker/u4:2/2025

CPU: 1 PID: 2025 Comm: kworker/u4:2 Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
 __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
 kasan_report+0x12/0x17 mm/kasan/common.c:618
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x134/0x1a0 mm/kasan/generic.c:192
 memcpy+0x24/0x50 mm/kasan/common.c:122
 memcpy include/linux/string.h:359 [inline]
 batadv_iv_ogm_aggregate_new net/batman-adv/bat_iv_ogm.c:544 [inline]
 batadv_iv_ogm_queue_add+0x31d/0x1120 net/batman-adv/bat_iv_ogm.c:640
 batadv_iv_ogm_schedule+0x783/0xe50 net/batman-adv/bat_iv_ogm.c:797
 batadv_iv_send_outstanding_bat_ogm_packet+0x580/0x730  
net/batman-adv/bat_iv_ogm.c:1675

 process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 9774:
 save_stack+0x23/0x90 mm/kasan/common.c:69
 set_track mm/kasan/common.c:77 [inline]
 __kasan_kmalloc mm/kasan/common.c:493 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:466
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:507
 kmem_cache_alloc_trace+0x158/0x790 mm/slab.c:3550
 kmalloc include/linux/slab.h:552 [inline]
 batadv_iv_ogm_iface_enable+0x123/0x320 net/batman-adv/bat_iv_ogm.c:201
 batadv_hardif_enable_interface+0x276/0x950  
net/batman-adv/hard-interface.c:761

 batadv_softif_slave_add+0x8f/0x100 net/batman-adv/soft-interface.c:892
 do_set_master net/core/rtnetlink.c:2369 [inline]
 do_set_master+0x1ca/0x230 net/core/rtnetlink.c:2343
 do_setlink+0xa85/0x3520 net/core/rtnetlink.c:2504
 rtnl_setlink+0x273/0x3d0 net/core/rtnetlink.c:2763
 rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223
 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
 netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
 netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:657
 sock_write_iter+0x27c/0x3e0 net/socket.c:989
 call_write_iter include/linux/fs.h:1880 [inline]
 do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
 do_iter_write fs/read_write.c:970 [inline]
 do_iter_write+0x184/0x610 fs/read_write.c:951
 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
 do_writev+0x15b/0x330 fs/read_write.c:1058
 __do_sys_writev fs/read_write.c:1131 [inline]
 __se_sys_writev fs/read_write.c:1128 [inline]
 __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 7:
 save_stack+0x23/0x90 mm/kasan/common.c:69
 set_track mm/kasan/common.c:77 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:455
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:463
 __cache_free mm/slab.c:3425 [inline]
 kfree+0x10a/0x2c0 mm/slab.c:3756
 batadv_iv_ogm_iface_disable+0x39/0x80 net/batman-adv/bat_iv_ogm.c:220
 batadv_hardif_disable_interface.cold+0x4b4/0x87b  
net/batman-adv/hard-interface.c:875
 batadv_softif_destroy_netlink+0xa9/0x130  
net/batman-adv/soft-interface.c:1146

 default_device_exit_batch+0x25c/0x410 net/core/dev.c:9830
 ops_exit_list.isra.0+0xfc/0x150 net/core/net_namespace.c:175
 cleanup_net+0x4e2/0xa60 net/core/net_namespace.c:594
 process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at 888099112740
 which belongs to the cache

Re: [PATCH v2] scsi: core: Log SCSI command age with errors

2019-09-30 Thread Martin K. Petersen



Milan,

> Couple of users had requested to print the SCSI command age along with
> command failure errors. This is a small change, but allows users to
> get more important information about the command that was failed, it
> would help the users in debugging the command failures:

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v2] scsi: qedf: Add port_id getter

2019-09-30 Thread Martin K. Petersen



Daniel,

> Add qedf_get_host_port_id() to the transport template.
>
> The fc_transport_template initializes the port_id member to the
> default value of -1. The new getter ensures that the sysfs entry shows
> the current value and not the default one, e.g by using 'lsscsi -H -t'

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: Problem sharing interrupts between gpioa and uart0 on Broadcom Hurricane 2 (iProc)

2019-09-30 Thread Chris Packham

Hi Florian,

On Mon, 2019-09-30 at 19:54 -0700, Florian Fainelli wrote:
> 
> On 9/30/2019 7:33 PM, Chris Packham wrote:
> > Hi,
> > 
> > We have a platform using the BCM53344 integrated switch/CPU. This is
> > part of the Hurricane 2 (technically Wolfhound) family of devices.
> > 
> > Currently we're using pieces of Broadcom's "iProc" SDK based on an out
> > of date kernel and we'd very much like to be running as close to
> > upstream as possible. The fact that the Ubiquiti UniFi Switch 8 is
> > upstream gives me some hope.
> 
> FYI, I could not get enough information from the iProc SDK to port (or
> not) the clock driver, so if nothing else, that is an area that may
> require immediate work (though sometimes fixed-clocks would do just fine).

Setting a fixed clock seems to work for me. At least for now.

> 
> > 
> > My current problem is the fact that the uart0 interrupt is shared with
> > the Chip Common A gpio block. When I have and interrupt node on the
> > gpio in the device tree I get an init exit at startup. If I remove the
> > interrupt node the system will boot (except I don't get cascaded
> > interrupts from the GPIOs).
> > 
> > Looking at the pinctrl-nsp-gpio.c it looks as though I might be able to
> > make this work if I can convince the gpio code to return IRQ_HANDLED or
> > IRQ_NONE but I'm struggling against the fact that the pinctrl-iproc-
> > gpio.c defers it's interrupt handing to the gpio core.
> 
> Not sure I follow you here, what part is being handed to gpiolib? The
> top interrupt handler under nsp_gpio_irq_handler() will try to do
> exactly as you described. In fact, there are other iProc designs where
> "gpio-a" and another interrupt, arch/arm/boot/dts/bcm-nsp.dtsi is one
> such example and I never had problems with that part of NSP.
> 

nsp_gpio_probe() creates the irq domain directly and
nsp_gpio_irq_handler() directly deals with sharing by returning
IRQ_HANDLED or IRQ_NONE depending on whether it has a bit set.

iproc_gpio_probe() on the sets iproc_gpio_irq_handler() as the
parent_handler and defers to gpiolib to deal with the irq domain etc.

I'm currently assuming this is why I can't have uart0 and gpio
interrupts. But of course I could be completely wrong.

> > 
> > Is there any way I can get the gpio core to deal with the shared
> > interrupt?

Re: [PATCH] kasan: fix the missing underflow in memmove and memcpy with CONFIG_KASAN_GENERIC=y

2019-09-30 Thread Dmitry Vyukov

On Tue, Oct 1, 2019 at 4:36 AM Walter Wu  wrote:
>
> On Mon, 2019-09-30 at 10:57 +0200, Marc Gonzalez wrote:
> > On 30/09/2019 06:36, Walter Wu wrote:
> >
> > >  bool check_memory_region(unsigned long addr, size_t size, bool write,
> > > unsigned long ret_ip)
> > >  {
> > > +   if (long(size) < 0) {
> > > +   kasan_report_invalid_size(src, dest, len, _RET_IP_);
> > > +   return false;
> > > +   }
> > > +
> > > return check_memory_region_inline(addr, size, write, ret_ip);
> > >  }
> >
> > Is it expected that memcpy/memmove may sometimes (incorrectly) be passed
> > a negative value? (It would indeed turn up as a "large" size_t)
> >
> > IMO, casting to long is suspicious.
> >
> > There seem to be some two implicit assumptions.
> >
> > 1) size >= ULONG_MAX/2 is invalid input
> > 2) casting a size >= ULONG_MAX/2 to long yields a negative value
> >
> > 1) seems reasonable because we can't copy more than half of memory to
> > the other half of memory. I suppose the constraint could be even tighter,
> > but it's not clear where to draw the line, especially when considering
> > 32b vs 64b arches.
> >
> > 2) is implementation-defined, and gcc works "as expected" (clang too
> > probably) https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
> >
> > A comment might be warranted to explain the rationale.
> > Regards.
>
> Thanks for your suggestion.
> Yes, It is passed a negative value issue in memcpy/memmove/memset.
> Our current idea should be assumption 1 and only consider 64b arch,
> because KASAN only supports 64b. In fact, we really can't use so much
> memory in 64b arch. so assumption 1 make sense.

Note there are arm KASAN patches floating around, so we should not
make assumptions about 64-bit arch.

But there seems to be a number of such casts already:

$ find -name "*.c" -exec egrep "\(long\).* < 0" {} \; -print
} else if ((long) delta < 0) {
./kernel/time/timer.c
if ((long)state < 0)
./drivers/thermal/thermal_sysfs.c
if ((long)delay < 0)
./drivers/infiniband/core/addr.c
if ((long)tmo < 0)
./drivers/net/wireless/st/cw1200/pm.c
if (pos < 0 || (long) pos != pos || (ssize_t) count < 0)
./sound/core/info.c
if ((long)hwrpb->sys_type < 0) {
./arch/alpha/kernel/setup.c
if ((long)m->driver_data < 0)
./arch/x86/kernel/apic/apic.c
if ((long) size < 0L)
if ((long)addr < 0L) {
./arch/sparc/mm/init_64.c
if ((long)lpid < 0)
./arch/powerpc/kvm/book3s_hv.c
if ((long)regs->regs[insn.mm_i_format.rs] < 0)
if ((long)regs->regs[insn.i_format.rs] < 0) {
if ((long)regs->regs[insn.i_format.rs] < 0) {
./arch/mips/kernel/branch.c
if ((long)arch->gprs[insn.i_format.rs] < 0)
if ((long)arch->gprs[insn.i_format.rs] < 0)
./arch/mips/kvm/emulate.c
if ((long)regs->regs[insn.i_format.rs] < 0)
./arch/mips/math-emu/cp1emu.c
if ((int32_t)(long)prom_vec < 0) {
./arch/mips/sibyte/common/cfe.c
if (msgsz > ns->msg_ctlmax || (long) msgsz < 0 || msqid < 0)
if (msqid < 0 || (long) bufsz < 0)
./ipc/msg.c
if ((long)x < 0)
./mm/page-writeback.c
if ((long)(next - val) < 0) {
./mm/memcontrol.c

Re: [PATCH][next] scsi: hisi_sas: fix spelling mistake "digial" -> "digital"

2019-09-30 Thread Martin K. Petersen



Colin,

> There is a spelling mistake in literal string. Fix it.

Applied to 5.5/scsi-queue. Thanks.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: bfa: release allocated memory in case of error

2019-09-30 Thread Martin K. Petersen



Navid,

> In bfad_im_get_stats if bfa_port_get_stats fails, allocated memory
> needs to be released.

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: Problem sharing interrupts between gpioa and uart0 on Broadcom Hurricane 2 (iProc)

2019-09-30 Thread Florian Fainelli

On 9/30/2019 7:33 PM, Chris Packham wrote:
> Hi,
> 
> We have a platform using the BCM53344 integrated switch/CPU. This is
> part of the Hurricane 2 (technically Wolfhound) family of devices.
> 
> Currently we're using pieces of Broadcom's "iProc" SDK based on an out
> of date kernel and we'd very much like to be running as close to
> upstream as possible. The fact that the Ubiquiti UniFi Switch 8 is
> upstream gives me some hope.

FYI, I could not get enough information from the iProc SDK to port (or
not) the clock driver, so if nothing else, that is an area that may
require immediate work (though sometimes fixed-clocks would do just fine).

> 
> My current problem is the fact that the uart0 interrupt is shared with
> the Chip Common A gpio block. When I have and interrupt node on the
> gpio in the device tree I get an init exit at startup. If I remove the
> interrupt node the system will boot (except I don't get cascaded
> interrupts from the GPIOs).
> 
> Looking at the pinctrl-nsp-gpio.c it looks as though I might be able to
> make this work if I can convince the gpio code to return IRQ_HANDLED or
> IRQ_NONE but I'm struggling against the fact that the pinctrl-iproc-
> gpio.c defers it's interrupt handing to the gpio core.

Not sure I follow you here, what part is being handed to gpiolib? The
top interrupt handler under nsp_gpio_irq_handler() will try to do
exactly as you described. In fact, there are other iProc designs where
"gpio-a" and another interrupt, arch/arm/boot/dts/bcm-nsp.dtsi is one
such example and I never had problems with that part of NSP.

> 
> Is there any way I can get the gpio core to deal with the shared
> interrupt?
-- 
Florian

Re: [PATCH] PCI: Add Loongson vendor ID and device IDs

2019-09-30 Thread Tiezhu Yang


On 09/30/2019 10:02 PM, Andrew Murray wrote:

On Mon, Sep 30, 2019 at 12:55:20PM +0800, Tiezhu Yang wrote:

Add the Loongson vendor ID and device IDs to pci_ids.h
to be used in the future.

The Loongson IDs can be found at the following link:
https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/tree/pci.ids

Co-developed-by: Lu Zeng 
Signed-off-by: Lu Zeng 
Signed-off-by: Tiezhu Yang 
---
  include/linux/pci_ids.h | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 21a5724..119639d 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -3111,4 +3111,23 @@

  #define PCI_VENDOR_ID_NCUBE0x10ff

+#define PCI_VENDOR_ID_LOONGSON 0x0014
+#define PCI_DEVICE_ID_LOONGSON_HT  0x7a00
+#define PCI_DEVICE_ID_LOONGSON_APB 0x7a02
+#define PCI_DEVICE_ID_LOONGSON_GMAC0x7a03
+#define PCI_DEVICE_ID_LOONGSON_OTG 0x7a04
+#define PCI_DEVICE_ID_LOONGSON_GPU_2K1000  0x7a05
+#define PCI_DEVICE_ID_LOONGSON_DC  0x7a06
+#define PCI_DEVICE_ID_LOONGSON_HDA 0x7a07
+#define PCI_DEVICE_ID_LOONGSON_SATA0x7a08
+#define PCI_DEVICE_ID_LOONGSON_PCIE_X1 0x7a09
+#define PCI_DEVICE_ID_LOONGSON_SPI 0x7a0b
+#define PCI_DEVICE_ID_LOONGSON_LPC 0x7a0c
+#define PCI_DEVICE_ID_LOONGSON_DMA 0x7a0f
+#define PCI_DEVICE_ID_LOONGSON_EHCI0x7a14
+#define PCI_DEVICE_ID_LOONGSON_GPU_7A1000  0x7a15
+#define PCI_DEVICE_ID_LOONGSON_PCIE_X4 0x7a19
+#define PCI_DEVICE_ID_LOONGSON_OHCI0x7a24
+#define PCI_DEVICE_ID_LOONGSON_PCIE_X8 0x7a29

Hi Tiezhu,

Thanks for the patch - however it is preferred to provide new PCI definitions
along with the drivers that use them. They don't provide any useful value
without drivers that use them.


Hi Andrew,

Thanks for your reply. This is the first step of the Loongson kernel team,
we will submit other related individual driver patches step by step in the
near future, these small patches make an easily understood change that can
be verified by reviewers.

Thanks,

Tiezhu Yang



Thanks,

Andrew Murray


+
  #endif /* _LINUX_PCI_IDS_H */
--
2.1.0

Re: [PATCH] phy-brcm-usb: Use devm_platform_ioremap_resource() in brcm_usb_phy_probe()

2019-09-30 Thread Florian Fainelli




On 9/26/2019 9:08 AM, Markus Elfring wrote:
> From: Markus Elfring 
> Date: Thu, 26 Sep 2019 18:00:14 +0200
> 
> Simplify this function implementation a bit by using
> a known wrapper function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH] scsi: fnic: make array dev_cmd_err static const, makes object smaller

2019-09-30 Thread Martin K. Petersen



Colin,

> Don't populate the array dev_cmd_err on the stack but instead make it
> static const. Makes the object code smaller by 80 bytes.

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period

2019-09-30 Thread kbuild test robot

Hi Like,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvm/linux-next]
[cannot apply to v5.4-rc1 next-20190930]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Like-Xu/perf-core-Provide-a-kernel-internal-interface-to-recalibrate-event-period/20191001-081543
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: sh-rsk7269_defconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 7.4.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   kernel/events/core.o: In function `_perf_ioctl':
>> core.c:(.text+0x7068): undefined reference to `__get_user_unknown'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] scsi: ufs: make array setup_attrs static const, makes object smaller

2019-09-30 Thread Martin K. Petersen



Colin,

> Don't populate the array setup_attrs on the stack but instead make it
> static const. Makes the object code smaller by 180 bytes.

Applied to 5.5/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: ips: make array 'options' static const, makes object smaller

2019-09-30 Thread Martin K. Petersen



Colin,

> Don't populate the array 'options' on the stack but instead make it
> static const. Makes the object code smaller by 143 bytes.

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: mvsas: remove redundant assignment to variable rc

2019-09-30 Thread Martin K. Petersen



Colin,

> The variable rc is being initialized with a value that is never read
> and is being re-assigned a little later on. The assignment is
> redundant and hence can be removed.

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: qla2xxx: remove redundant assignment to pointer host

2019-09-30 Thread Martin K. Petersen



Colin,

> The pointer host is being initialized with a value that is never read
> and is being re-assigned a little later on. The assignment is
> redundant and hence can be removed.

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH -next] scsi: smartpqi: remove set but not used variable 'ctrl_info'

2019-09-30 Thread Martin K. Petersen



YueHaibing,

> Fixes gcc '-Wunused-but-set-variable' warning:
>
> drivers/scsi/smartpqi/smartpqi_init.c: In function 'pqi_driver_version_show':
> drivers/scsi/smartpqi/smartpqi_init.c:6164:24: warning:
>  variable 'ctrl_info' set but not used [-Wunused-but-set-variable]

Applied to 5.5/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] kasan: fix the missing underflow in memmove and memcpy with CONFIG_KASAN_GENERIC=y

2019-09-30 Thread Walter Wu

On Mon, 2019-09-30 at 10:57 +0200, Marc Gonzalez wrote:
> On 30/09/2019 06:36, Walter Wu wrote:
> 
> >  bool check_memory_region(unsigned long addr, size_t size, bool write,
> > unsigned long ret_ip)
> >  {
> > +   if (long(size) < 0) {
> > +   kasan_report_invalid_size(src, dest, len, _RET_IP_);
> > +   return false;
> > +   }
> > +
> > return check_memory_region_inline(addr, size, write, ret_ip);
> >  }
> 
> Is it expected that memcpy/memmove may sometimes (incorrectly) be passed
> a negative value? (It would indeed turn up as a "large" size_t)
> 
> IMO, casting to long is suspicious.
> 
> There seem to be some two implicit assumptions.
> 
> 1) size >= ULONG_MAX/2 is invalid input
> 2) casting a size >= ULONG_MAX/2 to long yields a negative value
> 
> 1) seems reasonable because we can't copy more than half of memory to
> the other half of memory. I suppose the constraint could be even tighter,
> but it's not clear where to draw the line, especially when considering
> 32b vs 64b arches.
> 
> 2) is implementation-defined, and gcc works "as expected" (clang too
> probably) https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
> 
> A comment might be warranted to explain the rationale.
> Regards.

Thanks for your suggestion.
Yes, It is passed a negative value issue in memcpy/memmove/memset.
Our current idea should be assumption 1 and only consider 64b arch,
because KASAN only supports 64b. In fact, we really can't use so much
memory in 64b arch. so assumption 1 make sense.

Re: [PATCH][smartpqi-next] scsi: smartpqi: clean up indentation of a statement

2019-09-30 Thread Martin K. Petersen



Colin,

> There is a statement that is indented one level too deeply, remove the
> tab, re-join broken line and remove some empty lines.

Applied to 5.5/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v4 4/4] perf_event_open: switch to copy_struct_from_user()

2019-09-30 Thread Christian Brauner

On Tue, Oct 01, 2019 at 11:10:55AM +1000, Aleksa Sarai wrote:
> The change is very straightforward, and helps unify the syscall
> interface for struct-from-userspace syscalls.
> 
> Reviewed-by: Kees Cook 
> Signed-off-by: Aleksa Sarai 

Reviewed-by: Christian Brauner

Problem sharing interrupts between gpioa and uart0 on Broadcom Hurricane 2 (iProc)

2019-09-30 Thread Chris Packham

Hi,

We have a platform using the BCM53344 integrated switch/CPU. This is
part of the Hurricane 2 (technically Wolfhound) family of devices.

Currently we're using pieces of Broadcom's "iProc" SDK based on an out
of date kernel and we'd very much like to be running as close to
upstream as possible. The fact that the Ubiquiti UniFi Switch 8 is
upstream gives me some hope.

My current problem is the fact that the uart0 interrupt is shared with
the Chip Common A gpio block. When I have and interrupt node on the
gpio in the device tree I get an init exit at startup. If I remove the
interrupt node the system will boot (except I don't get cascaded
interrupts from the GPIOs).

Looking at the pinctrl-nsp-gpio.c it looks as though I might be able to
make this work if I can convince the gpio code to return IRQ_HANDLED or
IRQ_NONE but I'm struggling against the fact that the pinctrl-iproc-
gpio.c defers it's interrupt handing to the gpio core.

Is there any way I can get the gpio core to deal with the shared
interrupt?

Thanks,
Chris

Re: [PATCH v4 2/4] clone3: switch to copy_struct_from_user()

2019-09-30 Thread Christian Brauner

On Tue, Oct 01, 2019 at 11:10:53AM +1000, Aleksa Sarai wrote:
> The change is very straightforward, and helps unify the syscall
> interface for struct-from-userspace syscalls. Additionally, explicitly
> define CLONE_ARGS_SIZE_VER0 to match the other users of the
> struct-extension pattern.
> 
> Reviewed-by: Kees Cook 
> Signed-off-by: Aleksa Sarai 

Reviewed-by: Christian Brauner

Re: [PATCH v4 3/4] sched_setattr: switch to copy_struct_from_user()

2019-09-30 Thread Christian Brauner

On Tue, Oct 01, 2019 at 11:10:54AM +1000, Aleksa Sarai wrote:
> The change is very straightforward, and helps unify the syscall
> interface for struct-from-userspace syscalls. Ideally we could also
> unify sched_getattr(2)-style syscalls as well, but unfortunately the
> correct semantics for such syscalls are much less clear (see [1] for
> more detail). In future we could come up with a more sane idea for how
> the syscall interface should look.
> 
> [1]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
>  robustify sched_read_attr() ABI logic and code")
> 
> Reviewed-by: Kees Cook 
> Signed-off-by: Aleksa Sarai 

Reviewed-by: Christian Brauner

Re: [PATCH v4 1/4] lib: introduce copy_struct_from_user() helper

2019-09-30 Thread Christian Brauner

On Mon, Sep 30, 2019 at 06:58:39PM -0700, Kees Cook wrote:
> On Tue, Oct 01, 2019 at 11:10:52AM +1000, Aleksa Sarai wrote:
> > A common pattern for syscall extensions is increasing the size of a
> > struct passed from userspace, such that the zero-value of the new fields
> > result in the old kernel behaviour (allowing for a mix of userspace and
> > kernel vintages to operate on one another in most cases).
> > 
> > While this interface exists for communication in both directions, only
> > one interface is straightforward to have reasonable semantics for
> > (userspace passing a struct to the kernel). For kernel returns to
> > userspace, what the correct semantics are (whether there should be an
> > error if userspace is unaware of a new extension) is very
> > syscall-dependent and thus probably cannot be unified between syscalls
> > (a good example of this problem is [1]).
> > 
> > Previously there was no common lib/ function that implemented
> > the necessary extension-checking semantics (and different syscalls
> > implemented them slightly differently or incompletely[2]). Future
> > patches replace common uses of this pattern to make use of
> > copy_struct_from_user().
> > 
> > Some in-kernel selftests that insure that the handling of alignment and
> > various byte patterns are all handled identically to memchr_inv() usage.
> > 
> > [1]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
> >  robustify sched_read_attr() ABI logic and code")
> > 
> > [2]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
> >  similar checks to copy_struct_from_user() while rt_sigprocmask(2)
> >  always rejects differently-sized struct arguments.
> > 
> > Suggested-by: Rasmus Villemoes 
> > Signed-off-by: Aleksa Sarai 
> > ---
> >  include/linux/bitops.h  |   7 +++
> >  include/linux/uaccess.h |  70 +
> >  lib/strnlen_user.c  |   8 +--
> >  lib/test_user_copy.c| 136 ++--
> >  lib/usercopy.c  |  55 
> >  5 files changed, 263 insertions(+), 13 deletions(-)
> > 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index cf074bce3eb3..c94a9ff9f082 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -4,6 +4,13 @@
> >  #include 
> >  #include 
> >  
> > +/* Set bits in the first 'n' bytes when loaded from memory */
> > +#ifdef __LITTLE_ENDIAN
> > +#  define aligned_byte_mask(n) ((1UL << 8*(n))-1)
> > +#else
> > +#  define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
> > +#endif
> > +
> >  #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
> >  #define BITS_TO_LONGS(nr)  DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
> >  
> > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> > index 70bbdc38dc37..8abbc713f7fb 100644
> > --- a/include/linux/uaccess.h
> > +++ b/include/linux/uaccess.h
> > @@ -231,6 +231,76 @@ __copy_from_user_inatomic_nocache(void *to, const void 
> > __user *from,
> >  
> >  #endif /* ARCH_HAS_NOCACHE_UACCESS */
> >  
> > +extern int check_zeroed_user(const void __user *from, size_t size);
> > +
> > +/**
> > + * copy_struct_from_user: copy a struct from userspace
> > + * @dst:   Destination address, in kernel space. This buffer must be @ksize
> > + * bytes long.
> > + * @ksize: Size of @dst struct.
> > + * @src:   Source address, in userspace.
> > + * @usize: (Alleged) size of @src struct.
> > + *
> > + * Copies a struct from userspace to kernel space, in a way that guarantees
> > + * backwards-compatibility for struct syscall arguments (as long as future
> > + * struct extensions are made such that all new fields are *appended* to 
> > the
> > + * old struct, and zeroed-out new fields have the same meaning as the old
> > + * struct).
> > + *
> > + * @ksize is just sizeof(*dst), and @usize should've been passed by 
> > userspace.
> > + * The recommended usage is something like the following:
> > + *
> > + *   SYSCALL_DEFINE2(foobar, const struct foo __user *, uarg, size_t, 
> > usize)
> > + *   {
> > + *  int err;
> > + *  struct foo karg = {};
> > + *
> > + *  if (usize > PAGE_SIZE)
> > + *return -E2BIG;
> > + *  if (usize < FOO_SIZE_VER0)
> > + *return -EINVAL;
> > + *
> > + *  err = copy_struct_from_user(, sizeof(karg), uarg, usize);
> > + *  if (err)
> > + *return err;
> > + *
> > + *  // ...
> > + *   }
> > + *
> > + * There are three cases to consider:
> > + *  * If @usize == @ksize, then it's copied verbatim.
> > + *  * If @usize < @ksize, then the userspace has passed an old struct to a
> > + *newer kernel. The rest of the trailing bytes in @dst (@ksize - 
> > @usize)
> > + *are to be zero-filled.
> > + *  * If @usize > @ksize, then the userspace has passed a new struct to an
> > + *older kernel. The trailing bytes unknown to the kernel (@usize - 
> > @ksize)
> > + *are checked to ensure they are zeroed, otherwise

Re: [PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period

2019-09-30 Thread kbuild test robot

Hi Like,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvm/linux-next]
[cannot apply to v5.4-rc1 next-20190930]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Like-Xu/perf-core-Provide-a-kernel-internal-interface-to-recalibrate-event-period/20191001-081543
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: nds32-allnoconfig (attached as .config)
compiler: nds32le-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   nds32le-linux-ld: init/do_mounts.o: in function `perf_event_period':
>> do_mounts.c:(.text+0xc): multiple definition of `perf_event_period'; 
>> init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: init/noinitramfs.o: in function `perf_event_period':
   noinitramfs.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: arch/nds32/kernel/sys_nds32.o: in function 
`perf_event_period':
   sys_nds32.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: arch/nds32/kernel/syscall_table.o: in function 
`perf_event_period':
   syscall_table.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: arch/nds32/mm/fault.o: in function `perf_event_period':
   fault.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/fork.o: in function `perf_event_period':
   fork.c:(.text+0x374): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/exec_domain.o: in function `perf_event_period':
   exec_domain.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/cpu.o: in function `perf_event_period':
   cpu.c:(.text+0xd4): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/exit.o: in function `perf_event_period':
   exit.c:(.text+0xf30): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/sysctl.o: in function `perf_event_period':
   sysctl.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/sysctl_binary.o: in function `perf_event_period':
   sysctl_binary.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/capability.o: in function `perf_event_period':
   capability.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/ptrace.o: in function `perf_event_period':
   ptrace.c:(.text+0x3f4): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/signal.o: in function `perf_event_period':
   signal.c:(.text+0x580): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/sys.o: in function `perf_event_period':
   sys.c:(.text+0x70c): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/umh.o: in function `perf_event_period':
   umh.c:(.text+0x4c8): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/pid.o: in function `perf_event_period':
   pid.c:(.text+0x0): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/nsproxy.o: in function `perf_event_period':
   nsproxy.c:(.text+0x14c): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/reboot.o: in function `perf_event_period':
   reboot.c:(.text+0x6c): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0): first defined here
   nds32le-linux-ld: kernel/sched/core.o: in function `perf_event_period':
   core.c:(.text+0x338): multiple definition of `perf_event_period'; 
init/main.o:main.c:(.text+0x0):

include/linux/spinlock.h:393:9: sparse: sparse: context imbalance in 'linflex_console_write' - unexpected unlock

2019-09-30 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   54ecb8f7028c5eb3d740bb82b0f1d90f2df63c5c
commit: 09864c1cdf5c537bd01bff45181406e422ea988c tty: serial: Add linflexuart 
driver for S32V234
date:   4 weeks ago
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-rc1-37-gd466a02-dirty
git checkout 09864c1cdf5c537bd01bff45181406e422ea988c
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


sparse warnings: (new ones prefixed by >>)

>> include/linux/spinlock.h:393:9: sparse: sparse: context imbalance in 
>> 'linflex_console_write' - unexpected unlock

vim +/linflex_console_write +393 include/linux/spinlock.h

c2f21ce2e31286 Thomas Gleixner 2009-12-02  390  
3490565b633c70 Denys Vlasenko  2015-07-13  391  static __always_inline void 
spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
c2f21ce2e31286 Thomas Gleixner 2009-12-02  392  {
c2f21ce2e31286 Thomas Gleixner 2009-12-02 @393  
raw_spin_unlock_irqrestore(>rlock, flags);
c2f21ce2e31286 Thomas Gleixner 2009-12-02  394  }
c2f21ce2e31286 Thomas Gleixner 2009-12-02  395  

:: The code at line 393 was first introduced by commit
:: c2f21ce2e31286a0a32f8da0a7856e9ca1122ef3 locking: Implement new 
raw_spinlock

:: TO: Thomas Gleixner 
:: CC: Thomas Gleixner 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: x86/random: Speculation to the rescue

2019-09-30 Thread hgntkwis

Why not get entropy from the white noise that can be obtained from any  
attached ADC? Audio cards, some SBCs, and microcontrollers all have  
ADCs.
Not that I'm familiar with when the kernel first needs entropy or an  
expert in the field.


Thanks



-
This free account was provided by VFEmail.net - report spam to ab...@vfemail.net

ONLY AT VFEmail! - Use our Metadata Mitigator to keep your email out of the 
NSA's hands!
$24.95 ONETIME Lifetime accounts with Privacy Features!  
15GB disk! No bandwidth quotas!
Commercial and Bulk Mail Options!

Re: [PATCH 3/4] perf inject --jit: Remove //anon mmap events

2019-09-30 Thread Andi Kleen

On Mon, Sep 30, 2019 at 08:49:00PM +, Steve MacLean wrote:
> SNIP
> 
> > I can't apply this one:
> 
> > patching file builtin-inject.c
> > Hunk #1 FAILED at 263.
> > 1 out of 1 hunk FAILED -- saving rejects to file builtin-inject.c.rej 
> 
> I assume this is because I based my patches on the wrong tip.
> 
> > patching file util/jitdump.c
> > patch:  malformed patch at line 236: btree, node);
> 
> This doesn't make sense to me.  The patch doesn't try to inject near line 
> 236. There aren't 236 lines in the e-mail

Most likely your mail client did line wrap.

See Documentation/process/email-clients.rst

-Andi

Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user

2019-09-30 Thread Andi Kleen

> > I think it's useful. Makes it easy to do kernel/user break downs.
> > perf record should support the same.
> 
> Don't we have this already with:
> 
> [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k 
> -a -- sleep 1

This only works for simple cases. Try it for --topdown or multiple -M metrics.

-Andi

Re: [PATCH v15 1/4] soc: mediatek: cmdq: define the instruction struct

2019-09-30 Thread CK Hu

Hi, Bibby:

On Fri, 2019-09-27 at 19:42 +0800, Bibby Hsieh wrote:
> Define an instruction structure for gce driver to append command.
> This structure can make the client's code more readability.
> 
> Signed-off-by: Bibby Hsieh 
> Reviewed-by: CK Hu 

You've modified this patch in this version, so you should drop this
'Reviewed-by' tag.

> Reviewed-by: Houlong Wei 
> ---
>  drivers/soc/mediatek/mtk-cmdq-helper.c   | 106 +--
>  include/linux/mailbox/mtk-cmdq-mailbox.h |  10 +++
>  2 files changed, 90 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c 
> b/drivers/soc/mediatek/mtk-cmdq-helper.c
> index 7aa0517ff2f3..7af327b98d25 100644
> --- a/drivers/soc/mediatek/mtk-cmdq-helper.c
> +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c
> @@ -9,12 +9,24 @@
>  #include 
>  #include 
>  
> -#define CMDQ_ARG_A_WRITE_MASK0x
>  #define CMDQ_WRITE_ENABLE_MASK   BIT(0)
>  #define CMDQ_EOC_IRQ_EN  BIT(0)
>  #define CMDQ_EOC_CMD ((u64)((CMDQ_CODE_EOC << CMDQ_OP_CODE_SHIFT)) \
>   << 32 | CMDQ_EOC_IRQ_EN)
>  
> +struct cmdq_instruction {
> + union {
> + u32 value;
> + u32 mask;
> + };
> + union {
> + u16 offset;
> + u16 event;
> + };
> + u8 subsys;
> + u8 op;
> +};
> +
>  static void cmdq_client_timeout(struct timer_list *t)
>  {
>   struct cmdq_client *client = from_timer(client, t, timer);
> @@ -110,10 +122,10 @@ void cmdq_pkt_destroy(struct cmdq_pkt *pkt)
>  }
>  EXPORT_SYMBOL(cmdq_pkt_destroy);
>  
> -static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, enum cmdq_code code,
> -u32 arg_a, u32 arg_b)
> +static int cmdq_pkt_append_command(struct cmdq_pkt *pkt,
> +struct cmdq_instruction *inst)
>  {
> - u64 *cmd_ptr;
> + struct cmdq_instruction *cmd_ptr;
>  
>   if (unlikely(pkt->cmd_buf_size + CMDQ_INST_SIZE > pkt->buf_size)) {
>   /*
> @@ -129,8 +141,9 @@ static int cmdq_pkt_append_command(struct cmdq_pkt *pkt, 
> enum cmdq_code code,
>   __func__, (u32)pkt->buf_size);
>   return -ENOMEM;
>   }
> +
>   cmd_ptr = pkt->va_base + pkt->cmd_buf_size;
> - (*cmd_ptr) = (u64)((code << CMDQ_OP_CODE_SHIFT) | arg_a) << 32 | arg_b;
> + *cmd_ptr = *inst;
>   pkt->cmd_buf_size += CMDQ_INST_SIZE;
>  
>   return 0;
> @@ -138,24 +151,42 @@ static int cmdq_pkt_append_command(struct cmdq_pkt 
> *pkt, enum cmdq_code code,
>  
>  int cmdq_pkt_write(struct cmdq_pkt *pkt, u8 subsys, u16 offset, u32 value)
>  {
> - u32 arg_a = (offset & CMDQ_ARG_A_WRITE_MASK) |
> - (subsys << CMDQ_SUBSYS_SHIFT);
> + struct cmdq_instruction *inst = kzalloc(sizeof(*inst), GFP_KERNEL);

Frequently allocate/free increase CPU loading. The simpler way is

struct cmdq_instruction inst = { 0 };

cmdq_pkt_append_command(pkt, );


> + int err = 0;

No need to assign initial value.

> +
> + if (!inst)
> + return -ENOMEM;
> +
> + inst->op = CMDQ_CODE_WRITE;
> + inst->value = value;
> + inst->offset = offset;
> + inst->subsys = subsys;
>  
> - return cmdq_pkt_append_command(pkt, CMDQ_CODE_WRITE, arg_a, value);
> + err = cmdq_pkt_append_command(pkt, inst);
> + kfree(inst);
> +
> + return err;
>  }
>  EXPORT_SYMBOL(cmdq_pkt_write);
>  

[snip]

>  
>  static int cmdq_pkt_finalize(struct cmdq_pkt *pkt)
>  {
> - int err;
> + struct cmdq_instruction *inst = kzalloc(sizeof(*inst), GFP_KERNEL);
> + int err = 0;
> +
> + if (!inst)
> + return -ENOMEM;
>  
>   /* insert EOC and generate IRQ for each command iteration */
> - err = cmdq_pkt_append_command(pkt, CMDQ_CODE_EOC, 0, CMDQ_EOC_IRQ_EN);
> + inst->op = CMDQ_CODE_EOC;
> + inst->value = CMDQ_EOC_IRQ_EN;
> + err = cmdq_pkt_append_command(pkt, inst);
>  
>   /* JUMP to end */
> - err |= cmdq_pkt_append_command(pkt, CMDQ_CODE_JUMP, 0, CMDQ_JUMP_PASS);
> + inst->op = CMDQ_CODE_JUMP;
> + inst->value = CMDQ_JUMP_PASS;
> + err |= cmdq_pkt_append_command(pkt, inst);

OR the err value looks strange. If you OR err 0x1 and err 0x10, you
would get the new err 0x11. How do you know that err 0x11 is the
combination of 0x1 and 0x10?

This bug seems exist in previous patch, so I would like you to fix this
bug first and then apply this patch.

Regards,
CK


> + kfree(inst);
>  
>   return err;
>  }
> diff --git a/include/linux/mailbox/mtk-cmdq-mailbox.h 
> b/include/linux/mailbox/mtk-cmdq-mailbox.h
> index e6f54ef6698b..678760548791 100644
> --- a/include/linux/mailbox/mtk-cmdq-mailbox.h
> +++ b/include/linux/mailbox/mtk-cmdq-mailbox.h
> @@ -20,6 +20,16 @@
>  #define CMDQ_WFE_WAITBIT(15)
>  #define CMDQ_WFE_WAIT_VALUE  0x1
>  
> +/*
> + * WFE arg_b
> + * bit 0-11: wait value
> + * bit 15: 1 - wait, 0 - no wait
> + * bit 16-27:

[PATCH] tty: n_hdlc: fix build on SPARC

2019-09-30 Thread Randy Dunlap

From: Randy Dunlap 

Fix tty driver build on SPARC by not using __exitdata.
It appears that SPARC does not support section .exit.data.

Fixes these build errors:

`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: 
defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: 
defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: 
defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o
`.exit.data' referenced in section `.exit.text' of drivers/tty/n_hdlc.o: 
defined in discarded section `.exit.data' of drivers/tty/n_hdlc.o

Reported-by: kbuild test robot 
Fixes: 063246641d4a ("format-security: move static strings to const")
Signed-off-by: Randy Dunlap 
Cc: Kees Cook 
Cc: Greg Kroah-Hartman 
Cc: "David S. Miller" 
Cc: Andrew Morton 
---
 drivers/tty/n_hdlc.c |5 +
 1 file changed, 5 insertions(+)

--- mmotm-2019-0925-1810.orig/drivers/tty/n_hdlc.c
+++ mmotm-2019-0925-1810/drivers/tty/n_hdlc.c
@@ -968,6 +968,11 @@ static int __init n_hdlc_init(void)

 }  /* end of init_module() */
 
+#ifdef CONFIG_SPARC
+#undef __exitdata
+#define __exitdata
+#endif
+
 static const char hdlc_unregister_ok[] __exitdata =
KERN_INFO "N_HDLC: line discipline unregistered\n";
 static const char hdlc_unregister_fail[] __exitdata =

linux-next: Tree for Oct 1

2019-09-30 Thread Stephen Rothwell

Hi all,

Changes since 20190930:

New trees: erofs-fixes, erofs, kunit

My fixes tree contains:

  04e6dac68d9b ("powerpc/64s/radix: fix for "tidy up TLB flushing code" and 
!CONFIG_PPC_RADIX_MMU")

Non-merge commits (relative to Linus' tree): 455
 594 files changed, 15071 insertions(+), 4081 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 313 trees (counting Linus' and 78 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (54ecb8f7028c Linux 5.4-rc1)
Merging fixes/master (04e6dac68d9b powerpc/64s/radix: fix for "tidy up TLB 
flushing code" and !CONFIG_PPC_RADIX_MMU)
Merging kbuild-current/fixes (f97c81dc6ca5 Merge tag 'armsoc-late' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc)
Merging arc-current/for-curr (41277ba7eb4e ARC: mm: tlb flush optim: elide 
redundant uTLB invalidates for MMUv3)
Merging arm-current/fixes (5b3efa4f1479 ARM: 8901/1: add a criteria for 
pfn_valid of arm)
Merging arm-soc-fixes/arm/fixes (a304c0a60252 arm64/ARM: configs: Change 
CONFIG_REMOTEPROC from m to y)
Merging arm64-fixes/for-next/fixes (799c85105233 arm64: Fix reference to docs 
for ARM64_TAGGED_ADDR_ABI)
Merging m68k-current/for-linus (0f1979b402df m68k: Remove ioremap_fullcache())
Merging powerpc-fixes/fixes (253c892193ab powerpc/eeh: Fix eeh 
eeh_debugfs_break_device() with SRIOV devices)
Merging s390-fixes/fixes (d45331b00ddb Linux 5.3-rc4)
Merging sparc/master (038029c03e21 sparc: remove unneeded uapi/asm/statfs.h)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (0e141f757b2c erspan: remove the incorrect mtu limit for 
erspan)
Merging bpf/master (55d554f5d140 tools: bpf: Use !building_out_of_srctree to 
determine srctree)
Merging ipsec/master (00b368502d18 xen-netfront: do not assume sk_buff_head 
list is empty in error handling)
Merging netfilter/master (02dc96ef6c25 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net)
Merging ipvs/master (58e8b37069ff Merge branch 'net-phy-dp83867-add-some-fixes')
Merging wireless-drivers/master (54ecb8f7028c Linux 5.4-rc1)
Merging mac80211/master (f794dc2304d8 sctp: fix the missing put_user when 
dumping transport thresholds)
Merging rdma-fixes/for-rc (531a64e4c35b RDMA/siw: Fix IPv6 addr_list locking)
Merging sound-current/for-linus (f41f900568d9 ALSA: usb-audio: Add DSD support 
for EVGA NU Audio)
Merging sound-asoc-fixes/for-linus (84b66885fdcf Merge branch 'asoc-5.4' into 
asoc-linus)
Merging regmap-fixes/for-linus (0161b8716465 Merge branch 'regmap-5.3' into 
regmap-linus)
Merging regulator-fixes/for-linus (f9a60abc26d9 Merge branch 'regulator-5.4' 
into regulator-linus)
Merging spi-fixes/for-linus (60b76d1c3b0a Merge branch 'spi-5.4' into spi-linus)
Merging pci-current/for-linus (5184d449600f Merge tag 'microblaze-v5.4-rc1' of 
git://git.monstr.eu/linux-2.6-microblaze)
Merging driver-core.current/driver-core-linus (54ecb8f7028c Linux 5.4-rc1)
Merging tty.current/tty-linus (54ecb8f7028c Linux 5.4-rc1)
Merging usb.current/usb-linus (54ecb8f7028c Linux 5.4-rc1)
Merging usb-gadget-fixes/fixes (4a56a478a525 usb: gadget: mass_storage: Fix 
races between fsg_disable and fsg_set_alt)
Merging usb-serial-fixes/usb-linus (d1abaeb3be7b Linux 5.3-rc5)
Merging usb-chipidea-fixes/

Re: [PATCH v4 1/4] lib: introduce copy_struct_from_user() helper

2019-09-30 Thread Kees Cook

On Tue, Oct 01, 2019 at 11:10:52AM +1000, Aleksa Sarai wrote:
> A common pattern for syscall extensions is increasing the size of a
> struct passed from userspace, such that the zero-value of the new fields
> result in the old kernel behaviour (allowing for a mix of userspace and
> kernel vintages to operate on one another in most cases).
> 
> While this interface exists for communication in both directions, only
> one interface is straightforward to have reasonable semantics for
> (userspace passing a struct to the kernel). For kernel returns to
> userspace, what the correct semantics are (whether there should be an
> error if userspace is unaware of a new extension) is very
> syscall-dependent and thus probably cannot be unified between syscalls
> (a good example of this problem is [1]).
> 
> Previously there was no common lib/ function that implemented
> the necessary extension-checking semantics (and different syscalls
> implemented them slightly differently or incompletely[2]). Future
> patches replace common uses of this pattern to make use of
> copy_struct_from_user().
> 
> Some in-kernel selftests that insure that the handling of alignment and
> various byte patterns are all handled identically to memchr_inv() usage.
> 
> [1]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
>  robustify sched_read_attr() ABI logic and code")
> 
> [2]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
>  similar checks to copy_struct_from_user() while rt_sigprocmask(2)
>  always rejects differently-sized struct arguments.
> 
> Suggested-by: Rasmus Villemoes 
> Signed-off-by: Aleksa Sarai 
> ---
>  include/linux/bitops.h  |   7 +++
>  include/linux/uaccess.h |  70 +
>  lib/strnlen_user.c  |   8 +--
>  lib/test_user_copy.c| 136 ++--
>  lib/usercopy.c  |  55 
>  5 files changed, 263 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index cf074bce3eb3..c94a9ff9f082 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -4,6 +4,13 @@
>  #include 
>  #include 
>  
> +/* Set bits in the first 'n' bytes when loaded from memory */
> +#ifdef __LITTLE_ENDIAN
> +#  define aligned_byte_mask(n) ((1UL << 8*(n))-1)
> +#else
> +#  define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
> +#endif
> +
>  #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
>  #define BITS_TO_LONGS(nr)DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
>  
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 70bbdc38dc37..8abbc713f7fb 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -231,6 +231,76 @@ __copy_from_user_inatomic_nocache(void *to, const void 
> __user *from,
>  
>  #endif   /* ARCH_HAS_NOCACHE_UACCESS */
>  
> +extern int check_zeroed_user(const void __user *from, size_t size);
> +
> +/**
> + * copy_struct_from_user: copy a struct from userspace
> + * @dst:   Destination address, in kernel space. This buffer must be @ksize
> + * bytes long.
> + * @ksize: Size of @dst struct.
> + * @src:   Source address, in userspace.
> + * @usize: (Alleged) size of @src struct.
> + *
> + * Copies a struct from userspace to kernel space, in a way that guarantees
> + * backwards-compatibility for struct syscall arguments (as long as future
> + * struct extensions are made such that all new fields are *appended* to the
> + * old struct, and zeroed-out new fields have the same meaning as the old
> + * struct).
> + *
> + * @ksize is just sizeof(*dst), and @usize should've been passed by 
> userspace.
> + * The recommended usage is something like the following:
> + *
> + *   SYSCALL_DEFINE2(foobar, const struct foo __user *, uarg, size_t, usize)
> + *   {
> + *  int err;
> + *  struct foo karg = {};
> + *
> + *  if (usize > PAGE_SIZE)
> + *return -E2BIG;
> + *  if (usize < FOO_SIZE_VER0)
> + *return -EINVAL;
> + *
> + *  err = copy_struct_from_user(, sizeof(karg), uarg, usize);
> + *  if (err)
> + *return err;
> + *
> + *  // ...
> + *   }
> + *
> + * There are three cases to consider:
> + *  * If @usize == @ksize, then it's copied verbatim.
> + *  * If @usize < @ksize, then the userspace has passed an old struct to a
> + *newer kernel. The rest of the trailing bytes in @dst (@ksize - @usize)
> + *are to be zero-filled.
> + *  * If @usize > @ksize, then the userspace has passed a new struct to an
> + *older kernel. The trailing bytes unknown to the kernel (@usize - 
> @ksize)
> + *are checked to ensure they are zeroed, otherwise -E2BIG is returned.
> + *
> + * Returns (in all cases, some data may have been copied):
> + *  * -E2BIG:  (@usize > @ksize) and there are non-zero trailing bytes in 
> @src.
> + *  * -EFAULT: access to userspace failed.
> + */
> +static __always_inline
> +int copy_struct_from_user(void *dst,

Re: [PATCH v1 1/2] dt-bindings: spi: Add support for cadence-qspi IP Intel LGM SoC

2019-09-30 Thread Ramuthevar, Vadivel MuruganX


Hi Rob,

   Thank you for the review comments.

On 1/10/2019 6:36 AM, Rob Herring wrote:

On Mon, Sep 16, 2019 at 03:38:42PM +0800, Ramuthevar,Vadivel MuruganX wrote:

From: Ramuthevar Vadivel Murugan 

On Intel Lightening Mountain(LGM) SoCs QSPI controller support
to QSPI-NAND flash. This introduces to device tree binding
documentation for Cadence-QSPI controller and spi-nand flash.

Signed-off-by: Ramuthevar Vadivel Murugan 

---
  .../devicetree/bindings/spi/cadence,qspi-nand.yaml | 84 ++
  1 file changed, 84 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/spi/cadence,qspi-nand.yaml

diff --git a/Documentation/devicetree/bindings/spi/cadence,qspi-nand.yaml 
b/Documentation/devicetree/bindings/spi/cadence,qspi-nand.yaml
new file mode 100644
index ..9aae4c1459cc
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/cadence,qspi-nand.yaml
@@ -0,0 +1,84 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/spi/cadence,qspi-nand.yaml#;
+$schema: "http://devicetree.org/meta-schemas/core.yaml#;
+
+title: Cadence QSPI Flash Controller on Intel's SoC
+
+maintainers:
+  - Ramuthevar Vadivel Murugan 
+
+allOf:
+  - $ref: "spi-controller.yaml#"
+
+description: |
+  The Cadence QSPI is a controller optimized for communication with SPI
+  FLASH memories, without DMA support on Intel's SoC.
+
+properties:
+  compatible:
+const: cadence,lgm-qspi

Vendor here should be 'intel'. Perhaps the binding should be shared too
like the driver.

Plus the vendor prefix for Cadence is cdns.

Agreed!, will update.

+
+  reg:
+maxItems: 1
+
+  fifo-depth:
+maxItems: 1
+

This is vendor specific, so needs a vendor prefix, type, and
description.

agreed!

+  fifo-width:
+maxItems: 1

Same


+
+  qspi-phyaddr:
+maxItems: 1

Same


+
+  qspi-phymask:
+maxItems: 1

Same

will update all the above.

+
+  clocks:
+maxItems: 2

Need to define what each clock is when there is more than 1.

Sure, will update.

+
+  clocks-names:
+maxItems: 2

Need to define the strings.

Noted, will update.

+
+  resets:
+maxItems: 1
+
+  reset-names:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - fifo-depth
+  - fifo-width
+  - qspi-phyaddr
+  - qspi-phymask
+  - clocks
+  - clock-names
+  - resets
+  - reset-names
+
+examples:
+  - |
+qspi@ec00 {

spi@...


Controller is qspi , so that have updated.

With Best Regards
Vadivel Murugan R

+  compatible = "cadence,qspi-nand";
+  reg = <0xec00 0x100>;
+  fifo-depth = <128>;
+  fifo-width = <4>;
+  qspi-phyaddr = <0xf400>;
+  qspi-phymask = <0x>;
+  clocks = < LGM_CLK_QSPI>, < LGM_GCLK_QSPI>;
+  clock-names = "freq", "qspi";
+  resets = < 0x10 1>;
+  reset-names = "qspi";
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  flash: flash@1 {
+  compatible = "spi-nand";
+  reg = <1>;
+  spi-max-frequency = <1000>;
+  };
+};
+
--
2.11.0

[PATCH v4 3/4] sched_setattr: switch to copy_struct_from_user()

2019-09-30 Thread Aleksa Sarai

The change is very straightforward, and helps unify the syscall
interface for struct-from-userspace syscalls. Ideally we could also
unify sched_getattr(2)-style syscalls as well, but unfortunately the
correct semantics for such syscalls are much less clear (see [1] for
more detail). In future we could come up with a more sane idea for how
the syscall interface should look.

[1]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
 robustify sched_read_attr() ABI logic and code")

Reviewed-by: Kees Cook 
Signed-off-by: Aleksa Sarai 
---
 kernel/sched/core.c | 43 +++
 1 file changed, 7 insertions(+), 36 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7880f4f64d0e..dd05a378631a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5106,9 +5106,6 @@ static int sched_copy_attr(struct sched_attr __user 
*uattr, struct sched_attr *a
u32 size;
int ret;
 
-   if (!access_ok(uattr, SCHED_ATTR_SIZE_VER0))
-   return -EFAULT;
-
/* Zero the full structure, so that a short copy will be nice: */
memset(attr, 0, sizeof(*attr));
 
@@ -5116,45 +5113,19 @@ static int sched_copy_attr(struct sched_attr __user 
*uattr, struct sched_attr *a
if (ret)
return ret;
 
-   /* Bail out on silly large: */
-   if (size > PAGE_SIZE)
-   goto err_size;
-
/* ABI compatibility quirk: */
if (!size)
size = SCHED_ATTR_SIZE_VER0;
-
-   if (size < SCHED_ATTR_SIZE_VER0)
+   if (size < SCHED_ATTR_SIZE_VER0 || size > PAGE_SIZE)
goto err_size;
 
-   /*
-* If we're handed a bigger struct than we know of,
-* ensure all the unknown bits are 0 - i.e. new
-* user-space does not rely on any kernel feature
-* extensions we dont know about yet.
-*/
-   if (size > sizeof(*attr)) {
-   unsigned char __user *addr;
-   unsigned char __user *end;
-   unsigned char val;
-
-   addr = (void __user *)uattr + sizeof(*attr);
-   end  = (void __user *)uattr + size;
-
-   for (; addr < end; addr++) {
-   ret = get_user(val, addr);
-   if (ret)
-   return ret;
-   if (val)
-   goto err_size;
-   }
-   size = sizeof(*attr);
+   ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);
+   if (ret) {
+   if (ret == -E2BIG)
+   goto err_size;
+   return ret;
}
 
-   ret = copy_from_user(attr, uattr, size);
-   if (ret)
-   return -EFAULT;
-
if ((attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) &&
size < SCHED_ATTR_SIZE_VER1)
return -EINVAL;
@@ -5354,7 +5325,7 @@ sched_attr_copy_to_user(struct sched_attr __user *uattr,
  * sys_sched_getattr - similar to sched_getparam, but with sched_attr
  * @pid: the pid in question.
  * @uattr: structure containing the extended parameters.
- * @usize: sizeof(attr) that user-space knows about, for forwards and 
backwards compatibility.
+ * @usize: sizeof(attr) for fwd/bwd comp.
  * @flags: for future extension.
  */
 SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
-- 
2.23.0

[PATCH v4 4/4] perf_event_open: switch to copy_struct_from_user()

2019-09-30 Thread Aleksa Sarai

The change is very straightforward, and helps unify the syscall
interface for struct-from-userspace syscalls.

Reviewed-by: Kees Cook 
Signed-off-by: Aleksa Sarai 
---
 kernel/events/core.c | 47 +---
 1 file changed, 9 insertions(+), 38 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4655adbbae10..3f0cb82e4fbc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10586,55 +10586,26 @@ static int perf_copy_attr(struct perf_event_attr 
__user *uattr,
u32 size;
int ret;
 
-   if (!access_ok(uattr, PERF_ATTR_SIZE_VER0))
-   return -EFAULT;
-
-   /*
-* zero the full structure, so that a short copy will be nice.
-*/
+   /* Zero the full structure, so that a short copy will be nice. */
memset(attr, 0, sizeof(*attr));
 
ret = get_user(size, >size);
if (ret)
return ret;
 
-   if (size > PAGE_SIZE)   /* silly large */
-   goto err_size;
-
-   if (!size)  /* abi compat */
+   /* ABI compatibility quirk: */
+   if (!size)
size = PERF_ATTR_SIZE_VER0;
-
-   if (size < PERF_ATTR_SIZE_VER0)
+   if (size < PERF_ATTR_SIZE_VER0 || size > PAGE_SIZE)
goto err_size;
 
-   /*
-* If we're handed a bigger struct than we know of,
-* ensure all the unknown bits are 0 - i.e. new
-* user-space does not rely on any kernel feature
-* extensions we dont know about yet.
-*/
-   if (size > sizeof(*attr)) {
-   unsigned char __user *addr;
-   unsigned char __user *end;
-   unsigned char val;
-
-   addr = (void __user *)uattr + sizeof(*attr);
-   end  = (void __user *)uattr + size;
-
-   for (; addr < end; addr++) {
-   ret = get_user(val, addr);
-   if (ret)
-   return ret;
-   if (val)
-   goto err_size;
-   }
-   size = sizeof(*attr);
+   ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);
+   if (ret) {
+   if (ret == -E2BIG)
+   goto err_size;
+   return ret;
}
 
-   ret = copy_from_user(attr, uattr, size);
-   if (ret)
-   return -EFAULT;
-
attr->size = size;
 
if (attr->__reserved_1)
-- 
2.23.0

[PATCH v4 2/4] clone3: switch to copy_struct_from_user()

2019-09-30 Thread Aleksa Sarai

The change is very straightforward, and helps unify the syscall
interface for struct-from-userspace syscalls. Additionally, explicitly
define CLONE_ARGS_SIZE_VER0 to match the other users of the
struct-extension pattern.

Reviewed-by: Kees Cook 
Signed-off-by: Aleksa Sarai 
---
 include/uapi/linux/sched.h |  2 ++
 kernel/fork.c  | 34 +++---
 2 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index b3105ac1381a..0945805982b4 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -47,6 +47,8 @@ struct clone_args {
__aligned_u64 tls;
 };
 
+#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
+
 /*
  * Scheduling policies
  */
diff --git a/kernel/fork.c b/kernel/fork.c
index f9572f416126..2ef529869c64 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2525,39 +2525,19 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, 
unsigned long, newsp,
 #ifdef __ARCH_WANT_SYS_CLONE3
 noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
  struct clone_args __user *uargs,
- size_t size)
+ size_t usize)
 {
+   int err;
struct clone_args args;
 
-   if (unlikely(size > PAGE_SIZE))
+   if (unlikely(usize > PAGE_SIZE))
return -E2BIG;
-
-   if (unlikely(size < sizeof(struct clone_args)))
+   if (unlikely(usize < CLONE_ARGS_SIZE_VER0))
return -EINVAL;
 
-   if (unlikely(!access_ok(uargs, size)))
-   return -EFAULT;
-
-   if (size > sizeof(struct clone_args)) {
-   unsigned char __user *addr;
-   unsigned char __user *end;
-   unsigned char val;
-
-   addr = (void __user *)uargs + sizeof(struct clone_args);
-   end = (void __user *)uargs + size;
-
-   for (; addr < end; addr++) {
-   if (get_user(val, addr))
-   return -EFAULT;
-   if (val)
-   return -E2BIG;
-   }
-
-   size = sizeof(struct clone_args);
-   }
-
-   if (copy_from_user(, uargs, size))
-   return -EFAULT;
+   err = copy_struct_from_user(, sizeof(args), uargs, usize);
+   if (err)
+   return err;
 
/*
 * Verify that higher 32bits of exit_signal are unset and that
-- 
2.23.0

[PATCH v4 1/4] lib: introduce copy_struct_from_user() helper

2019-09-30 Thread Aleksa Sarai

A common pattern for syscall extensions is increasing the size of a
struct passed from userspace, such that the zero-value of the new fields
result in the old kernel behaviour (allowing for a mix of userspace and
kernel vintages to operate on one another in most cases).

While this interface exists for communication in both directions, only
one interface is straightforward to have reasonable semantics for
(userspace passing a struct to the kernel). For kernel returns to
userspace, what the correct semantics are (whether there should be an
error if userspace is unaware of a new extension) is very
syscall-dependent and thus probably cannot be unified between syscalls
(a good example of this problem is [1]).

Previously there was no common lib/ function that implemented
the necessary extension-checking semantics (and different syscalls
implemented them slightly differently or incompletely[2]). Future
patches replace common uses of this pattern to make use of
copy_struct_from_user().

Some in-kernel selftests that insure that the handling of alignment and
various byte patterns are all handled identically to memchr_inv() usage.

[1]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
 robustify sched_read_attr() ABI logic and code")

[2]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
 similar checks to copy_struct_from_user() while rt_sigprocmask(2)
 always rejects differently-sized struct arguments.

Suggested-by: Rasmus Villemoes 
Signed-off-by: Aleksa Sarai 
---
 include/linux/bitops.h  |   7 +++
 include/linux/uaccess.h |  70 +
 lib/strnlen_user.c  |   8 +--
 lib/test_user_copy.c| 136 ++--
 lib/usercopy.c  |  55 
 5 files changed, 263 insertions(+), 13 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index cf074bce3eb3..c94a9ff9f082 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -4,6 +4,13 @@
 #include 
 #include 
 
+/* Set bits in the first 'n' bytes when loaded from memory */
+#ifdef __LITTLE_ENDIAN
+#  define aligned_byte_mask(n) ((1UL << 8*(n))-1)
+#else
+#  define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
+#endif
+
 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
 #define BITS_TO_LONGS(nr)  DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
 
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 70bbdc38dc37..8abbc713f7fb 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -231,6 +231,76 @@ __copy_from_user_inatomic_nocache(void *to, const void 
__user *from,
 
 #endif /* ARCH_HAS_NOCACHE_UACCESS */
 
+extern int check_zeroed_user(const void __user *from, size_t size);
+
+/**
+ * copy_struct_from_user: copy a struct from userspace
+ * @dst:   Destination address, in kernel space. This buffer must be @ksize
+ * bytes long.
+ * @ksize: Size of @dst struct.
+ * @src:   Source address, in userspace.
+ * @usize: (Alleged) size of @src struct.
+ *
+ * Copies a struct from userspace to kernel space, in a way that guarantees
+ * backwards-compatibility for struct syscall arguments (as long as future
+ * struct extensions are made such that all new fields are *appended* to the
+ * old struct, and zeroed-out new fields have the same meaning as the old
+ * struct).
+ *
+ * @ksize is just sizeof(*dst), and @usize should've been passed by userspace.
+ * The recommended usage is something like the following:
+ *
+ *   SYSCALL_DEFINE2(foobar, const struct foo __user *, uarg, size_t, usize)
+ *   {
+ *  int err;
+ *  struct foo karg = {};
+ *
+ *  if (usize > PAGE_SIZE)
+ *return -E2BIG;
+ *  if (usize < FOO_SIZE_VER0)
+ *return -EINVAL;
+ *
+ *  err = copy_struct_from_user(, sizeof(karg), uarg, usize);
+ *  if (err)
+ *return err;
+ *
+ *  // ...
+ *   }
+ *
+ * There are three cases to consider:
+ *  * If @usize == @ksize, then it's copied verbatim.
+ *  * If @usize < @ksize, then the userspace has passed an old struct to a
+ *newer kernel. The rest of the trailing bytes in @dst (@ksize - @usize)
+ *are to be zero-filled.
+ *  * If @usize > @ksize, then the userspace has passed a new struct to an
+ *older kernel. The trailing bytes unknown to the kernel (@usize - @ksize)
+ *are checked to ensure they are zeroed, otherwise -E2BIG is returned.
+ *
+ * Returns (in all cases, some data may have been copied):
+ *  * -E2BIG:  (@usize > @ksize) and there are non-zero trailing bytes in @src.
+ *  * -EFAULT: access to userspace failed.
+ */
+static __always_inline
+int copy_struct_from_user(void *dst, size_t ksize,
+ const void __user *src, size_t usize)
+{
+   size_t size = min(ksize, usize);
+   size_t rest = max(ksize, usize) - size;
+
+   /* Deal with trailing bytes. */
+   if (usize < ksize) {
+   memset(dst + size, 0, rest);
+   } else if (usize >

[PATCH v4 0/4] lib: introduce copy_struct_from_user() helper

2019-09-30 Thread Aleksa Sarai

Patch changelog:
 v4:
  * __always_inline copy_struct_from_user(). [Kees Cook]
  * Rework test_user_copy.ko changes. [Kees Cook]
 v3: 
 
 v2: 
 v1: 

This series was split off from the openat2(2) syscall discussion[1].
However, the copy_struct_to_user() helper has been dropped, because
after some discussion it appears that there is no really obvious
semantics for how copy_struct_to_user() should work on mixed-vintages
(for instance, whether [2] is the correct semantics for all syscalls).

A common pattern for syscall extensions is increasing the size of a
struct passed from userspace, such that the zero-value of the new fields
result in the old kernel behaviour (allowing for a mix of userspace and
kernel vintages to operate on one another in most cases).

Previously there was no common lib/ function that implemented
the necessary extension-checking semantics (and different syscalls
implemented them slightly differently or incompletely[3]). This series
implements the helper and ports several syscalls to use it.

Some in-kernel selftests are included in this patch. More complete
self-tests for copy_struct_from_user() are included in the openat2()
patchset.

[1]: https://lore.kernel.org/lkml/20190904201933.10736-1-cyp...@cyphar.com/

[2]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
 robustify sched_read_attr() ABI logic and code")

[3]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
 similar checks to copy_struct_from_user() while rt_sigprocmask(2)
 always rejects differently-sized struct arguments.

Aleksa Sarai (4):
  lib: introduce copy_struct_from_user() helper
  clone3: switch to copy_struct_from_user()
  sched_setattr: switch to copy_struct_from_user()
  perf_event_open: switch to copy_struct_from_user()

 include/linux/bitops.h |   7 ++
 include/linux/uaccess.h|  70 +++
 include/uapi/linux/sched.h |   2 +
 kernel/events/core.c   |  47 +++--
 kernel/fork.c  |  34 ++
 kernel/sched/core.c|  43 ++--
 lib/strnlen_user.c |   8 +--
 lib/test_user_copy.c   | 136 +++--
 lib/usercopy.c |  55 +++
 9 files changed, 288 insertions(+), 114 deletions(-)

-- 
2.23.0

Re: [PATCH 5.3 00/25] 5.3.2-stable review

2019-09-30 Thread Dan Rue

On Sun, Sep 29, 2019 at 03:56:03PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.3.2 release.
> There are 25 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Tue 01 Oct 2019 01:47:47 PM UTC.
> Anything received after that time might be too late.

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 5.3.2-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.3.y
git commit: 5910f7ae17298c45fce24a2f314573bcb7a86284
git describe: v5.3.1-26-g5910f7ae1729
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.3-oe/build/v5.3.1-26-g5910f7ae1729

No regressions (compared to build v5.3.1)

No fixes (compared to build v5.3.1)

Ran 23295 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c
- hi6220-hikey
- i386
- juno-r2
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15
- x86

Test Suites
---
* build
* install-android-platform-tools-r2600
* kselftest
* libgpiod
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* perf
* spectre-meltdown-checker-test
* v4l2-compliance
* ltp-open-posix-tests
* network-basic-tests
* kvm-unit-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none
* ssuite

-- 
Linaro LKFT
https://lkft.linaro.org

Re: [PATCH 5.2 00/45] 5.2.18-stable review

2019-09-30 Thread Dan Rue

On Sun, Sep 29, 2019 at 03:55:28PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.2.18 release.
> There are 45 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Tue 01 Oct 2019 01:47:47 PM UTC.
> Anything received after that time might be too late.

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 5.2.18-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.2.y
git commit: 70cc0b99b90f823b81175b1f15f73ced86135c5b
git describe: v5.2.17-46-g70cc0b99b90f
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.2-oe/build/v5.2.17-46-g70cc0b99b90f

No regressions (compared to build v5.2.17)

No fixes (compared to build v5.2.17)

Ran 22166 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c
- hi6220-hikey
- i386
- juno-r2
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15
- x86

Test Suites
---
* build
* install-android-platform-tools-r2600
* kselftest
* libgpiod
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* perf
* spectre-meltdown-checker-test
* v4l2-compliance
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-open-posix-tests
* network-basic-tests
* kvm-unit-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none
* ssuite

-- 
Linaro LKFT
https://lkft.linaro.org

Re: [PATCH 4.19 00/63] 4.19.76-stable review

2019-09-30 Thread Dan Rue

On Sun, Sep 29, 2019 at 03:53:33PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.76 release.
> There are 63 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Tue 01 Oct 2019 01:47:47 PM UTC.
> Anything received after that time might be too late.

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 4.19.76-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.19.y
git commit: b52c75f7b9785d0d0e6bf145787ed2fc99f5483c
git describe: v4.19.75-64-gb52c75f7b978
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.75-64-gb52c75f7b978

No regressions (compared to build v4.19.75)

No fixes (compared to build v4.19.75)

Ran 23641 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64

Test Suites
---
* build
* install-android-platform-tools-r2600
* kselftest
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-timers-tests
* perf
* spectre-meltdown-checker-test
* v4l2-compliance
* ltp-fs-tests
* network-basic-tests
* ltp-open-posix-tests
* kvm-unit-tests
* ssuite
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-none

-- 
Linaro LKFT
https://lkft.linaro.org

Re: [PATCH 5.2 00/45] 5.2.18-stable review

2019-09-30 Thread shuah


On 9/29/19 7:55 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 5.2.18 release.
There are 45 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Tue 01 Oct 2019 01:47:47 PM UTC.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.18-rc1.gz
or in the git tree and branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-5.2.y
and the diffstat can be found below.

thanks,

greg k-h



Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

[PATCH v7 0/1] Add bounds check for Hotplugged memory

2019-09-30 Thread Alastair D'Silva

From: Alastair D'Silva 

This series adds bounds checks for hotplugged memory, ensuring that
it is within the physically addressable range (for platforms that
define MAX_(POSSIBLE_)PHYSMEM_BITS.

This allows for early failure, rather than attempting to access
bogus section numbers.

Changelog:
 V7:
   - Cast PFN_PHYS as u64 since it's type is platform dependant
 V6:
   - Fix printf formats
 V5:
   - Factor out calculation into max_allowed var
   - Declare unchanging vars as const
   - Use PFN_PHYS macro instead of shifting by PAGE_SHIFT
 V4:
   - Relocate call to __add_pages
   - Add a warning when the addressable check fails
 V3:
   - Perform the addressable check before we take the hotplug lock
 V2:
   - Don't use MAX_POSSIBLE_PHYSMEM_BITS as it's wider that what
 may be available

Alastair D'Silva (1):
  memory_hotplug: Add a bounds check to __add_pages

 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

-- 
2.21.0

[PATCH v7 1/1] memory_hotplug: Add a bounds check to __add_pages

2019-09-30 Thread Alastair D'Silva

From: Alastair D'Silva 

On PowerPC, the address ranges allocated to OpenCAPI LPC memory
are allocated from firmware. These address ranges may be higher
than what older kernels permit, as we increased the maximum
permissable address in commit 4ffe713b7587
("powerpc/mm: Increase the max addressable memory to 2PB"). It is
possible that the addressable range may change again in the
future.

In this scenario, we end up with a bogus section returned from
__section_nr (see the discussion on the thread "mm: Trigger bug on
if a section is not found in __section_nr").

Adding a check here means that we fail early and have an
opportunity to handle the error gracefully, rather than rumbling
on and potentially accessing an incorrect section.

Further discussion is also on the thread ("powerpc: Perform a bounds
check in arch_add_memory")
http://lkml.kernel.org/r/20190827052047.31547-1-alast...@au1.ibm.com

Signed-off-by: Alastair D'Silva 
---
 mm/memory_hotplug.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c73f09913165..5af9f4466ad1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -278,6 +278,22 @@ static int check_pfn_span(unsigned long pfn, unsigned long 
nr_pages,
return 0;
 }
 
+static int check_hotplug_memory_addressable(unsigned long pfn,
+   unsigned long nr_pages)
+{
+   const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
+
+   if (max_addr >> MAX_PHYSMEM_BITS) {
+   const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1;
+   WARN(1,
+"Hotplugged memory exceeds maximum addressable address, 
range=%#llx-%#llx, maximum=%#llx\n",
+(u64)PFN_PHYS(pfn), max_addr, max_allowed);
+   return -E2BIG;
+   }
+
+   return 0;
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -291,6 +307,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned 
long nr_pages,
unsigned long nr, start_sec, end_sec;
struct vmem_altmap *altmap = restrictions->altmap;
 
+   err = check_hotplug_memory_addressable(pfn, nr_pages);
+   if (err)
+   return err;
+
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
-- 
2.21.0

Re: [PATCH RESEND v3 2/4] clone3: switch to copy_struct_from_user()

2019-09-30 Thread Aleksa Sarai

On 2019-09-30, Kees Cook  wrote:
> On Tue, Oct 01, 2019 at 05:15:24AM +1000, Aleksa Sarai wrote:
> > From: Aleksa Sarai 
> > 
> > The change is very straightforward, and helps unify the syscall
> > interface for struct-from-userspace syscalls. Additionally, explicitly
> > define CLONE_ARGS_SIZE_VER0 to match the other users of the
> > struct-extension pattern.
> > 
> > Signed-off-by: Aleksa Sarai 
> > ---
> >  include/uapi/linux/sched.h |  2 ++
> >  kernel/fork.c  | 34 +++---
> >  2 files changed, 9 insertions(+), 27 deletions(-)
> > 
> > diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> > index b3105ac1381a..0945805982b4 100644
> > --- a/include/uapi/linux/sched.h
> > +++ b/include/uapi/linux/sched.h
> > @@ -47,6 +47,8 @@ struct clone_args {
> > __aligned_u64 tls;
> >  };
> >  
> > +#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
> > +
> >  /*
> >   * Scheduling policies
> >   */
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index f9572f416126..2ef529869c64 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2525,39 +2525,19 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, 
> > unsigned long, newsp,
> >  #ifdef __ARCH_WANT_SYS_CLONE3
> >  noinline static int copy_clone_args_from_user(struct kernel_clone_args 
> > *kargs,
> >   struct clone_args __user *uargs,
> > - size_t size)
> > + size_t usize)
> >  {
> > +   int err;
> > struct clone_args args;
> >  
> > -   if (unlikely(size > PAGE_SIZE))
> > +   if (unlikely(usize > PAGE_SIZE))
> > return -E2BIG;
> 
> I quickly looked through the earlier threads and couldn't find it, but
> I have a memory of some discussion about moving this test into the
> copy_struct_from_user() function itself? That would seems like a
> reasonable idea? ("4k should be enough for any structure!")

Yes (and this also seemed the most reasonable way to do it to me), but
the main counter-arguments which swayed me were:

 1. Putting it in the hands of the caller allows them to decide if they
want to have a limit, because if you institute a limit in one kernel
vintage then expanding it later will be less-than-ideally-smooth.

 2. There is no amplification, so doing copy_struct_from_user() for a
really big usize boils down to the userspace program blocking for
the kernel to check if some of your memory is zeroed. Thus there
doesn't seem to be much DoS potential.

Not to mention that users of copy_struct_from_user() will end up doing
some kind of usize comparison anyway (to check if it's smaller than
the version-0 size).

> Either way:
> 
> Reviewed-by: Kees Cook 
> 
> 
> > -
> > -   if (unlikely(size < sizeof(struct clone_args)))
> > +   if (unlikely(usize < CLONE_ARGS_SIZE_VER0))
> > return -EINVAL;
> >  
> > -   if (unlikely(!access_ok(uargs, size)))
> > -   return -EFAULT;
> > -
> > -   if (size > sizeof(struct clone_args)) {
> > -   unsigned char __user *addr;
> > -   unsigned char __user *end;
> > -   unsigned char val;
> > -
> > -   addr = (void __user *)uargs + sizeof(struct clone_args);
> > -   end = (void __user *)uargs + size;
> > -
> > -   for (; addr < end; addr++) {
> > -   if (get_user(val, addr))
> > -   return -EFAULT;
> > -   if (val)
> > -   return -E2BIG;
> > -   }
> > -
> > -   size = sizeof(struct clone_args);
> > -   }
> > -
> > -   if (copy_from_user(, uargs, size))
> > -   return -EFAULT;
> > +   err = copy_struct_from_user(, sizeof(args), uargs, usize);
> > +   if (err)
> > +   return err;
> >  
> > /*
> >  * Verify that higher 32bits of exit_signal are unset and that
> > -- 
> > 2.23.0
> > 

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature

Re: [PATCH 1/3] KVM: X86: Add "nopvspin" parameter to disable PV spinlocks

2019-09-30 Thread Zhenzhong Duan




On 2019/9/30 23:41, Vitaly Kuznetsov wrote:

Zhenzhong Duan  writes:


There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" on XEN platform and
"hv_nopvspin" on HYPER_V).

That feature is missed on KVM, add a new parameter "nopvspin" to disable
PV spinlocks for KVM guest.

This new parameter is also intended to replace "xen_nopvspin" and
"hv_nopvspin" in the future.

Any reason to not do it right now? We will probably need to have compat
code to support xen_nopvspin/hv_nopvspin too but emit a 'is deprecated'
warning.


Sorry the description isn't clear, I'll fix it.

I did the compat work in the other two patches.
[PATCH 2/3] xen: Mark "xen_nopvspin" parameter obsolete and map it to "nopvspin"
[PATCH 3/3] x86/hyperv: Mark "hv_nopvspin" parameter obsolete and map it to 
"nopvspin"




The global variable pvspin isn't defined as __initdata as it's used at
runtime by XEN guest.

Refactor the print stuff with pr_* which is preferred.

Please do it in a separate patch.


Ok, I'll do that in v2. Thanks for review.

Zhenzhong

[PATCH v3] perf tools: avoid sample_reg_masks being const + weak

2019-09-30 Thread Ian Rogers

Being const + weak breaks with some compilers that constant-propagate
from the weak symbol. This behavior is outside of the specification, but
in LLVM is chosen to match GCC's behavior.

LLVM's implementation was set in this patch:
https://github.com/llvm/llvm-project/commit/f49573d1eedcf1e44893d5a062ac1b72c8419646
A const + weak symbol is set to be weak_odr:
https://llvm.org/docs/LangRef.html
ODR is one definition rule, and given there is one constant definition
constant-propagation is possible. It is possible to get this code to
miscompile with LLVM when applying link time optimization. As compilers
become more aggressive, this is likely to break in more instances.

Move the definition of sample_reg_masks to the conditional part of
perf_regs.h and guard usage with HAVE_PERF_REGS_SUPPORT. This avoids the
weak symbol.

Fix an issue when HAVE_PERF_REGS_SUPPORT isn't defined from patch v1.
In v3, add perf_regs.c for architectures that HAVE_PERF_REGS_SUPPORT but
don't declare sample_regs_masks.

Signed-off-by: Ian Rogers 
---
 tools/perf/arch/arm/util/Build | 2 ++
 tools/perf/arch/arm/util/perf_regs.c   | 6 ++
 tools/perf/arch/arm64/util/Build   | 1 +
 tools/perf/arch/arm64/util/perf_regs.c | 6 ++
 tools/perf/arch/csky/util/Build| 2 ++
 tools/perf/arch/csky/util/perf_regs.c  | 6 ++
 tools/perf/arch/riscv/util/Build   | 2 ++
 tools/perf/arch/riscv/util/perf_regs.c | 6 ++
 tools/perf/arch/s390/util/Build| 1 +
 tools/perf/arch/s390/util/perf_regs.c  | 6 ++
 tools/perf/util/parse-regs-options.c   | 8 ++--
 tools/perf/util/perf_regs.c| 4 
 tools/perf/util/perf_regs.h| 4 ++--
 13 files changed, 46 insertions(+), 8 deletions(-)
 create mode 100644 tools/perf/arch/arm/util/perf_regs.c
 create mode 100644 tools/perf/arch/arm64/util/perf_regs.c
 create mode 100644 tools/perf/arch/csky/util/perf_regs.c
 create mode 100644 tools/perf/arch/riscv/util/perf_regs.c
 create mode 100644 tools/perf/arch/s390/util/perf_regs.c

diff --git a/tools/perf/arch/arm/util/Build b/tools/perf/arch/arm/util/Build
index 296f0eac5e18..37fc63708966 100644
--- a/tools/perf/arch/arm/util/Build
+++ b/tools/perf/arch/arm/util/Build
@@ -1,3 +1,5 @@
+perf-y += perf_regs.o
+
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 
 perf-$(CONFIG_LOCAL_LIBUNWIND)+= unwind-libunwind.o
diff --git a/tools/perf/arch/arm/util/perf_regs.c 
b/tools/perf/arch/arm/util/perf_regs.c
new file mode 100644
index ..2864e2e3776d
--- /dev/null
+++ b/tools/perf/arch/arm/util/perf_regs.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "../../util/perf_regs.h"
+
+const struct sample_reg sample_reg_masks[] = {
+   SMPL_REG_END
+};
diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index 3cde540d2fcf..0a7782c61209 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -1,4 +1,5 @@
 perf-y += header.o
+perf-y += perf_regs.o
 perf-y += sym-handling.o
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o
diff --git a/tools/perf/arch/arm64/util/perf_regs.c 
b/tools/perf/arch/arm64/util/perf_regs.c
new file mode 100644
index ..2864e2e3776d
--- /dev/null
+++ b/tools/perf/arch/arm64/util/perf_regs.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "../../util/perf_regs.h"
+
+const struct sample_reg sample_reg_masks[] = {
+   SMPL_REG_END
+};
diff --git a/tools/perf/arch/csky/util/Build b/tools/perf/arch/csky/util/Build
index 1160bb2332ba..7d3050134ae0 100644
--- a/tools/perf/arch/csky/util/Build
+++ b/tools/perf/arch/csky/util/Build
@@ -1,2 +1,4 @@
+perf-y += perf_regs.o
+
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
diff --git a/tools/perf/arch/csky/util/perf_regs.c 
b/tools/perf/arch/csky/util/perf_regs.c
new file mode 100644
index ..2864e2e3776d
--- /dev/null
+++ b/tools/perf/arch/csky/util/perf_regs.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "../../util/perf_regs.h"
+
+const struct sample_reg sample_reg_masks[] = {
+   SMPL_REG_END
+};
diff --git a/tools/perf/arch/riscv/util/Build b/tools/perf/arch/riscv/util/Build
index 1160bb2332ba..7d3050134ae0 100644
--- a/tools/perf/arch/riscv/util/Build
+++ b/tools/perf/arch/riscv/util/Build
@@ -1,2 +1,4 @@
+perf-y += perf_regs.o
+
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
diff --git a/tools/perf/arch/riscv/util/perf_regs.c 
b/tools/perf/arch/riscv/util/perf_regs.c
new file mode 100644
index ..2864e2e3776d
--- /dev/null
+++ b/tools/perf/arch/riscv/util/perf_regs.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "../../util/perf_regs.h"
+
+const struct sample_reg sample_reg_masks[] = {
+   SMPL_REG_END
+};
diff --git a/tools/perf/arch/s390/util/Build b/tools/perf/arch/s390/util/Build
index 22797f043b84..3d9d0f4f72ca 100644
---

Re: [PATCH v2] perf tools: avoid sample_reg_masks being const + weak

2019-09-30 Thread Ian Rogers

Apologies for that. I've addressed in v3 but only tested for riscv.
There is potential for additional tidy up related to this change, let
me know what would be appropriate.

Thanks,
Ian

On Mon, Sep 30, 2019 at 5:42 AM Jiri Olsa  wrote:
>
> On Mon, Sep 30, 2019 at 09:23:35AM -0300, Arnaldo Carvalho de Melo wrote:
>
> SNIP
>
> >   CC   /tmp/build/perf/util/lzma.o
> >   CC   /tmp/build/perf/util/demangle-java.o
> >   CC   /tmp/build/perf/util/demangle-rust.o
> >   CC   /tmp/build/perf/util/jitdump.o
> >   CC   /tmp/build/perf/util/genelf.o
> >   CC   /tmp/build/perf/util/genelf_debug.o
> >   CC   /tmp/build/perf/util/perf-hooks.o
> >   CC   /tmp/build/perf/util/bpf-event.o
> >   FLEX /tmp/build/perf/util/parse-events-flex.c
> >   LD   /tmp/build/perf/util/intel-pt-decoder/perf-in.o
> >   FLEX /tmp/build/perf/util/pmu-flex.c
> >   CC   /tmp/build/perf/util/pmu-bison.o
> >   CC   /tmp/build/perf/util/expr-bison.o
> >   CC   /tmp/build/perf/util/parse-events.o
> >   CC   /tmp/build/perf/util/parse-events-flex.o
> >   CC   /tmp/build/perf/util/pmu.o
> >   CC   /tmp/build/perf/util/pmu-flex.o
> >   LD   /tmp/build/perf/util/perf-in.o
> >   LD   /tmp/build/perf/perf-in.o
> >   LINK /tmp/build/perf/perf
> > /usr/lib/gcc-cross/aarch64-linux-gnu/8/../../../../aarch64-linux-gnu/bin/ld:
> >  /tmp/build/perf/perf-in.o: in function `__parse_regs':
> > /git/linux/tools/perf/util/parse-regs-options.c:39: undefined reference to 
> > `sample_reg_masks'
> > /usr/lib/gcc-cross/aarch64-linux-gnu/8/../../../../aarch64-linux-gnu/bin/ld:
> >  /git/linux/tools/perf/util/parse-regs-options.c:47: undefined reference to 
> > `sample_reg_masks'
> > /usr/lib/gcc-cross/aarch64-linux-gnu/8/../../../../aarch64-linux-gnu/bin/ld:
> >  /git/linux/tools/perf/util/parse-regs-options.c:60: undefined reference to 
> > `sample_reg_masks'
> > /usr/lib/gcc-cross/aarch64-linux-gnu/8/../../../../aarch64-linux-gnu/bin/ld:
> >  /git/linux/tools/perf/util/parse-regs-options.c:50: undefined reference to 
> > `sample_reg_masks'
>
> argh.. I tried on power.. should have tried on arm ;-)
>
> I expected that all the archs that set NO_PERF_REGS := 0 would have
> sample_reg_masks defined..  all those archs did fallback to the:
>
>   const struct sample_reg __weak sample_reg_masks[] = {
>SMPL_REG_END
>   };
>
> those archs are not able to use --user-regs/--intr-regs options,
> but for dwarf unwind we set those registers manualy, so that works
>
> so I guess we need to define empty sample_reg_masks for those archs
>
> jirka

Re: [PATCH RESEND v3 1/4] lib: introduce copy_struct_from_user() helper

2019-09-30 Thread Aleksa Sarai

On 2019-09-30, Kees Cook  wrote:
> On Tue, Oct 01, 2019 at 05:15:23AM +1000, Aleksa Sarai wrote:
> > From: Aleksa Sarai 
> > 
> > A common pattern for syscall extensions is increasing the size of a
> > struct passed from userspace, such that the zero-value of the new fields
> > result in the old kernel behaviour (allowing for a mix of userspace and
> > kernel vintages to operate on one another in most cases).
> > 
> > While this interface exists for communication in both directions, only
> > one interface is straightforward to have reasonable semantics for
> > (userspace passing a struct to the kernel). For kernel returns to
> > userspace, what the correct semantics are (whether there should be an
> > error if userspace is unaware of a new extension) is very
> > syscall-dependent and thus probably cannot be unified between syscalls
> > (a good example of this problem is [1]).
> > 
> > Previously there was no common lib/ function that implemented
> > the necessary extension-checking semantics (and different syscalls
> > implemented them slightly differently or incompletely[2]). Future
> > patches replace common uses of this pattern to make use of
> > copy_struct_from_user().
> > 
> > Some in-kernel selftests that insure that the handling of alignment and
> > various byte patterns are all handled identically to memchr_inv() usage.
> > 
> > [1]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
> >  robustify sched_read_attr() ABI logic and code")
> > 
> > [2]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
> >  similar checks to copy_struct_from_user() while rt_sigprocmask(2)
> >  always rejects differently-sized struct arguments.
> > 
> > Suggested-by: Rasmus Villemoes 
> > Signed-off-by: Aleksa Sarai 
> > ---
> >  include/linux/bitops.h  |   7 +++
> >  include/linux/uaccess.h |   4 ++
> >  lib/strnlen_user.c  |   8 +--
> >  lib/test_user_copy.c| 133 ++--
> >  lib/usercopy.c  | 123 +
> >  5 files changed, 262 insertions(+), 13 deletions(-)
> > 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index cf074bce3eb3..c94a9ff9f082 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -4,6 +4,13 @@
> >  #include 
> >  #include 
> >  
> > +/* Set bits in the first 'n' bytes when loaded from memory */
> > +#ifdef __LITTLE_ENDIAN
> > +#  define aligned_byte_mask(n) ((1UL << 8*(n))-1)
> > +#else
> > +#  define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
> > +#endif
> > +
> >  #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
> >  #define BITS_TO_LONGS(nr)  DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
> >  
> > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> > index 70bbdc38dc37..94f20e6ec6ab 100644
> > --- a/include/linux/uaccess.h
> > +++ b/include/linux/uaccess.h
> > @@ -231,6 +231,10 @@ __copy_from_user_inatomic_nocache(void *to, const void 
> > __user *from,
> >  
> >  #endif /* ARCH_HAS_NOCACHE_UACCESS */
> >  
> > +extern int check_zeroed_user(const void __user *from, size_t size);
> > +extern int copy_struct_from_user(void *dst, size_t ksize,
> > +const void __user *src, size_t usize);
> > +
> >  /*
> >   * probe_kernel_read(): safely attempt to read from a location
> >   * @dst: pointer to the buffer that shall take the data
> > diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
> > index 28ff554a1be8..6c0005d5dd5c 100644
> > --- a/lib/strnlen_user.c
> > +++ b/lib/strnlen_user.c
> > @@ -3,16 +3,10 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  
> > -/* Set bits in the first 'n' bytes when loaded from memory */
> > -#ifdef __LITTLE_ENDIAN
> > -#  define aligned_byte_mask(n) ((1ul << 8*(n))-1)
> > -#else
> > -#  define aligned_byte_mask(n) (~0xfful << (BITS_PER_LONG - 8 - 8*(n)))
> > -#endif
> > -
> >  /*
> >   * Do a strnlen, return length of string *with* final '\0'.
> >   * 'count' is the user-supplied count, while 'max' is the
> > diff --git a/lib/test_user_copy.c b/lib/test_user_copy.c
> > index 67bcd5dfd847..3a17f71029bb 100644
> > --- a/lib/test_user_copy.c
> > +++ b/lib/test_user_copy.c
> > @@ -16,6 +16,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  /*
> >   * Several 32-bit architectures support 64-bit {get,put}_user() calls.
> > @@ -31,14 +32,129 @@
> >  # define TEST_U64
> >  #endif
> >  
> > -#define test(condition, msg)   \
> > -({ \
> > -   int cond = (condition); \
> > -   if (cond)   \
> > -   pr_warn("%s\n", msg);   \
> > -   cond;   \
> > +#define test(condition, msg, ...)  \
> > +({ \
> > +   int cond = (condition); \
> > +   if

[v6 PATCH] RISC-V: Remove unsupported isa string info print

2019-09-30 Thread Atish Patra

/proc/cpuinfo should just print all the isa string as an information
instead of determining what is supported or not. ELF hwcap can be
used by the userspace to figure out that.

Simplify the isa string printing by removing the unsupported isa string
print and all related code.

The relevant discussion can be found at
http://lists.infradead.org/pipermail/linux-riscv/2019-September/006702.html

Signed-off-by: Atish Patra 
---
 arch/riscv/kernel/cpu.c | 35 ---
 1 file changed, 4 insertions(+), 31 deletions(-)

diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 7da3c6a93abd..9486c426af86 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -48,49 +48,22 @@ int riscv_of_processor_hartid(struct device_node *node)
 
 static void print_isa(struct seq_file *f, const char *orig_isa)
 {
-   static const char *ext = "mafdcsu";
-   const char *isa = orig_isa;
-   const char *e;
-
/*
 * Linux doesn't support rv32e or rv128i, and we only support booting
 * kernels on harts with the same ISA that the kernel is compiled for.
 */
 #if defined(CONFIG_32BIT)
-   if (strncmp(isa, "rv32i", 5) != 0)
+   if (strncmp(orig_isa, "rv32i", 5) != 0)
return;
 #elif defined(CONFIG_64BIT)
-   if (strncmp(isa, "rv64i", 5) != 0)
+   if (strncmp(orig_isa, "rv64i", 5) != 0)
return;
 #endif
 
-   /* Print the base ISA, as we already know it's legal. */
+   /* Print the entire ISA as it is */
seq_puts(f, "isa\t\t: ");
-   seq_write(f, isa, 5);
-   isa += 5;
-
-   /*
-* Check the rest of the ISA string for valid extensions, printing those
-* we find.  RISC-V ISA strings define an order, so we only print the
-* extension bits when they're in order. Hide the supervisor (S)
-* extension from userspace as it's not accessible from there.
-*/
-   for (e = ext; *e != '\0'; ++e) {
-   if (isa[0] == e[0]) {
-   if (isa[0] != 's')
-   seq_write(f, isa, 1);
-
-   isa++;
-   }
-   }
+   seq_write(f, orig_isa, strlen(orig_isa));
seq_puts(f, "\n");
-
-   /*
-* If we were given an unsupported ISA in the device tree then print
-* a bit of info describing what went wrong.
-*/
-   if (isa[0] != '\0')
-   pr_info("unsupported ISA \"%s\" in device tree\n", orig_isa);
 }
 
 static void print_mmu(struct seq_file *f, const char *mmu_type)
-- 
2.21.0

[PATCH mmotm] sparc64: pgtable_64.h: fix mismatched parens

2019-09-30 Thread Randy Dunlap

From: Randy Dunlap 

Fix lib-untag-user-pointers-in-strn_user.patch unmatched left paren.
Fixes many of these build errors:

../mm/gup.c: In function '__get_user_pages':
../mm/gup.c:791:30: error: expected ')' before ';' token
  start = untagged_addr(start);
  ^
In file included from ../arch/sparc/include/asm/pgtable.h:5,
 from ../include/linux/mm.h:99,
 from ../mm/gup.c:7:
../arch/sparc/include/asm/pgtable_64.h:1102:2: note: to match this '('
  ((__typeof__(addr))(__untagged_addr((unsigned long)(addr)))
  ^
../mm/gup.c:791:10: note: in expansion of macro 'untagged_addr'
  start = untagged_addr(start);
  ^
../mm/gup.c:892:21: error: expected ';' before '}' token

Signed-off-by: Randy Dunlap 
Cc: Andrey Konovalov 
---

Is this already fixed???


 arch/sparc/include/asm/pgtable_64.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- mmotm-2019-0925-1810.orig/arch/sparc/include/asm/pgtable_64.h
+++ mmotm-2019-0925-1810/arch/sparc/include/asm/pgtable_64.h
@@ -1099,7 +1099,7 @@ static inline unsigned long __untagged_a
return start;
 }
 #define untagged_addr(addr) \
-   ((__typeof__(addr))(__untagged_addr((unsigned long)(addr)))
+   ((__typeof__(addr))(__untagged_addr((unsigned long)(addr
 
 static inline bool pte_access_permitted(pte_t pte, bool write)
 {

lening

2019-09-30 Thread Simple Federal Credit Union

Hallo beste klant,

Dit is Simple Federal Credit Union. We bieden een lening variërend van 5.000 
tot 30.000.000 euro over een periode van 1 tot 30 jaar en met een rente van 3%.
Dus we zijn klaar om u te helpen bij particuliere geldschieters / investeerders 
en bieden zowel persoonlijke lening, startlening, educatieve / agrarische 
lening, onroerend goed / bouwlening, onroerend goed lening, schuldenvrije 
leningen, stacaravan lening, hard geld lening, beveiligd / onbeveiligd lening, 
investeringsfinanciering, uitbreidingslening, Jv-kapitaal / rehab-leningen, 
aandelen / herfinancieringsleningen enz. aan geïnteresseerde personen uit elk 
land moeten contact met ons opnemen via e-mail: simpelfedcre...@outlook.com

Voor-en achternaam:
Telefoonnummer:
Benodigde leningbedrag:
Looptijd:

Bedankt,
Eric.

Re: What populates /proc/partitions ?

2019-09-30 Thread David F.

Thanks for the replies.   I'll see if I can figure this out.   I know
with the same kernel and older udev version in use that it didn't add
it, but with the new udev (eudev) it does (one big difference is the
new one requires and uses devtmpfs and the old one didn't).

I tried making the floppy a module but it still loads on vmware player
and the physical test system I'm using that doesn't have one but
reports it as existing (vmware doesn't hang, just adds fd0 read errors
to log, but physical system does hang while fdisk -l, mdadm --scan
runs, etc..).

As far as the log, debugging udev output, it's close to the same, the
message log (busybox) not much in there to say what's up.   I even
tried the old .rules that were being used with the old udev version,
but made no difference.

On Mon, Sep 30, 2019 at 4:49 PM Randy Dunlap  wrote:
>
> On 9/30/19 3:47 PM, David F. wrote:
> > Hi,
> >
> > I want to find out why fd0 is being added to /proc/partitions and stop
> > that for my build.  I've searched "/proc/partitions" and "partitions",
> > not finding anything that matters.
>
> /proc/partitions is produced on demand by causing a read of it.
> That is done by these functions (pointers) in block/genhd.c:
>
> static const struct seq_operations partitions_op = {
> .start  = show_partition_start,
> .next   = disk_seqf_next,
> .stop   = disk_seqf_stop,
> .show   = show_partition
> };
>
> in particular, show_partition().  In turn, that function uses data that was
> produced upon block device discovery, also in block/genhd.c.
> See functions disk_get_part(), disk_part_iter_init(), disk_part_iter_next(),
> disk_part_iter_exit(), __device_add_disk(), and get_gendisk().
>
> > If udev is doing it, what function is it call so I can search on that?
>
> I don't know about that.  I guess in the kernel it is about "uevents".
> E.g., in block/genhd.c, there are some calls to kobject_uevent() or variants
> of it.
>
> > TIA!!
>
> There should be something in your boot log about "fd" or "fd0" or floppy.
> eh?
>
> --
> ~Randy

clk/clk-next boot bisection: v5.4-rc1 on bcm2837-rpi-3-b-32

2019-09-30 Thread kernelci.org bot

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has  *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.  *
*   *
* If you do send a fix, please include this trailer:*
*   Reported-by: "kernelci.org bot"   *
*   *
* Hope this helps!  *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

clk/clk-next boot bisection: v5.4-rc1 on bcm2837-rpi-3-b-32

Summary:
  Start:  54ecb8f7028c Linux 5.4-rc1
  Details:https://kernelci.org/boot/id/5d92746459b514e473d857e7
  Plain log:  
https://storage.kernelci.org//clk/clk-next/v5.4-rc1/arm/bcm2835_defconfig/gcc-8/lab-baylibre/boot-bcm2837-rpi-3-b.txt
  HTML log:   
https://storage.kernelci.org//clk/clk-next/v5.4-rc1/arm/bcm2835_defconfig/gcc-8/lab-baylibre/boot-bcm2837-rpi-3-b.html
  Result: ac7c3e4ff401 compiler: enable CONFIG_OPTIMIZE_INLINING forcibly

Checks:
  revert: PASS
  verify: PASS

Parameters:
  Tree:   clk
  URL:https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
  Branch: clk-next
  Target: bcm2837-rpi-3-b-32
  CPU arch:   arm
  Lab:lab-baylibre
  Compiler:   gcc-8
  Config: bcm2835_defconfig
  Test suite: boot

Breaking commit found:

---
commit ac7c3e4ff401b304489a031938dbeaab585bfe0a
Author: Masahiro Yamada 
Date:   Wed Sep 25 16:47:42 2019 -0700

compiler: enable CONFIG_OPTIMIZE_INLINING forcibly

Commit 9012d011660e ("compiler: allow all arches to enable
CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable this
option.  A couple of build errors were reported by randconfig, but all of
them have been ironed out.

Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely (and it
will simplify the 'inline' macro in compiler_types.h), this commit changes
it to always-on option.  Going forward, the compiler will always be
allowed to not inline functions marked 'inline'.

This is not a problem for x86 since it has been long used by
arch/x86/configs/{x86_64,i386}_defconfig.

I am keeping the config option just in case any problem crops up for other
architectures.

The code clean-up will be done after confirming this is solid.

Link: 
http://lkml.kernel.org/r/20190830034304.24259-1-yamada.masah...@socionext.com
Signed-off-by: Masahiro Yamada 
Acked-by: Nick Desaulniers 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Miguel Ojeda 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6b1b1703a646..93d97f9b0157 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -311,7 +311,7 @@ config HEADERS_CHECK
  relevant for userspace, say 'Y'.
 
 config OPTIMIZE_INLINING
-   bool "Allow compiler to uninline functions marked 'inline'"
+   def_bool y
help
  This option determines if the kernel forces gcc to inline the 
functions
  developers have marked 'inline'. Doing so takes away freedom from gcc 
to
@@ -322,8 +322,6 @@ config OPTIMIZE_INLINING
  decision will become the default in the future. Until then this option
  is there to test gcc for this.
 
- If unsure, say N.
-
 config DEBUG_SECTION_MISMATCH
bool "Enable full Section mismatch analysis"
help
---


Git bisection log:

---
git bisect start
# good: [ebd47c8434064687ab6641e837144e0a3ea3872d] Merge branches 
'clk-bulk-fix', 'clk-at91' and 'clk-sprd' into clk-next
git bisect good ebd47c8434064687ab6641e837144e0a3ea3872d
# bad: [54ecb8f7028c5eb3d740bb82b0f1d90f2df63c5c] Linux 5.4-rc1
git bisect bad 54ecb8f7028c5eb3d740bb82b0f1d90f2df63c5c
# good: [8b53c76533aa4356602aea98f98a2f3b4051464c] Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect good 8b53c76533aa4356602aea98f98a2f3b4051464c
# good: [574cc4539762561d96b456dbc0544d8898bd4c6e] Merge tag 
'drm-next-2019-09-18' of git://anongit.freedesktop.org/drm/drm
git bisect good 574cc4539762561d96b456dbc0544d8898bd4c6e
# good: [7e2f2a0cd17cfc42acb4b6a293d5cb6c7eda9862] mm, page_owner: record page 
owner for each subpage
git bisect good 7e2f2a0cd17cfc42acb4b6a293d5cb6c7eda9862
# bad: [972a2bf7dfe39ebf49dd47f68d27c416392e53b1] Merge tag 'nfs-for-5.4-1' of 
git://git.linux-nfs.org/projects/anna/linux-nfs
git bisect bad 972a2bf7dfe39ebf49dd47f68d27c416392e53b1
#

[PATCH RFC 0/1] VFIO: Region-specific file descriptors

2019-09-30 Thread Shawn Anastasio

This patch adds region file descriptors to VFIO, a simple file descriptor type
that allows read/write/mmap operations on a single region of a VFIO device.

This feature is particularly useful for privileged applications that use VFIO
and wish to share file descriptors with unprivileged applications without
handing over full control of the device. It also allows applications to use
regular offsets in read/write/mmap instead of the region index + offset that
must be used with device file descriptors.

The current implementation is very raw (PCI only, no reference counting which
is probably wrong), but I wanted to get a sense to see if this feature is
desired. If it is, tips on how to implement this more correctly are
appreciated.

Comments welcome!


Shawn Anastasio (1):
  vfio/pci: Introduce region file descriptors

 drivers/vfio/pci/vfio_pci.c | 105 
 drivers/vfio/pci/vfio_pci_private.h |   5 ++
 include/uapi/linux/vfio.h   |  14 
 3 files changed, 124 insertions(+)

-- 
2.20.1

[PATCH RFC 1/1] vfio/pci: Introduce region file descriptors

2019-09-30 Thread Shawn Anastasio

Introduce a new type of VFIO file descriptor that allows
memfd-style semantics for regions of a VFIO device.

Unlike VFIO device file descriptors, region file descriptors
are limited to a single region, and all offsets (read, etc.)
are relative to the start of the region.

This allows for finer granularity when sharing VFIO fds,
as applications can now choose to only share specific regions.

Signed-off-by: Shawn Anastasio 
---
 drivers/vfio/pci/vfio_pci.c | 105 
 drivers/vfio/pci/vfio_pci_private.h |   5 ++
 include/uapi/linux/vfio.h   |  14 
 3 files changed, 124 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 02206162eaa9..132ed245cd68 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vfio_pci_private.h"
 
@@ -688,6 +689,9 @@ int vfio_pci_register_dev_region(struct vfio_pci_device 
*vdev,
return 0;
 }
 
+
+static const struct file_operations vfio_region_fops;
+
 static long vfio_pci_ioctl(void *device_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -1137,6 +1141,54 @@ static long vfio_pci_ioctl(void *device_data,
 
return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
  ioeventfd.data, count, ioeventfd.fd);
+   } else if (cmd == VFIO_DEVICE_GET_REGION_FD) {
+   struct pci_dev *pdev = vdev->pdev;
+   u32 index;
+   u32 len;
+   int ret;
+   struct file *filep;
+   struct vfio_pci_region_info *info;
+
+   if (copy_from_user(, (void __user *)arg, sizeof(u32)))
+   return -EFAULT;
+
+   /* Don't support non-BAR regions */
+   if (index > VFIO_PCI_BAR5_REGION_INDEX)
+   return -EINVAL;
+
+   len = pci_resource_len(pdev, index);
+   if (!len)
+   return -EINVAL;
+
+   if (!vdev->bar_mmap_supported[index])
+   return -EINVAL;
+
+   info = kzalloc(sizeof(*info), GFP_KERNEL);
+   if (!info)
+   return -ENOMEM;
+
+   info->index = index;
+   info->vdev = vdev;
+
+   ret = get_unused_fd_flags(O_CLOEXEC);
+   if (ret < 0) {
+   kfree(info);
+   return ret;
+   }
+
+   filep = anon_inode_getfile("[vfio-region]", _region_fops,
+  info, O_RDWR);
+   if (IS_ERR(filep)) {
+   put_unused_fd(ret);
+   ret = PTR_ERR(filep);
+   kfree(info);
+   return ret;
+   }
+   filep->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
+
+   fd_install(ret, filep);
+
+   return ret;
}
 
return -ENOTTY;
@@ -1286,6 +1338,59 @@ static const struct vfio_device_ops vfio_pci_ops = {
.request= vfio_pci_request,
 };
 
+static int vfio_region_fops_release(struct inode *inode, struct file *filep)
+{
+   kfree(filep->private_data);
+   return 0;
+}
+
+static ssize_t vfio_region_fops_read(struct file *filep, char __user *buf,
+size_t count, loff_t *ppos)
+{
+   struct vfio_pci_region_info *info = filep->private_data;
+
+   if (*ppos > VFIO_PCI_OFFSET_MASK)
+   return -EINVAL;
+
+   *ppos |= VFIO_PCI_INDEX_TO_OFFSET(info->index);
+
+   return vfio_pci_rw(info->vdev, buf, count, ppos, false);
+}
+
+static ssize_t vfio_region_fops_write(struct file *filep,
+ const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+   struct vfio_pci_region_info *info = filep->private_data;
+
+   if (*ppos > VFIO_PCI_OFFSET_MASK)
+   return -EINVAL;
+
+   *ppos |= VFIO_PCI_INDEX_TO_OFFSET(info->index);
+
+   return vfio_pci_rw(info->vdev, (char __user *)buf, count, ppos, true);
+}
+
+static int vfio_region_fops_mmap(struct file *filep, struct vm_area_struct 
*vma)
+{
+   struct vfio_pci_region_info *info = filep->private_data;
+
+   if (vma->vm_pgoff > ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1))
+   return -EINVAL;
+
+   vma->vm_pgoff |= info->index << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
+
+   return vfio_pci_mmap(info->vdev, vma);
+}
+
+static const struct file_operations vfio_region_fops = {
+   .owner = THIS_MODULE,
+   .release = vfio_region_fops_release,
+   .read = vfio_region_fops_read,
+   .write = vfio_region_fops_write,
+   .mmap = vfio_region_fops_mmap
+};
+
 static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
 static void vfio_pci_reflck_put(struct

Re: [PATCH 1/1] blk-mq: fill header with kernel-doc

2019-09-30 Thread André Almeida

On 9/30/19 6:54 PM, Minwoo Im wrote:
> Hi André,
> 
>> -/*
>> +/**
>> + * blk_mq_rq_from_pdu - cast a PDU to a request
>> + * @pdu: the PDU (protocol unit request) to be casted
> 
> It makes sense, but it looks like PDU stands for protocol unit request.
> Could we have it "PDU(Protocol Data Unit)" ?
> 
Ops, thank your for the input :)

> Thanks,
> 
>

Re: [PATCH 1/1] blk-mq: fill header with kernel-doc

2019-09-30 Thread André Almeida

On 9/30/19 7:01 PM, Bart Van Assche wrote:
> On 9/30/19 12:48 PM, André Almeida wrote:
>> Insert documentation for structs, enums and functions at header file.
>> Format existing and new comments at struct blk_mq_ops as
>> kernel-doc comments.
> 
> Hi André,
> 
> Seeing the documentation being improved is great. However, this patch
> conflicts with a patch series in my tree and that I plan to post soon.
> So I would appreciate it if this patch would be withheld until after my
> patch series has been accepted.
>

Sure, no problem. If it helps the workflow, I could rebase my patch at
the top of your tree.


> Thanks,
> 
> Bart.
>

Re: What populates /proc/partitions ?

2019-09-30 Thread Randy Dunlap

On 9/30/19 3:47 PM, David F. wrote:
> Hi,
> 
> I want to find out why fd0 is being added to /proc/partitions and stop
> that for my build.  I've searched "/proc/partitions" and "partitions",
> not finding anything that matters.

/proc/partitions is produced on demand by causing a read of it.
That is done by these functions (pointers) in block/genhd.c:

static const struct seq_operations partitions_op = {
.start  = show_partition_start,
.next   = disk_seqf_next,
.stop   = disk_seqf_stop,
.show   = show_partition
};

in particular, show_partition().  In turn, that function uses data that was
produced upon block device discovery, also in block/genhd.c.
See functions disk_get_part(), disk_part_iter_init(), disk_part_iter_next(),
disk_part_iter_exit(), __device_add_disk(), and get_gendisk().

> If udev is doing it, what function is it call so I can search on that?

I don't know about that.  I guess in the kernel it is about "uevents".
E.g., in block/genhd.c, there are some calls to kobject_uevent() or variants
of it.

> TIA!!

There should be something in your boot log about "fd" or "fd0" or floppy.
eh?

-- 
~Randy

Re: [PATCH v2 2/3] mm, page_owner: decouple freeing stack trace from debug_pagealloc

2019-09-30 Thread Qian Cai

> On Sep 30, 2019, at 5:43 PM, Vlastimil Babka  wrote:
> 
> Well, my use case is shipping production kernels with CONFIG_PAGE_OWNER
> and CONFIG_DEBUG_PAGEALLOC enabled, and instructing users to boot-time
> enable only for troubleshooting a crash or memory leak, without a need
> to install a debug kernel. Things like static keys and page_ext
> allocations makes this possible without CPU and memory overhead when not
> boot-time enabled. I don't know too much about KASAN internals, but I
> assume it's not possible to use it that way on production kernels yet?

In that case, why can’t users just simply enable page_owner=on and 
debug_pagealloc=on for troubleshooting? The later makes the kernel slower, but 
I am not sure if it is worth optimization by adding a new parameter. There have 
already been quite a few MM-related kernel parameters that could tidy up a bit 
in the future.

Re: What populates /proc/partitions ?

2019-09-30 Thread Brian Masney

On Mon, Sep 30, 2019 at 03:47:21PM -0700, David F. wrote:
> Hi,
> 
> I want to find out why fd0 is being added to /proc/partitions and stop
> that for my build.  I've searched "/proc/partitions" and "partitions",
> not finding anything that matters.

It looks like it is done in block/genhd.c with this function:

static int __init proc_genhd_init(void)
{
proc_create_seq("diskstats", 0, NULL, _op);
proc_create_seq("partitions", 0, NULL, _op);
return 0;
}
module_init(proc_genhd_init);

Brian


> 
> If udev is doing it, what function is it call so I can search on that?
> 
> TIA!!

Re: [GIT PULL] scheduler fixes

2019-09-30 Thread John Stultz

On Sat, Sep 28, 2019 at 5:40 AM Ingo Molnar  wrote:
>
> Please pull the latest sched-urgent-for-linus git tree from:
>
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> sched-urgent-for-linus
>
># HEAD: 4892f51ad54ddff2883a60b6ad4323c1f632a9d6 sched/fair: Avoid 
> redundant EAS calculation
>
> The changes are:
>
>  - Apply a number of membarrier related fixes and cleanups, which fixes a
>use-after-free race in the membarrier code.
>
>  - Introduce proper RCU protection for tasks on the runqueue - to get rid
>of the subtle task_rcu_dereference() interface that was easy to get
>wrong.
>
>  - Misc fixes, but also an EAS speedup.
>
>  Thanks,
>
> Ingo
>
> -->
> Eric W. Biederman (4):
>   tasks: Add a count of task RCU users
>   tasks, sched/core: Ensure tasks are available for a grace period after 
> leaving the runqueue
>   tasks, sched/core: With a grace period after finish_task_switch(), 
> remove unnecessary code
>   tasks, sched/core: RCUify the assignment of rq->curr
>
> KeMeng Shi (1):
>   sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
>
> Mathieu Desnoyers (7):
>   sched/membarrier: Fix private expedited registration check
>   sched/membarrier: Remove redundant check
>   sched/membarrier: Call sync_core only before usermode for same mm
>   sched/membarrier: Fix p->mm->membarrier_state racy load
>   selftests, sched/membarrier: Add multi-threaded test
>   sched/membarrier: Skip IPIs when mm->mm_users == 1
>   sched/membarrier: Return -ENOMEM to userspace on memory allocation 
> failure
>
> Qian Cai (3):
>   sched/fair: Remove unused cfs_rq_clock_task() function
>   sched/core: Convert vcpu_is_preempted() from macro to an inline function
>   sched/fair: Fix -Wunused-but-set-variable warnings
>
> Quentin Perret (1):
>   sched/fair: Avoid redundant EAS calculation
>
> Valentin Schneider (2):
>   sched/core: Fix preempt_schedule() interrupt return comment
>   sched/core: Remove double update_max_interval() call on CPU startup

Hey all,
  After rebasing my hikey960 patches onto v5.4-rc1, I started seeing
boot hangs/stalls trying boot AOSP:

[9.788182] [ cut here ]
[9.792829] WARNING: CPU: 7 PID: 516 at
kernel/rcu/tree_plugin.h:293 rcu_note_context_switch+0x48/0x4a8
[9.802229] Modules linked in:
[9.805298] CPU: 7 PID: 516 Comm: Jit thread pool Not tainted
5.3.0-13104-g0dbefe07634f #1126
[9.813822] Hardware name: HiKey960 (DT)
[9.817742] pstate: 20400085 (nzCv daIf +PAN -UAO)
[9.822530] pc : rcu_note_context_switch+0x48/0x4a8
[9.827403] lr : rcu_note_context_switch+0x1c/0x4a8
[9.832273] sp : ffc012ee3a60
[9.835581] x29: ffc012ee3a60 x28: ff82192d4140
[9.840889] x27:  x26: ff821f7b38c0
[9.846195] x25: efb51cf8 x24: ffc0117ba000
[9.851501] x23:  x22: ff82192d4140
[9.856806] x21:  x20: ff821f7b38c0
[9.862111] x19: ff821f7b44c0 x18: 
[9.867416] x17:  x16: 
[9.872721] x15:  x14: 
[9.878026] x13:  x12: 
[9.883331] x11:  x10: 
[9.888636] x9 :  x8 : ffc012ee3c60
[9.893941] x7 : ffc012ee3c70 x6 : ff8219026788
[9.899246] x5 : 014a2000 x4 : 
[9.904551] x3 : ffc20e1fe000 x2 : 0001
[9.909856] x1 : ffc0117ba428 x0 : 0023
[9.915163] Call trace:
[9.917605]  rcu_note_context_switch+0x48/0x4a8
[9.922134]  __schedule+0x90/0x7d8
[9.925530]  schedule+0x38/0xc0
[9.928667]  futex_wait_queue_me+0xc0/0x140
[9.932847]  futex_wait+0xe0/0x210
[9.936242]  do_futex+0x618/0xdf8
[9.939551]  __arm64_sys_futex_time32+0xfc/0x148
[9.944167]  el0_svc_common.constprop.1+0x64/0x188
[9.948955]  el0_svc_compat_handler+0x18/0x38
[9.953307]  el0_svc_compat+0x8/0x2c
[9.956876] ---[ end trace cdf2ffd45270a24d ]---

Usually followed by:
[   30.807092] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   30.813207] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P521 P519
[   30.819998]  (detected by 4, t=5255 jiffies, g=169, q=5967)
[   30.825568] Jit thread pool S0   521  1 0x
[   30.831050] Call trace:
[   30.833498]  __switch_to+0xd4/0x230
[   30.836984]  __schedule+0x320/0x7d8
[   30.840464]  schedule+0x38/0xc0
[   30.843600]  futex_wait_queue_me+0xc0/0x140
[   30.847776]  futex_wait+0xe0/0x210
[   30.851169]  do_futex+0x618/0xdf8
[   30.854476]  __arm64_sys_futex+0xfc/0x148
[   30.858479]  el0_svc_common.constprop.1+0x64/0x188
[   30.863262]  el0_svc_handler+0x20/0x80
[   30.867003]  el0_svc+0x8/0xc
[   30.869876] Jit thread pool S0   519  1 0x0040
[   30.875353] Call trace:
[

Re: [PATCH RESEND v3 4/4] perf_event_open: switch to copy_struct_from_user()

2019-09-30 Thread Kees Cook

On Tue, Oct 01, 2019 at 05:15:26AM +1000, Aleksa Sarai wrote:
> From: Aleksa Sarai 
> 
> The change is very straightforward, and helps unify the syscall
> interface for struct-from-userspace syscalls.
> 
> Signed-off-by: Aleksa Sarai 

Reviewed-by: Kees Cook 

-Kees

> ---
>  kernel/events/core.c | 47 +---
>  1 file changed, 9 insertions(+), 38 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4655adbbae10..3f0cb82e4fbc 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10586,55 +10586,26 @@ static int perf_copy_attr(struct perf_event_attr 
> __user *uattr,
>   u32 size;
>   int ret;
>  
> - if (!access_ok(uattr, PERF_ATTR_SIZE_VER0))
> - return -EFAULT;
> -
> - /*
> -  * zero the full structure, so that a short copy will be nice.
> -  */
> + /* Zero the full structure, so that a short copy will be nice. */
>   memset(attr, 0, sizeof(*attr));
>  
>   ret = get_user(size, >size);
>   if (ret)
>   return ret;
>  
> - if (size > PAGE_SIZE)   /* silly large */
> - goto err_size;
> -
> - if (!size)  /* abi compat */
> + /* ABI compatibility quirk: */
> + if (!size)
>   size = PERF_ATTR_SIZE_VER0;
> -
> - if (size < PERF_ATTR_SIZE_VER0)
> + if (size < PERF_ATTR_SIZE_VER0 || size > PAGE_SIZE)
>   goto err_size;
>  
> - /*
> -  * If we're handed a bigger struct than we know of,
> -  * ensure all the unknown bits are 0 - i.e. new
> -  * user-space does not rely on any kernel feature
> -  * extensions we dont know about yet.
> -  */
> - if (size > sizeof(*attr)) {
> - unsigned char __user *addr;
> - unsigned char __user *end;
> - unsigned char val;
> -
> - addr = (void __user *)uattr + sizeof(*attr);
> - end  = (void __user *)uattr + size;
> -
> - for (; addr < end; addr++) {
> - ret = get_user(val, addr);
> - if (ret)
> - return ret;
> - if (val)
> - goto err_size;
> - }
> - size = sizeof(*attr);
> + ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);
> + if (ret) {
> + if (ret == -E2BIG)
> + goto err_size;
> + return ret;
>   }
>  
> - ret = copy_from_user(attr, uattr, size);
> - if (ret)
> - return -EFAULT;
> -
>   attr->size = size;
>  
>   if (attr->__reserved_1)
> -- 
> 2.23.0
> 

-- 
Kees Cook

Re: [PATCH RESEND v3 3/4] sched_setattr: switch to copy_struct_from_user()

2019-09-30 Thread Kees Cook

On Tue, Oct 01, 2019 at 05:15:25AM +1000, Aleksa Sarai wrote:
> From: Aleksa Sarai 
> 
> The change is very straightforward, and helps unify the syscall
> interface for struct-from-userspace syscalls. Ideally we could also
> unify sched_getattr(2)-style syscalls as well, but unfortunately the
> correct semantics for such syscalls are much less clear (see [1] for
> more detail). In future we could come up with a more sane idea for how
> the syscall interface should look.
> 
> [1]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
>  robustify sched_read_attr() ABI logic and code")
> 
> Signed-off-by: Aleksa Sarai 

Reviewed-by: Kees Cook 

-Kees

> ---
>  kernel/sched/core.c | 43 +++
>  1 file changed, 7 insertions(+), 36 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7880f4f64d0e..dd05a378631a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5106,9 +5106,6 @@ static int sched_copy_attr(struct sched_attr __user 
> *uattr, struct sched_attr *a
>   u32 size;
>   int ret;
>  
> - if (!access_ok(uattr, SCHED_ATTR_SIZE_VER0))
> - return -EFAULT;
> -
>   /* Zero the full structure, so that a short copy will be nice: */
>   memset(attr, 0, sizeof(*attr));
>  
> @@ -5116,45 +5113,19 @@ static int sched_copy_attr(struct sched_attr __user 
> *uattr, struct sched_attr *a
>   if (ret)
>   return ret;
>  
> - /* Bail out on silly large: */
> - if (size > PAGE_SIZE)
> - goto err_size;
> -
>   /* ABI compatibility quirk: */
>   if (!size)
>   size = SCHED_ATTR_SIZE_VER0;
> -
> - if (size < SCHED_ATTR_SIZE_VER0)
> + if (size < SCHED_ATTR_SIZE_VER0 || size > PAGE_SIZE)
>   goto err_size;
>  
> - /*
> -  * If we're handed a bigger struct than we know of,
> -  * ensure all the unknown bits are 0 - i.e. new
> -  * user-space does not rely on any kernel feature
> -  * extensions we dont know about yet.
> -  */
> - if (size > sizeof(*attr)) {
> - unsigned char __user *addr;
> - unsigned char __user *end;
> - unsigned char val;
> -
> - addr = (void __user *)uattr + sizeof(*attr);
> - end  = (void __user *)uattr + size;
> -
> - for (; addr < end; addr++) {
> - ret = get_user(val, addr);
> - if (ret)
> - return ret;
> - if (val)
> - goto err_size;
> - }
> - size = sizeof(*attr);
> + ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);
> + if (ret) {
> + if (ret == -E2BIG)
> + goto err_size;
> + return ret;
>   }
>  
> - ret = copy_from_user(attr, uattr, size);
> - if (ret)
> - return -EFAULT;
> -
>   if ((attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) &&
>   size < SCHED_ATTR_SIZE_VER1)
>   return -EINVAL;
> @@ -5354,7 +5325,7 @@ sched_attr_copy_to_user(struct sched_attr __user *uattr,
>   * sys_sched_getattr - similar to sched_getparam, but with sched_attr
>   * @pid: the pid in question.
>   * @uattr: structure containing the extended parameters.
> - * @usize: sizeof(attr) that user-space knows about, for forwards and 
> backwards compatibility.
> + * @usize: sizeof(attr) for fwd/bwd comp.
>   * @flags: for future extension.
>   */
>  SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
> -- 
> 2.23.0
> 

-- 
Kees Cook

Re: [PATCH 00/14] Add support for FM radio in hcill and kill TI_ST

2019-09-30 Thread Adam Ford

On Fri, May 10, 2019 at 10:38 AM Marcel Holtmann  wrote:
>
> Hi Adam,
>
> >>> This moves all remaining users of the legacy TI_ST driver to hcill 
> >>> (patches
> >>> 1-3). Then patches 4-7 convert wl128x-radio driver to a standard 
> >>> platform
> >>> device driver with support for multiple instances. Patch 7 will 
> >>> result in
> >>> (userless) TI_ST driver no longer supporting radio at runtime. Patch 
> >>> 8-11 do
> >>> some cleanups in the wl128x-radio driver. Finally patch 12 removes 
> >>> the TI_ST
> >>> specific parts from wl128x-radio and adds the required infrastructure 
> >>> to use it
> >>> with the serdev hcill driver instead. The remaining patches 13 and 14 
> >>> remove
> >>> the old TI_ST code.
> >>>
> >>> The new code has been tested on the Motorola Droid 4. For testing the 
> >>> audio
> >>> should be configured to route Ext to Speaker or Headphone. Then you 
> >>> need to
> >>> plug headphone, since its cable is used as antenna. For testing there 
> >>> is a
> >>> 'radio' utility packages in Debian. When you start the utility you 
> >>> need to
> >>> specify a frequency, since initial get_frequency returns an error:
> >>
> >> What is the status of this series?
> >>
> >> Based on some of the replies (from Adam Ford in particular) it appears 
> >> that
> >> this isn't ready to be merged, so is a v2 planned?
> >
> > Yes, a v2 is planned, but I'm super busy at the moment. I don't
> > expect to send something for this merge window. Neither LogicPD
> > nor IGEP use FM radio, so I can just remove FM support from the
> > TI_ST framework. Converting those platforms to hci_ll can be done
> > in a different patchset.
> >
> > If that was the only issue there would be a v2 already. But Marcel
> > Holtmann suggested to pass the custom packet data through the BT
> > subsystem, which is non-trivial (at least for me) :)
> 
>  I am running some tests today on the wl1283-st on the Logic PD Torpedo
>  board.  Tony had suggested a few options, so I'm going to try those.
>  Looking at those today.  If/when you have a V2, please CC me on it. If
>  it's been posted, can you send me a link?  I would really like to see
>  the st-kim driver go away so I'd like to resolve the issues with the
>  torpedo board.
> >>>
> >>> I have run a bunch of tests on the 5.1 kernel.  I am able to get the
> >>> firmware to load now and the hci0 goes up.  I was able to establish a
> >>> BLE connection to a TI Sensor Tag and read and write data to it with
> >>> good success on the wl1283.
> >>>
> >>> Unfortunately, when I tried to do some more extensive testing over
> >>> classic Bluetooth, I got an error that repeats itself at seemingly
> >>> random intervals:
> >>> Bluetooth: hci0: Frame reassembly failed (-84)
> >>>
> >>> I can still scan and pair, but these Frame reassembly failed errors
> >>> appear to come and go.
> >>
> >> there are only 3 places in h4_recv_buf that return EILSEQ. Just add an 
> >> extra printk to these to figure out which one it is. Maybe it is just 
> >> extra packet types that we need to handle. If it is not the packet type 
> >> one, print what packet we have that is causing this.
> >>
> >
> > I added some code around
> >
> > /* Check for invalid packet type */
> >if (!skb) {
> > printk("Check for invalid packet type %x\n", (unsigned int)
> > ([i])->type);
> > return ERR_PTR(-EILSEQ);
> > }
> >
> > I don't know if I did it right or I am reading the packet type
> > correctly, but the frame reassembly errors are being caught here.
> >
> > [  408.519165] Check for invalid packet type ff
> > [  408.523559] Bluetooth: hci0: Frame reassembly failed (-84)
>
> so now we need to figure our on how to handle HCI_VENDOR_PKT.
>
> #define LL_RECV_VENDOR \
> .type = HCI_VENDOR_PKT, \
> .hlen = aaa, \
> .loff = bbb, \
> .lsize = ccc, \
> .maxlen = ddd
>
> static const struct h4_recv_pkt ll_recv_pkts[] = {
> ...
> { LL_RECV_WAKE_ACK,  .recv = ll_recv_frame  },
> { LL_RECV_VENDOR,.recv = hci_recv_diag  },
> };
>
> Can you hexdump the data inside the skb and we can figure out what it uses 
> for the header and size.
>
> In hci_bcm.c there are a few examples of fixed size packets and bpa10x.c 
> contains one where it follows an actual header definition. Also hci_nokia.c 
> contains a few for their packets.

I haven't forgotten this, but I was highly distracted.  I wanted to
test a bunch of stuff on omap3630 and imx6 boards to prep them for the
upcoming 5.4 LTS kernel.  As of now I 'think' this is the last item on
my to-do list.

I'm going to try and throw some debug code into the older st/kim
driver as well as debug this.  I know some people have stated they
have wl1283-st working on a dm3730.  dump some logs?  I am curious to
see if there is

Re: [PATCH RESEND v3 2/4] clone3: switch to copy_struct_from_user()

2019-09-30 Thread Kees Cook

On Tue, Oct 01, 2019 at 05:15:24AM +1000, Aleksa Sarai wrote:
> From: Aleksa Sarai 
> 
> The change is very straightforward, and helps unify the syscall
> interface for struct-from-userspace syscalls. Additionally, explicitly
> define CLONE_ARGS_SIZE_VER0 to match the other users of the
> struct-extension pattern.
> 
> Signed-off-by: Aleksa Sarai 
> ---
>  include/uapi/linux/sched.h |  2 ++
>  kernel/fork.c  | 34 +++---
>  2 files changed, 9 insertions(+), 27 deletions(-)
> 
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index b3105ac1381a..0945805982b4 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -47,6 +47,8 @@ struct clone_args {
>   __aligned_u64 tls;
>  };
>  
> +#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
> +
>  /*
>   * Scheduling policies
>   */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index f9572f416126..2ef529869c64 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2525,39 +2525,19 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, 
> unsigned long, newsp,
>  #ifdef __ARCH_WANT_SYS_CLONE3
>  noinline static int copy_clone_args_from_user(struct kernel_clone_args 
> *kargs,
> struct clone_args __user *uargs,
> -   size_t size)
> +   size_t usize)
>  {
> + int err;
>   struct clone_args args;
>  
> - if (unlikely(size > PAGE_SIZE))
> + if (unlikely(usize > PAGE_SIZE))
>   return -E2BIG;

I quickly looked through the earlier threads and couldn't find it, but
I have a memory of some discussion about moving this test into the
copy_struct_from_user() function itself? That would seems like a
reasonable idea? ("4k should be enough for any structure!")

Either way:

Reviewed-by: Kees Cook 


> -
> - if (unlikely(size < sizeof(struct clone_args)))
> + if (unlikely(usize < CLONE_ARGS_SIZE_VER0))
>   return -EINVAL;
>  
> - if (unlikely(!access_ok(uargs, size)))
> - return -EFAULT;
> -
> - if (size > sizeof(struct clone_args)) {
> - unsigned char __user *addr;
> - unsigned char __user *end;
> - unsigned char val;
> -
> - addr = (void __user *)uargs + sizeof(struct clone_args);
> - end = (void __user *)uargs + size;
> -
> - for (; addr < end; addr++) {
> - if (get_user(val, addr))
> - return -EFAULT;
> - if (val)
> - return -E2BIG;
> - }
> -
> - size = sizeof(struct clone_args);
> - }
> -
> - if (copy_from_user(, uargs, size))
> - return -EFAULT;
> + err = copy_struct_from_user(, sizeof(args), uargs, usize);
> + if (err)
> + return err;
>  
>   /*
>* Verify that higher 32bits of exit_signal are unset and that
> -- 
> 2.23.0
> 

-- 
Kees Cook

Re: [PATCH RESEND v3 1/4] lib: introduce copy_struct_from_user() helper

2019-09-30 Thread Kees Cook

On Tue, Oct 01, 2019 at 05:15:23AM +1000, Aleksa Sarai wrote:
> From: Aleksa Sarai 
> 
> A common pattern for syscall extensions is increasing the size of a
> struct passed from userspace, such that the zero-value of the new fields
> result in the old kernel behaviour (allowing for a mix of userspace and
> kernel vintages to operate on one another in most cases).
> 
> While this interface exists for communication in both directions, only
> one interface is straightforward to have reasonable semantics for
> (userspace passing a struct to the kernel). For kernel returns to
> userspace, what the correct semantics are (whether there should be an
> error if userspace is unaware of a new extension) is very
> syscall-dependent and thus probably cannot be unified between syscalls
> (a good example of this problem is [1]).
> 
> Previously there was no common lib/ function that implemented
> the necessary extension-checking semantics (and different syscalls
> implemented them slightly differently or incompletely[2]). Future
> patches replace common uses of this pattern to make use of
> copy_struct_from_user().
> 
> Some in-kernel selftests that insure that the handling of alignment and
> various byte patterns are all handled identically to memchr_inv() usage.
> 
> [1]: commit 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
>  robustify sched_read_attr() ABI logic and code")
> 
> [2]: For instance {sched_setattr,perf_event_open,clone3}(2) all do do
>  similar checks to copy_struct_from_user() while rt_sigprocmask(2)
>  always rejects differently-sized struct arguments.
> 
> Suggested-by: Rasmus Villemoes 
> Signed-off-by: Aleksa Sarai 
> ---
>  include/linux/bitops.h  |   7 +++
>  include/linux/uaccess.h |   4 ++
>  lib/strnlen_user.c  |   8 +--
>  lib/test_user_copy.c| 133 ++--
>  lib/usercopy.c  | 123 +
>  5 files changed, 262 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index cf074bce3eb3..c94a9ff9f082 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -4,6 +4,13 @@
>  #include 
>  #include 
>  
> +/* Set bits in the first 'n' bytes when loaded from memory */
> +#ifdef __LITTLE_ENDIAN
> +#  define aligned_byte_mask(n) ((1UL << 8*(n))-1)
> +#else
> +#  define aligned_byte_mask(n) (~0xffUL << (BITS_PER_LONG - 8 - 8*(n)))
> +#endif
> +
>  #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
>  #define BITS_TO_LONGS(nr)DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
>  
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 70bbdc38dc37..94f20e6ec6ab 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -231,6 +231,10 @@ __copy_from_user_inatomic_nocache(void *to, const void 
> __user *from,
>  
>  #endif   /* ARCH_HAS_NOCACHE_UACCESS */
>  
> +extern int check_zeroed_user(const void __user *from, size_t size);
> +extern int copy_struct_from_user(void *dst, size_t ksize,
> +  const void __user *src, size_t usize);
> +
>  /*
>   * probe_kernel_read(): safely attempt to read from a location
>   * @dst: pointer to the buffer that shall take the data
> diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
> index 28ff554a1be8..6c0005d5dd5c 100644
> --- a/lib/strnlen_user.c
> +++ b/lib/strnlen_user.c
> @@ -3,16 +3,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> -/* Set bits in the first 'n' bytes when loaded from memory */
> -#ifdef __LITTLE_ENDIAN
> -#  define aligned_byte_mask(n) ((1ul << 8*(n))-1)
> -#else
> -#  define aligned_byte_mask(n) (~0xfful << (BITS_PER_LONG - 8 - 8*(n)))
> -#endif
> -
>  /*
>   * Do a strnlen, return length of string *with* final '\0'.
>   * 'count' is the user-supplied count, while 'max' is the
> diff --git a/lib/test_user_copy.c b/lib/test_user_copy.c
> index 67bcd5dfd847..3a17f71029bb 100644
> --- a/lib/test_user_copy.c
> +++ b/lib/test_user_copy.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Several 32-bit architectures support 64-bit {get,put}_user() calls.
> @@ -31,14 +32,129 @@
>  # define TEST_U64
>  #endif
>  
> -#define test(condition, msg) \
> -({   \
> - int cond = (condition); \
> - if (cond)   \
> - pr_warn("%s\n", msg);   \
> - cond;   \
> +#define test(condition, msg, ...)\
> +({   \
> + int cond = (condition); \
> + if (cond)   \
> + pr_warn("[%d] " msg "\n", __LINE__, ##__VA_ARGS__); \
> + cond;   \
>  })
>  
> +static bool is_zeroed(void *from, size_t

Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

2019-09-30 Thread Andrea Arcangeli

Hello,

On Fri, Sep 13, 2019 at 12:05:26PM -0600, Alex Williamson wrote:
> On Mon, 2 Sep 2019 15:32:42 +0800
> Ben Luo  wrote:
> 
> > 在 2019/8/30 上午1:06, Alex Williamson 写道:
> > > On Fri, 30 Aug 2019 00:58:22 +0800
> > > Ben Luo  wrote:
> > >  
> > >> 在 2019/8/28 下午11:55, Alex Williamson 写道:  
> > >>> On Wed, 28 Aug 2019 12:28:04 +0800
> > >>> Ben Luo  wrote:
> > >>> 
> >  currently, if the page is not a tail of compound page, it will be
> >  checked twice for the same thing.
> > 
> >  Signed-off-by: Ben Luo 
> >  ---
> > drivers/vfio/vfio_iommu_type1.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> >  diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >  b/drivers/vfio/vfio_iommu_type1.c
> >  index 054391f..d0f7346 100644
> >  --- a/drivers/vfio/vfio_iommu_type1.c
> >  +++ b/drivers/vfio/vfio_iommu_type1.c
> >  @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, 
> >  long npage, bool async)
> > static bool is_invalid_reserved_pfn(unsigned long pfn)
> > {
> > if (pfn_valid(pfn)) {
> >  -  bool reserved;
> > struct page *tail = pfn_to_page(pfn);
> > struct page *head = compound_head(tail);
> >  -  reserved = !!(PageReserved(head));
> > if (head != tail) {
> >  +  bool reserved = PageReserved(head);
> > /*
> >  * "head" is not a dangling pointer
> >  * (compound_head takes care of that)  
> > >>> Thinking more about this, the code here was originally just a copy of
> > >>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> > >>> Should we instead do the same thing here?  Thanks,
> > >>>
> > >>> Alex  
> > >> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
> > >> a look at that and compose a new patch  
> > > I'm not sure if the further updates are quite as relevant for vfio, but
> > > appreciate your review of them.  Thanks,
> > >
> > > Alex  
> > 
> > After studying the related code, my personal understandings are:
> > 
> > kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO 
> > mapped so that to set
> > the proper MTRR TYPE to spte.
> > 
> > is_invalid_reserved_pfn() is used in two scenarios:
> >      1. to tell whether a page should be counted against user's mlock 
> > limits, as the function's name
> > implies, all 'invalid' PFNs who are not backed by struct page and those 
> > reserved pages (including
> > zero page and those from NVDIMM DAX) should be excluded.
> > 2. to check if we have got a valid and pinned pfn for the vma 
> > with VM_PFNMAP flag.
> > 
> > So, for the zero page and 'RAM' backed PFNs without 'struct page', 
> > kvm_is_mmio_pfn() should
> > return false because they are not MMIO and are cacheable, but 
> > is_invalid_reserved_pfn() should
> > return true since they are truely reserved or invalid and should not be 
> > counted against user's
> > mlock limits.
> > 
> > For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO 
> > also returns this
> > error code to user, seems not support fsdax for now, so there is no 
> > chance to call into
> > is_invalid_reserved_pfn() currently, if fsdax is to be supported, not 
> > only this function needs to be
> > updated, vaddr_get_pfn() also needs some changes.
> > 
> > Now, with the assumption that PFNs of compound pages with reserved bit 
> > set in head will not be
> > passed to is_invalid_reserved_pfn(), we can simplify this function to:
> > 
> > static bool is_invalid_reserved_pfn(unsigned long pfn)
> > {
> >      if (pfn_valid(pfn))
> >      return PageReserved(pfn_to_page(pfn));
> > 
> >      return true;
> > }
> > 
> > But, I am not sure if the assumption above is true, if not, we still 
> > need to check the reserved bit of
> > head for a tail page as this PATCH v2 does.
> 
> I believe what you've described is correct.  Andrea, have we missed
> anything?  Thanks,

Yes it looks good. The only reason for ever wanting to check the head
page reserved bit (instead of only checking the tail page reserved
bit) would be if any code would transfer the reserved bit from head to
tail during a hugepage split, but no hugepage split code can transfer
the reserved bit from head to tail during the split, so checking the
head can't make a difference.

The buddy wouldn't allow the driver to allocate an hugepage if any
subpage is reserved in the e820 map at boot, so non-RAM pages with a
backing struct page aren't an issue here. This was only meaningful for
PFNMAP in case the PG_reserved bit was set by the driver on a hugepage
before mapping it in userland, in which case the driver needs to set
the reserved bit in all subpages to be safe (not only in the head).

Thanks,
Andrea

[PATCH 0/3] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event

2019-09-30 Thread Like Xu

Hi Paolo & Community:

Performance Monitoring Unit is designed to monitor micro architectural
events which helps in analyzing how an application or operating systems
are performing on the processors. In KVM/X86, version 2 Architectural
PMU on Intel and AMD hosts have been enabled. 

This patch series is going to improve vPMU Efficiency for guest perf users
which is mainly measured by guest NMI handler latency for basic perf usage
[1][2][3][4] with hardware PMU. It's not a passthrough solution but based
on the legacy vPMU implementation (since 2011) with backport-friendliness.

The general idea (defined in patch 2/3) is to reuse last created perf_event
for the same vPMC when the new requested config is the exactly same as the
last programed config (used by pmc_reprogram_counter()) AND the new event
period is appropriate and accepted (via perf_event_period() in patch 1/3).
Before the perf_event is resued, it would be disabled until it's could be
reused and reassigned a hw-counter again to serve for vPMC.

If the disabled perf_event is no longer reused, we do a lazy release
mechanism (defined in patch 3/3) which in a short is to release the
disabled perf_events on the first call of vcpu_enter_guest since the
vcpu gets next scheduled in if its MSRs is not accessed in the last
sched time slice. The bitmap pmu->lazy_release_ctrl is added to track.
The kvm_pmu_cleanup() is added to the first time to run vcpu_enter_guest
after the vcpu shced_in and the overhead is very limited.

With this optimization, the average latency of the guest NMI handler is
reduced from 99450 ns to 56195 ns (1.76x speed up on CLX-AP with v5.3).
If host disables the watchdog (echo 0 > /proc/sys/kernel/watchdog), the
minimum latency of guest NMI handler could be speed up at 2994x and in
the average at 685x. The run time of workload with perf attached inside
the guest could be reduced significantly with this optimization. 

Please check each commit for more details and share your comments with us.

Thanks,
Like Xu 

---
[1] multiplexing sampling usage: time perf record  -e \
`perf list | grep Hardware | grep event |\
awk '{print $1}' | head -n 10 |tr '\n' ',' | sed 's/,$//' ` ./ftest
[2] one gp counter sampling usage: perf record -e branch-misses ./ftest
[3] one fixed counter sampling usage: perf record -e instructions ./ftest
[4] event count usage: perf stat -e branch-misses ./ftest

Like Xu (3):
  perf/core: Provide a kernel-internal interface to recalibrate event
period
  KVM: x86/vPMU: Reuse perf_event to avoid unnecessary
pmc_reprogram_counter
  KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

 arch/x86/include/asm/kvm_host.h | 10 
 arch/x86/kvm/pmu.c  | 88 -
 arch/x86/kvm/pmu.h  | 15 +-
 arch/x86/kvm/pmu_amd.c  | 14 ++
 arch/x86/kvm/vmx/pmu_intel.c| 27 ++
 arch/x86/kvm/x86.c  |  6 +++
 include/linux/perf_event.h  |  5 ++
 kernel/events/core.c| 28 ---
 8 files changed, 182 insertions(+), 11 deletions(-)

-- 
2.21.0

[PATCH 2/3] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter

2019-09-30 Thread Like Xu

The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a high-frequency and heavyweight operation, especially when host disables
the watchdog (maximum 2100 ns) which leads to an unacceptable latency
of the guest NMI handler and limits the vPMU usage scenario.

When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.

For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its last programed config AND a new sample period
based on pmc->counter is accepted by host perf interface, the current event
could be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.

It's light-weight to call pmc_pause_counter (disable event & reset count)
and pmc_resume_counter (recalibrate period & re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 programed_config'
field is added to save the last original programed config for each vPMC.

Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~94% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 99450 ns to 56195 ns (1.76x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at 2994x (from 18134692 to 6057 ns) and in the average at 685x.

Suggested-by: Kan Liang 
Signed-off-by: Like Xu 
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/pmu.c  | 45 +++--
 arch/x86/kvm/pmu.h  | 12 +++--
 arch/x86/kvm/pmu_amd.c  |  1 +
 arch/x86/kvm/vmx/pmu_intel.c|  2 ++
 5 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 23edf56cf577..15f2ebad94f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -458,6 +458,8 @@ struct kvm_pmc {
u64 eventsel;
struct perf_event *perf_event;
struct kvm_vcpu *vcpu;
+   /* the exact requested config for perf_event reusability check */
+   u64 programed_config;
 };
 
 struct kvm_pmu {
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 46875bbd0419..74bc5c42b8b5 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -140,6 +140,35 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 
type,
clear_bit(pmc->idx, (unsigned long*)_to_pmu(pmc)->reprogram_pmi);
 }
 
+static void pmc_pause_counter(struct kvm_pmc *pmc)
+{
+   if (!pmc->perf_event)
+   return;
+
+   pmc->counter = pmc_read_counter(pmc);
+
+   perf_event_disable(pmc->perf_event);
+
+   /* reset count to avoid redundant accumulation */
+   local64_set(>perf_event->count, 0);
+}
+
+static bool pmc_resume_counter(struct kvm_pmc *pmc)
+{
+   if (!pmc->perf_event)
+   return false;
+
+   /* recalibrate sample period and check if it's accepted by perf core */
+   if (perf_event_period(pmc->perf_event,
+   (-pmc->counter) & pmc_bitmask(pmc)))
+   return false;
+
+   /* reuse perf_event to serve as pmc_reprogram_counter() does*/
+   perf_event_enable(pmc->perf_event);
+   clear_bit(pmc->idx, (unsigned long *)_to_pmu(pmc)->reprogram_pmi);
+   return true;
+}
+
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 {
unsigned config, type = PERF_TYPE_RAW;
@@ -154,7 +183,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 
pmc->eventsel = eventsel;
 
-   pmc_stop_counter(pmc);
+   pmc_pause_counter(pmc);
 
if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
return;
@@ -193,6 +222,12 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 
eventsel)
if (type == PERF_TYPE_RAW)
config = eventsel & X86_RAW_EVENT_MASK;
 
+   if (pmc->programed_config == eventsel && pmc_resume_counter(pmc))
+   return;
+
+   pmc_release_perf_event(pmc);
+
+   pmc->programed_config = eventsel;
pmc_reprogram_counter(pmc, type, config,
  !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
  !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
@@ -209,7 +244,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, 
int idx)
struct kvm_pmu_event_filter *filter;
struct kvm *kvm = pmc->vcpu->kvm;
 
-   pmc_stop_counter(pmc);
+   pmc_pause_counter(pmc);
 
if (!en_field || !pmc_is_enabled(pmc))
return;
@@ -224,6 +259,12 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8

[PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

2019-09-30 Thread Like Xu

Currently, a host perf_event is created for a vPMC functionality emulation.
It’s unpredictable to determine if a disabled perf_event will be reused.
If they are disabled and are not reused for a considerable period of time,
those obsolete perf_events would increase host context switch overhead that
could have been avoided.

If the guest doesn't access (set_msr/get_msr/rdpmc) any of the vPMC's MSRs
during an entire vcpu sched time slice, and its independent enable bit of
the vPMC isn't set, we can predict that the guest has finished the use of
this vPMC, and then it's time to release the non-reused perf_event on the
first call of vcpu_enter_guest() since the vcpu gets next scheduled in.

This lazy mechanism delays the event release time to the beginning of the
next scheduled time slice if vPMC's MSRs aren't accessed during this time
slice. If guest comes back to use this vPMC in next time slice, a new perf
event would be re-created via perf_event_create_kernel_counter() as usual.

Suggested-by: Wei W Wang 
Signed-off-by: Like Xu 
---
 arch/x86/include/asm/kvm_host.h |  8 ++
 arch/x86/kvm/pmu.c  | 43 +
 arch/x86/kvm/pmu.h  |  3 +++
 arch/x86/kvm/pmu_amd.c  | 13 ++
 arch/x86/kvm/vmx/pmu_intel.c| 25 +++
 arch/x86/kvm/x86.c  |  6 +
 6 files changed, 98 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 15f2ebad94f9..6723c04c8dc6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -479,6 +479,14 @@ struct kvm_pmu {
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
struct irq_work irq_work;
u64 reprogram_pmi;
+
+   /* for PMC being set, do not released its perf_event (if any) */
+   u64 lazy_release_ctrl;
+
+   union {
+   u8 event_count :7; /* the total number of created perf_events */
+   bool enable_cleanup :1;
+   } state;
 };
 
 struct kvm_pmu_ops;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 74bc5c42b8b5..1b3cec38b1a1 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -137,6 +137,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 
type,
}
 
pmc->perf_event = event;
+   pmc_to_pmu(pmc)->state.event_count++;
clear_bit(pmc->idx, (unsigned long*)_to_pmu(pmc)->reprogram_pmi);
 }
 
@@ -368,6 +369,7 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 
*data)
if (!pmc)
return 1;
 
+   __set_bit(pmc->idx, (unsigned long *)>lazy_release_ctrl);
*data = pmc_read_counter(pmc) & mask;
return 0;
 }
@@ -385,11 +387,13 @@ bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 {
+   kvm_x86_ops->pmu_ops->update_lazy_release_ctrl(vcpu, msr);
return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr, data);
 }
 
 int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+   kvm_x86_ops->pmu_ops->update_lazy_release_ctrl(vcpu, msr_info->index);
return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info);
 }
 
@@ -417,9 +421,48 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
memset(pmu, 0, sizeof(*pmu));
kvm_x86_ops->pmu_ops->init(vcpu);
init_irq_work(>irq_work, kvm_pmi_trigger_fn);
+   pmu->lazy_release_ctrl = 0;
+   pmu->state.event_count = 0;
+   pmu->state.enable_cleanup = false;
kvm_pmu_refresh(vcpu);
 }
 
+static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
+{
+   struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+
+   if (pmc_is_fixed(pmc))
+   return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+   pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
+
+   return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
+}
+
+void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
+{
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   struct kvm_pmc *pmc = NULL;
+   u64 bitmask = ~pmu->lazy_release_ctrl;
+   int i;
+
+   if (!unlikely(pmu->state.enable_cleanup))
+   return;
+
+   /* do cleanup before the first time of running vcpu after sched_in */
+   pmu->state.enable_cleanup = false;
+
+   /* cleanup unmarked vPMC in the last sched time slice */
+   for_each_set_bit(i, (unsigned long *), X86_PMC_IDX_MAX) {
+   pmc = kvm_x86_ops->pmu_ops->pmc_idx_to_pmc(pmu, i);
+
+   if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))
+   pmc_stop_counter(pmc);
+   }
+
+   /* reset vPMC lazy-release states for this sched time slice */
+   pmu->lazy_release_ctrl = 0;
+}
+
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
 {
kvm_pmu_reset(vcpu);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 3a95952702d2..c681738ba59c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -34,6 +34,7 @@ struct kvm_pmu_ops {

[PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period

2019-09-30 Thread Like Xu

Currently, perf_event_period() is used by user tool via ioctl. Exporting
perf_event_period() for kernel users (such as KVM) who may recalibrate the
event period for their assigned counters according to their requirements.

The perf_event_period() is an external accessor, just like the
perf_event_{en,dis}able() and should thus use perf_event_ctx_lock().

Suggested-by: Kan Liang 
Reviewed-by: Kan Liang 
Signed-off-by: Like Xu 
---
 include/linux/perf_event.h |  5 +
 kernel/events/core.c   | 28 +---
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 61448c19a132..83db24173e4d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1336,6 +1336,7 @@ extern void perf_event_disable_local(struct perf_event 
*event);
 extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
 extern int perf_event_account_interrupt(struct perf_event *event);
+extern int perf_event_period(struct perf_event *event, u64 value);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
@@ -1415,6 +1416,10 @@ static inline void perf_event_disable(struct perf_event 
*event)  { }
 static inline int __perf_event_disable(void *info) { 
return -1; }
 static inline void perf_event_task_tick(void)  { }
 static inline int perf_event_release_kernel(struct perf_event *event)  { 
return 0; }
+extern int perf_event_period(struct perf_event *event, u64 value)
+{
+   return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4655adbbae10..de740d20b028 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5094,16 +5094,11 @@ static int perf_event_check_period(struct perf_event 
*event, u64 value)
return event->pmu->check_period(event, value);
 }
 
-static int perf_event_period(struct perf_event *event, u64 __user *arg)
+static int _perf_event_period(struct perf_event *event, u64 value)
 {
-   u64 value;
-
if (!is_sampling_event(event))
return -EINVAL;
 
-   if (copy_from_user(, arg, sizeof(value)))
-   return -EFAULT;
-
if (!value)
return -EINVAL;
 
@@ -5121,6 +5116,19 @@ static int perf_event_period(struct perf_event *event, 
u64 __user *arg)
return 0;
 }
 
+int perf_event_period(struct perf_event *event, u64 value)
+{
+   struct perf_event_context *ctx;
+   int ret;
+
+   ctx = perf_event_ctx_lock(event);
+   ret = _perf_event_period(event, value);
+   perf_event_ctx_unlock(event, ctx);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(perf_event_period);
+
 static const struct file_operations perf_fops;
 
 static inline int perf_fget_light(int fd, struct fd *p)
@@ -5164,8 +5172,14 @@ static long _perf_ioctl(struct perf_event *event, 
unsigned int cmd, unsigned lon
return _perf_event_refresh(event, arg);
 
case PERF_EVENT_IOC_PERIOD:
-   return perf_event_period(event, (u64 __user *)arg);
+   {
+   u64 value;
+
+   if (get_user(value, (u64 __user *)))
+   return -EFAULT;
 
+   return _perf_event_period(event, value);
+   }
case PERF_EVENT_IOC_ID:
{
u64 id = primary_event_id(event);
-- 
2.21.0

Re: [PATCH v2] i2c: i2c-mt65xx: fix NULL ptr dereference

2019-09-30 Thread Hsin-Yi Wang

On Mon, Sep 30, 2019 at 2:20 PM Cengiz Can  wrote:
>
> On 2019-09-30 18:28, Fabien Parent wrote:
> > Fixes: abf4923e97c3 ("i2c: mediatek: disable zero-length transfers for
> > mt8183")
> > Signed-off-by: Fabien Parent 
>
> Reviewed-by: Cengiz Can 
Reviewed-by: Hsin-Yi Wang 

Thanks!

[PATCH] scatterlist: Validate page before calling PageSlab()

2019-09-30 Thread Alan Mikhak

From: Alan Mikhak 

Modify sg_miter_stop() to validate the page pointer
before calling PageSlab(). This check prevents a crash
that will occur if PageSlab() gets called with a page
pointer that is not backed by page struct.

A virtual address obtained from ioremap() for a physical
address in PCI address space can be assigned to a
scatterlist segment using the public scatterlist API
as in the following example:

my_sg_set_page(struct scatterlist *sg,
   const void __iomem *ioaddr,
   size_t iosize)
{
sg_set_page(sg,
virt_to_page(ioaddr),
(unsigned int)iosize,
offset_in_page(ioaddr));
sg_init_marker(sg, 1);
}

If the virtual address obtained from ioremap() is not
backed by a page struct, virt_to_page() returns an
invalid page pointer. However, sg_copy_buffer() can
correctly recover the original virtual address. Such
addresses can successfully be assigned to scatterlist
segments to transfer data across the PCI bus with
sg_copy_buffer() if it were not for the crash in
PageSlab() when called by sg_miter_stop().

Signed-off-by: Alan Mikhak 
---
 lib/scatterlist.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c2cf2c311b7d..f5c61cad40ba 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -807,6 +807,7 @@ void sg_miter_stop(struct sg_mapping_iter *miter)
miter->__remaining -= miter->consumed;
 
if ((miter->__flags & SG_MITER_TO_SG) &&
+   pfn_valid(page_to_pfn(miter->page)) &&
!PageSlab(miter->page))
flush_kernel_dcache_page(miter->page);
 
-- 
2.7.4

Re: [PATCH] PCI: Enhance the ACS quirk for Cavium devices

2019-09-30 Thread Jayachandran Chandrasekharan Nair

On Mon, Sep 30, 2019 at 03:34:10PM -0500, Bjorn Helgaas wrote:
> [+cc Vadim, Manish]

Manish and Vadim are no longer with Cavium, adding Robert for
ThunderX1 and Sunil for Cavium networking processors.

> On Thu, Sep 19, 2019 at 02:43:34AM +, George Cherian wrote:
> > Enhance the ACS quirk for Cavium Processors. Add the root port
> > vendor ID's in an array and use the same in match function.
> > For newer devices add the vendor ID's in the array so that the
> > match function is simpler.
> > 
> > Signed-off-by: George Cherian 
> > ---
> >  drivers/pci/quirks.c | 28 +++-
> >  1 file changed, 19 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 44c4ae1abd00..64deeaddd51c 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -4241,17 +4241,27 @@ static int pci_quirk_amd_sb_acs(struct pci_dev 
> > *dev, u16 acs_flags)
> >  #endif
> >  }
> >  
> > +static const u16 pci_quirk_cavium_acs_ids[] = {
> > +   /* CN88xx family of devices */
> > +   0xa180, 0xa170,
> > +   /* CN99xx family of devices */
> > +   0xaf84,
> > +   /* CN11xxx family of devices */
> > +   0xb884,
> > +};
> > +
> >  static bool pci_quirk_cavium_acs_match(struct pci_dev *dev)
> >  {
> > -   /*
> > -* Effectively selects all downstream ports for whole ThunderX 1
> > -* family by 0xf800 mask (which represents 8 SoCs), while the lower
> > -* bits of device ID are used to indicate which subdevice is used
> > -* within the SoC.
> > -*/
> > -   return (pci_is_pcie(dev) &&
> > -   (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) &&
> > -   ((dev->device & 0xf800) == 0xa000));
> > +   int i;
> > +
> > +   if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
> > +   return false;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(pci_quirk_cavium_acs_ids); i++)
> > +   if (pci_quirk_cavium_acs_ids[i] == dev->device)
> 
> I'm a little skeptical of this because the previous test:
> 
>   (dev->device & 0xf800) == 0xa000
> 
> could match *many* devices, but of those, the new code only matches two
> (0xa180, 0xa170).
> 
> And the comment says the new code matches the CN99xx and CN11xxx
> *families*, but it only matches a single device ID for each, which
> makes me think there may be more devices to come.
> 
> Maybe this is all what you want, but please confirm.

There are only a very few device IDs for root ports, so just listing
them out like this maybe better. The earlier match covered a lot of
ThunderX1 devices, but did not really match the ThunderX2 root ports.

This looks ok for ThunderX2. Sunil & Robert can comment on other
processor families I hope.
 
> The commit log should be explicit that this adds CN99xx and CN11xxx,
> which previously were not matched.
> 
> This looks like stable material?
> 
> > +   return true;
> > +
> > +   return false;
> >  }
> >  
> >  static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)

JC

Re: [PATCH] uaccess: Add missing __must_check attributes

2019-09-30 Thread Kees Cook

On Mon, Sep 30, 2019 at 12:33:19PM +0200, Arnd Bergmann wrote:
> On Wed, Aug 28, 2019 at 7:38 PM Kees Cook  wrote:
> >
> > The usercopy implementation comments describe that callers of the
> > copy_*_user() family of functions must always have their return values
> > checked. This can be enforced at compile time with __must_check, so add
> > it where needed.
> >
> > Signed-off-by: Kees Cook 
> 
> I can't find any other reports, so I'd point out here that this found what
> looks like a bug in the x86 math-emu code:

Oh interesting!

> arch/x86/math-emu/reg_ld_str.c:88:2: error: ignoring return value of
> function declared with 'warn_unused_result' attribute
> [-Werror,-Wunused-result]
> __copy_from_user(sti_ptr, s, 10);
> ^~~~ ~~
> arch/x86/math-emu/reg_ld_str.c:1129:2: error: ignoring return value of
> function declared with 'warn_unused_result' attribute
> [-Werror,-Wunused-result]
> __copy_from_user(register_base + offset, s, other);
> ^~~~ 
> arch/x86/math-emu/reg_ld_str.c:1131:3: error: ignoring return value of
> function declared with 'warn_unused_result' attribute
> [-Werror,-Wunused-result]
> __copy_from_user(register_base, s + other, offset);
> ^~~~ 

What was the CONFIG for this? I didn't hit these in my build tests.

> Moreover, the same code also ignores the return value from most
> get_user()/put_user()/FPU_get_user()/FPU_put_user() calls,
> which have no warn_unused_result annotation (they are macros,
> but I think something could be done if we want to have that
> annotation to catch some of the other such users).

It would certainly make sense to mark those as __must_check too... now
tracking this here for anyone that wants to take a stab at it:
https://github.com/KSPP/linux/issues/16

-- 
Kees Cook

Re: [PATCH v2 linux-next 4/4] arm64: configs: defconfig: Change CONFIG_REMOTEPROC from m to y

2019-09-30 Thread keerthy





On 10/1/2019 12:16 AM, Olof Johansson wrote:

On Mon, Sep 30, 2019 at 6:49 AM Will Deacon  wrote:


On Fri, Sep 20, 2019 at 01:29:46PM +0530, Keerthy wrote:

Commit 6334150e9a36 ("remoteproc: don't allow modular build")
changes CONFIG_REMOTEPROC to a boolean from a tristate config
option which inhibits all defconfigs marking CONFIG_REMOTEPROC as
a module in compiling the remoteproc and dependent config options.

So fix the defconfig to have CONFIG_REMOTEPROC built in.

Fixes: 6334150e9a36 ("remoteproc: don't allow modular build")
Signed-off-by: Keerthy 
---
  arch/arm64/configs/defconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 8e05c39eab08..c9a867ac32d4 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -723,7 +723,7 @@ CONFIG_TEGRA_IOMMU_SMMU=y
  CONFIG_ARM_SMMU=y
  CONFIG_ARM_SMMU_V3=y
  CONFIG_QCOM_IOMMU=y
-CONFIG_REMOTEPROC=m
+CONFIG_REMOTEPROC=y
  CONFIG_QCOM_Q6V5_MSS=m
  CONFIG_QCOM_Q6V5_PAS=m
  CONFIG_QCOM_SYSMON=m


Acked-by: Will Deacon 

This fixes the following annoying warning from "make defconfig" on arm64:

   arch/arm64/configs/defconfig:726:warning: symbol value 'm' invalid for 
REMOTEPROC

I'm assuming the fix will go via arm-soc, but I can take it otherwise
(please just let me know).


Thanks, I'll pick this up, but I'll squash the 4 one-line changes into
one commit instead of separate patches.


Thanks Olof.




-Olof

[PATCH 02/37] MIPS: Use compact branch for LL/SC loops on MIPSr6+

2019-09-30 Thread Paul Burton

When targeting MIPSr6 or higher make use of a compact branch in LL/SC
loops, preventing the insertion of a delay slot nop that only serves to
waste space.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/llsc.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/mips/include/asm/llsc.h b/arch/mips/include/asm/llsc.h
index 9b19f38562ac..d240a4a2d1c4 100644
--- a/arch/mips/include/asm/llsc.h
+++ b/arch/mips/include/asm/llsc.h
@@ -9,6 +9,8 @@
 #ifndef __ASM_LLSC_H
 #define __ASM_LLSC_H
 
+#include 
+
 #if _MIPS_SZLONG == 32
 #define SZLONG_LOG 5
 #define SZLONG_MASK 31UL
@@ -32,6 +34,8 @@
  */
 #if R1_LLSC_WAR
 # define __SC_BEQZ "beqzl  "
+#elif MIPS_ISA_REV >= 6
+# define __SC_BEQZ "beqzc  "
 #else
 # define __SC_BEQZ "beqz   "
 #endif
-- 
2.23.0

[PATCH 03/37] MIPS: barrier: Add __SYNC() infrastructure

2019-09-30 Thread Paul Burton

Introduce an asm/sync.h header which provides infrastructure that can be
used to generate sync instructions of various types, and for various
reasons. For example if we need a sync instruction that provides a full
completion barrier but only on systems which have weak memory ordering,
we can generate the appropriate assembly code using:

  __SYNC(full, weak_ordering)

When the kernel is configured to run on systems with weak memory
ordering (ie. CONFIG_WEAK_ORDERING is selected) we'll emit a sync
instruction. When the kernel is configured to run on systems with strong
memory ordering (ie. CONFIG_WEAK_ORDERING is not selected) we'll emit
nothing. The caller doesn't need to know which happened - it simply says
what it needs & when, with no concern for checking the kernel
configuration.

There are some scenarios in which we may want to emit code only when we
*didn't* emit a sync instruction. For example, some Loongson3 CPUs
suffer from a bug that requires us to emit a sync instruction prior to
each ll instruction (enabled by CONFIG_CPU_LOONGSON3_WORKAROUNDS). In
cases where this bug workaround is enabled, it's wasteful to then have
more generic code emit another sync instruction to provide barriers we
need in general. A __SYNC_ELSE() macro allows for this, providing an
extra argument that contains code to be assembled only in cases where
the sync instruction was not emitted. For example if we have a scenario
in which we generally want to emit a release barrier but for affected
Loongson3 configurations upgrade that to a full completion barrier, we
can do that like so:

  __SYNC_ELSE(full, loongson3_war, __SYNC(rl, always))

The assembly generated by these macros can be used either as inline
assembly or in assembly source files.

Differing types of sync as provided by MIPSr6 are defined, but currently
they all generate a full completion barrier except in kernels configured
for Cavium Octeon systems. There the wmb sync-type is used, and rmb
syncs are omitted, as has been the case since commit 6b07d38aaa52
("MIPS: Octeon: Use optimized memory barrier primitives."). Using
__SYNC() with the wmb or rmb types will abstract away the Octeon
specific behavior and allow us to later clean up asm/barrier.h code that
currently includes a plethora of #ifdef's.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/barrier.h | 113 +
 arch/mips/include/asm/sync.h| 207 
 arch/mips/kernel/pm-cps.c   |  20 +--
 3 files changed, 219 insertions(+), 121 deletions(-)
 create mode 100644 arch/mips/include/asm/sync.h

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 9228f7386220..5ad39bfd3b6d 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -9,116 +9,7 @@
 #define __ASM_BARRIER_H
 
 #include 
-
-/*
- * Sync types defined by the MIPS architecture (document MD00087 table 6.5)
- * These values are used with the sync instruction to perform memory barriers.
- * Types of ordering guarantees available through the SYNC instruction:
- * - Completion Barriers
- * - Ordering Barriers
- * As compared to the completion barrier, the ordering barrier is a
- * lighter-weight operation as it does not require the specified instructions
- * before the SYNC to be already completed. Instead it only requires that those
- * specified instructions which are subsequent to the SYNC in the instruction
- * stream are never re-ordered for processing ahead of the specified
- * instructions which are before the SYNC in the instruction stream.
- * This potentially reduces how many cycles the barrier instruction must stall
- * before it completes.
- * Implementations that do not use any of the non-zero values of stype to 
define
- * different barriers, such as ordering barriers, must make those stype values
- * act the same as stype zero.
- */
-
-/*
- * Completion barriers:
- * - Every synchronizable specified memory instruction (loads or stores or 
both)
- *   that occurs in the instruction stream before the SYNC instruction must be
- *   already globally performed before any synchronizable specified memory
- *   instructions that occur after the SYNC are allowed to be performed, with
- *   respect to any other processor or coherent I/O module.
- *
- * - The barrier does not guarantee the order in which instruction fetches are
- *   performed.
- *
- * - A stype value of zero will always be defined such that it performs the 
most
- *   complete set of synchronization operations that are defined.This means
- *   stype zero always does a completion barrier that affects both loads and
- *   stores preceding the SYNC instruction and both loads and stores that are
- *   subsequent to the SYNC instruction. Non-zero values of stype may be 
defined
- *   by the architecture or specific implementations to perform synchronization
- *   behaviors that are less complete than that of stype zero. If an
- *   implementation does not use one of these

[PATCH 18/37] MIPS: bitops: Only use ins for bit 16 or higher

2019-09-30 Thread Paul Burton

set_bit() can set bits 0-15 using an ori instruction, rather than
loading the value -1 into a register & then using an ins instruction.

That is, rather than the following:

  li   t0, -1
  ll   t1, 0(t2)
  ins  t1, t0, 4, 1
  sc   t1, 0(t2)

We can have the simpler:

  ll   t1, 0(t2)
  ori  t1, t1, 0x10
  sc   t1, 0(t2)

The or path already allows immediates to be used, so simply restricting
the ins path to bits that don't fit in immediates is sufficient to take
advantage of this.

Signed-off-by: Paul Burton 
---

 arch/mips/include/asm/bitops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index d3f3f37ca0b1..3ea4f172ac08 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -77,7 +77,7 @@ static inline void set_bit(unsigned long nr, volatile 
unsigned long *addr)
}
 
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-   if (__builtin_constant_p(bit)) {
+   if (__builtin_constant_p(bit) && (bit >= 16)) {
loongson_llsc_mb();
do {
__asm__ __volatile__(
-- 
2.23.0

[PATCH 22/37] MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit

2019-09-30 Thread Paul Burton

The logical operations or & xor used in the test_and_set_bit_lock(),
test_and_clear_bit() & test_and_change_bit() functions currently force
the value 1<
---

 arch/mips/include/asm/bitops.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 34d6fe3f18d0..0b0ce0adce8f 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -261,7 +261,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+m" (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else {
loongson_llsc_mb();
@@ -274,7 +274,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
"   " __SC  "%2, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
@@ -332,7 +332,7 @@ static inline int test_and_clear_bit(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
loongson_llsc_mb();
@@ -358,7 +358,7 @@ static inline int test_and_clear_bit(unsigned long nr,
"   " __SC  "%2, %1 \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
@@ -400,7 +400,7 @@ static inline int test_and_change_bit(unsigned long nr,
"   and %2, %0, %3  \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" (res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} else {
loongson_llsc_mb();
@@ -413,7 +413,7 @@ static inline int test_and_change_bit(unsigned long nr,
"   " __SC  "\t%2, %1   \n"
"   .setpop \n"
: "=" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=" 
(res)
-   : "r" (1UL << bit)
+   : "ir" (1UL << bit)
: __LLSC_CLOBBER);
} while (unlikely(!res));
 
-- 
2.23.0

1 2 3 4 5 6 7 8 >

1 - 100 of 734 matches

Mail list logo