date:20171113

Re: [PATCH v4 2/4] KVM: X86: Add paravirt remote TLB flush

2017-11-13 Thread Peter Zijlstra

On Tue, Nov 14, 2017 at 02:28:56PM +0800, Wanpeng Li wrote:
> >  - have the TLB invalidate handler do something like:
> >
> >state = READ_ONCE(src->preempted);
> >if (!(state & KVM_VCPU_IPI_PENDING))
> >return;
> >
> >local_flush_tlb();
> >
> >do {
> >} while (!try_cmpxchg(&src->preempted, &state,
> >  state & ~KVM_VCPU_IPI_PENDING));
> 
> There are a lot of cases handled by flush_tlb_func_remote() ->
> flush_tlb_function_common(), so I'm afraid to have hole.

Sure, just fix the handler to do what must be done. The above was merely
a sketch. The important part is to only clear IPI_PENDING after we do
the actual flushing, since the caller is waiting for that bit.

So flush_tlb_others() has two callers:

 - arch_tlbbatch_flush() -- info::end = TLB_FLUSH_ALL
 - flush_tlb_mm_range()  -- info::mm = mm

native_flush_tlb_others() does smp_call_function_many(
.func=flush_tlb_func_remote) which in turn calls flush_tlb_func_common(
.local=false, .reason=TLB_REMOTE_SHOOTDOWN).

So something like:

struct flush_tlb_info info = {
.start = 0,
.end = TLB_FLUSH_ALL,
};

flush_tlb_func_common(&info, false, TLB_REMOTE_SHOOTDOWN);

should work for the new IPI. It 'upgrades' all ranges to full flushes,
but meh.

Re: [PATCH 0/2] backlight: pwm_bl: prevent backlight flicker when switching PWM on

2017-11-13 Thread Lothar Waßmann

Hi,

On Fri, 10 Nov 2017 12:22:15 +0100 Enric Balletbo i Serra wrote:
> Hi all,
> 
> On 08/11/17 11:48, Daniel Thompson wrote:
> > On 26/10/17 13:49, Lothar Waßmann wrote:
> >> These patches implement some measures to prevent backlight flicker
> >> when the backlight is being switched on for a display with an active
> >> low brightness control pin.
> >> GIT: [PATCH 1/2] backlight: pwm_bl: Enable PWM before switching regulator
> >> GIT: [PATCH 2/2] backlight: pwm_bl: add configurable delay between
> > 
> > Other than hoisting the pwm_enable() even earlier in the setup sequence this
> > patchset seems to have a significant overlap with Enric's recent posting:
> > 
> >   https://lkml.org/lkml/2017/7/21/211
> > 
> > Any chance of a shared view on this, especially on the DT bindings?
> > 
> 
> The DT binding were discussed some time ago for the patch series I sent, 
> though
> there isn't a final ack from DT maintainer.
> 
> Lothar, the series I sent have been reviewed and acked, can you test if the
> series fixes the problem for you too?
> 
I'll try to test it within the next couple of days and will report back.


Lothar Waßmann

Re: [PATCH 4.13 00/33] 4.13.13-stable review

2017-11-13 Thread Greg Kroah-Hartman

On Mon, Nov 13, 2017 at 02:29:09PM -0800, Guenter Roeck wrote:
> On Mon, Nov 13, 2017 at 01:56:21PM +0100, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.13.13 release.
> > There are 33 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Nov 15 12:55:46 UTC 2017.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 145 pass: 145 fail: 0
> Qemu test results:
>   total: 123 pass: 123 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Thanks for testing all of these and letting me know.

greg k-h

Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Sebastian Gottschall


Am 14.11.2017 um 08:41 schrieb Greg Kroah-Hartman:

On Tue, Nov 14, 2017 at 07:48:47AM +0100, Sebastian Gottschall wrote:

ahm it compiles well. but

[   24.838120] Unable to handle kernel NULL pointer dereference at virtual
address 0055
[   24.846193] pgd = c0004000
[   24.848893] [0055] *pgd=
[   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd ohci_hcd
ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base qca_ssdk
gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k ath9k_common
ath9k_hw ath mac80211 cfg80211 compat
[   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
[   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
[   24.892921] task: ef029ac0 task.stack: ef05a000
[   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
[   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
[   24.907869] pc : []    lr : []    psr: 6153
[   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
[   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
[   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
[   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
[   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment
user
[   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
[   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
[   24.956477] Stack: (0xef05bb58 to 0xef05c000)


will dig into the code to find the reason

Can you run 'git bisect' or if you use quilt, do a manual bisect to find
the offending patch?


already done

netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to 
rhashtable"


this one caused the crash. if i revert it, its working again


Sebastian


--
Mit freundlichen Grüssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz:  Stubenwaldallee 21a, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: s.gottsch...@dd-wrt.com
Tel.: +496251-582650 / Fax: +496251-5826565

RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-13 Thread Ran Wang

Hi Michal,
> -Original Message-
> From: Michal Hocko [mailto:mho...@kernel.org]
> Sent: Tuesday, November 14, 2017 3:07 PM
> To: Ran Wang 
> Cc: linux...@kvack.org; Michael Ellerman ; Vlastimil
> Babka ; Andrew Morton ;
> KAMEZAWA Hiroyuki ; Reza Arbab
> ; Yasuaki Ishimatsu ;
> qiuxi...@huawei.com; Igor Mammedov ; Vitaly
> Kuznetsov ; LKML ;
> Leo Li ; Xiaobo Xie 
> Subject: Re: [PATCH 1/2] mm: drop migrate type checks from
> has_unmovable_pages
> 
> On Tue 14-11-17 06:10:00, Ran Wang wrote:
> [...]
> > > > This drop cause DWC3 USB controller fail on initialization with
> > > > Layerscaper processors (such as LS1043A) as below:
> > > >
> > > > [2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered,
> assigned
> > > bus number 1
> > > > [2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > > [2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > > [2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > > [2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > > [2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > > >
> > > > And I notice that someone also reported to you that DWC2 got
> > > > affected recently, so do you have the solution now?
> > >
> > > Yes. It should be in linux-next. Have a look at the following email
> > > thread:
> > >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > >
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > >
> data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > >
> a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > >
> 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > > &reserved=0
> 
> I really have no idea where the above link came from because my email had
> a reference to
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> data=02%7C01%7Cran.wang_1%40nxp.com%7C9b452e62f11e446d12b408d5
> 2b2e4014%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63646239
> 9997608449&sdata=S9MPhGyIUiYCJdVYMh3DAHAEytu%2Fu45BB%2BcMhO%
> 2BP3Qo%3D&reserved=0
> Has your email client modified the original email?
> 
> > Thanks for your info, although I fail to open the link you shared, but
> > I got patch from my colleague and the issue got fix on my side, let you 
> > know,
> thanks.
> 
> Thanks for your testing anyway. Can I assume your Tested-by?
Yes, please.

BR
Ran
> --
> Michal Hocko
> SUSE Labs

Re: [PATCH RFC v3 4/6] Documentation: Add three sysctls for smart idle poll

2017-11-13 Thread Ingo Molnar


* Quan Xu  wrote:

> 
> 
> On 2017/11/13 23:08, Ingo Molnar wrote:
> > * Quan Xu  wrote:
> > 
> > > From: Quan Xu 
> > > 
> > > To reduce the cost of poll, we introduce three sysctl to control the
> > > poll time when running as a virtual machine with paravirt.
> > > 
> > > Signed-off-by: Yang Zhang 
> > > Signed-off-by: Quan Xu 
> > > ---
> > >   Documentation/sysctl/kernel.txt |   35 
> > > +++
> > >   arch/x86/kernel/paravirt.c  |4 
> > >   include/linux/kernel.h  |6 ++
> > >   kernel/sysctl.c |   34 
> > > ++
> > >   4 files changed, 79 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/Documentation/sysctl/kernel.txt 
> > > b/Documentation/sysctl/kernel.txt
> > > index 694968c..30c25fb 100644
> > > --- a/Documentation/sysctl/kernel.txt
> > > +++ b/Documentation/sysctl/kernel.txt
> > > @@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this 
> > > one.
> > >   ==
> > > +paravirt_poll_grow: (X86 only)
> > > +
> > > +Multiplied value to increase the poll time. This is expected to take
> > > +effect only when running as a virtual machine with CONFIG_PARAVIRT
> > > +enabled. This can't bring any benifit on bare mental even with
> > > +CONFIG_PARAVIRT enabled.
> > > +
> > > +By default this value is 2. Possible values to set are in range {2..16}.
> > > +
> > > +==
> > > +
> > > +paravirt_poll_shrink: (X86 only)
> > > +
> > > +Divided value to reduce the poll time. This is expected to take effect
> > > +only when running as a virtual machine with CONFIG_PARAVIRT enabled.
> > > +This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
> > > +enabled.
> > > +
> > > +By default this value is 2. Possible values to set are in range {2..16}.
> > > +
> > > +==
> > > +
> > > +paravirt_poll_threshold_ns: (X86 only)
> > > +
> > > +Controls the maximum poll time before entering real idle path. This is
> > > +expected to take effect only when running as a virtual machine with
> > > +CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
> > > +even with CONFIG_PARAVIRT enabled.
> > > +
> > > +By default, this value is 0 means not to poll. Possible values to set
> > > +are in range {0..50}. Change the value to non-zero if running
> > > +latency-bound workloads in a virtual machine.
> > I absolutely hate it how this hybrid idle loop polling mechanism is not
> > self-tuning!
> 
> Ingo, actually it is self-tuning..

Then why the hell does it touch the syscall ABI?

> could I only leave paravirt_poll_threshold_ns parameter (the maximum poll 
> time), 
> which is as similar as "adaptive halt-polling" Wanpeng mentioned.. then user 
> can 
> turn it off, or find an appropriate threshold for some odd scenario..

That way lies utter madness. Maybe add it as a debugfs knob, but exposing it to 
userspace: NAK.

Thanks,

Ingo

Re: [PATCH 3.18 00/28] 3.18.81-stable review

2017-11-13 Thread Greg Kroah-Hartman

On Mon, Nov 13, 2017 at 02:50:22PM -0700, Shuah Khan wrote:
> On 11/13/2017 05:54 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.18.81 release.
> > There are 28 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Nov 15 12:53:41 UTC 2017.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.81-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-3.18.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h

Re: [PATCH 4.13 00/33] 4.13.13-stable review

2017-11-13 Thread Greg Kroah-Hartman

On Mon, Nov 13, 2017 at 02:02:12PM -0800, kernelci.org bot wrote:
> stable-rc/linux-4.13.y boot: 127 boots: 10 failed, 116 passed with 1 conflict 
> (v4.13.12-34-g109b28ca1340)
> 
> Full Boot Summary: 
> https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.13.y/kernel/v4.13.12-34-g109b28ca1340/
> Full Build Summary: 
> https://kernelci.org/build/stable-rc/branch/linux-4.13.y/kernel/v4.13.12-34-g109b28ca1340/
> 
> Tree: stable-rc
> Branch: linux-4.13.y
> Git Describe: v4.13.12-34-g109b28ca1340
> Git Commit: 109b28ca1340961002d4bede168f07823451b8e4
> Git URL: 
> http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> Tested: 51 unique boards, 17 SoC families, 18 builds out of 188
> 
> Boot Regressions Detected:
> 
> arm:
> 
> at91_dt_defconfig:
> at91rm9200ek_rootfs:nfs:
> lab-free-electrons: new failure (last pass: v4.13.12)
> 
> exynos_defconfig:
> exynos4412-odroidx2_rootfs:nfs:
> lab-collabora: failing since 4 days (last pass: 
> v4.13.11-37-g2295c8345797 - first fail: v4.13.12)
> exynos5250-snow_rootfs:nfs:
> lab-collabora: failing since 4 days (last pass: 
> v4.13.11-37-g2295c8345797 - first fail: v4.13.12)

Thanks for these "failing since..." markings, that makes me feel better
that I didn't break anything on my end :)

greg k-h

Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Greg Kroah-Hartman

On Tue, Nov 14, 2017 at 07:48:47AM +0100, Sebastian Gottschall wrote:
> ahm it compiles well. but
> 
> [   24.838120] Unable to handle kernel NULL pointer dereference at virtual
> address 0055
> [   24.846193] pgd = c0004000
> [   24.848893] [0055] *pgd=
> [   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
> [   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd ohci_hcd
> ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base qca_ssdk
> gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k ath9k_common
> ath9k_hw ath mac80211 cfg80211 compat
> [   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
> [   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
> [   24.892921] task: ef029ac0 task.stack: ef05a000
> [   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
> [   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
> [   24.907869] pc : []    lr : []    psr: 6153
> [   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
> [   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
> [   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
> [   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
> [   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment
> user
> [   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
> [   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
> [   24.956477] Stack: (0xef05bb58 to 0xef05c000)
> 
> 
> will dig into the code to find the reason

Can you run 'git bisect' or if you use quilt, do a manual bisect to find
the offending patch?

thanks,

greg k-h

Re: [PATCH 0/7] net: core: devname allocation cleanups

2017-11-13 Thread David Miller

From: Rasmus Villemoes 
Date: Mon, 13 Nov 2017 00:15:03 +0100

> It's somewhat confusing to have both dev_alloc_name and
> dev_get_valid_name. I can't see why the former is less strict than the
> latter, so make them (or rather dev_alloc_name_ns and
> dev_get_valid_name) equivalent, hardening dev_alloc_name() a little.
> 
> Obvious follow-up patches would be to only export one function, and
> make dev_alloc_name a static inline wrapper for that (whichever name
> is chosen for the exported interface). But maybe there is a good
> reason the two exported interfaces do different checking, so I'll
> refrain from including the trivial but tree-wide renaming in this
> series.

Series applied, thanks.

Re: [PATCH v4 2/4] KVM: X86: Add paravirt remote TLB flush

2017-11-13 Thread Peter Zijlstra

On Tue, Nov 14, 2017 at 02:10:16PM +0800, Wanpeng Li wrote:
> 2017-11-13 21:02 GMT+08:00 Peter Zijlstra :
> > That can be written like:
> >
> > do {
> > if (state & KVM_VCPU_PREEMPTED)
> > new_state = state | KVM_VCPU_SHOULD_FLUSH;
> > else
> > new_state = state | KVM_VCPU_IPI_PENDING;
> > } while (!try_cmpxchg(&src->preempted, state, new_state);
> >
> > if (new_state & KVM_VCPU_IPI_PENDING)
> 
> Should be new_state & KVM_VCPU_SHOULD_FLUSH I think.

Quite so indeed.

Re: [RESEND PATCH v2 4/4] x86/umip: Warn if UMIP-protected instructions are used

2017-11-13 Thread Ingo Molnar


* Ricardo Neri  wrote:

> +const char * const umip_insns[5] = {
> + [UMIP_INST_SGDT] = "sgdt",
> + [UMIP_INST_SIDT] = "sidt",
> + [UMIP_INST_SMSW] = "smsw",
> + [UMIP_INST_SLDT] = "sldt",
> + [UMIP_INST_STR] = "str",
> +};

Sigh ...

> +/*
> + * If you change these strings, ensure that buffers using them are 
> sufficiently
> + * large.
> + */
> +static const char umip_warn_use[] = "cannot be used by applications.";
> +static const char umip_warn_emu[] = "For now, expensive software emulation 
> returns result.";

Please use the string literals directly, don't add an extra obfuscation layer.

Plus:

> + unsigned char buf[MAX_INSN_SIZE], warn[128];

> + snprintf(warn, sizeof(warn), "%s %s", umip_insns[umip_inst],
> +  umip_warn_use);

This is incredibly fragile against future buffer overflows, and warning about 
it 
in comments does not make it less fragile!

Thanks,

Ingo

RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei

> > +   if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_PT_USE_GPA) ||
> > +   !(_vmexit_control & VM_EXIT_CLEAR_IA32_RTIT_CTL) ||
> > +   !(_vmentry_control & VM_ENTRY_LOAD_IA32_RTIT_CTL)) {
> > +   _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_PT_USE_GPA;
> 
> Also, you are not checking anywhere if the SUPPRESS_PIP controls are 
> available.  This is probably the best place.

SUPPRESS_PIP(should be "CONCEAL", will fix it.) is use for control of  
processor trace packet. 
I think we should clear it when in SYSTEM mode (For example, PIPs are generated 
on VM exit, with NonRoot=0. On VM exit to SMM, VMCS packets are additionally 
generated). Why need check this here?

> 
> > +   _vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +   _vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
> 
> These two are not needed; disabling SECONDARY_EXEC_PT_USE_GPA is enough.
> The tracing mode will revert to PT_SYSTEM, which does not use the load/clear 
> RTIT_CTL controls.
> 

The status of *_RTIT_CTL should be same with SECONDARY_EXEC_PT_USE_GPA or would 
cause VM-entry failed. 
(architecture-instruction-set-extensions-programming-reference  5.2.3)

[f2fs-dev] [PATCH RESEND] f2fs: fix concurrent problem for updating free bitmap

2017-11-13 Thread LiFan

alloc_nid_failed and scan_nat_page can be called at the same time,
and we haven't protected add_free_nid and update_free_nid_bitmap
with the same nid_list_lock. That could lead to

Thread AThread B
- __build_free_nids
 - scan_nat_page
  - add_free_nid
- alloc_nid_failed
 - update_free_nid_bitmap
  - update_free_nid_bitmap

scan_nat_page will clear the free bitmap since the nid is PREALLOC_NID,
but alloc_nid_failed needs to set the free bitmap. This results in
free nid with free bitmap cleared.
This patch update the bitmap under the same nid_list_lock in add_free_nid.

Signed-off-by: Fan li 
---
 fs/f2fs/node.c | 82 ++
 1 file changed, 42 insertions(+), 40 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index b965a53..0a217d2 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1811,8 +1811,33 @@ static void __move_free_nid(struct f2fs_sb_info *sbi, 
struct free_nid *i,
}
 }
 
+static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
+   bool set, bool build)
+{
+   struct f2fs_nm_info *nm_i = NM_I(sbi);
+   unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid);
+   unsigned int nid_ofs = nid - START_NID(nid);
+
+   if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
+   return;
+
+   if (set) {
+   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   nm_i->free_nid_count[nat_ofs]++;
+   } else {
+   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   if (!build)
+   nm_i->free_nid_count[nat_ofs]--;
+   }
+}
+
 /* return if the nid is recognized as free */
-static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t nid, bool build)
+static bool add_free_nid(struct f2fs_sb_info *sbi,
+   nid_t nid, bool build, bool update)
 {
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i, *e;
@@ -1870,6 +1895,11 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t 
nid, bool build)
ret = true;
err = __insert_free_nid(sbi, i, FREE_NID);
 err_out:
+   if (update) {
+   update_free_nid_bitmap(sbi, nid, ret, build);
+   if (!build)
+   nm_i->available_nids++;
+   }
spin_unlock(&nm_i->nid_list_lock);
radix_tree_preload_end();
 err:
@@ -1896,30 +1926,6 @@ static void remove_free_nid(struct f2fs_sb_info *sbi, 
nid_t nid)
kmem_cache_free(free_nid_slab, i);
 }
 
-static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
-   bool set, bool build)
-{
-   struct f2fs_nm_info *nm_i = NM_I(sbi);
-   unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid);
-   unsigned int nid_ofs = nid - START_NID(nid);
-
-   if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
-   return;
-
-   if (set) {
-   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
-   return;
-   __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   nm_i->free_nid_count[nat_ofs]++;
-   } else {
-   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
-   return;
-   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   if (!build)
-   nm_i->free_nid_count[nat_ofs]--;
-   }
-}
-
 static void scan_nat_page(struct f2fs_sb_info *sbi,
struct page *nat_page, nid_t start_nid)
 {
@@ -1937,18 +1943,18 @@ static void scan_nat_page(struct f2fs_sb_info *sbi,
i = start_nid % NAT_ENTRY_PER_BLOCK;
 
for (; i < NAT_ENTRY_PER_BLOCK; i++, start_nid++) {
-   bool freed = false;
-
if (unlikely(start_nid >= nm_i->max_nid))
break;
 
blk_addr = le32_to_cpu(nat_blk->entries[i].block_addr);
f2fs_bug_on(sbi, blk_addr == NEW_ADDR);
-   if (blk_addr == NULL_ADDR)
-   freed = add_free_nid(sbi, start_nid, true);
-   spin_lock(&NM_I(sbi)->nid_list_lock);
-   update_free_nid_bitmap(sbi, start_nid, freed, true);
-   spin_unlock(&NM_I(sbi)->nid_list_lock);
+   if (blk_addr == NULL_ADDR) {
+   add_free_nid(sbi, start_nid, true, true);
+   } else {
+   spin_lock(&NM_I(sbi)->nid_list_lock);
+   update_free_nid_bitmap(sbi, start_nid, false

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Juergen Gross

On 14/11/17 08:02, Quan Xu wrote:
> 
> 
> On 2017/11/13 18:53, Juergen Gross wrote:
>> On 13/11/17 11:06, Quan Xu wrote:
>>> From: Quan Xu 
>>>
>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>> in idle path which will poll for a while before we enter the real idle
>>> state.
>>>
>>> In virtualization, idle path includes several heavy operations
>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>> hurt performance especially for latency intensive workload like message
>>> passing task. The cost is mainly from the vmexit which is a hardware
>>> context switch between virtual machine and hypervisor. Our solution is
>>> to poll for a while and do not enter real idle path if we can get the
>>> schedule event during polling.
>>>
>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>> reduce the useless poll.
>>>
>>> Signed-off-by: Yang Zhang 
>>> Signed-off-by: Quan Xu 
>>> Cc: Juergen Gross 
>>> Cc: Alok Kataria 
>>> Cc: Rusty Russell 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: x...@kernel.org
>>> Cc: virtualizat...@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-de...@lists.xenproject.org
>> Hmm, is the idle entry path really so critical to performance that a new
>> pvops function is necessary?
> Juergen, Here is the data we get when running benchmark netperf:
>  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
>     29031.6 bit/s -- 76.1 %CPU
> 
>  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
>     35787.7 bit/s -- 129.4 %CPU
> 
>  3. w/ kvm dynamic poll:
>     35735.6 bit/s -- 200.0 %CPU
> 
>  4. w/patch and w/ kvm dynamic poll:
>     42225.3 bit/s -- 198.7 %CPU
> 
>  5. idle=poll
>     37081.7 bit/s -- 998.1 %CPU
> 
> 
> 
>  w/ this patch, we will improve performance by 23%.. even we could improve
>  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>  cost of CPU is much lower than 'idle=poll' case..

I don't question the general idea. I just think pvops isn't the best way
to implement it.

>> Wouldn't a function pointer, maybe guarded
>> by a static key, be enough? A further advantage would be that this would
>> work on other architectures, too.
> 
> I assume this feature will be ported to other archs.. a new pvops makes
> code
> clean and easy to maintain. also I tried to add it into existed pvops,
> but it
> doesn't match.

You are aware that pvops is x86 only?

I really don't see the big difference in maintainability compared to the
static key / function pointer variant:

void (*guest_idle_poll_func)(void);
struct static_key guest_idle_poll_key __read_mostly;

static inline void guest_idle_poll(void)
{
if (static_key_false(&guest_idle_poll_key))
guest_idle_poll_func();
}

And KVM would just need to set guest_idle_poll_func and enable the
static key. Works on non-x86 architectures, too.


Juergen

Re: [PATCH] iio: mma8452: add power_mode sysfs configuration

2017-11-13 Thread Martin Kepplinger


Am 14.11.2017 05:56 schrieb harinath Nampally:

Hi Martin,

But given your concerns, I would strip down this patch to only offer 
the

already documented "low_noise" and "low_power" modes. It wouldn't be
worth it to extend the ABI just because of this!

OK then we can map 'low_noise' to high resolution mode. But I am afraid
I can't test the functionality because I don't have proper instruments 
to

measure the current draw(in microAmps) accurately.


I would like "oversampling" more than this "power_mode" too. For this
driver it would be far more complicated to implement though. I doubt
that it'll be done. power_mode is basically already there implicitely,
and given that there *is* the ABI, we could offer it for free.

I think 'oversampling' is already implemented, as I see
'case IIO_CHAN_INFO_OVERSAMPLING_RATIO:'
being handled which is basically setting the all 4 different power 
modes.

If we also add 'power_mode', I think it would be like having two
different user interfaces for
same functionality. So I don't see much of value adding 'power_mode' as 
well.

Please correct me if I am wrong.

Thanks,
Harinath



You're right. I should've looked more closely. oversampling is there and 
seems to
work. No need to blow up this driver or let alone extend an ABI now. 
Let's drop

this patch.

thanks
 martin


On Sun, Nov 12, 2017 at 7:28 AM, Martin Kepplinger  
wrote:

On 2017-11-11 01:33, Jonathan Cameron wrote:

On Mon, 6 Nov 2017 08:19:58 +0100
Martin Kepplinger  wrote:

This adds the power_mode sysfs interface to the device as documented 
in

sysfs-bus-iio.

---

Note that I explicitely don't sign off on this.

This is a starting point for anybody who can test it and check for 
correct
API usage, and ABI correctness, as documented in 
Documentation/ABI/testing/sys-bus-iio
(grep it for "power_mode"). The ABI doc probably would need an 
addition

too, if the 4 power modes here seem generally useful (there are only
 2 listed there)!

So, if you can test this, feel free to set up a proper patch or
two, and I'm happy to review.

Please note that this patch is quite old. It really should be that 
simple
as far as my understanding back then. We always list the available 
frequencies
of the given power mode we are in, for example, already, and 
everything

basically is in place except for the user interface.


Hmm. A lot of devices support something along these lines.  The issue
has always been - how is userspace to figure out what to do with it?
It's all very vague...

Funnily enough - this used to be really common, but is becoming less 
so
now - presumably because no one was using it much (or maybe I am 
reading

too much into that ;)

Now the question is whether it can be tied to better defined things?

Here low noise restricts the range to 4g.  Issue is that we don't 
actually
have writeable _available attributes (which correspond to range in 
this case).




Does it? Isn't it merely less oversampling.

Low power mode... This one is apparently oversampling.  If possible 
support

it as that as we have well defined interfaces for that.

Jonathan.


Ah, I remember; the oversampling settings was actually a reason why I
hadn't submitted the patch :) The oversampling API would definitely be
more accurate.

I would like "oversampling" more than this "power_mode" too. For this
driver it would be far more complicated to implement though. I doubt
that it'll be done. power_mode is basically already there implicitely,
and given that there *is* the ABI, we could offer it for free.

But given your concerns, I would strip down this patch to only offer 
the

already documented "low_noise" and "low_power" modes. It wouldn't be
worth it to extend the ABI just because of this!

Users would have a simple switch if they don't really *want* to know 
the
details. I think it can be useful to just say "I don't care about 
power

consuption. Be as accurate as possible" or "I just want this think to
work. Use a little power as possible." Sure it's vage, but would it be
useless?

Re: [PATCH v3 2/3] usb: xhci: Add DbC support in xHCI driver

2017-11-13 Thread Felipe Balbi


Hi,

Mathias Nyman  writes:
>> +static int dbc_buf_alloc(struct dbc_buf *db, unsigned int size)
>> +{
>> +db->buf_buf = kzalloc(size, GFP_KERNEL);
>> +if (!db->buf_buf)
>> +return -ENOMEM;
>> +
>> +db->buf_size = size;
>> +db->buf_put = db->buf_buf;
>> +db->buf_get = db->buf_buf;
>> +
>> +return 0;
>> +}

you may wanna have a look at kfifo.

-- 
balbi

Re: [PATCH] lib: Avoid redundant sizeof checking in __bitmap_weight() calculation.

2017-11-13 Thread Rasmus Villemoes

On 14 November 2017 at 07:57, Rakib Mullick  wrote:
> Currently, during __bitmap_weight() calculation hweight_long() is used.
> Inside a hweight_long() a check has been made to figure out whether a
> hweight32() or hweight64() version to use.
>
> diff --git a/lib/bitmap.c b/lib/bitmap.c
> index d8f0c09..552096f 100644
> --- a/lib/bitmap.c
> +++ b/lib/bitmap.c
> @@ -241,10 +241,15 @@ EXPORT_SYMBOL(__bitmap_subset);
>  int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
>  {
> unsigned int k, lim = bits/BITS_PER_LONG;
> -   int w = 0;
> -
> -   for (k = 0; k < lim; k++)
> -   w += hweight_long(bitmap[k]);
> +   int w = 0, is32 = sizeof(bitmap[0]) ? 1 : 0;
> +

hint: sizeof() very rarely evaluates to zero... So this is the same as
"is32 = 1". So the patch as-is is broken (and may explain the 1-byte
delta in vmlinux). But even if this condition is fixed, the patch
doesn't change anything, since the sizeof() evaluation is done at
compile-time, regardless of whether it happens inside the inlined
hweight_long or outside. So it is certainly not worth it to duplicate
the loop.

Rasmus

Re: [PATCH] iio: accel: mma8452: Add single pulse/tap event detection

2017-11-13 Thread Martin Kepplinger


Am 14.11.2017 05:36 schrieb harinath Nampally:

> This patch adds following related changes:
> - defines pulse event related registers
> - enables and handles single pulse interrupt for fxls8471
> - handles IIO_EV_DIR_EITHER in read/write callbacks (because
>   event direction for pulse is either rising or falling)
> - configures read/write event value for pulse latency register
>   using IIO_EV_INFO_HYSTERESIS
> - adds multiple events like pulse and tranient event spec
>   as elements of event_spec array named 'mma8452_accel_events'
>
> Except mma8653 chip all other chips like mma845x and
> fxls8471 have single tap detection feature.
> Tested thoroughly using iio_event_monitor application on
> imx6ul-evk board which has fxls8471.
>
> Signed-off-by: Harinath Nampally 
> ---
What tree is this written against? It doesn't apply to the current 
-next

anyways.

Thanks for the review.
It is actually against 'testing' branch, I think two of my earlier
patches are not yet applied to
any branch, that might be reason this patch is not good against
current -next or 'togreg'.


I think the defintions would deserve to be in a separate patch, but
that's debatable.

Yes, I would argue that definitions are not a logical change.



I would argue definitions don't break the build and maybe slightly 
better

support features like bisect or revert :)


>   .type = IIO_EV_TYPE_MAG,
>   .dir = IIO_EV_DIR_RISING,
>   .mask_separate = BIT(IIO_EV_INFO_ENABLE),
> @@ -1139,6 +1274,15 @@ static const struct iio_event_spec 
mma8452_transient_event[] = {
>   BIT(IIO_EV_INFO_PERIOD) |
>   BIT(IIO_EV_INFO_HIGH_PASS_FILTER_3DB)
>   },
> + {
> + //pulse event
> + .type = IIO_EV_TYPE_MAG,
> + .dir = IIO_EV_DIR_EITHER,
> + .mask_separate = BIT(IIO_EV_INFO_ENABLE),
> + .mask_shared_by_type = BIT(IIO_EV_INFO_VALUE) |
> + BIT(IIO_EV_INFO_PERIOD) |
> + BIT(IIO_EV_INFO_HYSTERESIS)
> + },
>  };
>
>  static const struct iio_event_spec mma8452_motion_event[] = {
> @@ -1202,8 +1346,8 @@ static struct attribute_group 
mma8452_event_attribute_group = {
>   .shift = 16 - (bits), \
>   .endianness = IIO_BE, \
>   }, \
> - .event_spec = mma8452_transient_event, \
> - .num_event_specs = ARRAY_SIZE(mma8452_transient_event), \
> + .event_spec = mma8452_accel_events, \
> + .num_event_specs = ARRAY_SIZE(mma8452_accel_events), \
that would go in the mentioned separate renaming-patch

OK so I will make a patch set; patch 1/2 to just rename
'mma8452_transient_event[]'
to 'mma8452_accel_events[]'(without adding pulse event).
and everything else would go in 2/2. Does that makes sense?



It does to me.

Re: [RESEND PATCH v2 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ingo Molnar


* Ricardo Neri  wrote:

> The instructions STR and SLDT are not emulated in any case. Thus, it made
> sense to not implement functionality to identify them. However, a
> subsequent commit will introduce functionality to warn about the use of
> all the instructions that UMIP protect, not only those that are emulated.
> A first step for that is the ability to identify them.
> 
> Plus, now that STR and SLDT are identified, we need to explicitly avoid
> their emulation (i.e., not rely on unsuccessful identification). Group
> togehter all the cases that we do not want to emulate: STR, SLDT and user
> long mode processes.
> 
> Cc: Andy Lutomirski 
> Cc: H. Peter Anvin 
> Cc: Borislav Petkov 
> Cc: Tony Luck 
> Cc: Paolo Bonzini 
> Cc: Ravi V. Shankar 
> Cc: x...@kernel.org
> Signed-off-by: Ricardo Neri 

Sigh, the _title_ still refers to 'str'...

I'll fix it up, no need to resend, but this lack of attention to details is 
seriously sad.

Thanks,

Ingo

Re: [PATCH] arch, mm: introduce arch_tlb_gather_mmu_lazy (was: Re: [RESEND PATCH] mm, oom_reaper: gather each vma to prevent) leaking TLB entry

2017-11-13 Thread Michal Hocko

On Tue 14-11-17 10:45:49, Minchan Kim wrote:
[...]
> Anyway, I think Wang Nan's patch is already broken.
> http://lkml.kernel.org/r/%3c20171107095453.179940-1-wangn...@huawei.com%3E
> 
> Because unmap_page_range(ie, zap_pte_range) can flush TLB forcefully
> and free pages. However, the architecture code for TLB flush cannot
> flush at all by wrong fullmm so other threads can write freed-page.

I am not sure I understand what you mean. How is that any different from
any other explicit partial madvise call?
-- 
Michal Hocko
SUSE Labs

[f2fs-dev] [PATCH RESEND v2] f2fs: validate before set/clear free nat bitmap

2017-11-13 Thread LiFan

In flush_nat_entries, all dirty nats will be flushed and if their new address
isn't NULL_ADDR, their bitmaps will be updated, the free_nid_count of the
bitmaps will be increased regardless of whether the nats have already been
occupied before. This could lead to wrong free_nid_count.
So this patch checks the status of the bits before actually set/clear them.

Fixes: 586d1492f301 ("f2fs: skip scanning free nid bitmap of full NAT blocks")

Signed-off-by: Fan li 
---
 fs/f2fs/node.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index d234c6e..b965a53 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1906,15 +1906,18 @@ static void update_free_nid_bitmap(struct f2fs_sb_info 
*sbi, nid_t nid,
if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
return;
 
-   if (set)
+   if (set) {
+   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
__set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   else
-   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-
-   if (set)
nm_i->free_nid_count[nat_ofs]++;
-   else if (!build)
-   nm_i->free_nid_count[nat_ofs]--;
+   } else {
+   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   if (!build)
+   nm_i->free_nid_count[nat_ofs]--;
+   }
 }
 
 static void scan_nat_page(struct f2fs_sb_info *sbi,
--
2.7.4

Re: [PATCH 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ingo Molnar


* Ricardo Neri  wrote:

> On Mon, Nov 13, 2017 at 09:12:03AM +0100, Ingo Molnar wrote:
> > 
> > * Ricardo Neri  wrote:
> > 
> > > The instructions str and sldt are not emulated in any case. Thus, it made
> > > sense to not implement functionality to identify them. However, a
> > > subsequent commit will introduce functionality to warn about the use of
> > > all the instructions that UMIP protect, not only those that are emulated.
> > > A first step for that is the ability to identify them.
> > > 
> > > Plus, now that str and sldt are identified, we need to explicitly avoid
> > > their emulation (i.e., not rely on unsuccessful identification). Group
> > > togehter all the cases that we do not want to emulate: str, sldt and user
> > > long mode processes.
> > 
> > Did you notice how in all your previous patches (both in the code and in 
> > the 
> > changelogs) I have manually fixed up the capitalization of these 
> > instruction 
> > mnenonics?
> 
> I am sorry, I tried to see where you made these changes but I could not find
> any. I did a git diff of arch/x86/kernel/umip.c between the branch 
> rneri/umip_v11
> of my repository [1] and the master branch of the tip tree and I did not find
> any differences.

For example, I turned:

  [PATCH v11 12/12] selftests/x86: Add tests for instruction str and sldt

  The instructions str and sldt are not valid when running on virtual-8086
  mode and generate an invalid operand exception.
  ...

into:

  a9e017d5619e: selftests/x86: Add tests for the STR and SLDT instructions

  The STR and SLDT instructions are not valid when running on virtual-8086
  mode and generate an invalid operand exception.
  ...

I did not catch every case though.

Thanks,

Ingo

Re: [PATCH net-next 0/3] rxrpc: Fixes

2017-11-13 Thread David Miller

From: David Howells 
Date: Sat, 11 Nov 2017 17:57:52 +

> 
> Here are some patches that fix some things in AF_RXRPC:
> 
>  (1) Prevent notifications from being passed to a kernel service for a call
>  that it has ended.
> 
>  (2) Fix a null pointer deference that occurs under some circumstances when an
>  ACK is generated.
> 
>  (3) Fix a number of things to do with call expiration.
> 
> The patches can be found here also:
> 
>   
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next
> 
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-next-2017

Pulled, thanks David.

Re: [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack

2017-11-13 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> I have old patches to stop using IST for #DB and #BP, but I never finished 
> them.

I'm all in favor of reviving that effort!

Thanks,

Ingo

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Wanpeng Li

2017-11-14 15:02 GMT+08:00 Quan Xu :
>
>
> On 2017/11/13 18:53, Juergen Gross wrote:
>>
>> On 13/11/17 11:06, Quan Xu wrote:
>>>
>>> From: Quan Xu 
>>>
>>> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
>>> in idle path which will poll for a while before we enter the real idle
>>> state.
>>>
>>> In virtualization, idle path includes several heavy operations
>>> includes timer access(LAPIC timer or TSC deadline timer) which will
>>> hurt performance especially for latency intensive workload like message
>>> passing task. The cost is mainly from the vmexit which is a hardware
>>> context switch between virtual machine and hypervisor. Our solution is
>>> to poll for a while and do not enter real idle path if we can get the
>>> schedule event during polling.
>>>
>>> Poll may cause the CPU waste so we adopt a smart polling mechanism to
>>> reduce the useless poll.
>>>
>>> Signed-off-by: Yang Zhang 
>>> Signed-off-by: Quan Xu 
>>> Cc: Juergen Gross 
>>> Cc: Alok Kataria 
>>> Cc: Rusty Russell 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: x...@kernel.org
>>> Cc: virtualizat...@lists.linux-foundation.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: xen-de...@lists.xenproject.org
>>
>> Hmm, is the idle entry path really so critical to performance that a new
>> pvops function is necessary?
>
> Juergen, Here is the data we get when running benchmark netperf:
>  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
> 29031.6 bit/s -- 76.1 %CPU
>
>  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
> 35787.7 bit/s -- 129.4 %CPU
>
>  3. w/ kvm dynamic poll:
> 35735.6 bit/s -- 200.0 %CPU

Actually we can reduce the CPU utilization by sleeping a period of
time as what has already been done in the poll logic of IO subsystem,
then we can improve the algorithm in kvm instead of introduing another
duplicate one in the kvm guest.

Regards,
Wanpeng Li

>
>  4. w/patch and w/ kvm dynamic poll:
> 42225.3 bit/s -- 198.7 %CPU
>
>  5. idle=poll
> 37081.7 bit/s -- 998.1 %CPU
>
>
>
>  w/ this patch, we will improve performance by 23%.. even we could improve
>  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
>  cost of CPU is much lower than 'idle=poll' case..
>
>> Wouldn't a function pointer, maybe guarded
>> by a static key, be enough? A further advantage would be that this would
>> work on other architectures, too.
>
>
> I assume this feature will be ported to other archs.. a new pvops makes code
> clean and easy to maintain. also I tried to add it into existed pvops, but
> it
> doesn't match.
>
>
>
> Quan
> Alibaba Cloud
>>
>>
>> Juergen
>>
>

RE: [patch v2 3/8] KVM: x86: add Intel processor trace virtualization mode

2017-11-13 Thread Kang, Luwei

> > +#define VM_EXIT_PT_SUPPRESS_PIP0x0100
> > +#define VM_EXIT_CLEAR_IA32_RTIT_CTL0x0200
> >
> >  #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR  0x00036dff
> >
> > @@ -108,6 +112,8 @@
> >  #define VM_ENTRY_LOAD_IA32_PAT 0x4000
> >  #define VM_ENTRY_LOAD_IA32_EFER 0x8000
> >  #define VM_ENTRY_LOAD_BNDCFGS   0x0001
> > +#define VM_ENTRY_PT_SUPPRESS_PIP   0x0002
> > +#define VM_ENTRY_LOAD_IA32_RTIT_CTL0x0004
> 
> 
> Please use PT_CONCEAL instead of PT_SUPPRESS_PIP, to better match the SDM 
> (for both vmexit and vmentry controls).
> 
> > +   if (!enable_ept)
> > +   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > +
> 
> Why is this (and the similar bit-clear operation in vmx_vmentry_control) 
> needed only for !enable_ept?
> 
> Shouldn't it be like
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmexit_control &= ~VM_EXIT_PT_SUPPRESS_PIP;
>   vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
>   }
> 
> and
> 
>   if (pt_mode == PT_MODE_SYSTEM) {
>   vmentry_control &= ~VM_ENTRY_PT_SUPPRESS_PIP;
>   vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
>   }
> 

I think I have a misunderstand of " always set "use GPA for processor tracing" 
in secondary execution control if it can be ".
"use GPA for processor tracing" can't be set in SYSTEM mode even if hardware 
can set this bit. Because guest will still think this a GPA address and 
translate by EPT. In fact, RTIT_OUTPUT_BASE will always a HPA in SYSTEM mode.
Will fix in next version.

Thanks,
Luwei Kang

Re: Improving documentation of parent-ID field in /proc/PID/mountinfo

2017-11-13 Thread Michael Kerrisk (man-pages)

Hi Miklos, Ram

Thanks for your comments. A question below.

On 13 November 2017 at 09:11, Miklos Szeredi  wrote:
> On Mon, Nov 13, 2017 at 8:55 AM, Ram Pai  wrote:
>> On Mon, Nov 13, 2017 at 07:02:21AM +0100, Michael Kerrisk (man-pages) wrote:
>>> Hello Ram,
>>>
>>> Long ago (2.6.29) you added the /proc/PID/mountinfo file and
>>> associated documentation in Documentation/filesystems/proc.txt. Later,
>>> I pasted much of that documentation into the proc(5) manual page.
>>>
>>> That documentation says of the second field in the file:
>>>
>>> [[
>>> This file contains lines of the form:
>>>
>>> 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root 
>>> rw,errors=continue
>>> (1)(2)(3)   (4)   (5)  (6)  (7)   (8) (9)   (10) (11)
>>>
>>> (1) mount ID:  unique identifier of the mount (may be reused after umount)
>>> (2) parent ID:  ID of parent (or of self for the top of the mount tree)
>>> ...
>>> ]]
>>>
>>> The last piece of the description of field (2) doesn't seem to be
>>> correct, or is at least rather unclear. I take this to be saying that
>>> that for the root mount point, /, field (2) will have the same value
>>> as field (1). I never actually looked at this detail closely, but
>>> Alexander pointed out that this is obviously not so, as one can
>>> immediately verify:
>>>
>>> $ grep '/ / ' /proc/$$/mountinfo
>>> 65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,seclabel,data=order
>>>
>>> I dug around in the kernel source for a bit. I do not have an exact
>>> handle on the details, but I can see roughly what is going on.
>>> Internally, there seems to be one ("hidden") mount ID reserved to each
>>> mount namespace, and that ID is the parent of the root mount point.
>>>
>>> Looking through the (4.14) kernel source, mount IDs are allocated by
>>> mnt_alloc_id() (in fs/namespace.c), which is in turn called by
>>> alloc_vfsmnt() which is in turn called by clone_mnt().
>>>
>>> A new mount namespace is created by the kernel function copy_mnt_ns()
>>> (in fs/namespace.c, called by create_new_namespaces() in
>>> kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in
>>> fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places.
>>> The first of these is the call that creates the "hidden" mount ID that
>>> becomes the parent of the root mount point. (I verified this by
>>> instrumenting the kernel with a few printk() calls to display the
>>> IDs.)  The second place where copy_tree() calls clone_mnt() is in a
>>> loop that replicates each of the mount points (including the root
>>> mount point) in the source mount namespace.
>>
>> We used to report that mount, ones upon a time.  Something has changed
>> the behavior since then and its not reported any more, thus making it
>> hidden.
>
> The hidden one is the initramfs, I believe.  That's the root of the
> mount namespace, and the when a namespace is cloned, the tree is
> copied from the namespace root.
>
> It is "hidden" because no process has its root there.  Note the
> difference between namespace root and process root: the first is the
> real root of the mount tree and is unchangeable, the second is
> pointing to some place in a mount tree and can be changed (chroot).
>
> So there's nothing special in this rootfs, it is just hidden because
> it's not the root of any task.
>
> The description of  field (2) is correct, it just does not make it
> clear what it means by "root".

Sorry -- do you mean the old description is correct, or my new
description (below)?

Cheers,

Michael


> Thanks,
> Miklos
>
>>
>>>
>>> With these details in mind, I propose to patch the man page to read as
>>> below. Perhaps you have some corrections or improvements to suggest
>>> for this text?
>>>
>>> [[
>>>   (2)  parent ID: the ID of the parent mount.  For  the  root
>>>mount  point,  the  ID shown here is a hidden mount ID
>>>associated with the mount namespace.  That ID is  dis‐
>>>tinct  from  any  of the IDs shown in field (1) of the
>>>records shown in the  mountinfo  file,  and  does  not
>>>appear in field (1) in the mountinfo file in any other
>>>mount namespace.  (In  the  initial  mount  namespace,
>>>this hidden ID has the value 0.)
>>
>> It captures the current semantics correctly.
>>
>> RP
>>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Re: linux-next: Tree for Nov 7

2017-11-13 Thread Michal Hocko

On Mon 13-11-17 09:35:22, Khalid Aziz wrote:
> On 11/13/2017 09:06 AM, Michal Hocko wrote:
> > OK, so this one should take care of the backward compatibility while
> > still not touching the arch code
> > ---
> > commit 39ff9bf8597e79a032da0954aea1f0d77d137765
> > Author: Michal Hocko 
> > Date:   Mon Nov 13 17:06:24 2017 +0100
> > 
> >  mm: introduce MAP_FIXED_SAFE
> >  MAP_FIXED is used quite often but it is inherently dangerous because it
> >  unmaps an existing mapping covered by the requested range. While this
> >  might be might be really desidered behavior in many cases there are
> >  others which would rather see a failure than a silent memory 
> > corruption.
> >  Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior.
> >  It is a MAP_FIXED extension with a single exception that it fails with
> >  ENOMEM if the requested address is already covered by an existing
> >  mapping. We still do rely on get_unmaped_area to handle all the arch
> >  specific MAP_FIXED treatment and check for a conflicting vma after it
> >  returns.
> >  Signed-off-by: Michal Hocko 
> > 
> > .. deleted ...
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 680506faceae..aad8d37f0205 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1358,6 +1358,10 @@ unsigned long do_mmap(struct file *file, unsigned 
> > long addr,
> > if (mm->map_count > sysctl_max_map_count)
> > return -ENOMEM;
> > +   /* force arch specific MAP_FIXED handling in get_unmapped_area */
> > +   if (flags & MAP_FIXED_SAFE)
> > +   flags |= MAP_FIXED;
> > +
> > /* Obtain the address to map to. we verify (or select) it and ensure
> >  * that it represents a valid section of the address space.
> >  */
> 
> Do you need to move this code above:
> 
> if (!(flags & MAP_FIXED))
> addr = round_hint_to_min(addr);
> 
> /* Careful about overflows.. */
> len = PAGE_ALIGN(len);
> if (!len)
> return -ENOMEM;
> 
> Not doing that might mean the hint address will end up being rounded for
> MAP_FIXED_SAFE which would change the behavior from MAP_FIXED.

Yes, I will move it.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-13 Thread Michal Hocko

On Tue 14-11-17 06:10:00, Ran Wang wrote:
[...]
> > > This drop cause DWC3 USB controller fail on initialization with
> > > Layerscaper processors (such as LS1043A) as below:
> > >
> > > [2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned
> > bus number 1
> > > [2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > > [2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > > [2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > > [2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > > [2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> > >
> > > And I notice that someone also reported to you that DWC2 got affected
> > > recently, so do you have the solution now?
> > 
> > Yes. It should be in linux-next. Have a look at the following email
> > thread:
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> > kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> > data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> > a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> > 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> > &reserved=0

I really have no idea where the above link came from because my email
had a reference to 
http://lkml.kernel.org/r/20171104082500.qvzbb2kw4suo6...@dhcp22.suse.cz
Has your email client modified the original email?

> Thanks for your info, although I fail to open the link you shared, but I got 
> patch
> from my colleague and the issue got fix on my side, let you know, thanks.

Thanks for your testing anyway. Can I assume your Tested-by?
-- 
Michal Hocko
SUSE Labs

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-13 Thread Quan Xu




On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu 

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
Cc: Juergen Gross 
Cc: Alok Kataria 
Cc: Rusty Russell 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-de...@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
 1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
    29031.6 bit/s -- 76.1 %CPU

 2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
    35787.7 bit/s -- 129.4 %CPU

 3. w/ kvm dynamic poll:
    35735.6 bit/s -- 200.0 %CPU

 4. w/patch and w/ kvm dynamic poll:
    42225.3 bit/s -- 198.7 %CPU

 5. idle=poll
    37081.7 bit/s -- 998.1 %CPU



 w/ this patch, we will improve performance by 23%.. even we could improve
 performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
 cost of CPU is much lower than 'idle=poll' case..


Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this would
work on other architectures, too.


I assume this feature will be ported to other archs.. a new pvops makes code
clean and easy to maintain. also I tried to add it into existed pvops, 
but it

doesn't match.



Quan
Alibaba Cloud


Juergen

RE: [patch v2 8/8] KVM: x86: Disable intercept for Intel processor trace MSRs

2017-11-13 Thread Kang, Luwei

> > +   if (pt_mode == PT_MODE_HOST_GUEST) {
> > +   u32 i, eax, ebx, ecx, edx;
> > +
> > +   cpuid_count(0x14, 1, &eax, &ebx, &ecx, &edx);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
> > +   for (i = 0; i < (eax & 0x7); i++)
> > +   vmx_disable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
> > +   false);
> > +   }
> > +
> 
> As I mentioned earlier, this probably makes vmentry/vmexit too expensive when 
> guests are not using processor tracing.  I would do  it only if guest 
> TRACEEN=1 (since anyway the values have to be correct if guest TRACEEN=1, and 
> a change in TRACEEN always causes a vmexit).
> 

Will change in next version.

Thanks,
Luwei Kang

> 
> > return alloc_kvm_area();
> >
> >  out:
> >

RE: [patch v2 7/8] KVM: x86: add Intel PT msr RTIT_CTL read/write

2017-11-13 Thread Kang, Luwei

> >  static DEFINE_PER_CPU(struct vmcs *, vmxarea);  static
> > DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -3384,6 +3385,11 @@
> > static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > return 1;
> > msr_info->data = vcpu->arch.ia32_xss;
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported())
> > +   return 1;
> > +   msr_info->data = vmcs_read64(GUEST_IA32_RTIT_CTL);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP)) @@ -3508,6 
> > +3514,11
> > @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > else
> > clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
> > break;
> > +   case MSR_IA32_RTIT_CTL:
> > +   if (!vmx_pt_supported() || to_vmx(vcpu)->nested.vmxon)
> > +   return 1;
> 
> VMXON must also clear TraceEn bit (see 23.4 in the SDM).

Will clear TraceEn bit in handle_vmon() function.

> 
> You also need to support R/W of all the other MSRs, in order to make them 
> accessible to userspace, and add them in msrs_to_save and kvm_init_msr_list.
> 

Will add it in next version. This is use for live migration, is that right?

> Regarding the save/restore of the state, I am now wondering if you could also 
> use XSAVES and XRSTORS instead of multiple rdmsr/wrmsr in a loop.
> The cost is two MSR writes on vmenter (because the guest must run with 
> IA32_XSS=0) and two on vmexit.
> 

If use XSAVES and XRSTORS for context switch.
1. Before  kvm_load_guest_fpu(vcpu), we need to save host RTIT_CTL, disable PT 
and restore the value of  "vmx->pt_desc.guest.ctl" to GUEST_IA32_RTIT_CTL. Is 
that right?
2. After VM-exit (step out from kvm_x86_ops->run(vcpu)),  we need to save the 
status of GUEST_IA32_RTIT_CTL . TRACEEN=0 and others MSRs are still in guest 
status. Where to enable PT if in host-guest mode. I think we should enable PT 
after vm-exit but it may cause #GP. " If XRSTORS would restore (or initialize) 
PT state and IA32_RTIT_CTL.TraceEn = 1, the instruction causes a 
general-protection exception. SDM 13.5.6". if enable after kvm_put_guest_fpu() 
I think it too late.)

Thanks,
Luwei Kang
> 
> > +   vmcs_write64(GUEST_IA32_RTIT_CTL, data);
> > +   break;
> > case MSR_TSC_AUX:
> > if (!msr_info->host_initiated &&
> > !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> >

[PATCH] lib: Avoid redundant sizeof checking in __bitmap_weight() calculation.

2017-11-13 Thread Rakib Mullick

Currently, during __bitmap_weight() calculation hweight_long() is used.
Inside a hweight_long() a check has been made to figure out whether a
hweight32() or hweight64() version to use.

However, it's unnecessary to do it in case of __bitmap_weight() calculation
inside the loop. We can detect whether to use hweight32() or hweight64()
upfront and call respective function directly. It also reduces the
vmlinux size.

Before the patch:
   textdata bss dec hex filename
129013327798930 1454181635242078219c05e vmlinux

After the patch:
   textdata bss dec hex filename
129013317798930 1454181635242077219c05d vmlinux

Signed-off-by: Rakib Mullick 
Cc: Andrew Morton 
Cc: Rasmus Villemoes 
Cc: Matthew Wilcox 
Cc: Yury Norov 
Cc: Mauro Carvalho Chehab 
 
---
Patch was created against torvald's tree (commit 43ff2f4db9d0f764).

 lib/bitmap.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/lib/bitmap.c b/lib/bitmap.c
index d8f0c09..552096f 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -241,10 +241,15 @@ EXPORT_SYMBOL(__bitmap_subset);
 int __bitmap_weight(const unsigned long *bitmap, unsigned int bits)
 {
unsigned int k, lim = bits/BITS_PER_LONG;
-   int w = 0;
-
-   for (k = 0; k < lim; k++)
-   w += hweight_long(bitmap[k]);
+   int w = 0, is32 = sizeof(bitmap[0]) ? 1 : 0;
+
+   if (is32) {
+   for (k = 0; k < lim; k++)
+   w += hweight32(bitmap[k]);
+   } else {
+   for (k = 0; k < lim; k++)
+   w += hweight64(bitmap[k]);
+   }
 
if (bits % BITS_PER_LONG)
w += hweight_long(bitmap[k] & BITMAP_LAST_WORD_MASK(bits));
-- 
2.9.3

Re: KASAN: use-after-free Read in rds_tcp_dev_event

2017-11-13 Thread Sowmini Varadhan

On (11/13/17 19:30), Girish Moodalbail wrote:
> (L538-540). However, it leaves behind some of the rds_tcp connections that
> shared the same underlying RDS connection (L534 and 535). These connections
> with pointer to stale network namespace are left behind in the global list.

It leaves behind no such thing. After mprds, you want to collect
only one instance of the conn that is being removed, that's why
lines 534-535 skips over duplicat instances of the same conn
(for multiple paths in the same conn).

> When the 2nd network namespace is deleted, we will hit the above stale
> pointer and hit UAF panic.
> I think we should move away from global list to a per-namespace list. The
> global list are used only in two places (both of which are per-namespace
> operations):

Nice try, but not so. 

Let me look at this tomorrow, I missed this mail in my mbox.

--Sowmini

[PATCH -next] irqchip/exiu: Fix return value check in exiu_init()

2017-11-13 Thread Wei Yongjun

In case of error, the function of_iomap() returns NULL pointer not
ERR_PTR(). The IS_ERR() test in the return value check should be
replaced with NULL test.

Fixes: 706cffc1b912 ("irqchip/exiu: Add support for Socionext Synquacer EXIU 
controller")
Signed-off-by: Wei Yongjun 
---
 drivers/irqchip/irq-sni-exiu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-sni-exiu.c b/drivers/irqchip/irq-sni-exiu.c
index 1b6e2f7..1927b2f 100644
--- a/drivers/irqchip/irq-sni-exiu.c
+++ b/drivers/irqchip/irq-sni-exiu.c
@@ -196,8 +196,8 @@ static int __init exiu_init(struct device_node *node,
}
 
data->base = of_iomap(node, 0);
-   if (IS_ERR(data->base)) {
-   err = PTR_ERR(data->base);
+   if (!data->base) {
+   err = -ENODEV;
goto out_free;
}

Re: [PATCH] wcn36xx: Set BTLE coexistence related configuration values to defaults

2017-11-13 Thread Kalle Valo

Ramon Fried  writes:

> From: Eyal Ilsar 
>
> If the value for the firmware configuration parameters
> BTC_STATIC_LEN_LE_BT and BTC_STATIC_LEN_LE_WLAN are not set the duty
> cycle between BT and WLAN is such that if BT (including BLE) is active
> WLAN gets 0 bandwidth. When tuning these parameters having a too high
> value for WLAN means that BLE performance degrades. The "sweet" point
> of roughly half of the maximal values was empirically found to achieve
> a balance between BLE and Wi-Fi coexistence performance.
>
> Signed-off-by: Eyal Ilsar 
> Signed-off-by: Ramon Fried 

Then submit a new version of the patch then please include the version
number:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#patch_version_missing

So after fixing Bjorn's comments the next version should be v3.

-- 
Kalle Valo

Re: [PATCH v6 4/5] crash: export paddr_vmcoreinfo_note()

2017-11-13 Thread Baoquan He

On 11/13/17 at 08:29pm, Marc-André Lureau wrote:
> The following patch is going to use the symbol from the fw_cfg module,
> to call the function and write the note location details in the
> vmcoreinfo entry, so qemu can produce dumps with the vmcoreinfo note.
> 
> CC: Andrew Morton 
> CC: Baoquan He 
> CC: Dave Young 
> CC: Dave Young 
> CC: Hari Bathini 
> CC: Tony Luck 
> CC: Vivek Goyal 
> Signed-off-by: Marc-André Lureau 
> Acked-by: Gabriel Somlo 

ACK

Acked-by: Baoquan He 

Thanks
Baoquan

> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 6db80fc0810b..47541c891810 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -375,6 +375,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
>  {
>   return __pa(vmcoreinfo_note);
>  }
> +EXPORT_SYMBOL(paddr_vmcoreinfo_note);
>  
>  static int __init crash_save_vmcoreinfo_init(void)
>  {
> -- 
> 2.15.0.125.g8f49766d64
>

Re: [PATCH 4.9 00/87] 4.9.62-stable review --> crash

2017-11-13 Thread Sebastian Gottschall


ahm it compiles well. but

[   24.838120] Unable to handle kernel NULL pointer dereference at 
virtual address 0055

[   24.846193] pgd = c0004000
[   24.848893] [0055] *pgd=
[   24.852472] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[   24.858463] Modules linked in: xhci_plat_hcd xhci_pci xhci_hcd 
ohci_hcd ehci_pci ehci_platform ehci_hcd usbcore usb_common nls_base 
qca_ssdk gpio_pca953x mii_gpio wil6210 ath10k_pci ath10k_core ath9k 
ath9k_common ath9k_hw ath mac80211 cfg80211 compat

[   24.880852] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.62-rc1 #90
[   24.887189] Hardware name: AnnapurnaLabs Alpine (Device Tree)
[   24.892921] task: ef029ac0 task.stack: ef05a000
[   24.897444] PC is at nf_nat_cleanup_conntrack+0x4c/0x74
[   24.902657] LR is at nf_nat_cleanup_conntrack+0x38/0x74
[   24.907869] pc : []    lr : []    psr: 6153
[   24.907869] sp : ef05bb58  ip : ef05bb58  fp : ef05bb6c
[   24.919317] r10: ed230cc0  r9 : ed230c00  r8 : edf45800
[   24.924529] r7 : ebcd2f00  r6 : ec33739e  r5 : c0892294  r4 : ebcd2f00
[   24.931040] r3 :   r2 : 0055  r1 :   r0 : c0892718
[   24.937551] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  
Segment user

[   24.944755] Control: 10c5387d  Table: 2bd1006a  DAC: 0055
[   24.950486] Process swapper/2 (pid: 0, stack limit = 0xef05a210)
[   24.956477] Stack: (0xef05bb58 to 0xef05c000)


will dig into the code to find the reason


Am 13.11.2017 um 13:55 schrieb Greg Kroah-Hartman:

This is the start of the stable review cycle for the 4.9.62 release.
There are 87 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Nov 15 12:55:40 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.62-rc1.gz
or in the git tree and branch at:
   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-
Pseudo-Shortlog of commits:

Greg Kroah-Hartman 
 Linux 4.9.62-rc1

Borislav Petkov 
 x86/oprofile/ppro: Do not use __this_cpu*() in preemptible context

Pavel Tatashin 
 x86/smpboot: Make optimization of delay calibration work correctly

Florian Westphal 
 netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to 
rhashtable"

Richard Schütz 
 can: c_can: don't indicate triple sampling support for D_CAN

Marek Vasut 
 can: ifi: Fix transmitter delay calculation

Gerhard Bertelsmann 
 can: sun4i: handle overrun in RX FIFO

John Stultz 
 drm/bridge: adv7511: Re-write the i2c address before EDID probing

John Stultz 
 drm/bridge: adv7511: Reuse __adv7511_power_on/off() when probing EDID

John Stultz 
 drm/bridge: adv7511: Rework adv7511_power_on/off() so they can be reused 
internally

Sinclair Yeh 
 drm/vmwgfx: Fix Ubuntu 17.10 Wayland black screen issue

Ilya Dryomov 
 rbd: use GFP_NOIO for parent stat and data requests

Kai-Heng Feng 
 Input: elan_i2c - add ELAN060C to the ACPI table

Oswald Buddenhagen 
 MIPS: AR7: Ensure that serial ports are properly set up

Jonas Gorski 
 MIPS: AR7: Defer registration of GPIO

Jaedon Shin 
 MIPS: BMIPS: Fix missing cbr address

Marcus Cooper 
 ASoC: sun4i-spdif: remove legacy dapm components

Luis R. Rodriguez 
 tools: firmware: check for distro fallback udev cancel rule

Luis R. Rodriguez 
 selftests: firmware: send expected errors to /dev/null

Matt Redfearn 
 MIPS: SMP: Fix deadlock & online race

Matija Glavinic Pecotic 
 MIPS: Fix race on setting and getting cpu_online_mask

Matt Redfearn 
 MIPS: SMP: Use a completion event to signal CPU up

Paul Burton 
 MIPS: Fix CM region target definitions

Gustavo A. R. Silva 
 MIPS: microMIPS: Fix incorrect mask in insn_table_MM

Maarten Lankhorst 
 drm/i915: Do not rely on wm preservation for ILK watermarks

Takashi Iwai 
 ALSA: seq: Avoid invalid lockdep class warning

Takashi Iwai 
 ALSA: seq: Fix OSS sysex delivery in OSS emulation

Mark Rutland 
 ARM: 8720/1: ensure dump_instr() checks addr_limit

Eric Biggers 
 KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2]

Andrey Ryabinin 
 crypto: x86/sha256-mb - fix panic due to unaligned access

Andrey Ryabinin 
 crypto: x86/sha1-mb - fix panic due to unaligned access

Romain Izard 
 crypto: ccm - preserve the IV buffer

Li Bin 
 workqueue: Fix NULL pointer dereference

Peter Zijlstra 
 x86/uaccess, sched/preempt: Verify access_ok() context

Carlo Caione 
 platform/x86: hp-wmi: Do not shadow error values

Carlo Caione 
 platform/x86: hp-wmi: Fix error value for hp_wmi_tablet_state

Eric Biggers 
 KEYS: trusted: fix writing past end of buffer in trusted_read()

Eric Biggers 
 KEYS: tr

Re: [PATCH 1/2] arm64: mm: abort uaccess retries upon fatal signal

2017-11-13 Thread Rabin Vincent

On Tue, Aug 22, 2017 at 10:45:24AM +0100, Will Deacon wrote:
> On Mon, Aug 21, 2017 at 02:42:03PM +0100, Mark Rutland wrote:
> > On Tue, Jul 11, 2017 at 03:58:49PM +0100, Will Deacon wrote:
> > > On Tue, Jul 11, 2017 at 03:19:22PM +0100, Mark Rutland wrote:
> > > > When there's a fatal signal pending, arm64's do_page_fault()
> > > > implementation returns 0. The intent is that we'll return to the
> > > > faulting userspace instruction, delivering the signal on the way.
> > > > 
> > > > However, if we take a fatal signal during fixing up a uaccess, this
> > > > results in a return to the faulting kernel instruction, which will be
> > > > instantly retried, resulting in the same fault being taken forever. As
> > > > the task never reaches userspace, the signal is not delivered, and the
> > > > task is left unkillable. While the task is stuck in this state, it can
> > > > inhibit the forward progress of the system.
> > > > 
> > > > To avoid this, we must ensure that when a fatal signal is pending, we
> > > > apply any necessary fixup for a faulting kernel instruction. Thus we
> > > > will return to an error path, and it is up to that code to make forward
> > > > progress towards delivering the fatal signal.
> > > > 
> > > > Signed-off-by: Mark Rutland 
> > > > Reviewed-by: Steve Capper 
> > > > Tested-by: Steve Capper 
> > > > Cc: Catalin Marinas 
> > > > Cc: James Morse 
> > > > Cc: Laura Abbott 
> > > > Cc: Will Deacon 
> > > > Cc: sta...@vger.kernel.org
> > > > ---
> > > >  arch/arm64/mm/fault.c | 5 -
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > > > index 37b95df..3952d5e 100644
> > > > --- a/arch/arm64/mm/fault.c
> > > > +++ b/arch/arm64/mm/fault.c
> > > > @@ -397,8 +397,11 @@ static int __kprobes do_page_fault(unsigned long 
> > > > addr, unsigned int esr,
> > > >  * signal first. We do not need to release the mmap_sem because 
> > > > it
> > > >  * would already be released in __lock_page_or_retry in 
> > > > mm/filemap.c.
> > > >  */
> > > > -   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> > > > +   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> > > > +   if (!user_mode(regs))
> > > > +   goto no_context;
> > > > return 0;
> > > > +   }
> > > 
> > > This will need rebasing at -rc1 (take a look at current HEAD).
> > > 
> > > Also, I think it introduces a weird corner case where we take a page fault
> > > when writing the signal frame to the user stack to deliver a SIGSEGV. If
> > > we end up with VM_FAULT_RETRY and somebody has sent a SIGKILL to the task,
> > > then we'll fail setup_sigframe and force an un-handleable SIGSEGV instead
> > > of SIGKILL.
> > > 
> > > The end result (task is killed) is the same, but the fatal signal is 
> > > wrong.
> > 
> > That doesn't seem to be the case, testing on v4.13-rc5.
> > 
> > I used sigaltstack() to use the userfaultfd region as signal stack,
> > registerd a SIGSEGV handler, and dereferenced NULL. The task locks up,
> > but when killed with a SIGINT or SIGKILL, the exit status reflects that
> > signal, rather than the SIGSEGV.
> > 
> > If I move the SIGINT handler onto the userfaultfd-monitored stack, then
> > delivering SIGINT hangs, but can be killed with SIGKILL, and the exit
> > status reflects that SIGKILL.
> > 
> > As you say, it does look like we'd try to set up a deferred SIGSEGV for
> > the failed signal delivery.
> > 
> > I haven't yet figured out exactly how that works; I'll keep digging.
> 
> The SEGV makes it all the way into do_group_exit, but then signal_group_exit
> is set and the exit_code is overridden with SIGKILL at the last minute (see
> complete_signal).

Unfortunately, this last minute is too late for print-fatal-signals.
With print-fatal-signals enabled, this patch leads to misleading
"potentially unexpected fatal signal 11" warnings if a process is
SIGKILL'd at the right time.

I've seen it without userfaultfd, but it's easiest reproduced by
patching Mark's original test code [1] with the following patch and
simply running "pkill -WINCH foo; pkill -KILL foo".  This results in:

 foo: potentially unexpected fatal signal 11.
 CPU: 1 PID: 1793 Comm: foo Not tainted 4.9.58-devel #3
 task: b3534780 task.stack: b4b7c000
 PC is at 0x76effb60
 LR is at 0x4227f4
 pc : [<76effb60>]lr : [<004227f4>]psr: 600b0010
 sp : 7eaf7bb4  ip :   fp : 
 r10: 0001  r9 : 0003  r8 : 76fcd000
 r7 : 001d  r6 : 76fd0cf0  r5 : 7eaf7c08  r4 : 
 r3 :   r2 :   r1 : 7eaf7a88  r0 : fffc
 Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
 Control: 10c5387d  Table: 3357404a  DAC: 0055
 CPU: 1 PID: 1793 Comm: foo Not tainted 4.9.58-devel #3
 [<801113f0>] (unwind_backtrace) from [<8010cfb0>] (show_stack+0x18/0x1c)
 [<8010cfb0>] (show_stack) from [<8039725c>] (dump_stack+0x84/0

Re: [PATCH v2] coccinelle: fix parallel build with CHECK=scripts/coccicheck

2017-11-13 Thread Julia Lawall

On Tue, 14 Nov 2017, Masahiro Yamada wrote:

> Hi Julia,
>
> 2017-11-14 1:45 GMT+09:00 Julia Lawall :
> >
> >
> > On Tue, 14 Nov 2017, Masahiro Yamada wrote:
> >
> >> Hi Julia,
> >>
> >>
> >> 2017-11-14 0:30 GMT+09:00 Julia Lawall :
> >> >
> >> >
> >> > On Thu, 9 Nov 2017, Masahiro Yamada wrote:
> >> >
> >> >> The command "make -j8 C=1 CHECK=scripts/coccicheck" produces lots of
> >> >> "coccicheck failed" error messages.
> >> >>
> >> >> I do not know the coccinelle internals, but I guess --jobs does not
> >> >> work well if spatch is invoked from Make running in parallel.
> >> >> Disable --jobs in this case.
> >> >
> >> > Why is this change under:
> >> >
> >> > if [ "$C" = "1" -o "$C" = "2" ];
> >> >
> >> > The coccicheck failed messages come also if one runs Coccinelle on the
> >> > entire kernel.
> >>
> >> As far as I tested, "coccicheck failed" error only happens
> >> when ONLINE=1.
> >>
> >>
> >> make -j8 C=1 CHECK=scripts/coccicheck  
> >> COCCI=scripts/coccinelle/misc/bugon.cocci
> >>
> >> emits lots of errors.
> >>
> >>
> >> make -j8 coccicheck  COCCI=scripts/coccinelle/misc/bugon.cocci
> >>
> >> is fine.
> >>
> >>
> >> Have you tested it?
> >> Do you mean you got a different result from mine?
> >
> > I agree with your results, with respect to the number of errors.
> >
> > julia
> >
>
> So, what shall we do?
>
> If you do not like to fix it (or you can fix coccinelle itself),
> I can take back this patch.

I'm OK with your fix.  I will check and ack it today.

> I am not a coccinelle developer, so
> setting USE_JOBS="no" is the best I can do.

The problem on the Coccinelle side is that it uses a subdirectory with the
name of the semantic patch to store standard output and standard error for
the different threads.  I didn't want to use a name with the pid, so that
one could easily find this information while Coccinelle is running.
Normally the subdirectory is cleaned up when Coccinelle completes, so
there is only one of them at a time.  Maybe it is best to just add the
pid.  There is the risk that these subdirectories will accumulate if
Coccinelle crashes in a way such that they don't get cleaned up, but
Coccinelle could print a warning if it detects this case, rather than
failing.

Still I think it is useful to do something on the make coccicheck side,
because there is no need for the double layer of parallelism.

julia

Re: [PATCH v6 4/5] crash: export paddr_vmcoreinfo_note()

2017-11-13 Thread Dave Young

On 11/13/17 at 08:29pm, Marc-André Lureau wrote:
> The following patch is going to use the symbol from the fw_cfg module,
> to call the function and write the note location details in the
> vmcoreinfo entry, so qemu can produce dumps with the vmcoreinfo note.
> 
> CC: Andrew Morton 
> CC: Baoquan He 
> CC: Dave Young 
> CC: Dave Young 
> CC: Hari Bathini 
> CC: Tony Luck 
> CC: Vivek Goyal 
> Signed-off-by: Marc-André Lureau 
> Acked-by: Gabriel Somlo 
> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 6db80fc0810b..47541c891810 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -375,6 +375,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
>  {
>   return __pa(vmcoreinfo_note);
>  }
> +EXPORT_SYMBOL(paddr_vmcoreinfo_note);
>  
>  static int __init crash_save_vmcoreinfo_init(void)
>  {
> -- 
> 2.15.0.125.g8f49766d64
> 

Acked-by: Dave Young 

Thanks
Dave

Re: n900 in next-20170901

2017-11-13 Thread Joonsoo Kim

On Mon, Nov 13, 2017 at 01:15:30PM -0800, Tony Lindgren wrote:
> * Tony Lindgren  [171110 07:36]:
> > * Joonsoo Kim  [171110 06:34]:
> > > On Thu, Nov 09, 2017 at 07:26:10PM -0800, Tony Lindgren wrote:
> > > > +#define OMAP34XX_SRAM_PHYS 0x4020
> > > > +#define OMAP34XX_SRAM_VIRT 0xd001
> > > > +#define OMAP34XX_SRAM_SIZE 0x1
> > > 
> > > For my testing environment, vmalloc address space is started at
> > > roughly 0xe000 so 0xd001 would not be valid.
> > 
> > Well we can map it anywhere we want, got any preferences?
> 
> Hmm and I'm also wondering what you do to make vmalloc space to
> start at 0xe000 instead of 0xd000?

Please see the another reply.

> 
> The reason I'm asking is because I think we can just move all of
> save_secure_ram_context to run from DDR instead of SRAM. But I'd
> rather do a minimal patch first that fixes your series and then we
> can test the further changes with more time.

Okay. I agree to make a minimal patch first and then go next step.

> After moving save_secure_ram_context to DDR, we can call
> save_secure_ram_context directly with something like:
> 
>   args_pa = __pa(omap3_secure_ram_storage);
>   offset = (unsigned long)omap3_secure_ram_storage - args_pa;
>   ret = save_secure_ram_context(args_pa, offset);
> 
> > Just that the current save_secure_ram_context uses "high_mask"
> > of 0x to translate the address. To make this more flexible,
> > we need the save_secure_ram_context changes too. So we might
> > want to do the static mapping and save_secure_ram_context changes
> > as a single patch.
> > 
> > > And, PHYS can be different according to the system type. Maybe either
> > > OMAP3_SRAM_PUB_PA or OMAP3_SRAM_PA. It seems that SIZE and TYPE should
> > > be considered, too. My understanding is correct?
> > 
> > We can have a static map for the whole SRAM area, see function
> > __arm_ioremap_pfn_caller() for the comment "Try to reuse one of the
> > static mapping whenever possible". So the different public SRAM start
> > addresses and sizes don't matter there.
> 
> And then if save_secure_ram_contet runs in DDR, no static map is
> needed.

Okay.

Thanks.

Re: Fwd: linux v4.14 causes firmware iwlwifi errors on Lenovo Thinkpad T440s

2017-11-13 Thread Luciano Coelho

On Mon, 2017-11-13 at 16:23 -0600, Larry Finger wrote:
> On 11/13/2017 03:30 PM, Bartosz Golaszewski wrote:
> > 2017-11-13 21:45 GMT+01:00 Larry Finger 
> > :
> > > On 11/13/2017 02:22 PM, Bartosz Golaszewski wrote:
> > > > 
> > > > Forwarding here too as I messed up the address the last time.
> > > > --
> > > > 
> > > > Hi,
> > > > 
> > > > I noticed my wireless interface can't get up with linux v4.14
> > > > and the
> > > > kernel log is flooded with firmware errors:
> > > > 
> > > > iwlwifi :03:00.0: Firmware error during reconfiguration -
> > > > reprobe!
> > > > iwlwifi :03:00.0: FW error in SYNC CMD DQA_ENABLE_CMD
> > > > 
> > > > and
> > > > 
> > > > ieee80211 phy63: Hardware restart was requested.
> > > > 
> > > > The wireless controller is: Intel Corporation Wireless 7260
> > > > (rev 83)
> > > > Firmware used is: iwlwifi-7260-17
> > > > 
> > > > Everything works fine with v4.13.12.
> > > > 
> > > > I didn't have time today to bisect for the offending commit.
> > > > Full log
> > > > uploaded[1].
> > > > 
> > > > Best regards,
> > > > Bartosz Golaszewski
> > > > 
> > > > [1] https://pastebin.com/jksqxvS6
> > > 
> > > 
> > > Your log shows "iwlwifi :03:00.0: loaded firmware version
> > > 17.228510.0
> > > op_mode iwlmvm"
> > > 
> > > Mine, where the 7260 works, shows "iwlwifi :04:00.0: loaded
> > > firmware
> > > version 17.459231.0 op_mode iwlmvm".
> > > 
> > > It seems as if you need newer firmware. A detailed file listing
> > > shows
> > > "-rw-r--r-- 1 root root 1049340 Oct  9 12:03
> > > /lib/firmware/iwlwifi-7260-17.ucode". That date is likely when I
> > > installed
> > > the updated kernel firmware package from my distro. The md5sum
> > > for the file
> > > is 73a217f55c47d3a70bb5dbbe1d676423.
> > > 
> > > Larry
> > > 
> > 
> > Ok so it seems the version in linux-firmware is outdated. The file
> > you're using is available on github[1] and fixed the issue for me.
> > 
> > Thanks!
> > Bartosz Golaszewski
> > 
> > [1] https://github.com/OpenELEC/iwlwifi-firmware
> 
> Interesting. Using md5sum of the git repo for linux-firmware gets
> 73a217f55c47d3a70bb5dbbe1d676423  iwlwifi-7260-17.ucode.
> 
> That is the file I'm using.

You shouldn't use firmwares from github or anywhere else except from
the two official places where we release it:

Mainline (but slow to get updated):

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/iwlwifi-7260-17.ucode


Our official public tree:

https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7260-17.ucode


And, of course, your distro may distribute these in an official package
as well, but it's good to check the versions to be sure you're running
the latest one we released.

--
Cheers,
Luca.

Re: n900 in next-20170901

2017-11-13 Thread Joonsoo Kim

On Fri, Nov 10, 2017 at 07:36:20AM -0800, Tony Lindgren wrote:
> * Joonsoo Kim  [171110 06:34]:
> > On Thu, Nov 09, 2017 at 07:26:10PM -0800, Tony Lindgren wrote:
> > > +#define OMAP34XX_SRAM_PHYS   0x4020
> > > +#define OMAP34XX_SRAM_VIRT   0xd001
> > > +#define OMAP34XX_SRAM_SIZE   0x1
> > 
> > For my testing environment, vmalloc address space is started at
> > roughly 0xe000 so 0xd001 would not be valid.
> 
> Well we can map it anywhere we want, got any preferences?

My testing environment is a beagle-(xm?) for QEMU. It is configured by
CONFIG_VMSPLIT_3G=y so kernel address space is started at 0xc000.
And, it has 512 MB memory so 0xc000 ~ 0xdff0 is used for
direct mapping. See below.

[0.00] Memory: 429504K/522240K available (11264K kernel code,
1562K rwdata, 4288K rodata, 2048K init, 405K bss, 27200K reserved,
65536K cma-reserved, 0K highmem)
[0.00] Virtual kernel memory layout:
[0.00] vector  : 0x - 0x1000   (   4 kB)
[0.00] fixmap  : 0xffc0 - 0xfff0   (3072 kB)
[0.00] vmalloc : 0xe000 - 0xff80   ( 504 MB)
[0.00] lowmem  : 0xc000 - 0xdff0   ( 511 MB)
[0.00] pkmap   : 0xbfe0 - 0xc000   (   2 MB)
[0.00] modules : 0xbf00 - 0xbfe0   (  14 MB)
[0.00]   .text : 0xc0208000 - 0xc0e0   (12256 kB)
[0.00]   .init : 0xc130 - 0xc150   (2048 kB)
[0.00]   .data : 0xc150 - 0xc1686810   (1563 kB)
[0.00].bss : 0xc168fc68 - 0xc16f512c   ( 406 kB)

Therefore, if OMAP34XX_SRAM_VIRT is 0xd001, direct mapping is
broken and the system doesn't work. I guess that we should use more
stable address like as 0xf000.

> 
> Just that the current save_secure_ram_context uses "high_mask"
> of 0x to translate the address. To make this more flexible,
> we need the save_secure_ram_context changes too. So we might
> want to do the static mapping and save_secure_ram_context changes
> as a single patch.
> 
> > And, PHYS can be different according to the system type. Maybe either
> > OMAP3_SRAM_PUB_PA or OMAP3_SRAM_PA. It seems that SIZE and TYPE should
> > be considered, too. My understanding is correct?
> 
> We can have a static map for the whole SRAM area, see function
> __arm_ioremap_pfn_caller() for the comment "Try to reuse one of the
> static mapping whenever possible". So the different public SRAM start
> addresses and sizes don't matter there.

Okay. Look fine with SRAM start addresses and sizes. However, we need
to consider mtype since __arm_ioremap_pfn_caller() doesn't reuse the
mapping if mtype is different. mtype can be either MT_MEMORY_RWX or
MT_MEMORY_RWX_NONCACHED.

Thanks.

[RESEND PATCH v2 2/4] x86/umip: Inform that UMIP has been enabled

2017-11-13 Thread Ricardo Neri

Let us have an indication that this feature has been enabled.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Ingo Molnar 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/cpu/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 13ae9e5..fa998ca 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -341,6 +341,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 
*c)
 
cr4_set_bits(X86_CR4_UMIP);
 
+   pr_info("x86/cpu: Activated the Intel User Mode Instruction Prevention 
(UMIP) CPU feature\n");
+
return;
 
 out:
-- 
2.7.4

[PATCH] usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub

2017-11-13 Thread Kai-Heng Feng

KY-688 USB 3.1 Type-C Hub internally uses a Genesys Logic hub to connect
to Realtek r8153.

Similar to commit ("7496cfe5431f2 usb: quirks: Add no-lpm quirk for Moshi
USB to Ethernet Adapter"), no-lpm can make r8153 ethernet work.

Signed-off-by: Kai-Heng Feng 
---
 drivers/usb/core/quirks.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index a6aaf2f193a4..12246da8fcf6 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -151,6 +151,9 @@ static const struct usb_device_id usb_quirk_list[] = {
/* appletouch */
{ USB_DEVICE(0x05ac, 0x021a), .driver_info = USB_QUIRK_RESET_RESUME },
 
+   /* Genesys Logic hub, internally used by KY-688 USB 3.1 Type-C Hub */
+   { USB_DEVICE(0x05e3, 0x0612), .driver_info = USB_QUIRK_NO_LPM },
+
/* Genesys Logic hub, internally used by Moshi USB to Ethernet Adapter 
*/
{ USB_DEVICE(0x05e3, 0x0616), .driver_info = USB_QUIRK_NO_LPM },
 
-- 
2.14.1

[RESEND PATCH v2 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ricardo Neri

The instructions STR and SLDT are not emulated in any case. Thus, it made
sense to not implement functionality to identify them. However, a
subsequent commit will introduce functionality to warn about the use of
all the instructions that UMIP protect, not only those that are emulated.
A first step for that is the ability to identify them.

Plus, now that STR and SLDT are identified, we need to explicitly avoid
their emulation (i.e., not rely on unsuccessful identification). Group
togehter all the cases that we do not want to emulate: STR, SLDT and user
long mode processes.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Signed-off-by: Ricardo Neri 
---
This patch also corrects the #define of SMSW. This change does not have a
functional impact as it is only used as an identifier.
---
 arch/x86/kernel/umip.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 6ba82be..2e09b5b 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -78,7 +78,9 @@
 
 #defineUMIP_INST_SGDT  0   /* 0F 01 /0 */
 #defineUMIP_INST_SIDT  1   /* 0F 01 /1 */
-#defineUMIP_INST_SMSW  3   /* 0F 01 /4 */
+#defineUMIP_INST_SMSW  2   /* 0F 01 /4 */
+#defineUMIP_INST_SLDT  3   /* 0F 00 /0 */
+#defineUMIP_INST_STR   4   /* 0F 00 /1 */
 
 /**
  * identify_insn() - Identify a UMIP-protected instruction
@@ -118,10 +120,16 @@ static int identify_insn(struct insn *insn)
default:
return -EINVAL;
}
+   } else if (insn->opcode.bytes[1] == 0x0) {
+   if (X86_MODRM_REG(insn->modrm.value) == 0)
+   return UMIP_INST_SLDT;
+   else if (X86_MODRM_REG(insn->modrm.value) == 1)
+   return UMIP_INST_STR;
+   else
+   return -EINVAL;
+   } else {
+   return -EINVAL;
}
-
-   /* SLDT AND STR are not emulated */
-   return -EINVAL;
 }
 
 /**
@@ -267,10 +275,6 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (!regs)
return false;
 
-   /* Do not emulate 64-bit processes. */
-   if (user_64bit_mode(regs))
-   return false;
-
/*
 * If not in user-space long mode, a custom code segment could be in
 * use. This is true in protected mode (if the process defined a local
@@ -322,6 +326,11 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (umip_inst < 0)
return false;
 
+   /* Do not emulate SLDT, STR or user long mode processes. */
+   if (umip_inst == UMIP_INST_STR || umip_inst == UMIP_INST_SLDT ||
+   user_64bit_mode(regs))
+   return false;
+
if (emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
return false;
 
-- 
2.7.4

[RESEND PATCH v2 4/4] x86/umip: Warn if UMIP-protected instructions are used

2017-11-13 Thread Ricardo Neri

Issue a rate-limited warning whenever any of the instructions that UMIP
protects (i.e., SGDT, SIDT, SLDT, STR and SMSW) are used by user space
programs.

This is useful because, with UMIP enabled, the few programs that use such
instructions will start receiving a SIGSEGV signal. In the specific cases
for which emulation is provided (instructions SGDT, SIDT and SMSW in
protected and virtual-8086 modes), a warning is also helpful to encourage
updates in such programs to avoid the use of such instructions.

An existing rate-limited pr_err() is converted to use the new function
umip_pr_warn() in order to have it printing at the same rate and log
level.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Linus Torvalds 
Signed-off-by: Ricardo Neri 
---
 arch/x86/kernel/umip.c | 65 +-
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 2e09b5b..50f4b11 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -82,6 +82,54 @@
 #defineUMIP_INST_SLDT  3   /* 0F 00 /0 */
 #defineUMIP_INST_STR   4   /* 0F 00 /1 */
 
+const char * const umip_insns[5] = {
+   [UMIP_INST_SGDT] = "sgdt",
+   [UMIP_INST_SIDT] = "sidt",
+   [UMIP_INST_SMSW] = "smsw",
+   [UMIP_INST_SLDT] = "sldt",
+   [UMIP_INST_STR] = "str",
+};
+
+/*
+ * If you change these strings, ensure that buffers using them are sufficiently
+ * large.
+ */
+static const char umip_warn_use[] = "cannot be used by applications.";
+static const char umip_warn_emu[] = "For now, expensive software emulation 
returns result.";
+
+/**
+ * umip_pr_warn() - Print a rate-limited warning
+ * @regs:  Register set with the context in which the warning is printed
+ * @msg:   Pointer to a string with the warning message
+ * @error: Error code to print along with the warning
+ *
+ * Print the message contained in @msg along with the task name, ID number and
+ * instruction and stack pointers of the associated process. Optionally, an
+ * error code is printed if @error is not zero. These warning messages are
+ * limited to a burst of 5 messages every two minutes.
+ *
+ * Returns:
+ *
+ * None.
+ */
+static void umip_pr_warn(struct pt_regs *regs, char *msg, long error)
+{
+   struct task_struct *tsk = current;
+   char err_str[8 + BITS_PER_LONG / 4] = "";
+
+   /* Bursts of 5 messages every two minutes */
+   static DEFINE_RATELIMIT_STATE(ratelimit, 2 * 60 * HZ, 5);
+
+   if (!__ratelimit(&ratelimit))
+   return;
+
+   if (error)
+   snprintf(err_str, sizeof(err_str), " error:%lx", error);
+
+   pr_warn("%s[%d] %s ip:%lx sp:%lx%s\n", tsk->comm, task_pid_nr(tsk), msg,
+   regs->ip, regs->sp, err_str);
+}
+
 /**
  * identify_insn() - Identify a UMIP-protected instruction
  * @insn:  Instruction structure with opcode and ModRM byte.
@@ -236,10 +284,7 @@ static void force_sig_info_umip_fault(void __user *addr, 
struct pt_regs *regs)
if (!(show_unhandled_signals && unhandled_signal(tsk, SIGSEGV)))
return;
 
-   pr_err_ratelimited("%s[%d] umip emulation segfault ip:%lx sp:%lx 
error:%x in %lx\n",
-  tsk->comm, task_pid_nr(tsk), regs->ip,
-  regs->sp, X86_PF_USER | X86_PF_WRITE,
-  regs->ip);
+   umip_pr_warn(regs, "segfault at", X86_PF_USER | X86_PF_WRITE);
 }
 
 /**
@@ -264,10 +309,10 @@ static void force_sig_info_umip_fault(void __user *addr, 
struct pt_regs *regs)
 bool fixup_umip_exception(struct pt_regs *regs)
 {
int not_copied, nr_copied, reg_offset, dummy_data_size, umip_inst;
+   unsigned char buf[MAX_INSN_SIZE], warn[128];
unsigned long seg_base = 0, *reg_addr;
/* 10 bytes is the maximum size of the result of UMIP instructions */
unsigned char dummy_data[10] = { 0 };
-   unsigned char buf[MAX_INSN_SIZE];
void __user *uaddr;
struct insn insn;
char seg_defs;
@@ -326,10 +371,18 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (umip_inst < 0)
return false;
 
+   snprintf(warn, sizeof(warn), "%s %s", umip_insns[umip_inst],
+umip_warn_use);
+
/* Do not emulate SLDT, STR or user long mode processes. */
if (umip_inst == UMIP_INST_STR || umip_inst == UMIP_INST_SLDT ||
-   user_64bit_mode(regs))
+   user_64bit_mode(regs)) {
+   umip_pr_warn(regs, warn, 0);
return false;
+   }
+
+   snprintf(warn, sizeof(warn), "%s %s", warn, umip_warn_emu);
+   umip_pr_warn(regs, warn, 0);
 
if (emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
return false;
-- 
2.7.4

[RESEND PATCH v2 0/4] x86: Tweaks for UMIP

2017-11-13 Thread Ricardo Neri

[To tip maintainers: This is a resend to copy the Linux kernel mailing
list. No changes in the patches since my original v2 submission.]

Now that the support for UMIP [1], [2] has been merged in the tip tree,
this series add a couple of tweaks.

Ingo asked for two small additions to select UMIP by default when building
and inform of this feature being enabled [3].

Also, Linus suggested to issue a rate-limited warning whenever the any of
the instructions that UMIP protects are used by user space programs [4].
This is useful to give programs a hint on the reason for which they start
seeing an unexpected SIGSEGV signal. Also, it helps to encourage updates
to those programs and avoid using these instructions if possible.

Thanks and BR,
Ricardo

[1]. https://lkml.org/lkml/2017/10/27/699
[2]. https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1523438.html
[3]. https://lkml.org/lkml/2017/11/8/238
[4]. https://lkml.org/lkml/2017/11/8/593

Changes since V1:
* Capitalize all the instructions' mnemonics in both code and patch
  descriptions.
* Correct documentation of umip_pr_warn() to correctly reflect the function
  name.
* Update description of patch #4 to describe the update to the existing
  rate-limited pr_err().

Ricardo Neri (4):
  x86/umip: Select X86_INTEL_UMIP by default
  x86/umip: Inform that UMIP has been enabled
  x86/umip: Identify the str and sldt instructions
  x86/umip: Warn if UMIP-protected instructions are used

 arch/x86/Kconfig | 10 -
 arch/x86/kernel/cpu/common.c |  2 +
 arch/x86/kernel/umip.c   | 88 +---
 3 files changed, 85 insertions(+), 15 deletions(-)

-- 
2.7.4

[RESEND PATCH v2 1/4] x86/umip: Select X86_INTEL_UMIP by default

2017-11-13 Thread Ricardo Neri

UMIP does not incur in a significant performance penalty. Furthermore, it
is triggered only when a small group of instructions are used from user
space programs.

While here, provide more details on the benefits UMIP provides and the
behavior that can expect the few applications that use the instructions
protected by UMIP.

Cc: Andy Lutomirski 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Paolo Bonzini 
Cc: Ravi V. Shankar 
Cc: x...@kernel.org
Suggested-by: Ingo Molnar 
Signed-off-by: Ricardo Neri 
---
 arch/x86/Kconfig | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f08977d..a524a7a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1805,14 +1805,20 @@ config X86_SMAP
  If unsure, say Y.
 
 config X86_INTEL_UMIP
-   def_bool n
+   def_bool y
depends on CPU_SUP_INTEL
prompt "Intel User Mode Instruction Prevention" if EXPERT
---help---
  The User Mode Instruction Prevention (UMIP) is a security
  feature in newer Intel processors. If enabled, a general
  protection fault is issued if the instructions SGDT, SLDT,
- SIDT, SMSW and STR are executed in user mode.
+ SIDT, SMSW and STR are executed in user mode. These instructions
+ unnecessarily expose information about the hardware state.
+
+ The vast majority of applications do not use these instructions.
+ For the very few that do, software emulation is provided in
+ specific cases in protected and virtual-8086 modes. Emulated
+ results are dummy.
 
 config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)"
-- 
2.7.4

Re: [PATCH v4 2/4] KVM: X86: Add paravirt remote TLB flush

2017-11-13 Thread Wanpeng Li

2017-11-13 18:46 GMT+08:00 Peter Zijlstra :
> On Mon, Nov 13, 2017 at 04:26:57PM +0800, Wanpeng Li wrote:
>> 2017-11-13 16:04 GMT+08:00 Peter Zijlstra :
>
>> > So if at this point a vCPU gets preempted we'll still spin-wait for it,
>> > which is sub-optimal.
>> >
>> > I think we can come up with something to get around that 'problem' if
>> > indeed it is a problem. But we can easily do that as follow up patches.
>> > Just let me know if you think its worth spending more time on.
>>
>> You can post your idea, it is always smart. :) Then we can evaluate
>> the complexity and gains.
>
> I'm not sure I have a fully baked idea just yet, but the general idea
> would be something like:
>
>  - switch (back) to a dedicated TLB invalidate IPI
>
>  - introduce KVM_VCPU_IPI_PENDING
>
>  - change flush_tlb_others() into something like:
>
>for_each_cpu(cpu, flushmask) {
>  src = &per_cpu(steal_time, cpu);
>  state = READ_ONCE(src->preempted);
>  do {
>  if (state & KVM_VCPU_PREEMPTED) {
>  if (try_cmpxchg(&src->preempted, &state,
>  state | 
> KVM_VCPU_SHOULD_FLUSH)) {
>  __cpumask_clear_cpu(cpu, flushmask);
>  break;
>  }
>  }
>  } while (!try_cmpxchg(&src->preempted, &state,
>  state | KVM_VCPU_IPI_PENDING));
>}
>
>apic->send_IPI_mask(flushmask, CALL_TLB_VECTOR);
>
>for_each_cpu(cpu, flushmask) {
>  src = &per_cpu(steal_time, cpu);
>  smp_cond_load_acquire(&src->preempted, !(VAL & KVM_VCPU_IPI_PENDING);
>}
>
>
>  - have the TLB invalidate handler do something like:
>
>state = READ_ONCE(src->preempted);
>if (!(state & KVM_VCPU_IPI_PENDING))
>return;
>
>local_flush_tlb();
>
>do {
>} while (!try_cmpxchg(&src->preempted, &state,
>  state & ~KVM_VCPU_IPI_PENDING));

There are a lot of cases handled by flush_tlb_func_remote() ->
flush_tlb_function_common(), so I'm afraid to have hole.

Regards,
Wanpeng Li

>
>  - then at VMEXIT time do something like:
>
>state = READ_ONCE(src->preempted);
>do {
> if (!(state & KVM_VCPU_IPI_PENDING))
> break;
>} while (!try_cmpxchg(&src->preempted, state,
>  (state & ~KVM_VCPU_IPI_PENDING) |
>  KVM_VCPU_SHOULD_FLUSH));
>
>and clear any possible pending TLB_VECTOR in the guest state to avoid
>raising that IPI spuriously on enter again.
>
>
> This way the preemption will clear the IPI_PENDING and the
> flush_others() wait loop will terminate.

Re: 答复: [f2fs-dev] [PATCH RESEND] f2fs: validate before set/clear free nat bitmap

2017-11-13 Thread Chao Yu

On 2017/11/14 13:59, LiFan wrote:
> Sorry, it seems my company mailbox single mail would cut the long line short
> automatically.
> It's fine in my outlook mail, so I overlooked.

Maybe 'git send-email' can be one of your options to save some work in your
email client? ;)

Thanks,

> I haven't find a way to solve that yet, please hold both of my new patch.
> I will fix it as soon as possible.
> 
> 
> -邮件原件-
> 发件人: Jaegeuk Kim [mailto:jaeg...@kernel.org] 
> 发送时间: 2017年11月14日 12:54
> 收件人: LiFan
> 抄送: 'Chao Yu'; 'Chao Yu'; linux-kernel@vger.kernel.org;
> linux-f2fs-de...@lists.sourceforge.net
> 主题: Re: [f2fs-dev] [PATCH RESEND] f2fs: validate before set/clear free nat
> bitmap
> 
> Sorry, I can't merge this patch due to wrong format.
> 
> On 11/11, LiFan wrote:
>> In flush_nat_entries, all dirty nats will be flushed and if their new 
>> address isn't NULL_ADDR, their bitmaps will be updated, the 
>> free_nid_count of the bitmaps will be increased regardless of whether 
>> the nats have already been occupied before. This could lead to wrong 
>> free_nid_count.
>> So this patch checks the status of the bits before actually set/clear
> them.
>>
>> Fixes: 586d1492f301 ("f2fs: skip scanning free nid bitmap of full NAT
>> blocks")
>>
>> Signed-off-by: Fan li 
>> ---
>>  fs/f2fs/node.c | 17 ++---
>>  1 file changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index d234c6e..b965a53 
>> 100644
>> --- a/fs/f2fs/node.c
>> +++ b/fs/f2fs/node.c
>> @@ -1906,15 +1906,18 @@ static void update_free_nid_bitmap(struct 
>> f2fs_sb_info *sbi, nid_t nid,
>>  if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
>>  return;
>>  
>> -if (set)
>> +if (set) {
>> +if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
>> +return;
>>  __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
>> -else
>> -__clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
>> -
>> -if (set)
>>  nm_i->free_nid_count[nat_ofs]++;
>> -else if (!build)
>> -nm_i->free_nid_count[nat_ofs]--;
>> +} else {
>> +if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
>> +return;
>> +__clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
>> +if (!build)
>> +nm_i->free_nid_count[nat_ofs]--;
>> +}
>>  }
>>  
>>  static void scan_nat_page(struct f2fs_sb_info *sbi,
>> --
>> 2.7.4
>>
> 
> 
> 
> 
>

[p...@keyserver.paulfurley.com: PGP key expires in 3 days: 0x7179B76704ABA18B (it can be extended)]

2017-11-13 Thread Jarkko Sakkinen

James,

Refreshed my key with expiration +2y in keys.gunpg.net and pgp.mit.edu.
Please update.

/Jarkko

Re: [PATCH V2 net] net: hns3: Updates MSI/MSI-X alloc/free APIs(depricated) to new APIs

2017-11-13 Thread Yuval Shaia

On Thu, Nov 09, 2017 at 04:38:13PM +, Salil Mehta wrote:
> This patch migrates the HNS3 driver code from use of depricated PCI
> MSI/MSI-X interrupt vector allocation/free APIs to new common APIs.
> 
> Signed-off-by: Salil Mehta 
> Suggested-by: Christoph Hellwig 
> ---
> PATCH V2: Yuval Shaia 
>   Link -> https://lkml.org/lkml/2017/11/9/138
> PATCH V1: Initial Submit
> ---
>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 107 
> +++--
>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  15 ++-
>  2 files changed, 42 insertions(+), 80 deletions(-)

Reviewed-by: Yuval Shaia 

> 
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> index c1cdbfd..d65c599 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> @@ -885,14 +885,14 @@ static int hclge_query_pf_resource(struct hclge_dev 
> *hdev)
>   hdev->pkt_buf_size = __le16_to_cpu(req->buf_size) << HCLGE_BUF_UNIT_S;
>  
>   if (hnae3_dev_roce_supported(hdev)) {
> - hdev->num_roce_msix =
> + hdev->num_roce_msi =
>   hnae_get_field(__le16_to_cpu(req->pf_intr_vector_number),
>  HCLGE_PF_VEC_NUM_M, HCLGE_PF_VEC_NUM_S);
>  
>   /* PF should have NIC vectors and Roce vectors,
>* NIC vectors are queued before Roce vectors.
>*/
> - hdev->num_msi = hdev->num_roce_msix  + HCLGE_ROCE_VECTOR_OFFSET;
> + hdev->num_msi = hdev->num_roce_msi  + HCLGE_ROCE_VECTOR_OFFSET;
>   } else {
>   hdev->num_msi =
>   hnae_get_field(__le16_to_cpu(req->pf_intr_vector_number),
> @@ -1835,7 +1835,7 @@ static int hclge_init_roce_base_info(struct hclge_vport 
> *vport)
>   struct hnae3_handle *roce = &vport->roce;
>   struct hnae3_handle *nic = &vport->nic;
>  
> - roce->rinfo.num_vectors = vport->back->num_roce_msix;
> + roce->rinfo.num_vectors = vport->back->num_roce_msi;
>  
>   if (vport->back->num_msi_left < vport->roce.rinfo.num_vectors ||
>   vport->back->num_msi_left == 0)
> @@ -1853,67 +1853,47 @@ static int hclge_init_roce_base_info(struct 
> hclge_vport *vport)
>   return 0;
>  }
>  
> -static int hclge_init_msix(struct hclge_dev *hdev)
> +static int hclge_init_msi(struct hclge_dev *hdev)
>  {
>   struct pci_dev *pdev = hdev->pdev;
> - int ret, i;
> -
> - hdev->msix_entries = devm_kcalloc(&pdev->dev, hdev->num_msi,
> -   sizeof(struct msix_entry),
> -   GFP_KERNEL);
> - if (!hdev->msix_entries)
> - return -ENOMEM;
> -
> - hdev->vector_status = devm_kcalloc(&pdev->dev, hdev->num_msi,
> -sizeof(u16), GFP_KERNEL);
> - if (!hdev->vector_status)
> - return -ENOMEM;
> + int vectors;
> + int i;
>  
> - for (i = 0; i < hdev->num_msi; i++) {
> - hdev->msix_entries[i].entry = i;
> - hdev->vector_status[i] = HCLGE_INVALID_VPORT;
> + vectors = pci_alloc_irq_vectors(pdev, 1, hdev->num_msi,
> + PCI_IRQ_MSI | PCI_IRQ_MSIX);
> + if (vectors < 0) {
> + dev_err(&pdev->dev,
> + "failed(%d) to allocate MSI/MSI-X vectors\n",
> + vectors);
> + return vectors;
>   }
> + if (vectors < hdev->num_msi)
> + dev_warn(&hdev->pdev->dev,
> +  "requested %d MSI/MSI-X, but allocated %d MSI/MSI-X\n",
> +  hdev->num_msi, vectors);
>  
> - hdev->num_msi_left = hdev->num_msi;
> - hdev->base_msi_vector = hdev->pdev->irq;
> + hdev->num_msi = vectors;
> + hdev->num_msi_left = vectors;
> + hdev->base_msi_vector = pdev->irq;
>   hdev->roce_base_vector = hdev->base_msi_vector +
>   HCLGE_ROCE_VECTOR_OFFSET;
>  
> - ret = pci_enable_msix_range(hdev->pdev, hdev->msix_entries,
> - hdev->num_msi, hdev->num_msi);
> - if (ret < 0) {
> - dev_info(&hdev->pdev->dev,
> -  "MSI-X vector alloc failed: %d\n", ret);
> - return ret;
> - }
> -
> - return 0;
> -}
> -
> -static int hclge_init_msi(struct hclge_dev *hdev)
> -{
> - struct pci_dev *pdev = hdev->pdev;
> - int vectors;
> - int i;
> -
>   hdev->vector_status = devm_kcalloc(&pdev->dev, hdev->num_msi,
>  sizeof(u16), GFP_KERNEL);
> - if (!hdev->vector_status)
> + if (!hdev->vector_status) {
> + pci_free_irq_vectors(pdev);
>   return -ENOMEM;
> + }
>  
>   for (i = 0; i < hdev->num_msi; i++)
>   hdev->vector_status[i] = HCLGE_INVALID_VPORT;
>  
> - vectors = pci_alloc_irq_vectors(pdev, 1, hde

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-13 Thread Chao Yu

On 2017/11/14 12:20, Jaegeuk Kim wrote:
> On 11/13, Hyunchul Lee wrote:
>> On 11/13/2017 10:59 AM, Chao Yu wrote:
>>> On 2017/11/13 9:35, Hyunchul Lee wrote:
 On 11/13/2017 10:26 AM, Chao Yu wrote:
> On 2017/11/13 8:24, Hyunchul Lee wrote:
>> On 11/10/2017 03:42 PM, Chao Yu wrote:
>>> On 2017/11/10 8:23, Hyunchul Lee wrote:
 Hello, Chao

 On 11/09/2017 06:12 PM, Chao Yu wrote:
> On 2017/11/9 13:51, Hyunchul Lee wrote:
>> From: Hyunchul Lee 
>>
>> Using write hints[1], applications can inform the life time of the 
>> data
>> written to devices. and this[2] reported that the write hints patch
>> decreased writes in NAND by 25%.
>>
>> This hints help F2FS to determine the followings.
>>   1) the segment types where the data will be written.
>>   2) the hints that will be passed down to devices with the data of 
>> segments.
>>
>> This patch set implements the first mapping from write hints to 
>> segment types
>> as shown below.
>>
>>   hints segment type
>>   - 
>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>   othersCURSEG_WARM_DATA
>>
>> The F2FS poliy for hot/cold seperation has precedence over this 
>> hints, And
>> hints are not applied in in-place update.
>
> Could we change to disable IPU if file/inode write hint is existing?
>

 I am afraid that this makes side effects. for example, this could cause
 out-of-place updates even when there are not enough free segments. 
 I can write the patch that handles these situations. But I wonder 
 that this is required, and I am not sure which IPU polices can be 
 disabled.
>>>
>>> Oh, As I replied in another thread, I think IPU just affects filesystem
>>> hot/cold separating, rather than this feature. So I think it will be 
>>> okay
>>> to not consider it.
>>>

>>
>> Before the second mapping is implemented, write hints are not passed 
>> down
>> to devices. Because it is better that the data of a segment have the 
>> same 
>> hint.
>>
>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>> [2]: https://lwn.net/Articles/726477/
>
> Could you write a patch to support passing write hint to block layer 
> for
> buffered writes as below commit:
> 0127251c45ae ("ext4: add support for passing in write hints for 
> buffered writes")
>

 Sure I will. I wrote it already ;)
>>>
>>> Cool, ;)
>>>
 I think that datas from the same segment should be passed down with 
 the same
 hint, and the following mapping is reasonable. I wonder what is your 
 opinion
 about it.

   segment type   hints
      -
   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
   CURSEG_HOT_DATAWRITE_LIFE_SHORT
   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
>>>
>>> We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
>>>
   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
>>>
>>> As I know, in scenario of cell phone, data of meta_inode is hottest, 
>>> then hot
>>> data, warm node, and cold node should be coldest. So I suggested we can 
>>> define
>>> as below:
>>>
>>> META_DATA   WRITE_LIFE_SHORT
>>> HOT_DATA & WARM_NODEWRITE_LIFE_MEDIUM
>>> HOT_NODE & WARM_DATAWRITE_LIFE_LONG
>>> COLD_NODE & COLD_DATA   WRITE_LIFE_EXTREME
>>>
>>
>> I agree, But I am not sure that assigning the same hint to a node and 
>> data
>> segment is good. Because NVMe is likely to write them in the same erase 
>> block if they have the same hint.
>
> If we do not give the hint, they can still be written to the same erase 
> block,
>>>
>>> I mean it's possible to write them to the same erase block. :)
>>>
> right? it will not be worse?
>

 If the hint is not given, I think that they could be written to 
 the same erase block, or not. But if we give the same hint, they are 
 written
 to the same block.
>>>
>>> IMO, Only if underlying device can support more hint type or opened 
>>> channels,
>>> and actual temperature of data segment and node segment is quite different, 
>>> we
>>> can separate them.
>>>
>>
>> Okay, If Jaegeuk Kim agrees with this, I will submit the patch that 
>> implements your proposed mapping.
> 
> How about this? We'd better to split data and node blocks as much as

linux-next: Tree for Nov 14

2017-11-13 Thread Stephen Rothwell

Hi all,

Please do not add any v4.16 material to your linux-next included trees
until v4.15-rc1 has been released.

Changes since 20171113:

The powerpc tree lost its build failure.

The keys tree lost its build failure.

The nvdimm tree gained a conflict against the parisc-hd tree.

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 11762
 10963 files changed, 528527 insertions(+), 258423 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 272 trees (counting Linus' and 42 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (8e9a2dba8686 Merge branch 'locking-core-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (bb3f38c3c5b7 kbuild: clang: fix build failures 
with sparse check)
Merging arc-current/for-curr (92d44128241f ARCv2: Accomodate HS48 MMUv5 by 
relaxing MMU ver checking)
Merging arm-current/fixes (b9dd05c7002e ARM: 8720/1: ensure dump_instr() checks 
addr_limit)
Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs 
for v4.14-rc7)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7ecb37f62fe5 powerpc/perf: Fix core-imc hotplug 
callback failure during imc initialization)
Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (b39545684a90 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (c9f3f813d462 xfrm: Fix stack-out-of-bounds read in 
xfrm_state_find.)
Merging netfilter/master (7400bb4b5800 netfilter: nf_reject_ipv4: Fix 
use-after-free in send_reset)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a6127b4440d1 Merge tag 
'iwlwifi-for-kalle-2017-10-06' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (b39545684a90 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging sound-current/for-linus (7087cb8fad5e Documentation: sound: hd-audio: 
notes.rst)
Merging pci-current/for-linus (6b7be529634b MAINTAINERS: Add Lorenzo Pieralisi 
for PCI host bridge drivers)
Merging driver-core.current/driver-core-linus (39dae59d66ac Linux 4.14-rc8)
Merging tty.current/tty-linus (8a5776a5f498 Linux 4.14-rc4)
Merging usb.current/usb-linus (bb176f67090c Linux 4.14-rc6)
Merging usb-gadget-fixes/fixes (7c80f9e4a588 usb: usbtest: fix NULL pointer 
dereference)
Merging usb-serial-fixes/usb-linus (0b07194bb55e Linux 4.14-rc7)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (2fb850092fd9 phy: rockchip-typec: Check for errors from 
tcphy_phy_init())
Merging staging.current/staging-linus (bb176f67090c Linux 4.14-rc6)
Merging char-misc.current/char-misc-linus (bb176f67090c Linux 4.14-rc6)
Merging input-current/for-linus (26dd633e437d Input: synaptics-rmi4 - RMI4 can 
also use

Re: [PATCH v4 2/4] KVM: X86: Add paravirt remote TLB flush

2017-11-13 Thread Wanpeng Li

2017-11-13 21:02 GMT+08:00 Peter Zijlstra :
> On Mon, Nov 13, 2017 at 11:46:34AM +0100, Peter Zijlstra wrote:
>> On Mon, Nov 13, 2017 at 04:26:57PM +0800, Wanpeng Li wrote:
>> > 2017-11-13 16:04 GMT+08:00 Peter Zijlstra :
>>
>> > > So if at this point a vCPU gets preempted we'll still spin-wait for it,
>> > > which is sub-optimal.
>> > >
>> > > I think we can come up with something to get around that 'problem' if
>> > > indeed it is a problem. But we can easily do that as follow up patches.
>> > > Just let me know if you think its worth spending more time on.
>> >
>> > You can post your idea, it is always smart. :) Then we can evaluate
>> > the complexity and gains.
>>
>> I'm not sure I have a fully baked idea just yet, but the general idea
>> would be something like:
>>
>>  - switch (back) to a dedicated TLB invalidate IPI
>
> Just for PV that is; the !PV code can continue doing what it does today.
>
>>  - introduce KVM_VCPU_IPI_PENDING
>>
>>  - change flush_tlb_others() into something like:
>>
>>for_each_cpu(cpu, flushmask) {
>>src = &per_cpu(steal_time, cpu);
>>state = READ_ONCE(src->preempted);
>>do {
>>if (state & KVM_VCPU_PREEMPTED) {
>>if (try_cmpxchg(&src->preempted, &state,
>>state | 
>> KVM_VCPU_SHOULD_FLUSH)) {
>>__cpumask_clear_cpu(cpu, flushmask);
>>break;
>>}
>>}
>>} while (!try_cmpxchg(&src->preempted, &state,
>>state | KVM_VCPU_IPI_PENDING));
>
> That can be written like:
>
> do {
> if (state & KVM_VCPU_PREEMPTED)
> new_state = state | KVM_VCPU_SHOULD_FLUSH;
> else
> new_state = state | KVM_VCPU_IPI_PENDING;
> } while (!try_cmpxchg(&src->preempted, state, new_state);
>
> if (new_state & KVM_VCPU_IPI_PENDING)

Should be new_state & KVM_VCPU_SHOULD_FLUSH I think.

Regards,
Wanpeng Li

> __cpumask_clear_cpu(cpu, flushmask);
>
>>}
>>
>>apic->send_IPI_mask(flushmask, CALL_TLB_VECTOR);
>>
>>for_each_cpu(cpu, flushmask) {
>>src = &per_cpu(steal_time, cpu);
>
> /*
>  * The ACQUIRE pairs with the cmpxchg clearing IPI_PENDING,
>  * which is either the TLB IPI handler, or the VMEXIT path.
>  * It ensure that the invalidate happens-before.
>  */
>>smp_cond_load_acquire(&src->preempted, !(VAL & KVM_VCPU_IPI_PENDING);
>>}
>
> And here we wait for completion of the invalidate; but because of the
> VMEXIT change below, this will never stall on a !running vCPU.
>
> Note that PLE will not help (much) here, without this extra IPI_PENDING
> state and the VMEXIT transferring it to SHOULD_FLUSH this vCPU's progress
> will be held up until all vCPU's you've IPI'd will have ran the IPI
> handler, which in the worst case is still a very long time.
>
>>  - have the TLB invalidate handler do something like:
>>
>>state = READ_ONCE(src->preempted);
>>if (!(state & KVM_VCPU_IPI_PENDING))
>>  return;
>>
>>local_flush_tlb();
>>
>>do {
>>} while (!try_cmpxchg(&src->preempted, &state,
>>state & ~KVM_VCPU_IPI_PENDING));
>
> That needs to be:
>
> /*
>  * Clear KVM_VCPU_IPI_PENDING to 'complete' flush_tlb_others().
>  */
> do {
> /*
>  * VMEXIT could have cleared this for us, in which case
>  * we're done.
>  */
> if (!(state & KVM_VCPU_IPI_PENDING))
> return;
>
> } while (!try_cmpxchg(&src->preempted, state,
> state & ~KVM_VCPU_IPI_PENDING));
>
>>  - then at VMEXIT time do something like:
>>
> /*
>  * If we have IPI_PENDING set at VMEXIT time, transfer it to
>  * SHOULD_FLUSH. Clearing IPI_PENDING here allows the
>  * flush_others() vCPU to continue while the SHOULD_FLUSH
>  * guarantees this vCPU will flush TLBs before it continues
>  * execution.
>  */
>
>>state = READ_ONCE(src->preempted);
>>do {
>>   if (!(state & KVM_VCPU_IPI_PENDING))
>>   break;
>>} while (!try_cmpxchg(&src->preempted, state,
>>(state & ~KVM_VCPU_IPI_PENDING) |
>>KVM_VCPU_SHOULD_FLUSH));
>>
>>and clear any possible pending TLB_VECTOR in the guest state to avoid
>>raising that IPI spuriously on enter again.
>>
>
>

Re: [PATCH] docs: dev-tools: coccinelle: delete out of date wiki reference

2017-11-13 Thread Julia Lawall



On Tue, 14 Nov 2017, Masahiro Yamada wrote:

> Hi Julia, Jon,
>
> 2017-11-14 1:50 GMT+09:00 Julia Lawall :
> > The wiki is no longer available.
> >
> > Signed-off-by: Julia Lawall 
> >
>
>
> Jon sent the doc pull request yesterday.
>
> I will pick this up for Kbuild tree
> because I have not sent pull requests for this MW yet.

OK, thanks.

julia

>
>
>
> >
> > diff --git a/Documentation/dev-tools/coccinelle.rst 
> > b/Documentation/dev-tools/coccinelle.rst
> > index 37e474f..94f41c2 100644
> > --- a/Documentation/dev-tools/coccinelle.rst
> > +++ b/Documentation/dev-tools/coccinelle.rst
> > @@ -33,9 +33,6 @@ of many distributions, e.g. :
> >  You can get the latest version released from the Coccinelle homepage at
> >  http://coccinelle.lip6.fr/
> >
> > -Information and tips about Coccinelle are also provided on the wiki
> > -pages at http://cocci.ekstranet.diku.dk/wiki/doku.php
> > -
> >  Once you have it, run the following command::
> >
> > ./configure
>
>
>
> --
> Best Regards
> Masahiro Yamada
>

RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-13 Thread Ran Wang

Hi Michal,

> -Original Message-
> From: Michal Hocko [mailto:mho...@kernel.org]
> Sent: Monday, November 13, 2017 7:03 PM
> To: Ran Wang 
> Cc: linux...@kvack.org; Michael Ellerman ; Vlastimil
> Babka ; Andrew Morton ;
> KAMEZAWA Hiroyuki ; Reza Arbab
> ; Yasuaki Ishimatsu ;
> qiuxi...@huawei.com; Igor Mammedov ; Vitaly
> Kuznetsov ; LKML ;
> Leo Li ; Xiaobo Xie 
> Subject: Re: [PATCH 1/2] mm: drop migrate type checks from
> has_unmovable_pages
> 
> On Mon 13-11-17 07:33:13, Ran Wang wrote:
> > Hello Michal,
> >
> > 
> >
> > > Date: Fri, 13 Oct 2017 14:00:12 +0200
> > >
> > > From: Michal Hocko 
> > >
> > > Michael has noticed that the memory offline tries to migrate kernel
> > > code pages when doing  echo 0 >
> > > /sys/devices/system/memory/memory0/online
> > >
> > > The current implementation will fail the operation after several
> > > failed page migration attempts but we shouldn't even attempt to
> > > migrate that memory and fail right away because this memory is
> > > clearly not migrateable. This will become a real problem when we drop
> the retry loop counter resp. timeout.
> > >
> > > The real problem is in has_unmovable_pages in fact. We should fail
> > > if there are any non migrateable pages in the area. In orther to
> > > guarantee that remove the migrate type checks because
> > > MIGRATE_MOVABLE is not guaranteed to contain only migrateable pages.
> It is merely a heuristic.
> > > Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> > > allocate any non-migrateable pages from the block but CMA
> > > allocations themselves are unlikely to migrateable. Therefore remove
> both checks.
> > >
> > > Reported-by: Michael Ellerman 
> > > Signed-off-by: Michal Hocko 
> > > Tested-by: Michael Ellerman 
> > > Acked-by: Vlastimil Babka 
> > > ---
> > >  mm/page_alloc.c | 3 ---
> > >  1 file changed, 3 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> > > 3badcedf96a7..ad0294ab3e4f 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> > > struct page *page, int count,
> > >*/
> > >   if (zone_idx(zone) == ZONE_MOVABLE)
> > >   return false;
> > > - mt = get_pageblock_migratetype(page);
> > > - if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> > > - return false;
> >
> > This drop cause DWC3 USB controller fail on initialization with
> > Layerscaper processors (such as LS1043A) as below:
> >
> > [2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned
> bus number 1
> > [2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
> > [2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
> > [2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
> > [2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
> > [2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
> >
> > And I notice that someone also reported to you that DWC2 got affected
> > recently, so do you have the solution now?
> 
> Yes. It should be in linux-next. Have a look at the following email
> thread:
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.
> kernel.org%2Fr%2F20171104082500.qvzbb2kw4suo6cgy%40dhcp22.suse.cz&
> data=02%7C01%7Cran.wang_1%40nxp.com%7C5e73c6a941fc4f1c10e708d52
> a860c5b%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636461677
> 583607877&sdata=zlRxJ4LZwOBsit5qRx9yFT5qfP54wZ0z6G1z%2Bcywf5g%3D
> &reserved=0

Thanks for your info, although I fail to open the link you shared, but I got 
patch
from my colleague and the issue got fix on my side, let you know, thanks.

Best Regards,
Ran
> --
> Michal Hocko
> SUSE Labs

[PATCH] x86/mce: add support SRAO reported via CMC check

2017-11-13 Thread Xie XiuQi

In Intel SDM Volume 3B (253669-063US, July 2017), SRAO could be
reported via CMC:

  In cases when SRAO is signaled via CMCI the error signature is
  indicated via UC=1, PCC=0, S=0.

So we add those known AO MCACODs check in mce_severity().

Signed-off-by: Xie XiuQi 
Tested-by: Chen Wei 
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c 
b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 4ca632a..48f239a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -101,6 +101,16 @@
NOSER, BITCLR(MCI_STATUS_UC)
),
 
+   /* known AO MCACODs reported via CMC: */
+   MCESEV(
+   AO, "Action optional: memory scrubbing error",
+   SER, MASK(MCI_UC_SAR|MCACOD_SCRUBMSK, 
MCI_STATUS_UC|MCACOD_SCRUB)
+   ),
+   MCESEV(
+   AO, "Action optional: last level cache writeback error",
+   SER, MASK(MCI_UC_SAR|MCACOD, MCI_STATUS_UC|MCACOD_L3WB)
+   ),
+
/* ignore OVER for UCNA */
MCESEV(
UCNA, "Uncorrected no action required",
-- 
1.8.3.1

[GIT PULL]: dmaengine updates for 4.15-rc1

2017-11-13 Thread Vinod Koul

Hi Linus,

Here is the PULL request for dmaengine updates for 4.15-rc1. As you may have
noticed I am also using topic branches but the branch (for-linus) contains
only merge commits. Since I was not in KS and based on reading the coverage
I have gathered that you would like it this way, if not do let me know I
shall do accordingly.

Further we have also done RST conversion for dmaengine documentation. That
would come from Jon's tree.

The following changes since commit 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e:

  Linux 4.14-rc1 (2017-09-16 15:47:51 -0700)

are available in the git repository at:

  git://git.infradead.org/users/vkoul/slave-dma.git tags/dmaengine-4.15-rc1

for you to fetch changes up to cecd5fc5512349662b9e7a9e06231055d803e3f6:

  Merge branch 'topic/xilinx' into for-linus (2017-11-14 10:37:28 +0530)


dmaengine updates for 4.15-rc1

Updates for this cycle include:
- New driver for Spreadtrum dma controller, ST MDMA and DMAMUX controllers
- PM support for IMG MDC drivers
- Updates to bcm-sba-raid driver and improvements to sun6i driver
- Subsystem conversion for:
  - timers to use timer_setup()
  - remove usage of PCI pool API
  - usage of %p format specifier
- Minor updates to bunch of drivers


Adam Wallis (1):
  dmaengine: dmatest: warn user when dma test times out

Alexander Kochetkov (1):
  dmaengine: pl330: fix descriptor allocation fail

Andy Shevchenko (1):
  MAINTAINERS: Step down from a co-maintaner of DW DMAC driver

Anup Patel (4):
  dmaengine: bcm-sba-raid: serialize dma_cookie_complete() using reqs_lock
  dmaengine: bcm-sba-raid: Use only single mailbox channel
  dmaengine: bcm-sba-raid: Use common GPL comment header
  dmaengine: Build bcm-sba-raid driver as loadable module for iProc SoCs

Arnd Bergmann (1):
  dmaengine: stm32_mdma: add CONFIG_OF dependency

Baolin Wang (2):
  dt-bindings: dmaengine: Add Spreadtrum SC9860 DMA controller
  dmaengine: sprd: Add Spreadtrum DMA driver

Biju Das (1):
  dmaengine: usb-dmac: Add compatible string for r8a7743/5

Colin Ian King (1):
  dmaengine: stm32: remove redundant initialization of hwdesc

Corentin Labbe (1):
  dmaengine: sun6i: use of_device_get_match_data

Dan Carpenter (1):
  dmaengine: stm32-dmamux: Fix a NULL vs IS_ERR() check in probe

Ed Blake (2):
  dmaengine: img-mdc: Add suspend / resume handling
  dmaengine: img-mdc: Add runtime PM

Geert Uytterhoeven (1):
  dmaengine: nbpfaxi: Use of_device_get_match_data() helper

Hiroyuki Yokoyama (1):
  dmaengine: rcar-dmac: use TCRB instead of TCR for residue

Kees Cook (1):
  dmaengine: Convert timers to use timer_setup()

Lars-Peter Clausen (3):
  dmaengine: axi-dmac: Only use hardware cyclic mode for single segment 
transfers
  dmaengine: axi-dmac: Fix software cyclic mode
  dmaengine: xilinx_dma: Move enum xdma_ip_type to driver file

Nicolin Chen (1):
  dmaengine: imx-sdma: Correct src_addr_widths and directions

Peter Ujfalusi (3):
  dmaengine: edma: Implement protection for invalid max_burst
  dmaengine: omap-dma: Implement protection for invalid max_burst
  dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type

Pierre-Yves MORDRET (6):
  dt-bindings: Document the STM32 DMAMUX bindings
  dmaengine: Add STM32 DMAMUX driver
  dt-bindings: stm32-dma: add a property to handle STM32 DMAMUX
  dt-bindings: Document the STM32 MDMA bindings
  dmaengine: Add STM32 MDMA driver
  dmaengine: stm32_mdma: activate pack/unpack feature

Romain Perier (1):
  dmaengine: pch_dma: Replace PCI pool old API

Russell King (1):
  dmaengine: sa11x0: add DMA filters

Sricharan R (1):
  dmaengine: qcom-bam: Process multiple pending descriptors

Stefan Brüns (10):
  dmaengine: List all allowed values for src/dst_addr_width in kernel doc
  dmaengine: Mark struct dma_slave_caps kernel-doc correctly, clarify
  dmaengine: sun6i: Correct setting of clock autogating register for A83T/H3
  dmaengine: sun6i: Correct burst length field offsets for H3
  dmaengine: sun6i: Restructure code to allow extension for new SoCs
  dmaengine: sun6i: Enable additional burst lengths/widths on H3
  dmaengine: sun6i: Move number of pchans/vchans/request to device struct
  arm64: allwinner: a64: Add devicetree binding for DMA controller
  dmaengine: sun6i: Add support for Allwinner A64 and compatibles
  dmaengine: sun6i: Retrieve channel count/max request from devicetree

Vinod Koul (21):
  dmaengine: stm32: use %p format specfier for pointer
  dmaengine: coh901318: Remove unnecessary 0x prefixes before %pad
  dmaengine: at_hdmac: Remove unnecessary 0x prefixes before %pad
  dmaengine: Revert "rcar-dmac: use TCRB instead of TCR for residue"
  Merge branch 'topic/print_fixes' into for-linus

Re: [PATCH v2 1/3] dt-bindings: phy: Add Cygnus usb phy binding

2017-11-13 Thread Raveendra Padasalagi

Hi Rob,

On Mon, Nov 13, 2017 at 11:23 PM, Rob Herring  wrote:
> On Sun, Nov 12, 2017 at 10:23 PM, Raveendra Padasalagi
>  wrote:
>> Hi,
>>
>> On Sat, Nov 11, 2017 at 3:14 AM, Rob Herring  wrote:
>>> On Wed, Nov 08, 2017 at 01:16:41PM +0530, Raveendra Padasalagi wrote:
 Add devicetree binding document for broadcom's
 Cygnus SoC specific usb phy controller driver.

 Signed-off-by: Raveendra Padasalagi 
 ---
  .../bindings/phy/brcm,cygnus-usb-phy.txt   | 106 
 +
  1 file changed, 106 insertions(+)
  create mode 100644 
 Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt

 diff --git a/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt 
 b/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt
 new file mode 100644
 index 000..bbc4b94
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt
 @@ -0,0 +1,106 @@
 +BROADCOM CYGNUS USB PHY
 +
 +Required Properties:
 +- compatible:  brcm,cygnus-usb-phy
 +- reg : the register start address and length for
 +crmu_usbphy_aon_ctrl,
 +cdru usb phy control,
 +usb host idm registers,
 +usb device idm registers.
 +- reg-names: a list of the names corresponding to the previous register 
 ranges
 +  Should contain
 +"crmu-usbphy-aon-ctrl",
 +"cdru-usbphy",
 +"usb2h-idm",
 +"usb2d-idm".
 +- address-cells: should be 1
 +- size-cells: should be 0
 +
 +Sub-nodes:
 +  Each port's PHY should be represented as a sub-node.
 +
 +Sub-nodes required properties:
 +- reg: the PHY number
 +- #phy-cells must be 1
 +  The node that uses the phy must provide 1 integer argument specifying
 +  port number.
 +
 +Optional Properties:
 +- vbus-p#-supply : The regulator for vbus out control for the host
>>>
>>> Is this a literal # or something else?
>>
>> Yes, this is a literal. It's assumed # will replace numeric 0-2 for
>> each of the ports.
>
> I'm still confused. Which is valid? "vbus-p#-supply" or "vbus-p0-supply"
>
I agree, it's creating confusion. Instead of enumerating
"vbus-p0-supply", "vbus-p1-supply", "vbus-p2-supply" kept "vbus-p#-supply".

Yes, as suggested by you "vbus-supply" should be sufficient as it's in each
of phy sub node.

> If the latter, you need to enumerate all valid options. But these are
> in sub nodes for each port, so just "vbus-supply" should be
> sufficient.

Keeping "vbus-supply" should not create any confusion. Will send out the
change in next version of the patch.

> One more question, does Vbus actually supply power to the phy or you
> are just associating the Vbus supply to a connector with a port? The
> latter needs a connector node instead and Vbus should be part of that.
> There's been some attempts at USB connectors, but we don't really have
> one yet (the extcon binding is not it).

Vbus is not supplied to phy, it's been given to the devices connected on
the port and in our platform vbus is controlled through an external regulator
which is controlled through gpio.
So "vbus-supply" shown above actually points to the phandle of vbus regulator
node.

>> In the example it's not shown as the regulators specified in vbus-p#-supply
>> are board specific.
>
> Please show in the example. Examples should be complete.

Ok. Sure.

> Rob

Re: [GIT PULL] USB/PHY driver changes for 4.15-rc1

2017-11-13 Thread Linus Torvalds

On Mon, Nov 13, 2017 at 8:19 AM, Greg KH  wrote:
>
> Other major thing is the typec code that moved out of staging and into
> the "real" part of the drivers/usb/ tree, which was nice to see happen.

Hmm. So now it asks me about Type-C Port Controller Manager. Fair
enough. I say "N", because I have none. But then it still asks me
about that TI TPS6598x driver...

So I do see the _technical_ logic in there - the "TYPEC" config option
is a hidden internal option, and it's selected by the things that need
it.

But from a user perspective, this configuration model is really strange.

Why is TYPEC_TCPM something you ask the user, but not "do you want
Type-C support"?  And if you single out the PCM side to ask about, why
don't you single out the power delivery side?

Wouldn't it make more sense to at least ask whether I want Type-C
power delivery chips before it then starts asking about individual PD
drivers, the same way you asked about the port controller before you
started asking ab out individual port controller drivers?

Or is it just me who finds this a bit odd?

   Linus

[PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run

2017-11-13 Thread Rik van Riel

Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.

This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.

This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.

No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy. 

There may be other tests where performance changes are noticeable.

Signed-off-by: Rik van Riel 
Suggested-by: Christian Borntraeger 
---
 arch/x86/include/asm/kvm_host.h | 13 +
 arch/x86/kvm/x86.c  | 29 -
 2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493adf07..92e66685249e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
 
+   /*
+* QEMU userspace and the guest each have their own FPU state.
+* In vcpu_run, we switch between the user and guest FPU contexts.
+* While running a VCPU, the VCPU thread will have the guest FPU
+* context.
+*
+* Note that while the PKRU state lives inside the fpu registers,
+* it is switched out separately at VMENTER and VMEXIT time. The
+* "guest_fpu" state here contains the guest FPU context, with the
+* host PRKU bits.
+*/
+   struct fpu user_fpu;
struct fpu guest_fpu;
+
u64 xcr0;
u64 guest_supported_xcr0;
u32 guest_xstate_size;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb7fcd6..59912b20a830 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2917,7 +2917,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
srcu_read_unlock(&vcpu->kvm->srcu, idx);
pagefault_enable();
kvm_x86_ops->vcpu_put(vcpu);
-   kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
 }
 
@@ -6908,7 +6907,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
 
kvm_x86_ops->prepare_guest_switch(vcpu);
-   kvm_load_guest_fpu(vcpu);
 
/*
 * Disable IRQs before setting IN_GUEST_MODE.  Posted interrupt
@@ -7095,6 +7093,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 
vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
 
+   kvm_load_guest_fpu(vcpu);
+
for (;;) {
if (kvm_vcpu_running(vcpu)) {
r = vcpu_enter_guest(vcpu);
@@ -7132,6 +7132,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
}
}
 
+   kvm_put_guest_fpu(vcpu);
+
srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
 
return r;
@@ -7663,32 +7665,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.cr0 |= X86_CR0_ET;
 }
 
+/* Swap (qemu) user FPU context for the guest FPU context. */
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-   if (vcpu->guest_fpu_loaded)
-   return;
-
-   /*
-* Restore all possible states in the guest,
-* and assume host would use all available bits.
-* Guest xcr0 would be loaded later.
-*/
-   vcpu->guest_fpu_loaded = 1;
-   __kernel_fpu_begin();
+   preempt_disable();
+   copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
/* PKRU is separately restored in kvm_x86_ops->run.  */
__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
~XFEATURE_MASK_PKRU);
+   preempt_enable();
trace_kvm_fpu(1);
 }
 
+/* When vcpu_run ends, restore user space FPU context. */
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-   if (!vcpu->guest_fpu_loaded)
-   return;
-
-   vcpu->guest_fpu_loaded = 0;
+   preempt_disable();
copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
-   __kernel_fpu_end();
+   copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
+   preempt_enable();
++vcpu->stat.fpu_reload;
trace_kvm_fpu(0);
 }

[PATCH] f2fs: expose quota information in debugfs

2017-11-13 Thread Jaegeuk Kim

This patch shows # of dirty pages and # of hidden quota files.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/debug.c | 11 +++
 fs/f2fs/f2fs.h  | 10 --
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index f7eec506ceea..ecada8425268 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -45,9 +45,18 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
si->ndirty_data = get_pages(sbi, F2FS_DIRTY_DATA);
+   si->ndirty_qdata = get_pages(sbi, F2FS_DIRTY_QDATA);
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+
+   si->nquota_files = 0;
+   if (f2fs_sb_has_quota_ino(sbi->sb)) {
+   for (i = 0; i < MAXQUOTAS; i++) {
+   if (f2fs_qf_ino(sbi->sb, i))
+   si->nquota_files++;
+   }
+   }
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES);
si->aw_cnt = atomic_read(&sbi->aw_cnt);
@@ -369,6 +378,8 @@ static int stat_show(struct seq_file *s, void *v)
   si->ndirty_dent, si->ndirty_dirs, si->ndirty_all);
seq_printf(s, "  - datas: %4d in files:%4d\n",
   si->ndirty_data, si->ndirty_files);
+   seq_printf(s, "  - quota datas: %4d in quota files:%4d\n",
+  si->ndirty_qdata, si->nquota_files);
seq_printf(s, "  - meta: %4d in %4d\n",
   si->ndirty_meta, si->meta_pages);
seq_printf(s, "  - imeta: %4d\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 5c379a8ea075..44f874483ecf 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -865,6 +865,7 @@ struct f2fs_sm_info {
 enum count_type {
F2FS_DIRTY_DENTS,
F2FS_DIRTY_DATA,
+   F2FS_DIRTY_QDATA,
F2FS_DIRTY_NODES,
F2FS_DIRTY_META,
F2FS_INMEM_PAGES,
@@ -1642,6 +1643,8 @@ static inline void inode_inc_dirty_pages(struct inode 
*inode)
atomic_inc(&F2FS_I(inode)->dirty_pages);
inc_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ?
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA);
+   if (IS_NOQUOTA(inode))
+   inc_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA);
 }
 
 static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type)
@@ -1658,6 +1661,8 @@ static inline void inode_dec_dirty_pages(struct inode 
*inode)
atomic_dec(&F2FS_I(inode)->dirty_pages);
dec_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ?
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA);
+   if (IS_NOQUOTA(inode))
+   dec_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA);
 }
 
 static inline s64 get_pages(struct f2fs_sb_info *sbi, int count_type)
@@ -2771,9 +2776,10 @@ struct f2fs_stat_info {
unsigned long long hit_largest, hit_cached, hit_rbtree;
unsigned long long hit_total, total_ext;
int ext_tree, zombie_tree, ext_node;
-   int ndirty_node, ndirty_dent, ndirty_meta, ndirty_data, ndirty_imeta;
+   int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
+   int ndirty_data, ndirty_qdata;
int inmem_pages;
-   unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+   unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
-- 
2.14.0.rc1.383.gd1ce394fe2-goog

Re: [PATCH v1 4/5] perf, tools: Add fallback in perf_evsel__nr_cpus for no map

2017-11-13 Thread Andi Kleen

On Mon, Nov 13, 2017 at 10:22:30AM +0100, Jiri Olsa wrote:
> On Thu, Nov 09, 2017 at 06:55:27AM -0800, Andi Kleen wrote:
> > From: Andi Kleen 
> > 
> > Support the case of the event having no cpumap in perf_evsel__nr_cpus.
> > Just return 1 in this case.  This can happen in perf script
> > when it uses the perf stat shadow functions.
> 
> why 1, where in shadow code? you can synthesize cpus for event
> via event_update event

For sampling it should be always 1, right?

Where:

#0  0x00570e03 in __perf_evsel_stat__is (evsel=0x2690ce0,
id=PERF_STAT_EVSEL_ID__CYCLES_IN_TX) at util/stat.c:75
#1  0x00572375 in perf_stat__update_shadow_stats
(counter=0x2690ce0, count=3744, cpu=0) at util/stat-shadow.c:194

-Andi

Re: [GIT PULL] x86 updates for v4.15

2017-11-13 Thread Linus Torvalds

On Mon, Nov 13, 2017 at 12:24 AM, Ingo Molnar  wrote:
>
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-asm-for-linus

Hmm #2.

My laptop had odd SIGBUS and IO errors after a suspend/resume cycle
when running commit d6ec9d9a4def, which is after my merge of the x86
core changes.

I'm probably not going to be able to bisect it - there's nothing in
the logs, probably because processes just died (and most likely the IO
errors were due to the disk having gone missing), but looking at the
merges I had done up until that point, all the suspect ones are from
you.

The x86 pull obviously being the most likely one, just based on
content, and based on that "after suspend/resume".

I'm wondering how much suspend/resume testing that entry code has
gotten. Last release it was the TLB ASID code that messed up on
suspend/resume, I suspect there is a decided lack of test coverage in
the otherwise good x86 farm..

I'll see if I can get anything interesting out of testing some more,
but I thought I'd give you guys a heads up.

Usually it's the networking tree and the PM tree that triggers issues
on my laptop, but neither of those had been merged at that point. But
there also really isn't anything else that looks odd in there.

 Linus

Re: [PATCH] iio: mma8452: add power_mode sysfs configuration

2017-11-13 Thread harinath Nampally

Hi Martin,

> But given your concerns, I would strip down this patch to only offer the
> already documented "low_noise" and "low_power" modes. It wouldn't be
> worth it to extend the ABI just because of this!
OK then we can map 'low_noise' to high resolution mode. But I am afraid
I can't test the functionality because I don't have proper instruments to
measure the current draw(in microAmps) accurately.

> I would like "oversampling" more than this "power_mode" too. For this
> driver it would be far more complicated to implement though. I doubt
> that it'll be done. power_mode is basically already there implicitely,
> and given that there *is* the ABI, we could offer it for free.
I think 'oversampling' is already implemented, as I see
'case IIO_CHAN_INFO_OVERSAMPLING_RATIO:'
being handled which is basically setting the all 4 different power modes.
If we also add 'power_mode', I think it would be like having two
different user interfaces for
same functionality. So I don't see much of value adding 'power_mode' as well.
Please correct me if I am wrong.

Thanks,
Harinath

On Sun, Nov 12, 2017 at 7:28 AM, Martin Kepplinger  wrote:
> On 2017-11-11 01:33, Jonathan Cameron wrote:
>> On Mon, 6 Nov 2017 08:19:58 +0100
>> Martin Kepplinger  wrote:
>>
>>> This adds the power_mode sysfs interface to the device as documented in
>>> sysfs-bus-iio.
>>>
>>> ---
>>>
>>> Note that I explicitely don't sign off on this.
>>>
>>> This is a starting point for anybody who can test it and check for correct
>>> API usage, and ABI correctness, as documented in 
>>> Documentation/ABI/testing/sys-bus-iio
>>> (grep it for "power_mode"). The ABI doc probably would need an addition
>>> too, if the 4 power modes here seem generally useful (there are only
>>>  2 listed there)!
>>>
>>> So, if you can test this, feel free to set up a proper patch or
>>> two, and I'm happy to review.
>>>
>>> Please note that this patch is quite old. It really should be that simple
>>> as far as my understanding back then. We always list the available 
>>> frequencies
>>> of the given power mode we are in, for example, already, and everything
>>> basically is in place except for the user interface.
>>
>> Hmm. A lot of devices support something along these lines.  The issue
>> has always been - how is userspace to figure out what to do with it?
>> It's all very vague...
>>
>> Funnily enough - this used to be really common, but is becoming less so
>> now - presumably because no one was using it much (or maybe I am reading
>> too much into that ;)
>>
>> Now the question is whether it can be tied to better defined things?
>>
>> Here low noise restricts the range to 4g.  Issue is that we don't actually
>> have writeable _available attributes (which correspond to range in this 
>> case).
>>
>
> Does it? Isn't it merely less oversampling.
>
>> Low power mode... This one is apparently oversampling.  If possible support
>> it as that as we have well defined interfaces for that.
>>
>> Jonathan.
>
> Ah, I remember; the oversampling settings was actually a reason why I
> hadn't submitted the patch :) The oversampling API would definitely be
> more accurate.
>
> I would like "oversampling" more than this "power_mode" too. For this
> driver it would be far more complicated to implement though. I doubt
> that it'll be done. power_mode is basically already there implicitely,
> and given that there *is* the ABI, we could offer it for free.
>
> But given your concerns, I would strip down this patch to only offer the
> already documented "low_noise" and "low_power" modes. It wouldn't be
> worth it to extend the ABI just because of this!
>
> Users would have a simple switch if they don't really *want* to know the
> details. I think it can be useful to just say "I don't care about power
> consuption. Be as accurate as possible" or "I just want this think to
> work. Use a little power as possible." Sure it's vage, but would it be
> useless?

Re: [f2fs-dev] [PATCH RESEND] f2fs: validate before set/clear free nat bitmap

2017-11-13 Thread Jaegeuk Kim

Sorry, I can't merge this patch due to wrong format.

On 11/11, LiFan wrote:
> In flush_nat_entries, all dirty nats will be flushed and if their new
> address isn't 
> NULL_ADDR, their bitmaps will be updated, the free_nid_count of the bitmaps 
> will be increased regardless of whether the nats have already been occupied 
> before. This could lead to wrong free_nid_count.
> So this patch checks the status of the bits before actually set/clear them.
> 
> Fixes: 586d1492f301 ("f2fs: skip scanning free nid bitmap of full NAT
> blocks")
> 
> Signed-off-by: Fan li 
> ---
>  fs/f2fs/node.c | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index d234c6e..b965a53 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1906,15 +1906,18 @@ static void update_free_nid_bitmap(struct
> f2fs_sb_info *sbi, nid_t nid,
>   if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
>   return;
>  
> - if (set)
> + if (set) {
> + if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
> + return;
>   __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
> - else
> - __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
> -
> - if (set)
>   nm_i->free_nid_count[nat_ofs]++;
> - else if (!build)
> - nm_i->free_nid_count[nat_ofs]--;
> + } else {
> + if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
> + return;
> + __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
> + if (!build)
> + nm_i->free_nid_count[nat_ofs]--;
> + }
>  }
>  
>  static void scan_nat_page(struct f2fs_sb_info *sbi,
> --
> 2.7.4
>

Re: Prototype patch for Linux-kernel memory model

2017-11-13 Thread Paul E. McKenney

On Mon, Nov 13, 2017 at 03:09:11PM -0500, Alan Stern wrote:
> On Mon, 13 Nov 2017, Paul E. McKenney wrote:
> 
> > Hello!
> > 
> > Please see below for the git commit corresponding to a prototype
> > patch for the Linux-kernel memory model.  This addresses the feedback
> > we got at Linux Plumbers Conference:
> > 
> > 1.  There is a Documentation/recipes.txt file giving known-good
> > useful examples, along with corresponding litmus tests.
> > 
> > 2.  There is a Documentation/explanation.txt file giving an
> > overview of the memory model and its workings.
> > 
> > 3.  There is a Documentation/references.txt file giving some
> > background reading.
> > 
> > I believe that we have something that will be extremely useful and
> > valuable to novices and experts alike.
> > 
> > Please note that this version of the memory model does not yet reflect
> > the changes that make DEC Alpha no longer be a special case because
> > those changes have not yet hit mainline.  The model will be updated
> > once this happens.
> > 
> > Thoughts?
> 
> In references.txt, should we add URLs to non-paywalled PDFs?  Or should
> we assume that our readers are capable of using Google to find these 
> things on their own?
> 
> There are a few places where some comments should be resolved/removed
> before submission:
> 
>   Documentation/references.txt line 98:
>   Uncategorized stuff (any of this really needed?)
> 
>   litmus-tests/README line 92:
>   [ Shouldn't we have one with smp_wmb() in the process with both
> writes, and smp_mb() in the other process. ]

I updated these, recategorizing the "Uncategorized stuff" and removing
the note from litmus-tests/README -- we don't seem to use R in recipes
anyway.

> In the files defining the memory model, we should replace the GPL
> boilerplate with SPDX headers.

We can!

I pushed both commits.

Thanx, Paul

Re: [PATCH v9 3/7] mailbox: qcom: Move the apcs struct into a separate header

2017-11-13 Thread Bjorn Andersson

On Mon 13 Nov 18:12 PST 2017, Stephen Boyd wrote:

> On 10/27, Georgi Djakov wrote:
> > Hi Bjorn,
> > 
> > Thanks for reviewing!
> > 
> > On 10/26/2017 07:28 AM, Bjorn Andersson wrote:
> > > On Thu 21 Sep 09:49 PDT 2017, Georgi Djakov wrote:
> > > 
> > >> Move the structure shared by the APCS IPC device and its subdevices
> > >> into a separate header file.
> > >>
> > > 
> > > As you're creating the apcs regmap with devm_regmap_init_mmio() you can
> > > just call dev_get_regmap(dev->parent) in your child to get the handle.
> > 
> > Ok, thanks!
> > 
> > > 
> > > But I would prefer that you just add the clock code to the existing
> > > driver.
> > 
> > This will require an ack from Stephen, and i got the impression that he
> > prefers a separate clk driver [1].
> > 
> > Stephen, are you ok with registering the clocks from the apcs mailbox
> > driver?
> > 
> > [1] https://lkml.org/lkml/2017/6/26/750
> 
> The parent regmap "trick" was the plan. Is something wrong with
> that?
> 

Not at all, but then this patch (moving apcs context to a shared header
file) shouldn't be needed, or am I missing something?

> Not having random clk drivers scattered throughout the tree is
> sort of nice because it makes for an easier time finding things
> that are similar. Maybe that's an abuse of the driver model
> though? Just to get things into some same directory. I'm fine
> either way.
> 

Keeping the clock driver in the clock subsystem does make sense. I see
now that there is a include of a local header file as well, so that
would just be messy to keep split.

I'm fine with the extra driver instance, it's the DT that I don't think
should describe the fact that we want to keep the clock-part in the
clock subsystem.

Do you see any problems spawning the clock driver programmatically and
then calling of_clk_add_hw_provider() on the parent's of_node?

Regards,
Bjorn

Fwd: FW: [PATCH 18/31] nds32: Library functions

2017-11-13 Thread Vincent Chen

>>On Wed, Nov 08, 2017 at 01:55:06PM +0800, Greentime Hu wrote:
>
>> +#define __range_ok(addr, size) (size <= get_fs() && addr <= (get_fs()
>> +-size))
>> +
>> +#define access_ok(type, addr, size) \
>> + __range_ok((unsigned long)addr, (unsigned long)size)
>
>> +#define __get_user_x(__r2,__p,__e,__s,__i...)   
>>  \
>> +__asm__ __volatile__ (   \
>> + __asmeq("%0", "$r0") __asmeq("%1", "$r2")   \
>> + "bal__get_user_" #__s   \
>
>... which does not check access_ok() or do any visible equivalents; OK...
>
>> +#define get_user(x,p)   
>>  \
>> + ({  \
>> + const register typeof(*(p)) __user *__p asm("$r0") = (p);\
>> + register unsigned long __r2 asm("$r2"); \
>> + register int __e asm("$r0");\
>> + switch (sizeof(*(__p))) {   \
>> + case 1: \
>> + __get_user_x(__r2, __p, __e, 1, "$lp"); \
>
>... and neither does this, which is almost certainly *not* OK.
>
>> +#define put_user(x,p)   
>>  \
>
>Same here, AFAICS.

Thanks.
I will add access_ok() in get_user/put_user

>> +extern unsigned long __arch_copy_from_user(void *to, const void __user * 
>> from,
>> +unsigned long n);
>
>> +static inline unsigned long raw_copy_from_user(void *to,
>> +const void __user * from,
>> +unsigned long n)
>> +{
>> + return __arch_copy_from_user(to, from, n); }
>
>Er...  Why not call your __arch_... raw_... and be done with that?

Thanks.
I will modify it in next patch version

>> +#define INLINE_COPY_FROM_USER
>> +#define INLINE_COPY_TO_USER
>
>Are those actually worth bothering?  IOW, have you compared behaviour with and 
>without them?

We compared the assembly code of copy_from/to_user's caller function,
and we think the performance is better by making copy_from/to_user as
inline


>> +ENTRY(__arch_copy_to_user)
>> + push$r0
>> + push$r2
>> + beqz$r2, ctu_exit
>> + srli$p0, $r2, #2! $p0 = number of word to clear
>> + andi$r2, $r2, #3! Bytes less than a word to copy
>> + beqz$p0, byte_ctu   ! Only less than a word to copy
>> +word_ctu:
>> + lmw.bim $p1, [$r1], $p1 ! Load the next word
>> +USER(smw.bim,$p1, [$r0], $p1)! Store the next word
>
>Umm...  It's that happy with unaligned loads and stores?  Your memcpy seems to 
>be trying to avoid those...

Thanks.
This should be aligned loads and stores, too.
I will modify it in next version patch.

>> +9001:
>> + pop $p1 ! Original $r2, n
>> + pop $p0 ! Original $r0, void *to
>> + sub $r1, $r0, $p0   ! Bytes copied
>> + sub $r2, $p1, $r1   ! Bytes left to copy
>> + push$lp
>> + move$r0, $p0
>> + bal memzero ! Clean up the memory
>
>Just what memory are you zeroing here?  The one you had been unable to store 
>into in the first place?
>
>> +ENTRY(__arch_copy_from_user)
>
>> +9001:
>> + pop $p1 ! Original $r2, n
>> + pop $p0 ! Original $r0, void *to
>> + sub $r1, $r1, $p0   ! Bytes copied
>> + sub $r2, $p1, $r1   ! Bytes left to copy
>> + push$lp
>> + bal memzero ! Clean up the memory
>
>Ditto, only this one is even worse - instead of just oopsing on you, it will 
>quietly destroy data past the area you've copied into.  raw_copy_..._user() 
>MUST NOT ZERO ANYTHING.  Ever.


Thanks
So, I should keep the area that we've copied into instead of zeroing
the area even if unpredicted exception is happened. Right?


Best regards
Vincent

linux-next: manual merge of the nvdimm tree with the parisc-hd tree

2017-11-13 Thread Stephen Rothwell

Hi Dan,

Today's linux-next merge of the nvdimm tree got a conflict in:

  arch/parisc/include/uapi/asm/mman.h

between commit:

  48cd4dc8f57f ("parisc: Convert MAP_TYPE to cover 4 bits on parisc")

from the parisc-hd tree and commit:

  1c9725974074 ("mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely 
define new mmap flag")

from the nvdimm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/parisc/include/uapi/asm/mman.h
index 9a39035986cc,bca652aa1677..
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@@ -12,11 -11,10 +12,12 @@@
  
  #define MAP_SHARED0x01/* Share changes */
  #define MAP_PRIVATE   0x02/* Changes are private */
+ #define MAP_SHARED_VALIDATE 0x03  /* share + validate extension flags */
 -#define MAP_TYPE  0x03/* Mask for type of mapping */
 +#define MAP_TYPE  (MAP_SHARED|MAP_PRIVATE|MAP_RESRVD1|MAP_RESRVD2) /* 
Mask for type of mapping */
  #define MAP_FIXED 0x04/* Interpret addr exactly */
 +#define MAP_RESRVD1   0x08/* reserved for 3rd bit of MAP_TYPE */
  #define MAP_ANONYMOUS 0x10/* don't use a file */
 +#define MAP_RESRVD2   0x20/* reserved for 4th bit of MAP_TYPE */
  
  #define MAP_DENYWRITE 0x0800  /* ETXTBSY */
  #define MAP_EXECUTABLE0x1000  /* mark it as an executable */

Re: [PATCH] iio: accel: mma8452: Add single pulse/tap event detection

2017-11-13 Thread harinath Nampally

> > This patch adds following related changes:
> > - defines pulse event related registers
> > - enables and handles single pulse interrupt for fxls8471
> > - handles IIO_EV_DIR_EITHER in read/write callbacks (because
> >   event direction for pulse is either rising or falling)
> > - configures read/write event value for pulse latency register
> >   using IIO_EV_INFO_HYSTERESIS
> > - adds multiple events like pulse and tranient event spec
> >   as elements of event_spec array named 'mma8452_accel_events'
> >
> > Except mma8653 chip all other chips like mma845x and
> > fxls8471 have single tap detection feature.
> > Tested thoroughly using iio_event_monitor application on
> > imx6ul-evk board which has fxls8471.
> >
> > Signed-off-by: Harinath Nampally 
> > ---
> What tree is this written against? It doesn't apply to the current -next
> anyways.
Thanks for the review.
It is actually against 'testing' branch, I think two of my earlier
patches are not yet applied to
any branch, that might be reason this patch is not good against
current -next or 'togreg'.

> I think the defintions would deserve to be in a separate patch, but
> that's debatable.
Yes, I would argue that definitions are not a logical change.

> >   .type = IIO_EV_TYPE_MAG,
> >   .dir = IIO_EV_DIR_RISING,
> >   .mask_separate = BIT(IIO_EV_INFO_ENABLE),
> > @@ -1139,6 +1274,15 @@ static const struct iio_event_spec 
> > mma8452_transient_event[] = {
> >   BIT(IIO_EV_INFO_PERIOD) |
> >   BIT(IIO_EV_INFO_HIGH_PASS_FILTER_3DB)
> >   },
> > + {
> > + //pulse event
> > + .type = IIO_EV_TYPE_MAG,
> > + .dir = IIO_EV_DIR_EITHER,
> > + .mask_separate = BIT(IIO_EV_INFO_ENABLE),
> > + .mask_shared_by_type = BIT(IIO_EV_INFO_VALUE) |
> > + BIT(IIO_EV_INFO_PERIOD) |
> > + BIT(IIO_EV_INFO_HYSTERESIS)
> > + },
> >  };
> >
> >  static const struct iio_event_spec mma8452_motion_event[] = {
> > @@ -1202,8 +1346,8 @@ static struct attribute_group 
> > mma8452_event_attribute_group = {
> >   .shift = 16 - (bits), \
> >   .endianness = IIO_BE, \
> >   }, \
> > - .event_spec = mma8452_transient_event, \
> > - .num_event_specs = ARRAY_SIZE(mma8452_transient_event), \
> > + .event_spec = mma8452_accel_events, \
> > + .num_event_specs = ARRAY_SIZE(mma8452_accel_events), \
> that would go in the mentioned separate renaming-patch
OK so I will make a patch set; patch 1/2 to just rename
'mma8452_transient_event[]'
to 'mma8452_accel_events[]'(without adding pulse event).
and everything else would go in 2/2. Does that makes sense?

Thanks,
Harinath

On Fri, Nov 10, 2017 at 5:35 PM, Martin Kepplinger  wrote:
> On 2017-11-09 04:12, Harinath Nampally wrote:
>> This patch adds following related changes:
>> - defines pulse event related registers
>> - enables and handles single pulse interrupt for fxls8471
>> - handles IIO_EV_DIR_EITHER in read/write callbacks (because
>>   event direction for pulse is either rising or falling)
>> - configures read/write event value for pulse latency register
>>   using IIO_EV_INFO_HYSTERESIS
>> - adds multiple events like pulse and tranient event spec
>>   as elements of event_spec array named 'mma8452_accel_events'
>>
>> Except mma8653 chip all other chips like mma845x and
>> fxls8471 have single tap detection feature.
>> Tested thoroughly using iio_event_monitor application on
>> imx6ul-evk board which has fxls8471.
>>
>> Signed-off-by: Harinath Nampally 
>> ---
>
> What tree is this written against? It doesn't apply to the current -next
> anyways.
>
>>  drivers/iio/accel/mma8452.c | 156 
>> ++--
>>  1 file changed, 151 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iio/accel/mma8452.c b/drivers/iio/accel/mma8452.c
>> index 43c3a6b..36f1b56 100644
>> --- a/drivers/iio/accel/mma8452.c
>> +++ b/drivers/iio/accel/mma8452.c
>> @@ -72,6 +72,19 @@
>>  #define  MMA8452_TRANSIENT_THS_MASK  GENMASK(6, 0)
>>  #define MMA8452_TRANSIENT_COUNT  0x20
>>  #define MMA8452_TRANSIENT_CHAN_SHIFT 1
>> +#define MMA8452_PULSE_CFG0x21
>> +#define MMA8452_PULSE_CFG_CHAN(chan) BIT(chan * 2)
>> +#define MMA8452_PULSE_CFG_ELEBIT(6)
>> +#define MMA8452_PULSE_SRC0x22
>> +#define MMA8452_PULSE_SRC_XPULSE BIT(4)
>> +#define MMA8452_PULSE_SRC_YPULSE BIT(5)
>> +#define MMA8452_PULSE_SRC_ZPULSE BIT(6)
>> +#define MMA8452_PULSE_THS0x23
>> +#define MMA8452_PULSE_THS_MASK   GENMASK(6, 0)
>> +#define MMA8452_PULSE_COUNT  0x26
>> +#define MMA8452_PULSE_CHAN_SHIFT 2
>> +#define MMA8452_PULSE_LTCY   0x27
>> +
>>  #define MMA8452_CTR

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-13 Thread Jaegeuk Kim

On 11/13, Hyunchul Lee wrote:
> On 11/13/2017 10:59 AM, Chao Yu wrote:
> > On 2017/11/13 9:35, Hyunchul Lee wrote:
> >> On 11/13/2017 10:26 AM, Chao Yu wrote:
> >>> On 2017/11/13 8:24, Hyunchul Lee wrote:
>  On 11/10/2017 03:42 PM, Chao Yu wrote:
> > On 2017/11/10 8:23, Hyunchul Lee wrote:
> >> Hello, Chao
> >>
> >> On 11/09/2017 06:12 PM, Chao Yu wrote:
> >>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>  From: Hyunchul Lee 
> 
>  Using write hints[1], applications can inform the life time of the 
>  data
>  written to devices. and this[2] reported that the write hints patch
>  decreased writes in NAND by 25%.
> 
>  This hints help F2FS to determine the followings.
>    1) the segment types where the data will be written.
>    2) the hints that will be passed down to devices with the data of 
>  segments.
> 
>  This patch set implements the first mapping from write hints to 
>  segment types
>  as shown below.
> 
>    hints segment type
>    - 
>    WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>    WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>    othersCURSEG_WARM_DATA
> 
>  The F2FS poliy for hot/cold seperation has precedence over this 
>  hints, And
>  hints are not applied in in-place update.
> >>>
> >>> Could we change to disable IPU if file/inode write hint is existing?
> >>>
> >>
> >> I am afraid that this makes side effects. for example, this could cause
> >> out-of-place updates even when there are not enough free segments. 
> >> I can write the patch that handles these situations. But I wonder 
> >> that this is required, and I am not sure which IPU polices can be 
> >> disabled.
> >
> > Oh, As I replied in another thread, I think IPU just affects filesystem
> > hot/cold separating, rather than this feature. So I think it will be 
> > okay
> > to not consider it.
> >
> >>
> 
>  Before the second mapping is implemented, write hints are not passed 
>  down
>  to devices. Because it is better that the data of a segment have the 
>  same 
>  hint.
> 
>  [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>  [2]: https://lwn.net/Articles/726477/
> >>>
> >>> Could you write a patch to support passing write hint to block layer 
> >>> for
> >>> buffered writes as below commit:
> >>> 0127251c45ae ("ext4: add support for passing in write hints for 
> >>> buffered writes")
> >>>
> >>
> >> Sure I will. I wrote it already ;)
> >
> > Cool, ;)
> >
> >> I think that datas from the same segment should be passed down with 
> >> the same
> >> hint, and the following mapping is reasonable. I wonder what is your 
> >> opinion
> >> about it.
> >>
> >>   segment type   hints
> >>      -
> >>   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
> >>   CURSEG_HOT_DATAWRITE_LIFE_SHORT
> >>   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL
> >
> > We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?
> >
> >>   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM
> >
> > As I know, in scenario of cell phone, data of meta_inode is hottest, 
> > then hot
> > data, warm node, and cold node should be coldest. So I suggested we can 
> > define
> > as below:
> >
> > META_DATA   WRITE_LIFE_SHORT
> > HOT_DATA & WARM_NODEWRITE_LIFE_MEDIUM
> > HOT_NODE & WARM_DATAWRITE_LIFE_LONG
> > COLD_NODE & COLD_DATA   WRITE_LIFE_EXTREME
> >
> 
>  I agree, But I am not sure that assigning the same hint to a node and 
>  data
>  segment is good. Because NVMe is likely to write them in the same erase 
>  block if they have the same hint.
> >>>
> >>> If we do not give the hint, they can still be written to the same erase 
> >>> block,
> > 
> > I mean it's possible to write them to the same erase block. :)
> > 
> >>> right? it will not be worse?
> >>>
> >>
> >> If the hint is not given, I think that they could be written to 
> >> the same erase block, or not. But if we give the same hint, they are 
> >> written
> >> to the same block.
> > 
> > IMO, Only if underlying device can support more hint type or opened 
> > channels,
> > and actual temperature of data segment and node segment is quite different, 
> > we
> > can separate them.
> > 
> 
> Okay, If Jaegeuk Kim agrees with this, I will submit the patch that 
> implements your proposed mapping.

How about this? We'd better to split data and node blocks as much as possible.

segment typeh

Re: [PATCH RFC v3 4/6] Documentation: Add three sysctls for smart idle poll

2017-11-13 Thread Quan Xu




On 2017/11/13 23:08, Ingo Molnar wrote:

* Quan Xu  wrote:


From: Quan Xu 

To reduce the cost of poll, we introduce three sysctl to control the
poll time when running as a virtual machine with paravirt.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
---
  Documentation/sysctl/kernel.txt |   35 +++
  arch/x86/kernel/paravirt.c  |4 
  include/linux/kernel.h  |6 ++
  kernel/sysctl.c |   34 ++
  4 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c..30c25fb 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this one.
  
  ==
  
+paravirt_poll_grow: (X86 only)

+
+Multiplied value to increase the poll time. This is expected to take
+effect only when running as a virtual machine with CONFIG_PARAVIRT
+enabled. This can't bring any benifit on bare mental even with
+CONFIG_PARAVIRT enabled.
+
+By default this value is 2. Possible values to set are in range {2..16}.
+
+==
+
+paravirt_poll_shrink: (X86 only)
+
+Divided value to reduce the poll time. This is expected to take effect
+only when running as a virtual machine with CONFIG_PARAVIRT enabled.
+This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
+enabled.
+
+By default this value is 2. Possible values to set are in range {2..16}.
+
+==
+
+paravirt_poll_threshold_ns: (X86 only)
+
+Controls the maximum poll time before entering real idle path. This is
+expected to take effect only when running as a virtual machine with
+CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
+even with CONFIG_PARAVIRT enabled.
+
+By default, this value is 0 means not to poll. Possible values to set
+are in range {0..50}. Change the value to non-zero if running
+latency-bound workloads in a virtual machine.

I absolutely hate it how this hybrid idle loop polling mechanism is not
self-tuning!


Ingo, actually it is self-tuning..

Please make it all work fine by default, and automatically so, instead of adding
three random parameters...
.. I will make it all fine by default. howerver cloud environment is of 
diversity,


could I only leave paravirt_poll_threshold_ns parameter (the maximum 
poll time),
which is as similar as "adaptive halt-polling" Wanpeng mentioned.. then 
user can turn

it off, or find an appropriate threshold for some odd scenario..

thanks for your comments!!
Quan
Alibaba Cloud

And if it cannot be done automatically then we should rather not do it at all.
Maybe the next submitter of a similar feature can think of a better approach.

Thanks,

Ingo

Re: [Regression/XFS/PM] Freeze tasks failed in xfsaild

2017-11-13 Thread Dave Chinner

On Tue, Nov 14, 2017 at 11:39:59AM +0800, Yu Chen wrote:
> Hi Dave,
> On Tue, Nov 14, 2017 at 09:52:16AM +1100, Dave Chinner wrote:
> > On Mon, Nov 13, 2017 at 06:31:39PM +0800, Yu Chen wrote:
> > > Hi all,
> > > Currently we are running hibernation stress test on a server
> > > and unfortunately after 48 rounds of cycling, it fails at a
> > > early stage that, the xfs task refuses to be frozen by the system:
> > > 
> > > [ 1934.221653] PM: Syncing filesystems ...
> > > [ 1934.661517] PM: done.
> > > [ 1934.664067] Freezing user space processes ... (elapsed 0.003 seconds) 
> > > done.
> > > [ 1934.675251] OOM killer disabled.
> > > [ 1934.724317] PM: Preallocating image memory... done (allocated 6906555 
> > > pages)
> > > [ 1954.666378] PM: Allocated 27626220 kbytes in 19.93 seconds (1386.16 
> > > MB/s)
> > > [ 1954.673939] Freezing remaining freezable tasks ...
> > > [ 1974.681089] Freezing of tasks failed after 20.001 seconds (1 tasks 
> > > refusing to freeze, wq_busy=0):
> > > [ 1974.691169] xfsaild/dm-1D0  1362  2 0x0080
> > > [ 1974.697283] Call Trace:
> > > [ 1974.700014]  __schedule+0x3be/0x830
> > > [ 1974.703898]  schedule+0x36/0x80
> > > [ 1974.707440]  _xfs_log_force+0x143/0x280 [xfs]
> > > [ 1974.712295]  ? schedule_timeout+0x16b/0x350
> > > [ 1974.716953]  ? wake_up_q+0x80/0x80
> > > [ 1974.720752]  ? xfsaild+0x16f/0x770 [xfs]
> > > [ 1974.725134]  xfs_log_force+0x2c/0x80 [xfs]
> > > [ 1974.729707]  xfsaild+0x16f/0x770 [xfs]
> > > [ 1974.733885]  kthread+0x109/0x140
> > > [ 1974.737480]  ? kthread+0x109/0x140
> > > [ 1974.741271]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> > > [ 1974.747284]  ? kthread_park+0x60/0x60
> > > [ 1974.751354]  ret_from_fork+0x25/0x30
> > > [ 1974.755366] Restarting kernel threads ... done.
> > > [ 1978.259907] OOM killer enabled.
> > > [ 1978.263405] Restarting tasks ... done.
> > > 
> > > The reason for this failure might be that,
> > > while the kernel thread xfsaild/dm-1 is waiting for
> > > xfs-buf/dm-1 to wake it up, however the latter
> > > has already been frozen, thus xfsaild/dm-1 never
> > > has a chance to be woken up and get froze. (Although
> > > the xfsaild/dm-1 remains in TASK_UNINTERRUPTIBLE, which
> > > is quite similar to 'frozen'.)
> > 
> > Should be fixed by this commit in the for-next branch:
> > 
> > 0bd89676c4fe xfs: check kthread_should_stop() after the setting of task 
> > state
> > 
> > That should get merged into 4.15 with the next merge...
> >
> I did not quite catch why above commit would fix the issue here,
> according to
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=0bd89676c4fed53b003025bc4a5200861ac5d8ef
> it tries to address a race condition between umount and xfsaild on
> checking the kthread_should_stop() in order not to make
> xfsaild falling asleep indefinitely.

Argh, got my threads slightly crossed there.

> But in our case, the xfsaild is waiting for the xfs-buf to wake
> it up, and is nothing related to the kthread_should_stop() checking
> here, did I miss something?

Similar symptoms - the symptom that was fixed by the commit I
mentioned was the xfsaild getting stuck in sleeping forever and so
never seeing the KTHREAD_STOP bit - it was a "set bit vs wakeup"
race caused by the fact that we didn't reset the state of the
task correctly after wakeup.

You said:

>> (Although the xfsaild/dm-1 remains in TASK_UNINTERRUPTIBLE, which
>> is quite similar to 'frozen'.)

So from a quick look, it seemed like a similar race condition. I
missed the *un* part of the task state, though.
TASK_UNINTERRUPTIBLE implies waiting for IO completion, which is
what _xfs_log_force() is doing.

SO, follow the other branch of the discussion: hibernation needs to
freeze filesystems so they can quiesce gracefully before the kernel
starts shutting down the infrastructure the filesystem relies on...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH v2] coccinelle: fix parallel build with CHECK=scripts/coccicheck

2017-11-13 Thread Masahiro Yamada

Hi Julia,

2017-11-14 1:45 GMT+09:00 Julia Lawall :
>
>
> On Tue, 14 Nov 2017, Masahiro Yamada wrote:
>
>> Hi Julia,
>>
>>
>> 2017-11-14 0:30 GMT+09:00 Julia Lawall :
>> >
>> >
>> > On Thu, 9 Nov 2017, Masahiro Yamada wrote:
>> >
>> >> The command "make -j8 C=1 CHECK=scripts/coccicheck" produces lots of
>> >> "coccicheck failed" error messages.
>> >>
>> >> I do not know the coccinelle internals, but I guess --jobs does not
>> >> work well if spatch is invoked from Make running in parallel.
>> >> Disable --jobs in this case.
>> >
>> > Why is this change under:
>> >
>> > if [ "$C" = "1" -o "$C" = "2" ];
>> >
>> > The coccicheck failed messages come also if one runs Coccinelle on the
>> > entire kernel.
>>
>> As far as I tested, "coccicheck failed" error only happens
>> when ONLINE=1.
>>
>>
>> make -j8 C=1 CHECK=scripts/coccicheck  
>> COCCI=scripts/coccinelle/misc/bugon.cocci
>>
>> emits lots of errors.
>>
>>
>> make -j8 coccicheck  COCCI=scripts/coccinelle/misc/bugon.cocci
>>
>> is fine.
>>
>>
>> Have you tested it?
>> Do you mean you got a different result from mine?
>
> I agree with your results, with respect to the number of errors.
>
> julia
>

So, what shall we do?

If you do not like to fix it (or you can fix coccinelle itself),
I can take back this patch.

I am not a coccinelle developer, so
setting USE_JOBS="no" is the best I can do.




-- 
Best Regards
Masahiro Yamada

Re: [PATCH] quota: be aware of error from dquot_initialize

2017-11-13 Thread Chao Yu

On 2017/11/13 17:18, Jan Kara wrote:
> On Mon 13-11-17 11:31:48, Chao Yu wrote:
>> Commit 6184fc0b8dd7 ("quota: Propagate error from ->acquire_dquot()")
>> missed to handle error from dquot_initialize in dquot_file_open, fix it.
>>
>> Signed-off-by: Chao Yu 
> 
> Good spotting. I've added the patch to my tree.

Thanks for queuing the patch. :)

BTW, I notice in add_dquot_ref we also didn't handle error of
__dquot_initialize, should we handle it too?

Thanks,

> 
>   Honza
> 
>> ---
>>  fs/quota/dquot.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
>> index 50b0556a124f..80002c094647 100644
>> --- a/fs/quota/dquot.c
>> +++ b/fs/quota/dquot.c
>> @@ -2133,7 +2133,7 @@ int dquot_file_open(struct inode *inode, struct file 
>> *file)
>>  
>>  error = generic_file_open(inode, file);
>>  if (!error && (file->f_mode & FMODE_WRITE))
>> -dquot_initialize(inode);
>> +error = dquot_initialize(inode);
>>  return error;
>>  }
>>  EXPORT_SYMBOL(dquot_file_open);
>> -- 
>> 2.15.0.55.gc2ece9dc4de6
>>
>>

Re: [Regression/XFS/PM] Freeze tasks failed in xfsaild

2017-11-13 Thread Yu Chen

Hi Dave,
On Tue, Nov 14, 2017 at 09:52:16AM +1100, Dave Chinner wrote:
> On Mon, Nov 13, 2017 at 06:31:39PM +0800, Yu Chen wrote:
> > Hi all,
> > Currently we are running hibernation stress test on a server
> > and unfortunately after 48 rounds of cycling, it fails at a
> > early stage that, the xfs task refuses to be frozen by the system:
> > 
> > [ 1934.221653] PM: Syncing filesystems ...
> > [ 1934.661517] PM: done.
> > [ 1934.664067] Freezing user space processes ... (elapsed 0.003 seconds) 
> > done.
> > [ 1934.675251] OOM killer disabled.
> > [ 1934.724317] PM: Preallocating image memory... done (allocated 6906555 
> > pages)
> > [ 1954.666378] PM: Allocated 27626220 kbytes in 19.93 seconds (1386.16 MB/s)
> > [ 1954.673939] Freezing remaining freezable tasks ...
> > [ 1974.681089] Freezing of tasks failed after 20.001 seconds (1 tasks 
> > refusing to freeze, wq_busy=0):
> > [ 1974.691169] xfsaild/dm-1D0  1362  2 0x0080
> > [ 1974.697283] Call Trace:
> > [ 1974.700014]  __schedule+0x3be/0x830
> > [ 1974.703898]  schedule+0x36/0x80
> > [ 1974.707440]  _xfs_log_force+0x143/0x280 [xfs]
> > [ 1974.712295]  ? schedule_timeout+0x16b/0x350
> > [ 1974.716953]  ? wake_up_q+0x80/0x80
> > [ 1974.720752]  ? xfsaild+0x16f/0x770 [xfs]
> > [ 1974.725134]  xfs_log_force+0x2c/0x80 [xfs]
> > [ 1974.729707]  xfsaild+0x16f/0x770 [xfs]
> > [ 1974.733885]  kthread+0x109/0x140
> > [ 1974.737480]  ? kthread+0x109/0x140
> > [ 1974.741271]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> > [ 1974.747284]  ? kthread_park+0x60/0x60
> > [ 1974.751354]  ret_from_fork+0x25/0x30
> > [ 1974.755366] Restarting kernel threads ... done.
> > [ 1978.259907] OOM killer enabled.
> > [ 1978.263405] Restarting tasks ... done.
> > 
> > The reason for this failure might be that,
> > while the kernel thread xfsaild/dm-1 is waiting for
> > xfs-buf/dm-1 to wake it up, however the latter
> > has already been frozen, thus xfsaild/dm-1 never
> > has a chance to be woken up and get froze. (Although
> > the xfsaild/dm-1 remains in TASK_UNINTERRUPTIBLE, which
> > is quite similar to 'frozen'.)
> 
> Should be fixed by this commit in the for-next branch:
> 
> 0bd89676c4fe xfs: check kthread_should_stop() after the setting of task state
> 
> That should get merged into 4.15 with the next merge...
>
I did not quite catch why above commit would fix the issue here,
according to
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=0bd89676c4fed53b003025bc4a5200861ac5d8ef
it tries to address a race condition between umount and xfsaild on
checking the kthread_should_stop() in order not to make
xfsaild falling asleep indefinitely.
But in our case, the xfsaild is waiting for the xfs-buf to wake
it up, and is nothing related to the kthread_should_stop() checking
here, did I miss something?
Thanks,
Yu

Re: [PATCH] tick/broadcast: Remove redundant code in tick_check_new_device()

2017-11-13 Thread Zhenzhong Duan


On 2017/11/14 0:54, Thomas Gleixner wrote:


On Wed, 8 Nov 2017, Zhenzhong Duan wrote:


There is no way a timer used as broadcast clockevent device is also used as
percpu tick clockevent device currently.

Correct.


It's better to put related code in tick_install_broadcast_device(), but I feel
it's harmless to give it back to the clockevents layer. Pls correct me if I'm
wrong.

You already established, that it _cannot_ be the broadcast device and the
per cpu device at the same time. So that condition can never be true. What
do you want to put into tick_install_broadcast_device()? This second
paragraph doesn't make sense, unless I'm missing something.


I didn't find the reason in long history logs while the comments saying 'If the 
current device is the broadcast device, do not give it back to the clockevents 
layer !'

If it does, tick_install_broadcast_device() is a proper place. If not, I can 
resend the patch with fresh description, pls confirm.

--
thanks
zduan

[PATCH] uapi: fix linux/tls.h userspace compilation error

2017-11-13 Thread Dmitry V. Levin

Move inclusion of a private kernel header 
from uapi/linux/tls.h to its only user - net/tls.h,
to fix the following linux/tls.h userspace compilation error:

/usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or 
directory

As to this point uapi/linux/tls.h was totaly unusuable for userspace,
cleanup this header file further by moving other redundant includes
to net/tls.h.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Cc:  # v4.13+
Signed-off-by: Dmitry V. Levin 
---
 include/net/tls.h| 4 
 include/uapi/linux/tls.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index b89d397dd62f..c06db1eadac2 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -35,6 +35,10 @@
 #define _TLS_OFFLOAD_H
 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index d5e0682ab837..293b2cdad88d 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -35,10 +35,6 @@
 #define _UAPI_LINUX_TLS_H
 
 #include 
-#include 
-#include 
-#include 
-#include 
 
 /* TLS socket options */
 #define TLS_TX 1   /* Set transmit parameters */

-- 
ldv

Re: KASAN: use-after-free Read in rds_tcp_dev_event

2017-11-13 Thread Girish Moodalbail


On 11/7/17 12:28 PM, syzbot wrote:

Hello,

syzkaller hit the following crash on 287683d027a3ff83feb6c7044430c79881664ecf
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.




==
BUG: KASAN: use-after-free in rds_tcp_kill_sock net/rds/tcp.c:530 [inline]
BUG: KASAN: use-after-free in rds_tcp_dev_event+0xc01/0xc90 net/rds/tcp.c:568
Read of size 8 at addr 8801cd879200 by task kworker/u4:3/147

CPU: 0 PID: 147 Comm: kworker/u4:3 Not tainted 4.14.0-rc7+ #156
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011

Workqueue: netns cleanup_net
Call Trace:
  __dump_stack lib/dump_stack.c:16 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:52
  print_address_description+0x73/0x250 mm/kasan/report.c:252
  kasan_report_error mm/kasan/report.c:351 [inline]
  kasan_report+0x25b/0x340 mm/kasan/report.c:409
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
  rds_tcp_kill_sock net/rds/tcp.c:530 [inline]
  rds_tcp_dev_event+0xc01/0xc90 net/rds/tcp.c:568


The issue here is that we are trying to reference a network namespace (struct 
net *) that is long gone (i.e., L532 below -- c_net is the culprit).


528 spin_lock_irq(&rds_tcp_conn_lock);
529 list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list,
 t_tcp_node) {
530 struct net *c_net = tc->t_cpath->cp_conn->c_net;
531
532 if (net != c_net || !tc->t_sock)
533 continue;
534 if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn))
535 list_move_tail(&tc->t_tcp_node, &tmp_list);
536 }
537 spin_unlock_irq(&rds_tcp_conn_lock);
538 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) {
539 rds_tcp_conn_paths_destroy(tc->t_cpath->cp_conn);
540 rds_conn_destroy(tc->t_cpath->cp_conn);
541 }

When a network namespace is deleted, devices within that namespace are 
unregistered and removed one by one. RDS is notified about this event through 
rds_tcp_dev_event() callback. When the loopback device is removed from the 
namespace, the above RDS callback function destroys all the RDS connections in 
that namespace.


The loop@L529 above walks through each of the rds_tcp connection in the global 
list (rds_tcp_conn_list) to see if that connection belongs to the namespace in 
question. It collects all such connections and destroys them (L538-540). 
However, it leaves behind some of the rds_tcp connections that shared the same 
underlying RDS connection (L534 and 535). These connections with pointer to 
stale network namespace are left behind in the global list. When the 2nd network 
namespace is deleted, we will hit the above stale pointer and hit UAF panic.


I think we should move away from global list to a per-namespace list. The global 
list are used only in two places (both of which are per-namespace operations):


 - to destroy all the RDS connections belonging to a namespace when the
   network namespace is being deleted.
 - to reset all the RDS connections  when socket parameters for a namespace are
   modified using sysctl

Thanks,
~Girish

Re: [Regression/XFS/PM] Freeze tasks failed in xfsaild

2017-11-13 Thread Yu Chen

On Mon, Nov 13, 2017 at 09:14:14PM +0100, Luis R. Rodriguez wrote:
> On Mon, Nov 13, 2017 at 06:31:39PM +0800, Yu Chen wrote:
> > The xfs-buf/dm-1 should be freezed according to
> > commit 8018ec083c72 ("xfs: mark all internal workqueues
> > as freezable"), thus a easier way might be have to revert
> > commit 18f1df4e00ce ("xfs: Make xfsaild freezeable
> > again") for now, after this reverting the xfsaild/dm-1
> > becomes non-freezable again, thus pm does not see this
> > thread - unless we find a graceful way to treat xfsaild/dm-1
> > as 'frozen' if it is waiting for an already 'frozen' task,
> > or if the filesystem freeze is added in.
> > 
> > Any comments would be much appreciated.
> 
> Reverting 18f1df4e00ce ("xfs: Make xfsaild freezeable again")
> would break the proper form of the kthread for it to be freezable.
> This "form" is not defined formally, and sadly its just a form
> learned throughout years over different kthreads in the kernel.
> 
> I'm also not convinced all our hibernation / suspend woes would be fixed by
> reverting this commit, its why I worked instead on formalizing a proper freeze
> / thaw, which a lot of filesystems already implement prior to system
> hibernation / suspend / resume [0].
> 
> I'll be respinning this series without the last patch, provided I'm able to
> ensure I don't need the ext[234] hack I did in that thread. Can you test the
> first 3 patches *only* on that series and seeing if that helps on your XFS
> front as well?
> 
> [0] https://lkml.kernel.org/r/20171003185313.1017-1-mcg...@kernel.org
> 
>   Luis
Thanks for the comment Luis,
Yes, I agree the freezing of filesystem is a proper/thorough fix for such
kind issues, but as Dan said, it might be a little risky for us to
to deploy it on our products currently, unless it is in the
mainline/stable branch. Although the XFS issue might not be 100% reproducible,
we can help test the patch set while seeking for a lightweight 'fix'.
Thanks,
Yu

[PATCH] perf annotate: Remove precision for mnemonics

2017-11-13 Thread Ravi Bangoria

There are many instructions, esp on powerpc, whose mnemonics are
longer than 6 characters. Using precision limit causes truncation
of such mnemonics.

Fix this by removing precision limit. Note that, 'width' is still
6, so alignment won't get affected for length <= 6.

Before:

   li r11,-1
   xscvdp vs1,vs1
   add.   r10,r10,r11

After:

  li r11,-1
  xscvdpsxds vs1,vs1
  add.   r10,r10,r11

Reported-by: Donald Stence 
Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/annotate.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 54321b947de8..6462a7423beb 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -165,7 +165,7 @@ static void ins__delete(struct ins_operands *ops)
 static int ins__raw_scnprintf(struct ins *ins, char *bf, size_t size,
  struct ins_operands *ops)
 {
-   return scnprintf(bf, size, "%-6.6s %s", ins->name, ops->raw);
+   return scnprintf(bf, size, "%-6s %s", ins->name, ops->raw);
 }
 
 int ins__scnprintf(struct ins *ins, char *bf, size_t size,
@@ -230,12 +230,12 @@ static int call__scnprintf(struct ins *ins, char *bf, 
size_t size,
   struct ins_operands *ops)
 {
if (ops->target.name)
-   return scnprintf(bf, size, "%-6.6s %s", ins->name, 
ops->target.name);
+   return scnprintf(bf, size, "%-6s %s", ins->name, 
ops->target.name);
 
if (ops->target.addr == 0)
return ins__raw_scnprintf(ins, bf, size, ops);
 
-   return scnprintf(bf, size, "%-6.6s *%" PRIx64, ins->name, 
ops->target.addr);
+   return scnprintf(bf, size, "%-6s *%" PRIx64, ins->name, 
ops->target.addr);
 }
 
 static struct ins_ops call_ops = {
@@ -299,7 +299,7 @@ static int jump__scnprintf(struct ins *ins, char *bf, 
size_t size,
c++;
}
 
-   return scnprintf(bf, size, "%-6.6s %.*s%" PRIx64,
+   return scnprintf(bf, size, "%-6s %.*s%" PRIx64,
 ins->name, c ? c - ops->raw : 0, ops->raw,
 ops->target.offset);
 }
@@ -372,7 +372,7 @@ static int lock__scnprintf(struct ins *ins, char *bf, 
size_t size,
if (ops->locked.ins.ops == NULL)
return ins__raw_scnprintf(ins, bf, size, ops);
 
-   printed = scnprintf(bf, size, "%-6.6s ", ins->name);
+   printed = scnprintf(bf, size, "%-6s ", ins->name);
return printed + ins__scnprintf(&ops->locked.ins, bf + printed,
size - printed, ops->locked.ops);
 }
@@ -448,7 +448,7 @@ static int mov__parse(struct arch *arch, struct 
ins_operands *ops, struct map *m
 static int mov__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
-   return scnprintf(bf, size, "%-6.6s %s,%s", ins->name,
+   return scnprintf(bf, size, "%-6s %s,%s", ins->name,
 ops->source.name ?: ops->source.raw,
 ops->target.name ?: ops->target.raw);
 }
@@ -488,7 +488,7 @@ static int dec__parse(struct arch *arch __maybe_unused, 
struct ins_operands *ops
 static int dec__scnprintf(struct ins *ins, char *bf, size_t size,
   struct ins_operands *ops)
 {
-   return scnprintf(bf, size, "%-6.6s %s", ins->name,
+   return scnprintf(bf, size, "%-6s %s", ins->name,
 ops->target.name ?: ops->target.raw);
 }
 
@@ -500,7 +500,7 @@ static struct ins_ops dec_ops = {
 static int nop__scnprintf(struct ins *ins __maybe_unused, char *bf, size_t 
size,
  struct ins_operands *ops __maybe_unused)
 {
-   return scnprintf(bf, size, "%-6.6s", "nop");
+   return scnprintf(bf, size, "%-6s", "nop");
 }
 
 static struct ins_ops nop_ops = {
@@ -990,7 +990,7 @@ void disasm_line__free(struct disasm_line *dl)
 int disasm_line__scnprintf(struct disasm_line *dl, char *bf, size_t size, bool 
raw)
 {
if (raw || !dl->ins.ops)
-   return scnprintf(bf, size, "%-6.6s %s", dl->ins.name, 
dl->ops.raw);
+   return scnprintf(bf, size, "%-6s %s", dl->ins.name, 
dl->ops.raw);
 
return ins__scnprintf(&dl->ins, bf, size, &dl->ops);
 }
-- 
2.13.6

Re: video: fbdev: Convert timers to use timer_setup()

2017-11-13 Thread Kees Cook

On Mon, Nov 13, 2017 at 5:45 PM, Guenter Roeck  wrote:
> On Tue, Oct 24, 2017 at 08:20:26AM -0700, Kees Cook wrote:
>> In preparation for unconditionally passing the struct timer_list pointer to
>> all timer callbacks, switch to using the new timer_setup() and from_timer()
>> to pass the timer pointer explicitly. One tracking pointer was added, and
>> one initialization was cleaned up.
>>
>> Cc: Bartlomiej Zolnierkiewicz 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Tomi Valkeinen 
>> Cc: David Lechner 
>> Cc: Daniel Vetter 
>> Cc: Sean Paul 
>> Cc: Jean Delvare 
>> Cc: Hans de Goede 
>> Cc: "Gustavo A. R. Silva" 
>> Cc: linux-fb...@vger.kernel.org
>> Cc: dri-de...@lists.freedesktop.org
>> Cc: linux-o...@vger.kernel.org
>> Signed-off-by: Kees Cook 
>
> Hi Kees,
>
> this patch causes a large number of qemu crashes.
>
> Unable to handle kernel NULL pointer dereference at virtual address 0194
> pgd = c0004000
> [0194] *pgd=
> Internal error: Oops: 5 [#1] ARM
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-next-20171113 #1
> Hardware name: ARM-Versatile (Device Tree Support)
> task: c04df238 task.stack: c04da000
> PC is at queue_work_on+0x1c/0x48
> ...
> [] (queue_work_on) from [] 
> (cursor_timer_handler+0x20/0x44)
> [] (cursor_timer_handler) from [] 
> (call_timer_fn+0x24/0xa0)
> [] (call_timer_fn) from [] (expire_timers+0x7c/0x8c)
> [] (expire_timers) from [] (run_timer_softirq+0x88/0x184)
> [] (run_timer_softirq) from [] (__do_softirq+0xe0/0x238)
> [] (__do_softirq) from [] (irq_exit+0xb4/0xd0)
> [] (irq_exit) from [] (__handle_domain_irq+0x50/0xa8)
> [] (__handle_domain_irq) from [] 
> (vic_handle_irq+0x54/0x94)
> [] (vic_handle_irq) from [] (__irq_svc+0x68/0x84)
>
> See
> http://kerneltests.org/builders/qemu-arm-next/builds/806/steps/qemubuildcommand/logs/stdio
> for complete crash logs.
>
> Reverting the patch fixes the problem.
>
> Images for various other architectures crash as well in next-20171113,
> but I didn't bisect those. It looks like there are additional (possibly irq
> related) problems in the latest -next kernel; I don't know if those are
> also related to timer changes.

I think this is already fixed here:
https://marc.info/?l=linux-fbdev&m=151056635200583&w=2

If not, please let me know! :)

-Kees

-- 
Kees Cook
Pixel Security

Re: [intel-sgx-kernel-dev] [PATCH v5 11/11] intel_sgx: driver documentation

2017-11-13 Thread Kai Huang

On Mon, 2017-11-13 at 21:45 +0200, Jarkko Sakkinen wrote:
> Signed-off-by: Jarkko Sakkinen 
> ---
>  Documentation/index.rst |   1 +
>  Documentation/x86/intel_sgx.rst | 131
> 
>  2 files changed, 132 insertions(+)
>  create mode 100644 Documentation/x86/intel_sgx.rst
> 
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index cb7f1ba5b3b1..ccfebc260e04 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -86,6 +86,7 @@ implementation.
> :maxdepth: 2
>  
> sh/index
> +   x86/index
>  
>  Korean translations
>  ---
> diff --git a/Documentation/x86/intel_sgx.rst
> b/Documentation/x86/intel_sgx.rst
> new file mode 100644
> index ..34bcf6a2a495
> --- /dev/null
> +++ b/Documentation/x86/intel_sgx.rst
> @@ -0,0 +1,131 @@
> +===
> +Intel(R) SGX driver
> +===
> +
> +Introduction
> +
> +
> +Intel(R) SGX is a set of CPU instructions that can be used by
> applications to
> +set aside private regions of code and data. The code outside the
> enclave is
> +disallowed to access the memory inside the enclave by the CPU access
> control.
> +In a way you can think that SGX provides inverted sandbox. It
> protects the
> +application from a malicious host.
> +
> +There is a new hardware unit in the processor called Memory
> Encryption Engine
> +(MEE) starting from the Skylake microarchitecture. BIOS can define
> one or many
> +MEE regions that can hold enclave data by configuring them with
> PRMRR registers.
> +
> +The MEE automatically encrypts the data leaving the processor
> package to the MEE
> +regions. The data is encrypted using a random key whose life-time is
> exactly one
> +power cycle.

Not sure whether you should talk about MEE staff here. They are not in
SDM and (thus) may potentially be changed in the future.

> +
> +You can tell if your CPU supports SGX by looking into
> ``/proc/cpuinfo``:
> +
> + ``cat /proc/cpuinfo  | grep sgx``
> +
> +Enclave data types
> +==
> +
> +SGX defines new data types to maintain information about the
> enclaves and their
> +security properties.
> +
> +The following data structures exist in MEE regions:
> +
> +* **Enclave Page Cache (EPC):** memory pages for protected code and
> data
> +* **Enclave Page Cache Map (EPCM):** meta-data for each EPC page
> +
> +The Enclave Page Cache holds following types of pages:
> +
> +* **SGX Enclave Control Structure (SECS)**: meta-data defining the
> global
> +  properties of an enclave such as range of addresses it can access.
> +* **Regular (REG):** containing code and data for the enclave.
> +* **Thread Control Structure (TCS):** defines an entry point for a
> hardware
> +  thread to enter into the enclave. The enclave can only be entered
> through
> +  these entry points.
> +* **Version Array (VA)**: an EPC page receives a unique 8 byte
> version number
> +  when it is swapped, which is then stored into a VA page. A VA page
> can hold up
> +  to 512 version numbers.
> +
> +Launch control
> +==
> +
> +For launching an enclave, two structures must be provided for
> ENCLS(EINIT):
> +
> +1. **SIGSTRUCT:** a signed measurement of the enclave binary.
> +2. **EINITTOKEN:** the measurement, the public key of the signer and
> various
> +   enclave attributes. This structure contains a MAC of its contents
> using
> +   hardware derived symmetric key called *launch key*.
> +
> +The hardware platform contains a root key pair for signing the
> SIGTRUCT
> +for a *launch enclave* that is able to acquire the *launch key* for
> +creating EINITTOKEN's for other enclaves.  For the launch enclave
> +EINITTOKEN is not needed because it is signed with the private root
> key.
> +
> +There are two feature control bits associate with launch control
> +
> +* **IA32_FEATURE_CONTROL[0]**: locks down the feature control
> register
> +* **IA32_FEATURE_CONTROL[17]**: allow runtime reconfiguration of
> +  IA32_SGXLEPUBKEYHASHn MSRs that define MRSIGNER hash for the
> launch
> +  enclave. Essentially they define a signing key that does not
> require
> +  EINITTOKEN to be let run.
> +
> +The BIOS can configure IA32_SGXLEPUBKEYHASHn MSRs before feature
> control
> +register is locked.
> +
> +It could be tempting to implement launch control by writing the MSRs
> +every time when an enclave is launched. This does not scale because
> for
> +generic case because BIOS might lock down the MSRs before handover
> to
> +the OS.
> +
> +Debug enclaves
> +--
> +
> +Enclave can be set as a *debug enclave* of which memory can be read
> or written
> +by using the ENCLS(EDBGRD) and ENCLS(EDBGWR) opcodes. The Intel
> provided launch
> +enclave provides them always a valid EINITTOKEN and therefore they
> are a low
> +hanging fruit way to try out SGX.
> +
> +Virtualization
> +==
> +
> +Launch control
> +--
> +
> +The values for IA32_SGXLEPUBKEYHASHn MSRs cannot be em

RE: [patch v2 4/8] KVM: x86: add Intel processor trace cpuid emulataion

2017-11-13 Thread Kang, Luwei

> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > 0099e10..ef19a11 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -70,6 +70,7 @@ u64 kvm_supported_xcr0(void)
> >  /* These are scattered features in cpufeatures.h. */
> >  #define KVM_CPUID_BIT_AVX512_4VNNIW 2
> >  #define KVM_CPUID_BIT_AVX512_4FMAPS 3
> > +#define KVM_CPUID_BIT_INTEL_PT 25
> 
> This is not necessary, because there is no need to place processor tracing in 
> scattered features.  Can you replace this hunk, and the KF usage below, with 
> the following patch?
> 

Yes, this looks good to me. will fix in next version. 

Thanks,
Luwei Kang

>  8< -
> From: Paolo Bonzini 
> Subject: [PATCH] x86: cpufeature: move processor tracing out of scattered 
> features
> 
> Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX), so do not 
> duplicate it in the scattered features word.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/include/asm/cpufeatures.h | 3 ++-
>  arch/x86/kernel/cpu/scattered.c| 1 -
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 2519c6c801c9..839781e78763 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -199,7 +199,7 @@
>  #define X86_FEATURE_SME  ( 7*32+10) /* AMD Secure Memory 
> Encryption */
> 
>  #define X86_FEATURE_INTEL_PPIN   ( 7*32+14) /* Intel Processor Inventory 
> Number */
> -#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
> +
>  #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network 
> Instructions */  #define
> X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single 
> precision */
> 
> @@ -238,6 +238,7 @@
>  #define X86_FEATURE_AVX512IFMA  ( 9*32+21) /* AVX-512 Integer Fused 
> Multiply-Add instructions */
>  #define X86_FEATURE_CLFLUSHOPT   ( 9*32+23) /* CLFLUSHOPT instruction */
>  #define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
> +#define X86_FEATURE_INTEL_PT ( 9*32+25) /* Intel Processor Trace */
>  #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
>  #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and 
> Reciprocal */
>  #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
> diff --git a/arch/x86/kernel/cpu/scattered.c 
> b/arch/x86/kernel/cpu/scattered.c index 05459ad3db46..d0e69769abfd 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -21,7 +21,6 @@ struct cpuid_bit {
>  static const struct cpuid_bit cpuid_bits[] = {
>   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
>   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
> - { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
>   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
>   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
>   { X86_FEATURE_CAT_L3,   CPUID_EBX,  1, 0x0010, 0 },
> 
> >  #define KF(x) bit(KVM_CPUID_BIT_##x)
> >
> >  int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -327,6 +328,7 @@
> > static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 
> > function,
> > unsigned f_invpcid = kvm_x86_ops->invpcid_supported() ? F(INVPCID) : 0;
> > unsigned f_mpx = kvm_mpx_supported() ? F(MPX) : 0;
> > unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
> > +   unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? KF(INTEL_PT) :
> > +0;
> >
> > /* cpuid 1.edx */
> > const u32 kvm_cpuid_1_edx_x86_features =

Re: [RFC PATCH v10 6/7] PCI / PM: Move acpi wakeup code to pci core

2017-11-13 Thread Brian Norris

Hi Rafael,

I'll answer some of it from my perspective, though Jeffy might have had
different ideas (and answers) when he implemented this.

On Wed, Nov 08, 2017 at 11:32:20PM +0100, Rafael J. Wysocki wrote:
> On Friday, October 27, 2017 9:26:11 AM CET Jeffy Chen wrote:
> > Move acpi wakeup code to pci core as pci_set_wakeup(), so that other
> > platforms could reuse it.
> 
> What exactly do you want to reuse?
> 
> It looks like that's just several lines of code in acpi_pci_wakeup()
> and acpi_pci_propagate_wakeup() which invoke ACPI-specific lower-level
> functions, so IMO not worth it at all.

The important part he's sharing here is the walking of the tree
structure, in which it's possible for some parent along the way to
handle wakeup for its children. I'm not sure how valuable nor how
reusable that is.

In this case (the Rockchip platforms Jeffy and I are working on), I
think we really want to just support a single WAKE# pin for the whole
system, so maybe the complexity isn't needed. The spec does describe
that there are good reasons for supporting more than 1 WAKE# pin though
(e.g., 1 per device), so it doesn't seem really wise to shoehorn
oursleves into a single setup.

But that can be implemented either via copying the "few" lines of
tree-walking logic, or by trying to share them.

> The structure for other platform code may be the same or similar, but
> the details will almost certainly be different and I don't think that
> having more callback pointers in pci_platform_pm_ops is necessarily better.

I suppose that's reasonable.

> > Also add .setup_dev() / .setup_host_bridge() / .cleanup() platform pm
> > ops's callbacks to setup and cleanup pci devices and host bridge for
> > wakeup.
> 
> Why are they needed?

The implementation is in patch 7, if you really needed more info about
why, or provide alternatives.

The current set of hooks assumes that there is no state information or
initialization needed for tracking actions of these platform PM hooks on
a device. For ACPI this works, because devices have "companion"
acpi_dev's to handle everything, and the ACPI framework generally
prepares GPE's for you, IIUC. For 'pci-mid', the operations happen to be
trivial (and arguably wrong -- several are no-ops, where we might expect
the platform to tell us whether the operation was actually supported or
not).

For device tree, there isn't really a canonical place to store this
information, nor to initialize something like wakeup interrupts.

Technically, we could shoehorn this into the .set_wakeup() call, but
we'd probably rather not do things like request_irq() on every attempt
to suspend/resume the system (among other reasons, we'd lose information
that we might otherwise track in /proc/ or /sys/).

Or the inverse of the above: where would you suggest initializing or
tearing down the wakeirq?

An alternative could be to include any necessary state into the
pci_host_bridge or pci_dev and just inline the setup code into
pci.c/remove.c (e.g., pci_register_host_bridge()) and pci-driver.c
(pci_device_{probe,remove}()). But I'm not sure that's much more
beautiful.

Brian

> > Signed-off-by: Jeffy Chen 
> 
> Thanks,
> Rafael
>

Re: [GIT pull] printk updates for 4.15

2017-11-13 Thread Linus Torvalds

On Mon, Nov 13, 2017 at 5:18 PM, Linus Torvalds
 wrote:
>
>  (b) just emit a "synchronization printk" every once in a while, which
> is obviously also using the same standard time source, but the line
> actually _says_ what the other time sources are.

Side note: there's a few good obvious times to do this. After a NTP
synchronization, after a resume, and maybe "every X hours if nothing
else is happening".

That "if nothing else is happening" would actually be a nice heartbeat
thing for people who care about that. I've had machines crash
overnight, and later wondered when it happened. Of course, these days
other system journal sources tend to be so chatty that it doesn't much
happen, but maybe it would still be appreciated in embedded places
where that isn't yet the case..

And that "how often you do the time sync printk" really could be a
kernel configuration thing then, but it wouldn't actually affect any
existing machinery unlike the "let's just change what the printk
header timestamp means".

  Linus

Re: [v8, 4/5] x86/xsave: Make XSAVE check the base CPUID features before enabling

2017-11-13 Thread Guenter Roeck


On 11/13/2017 05:28 PM, Andi Kleen wrote:

Guenter,

Do you have a command line that reproduces it and the exact log
output?



Sorry, I forgot: Logs are at http://kerneltests.org/builders/qemu-x86_64-next.

Guenter

Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area

2017-11-13 Thread Andy Lutomirski

On Mon, Nov 13, 2017 at 6:28 PM, Linus Torvalds
 wrote:
> On Mon, Nov 13, 2017 at 6:25 PM, Andy Lutomirski  wrote:
>> On Mon, Nov 13, 2017 at 11:36 AM, Linus Torvalds
>>  wrote:
>>>
>>> I forget what the actual size is, but aligning the hardware TSS struct
>>> to 128 bytes might be sufficient. It's not that big.
>>
>> 104 bytes, so it's probably already fine.  For anything except an
>> actual task switch, only the first 12 or so bytes matter.
>
> Note that historically, about half of the Intel errata (that don't get
> fixed) are about TSS in oddball situations, mainly page crossers.
>
> I may be exaggerating just a tiny bit, but it's definitely a "don't do it".

:)

I suspect the major case where this matters is when we do a task
switch, which only ever happens on 32-bit double faults, at which
point we're already seriously screwed.  But yes, I agree.

Re: linux-next: Signed-off-by missing for commit in the f2fs tree

2017-11-13 Thread Jaegeuk Kim

On 11/13, Stephen Rothwell wrote:
> Hi Jaegeuk,
> 
> Commit
> 
>   c79d88f915ed ("f2fs: separate nat entry mem alloc from nat_tree_lock")
> 
> is missing a Signed-off-by from its author.

Thank you so much to point this out.
I fixed it.

Thanks,

> 
> -- 
> Cheers,
> Stephen Rothwell

Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area

2017-11-13 Thread Linus Torvalds

On Mon, Nov 13, 2017 at 6:25 PM, Andy Lutomirski  wrote:
> On Mon, Nov 13, 2017 at 11:36 AM, Linus Torvalds
>  wrote:
>>
>> I forget what the actual size is, but aligning the hardware TSS struct
>> to 128 bytes might be sufficient. It's not that big.
>
> 104 bytes, so it's probably already fine.  For anything except an
> actual task switch, only the first 12 or so bytes matter.

Note that historically, about half of the Intel errata (that don't get
fixed) are about TSS in oddball situations, mainly page crossers.

I may be exaggerating just a tiny bit, but it's definitely a "don't do it".

   Linus

Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area

2017-11-13 Thread Andy Lutomirski

On Mon, Nov 13, 2017 at 11:22 AM, Dave Hansen  wrote:
> On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>> index fbc9b7f4e35e..8a9ba5553cab 100644
>> --- a/arch/x86/include/asm/fixmap.h
>> +++ b/arch/x86/include/asm/fixmap.h
>> @@ -52,6 +52,13 @@ extern unsigned long __FIXADDR_TOP;
>>  struct cpu_entry_area
>>  {
>>   char gdt[PAGE_SIZE];
>> +
>> + /*
>> +  * The gdt is just below cpu_tss and thus serves (on x86_64) as a
>> +  * a read-only guard page for the SYSENTER stack at the bottom
>> +  * of the TSS region.
>> +  */
>> + struct tss_struct tss;
>>  };
>>
>
> Aha, and here's the place that you need sizeof(tss_struct) to be nice
> and page-aligned.
>
> But why don't we just do:
>
> char tss_space[PAGE_SIZE*something];

The idea is to save some space.  The TSS plus IO bitmap is slightly
over a page, so, if we're giving it a dedicated block of pages, we
have almost a page of unused space.  I want to use some of that space
for the SYSENTER stack.  To reliably detect overflow, that space
should be at the beginning.

It turns out that using almost a page is way too *big*: it masks bugs.
I want anything nontrivial that accidentally runs on the SYSENTER
stack to overflow and crash very quickly rather than having a decent
chance of working or of causing nasty corruption with a crash down the
road.  So I'm going to make it much smaller and instead just add a
build-time assertion that we don't cross a page boundary.

Re: [PATCH 3/4] x86/umip: Identify the str and sldt instructions

2017-11-13 Thread Ricardo Neri

On Mon, Nov 13, 2017 at 09:12:03AM +0100, Ingo Molnar wrote:
> 
> * Ricardo Neri  wrote:
> 
> > The instructions str and sldt are not emulated in any case. Thus, it made
> > sense to not implement functionality to identify them. However, a
> > subsequent commit will introduce functionality to warn about the use of
> > all the instructions that UMIP protect, not only those that are emulated.
> > A first step for that is the ability to identify them.
> > 
> > Plus, now that str and sldt are identified, we need to explicitly avoid
> > their emulation (i.e., not rely on unsuccessful identification). Group
> > togehter all the cases that we do not want to emulate: str, sldt and user
> > long mode processes.
> 
> Did you notice how in all your previous patches (both in the code and in the 
> changelogs) I have manually fixed up the capitalization of these instruction 
> mnenonics?

I am sorry, I tried to see where you made these changes but I could not find
any. I did a git diff of arch/x86/kernel/umip.c between the branch 
rneri/umip_v11
of my repository [1] and the master branch of the tip tree and I did not find
any differences.

Also, looking at the log of the master branch of the tip tree I see, for
instance:

commit 1e5db223696afa55e6a038fac638f759e1fdcc01
Author: Ricardo Neri 
Date:   Sun Nov 5 18:27:52 2017 -0800

x86/umip: Add emulation code for UMIP instructions

The feature User-Mode Instruction Prevention present in recent Intel
processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and
str) from being executed with CPL > 0. Otherwise, a general protection
fault is issued.
...

The instruction mnemonics were not capitalized. Is the master branch the one 
where
I can look at your fixes?

> 
> The capitalized form is much more readable, especially with seriously 
> overloaded 
> acronyms such as 'str' ...

I see.
> 
> You now repeat the same bad pattern, in fact you regress existing code:
> 
> > -   /* SLDT AND STR are not emulated */
> 
> > +   /* Do not emulate sldt, str or user long mode processes. */
> 
> Please be more careful with such details, and please fix & resend this series.

Sure, I will submit a v2 with capitalized mnemonics in both the code and the
patch descriptions. I will be more careful in the future.

Thanks and BR,
Ricardo

[1]. https://github.com/ricardon/tip/commits/rneri/umip_v11

Re: [RFC 6/7] x86/asm: Remap the TSS into the cpu entry area

2017-11-13 Thread Andy Lutomirski

On Mon, Nov 13, 2017 at 11:36 AM, Linus Torvalds
 wrote:
> On Mon, Nov 13, 2017 at 11:22 AM, Dave Hansen  wrote:
>>
>> Aha, and here's the place that you need sizeof(tss_struct) to be nice
>> and page-aligned.
>
> No, it should _not_ be page-aligned. It should fit _within_ a page,
> but it 'struct tss_struct' now has something else in front of it, then
> page-aliging that is actually pointless.
>
> I forget what the actual size is, but aligning the hardware TSS struct
> to 128 bytes might be sufficient. It's not that big.

104 bytes, so it's probably already fine.  For anything except an
actual task switch, only the first 12 or so bytes matter.

Re: [PATCH 1/3] perf help: Document missing options

2017-11-13 Thread Taeung Song


Hi Arnaldo and Namhyung :)

On 11/14/2017 09:15 AM, Namhyung Kim wrote:

Hi Arnaldo,

On Mon, Nov 13, 2017 at 03:29:56PM -0300, Arnaldo Carvalho de Melo wrote:

Em Sun, Nov 12, 2017 at 10:10:45AM +0900, Sihyeon Jang escreveu:

Cc: Jiri Olsa 
Cc: Namhyung Kim 
Signed-off-by: Sihyeon Jang 
---
  tools/perf/Documentation/perf-help.txt | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-help.txt 
b/tools/perf/Documentation/perf-help.txt
index 5143918..bb605af 100644
--- a/tools/perf/Documentation/perf-help.txt
+++ b/tools/perf/Documentation/perf-help.txt
@@ -7,7 +7,7 @@ perf-help - display help information about perf
  
  SYNOPSIS

  
-'perf help' [-a|--all] [COMMAND]
+'perf help' [--all] [--man|--web|--info] [COMMAND]


Can you try figuring out if this actually works? I tried here and it
doesn't, its an area we took "for free" when we copied the initial
codebase from git.git, but I never looked at this area that much, now
that I try:


Yeah, I'm not sure we need to keep it.




[acme@jouet linux]$ perf help
Config with no key for man viewer: childrenError: wrong config key-value pair 
top.children=true
[acme@jouet linux]$

Unsure if this is something that got broken by the 'perf config'
patches, Taeung?


Looks like a bug in 8e99b6d4533c ("tools include: Adopt strstarts()
from the kernel").

Following patch should fix it:

Thanks,
Namhyung


I also checked this error and test the below patch.
It seems that Namhyung already fixes it !!

Thanks,
Taeung




 From 096b78b437b5758acc025498e88d73d9d471b3c0 Mon Sep 17 00:00:00 2001
From: Namhyung Kim 
Date: Tue, 14 Nov 2017 09:10:43 +0900
Subject: [PATCH] perf help: Fix a bug during strstart() conversion

The commit 8e99b6d4533c changed prefixcmp() to strstart() but missed to
change the return value in some place.  It makes perf help print
annoying output even for sane config items like below:

   $ perf help
   '.root': unsupported man viewer sub key.
   ...

Fixes: 8e99b6d4533c ("tools include: Adopt strstarts() from the kernel")
Cc: Taeung Song 
Signed-off-by: Namhyung Kim 
---
  tools/perf/builtin-help.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-help.c b/tools/perf/builtin-help.c
index dbe4e4153bcf..ff51e5fc0daf 100644
--- a/tools/perf/builtin-help.c
+++ b/tools/perf/builtin-help.c
@@ -283,7 +283,7 @@ static int perf_help_config(const char *var, const char 
*value, void *cb)
add_man_viewer(value);
return 0;
}
-   if (!strstarts(var, "man."))
+   if (strstarts(var, "man."))
return add_man_viewer_info(var, value);
  
  	return 0;

@@ -313,7 +313,7 @@ static const char *cmd_to_page(const char *perf_cmd)
  
  	if (!perf_cmd)

return "perf";
-   else if (!strstarts(perf_cmd, "perf"))
+   else if (strstarts(perf_cmd, "perf"))
return perf_cmd;
  
  	return asprintf(&s, "perf-%s", perf_cmd) < 0 ? NULL : s;

Re: [RFC 1/7] x86/asm/64: Allocate and enable the SYSENTER stack

2017-11-13 Thread Andy Lutomirski

On Mon, Nov 13, 2017 at 11:07 AM, Dave Hansen  wrote:
> On 11/10/2017 08:05 PM, Andy Lutomirski wrote:
>> This will simplify some future code changes that will want some
>> temporary stack space in more places.  It also lets us get rid of a
>> SWAPGS_UNSAFE_STACK user.
>>
>> This does not depend on CONFIG_IA32_EMULATION because we'll want the
>> stack space even without IA32 emulation.
>
> It was never clear to me why we don't use this on 64-bit today.  Does
> anybody know why?

Nothing used it?

As far as I can tell, the original x86_64 Linux port was a little bit
more excited about IST than I think made sense.  As a result, we use
IST for #DB and #BP, which is IMO rather nasty and causes a lot more
problems than it solves.  But, since #DB uses IST, we don't actually
need a real stack for SYSENTER (because SYSENTER with TF set will
invoke #DB on the IST stack rather than the SYSENTER stack).

I have old patches to stop using IST for #DB and #BP, but I never finished them.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 975 matches

Mail list logo