date:20180325

Re: [QUESTION] Mainline support for B43_PHY_AC wifi cards

2018-03-25 Thread Juri Lelli

On 24/03/18 00:01, Rafał Miłecki wrote:
> On 23 March 2018 at 15:09, Juri Lelli  wrote:
> > On 23/03/18 14:43, Rafał Miłecki wrote:
> >> Hi,
> >>
> >> On 23 March 2018 at 10:47, Juri Lelli  wrote:
> >> > I've got a Dell XPS 13 9343/0TM99H (BIOS A15 01/23/2018) mounting a
> >> > BCM4352 802.11ac (rev 03) wireless card and so far I've been using it on
> >> > Fedora with broadcom-wl package (which I believe installs Broadcom's STA
> >> > driver?). It works good apart from occasional hiccups after suspend.
> >> >
> >> > I'd like to get rid of that dependency (you can understand that it's
> >> > particularly annoying when testing mainline kernels), but I found out
> >> > that support for my card is BROKEN in mainline [1]. Just to see what
> >> > happens, I forcibly enabled it witnessing that it indeed crashes like
> >> > below as Kconfig warns. :)
> >> >
> >> >  bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00
> >> >  bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x2B, 
> >> > class 0x0)
> >> >  bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x2A, 
> >> > class 0x0)
> >> >  bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 0x02, 
> >> > class 0x0)
> >> >  bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 0x01, 
> >> > class 0x0)
> >> >  bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, rev 
> >> > 0x11, class 0x0)
> >> >  bcma: Unsupported SPROM revision: 11
> >> >  bcma: bus0: Invalid SPROM read from the PCIe card, trying to use 
> >> > fallback SPROM
> >> >  bcma: bus0: Using fallback SPROM failed (err -2)
> >> >  bcma: bus0: No SPROM available
> >> >  bcma: bus0: Bus registered
> >> >  b43-phy0: Broadcom 4352 WLAN found (core revision 42)
> >> >  b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1
> >> >  b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, Version 0
> >> >  BUG: unable to handle kernel NULL pointer dereference at 
> >> > 
> >>
> >> This isn't really useful without a full backtrace.
> >
> > Sure. I cut it here because I didn't expect people to debug what is
> > already known to be broken (but still it seemed to carry useful
> > information about the hw). :)
> 
> Please paste the remaining part if you still got it.

Sure, please find it below.

Thanks!

- Juri

--->8---

[   60.732180] cfg80211: Loading compiled-in X.509 certificates for regulatory 
database
[   60.733048] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   60.733303] platform regulatory.0: Direct firmware load for regulatory.db 
failed with error -2
[   60.733305] cfg80211: failed to load regulatory.db
[   61.047277] bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00
[   61.047302] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 
0x2B, class 0x0)
[   61.047316] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, 
rev 0x2A, class 0x0)
[   61.047340] bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 
0x02, class 0x0)
[   61.047366] bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 
0x01, class 0x0)
[   61.047380] bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, 
rev 0x11, class 0x0)
[   61.107321] bcma: Unsupported SPROM revision: 11
[   61.107325] bcma: bus0: Invalid SPROM read from the PCIe card, trying to use 
fallback SPROM
[   61.107326] bcma: bus0: Using fallback SPROM failed (err -2)
[   61.107327] bcma: bus0: No SPROM available
[   61.109830] bcma: bus0: Bus registered
[   61.242068] b43-phy0: Broadcom 4352 WLAN found (core revision 42)
[   61.242481] b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1
[   61.242487] b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, 
Version 0
[   61.242909] BUG: unable to handle kernel NULL pointer dereference at 

[   61.242916] IP:   (null)
[   61.242919] PGD 0 P4D 0 
[   61.242924] Oops: 0010 [#1] PREEMPT SMP PTI
[   61.242926] Modules linked in: b43(+) bcma mac80211 cfg80211 ssb mmc_core 
rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter 
ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat 
ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c 
iptable_mangle iptable_raw iptable_security ebtable_filter ebtables 
ip6table_filter ip6_tables cmac bnep sunrpc intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek 
snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel btusb snd_hda_codec 
uvcvideo btrtl btbcm btintel snd_hda_core bluetooth snd_hwdep videobuf2_vmalloc 
snd_seq
[   61.242989]  videobuf2_memops videobuf2_v4l2 iTCO_wdt snd_seq_device snd_pcm 
videobuf2_common iTCO_vendor_support dell_laptop dell_wmi irqbypass videodev 
wmi_bmof

Re: [QUESTION] Mainline support for B43_PHY_AC wifi cards

2018-03-25 Thread Juri Lelli

On 24/03/18 00:01, Rafał Miłecki wrote:
> On 23 March 2018 at 15:09, Juri Lelli  wrote:
> > On 23/03/18 14:43, Rafał Miłecki wrote:
> >> Hi,
> >>
> >> On 23 March 2018 at 10:47, Juri Lelli  wrote:
> >> > I've got a Dell XPS 13 9343/0TM99H (BIOS A15 01/23/2018) mounting a
> >> > BCM4352 802.11ac (rev 03) wireless card and so far I've been using it on
> >> > Fedora with broadcom-wl package (which I believe installs Broadcom's STA
> >> > driver?). It works good apart from occasional hiccups after suspend.
> >> >
> >> > I'd like to get rid of that dependency (you can understand that it's
> >> > particularly annoying when testing mainline kernels), but I found out
> >> > that support for my card is BROKEN in mainline [1]. Just to see what
> >> > happens, I forcibly enabled it witnessing that it indeed crashes like
> >> > below as Kconfig warns. :)
> >> >
> >> >  bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00
> >> >  bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x2B, 
> >> > class 0x0)
> >> >  bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x2A, 
> >> > class 0x0)
> >> >  bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 0x02, 
> >> > class 0x0)
> >> >  bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 0x01, 
> >> > class 0x0)
> >> >  bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, rev 
> >> > 0x11, class 0x0)
> >> >  bcma: Unsupported SPROM revision: 11
> >> >  bcma: bus0: Invalid SPROM read from the PCIe card, trying to use 
> >> > fallback SPROM
> >> >  bcma: bus0: Using fallback SPROM failed (err -2)
> >> >  bcma: bus0: No SPROM available
> >> >  bcma: bus0: Bus registered
> >> >  b43-phy0: Broadcom 4352 WLAN found (core revision 42)
> >> >  b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1
> >> >  b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, Version 0
> >> >  BUG: unable to handle kernel NULL pointer dereference at 
> >> > 
> >>
> >> This isn't really useful without a full backtrace.
> >
> > Sure. I cut it here because I didn't expect people to debug what is
> > already known to be broken (but still it seemed to carry useful
> > information about the hw). :)
> 
> Please paste the remaining part if you still got it.

Sure, please find it below.

Thanks!

- Juri

--->8---

[   60.732180] cfg80211: Loading compiled-in X.509 certificates for regulatory 
database
[   60.733048] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   60.733303] platform regulatory.0: Direct firmware load for regulatory.db 
failed with error -2
[   60.733305] cfg80211: failed to load regulatory.db
[   61.047277] bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00
[   61.047302] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 
0x2B, class 0x0)
[   61.047316] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, 
rev 0x2A, class 0x0)
[   61.047340] bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 
0x02, class 0x0)
[   61.047366] bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 
0x01, class 0x0)
[   61.047380] bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, 
rev 0x11, class 0x0)
[   61.107321] bcma: Unsupported SPROM revision: 11
[   61.107325] bcma: bus0: Invalid SPROM read from the PCIe card, trying to use 
fallback SPROM
[   61.107326] bcma: bus0: Using fallback SPROM failed (err -2)
[   61.107327] bcma: bus0: No SPROM available
[   61.109830] bcma: bus0: Bus registered
[   61.242068] b43-phy0: Broadcom 4352 WLAN found (core revision 42)
[   61.242481] b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1
[   61.242487] b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, 
Version 0
[   61.242909] BUG: unable to handle kernel NULL pointer dereference at 

[   61.242916] IP:   (null)
[   61.242919] PGD 0 P4D 0 
[   61.242924] Oops: 0010 [#1] PREEMPT SMP PTI
[   61.242926] Modules linked in: b43(+) bcma mac80211 cfg80211 ssb mmc_core 
rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter 
ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat 
ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c 
iptable_mangle iptable_raw iptable_security ebtable_filter ebtables 
ip6table_filter ip6_tables cmac bnep sunrpc intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek 
snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel btusb snd_hda_codec 
uvcvideo btrtl btbcm btintel snd_hda_core bluetooth snd_hwdep videobuf2_vmalloc 
snd_seq
[   61.242989]  videobuf2_memops videobuf2_v4l2 iTCO_wdt snd_seq_device snd_pcm 
videobuf2_common iTCO_vendor_support dell_laptop dell_wmi irqbypass videodev 
wmi_bmof sparse_keymap dell_smbios intel_cstate

Re: [PATCH v2] PCI / PM: Always check PME wakeup capability for runtime wakeup support

2018-03-25 Thread Kai Heng Feng


Hi Bjorn, Rafael,

On Mar 19, 2018, at 10:09 PM, Kai-Heng Feng   
wrote:


USB controller ASM1042 stops working after commit de3ef1eb1cd0 ("PM /
core: Drop run_wake flag from struct dev_pm_info").

The device in question is not power managed by platform firmware,
furthermore, it only supports PME# from D3cold:
Capabilities: [78] Power Management version 3
   Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA 
PME(D0-,D1-,D2-,D3hot-,D3cold+)
   Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

Before commit de3ef1eb1cd0, the device never gets runtime suspended.
After that commit, the device gets runtime suspended, so it does not
respond to any PME#.

usb_hcd_pci_probe() mandatorily calls device_wakeup_enable(), hence
device_can_wakeup() in pci_dev_run_wake() always returns true.

So pci_dev_run_wake() needs to check PME wakeup capability as its first
condition.

Fixes: de3ef1eb1cd0 ("PM / core: Drop run_wake flag from struct  
dev_pm_info")

Cc: sta...@vger.kernel.org # 4.13+
Signed-off-by: Kai-Heng Feng 



Is there any improvement I can address?
Or do you have any concern about this patch?

Kai-Heng


---
v2: Explicitly check dev->pme_support.

 drivers/pci/pci.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f6a4dd10d9b0..52821a21fc07 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2125,16 +2125,16 @@ bool pci_dev_run_wake(struct pci_dev *dev)
 {
struct pci_bus *bus = dev->bus;

-   if (device_can_wakeup(>dev))
-   return true;
-
if (!dev->pme_support)
return false;

/* PME-capable in principle, but not from the target power state */
-   if (!pci_pme_capable(dev, pci_target_state(dev, false)))
+   if (!pci_pme_capable(dev, pci_target_state(dev, true)))
return false;

+   if (device_can_wakeup(>dev))
+   return true;
+
while (bus->parent) {
struct pci_dev *bridge = bus->self;

--
2.15.1

Re: [PATCH v2] PCI / PM: Always check PME wakeup capability for runtime wakeup support

2018-03-25 Thread Kai Heng Feng


Hi Bjorn, Rafael,

On Mar 19, 2018, at 10:09 PM, Kai-Heng Feng   
wrote:


USB controller ASM1042 stops working after commit de3ef1eb1cd0 ("PM /
core: Drop run_wake flag from struct dev_pm_info").

The device in question is not power managed by platform firmware,
furthermore, it only supports PME# from D3cold:
Capabilities: [78] Power Management version 3
   Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA 
PME(D0-,D1-,D2-,D3hot-,D3cold+)
   Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

Before commit de3ef1eb1cd0, the device never gets runtime suspended.
After that commit, the device gets runtime suspended, so it does not
respond to any PME#.

usb_hcd_pci_probe() mandatorily calls device_wakeup_enable(), hence
device_can_wakeup() in pci_dev_run_wake() always returns true.

So pci_dev_run_wake() needs to check PME wakeup capability as its first
condition.

Fixes: de3ef1eb1cd0 ("PM / core: Drop run_wake flag from struct  
dev_pm_info")

Cc: sta...@vger.kernel.org # 4.13+
Signed-off-by: Kai-Heng Feng 



Is there any improvement I can address?
Or do you have any concern about this patch?

Kai-Heng


---
v2: Explicitly check dev->pme_support.

 drivers/pci/pci.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f6a4dd10d9b0..52821a21fc07 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2125,16 +2125,16 @@ bool pci_dev_run_wake(struct pci_dev *dev)
 {
struct pci_bus *bus = dev->bus;

-   if (device_can_wakeup(>dev))
-   return true;
-
if (!dev->pme_support)
return false;

/* PME-capable in principle, but not from the target power state */
-   if (!pci_pme_capable(dev, pci_target_state(dev, false)))
+   if (!pci_pme_capable(dev, pci_target_state(dev, true)))
return false;

+   if (device_can_wakeup(>dev))
+   return true;
+
while (bus->parent) {
struct pci_dev *bridge = bus->self;

--
2.15.1

Re: [PATCH v3] x86: i8237: Register based on FADT legacy boot flag

2018-03-25 Thread Rajneesh Bhardwaj

On Sun, Mar 25, 2018 at 01:50:40PM +0200, Thomas Gleixner wrote:
> On Thu, 22 Mar 2018, Anshuman Gupta wrote:
> 
> > From: Rajneesh Bhardwaj 
> > 
> > >From Skylake onwards, the platform controller hub (Sunrisepoint PCH) does
> > not support legacy DMA operations to IO ports 81h-83h, 87h, 89h-8Bh, 8Fh.
> > Currently this driver registers as syscore ops and its resume function is
> > called on every resume from S3. On Skylake and Kabylake, this causes a
> > resume delay of around 100ms due to port IO operations, which is a problem.
> > 
> > This change allows to load the driver only when the platform bios
> > explicitly supports such devices or has a cut-off date earlier than 2017.
> 
> Please explain WHY 2017 is the cut-off date. I still have no clue how that
> is decided aside of being a random number.

Hello Thomas,

We tested on few Intel platforms such as Skylake, Kabylake, Geminilake etc
and realized that the BIOS always sets the FADT flag to be true though the
device may not be physically present on the SoC. This is a BIOS bug. To keep
the impact minimum, we decided to add a cut-off date since we are not aware
of any BIOS (other than the coreboot link provided in the commit msg) that
properly sets this field. SoCs released after Skylake will not have this DMA
device on the PCH. So, because of these two reasons, we decided to add a
cut-off date as 2017.

Please let us know if you feel strongly about it and we can change it or
remove it if you feel so.

Ideally, we didnt want to add this BIOS check at all and only wanted to use
inb() approch but unfortunately, that too was broken for port 0x81.

@Rafael / Alan / Andy - Please add more or correct me in case of anything
missed  or not communicated fully.

> 
> Thanks,
> 
>   tglx

-- 
Best Regards,
Rajneesh

Re: [PATCH v3] x86: i8237: Register based on FADT legacy boot flag

2018-03-25 Thread Rajneesh Bhardwaj

On Sun, Mar 25, 2018 at 01:50:40PM +0200, Thomas Gleixner wrote:
> On Thu, 22 Mar 2018, Anshuman Gupta wrote:
> 
> > From: Rajneesh Bhardwaj 
> > 
> > >From Skylake onwards, the platform controller hub (Sunrisepoint PCH) does
> > not support legacy DMA operations to IO ports 81h-83h, 87h, 89h-8Bh, 8Fh.
> > Currently this driver registers as syscore ops and its resume function is
> > called on every resume from S3. On Skylake and Kabylake, this causes a
> > resume delay of around 100ms due to port IO operations, which is a problem.
> > 
> > This change allows to load the driver only when the platform bios
> > explicitly supports such devices or has a cut-off date earlier than 2017.
> 
> Please explain WHY 2017 is the cut-off date. I still have no clue how that
> is decided aside of being a random number.

Hello Thomas,

We tested on few Intel platforms such as Skylake, Kabylake, Geminilake etc
and realized that the BIOS always sets the FADT flag to be true though the
device may not be physically present on the SoC. This is a BIOS bug. To keep
the impact minimum, we decided to add a cut-off date since we are not aware
of any BIOS (other than the coreboot link provided in the commit msg) that
properly sets this field. SoCs released after Skylake will not have this DMA
device on the PCH. So, because of these two reasons, we decided to add a
cut-off date as 2017.

Please let us know if you feel strongly about it and we can change it or
remove it if you feel so.

Ideally, we didnt want to add this BIOS check at all and only wanted to use
inb() approch but unfortunately, that too was broken for port 0x81.

@Rafael / Alan / Andy - Please add more or correct me in case of anything
missed  or not communicated fully.

> 
> Thanks,
> 
>   tglx

-- 
Best Regards,
Rajneesh

Re: [PATCH] ext4 : fix comments in ext4_swap_extents

2018-03-25 Thread Theodore Y. Ts'o

On Sat, Mar 24, 2018 at 03:28:24PM +0800, zhenwei.pi wrote:
> "mark_unwritten" in comment and "unwritten" in variable
> argument lists is mismatch.
> 
> Signed-off-by: zhenwei.pi 

Applied, thanks.

- Ted

Re: [PATCH] ext4 : fix comments in ext4_swap_extents

2018-03-25 Thread Theodore Y. Ts'o

On Sat, Mar 24, 2018 at 03:28:24PM +0800, zhenwei.pi wrote:
> "mark_unwritten" in comment and "unwritten" in variable
> argument lists is mismatch.
> 
> Signed-off-by: zhenwei.pi 

Applied, thanks.

- Ted

linux-next: manual merge of the kvm tree with the kvm-fixes tree

2018-03-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/x86/kvm/vmx.c

between commit:

  9d1887ef3252 ("KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0")

from the kvm-fixes tree and commit:

  2bb8cafea80b ("KVM: vVMX: signal failure for nested VMEntry if 
emulation_required")

from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/kvm/vmx.c
index 92496b9b5f2b,b4d8da6c62c8..
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@@ -10952,6 -11010,19 +11021,14 @@@ static int prepare_vmcs02(struct kvm_vc
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
  
 -  if (vmx->nested.dirty_vmcs12) {
 -  prepare_vmcs02_full(vcpu, vmcs12, from_vmentry);
 -  vmx->nested.dirty_vmcs12 = false;
 -  }
 -
+   /*
+* Guest state is invalid and unrestricted guest is disabled,
+* which means L1 attempted VMEntry to L2 with invalid state.
+* Fail the VMEntry.
+*/
+   if (vmx->emulation_required)
+   return 1;
+ 
/* Shadow page tables on either EPT or shadow page tables. */
if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, 
nested_cpu_has_ept(vmcs12),
entry_failure_code))


pgpVrgLAJzdY_.pgp
Description: OpenPGP digital signature

linux-next: manual merge of the kvm tree with the kvm-fixes tree

2018-03-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  arch/x86/kvm/vmx.c

between commit:

  9d1887ef3252 ("KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0")

from the kvm-fixes tree and commit:

  2bb8cafea80b ("KVM: vVMX: signal failure for nested VMEntry if 
emulation_required")

from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/kvm/vmx.c
index 92496b9b5f2b,b4d8da6c62c8..
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@@ -10952,6 -11010,19 +11021,14 @@@ static int prepare_vmcs02(struct kvm_vc
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
  
 -  if (vmx->nested.dirty_vmcs12) {
 -  prepare_vmcs02_full(vcpu, vmcs12, from_vmentry);
 -  vmx->nested.dirty_vmcs12 = false;
 -  }
 -
+   /*
+* Guest state is invalid and unrestricted guest is disabled,
+* which means L1 attempted VMEntry to L2 with invalid state.
+* Fail the VMEntry.
+*/
+   if (vmx->emulation_required)
+   return 1;
+ 
/* Shadow page tables on either EPT or shadow page tables. */
if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, 
nested_cpu_has_ept(vmcs12),
entry_failure_code))


pgpVrgLAJzdY_.pgp
Description: OpenPGP digital signature

[PATCH] ALSA: aloop: Mark paused device as inactive

2018-03-25 Thread Robert Rosengren

Show paused ALSA aloop device as inactive, i.e. the control
"PCM Slave Active" set as false. Notification sent upon state change.

This makes it possible for client capturing from aloop device to know if
data is expected. Without it the client expects data even if playback
is paused.

Signed-off-by: Robert Rosengren 
---
 sound/drivers/aloop.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/sound/drivers/aloop.c b/sound/drivers/aloop.c
index 0333143a1fa7..5404ab11132d 100644
--- a/sound/drivers/aloop.c
+++ b/sound/drivers/aloop.c
@@ -291,6 +291,8 @@ static int loopback_trigger(struct snd_pcm_substream 
*substream, int cmd)
cable->pause |= stream;
loopback_timer_stop(dpcm);
spin_unlock(>lock);
+   if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
+   loopback_active_notify(dpcm);
break;
case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
case SNDRV_PCM_TRIGGER_RESUME:
@@ -299,6 +301,8 @@ static int loopback_trigger(struct snd_pcm_substream 
*substream, int cmd)
cable->pause &= ~stream;
loopback_timer_start(dpcm);
spin_unlock(>lock);
+   if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
+   loopback_active_notify(dpcm);
break;
default:
return -EINVAL;
@@ -879,9 +883,11 @@ static int loopback_active_get(struct snd_kcontrol 
*kcontrol,
[kcontrol->id.subdevice][kcontrol->id.device ^ 1];
unsigned int val = 0;
 
-   if (cable != NULL)
-   val = (cable->running & (1 << SNDRV_PCM_STREAM_PLAYBACK)) ?
-   1 : 0;
+   if (cable != NULL) {
+   unsigned int running = cable->running ^ cable->pause;
+
+   val = (running & (1 << SNDRV_PCM_STREAM_PLAYBACK)) ? 1 : 0;
+   }
ucontrol->value.integer.value[0] = val;
return 0;
 }
-- 
2.11.0

[PATCH] ALSA: aloop: Mark paused device as inactive

2018-03-25 Thread Robert Rosengren

Show paused ALSA aloop device as inactive, i.e. the control
"PCM Slave Active" set as false. Notification sent upon state change.

This makes it possible for client capturing from aloop device to know if
data is expected. Without it the client expects data even if playback
is paused.

Signed-off-by: Robert Rosengren 
---
 sound/drivers/aloop.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/sound/drivers/aloop.c b/sound/drivers/aloop.c
index 0333143a1fa7..5404ab11132d 100644
--- a/sound/drivers/aloop.c
+++ b/sound/drivers/aloop.c
@@ -291,6 +291,8 @@ static int loopback_trigger(struct snd_pcm_substream 
*substream, int cmd)
cable->pause |= stream;
loopback_timer_stop(dpcm);
spin_unlock(>lock);
+   if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
+   loopback_active_notify(dpcm);
break;
case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
case SNDRV_PCM_TRIGGER_RESUME:
@@ -299,6 +301,8 @@ static int loopback_trigger(struct snd_pcm_substream 
*substream, int cmd)
cable->pause &= ~stream;
loopback_timer_start(dpcm);
spin_unlock(>lock);
+   if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
+   loopback_active_notify(dpcm);
break;
default:
return -EINVAL;
@@ -879,9 +883,11 @@ static int loopback_active_get(struct snd_kcontrol 
*kcontrol,
[kcontrol->id.subdevice][kcontrol->id.device ^ 1];
unsigned int val = 0;
 
-   if (cable != NULL)
-   val = (cable->running & (1 << SNDRV_PCM_STREAM_PLAYBACK)) ?
-   1 : 0;
+   if (cable != NULL) {
+   unsigned int running = cable->running ^ cable->pause;
+
+   val = (running & (1 << SNDRV_PCM_STREAM_PLAYBACK)) ? 1 : 0;
+   }
ucontrol->value.integer.value[0] = val;
return 0;
 }
-- 
2.11.0

Re: [PATCH v2 1/4] clk: qcom: Clear hardware clock control bit of RCG

2018-03-25 Thread Nischal, Amit




On 3/20/2018 4:25 AM, Stephen Boyd wrote:

Quoting Amit Nischal (2018-03-07 23:18:12)

For upcoming targets like sdm845, POR value of the hardware clock control
bit is set for most of root clocks which needs to be cleared for software
to be able to control. For older targets like MSM8996, this bit is reserved
bit and having POR value as 0 so this patch will work for the older targets
too. So update the configuration mask to take care of the same to clear
hardware clock control bit.

Signed-off-by: Amit Nischal 
---
  drivers/clk/qcom/clk-rcg2.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
index bbeaf9c..e63db10 100644
--- a/drivers/clk/qcom/clk-rcg2.c
+++ b/drivers/clk/qcom/clk-rcg2.c
@@ -1,5 +1,5 @@
  /*
- * Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2013, 2018, The Linux Foundation. All rights reserved.

It would be nice if lawyers over there could avoid forcing copyright
date updates when less than half the file changes.


Thanks for the review.
I will address the above in the next patch series.




   *
   * This software is licensed under the terms of the GNU General Public
   * License version 2, as published by the Free Software Foundation, and
@@ -42,6 +42,7 @@
  #define CFG_MODE_SHIFT 12
  #define CFG_MODE_MASK  (0x3 << CFG_MODE_SHIFT)
  #define CFG_MODE_DUAL_EDGE (0x2 << CFG_MODE_SHIFT)
+#define CFG_HW_CLK_CTRL_MASK   BIT(20)

  #define M_REG  0x8
  #define N_REG  0xc
@@ -276,7 +277,7 @@ static int clk_rcg2_configure(struct clk_rcg2 *rcg, const 
struct freq_tbl *f)
 }

 mask = BIT(rcg->hid_width) - 1;
-   mask |= CFG_SRC_SEL_MASK | CFG_MODE_MASK;
+   mask |= CFG_SRC_SEL_MASK | CFG_MODE_MASK | CFG_HW_CLK_CTRL_MASK;
 cfg = f->pre_div << CFG_SRC_DIV_SHIFT;
 cfg |= rcg->parent_map[index].cfg << CFG_SRC_SEL_SHIFT;
 if (rcg->mnd_width && f->n && (f->m != f->n))

Is there going to be a future patch to update the RCGs to indicate they
support hardware control or not?


As of now, there will not be any patch to update the RCGs to support HW control.

Re: [PATCH v2 1/4] clk: qcom: Clear hardware clock control bit of RCG

2018-03-25 Thread Nischal, Amit




On 3/20/2018 4:25 AM, Stephen Boyd wrote:

Quoting Amit Nischal (2018-03-07 23:18:12)

For upcoming targets like sdm845, POR value of the hardware clock control
bit is set for most of root clocks which needs to be cleared for software
to be able to control. For older targets like MSM8996, this bit is reserved
bit and having POR value as 0 so this patch will work for the older targets
too. So update the configuration mask to take care of the same to clear
hardware clock control bit.

Signed-off-by: Amit Nischal 
---
  drivers/clk/qcom/clk-rcg2.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
index bbeaf9c..e63db10 100644
--- a/drivers/clk/qcom/clk-rcg2.c
+++ b/drivers/clk/qcom/clk-rcg2.c
@@ -1,5 +1,5 @@
  /*
- * Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2013, 2018, The Linux Foundation. All rights reserved.

It would be nice if lawyers over there could avoid forcing copyright
date updates when less than half the file changes.


Thanks for the review.
I will address the above in the next patch series.




   *
   * This software is licensed under the terms of the GNU General Public
   * License version 2, as published by the Free Software Foundation, and
@@ -42,6 +42,7 @@
  #define CFG_MODE_SHIFT 12
  #define CFG_MODE_MASK  (0x3 << CFG_MODE_SHIFT)
  #define CFG_MODE_DUAL_EDGE (0x2 << CFG_MODE_SHIFT)
+#define CFG_HW_CLK_CTRL_MASK   BIT(20)

  #define M_REG  0x8
  #define N_REG  0xc
@@ -276,7 +277,7 @@ static int clk_rcg2_configure(struct clk_rcg2 *rcg, const 
struct freq_tbl *f)
 }

 mask = BIT(rcg->hid_width) - 1;
-   mask |= CFG_SRC_SEL_MASK | CFG_MODE_MASK;
+   mask |= CFG_SRC_SEL_MASK | CFG_MODE_MASK | CFG_HW_CLK_CTRL_MASK;
 cfg = f->pre_div << CFG_SRC_DIV_SHIFT;
 cfg |= rcg->parent_map[index].cfg << CFG_SRC_SEL_SHIFT;
 if (rcg->mnd_width && f->n && (f->m != f->n))

Is there going to be a future patch to update the RCGs to indicate they
support hardware control or not?


As of now, there will not be any patch to update the RCGs to support HW control.

Re: [Bug 199003] console stalled, cause Hard LOCKUP.

2018-03-25 Thread Sergey Senozhatsky

Cc-ing the kernel list and printk people.

Wen Yang, any chance we can switch to email? Bugzilla is not very handful.


On (03/26/18 02:40), bugzilla-dae...@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=199003
> 
> --- Comment #11 from Wen Yang (wen.yan...@zte.com.cn) ---
> Hello Steven,
> 
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> 
> It's depend on this kernel parameter(printk.synchronous), but this parameter 
> is
> readonly.
> So we must change grub files, and alse need to restart the server for changes
> to take effect.
> If we can configure it dynamically, it will be more useful.

So you are testing printk_kthread now, right? May I ask why?
Did Steven's patch help?

-ss

Re: [Bug 199003] console stalled, cause Hard LOCKUP.

2018-03-25 Thread Sergey Senozhatsky

Cc-ing the kernel list and printk people.

Wen Yang, any chance we can switch to email? Bugzilla is not very handful.


On (03/26/18 02:40), bugzilla-dae...@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=199003
> 
> --- Comment #11 from Wen Yang (wen.yan...@zte.com.cn) ---
> Hello Steven,
> 
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> 
> It's depend on this kernel parameter(printk.synchronous), but this parameter 
> is
> readonly.
> So we must change grub files, and alse need to restart the server for changes
> to take effect.
> If we can configure it dynamically, it will be more useful.

So you are testing printk_kthread now, right? May I ask why?
Did Steven's patch help?

-ss

Re: [Bug 199003] console stalled, cause Hard LOCKUP.

2018-03-25 Thread Sergey Senozhatsky

On (03/23/18 14:16), Petr Mladek wrote:
[..]
> If I get it correctly, the reporter of this bug has not tried
> Steven's patches yet.

It's not immediately clear.

It's not even completely clear if we are looking at "X cpus printk
1 cpu prints it all scenario" and it's not clear if hand off will be
helpful here. I'll try to explain.

What we see is:

CPU0 locked up on blkg->q->queue_lock

  [] cfqg_print_rwstat_recursive+0x36/0x40
  [] cgroup_seqfile_show+0x73/0x80
  [] ? seq_buf_alloc+0x17/0x40
  [] seq_read+0x10a/0x3b0
  [] vfs_read+0x9e/0x170
  [] SyS_read+0x7f/0xe0
  [] system_call_fastpath+0x16/0x1b

blkg->q->queue_lock was held by CPU7, which was spinnig in wait_for_xmitr().

 #5 [881ffb0b7548] __const_udelay at 81326678
 #6 [881ffb0b7558] wait_for_xmitr at 814056e0
 #7 [881ffb0b7580] serial8250_console_putchar at 814058ac
 #8 [881ffb0b75a0] uart_console_write at 8140035a
 #9 [881ffb0b75d0] serial8250_console_write at 814057fe
#10 [881ffb0b7618] call_console_drivers.constprop.17 at 81087011
#11 [881ffb0b7640] console_unlock at 810889e9
#12 [881ffb0b7680] vprintk_emit at 81088df4
#13 [881ffb0b76f0] dev_vprintk_emit at 81428e72
#14 [881ffb0b77a8] dev_printk_emit at 81428eee
#15 [881ffb0b7808] __dev_printk at 8142937e
#16 [881ffb0b7818] dev_printk at 8142942d
#17 [881ffb0b7888] sdev_prefix_printk at 81463771
#18 [881ffb0b7918] scsi_prep_state_check at 814598e4
#19 [881ffb0b7928] scsi_prep_fn at 8145992d
#20 [881ffb0b7960] blk_peek_request at 812f0826
#21 [881ffb0b7988] scsi_request_fn at 8145b588
#22 [881ffb0b79f0] __blk_run_queue at 812ebd63
#23 [881ffb0b7a08] blk_queue_bio at 812f1013 <<< acquired 
q->queue_lock
#24 [881ffb0b7a50] generic_make_request at 812ef209
#25 [881ffb0b7a98] submit_bio at 812ef351
#26 [881ffb0b7af0] xfs_submit_ioend_bio at a0146a63 [xfs]
#27 [881ffb0b7b00] xfs_submit_ioend at a0146b31 [xfs]
#28 [881ffb0b7b40] xfs_vm_writepages at a0146e18 [xfs]
#29 [881ffb0b7bb8] do_writepages at 8118da6e
#30 [881ffb0b7bc8] __writeback_single_inode at 812293a0
#31 [881ffb0b7c08] writeback_sb_inodes at 8122a08e
#32 [881ffb0b7cb0] __writeback_inodes_wb at 8122a2ef
#33 [881ffb0b7cf8] wb_writeback at 8122ab33
#34 [881ffb0b7d70] bdi_writeback_workfn at 8122cb2b
#35 [881ffb0b7e20] process_one_work at 810a851b
#36 [881ffb0b7e68] worker_thread at 810a9356
#37 [881ffb0b7ec8] kthread at 810b0b6f
#38 [881ffb0b7f50] ret_from_fork at 81697a18

Given how slow serial8250_console_putchar()->wait_for_xmitr() can be -
10ms of delay for every char - it's possible that we had no concurrent
printk()-s from other CPUs. So may be we had just one printing CPU,
and several CPUs spinning on a spin_lock which was owned by the printing
CPU.

So that's why printk_deferred() helped here. It simply detached 8250
and made spin_lock critical secrtion to be as fast as printk->log_store().

But here comes the tricky part. Suppose that we:
a) have at least two CPUs that call printk concurrently
b) have hand off enabled

Now, what will happen if we have something like this

CPU0CPU1CPU2
spin_lock(queue_lock)
 printk printk
cfqg_print_rwstat_recursive() serial8250
 spin_lock(queue_lock)   printk  
serial8250
  serial8250printk

serial8250

I suspect that handoff may not be very helpful. CPU1 and CPU2 will wait for
each to finish serial8250 and to hand off printing to each other. So CPU1
will do 2 serial8250 invocations to printk its messages, and in between it
will spin waiting for CPU2 to do its printk->serial8250 and to handoff
printing to CPU1. The problem is that CPU1 will be under spin_lock() all
that time, so CPU0 is going to suffer just like before.

Opinions?

-ss

Re: [Bug 199003] console stalled, cause Hard LOCKUP.

2018-03-25 Thread Sergey Senozhatsky

On (03/23/18 14:16), Petr Mladek wrote:
[..]
> If I get it correctly, the reporter of this bug has not tried
> Steven's patches yet.

It's not immediately clear.

It's not even completely clear if we are looking at "X cpus printk
1 cpu prints it all scenario" and it's not clear if hand off will be
helpful here. I'll try to explain.

What we see is:

CPU0 locked up on blkg->q->queue_lock

  [] cfqg_print_rwstat_recursive+0x36/0x40
  [] cgroup_seqfile_show+0x73/0x80
  [] ? seq_buf_alloc+0x17/0x40
  [] seq_read+0x10a/0x3b0
  [] vfs_read+0x9e/0x170
  [] SyS_read+0x7f/0xe0
  [] system_call_fastpath+0x16/0x1b

blkg->q->queue_lock was held by CPU7, which was spinnig in wait_for_xmitr().

 #5 [881ffb0b7548] __const_udelay at 81326678
 #6 [881ffb0b7558] wait_for_xmitr at 814056e0
 #7 [881ffb0b7580] serial8250_console_putchar at 814058ac
 #8 [881ffb0b75a0] uart_console_write at 8140035a
 #9 [881ffb0b75d0] serial8250_console_write at 814057fe
#10 [881ffb0b7618] call_console_drivers.constprop.17 at 81087011
#11 [881ffb0b7640] console_unlock at 810889e9
#12 [881ffb0b7680] vprintk_emit at 81088df4
#13 [881ffb0b76f0] dev_vprintk_emit at 81428e72
#14 [881ffb0b77a8] dev_printk_emit at 81428eee
#15 [881ffb0b7808] __dev_printk at 8142937e
#16 [881ffb0b7818] dev_printk at 8142942d
#17 [881ffb0b7888] sdev_prefix_printk at 81463771
#18 [881ffb0b7918] scsi_prep_state_check at 814598e4
#19 [881ffb0b7928] scsi_prep_fn at 8145992d
#20 [881ffb0b7960] blk_peek_request at 812f0826
#21 [881ffb0b7988] scsi_request_fn at 8145b588
#22 [881ffb0b79f0] __blk_run_queue at 812ebd63
#23 [881ffb0b7a08] blk_queue_bio at 812f1013 <<< acquired 
q->queue_lock
#24 [881ffb0b7a50] generic_make_request at 812ef209
#25 [881ffb0b7a98] submit_bio at 812ef351
#26 [881ffb0b7af0] xfs_submit_ioend_bio at a0146a63 [xfs]
#27 [881ffb0b7b00] xfs_submit_ioend at a0146b31 [xfs]
#28 [881ffb0b7b40] xfs_vm_writepages at a0146e18 [xfs]
#29 [881ffb0b7bb8] do_writepages at 8118da6e
#30 [881ffb0b7bc8] __writeback_single_inode at 812293a0
#31 [881ffb0b7c08] writeback_sb_inodes at 8122a08e
#32 [881ffb0b7cb0] __writeback_inodes_wb at 8122a2ef
#33 [881ffb0b7cf8] wb_writeback at 8122ab33
#34 [881ffb0b7d70] bdi_writeback_workfn at 8122cb2b
#35 [881ffb0b7e20] process_one_work at 810a851b
#36 [881ffb0b7e68] worker_thread at 810a9356
#37 [881ffb0b7ec8] kthread at 810b0b6f
#38 [881ffb0b7f50] ret_from_fork at 81697a18

Given how slow serial8250_console_putchar()->wait_for_xmitr() can be -
10ms of delay for every char - it's possible that we had no concurrent
printk()-s from other CPUs. So may be we had just one printing CPU,
and several CPUs spinning on a spin_lock which was owned by the printing
CPU.

So that's why printk_deferred() helped here. It simply detached 8250
and made spin_lock critical secrtion to be as fast as printk->log_store().

But here comes the tricky part. Suppose that we:
a) have at least two CPUs that call printk concurrently
b) have hand off enabled

Now, what will happen if we have something like this

CPU0CPU1CPU2
spin_lock(queue_lock)
 printk printk
cfqg_print_rwstat_recursive() serial8250
 spin_lock(queue_lock)   printk  
serial8250
  serial8250printk

serial8250

I suspect that handoff may not be very helpful. CPU1 and CPU2 will wait for
each to finish serial8250 and to hand off printing to each other. So CPU1
will do 2 serial8250 invocations to printk its messages, and in between it
will spin waiting for CPU2 to do its printk->serial8250 and to handoff
printing to CPU1. The problem is that CPU1 will be under spin_lock() all
that time, so CPU0 is going to suffer just like before.

Opinions?

-ss

Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree

2018-03-25 Thread Sasha Levin

On Sat, Mar 24, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote:
>On Sat, Mar 24, 2018 at 10:06:38AM +0100, Greg Kroah-Hartman wrote:
>> On Fri, Mar 23, 2018 at 06:23:02PM +, Luis R. Rodriguez wrote:
>> > On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
>> > > On Fri, Mar 23, 2018 at 05:08:13PM +, Luis R. Rodriguez wrote:
>> > > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
>> > > > > On Fri, Mar 23, 2018 at 01:30:37AM +, Luis R. Rodriguez wrote:
>> > > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
>> > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> > > > > > > index 61d1cb7..8012741 100644
>> > > > > > > --- a/fs/xfs/xfs_inode.c
>> > > > > > > +++ b/fs/xfs/xfs_inode.c
>> > > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
>> > > > > > >  }
>> > > > > > >
>> > > > > > >  /*
>> > > > > > > + * Free any local-format buffers sitting around before we reset 
>> > > > > > > to
>> > > > > > > + * extents format.
>> > > > > > > + */
>> > > > > > > +static inline void
>> > > > > > > +xfs_ifree_local_data(
>> > > > > > > +struct xfs_inode*ip,
>> > > > > > > +int whichfork)
>> > > > > > > +{
>> > > > > > > +struct xfs_ifork*ifp;
>> > > > > > > +
>> > > > > > > +if (XFS_IFORK_FORMAT(ip, whichfork) != 
>> > > > > > > XFS_DINODE_FMT_LOCAL)
>> > > > > > > +return;
>> > > > > >
>> > > > > > I'm new to all this so this was a bit hard to follow. I'm confused 
>> > > > > > with how
>> > > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents 
>> > > > > > into the
>> > > > > > inode fork") exacerbated the leak, isn't that commit about
>> > > > > > XFS_DINODE_FMT_EXTENTS?
>> > > > >
>> > > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
>> > > > > incore data was small enough to fit in if_inline_ata.
>> > > >
>> > > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
>> > > >
>> > > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
>> > > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
>> > > > >
>> > > > > An empty directory is 6 bytes, which is what you get with a fresh 
>> > > > > mkdir
>> > > > > or after deleting everything in the directory.  Prior to the 
>> > > > > 43518812d2
>> > > > > patch we could get away with not even checking if we had to free 
>> > > > > if_data
>> > > > > when deleting a directory because it fit within if_inline_data.
>> > > >
>> > > > Ah got it. So your fix *is* also applicable even prior to commit 
>> > > > 43518812d2.
>> > >
>> > > You'd have to modify the patch so that it doesn't try to kmem_free
>> > > if_data if if_data == if_inline_data but otherwise (in theory) I think
>> > > that the concept applies to pre-4.15 kernels.
>> > >
>> > > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, 
>> > > etc...)
>> >
>> > Well... so we need a resolution and better get testing this already given 
>> > that
>> > *I believe* the new auto-selection algorithm used to cherry pick patches 
>> > onto
>> > stable for linux-4.14.y (covered on a paper [0] and when used, stable 
>> > patches
>> > are prefixed with AUTOSEL, a recent discussion covered this in November 
>> > 2017
>> > [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free 
>> > inline
>> > data before resetting inode fork during ifree") as stable commit 
>> > 1eccdbd4836a41
>> > on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
>> > inlining data/extents into the inode fork").
>> >
>> > Sasha, Greg,
>> >
>> > Can you confirm if the algorithm was used in this case?
>>
>> No idea.
>>
>> I think xfs should just be added to the "blacklist" so that it is not
>> even looked at for these types of auto-selected patches.  Much like the
>> i915 driver currently is handled (it too is ignored for these patches
>> due to objections from the maintainers of it.)
>
>Just out of curiosity, how does this autoselection mechanism work today?
>If it's smart enough to cherry pick patches, apply them to a kernel,
>build the kernel and run xfstests, and propose the patches if nothing
>weird happened, then I'd be interested in looking further.  I've nothing
>against algorithmic selection per se, but I'd want to know more about
>the data sets and parameters that feed the algorithm.

It won't go beyond build testing.

>I did receive the AUTOSEL tagged patches a few days ago, but I couldn't
>figure out what automated regression testing, if any, had been done; or
>whether the patch submission was asking if we wanted it put into 4.14
>or if it was a declaration that they were on their way in.  Excuse me

There would be (at least) 3 different mails involved in this process:

 1. You'd get a mail from me, proposing this patch for stable. We give
 at least 1 week (but usually closer to 2) to comment on whether this
 patch should or should not go in

Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree

2018-03-25 Thread Sasha Levin

On Sat, Mar 24, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote:
>On Sat, Mar 24, 2018 at 10:06:38AM +0100, Greg Kroah-Hartman wrote:
>> On Fri, Mar 23, 2018 at 06:23:02PM +, Luis R. Rodriguez wrote:
>> > On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
>> > > On Fri, Mar 23, 2018 at 05:08:13PM +, Luis R. Rodriguez wrote:
>> > > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
>> > > > > On Fri, Mar 23, 2018 at 01:30:37AM +, Luis R. Rodriguez wrote:
>> > > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
>> > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> > > > > > > index 61d1cb7..8012741 100644
>> > > > > > > --- a/fs/xfs/xfs_inode.c
>> > > > > > > +++ b/fs/xfs/xfs_inode.c
>> > > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
>> > > > > > >  }
>> > > > > > >
>> > > > > > >  /*
>> > > > > > > + * Free any local-format buffers sitting around before we reset 
>> > > > > > > to
>> > > > > > > + * extents format.
>> > > > > > > + */
>> > > > > > > +static inline void
>> > > > > > > +xfs_ifree_local_data(
>> > > > > > > +struct xfs_inode*ip,
>> > > > > > > +int whichfork)
>> > > > > > > +{
>> > > > > > > +struct xfs_ifork*ifp;
>> > > > > > > +
>> > > > > > > +if (XFS_IFORK_FORMAT(ip, whichfork) != 
>> > > > > > > XFS_DINODE_FMT_LOCAL)
>> > > > > > > +return;
>> > > > > >
>> > > > > > I'm new to all this so this was a bit hard to follow. I'm confused 
>> > > > > > with how
>> > > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents 
>> > > > > > into the
>> > > > > > inode fork") exacerbated the leak, isn't that commit about
>> > > > > > XFS_DINODE_FMT_EXTENTS?
>> > > > >
>> > > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
>> > > > > incore data was small enough to fit in if_inline_ata.
>> > > >
>> > > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
>> > > >
>> > > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
>> > > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
>> > > > >
>> > > > > An empty directory is 6 bytes, which is what you get with a fresh 
>> > > > > mkdir
>> > > > > or after deleting everything in the directory.  Prior to the 
>> > > > > 43518812d2
>> > > > > patch we could get away with not even checking if we had to free 
>> > > > > if_data
>> > > > > when deleting a directory because it fit within if_inline_data.
>> > > >
>> > > > Ah got it. So your fix *is* also applicable even prior to commit 
>> > > > 43518812d2.
>> > >
>> > > You'd have to modify the patch so that it doesn't try to kmem_free
>> > > if_data if if_data == if_inline_data but otherwise (in theory) I think
>> > > that the concept applies to pre-4.15 kernels.
>> > >
>> > > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, 
>> > > etc...)
>> >
>> > Well... so we need a resolution and better get testing this already given 
>> > that
>> > *I believe* the new auto-selection algorithm used to cherry pick patches 
>> > onto
>> > stable for linux-4.14.y (covered on a paper [0] and when used, stable 
>> > patches
>> > are prefixed with AUTOSEL, a recent discussion covered this in November 
>> > 2017
>> > [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free 
>> > inline
>> > data before resetting inode fork during ifree") as stable commit 
>> > 1eccdbd4836a41
>> > on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
>> > inlining data/extents into the inode fork").
>> >
>> > Sasha, Greg,
>> >
>> > Can you confirm if the algorithm was used in this case?
>>
>> No idea.
>>
>> I think xfs should just be added to the "blacklist" so that it is not
>> even looked at for these types of auto-selected patches.  Much like the
>> i915 driver currently is handled (it too is ignored for these patches
>> due to objections from the maintainers of it.)
>
>Just out of curiosity, how does this autoselection mechanism work today?
>If it's smart enough to cherry pick patches, apply them to a kernel,
>build the kernel and run xfstests, and propose the patches if nothing
>weird happened, then I'd be interested in looking further.  I've nothing
>against algorithmic selection per se, but I'd want to know more about
>the data sets and parameters that feed the algorithm.

It won't go beyond build testing.

>I did receive the AUTOSEL tagged patches a few days ago, but I couldn't
>figure out what automated regression testing, if any, had been done; or
>whether the patch submission was asking if we wanted it put into 4.14
>or if it was a declaration that they were on their way in.  Excuse me

There would be (at least) 3 different mails involved in this process:

 1. You'd get a mail from me, proposing this patch for stable. We give
 at least 1 week (but usually closer to 2) to comment on whether this
 patch should or should not go in

Re: [PATCH v2 00/13] Major code reorganization to make all i2c transfers working

2018-03-25 Thread Abhishek Sahu


On 2018-03-24 17:52, Wolfram Sang wrote:

On Mon, Mar 12, 2018 at 06:44:49PM +0530, Abhishek Sahu wrote:

* v2:

1. Address review comments in v1
2. Changed the license to SPDX
3. Changed commit messages for some of the patch having more detail
4. Removed event-based completion and changed transfer completion
   detection logic in interrupt handler
5. Removed dma_threshold and blk_mode_threshold from global structure
6. Improved determine mode logic for QUP v2 transfers
7. Fixed function comments
8. Fixed auto build test WARNING ‘idx' may be used uninitialized
   in this function
9. Renamed tx/rx_buf to tx/rx_cnt

* v1:

The current driver is failing in following test case
1. Handling of failure cases is not working in long run for BAM
   mode. It generates error message “bam-dma-engine 7884000.dma: 
Cannot

   free busy channel” sometimes.
2. Following I2C transfers are failing
   a. Single transfer with multiple read messages
   b. Single transfer with multiple read/write message with maximum
  allowed length per message (65K) in BAM mode
   c. Single transfer with write greater than 32 bytes in QUP v1 and
  write greater than 64 bytes in QUP v2 for non-DMA mode.
3. No handling is present for Block/FIFO interrupts. Any non-error
   interrupts are being treated as the transfer completion and then
   polling is being done for available/free bytes in FIFO.

To fix all these issues, major code changes are required. This patch
series fixes all the above issues and makes the driver interrupt based
instead of polling based. After these changes, all the mentioned test
cases are working properly.

The code changes have been tested for QUP v1 (IPQ8064) and QUP
v2 (IPQ8074) with sample application written over i2c-dev.

Abhishek Sahu (13):
  i2c: qup: fix copyrights and update to SPDX identifier
  i2c: qup: fixed releasing dma without flush operation completion
  i2c: qup: minor code reorganization for use_dma
  i2c: qup: remove redundant variables for BAM SG count
  i2c: qup: schedule EOT and FLUSH tags at the end of transfer
  i2c: qup: fix the transfer length for BAM RX EOT FLUSH tags
  i2c: qup: proper error handling for i2c error in BAM mode
  i2c: qup: use the complete transfer length to choose DMA mode
  i2c: qup: change completion timeout according to transfer length
  i2c: qup: fix buffer overflow for multiple msg of maximum xfer len
  i2c: qup: send NACK for last read sub transfers
  i2c: qup: reorganization of driver code to remove polling for qup v1
  i2c: qup: reorganization of driver code to remove polling for qup v2


Applied to for-next, thanks! Also thanks to the reviewers!


 Thanks Wolfram for your help in getting this big
 patch series applied to for-next.

 Thanks to Andy, Sricharan, Austin and other reviewers for
 reviewing/testing the patches.

 Regards,
 Abhishek

Re: [PATCH v2 00/13] Major code reorganization to make all i2c transfers working

2018-03-25 Thread Abhishek Sahu


On 2018-03-24 17:52, Wolfram Sang wrote:

On Mon, Mar 12, 2018 at 06:44:49PM +0530, Abhishek Sahu wrote:

* v2:

1. Address review comments in v1
2. Changed the license to SPDX
3. Changed commit messages for some of the patch having more detail
4. Removed event-based completion and changed transfer completion
   detection logic in interrupt handler
5. Removed dma_threshold and blk_mode_threshold from global structure
6. Improved determine mode logic for QUP v2 transfers
7. Fixed function comments
8. Fixed auto build test WARNING ‘idx' may be used uninitialized
   in this function
9. Renamed tx/rx_buf to tx/rx_cnt

* v1:

The current driver is failing in following test case
1. Handling of failure cases is not working in long run for BAM
   mode. It generates error message “bam-dma-engine 7884000.dma: 
Cannot

   free busy channel” sometimes.
2. Following I2C transfers are failing
   a. Single transfer with multiple read messages
   b. Single transfer with multiple read/write message with maximum
  allowed length per message (65K) in BAM mode
   c. Single transfer with write greater than 32 bytes in QUP v1 and
  write greater than 64 bytes in QUP v2 for non-DMA mode.
3. No handling is present for Block/FIFO interrupts. Any non-error
   interrupts are being treated as the transfer completion and then
   polling is being done for available/free bytes in FIFO.

To fix all these issues, major code changes are required. This patch
series fixes all the above issues and makes the driver interrupt based
instead of polling based. After these changes, all the mentioned test
cases are working properly.

The code changes have been tested for QUP v1 (IPQ8064) and QUP
v2 (IPQ8074) with sample application written over i2c-dev.

Abhishek Sahu (13):
  i2c: qup: fix copyrights and update to SPDX identifier
  i2c: qup: fixed releasing dma without flush operation completion
  i2c: qup: minor code reorganization for use_dma
  i2c: qup: remove redundant variables for BAM SG count
  i2c: qup: schedule EOT and FLUSH tags at the end of transfer
  i2c: qup: fix the transfer length for BAM RX EOT FLUSH tags
  i2c: qup: proper error handling for i2c error in BAM mode
  i2c: qup: use the complete transfer length to choose DMA mode
  i2c: qup: change completion timeout according to transfer length
  i2c: qup: fix buffer overflow for multiple msg of maximum xfer len
  i2c: qup: send NACK for last read sub transfers
  i2c: qup: reorganization of driver code to remove polling for qup v1
  i2c: qup: reorganization of driver code to remove polling for qup v2


Applied to for-next, thanks! Also thanks to the reviewers!


 Thanks Wolfram for your help in getting this big
 patch series applied to for-next.

 Thanks to Andy, Sricharan, Austin and other reviewers for
 reviewing/testing the patches.

 Regards,
 Abhishek

Re: [PATCH v6 05/21] tracing: probeevent: Cleanup print argument functions

2018-03-25 Thread Masami Hiramatsu

On Fri, 23 Mar 2018 12:36:47 -0400
Steven Rostedt  wrote:

> On Sat, 17 Mar 2018 21:41:12 +0900
> Masami Hiramatsu  wrote:
> 
> > Current print argument functions prints the argument
> > name too. It is not good for printing out multiple
> > values for one argument. This change it to just print
> > out the value.
> 
> Hi Masami,
> 
> This is a confusing change log, as I have no idea what this patch does.
> Can you add a "before" and "after" of what you mean. Some examples of
> what it currently does to show why it looks bad, and then an example of
> what it looks like after the patch.

OK, this is actually just a cleanup patch. No functional difference between 
"before" and "after". For more flexible argument, like array type, we need
to decouple with argument name and its value printing. Is below more clear
to you?

Cleanup argument-printing functions to decouple it into name-printing and
value-printing, so that it can support more flexible argument expression,
like array type.

Thanks,

> 
> Thanks!
> 
> -- Steve
> 
> 
> > 
> > Signed-off-by: Masami Hiramatsu 
> > 

-- 
Masami Hiramatsu

Re: [PATCH v6 05/21] tracing: probeevent: Cleanup print argument functions

2018-03-25 Thread Masami Hiramatsu

On Fri, 23 Mar 2018 12:36:47 -0400
Steven Rostedt  wrote:

> On Sat, 17 Mar 2018 21:41:12 +0900
> Masami Hiramatsu  wrote:
> 
> > Current print argument functions prints the argument
> > name too. It is not good for printing out multiple
> > values for one argument. This change it to just print
> > out the value.
> 
> Hi Masami,
> 
> This is a confusing change log, as I have no idea what this patch does.
> Can you add a "before" and "after" of what you mean. Some examples of
> what it currently does to show why it looks bad, and then an example of
> what it looks like after the patch.

OK, this is actually just a cleanup patch. No functional difference between 
"before" and "after". For more flexible argument, like array type, we need
to decouple with argument name and its value printing. Is below more clear
to you?

Cleanup argument-printing functions to decouple it into name-printing and
value-printing, so that it can support more flexible argument expression,
like array type.

Thanks,

> 
> Thanks!
> 
> -- Steve
> 
> 
> > 
> > Signed-off-by: Masami Hiramatsu 
> > 

-- 
Masami Hiramatsu

Re: [PATCH v1 09/16] rtc: mediatek: convert to use device managed functions

2018-03-25 Thread Sean Wang

On Fri, 2018-03-23 at 11:50 +0100, Alexandre Belloni wrote:
> On 23/03/2018 at 17:15:06 +0800, sean.w...@mediatek.com wrote:
> > From: Sean Wang 
> > 
> > Use device managed operation to simplify error handling, reduce source
> > code size, and reduce the likelyhood of bugs, and remove our removal
> > callback which contains anything already done by device managed functions.
> > 
> > Signed-off-by: Sean Wang 
> > ---
> >  drivers/rtc/rtc-mt6397.c | 31 ---
> >  1 file changed, 8 insertions(+), 23 deletions(-)
> > 
> > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c
> > index cefb83b..bfc5d6f 100644
> > --- a/drivers/rtc/rtc-mt6397.c
> > +++ b/drivers/rtc/rtc-mt6397.c
> > @@ -14,6 +14,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -328,10 +329,10 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> >  
> > platform_set_drvdata(pdev, rtc);
> >  
> > -   ret = request_threaded_irq(rtc->irq, NULL,
> > -  mtk_rtc_irq_handler_thread,
> > -  IRQF_ONESHOT | IRQF_TRIGGER_HIGH,
> > -  "mt6397-rtc", rtc);
> > +   ret = devm_request_threaded_irq(>dev, rtc->irq, NULL,
> > +   mtk_rtc_irq_handler_thread,
> > +   IRQF_ONESHOT | IRQF_TRIGGER_HIGH,
> > +   "mt6397-rtc", rtc);
> > if (ret) {
> > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n",
> > rtc->irq, ret);
> > @@ -340,30 +341,15 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> >  
> > device_init_wakeup(>dev, 1);
> >  
> > -   rtc->rtc_dev = rtc_device_register("mt6397-rtc", >dev,
> > -  _rtc_ops, THIS_MODULE);
> > +   rtc->rtc_dev = devm_rtc_device_register(>dev, "mt6397-rtc",
> > +   _rtc_ops, THIS_MODULE);
> 
> You should probably switch to devm_rtc_allocate_device() and
> rtc_register_device instead of devm_rtc_device_register.
> 

Just would like to know something details

It seems you just encourage me to switch into the new registration
method and currently devm_rtc_device_register I used for the driver
shouldn't cause any harm. right?

> > if (IS_ERR(rtc->rtc_dev)) {
> > dev_err(>dev, "register rtc device failed\n");
> > ret = PTR_ERR(rtc->rtc_dev);
> > -   goto out_free_irq;
> > +   return ret;
> 
> ret doesn't seem necessary anymore here.


okay, it'll be removed

> 
>

Re: [PATCH v1 09/16] rtc: mediatek: convert to use device managed functions

2018-03-25 Thread Sean Wang

On Fri, 2018-03-23 at 11:50 +0100, Alexandre Belloni wrote:
> On 23/03/2018 at 17:15:06 +0800, sean.w...@mediatek.com wrote:
> > From: Sean Wang 
> > 
> > Use device managed operation to simplify error handling, reduce source
> > code size, and reduce the likelyhood of bugs, and remove our removal
> > callback which contains anything already done by device managed functions.
> > 
> > Signed-off-by: Sean Wang 
> > ---
> >  drivers/rtc/rtc-mt6397.c | 31 ---
> >  1 file changed, 8 insertions(+), 23 deletions(-)
> > 
> > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c
> > index cefb83b..bfc5d6f 100644
> > --- a/drivers/rtc/rtc-mt6397.c
> > +++ b/drivers/rtc/rtc-mt6397.c
> > @@ -14,6 +14,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -328,10 +329,10 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> >  
> > platform_set_drvdata(pdev, rtc);
> >  
> > -   ret = request_threaded_irq(rtc->irq, NULL,
> > -  mtk_rtc_irq_handler_thread,
> > -  IRQF_ONESHOT | IRQF_TRIGGER_HIGH,
> > -  "mt6397-rtc", rtc);
> > +   ret = devm_request_threaded_irq(>dev, rtc->irq, NULL,
> > +   mtk_rtc_irq_handler_thread,
> > +   IRQF_ONESHOT | IRQF_TRIGGER_HIGH,
> > +   "mt6397-rtc", rtc);
> > if (ret) {
> > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n",
> > rtc->irq, ret);
> > @@ -340,30 +341,15 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> >  
> > device_init_wakeup(>dev, 1);
> >  
> > -   rtc->rtc_dev = rtc_device_register("mt6397-rtc", >dev,
> > -  _rtc_ops, THIS_MODULE);
> > +   rtc->rtc_dev = devm_rtc_device_register(>dev, "mt6397-rtc",
> > +   _rtc_ops, THIS_MODULE);
> 
> You should probably switch to devm_rtc_allocate_device() and
> rtc_register_device instead of devm_rtc_device_register.
> 

Just would like to know something details

It seems you just encourage me to switch into the new registration
method and currently devm_rtc_device_register I used for the driver
shouldn't cause any harm. right?

> > if (IS_ERR(rtc->rtc_dev)) {
> > dev_err(>dev, "register rtc device failed\n");
> > ret = PTR_ERR(rtc->rtc_dev);
> > -   goto out_free_irq;
> > +   return ret;
> 
> ret doesn't seem necessary anymore here.


okay, it'll be removed

> 
>

Re: [PATCH v6 21/21] perf-probe: Add array argument support

2018-03-25 Thread Masami Hiramatsu

On Thu, 22 Mar 2018 16:19:46 +0530
Ravi Bangoria  wrote:

> Hi Masami :)
> 
> On 03/22/2018 03:53 PM, Masami Hiramatsu wrote:
> > On Mon, 19 Mar 2018 13:29:59 +0530
> > Ravi Bangoria  wrote:
> >
> >>
> >> Is it okay to allow user to specify array size with type field?
> > Fro this patch, yes.
> 
> So IIUC, perf _tool_ will allow user to record array either with "name[range]"
> or by "name:type[length]". Please correct me if that's wrong.

Yes, it is correct.

> And If perf tool will allow array length along with TYPE field, I guess we 
> should
> document that in man perf-probe?

Ah, right. OK, I'll add it.

Thanks!

> 
> Otherwise,
> 
> Acked-by: Ravi Bangoria 
> 
> Thanks,
> Ravi
> 
> >  The availability of type is checked only when
> > it is automatically generated.
> > IMO, it should be done in another patch, something like
> > "Validate user specified type casting" patch. Would you need it?
> >
> > Thank you,
> >
> 


-- 
Masami Hiramatsu

Re: [PATCH v6 21/21] perf-probe: Add array argument support

2018-03-25 Thread Masami Hiramatsu

On Thu, 22 Mar 2018 16:19:46 +0530
Ravi Bangoria  wrote:

> Hi Masami :)
> 
> On 03/22/2018 03:53 PM, Masami Hiramatsu wrote:
> > On Mon, 19 Mar 2018 13:29:59 +0530
> > Ravi Bangoria  wrote:
> >
> >>
> >> Is it okay to allow user to specify array size with type field?
> > Fro this patch, yes.
> 
> So IIUC, perf _tool_ will allow user to record array either with "name[range]"
> or by "name:type[length]". Please correct me if that's wrong.

Yes, it is correct.

> And If perf tool will allow array length along with TYPE field, I guess we 
> should
> document that in man perf-probe?

Ah, right. OK, I'll add it.

Thanks!

> 
> Otherwise,
> 
> Acked-by: Ravi Bangoria 
> 
> Thanks,
> Ravi
> 
> >  The availability of type is checked only when
> > it is automatically generated.
> > IMO, it should be done in another patch, something like
> > "Validate user specified type casting" patch. Would you need it?
> >
> > Thank you,
> >
> 


-- 
Masami Hiramatsu

Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers

2018-03-25 Thread Al Viro

On Mon, Mar 26, 2018 at 01:40:17AM +0100, Al Viro wrote:

> Kinda-sorta part:
>   * asmlinkage_protect is taken out for now, so m68k has problems.
>   * syscalls that run out of 6 slots barf violently.  For mips it's
> wrong (there we have 8 slots); for stuff like arm and ppc it's right, but
> it means that things like e.g. compat sync_file_range() should not even
> be compiled on those.  __ARCH_WANT_SYS_SYNC_FILE_RANGE, presumably...
> In any case, we *can't* do pt_regs-based wrappers for those syscalls on
> such architectures, so ifdefs around those puppies are probably the right
> thing to do.
>   * s390 macrology in compat_wrapper.c not even touched; it needs
> a trivial update to keep working (__MAP callbacks take an extra argument,
> unused for those users).
>   * sys_... and compat_sys_... aliases are unchanged; if we kill
> direct callers, we can trivially rename SyS##name and compat_SyS##name
> to sys##name and compat_sys##name and get rid of aliases.

* mips n32 and x86 x32 can become an extra source of headache.
That actually applies to any plans of passing struct pt_regs *.  As it
is, e.g. syscall 515 on amd64 is compat_sys_readv().  Dispatched via
this:
/*
 * NB: Native and x32 syscalls are dispatched from the same
 * table.  The only functional difference is the x32 bit in
 * regs->orig_ax, which changes the behavior of some syscalls.
 */
if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
regs->ax = sys_call_table[nr](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
}
Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched
via
nr = array_index_nospec(nr, IA32_NR_syscalls);
/*
 * It's possible that a 32-bit syscall implementation
 * takes a 64-bit parameter but nonetheless assumes that
 * the high bits are zero.  Make sure we zero-extend all
 * of the args.
 */
regs->ax = ia32_sys_call_table[nr](
(unsigned int)regs->bx, (unsigned int)regs->cx,
(unsigned int)regs->dx, (unsigned int)regs->si,
(unsigned int)regs->di, (unsigned int)regs->bp);
Right now it works - we call the same function, passing it arguments picked
from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one).
But if we switch to passing struct pt_regs * and have the wrapper fetch
regs->{bx,cx,dx}, we have a problem.  It won't work for both entry points.

IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs
and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing
them and arranging them into arguments expected by syscall body.

Linus, Dominik - how do you plan dealing with that fun?  Regardless of the
way we generate the glue, the issue remains.  We can't get the same
struct pt_regs *-taking function for both; we either need to produce
a separate chunk of glue for each compat_sys_... involved (either making
COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE
for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that)
or we need to have the registers-to-slots mapping done in dispatcher...

Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers

2018-03-25 Thread Al Viro

On Mon, Mar 26, 2018 at 01:40:17AM +0100, Al Viro wrote:

> Kinda-sorta part:
>   * asmlinkage_protect is taken out for now, so m68k has problems.
>   * syscalls that run out of 6 slots barf violently.  For mips it's
> wrong (there we have 8 slots); for stuff like arm and ppc it's right, but
> it means that things like e.g. compat sync_file_range() should not even
> be compiled on those.  __ARCH_WANT_SYS_SYNC_FILE_RANGE, presumably...
> In any case, we *can't* do pt_regs-based wrappers for those syscalls on
> such architectures, so ifdefs around those puppies are probably the right
> thing to do.
>   * s390 macrology in compat_wrapper.c not even touched; it needs
> a trivial update to keep working (__MAP callbacks take an extra argument,
> unused for those users).
>   * sys_... and compat_sys_... aliases are unchanged; if we kill
> direct callers, we can trivially rename SyS##name and compat_SyS##name
> to sys##name and compat_sys##name and get rid of aliases.

* mips n32 and x86 x32 can become an extra source of headache.
That actually applies to any plans of passing struct pt_regs *.  As it
is, e.g. syscall 515 on amd64 is compat_sys_readv().  Dispatched via
this:
/*
 * NB: Native and x32 syscalls are dispatched from the same
 * table.  The only functional difference is the x32 bit in
 * regs->orig_ax, which changes the behavior of some syscalls.
 */
if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
regs->ax = sys_call_table[nr](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
}
Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched
via
nr = array_index_nospec(nr, IA32_NR_syscalls);
/*
 * It's possible that a 32-bit syscall implementation
 * takes a 64-bit parameter but nonetheless assumes that
 * the high bits are zero.  Make sure we zero-extend all
 * of the args.
 */
regs->ax = ia32_sys_call_table[nr](
(unsigned int)regs->bx, (unsigned int)regs->cx,
(unsigned int)regs->dx, (unsigned int)regs->si,
(unsigned int)regs->di, (unsigned int)regs->bp);
Right now it works - we call the same function, passing it arguments picked
from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one).
But if we switch to passing struct pt_regs * and have the wrapper fetch
regs->{bx,cx,dx}, we have a problem.  It won't work for both entry points.

IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs
and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing
them and arranging them into arguments expected by syscall body.

Linus, Dominik - how do you plan dealing with that fun?  Regardless of the
way we generate the glue, the issue remains.  We can't get the same
struct pt_regs *-taking function for both; we either need to produce
a separate chunk of glue for each compat_sys_... involved (either making
COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE
for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that)
or we need to have the registers-to-slots mapping done in dispatcher...

Re: possible deadlock in handle_rx

2018-03-25 Thread Jason Wang




On 2018年03月26日 08:01, syzbot wrote:

Hello,

syzbot hit the following crash on upstream commit
cb6416592bc2a8b731dabcec0d63cda270764fc6 (Sun Mar 25 17:45:10 2018 +)
Merge tag 'dmaengine-fix-4.16-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma
syzbot dashboard link: 
https://syzkaller.appspot.com/bug?extid=7f073540b1384a614e09


So far this crash happened 4 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6506789075943424
syzkaller reproducer: 
https://syzkaller.appspot.com/x/repro.syz?id=5716250550337536
Raw console output: 
https://syzkaller.appspot.com/x/log.txt?id=5142038655795200
Kernel config: 
https://syzkaller.appspot.com/x/.config?id=-5034017172441945317

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the 
commit:

Reported-by: syzbot+7f073540b1384a614...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for 
details.

If you forward the report, please keep this part and the footer.



WARNING: possible recursive locking detected
4.16.0-rc6+ #366 Not tainted

vhost-4248/4760 is trying to acquire lock:
 (>mutex){+.+.}, at: [<3482bddc>] 
vhost_net_rx_peek_head_len drivers/vhost/net.c:633 [inline]
 (>mutex){+.+.}, at: [<3482bddc>] handle_rx+0xeb1/0x19c0 
drivers/vhost/net.c:784


but task is already holding lock:
 (>mutex){+.+.}, at: [<4de72f44>] handle_rx+0x1f5/0x19c0 
drivers/vhost/net.c:766


other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(>mutex);
  lock(>mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation


Yes, it's a missing of nesting notation.

Will post a patch soon.

Thanks

Re: possible deadlock in handle_rx

2018-03-25 Thread Jason Wang




On 2018年03月26日 08:01, syzbot wrote:

Hello,

syzbot hit the following crash on upstream commit
cb6416592bc2a8b731dabcec0d63cda270764fc6 (Sun Mar 25 17:45:10 2018 +)
Merge tag 'dmaengine-fix-4.16-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma
syzbot dashboard link: 
https://syzkaller.appspot.com/bug?extid=7f073540b1384a614e09


So far this crash happened 4 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6506789075943424
syzkaller reproducer: 
https://syzkaller.appspot.com/x/repro.syz?id=5716250550337536
Raw console output: 
https://syzkaller.appspot.com/x/log.txt?id=5142038655795200
Kernel config: 
https://syzkaller.appspot.com/x/.config?id=-5034017172441945317

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the 
commit:

Reported-by: syzbot+7f073540b1384a614...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for 
details.

If you forward the report, please keep this part and the footer.



WARNING: possible recursive locking detected
4.16.0-rc6+ #366 Not tainted

vhost-4248/4760 is trying to acquire lock:
 (>mutex){+.+.}, at: [<3482bddc>] 
vhost_net_rx_peek_head_len drivers/vhost/net.c:633 [inline]
 (>mutex){+.+.}, at: [<3482bddc>] handle_rx+0xeb1/0x19c0 
drivers/vhost/net.c:784


but task is already holding lock:
 (>mutex){+.+.}, at: [<4de72f44>] handle_rx+0x1f5/0x19c0 
drivers/vhost/net.c:766


other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(>mutex);
  lock(>mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation


Yes, it's a missing of nesting notation.

Will post a patch soon.

Thanks

Re: linux-next: manual merge of the drm tree with Linus' tree

2018-03-25 Thread Stephen Rothwell

Hi all,

On Thu, 22 Mar 2018 17:37:22 +1100 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the drm tree got conflicts in several amdgpu
> files because there are a set of (mostly identical) patches that appear
> Linus' tree and the drm tree.  In each case I just used the version fo
> the file from the drm tree.
> 
> You should do a test merge between your tree and Linus' tree and see what
> you want to do about the resolution (either do the back merge (I think
> with v4.16-rc6), or provide Linus with branch that has the merge done).
> Its a bit of a mess :-(

I got a few more of these today.
-- 
Cheers,
Stephen Rothwell


pgpqw3NCsq8IR.pgp
Description: OpenPGP digital signature

Re: linux-next: manual merge of the drm tree with Linus' tree

2018-03-25 Thread Stephen Rothwell

Hi all,

On Thu, 22 Mar 2018 17:37:22 +1100 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the drm tree got conflicts in several amdgpu
> files because there are a set of (mostly identical) patches that appear
> Linus' tree and the drm tree.  In each case I just used the version fo
> the file from the drm tree.
> 
> You should do a test merge between your tree and Linus' tree and see what
> you want to do about the resolution (either do the back merge (I think
> with v4.16-rc6), or provide Linus with branch that has the merge done).
> Its a bit of a mess :-(

I got a few more of these today.
-- 
Cheers,
Stephen Rothwell


pgpqw3NCsq8IR.pgp
Description: OpenPGP digital signature

Re: [RFC PATCH V2 0/8] Packed ring for vhost

2018-03-25 Thread Jason Wang


cc Jens, Tiwei and Wei

Thanks


On 2018年03月26日 11:38, Jason Wang wrote:

Hi all:

This RFC implement packed ring layout. The code were tested with pmd
implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change
was needed for pmd codes to kick virtqueue since it assumes a busy
polling backend.

Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

Notes: The event suppression /indirect descriptor support is complied
test only because of lacked driver support.

Changes from V1:

- Refactor vhost used elem code to avoid open coding on used elem
- Event suppression support (compile test only).
- Indirect descriptor support (compile test only).
- Zerocopy support.
- vIOMMU support.
- SCSI/VSOCK support (compile test only).
- Fix several bugs

For simplicity, I don't implement batching or other optimizations.

Please review.

Thanks

Jason Wang (8):
   vhost: move get_rx_bufs to vhost.c
   vhost: hide used ring layout from device
   vhost: do not use vring_used_elem
   vhost_net: do not explicitly manipulate vhost_used_elem
   vhost: vhost_put_user() can accept metadata type
   virtio: introduce packed ring defines
   vhost: packed ring support
   vhost: event suppression for packed ring

  drivers/vhost/net.c| 138 ++-
  drivers/vhost/scsi.c   |  62 +--
  drivers/vhost/vhost.c  | 818 ++---
  drivers/vhost/vhost.h  |  46 ++-
  drivers/vhost/vsock.c  |  42 +-
  include/uapi/linux/virtio_config.h |   9 +
  include/uapi/linux/virtio_ring.h   |  32 ++
  7 files changed, 921 insertions(+), 226 deletions(-)

Re: [RFC PATCH V2 0/8] Packed ring for vhost

2018-03-25 Thread Jason Wang


cc Jens, Tiwei and Wei

Thanks


On 2018年03月26日 11:38, Jason Wang wrote:

Hi all:

This RFC implement packed ring layout. The code were tested with pmd
implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change
was needed for pmd codes to kick virtqueue since it assumes a busy
polling backend.

Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

Notes: The event suppression /indirect descriptor support is complied
test only because of lacked driver support.

Changes from V1:

- Refactor vhost used elem code to avoid open coding on used elem
- Event suppression support (compile test only).
- Indirect descriptor support (compile test only).
- Zerocopy support.
- vIOMMU support.
- SCSI/VSOCK support (compile test only).
- Fix several bugs

For simplicity, I don't implement batching or other optimizations.

Please review.

Thanks

Jason Wang (8):
   vhost: move get_rx_bufs to vhost.c
   vhost: hide used ring layout from device
   vhost: do not use vring_used_elem
   vhost_net: do not explicitly manipulate vhost_used_elem
   vhost: vhost_put_user() can accept metadata type
   virtio: introduce packed ring defines
   vhost: packed ring support
   vhost: event suppression for packed ring

  drivers/vhost/net.c| 138 ++-
  drivers/vhost/scsi.c   |  62 +--
  drivers/vhost/vhost.c  | 818 ++---
  drivers/vhost/vhost.h  |  46 ++-
  drivers/vhost/vsock.c  |  42 +-
  include/uapi/linux/virtio_config.h |   9 +
  include/uapi/linux/virtio_ring.h   |  32 ++
  7 files changed, 921 insertions(+), 226 deletions(-)

Re: KASAN: use-after-free Read in __list_del_entry_valid (4)

2018-03-25 Thread syzbot


syzbot has found reproducer for the following crash on upstream commit
3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
Linux 4.16-rc7
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=29ee8f76017ce6cf03da


So far this crash happened 4 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4763014771245056
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5870647779524608
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4652258302099456
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-2340295454854568752

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+29ee8f76017ce6cf0...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
8021q: adding VLAN 0 to HW filter on device bond0
IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
==
BUG: KASAN: use-after-free in __list_del_entry_valid+0x144/0x150  
lib/list_debug.c:54

Read of size 8 at addr 8801b6022fa0 by task syzkaller871713/4346

CPU: 1 PID: 4346 Comm: syzkaller871713 Not tainted 4.16.0-rc7+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x23c/0x360 mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 __list_del_entry_valid+0x144/0x150 lib/list_debug.c:54
 __list_del_entry include/linux/list.h:117 [inline]
 list_del include/linux/list.h:125 [inline]
 cma_cancel_listens drivers/infiniband/core/cma.c:1569 [inline]
 cma_cancel_operation+0x455/0xd60 drivers/infiniband/core/cma.c:1597
 rdma_destroy_id+0xff/0xda0 drivers/infiniband/core/cma.c:1661
 ucma_close+0x100/0x2f0 drivers/infiniband/core/ucma.c:1728
 __fput+0x327/0x7e0 fs/file_table.c:209
 fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x199/0x270 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x9bb/0x1ad0 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:968
 get_signal+0x73a/0x16d0 kernel/signal.c:2469
 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x447529
RSP: 002b:7f782c2b0cf8 EFLAGS: 0202 ORIG_RAX: 00ca
RAX: 0001 RBX: 006ddc5c RCX: 00447529
RDX: 00447529 RSI: 0001 RDI: 006ddc5c
RBP: 006ddc58 R08:  R09: 
R10:  R11: 0202 R12: 
R13: 7fff8bb3d8cf R14: 7f782c2b19c0 R15: 0005

Allocated by task 4343:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552
 kmem_cache_alloc_trace+0x136/0x740 mm/slab.c:3607
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 rdma_create_id+0xd0/0x630 drivers/infiniband/core/cma.c:787
 ucma_create_id+0x35f/0x920 drivers/infiniband/core/ucma.c:480
 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1649
 __vfs_write+0xef/0x970 fs/read_write.c:480
 vfs_write+0x189/0x510 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:581
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Freed by task 4346:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527
 __cache_free mm/slab.c:3485 [inline]
 kfree+0xd9/0x260 mm/slab.c:3800
 rdma_destroy_id+0x821/0xda0 drivers/infiniband/core/cma.c:1691
 ucma_close+0x100/0x2f0 drivers/infiniband/core/ucma.c:1728
 __fput+0x327/0x7e0 fs/file_table.c:209
 fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x199/0x270 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x9bb/0x1ad0 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:968
 get_signal+0x73a/0x16d0 kernel/signal.c:2469
 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ec/0x940

Re: KASAN: use-after-free Read in __list_del_entry_valid (4)

2018-03-25 Thread syzbot


syzbot has found reproducer for the following crash on upstream commit
3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
Linux 4.16-rc7
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=29ee8f76017ce6cf03da


So far this crash happened 4 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4763014771245056
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5870647779524608
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4652258302099456
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-2340295454854568752

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+29ee8f76017ce6cf0...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed.

IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
8021q: adding VLAN 0 to HW filter on device bond0
IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
==
BUG: KASAN: use-after-free in __list_del_entry_valid+0x144/0x150  
lib/list_debug.c:54

Read of size 8 at addr 8801b6022fa0 by task syzkaller871713/4346

CPU: 1 PID: 4346 Comm: syzkaller871713 Not tainted 4.16.0-rc7+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x23c/0x360 mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 __list_del_entry_valid+0x144/0x150 lib/list_debug.c:54
 __list_del_entry include/linux/list.h:117 [inline]
 list_del include/linux/list.h:125 [inline]
 cma_cancel_listens drivers/infiniband/core/cma.c:1569 [inline]
 cma_cancel_operation+0x455/0xd60 drivers/infiniband/core/cma.c:1597
 rdma_destroy_id+0xff/0xda0 drivers/infiniband/core/cma.c:1661
 ucma_close+0x100/0x2f0 drivers/infiniband/core/ucma.c:1728
 __fput+0x327/0x7e0 fs/file_table.c:209
 fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x199/0x270 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x9bb/0x1ad0 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:968
 get_signal+0x73a/0x16d0 kernel/signal.c:2469
 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x447529
RSP: 002b:7f782c2b0cf8 EFLAGS: 0202 ORIG_RAX: 00ca
RAX: 0001 RBX: 006ddc5c RCX: 00447529
RDX: 00447529 RSI: 0001 RDI: 006ddc5c
RBP: 006ddc58 R08:  R09: 
R10:  R11: 0202 R12: 
R13: 7fff8bb3d8cf R14: 7f782c2b19c0 R15: 0005

Allocated by task 4343:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552
 kmem_cache_alloc_trace+0x136/0x740 mm/slab.c:3607
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 rdma_create_id+0xd0/0x630 drivers/infiniband/core/cma.c:787
 ucma_create_id+0x35f/0x920 drivers/infiniband/core/ucma.c:480
 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1649
 __vfs_write+0xef/0x970 fs/read_write.c:480
 vfs_write+0x189/0x510 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:581
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Freed by task 4346:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527
 __cache_free mm/slab.c:3485 [inline]
 kfree+0xd9/0x260 mm/slab.c:3800
 rdma_destroy_id+0x821/0xda0 drivers/infiniband/core/cma.c:1691
 ucma_close+0x100/0x2f0 drivers/infiniband/core/ucma.c:1728
 __fput+0x327/0x7e0 fs/file_table.c:209
 fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x199/0x270 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x9bb/0x1ad0 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:968
 get_signal+0x73a/0x16d0 kernel/signal.c:2469
 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ec/0x940

[RFC PATCH V2 8/8] vhost: event suppression for packed ring

2018-03-25 Thread Jason Wang

This patch introduces basic support for event suppression aka driver
and device area. Compile tested only.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c| 169 ---
 drivers/vhost/vhost.h|  10 ++-
 include/uapi/linux/virtio_ring.h |  19 +
 3 files changed, 183 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6177e4d..ff83a2e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1143,10 +1143,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue 
*vq, unsigned int num,
   struct vring_used __user *used)
 {
struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
+   struct vring_packed_desc_event *driver_event =
+   (struct vring_packed_desc_event *)avail;
+   struct vring_packed_desc_event *device_event =
+   (struct vring_packed_desc_event *)used;
 
-   /* FIXME: check device area and driver area */
return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
-  access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
+  access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&
+  access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) &&
+  access_ok(VERIFY_WRITE, device_event, sizeof(*device_event));
 }
 
 static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
@@ -1222,14 +1227,27 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
return true;
 }
 
-int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
+int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
+{
+   int num = vq->num;
+
+   return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
+  num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
+  iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
+  num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
+  iotlb_access_ok(vq, VHOST_ACCESS_RO,
+  (u64)(uintptr_t)vq->driver_event,
+  sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
+  iotlb_access_ok(vq, VHOST_ACCESS_WO,
+  (u64)(uintptr_t)vq->device_event,
+  sizeof(*vq->device_event), VHOST_ADDR_USED);
+}
+
+int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
 {
size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
unsigned int num = vq->num;
 
-   if (!vq->iotlb)
-   return 1;
-
return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
   num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
   iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
@@ -1241,6 +1259,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
   num * sizeof(*vq->used->ring) + s,
   VHOST_ADDR_USED);
 }
+
+int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
+{
+   if (!vq->iotlb)
+   return 1;
+
+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+   return vq_iotlb_prefetch_packed(vq);
+   else
+   return vq_iotlb_prefetch_split(vq);
+}
 EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
 
 /* Can we log writes? */
@@ -1756,6 +1785,29 @@ static int vhost_update_used_flags(struct 
vhost_virtqueue *vq)
return 0;
 }
 
+static int vhost_update_device_flags(struct vhost_virtqueue *vq,
+__virtio16 device_flags)
+{
+   void __user *flags;
+
+   if (vhost_put_user(vq, cpu_to_vhost16(vq, device_flags),
+  >device_event->desc_event_flags,
+  VHOST_ADDR_DESC) < 0)
+   return -EFAULT;
+   if (unlikely(vq->log_used)) {
+   /* Make sure the flag is seen before log. */
+   smp_wmb();
+   /* Log used flag write. */
+   flags = >device_event->desc_event_flags;
+   log_write(vq->log_base, vq->log_addr +
+ (flags - (void __user *)vq->device_event),
+ sizeof(vq->used->flags));
+   if (vq->log_ctx)
+   eventfd_signal(vq->log_ctx, 1);
+   }
+   return 0;
+}
+
 static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 
avail_event)
 {
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
@@ -2667,16 +2719,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
-static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static bool vhost_notify_split(struct vhost_dev *dev,
+  struct vhost_virtqueue *vq)
 {
__u16 old, new;
__virtio16 event;
bool v;
 
-   /*

[RFC PATCH V2 8/8] vhost: event suppression for packed ring

2018-03-25 Thread Jason Wang

This patch introduces basic support for event suppression aka driver
and device area. Compile tested only.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c| 169 ---
 drivers/vhost/vhost.h|  10 ++-
 include/uapi/linux/virtio_ring.h |  19 +
 3 files changed, 183 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6177e4d..ff83a2e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1143,10 +1143,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue 
*vq, unsigned int num,
   struct vring_used __user *used)
 {
struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
+   struct vring_packed_desc_event *driver_event =
+   (struct vring_packed_desc_event *)avail;
+   struct vring_packed_desc_event *device_event =
+   (struct vring_packed_desc_event *)used;
 
-   /* FIXME: check device area and driver area */
return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
-  access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
+  access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&
+  access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) &&
+  access_ok(VERIFY_WRITE, device_event, sizeof(*device_event));
 }
 
 static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
@@ -1222,14 +1227,27 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
return true;
 }
 
-int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
+int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
+{
+   int num = vq->num;
+
+   return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
+  num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
+  iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
+  num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
+  iotlb_access_ok(vq, VHOST_ACCESS_RO,
+  (u64)(uintptr_t)vq->driver_event,
+  sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
+  iotlb_access_ok(vq, VHOST_ACCESS_WO,
+  (u64)(uintptr_t)vq->device_event,
+  sizeof(*vq->device_event), VHOST_ADDR_USED);
+}
+
+int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
 {
size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
unsigned int num = vq->num;
 
-   if (!vq->iotlb)
-   return 1;
-
return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
   num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
   iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
@@ -1241,6 +1259,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
   num * sizeof(*vq->used->ring) + s,
   VHOST_ADDR_USED);
 }
+
+int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
+{
+   if (!vq->iotlb)
+   return 1;
+
+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+   return vq_iotlb_prefetch_packed(vq);
+   else
+   return vq_iotlb_prefetch_split(vq);
+}
 EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
 
 /* Can we log writes? */
@@ -1756,6 +1785,29 @@ static int vhost_update_used_flags(struct 
vhost_virtqueue *vq)
return 0;
 }
 
+static int vhost_update_device_flags(struct vhost_virtqueue *vq,
+__virtio16 device_flags)
+{
+   void __user *flags;
+
+   if (vhost_put_user(vq, cpu_to_vhost16(vq, device_flags),
+  >device_event->desc_event_flags,
+  VHOST_ADDR_DESC) < 0)
+   return -EFAULT;
+   if (unlikely(vq->log_used)) {
+   /* Make sure the flag is seen before log. */
+   smp_wmb();
+   /* Log used flag write. */
+   flags = >device_event->desc_event_flags;
+   log_write(vq->log_base, vq->log_addr +
+ (flags - (void __user *)vq->device_event),
+ sizeof(vq->used->flags));
+   if (vq->log_ctx)
+   eventfd_signal(vq->log_ctx, 1);
+   }
+   return 0;
+}
+
 static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 
avail_event)
 {
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
@@ -2667,16 +2719,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
-static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static bool vhost_notify_split(struct vhost_dev *dev,
+  struct vhost_virtqueue *vq)
 {
__u16 old, new;
__virtio16 event;
bool v;
 
-   /* FIXME: check driver

[RFC PATCH V2 7/8] vhost: packed ring support

2018-03-25 Thread Jason Wang

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   |   5 +-
 drivers/vhost/vhost.c | 530 ++
 drivers/vhost/vhost.h |   7 +-
 3 files changed, 505 insertions(+), 37 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7be8b55..84905d5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -67,7 +67,8 @@ enum {
VHOST_NET_FEATURES = VHOST_FEATURES |
 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
 (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-(1ULL << VIRTIO_F_IOMMU_PLATFORM)
+(1ULL << VIRTIO_F_IOMMU_PLATFORM) |
+(1ULL << VIRTIO_F_RING_PACKED)
 };
 
 enum {
@@ -706,6 +707,8 @@ static void handle_rx(struct vhost_net *net)
vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ?
vq->log : NULL;
mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
+   /* FIXME: workaround for current dpdk prototype */
+   mergeable = false;
 
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
sock_len += sock_hlen;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dcac4d4..6177e4d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -324,6 +324,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vhost_reset_is_le(vq);
vhost_disable_cross_endian(vq);
vq->busyloop_timeout = 0;
+   vq->used_wrap_counter = true;
vq->umem = NULL;
vq->iotlb = NULL;
__vhost_vq_meta_reset(vq);
@@ -1136,10 +1137,22 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, 
u64 iova, int access)
return 0;
 }
 
-static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
-   struct vring_desc __user *desc,
-   struct vring_avail __user *avail,
-   struct vring_used __user *used)
+static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
+  struct vring_desc __user *desc,
+  struct vring_avail __user *avail,
+  struct vring_used __user *used)
+{
+   struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
+
+   /* FIXME: check device area and driver area */
+   return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
+  access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
+}
+
+static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used)
 
 {
size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
@@ -1151,6 +1164,17 @@ static int vq_access_ok(struct vhost_virtqueue *vq, 
unsigned int num,
sizeof *used + num * sizeof *used->ring + s);
 }
 
+static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
+   struct vring_desc __user *desc,
+   struct vring_avail __user *avail,
+   struct vring_used __user *used)
+{
+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+   return vq_access_ok_packed(vq, num, desc, avail, used);
+   else
+   return vq_access_ok_split(vq, num, desc, avail, used);
+}
+
 static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
 const struct vhost_umem_node *node,
 int type)
@@ -1763,6 +1787,9 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 
vhost_init_is_le(vq);
 
+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+   return 0;
+
r = vhost_update_used_flags(vq);
if (r)
goto err;
@@ -1836,7 +1863,8 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 
addr, u32 len,
 /* Each buffer in the virtqueues is actually a chain of descriptors.  This
  * function returns the next descriptor in the chain,
  * or -1U if we're at the end. */
-static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
+static unsigned next_desc_split(struct vhost_virtqueue *vq,
+   struct vring_desc *desc)
 {
unsigned int next;
 
@@ -1849,11 +1877,17 @@ static unsigned next_desc(struct vhost_virtqueue *vq, 
struct vring_desc *desc)
return next;
 }
 
-static int get_indirect(struct vhost_virtqueue *vq,
-   struct iovec iov[], unsigned int iov_size,
-   unsigned int *out_num, unsigned int *in_num,
-   struct vhost_log *log, unsigned int *log_num,
-   struct vring_desc *indirect)
+static unsigned next_desc_packed(struct vhost_virtqueue *vq,
+

[RFC PATCH V2 7/8] vhost: packed ring support

2018-03-25 Thread Jason Wang

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   |   5 +-
 drivers/vhost/vhost.c | 530 ++
 drivers/vhost/vhost.h |   7 +-
 3 files changed, 505 insertions(+), 37 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7be8b55..84905d5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -67,7 +67,8 @@ enum {
VHOST_NET_FEATURES = VHOST_FEATURES |
 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
 (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-(1ULL << VIRTIO_F_IOMMU_PLATFORM)
+(1ULL << VIRTIO_F_IOMMU_PLATFORM) |
+(1ULL << VIRTIO_F_RING_PACKED)
 };
 
 enum {
@@ -706,6 +707,8 @@ static void handle_rx(struct vhost_net *net)
vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ?
vq->log : NULL;
mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
+   /* FIXME: workaround for current dpdk prototype */
+   mergeable = false;
 
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
sock_len += sock_hlen;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dcac4d4..6177e4d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -324,6 +324,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vhost_reset_is_le(vq);
vhost_disable_cross_endian(vq);
vq->busyloop_timeout = 0;
+   vq->used_wrap_counter = true;
vq->umem = NULL;
vq->iotlb = NULL;
__vhost_vq_meta_reset(vq);
@@ -1136,10 +1137,22 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, 
u64 iova, int access)
return 0;
 }
 
-static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
-   struct vring_desc __user *desc,
-   struct vring_avail __user *avail,
-   struct vring_used __user *used)
+static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
+  struct vring_desc __user *desc,
+  struct vring_avail __user *avail,
+  struct vring_used __user *used)
+{
+   struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
+
+   /* FIXME: check device area and driver area */
+   return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
+  access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
+}
+
+static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used)
 
 {
size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
@@ -1151,6 +1164,17 @@ static int vq_access_ok(struct vhost_virtqueue *vq, 
unsigned int num,
sizeof *used + num * sizeof *used->ring + s);
 }
 
+static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
+   struct vring_desc __user *desc,
+   struct vring_avail __user *avail,
+   struct vring_used __user *used)
+{
+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+   return vq_access_ok_packed(vq, num, desc, avail, used);
+   else
+   return vq_access_ok_split(vq, num, desc, avail, used);
+}
+
 static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
 const struct vhost_umem_node *node,
 int type)
@@ -1763,6 +1787,9 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 
vhost_init_is_le(vq);
 
+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+   return 0;
+
r = vhost_update_used_flags(vq);
if (r)
goto err;
@@ -1836,7 +1863,8 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 
addr, u32 len,
 /* Each buffer in the virtqueues is actually a chain of descriptors.  This
  * function returns the next descriptor in the chain,
  * or -1U if we're at the end. */
-static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
+static unsigned next_desc_split(struct vhost_virtqueue *vq,
+   struct vring_desc *desc)
 {
unsigned int next;
 
@@ -1849,11 +1877,17 @@ static unsigned next_desc(struct vhost_virtqueue *vq, 
struct vring_desc *desc)
return next;
 }
 
-static int get_indirect(struct vhost_virtqueue *vq,
-   struct iovec iov[], unsigned int iov_size,
-   unsigned int *out_num, unsigned int *in_num,
-   struct vhost_log *log, unsigned int *log_num,
-   struct vring_desc *indirect)
+static unsigned next_desc_packed(struct vhost_virtqueue *vq,
+

[RFC PATCH V2 6/8] virtio: introduce packed ring defines

2018-03-25 Thread Jason Wang

Signed-off-by: Jason Wang 
---
 include/uapi/linux/virtio_config.h |  9 +
 include/uapi/linux/virtio_ring.h   | 13 +
 2 files changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 308e209..5903d51 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -71,4 +71,13 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM33
+
+#define VIRTIO_F_RING_PACKED   34
+
+/*
+ * This feature indicates that all buffers are used by the device in
+ * the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER  35
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5fa..e297580 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -43,6 +43,8 @@
 #define VRING_DESC_F_WRITE 2
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT  4
+#define VRING_DESC_F_AVAIL  7
+#define VRING_DESC_F_USED  15
 
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
@@ -62,6 +64,17 @@
  * at the end of the used ring. Guest should ignore the used->flags field. */
 #define VIRTIO_RING_F_EVENT_IDX29
 
+struct vring_desc_packed {
+   /* Buffer Address. */
+   __virtio64 addr;
+   /* Buffer Length. */
+   __virtio32 len;
+   /* Buffer ID. */
+   __virtio16 id;
+   /* The flags depending on descriptor type. */
+   __virtio16 flags;
+};
+
 /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
 struct vring_desc {
/* Address (guest-physical). */
-- 
2.7.4

[RFC PATCH V2 6/8] virtio: introduce packed ring defines

2018-03-25 Thread Jason Wang

Signed-off-by: Jason Wang 
---
 include/uapi/linux/virtio_config.h |  9 +
 include/uapi/linux/virtio_ring.h   | 13 +
 2 files changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 308e209..5903d51 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -71,4 +71,13 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM33
+
+#define VIRTIO_F_RING_PACKED   34
+
+/*
+ * This feature indicates that all buffers are used by the device in
+ * the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER  35
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5fa..e297580 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -43,6 +43,8 @@
 #define VRING_DESC_F_WRITE 2
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT  4
+#define VRING_DESC_F_AVAIL  7
+#define VRING_DESC_F_USED  15
 
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
@@ -62,6 +64,17 @@
  * at the end of the used ring. Guest should ignore the used->flags field. */
 #define VIRTIO_RING_F_EVENT_IDX29
 
+struct vring_desc_packed {
+   /* Buffer Address. */
+   __virtio64 addr;
+   /* Buffer Length. */
+   __virtio32 len;
+   /* Buffer ID. */
+   __virtio16 id;
+   /* The flags depending on descriptor type. */
+   __virtio16 flags;
+};
+
 /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
 struct vring_desc {
/* Address (guest-physical). */
-- 
2.7.4

linux-next: manual merge of the drm tree with Linus' tree

2018-03-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the drm tree got conflicts in:

  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c

between commit:

  140bcaa23a1c ("drm/vmwgfx: Fix black screen and device errors when running 
without fbdev")

from Linus' tree and commit:

  c3b9b1657344 ("drm/vmwgfx: Improve on hibernation")

from the drm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 9116fe8baebc,9e60de95b863..
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@@ -938,7 -947,8 +947,9 @@@ int vmw_kms_present(struct vmw_private 
  int vmw_kms_update_layout_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
  void vmw_kms_legacy_hotspot_clear(struct vmw_private *dev_priv);
 +void vmw_kms_lost_device(struct drm_device *dev);
+ int vmw_kms_suspend(struct drm_device *dev);
+ int vmw_kms_resume(struct drm_device *dev);
  
  int vmw_dumb_create(struct drm_file *file_priv,
struct drm_device *dev,
diff --cc drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 3c824fd7cbf3,3628a9fe705f..
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@@ -2561,11 -2551,10 +2557,12 @@@ int vmw_kms_helper_resource_prepare(str
if (res->backup) {
ret = vmw_kms_helper_buffer_prepare(res->dev_priv, res->backup,
interruptible,
-   res->dev_priv->has_mob);
+   res->dev_priv->has_mob,
+   false);
if (ret)
goto out_unreserve;
 +
 +  ctx->buf = vmw_dmabuf_reference(res->backup);
}
ret = vmw_resource_validate(res);
if (ret)
@@@ -2863,12 -2850,49 +2860,59 @@@ int vmw_kms_set_config(struct drm_mode_
  }
  
  
 +/**
 + * vmw_kms_lost_device - Notify kms that modesetting capabilities will be lost
 + *
 + * @dev: Pointer to the drm device
 + */
 +void vmw_kms_lost_device(struct drm_device *dev)
 +{
 +  drm_atomic_helper_shutdown(dev);
 +}
++
+ /**
+  * vmw_kms_suspend - Save modesetting state and turn modesetting off.
+  *
+  * @dev: Pointer to the drm device
+  * Return: 0 on success. Negative error code on failure.
+  */
+ int vmw_kms_suspend(struct drm_device *dev)
+ {
+   struct vmw_private *dev_priv = vmw_priv(dev);
+ 
+   dev_priv->suspend_state = drm_atomic_helper_suspend(dev);
+   if (IS_ERR(dev_priv->suspend_state)) {
+   int ret = PTR_ERR(dev_priv->suspend_state);
+ 
+   DRM_ERROR("Failed kms suspend: %d\n", ret);
+   dev_priv->suspend_state = NULL;
+ 
+   return ret;
+   }
+ 
+   return 0;
+ }
+ 
+ 
+ /**
+  * vmw_kms_resume - Re-enable modesetting and restore state
+  *
+  * @dev: Pointer to the drm device
+  * Return: 0 on success. Negative error code on failure.
+  *
+  * State is resumed from a previous vmw_kms_suspend(). It's illegal
+  * to call this function without a previous vmw_kms_suspend().
+  */
+ int vmw_kms_resume(struct drm_device *dev)
+ {
+   struct vmw_private *dev_priv = vmw_priv(dev);
+   int ret;
+ 
+   if (WARN_ON(!dev_priv->suspend_state))
+   return 0;
+ 
+   ret = drm_atomic_helper_resume(dev, dev_priv->suspend_state);
+   dev_priv->suspend_state = NULL;
+ 
+   return ret;
+ }


pgpZ5ofp8Ayc4.pgp
Description: OpenPGP digital signature

[RFC PATCH V2 5/8] vhost: vhost_put_user() can accept metadata type

2018-03-25 Thread Jason Wang

We assumes used ring update is the only user for vhost_put_user() in
the past. This may not be the case for the incoming packed ring which
may update the descriptor ring for used. So introduce a new type
parameter.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 65954d6..dcac4d4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -847,7 +847,7 @@ static inline void __user *__vhost_get_user(struct 
vhost_virtqueue *vq,
return __vhost_get_user_slow(vq, addr, size, type);
 }
 
-#define vhost_put_user(vq, x, ptr) \
+#define vhost_put_user(vq, x, ptr, type)   \
 ({ \
int ret = -EFAULT; \
if (!vq->iotlb) { \
@@ -855,7 +855,7 @@ static inline void __user *__vhost_get_user(struct 
vhost_virtqueue *vq,
} else { \
__typeof__(ptr) to = \
(__typeof__(ptr)) __vhost_get_user(vq, ptr, \
- sizeof(*ptr), VHOST_ADDR_USED); \
+ sizeof(*ptr), type); \
if (to != NULL) \
ret = __put_user(x, to); \
else \
@@ -1716,7 +1716,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue 
*vq)
 {
void __user *used;
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags),
-  >used->flags) < 0)
+  >used->flags, VHOST_ADDR_USED) < 0)
return -EFAULT;
if (unlikely(vq->log_used)) {
/* Make sure the flag is seen before log. */
@@ -1735,7 +1735,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue 
*vq)
 static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 
avail_event)
 {
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
-  vhost_avail_event(vq)))
+  vhost_avail_event(vq), VHOST_ADDR_USED))
return -EFAULT;
if (unlikely(vq->log_used)) {
void __user *used;
@@ -2218,12 +2218,12 @@ static int __vhost_add_used_n(struct vhost_virtqueue 
*vq,
used = vq->used->ring + start;
for (i = 0; i < count; i++) {
if (unlikely(vhost_put_user(vq, heads[i].elem.id,
-   [i].id))) {
+   [i].id, VHOST_ADDR_USED))) {
vq_err(vq, "Failed to write used id");
return -EFAULT;
}
if (unlikely(vhost_put_user(vq, heads[i].elem.len,
-   [i].len))) {
+   [i].len, VHOST_ADDR_USED))) {
vq_err(vq, "Failed to write used len");
return -EFAULT;
}
@@ -2269,7 +2269,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct 
vhost_used_elem *heads,
/* Make sure buffer is written before we update index. */
smp_wmb();
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
-  >used->idx)) {
+  >used->idx, VHOST_ADDR_USED)) {
vq_err(vq, "Failed to increment used idx");
return -EFAULT;
}
-- 
2.7.4

[RFC PATCH V2 3/8] vhost: do not use vring_used_elem

2018-03-25 Thread Jason Wang

Instead of depending on the exported vring_used_elem, this patch
switches to use a new internal structure vhost_used_elem which embed
vring_used_elem in itself. This could be used to let vhost to record
extra metadata for the incoming packed ring layout.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 19 ++-
 drivers/vhost/scsi.c  | 10 +-
 drivers/vhost/vhost.c | 33 -
 drivers/vhost/vhost.h | 18 +++---
 drivers/vhost/vsock.c |  6 +++---
 5 files changed, 45 insertions(+), 41 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f821fcd..7ea2aee 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -337,10 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
*net,
int j = 0;
 
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+   if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN)
vhost_net_tx_err(net);
-   if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
-   vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+   if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) {
+   vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN;
++j;
} else
break;
@@ -363,7 +363,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
rcu_read_lock_bh();
 
/* set len to mark this desc buffers done DMA */
-   vq->heads[ubuf->desc].len = success ?
+   vq->heads[ubuf->desc].elem.len = success ?
VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
cnt = vhost_net_ubuf_put(ubufs);
 
@@ -422,7 +422,7 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_virtqueue *vq,
-   struct vring_used_elem *used_elem,
+   struct vhost_used_elem *used_elem,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num)
 {
@@ -473,7 +473,7 @@ static void handle_tx(struct vhost_net *net)
size_t hdr_size;
struct socket *sock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
-   struct vring_used_elem used;
+   struct vhost_used_elem used;
bool zcopy, zcopy_used;
 
mutex_lock(>mutex);
@@ -537,9 +537,10 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
 
-   vq->heads[nvq->upend_idx].id =
-   cpu_to_vhost32(vq, used.id);
-   vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
+   vq->heads[nvq->upend_idx].elem.id =
+   cpu_to_vhost32(vq, used.elem.id);
+   vq->heads[nvq->upend_idx].elem.len =
+   VHOST_DMA_IN_PROGRESS;
ubuf->callback = vhost_zerocopy_callback;
ubuf->ctx = nvq->ubufs;
ubuf->desc = nvq->upend_idx;
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 654c71f..ac11412 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -67,7 +67,7 @@ struct vhost_scsi_inflight {
 
 struct vhost_scsi_cmd {
/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-   struct vring_used_elem tvc_vq_used;
+   struct vhost_used_elem tvc_vq_used;
/* virtio-scsi initiator task attribute */
int tvc_task_attr;
/* virtio-scsi response incoming iovecs */
@@ -441,7 +441,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
struct vhost_virtqueue *vq = >vqs[VHOST_SCSI_VQ_EVT].vq;
struct virtio_scsi_event *event = >event;
struct virtio_scsi_event __user *eventp;
-   struct vring_used_elem used;
+   struct vhost_used_elem used;
unsigned out, in;
int ret;
 
@@ -785,7 +785,7 @@ static void vhost_scsi_submission_work(struct work_struct 
*work)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
   struct vhost_virtqueue *vq,
-  struct vring_used_elem *used, unsigned out)
+  struct vhost_used_elem *used, unsigned out)
 {
struct virtio_scsi_cmd_resp __user *resp;
struct virtio_scsi_cmd_resp rsp;
@@ -808,7 +808,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
struct virtio_scsi_cmd_req v_req;
struct virtio_scsi_cmd_req_pi v_req_pi;
struct vhost_scsi_cmd *cmd;
-   struct vring_used_elem used;
+

[RFC PATCH V2 2/8] vhost: hide used ring layout from device

2018-03-25 Thread Jason Wang

We used to return descriptor head by vhost_get_vq_desc() to device and
pass it back to vhost_add_used() and its friends. This exposes the
internal used ring layout to device which makes it hard to be extended for
e.g packed ring layout.

So this patch tries to hide the used ring layout by

- letting vhost_get_vq_desc() return pointer to struct vring_used_elem
- accepting pointer to struct vring_used_elem in vhost_add_used() and
  vhost_add_used_and_signal()

This could help to hide used ring layout and make it easier to
implement packed ring on top.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 46 +-
 drivers/vhost/scsi.c  | 62 +++
 drivers/vhost/vhost.c | 52 +-
 drivers/vhost/vhost.h |  9 +---
 drivers/vhost/vsock.c | 42 +-
 5 files changed, 112 insertions(+), 99 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 57dfa63..f821fcd 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -422,22 +422,24 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_virtqueue *vq,
+   struct vring_used_elem *used_elem,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num)
 {
unsigned long uninitialized_var(endtime);
-   int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+   int r = vhost_get_vq_desc(vq, used_elem, vq->iov, ARRAY_SIZE(vq->iov),
  out_num, in_num, NULL, NULL);
 
-   if (r == vq->num && vq->busyloop_timeout) {
+   if (r == -ENOSPC && vq->busyloop_timeout) {
preempt_disable();
endtime = busy_clock() + vq->busyloop_timeout;
while (vhost_can_busy_poll(vq->dev, endtime) &&
   vhost_vq_avail_empty(vq->dev, vq))
cpu_relax();
preempt_enable();
-   r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
- out_num, in_num, NULL, NULL);
+   r = vhost_get_vq_desc(vq, used_elem, vq->iov,
+ ARRAY_SIZE(vq->iov), out_num, in_num,
+ NULL, NULL);
}
 
return r;
@@ -459,7 +461,6 @@ static void handle_tx(struct vhost_net *net)
struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *vq = >vq;
unsigned out, in;
-   int head;
struct msghdr msg = {
.msg_name = NULL,
.msg_namelen = 0,
@@ -472,6 +473,7 @@ static void handle_tx(struct vhost_net *net)
size_t hdr_size;
struct socket *sock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
+   struct vring_used_elem used;
bool zcopy, zcopy_used;
 
mutex_lock(>mutex);
@@ -494,20 +496,20 @@ static void handle_tx(struct vhost_net *net)
vhost_zerocopy_signal_used(net, vq);
 
 
-   head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
-   ARRAY_SIZE(vq->iov),
-   , );
-   /* On error, stop handling until the next kick. */
-   if (unlikely(head < 0))
-   break;
+   err = vhost_net_tx_get_vq_desc(net, vq, , vq->iov,
+  ARRAY_SIZE(vq->iov),
+  , );
/* Nothing new?  Wait for eventfd to tell us they refilled. */
-   if (head == vq->num) {
+   if (err == -ENOSPC) {
if (unlikely(vhost_enable_notify(>dev, vq))) {
vhost_disable_notify(>dev, vq);
continue;
}
break;
}
+   /* On error, stop handling until the next kick. */
+   if (unlikely(err < 0))
+   break;
if (in) {
vq_err(vq, "Unexpected descriptor format for TX: "
   "out %d, int %d\n", out, in);
@@ -535,7 +537,8 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
 
-   vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
+   vq->heads[nvq->upend_idx].id =
+   cpu_to_vhost32(vq, used.id);
vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;

[RFC PATCH V2 1/8] vhost: move get_rx_bufs to vhost.c

2018-03-25 Thread Jason Wang

Move get_rx_bufs() to vhost.c and rename it to
vhost_get_rx_bufs(). This helps to hide vring internal layout from
specific device implementation. Packed ring implementation will
benefit from this.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 83 ++-
 drivers/vhost/vhost.c | 78 +++
 drivers/vhost/vhost.h |  7 +
 3 files changed, 88 insertions(+), 80 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8139bc7..57dfa63 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -658,83 +658,6 @@ static int vhost_net_rx_peek_head_len(struct vhost_net 
*net, struct sock *sk)
return len;
 }
 
-/* This is a multi-buffer version of vhost_get_desc, that works if
- * vq has read descriptors only.
- * @vq - the relevant virtqueue
- * @datalen- data length we'll be reading
- * @iovcount   - returned count of io vectors we fill
- * @log- vhost log
- * @log_num- log offset
- * @quota   - headcount quota, 1 for big buffer
- * returns number of buffer heads allocated, negative on error
- */
-static int get_rx_bufs(struct vhost_virtqueue *vq,
-  struct vring_used_elem *heads,
-  int datalen,
-  unsigned *iovcount,
-  struct vhost_log *log,
-  unsigned *log_num,
-  unsigned int quota)
-{
-   unsigned int out, in;
-   int seg = 0;
-   int headcount = 0;
-   unsigned d;
-   int r, nlogs = 0;
-   /* len is always initialized before use since we are always called with
-* datalen > 0.
-*/
-   u32 uninitialized_var(len);
-
-   while (datalen > 0 && headcount < quota) {
-   if (unlikely(seg >= UIO_MAXIOV)) {
-   r = -ENOBUFS;
-   goto err;
-   }
-   r = vhost_get_vq_desc(vq, vq->iov + seg,
- ARRAY_SIZE(vq->iov) - seg, ,
- , log, log_num);
-   if (unlikely(r < 0))
-   goto err;
-
-   d = r;
-   if (d == vq->num) {
-   r = 0;
-   goto err;
-   }
-   if (unlikely(out || in <= 0)) {
-   vq_err(vq, "unexpected descriptor format for RX: "
-   "out %d, in %d\n", out, in);
-   r = -EINVAL;
-   goto err;
-   }
-   if (unlikely(log)) {
-   nlogs += *log_num;
-   log += *log_num;
-   }
-   heads[headcount].id = cpu_to_vhost32(vq, d);
-   len = iov_length(vq->iov + seg, in);
-   heads[headcount].len = cpu_to_vhost32(vq, len);
-   datalen -= len;
-   ++headcount;
-   seg += in;
-   }
-   heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
-   *iovcount = seg;
-   if (unlikely(log))
-   *log_num = nlogs;
-
-   /* Detect overrun */
-   if (unlikely(datalen > 0)) {
-   r = UIO_MAXIOV + 1;
-   goto err;
-   }
-   return headcount;
-err:
-   vhost_discard_vq_desc(vq, headcount);
-   return r;
-}
-
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_rx(struct vhost_net *net)
@@ -784,9 +707,9 @@ static void handle_rx(struct vhost_net *net)
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
sock_len += sock_hlen;
vhost_len = sock_len + vhost_hlen;
-   headcount = get_rx_bufs(vq, vq->heads + nheads, vhost_len,
-   , vq_log, ,
-   likely(mergeable) ? UIO_MAXIOV : 1);
+   headcount = vhost_get_bufs(vq, vq->heads + nheads, vhost_len,
+  , vq_log, ,
+  likely(mergeable) ? UIO_MAXIOV : 1);
/* On error, stop handling until the next kick. */
if (unlikely(headcount < 0))
goto out;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1b3e8d2d..c57df71 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2098,6 +2098,84 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+/* This is a multi-buffer version of vhost_get_desc, that works if
+ * vq has read descriptors only.
+ * @vq - the relevant virtqueue
+ * @datalen- data length we'll be reading
+ * @iovcount   - returned count of io vectors we fill
+ * @log- vhost log
+ * @log_num- log offset
+ *

[RFC PATCH V2 4/8] vhost_net: do not explicitly manipulate vhost_used_elem

2018-03-25 Thread Jason Wang

Two helpers of setting/getting used len were introduced to avoid
explicitly manipulating vhost_used_elem in zerocopy code. This will be
used to hide used_elem internals and simplify packed ring
implementation.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 11 +--
 drivers/vhost/vhost.c | 12 ++--
 drivers/vhost/vhost.h |  5 +
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7ea2aee..7be8b55 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -337,9 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
*net,
int j = 0;
 
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN)
+   if (vhost_get_used_len(vq, >heads[i]) ==
+   VHOST_DMA_FAILED_LEN)
vhost_net_tx_err(net);
-   if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) {
+   if (VHOST_DMA_IS_DONE(vhost_get_used_len(vq, >heads[i]))) {
vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN;
++j;
} else
@@ -537,10 +538,8 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
 
-   vq->heads[nvq->upend_idx].elem.id =
-   cpu_to_vhost32(vq, used.elem.id);
-   vq->heads[nvq->upend_idx].elem.len =
-   VHOST_DMA_IN_PROGRESS;
+   vhost_set_used_len(vq, , VHOST_DMA_IN_PROGRESS);
+   vq->heads[nvq->upend_idx] = used;
ubuf->callback = vhost_zerocopy_callback;
ubuf->ctx = nvq->ubufs;
ubuf->desc = nvq->upend_idx;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8744dae..65954d6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2100,11 +2100,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
-static void vhost_set_used_len(struct vhost_virtqueue *vq,
-  struct vhost_used_elem *used, int len)
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+   struct vhost_used_elem *used, int len)
 {
used->elem.len = cpu_to_vhost32(vq, len);
 }
+EXPORT_SYMBOL_GPL(vhost_set_used_len);
+
+int vhost_get_used_len(struct vhost_virtqueue *vq,
+  struct vhost_used_elem *used)
+{
+   return vhost32_to_cpu(vq, used->elem.len);
+}
+EXPORT_SYMBOL_GPL(vhost_get_used_len);
 
 /* This is a multi-buffer version of vhost_get_desc, that works if
  * vq has read descriptors only.
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8399887..d57c875 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -198,6 +198,11 @@ int vhost_get_bufs(struct vhost_virtqueue *vq,
   unsigned *log_num,
   unsigned int quota,
   s16 *count);
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+   struct vhost_used_elem *used,
+   int len);
+int vhost_get_used_len(struct vhost_virtqueue *vq,
+  struct vhost_used_elem *used);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
-- 
2.7.4

linux-next: manual merge of the drm tree with Linus' tree

2018-03-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the drm tree got conflicts in:

  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c

between commit:

  140bcaa23a1c ("drm/vmwgfx: Fix black screen and device errors when running 
without fbdev")

from Linus' tree and commit:

  c3b9b1657344 ("drm/vmwgfx: Improve on hibernation")

from the drm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 9116fe8baebc,9e60de95b863..
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@@ -938,7 -947,8 +947,9 @@@ int vmw_kms_present(struct vmw_private 
  int vmw_kms_update_layout_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
  void vmw_kms_legacy_hotspot_clear(struct vmw_private *dev_priv);
 +void vmw_kms_lost_device(struct drm_device *dev);
+ int vmw_kms_suspend(struct drm_device *dev);
+ int vmw_kms_resume(struct drm_device *dev);
  
  int vmw_dumb_create(struct drm_file *file_priv,
struct drm_device *dev,
diff --cc drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 3c824fd7cbf3,3628a9fe705f..
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@@ -2561,11 -2551,10 +2557,12 @@@ int vmw_kms_helper_resource_prepare(str
if (res->backup) {
ret = vmw_kms_helper_buffer_prepare(res->dev_priv, res->backup,
interruptible,
-   res->dev_priv->has_mob);
+   res->dev_priv->has_mob,
+   false);
if (ret)
goto out_unreserve;
 +
 +  ctx->buf = vmw_dmabuf_reference(res->backup);
}
ret = vmw_resource_validate(res);
if (ret)
@@@ -2863,12 -2850,49 +2860,59 @@@ int vmw_kms_set_config(struct drm_mode_
  }
  
  
 +/**
 + * vmw_kms_lost_device - Notify kms that modesetting capabilities will be lost
 + *
 + * @dev: Pointer to the drm device
 + */
 +void vmw_kms_lost_device(struct drm_device *dev)
 +{
 +  drm_atomic_helper_shutdown(dev);
 +}
++
+ /**
+  * vmw_kms_suspend - Save modesetting state and turn modesetting off.
+  *
+  * @dev: Pointer to the drm device
+  * Return: 0 on success. Negative error code on failure.
+  */
+ int vmw_kms_suspend(struct drm_device *dev)
+ {
+   struct vmw_private *dev_priv = vmw_priv(dev);
+ 
+   dev_priv->suspend_state = drm_atomic_helper_suspend(dev);
+   if (IS_ERR(dev_priv->suspend_state)) {
+   int ret = PTR_ERR(dev_priv->suspend_state);
+ 
+   DRM_ERROR("Failed kms suspend: %d\n", ret);
+   dev_priv->suspend_state = NULL;
+ 
+   return ret;
+   }
+ 
+   return 0;
+ }
+ 
+ 
+ /**
+  * vmw_kms_resume - Re-enable modesetting and restore state
+  *
+  * @dev: Pointer to the drm device
+  * Return: 0 on success. Negative error code on failure.
+  *
+  * State is resumed from a previous vmw_kms_suspend(). It's illegal
+  * to call this function without a previous vmw_kms_suspend().
+  */
+ int vmw_kms_resume(struct drm_device *dev)
+ {
+   struct vmw_private *dev_priv = vmw_priv(dev);
+   int ret;
+ 
+   if (WARN_ON(!dev_priv->suspend_state))
+   return 0;
+ 
+   ret = drm_atomic_helper_resume(dev, dev_priv->suspend_state);
+   dev_priv->suspend_state = NULL;
+ 
+   return ret;
+ }


pgpZ5ofp8Ayc4.pgp
Description: OpenPGP digital signature

[RFC PATCH V2 5/8] vhost: vhost_put_user() can accept metadata type

2018-03-25 Thread Jason Wang

We assumes used ring update is the only user for vhost_put_user() in
the past. This may not be the case for the incoming packed ring which
may update the descriptor ring for used. So introduce a new type
parameter.

Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 65954d6..dcac4d4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -847,7 +847,7 @@ static inline void __user *__vhost_get_user(struct 
vhost_virtqueue *vq,
return __vhost_get_user_slow(vq, addr, size, type);
 }
 
-#define vhost_put_user(vq, x, ptr) \
+#define vhost_put_user(vq, x, ptr, type)   \
 ({ \
int ret = -EFAULT; \
if (!vq->iotlb) { \
@@ -855,7 +855,7 @@ static inline void __user *__vhost_get_user(struct 
vhost_virtqueue *vq,
} else { \
__typeof__(ptr) to = \
(__typeof__(ptr)) __vhost_get_user(vq, ptr, \
- sizeof(*ptr), VHOST_ADDR_USED); \
+ sizeof(*ptr), type); \
if (to != NULL) \
ret = __put_user(x, to); \
else \
@@ -1716,7 +1716,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue 
*vq)
 {
void __user *used;
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags),
-  >used->flags) < 0)
+  >used->flags, VHOST_ADDR_USED) < 0)
return -EFAULT;
if (unlikely(vq->log_used)) {
/* Make sure the flag is seen before log. */
@@ -1735,7 +1735,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue 
*vq)
 static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 
avail_event)
 {
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
-  vhost_avail_event(vq)))
+  vhost_avail_event(vq), VHOST_ADDR_USED))
return -EFAULT;
if (unlikely(vq->log_used)) {
void __user *used;
@@ -2218,12 +2218,12 @@ static int __vhost_add_used_n(struct vhost_virtqueue 
*vq,
used = vq->used->ring + start;
for (i = 0; i < count; i++) {
if (unlikely(vhost_put_user(vq, heads[i].elem.id,
-   [i].id))) {
+   [i].id, VHOST_ADDR_USED))) {
vq_err(vq, "Failed to write used id");
return -EFAULT;
}
if (unlikely(vhost_put_user(vq, heads[i].elem.len,
-   [i].len))) {
+   [i].len, VHOST_ADDR_USED))) {
vq_err(vq, "Failed to write used len");
return -EFAULT;
}
@@ -2269,7 +2269,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct 
vhost_used_elem *heads,
/* Make sure buffer is written before we update index. */
smp_wmb();
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
-  >used->idx)) {
+  >used->idx, VHOST_ADDR_USED)) {
vq_err(vq, "Failed to increment used idx");
return -EFAULT;
}
-- 
2.7.4

[RFC PATCH V2 3/8] vhost: do not use vring_used_elem

2018-03-25 Thread Jason Wang

Instead of depending on the exported vring_used_elem, this patch
switches to use a new internal structure vhost_used_elem which embed
vring_used_elem in itself. This could be used to let vhost to record
extra metadata for the incoming packed ring layout.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 19 ++-
 drivers/vhost/scsi.c  | 10 +-
 drivers/vhost/vhost.c | 33 -
 drivers/vhost/vhost.h | 18 +++---
 drivers/vhost/vsock.c |  6 +++---
 5 files changed, 45 insertions(+), 41 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f821fcd..7ea2aee 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -337,10 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
*net,
int j = 0;
 
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+   if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN)
vhost_net_tx_err(net);
-   if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
-   vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+   if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) {
+   vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN;
++j;
} else
break;
@@ -363,7 +363,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
rcu_read_lock_bh();
 
/* set len to mark this desc buffers done DMA */
-   vq->heads[ubuf->desc].len = success ?
+   vq->heads[ubuf->desc].elem.len = success ?
VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
cnt = vhost_net_ubuf_put(ubufs);
 
@@ -422,7 +422,7 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_virtqueue *vq,
-   struct vring_used_elem *used_elem,
+   struct vhost_used_elem *used_elem,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num)
 {
@@ -473,7 +473,7 @@ static void handle_tx(struct vhost_net *net)
size_t hdr_size;
struct socket *sock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
-   struct vring_used_elem used;
+   struct vhost_used_elem used;
bool zcopy, zcopy_used;
 
mutex_lock(>mutex);
@@ -537,9 +537,10 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
 
-   vq->heads[nvq->upend_idx].id =
-   cpu_to_vhost32(vq, used.id);
-   vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
+   vq->heads[nvq->upend_idx].elem.id =
+   cpu_to_vhost32(vq, used.elem.id);
+   vq->heads[nvq->upend_idx].elem.len =
+   VHOST_DMA_IN_PROGRESS;
ubuf->callback = vhost_zerocopy_callback;
ubuf->ctx = nvq->ubufs;
ubuf->desc = nvq->upend_idx;
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 654c71f..ac11412 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -67,7 +67,7 @@ struct vhost_scsi_inflight {
 
 struct vhost_scsi_cmd {
/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-   struct vring_used_elem tvc_vq_used;
+   struct vhost_used_elem tvc_vq_used;
/* virtio-scsi initiator task attribute */
int tvc_task_attr;
/* virtio-scsi response incoming iovecs */
@@ -441,7 +441,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
struct vhost_virtqueue *vq = >vqs[VHOST_SCSI_VQ_EVT].vq;
struct virtio_scsi_event *event = >event;
struct virtio_scsi_event __user *eventp;
-   struct vring_used_elem used;
+   struct vhost_used_elem used;
unsigned out, in;
int ret;
 
@@ -785,7 +785,7 @@ static void vhost_scsi_submission_work(struct work_struct 
*work)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
   struct vhost_virtqueue *vq,
-  struct vring_used_elem *used, unsigned out)
+  struct vhost_used_elem *used, unsigned out)
 {
struct virtio_scsi_cmd_resp __user *resp;
struct virtio_scsi_cmd_resp rsp;
@@ -808,7 +808,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
struct virtio_scsi_cmd_req v_req;
struct virtio_scsi_cmd_req_pi v_req_pi;
struct vhost_scsi_cmd *cmd;
-   struct vring_used_elem used;
+   struct

[RFC PATCH V2 2/8] vhost: hide used ring layout from device

2018-03-25 Thread Jason Wang

We used to return descriptor head by vhost_get_vq_desc() to device and
pass it back to vhost_add_used() and its friends. This exposes the
internal used ring layout to device which makes it hard to be extended for
e.g packed ring layout.

So this patch tries to hide the used ring layout by

- letting vhost_get_vq_desc() return pointer to struct vring_used_elem
- accepting pointer to struct vring_used_elem in vhost_add_used() and
  vhost_add_used_and_signal()

This could help to hide used ring layout and make it easier to
implement packed ring on top.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 46 +-
 drivers/vhost/scsi.c  | 62 +++
 drivers/vhost/vhost.c | 52 +-
 drivers/vhost/vhost.h |  9 +---
 drivers/vhost/vsock.c | 42 +-
 5 files changed, 112 insertions(+), 99 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 57dfa63..f821fcd 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -422,22 +422,24 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_virtqueue *vq,
+   struct vring_used_elem *used_elem,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num)
 {
unsigned long uninitialized_var(endtime);
-   int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+   int r = vhost_get_vq_desc(vq, used_elem, vq->iov, ARRAY_SIZE(vq->iov),
  out_num, in_num, NULL, NULL);
 
-   if (r == vq->num && vq->busyloop_timeout) {
+   if (r == -ENOSPC && vq->busyloop_timeout) {
preempt_disable();
endtime = busy_clock() + vq->busyloop_timeout;
while (vhost_can_busy_poll(vq->dev, endtime) &&
   vhost_vq_avail_empty(vq->dev, vq))
cpu_relax();
preempt_enable();
-   r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
- out_num, in_num, NULL, NULL);
+   r = vhost_get_vq_desc(vq, used_elem, vq->iov,
+ ARRAY_SIZE(vq->iov), out_num, in_num,
+ NULL, NULL);
}
 
return r;
@@ -459,7 +461,6 @@ static void handle_tx(struct vhost_net *net)
struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *vq = >vq;
unsigned out, in;
-   int head;
struct msghdr msg = {
.msg_name = NULL,
.msg_namelen = 0,
@@ -472,6 +473,7 @@ static void handle_tx(struct vhost_net *net)
size_t hdr_size;
struct socket *sock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
+   struct vring_used_elem used;
bool zcopy, zcopy_used;
 
mutex_lock(>mutex);
@@ -494,20 +496,20 @@ static void handle_tx(struct vhost_net *net)
vhost_zerocopy_signal_used(net, vq);
 
 
-   head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
-   ARRAY_SIZE(vq->iov),
-   , );
-   /* On error, stop handling until the next kick. */
-   if (unlikely(head < 0))
-   break;
+   err = vhost_net_tx_get_vq_desc(net, vq, , vq->iov,
+  ARRAY_SIZE(vq->iov),
+  , );
/* Nothing new?  Wait for eventfd to tell us they refilled. */
-   if (head == vq->num) {
+   if (err == -ENOSPC) {
if (unlikely(vhost_enable_notify(>dev, vq))) {
vhost_disable_notify(>dev, vq);
continue;
}
break;
}
+   /* On error, stop handling until the next kick. */
+   if (unlikely(err < 0))
+   break;
if (in) {
vq_err(vq, "Unexpected descriptor format for TX: "
   "out %d, int %d\n", out, in);
@@ -535,7 +537,8 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
 
-   vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
+   vq->heads[nvq->upend_idx].id =
+   cpu_to_vhost32(vq, used.id);
vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
ubuf->callback =

[RFC PATCH V2 1/8] vhost: move get_rx_bufs to vhost.c

2018-03-25 Thread Jason Wang

Move get_rx_bufs() to vhost.c and rename it to
vhost_get_rx_bufs(). This helps to hide vring internal layout from
specific device implementation. Packed ring implementation will
benefit from this.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 83 ++-
 drivers/vhost/vhost.c | 78 +++
 drivers/vhost/vhost.h |  7 +
 3 files changed, 88 insertions(+), 80 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8139bc7..57dfa63 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -658,83 +658,6 @@ static int vhost_net_rx_peek_head_len(struct vhost_net 
*net, struct sock *sk)
return len;
 }
 
-/* This is a multi-buffer version of vhost_get_desc, that works if
- * vq has read descriptors only.
- * @vq - the relevant virtqueue
- * @datalen- data length we'll be reading
- * @iovcount   - returned count of io vectors we fill
- * @log- vhost log
- * @log_num- log offset
- * @quota   - headcount quota, 1 for big buffer
- * returns number of buffer heads allocated, negative on error
- */
-static int get_rx_bufs(struct vhost_virtqueue *vq,
-  struct vring_used_elem *heads,
-  int datalen,
-  unsigned *iovcount,
-  struct vhost_log *log,
-  unsigned *log_num,
-  unsigned int quota)
-{
-   unsigned int out, in;
-   int seg = 0;
-   int headcount = 0;
-   unsigned d;
-   int r, nlogs = 0;
-   /* len is always initialized before use since we are always called with
-* datalen > 0.
-*/
-   u32 uninitialized_var(len);
-
-   while (datalen > 0 && headcount < quota) {
-   if (unlikely(seg >= UIO_MAXIOV)) {
-   r = -ENOBUFS;
-   goto err;
-   }
-   r = vhost_get_vq_desc(vq, vq->iov + seg,
- ARRAY_SIZE(vq->iov) - seg, ,
- , log, log_num);
-   if (unlikely(r < 0))
-   goto err;
-
-   d = r;
-   if (d == vq->num) {
-   r = 0;
-   goto err;
-   }
-   if (unlikely(out || in <= 0)) {
-   vq_err(vq, "unexpected descriptor format for RX: "
-   "out %d, in %d\n", out, in);
-   r = -EINVAL;
-   goto err;
-   }
-   if (unlikely(log)) {
-   nlogs += *log_num;
-   log += *log_num;
-   }
-   heads[headcount].id = cpu_to_vhost32(vq, d);
-   len = iov_length(vq->iov + seg, in);
-   heads[headcount].len = cpu_to_vhost32(vq, len);
-   datalen -= len;
-   ++headcount;
-   seg += in;
-   }
-   heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
-   *iovcount = seg;
-   if (unlikely(log))
-   *log_num = nlogs;
-
-   /* Detect overrun */
-   if (unlikely(datalen > 0)) {
-   r = UIO_MAXIOV + 1;
-   goto err;
-   }
-   return headcount;
-err:
-   vhost_discard_vq_desc(vq, headcount);
-   return r;
-}
-
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_rx(struct vhost_net *net)
@@ -784,9 +707,9 @@ static void handle_rx(struct vhost_net *net)
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
sock_len += sock_hlen;
vhost_len = sock_len + vhost_hlen;
-   headcount = get_rx_bufs(vq, vq->heads + nheads, vhost_len,
-   , vq_log, ,
-   likely(mergeable) ? UIO_MAXIOV : 1);
+   headcount = vhost_get_bufs(vq, vq->heads + nheads, vhost_len,
+  , vq_log, ,
+  likely(mergeable) ? UIO_MAXIOV : 1);
/* On error, stop handling until the next kick. */
if (unlikely(headcount < 0))
goto out;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1b3e8d2d..c57df71 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2098,6 +2098,84 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+/* This is a multi-buffer version of vhost_get_desc, that works if
+ * vq has read descriptors only.
+ * @vq - the relevant virtqueue
+ * @datalen- data length we'll be reading
+ * @iovcount   - returned count of io vectors we fill
+ * @log- vhost log
+ * @log_num- log offset
+ * @quota   -

[RFC PATCH V2 4/8] vhost_net: do not explicitly manipulate vhost_used_elem

2018-03-25 Thread Jason Wang

Two helpers of setting/getting used len were introduced to avoid
explicitly manipulating vhost_used_elem in zerocopy code. This will be
used to hide used_elem internals and simplify packed ring
implementation.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c   | 11 +--
 drivers/vhost/vhost.c | 12 ++--
 drivers/vhost/vhost.h |  5 +
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7ea2aee..7be8b55 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -337,9 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
*net,
int j = 0;
 
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN)
+   if (vhost_get_used_len(vq, >heads[i]) ==
+   VHOST_DMA_FAILED_LEN)
vhost_net_tx_err(net);
-   if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) {
+   if (VHOST_DMA_IS_DONE(vhost_get_used_len(vq, >heads[i]))) {
vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN;
++j;
} else
@@ -537,10 +538,8 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
 
-   vq->heads[nvq->upend_idx].elem.id =
-   cpu_to_vhost32(vq, used.elem.id);
-   vq->heads[nvq->upend_idx].elem.len =
-   VHOST_DMA_IN_PROGRESS;
+   vhost_set_used_len(vq, , VHOST_DMA_IN_PROGRESS);
+   vq->heads[nvq->upend_idx] = used;
ubuf->callback = vhost_zerocopy_callback;
ubuf->ctx = nvq->ubufs;
ubuf->desc = nvq->upend_idx;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8744dae..65954d6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2100,11 +2100,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
-static void vhost_set_used_len(struct vhost_virtqueue *vq,
-  struct vhost_used_elem *used, int len)
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+   struct vhost_used_elem *used, int len)
 {
used->elem.len = cpu_to_vhost32(vq, len);
 }
+EXPORT_SYMBOL_GPL(vhost_set_used_len);
+
+int vhost_get_used_len(struct vhost_virtqueue *vq,
+  struct vhost_used_elem *used)
+{
+   return vhost32_to_cpu(vq, used->elem.len);
+}
+EXPORT_SYMBOL_GPL(vhost_get_used_len);
 
 /* This is a multi-buffer version of vhost_get_desc, that works if
  * vq has read descriptors only.
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8399887..d57c875 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -198,6 +198,11 @@ int vhost_get_bufs(struct vhost_virtqueue *vq,
   unsigned *log_num,
   unsigned int quota,
   s16 *count);
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+   struct vhost_used_elem *used,
+   int len);
+int vhost_get_used_len(struct vhost_virtqueue *vq,
+  struct vhost_used_elem *used);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
-- 
2.7.4

[RFC PATCH V2 0/8] Packed ring for vhost

2018-03-25 Thread Jason Wang

Hi all:

This RFC implement packed ring layout. The code were tested with pmd
implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change
was needed for pmd codes to kick virtqueue since it assumes a busy
polling backend.

Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

Notes: The event suppression /indirect descriptor support is complied
   test only because of lacked driver support.

Changes from V1:

- Refactor vhost used elem code to avoid open coding on used elem
- Event suppression support (compile test only).
- Indirect descriptor support (compile test only).
- Zerocopy support.
- vIOMMU support.
- SCSI/VSOCK support (compile test only).
- Fix several bugs

For simplicity, I don't implement batching or other optimizations.

Please review.

Thanks

Jason Wang (8):
  vhost: move get_rx_bufs to vhost.c
  vhost: hide used ring layout from device
  vhost: do not use vring_used_elem
  vhost_net: do not explicitly manipulate vhost_used_elem
  vhost: vhost_put_user() can accept metadata type
  virtio: introduce packed ring defines
  vhost: packed ring support
  vhost: event suppression for packed ring

 drivers/vhost/net.c| 138 ++-
 drivers/vhost/scsi.c   |  62 +--
 drivers/vhost/vhost.c  | 818 ++---
 drivers/vhost/vhost.h  |  46 ++-
 drivers/vhost/vsock.c  |  42 +-
 include/uapi/linux/virtio_config.h |   9 +
 include/uapi/linux/virtio_ring.h   |  32 ++
 7 files changed, 921 insertions(+), 226 deletions(-)

-- 
2.7.4

[RFC PATCH V2 0/8] Packed ring for vhost

2018-03-25 Thread Jason Wang

Hi all:

This RFC implement packed ring layout. The code were tested with pmd
implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change
was needed for pmd codes to kick virtqueue since it assumes a busy
polling backend.

Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

Notes: The event suppression /indirect descriptor support is complied
   test only because of lacked driver support.

Changes from V1:

- Refactor vhost used elem code to avoid open coding on used elem
- Event suppression support (compile test only).
- Indirect descriptor support (compile test only).
- Zerocopy support.
- vIOMMU support.
- SCSI/VSOCK support (compile test only).
- Fix several bugs

For simplicity, I don't implement batching or other optimizations.

Please review.

Thanks

Jason Wang (8):
  vhost: move get_rx_bufs to vhost.c
  vhost: hide used ring layout from device
  vhost: do not use vring_used_elem
  vhost_net: do not explicitly manipulate vhost_used_elem
  vhost: vhost_put_user() can accept metadata type
  virtio: introduce packed ring defines
  vhost: packed ring support
  vhost: event suppression for packed ring

 drivers/vhost/net.c| 138 ++-
 drivers/vhost/scsi.c   |  62 +--
 drivers/vhost/vhost.c  | 818 ++---
 drivers/vhost/vhost.h  |  46 ++-
 drivers/vhost/vsock.c  |  42 +-
 include/uapi/linux/virtio_config.h |   9 +
 include/uapi/linux/virtio_ring.h   |  32 ++
 7 files changed, 921 insertions(+), 226 deletions(-)

-- 
2.7.4

RE: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules

2018-03-25 Thread Wang, Wei W

On Monday, March 26, 2018 10:40 AM, Wang, Wei W wrote:
> Subject: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled
> to kernel modules
> 
> In some usages, e.g. virtio-balloon, a kernel module needs to know if page
> poisoning is in use. This patch exposes the page_poisoning_enabled function
> to kernel modules.
> 
> Signed-off-by: Wei Wang 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Michael S. Tsirkin 
> ---
>  mm/page_poison.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/page_poison.c b/mm/page_poison.c index e83fd44..762b472
> 100644
> --- a/mm/page_poison.c
> +++ b/mm/page_poison.c
> @@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf)  }
> early_param("page_poison", early_page_poison_param);
> 
> +/**
> + * page_poisoning_enabled - check if page poisoning is enabled
> + *
> + * Return true if page poisoning is enabled, or false if not.
> + */
>  bool page_poisoning_enabled(void)
>  {
>   /*
> @@ -29,6 +34,7 @@ bool page_poisoning_enabled(void)
> 
>   (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
>   debug_pagealloc_enabled()));
>  }
> +EXPORT_SYMBOL_GPL(page_poisoning_enabled);
> 
>  static void poison_page(struct page *page)  {
> --
> 2.7.4


Could we get a review of this patch? We've reviewed other parts, and this one 
seems to be the last part of this feature. Thanks.

Best,
Wei

RE: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules

2018-03-25 Thread Wang, Wei W

On Monday, March 26, 2018 10:40 AM, Wang, Wei W wrote:
> Subject: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled
> to kernel modules
> 
> In some usages, e.g. virtio-balloon, a kernel module needs to know if page
> poisoning is in use. This patch exposes the page_poisoning_enabled function
> to kernel modules.
> 
> Signed-off-by: Wei Wang 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Michael S. Tsirkin 
> ---
>  mm/page_poison.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/page_poison.c b/mm/page_poison.c index e83fd44..762b472
> 100644
> --- a/mm/page_poison.c
> +++ b/mm/page_poison.c
> @@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf)  }
> early_param("page_poison", early_page_poison_param);
> 
> +/**
> + * page_poisoning_enabled - check if page poisoning is enabled
> + *
> + * Return true if page poisoning is enabled, or false if not.
> + */
>  bool page_poisoning_enabled(void)
>  {
>   /*
> @@ -29,6 +34,7 @@ bool page_poisoning_enabled(void)
> 
>   (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
>   debug_pagealloc_enabled()));
>  }
> +EXPORT_SYMBOL_GPL(page_poisoning_enabled);
> 
>  static void poison_page(struct page *page)  {
> --
> 2.7.4


Could we get a review of this patch? We've reviewed other parts, and this one 
seems to be the last part of this feature. Thanks.

Best,
Wei

Re: [PATCH v7 6/6] typec: tcpm: Add support for sink PPS related messages

2018-03-25 Thread Guenter Roeck


On 03/23/2018 03:12 AM, Adam Thomson wrote:

This commit adds sink side support for Get_Status, Status,
Get_PPS_Status and PPS_Status handling. As there's the
potential for a partner to respond with Not_Supported,
handling of this message is also added. Sending of
Not_Supported is added to handle messagescreceived but not
yet handled.

Signed-off-by: Adam Thomson 
Acked-by: Heikki Krogerus 


Reviewed-by: Guenter Roeck 


---
  drivers/usb/typec/tcpm.c | 143 ---
  1 file changed, 134 insertions(+), 9 deletions(-)

diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c
index 57a7d1a..7025a16 100644
--- a/drivers/usb/typec/tcpm.c
+++ b/drivers/usb/typec/tcpm.c
@@ -19,7 +19,9 @@
  #include 
  #include 
  #include 
+#include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -113,6 +115,11 @@
S(SNK_TRYWAIT_VBUS),\
S(BIST_RX), \
\
+   S(GET_STATUS_SEND), \
+   S(GET_STATUS_SEND_TIMEOUT), \
+   S(GET_PPS_STATUS_SEND), \
+   S(GET_PPS_STATUS_SEND_TIMEOUT), \
+   \
S(ERROR_RECOVERY),  \
S(PORT_RESET),  \
S(PORT_RESET_WAIT_OFF)
@@ -143,6 +150,7 @@ enum pd_msg_request {
PD_MSG_NONE = 0,
PD_MSG_CTRL_REJECT,
PD_MSG_CTRL_WAIT,
+   PD_MSG_CTRL_NOT_SUPP,
PD_MSG_DATA_SINK_CAP,
PD_MSG_DATA_SOURCE_CAP,
  };
@@ -1398,10 +1406,42 @@ static int tcpm_validate_caps(struct tcpm_port *port, 
const u32 *pdo,
  /*
   * PD (data, control) command handling functions
   */
+static inline enum tcpm_state ready_state(struct tcpm_port *port)
+{
+   if (port->pwr_role == TYPEC_SOURCE)
+   return SRC_READY;
+   else
+   return SNK_READY;
+}
  
  static int tcpm_pd_send_control(struct tcpm_port *port,

enum pd_ctrl_msg_type type);
  
+static void tcpm_handle_alert(struct tcpm_port *port, const __le32 *payload,

+ int cnt)
+{
+   u32 p0 = le32_to_cpu(payload[0]);
+   unsigned int type = usb_pd_ado_type(p0);
+
+   if (!type) {
+   tcpm_log(port, "Alert message received with no type");
+   return;
+   }
+
+   /* Just handling non-battery alerts for now */
+   if (!(type & USB_PD_ADO_TYPE_BATT_STATUS_CHANGE)) {
+   switch (port->state) {
+   case SRC_READY:
+   case SNK_READY:
+   tcpm_set_state(port, GET_STATUS_SEND, 0);
+   break;
+   default:
+   tcpm_queue_message(port, PD_MSG_CTRL_WAIT);
+   break;
+   }
+   }
+}
+
  static void tcpm_pd_data_request(struct tcpm_port *port,
 const struct pd_message *msg)
  {
@@ -1489,6 +1529,14 @@ static void tcpm_pd_data_request(struct tcpm_port *port,
tcpm_set_state(port, BIST_RX, 0);
}
break;
+   case PD_DATA_ALERT:
+   tcpm_handle_alert(port, msg->payload, cnt);
+   break;
+   case PD_DATA_BATT_STATUS:
+   case PD_DATA_GET_COUNTRY_INFO:
+   /* Currently unsupported */
+   tcpm_queue_message(port, PD_MSG_CTRL_NOT_SUPP);
+   break;
default:
tcpm_log(port, "Unhandled data message type %#x", type);
break;
@@ -1571,6 +1619,7 @@ static void tcpm_pd_ctrl_request(struct tcpm_port *port,
break;
case PD_CTRL_REJECT:
case PD_CTRL_WAIT:
+   case PD_CTRL_NOT_SUPP:
switch (port->state) {
case SNK_NEGOTIATE_CAPABILITIES:
/* USB PD specification, Figure 8-43 */
@@ -1690,12 +1739,75 @@ static void tcpm_pd_ctrl_request(struct tcpm_port *port,
break;
}
break;
+   case PD_CTRL_GET_SOURCE_CAP_EXT:
+   case PD_CTRL_GET_STATUS:
+   case PD_CTRL_FR_SWAP:
+   case PD_CTRL_GET_PPS_STATUS:
+   case PD_CTRL_GET_COUNTRY_CODES:
+   /* Currently not supported */
+   tcpm_queue_message(port, PD_MSG_CTRL_NOT_SUPP);
+   break;
default:
tcpm_log(port, "Unhandled ctrl message type %#x", type);
break;
}
  }
  
+static void tcpm_pd_ext_msg_request(struct tcpm_port *port,

+   const struct pd_message *msg)
+{
+   enum pd_ext_msg_type type = pd_header_type_le(msg->header);
+   unsigned int data_size = 
pd_ext_header_data_size_le(msg->ext_msg.header);
+
+   if (!(msg->ext_msg.header &&

Re: [PATCH v7 6/6] typec: tcpm: Add support for sink PPS related messages

2018-03-25 Thread Guenter Roeck


On 03/23/2018 03:12 AM, Adam Thomson wrote:

This commit adds sink side support for Get_Status, Status,
Get_PPS_Status and PPS_Status handling. As there's the
potential for a partner to respond with Not_Supported,
handling of this message is also added. Sending of
Not_Supported is added to handle messagescreceived but not
yet handled.

Signed-off-by: Adam Thomson 
Acked-by: Heikki Krogerus 


Reviewed-by: Guenter Roeck 


---
  drivers/usb/typec/tcpm.c | 143 ---
  1 file changed, 134 insertions(+), 9 deletions(-)

diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c
index 57a7d1a..7025a16 100644
--- a/drivers/usb/typec/tcpm.c
+++ b/drivers/usb/typec/tcpm.c
@@ -19,7 +19,9 @@
  #include 
  #include 
  #include 
+#include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -113,6 +115,11 @@
S(SNK_TRYWAIT_VBUS),\
S(BIST_RX), \
\
+   S(GET_STATUS_SEND), \
+   S(GET_STATUS_SEND_TIMEOUT), \
+   S(GET_PPS_STATUS_SEND), \
+   S(GET_PPS_STATUS_SEND_TIMEOUT), \
+   \
S(ERROR_RECOVERY),  \
S(PORT_RESET),  \
S(PORT_RESET_WAIT_OFF)
@@ -143,6 +150,7 @@ enum pd_msg_request {
PD_MSG_NONE = 0,
PD_MSG_CTRL_REJECT,
PD_MSG_CTRL_WAIT,
+   PD_MSG_CTRL_NOT_SUPP,
PD_MSG_DATA_SINK_CAP,
PD_MSG_DATA_SOURCE_CAP,
  };
@@ -1398,10 +1406,42 @@ static int tcpm_validate_caps(struct tcpm_port *port, 
const u32 *pdo,
  /*
   * PD (data, control) command handling functions
   */
+static inline enum tcpm_state ready_state(struct tcpm_port *port)
+{
+   if (port->pwr_role == TYPEC_SOURCE)
+   return SRC_READY;
+   else
+   return SNK_READY;
+}
  
  static int tcpm_pd_send_control(struct tcpm_port *port,

enum pd_ctrl_msg_type type);
  
+static void tcpm_handle_alert(struct tcpm_port *port, const __le32 *payload,

+ int cnt)
+{
+   u32 p0 = le32_to_cpu(payload[0]);
+   unsigned int type = usb_pd_ado_type(p0);
+
+   if (!type) {
+   tcpm_log(port, "Alert message received with no type");
+   return;
+   }
+
+   /* Just handling non-battery alerts for now */
+   if (!(type & USB_PD_ADO_TYPE_BATT_STATUS_CHANGE)) {
+   switch (port->state) {
+   case SRC_READY:
+   case SNK_READY:
+   tcpm_set_state(port, GET_STATUS_SEND, 0);
+   break;
+   default:
+   tcpm_queue_message(port, PD_MSG_CTRL_WAIT);
+   break;
+   }
+   }
+}
+
  static void tcpm_pd_data_request(struct tcpm_port *port,
 const struct pd_message *msg)
  {
@@ -1489,6 +1529,14 @@ static void tcpm_pd_data_request(struct tcpm_port *port,
tcpm_set_state(port, BIST_RX, 0);
}
break;
+   case PD_DATA_ALERT:
+   tcpm_handle_alert(port, msg->payload, cnt);
+   break;
+   case PD_DATA_BATT_STATUS:
+   case PD_DATA_GET_COUNTRY_INFO:
+   /* Currently unsupported */
+   tcpm_queue_message(port, PD_MSG_CTRL_NOT_SUPP);
+   break;
default:
tcpm_log(port, "Unhandled data message type %#x", type);
break;
@@ -1571,6 +1619,7 @@ static void tcpm_pd_ctrl_request(struct tcpm_port *port,
break;
case PD_CTRL_REJECT:
case PD_CTRL_WAIT:
+   case PD_CTRL_NOT_SUPP:
switch (port->state) {
case SNK_NEGOTIATE_CAPABILITIES:
/* USB PD specification, Figure 8-43 */
@@ -1690,12 +1739,75 @@ static void tcpm_pd_ctrl_request(struct tcpm_port *port,
break;
}
break;
+   case PD_CTRL_GET_SOURCE_CAP_EXT:
+   case PD_CTRL_GET_STATUS:
+   case PD_CTRL_FR_SWAP:
+   case PD_CTRL_GET_PPS_STATUS:
+   case PD_CTRL_GET_COUNTRY_CODES:
+   /* Currently not supported */
+   tcpm_queue_message(port, PD_MSG_CTRL_NOT_SUPP);
+   break;
default:
tcpm_log(port, "Unhandled ctrl message type %#x", type);
break;
}
  }
  
+static void tcpm_pd_ext_msg_request(struct tcpm_port *port,

+   const struct pd_message *msg)
+{
+   enum pd_ext_msg_type type = pd_header_type_le(msg->header);
+   unsigned int data_size = 
pd_ext_header_data_size_le(msg->ext_msg.header);
+
+   if (!(msg->ext_msg.header && PD_EXT_HDR_CHUNKED)) {
+   tcpm_log(port, "Unchunked extended messages

Re: [PATCH v7 5/6] typec: tcpm: Represent source supply through power_supply

2018-03-25 Thread Guenter Roeck


On 03/23/2018 03:12 AM, Adam Thomson wrote:

This commit adds a power_supply class instance to represent a
PD source's voltage and current properties. This provides an
interface for reading these properties from user-space or other
drivers.

For PPS enabled Sources, this also provides write access to set
the current and voltage and allows for swapping between standard
PDO and PPS APDO.

As this represents a superset of the information provided in the
fusb302 driver, the power_supply instance in that code is removed
as part of this change, so reverting the commit titled
'typec: tcpm: Represent source supply through power_supply class'

Signed-off-by: Adam Thomson 


Reviewed-by: Guenter Roeck 


---
  drivers/usb/typec/Kconfig   |   1 +
  drivers/usb/typec/fusb302/Kconfig   |   2 +-
  drivers/usb/typec/fusb302/fusb302.c |  63 +-
  drivers/usb/typec/tcpm.c| 245 +++-
  4 files changed, 248 insertions(+), 63 deletions(-)

diff --git a/drivers/usb/typec/Kconfig b/drivers/usb/typec/Kconfig
index bcb2744..1ef606d 100644
--- a/drivers/usb/typec/Kconfig
+++ b/drivers/usb/typec/Kconfig
@@ -48,6 +48,7 @@ if TYPEC
  config TYPEC_TCPM
tristate "USB Type-C Port Controller Manager"
depends on USB
+   select POWER_SUPPLY
help
  The Type-C Port Controller Manager provides a USB PD and USB Type-C
  state machine for use with Type-C Port Controllers.
diff --git a/drivers/usb/typec/fusb302/Kconfig 
b/drivers/usb/typec/fusb302/Kconfig
index 48a4f2f..fce099f 100644
--- a/drivers/usb/typec/fusb302/Kconfig
+++ b/drivers/usb/typec/fusb302/Kconfig
@@ -1,6 +1,6 @@
  config TYPEC_FUSB302
tristate "Fairchild FUSB302 Type-C chip driver"
-   depends on I2C && POWER_SUPPLY
+   depends on I2C
help
  The Fairchild FUSB302 Type-C chip driver that works with
  Type-C Port Controller Manager to provide USB PD and USB
diff --git a/drivers/usb/typec/fusb302/fusb302.c 
b/drivers/usb/typec/fusb302/fusb302.c
index 06794c0..6a8f279 100644
--- a/drivers/usb/typec/fusb302/fusb302.c
+++ b/drivers/usb/typec/fusb302/fusb302.c
@@ -18,7 +18,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
@@ -99,11 +98,6 @@ struct fusb302_chip {
/* lock for sharing chip states */
struct mutex lock;
  
-	/* psy + psy status */

-   struct power_supply *psy;
-   u32 current_limit;
-   u32 supply_voltage;
-
/* chip status */
enum toggling_mode toggling_mode;
enum src_current_status src_current_status;
@@ -861,13 +855,11 @@ static int tcpm_set_vbus(struct tcpc_dev *dev, bool on, 
bool charge)
chip->vbus_on = on;
fusb302_log(chip, "vbus := %s", on ? "On" : "Off");
}
-   if (chip->charge_on == charge) {
+   if (chip->charge_on == charge)
fusb302_log(chip, "charge is already %s",
charge ? "On" : "Off");
-   } else {
+   else
chip->charge_on = charge;
-   power_supply_changed(chip->psy);
-   }
  
  done:

mutex_unlock(>lock);
@@ -883,11 +875,6 @@ static int tcpm_set_current_limit(struct tcpc_dev *dev, 
u32 max_ma, u32 mv)
fusb302_log(chip, "current limit: %d ma, %d mv (not implemented)",
max_ma, mv);
  
-	chip->supply_voltage = mv;

-   chip->current_limit = max_ma;
-
-   power_supply_changed(chip->psy);
-
return 0;
  }
  
@@ -1686,43 +1673,6 @@ static irqreturn_t fusb302_irq_intn(int irq, void *dev_id)

return IRQ_HANDLED;
  }
  
-static int fusb302_psy_get_property(struct power_supply *psy,

-   enum power_supply_property psp,
-   union power_supply_propval *val)
-{
-   struct fusb302_chip *chip = power_supply_get_drvdata(psy);
-
-   switch (psp) {
-   case POWER_SUPPLY_PROP_ONLINE:
-   val->intval = chip->charge_on;
-   break;
-   case POWER_SUPPLY_PROP_VOLTAGE_NOW:
-   val->intval = chip->supply_voltage * 1000; /* mV -> ÂµV */
-   break;
-   case POWER_SUPPLY_PROP_CURRENT_MAX:
-   val->intval = chip->current_limit * 1000; /* mA -> ÂµA */
-   break;
-   default:
-   return -ENODATA;
-   }
-
-   return 0;
-}
-
-static enum power_supply_property fusb302_psy_properties[] = {
-   POWER_SUPPLY_PROP_ONLINE,
-   POWER_SUPPLY_PROP_VOLTAGE_NOW,
-   POWER_SUPPLY_PROP_CURRENT_MAX,
-};
-
-static const struct power_supply_desc fusb302_psy_desc = {
-   .name   = "fusb302-typec-source",
-   .type   = POWER_SUPPLY_TYPE_USB_TYPE_C,
-   .properties = fusb302_psy_properties,
-   .num_properties = ARRAY_SIZE(fusb302_psy_properties),
-   .get_property   = fusb302_psy_get_property,
-};
-

Re: [PATCH v7 5/6] typec: tcpm: Represent source supply through power_supply

2018-03-25 Thread Guenter Roeck


On 03/23/2018 03:12 AM, Adam Thomson wrote:

This commit adds a power_supply class instance to represent a
PD source's voltage and current properties. This provides an
interface for reading these properties from user-space or other
drivers.

For PPS enabled Sources, this also provides write access to set
the current and voltage and allows for swapping between standard
PDO and PPS APDO.

As this represents a superset of the information provided in the
fusb302 driver, the power_supply instance in that code is removed
as part of this change, so reverting the commit titled
'typec: tcpm: Represent source supply through power_supply class'

Signed-off-by: Adam Thomson 


Reviewed-by: Guenter Roeck 


---
  drivers/usb/typec/Kconfig   |   1 +
  drivers/usb/typec/fusb302/Kconfig   |   2 +-
  drivers/usb/typec/fusb302/fusb302.c |  63 +-
  drivers/usb/typec/tcpm.c| 245 +++-
  4 files changed, 248 insertions(+), 63 deletions(-)

diff --git a/drivers/usb/typec/Kconfig b/drivers/usb/typec/Kconfig
index bcb2744..1ef606d 100644
--- a/drivers/usb/typec/Kconfig
+++ b/drivers/usb/typec/Kconfig
@@ -48,6 +48,7 @@ if TYPEC
  config TYPEC_TCPM
tristate "USB Type-C Port Controller Manager"
depends on USB
+   select POWER_SUPPLY
help
  The Type-C Port Controller Manager provides a USB PD and USB Type-C
  state machine for use with Type-C Port Controllers.
diff --git a/drivers/usb/typec/fusb302/Kconfig 
b/drivers/usb/typec/fusb302/Kconfig
index 48a4f2f..fce099f 100644
--- a/drivers/usb/typec/fusb302/Kconfig
+++ b/drivers/usb/typec/fusb302/Kconfig
@@ -1,6 +1,6 @@
  config TYPEC_FUSB302
tristate "Fairchild FUSB302 Type-C chip driver"
-   depends on I2C && POWER_SUPPLY
+   depends on I2C
help
  The Fairchild FUSB302 Type-C chip driver that works with
  Type-C Port Controller Manager to provide USB PD and USB
diff --git a/drivers/usb/typec/fusb302/fusb302.c 
b/drivers/usb/typec/fusb302/fusb302.c
index 06794c0..6a8f279 100644
--- a/drivers/usb/typec/fusb302/fusb302.c
+++ b/drivers/usb/typec/fusb302/fusb302.c
@@ -18,7 +18,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
@@ -99,11 +98,6 @@ struct fusb302_chip {
/* lock for sharing chip states */
struct mutex lock;
  
-	/* psy + psy status */

-   struct power_supply *psy;
-   u32 current_limit;
-   u32 supply_voltage;
-
/* chip status */
enum toggling_mode toggling_mode;
enum src_current_status src_current_status;
@@ -861,13 +855,11 @@ static int tcpm_set_vbus(struct tcpc_dev *dev, bool on, 
bool charge)
chip->vbus_on = on;
fusb302_log(chip, "vbus := %s", on ? "On" : "Off");
}
-   if (chip->charge_on == charge) {
+   if (chip->charge_on == charge)
fusb302_log(chip, "charge is already %s",
charge ? "On" : "Off");
-   } else {
+   else
chip->charge_on = charge;
-   power_supply_changed(chip->psy);
-   }
  
  done:

mutex_unlock(>lock);
@@ -883,11 +875,6 @@ static int tcpm_set_current_limit(struct tcpc_dev *dev, 
u32 max_ma, u32 mv)
fusb302_log(chip, "current limit: %d ma, %d mv (not implemented)",
max_ma, mv);
  
-	chip->supply_voltage = mv;

-   chip->current_limit = max_ma;
-
-   power_supply_changed(chip->psy);
-
return 0;
  }
  
@@ -1686,43 +1673,6 @@ static irqreturn_t fusb302_irq_intn(int irq, void *dev_id)

return IRQ_HANDLED;
  }
  
-static int fusb302_psy_get_property(struct power_supply *psy,

-   enum power_supply_property psp,
-   union power_supply_propval *val)
-{
-   struct fusb302_chip *chip = power_supply_get_drvdata(psy);
-
-   switch (psp) {
-   case POWER_SUPPLY_PROP_ONLINE:
-   val->intval = chip->charge_on;
-   break;
-   case POWER_SUPPLY_PROP_VOLTAGE_NOW:
-   val->intval = chip->supply_voltage * 1000; /* mV -> ÂµV */
-   break;
-   case POWER_SUPPLY_PROP_CURRENT_MAX:
-   val->intval = chip->current_limit * 1000; /* mA -> ÂµA */
-   break;
-   default:
-   return -ENODATA;
-   }
-
-   return 0;
-}
-
-static enum power_supply_property fusb302_psy_properties[] = {
-   POWER_SUPPLY_PROP_ONLINE,
-   POWER_SUPPLY_PROP_VOLTAGE_NOW,
-   POWER_SUPPLY_PROP_CURRENT_MAX,
-};
-
-static const struct power_supply_desc fusb302_psy_desc = {
-   .name   = "fusb302-typec-source",
-   .type   = POWER_SUPPLY_TYPE_USB_TYPE_C,
-   .properties = fusb302_psy_properties,
-   .num_properties = ARRAY_SIZE(fusb302_psy_properties),
-   .get_property   = fusb302_psy_get_property,
-};
-
  static int init_gpio(struct fusb302_chip *chip)
  {

Re: [PATCH v7 1/6] typec: tcpm: Add core support for sink side PPS

2018-03-25 Thread Guenter Roeck


On 03/23/2018 03:12 AM, Adam Thomson wrote:

This commit adds code to handle requesting of PPS APDOs. Switching
between standard PDOs and APDOs, and re-requesting an APDO to
modify operating voltage/current will be triggered by an
external call into TCPM.

Signed-off-by: Adam Thomson 
Acked-by: Heikki Krogerus 


Reviewed-by: Guenter Roeck 


---
  drivers/usb/typec/tcpm.c | 517 ++-
  include/linux/usb/pd.h   |   4 +-
  include/linux/usb/tcpm.h |   1 +
  3 files changed, 509 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c
index 4c0fc54..1a66c9e 100644
--- a/drivers/usb/typec/tcpm.c
+++ b/drivers/usb/typec/tcpm.c
@@ -47,6 +47,7 @@
S(SNK_DISCOVERY_DEBOUNCE_DONE), \
S(SNK_WAIT_CAPABILITIES),   \
S(SNK_NEGOTIATE_CAPABILITIES),  \
+   S(SNK_NEGOTIATE_PPS_CAPABILITIES),  \
S(SNK_TRANSITION_SINK), \
S(SNK_TRANSITION_SINK_VBUS),\
S(SNK_READY),   \
@@ -166,6 +167,16 @@ struct pd_mode_data {
struct typec_altmode_desc altmode_desc[SVID_DISCOVERY_MAX];
  };
  
+struct pd_pps_data {

+   u32 min_volt;
+   u32 max_volt;
+   u32 max_curr;
+   u32 out_volt;
+   u32 op_curr;
+   bool supported;
+   bool active;
+};
+
  struct tcpm_port {
struct device *dev;
  
@@ -233,6 +244,7 @@ struct tcpm_port {

struct completion swap_complete;
int swap_status;
  
+	unsigned int negotiated_rev;

unsigned int message_id;
unsigned int caps_count;
unsigned int hard_reset_count;
@@ -259,6 +271,7 @@ struct tcpm_port {
unsigned int max_snk_ma;
unsigned int max_snk_mw;
unsigned int operating_snk_mw;
+   bool update_sink_caps;
  
  	/* Requested current / voltage */

u32 current_limit;
@@ -275,8 +288,13 @@ struct tcpm_port {
/* VDO to retry if UFP responder replied busy */
u32 vdo_retry;
  
-	/* Alternate mode data */

+   /* PPS */
+   struct pd_pps_data pps_data;
+   struct completion pps_complete;
+   bool pps_pending;
+   int pps_status;
  
+	/* Alternate mode data */

struct pd_mode_data mode_data;
struct typec_altmode *partner_altmode[SVID_DISCOVERY_MAX];
struct typec_altmode *port_altmode[SVID_DISCOVERY_MAX];
@@ -494,6 +512,16 @@ static void tcpm_log_source_caps(struct tcpm_port *port)
  pdo_max_voltage(pdo),
  pdo_max_power(pdo));
break;
+   case PDO_TYPE_APDO:
+   if (pdo_apdo_type(pdo) == APDO_TYPE_PPS)
+   scnprintf(msg, sizeof(msg),
+ "%u-%u mV, %u mA",
+ pdo_pps_apdo_min_voltage(pdo),
+ pdo_pps_apdo_max_voltage(pdo),
+ pdo_pps_apdo_max_current(pdo));
+   else
+   strcpy(msg, "undefined APDO");
+   break;
default:
strcpy(msg, "undefined");
break;
@@ -777,11 +805,13 @@ static int tcpm_pd_send_source_caps(struct tcpm_port 
*port)
msg.header = PD_HEADER_LE(PD_CTRL_REJECT,
  port->pwr_role,
  port->data_role,
+ port->negotiated_rev,
  port->message_id, 0);
} else {
msg.header = PD_HEADER_LE(PD_DATA_SOURCE_CAP,
  port->pwr_role,
  port->data_role,
+ port->negotiated_rev,
  port->message_id,
  port->nr_src_pdo);
}
@@ -802,11 +832,13 @@ static int tcpm_pd_send_sink_caps(struct tcpm_port *port)
msg.header = PD_HEADER_LE(PD_CTRL_REJECT,
  port->pwr_role,
  port->data_role,
+ port->negotiated_rev,
  port->message_id, 0);
} else {
msg.header = PD_HEADER_LE(PD_DATA_SINK_CAP,
  port->pwr_role,
  port->data_role,
+ port->negotiated_rev,
  port->message_id,
  port->nr_snk_pdo);
}
@@ -1173,6 +1205,7 @@ static void

Re: [PATCH v7 1/6] typec: tcpm: Add core support for sink side PPS

2018-03-25 Thread Guenter Roeck


On 03/23/2018 03:12 AM, Adam Thomson wrote:

This commit adds code to handle requesting of PPS APDOs. Switching
between standard PDOs and APDOs, and re-requesting an APDO to
modify operating voltage/current will be triggered by an
external call into TCPM.

Signed-off-by: Adam Thomson 
Acked-by: Heikki Krogerus 


Reviewed-by: Guenter Roeck 


---
  drivers/usb/typec/tcpm.c | 517 ++-
  include/linux/usb/pd.h   |   4 +-
  include/linux/usb/tcpm.h |   1 +
  3 files changed, 509 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c
index 4c0fc54..1a66c9e 100644
--- a/drivers/usb/typec/tcpm.c
+++ b/drivers/usb/typec/tcpm.c
@@ -47,6 +47,7 @@
S(SNK_DISCOVERY_DEBOUNCE_DONE), \
S(SNK_WAIT_CAPABILITIES),   \
S(SNK_NEGOTIATE_CAPABILITIES),  \
+   S(SNK_NEGOTIATE_PPS_CAPABILITIES),  \
S(SNK_TRANSITION_SINK), \
S(SNK_TRANSITION_SINK_VBUS),\
S(SNK_READY),   \
@@ -166,6 +167,16 @@ struct pd_mode_data {
struct typec_altmode_desc altmode_desc[SVID_DISCOVERY_MAX];
  };
  
+struct pd_pps_data {

+   u32 min_volt;
+   u32 max_volt;
+   u32 max_curr;
+   u32 out_volt;
+   u32 op_curr;
+   bool supported;
+   bool active;
+};
+
  struct tcpm_port {
struct device *dev;
  
@@ -233,6 +244,7 @@ struct tcpm_port {

struct completion swap_complete;
int swap_status;
  
+	unsigned int negotiated_rev;

unsigned int message_id;
unsigned int caps_count;
unsigned int hard_reset_count;
@@ -259,6 +271,7 @@ struct tcpm_port {
unsigned int max_snk_ma;
unsigned int max_snk_mw;
unsigned int operating_snk_mw;
+   bool update_sink_caps;
  
  	/* Requested current / voltage */

u32 current_limit;
@@ -275,8 +288,13 @@ struct tcpm_port {
/* VDO to retry if UFP responder replied busy */
u32 vdo_retry;
  
-	/* Alternate mode data */

+   /* PPS */
+   struct pd_pps_data pps_data;
+   struct completion pps_complete;
+   bool pps_pending;
+   int pps_status;
  
+	/* Alternate mode data */

struct pd_mode_data mode_data;
struct typec_altmode *partner_altmode[SVID_DISCOVERY_MAX];
struct typec_altmode *port_altmode[SVID_DISCOVERY_MAX];
@@ -494,6 +512,16 @@ static void tcpm_log_source_caps(struct tcpm_port *port)
  pdo_max_voltage(pdo),
  pdo_max_power(pdo));
break;
+   case PDO_TYPE_APDO:
+   if (pdo_apdo_type(pdo) == APDO_TYPE_PPS)
+   scnprintf(msg, sizeof(msg),
+ "%u-%u mV, %u mA",
+ pdo_pps_apdo_min_voltage(pdo),
+ pdo_pps_apdo_max_voltage(pdo),
+ pdo_pps_apdo_max_current(pdo));
+   else
+   strcpy(msg, "undefined APDO");
+   break;
default:
strcpy(msg, "undefined");
break;
@@ -777,11 +805,13 @@ static int tcpm_pd_send_source_caps(struct tcpm_port 
*port)
msg.header = PD_HEADER_LE(PD_CTRL_REJECT,
  port->pwr_role,
  port->data_role,
+ port->negotiated_rev,
  port->message_id, 0);
} else {
msg.header = PD_HEADER_LE(PD_DATA_SOURCE_CAP,
  port->pwr_role,
  port->data_role,
+ port->negotiated_rev,
  port->message_id,
  port->nr_src_pdo);
}
@@ -802,11 +832,13 @@ static int tcpm_pd_send_sink_caps(struct tcpm_port *port)
msg.header = PD_HEADER_LE(PD_CTRL_REJECT,
  port->pwr_role,
  port->data_role,
+ port->negotiated_rev,
  port->message_id, 0);
} else {
msg.header = PD_HEADER_LE(PD_DATA_SINK_CAP,
  port->pwr_role,
  port->data_role,
+ port->negotiated_rev,
  port->message_id,
  port->nr_snk_pdo);
}
@@ -1173,6 +1205,7 @@ static void vdm_run_state_machine(struct tcpm_port *port)
msg.header =

Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free

2018-03-25 Thread Aaron Lu

On Thu, Mar 22, 2018 at 08:17:19AM -0700, Matthew Wilcox wrote:
> On Tue, Mar 13, 2018 at 11:34:53AM +0800, Aaron Lu wrote:
> > I wish there is a data structure that has the flexibility of list while
> > at the same time we can locate the Nth element in the list without the
> > need to iterate. That's what I'm looking for when developing clustered
> > allocation for order 0 pages. In the end, I had to use another place to
> > record where the Nth element is. I hope to send out v2 of that RFC
> > series soon but I'm still collecting data for it. I would appreciate if
> > people could take a look then :-)
> 
> Sorry, I missed this.  There is such a data structure -- the IDR, or
> possibly a bare radix tree, or we can build a better data structure on
> top of the radix tree (I talked about one called the XQueue a while ago).
> 
> The IDR will automatically grow to whatever needed size, it stores
> pointers, you can find out quickly where the last allocated index is,
> you can remove from the middle of the array.  Disadvantage is that it
> requires memory allocation to store the array of pointers, *but* it
> can always hold at least one entry.  So if you have no memory, you can
> always return the one element in your IDR to the free pool and allocate
> from that page.

Thanks for the pointer, will take a look later.
Currently I'm focusing on finding real workloads that have zone lock
contention issue.

Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free

2018-03-25 Thread Aaron Lu

On Thu, Mar 22, 2018 at 08:17:19AM -0700, Matthew Wilcox wrote:
> On Tue, Mar 13, 2018 at 11:34:53AM +0800, Aaron Lu wrote:
> > I wish there is a data structure that has the flexibility of list while
> > at the same time we can locate the Nth element in the list without the
> > need to iterate. That's what I'm looking for when developing clustered
> > allocation for order 0 pages. In the end, I had to use another place to
> > record where the Nth element is. I hope to send out v2 of that RFC
> > series soon but I'm still collecting data for it. I would appreciate if
> > people could take a look then :-)
> 
> Sorry, I missed this.  There is such a data structure -- the IDR, or
> possibly a bare radix tree, or we can build a better data structure on
> top of the radix tree (I talked about one called the XQueue a while ago).
> 
> The IDR will automatically grow to whatever needed size, it stores
> pointers, you can find out quickly where the last allocated index is,
> you can remove from the middle of the array.  Disadvantage is that it
> requires memory allocation to store the array of pointers, *but* it
> can always hold at least one entry.  So if you have no memory, you can
> always return the one element in your IDR to the free pool and allocate
> from that page.

Thanks for the pointer, will take a look later.
Currently I'm focusing on finding real workloads that have zone lock
contention issue.

[PATCH v3 4/5] arm64: introduce pfn_valid_region()

2018-03-25 Thread Jia He

This is the preparation for further optimizing in early_pfn_valid
on arm64.

Signed-off-by: Jia He 
---
 arch/arm64/include/asm/page.h |  3 ++-
 arch/arm64/mm/init.c  | 25 -
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 60d02c8..da2cba3 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,7 +38,8 @@ extern void clear_page(void *to);
 typedef struct page *pgtable_t;
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
-extern int pfn_valid(unsigned long);
+extern int pfn_valid(unsigned long pfn);
+extern int pfn_valid_region(unsigned long pfn, int *last_idx);
 #endif
 
 #include 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b90..06433d5 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -290,7 +290,30 @@ int pfn_valid(unsigned long pfn)
return memblock_is_map_memory(pfn << PAGE_SHIFT);
 }
 EXPORT_SYMBOL(pfn_valid);
-#endif
+
+int pfn_valid_region(unsigned long pfn, int *last_idx)
+{
+   unsigned long start_pfn, end_pfn;
+   struct memblock_type *type = 
+
+   if (*last_idx != -1) {
+   start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+   end_pfn= PFN_DOWN(type->regions[*last_idx].base +
+   type->regions[*last_idx].size);
+
+   if (pfn >= start_pfn && pfn < end_pfn)
+   return !memblock_is_nomap(
+   [*last_idx]);
+   }
+
+   *last_idx = memblock_search_pfn_regions(pfn);
+   if (*last_idx == -1)
+   return false;
+
+   return !memblock_is_nomap([*last_idx]);
+}
+EXPORT_SYMBOL(pfn_valid_region);
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 
 #ifndef CONFIG_SPARSEMEM
 static void __init arm64_memory_present(void)
-- 
2.7.4

[PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

2018-03-25 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), if pfn and
pfn+1 are in the same memblock region, we can record the last returned
memblock region index and check check pfn++ is still in the same region.

Currently it only improve the performance on arm64 and will have no
impact on other arches.

Signed-off-by: Jia He 
---
 arch/x86/include/asm/mmzone_32.h |  2 +-
 include/linux/mmzone.h   | 12 +---
 mm/page_alloc.c  |  2 +-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 73d8dd1..329d3ba 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn)
return 0;
 }
 
-#define early_pfn_valid(pfn)   pfn_valid((pfn))
+#define early_pfn_valid(pfn, last_region_idx)  pfn_valid((pfn))
 
 #endif /* CONFIG_DISCONTIGMEM */
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d797716..3a686af 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn)
 })
 #else
 #define pfn_to_nid(pfn)(0)
-#endif
+#endif /*CONFIG_NUMA*/
+
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+#define early_pfn_valid(pfn, last_region_idx) \
+   pfn_valid_region(pfn, last_region_idx)
+#else
+#define early_pfn_valid(pfn, last_region_idx)  pfn_valid(pfn)
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 
-#define early_pfn_valid(pfn)   pfn_valid(pfn)
 void sparse_init(void);
 #else
 #define sparse_init()  do {} while (0)
@@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache {
 };
 
 #ifndef early_pfn_valid
-#define early_pfn_valid(pfn)   (1)
+#define early_pfn_valid(pfn, last_region_idx)  (1)
 #endif
 
 void memory_present(int nid, unsigned long start, unsigned long end);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0bb0274..debccf3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5484,7 +5484,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
if (context != MEMMAP_EARLY)
goto not_early;
 
-   if (!early_pfn_valid(pfn)) {
+   if (!early_pfn_valid(pfn, )) {
 #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
/*
 * Skip to the pfn preceding the next valid one (or
-- 
2.7.4

[PATCH v3 3/5] mm/memblock: introduce memblock_search_pfn_regions()

2018-03-25 Thread Jia He

This api is the preparation for further optimizing early_pfn_valid

Signed-off-by: Jia He 
---
 include/linux/memblock.h | 2 ++
 mm/memblock.c| 9 +
 2 files changed, 11 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index a8fb2ab..104bca6 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -207,6 +207,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
 #endif
 
+int memblock_search_pfn_regions(unsigned long pfn);
+
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index 06c1a08..15fcde2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1661,6 +1661,15 @@ static int __init_memblock memblock_search(struct 
memblock_type *type, phys_addr
return -1;
 }
 
+/* search memblock with the input pfn, return the region idx */
+int __init_memblock memblock_search_pfn_regions(unsigned long pfn)
+{
+   struct memblock_type *type = 
+   int mid = memblock_search(type, PFN_PHYS(pfn));
+
+   return mid;
+}
+
 bool __init memblock_is_reserved(phys_addr_t addr)
 {
return memblock_search(, addr) != -1;
-- 
2.7.4

[PATCH v3 4/5] arm64: introduce pfn_valid_region()

2018-03-25 Thread Jia He

This is the preparation for further optimizing in early_pfn_valid
on arm64.

Signed-off-by: Jia He 
---
 arch/arm64/include/asm/page.h |  3 ++-
 arch/arm64/mm/init.c  | 25 -
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 60d02c8..da2cba3 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,7 +38,8 @@ extern void clear_page(void *to);
 typedef struct page *pgtable_t;
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
-extern int pfn_valid(unsigned long);
+extern int pfn_valid(unsigned long pfn);
+extern int pfn_valid_region(unsigned long pfn, int *last_idx);
 #endif
 
 #include 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b90..06433d5 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -290,7 +290,30 @@ int pfn_valid(unsigned long pfn)
return memblock_is_map_memory(pfn << PAGE_SHIFT);
 }
 EXPORT_SYMBOL(pfn_valid);
-#endif
+
+int pfn_valid_region(unsigned long pfn, int *last_idx)
+{
+   unsigned long start_pfn, end_pfn;
+   struct memblock_type *type = 
+
+   if (*last_idx != -1) {
+   start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+   end_pfn= PFN_DOWN(type->regions[*last_idx].base +
+   type->regions[*last_idx].size);
+
+   if (pfn >= start_pfn && pfn < end_pfn)
+   return !memblock_is_nomap(
+   [*last_idx]);
+   }
+
+   *last_idx = memblock_search_pfn_regions(pfn);
+   if (*last_idx == -1)
+   return false;
+
+   return !memblock_is_nomap([*last_idx]);
+}
+EXPORT_SYMBOL(pfn_valid_region);
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 
 #ifndef CONFIG_SPARSEMEM
 static void __init arm64_memory_present(void)
-- 
2.7.4

[PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

2018-03-25 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. in early_pfn_valid(), if pfn and
pfn+1 are in the same memblock region, we can record the last returned
memblock region index and check check pfn++ is still in the same region.

Currently it only improve the performance on arm64 and will have no
impact on other arches.

Signed-off-by: Jia He 
---
 arch/x86/include/asm/mmzone_32.h |  2 +-
 include/linux/mmzone.h   | 12 +---
 mm/page_alloc.c  |  2 +-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 73d8dd1..329d3ba 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn)
return 0;
 }
 
-#define early_pfn_valid(pfn)   pfn_valid((pfn))
+#define early_pfn_valid(pfn, last_region_idx)  pfn_valid((pfn))
 
 #endif /* CONFIG_DISCONTIGMEM */
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d797716..3a686af 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn)
 })
 #else
 #define pfn_to_nid(pfn)(0)
-#endif
+#endif /*CONFIG_NUMA*/
+
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+#define early_pfn_valid(pfn, last_region_idx) \
+   pfn_valid_region(pfn, last_region_idx)
+#else
+#define early_pfn_valid(pfn, last_region_idx)  pfn_valid(pfn)
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
 
-#define early_pfn_valid(pfn)   pfn_valid(pfn)
 void sparse_init(void);
 #else
 #define sparse_init()  do {} while (0)
@@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache {
 };
 
 #ifndef early_pfn_valid
-#define early_pfn_valid(pfn)   (1)
+#define early_pfn_valid(pfn, last_region_idx)  (1)
 #endif
 
 void memory_present(int nid, unsigned long start, unsigned long end);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0bb0274..debccf3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5484,7 +5484,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
if (context != MEMMAP_EARLY)
goto not_early;
 
-   if (!early_pfn_valid(pfn)) {
+   if (!early_pfn_valid(pfn, )) {
 #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
/*
 * Skip to the pfn preceding the next valid one (or
-- 
2.7.4

[PATCH v3 3/5] mm/memblock: introduce memblock_search_pfn_regions()

2018-03-25 Thread Jia He

This api is the preparation for further optimizing early_pfn_valid

Signed-off-by: Jia He 
---
 include/linux/memblock.h | 2 ++
 mm/memblock.c| 9 +
 2 files changed, 11 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index a8fb2ab..104bca6 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -207,6 +207,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
 #endif
 
+int memblock_search_pfn_regions(unsigned long pfn);
+
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index 06c1a08..15fcde2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1661,6 +1661,15 @@ static int __init_memblock memblock_search(struct 
memblock_type *type, phys_addr
return -1;
 }
 
+/* search memblock with the input pfn, return the region idx */
+int __init_memblock memblock_search_pfn_regions(unsigned long pfn)
+{
+   struct memblock_type *type = 
+   int mid = memblock_search(type, PFN_PHYS(pfn));
+
+   return mid;
+}
+
 bool __init memblock_is_reserved(phys_addr_t addr)
 {
return memblock_search(, addr) != -1;
-- 
2.7.4

[PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()

2018-03-25 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. if pfn and pfn+1 are in the same
memblock region, we can simply pfn++ instead of doing the binary search
in memblock_next_valid_pfn. This patch only works when
CONFIG_HAVE_ARCH_PFN_VALID is enable.

Signed-off-by: Jia He 
---
 include/linux/memblock.h |  2 +-
 mm/memblock.c| 73 +---
 mm/page_alloc.c  |  3 +-
 3 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index efbbe4b..a8fb2ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
-unsigned long memblock_next_valid_pfn(unsigned long pfn);
+unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
 #endif
 
 /**
diff --git a/mm/memblock.c b/mm/memblock.c
index bea5a9c..06c1a08 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int 
nid,
*out_nid = r->nid;
 }
 
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
-{
-   struct memblock_type *type = 
-   unsigned int right = type->cnt;
-   unsigned int mid, left = 0;
-   phys_addr_t addr = PFN_PHYS(++pfn);
-
-   do {
-   mid = (right + left) / 2;
-
-   if (addr < type->regions[mid].base)
-   right = mid;
-   else if (addr >= (type->regions[mid].base +
- type->regions[mid].size))
-   left = mid + 1;
-   else {
-   /* addr is within the region, so pfn is valid */
-   return pfn;
-   }
-   } while (left < right);
-
-   if (right == type->cnt)
-   return -1UL;
-   else
-   return PHYS_PFN(type->regions[right].base);
-}
-#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
-
 /**
  * memblock_set_node - set node ID on memblock regions
  * @base: base of area to set node ID for
@@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, 
phys_addr_t size,
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
+   int *last_idx)
+{
+   struct memblock_type *type = 
+   unsigned int right = type->cnt;
+   unsigned int mid, left = 0;
+   unsigned long start_pfn, end_pfn;
+   phys_addr_t addr = PFN_PHYS(++pfn);
+
+   /* fast path, return pfh+1 if next pfn is in the same region */
+   if (*last_idx != -1) {
+   start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+   end_pfn = PFN_DOWN(type->regions[*last_idx].base +
+   type->regions[*last_idx].size);
+
+   if (pfn < end_pfn && pfn > start_pfn)
+   return pfn;
+   }
+
+   /* slow path, do the binary searching */
+   do {
+   mid = (right + left) / 2;
+
+   if (addr < type->regions[mid].base)
+   right = mid;
+   else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+   left = mid + 1;
+   else {
+   *last_idx = mid;
+   return pfn;
+   }
+   } while (left < right);
+
+   if (right == type->cnt)
+   return -1UL;
+
+   *last_idx = right;
+
+   return PHYS_PFN(type->regions[*last_idx].base);
+}
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
 static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
phys_addr_t end, int nid, ulong flags)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2a967f7..0bb0274 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
unsigned long end_pfn = start_pfn + size;
pg_data_t *pgdat = NODE_DATA(nid);
unsigned long pfn;
+   int idx = -1;
unsigned long nr_initialised = 0;
struct page *page;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
@@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 * end_pfn), such that we hit a valid pfn (or end_pfn)
 * on our next iteration of the loop.

[PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable

2018-03-25 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic bug. So Daniel Vacek reverted it later.

But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
enable. And as verified by Eugeniu Rosca, arm can benifit from this
commit. So remain the memblock_next_valid_pfn.

Signed-off-by: Jia He 
---
 include/linux/memblock.h |  4 
 mm/memblock.c| 29 +
 mm/page_alloc.c  | 11 ++-
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 0257aee..efbbe4b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 i >= 0; __next_mem_pfn_range(, nid, p_start, p_end, p_nid))
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long memblock_next_valid_pfn(unsigned long pfn);
+#endif
+
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index ba7c878..bea5a9c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int 
nid,
*out_nid = r->nid;
 }
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
+{
+   struct memblock_type *type = 
+   unsigned int right = type->cnt;
+   unsigned int mid, left = 0;
+   phys_addr_t addr = PFN_PHYS(++pfn);
+
+   do {
+   mid = (right + left) / 2;
+
+   if (addr < type->regions[mid].base)
+   right = mid;
+   else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+   left = mid + 1;
+   else {
+   /* addr is within the region, so pfn is valid */
+   return pfn;
+   }
+   } while (left < right);
+
+   if (right == type->cnt)
+   return -1UL;
+   else
+   return PHYS_PFN(type->regions[right].base);
+}
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
 /**
  * memblock_set_node - set node ID on memblock regions
  * @base: base of area to set node ID for
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c19f5ac..2a967f7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
if (context != MEMMAP_EARLY)
goto not_early;
 
-   if (!early_pfn_valid(pfn))
+   if (!early_pfn_valid(pfn)) {
+#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
+   /*
+* Skip to the pfn preceding the next valid one (or
+* end_pfn), such that we hit a valid pfn (or end_pfn)
+* on our next iteration of the loop.
+*/
+   pfn = memblock_next_valid_pfn(pfn) - 1;
+#endif
continue;
+   }
if (!early_pfn_in_nid(pfn, nid))
continue;
if (!update_defer_init(pgdat, pfn, end_pfn, _initialised))
-- 
2.7.4

[PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()

2018-03-25 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But there is
still some room for improvement. E.g. if pfn and pfn+1 are in the same
memblock region, we can simply pfn++ instead of doing the binary search
in memblock_next_valid_pfn. This patch only works when
CONFIG_HAVE_ARCH_PFN_VALID is enable.

Signed-off-by: Jia He 
---
 include/linux/memblock.h |  2 +-
 mm/memblock.c| 73 +---
 mm/page_alloc.c  |  3 +-
 3 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index efbbe4b..a8fb2ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
-unsigned long memblock_next_valid_pfn(unsigned long pfn);
+unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx);
 #endif
 
 /**
diff --git a/mm/memblock.c b/mm/memblock.c
index bea5a9c..06c1a08 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int 
nid,
*out_nid = r->nid;
 }
 
-#ifdef CONFIG_HAVE_ARCH_PFN_VALID
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
-{
-   struct memblock_type *type = 
-   unsigned int right = type->cnt;
-   unsigned int mid, left = 0;
-   phys_addr_t addr = PFN_PHYS(++pfn);
-
-   do {
-   mid = (right + left) / 2;
-
-   if (addr < type->regions[mid].base)
-   right = mid;
-   else if (addr >= (type->regions[mid].base +
- type->regions[mid].size))
-   left = mid + 1;
-   else {
-   /* addr is within the region, so pfn is valid */
-   return pfn;
-   }
-   } while (left < right);
-
-   if (right == type->cnt)
-   return -1UL;
-   else
-   return PHYS_PFN(type->regions[right].base);
-}
-#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
-
 /**
  * memblock_set_node - set node ID on memblock regions
  * @base: base of area to set node ID for
@@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, 
phys_addr_t size,
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
+   int *last_idx)
+{
+   struct memblock_type *type = 
+   unsigned int right = type->cnt;
+   unsigned int mid, left = 0;
+   unsigned long start_pfn, end_pfn;
+   phys_addr_t addr = PFN_PHYS(++pfn);
+
+   /* fast path, return pfh+1 if next pfn is in the same region */
+   if (*last_idx != -1) {
+   start_pfn = PFN_DOWN(type->regions[*last_idx].base);
+   end_pfn = PFN_DOWN(type->regions[*last_idx].base +
+   type->regions[*last_idx].size);
+
+   if (pfn < end_pfn && pfn > start_pfn)
+   return pfn;
+   }
+
+   /* slow path, do the binary searching */
+   do {
+   mid = (right + left) / 2;
+
+   if (addr < type->regions[mid].base)
+   right = mid;
+   else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+   left = mid + 1;
+   else {
+   *last_idx = mid;
+   return pfn;
+   }
+   } while (left < right);
+
+   if (right == type->cnt)
+   return -1UL;
+
+   *last_idx = right;
+
+   return PHYS_PFN(type->regions[*last_idx].base);
+}
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
 static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
phys_addr_t end, int nid, ulong flags)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2a967f7..0bb0274 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
unsigned long end_pfn = start_pfn + size;
pg_data_t *pgdat = NODE_DATA(nid);
unsigned long pfn;
+   int idx = -1;
unsigned long nr_initialised = 0;
struct page *page;
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
@@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 * end_pfn), such that we hit a valid pfn (or end_pfn)
 * on our next iteration of the loop.
 */
-

[PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable

2018-03-25 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic bug. So Daniel Vacek reverted it later.

But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is
enable. And as verified by Eugeniu Rosca, arm can benifit from this
commit. So remain the memblock_next_valid_pfn.

Signed-off-by: Jia He 
---
 include/linux/memblock.h |  4 
 mm/memblock.c| 29 +
 mm/page_alloc.c  | 11 ++-
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 0257aee..efbbe4b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 i >= 0; __next_mem_pfn_range(, nid, p_start, p_end, p_nid))
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long memblock_next_valid_pfn(unsigned long pfn);
+#endif
+
 /**
  * for_each_free_mem_range - iterate through free memblock areas
  * @i: u64 used as loop variable
diff --git a/mm/memblock.c b/mm/memblock.c
index ba7c878..bea5a9c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int 
nid,
*out_nid = r->nid;
 }
 
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
+{
+   struct memblock_type *type = 
+   unsigned int right = type->cnt;
+   unsigned int mid, left = 0;
+   phys_addr_t addr = PFN_PHYS(++pfn);
+
+   do {
+   mid = (right + left) / 2;
+
+   if (addr < type->regions[mid].base)
+   right = mid;
+   else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+   left = mid + 1;
+   else {
+   /* addr is within the region, so pfn is valid */
+   return pfn;
+   }
+   } while (left < right);
+
+   if (right == type->cnt)
+   return -1UL;
+   else
+   return PHYS_PFN(type->regions[right].base);
+}
+#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/
+
 /**
  * memblock_set_node - set node ID on memblock regions
  * @base: base of area to set node ID for
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c19f5ac..2a967f7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
if (context != MEMMAP_EARLY)
goto not_early;
 
-   if (!early_pfn_valid(pfn))
+   if (!early_pfn_valid(pfn)) {
+#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID)
+   /*
+* Skip to the pfn preceding the next valid one (or
+* end_pfn), such that we hit a valid pfn (or end_pfn)
+* on our next iteration of the loop.
+*/
+   pfn = memblock_next_valid_pfn(pfn) - 1;
+#endif
continue;
+   }
if (!early_pfn_in_nid(pfn, nid))
continue;
if (!update_defer_init(pgdat, pfn, end_pfn, _initialised))
-- 
2.7.4

[PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid

2018-03-25 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") tried to optimize the loop in memmap_init_zone(). But
there is still some room for improvement.

Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
is enabled
Patch 2 optimizes the memblock_next_valid_pfn()
Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
because the changes are located across subsystems.

I tested the pfn loop process in memmap_init(), the same as before.
As for the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 41313 us to 24345 us in my
armv8a server(QDF2400 with 96G memory).

Attached the memblock region information in my server.
[   86.956758] Zone ranges:
[   86.959452]   DMA  [mem 0x0020-0x]
[   86.966041]   Normal   [mem 0x0001-0x0017]
[   86.972631] Movable zone start for each node
[   86.977179] Early memory node ranges
[   86.980985]   node   0: [mem 0x0020-0x0021]
[   86.987666]   node   0: [mem 0x0082-0x0307]
[   86.994348]   node   0: [mem 0x0308-0x0308]
[   87.001029]   node   0: [mem 0x0309-0x031f]
[   87.007710]   node   0: [mem 0x0320-0x033f]
[   87.014392]   node   0: [mem 0x0341-0x0563]
[   87.021073]   node   0: [mem 0x0564-0x0567]
[   87.027754]   node   0: [mem 0x0568-0x056d]
[   87.034435]   node   0: [mem 0x056e-0x086f]
[   87.041117]   node   0: [mem 0x0870-0x0871]
[   87.047798]   node   0: [mem 0x0872-0x0894]
[   87.054479]   node   0: [mem 0x0895-0x08ba]
[   87.061161]   node   0: [mem 0x08bb-0x08bc]
[   87.067842]   node   0: [mem 0x08bd-0x08c4]
[   87.074524]   node   0: [mem 0x08c5-0x08e2]
[   87.081205]   node   0: [mem 0x08e3-0x08e4]
[   87.087886]   node   0: [mem 0x08e5-0x08fc]
[   87.094568]   node   0: [mem 0x08fd-0x0910]
[   87.101249]   node   0: [mem 0x0911-0x092e]
[   87.107930]   node   0: [mem 0x092f-0x0930]
[   87.114612]   node   0: [mem 0x0931-0x0963]
[   87.121293]   node   0: [mem 0x0964-0x0e61]
[   87.127975]   node   0: [mem 0x0e62-0x0e64]
[   87.134657]   node   0: [mem 0x0e65-0x0fff]
[   87.141338]   node   0: [mem 0x1080-0x17fe]
[   87.148019]   node   0: [mem 0x1c00-0x1c00]
[   87.154701]   node   0: [mem 0x1c01-0x1c7f]
[   87.161383]   node   0: [mem 0x1c81-0x7efb]
[   87.168064]   node   0: [mem 0x7efc-0x7efd]
[   87.174746]   node   0: [mem 0x7efe-0x7efe]
[   87.181427]   node   0: [mem 0x7eff-0x7eff]
[   87.188108]   node   0: [mem 0x7f00-0x0017]
[   87.194791] Initmem setup node 0 [mem 0x0020-0x0017]

Without this patchset:
[  117.106153] Initmem setup node 0 [mem 0x0020-0x0017]
[  117.113677] before memmap_init
[  117.118195] after  memmap_init
>>> memmap_init takes 4518 us
[  117.121446] before memmap_init
[  117.154992] after  memmap_init
>>> memmap_init takes 33546 us
[  117.158241] before memmap_init
[  117.161490] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 41313 us

With this patchset:
[   87.194791] Initmem setup node 0 [mem 0x0020-0x0017]
[   87.202314] before memmap_init
[   87.206164] after  memmap_init
>>> memmap_init takes 3850 us
[   87.209416] before memmap_init
[   87.226662] after  memmap_init
>>> memmap_init takes 17246 us
[   87.229911] before memmap_init
[   87.233160] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 24345 us

Changelog:
V3: - fix 2 issues reported by kbuild test robot
V2: - rebase to mmotm latest
- remain memblock_next_valid_pfn on arm64
- refine memblock_search_pfn_regions and pfn_valid_region

Jia He (5):
  mm: page_alloc: remain memblock_next_valid_pfn() when
CONFIG_HAVE_ARCH_PFN_VALID is enable
  mm: page_alloc: reduce unnecessary binary search in
memblock_next_valid_pfn()
  mm/memblock: introduce memblock_search_pfn_regions()
  arm64: introduce pfn_valid_region()
  mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

 arch/arm64/include/asm/page.h|  3 ++-
 arch/arm64/mm/init.c | 25 ++-
 arch/x86/include/asm/mmzone_32.h |  2 +-
 include/linux/memblock.h |  6 +
 include/linux/mmzone.h

[PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid

2018-03-25 Thread Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") tried to optimize the loop in memmap_init_zone(). But
there is still some room for improvement.

Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID
is enabled
Patch 2 optimizes the memblock_next_valid_pfn()
Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts
because the changes are located across subsystems.

I tested the pfn loop process in memmap_init(), the same as before.
As for the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 41313 us to 24345 us in my
armv8a server(QDF2400 with 96G memory).

Attached the memblock region information in my server.
[   86.956758] Zone ranges:
[   86.959452]   DMA  [mem 0x0020-0x]
[   86.966041]   Normal   [mem 0x0001-0x0017]
[   86.972631] Movable zone start for each node
[   86.977179] Early memory node ranges
[   86.980985]   node   0: [mem 0x0020-0x0021]
[   86.987666]   node   0: [mem 0x0082-0x0307]
[   86.994348]   node   0: [mem 0x0308-0x0308]
[   87.001029]   node   0: [mem 0x0309-0x031f]
[   87.007710]   node   0: [mem 0x0320-0x033f]
[   87.014392]   node   0: [mem 0x0341-0x0563]
[   87.021073]   node   0: [mem 0x0564-0x0567]
[   87.027754]   node   0: [mem 0x0568-0x056d]
[   87.034435]   node   0: [mem 0x056e-0x086f]
[   87.041117]   node   0: [mem 0x0870-0x0871]
[   87.047798]   node   0: [mem 0x0872-0x0894]
[   87.054479]   node   0: [mem 0x0895-0x08ba]
[   87.061161]   node   0: [mem 0x08bb-0x08bc]
[   87.067842]   node   0: [mem 0x08bd-0x08c4]
[   87.074524]   node   0: [mem 0x08c5-0x08e2]
[   87.081205]   node   0: [mem 0x08e3-0x08e4]
[   87.087886]   node   0: [mem 0x08e5-0x08fc]
[   87.094568]   node   0: [mem 0x08fd-0x0910]
[   87.101249]   node   0: [mem 0x0911-0x092e]
[   87.107930]   node   0: [mem 0x092f-0x0930]
[   87.114612]   node   0: [mem 0x0931-0x0963]
[   87.121293]   node   0: [mem 0x0964-0x0e61]
[   87.127975]   node   0: [mem 0x0e62-0x0e64]
[   87.134657]   node   0: [mem 0x0e65-0x0fff]
[   87.141338]   node   0: [mem 0x1080-0x17fe]
[   87.148019]   node   0: [mem 0x1c00-0x1c00]
[   87.154701]   node   0: [mem 0x1c01-0x1c7f]
[   87.161383]   node   0: [mem 0x1c81-0x7efb]
[   87.168064]   node   0: [mem 0x7efc-0x7efd]
[   87.174746]   node   0: [mem 0x7efe-0x7efe]
[   87.181427]   node   0: [mem 0x7eff-0x7eff]
[   87.188108]   node   0: [mem 0x7f00-0x0017]
[   87.194791] Initmem setup node 0 [mem 0x0020-0x0017]

Without this patchset:
[  117.106153] Initmem setup node 0 [mem 0x0020-0x0017]
[  117.113677] before memmap_init
[  117.118195] after  memmap_init
>>> memmap_init takes 4518 us
[  117.121446] before memmap_init
[  117.154992] after  memmap_init
>>> memmap_init takes 33546 us
[  117.158241] before memmap_init
[  117.161490] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 41313 us

With this patchset:
[   87.194791] Initmem setup node 0 [mem 0x0020-0x0017]
[   87.202314] before memmap_init
[   87.206164] after  memmap_init
>>> memmap_init takes 3850 us
[   87.209416] before memmap_init
[   87.226662] after  memmap_init
>>> memmap_init takes 17246 us
[   87.229911] before memmap_init
[   87.233160] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 24345 us

Changelog:
V3: - fix 2 issues reported by kbuild test robot
V2: - rebase to mmotm latest
- remain memblock_next_valid_pfn on arm64
- refine memblock_search_pfn_regions and pfn_valid_region

Jia He (5):
  mm: page_alloc: remain memblock_next_valid_pfn() when
CONFIG_HAVE_ARCH_PFN_VALID is enable
  mm: page_alloc: reduce unnecessary binary search in
memblock_next_valid_pfn()
  mm/memblock: introduce memblock_search_pfn_regions()
  arm64: introduce pfn_valid_region()
  mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

 arch/arm64/include/asm/page.h|  3 ++-
 arch/arm64/mm/init.c | 25 ++-
 arch/x86/include/asm/mmzone_32.h |  2 +-
 include/linux/memblock.h |  6 +
 include/linux/mmzone.h

[PATCH v29 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

2018-03-25 Thread Wei Wang

The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value in use.

Signed-off-by: Wei Wang 
Suggested-by: Michael S. Tsirkin 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 10 ++
 include/uapi/linux/virtio_balloon.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 18d24a4..6de9339 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -699,6 +699,7 @@ static struct file_system_type balloon_fs = {
 static int virtballoon_probe(struct virtio_device *vdev)
 {
struct virtio_balloon *vb;
+   __u32 poison_val;
int err;
 
if (!vdev->config->get) {
@@ -744,6 +745,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->stop_cmd_id = cpu_to_virtio32(vb->vdev,
VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
INIT_WORK(>report_free_page_work, report_free_page_func);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+   memset(_val, PAGE_POISON, sizeof(poison_val));
+   virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+ poison_val, _val);
+   }
}
 
vb->nb.notifier_call = virtballoon_oom_notify;
@@ -862,6 +868,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
 
 static int virtballoon_validate(struct virtio_device *vdev)
 {
+   if (!page_poisoning_enabled())
+   __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
return 0;
 }
@@ -871,6 +880,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+   VIRTIO_BALLOON_F_PAGE_POISON,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index b2d86c2..8b93581 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -47,6 +48,8 @@ struct virtio_balloon_config {
__u32 actual;
/* Free page report command id, readonly by guest */
__u32 free_page_report_cmd_id;
+   /* Stores PAGE_POISON if page poisoning is in use */
+   __u32 poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4

[PATCH v29 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

2018-03-25 Thread Wei Wang

The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value in use.

Signed-off-by: Wei Wang 
Suggested-by: Michael S. Tsirkin 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 10 ++
 include/uapi/linux/virtio_balloon.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 18d24a4..6de9339 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -699,6 +699,7 @@ static struct file_system_type balloon_fs = {
 static int virtballoon_probe(struct virtio_device *vdev)
 {
struct virtio_balloon *vb;
+   __u32 poison_val;
int err;
 
if (!vdev->config->get) {
@@ -744,6 +745,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->stop_cmd_id = cpu_to_virtio32(vb->vdev,
VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
INIT_WORK(>report_free_page_work, report_free_page_func);
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+   memset(_val, PAGE_POISON, sizeof(poison_val));
+   virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+ poison_val, _val);
+   }
}
 
vb->nb.notifier_call = virtballoon_oom_notify;
@@ -862,6 +868,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
 
 static int virtballoon_validate(struct virtio_device *vdev)
 {
+   if (!page_poisoning_enabled())
+   __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
return 0;
 }
@@ -871,6 +880,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+   VIRTIO_BALLOON_F_PAGE_POISON,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index b2d86c2..8b93581 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -47,6 +48,8 @@ struct virtio_balloon_config {
__u32 actual;
/* Free page report command id, readonly by guest */
__u32 free_page_report_cmd_id;
+   /* Stores PAGE_POISON if page poisoning is in use */
+   __u32 poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4

[PATCH v29 1/4] mm: support reporting free page blocks

2018-03-25 Thread Wei Wang

This patch adds support to walk through the free page blocks in the
system and report them via a callback function. Some page blocks may
leave the free list after zone->lock is released, so it is the caller's
responsibility to either detect or prevent the use of such pages.

One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are reported as free pages but are written after the report function
returns will be captured by the hypervisor, and they will be added to the
next round of memory transfer.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Acked-by: Michal Hocko 
---
 include/linux/mm.h |  6 
 mm/page_alloc.c| 96 ++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..c72b5a9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1944,6 +1944,12 @@ extern void free_area_init_node(int nid, unsigned long * 
zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
 
+extern int walk_free_mem_block(void *opaque,
+  int min_order,
+  int (*report_pfn_range)(void *opaque,
+  unsigned long pfn,
+  unsigned long num));
+
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
  * into the buddy system. The freed pages will be poisoned with pattern
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 635d7dd..d58de87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4912,6 +4912,102 @@ void show_free_areas(unsigned int filter, nodemask_t 
*nodemask)
show_swap_cache_info();
 }
 
+/*
+ * Walk through a free page list and report the found pfn range via the
+ * callback.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+static int walk_free_page_list(void *opaque,
+  struct zone *zone,
+  int order,
+  enum migratetype mt,
+  int (*report_pfn_range)(void *,
+  unsigned long,
+  unsigned long))
+{
+   struct page *page;
+   struct list_head *list;
+   unsigned long pfn, flags;
+   int ret = 0;
+
+   spin_lock_irqsave(>lock, flags);
+   list = >free_area[order].free_list[mt];
+   list_for_each_entry(page, list, lru) {
+   pfn = page_to_pfn(page);
+   ret = report_pfn_range(opaque, pfn, 1 << order);
+   if (ret)
+   break;
+   }
+   spin_unlock_irqrestore(>lock, flags);
+
+   return ret;
+}
+
+/**
+ * walk_free_mem_block - Walk through the free page blocks in the system
+ * @opaque: the context passed from the caller
+ * @min_order: the minimum order of free lists to check
+ * @report_pfn_range: the callback to report the pfn range of the free pages
+ *
+ * If the callback returns a non-zero value, stop iterating the list of free
+ * page blocks. Otherwise, continue to report.
+ *
+ * Please note that there are no locking guarantees for the callback and
+ * that the reported pfn range might be freed or disappear after the
+ * callback returns so the caller has to be very careful how it is used.
+ *
+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.
+ *
+ * There is no guarantee that each free range will be reported only once
+ * during one walk_free_mem_block invocation.
+ *
+ * pfn_to_page on the given range is strongly discouraged and if there is
+ * an absolute need for that make sure to contact MM people to discuss
+ * potential problems.
+ *
+ * The function itself might sleep so it cannot be called from atomic
+ * contexts.
+ *
+ * In general low orders tend to be very volatile and so it makes more
+ * sense to query larger ones first for various optimizations which like
+ * ballooning etc... This will reduce the overhead as well.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+int walk_free_mem_block(void *opaque,
+   int

[PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules

2018-03-25 Thread Wei Wang

In some usages, e.g. virtio-balloon, a kernel module needs to know if
page poisoning is in use. This patch exposes the page_poisoning_enabled
function to kernel modules.

Signed-off-by: Wei Wang 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
---
 mm/page_poison.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index e83fd44..762b472 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf)
 }
 early_param("page_poison", early_page_poison_param);
 
+/**
+ * page_poisoning_enabled - check if page poisoning is enabled
+ *
+ * Return true if page poisoning is enabled, or false if not.
+ */
 bool page_poisoning_enabled(void)
 {
/*
@@ -29,6 +34,7 @@ bool page_poisoning_enabled(void)
(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
debug_pagealloc_enabled()));
 }
+EXPORT_SYMBOL_GPL(page_poisoning_enabled);
 
 static void poison_page(struct page *page)
 {
-- 
2.7.4

[PATCH v29 1/4] mm: support reporting free page blocks

2018-03-25 Thread Wei Wang

This patch adds support to walk through the free page blocks in the
system and report them via a callback function. Some page blocks may
leave the free list after zone->lock is released, so it is the caller's
responsibility to either detect or prevent the use of such pages.

One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are reported as free pages but are written after the report function
returns will be captured by the hypervisor, and they will be added to the
next round of memory transfer.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Acked-by: Michal Hocko 
---
 include/linux/mm.h |  6 
 mm/page_alloc.c| 96 ++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..c72b5a9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1944,6 +1944,12 @@ extern void free_area_init_node(int nid, unsigned long * 
zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
 
+extern int walk_free_mem_block(void *opaque,
+  int min_order,
+  int (*report_pfn_range)(void *opaque,
+  unsigned long pfn,
+  unsigned long num));
+
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
  * into the buddy system. The freed pages will be poisoned with pattern
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 635d7dd..d58de87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4912,6 +4912,102 @@ void show_free_areas(unsigned int filter, nodemask_t 
*nodemask)
show_swap_cache_info();
 }
 
+/*
+ * Walk through a free page list and report the found pfn range via the
+ * callback.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+static int walk_free_page_list(void *opaque,
+  struct zone *zone,
+  int order,
+  enum migratetype mt,
+  int (*report_pfn_range)(void *,
+  unsigned long,
+  unsigned long))
+{
+   struct page *page;
+   struct list_head *list;
+   unsigned long pfn, flags;
+   int ret = 0;
+
+   spin_lock_irqsave(>lock, flags);
+   list = >free_area[order].free_list[mt];
+   list_for_each_entry(page, list, lru) {
+   pfn = page_to_pfn(page);
+   ret = report_pfn_range(opaque, pfn, 1 << order);
+   if (ret)
+   break;
+   }
+   spin_unlock_irqrestore(>lock, flags);
+
+   return ret;
+}
+
+/**
+ * walk_free_mem_block - Walk through the free page blocks in the system
+ * @opaque: the context passed from the caller
+ * @min_order: the minimum order of free lists to check
+ * @report_pfn_range: the callback to report the pfn range of the free pages
+ *
+ * If the callback returns a non-zero value, stop iterating the list of free
+ * page blocks. Otherwise, continue to report.
+ *
+ * Please note that there are no locking guarantees for the callback and
+ * that the reported pfn range might be freed or disappear after the
+ * callback returns so the caller has to be very careful how it is used.
+ *
+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.
+ *
+ * There is no guarantee that each free range will be reported only once
+ * during one walk_free_mem_block invocation.
+ *
+ * pfn_to_page on the given range is strongly discouraged and if there is
+ * an absolute need for that make sure to contact MM people to discuss
+ * potential problems.
+ *
+ * The function itself might sleep so it cannot be called from atomic
+ * contexts.
+ *
+ * In general low orders tend to be very volatile and so it makes more
+ * sense to query larger ones first for various optimizations which like
+ * ballooning etc... This will reduce the overhead as well.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+int walk_free_mem_block(void *opaque,
+   int min_order,
+   int (*report_pfn_range)(void *opaque,
+   unsigned

[PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules

2018-03-25 Thread Wei Wang

In some usages, e.g. virtio-balloon, a kernel module needs to know if
page poisoning is in use. This patch exposes the page_poisoning_enabled
function to kernel modules.

Signed-off-by: Wei Wang 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
---
 mm/page_poison.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_poison.c b/mm/page_poison.c
index e83fd44..762b472 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf)
 }
 early_param("page_poison", early_page_poison_param);
 
+/**
+ * page_poisoning_enabled - check if page poisoning is enabled
+ *
+ * Return true if page poisoning is enabled, or false if not.
+ */
 bool page_poisoning_enabled(void)
 {
/*
@@ -29,6 +34,7 @@ bool page_poisoning_enabled(void)
(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
debug_pagealloc_enabled()));
 }
+EXPORT_SYMBOL_GPL(page_poisoning_enabled);
 
 static void poison_page(struct page *page)
 {
-- 
2.7.4

[PATCH v29 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-03-25 Thread Wei Wang

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
 drivers/virtio/virtio_balloon.c | 257 +++-
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 225 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..18d24a4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+   /* Buffer to store the stop sign */
+   __virtio32 stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -421,42 +458,163 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err,

[PATCH v29 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-03-25 Thread Wei Wang

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
 drivers/virtio/virtio_balloon.c | 257 +++-
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 225 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..18d24a4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+   /* Buffer to store the stop sign */
+   __virtio32 stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -421,42 +458,163 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err, nvqs;
+   struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+

[PATCH v29 0/4] Virtio-balloon: support free page reporting

2018-03-25 Thread Wei Wang

This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.

* Tests
- Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Guest: 8G RAM, 4 vCPU
Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
- Idle Guest Live Migration Time (results are averaged over 10 runs)
- Optimization v.s. Legacy = 261ms vs 1769ms --> ~86% reduction
- Guest with Linux Compilation Workload (make bzImage -j4):
- Live Migration Time (average)
  Optimization v.s. Legacy = 1260ms v.s. 2634ms --> ~51% reduction
- Linux Compilation Time
  Optimization v.s. Legacy = 4min58s v.s. 5min3s
  --> no obvious difference

ChangeLog:
v28->v29:
- mm/page_poison: only expose page_poison_enabled(), rather than more
  changes did in v28, as we are not 100% confident about that for now.
- virtio-balloon: use a separate buffer for the stop cmd, instead of
  having the start and stop cmd use the same buffer. This avoids the
  corner case that the start cmd is overridden by the stop cmd when
  the host has a delay in reading the start cmd.
v27->v28:
- mm/page_poison: Move PAGE_POISON to page_poison.c and add a function
  to expose page poison val to kernel modules.
v26->v27:
- add a new patch to expose page_poisoning_enabled to kernel modules
- virtio-balloon: set poison_val to 0x, instead of 0xaa
v25->v26: virtio-balloon changes only
- remove kicking free page vq since the host now polls the vq after
  initiating the reporting
- report_free_page_func: detach all the used buffers after sending
  the stop cmd id. This avoids leaving the detaching burden (i.e.
  overhead) to the next cmd id. Detaching here isn't considered
  overhead since the stop cmd id has been sent, and host has already
  moved formard.
v24->v25:
- mm: change walk_free_mem_block to return 0 (instead of true) on
  completing the report, and return a non-zero value from the
  callabck, which stops the reporting.
- virtio-balloon:
- use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc.
- avoid __virtio_clear_bit when bailing out;
- a new method to avoid reporting the some cmd id to host twice
- destroy_workqueue can cancel free page work when the feature is
  negotiated;
- fail probe when the free page vq size is less than 2.
v23->v24:
- change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to
  VIRTIO_BALLOON_F_FREE_PAGE_HINT
- kick when vq->num_free < half full, instead of "= half full"
- replace BUG_ON with bailing out
- check vb->balloon_wq in probe(), if null, bail out
- add a new feature bit for page poisoning
- solve the corner case that one cmd id being sent to host twice
v22->v23:
- change to kick the device when the vq is half-way full;
- open-code batch_free_page_sg into add_one_sg;
- change cmd_id from "uint32_t" to "__virtio32";
- reserver one entry in the vq for the driver to send cmd_id, instead
  of busywaiting for an available entry;
- add "stop_update" check before queue_work for prudence purpose for
  now, will have a separate patch to discuss this flag check later;
- init_vqs: change to put some variables on stack to have simpler
  implementation;
- add destroy_workqueue(vb->balloon_wq);

v21->v22:
- add_one_sg: some code and comment re-arrangement
- send_cmd_id: handle a cornercase

For previous ChangeLog, please reference
https://lwn.net/Articles/743660/

Wei Wang (4):
  mm: support reporting free page blocks
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 drivers/virtio/virtio_balloon.c | 267 +++-
 include/linux/mm.h  |   6 +

[PATCH v29 0/4] Virtio-balloon: support free page reporting

2018-03-25 Thread Wei Wang

This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:

Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.

This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.

* Tests
- Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Guest: 8G RAM, 4 vCPU
Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
- Idle Guest Live Migration Time (results are averaged over 10 runs)
- Optimization v.s. Legacy = 261ms vs 1769ms --> ~86% reduction
- Guest with Linux Compilation Workload (make bzImage -j4):
- Live Migration Time (average)
  Optimization v.s. Legacy = 1260ms v.s. 2634ms --> ~51% reduction
- Linux Compilation Time
  Optimization v.s. Legacy = 4min58s v.s. 5min3s
  --> no obvious difference

ChangeLog:
v28->v29:
- mm/page_poison: only expose page_poison_enabled(), rather than more
  changes did in v28, as we are not 100% confident about that for now.
- virtio-balloon: use a separate buffer for the stop cmd, instead of
  having the start and stop cmd use the same buffer. This avoids the
  corner case that the start cmd is overridden by the stop cmd when
  the host has a delay in reading the start cmd.
v27->v28:
- mm/page_poison: Move PAGE_POISON to page_poison.c and add a function
  to expose page poison val to kernel modules.
v26->v27:
- add a new patch to expose page_poisoning_enabled to kernel modules
- virtio-balloon: set poison_val to 0x, instead of 0xaa
v25->v26: virtio-balloon changes only
- remove kicking free page vq since the host now polls the vq after
  initiating the reporting
- report_free_page_func: detach all the used buffers after sending
  the stop cmd id. This avoids leaving the detaching burden (i.e.
  overhead) to the next cmd id. Detaching here isn't considered
  overhead since the stop cmd id has been sent, and host has already
  moved formard.
v24->v25:
- mm: change walk_free_mem_block to return 0 (instead of true) on
  completing the report, and return a non-zero value from the
  callabck, which stops the reporting.
- virtio-balloon:
- use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc.
- avoid __virtio_clear_bit when bailing out;
- a new method to avoid reporting the some cmd id to host twice
- destroy_workqueue can cancel free page work when the feature is
  negotiated;
- fail probe when the free page vq size is less than 2.
v23->v24:
- change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to
  VIRTIO_BALLOON_F_FREE_PAGE_HINT
- kick when vq->num_free < half full, instead of "= half full"
- replace BUG_ON with bailing out
- check vb->balloon_wq in probe(), if null, bail out
- add a new feature bit for page poisoning
- solve the corner case that one cmd id being sent to host twice
v22->v23:
- change to kick the device when the vq is half-way full;
- open-code batch_free_page_sg into add_one_sg;
- change cmd_id from "uint32_t" to "__virtio32";
- reserver one entry in the vq for the driver to send cmd_id, instead
  of busywaiting for an available entry;
- add "stop_update" check before queue_work for prudence purpose for
  now, will have a separate patch to discuss this flag check later;
- init_vqs: change to put some variables on stack to have simpler
  implementation;
- add destroy_workqueue(vb->balloon_wq);

v21->v22:
- add_one_sg: some code and comment re-arrangement
- send_cmd_id: handle a cornercase

For previous ChangeLog, please reference
https://lwn.net/Articles/743660/

Wei Wang (4):
  mm: support reporting free page blocks
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

 drivers/virtio/virtio_balloon.c | 267 +++-
 include/linux/mm.h  |   6 +

Re: [PATCH RFC] xfs, memcg: Call xfs_fs_nr_cached_objects() only in case of global reclaim

2018-03-25 Thread Dave Chinner

On Fri, Mar 23, 2018 at 03:39:53PM +0300, Kirill Tkhai wrote:
> On 23.03.2018 02:46, Dave Chinner wrote:
> > On Thu, Mar 22, 2018 at 07:52:37PM +0300, Kirill Tkhai wrote:
> >> Here is the problem I'm solving: https://lkml.org/lkml/2018/3/21/365.
> > 
> > Oh, finally you tell me what the problem is that you're trying to
> > solve. I *asked this several times* and got no response. Thank you
> > for wasting so much of my time.
> > 
> >> Current shrinker is not scalable. Then there are many memcg and mounts,
> >> the time of iteration shrink_slab() in case of global reclaim can
> >> take much time. There is times of shrink_slab() by the link. A node
> >> with 200 containers may waste 4 seconds on global reclaim just to
> >> iterate over all shrinkers for all cgroups, call shrinker::count_objects()
> >> and receive 0 zero objects.
> > 
> > So, your problem is the way the memcgs were tacked onto the side
> > of the list_lru infrastructure and are iterated, which has basically
> > nothing to do with the way the low level XFS inode shrinker behaves.
> > 
> > /me looks at the patches
> > 
> > /me shudders at the thought of external "cache has freeable items"
> > control for the shrinking of vfs caches.
> > 
> > Biggest problem I see with this is the scope for coherency bugs ini
> > the "memcg shrinker has freeable items" tracking. If that happens,
> > there's no way of getting that memcg to run it's shrinker ever
> > again. That seems very, very fragile and likely to be an endless
> > source of OOM bugs. The whole point of the shrinker polling
> > infrastructure is that it is not susceptible to this sort of bug.
> > 
> > Really, the problem is that there's no separate list of memcg aware
> > shrinkers, so every registered shrinker has to be iterated just to
> > find the one shrinker that is memcg aware.
> 
> I don't think the logic is difficult. There are generic rules,
> and the only task is to teach them memcg-aware shrinkers. Currently,
> they are only super block and workingsets shrinkers, and both of
> them are based on generic list_lru infrastructure. Shrinker-related
> bit is also cleared in generic code (shrink_slab()) only, and
> the algorhythm doesn't allow to clear it without double check.
> The only principle modification I'm thinking about is we should
> clear the bit only when the shrinker is called with maximum
> parameters: priority and GFP.

Lots of "simple logic" combined together makes for a complex mass of
difficult to understand and debug code.

And, really, you're not suffering from a memcg problem - you're
suffering from a "there are thousands of shrinkers" scalability
issue because superblocks have per-superblock shrinker contexts and
you have thousands of mounted filesystems.

> There are a lot of performance improving synchronizations in kernel,
> and they had been refused, the kernel would have remained in the age
> of big kernel lock.

That's false equivalence and hyperbole. The shrinkers are not
limiting what every Linux user can do with their hardware. It's not
a fundamental architectural limitation.  These sorts of arguments
are not convincing - this is the second time I've told you this, so
please stick to technical arguments and drop the dramatic
"anti-progress" conspiracy theory bullshit.

> > So why not just do the simple thing which is create a separate
> > "memcg-aware" shrinker list (i.e. create shrinker_list_memcg
> > alongside shrinker_list) and only iterate the shrinker_list_memcg
> > when a memcg is passed to shrink_slab()?
> > 
> > That means we'll only run 2 shrinkers per memcg at most (sueprblock
> > and working set) per memcg reclaim call. That's a simple 10-20 line
> > change, not a whole mess of new code and issues...
> 
> It was the first optimization, which came to my head, by there is no
> almost a performance profit, since memcg-aware shrinkers still be called
> per every memcg, and they are the biggest part of shrinkers in the system.

Sure, but a polling algorithm is not a fundamental performance
limitation.

The problem here is that the memcg infrastructure has caused an
exponential explosion in shrinker scanning.

> >> Can't we call shrink of shared objects only for top memcg? Something like
> >> this:
> > 
> > What's a "shared object", and how is it any different to a normal
> > slab cache object?
> 
> Sorry, it's erratum. I'm speaking about cached objects. I mean something like
> the below. The patch makes cached objects be cleared outside the memcg 
> iteration
> cycle (it has not sense to call them for every memcg since cached object logic
> just ignores memcg).

The cached flag seems like a hack to me. It does nothing to address
the number of shrinker callout calls (it actually increases them!),
just tries to hack around something you want a specific shrinker to
avoid doing.

I've asked *repeatedly* for a description of the actual workload
problems the XFS shrinker behaviour is causing you. In the absence
of any workload description, I'm simply going to

Re: [PATCH RFC] xfs, memcg: Call xfs_fs_nr_cached_objects() only in case of global reclaim

2018-03-25 Thread Dave Chinner

On Fri, Mar 23, 2018 at 03:39:53PM +0300, Kirill Tkhai wrote:
> On 23.03.2018 02:46, Dave Chinner wrote:
> > On Thu, Mar 22, 2018 at 07:52:37PM +0300, Kirill Tkhai wrote:
> >> Here is the problem I'm solving: https://lkml.org/lkml/2018/3/21/365.
> > 
> > Oh, finally you tell me what the problem is that you're trying to
> > solve. I *asked this several times* and got no response. Thank you
> > for wasting so much of my time.
> > 
> >> Current shrinker is not scalable. Then there are many memcg and mounts,
> >> the time of iteration shrink_slab() in case of global reclaim can
> >> take much time. There is times of shrink_slab() by the link. A node
> >> with 200 containers may waste 4 seconds on global reclaim just to
> >> iterate over all shrinkers for all cgroups, call shrinker::count_objects()
> >> and receive 0 zero objects.
> > 
> > So, your problem is the way the memcgs were tacked onto the side
> > of the list_lru infrastructure and are iterated, which has basically
> > nothing to do with the way the low level XFS inode shrinker behaves.
> > 
> > /me looks at the patches
> > 
> > /me shudders at the thought of external "cache has freeable items"
> > control for the shrinking of vfs caches.
> > 
> > Biggest problem I see with this is the scope for coherency bugs ini
> > the "memcg shrinker has freeable items" tracking. If that happens,
> > there's no way of getting that memcg to run it's shrinker ever
> > again. That seems very, very fragile and likely to be an endless
> > source of OOM bugs. The whole point of the shrinker polling
> > infrastructure is that it is not susceptible to this sort of bug.
> > 
> > Really, the problem is that there's no separate list of memcg aware
> > shrinkers, so every registered shrinker has to be iterated just to
> > find the one shrinker that is memcg aware.
> 
> I don't think the logic is difficult. There are generic rules,
> and the only task is to teach them memcg-aware shrinkers. Currently,
> they are only super block and workingsets shrinkers, and both of
> them are based on generic list_lru infrastructure. Shrinker-related
> bit is also cleared in generic code (shrink_slab()) only, and
> the algorhythm doesn't allow to clear it without double check.
> The only principle modification I'm thinking about is we should
> clear the bit only when the shrinker is called with maximum
> parameters: priority and GFP.

Lots of "simple logic" combined together makes for a complex mass of
difficult to understand and debug code.

And, really, you're not suffering from a memcg problem - you're
suffering from a "there are thousands of shrinkers" scalability
issue because superblocks have per-superblock shrinker contexts and
you have thousands of mounted filesystems.

> There are a lot of performance improving synchronizations in kernel,
> and they had been refused, the kernel would have remained in the age
> of big kernel lock.

That's false equivalence and hyperbole. The shrinkers are not
limiting what every Linux user can do with their hardware. It's not
a fundamental architectural limitation.  These sorts of arguments
are not convincing - this is the second time I've told you this, so
please stick to technical arguments and drop the dramatic
"anti-progress" conspiracy theory bullshit.

> > So why not just do the simple thing which is create a separate
> > "memcg-aware" shrinker list (i.e. create shrinker_list_memcg
> > alongside shrinker_list) and only iterate the shrinker_list_memcg
> > when a memcg is passed to shrink_slab()?
> > 
> > That means we'll only run 2 shrinkers per memcg at most (sueprblock
> > and working set) per memcg reclaim call. That's a simple 10-20 line
> > change, not a whole mess of new code and issues...
> 
> It was the first optimization, which came to my head, by there is no
> almost a performance profit, since memcg-aware shrinkers still be called
> per every memcg, and they are the biggest part of shrinkers in the system.

Sure, but a polling algorithm is not a fundamental performance
limitation.

The problem here is that the memcg infrastructure has caused an
exponential explosion in shrinker scanning.

> >> Can't we call shrink of shared objects only for top memcg? Something like
> >> this:
> > 
> > What's a "shared object", and how is it any different to a normal
> > slab cache object?
> 
> Sorry, it's erratum. I'm speaking about cached objects. I mean something like
> the below. The patch makes cached objects be cleared outside the memcg 
> iteration
> cycle (it has not sense to call them for every memcg since cached object logic
> just ignores memcg).

The cached flag seems like a hack to me. It does nothing to address
the number of shrinker callout calls (it actually increases them!),
just tries to hack around something you want a specific shrinker to
avoid doing.

I've asked *repeatedly* for a description of the actual workload
problems the XFS shrinker behaviour is causing you. In the absence
of any workload description, I'm simply going to

Re: [PATCH v1 08/16] rtc: mediatek: remove unnecessary irq_dispose_mapping

2018-03-25 Thread Sean Wang

On Fri, 2018-03-23 at 11:38 +0100, Alexandre Belloni wrote:
> On 23/03/2018 at 17:15:05 +0800, sean.w...@mediatek.com wrote:
> > From: Sean Wang 
> > 
> > It's unnecessary doing irq_dispose_mapping as a reverse operation for
> > platform_get_irq.
> > 
> > Ususally, irq_dispose_mapping should be called in error path or module
> > removal to release the resources for irq_of_parse_and_map requested.
> > 
> > Signed-off-by: Sean Wang 
> > ---
> >  drivers/rtc/rtc-mt6397.c | 7 ++-
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c
> > index b62eaa8..cefb83b 100644
> > --- a/drivers/rtc/rtc-mt6397.c
> > +++ b/drivers/rtc/rtc-mt6397.c
> > @@ -17,7 +17,6 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -336,7 +335,7 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> > if (ret) {
> > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n",
> > rtc->irq, ret);
> > -   goto out_dispose_irq;
> > +   return ret;
> > }
> >  
> > device_init_wakeup(>dev, 1);
> > @@ -353,8 +352,7 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> >  
> >  out_free_irq:
> > free_irq(rtc->irq, rtc->rtc_dev);
> > -out_dispose_irq:
> > -   irq_dispose_mapping(rtc->irq);
> > +
> 
> Don't you still have a irq_create_mapping?
> 

Sorry for that I didn't mention in the beginning that the series must
depend on another patch [1]. With the patch, the job irq_create_mapping
had been moved from rtc to mfd, so here it should be better to cleanup
up irq_dispose_mapping in all paths.

[1] https://patchwork.kernel.org/patch/9954643/

> > return ret;
> >  }
> >  
> > @@ -364,7 +362,6 @@ static int mtk_rtc_remove(struct platform_device *pdev)
> >  
> > rtc_device_unregister(rtc->rtc_dev);
> > free_irq(rtc->irq, rtc->rtc_dev);
> > -   irq_dispose_mapping(rtc->irq);
> >  
> > return 0;
> >  }
> > -- 
> > 2.7.4
> > 
>

Re: [PATCH v1 08/16] rtc: mediatek: remove unnecessary irq_dispose_mapping

2018-03-25 Thread Sean Wang

On Fri, 2018-03-23 at 11:38 +0100, Alexandre Belloni wrote:
> On 23/03/2018 at 17:15:05 +0800, sean.w...@mediatek.com wrote:
> > From: Sean Wang 
> > 
> > It's unnecessary doing irq_dispose_mapping as a reverse operation for
> > platform_get_irq.
> > 
> > Ususally, irq_dispose_mapping should be called in error path or module
> > removal to release the resources for irq_of_parse_and_map requested.
> > 
> > Signed-off-by: Sean Wang 
> > ---
> >  drivers/rtc/rtc-mt6397.c | 7 ++-
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c
> > index b62eaa8..cefb83b 100644
> > --- a/drivers/rtc/rtc-mt6397.c
> > +++ b/drivers/rtc/rtc-mt6397.c
> > @@ -17,7 +17,6 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -336,7 +335,7 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> > if (ret) {
> > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n",
> > rtc->irq, ret);
> > -   goto out_dispose_irq;
> > +   return ret;
> > }
> >  
> > device_init_wakeup(>dev, 1);
> > @@ -353,8 +352,7 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> >  
> >  out_free_irq:
> > free_irq(rtc->irq, rtc->rtc_dev);
> > -out_dispose_irq:
> > -   irq_dispose_mapping(rtc->irq);
> > +
> 
> Don't you still have a irq_create_mapping?
> 

Sorry for that I didn't mention in the beginning that the series must
depend on another patch [1]. With the patch, the job irq_create_mapping
had been moved from rtc to mfd, so here it should be better to cleanup
up irq_dispose_mapping in all paths.

[1] https://patchwork.kernel.org/patch/9954643/

> > return ret;
> >  }
> >  
> > @@ -364,7 +362,6 @@ static int mtk_rtc_remove(struct platform_device *pdev)
> >  
> > rtc_device_unregister(rtc->rtc_dev);
> > free_irq(rtc->irq, rtc->rtc_dev);
> > -   irq_dispose_mapping(rtc->irq);
> >  
> > return 0;
> >  }
> > -- 
> > 2.7.4
> > 
>

[no subject]

2018-03-25 Thread Dennis Aberilla

   hi Linux https://goo.gl/BDc7JvDennis Aberilla

[no subject]

2018-03-25 Thread Dennis Aberilla

   hi Linux https://goo.gl/BDc7JvDennis Aberilla

linux-next: manual merge of the i2c tree with the asm-generic tree

2018-03-25 Thread Stephen Rothwell

Hi Wolfram,

Today's linux-next merge of the i2c tree got a conflict in:

  arch/blackfin/mach-bf561/boards/acvilon.c

between commit:

  120090af2745 ("arch: remove blackfin port")

from the asm-generic tree and commit:

  eb49778c8c6c ("i2c: pca-platform: drop gpio from platform data")

from the i2c tree.

I fixed it up (I removed the file) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpJfUu18n4Li.pgp
Description: OpenPGP digital signature

linux-next: manual merge of the i2c tree with the asm-generic tree

2018-03-25 Thread Stephen Rothwell

Hi Wolfram,

Today's linux-next merge of the i2c tree got a conflict in:

  arch/blackfin/mach-bf561/boards/acvilon.c

between commit:

  120090af2745 ("arch: remove blackfin port")

from the asm-generic tree and commit:

  eb49778c8c6c ("i2c: pca-platform: drop gpio from platform data")

from the i2c tree.

I fixed it up (I removed the file) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpJfUu18n4Li.pgp
Description: OpenPGP digital signature

Re: [PATCH v2] scripts/kconfig: cleanup symbol handling code

2018-03-25 Thread Joey Pabalinas

On Mon, Mar 26, 2018 at 01:52:26AM +0900, Masahiro Yamada wrote:
> I want to see Kconfig improvements in a bigger picture.
> 
> The changes below are noise.

That's understandable; I do agree that nothing here is _fundamentally_
broken at all, so no worries.

-- 
Cheers,
Joey Pabalinas

signature.asc
Description: PGP signature

Re: [PATCH v2] scripts/kconfig: cleanup symbol handling code

2018-03-25 Thread Joey Pabalinas

On Mon, Mar 26, 2018 at 01:52:26AM +0900, Masahiro Yamada wrote:
> I want to see Kconfig improvements in a bigger picture.
> 
> The changes below are noise.

That's understandable; I do agree that nothing here is _fundamentally_
broken at all, so no worries.

-- 
Cheers,
Joey Pabalinas

signature.asc
Description: PGP signature

1 2 3 4 5 6 7 >

1 - 100 of 644 matches

Mail list logo