Re: [QUESTION] Mainline support for B43_PHY_AC wifi cards
On 24/03/18 00:01, Rafał Miłecki wrote: > On 23 March 2018 at 15:09, Juri Lelliwrote: > > On 23/03/18 14:43, Rafał Miłecki wrote: > >> Hi, > >> > >> On 23 March 2018 at 10:47, Juri Lelli wrote: > >> > I've got a Dell XPS 13 9343/0TM99H (BIOS A15 01/23/2018) mounting a > >> > BCM4352 802.11ac (rev 03) wireless card and so far I've been using it on > >> > Fedora with broadcom-wl package (which I believe installs Broadcom's STA > >> > driver?). It works good apart from occasional hiccups after suspend. > >> > > >> > I'd like to get rid of that dependency (you can understand that it's > >> > particularly annoying when testing mainline kernels), but I found out > >> > that support for my card is BROKEN in mainline [1]. Just to see what > >> > happens, I forcibly enabled it witnessing that it indeed crashes like > >> > below as Kconfig warns. :) > >> > > >> > bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00 > >> > bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x2B, > >> > class 0x0) > >> > bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x2A, > >> > class 0x0) > >> > bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 0x02, > >> > class 0x0) > >> > bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 0x01, > >> > class 0x0) > >> > bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, rev > >> > 0x11, class 0x0) > >> > bcma: Unsupported SPROM revision: 11 > >> > bcma: bus0: Invalid SPROM read from the PCIe card, trying to use > >> > fallback SPROM > >> > bcma: bus0: Using fallback SPROM failed (err -2) > >> > bcma: bus0: No SPROM available > >> > bcma: bus0: Bus registered > >> > b43-phy0: Broadcom 4352 WLAN found (core revision 42) > >> > b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1 > >> > b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, Version 0 > >> > BUG: unable to handle kernel NULL pointer dereference at > >> > > >> > >> This isn't really useful without a full backtrace. > > > > Sure. I cut it here because I didn't expect people to debug what is > > already known to be broken (but still it seemed to carry useful > > information about the hw). :) > > Please paste the remaining part if you still got it. Sure, please find it below. Thanks! - Juri --->8--- [ 60.732180] cfg80211: Loading compiled-in X.509 certificates for regulatory database [ 60.733048] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' [ 60.733303] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2 [ 60.733305] cfg80211: failed to load regulatory.db [ 61.047277] bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00 [ 61.047302] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x2B, class 0x0) [ 61.047316] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x2A, class 0x0) [ 61.047340] bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 0x02, class 0x0) [ 61.047366] bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 0x01, class 0x0) [ 61.047380] bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, rev 0x11, class 0x0) [ 61.107321] bcma: Unsupported SPROM revision: 11 [ 61.107325] bcma: bus0: Invalid SPROM read from the PCIe card, trying to use fallback SPROM [ 61.107326] bcma: bus0: Using fallback SPROM failed (err -2) [ 61.107327] bcma: bus0: No SPROM available [ 61.109830] bcma: bus0: Bus registered [ 61.242068] b43-phy0: Broadcom 4352 WLAN found (core revision 42) [ 61.242481] b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1 [ 61.242487] b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, Version 0 [ 61.242909] BUG: unable to handle kernel NULL pointer dereference at [ 61.242916] IP: (null) [ 61.242919] PGD 0 P4D 0 [ 61.242924] Oops: 0010 [#1] PREEMPT SMP PTI [ 61.242926] Modules linked in: b43(+) bcma mac80211 cfg80211 ssb mmc_core rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel btusb snd_hda_codec uvcvideo btrtl btbcm btintel snd_hda_core bluetooth snd_hwdep videobuf2_vmalloc snd_seq [ 61.242989] videobuf2_memops videobuf2_v4l2 iTCO_wdt snd_seq_device snd_pcm videobuf2_common iTCO_vendor_support dell_laptop dell_wmi irqbypass videodev wmi_bmof
Re: [QUESTION] Mainline support for B43_PHY_AC wifi cards
On 24/03/18 00:01, Rafał Miłecki wrote: > On 23 March 2018 at 15:09, Juri Lelli wrote: > > On 23/03/18 14:43, Rafał Miłecki wrote: > >> Hi, > >> > >> On 23 March 2018 at 10:47, Juri Lelli wrote: > >> > I've got a Dell XPS 13 9343/0TM99H (BIOS A15 01/23/2018) mounting a > >> > BCM4352 802.11ac (rev 03) wireless card and so far I've been using it on > >> > Fedora with broadcom-wl package (which I believe installs Broadcom's STA > >> > driver?). It works good apart from occasional hiccups after suspend. > >> > > >> > I'd like to get rid of that dependency (you can understand that it's > >> > particularly annoying when testing mainline kernels), but I found out > >> > that support for my card is BROKEN in mainline [1]. Just to see what > >> > happens, I forcibly enabled it witnessing that it indeed crashes like > >> > below as Kconfig warns. :) > >> > > >> > bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00 > >> > bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x2B, > >> > class 0x0) > >> > bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x2A, > >> > class 0x0) > >> > bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 0x02, > >> > class 0x0) > >> > bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 0x01, > >> > class 0x0) > >> > bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, rev > >> > 0x11, class 0x0) > >> > bcma: Unsupported SPROM revision: 11 > >> > bcma: bus0: Invalid SPROM read from the PCIe card, trying to use > >> > fallback SPROM > >> > bcma: bus0: Using fallback SPROM failed (err -2) > >> > bcma: bus0: No SPROM available > >> > bcma: bus0: Bus registered > >> > b43-phy0: Broadcom 4352 WLAN found (core revision 42) > >> > b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1 > >> > b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, Version 0 > >> > BUG: unable to handle kernel NULL pointer dereference at > >> > > >> > >> This isn't really useful without a full backtrace. > > > > Sure. I cut it here because I didn't expect people to debug what is > > already known to be broken (but still it seemed to carry useful > > information about the hw). :) > > Please paste the remaining part if you still got it. Sure, please find it below. Thanks! - Juri --->8--- [ 60.732180] cfg80211: Loading compiled-in X.509 certificates for regulatory database [ 60.733048] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' [ 60.733303] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2 [ 60.733305] cfg80211: failed to load regulatory.db [ 61.047277] bcma: bus0: Found chip with id 0x4352, rev 0x03 and package 0x00 [ 61.047302] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x2B, class 0x0) [ 61.047316] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x2A, class 0x0) [ 61.047340] bcma: bus0: Core 2 found: ARM CR4 (manuf 0x4BF, id 0x83E, rev 0x02, class 0x0) [ 61.047366] bcma: bus0: Core 3 found: PCIe Gen2 (manuf 0x4BF, id 0x83C, rev 0x01, class 0x0) [ 61.047380] bcma: bus0: Core 4 found: USB 2.0 Device (manuf 0x4BF, id 0x81A, rev 0x11, class 0x0) [ 61.107321] bcma: Unsupported SPROM revision: 11 [ 61.107325] bcma: bus0: Invalid SPROM read from the PCIe card, trying to use fallback SPROM [ 61.107326] bcma: bus0: Using fallback SPROM failed (err -2) [ 61.107327] bcma: bus0: No SPROM available [ 61.109830] bcma: bus0: Bus registered [ 61.242068] b43-phy0: Broadcom 4352 WLAN found (core revision 42) [ 61.242481] b43-phy0: Found PHY: Analog 12, Type 11 (AC), Revision 1 [ 61.242487] b43-phy0: Found Radio: Manuf 0x17F, ID 0x2069, Revision 4, Version 0 [ 61.242909] BUG: unable to handle kernel NULL pointer dereference at [ 61.242916] IP: (null) [ 61.242919] PGD 0 P4D 0 [ 61.242924] Oops: 0010 [#1] PREEMPT SMP PTI [ 61.242926] Modules linked in: b43(+) bcma mac80211 cfg80211 ssb mmc_core rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel btusb snd_hda_codec uvcvideo btrtl btbcm btintel snd_hda_core bluetooth snd_hwdep videobuf2_vmalloc snd_seq [ 61.242989] videobuf2_memops videobuf2_v4l2 iTCO_wdt snd_seq_device snd_pcm videobuf2_common iTCO_vendor_support dell_laptop dell_wmi irqbypass videodev wmi_bmof sparse_keymap dell_smbios intel_cstate
Re: [PATCH v2] PCI / PM: Always check PME wakeup capability for runtime wakeup support
Hi Bjorn, Rafael, On Mar 19, 2018, at 10:09 PM, Kai-Heng Fengwrote: USB controller ASM1042 stops working after commit de3ef1eb1cd0 ("PM / core: Drop run_wake flag from struct dev_pm_info"). The device in question is not power managed by platform firmware, furthermore, it only supports PME# from D3cold: Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Before commit de3ef1eb1cd0, the device never gets runtime suspended. After that commit, the device gets runtime suspended, so it does not respond to any PME#. usb_hcd_pci_probe() mandatorily calls device_wakeup_enable(), hence device_can_wakeup() in pci_dev_run_wake() always returns true. So pci_dev_run_wake() needs to check PME wakeup capability as its first condition. Fixes: de3ef1eb1cd0 ("PM / core: Drop run_wake flag from struct dev_pm_info") Cc: sta...@vger.kernel.org # 4.13+ Signed-off-by: Kai-Heng Feng Is there any improvement I can address? Or do you have any concern about this patch? Kai-Heng --- v2: Explicitly check dev->pme_support. drivers/pci/pci.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index f6a4dd10d9b0..52821a21fc07 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2125,16 +2125,16 @@ bool pci_dev_run_wake(struct pci_dev *dev) { struct pci_bus *bus = dev->bus; - if (device_can_wakeup(>dev)) - return true; - if (!dev->pme_support) return false; /* PME-capable in principle, but not from the target power state */ - if (!pci_pme_capable(dev, pci_target_state(dev, false))) + if (!pci_pme_capable(dev, pci_target_state(dev, true))) return false; + if (device_can_wakeup(>dev)) + return true; + while (bus->parent) { struct pci_dev *bridge = bus->self; -- 2.15.1
Re: [PATCH v2] PCI / PM: Always check PME wakeup capability for runtime wakeup support
Hi Bjorn, Rafael, On Mar 19, 2018, at 10:09 PM, Kai-Heng Feng wrote: USB controller ASM1042 stops working after commit de3ef1eb1cd0 ("PM / core: Drop run_wake flag from struct dev_pm_info"). The device in question is not power managed by platform firmware, furthermore, it only supports PME# from D3cold: Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Before commit de3ef1eb1cd0, the device never gets runtime suspended. After that commit, the device gets runtime suspended, so it does not respond to any PME#. usb_hcd_pci_probe() mandatorily calls device_wakeup_enable(), hence device_can_wakeup() in pci_dev_run_wake() always returns true. So pci_dev_run_wake() needs to check PME wakeup capability as its first condition. Fixes: de3ef1eb1cd0 ("PM / core: Drop run_wake flag from struct dev_pm_info") Cc: sta...@vger.kernel.org # 4.13+ Signed-off-by: Kai-Heng Feng Is there any improvement I can address? Or do you have any concern about this patch? Kai-Heng --- v2: Explicitly check dev->pme_support. drivers/pci/pci.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index f6a4dd10d9b0..52821a21fc07 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2125,16 +2125,16 @@ bool pci_dev_run_wake(struct pci_dev *dev) { struct pci_bus *bus = dev->bus; - if (device_can_wakeup(>dev)) - return true; - if (!dev->pme_support) return false; /* PME-capable in principle, but not from the target power state */ - if (!pci_pme_capable(dev, pci_target_state(dev, false))) + if (!pci_pme_capable(dev, pci_target_state(dev, true))) return false; + if (device_can_wakeup(>dev)) + return true; + while (bus->parent) { struct pci_dev *bridge = bus->self; -- 2.15.1
Re: [PATCH v3] x86: i8237: Register based on FADT legacy boot flag
On Sun, Mar 25, 2018 at 01:50:40PM +0200, Thomas Gleixner wrote: > On Thu, 22 Mar 2018, Anshuman Gupta wrote: > > > From: Rajneesh Bhardwaj> > > > >From Skylake onwards, the platform controller hub (Sunrisepoint PCH) does > > not support legacy DMA operations to IO ports 81h-83h, 87h, 89h-8Bh, 8Fh. > > Currently this driver registers as syscore ops and its resume function is > > called on every resume from S3. On Skylake and Kabylake, this causes a > > resume delay of around 100ms due to port IO operations, which is a problem. > > > > This change allows to load the driver only when the platform bios > > explicitly supports such devices or has a cut-off date earlier than 2017. > > Please explain WHY 2017 is the cut-off date. I still have no clue how that > is decided aside of being a random number. Hello Thomas, We tested on few Intel platforms such as Skylake, Kabylake, Geminilake etc and realized that the BIOS always sets the FADT flag to be true though the device may not be physically present on the SoC. This is a BIOS bug. To keep the impact minimum, we decided to add a cut-off date since we are not aware of any BIOS (other than the coreboot link provided in the commit msg) that properly sets this field. SoCs released after Skylake will not have this DMA device on the PCH. So, because of these two reasons, we decided to add a cut-off date as 2017. Please let us know if you feel strongly about it and we can change it or remove it if you feel so. Ideally, we didnt want to add this BIOS check at all and only wanted to use inb() approch but unfortunately, that too was broken for port 0x81. @Rafael / Alan / Andy - Please add more or correct me in case of anything missed or not communicated fully. > > Thanks, > > tglx -- Best Regards, Rajneesh
Re: [PATCH v3] x86: i8237: Register based on FADT legacy boot flag
On Sun, Mar 25, 2018 at 01:50:40PM +0200, Thomas Gleixner wrote: > On Thu, 22 Mar 2018, Anshuman Gupta wrote: > > > From: Rajneesh Bhardwaj > > > > >From Skylake onwards, the platform controller hub (Sunrisepoint PCH) does > > not support legacy DMA operations to IO ports 81h-83h, 87h, 89h-8Bh, 8Fh. > > Currently this driver registers as syscore ops and its resume function is > > called on every resume from S3. On Skylake and Kabylake, this causes a > > resume delay of around 100ms due to port IO operations, which is a problem. > > > > This change allows to load the driver only when the platform bios > > explicitly supports such devices or has a cut-off date earlier than 2017. > > Please explain WHY 2017 is the cut-off date. I still have no clue how that > is decided aside of being a random number. Hello Thomas, We tested on few Intel platforms such as Skylake, Kabylake, Geminilake etc and realized that the BIOS always sets the FADT flag to be true though the device may not be physically present on the SoC. This is a BIOS bug. To keep the impact minimum, we decided to add a cut-off date since we are not aware of any BIOS (other than the coreboot link provided in the commit msg) that properly sets this field. SoCs released after Skylake will not have this DMA device on the PCH. So, because of these two reasons, we decided to add a cut-off date as 2017. Please let us know if you feel strongly about it and we can change it or remove it if you feel so. Ideally, we didnt want to add this BIOS check at all and only wanted to use inb() approch but unfortunately, that too was broken for port 0x81. @Rafael / Alan / Andy - Please add more or correct me in case of anything missed or not communicated fully. > > Thanks, > > tglx -- Best Regards, Rajneesh
Re: [PATCH] ext4 : fix comments in ext4_swap_extents
On Sat, Mar 24, 2018 at 03:28:24PM +0800, zhenwei.pi wrote: > "mark_unwritten" in comment and "unwritten" in variable > argument lists is mismatch. > > Signed-off-by: zhenwei.piApplied, thanks. - Ted
Re: [PATCH] ext4 : fix comments in ext4_swap_extents
On Sat, Mar 24, 2018 at 03:28:24PM +0800, zhenwei.pi wrote: > "mark_unwritten" in comment and "unwritten" in variable > argument lists is mismatch. > > Signed-off-by: zhenwei.pi Applied, thanks. - Ted
linux-next: manual merge of the kvm tree with the kvm-fixes tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/x86/kvm/vmx.c between commit: 9d1887ef3252 ("KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0") from the kvm-fixes tree and commit: 2bb8cafea80b ("KVM: vVMX: signal failure for nested VMEntry if emulation_required") from the kvm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/x86/kvm/vmx.c index 92496b9b5f2b,b4d8da6c62c8.. --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@@ -10952,6 -11010,19 +11021,14 @@@ static int prepare_vmcs02(struct kvm_vc /* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */ vmx_set_efer(vcpu, vcpu->arch.efer); - if (vmx->nested.dirty_vmcs12) { - prepare_vmcs02_full(vcpu, vmcs12, from_vmentry); - vmx->nested.dirty_vmcs12 = false; - } - + /* +* Guest state is invalid and unrestricted guest is disabled, +* which means L1 attempted VMEntry to L2 with invalid state. +* Fail the VMEntry. +*/ + if (vmx->emulation_required) + return 1; + /* Shadow page tables on either EPT or shadow page tables. */ if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), entry_failure_code)) pgpVrgLAJzdY_.pgp Description: OpenPGP digital signature
linux-next: manual merge of the kvm tree with the kvm-fixes tree
Hi all, Today's linux-next merge of the kvm tree got a conflict in: arch/x86/kvm/vmx.c between commit: 9d1887ef3252 ("KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0") from the kvm-fixes tree and commit: 2bb8cafea80b ("KVM: vVMX: signal failure for nested VMEntry if emulation_required") from the kvm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/x86/kvm/vmx.c index 92496b9b5f2b,b4d8da6c62c8.. --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@@ -10952,6 -11010,19 +11021,14 @@@ static int prepare_vmcs02(struct kvm_vc /* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */ vmx_set_efer(vcpu, vcpu->arch.efer); - if (vmx->nested.dirty_vmcs12) { - prepare_vmcs02_full(vcpu, vmcs12, from_vmentry); - vmx->nested.dirty_vmcs12 = false; - } - + /* +* Guest state is invalid and unrestricted guest is disabled, +* which means L1 attempted VMEntry to L2 with invalid state. +* Fail the VMEntry. +*/ + if (vmx->emulation_required) + return 1; + /* Shadow page tables on either EPT or shadow page tables. */ if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12), entry_failure_code)) pgpVrgLAJzdY_.pgp Description: OpenPGP digital signature
[PATCH] ALSA: aloop: Mark paused device as inactive
Show paused ALSA aloop device as inactive, i.e. the control "PCM Slave Active" set as false. Notification sent upon state change. This makes it possible for client capturing from aloop device to know if data is expected. Without it the client expects data even if playback is paused. Signed-off-by: Robert Rosengren--- sound/drivers/aloop.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/sound/drivers/aloop.c b/sound/drivers/aloop.c index 0333143a1fa7..5404ab11132d 100644 --- a/sound/drivers/aloop.c +++ b/sound/drivers/aloop.c @@ -291,6 +291,8 @@ static int loopback_trigger(struct snd_pcm_substream *substream, int cmd) cable->pause |= stream; loopback_timer_stop(dpcm); spin_unlock(>lock); + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) + loopback_active_notify(dpcm); break; case SNDRV_PCM_TRIGGER_PAUSE_RELEASE: case SNDRV_PCM_TRIGGER_RESUME: @@ -299,6 +301,8 @@ static int loopback_trigger(struct snd_pcm_substream *substream, int cmd) cable->pause &= ~stream; loopback_timer_start(dpcm); spin_unlock(>lock); + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) + loopback_active_notify(dpcm); break; default: return -EINVAL; @@ -879,9 +883,11 @@ static int loopback_active_get(struct snd_kcontrol *kcontrol, [kcontrol->id.subdevice][kcontrol->id.device ^ 1]; unsigned int val = 0; - if (cable != NULL) - val = (cable->running & (1 << SNDRV_PCM_STREAM_PLAYBACK)) ? - 1 : 0; + if (cable != NULL) { + unsigned int running = cable->running ^ cable->pause; + + val = (running & (1 << SNDRV_PCM_STREAM_PLAYBACK)) ? 1 : 0; + } ucontrol->value.integer.value[0] = val; return 0; } -- 2.11.0
[PATCH] ALSA: aloop: Mark paused device as inactive
Show paused ALSA aloop device as inactive, i.e. the control "PCM Slave Active" set as false. Notification sent upon state change. This makes it possible for client capturing from aloop device to know if data is expected. Without it the client expects data even if playback is paused. Signed-off-by: Robert Rosengren --- sound/drivers/aloop.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/sound/drivers/aloop.c b/sound/drivers/aloop.c index 0333143a1fa7..5404ab11132d 100644 --- a/sound/drivers/aloop.c +++ b/sound/drivers/aloop.c @@ -291,6 +291,8 @@ static int loopback_trigger(struct snd_pcm_substream *substream, int cmd) cable->pause |= stream; loopback_timer_stop(dpcm); spin_unlock(>lock); + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) + loopback_active_notify(dpcm); break; case SNDRV_PCM_TRIGGER_PAUSE_RELEASE: case SNDRV_PCM_TRIGGER_RESUME: @@ -299,6 +301,8 @@ static int loopback_trigger(struct snd_pcm_substream *substream, int cmd) cable->pause &= ~stream; loopback_timer_start(dpcm); spin_unlock(>lock); + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) + loopback_active_notify(dpcm); break; default: return -EINVAL; @@ -879,9 +883,11 @@ static int loopback_active_get(struct snd_kcontrol *kcontrol, [kcontrol->id.subdevice][kcontrol->id.device ^ 1]; unsigned int val = 0; - if (cable != NULL) - val = (cable->running & (1 << SNDRV_PCM_STREAM_PLAYBACK)) ? - 1 : 0; + if (cable != NULL) { + unsigned int running = cable->running ^ cable->pause; + + val = (running & (1 << SNDRV_PCM_STREAM_PLAYBACK)) ? 1 : 0; + } ucontrol->value.integer.value[0] = val; return 0; } -- 2.11.0
Re: [PATCH v2 1/4] clk: qcom: Clear hardware clock control bit of RCG
On 3/20/2018 4:25 AM, Stephen Boyd wrote: Quoting Amit Nischal (2018-03-07 23:18:12) For upcoming targets like sdm845, POR value of the hardware clock control bit is set for most of root clocks which needs to be cleared for software to be able to control. For older targets like MSM8996, this bit is reserved bit and having POR value as 0 so this patch will work for the older targets too. So update the configuration mask to take care of the same to clear hardware clock control bit. Signed-off-by: Amit Nischal--- drivers/clk/qcom/clk-rcg2.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c index bbeaf9c..e63db10 100644 --- a/drivers/clk/qcom/clk-rcg2.c +++ b/drivers/clk/qcom/clk-rcg2.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013, The Linux Foundation. All rights reserved. + * Copyright (c) 2013, 2018, The Linux Foundation. All rights reserved. It would be nice if lawyers over there could avoid forcing copyright date updates when less than half the file changes. Thanks for the review. I will address the above in the next patch series. * * This software is licensed under the terms of the GNU General Public * License version 2, as published by the Free Software Foundation, and @@ -42,6 +42,7 @@ #define CFG_MODE_SHIFT 12 #define CFG_MODE_MASK (0x3 << CFG_MODE_SHIFT) #define CFG_MODE_DUAL_EDGE (0x2 << CFG_MODE_SHIFT) +#define CFG_HW_CLK_CTRL_MASK BIT(20) #define M_REG 0x8 #define N_REG 0xc @@ -276,7 +277,7 @@ static int clk_rcg2_configure(struct clk_rcg2 *rcg, const struct freq_tbl *f) } mask = BIT(rcg->hid_width) - 1; - mask |= CFG_SRC_SEL_MASK | CFG_MODE_MASK; + mask |= CFG_SRC_SEL_MASK | CFG_MODE_MASK | CFG_HW_CLK_CTRL_MASK; cfg = f->pre_div << CFG_SRC_DIV_SHIFT; cfg |= rcg->parent_map[index].cfg << CFG_SRC_SEL_SHIFT; if (rcg->mnd_width && f->n && (f->m != f->n)) Is there going to be a future patch to update the RCGs to indicate they support hardware control or not? As of now, there will not be any patch to update the RCGs to support HW control.
Re: [PATCH v2 1/4] clk: qcom: Clear hardware clock control bit of RCG
On 3/20/2018 4:25 AM, Stephen Boyd wrote: Quoting Amit Nischal (2018-03-07 23:18:12) For upcoming targets like sdm845, POR value of the hardware clock control bit is set for most of root clocks which needs to be cleared for software to be able to control. For older targets like MSM8996, this bit is reserved bit and having POR value as 0 so this patch will work for the older targets too. So update the configuration mask to take care of the same to clear hardware clock control bit. Signed-off-by: Amit Nischal --- drivers/clk/qcom/clk-rcg2.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c index bbeaf9c..e63db10 100644 --- a/drivers/clk/qcom/clk-rcg2.c +++ b/drivers/clk/qcom/clk-rcg2.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013, The Linux Foundation. All rights reserved. + * Copyright (c) 2013, 2018, The Linux Foundation. All rights reserved. It would be nice if lawyers over there could avoid forcing copyright date updates when less than half the file changes. Thanks for the review. I will address the above in the next patch series. * * This software is licensed under the terms of the GNU General Public * License version 2, as published by the Free Software Foundation, and @@ -42,6 +42,7 @@ #define CFG_MODE_SHIFT 12 #define CFG_MODE_MASK (0x3 << CFG_MODE_SHIFT) #define CFG_MODE_DUAL_EDGE (0x2 << CFG_MODE_SHIFT) +#define CFG_HW_CLK_CTRL_MASK BIT(20) #define M_REG 0x8 #define N_REG 0xc @@ -276,7 +277,7 @@ static int clk_rcg2_configure(struct clk_rcg2 *rcg, const struct freq_tbl *f) } mask = BIT(rcg->hid_width) - 1; - mask |= CFG_SRC_SEL_MASK | CFG_MODE_MASK; + mask |= CFG_SRC_SEL_MASK | CFG_MODE_MASK | CFG_HW_CLK_CTRL_MASK; cfg = f->pre_div << CFG_SRC_DIV_SHIFT; cfg |= rcg->parent_map[index].cfg << CFG_SRC_SEL_SHIFT; if (rcg->mnd_width && f->n && (f->m != f->n)) Is there going to be a future patch to update the RCGs to indicate they support hardware control or not? As of now, there will not be any patch to update the RCGs to support HW control.
Re: [Bug 199003] console stalled, cause Hard LOCKUP.
Cc-ing the kernel list and printk people. Wen Yang, any chance we can switch to email? Bugzilla is not very handful. On (03/26/18 02:40), bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=199003 > > --- Comment #11 from Wen Yang (wen.yan...@zte.com.cn) --- > Hello Steven, > > +module_param_named(synchronous, printk_sync, bool, S_IRUGO); > +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); > > It's depend on this kernel parameter(printk.synchronous), but this parameter > is > readonly. > So we must change grub files, and alse need to restart the server for changes > to take effect. > If we can configure it dynamically, it will be more useful. So you are testing printk_kthread now, right? May I ask why? Did Steven's patch help? -ss
Re: [Bug 199003] console stalled, cause Hard LOCKUP.
Cc-ing the kernel list and printk people. Wen Yang, any chance we can switch to email? Bugzilla is not very handful. On (03/26/18 02:40), bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=199003 > > --- Comment #11 from Wen Yang (wen.yan...@zte.com.cn) --- > Hello Steven, > > +module_param_named(synchronous, printk_sync, bool, S_IRUGO); > +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); > > It's depend on this kernel parameter(printk.synchronous), but this parameter > is > readonly. > So we must change grub files, and alse need to restart the server for changes > to take effect. > If we can configure it dynamically, it will be more useful. So you are testing printk_kthread now, right? May I ask why? Did Steven's patch help? -ss
Re: [Bug 199003] console stalled, cause Hard LOCKUP.
On (03/23/18 14:16), Petr Mladek wrote: [..] > If I get it correctly, the reporter of this bug has not tried > Steven's patches yet. It's not immediately clear. It's not even completely clear if we are looking at "X cpus printk 1 cpu prints it all scenario" and it's not clear if hand off will be helpful here. I'll try to explain. What we see is: CPU0 locked up on blkg->q->queue_lock [] cfqg_print_rwstat_recursive+0x36/0x40 [] cgroup_seqfile_show+0x73/0x80 [] ? seq_buf_alloc+0x17/0x40 [] seq_read+0x10a/0x3b0 [] vfs_read+0x9e/0x170 [] SyS_read+0x7f/0xe0 [] system_call_fastpath+0x16/0x1b blkg->q->queue_lock was held by CPU7, which was spinnig in wait_for_xmitr(). #5 [881ffb0b7548] __const_udelay at 81326678 #6 [881ffb0b7558] wait_for_xmitr at 814056e0 #7 [881ffb0b7580] serial8250_console_putchar at 814058ac #8 [881ffb0b75a0] uart_console_write at 8140035a #9 [881ffb0b75d0] serial8250_console_write at 814057fe #10 [881ffb0b7618] call_console_drivers.constprop.17 at 81087011 #11 [881ffb0b7640] console_unlock at 810889e9 #12 [881ffb0b7680] vprintk_emit at 81088df4 #13 [881ffb0b76f0] dev_vprintk_emit at 81428e72 #14 [881ffb0b77a8] dev_printk_emit at 81428eee #15 [881ffb0b7808] __dev_printk at 8142937e #16 [881ffb0b7818] dev_printk at 8142942d #17 [881ffb0b7888] sdev_prefix_printk at 81463771 #18 [881ffb0b7918] scsi_prep_state_check at 814598e4 #19 [881ffb0b7928] scsi_prep_fn at 8145992d #20 [881ffb0b7960] blk_peek_request at 812f0826 #21 [881ffb0b7988] scsi_request_fn at 8145b588 #22 [881ffb0b79f0] __blk_run_queue at 812ebd63 #23 [881ffb0b7a08] blk_queue_bio at 812f1013 <<< acquired q->queue_lock #24 [881ffb0b7a50] generic_make_request at 812ef209 #25 [881ffb0b7a98] submit_bio at 812ef351 #26 [881ffb0b7af0] xfs_submit_ioend_bio at a0146a63 [xfs] #27 [881ffb0b7b00] xfs_submit_ioend at a0146b31 [xfs] #28 [881ffb0b7b40] xfs_vm_writepages at a0146e18 [xfs] #29 [881ffb0b7bb8] do_writepages at 8118da6e #30 [881ffb0b7bc8] __writeback_single_inode at 812293a0 #31 [881ffb0b7c08] writeback_sb_inodes at 8122a08e #32 [881ffb0b7cb0] __writeback_inodes_wb at 8122a2ef #33 [881ffb0b7cf8] wb_writeback at 8122ab33 #34 [881ffb0b7d70] bdi_writeback_workfn at 8122cb2b #35 [881ffb0b7e20] process_one_work at 810a851b #36 [881ffb0b7e68] worker_thread at 810a9356 #37 [881ffb0b7ec8] kthread at 810b0b6f #38 [881ffb0b7f50] ret_from_fork at 81697a18 Given how slow serial8250_console_putchar()->wait_for_xmitr() can be - 10ms of delay for every char - it's possible that we had no concurrent printk()-s from other CPUs. So may be we had just one printing CPU, and several CPUs spinning on a spin_lock which was owned by the printing CPU. So that's why printk_deferred() helped here. It simply detached 8250 and made spin_lock critical secrtion to be as fast as printk->log_store(). But here comes the tricky part. Suppose that we: a) have at least two CPUs that call printk concurrently b) have hand off enabled Now, what will happen if we have something like this CPU0CPU1CPU2 spin_lock(queue_lock) printk printk cfqg_print_rwstat_recursive() serial8250 spin_lock(queue_lock) printk serial8250 serial8250printk serial8250 I suspect that handoff may not be very helpful. CPU1 and CPU2 will wait for each to finish serial8250 and to hand off printing to each other. So CPU1 will do 2 serial8250 invocations to printk its messages, and in between it will spin waiting for CPU2 to do its printk->serial8250 and to handoff printing to CPU1. The problem is that CPU1 will be under spin_lock() all that time, so CPU0 is going to suffer just like before. Opinions? -ss
Re: [Bug 199003] console stalled, cause Hard LOCKUP.
On (03/23/18 14:16), Petr Mladek wrote: [..] > If I get it correctly, the reporter of this bug has not tried > Steven's patches yet. It's not immediately clear. It's not even completely clear if we are looking at "X cpus printk 1 cpu prints it all scenario" and it's not clear if hand off will be helpful here. I'll try to explain. What we see is: CPU0 locked up on blkg->q->queue_lock [] cfqg_print_rwstat_recursive+0x36/0x40 [] cgroup_seqfile_show+0x73/0x80 [] ? seq_buf_alloc+0x17/0x40 [] seq_read+0x10a/0x3b0 [] vfs_read+0x9e/0x170 [] SyS_read+0x7f/0xe0 [] system_call_fastpath+0x16/0x1b blkg->q->queue_lock was held by CPU7, which was spinnig in wait_for_xmitr(). #5 [881ffb0b7548] __const_udelay at 81326678 #6 [881ffb0b7558] wait_for_xmitr at 814056e0 #7 [881ffb0b7580] serial8250_console_putchar at 814058ac #8 [881ffb0b75a0] uart_console_write at 8140035a #9 [881ffb0b75d0] serial8250_console_write at 814057fe #10 [881ffb0b7618] call_console_drivers.constprop.17 at 81087011 #11 [881ffb0b7640] console_unlock at 810889e9 #12 [881ffb0b7680] vprintk_emit at 81088df4 #13 [881ffb0b76f0] dev_vprintk_emit at 81428e72 #14 [881ffb0b77a8] dev_printk_emit at 81428eee #15 [881ffb0b7808] __dev_printk at 8142937e #16 [881ffb0b7818] dev_printk at 8142942d #17 [881ffb0b7888] sdev_prefix_printk at 81463771 #18 [881ffb0b7918] scsi_prep_state_check at 814598e4 #19 [881ffb0b7928] scsi_prep_fn at 8145992d #20 [881ffb0b7960] blk_peek_request at 812f0826 #21 [881ffb0b7988] scsi_request_fn at 8145b588 #22 [881ffb0b79f0] __blk_run_queue at 812ebd63 #23 [881ffb0b7a08] blk_queue_bio at 812f1013 <<< acquired q->queue_lock #24 [881ffb0b7a50] generic_make_request at 812ef209 #25 [881ffb0b7a98] submit_bio at 812ef351 #26 [881ffb0b7af0] xfs_submit_ioend_bio at a0146a63 [xfs] #27 [881ffb0b7b00] xfs_submit_ioend at a0146b31 [xfs] #28 [881ffb0b7b40] xfs_vm_writepages at a0146e18 [xfs] #29 [881ffb0b7bb8] do_writepages at 8118da6e #30 [881ffb0b7bc8] __writeback_single_inode at 812293a0 #31 [881ffb0b7c08] writeback_sb_inodes at 8122a08e #32 [881ffb0b7cb0] __writeback_inodes_wb at 8122a2ef #33 [881ffb0b7cf8] wb_writeback at 8122ab33 #34 [881ffb0b7d70] bdi_writeback_workfn at 8122cb2b #35 [881ffb0b7e20] process_one_work at 810a851b #36 [881ffb0b7e68] worker_thread at 810a9356 #37 [881ffb0b7ec8] kthread at 810b0b6f #38 [881ffb0b7f50] ret_from_fork at 81697a18 Given how slow serial8250_console_putchar()->wait_for_xmitr() can be - 10ms of delay for every char - it's possible that we had no concurrent printk()-s from other CPUs. So may be we had just one printing CPU, and several CPUs spinning on a spin_lock which was owned by the printing CPU. So that's why printk_deferred() helped here. It simply detached 8250 and made spin_lock critical secrtion to be as fast as printk->log_store(). But here comes the tricky part. Suppose that we: a) have at least two CPUs that call printk concurrently b) have hand off enabled Now, what will happen if we have something like this CPU0CPU1CPU2 spin_lock(queue_lock) printk printk cfqg_print_rwstat_recursive() serial8250 spin_lock(queue_lock) printk serial8250 serial8250printk serial8250 I suspect that handoff may not be very helpful. CPU1 and CPU2 will wait for each to finish serial8250 and to hand off printing to each other. So CPU1 will do 2 serial8250 invocations to printk its messages, and in between it will spin waiting for CPU2 to do its printk->serial8250 and to handoff printing to CPU1. The problem is that CPU1 will be under spin_lock() all that time, so CPU0 is going to suffer just like before. Opinions? -ss
Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
On Sat, Mar 24, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote: >On Sat, Mar 24, 2018 at 10:06:38AM +0100, Greg Kroah-Hartman wrote: >> On Fri, Mar 23, 2018 at 06:23:02PM +, Luis R. Rodriguez wrote: >> > On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote: >> > > On Fri, Mar 23, 2018 at 05:08:13PM +, Luis R. Rodriguez wrote: >> > > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote: >> > > > > On Fri, Mar 23, 2018 at 01:30:37AM +, Luis R. Rodriguez wrote: >> > > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote: >> > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c >> > > > > > > index 61d1cb7..8012741 100644 >> > > > > > > --- a/fs/xfs/xfs_inode.c >> > > > > > > +++ b/fs/xfs/xfs_inode.c >> > > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster( >> > > > > > > } >> > > > > > > >> > > > > > > /* >> > > > > > > + * Free any local-format buffers sitting around before we reset >> > > > > > > to >> > > > > > > + * extents format. >> > > > > > > + */ >> > > > > > > +static inline void >> > > > > > > +xfs_ifree_local_data( >> > > > > > > +struct xfs_inode*ip, >> > > > > > > +int whichfork) >> > > > > > > +{ >> > > > > > > +struct xfs_ifork*ifp; >> > > > > > > + >> > > > > > > +if (XFS_IFORK_FORMAT(ip, whichfork) != >> > > > > > > XFS_DINODE_FMT_LOCAL) >> > > > > > > +return; >> > > > > > >> > > > > > I'm new to all this so this was a bit hard to follow. I'm confused >> > > > > > with how >> > > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents >> > > > > > into the >> > > > > > inode fork") exacerbated the leak, isn't that commit about >> > > > > > XFS_DINODE_FMT_EXTENTS? >> > > > > >> > > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose >> > > > > incore data was small enough to fit in if_inline_ata. >> > > > >> > > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition. >> > > > >> > > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet >> > > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ? >> > > > > >> > > > > An empty directory is 6 bytes, which is what you get with a fresh >> > > > > mkdir >> > > > > or after deleting everything in the directory. Prior to the >> > > > > 43518812d2 >> > > > > patch we could get away with not even checking if we had to free >> > > > > if_data >> > > > > when deleting a directory because it fit within if_inline_data. >> > > > >> > > > Ah got it. So your fix *is* also applicable even prior to commit >> > > > 43518812d2. >> > > >> > > You'd have to modify the patch so that it doesn't try to kmem_free >> > > if_data if if_data == if_inline_data but otherwise (in theory) I think >> > > that the concept applies to pre-4.15 kernels. >> > > >> > > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, >> > > etc...) >> > >> > Well... so we need a resolution and better get testing this already given >> > that >> > *I believe* the new auto-selection algorithm used to cherry pick patches >> > onto >> > stable for linux-4.14.y (covered on a paper [0] and when used, stable >> > patches >> > are prefixed with AUTOSEL, a recent discussion covered this in November >> > 2017 >> > [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free >> > inline >> > data before resetting inode fork during ifree") as stable commit >> > 1eccdbd4836a41 >> > on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for >> > inlining data/extents into the inode fork"). >> > >> > Sasha, Greg, >> > >> > Can you confirm if the algorithm was used in this case? >> >> No idea. >> >> I think xfs should just be added to the "blacklist" so that it is not >> even looked at for these types of auto-selected patches. Much like the >> i915 driver currently is handled (it too is ignored for these patches >> due to objections from the maintainers of it.) > >Just out of curiosity, how does this autoselection mechanism work today? >If it's smart enough to cherry pick patches, apply them to a kernel, >build the kernel and run xfstests, and propose the patches if nothing >weird happened, then I'd be interested in looking further. I've nothing >against algorithmic selection per se, but I'd want to know more about >the data sets and parameters that feed the algorithm. It won't go beyond build testing. >I did receive the AUTOSEL tagged patches a few days ago, but I couldn't >figure out what automated regression testing, if any, had been done; or >whether the patch submission was asking if we wanted it put into 4.14 >or if it was a declaration that they were on their way in. Excuse me There would be (at least) 3 different mails involved in this process: 1. You'd get a mail from me, proposing this patch for stable. We give at least 1 week (but usually closer to 2) to comment on whether this patch should or should not go in
Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
On Sat, Mar 24, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote: >On Sat, Mar 24, 2018 at 10:06:38AM +0100, Greg Kroah-Hartman wrote: >> On Fri, Mar 23, 2018 at 06:23:02PM +, Luis R. Rodriguez wrote: >> > On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote: >> > > On Fri, Mar 23, 2018 at 05:08:13PM +, Luis R. Rodriguez wrote: >> > > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote: >> > > > > On Fri, Mar 23, 2018 at 01:30:37AM +, Luis R. Rodriguez wrote: >> > > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote: >> > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c >> > > > > > > index 61d1cb7..8012741 100644 >> > > > > > > --- a/fs/xfs/xfs_inode.c >> > > > > > > +++ b/fs/xfs/xfs_inode.c >> > > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster( >> > > > > > > } >> > > > > > > >> > > > > > > /* >> > > > > > > + * Free any local-format buffers sitting around before we reset >> > > > > > > to >> > > > > > > + * extents format. >> > > > > > > + */ >> > > > > > > +static inline void >> > > > > > > +xfs_ifree_local_data( >> > > > > > > +struct xfs_inode*ip, >> > > > > > > +int whichfork) >> > > > > > > +{ >> > > > > > > +struct xfs_ifork*ifp; >> > > > > > > + >> > > > > > > +if (XFS_IFORK_FORMAT(ip, whichfork) != >> > > > > > > XFS_DINODE_FMT_LOCAL) >> > > > > > > +return; >> > > > > > >> > > > > > I'm new to all this so this was a bit hard to follow. I'm confused >> > > > > > with how >> > > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents >> > > > > > into the >> > > > > > inode fork") exacerbated the leak, isn't that commit about >> > > > > > XFS_DINODE_FMT_EXTENTS? >> > > > > >> > > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose >> > > > > incore data was small enough to fit in if_inline_ata. >> > > > >> > > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition. >> > > > >> > > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet >> > > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ? >> > > > > >> > > > > An empty directory is 6 bytes, which is what you get with a fresh >> > > > > mkdir >> > > > > or after deleting everything in the directory. Prior to the >> > > > > 43518812d2 >> > > > > patch we could get away with not even checking if we had to free >> > > > > if_data >> > > > > when deleting a directory because it fit within if_inline_data. >> > > > >> > > > Ah got it. So your fix *is* also applicable even prior to commit >> > > > 43518812d2. >> > > >> > > You'd have to modify the patch so that it doesn't try to kmem_free >> > > if_data if if_data == if_inline_data but otherwise (in theory) I think >> > > that the concept applies to pre-4.15 kernels. >> > > >> > > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, >> > > etc...) >> > >> > Well... so we need a resolution and better get testing this already given >> > that >> > *I believe* the new auto-selection algorithm used to cherry pick patches >> > onto >> > stable for linux-4.14.y (covered on a paper [0] and when used, stable >> > patches >> > are prefixed with AUTOSEL, a recent discussion covered this in November >> > 2017 >> > [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free >> > inline >> > data before resetting inode fork during ifree") as stable commit >> > 1eccdbd4836a41 >> > on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for >> > inlining data/extents into the inode fork"). >> > >> > Sasha, Greg, >> > >> > Can you confirm if the algorithm was used in this case? >> >> No idea. >> >> I think xfs should just be added to the "blacklist" so that it is not >> even looked at for these types of auto-selected patches. Much like the >> i915 driver currently is handled (it too is ignored for these patches >> due to objections from the maintainers of it.) > >Just out of curiosity, how does this autoselection mechanism work today? >If it's smart enough to cherry pick patches, apply them to a kernel, >build the kernel and run xfstests, and propose the patches if nothing >weird happened, then I'd be interested in looking further. I've nothing >against algorithmic selection per se, but I'd want to know more about >the data sets and parameters that feed the algorithm. It won't go beyond build testing. >I did receive the AUTOSEL tagged patches a few days ago, but I couldn't >figure out what automated regression testing, if any, had been done; or >whether the patch submission was asking if we wanted it put into 4.14 >or if it was a declaration that they were on their way in. Excuse me There would be (at least) 3 different mails involved in this process: 1. You'd get a mail from me, proposing this patch for stable. We give at least 1 week (but usually closer to 2) to comment on whether this patch should or should not go in
Re: [PATCH v2 00/13] Major code reorganization to make all i2c transfers working
On 2018-03-24 17:52, Wolfram Sang wrote: On Mon, Mar 12, 2018 at 06:44:49PM +0530, Abhishek Sahu wrote: * v2: 1. Address review comments in v1 2. Changed the license to SPDX 3. Changed commit messages for some of the patch having more detail 4. Removed event-based completion and changed transfer completion detection logic in interrupt handler 5. Removed dma_threshold and blk_mode_threshold from global structure 6. Improved determine mode logic for QUP v2 transfers 7. Fixed function comments 8. Fixed auto build test WARNING ‘idx' may be used uninitialized in this function 9. Renamed tx/rx_buf to tx/rx_cnt * v1: The current driver is failing in following test case 1. Handling of failure cases is not working in long run for BAM mode. It generates error message “bam-dma-engine 7884000.dma: Cannot free busy channel” sometimes. 2. Following I2C transfers are failing a. Single transfer with multiple read messages b. Single transfer with multiple read/write message with maximum allowed length per message (65K) in BAM mode c. Single transfer with write greater than 32 bytes in QUP v1 and write greater than 64 bytes in QUP v2 for non-DMA mode. 3. No handling is present for Block/FIFO interrupts. Any non-error interrupts are being treated as the transfer completion and then polling is being done for available/free bytes in FIFO. To fix all these issues, major code changes are required. This patch series fixes all the above issues and makes the driver interrupt based instead of polling based. After these changes, all the mentioned test cases are working properly. The code changes have been tested for QUP v1 (IPQ8064) and QUP v2 (IPQ8074) with sample application written over i2c-dev. Abhishek Sahu (13): i2c: qup: fix copyrights and update to SPDX identifier i2c: qup: fixed releasing dma without flush operation completion i2c: qup: minor code reorganization for use_dma i2c: qup: remove redundant variables for BAM SG count i2c: qup: schedule EOT and FLUSH tags at the end of transfer i2c: qup: fix the transfer length for BAM RX EOT FLUSH tags i2c: qup: proper error handling for i2c error in BAM mode i2c: qup: use the complete transfer length to choose DMA mode i2c: qup: change completion timeout according to transfer length i2c: qup: fix buffer overflow for multiple msg of maximum xfer len i2c: qup: send NACK for last read sub transfers i2c: qup: reorganization of driver code to remove polling for qup v1 i2c: qup: reorganization of driver code to remove polling for qup v2 Applied to for-next, thanks! Also thanks to the reviewers! Thanks Wolfram for your help in getting this big patch series applied to for-next. Thanks to Andy, Sricharan, Austin and other reviewers for reviewing/testing the patches. Regards, Abhishek
Re: [PATCH v2 00/13] Major code reorganization to make all i2c transfers working
On 2018-03-24 17:52, Wolfram Sang wrote: On Mon, Mar 12, 2018 at 06:44:49PM +0530, Abhishek Sahu wrote: * v2: 1. Address review comments in v1 2. Changed the license to SPDX 3. Changed commit messages for some of the patch having more detail 4. Removed event-based completion and changed transfer completion detection logic in interrupt handler 5. Removed dma_threshold and blk_mode_threshold from global structure 6. Improved determine mode logic for QUP v2 transfers 7. Fixed function comments 8. Fixed auto build test WARNING ‘idx' may be used uninitialized in this function 9. Renamed tx/rx_buf to tx/rx_cnt * v1: The current driver is failing in following test case 1. Handling of failure cases is not working in long run for BAM mode. It generates error message “bam-dma-engine 7884000.dma: Cannot free busy channel” sometimes. 2. Following I2C transfers are failing a. Single transfer with multiple read messages b. Single transfer with multiple read/write message with maximum allowed length per message (65K) in BAM mode c. Single transfer with write greater than 32 bytes in QUP v1 and write greater than 64 bytes in QUP v2 for non-DMA mode. 3. No handling is present for Block/FIFO interrupts. Any non-error interrupts are being treated as the transfer completion and then polling is being done for available/free bytes in FIFO. To fix all these issues, major code changes are required. This patch series fixes all the above issues and makes the driver interrupt based instead of polling based. After these changes, all the mentioned test cases are working properly. The code changes have been tested for QUP v1 (IPQ8064) and QUP v2 (IPQ8074) with sample application written over i2c-dev. Abhishek Sahu (13): i2c: qup: fix copyrights and update to SPDX identifier i2c: qup: fixed releasing dma without flush operation completion i2c: qup: minor code reorganization for use_dma i2c: qup: remove redundant variables for BAM SG count i2c: qup: schedule EOT and FLUSH tags at the end of transfer i2c: qup: fix the transfer length for BAM RX EOT FLUSH tags i2c: qup: proper error handling for i2c error in BAM mode i2c: qup: use the complete transfer length to choose DMA mode i2c: qup: change completion timeout according to transfer length i2c: qup: fix buffer overflow for multiple msg of maximum xfer len i2c: qup: send NACK for last read sub transfers i2c: qup: reorganization of driver code to remove polling for qup v1 i2c: qup: reorganization of driver code to remove polling for qup v2 Applied to for-next, thanks! Also thanks to the reviewers! Thanks Wolfram for your help in getting this big patch series applied to for-next. Thanks to Andy, Sricharan, Austin and other reviewers for reviewing/testing the patches. Regards, Abhishek
Re: [PATCH v6 05/21] tracing: probeevent: Cleanup print argument functions
On Fri, 23 Mar 2018 12:36:47 -0400 Steven Rostedtwrote: > On Sat, 17 Mar 2018 21:41:12 +0900 > Masami Hiramatsu wrote: > > > Current print argument functions prints the argument > > name too. It is not good for printing out multiple > > values for one argument. This change it to just print > > out the value. > > Hi Masami, > > This is a confusing change log, as I have no idea what this patch does. > Can you add a "before" and "after" of what you mean. Some examples of > what it currently does to show why it looks bad, and then an example of > what it looks like after the patch. OK, this is actually just a cleanup patch. No functional difference between "before" and "after". For more flexible argument, like array type, we need to decouple with argument name and its value printing. Is below more clear to you? Cleanup argument-printing functions to decouple it into name-printing and value-printing, so that it can support more flexible argument expression, like array type. Thanks, > > Thanks! > > -- Steve > > > > > > Signed-off-by: Masami Hiramatsu > > -- Masami Hiramatsu
Re: [PATCH v6 05/21] tracing: probeevent: Cleanup print argument functions
On Fri, 23 Mar 2018 12:36:47 -0400 Steven Rostedt wrote: > On Sat, 17 Mar 2018 21:41:12 +0900 > Masami Hiramatsu wrote: > > > Current print argument functions prints the argument > > name too. It is not good for printing out multiple > > values for one argument. This change it to just print > > out the value. > > Hi Masami, > > This is a confusing change log, as I have no idea what this patch does. > Can you add a "before" and "after" of what you mean. Some examples of > what it currently does to show why it looks bad, and then an example of > what it looks like after the patch. OK, this is actually just a cleanup patch. No functional difference between "before" and "after". For more flexible argument, like array type, we need to decouple with argument name and its value printing. Is below more clear to you? Cleanup argument-printing functions to decouple it into name-printing and value-printing, so that it can support more flexible argument expression, like array type. Thanks, > > Thanks! > > -- Steve > > > > > > Signed-off-by: Masami Hiramatsu > > -- Masami Hiramatsu
Re: [PATCH v1 09/16] rtc: mediatek: convert to use device managed functions
On Fri, 2018-03-23 at 11:50 +0100, Alexandre Belloni wrote: > On 23/03/2018 at 17:15:06 +0800, sean.w...@mediatek.com wrote: > > From: Sean Wang> > > > Use device managed operation to simplify error handling, reduce source > > code size, and reduce the likelyhood of bugs, and remove our removal > > callback which contains anything already done by device managed functions. > > > > Signed-off-by: Sean Wang > > --- > > drivers/rtc/rtc-mt6397.c | 31 --- > > 1 file changed, 8 insertions(+), 23 deletions(-) > > > > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c > > index cefb83b..bfc5d6f 100644 > > --- a/drivers/rtc/rtc-mt6397.c > > +++ b/drivers/rtc/rtc-mt6397.c > > @@ -14,6 +14,7 @@ > > > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -328,10 +329,10 @@ static int mtk_rtc_probe(struct platform_device *pdev) > > > > platform_set_drvdata(pdev, rtc); > > > > - ret = request_threaded_irq(rtc->irq, NULL, > > - mtk_rtc_irq_handler_thread, > > - IRQF_ONESHOT | IRQF_TRIGGER_HIGH, > > - "mt6397-rtc", rtc); > > + ret = devm_request_threaded_irq(>dev, rtc->irq, NULL, > > + mtk_rtc_irq_handler_thread, > > + IRQF_ONESHOT | IRQF_TRIGGER_HIGH, > > + "mt6397-rtc", rtc); > > if (ret) { > > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n", > > rtc->irq, ret); > > @@ -340,30 +341,15 @@ static int mtk_rtc_probe(struct platform_device *pdev) > > > > device_init_wakeup(>dev, 1); > > > > - rtc->rtc_dev = rtc_device_register("mt6397-rtc", >dev, > > - _rtc_ops, THIS_MODULE); > > + rtc->rtc_dev = devm_rtc_device_register(>dev, "mt6397-rtc", > > + _rtc_ops, THIS_MODULE); > > You should probably switch to devm_rtc_allocate_device() and > rtc_register_device instead of devm_rtc_device_register. > Just would like to know something details It seems you just encourage me to switch into the new registration method and currently devm_rtc_device_register I used for the driver shouldn't cause any harm. right? > > if (IS_ERR(rtc->rtc_dev)) { > > dev_err(>dev, "register rtc device failed\n"); > > ret = PTR_ERR(rtc->rtc_dev); > > - goto out_free_irq; > > + return ret; > > ret doesn't seem necessary anymore here. okay, it'll be removed > >
Re: [PATCH v1 09/16] rtc: mediatek: convert to use device managed functions
On Fri, 2018-03-23 at 11:50 +0100, Alexandre Belloni wrote: > On 23/03/2018 at 17:15:06 +0800, sean.w...@mediatek.com wrote: > > From: Sean Wang > > > > Use device managed operation to simplify error handling, reduce source > > code size, and reduce the likelyhood of bugs, and remove our removal > > callback which contains anything already done by device managed functions. > > > > Signed-off-by: Sean Wang > > --- > > drivers/rtc/rtc-mt6397.c | 31 --- > > 1 file changed, 8 insertions(+), 23 deletions(-) > > > > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c > > index cefb83b..bfc5d6f 100644 > > --- a/drivers/rtc/rtc-mt6397.c > > +++ b/drivers/rtc/rtc-mt6397.c > > @@ -14,6 +14,7 @@ > > > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -328,10 +329,10 @@ static int mtk_rtc_probe(struct platform_device *pdev) > > > > platform_set_drvdata(pdev, rtc); > > > > - ret = request_threaded_irq(rtc->irq, NULL, > > - mtk_rtc_irq_handler_thread, > > - IRQF_ONESHOT | IRQF_TRIGGER_HIGH, > > - "mt6397-rtc", rtc); > > + ret = devm_request_threaded_irq(>dev, rtc->irq, NULL, > > + mtk_rtc_irq_handler_thread, > > + IRQF_ONESHOT | IRQF_TRIGGER_HIGH, > > + "mt6397-rtc", rtc); > > if (ret) { > > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n", > > rtc->irq, ret); > > @@ -340,30 +341,15 @@ static int mtk_rtc_probe(struct platform_device *pdev) > > > > device_init_wakeup(>dev, 1); > > > > - rtc->rtc_dev = rtc_device_register("mt6397-rtc", >dev, > > - _rtc_ops, THIS_MODULE); > > + rtc->rtc_dev = devm_rtc_device_register(>dev, "mt6397-rtc", > > + _rtc_ops, THIS_MODULE); > > You should probably switch to devm_rtc_allocate_device() and > rtc_register_device instead of devm_rtc_device_register. > Just would like to know something details It seems you just encourage me to switch into the new registration method and currently devm_rtc_device_register I used for the driver shouldn't cause any harm. right? > > if (IS_ERR(rtc->rtc_dev)) { > > dev_err(>dev, "register rtc device failed\n"); > > ret = PTR_ERR(rtc->rtc_dev); > > - goto out_free_irq; > > + return ret; > > ret doesn't seem necessary anymore here. okay, it'll be removed > >
Re: [PATCH v6 21/21] perf-probe: Add array argument support
On Thu, 22 Mar 2018 16:19:46 +0530 Ravi Bangoriawrote: > Hi Masami :) > > On 03/22/2018 03:53 PM, Masami Hiramatsu wrote: > > On Mon, 19 Mar 2018 13:29:59 +0530 > > Ravi Bangoria wrote: > > > >> > >> Is it okay to allow user to specify array size with type field? > > Fro this patch, yes. > > So IIUC, perf _tool_ will allow user to record array either with "name[range]" > or by "name:type[length]". Please correct me if that's wrong. Yes, it is correct. > And If perf tool will allow array length along with TYPE field, I guess we > should > document that in man perf-probe? Ah, right. OK, I'll add it. Thanks! > > Otherwise, > > Acked-by: Ravi Bangoria > > Thanks, > Ravi > > > The availability of type is checked only when > > it is automatically generated. > > IMO, it should be done in another patch, something like > > "Validate user specified type casting" patch. Would you need it? > > > > Thank you, > > > -- Masami Hiramatsu
Re: [PATCH v6 21/21] perf-probe: Add array argument support
On Thu, 22 Mar 2018 16:19:46 +0530 Ravi Bangoria wrote: > Hi Masami :) > > On 03/22/2018 03:53 PM, Masami Hiramatsu wrote: > > On Mon, 19 Mar 2018 13:29:59 +0530 > > Ravi Bangoria wrote: > > > >> > >> Is it okay to allow user to specify array size with type field? > > Fro this patch, yes. > > So IIUC, perf _tool_ will allow user to record array either with "name[range]" > or by "name:type[length]". Please correct me if that's wrong. Yes, it is correct. > And If perf tool will allow array length along with TYPE field, I guess we > should > document that in man perf-probe? Ah, right. OK, I'll add it. Thanks! > > Otherwise, > > Acked-by: Ravi Bangoria > > Thanks, > Ravi > > > The availability of type is checked only when > > it is automatically generated. > > IMO, it should be done in another patch, something like > > "Validate user specified type casting" patch. Would you need it? > > > > Thank you, > > > -- Masami Hiramatsu
Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers
On Mon, Mar 26, 2018 at 01:40:17AM +0100, Al Viro wrote: > Kinda-sorta part: > * asmlinkage_protect is taken out for now, so m68k has problems. > * syscalls that run out of 6 slots barf violently. For mips it's > wrong (there we have 8 slots); for stuff like arm and ppc it's right, but > it means that things like e.g. compat sync_file_range() should not even > be compiled on those. __ARCH_WANT_SYS_SYNC_FILE_RANGE, presumably... > In any case, we *can't* do pt_regs-based wrappers for those syscalls on > such architectures, so ifdefs around those puppies are probably the right > thing to do. > * s390 macrology in compat_wrapper.c not even touched; it needs > a trivial update to keep working (__MAP callbacks take an extra argument, > unused for those users). > * sys_... and compat_sys_... aliases are unchanged; if we kill > direct callers, we can trivially rename SyS##name and compat_SyS##name > to sys##name and compat_sys##name and get rid of aliases. * mips n32 and x86 x32 can become an extra source of headache. That actually applies to any plans of passing struct pt_regs *. As it is, e.g. syscall 515 on amd64 is compat_sys_readv(). Dispatched via this: /* * NB: Native and x32 syscalls are dispatched from the same * table. The only functional difference is the x32 bit in * regs->orig_ax, which changes the behavior of some syscalls. */ if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) { nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls); regs->ax = sys_call_table[nr]( regs->di, regs->si, regs->dx, regs->r10, regs->r8, regs->r9); } Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched via nr = array_index_nospec(nr, IA32_NR_syscalls); /* * It's possible that a 32-bit syscall implementation * takes a 64-bit parameter but nonetheless assumes that * the high bits are zero. Make sure we zero-extend all * of the args. */ regs->ax = ia32_sys_call_table[nr]( (unsigned int)regs->bx, (unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->si, (unsigned int)regs->di, (unsigned int)regs->bp); Right now it works - we call the same function, passing it arguments picked from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one). But if we switch to passing struct pt_regs * and have the wrapper fetch regs->{bx,cx,dx}, we have a problem. It won't work for both entry points. IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing them and arranging them into arguments expected by syscall body. Linus, Dominik - how do you plan dealing with that fun? Regardless of the way we generate the glue, the issue remains. We can't get the same struct pt_regs *-taking function for both; we either need to produce a separate chunk of glue for each compat_sys_... involved (either making COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that) or we need to have the registers-to-slots mapping done in dispatcher...
Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers
On Mon, Mar 26, 2018 at 01:40:17AM +0100, Al Viro wrote: > Kinda-sorta part: > * asmlinkage_protect is taken out for now, so m68k has problems. > * syscalls that run out of 6 slots barf violently. For mips it's > wrong (there we have 8 slots); for stuff like arm and ppc it's right, but > it means that things like e.g. compat sync_file_range() should not even > be compiled on those. __ARCH_WANT_SYS_SYNC_FILE_RANGE, presumably... > In any case, we *can't* do pt_regs-based wrappers for those syscalls on > such architectures, so ifdefs around those puppies are probably the right > thing to do. > * s390 macrology in compat_wrapper.c not even touched; it needs > a trivial update to keep working (__MAP callbacks take an extra argument, > unused for those users). > * sys_... and compat_sys_... aliases are unchanged; if we kill > direct callers, we can trivially rename SyS##name and compat_SyS##name > to sys##name and compat_sys##name and get rid of aliases. * mips n32 and x86 x32 can become an extra source of headache. That actually applies to any plans of passing struct pt_regs *. As it is, e.g. syscall 515 on amd64 is compat_sys_readv(). Dispatched via this: /* * NB: Native and x32 syscalls are dispatched from the same * table. The only functional difference is the x32 bit in * regs->orig_ax, which changes the behavior of some syscalls. */ if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) { nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls); regs->ax = sys_call_table[nr]( regs->di, regs->si, regs->dx, regs->r10, regs->r8, regs->r9); } Now, syscall 145 via 32bit call is *also* compat_sys_readv(), dispatched via nr = array_index_nospec(nr, IA32_NR_syscalls); /* * It's possible that a 32-bit syscall implementation * takes a 64-bit parameter but nonetheless assumes that * the high bits are zero. Make sure we zero-extend all * of the args. */ regs->ax = ia32_sys_call_table[nr]( (unsigned int)regs->bx, (unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->si, (unsigned int)regs->di, (unsigned int)regs->bp); Right now it works - we call the same function, passing it arguments picked from different set of registers (di/si/dx in x32 case, bx/cx/dx in i386 one). But if we switch to passing struct pt_regs * and have the wrapper fetch regs->{bx,cx,dx}, we have a problem. It won't work for both entry points. IMO it's a good reason to have dispatcher(s) handle extraction from pt_regs and let the wrapper deal with the resulting 6 u64 or 6 u32, normalizing them and arranging them into arguments expected by syscall body. Linus, Dominik - how do you plan dealing with that fun? Regardless of the way we generate the glue, the issue remains. We can't get the same struct pt_regs *-taking function for both; we either need to produce a separate chunk of glue for each compat_sys_... involved (either making COMPAT_SYSCALL_DEFINE generate both, or having duplicate X32_SYSCALL_DEFINE for each of those COMPAT_SYSCALL_DEFINE - with identical body, at that) or we need to have the registers-to-slots mapping done in dispatcher...
Re: possible deadlock in handle_rx
On 2018年03月26日 08:01, syzbot wrote: Hello, syzbot hit the following crash on upstream commit cb6416592bc2a8b731dabcec0d63cda270764fc6 (Sun Mar 25 17:45:10 2018 +) Merge tag 'dmaengine-fix-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=7f073540b1384a614e09 So far this crash happened 4 times on upstream. C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6506789075943424 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5716250550337536 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5142038655795200 Kernel config: https://syzkaller.appspot.com/x/.config?id=-5034017172441945317 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+7f073540b1384a614...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. WARNING: possible recursive locking detected 4.16.0-rc6+ #366 Not tainted vhost-4248/4760 is trying to acquire lock: (>mutex){+.+.}, at: [<3482bddc>] vhost_net_rx_peek_head_len drivers/vhost/net.c:633 [inline] (>mutex){+.+.}, at: [<3482bddc>] handle_rx+0xeb1/0x19c0 drivers/vhost/net.c:784 but task is already holding lock: (>mutex){+.+.}, at: [<4de72f44>] handle_rx+0x1f5/0x19c0 drivers/vhost/net.c:766 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(>mutex); lock(>mutex); *** DEADLOCK *** May be due to missing lock nesting notation Yes, it's a missing of nesting notation. Will post a patch soon. Thanks
Re: possible deadlock in handle_rx
On 2018年03月26日 08:01, syzbot wrote: Hello, syzbot hit the following crash on upstream commit cb6416592bc2a8b731dabcec0d63cda270764fc6 (Sun Mar 25 17:45:10 2018 +) Merge tag 'dmaengine-fix-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=7f073540b1384a614e09 So far this crash happened 4 times on upstream. C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6506789075943424 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5716250550337536 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5142038655795200 Kernel config: https://syzkaller.appspot.com/x/.config?id=-5034017172441945317 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+7f073540b1384a614...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. WARNING: possible recursive locking detected 4.16.0-rc6+ #366 Not tainted vhost-4248/4760 is trying to acquire lock: (>mutex){+.+.}, at: [<3482bddc>] vhost_net_rx_peek_head_len drivers/vhost/net.c:633 [inline] (>mutex){+.+.}, at: [<3482bddc>] handle_rx+0xeb1/0x19c0 drivers/vhost/net.c:784 but task is already holding lock: (>mutex){+.+.}, at: [<4de72f44>] handle_rx+0x1f5/0x19c0 drivers/vhost/net.c:766 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(>mutex); lock(>mutex); *** DEADLOCK *** May be due to missing lock nesting notation Yes, it's a missing of nesting notation. Will post a patch soon. Thanks
Re: linux-next: manual merge of the drm tree with Linus' tree
Hi all, On Thu, 22 Mar 2018 17:37:22 +1100 Stephen Rothwellwrote: > > Today's linux-next merge of the drm tree got conflicts in several amdgpu > files because there are a set of (mostly identical) patches that appear > Linus' tree and the drm tree. In each case I just used the version fo > the file from the drm tree. > > You should do a test merge between your tree and Linus' tree and see what > you want to do about the resolution (either do the back merge (I think > with v4.16-rc6), or provide Linus with branch that has the merge done). > Its a bit of a mess :-( I got a few more of these today. -- Cheers, Stephen Rothwell pgpqw3NCsq8IR.pgp Description: OpenPGP digital signature
Re: linux-next: manual merge of the drm tree with Linus' tree
Hi all, On Thu, 22 Mar 2018 17:37:22 +1100 Stephen Rothwell wrote: > > Today's linux-next merge of the drm tree got conflicts in several amdgpu > files because there are a set of (mostly identical) patches that appear > Linus' tree and the drm tree. In each case I just used the version fo > the file from the drm tree. > > You should do a test merge between your tree and Linus' tree and see what > you want to do about the resolution (either do the back merge (I think > with v4.16-rc6), or provide Linus with branch that has the merge done). > Its a bit of a mess :-( I got a few more of these today. -- Cheers, Stephen Rothwell pgpqw3NCsq8IR.pgp Description: OpenPGP digital signature
Re: [RFC PATCH V2 0/8] Packed ring for vhost
cc Jens, Tiwei and Wei Thanks On 2018年03月26日 11:38, Jason Wang wrote: Hi all: This RFC implement packed ring layout. The code were tested with pmd implement by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change was needed for pmd codes to kick virtqueue since it assumes a busy polling backend. Test were done between localhost and guest. Testpmd (rxonly) in guest reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps. Notes: The event suppression /indirect descriptor support is complied test only because of lacked driver support. Changes from V1: - Refactor vhost used elem code to avoid open coding on used elem - Event suppression support (compile test only). - Indirect descriptor support (compile test only). - Zerocopy support. - vIOMMU support. - SCSI/VSOCK support (compile test only). - Fix several bugs For simplicity, I don't implement batching or other optimizations. Please review. Thanks Jason Wang (8): vhost: move get_rx_bufs to vhost.c vhost: hide used ring layout from device vhost: do not use vring_used_elem vhost_net: do not explicitly manipulate vhost_used_elem vhost: vhost_put_user() can accept metadata type virtio: introduce packed ring defines vhost: packed ring support vhost: event suppression for packed ring drivers/vhost/net.c| 138 ++- drivers/vhost/scsi.c | 62 +-- drivers/vhost/vhost.c | 818 ++--- drivers/vhost/vhost.h | 46 ++- drivers/vhost/vsock.c | 42 +- include/uapi/linux/virtio_config.h | 9 + include/uapi/linux/virtio_ring.h | 32 ++ 7 files changed, 921 insertions(+), 226 deletions(-)
Re: [RFC PATCH V2 0/8] Packed ring for vhost
cc Jens, Tiwei and Wei Thanks On 2018年03月26日 11:38, Jason Wang wrote: Hi all: This RFC implement packed ring layout. The code were tested with pmd implement by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change was needed for pmd codes to kick virtqueue since it assumes a busy polling backend. Test were done between localhost and guest. Testpmd (rxonly) in guest reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps. Notes: The event suppression /indirect descriptor support is complied test only because of lacked driver support. Changes from V1: - Refactor vhost used elem code to avoid open coding on used elem - Event suppression support (compile test only). - Indirect descriptor support (compile test only). - Zerocopy support. - vIOMMU support. - SCSI/VSOCK support (compile test only). - Fix several bugs For simplicity, I don't implement batching or other optimizations. Please review. Thanks Jason Wang (8): vhost: move get_rx_bufs to vhost.c vhost: hide used ring layout from device vhost: do not use vring_used_elem vhost_net: do not explicitly manipulate vhost_used_elem vhost: vhost_put_user() can accept metadata type virtio: introduce packed ring defines vhost: packed ring support vhost: event suppression for packed ring drivers/vhost/net.c| 138 ++- drivers/vhost/scsi.c | 62 +-- drivers/vhost/vhost.c | 818 ++--- drivers/vhost/vhost.h | 46 ++- drivers/vhost/vsock.c | 42 +- include/uapi/linux/virtio_config.h | 9 + include/uapi/linux/virtio_ring.h | 32 ++ 7 files changed, 921 insertions(+), 226 deletions(-)
Re: KASAN: use-after-free Read in __list_del_entry_valid (4)
syzbot has found reproducer for the following crash on upstream commit 3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +) Linux 4.16-rc7 syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=29ee8f76017ce6cf03da So far this crash happened 4 times on upstream. C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4763014771245056 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5870647779524608 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=4652258302099456 Kernel config: https://syzkaller.appspot.com/x/.config?id=-2340295454854568752 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+29ee8f76017ce6cf0...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready 8021q: adding VLAN 0 to HW filter on device bond0 IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready == BUG: KASAN: use-after-free in __list_del_entry_valid+0x144/0x150 lib/list_debug.c:54 Read of size 8 at addr 8801b6022fa0 by task syzkaller871713/4346 CPU: 1 PID: 4346 Comm: syzkaller871713 Not tainted 4.16.0-rc7+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __list_del_entry_valid+0x144/0x150 lib/list_debug.c:54 __list_del_entry include/linux/list.h:117 [inline] list_del include/linux/list.h:125 [inline] cma_cancel_listens drivers/infiniband/core/cma.c:1569 [inline] cma_cancel_operation+0x455/0xd60 drivers/infiniband/core/cma.c:1597 rdma_destroy_id+0xff/0xda0 drivers/infiniband/core/cma.c:1661 ucma_close+0x100/0x2f0 drivers/infiniband/core/ucma.c:1728 __fput+0x327/0x7e0 fs/file_table.c:209 fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ad0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 get_signal+0x73a/0x16d0 kernel/signal.c:2469 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x447529 RSP: 002b:7f782c2b0cf8 EFLAGS: 0202 ORIG_RAX: 00ca RAX: 0001 RBX: 006ddc5c RCX: 00447529 RDX: 00447529 RSI: 0001 RDI: 006ddc5c RBP: 006ddc58 R08: R09: R10: R11: 0202 R12: R13: 7fff8bb3d8cf R14: 7f782c2b19c0 R15: 0005 Allocated by task 4343: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552 kmem_cache_alloc_trace+0x136/0x740 mm/slab.c:3607 kmalloc include/linux/slab.h:512 [inline] kzalloc include/linux/slab.h:701 [inline] rdma_create_id+0xd0/0x630 drivers/infiniband/core/cma.c:787 ucma_create_id+0x35f/0x920 drivers/infiniband/core/ucma.c:480 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1649 __vfs_write+0xef/0x970 fs/read_write.c:480 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Freed by task 4346: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527 __cache_free mm/slab.c:3485 [inline] kfree+0xd9/0x260 mm/slab.c:3800 rdma_destroy_id+0x821/0xda0 drivers/infiniband/core/cma.c:1691 ucma_close+0x100/0x2f0 drivers/infiniband/core/ucma.c:1728 __fput+0x327/0x7e0 fs/file_table.c:209 fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ad0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 get_signal+0x73a/0x16d0 kernel/signal.c:2469 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ec/0x940
Re: KASAN: use-after-free Read in __list_del_entry_valid (4)
syzbot has found reproducer for the following crash on upstream commit 3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +) Linux 4.16-rc7 syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=29ee8f76017ce6cf03da So far this crash happened 4 times on upstream. C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4763014771245056 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5870647779524608 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=4652258302099456 Kernel config: https://syzkaller.appspot.com/x/.config?id=-2340295454854568752 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+29ee8f76017ce6cf0...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready 8021q: adding VLAN 0 to HW filter on device bond0 IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready == BUG: KASAN: use-after-free in __list_del_entry_valid+0x144/0x150 lib/list_debug.c:54 Read of size 8 at addr 8801b6022fa0 by task syzkaller871713/4346 CPU: 1 PID: 4346 Comm: syzkaller871713 Not tainted 4.16.0-rc7+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __list_del_entry_valid+0x144/0x150 lib/list_debug.c:54 __list_del_entry include/linux/list.h:117 [inline] list_del include/linux/list.h:125 [inline] cma_cancel_listens drivers/infiniband/core/cma.c:1569 [inline] cma_cancel_operation+0x455/0xd60 drivers/infiniband/core/cma.c:1597 rdma_destroy_id+0xff/0xda0 drivers/infiniband/core/cma.c:1661 ucma_close+0x100/0x2f0 drivers/infiniband/core/ucma.c:1728 __fput+0x327/0x7e0 fs/file_table.c:209 fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ad0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 get_signal+0x73a/0x16d0 kernel/signal.c:2469 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x447529 RSP: 002b:7f782c2b0cf8 EFLAGS: 0202 ORIG_RAX: 00ca RAX: 0001 RBX: 006ddc5c RCX: 00447529 RDX: 00447529 RSI: 0001 RDI: 006ddc5c RBP: 006ddc58 R08: R09: R10: R11: 0202 R12: R13: 7fff8bb3d8cf R14: 7f782c2b19c0 R15: 0005 Allocated by task 4343: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552 kmem_cache_alloc_trace+0x136/0x740 mm/slab.c:3607 kmalloc include/linux/slab.h:512 [inline] kzalloc include/linux/slab.h:701 [inline] rdma_create_id+0xd0/0x630 drivers/infiniband/core/cma.c:787 ucma_create_id+0x35f/0x920 drivers/infiniband/core/ucma.c:480 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1649 __vfs_write+0xef/0x970 fs/read_write.c:480 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Freed by task 4346: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527 __cache_free mm/slab.c:3485 [inline] kfree+0xd9/0x260 mm/slab.c:3800 rdma_destroy_id+0x821/0xda0 drivers/infiniband/core/cma.c:1691 ucma_close+0x100/0x2f0 drivers/infiniband/core/ucma.c:1728 __fput+0x327/0x7e0 fs/file_table.c:209 fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ad0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 get_signal+0x73a/0x16d0 kernel/signal.c:2469 do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809 exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ec/0x940
[RFC PATCH V2 8/8] vhost: event suppression for packed ring
This patch introduces basic support for event suppression aka driver and device area. Compile tested only. Signed-off-by: Jason Wang--- drivers/vhost/vhost.c| 169 --- drivers/vhost/vhost.h| 10 ++- include/uapi/linux/virtio_ring.h | 19 + 3 files changed, 183 insertions(+), 15 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 6177e4d..ff83a2e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1143,10 +1143,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num, struct vring_used __user *used) { struct vring_desc_packed *packed = (struct vring_desc_packed *)desc; + struct vring_packed_desc_event *driver_event = + (struct vring_packed_desc_event *)avail; + struct vring_packed_desc_event *device_event = + (struct vring_packed_desc_event *)used; - /* FIXME: check device area and driver area */ return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) && - access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)); + access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) && + access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) && + access_ok(VERIFY_WRITE, device_event, sizeof(*device_event)); } static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num, @@ -1222,14 +1227,27 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq, return true; } -int vq_iotlb_prefetch(struct vhost_virtqueue *vq) +int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq) +{ + int num = vq->num; + + return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc, + num * sizeof(*vq->desc), VHOST_ADDR_DESC) && + iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc, + num * sizeof(*vq->desc), VHOST_ADDR_DESC) && + iotlb_access_ok(vq, VHOST_ACCESS_RO, + (u64)(uintptr_t)vq->driver_event, + sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) && + iotlb_access_ok(vq, VHOST_ACCESS_WO, + (u64)(uintptr_t)vq->device_event, + sizeof(*vq->device_event), VHOST_ADDR_USED); +} + +int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq) { size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; unsigned int num = vq->num; - if (!vq->iotlb) - return 1; - return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc, num * sizeof(*vq->desc), VHOST_ADDR_DESC) && iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail, @@ -1241,6 +1259,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq) num * sizeof(*vq->used->ring) + s, VHOST_ADDR_USED); } + +int vq_iotlb_prefetch(struct vhost_virtqueue *vq) +{ + if (!vq->iotlb) + return 1; + + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) + return vq_iotlb_prefetch_packed(vq); + else + return vq_iotlb_prefetch_split(vq); +} EXPORT_SYMBOL_GPL(vq_iotlb_prefetch); /* Can we log writes? */ @@ -1756,6 +1785,29 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) return 0; } +static int vhost_update_device_flags(struct vhost_virtqueue *vq, +__virtio16 device_flags) +{ + void __user *flags; + + if (vhost_put_user(vq, cpu_to_vhost16(vq, device_flags), + >device_event->desc_event_flags, + VHOST_ADDR_DESC) < 0) + return -EFAULT; + if (unlikely(vq->log_used)) { + /* Make sure the flag is seen before log. */ + smp_wmb(); + /* Log used flag write. */ + flags = >device_event->desc_event_flags; + log_write(vq->log_base, vq->log_addr + + (flags - (void __user *)vq->device_event), + sizeof(vq->used->flags)); + if (vq->log_ctx) + eventfd_signal(vq->log_ctx, 1); + } + return 0; +} + static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) { if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx), @@ -2667,16 +2719,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, } EXPORT_SYMBOL_GPL(vhost_add_used_n); -static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) +static bool vhost_notify_split(struct vhost_dev *dev, + struct vhost_virtqueue *vq) { __u16 old, new; __virtio16 event; bool v; - /*
[RFC PATCH V2 8/8] vhost: event suppression for packed ring
This patch introduces basic support for event suppression aka driver and device area. Compile tested only. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c| 169 --- drivers/vhost/vhost.h| 10 ++- include/uapi/linux/virtio_ring.h | 19 + 3 files changed, 183 insertions(+), 15 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 6177e4d..ff83a2e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1143,10 +1143,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num, struct vring_used __user *used) { struct vring_desc_packed *packed = (struct vring_desc_packed *)desc; + struct vring_packed_desc_event *driver_event = + (struct vring_packed_desc_event *)avail; + struct vring_packed_desc_event *device_event = + (struct vring_packed_desc_event *)used; - /* FIXME: check device area and driver area */ return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) && - access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)); + access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) && + access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) && + access_ok(VERIFY_WRITE, device_event, sizeof(*device_event)); } static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num, @@ -1222,14 +1227,27 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq, return true; } -int vq_iotlb_prefetch(struct vhost_virtqueue *vq) +int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq) +{ + int num = vq->num; + + return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc, + num * sizeof(*vq->desc), VHOST_ADDR_DESC) && + iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc, + num * sizeof(*vq->desc), VHOST_ADDR_DESC) && + iotlb_access_ok(vq, VHOST_ACCESS_RO, + (u64)(uintptr_t)vq->driver_event, + sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) && + iotlb_access_ok(vq, VHOST_ACCESS_WO, + (u64)(uintptr_t)vq->device_event, + sizeof(*vq->device_event), VHOST_ADDR_USED); +} + +int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq) { size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; unsigned int num = vq->num; - if (!vq->iotlb) - return 1; - return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc, num * sizeof(*vq->desc), VHOST_ADDR_DESC) && iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail, @@ -1241,6 +1259,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq) num * sizeof(*vq->used->ring) + s, VHOST_ADDR_USED); } + +int vq_iotlb_prefetch(struct vhost_virtqueue *vq) +{ + if (!vq->iotlb) + return 1; + + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) + return vq_iotlb_prefetch_packed(vq); + else + return vq_iotlb_prefetch_split(vq); +} EXPORT_SYMBOL_GPL(vq_iotlb_prefetch); /* Can we log writes? */ @@ -1756,6 +1785,29 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) return 0; } +static int vhost_update_device_flags(struct vhost_virtqueue *vq, +__virtio16 device_flags) +{ + void __user *flags; + + if (vhost_put_user(vq, cpu_to_vhost16(vq, device_flags), + >device_event->desc_event_flags, + VHOST_ADDR_DESC) < 0) + return -EFAULT; + if (unlikely(vq->log_used)) { + /* Make sure the flag is seen before log. */ + smp_wmb(); + /* Log used flag write. */ + flags = >device_event->desc_event_flags; + log_write(vq->log_base, vq->log_addr + + (flags - (void __user *)vq->device_event), + sizeof(vq->used->flags)); + if (vq->log_ctx) + eventfd_signal(vq->log_ctx, 1); + } + return 0; +} + static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) { if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx), @@ -2667,16 +2719,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, } EXPORT_SYMBOL_GPL(vhost_add_used_n); -static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) +static bool vhost_notify_split(struct vhost_dev *dev, + struct vhost_virtqueue *vq) { __u16 old, new; __virtio16 event; bool v; - /* FIXME: check driver
[RFC PATCH V2 7/8] vhost: packed ring support
Signed-off-by: Jason Wang--- drivers/vhost/net.c | 5 +- drivers/vhost/vhost.c | 530 ++ drivers/vhost/vhost.h | 7 +- 3 files changed, 505 insertions(+), 37 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 7be8b55..84905d5 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -67,7 +67,8 @@ enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | -(1ULL << VIRTIO_F_IOMMU_PLATFORM) +(1ULL << VIRTIO_F_IOMMU_PLATFORM) | +(1ULL << VIRTIO_F_RING_PACKED) }; enum { @@ -706,6 +707,8 @@ static void handle_rx(struct vhost_net *net) vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ? vq->log : NULL; mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF); + /* FIXME: workaround for current dpdk prototype */ + mergeable = false; while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) { sock_len += sock_hlen; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index dcac4d4..6177e4d 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -324,6 +324,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, vhost_reset_is_le(vq); vhost_disable_cross_endian(vq); vq->busyloop_timeout = 0; + vq->used_wrap_counter = true; vq->umem = NULL; vq->iotlb = NULL; __vhost_vq_meta_reset(vq); @@ -1136,10 +1137,22 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access) return 0; } -static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, - struct vring_desc __user *desc, - struct vring_avail __user *avail, - struct vring_used __user *used) +static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num, + struct vring_desc __user *desc, + struct vring_avail __user *avail, + struct vring_used __user *used) +{ + struct vring_desc_packed *packed = (struct vring_desc_packed *)desc; + + /* FIXME: check device area and driver area */ + return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) && + access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)); +} + +static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num, + struct vring_desc __user *desc, + struct vring_avail __user *avail, + struct vring_used __user *used) { size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; @@ -1151,6 +1164,17 @@ static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, sizeof *used + num * sizeof *used->ring + s); } +static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, + struct vring_desc __user *desc, + struct vring_avail __user *avail, + struct vring_used __user *used) +{ + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) + return vq_access_ok_packed(vq, num, desc, avail, used); + else + return vq_access_ok_split(vq, num, desc, avail, used); +} + static void vhost_vq_meta_update(struct vhost_virtqueue *vq, const struct vhost_umem_node *node, int type) @@ -1763,6 +1787,9 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) vhost_init_is_le(vq); + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) + return 0; + r = vhost_update_used_flags(vq); if (r) goto err; @@ -1836,7 +1863,8 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, /* Each buffer in the virtqueues is actually a chain of descriptors. This * function returns the next descriptor in the chain, * or -1U if we're at the end. */ -static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc) +static unsigned next_desc_split(struct vhost_virtqueue *vq, + struct vring_desc *desc) { unsigned int next; @@ -1849,11 +1877,17 @@ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc) return next; } -static int get_indirect(struct vhost_virtqueue *vq, - struct iovec iov[], unsigned int iov_size, - unsigned int *out_num, unsigned int *in_num, - struct vhost_log *log, unsigned int *log_num, - struct vring_desc *indirect) +static unsigned next_desc_packed(struct vhost_virtqueue *vq, +
[RFC PATCH V2 7/8] vhost: packed ring support
Signed-off-by: Jason Wang --- drivers/vhost/net.c | 5 +- drivers/vhost/vhost.c | 530 ++ drivers/vhost/vhost.h | 7 +- 3 files changed, 505 insertions(+), 37 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 7be8b55..84905d5 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -67,7 +67,8 @@ enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | -(1ULL << VIRTIO_F_IOMMU_PLATFORM) +(1ULL << VIRTIO_F_IOMMU_PLATFORM) | +(1ULL << VIRTIO_F_RING_PACKED) }; enum { @@ -706,6 +707,8 @@ static void handle_rx(struct vhost_net *net) vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ? vq->log : NULL; mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF); + /* FIXME: workaround for current dpdk prototype */ + mergeable = false; while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) { sock_len += sock_hlen; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index dcac4d4..6177e4d 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -324,6 +324,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, vhost_reset_is_le(vq); vhost_disable_cross_endian(vq); vq->busyloop_timeout = 0; + vq->used_wrap_counter = true; vq->umem = NULL; vq->iotlb = NULL; __vhost_vq_meta_reset(vq); @@ -1136,10 +1137,22 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access) return 0; } -static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, - struct vring_desc __user *desc, - struct vring_avail __user *avail, - struct vring_used __user *used) +static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num, + struct vring_desc __user *desc, + struct vring_avail __user *avail, + struct vring_used __user *used) +{ + struct vring_desc_packed *packed = (struct vring_desc_packed *)desc; + + /* FIXME: check device area and driver area */ + return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) && + access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)); +} + +static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num, + struct vring_desc __user *desc, + struct vring_avail __user *avail, + struct vring_used __user *used) { size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; @@ -1151,6 +1164,17 @@ static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, sizeof *used + num * sizeof *used->ring + s); } +static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, + struct vring_desc __user *desc, + struct vring_avail __user *avail, + struct vring_used __user *used) +{ + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) + return vq_access_ok_packed(vq, num, desc, avail, used); + else + return vq_access_ok_split(vq, num, desc, avail, used); +} + static void vhost_vq_meta_update(struct vhost_virtqueue *vq, const struct vhost_umem_node *node, int type) @@ -1763,6 +1787,9 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) vhost_init_is_le(vq); + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) + return 0; + r = vhost_update_used_flags(vq); if (r) goto err; @@ -1836,7 +1863,8 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, /* Each buffer in the virtqueues is actually a chain of descriptors. This * function returns the next descriptor in the chain, * or -1U if we're at the end. */ -static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc) +static unsigned next_desc_split(struct vhost_virtqueue *vq, + struct vring_desc *desc) { unsigned int next; @@ -1849,11 +1877,17 @@ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc) return next; } -static int get_indirect(struct vhost_virtqueue *vq, - struct iovec iov[], unsigned int iov_size, - unsigned int *out_num, unsigned int *in_num, - struct vhost_log *log, unsigned int *log_num, - struct vring_desc *indirect) +static unsigned next_desc_packed(struct vhost_virtqueue *vq, +
[RFC PATCH V2 6/8] virtio: introduce packed ring defines
Signed-off-by: Jason Wang--- include/uapi/linux/virtio_config.h | 9 + include/uapi/linux/virtio_ring.h | 13 + 2 files changed, 22 insertions(+) diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h index 308e209..5903d51 100644 --- a/include/uapi/linux/virtio_config.h +++ b/include/uapi/linux/virtio_config.h @@ -71,4 +71,13 @@ * this is for compatibility with legacy systems. */ #define VIRTIO_F_IOMMU_PLATFORM33 + +#define VIRTIO_F_RING_PACKED 34 + +/* + * This feature indicates that all buffers are used by the device in + * the same order in which they have been made available. + */ +#define VIRTIO_F_IN_ORDER 35 + #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */ diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h index 6d5d5fa..e297580 100644 --- a/include/uapi/linux/virtio_ring.h +++ b/include/uapi/linux/virtio_ring.h @@ -43,6 +43,8 @@ #define VRING_DESC_F_WRITE 2 /* This means the buffer contains a list of buffer descriptors. */ #define VRING_DESC_F_INDIRECT 4 +#define VRING_DESC_F_AVAIL 7 +#define VRING_DESC_F_USED 15 /* The Host uses this in used->flags to advise the Guest: don't kick me when * you add a buffer. It's unreliable, so it's simply an optimization. Guest @@ -62,6 +64,17 @@ * at the end of the used ring. Guest should ignore the used->flags field. */ #define VIRTIO_RING_F_EVENT_IDX29 +struct vring_desc_packed { + /* Buffer Address. */ + __virtio64 addr; + /* Buffer Length. */ + __virtio32 len; + /* Buffer ID. */ + __virtio16 id; + /* The flags depending on descriptor type. */ + __virtio16 flags; +}; + /* Virtio ring descriptors: 16 bytes. These can chain together via "next". */ struct vring_desc { /* Address (guest-physical). */ -- 2.7.4
[RFC PATCH V2 6/8] virtio: introduce packed ring defines
Signed-off-by: Jason Wang --- include/uapi/linux/virtio_config.h | 9 + include/uapi/linux/virtio_ring.h | 13 + 2 files changed, 22 insertions(+) diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h index 308e209..5903d51 100644 --- a/include/uapi/linux/virtio_config.h +++ b/include/uapi/linux/virtio_config.h @@ -71,4 +71,13 @@ * this is for compatibility with legacy systems. */ #define VIRTIO_F_IOMMU_PLATFORM33 + +#define VIRTIO_F_RING_PACKED 34 + +/* + * This feature indicates that all buffers are used by the device in + * the same order in which they have been made available. + */ +#define VIRTIO_F_IN_ORDER 35 + #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */ diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h index 6d5d5fa..e297580 100644 --- a/include/uapi/linux/virtio_ring.h +++ b/include/uapi/linux/virtio_ring.h @@ -43,6 +43,8 @@ #define VRING_DESC_F_WRITE 2 /* This means the buffer contains a list of buffer descriptors. */ #define VRING_DESC_F_INDIRECT 4 +#define VRING_DESC_F_AVAIL 7 +#define VRING_DESC_F_USED 15 /* The Host uses this in used->flags to advise the Guest: don't kick me when * you add a buffer. It's unreliable, so it's simply an optimization. Guest @@ -62,6 +64,17 @@ * at the end of the used ring. Guest should ignore the used->flags field. */ #define VIRTIO_RING_F_EVENT_IDX29 +struct vring_desc_packed { + /* Buffer Address. */ + __virtio64 addr; + /* Buffer Length. */ + __virtio32 len; + /* Buffer ID. */ + __virtio16 id; + /* The flags depending on descriptor type. */ + __virtio16 flags; +}; + /* Virtio ring descriptors: 16 bytes. These can chain together via "next". */ struct vring_desc { /* Address (guest-physical). */ -- 2.7.4
linux-next: manual merge of the drm tree with Linus' tree
Hi all, Today's linux-next merge of the drm tree got conflicts in: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h drivers/gpu/drm/vmwgfx/vmwgfx_kms.c between commit: 140bcaa23a1c ("drm/vmwgfx: Fix black screen and device errors when running without fbdev") from Linus' tree and commit: c3b9b1657344 ("drm/vmwgfx: Improve on hibernation") from the drm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index 9116fe8baebc,9e60de95b863.. --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@@ -938,7 -947,8 +947,9 @@@ int vmw_kms_present(struct vmw_private int vmw_kms_update_layout_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); void vmw_kms_legacy_hotspot_clear(struct vmw_private *dev_priv); +void vmw_kms_lost_device(struct drm_device *dev); + int vmw_kms_suspend(struct drm_device *dev); + int vmw_kms_resume(struct drm_device *dev); int vmw_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --cc drivers/gpu/drm/vmwgfx/vmwgfx_kms.c index 3c824fd7cbf3,3628a9fe705f.. --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c @@@ -2561,11 -2551,10 +2557,12 @@@ int vmw_kms_helper_resource_prepare(str if (res->backup) { ret = vmw_kms_helper_buffer_prepare(res->dev_priv, res->backup, interruptible, - res->dev_priv->has_mob); + res->dev_priv->has_mob, + false); if (ret) goto out_unreserve; + + ctx->buf = vmw_dmabuf_reference(res->backup); } ret = vmw_resource_validate(res); if (ret) @@@ -2863,12 -2850,49 +2860,59 @@@ int vmw_kms_set_config(struct drm_mode_ } +/** + * vmw_kms_lost_device - Notify kms that modesetting capabilities will be lost + * + * @dev: Pointer to the drm device + */ +void vmw_kms_lost_device(struct drm_device *dev) +{ + drm_atomic_helper_shutdown(dev); +} ++ + /** + * vmw_kms_suspend - Save modesetting state and turn modesetting off. + * + * @dev: Pointer to the drm device + * Return: 0 on success. Negative error code on failure. + */ + int vmw_kms_suspend(struct drm_device *dev) + { + struct vmw_private *dev_priv = vmw_priv(dev); + + dev_priv->suspend_state = drm_atomic_helper_suspend(dev); + if (IS_ERR(dev_priv->suspend_state)) { + int ret = PTR_ERR(dev_priv->suspend_state); + + DRM_ERROR("Failed kms suspend: %d\n", ret); + dev_priv->suspend_state = NULL; + + return ret; + } + + return 0; + } + + + /** + * vmw_kms_resume - Re-enable modesetting and restore state + * + * @dev: Pointer to the drm device + * Return: 0 on success. Negative error code on failure. + * + * State is resumed from a previous vmw_kms_suspend(). It's illegal + * to call this function without a previous vmw_kms_suspend(). + */ + int vmw_kms_resume(struct drm_device *dev) + { + struct vmw_private *dev_priv = vmw_priv(dev); + int ret; + + if (WARN_ON(!dev_priv->suspend_state)) + return 0; + + ret = drm_atomic_helper_resume(dev, dev_priv->suspend_state); + dev_priv->suspend_state = NULL; + + return ret; + } pgpZ5ofp8Ayc4.pgp Description: OpenPGP digital signature
[RFC PATCH V2 5/8] vhost: vhost_put_user() can accept metadata type
We assumes used ring update is the only user for vhost_put_user() in the past. This may not be the case for the incoming packed ring which may update the descriptor ring for used. So introduce a new type parameter. Signed-off-by: Jason Wang--- drivers/vhost/vhost.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 65954d6..dcac4d4 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -847,7 +847,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq, return __vhost_get_user_slow(vq, addr, size, type); } -#define vhost_put_user(vq, x, ptr) \ +#define vhost_put_user(vq, x, ptr, type) \ ({ \ int ret = -EFAULT; \ if (!vq->iotlb) { \ @@ -855,7 +855,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq, } else { \ __typeof__(ptr) to = \ (__typeof__(ptr)) __vhost_get_user(vq, ptr, \ - sizeof(*ptr), VHOST_ADDR_USED); \ + sizeof(*ptr), type); \ if (to != NULL) \ ret = __put_user(x, to); \ else \ @@ -1716,7 +1716,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) { void __user *used; if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags), - >used->flags) < 0) + >used->flags, VHOST_ADDR_USED) < 0) return -EFAULT; if (unlikely(vq->log_used)) { /* Make sure the flag is seen before log. */ @@ -1735,7 +1735,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) { if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx), - vhost_avail_event(vq))) + vhost_avail_event(vq), VHOST_ADDR_USED)) return -EFAULT; if (unlikely(vq->log_used)) { void __user *used; @@ -2218,12 +2218,12 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, used = vq->used->ring + start; for (i = 0; i < count; i++) { if (unlikely(vhost_put_user(vq, heads[i].elem.id, - [i].id))) { + [i].id, VHOST_ADDR_USED))) { vq_err(vq, "Failed to write used id"); return -EFAULT; } if (unlikely(vhost_put_user(vq, heads[i].elem.len, - [i].len))) { + [i].len, VHOST_ADDR_USED))) { vq_err(vq, "Failed to write used len"); return -EFAULT; } @@ -2269,7 +2269,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vhost_used_elem *heads, /* Make sure buffer is written before we update index. */ smp_wmb(); if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx), - >used->idx)) { + >used->idx, VHOST_ADDR_USED)) { vq_err(vq, "Failed to increment used idx"); return -EFAULT; } -- 2.7.4
[RFC PATCH V2 3/8] vhost: do not use vring_used_elem
Instead of depending on the exported vring_used_elem, this patch switches to use a new internal structure vhost_used_elem which embed vring_used_elem in itself. This could be used to let vhost to record extra metadata for the incoming packed ring layout. Signed-off-by: Jason Wang--- drivers/vhost/net.c | 19 ++- drivers/vhost/scsi.c | 10 +- drivers/vhost/vhost.c | 33 - drivers/vhost/vhost.h | 18 +++--- drivers/vhost/vsock.c | 6 +++--- 5 files changed, 45 insertions(+), 41 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f821fcd..7ea2aee 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -337,10 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, int j = 0; for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) { - if (vq->heads[i].len == VHOST_DMA_FAILED_LEN) + if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN) vhost_net_tx_err(net); - if (VHOST_DMA_IS_DONE(vq->heads[i].len)) { - vq->heads[i].len = VHOST_DMA_CLEAR_LEN; + if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) { + vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN; ++j; } else break; @@ -363,7 +363,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) rcu_read_lock_bh(); /* set len to mark this desc buffers done DMA */ - vq->heads[ubuf->desc].len = success ? + vq->heads[ubuf->desc].elem.len = success ? VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN; cnt = vhost_net_ubuf_put(ubufs); @@ -422,7 +422,7 @@ static int vhost_net_enable_vq(struct vhost_net *n, static int vhost_net_tx_get_vq_desc(struct vhost_net *net, struct vhost_virtqueue *vq, - struct vring_used_elem *used_elem, + struct vhost_used_elem *used_elem, struct iovec iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num) { @@ -473,7 +473,7 @@ static void handle_tx(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); - struct vring_used_elem used; + struct vhost_used_elem used; bool zcopy, zcopy_used; mutex_lock(>mutex); @@ -537,9 +537,10 @@ static void handle_tx(struct vhost_net *net) struct ubuf_info *ubuf; ubuf = nvq->ubuf_info + nvq->upend_idx; - vq->heads[nvq->upend_idx].id = - cpu_to_vhost32(vq, used.id); - vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; + vq->heads[nvq->upend_idx].elem.id = + cpu_to_vhost32(vq, used.elem.id); + vq->heads[nvq->upend_idx].elem.len = + VHOST_DMA_IN_PROGRESS; ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 654c71f..ac11412 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -67,7 +67,7 @@ struct vhost_scsi_inflight { struct vhost_scsi_cmd { /* Descriptor from vhost_get_vq_desc() for virt_queue segment */ - struct vring_used_elem tvc_vq_used; + struct vhost_used_elem tvc_vq_used; /* virtio-scsi initiator task attribute */ int tvc_task_attr; /* virtio-scsi response incoming iovecs */ @@ -441,7 +441,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt) struct vhost_virtqueue *vq = >vqs[VHOST_SCSI_VQ_EVT].vq; struct virtio_scsi_event *event = >event; struct virtio_scsi_event __user *eventp; - struct vring_used_elem used; + struct vhost_used_elem used; unsigned out, in; int ret; @@ -785,7 +785,7 @@ static void vhost_scsi_submission_work(struct work_struct *work) static void vhost_scsi_send_bad_target(struct vhost_scsi *vs, struct vhost_virtqueue *vq, - struct vring_used_elem *used, unsigned out) + struct vhost_used_elem *used, unsigned out) { struct virtio_scsi_cmd_resp __user *resp; struct virtio_scsi_cmd_resp rsp; @@ -808,7 +808,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) struct virtio_scsi_cmd_req v_req; struct virtio_scsi_cmd_req_pi v_req_pi; struct vhost_scsi_cmd *cmd; - struct vring_used_elem used; +
[RFC PATCH V2 2/8] vhost: hide used ring layout from device
We used to return descriptor head by vhost_get_vq_desc() to device and pass it back to vhost_add_used() and its friends. This exposes the internal used ring layout to device which makes it hard to be extended for e.g packed ring layout. So this patch tries to hide the used ring layout by - letting vhost_get_vq_desc() return pointer to struct vring_used_elem - accepting pointer to struct vring_used_elem in vhost_add_used() and vhost_add_used_and_signal() This could help to hide used ring layout and make it easier to implement packed ring on top. Signed-off-by: Jason Wang--- drivers/vhost/net.c | 46 +- drivers/vhost/scsi.c | 62 +++ drivers/vhost/vhost.c | 52 +- drivers/vhost/vhost.h | 9 +--- drivers/vhost/vsock.c | 42 +- 5 files changed, 112 insertions(+), 99 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 57dfa63..f821fcd 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -422,22 +422,24 @@ static int vhost_net_enable_vq(struct vhost_net *n, static int vhost_net_tx_get_vq_desc(struct vhost_net *net, struct vhost_virtqueue *vq, + struct vring_used_elem *used_elem, struct iovec iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num) { unsigned long uninitialized_var(endtime); - int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), + int r = vhost_get_vq_desc(vq, used_elem, vq->iov, ARRAY_SIZE(vq->iov), out_num, in_num, NULL, NULL); - if (r == vq->num && vq->busyloop_timeout) { + if (r == -ENOSPC && vq->busyloop_timeout) { preempt_disable(); endtime = busy_clock() + vq->busyloop_timeout; while (vhost_can_busy_poll(vq->dev, endtime) && vhost_vq_avail_empty(vq->dev, vq)) cpu_relax(); preempt_enable(); - r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), - out_num, in_num, NULL, NULL); + r = vhost_get_vq_desc(vq, used_elem, vq->iov, + ARRAY_SIZE(vq->iov), out_num, in_num, + NULL, NULL); } return r; @@ -459,7 +461,6 @@ static void handle_tx(struct vhost_net *net) struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = >vq; unsigned out, in; - int head; struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0, @@ -472,6 +473,7 @@ static void handle_tx(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); + struct vring_used_elem used; bool zcopy, zcopy_used; mutex_lock(>mutex); @@ -494,20 +496,20 @@ static void handle_tx(struct vhost_net *net) vhost_zerocopy_signal_used(net, vq); - head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, - ARRAY_SIZE(vq->iov), - , ); - /* On error, stop handling until the next kick. */ - if (unlikely(head < 0)) - break; + err = vhost_net_tx_get_vq_desc(net, vq, , vq->iov, + ARRAY_SIZE(vq->iov), + , ); /* Nothing new? Wait for eventfd to tell us they refilled. */ - if (head == vq->num) { + if (err == -ENOSPC) { if (unlikely(vhost_enable_notify(>dev, vq))) { vhost_disable_notify(>dev, vq); continue; } break; } + /* On error, stop handling until the next kick. */ + if (unlikely(err < 0)) + break; if (in) { vq_err(vq, "Unexpected descriptor format for TX: " "out %d, int %d\n", out, in); @@ -535,7 +537,8 @@ static void handle_tx(struct vhost_net *net) struct ubuf_info *ubuf; ubuf = nvq->ubuf_info + nvq->upend_idx; - vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head); + vq->heads[nvq->upend_idx].id = + cpu_to_vhost32(vq, used.id); vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
[RFC PATCH V2 1/8] vhost: move get_rx_bufs to vhost.c
Move get_rx_bufs() to vhost.c and rename it to vhost_get_rx_bufs(). This helps to hide vring internal layout from specific device implementation. Packed ring implementation will benefit from this. Signed-off-by: Jason Wang--- drivers/vhost/net.c | 83 ++- drivers/vhost/vhost.c | 78 +++ drivers/vhost/vhost.h | 7 + 3 files changed, 88 insertions(+), 80 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8139bc7..57dfa63 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -658,83 +658,6 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk) return len; } -/* This is a multi-buffer version of vhost_get_desc, that works if - * vq has read descriptors only. - * @vq - the relevant virtqueue - * @datalen- data length we'll be reading - * @iovcount - returned count of io vectors we fill - * @log- vhost log - * @log_num- log offset - * @quota - headcount quota, 1 for big buffer - * returns number of buffer heads allocated, negative on error - */ -static int get_rx_bufs(struct vhost_virtqueue *vq, - struct vring_used_elem *heads, - int datalen, - unsigned *iovcount, - struct vhost_log *log, - unsigned *log_num, - unsigned int quota) -{ - unsigned int out, in; - int seg = 0; - int headcount = 0; - unsigned d; - int r, nlogs = 0; - /* len is always initialized before use since we are always called with -* datalen > 0. -*/ - u32 uninitialized_var(len); - - while (datalen > 0 && headcount < quota) { - if (unlikely(seg >= UIO_MAXIOV)) { - r = -ENOBUFS; - goto err; - } - r = vhost_get_vq_desc(vq, vq->iov + seg, - ARRAY_SIZE(vq->iov) - seg, , - , log, log_num); - if (unlikely(r < 0)) - goto err; - - d = r; - if (d == vq->num) { - r = 0; - goto err; - } - if (unlikely(out || in <= 0)) { - vq_err(vq, "unexpected descriptor format for RX: " - "out %d, in %d\n", out, in); - r = -EINVAL; - goto err; - } - if (unlikely(log)) { - nlogs += *log_num; - log += *log_num; - } - heads[headcount].id = cpu_to_vhost32(vq, d); - len = iov_length(vq->iov + seg, in); - heads[headcount].len = cpu_to_vhost32(vq, len); - datalen -= len; - ++headcount; - seg += in; - } - heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen); - *iovcount = seg; - if (unlikely(log)) - *log_num = nlogs; - - /* Detect overrun */ - if (unlikely(datalen > 0)) { - r = UIO_MAXIOV + 1; - goto err; - } - return headcount; -err: - vhost_discard_vq_desc(vq, headcount); - return r; -} - /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) @@ -784,9 +707,9 @@ static void handle_rx(struct vhost_net *net) while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) { sock_len += sock_hlen; vhost_len = sock_len + vhost_hlen; - headcount = get_rx_bufs(vq, vq->heads + nheads, vhost_len, - , vq_log, , - likely(mergeable) ? UIO_MAXIOV : 1); + headcount = vhost_get_bufs(vq, vq->heads + nheads, vhost_len, + , vq_log, , + likely(mergeable) ? UIO_MAXIOV : 1); /* On error, stop handling until the next kick. */ if (unlikely(headcount < 0)) goto out; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 1b3e8d2d..c57df71 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2098,6 +2098,84 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, } EXPORT_SYMBOL_GPL(vhost_get_vq_desc); +/* This is a multi-buffer version of vhost_get_desc, that works if + * vq has read descriptors only. + * @vq - the relevant virtqueue + * @datalen- data length we'll be reading + * @iovcount - returned count of io vectors we fill + * @log- vhost log + * @log_num- log offset + *
[RFC PATCH V2 4/8] vhost_net: do not explicitly manipulate vhost_used_elem
Two helpers of setting/getting used len were introduced to avoid explicitly manipulating vhost_used_elem in zerocopy code. This will be used to hide used_elem internals and simplify packed ring implementation. Signed-off-by: Jason Wang--- drivers/vhost/net.c | 11 +-- drivers/vhost/vhost.c | 12 ++-- drivers/vhost/vhost.h | 5 + 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 7ea2aee..7be8b55 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -337,9 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, int j = 0; for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) { - if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN) + if (vhost_get_used_len(vq, >heads[i]) == + VHOST_DMA_FAILED_LEN) vhost_net_tx_err(net); - if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) { + if (VHOST_DMA_IS_DONE(vhost_get_used_len(vq, >heads[i]))) { vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN; ++j; } else @@ -537,10 +538,8 @@ static void handle_tx(struct vhost_net *net) struct ubuf_info *ubuf; ubuf = nvq->ubuf_info + nvq->upend_idx; - vq->heads[nvq->upend_idx].elem.id = - cpu_to_vhost32(vq, used.elem.id); - vq->heads[nvq->upend_idx].elem.len = - VHOST_DMA_IN_PROGRESS; + vhost_set_used_len(vq, , VHOST_DMA_IN_PROGRESS); + vq->heads[nvq->upend_idx] = used; ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 8744dae..65954d6 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2100,11 +2100,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, } EXPORT_SYMBOL_GPL(vhost_get_vq_desc); -static void vhost_set_used_len(struct vhost_virtqueue *vq, - struct vhost_used_elem *used, int len) +void vhost_set_used_len(struct vhost_virtqueue *vq, + struct vhost_used_elem *used, int len) { used->elem.len = cpu_to_vhost32(vq, len); } +EXPORT_SYMBOL_GPL(vhost_set_used_len); + +int vhost_get_used_len(struct vhost_virtqueue *vq, + struct vhost_used_elem *used) +{ + return vhost32_to_cpu(vq, used->elem.len); +} +EXPORT_SYMBOL_GPL(vhost_get_used_len); /* This is a multi-buffer version of vhost_get_desc, that works if * vq has read descriptors only. diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 8399887..d57c875 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -198,6 +198,11 @@ int vhost_get_bufs(struct vhost_virtqueue *vq, unsigned *log_num, unsigned int quota, s16 *count); +void vhost_set_used_len(struct vhost_virtqueue *vq, + struct vhost_used_elem *used, + int len); +int vhost_get_used_len(struct vhost_virtqueue *vq, + struct vhost_used_elem *used); void vhost_discard_vq_desc(struct vhost_virtqueue *, int n); int vhost_vq_init_access(struct vhost_virtqueue *); -- 2.7.4
linux-next: manual merge of the drm tree with Linus' tree
Hi all, Today's linux-next merge of the drm tree got conflicts in: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h drivers/gpu/drm/vmwgfx/vmwgfx_kms.c between commit: 140bcaa23a1c ("drm/vmwgfx: Fix black screen and device errors when running without fbdev") from Linus' tree and commit: c3b9b1657344 ("drm/vmwgfx: Improve on hibernation") from the drm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index 9116fe8baebc,9e60de95b863.. --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@@ -938,7 -947,8 +947,9 @@@ int vmw_kms_present(struct vmw_private int vmw_kms_update_layout_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); void vmw_kms_legacy_hotspot_clear(struct vmw_private *dev_priv); +void vmw_kms_lost_device(struct drm_device *dev); + int vmw_kms_suspend(struct drm_device *dev); + int vmw_kms_resume(struct drm_device *dev); int vmw_dumb_create(struct drm_file *file_priv, struct drm_device *dev, diff --cc drivers/gpu/drm/vmwgfx/vmwgfx_kms.c index 3c824fd7cbf3,3628a9fe705f.. --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c @@@ -2561,11 -2551,10 +2557,12 @@@ int vmw_kms_helper_resource_prepare(str if (res->backup) { ret = vmw_kms_helper_buffer_prepare(res->dev_priv, res->backup, interruptible, - res->dev_priv->has_mob); + res->dev_priv->has_mob, + false); if (ret) goto out_unreserve; + + ctx->buf = vmw_dmabuf_reference(res->backup); } ret = vmw_resource_validate(res); if (ret) @@@ -2863,12 -2850,49 +2860,59 @@@ int vmw_kms_set_config(struct drm_mode_ } +/** + * vmw_kms_lost_device - Notify kms that modesetting capabilities will be lost + * + * @dev: Pointer to the drm device + */ +void vmw_kms_lost_device(struct drm_device *dev) +{ + drm_atomic_helper_shutdown(dev); +} ++ + /** + * vmw_kms_suspend - Save modesetting state and turn modesetting off. + * + * @dev: Pointer to the drm device + * Return: 0 on success. Negative error code on failure. + */ + int vmw_kms_suspend(struct drm_device *dev) + { + struct vmw_private *dev_priv = vmw_priv(dev); + + dev_priv->suspend_state = drm_atomic_helper_suspend(dev); + if (IS_ERR(dev_priv->suspend_state)) { + int ret = PTR_ERR(dev_priv->suspend_state); + + DRM_ERROR("Failed kms suspend: %d\n", ret); + dev_priv->suspend_state = NULL; + + return ret; + } + + return 0; + } + + + /** + * vmw_kms_resume - Re-enable modesetting and restore state + * + * @dev: Pointer to the drm device + * Return: 0 on success. Negative error code on failure. + * + * State is resumed from a previous vmw_kms_suspend(). It's illegal + * to call this function without a previous vmw_kms_suspend(). + */ + int vmw_kms_resume(struct drm_device *dev) + { + struct vmw_private *dev_priv = vmw_priv(dev); + int ret; + + if (WARN_ON(!dev_priv->suspend_state)) + return 0; + + ret = drm_atomic_helper_resume(dev, dev_priv->suspend_state); + dev_priv->suspend_state = NULL; + + return ret; + } pgpZ5ofp8Ayc4.pgp Description: OpenPGP digital signature
[RFC PATCH V2 5/8] vhost: vhost_put_user() can accept metadata type
We assumes used ring update is the only user for vhost_put_user() in the past. This may not be the case for the incoming packed ring which may update the descriptor ring for used. So introduce a new type parameter. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 65954d6..dcac4d4 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -847,7 +847,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq, return __vhost_get_user_slow(vq, addr, size, type); } -#define vhost_put_user(vq, x, ptr) \ +#define vhost_put_user(vq, x, ptr, type) \ ({ \ int ret = -EFAULT; \ if (!vq->iotlb) { \ @@ -855,7 +855,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq, } else { \ __typeof__(ptr) to = \ (__typeof__(ptr)) __vhost_get_user(vq, ptr, \ - sizeof(*ptr), VHOST_ADDR_USED); \ + sizeof(*ptr), type); \ if (to != NULL) \ ret = __put_user(x, to); \ else \ @@ -1716,7 +1716,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) { void __user *used; if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags), - >used->flags) < 0) + >used->flags, VHOST_ADDR_USED) < 0) return -EFAULT; if (unlikely(vq->log_used)) { /* Make sure the flag is seen before log. */ @@ -1735,7 +1735,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) { if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx), - vhost_avail_event(vq))) + vhost_avail_event(vq), VHOST_ADDR_USED)) return -EFAULT; if (unlikely(vq->log_used)) { void __user *used; @@ -2218,12 +2218,12 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, used = vq->used->ring + start; for (i = 0; i < count; i++) { if (unlikely(vhost_put_user(vq, heads[i].elem.id, - [i].id))) { + [i].id, VHOST_ADDR_USED))) { vq_err(vq, "Failed to write used id"); return -EFAULT; } if (unlikely(vhost_put_user(vq, heads[i].elem.len, - [i].len))) { + [i].len, VHOST_ADDR_USED))) { vq_err(vq, "Failed to write used len"); return -EFAULT; } @@ -2269,7 +2269,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vhost_used_elem *heads, /* Make sure buffer is written before we update index. */ smp_wmb(); if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx), - >used->idx)) { + >used->idx, VHOST_ADDR_USED)) { vq_err(vq, "Failed to increment used idx"); return -EFAULT; } -- 2.7.4
[RFC PATCH V2 3/8] vhost: do not use vring_used_elem
Instead of depending on the exported vring_used_elem, this patch switches to use a new internal structure vhost_used_elem which embed vring_used_elem in itself. This could be used to let vhost to record extra metadata for the incoming packed ring layout. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 19 ++- drivers/vhost/scsi.c | 10 +- drivers/vhost/vhost.c | 33 - drivers/vhost/vhost.h | 18 +++--- drivers/vhost/vsock.c | 6 +++--- 5 files changed, 45 insertions(+), 41 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f821fcd..7ea2aee 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -337,10 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, int j = 0; for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) { - if (vq->heads[i].len == VHOST_DMA_FAILED_LEN) + if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN) vhost_net_tx_err(net); - if (VHOST_DMA_IS_DONE(vq->heads[i].len)) { - vq->heads[i].len = VHOST_DMA_CLEAR_LEN; + if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) { + vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN; ++j; } else break; @@ -363,7 +363,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) rcu_read_lock_bh(); /* set len to mark this desc buffers done DMA */ - vq->heads[ubuf->desc].len = success ? + vq->heads[ubuf->desc].elem.len = success ? VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN; cnt = vhost_net_ubuf_put(ubufs); @@ -422,7 +422,7 @@ static int vhost_net_enable_vq(struct vhost_net *n, static int vhost_net_tx_get_vq_desc(struct vhost_net *net, struct vhost_virtqueue *vq, - struct vring_used_elem *used_elem, + struct vhost_used_elem *used_elem, struct iovec iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num) { @@ -473,7 +473,7 @@ static void handle_tx(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); - struct vring_used_elem used; + struct vhost_used_elem used; bool zcopy, zcopy_used; mutex_lock(>mutex); @@ -537,9 +537,10 @@ static void handle_tx(struct vhost_net *net) struct ubuf_info *ubuf; ubuf = nvq->ubuf_info + nvq->upend_idx; - vq->heads[nvq->upend_idx].id = - cpu_to_vhost32(vq, used.id); - vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; + vq->heads[nvq->upend_idx].elem.id = + cpu_to_vhost32(vq, used.elem.id); + vq->heads[nvq->upend_idx].elem.len = + VHOST_DMA_IN_PROGRESS; ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 654c71f..ac11412 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -67,7 +67,7 @@ struct vhost_scsi_inflight { struct vhost_scsi_cmd { /* Descriptor from vhost_get_vq_desc() for virt_queue segment */ - struct vring_used_elem tvc_vq_used; + struct vhost_used_elem tvc_vq_used; /* virtio-scsi initiator task attribute */ int tvc_task_attr; /* virtio-scsi response incoming iovecs */ @@ -441,7 +441,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt) struct vhost_virtqueue *vq = >vqs[VHOST_SCSI_VQ_EVT].vq; struct virtio_scsi_event *event = >event; struct virtio_scsi_event __user *eventp; - struct vring_used_elem used; + struct vhost_used_elem used; unsigned out, in; int ret; @@ -785,7 +785,7 @@ static void vhost_scsi_submission_work(struct work_struct *work) static void vhost_scsi_send_bad_target(struct vhost_scsi *vs, struct vhost_virtqueue *vq, - struct vring_used_elem *used, unsigned out) + struct vhost_used_elem *used, unsigned out) { struct virtio_scsi_cmd_resp __user *resp; struct virtio_scsi_cmd_resp rsp; @@ -808,7 +808,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) struct virtio_scsi_cmd_req v_req; struct virtio_scsi_cmd_req_pi v_req_pi; struct vhost_scsi_cmd *cmd; - struct vring_used_elem used; + struct
[RFC PATCH V2 2/8] vhost: hide used ring layout from device
We used to return descriptor head by vhost_get_vq_desc() to device and pass it back to vhost_add_used() and its friends. This exposes the internal used ring layout to device which makes it hard to be extended for e.g packed ring layout. So this patch tries to hide the used ring layout by - letting vhost_get_vq_desc() return pointer to struct vring_used_elem - accepting pointer to struct vring_used_elem in vhost_add_used() and vhost_add_used_and_signal() This could help to hide used ring layout and make it easier to implement packed ring on top. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 46 +- drivers/vhost/scsi.c | 62 +++ drivers/vhost/vhost.c | 52 +- drivers/vhost/vhost.h | 9 +--- drivers/vhost/vsock.c | 42 +- 5 files changed, 112 insertions(+), 99 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 57dfa63..f821fcd 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -422,22 +422,24 @@ static int vhost_net_enable_vq(struct vhost_net *n, static int vhost_net_tx_get_vq_desc(struct vhost_net *net, struct vhost_virtqueue *vq, + struct vring_used_elem *used_elem, struct iovec iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num) { unsigned long uninitialized_var(endtime); - int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), + int r = vhost_get_vq_desc(vq, used_elem, vq->iov, ARRAY_SIZE(vq->iov), out_num, in_num, NULL, NULL); - if (r == vq->num && vq->busyloop_timeout) { + if (r == -ENOSPC && vq->busyloop_timeout) { preempt_disable(); endtime = busy_clock() + vq->busyloop_timeout; while (vhost_can_busy_poll(vq->dev, endtime) && vhost_vq_avail_empty(vq->dev, vq)) cpu_relax(); preempt_enable(); - r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), - out_num, in_num, NULL, NULL); + r = vhost_get_vq_desc(vq, used_elem, vq->iov, + ARRAY_SIZE(vq->iov), out_num, in_num, + NULL, NULL); } return r; @@ -459,7 +461,6 @@ static void handle_tx(struct vhost_net *net) struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = >vq; unsigned out, in; - int head; struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0, @@ -472,6 +473,7 @@ static void handle_tx(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); + struct vring_used_elem used; bool zcopy, zcopy_used; mutex_lock(>mutex); @@ -494,20 +496,20 @@ static void handle_tx(struct vhost_net *net) vhost_zerocopy_signal_used(net, vq); - head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, - ARRAY_SIZE(vq->iov), - , ); - /* On error, stop handling until the next kick. */ - if (unlikely(head < 0)) - break; + err = vhost_net_tx_get_vq_desc(net, vq, , vq->iov, + ARRAY_SIZE(vq->iov), + , ); /* Nothing new? Wait for eventfd to tell us they refilled. */ - if (head == vq->num) { + if (err == -ENOSPC) { if (unlikely(vhost_enable_notify(>dev, vq))) { vhost_disable_notify(>dev, vq); continue; } break; } + /* On error, stop handling until the next kick. */ + if (unlikely(err < 0)) + break; if (in) { vq_err(vq, "Unexpected descriptor format for TX: " "out %d, int %d\n", out, in); @@ -535,7 +537,8 @@ static void handle_tx(struct vhost_net *net) struct ubuf_info *ubuf; ubuf = nvq->ubuf_info + nvq->upend_idx; - vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head); + vq->heads[nvq->upend_idx].id = + cpu_to_vhost32(vq, used.id); vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; ubuf->callback =
[RFC PATCH V2 1/8] vhost: move get_rx_bufs to vhost.c
Move get_rx_bufs() to vhost.c and rename it to vhost_get_rx_bufs(). This helps to hide vring internal layout from specific device implementation. Packed ring implementation will benefit from this. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 83 ++- drivers/vhost/vhost.c | 78 +++ drivers/vhost/vhost.h | 7 + 3 files changed, 88 insertions(+), 80 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8139bc7..57dfa63 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -658,83 +658,6 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk) return len; } -/* This is a multi-buffer version of vhost_get_desc, that works if - * vq has read descriptors only. - * @vq - the relevant virtqueue - * @datalen- data length we'll be reading - * @iovcount - returned count of io vectors we fill - * @log- vhost log - * @log_num- log offset - * @quota - headcount quota, 1 for big buffer - * returns number of buffer heads allocated, negative on error - */ -static int get_rx_bufs(struct vhost_virtqueue *vq, - struct vring_used_elem *heads, - int datalen, - unsigned *iovcount, - struct vhost_log *log, - unsigned *log_num, - unsigned int quota) -{ - unsigned int out, in; - int seg = 0; - int headcount = 0; - unsigned d; - int r, nlogs = 0; - /* len is always initialized before use since we are always called with -* datalen > 0. -*/ - u32 uninitialized_var(len); - - while (datalen > 0 && headcount < quota) { - if (unlikely(seg >= UIO_MAXIOV)) { - r = -ENOBUFS; - goto err; - } - r = vhost_get_vq_desc(vq, vq->iov + seg, - ARRAY_SIZE(vq->iov) - seg, , - , log, log_num); - if (unlikely(r < 0)) - goto err; - - d = r; - if (d == vq->num) { - r = 0; - goto err; - } - if (unlikely(out || in <= 0)) { - vq_err(vq, "unexpected descriptor format for RX: " - "out %d, in %d\n", out, in); - r = -EINVAL; - goto err; - } - if (unlikely(log)) { - nlogs += *log_num; - log += *log_num; - } - heads[headcount].id = cpu_to_vhost32(vq, d); - len = iov_length(vq->iov + seg, in); - heads[headcount].len = cpu_to_vhost32(vq, len); - datalen -= len; - ++headcount; - seg += in; - } - heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen); - *iovcount = seg; - if (unlikely(log)) - *log_num = nlogs; - - /* Detect overrun */ - if (unlikely(datalen > 0)) { - r = UIO_MAXIOV + 1; - goto err; - } - return headcount; -err: - vhost_discard_vq_desc(vq, headcount); - return r; -} - /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) @@ -784,9 +707,9 @@ static void handle_rx(struct vhost_net *net) while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) { sock_len += sock_hlen; vhost_len = sock_len + vhost_hlen; - headcount = get_rx_bufs(vq, vq->heads + nheads, vhost_len, - , vq_log, , - likely(mergeable) ? UIO_MAXIOV : 1); + headcount = vhost_get_bufs(vq, vq->heads + nheads, vhost_len, + , vq_log, , + likely(mergeable) ? UIO_MAXIOV : 1); /* On error, stop handling until the next kick. */ if (unlikely(headcount < 0)) goto out; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 1b3e8d2d..c57df71 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2098,6 +2098,84 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, } EXPORT_SYMBOL_GPL(vhost_get_vq_desc); +/* This is a multi-buffer version of vhost_get_desc, that works if + * vq has read descriptors only. + * @vq - the relevant virtqueue + * @datalen- data length we'll be reading + * @iovcount - returned count of io vectors we fill + * @log- vhost log + * @log_num- log offset + * @quota -
[RFC PATCH V2 4/8] vhost_net: do not explicitly manipulate vhost_used_elem
Two helpers of setting/getting used len were introduced to avoid explicitly manipulating vhost_used_elem in zerocopy code. This will be used to hide used_elem internals and simplify packed ring implementation. Signed-off-by: Jason Wang --- drivers/vhost/net.c | 11 +-- drivers/vhost/vhost.c | 12 ++-- drivers/vhost/vhost.h | 5 + 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 7ea2aee..7be8b55 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -337,9 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, int j = 0; for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) { - if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN) + if (vhost_get_used_len(vq, >heads[i]) == + VHOST_DMA_FAILED_LEN) vhost_net_tx_err(net); - if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) { + if (VHOST_DMA_IS_DONE(vhost_get_used_len(vq, >heads[i]))) { vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN; ++j; } else @@ -537,10 +538,8 @@ static void handle_tx(struct vhost_net *net) struct ubuf_info *ubuf; ubuf = nvq->ubuf_info + nvq->upend_idx; - vq->heads[nvq->upend_idx].elem.id = - cpu_to_vhost32(vq, used.elem.id); - vq->heads[nvq->upend_idx].elem.len = - VHOST_DMA_IN_PROGRESS; + vhost_set_used_len(vq, , VHOST_DMA_IN_PROGRESS); + vq->heads[nvq->upend_idx] = used; ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 8744dae..65954d6 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2100,11 +2100,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, } EXPORT_SYMBOL_GPL(vhost_get_vq_desc); -static void vhost_set_used_len(struct vhost_virtqueue *vq, - struct vhost_used_elem *used, int len) +void vhost_set_used_len(struct vhost_virtqueue *vq, + struct vhost_used_elem *used, int len) { used->elem.len = cpu_to_vhost32(vq, len); } +EXPORT_SYMBOL_GPL(vhost_set_used_len); + +int vhost_get_used_len(struct vhost_virtqueue *vq, + struct vhost_used_elem *used) +{ + return vhost32_to_cpu(vq, used->elem.len); +} +EXPORT_SYMBOL_GPL(vhost_get_used_len); /* This is a multi-buffer version of vhost_get_desc, that works if * vq has read descriptors only. diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 8399887..d57c875 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -198,6 +198,11 @@ int vhost_get_bufs(struct vhost_virtqueue *vq, unsigned *log_num, unsigned int quota, s16 *count); +void vhost_set_used_len(struct vhost_virtqueue *vq, + struct vhost_used_elem *used, + int len); +int vhost_get_used_len(struct vhost_virtqueue *vq, + struct vhost_used_elem *used); void vhost_discard_vq_desc(struct vhost_virtqueue *, int n); int vhost_vq_init_access(struct vhost_virtqueue *); -- 2.7.4
[RFC PATCH V2 0/8] Packed ring for vhost
Hi all: This RFC implement packed ring layout. The code were tested with pmd implement by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change was needed for pmd codes to kick virtqueue since it assumes a busy polling backend. Test were done between localhost and guest. Testpmd (rxonly) in guest reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps. Notes: The event suppression /indirect descriptor support is complied test only because of lacked driver support. Changes from V1: - Refactor vhost used elem code to avoid open coding on used elem - Event suppression support (compile test only). - Indirect descriptor support (compile test only). - Zerocopy support. - vIOMMU support. - SCSI/VSOCK support (compile test only). - Fix several bugs For simplicity, I don't implement batching or other optimizations. Please review. Thanks Jason Wang (8): vhost: move get_rx_bufs to vhost.c vhost: hide used ring layout from device vhost: do not use vring_used_elem vhost_net: do not explicitly manipulate vhost_used_elem vhost: vhost_put_user() can accept metadata type virtio: introduce packed ring defines vhost: packed ring support vhost: event suppression for packed ring drivers/vhost/net.c| 138 ++- drivers/vhost/scsi.c | 62 +-- drivers/vhost/vhost.c | 818 ++--- drivers/vhost/vhost.h | 46 ++- drivers/vhost/vsock.c | 42 +- include/uapi/linux/virtio_config.h | 9 + include/uapi/linux/virtio_ring.h | 32 ++ 7 files changed, 921 insertions(+), 226 deletions(-) -- 2.7.4
[RFC PATCH V2 0/8] Packed ring for vhost
Hi all: This RFC implement packed ring layout. The code were tested with pmd implement by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change was needed for pmd codes to kick virtqueue since it assumes a busy polling backend. Test were done between localhost and guest. Testpmd (rxonly) in guest reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps. Notes: The event suppression /indirect descriptor support is complied test only because of lacked driver support. Changes from V1: - Refactor vhost used elem code to avoid open coding on used elem - Event suppression support (compile test only). - Indirect descriptor support (compile test only). - Zerocopy support. - vIOMMU support. - SCSI/VSOCK support (compile test only). - Fix several bugs For simplicity, I don't implement batching or other optimizations. Please review. Thanks Jason Wang (8): vhost: move get_rx_bufs to vhost.c vhost: hide used ring layout from device vhost: do not use vring_used_elem vhost_net: do not explicitly manipulate vhost_used_elem vhost: vhost_put_user() can accept metadata type virtio: introduce packed ring defines vhost: packed ring support vhost: event suppression for packed ring drivers/vhost/net.c| 138 ++- drivers/vhost/scsi.c | 62 +-- drivers/vhost/vhost.c | 818 ++--- drivers/vhost/vhost.h | 46 ++- drivers/vhost/vsock.c | 42 +- include/uapi/linux/virtio_config.h | 9 + include/uapi/linux/virtio_ring.h | 32 ++ 7 files changed, 921 insertions(+), 226 deletions(-) -- 2.7.4
RE: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules
On Monday, March 26, 2018 10:40 AM, Wang, Wei W wrote: > Subject: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled > to kernel modules > > In some usages, e.g. virtio-balloon, a kernel module needs to know if page > poisoning is in use. This patch exposes the page_poisoning_enabled function > to kernel modules. > > Signed-off-by: Wei Wang> Cc: Andrew Morton > Cc: Michal Hocko > Cc: Michael S. Tsirkin > --- > mm/page_poison.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/page_poison.c b/mm/page_poison.c index e83fd44..762b472 > 100644 > --- a/mm/page_poison.c > +++ b/mm/page_poison.c > @@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf) } > early_param("page_poison", early_page_poison_param); > > +/** > + * page_poisoning_enabled - check if page poisoning is enabled > + * > + * Return true if page poisoning is enabled, or false if not. > + */ > bool page_poisoning_enabled(void) > { > /* > @@ -29,6 +34,7 @@ bool page_poisoning_enabled(void) > > (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && > debug_pagealloc_enabled())); > } > +EXPORT_SYMBOL_GPL(page_poisoning_enabled); > > static void poison_page(struct page *page) { > -- > 2.7.4 Could we get a review of this patch? We've reviewed other parts, and this one seems to be the last part of this feature. Thanks. Best, Wei
RE: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules
On Monday, March 26, 2018 10:40 AM, Wang, Wei W wrote: > Subject: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled > to kernel modules > > In some usages, e.g. virtio-balloon, a kernel module needs to know if page > poisoning is in use. This patch exposes the page_poisoning_enabled function > to kernel modules. > > Signed-off-by: Wei Wang > Cc: Andrew Morton > Cc: Michal Hocko > Cc: Michael S. Tsirkin > --- > mm/page_poison.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/page_poison.c b/mm/page_poison.c index e83fd44..762b472 > 100644 > --- a/mm/page_poison.c > +++ b/mm/page_poison.c > @@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf) } > early_param("page_poison", early_page_poison_param); > > +/** > + * page_poisoning_enabled - check if page poisoning is enabled > + * > + * Return true if page poisoning is enabled, or false if not. > + */ > bool page_poisoning_enabled(void) > { > /* > @@ -29,6 +34,7 @@ bool page_poisoning_enabled(void) > > (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && > debug_pagealloc_enabled())); > } > +EXPORT_SYMBOL_GPL(page_poisoning_enabled); > > static void poison_page(struct page *page) { > -- > 2.7.4 Could we get a review of this patch? We've reviewed other parts, and this one seems to be the last part of this feature. Thanks. Best, Wei
Re: [PATCH v7 6/6] typec: tcpm: Add support for sink PPS related messages
On 03/23/2018 03:12 AM, Adam Thomson wrote: This commit adds sink side support for Get_Status, Status, Get_PPS_Status and PPS_Status handling. As there's the potential for a partner to respond with Not_Supported, handling of this message is also added. Sending of Not_Supported is added to handle messagescreceived but not yet handled. Signed-off-by: Adam ThomsonAcked-by: Heikki Krogerus Reviewed-by: Guenter Roeck --- drivers/usb/typec/tcpm.c | 143 --- 1 file changed, 134 insertions(+), 9 deletions(-) diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c index 57a7d1a..7025a16 100644 --- a/drivers/usb/typec/tcpm.c +++ b/drivers/usb/typec/tcpm.c @@ -19,7 +19,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -113,6 +115,11 @@ S(SNK_TRYWAIT_VBUS),\ S(BIST_RX), \ \ + S(GET_STATUS_SEND), \ + S(GET_STATUS_SEND_TIMEOUT), \ + S(GET_PPS_STATUS_SEND), \ + S(GET_PPS_STATUS_SEND_TIMEOUT), \ + \ S(ERROR_RECOVERY), \ S(PORT_RESET), \ S(PORT_RESET_WAIT_OFF) @@ -143,6 +150,7 @@ enum pd_msg_request { PD_MSG_NONE = 0, PD_MSG_CTRL_REJECT, PD_MSG_CTRL_WAIT, + PD_MSG_CTRL_NOT_SUPP, PD_MSG_DATA_SINK_CAP, PD_MSG_DATA_SOURCE_CAP, }; @@ -1398,10 +1406,42 @@ static int tcpm_validate_caps(struct tcpm_port *port, const u32 *pdo, /* * PD (data, control) command handling functions */ +static inline enum tcpm_state ready_state(struct tcpm_port *port) +{ + if (port->pwr_role == TYPEC_SOURCE) + return SRC_READY; + else + return SNK_READY; +} static int tcpm_pd_send_control(struct tcpm_port *port, enum pd_ctrl_msg_type type); +static void tcpm_handle_alert(struct tcpm_port *port, const __le32 *payload, + int cnt) +{ + u32 p0 = le32_to_cpu(payload[0]); + unsigned int type = usb_pd_ado_type(p0); + + if (!type) { + tcpm_log(port, "Alert message received with no type"); + return; + } + + /* Just handling non-battery alerts for now */ + if (!(type & USB_PD_ADO_TYPE_BATT_STATUS_CHANGE)) { + switch (port->state) { + case SRC_READY: + case SNK_READY: + tcpm_set_state(port, GET_STATUS_SEND, 0); + break; + default: + tcpm_queue_message(port, PD_MSG_CTRL_WAIT); + break; + } + } +} + static void tcpm_pd_data_request(struct tcpm_port *port, const struct pd_message *msg) { @@ -1489,6 +1529,14 @@ static void tcpm_pd_data_request(struct tcpm_port *port, tcpm_set_state(port, BIST_RX, 0); } break; + case PD_DATA_ALERT: + tcpm_handle_alert(port, msg->payload, cnt); + break; + case PD_DATA_BATT_STATUS: + case PD_DATA_GET_COUNTRY_INFO: + /* Currently unsupported */ + tcpm_queue_message(port, PD_MSG_CTRL_NOT_SUPP); + break; default: tcpm_log(port, "Unhandled data message type %#x", type); break; @@ -1571,6 +1619,7 @@ static void tcpm_pd_ctrl_request(struct tcpm_port *port, break; case PD_CTRL_REJECT: case PD_CTRL_WAIT: + case PD_CTRL_NOT_SUPP: switch (port->state) { case SNK_NEGOTIATE_CAPABILITIES: /* USB PD specification, Figure 8-43 */ @@ -1690,12 +1739,75 @@ static void tcpm_pd_ctrl_request(struct tcpm_port *port, break; } break; + case PD_CTRL_GET_SOURCE_CAP_EXT: + case PD_CTRL_GET_STATUS: + case PD_CTRL_FR_SWAP: + case PD_CTRL_GET_PPS_STATUS: + case PD_CTRL_GET_COUNTRY_CODES: + /* Currently not supported */ + tcpm_queue_message(port, PD_MSG_CTRL_NOT_SUPP); + break; default: tcpm_log(port, "Unhandled ctrl message type %#x", type); break; } } +static void tcpm_pd_ext_msg_request(struct tcpm_port *port, + const struct pd_message *msg) +{ + enum pd_ext_msg_type type = pd_header_type_le(msg->header); + unsigned int data_size = pd_ext_header_data_size_le(msg->ext_msg.header); + + if (!(msg->ext_msg.header &&
Re: [PATCH v7 6/6] typec: tcpm: Add support for sink PPS related messages
On 03/23/2018 03:12 AM, Adam Thomson wrote: This commit adds sink side support for Get_Status, Status, Get_PPS_Status and PPS_Status handling. As there's the potential for a partner to respond with Not_Supported, handling of this message is also added. Sending of Not_Supported is added to handle messagescreceived but not yet handled. Signed-off-by: Adam Thomson Acked-by: Heikki Krogerus Reviewed-by: Guenter Roeck --- drivers/usb/typec/tcpm.c | 143 --- 1 file changed, 134 insertions(+), 9 deletions(-) diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c index 57a7d1a..7025a16 100644 --- a/drivers/usb/typec/tcpm.c +++ b/drivers/usb/typec/tcpm.c @@ -19,7 +19,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -113,6 +115,11 @@ S(SNK_TRYWAIT_VBUS),\ S(BIST_RX), \ \ + S(GET_STATUS_SEND), \ + S(GET_STATUS_SEND_TIMEOUT), \ + S(GET_PPS_STATUS_SEND), \ + S(GET_PPS_STATUS_SEND_TIMEOUT), \ + \ S(ERROR_RECOVERY), \ S(PORT_RESET), \ S(PORT_RESET_WAIT_OFF) @@ -143,6 +150,7 @@ enum pd_msg_request { PD_MSG_NONE = 0, PD_MSG_CTRL_REJECT, PD_MSG_CTRL_WAIT, + PD_MSG_CTRL_NOT_SUPP, PD_MSG_DATA_SINK_CAP, PD_MSG_DATA_SOURCE_CAP, }; @@ -1398,10 +1406,42 @@ static int tcpm_validate_caps(struct tcpm_port *port, const u32 *pdo, /* * PD (data, control) command handling functions */ +static inline enum tcpm_state ready_state(struct tcpm_port *port) +{ + if (port->pwr_role == TYPEC_SOURCE) + return SRC_READY; + else + return SNK_READY; +} static int tcpm_pd_send_control(struct tcpm_port *port, enum pd_ctrl_msg_type type); +static void tcpm_handle_alert(struct tcpm_port *port, const __le32 *payload, + int cnt) +{ + u32 p0 = le32_to_cpu(payload[0]); + unsigned int type = usb_pd_ado_type(p0); + + if (!type) { + tcpm_log(port, "Alert message received with no type"); + return; + } + + /* Just handling non-battery alerts for now */ + if (!(type & USB_PD_ADO_TYPE_BATT_STATUS_CHANGE)) { + switch (port->state) { + case SRC_READY: + case SNK_READY: + tcpm_set_state(port, GET_STATUS_SEND, 0); + break; + default: + tcpm_queue_message(port, PD_MSG_CTRL_WAIT); + break; + } + } +} + static void tcpm_pd_data_request(struct tcpm_port *port, const struct pd_message *msg) { @@ -1489,6 +1529,14 @@ static void tcpm_pd_data_request(struct tcpm_port *port, tcpm_set_state(port, BIST_RX, 0); } break; + case PD_DATA_ALERT: + tcpm_handle_alert(port, msg->payload, cnt); + break; + case PD_DATA_BATT_STATUS: + case PD_DATA_GET_COUNTRY_INFO: + /* Currently unsupported */ + tcpm_queue_message(port, PD_MSG_CTRL_NOT_SUPP); + break; default: tcpm_log(port, "Unhandled data message type %#x", type); break; @@ -1571,6 +1619,7 @@ static void tcpm_pd_ctrl_request(struct tcpm_port *port, break; case PD_CTRL_REJECT: case PD_CTRL_WAIT: + case PD_CTRL_NOT_SUPP: switch (port->state) { case SNK_NEGOTIATE_CAPABILITIES: /* USB PD specification, Figure 8-43 */ @@ -1690,12 +1739,75 @@ static void tcpm_pd_ctrl_request(struct tcpm_port *port, break; } break; + case PD_CTRL_GET_SOURCE_CAP_EXT: + case PD_CTRL_GET_STATUS: + case PD_CTRL_FR_SWAP: + case PD_CTRL_GET_PPS_STATUS: + case PD_CTRL_GET_COUNTRY_CODES: + /* Currently not supported */ + tcpm_queue_message(port, PD_MSG_CTRL_NOT_SUPP); + break; default: tcpm_log(port, "Unhandled ctrl message type %#x", type); break; } } +static void tcpm_pd_ext_msg_request(struct tcpm_port *port, + const struct pd_message *msg) +{ + enum pd_ext_msg_type type = pd_header_type_le(msg->header); + unsigned int data_size = pd_ext_header_data_size_le(msg->ext_msg.header); + + if (!(msg->ext_msg.header && PD_EXT_HDR_CHUNKED)) { + tcpm_log(port, "Unchunked extended messages
Re: [PATCH v7 5/6] typec: tcpm: Represent source supply through power_supply
On 03/23/2018 03:12 AM, Adam Thomson wrote: This commit adds a power_supply class instance to represent a PD source's voltage and current properties. This provides an interface for reading these properties from user-space or other drivers. For PPS enabled Sources, this also provides write access to set the current and voltage and allows for swapping between standard PDO and PPS APDO. As this represents a superset of the information provided in the fusb302 driver, the power_supply instance in that code is removed as part of this change, so reverting the commit titled 'typec: tcpm: Represent source supply through power_supply class' Signed-off-by: Adam ThomsonReviewed-by: Guenter Roeck --- drivers/usb/typec/Kconfig | 1 + drivers/usb/typec/fusb302/Kconfig | 2 +- drivers/usb/typec/fusb302/fusb302.c | 63 +- drivers/usb/typec/tcpm.c| 245 +++- 4 files changed, 248 insertions(+), 63 deletions(-) diff --git a/drivers/usb/typec/Kconfig b/drivers/usb/typec/Kconfig index bcb2744..1ef606d 100644 --- a/drivers/usb/typec/Kconfig +++ b/drivers/usb/typec/Kconfig @@ -48,6 +48,7 @@ if TYPEC config TYPEC_TCPM tristate "USB Type-C Port Controller Manager" depends on USB + select POWER_SUPPLY help The Type-C Port Controller Manager provides a USB PD and USB Type-C state machine for use with Type-C Port Controllers. diff --git a/drivers/usb/typec/fusb302/Kconfig b/drivers/usb/typec/fusb302/Kconfig index 48a4f2f..fce099f 100644 --- a/drivers/usb/typec/fusb302/Kconfig +++ b/drivers/usb/typec/fusb302/Kconfig @@ -1,6 +1,6 @@ config TYPEC_FUSB302 tristate "Fairchild FUSB302 Type-C chip driver" - depends on I2C && POWER_SUPPLY + depends on I2C help The Fairchild FUSB302 Type-C chip driver that works with Type-C Port Controller Manager to provide USB PD and USB diff --git a/drivers/usb/typec/fusb302/fusb302.c b/drivers/usb/typec/fusb302/fusb302.c index 06794c0..6a8f279 100644 --- a/drivers/usb/typec/fusb302/fusb302.c +++ b/drivers/usb/typec/fusb302/fusb302.c @@ -18,7 +18,6 @@ #include #include #include -#include #include #include #include @@ -99,11 +98,6 @@ struct fusb302_chip { /* lock for sharing chip states */ struct mutex lock; - /* psy + psy status */ - struct power_supply *psy; - u32 current_limit; - u32 supply_voltage; - /* chip status */ enum toggling_mode toggling_mode; enum src_current_status src_current_status; @@ -861,13 +855,11 @@ static int tcpm_set_vbus(struct tcpc_dev *dev, bool on, bool charge) chip->vbus_on = on; fusb302_log(chip, "vbus := %s", on ? "On" : "Off"); } - if (chip->charge_on == charge) { + if (chip->charge_on == charge) fusb302_log(chip, "charge is already %s", charge ? "On" : "Off"); - } else { + else chip->charge_on = charge; - power_supply_changed(chip->psy); - } done: mutex_unlock(>lock); @@ -883,11 +875,6 @@ static int tcpm_set_current_limit(struct tcpc_dev *dev, u32 max_ma, u32 mv) fusb302_log(chip, "current limit: %d ma, %d mv (not implemented)", max_ma, mv); - chip->supply_voltage = mv; - chip->current_limit = max_ma; - - power_supply_changed(chip->psy); - return 0; } @@ -1686,43 +1673,6 @@ static irqreturn_t fusb302_irq_intn(int irq, void *dev_id) return IRQ_HANDLED; } -static int fusb302_psy_get_property(struct power_supply *psy, - enum power_supply_property psp, - union power_supply_propval *val) -{ - struct fusb302_chip *chip = power_supply_get_drvdata(psy); - - switch (psp) { - case POWER_SUPPLY_PROP_ONLINE: - val->intval = chip->charge_on; - break; - case POWER_SUPPLY_PROP_VOLTAGE_NOW: - val->intval = chip->supply_voltage * 1000; /* mV -> µV */ - break; - case POWER_SUPPLY_PROP_CURRENT_MAX: - val->intval = chip->current_limit * 1000; /* mA -> µA */ - break; - default: - return -ENODATA; - } - - return 0; -} - -static enum power_supply_property fusb302_psy_properties[] = { - POWER_SUPPLY_PROP_ONLINE, - POWER_SUPPLY_PROP_VOLTAGE_NOW, - POWER_SUPPLY_PROP_CURRENT_MAX, -}; - -static const struct power_supply_desc fusb302_psy_desc = { - .name = "fusb302-typec-source", - .type = POWER_SUPPLY_TYPE_USB_TYPE_C, - .properties = fusb302_psy_properties, - .num_properties = ARRAY_SIZE(fusb302_psy_properties), - .get_property = fusb302_psy_get_property, -}; -
Re: [PATCH v7 5/6] typec: tcpm: Represent source supply through power_supply
On 03/23/2018 03:12 AM, Adam Thomson wrote: This commit adds a power_supply class instance to represent a PD source's voltage and current properties. This provides an interface for reading these properties from user-space or other drivers. For PPS enabled Sources, this also provides write access to set the current and voltage and allows for swapping between standard PDO and PPS APDO. As this represents a superset of the information provided in the fusb302 driver, the power_supply instance in that code is removed as part of this change, so reverting the commit titled 'typec: tcpm: Represent source supply through power_supply class' Signed-off-by: Adam Thomson Reviewed-by: Guenter Roeck --- drivers/usb/typec/Kconfig | 1 + drivers/usb/typec/fusb302/Kconfig | 2 +- drivers/usb/typec/fusb302/fusb302.c | 63 +- drivers/usb/typec/tcpm.c| 245 +++- 4 files changed, 248 insertions(+), 63 deletions(-) diff --git a/drivers/usb/typec/Kconfig b/drivers/usb/typec/Kconfig index bcb2744..1ef606d 100644 --- a/drivers/usb/typec/Kconfig +++ b/drivers/usb/typec/Kconfig @@ -48,6 +48,7 @@ if TYPEC config TYPEC_TCPM tristate "USB Type-C Port Controller Manager" depends on USB + select POWER_SUPPLY help The Type-C Port Controller Manager provides a USB PD and USB Type-C state machine for use with Type-C Port Controllers. diff --git a/drivers/usb/typec/fusb302/Kconfig b/drivers/usb/typec/fusb302/Kconfig index 48a4f2f..fce099f 100644 --- a/drivers/usb/typec/fusb302/Kconfig +++ b/drivers/usb/typec/fusb302/Kconfig @@ -1,6 +1,6 @@ config TYPEC_FUSB302 tristate "Fairchild FUSB302 Type-C chip driver" - depends on I2C && POWER_SUPPLY + depends on I2C help The Fairchild FUSB302 Type-C chip driver that works with Type-C Port Controller Manager to provide USB PD and USB diff --git a/drivers/usb/typec/fusb302/fusb302.c b/drivers/usb/typec/fusb302/fusb302.c index 06794c0..6a8f279 100644 --- a/drivers/usb/typec/fusb302/fusb302.c +++ b/drivers/usb/typec/fusb302/fusb302.c @@ -18,7 +18,6 @@ #include #include #include -#include #include #include #include @@ -99,11 +98,6 @@ struct fusb302_chip { /* lock for sharing chip states */ struct mutex lock; - /* psy + psy status */ - struct power_supply *psy; - u32 current_limit; - u32 supply_voltage; - /* chip status */ enum toggling_mode toggling_mode; enum src_current_status src_current_status; @@ -861,13 +855,11 @@ static int tcpm_set_vbus(struct tcpc_dev *dev, bool on, bool charge) chip->vbus_on = on; fusb302_log(chip, "vbus := %s", on ? "On" : "Off"); } - if (chip->charge_on == charge) { + if (chip->charge_on == charge) fusb302_log(chip, "charge is already %s", charge ? "On" : "Off"); - } else { + else chip->charge_on = charge; - power_supply_changed(chip->psy); - } done: mutex_unlock(>lock); @@ -883,11 +875,6 @@ static int tcpm_set_current_limit(struct tcpc_dev *dev, u32 max_ma, u32 mv) fusb302_log(chip, "current limit: %d ma, %d mv (not implemented)", max_ma, mv); - chip->supply_voltage = mv; - chip->current_limit = max_ma; - - power_supply_changed(chip->psy); - return 0; } @@ -1686,43 +1673,6 @@ static irqreturn_t fusb302_irq_intn(int irq, void *dev_id) return IRQ_HANDLED; } -static int fusb302_psy_get_property(struct power_supply *psy, - enum power_supply_property psp, - union power_supply_propval *val) -{ - struct fusb302_chip *chip = power_supply_get_drvdata(psy); - - switch (psp) { - case POWER_SUPPLY_PROP_ONLINE: - val->intval = chip->charge_on; - break; - case POWER_SUPPLY_PROP_VOLTAGE_NOW: - val->intval = chip->supply_voltage * 1000; /* mV -> µV */ - break; - case POWER_SUPPLY_PROP_CURRENT_MAX: - val->intval = chip->current_limit * 1000; /* mA -> µA */ - break; - default: - return -ENODATA; - } - - return 0; -} - -static enum power_supply_property fusb302_psy_properties[] = { - POWER_SUPPLY_PROP_ONLINE, - POWER_SUPPLY_PROP_VOLTAGE_NOW, - POWER_SUPPLY_PROP_CURRENT_MAX, -}; - -static const struct power_supply_desc fusb302_psy_desc = { - .name = "fusb302-typec-source", - .type = POWER_SUPPLY_TYPE_USB_TYPE_C, - .properties = fusb302_psy_properties, - .num_properties = ARRAY_SIZE(fusb302_psy_properties), - .get_property = fusb302_psy_get_property, -}; - static int init_gpio(struct fusb302_chip *chip) {
Re: [PATCH v7 1/6] typec: tcpm: Add core support for sink side PPS
On 03/23/2018 03:12 AM, Adam Thomson wrote: This commit adds code to handle requesting of PPS APDOs. Switching between standard PDOs and APDOs, and re-requesting an APDO to modify operating voltage/current will be triggered by an external call into TCPM. Signed-off-by: Adam ThomsonAcked-by: Heikki Krogerus Reviewed-by: Guenter Roeck --- drivers/usb/typec/tcpm.c | 517 ++- include/linux/usb/pd.h | 4 +- include/linux/usb/tcpm.h | 1 + 3 files changed, 509 insertions(+), 13 deletions(-) diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c index 4c0fc54..1a66c9e 100644 --- a/drivers/usb/typec/tcpm.c +++ b/drivers/usb/typec/tcpm.c @@ -47,6 +47,7 @@ S(SNK_DISCOVERY_DEBOUNCE_DONE), \ S(SNK_WAIT_CAPABILITIES), \ S(SNK_NEGOTIATE_CAPABILITIES), \ + S(SNK_NEGOTIATE_PPS_CAPABILITIES), \ S(SNK_TRANSITION_SINK), \ S(SNK_TRANSITION_SINK_VBUS),\ S(SNK_READY), \ @@ -166,6 +167,16 @@ struct pd_mode_data { struct typec_altmode_desc altmode_desc[SVID_DISCOVERY_MAX]; }; +struct pd_pps_data { + u32 min_volt; + u32 max_volt; + u32 max_curr; + u32 out_volt; + u32 op_curr; + bool supported; + bool active; +}; + struct tcpm_port { struct device *dev; @@ -233,6 +244,7 @@ struct tcpm_port { struct completion swap_complete; int swap_status; + unsigned int negotiated_rev; unsigned int message_id; unsigned int caps_count; unsigned int hard_reset_count; @@ -259,6 +271,7 @@ struct tcpm_port { unsigned int max_snk_ma; unsigned int max_snk_mw; unsigned int operating_snk_mw; + bool update_sink_caps; /* Requested current / voltage */ u32 current_limit; @@ -275,8 +288,13 @@ struct tcpm_port { /* VDO to retry if UFP responder replied busy */ u32 vdo_retry; - /* Alternate mode data */ + /* PPS */ + struct pd_pps_data pps_data; + struct completion pps_complete; + bool pps_pending; + int pps_status; + /* Alternate mode data */ struct pd_mode_data mode_data; struct typec_altmode *partner_altmode[SVID_DISCOVERY_MAX]; struct typec_altmode *port_altmode[SVID_DISCOVERY_MAX]; @@ -494,6 +512,16 @@ static void tcpm_log_source_caps(struct tcpm_port *port) pdo_max_voltage(pdo), pdo_max_power(pdo)); break; + case PDO_TYPE_APDO: + if (pdo_apdo_type(pdo) == APDO_TYPE_PPS) + scnprintf(msg, sizeof(msg), + "%u-%u mV, %u mA", + pdo_pps_apdo_min_voltage(pdo), + pdo_pps_apdo_max_voltage(pdo), + pdo_pps_apdo_max_current(pdo)); + else + strcpy(msg, "undefined APDO"); + break; default: strcpy(msg, "undefined"); break; @@ -777,11 +805,13 @@ static int tcpm_pd_send_source_caps(struct tcpm_port *port) msg.header = PD_HEADER_LE(PD_CTRL_REJECT, port->pwr_role, port->data_role, + port->negotiated_rev, port->message_id, 0); } else { msg.header = PD_HEADER_LE(PD_DATA_SOURCE_CAP, port->pwr_role, port->data_role, + port->negotiated_rev, port->message_id, port->nr_src_pdo); } @@ -802,11 +832,13 @@ static int tcpm_pd_send_sink_caps(struct tcpm_port *port) msg.header = PD_HEADER_LE(PD_CTRL_REJECT, port->pwr_role, port->data_role, + port->negotiated_rev, port->message_id, 0); } else { msg.header = PD_HEADER_LE(PD_DATA_SINK_CAP, port->pwr_role, port->data_role, + port->negotiated_rev, port->message_id, port->nr_snk_pdo); } @@ -1173,6 +1205,7 @@ static void
Re: [PATCH v7 1/6] typec: tcpm: Add core support for sink side PPS
On 03/23/2018 03:12 AM, Adam Thomson wrote: This commit adds code to handle requesting of PPS APDOs. Switching between standard PDOs and APDOs, and re-requesting an APDO to modify operating voltage/current will be triggered by an external call into TCPM. Signed-off-by: Adam Thomson Acked-by: Heikki Krogerus Reviewed-by: Guenter Roeck --- drivers/usb/typec/tcpm.c | 517 ++- include/linux/usb/pd.h | 4 +- include/linux/usb/tcpm.h | 1 + 3 files changed, 509 insertions(+), 13 deletions(-) diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c index 4c0fc54..1a66c9e 100644 --- a/drivers/usb/typec/tcpm.c +++ b/drivers/usb/typec/tcpm.c @@ -47,6 +47,7 @@ S(SNK_DISCOVERY_DEBOUNCE_DONE), \ S(SNK_WAIT_CAPABILITIES), \ S(SNK_NEGOTIATE_CAPABILITIES), \ + S(SNK_NEGOTIATE_PPS_CAPABILITIES), \ S(SNK_TRANSITION_SINK), \ S(SNK_TRANSITION_SINK_VBUS),\ S(SNK_READY), \ @@ -166,6 +167,16 @@ struct pd_mode_data { struct typec_altmode_desc altmode_desc[SVID_DISCOVERY_MAX]; }; +struct pd_pps_data { + u32 min_volt; + u32 max_volt; + u32 max_curr; + u32 out_volt; + u32 op_curr; + bool supported; + bool active; +}; + struct tcpm_port { struct device *dev; @@ -233,6 +244,7 @@ struct tcpm_port { struct completion swap_complete; int swap_status; + unsigned int negotiated_rev; unsigned int message_id; unsigned int caps_count; unsigned int hard_reset_count; @@ -259,6 +271,7 @@ struct tcpm_port { unsigned int max_snk_ma; unsigned int max_snk_mw; unsigned int operating_snk_mw; + bool update_sink_caps; /* Requested current / voltage */ u32 current_limit; @@ -275,8 +288,13 @@ struct tcpm_port { /* VDO to retry if UFP responder replied busy */ u32 vdo_retry; - /* Alternate mode data */ + /* PPS */ + struct pd_pps_data pps_data; + struct completion pps_complete; + bool pps_pending; + int pps_status; + /* Alternate mode data */ struct pd_mode_data mode_data; struct typec_altmode *partner_altmode[SVID_DISCOVERY_MAX]; struct typec_altmode *port_altmode[SVID_DISCOVERY_MAX]; @@ -494,6 +512,16 @@ static void tcpm_log_source_caps(struct tcpm_port *port) pdo_max_voltage(pdo), pdo_max_power(pdo)); break; + case PDO_TYPE_APDO: + if (pdo_apdo_type(pdo) == APDO_TYPE_PPS) + scnprintf(msg, sizeof(msg), + "%u-%u mV, %u mA", + pdo_pps_apdo_min_voltage(pdo), + pdo_pps_apdo_max_voltage(pdo), + pdo_pps_apdo_max_current(pdo)); + else + strcpy(msg, "undefined APDO"); + break; default: strcpy(msg, "undefined"); break; @@ -777,11 +805,13 @@ static int tcpm_pd_send_source_caps(struct tcpm_port *port) msg.header = PD_HEADER_LE(PD_CTRL_REJECT, port->pwr_role, port->data_role, + port->negotiated_rev, port->message_id, 0); } else { msg.header = PD_HEADER_LE(PD_DATA_SOURCE_CAP, port->pwr_role, port->data_role, + port->negotiated_rev, port->message_id, port->nr_src_pdo); } @@ -802,11 +832,13 @@ static int tcpm_pd_send_sink_caps(struct tcpm_port *port) msg.header = PD_HEADER_LE(PD_CTRL_REJECT, port->pwr_role, port->data_role, + port->negotiated_rev, port->message_id, 0); } else { msg.header = PD_HEADER_LE(PD_DATA_SINK_CAP, port->pwr_role, port->data_role, + port->negotiated_rev, port->message_id, port->nr_snk_pdo); } @@ -1173,6 +1205,7 @@ static void vdm_run_state_machine(struct tcpm_port *port) msg.header =
Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free
On Thu, Mar 22, 2018 at 08:17:19AM -0700, Matthew Wilcox wrote: > On Tue, Mar 13, 2018 at 11:34:53AM +0800, Aaron Lu wrote: > > I wish there is a data structure that has the flexibility of list while > > at the same time we can locate the Nth element in the list without the > > need to iterate. That's what I'm looking for when developing clustered > > allocation for order 0 pages. In the end, I had to use another place to > > record where the Nth element is. I hope to send out v2 of that RFC > > series soon but I'm still collecting data for it. I would appreciate if > > people could take a look then :-) > > Sorry, I missed this. There is such a data structure -- the IDR, or > possibly a bare radix tree, or we can build a better data structure on > top of the radix tree (I talked about one called the XQueue a while ago). > > The IDR will automatically grow to whatever needed size, it stores > pointers, you can find out quickly where the last allocated index is, > you can remove from the middle of the array. Disadvantage is that it > requires memory allocation to store the array of pointers, *but* it > can always hold at least one entry. So if you have no memory, you can > always return the one element in your IDR to the free pool and allocate > from that page. Thanks for the pointer, will take a look later. Currently I'm focusing on finding real workloads that have zone lock contention issue.
Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free
On Thu, Mar 22, 2018 at 08:17:19AM -0700, Matthew Wilcox wrote: > On Tue, Mar 13, 2018 at 11:34:53AM +0800, Aaron Lu wrote: > > I wish there is a data structure that has the flexibility of list while > > at the same time we can locate the Nth element in the list without the > > need to iterate. That's what I'm looking for when developing clustered > > allocation for order 0 pages. In the end, I had to use another place to > > record where the Nth element is. I hope to send out v2 of that RFC > > series soon but I'm still collecting data for it. I would appreciate if > > people could take a look then :-) > > Sorry, I missed this. There is such a data structure -- the IDR, or > possibly a bare radix tree, or we can build a better data structure on > top of the radix tree (I talked about one called the XQueue a while ago). > > The IDR will automatically grow to whatever needed size, it stores > pointers, you can find out quickly where the last allocated index is, > you can remove from the middle of the array. Disadvantage is that it > requires memory allocation to store the array of pointers, *but* it > can always hold at least one entry. So if you have no memory, you can > always return the one element in your IDR to the free pool and allocate > from that page. Thanks for the pointer, will take a look later. Currently I'm focusing on finding real workloads that have zone lock contention issue.
[PATCH v3 4/5] arm64: introduce pfn_valid_region()
This is the preparation for further optimizing in early_pfn_valid on arm64. Signed-off-by: Jia He--- arch/arm64/include/asm/page.h | 3 ++- arch/arm64/mm/init.c | 25 - 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 60d02c8..da2cba3 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -38,7 +38,8 @@ extern void clear_page(void *to); typedef struct page *pgtable_t; #ifdef CONFIG_HAVE_ARCH_PFN_VALID -extern int pfn_valid(unsigned long); +extern int pfn_valid(unsigned long pfn); +extern int pfn_valid_region(unsigned long pfn, int *last_idx); #endif #include diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 00e7b90..06433d5 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -290,7 +290,30 @@ int pfn_valid(unsigned long pfn) return memblock_is_map_memory(pfn << PAGE_SHIFT); } EXPORT_SYMBOL(pfn_valid); -#endif + +int pfn_valid_region(unsigned long pfn, int *last_idx) +{ + unsigned long start_pfn, end_pfn; + struct memblock_type *type = + + if (*last_idx != -1) { + start_pfn = PFN_DOWN(type->regions[*last_idx].base); + end_pfn= PFN_DOWN(type->regions[*last_idx].base + + type->regions[*last_idx].size); + + if (pfn >= start_pfn && pfn < end_pfn) + return !memblock_is_nomap( + [*last_idx]); + } + + *last_idx = memblock_search_pfn_regions(pfn); + if (*last_idx == -1) + return false; + + return !memblock_is_nomap([*last_idx]); +} +EXPORT_SYMBOL(pfn_valid_region); +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ #ifndef CONFIG_SPARSEMEM static void __init arm64_memory_present(void) -- 2.7.4
[PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") optimized the loop in memmap_init_zone(). But there is still some room for improvement. E.g. in early_pfn_valid(), if pfn and pfn+1 are in the same memblock region, we can record the last returned memblock region index and check check pfn++ is still in the same region. Currently it only improve the performance on arm64 and will have no impact on other arches. Signed-off-by: Jia He--- arch/x86/include/asm/mmzone_32.h | 2 +- include/linux/mmzone.h | 12 +--- mm/page_alloc.c | 2 +- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h index 73d8dd1..329d3ba 100644 --- a/arch/x86/include/asm/mmzone_32.h +++ b/arch/x86/include/asm/mmzone_32.h @@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn) return 0; } -#define early_pfn_valid(pfn) pfn_valid((pfn)) +#define early_pfn_valid(pfn, last_region_idx) pfn_valid((pfn)) #endif /* CONFIG_DISCONTIGMEM */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d797716..3a686af 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn) }) #else #define pfn_to_nid(pfn)(0) -#endif +#endif /*CONFIG_NUMA*/ + +#ifdef CONFIG_HAVE_ARCH_PFN_VALID +#define early_pfn_valid(pfn, last_region_idx) \ + pfn_valid_region(pfn, last_region_idx) +#else +#define early_pfn_valid(pfn, last_region_idx) pfn_valid(pfn) +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ -#define early_pfn_valid(pfn) pfn_valid(pfn) void sparse_init(void); #else #define sparse_init() do {} while (0) @@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache { }; #ifndef early_pfn_valid -#define early_pfn_valid(pfn) (1) +#define early_pfn_valid(pfn, last_region_idx) (1) #endif void memory_present(int nid, unsigned long start, unsigned long end); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0bb0274..debccf3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5484,7 +5484,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (context != MEMMAP_EARLY) goto not_early; - if (!early_pfn_valid(pfn)) { + if (!early_pfn_valid(pfn, )) { #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID) /* * Skip to the pfn preceding the next valid one (or -- 2.7.4
[PATCH v3 3/5] mm/memblock: introduce memblock_search_pfn_regions()
This api is the preparation for further optimizing early_pfn_valid Signed-off-by: Jia He--- include/linux/memblock.h | 2 ++ mm/memblock.c| 9 + 2 files changed, 11 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index a8fb2ab..104bca6 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -207,6 +207,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx); #endif +int memblock_search_pfn_regions(unsigned long pfn); + /** * for_each_free_mem_range - iterate through free memblock areas * @i: u64 used as loop variable diff --git a/mm/memblock.c b/mm/memblock.c index 06c1a08..15fcde2 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1661,6 +1661,15 @@ static int __init_memblock memblock_search(struct memblock_type *type, phys_addr return -1; } +/* search memblock with the input pfn, return the region idx */ +int __init_memblock memblock_search_pfn_regions(unsigned long pfn) +{ + struct memblock_type *type = + int mid = memblock_search(type, PFN_PHYS(pfn)); + + return mid; +} + bool __init memblock_is_reserved(phys_addr_t addr) { return memblock_search(, addr) != -1; -- 2.7.4
[PATCH v3 4/5] arm64: introduce pfn_valid_region()
This is the preparation for further optimizing in early_pfn_valid on arm64. Signed-off-by: Jia He --- arch/arm64/include/asm/page.h | 3 ++- arch/arm64/mm/init.c | 25 - 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 60d02c8..da2cba3 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -38,7 +38,8 @@ extern void clear_page(void *to); typedef struct page *pgtable_t; #ifdef CONFIG_HAVE_ARCH_PFN_VALID -extern int pfn_valid(unsigned long); +extern int pfn_valid(unsigned long pfn); +extern int pfn_valid_region(unsigned long pfn, int *last_idx); #endif #include diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 00e7b90..06433d5 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -290,7 +290,30 @@ int pfn_valid(unsigned long pfn) return memblock_is_map_memory(pfn << PAGE_SHIFT); } EXPORT_SYMBOL(pfn_valid); -#endif + +int pfn_valid_region(unsigned long pfn, int *last_idx) +{ + unsigned long start_pfn, end_pfn; + struct memblock_type *type = + + if (*last_idx != -1) { + start_pfn = PFN_DOWN(type->regions[*last_idx].base); + end_pfn= PFN_DOWN(type->regions[*last_idx].base + + type->regions[*last_idx].size); + + if (pfn >= start_pfn && pfn < end_pfn) + return !memblock_is_nomap( + [*last_idx]); + } + + *last_idx = memblock_search_pfn_regions(pfn); + if (*last_idx == -1) + return false; + + return !memblock_is_nomap([*last_idx]); +} +EXPORT_SYMBOL(pfn_valid_region); +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ #ifndef CONFIG_SPARSEMEM static void __init arm64_memory_present(void) -- 2.7.4
[PATCH v3 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") optimized the loop in memmap_init_zone(). But there is still some room for improvement. E.g. in early_pfn_valid(), if pfn and pfn+1 are in the same memblock region, we can record the last returned memblock region index and check check pfn++ is still in the same region. Currently it only improve the performance on arm64 and will have no impact on other arches. Signed-off-by: Jia He --- arch/x86/include/asm/mmzone_32.h | 2 +- include/linux/mmzone.h | 12 +--- mm/page_alloc.c | 2 +- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h index 73d8dd1..329d3ba 100644 --- a/arch/x86/include/asm/mmzone_32.h +++ b/arch/x86/include/asm/mmzone_32.h @@ -49,7 +49,7 @@ static inline int pfn_valid(int pfn) return 0; } -#define early_pfn_valid(pfn) pfn_valid((pfn)) +#define early_pfn_valid(pfn, last_region_idx) pfn_valid((pfn)) #endif /* CONFIG_DISCONTIGMEM */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d797716..3a686af 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1267,9 +1267,15 @@ static inline int pfn_present(unsigned long pfn) }) #else #define pfn_to_nid(pfn)(0) -#endif +#endif /*CONFIG_NUMA*/ + +#ifdef CONFIG_HAVE_ARCH_PFN_VALID +#define early_pfn_valid(pfn, last_region_idx) \ + pfn_valid_region(pfn, last_region_idx) +#else +#define early_pfn_valid(pfn, last_region_idx) pfn_valid(pfn) +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ -#define early_pfn_valid(pfn) pfn_valid(pfn) void sparse_init(void); #else #define sparse_init() do {} while (0) @@ -1288,7 +1294,7 @@ struct mminit_pfnnid_cache { }; #ifndef early_pfn_valid -#define early_pfn_valid(pfn) (1) +#define early_pfn_valid(pfn, last_region_idx) (1) #endif void memory_present(int nid, unsigned long start, unsigned long end); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0bb0274..debccf3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5484,7 +5484,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (context != MEMMAP_EARLY) goto not_early; - if (!early_pfn_valid(pfn)) { + if (!early_pfn_valid(pfn, )) { #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID) /* * Skip to the pfn preceding the next valid one (or -- 2.7.4
[PATCH v3 3/5] mm/memblock: introduce memblock_search_pfn_regions()
This api is the preparation for further optimizing early_pfn_valid Signed-off-by: Jia He --- include/linux/memblock.h | 2 ++ mm/memblock.c| 9 + 2 files changed, 11 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index a8fb2ab..104bca6 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -207,6 +207,8 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx); #endif +int memblock_search_pfn_regions(unsigned long pfn); + /** * for_each_free_mem_range - iterate through free memblock areas * @i: u64 used as loop variable diff --git a/mm/memblock.c b/mm/memblock.c index 06c1a08..15fcde2 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1661,6 +1661,15 @@ static int __init_memblock memblock_search(struct memblock_type *type, phys_addr return -1; } +/* search memblock with the input pfn, return the region idx */ +int __init_memblock memblock_search_pfn_regions(unsigned long pfn) +{ + struct memblock_type *type = + int mid = memblock_search(type, PFN_PHYS(pfn)); + + return mid; +} + bool __init memblock_is_reserved(phys_addr_t addr) { return memblock_search(, addr) != -1; -- 2.7.4
[PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") optimized the loop in memmap_init_zone(). But there is still some room for improvement. E.g. if pfn and pfn+1 are in the same memblock region, we can simply pfn++ instead of doing the binary search in memblock_next_valid_pfn. This patch only works when CONFIG_HAVE_ARCH_PFN_VALID is enable. Signed-off-by: Jia He--- include/linux/memblock.h | 2 +- mm/memblock.c| 73 +--- mm/page_alloc.c | 3 +- 3 files changed, 47 insertions(+), 31 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index efbbe4b..a8fb2ab 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ #ifdef CONFIG_HAVE_ARCH_PFN_VALID -unsigned long memblock_next_valid_pfn(unsigned long pfn); +unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx); #endif /** diff --git a/mm/memblock.c b/mm/memblock.c index bea5a9c..06c1a08 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid, *out_nid = r->nid; } -#ifdef CONFIG_HAVE_ARCH_PFN_VALID -unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) -{ - struct memblock_type *type = - unsigned int right = type->cnt; - unsigned int mid, left = 0; - phys_addr_t addr = PFN_PHYS(++pfn); - - do { - mid = (right + left) / 2; - - if (addr < type->regions[mid].base) - right = mid; - else if (addr >= (type->regions[mid].base + - type->regions[mid].size)) - left = mid + 1; - else { - /* addr is within the region, so pfn is valid */ - return pfn; - } - } while (left < right); - - if (right == type->cnt) - return -1UL; - else - return PHYS_PFN(type->regions[right].base); -} -#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ - /** * memblock_set_node - set node ID on memblock regions * @base: base of area to set node ID for @@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size, } #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#ifdef CONFIG_HAVE_ARCH_PFN_VALID +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, + int *last_idx) +{ + struct memblock_type *type = + unsigned int right = type->cnt; + unsigned int mid, left = 0; + unsigned long start_pfn, end_pfn; + phys_addr_t addr = PFN_PHYS(++pfn); + + /* fast path, return pfh+1 if next pfn is in the same region */ + if (*last_idx != -1) { + start_pfn = PFN_DOWN(type->regions[*last_idx].base); + end_pfn = PFN_DOWN(type->regions[*last_idx].base + + type->regions[*last_idx].size); + + if (pfn < end_pfn && pfn > start_pfn) + return pfn; + } + + /* slow path, do the binary searching */ + do { + mid = (right + left) / 2; + + if (addr < type->regions[mid].base) + right = mid; + else if (addr >= (type->regions[mid].base + + type->regions[mid].size)) + left = mid + 1; + else { + *last_idx = mid; + return pfn; + } + } while (left < right); + + if (right == type->cnt) + return -1UL; + + *last_idx = right; + + return PHYS_PFN(type->regions[*last_idx].base); +} +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ + static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, phys_addr_t align, phys_addr_t start, phys_addr_t end, int nid, ulong flags) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2a967f7..0bb0274 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, unsigned long end_pfn = start_pfn + size; pg_data_t *pgdat = NODE_DATA(nid); unsigned long pfn; + int idx = -1; unsigned long nr_initialised = 0; struct page *page; #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP @@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * end_pfn), such that we hit a valid pfn (or end_pfn) * on our next iteration of the loop.
[PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") optimized the loop in memmap_init_zone(). But it causes possible panic bug. So Daniel Vacek reverted it later. But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is enable. And as verified by Eugeniu Rosca, arm can benifit from this commit. So remain the memblock_next_valid_pfn. Signed-off-by: Jia He--- include/linux/memblock.h | 4 mm/memblock.c| 29 + mm/page_alloc.c | 11 ++- 3 files changed, 43 insertions(+), 1 deletion(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 0257aee..efbbe4b 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, i >= 0; __next_mem_pfn_range(, nid, p_start, p_end, p_nid)) #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#ifdef CONFIG_HAVE_ARCH_PFN_VALID +unsigned long memblock_next_valid_pfn(unsigned long pfn); +#endif + /** * for_each_free_mem_range - iterate through free memblock areas * @i: u64 used as loop variable diff --git a/mm/memblock.c b/mm/memblock.c index ba7c878..bea5a9c 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid, *out_nid = r->nid; } +#ifdef CONFIG_HAVE_ARCH_PFN_VALID +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) +{ + struct memblock_type *type = + unsigned int right = type->cnt; + unsigned int mid, left = 0; + phys_addr_t addr = PFN_PHYS(++pfn); + + do { + mid = (right + left) / 2; + + if (addr < type->regions[mid].base) + right = mid; + else if (addr >= (type->regions[mid].base + + type->regions[mid].size)) + left = mid + 1; + else { + /* addr is within the region, so pfn is valid */ + return pfn; + } + } while (left < right); + + if (right == type->cnt) + return -1UL; + else + return PHYS_PFN(type->regions[right].base); +} +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ + /** * memblock_set_node - set node ID on memblock regions * @base: base of area to set node ID for diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c19f5ac..2a967f7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (context != MEMMAP_EARLY) goto not_early; - if (!early_pfn_valid(pfn)) + if (!early_pfn_valid(pfn)) { +#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID) + /* +* Skip to the pfn preceding the next valid one (or +* end_pfn), such that we hit a valid pfn (or end_pfn) +* on our next iteration of the loop. +*/ + pfn = memblock_next_valid_pfn(pfn) - 1; +#endif continue; + } if (!early_pfn_in_nid(pfn, nid)) continue; if (!update_defer_init(pgdat, pfn, end_pfn, _initialised)) -- 2.7.4
[PATCH v3 2/5] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") optimized the loop in memmap_init_zone(). But there is still some room for improvement. E.g. if pfn and pfn+1 are in the same memblock region, we can simply pfn++ instead of doing the binary search in memblock_next_valid_pfn. This patch only works when CONFIG_HAVE_ARCH_PFN_VALID is enable. Signed-off-by: Jia He --- include/linux/memblock.h | 2 +- mm/memblock.c| 73 +--- mm/page_alloc.c | 3 +- 3 files changed, 47 insertions(+), 31 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index efbbe4b..a8fb2ab 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -204,7 +204,7 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ #ifdef CONFIG_HAVE_ARCH_PFN_VALID -unsigned long memblock_next_valid_pfn(unsigned long pfn); +unsigned long memblock_next_valid_pfn(unsigned long pfn, int *idx); #endif /** diff --git a/mm/memblock.c b/mm/memblock.c index bea5a9c..06c1a08 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1102,35 +1102,6 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid, *out_nid = r->nid; } -#ifdef CONFIG_HAVE_ARCH_PFN_VALID -unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) -{ - struct memblock_type *type = - unsigned int right = type->cnt; - unsigned int mid, left = 0; - phys_addr_t addr = PFN_PHYS(++pfn); - - do { - mid = (right + left) / 2; - - if (addr < type->regions[mid].base) - right = mid; - else if (addr >= (type->regions[mid].base + - type->regions[mid].size)) - left = mid + 1; - else { - /* addr is within the region, so pfn is valid */ - return pfn; - } - } while (left < right); - - if (right == type->cnt) - return -1UL; - else - return PHYS_PFN(type->regions[right].base); -} -#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ - /** * memblock_set_node - set node ID on memblock regions * @base: base of area to set node ID for @@ -1162,6 +1133,50 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size, } #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#ifdef CONFIG_HAVE_ARCH_PFN_VALID +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, + int *last_idx) +{ + struct memblock_type *type = + unsigned int right = type->cnt; + unsigned int mid, left = 0; + unsigned long start_pfn, end_pfn; + phys_addr_t addr = PFN_PHYS(++pfn); + + /* fast path, return pfh+1 if next pfn is in the same region */ + if (*last_idx != -1) { + start_pfn = PFN_DOWN(type->regions[*last_idx].base); + end_pfn = PFN_DOWN(type->regions[*last_idx].base + + type->regions[*last_idx].size); + + if (pfn < end_pfn && pfn > start_pfn) + return pfn; + } + + /* slow path, do the binary searching */ + do { + mid = (right + left) / 2; + + if (addr < type->regions[mid].base) + right = mid; + else if (addr >= (type->regions[mid].base + + type->regions[mid].size)) + left = mid + 1; + else { + *last_idx = mid; + return pfn; + } + } while (left < right); + + if (right == type->cnt) + return -1UL; + + *last_idx = right; + + return PHYS_PFN(type->regions[*last_idx].base); +} +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ + static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, phys_addr_t align, phys_addr_t start, phys_addr_t end, int nid, ulong flags) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2a967f7..0bb0274 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5459,6 +5459,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, unsigned long end_pfn = start_pfn + size; pg_data_t *pgdat = NODE_DATA(nid); unsigned long pfn; + int idx = -1; unsigned long nr_initialised = 0; struct page *page; #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP @@ -5490,7 +5491,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * end_pfn), such that we hit a valid pfn (or end_pfn) * on our next iteration of the loop. */ -
[PATCH v3 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") optimized the loop in memmap_init_zone(). But it causes possible panic bug. So Daniel Vacek reverted it later. But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is enable. And as verified by Eugeniu Rosca, arm can benifit from this commit. So remain the memblock_next_valid_pfn. Signed-off-by: Jia He --- include/linux/memblock.h | 4 mm/memblock.c| 29 + mm/page_alloc.c | 11 ++- 3 files changed, 43 insertions(+), 1 deletion(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 0257aee..efbbe4b 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, i >= 0; __next_mem_pfn_range(, nid, p_start, p_end, p_nid)) #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#ifdef CONFIG_HAVE_ARCH_PFN_VALID +unsigned long memblock_next_valid_pfn(unsigned long pfn); +#endif + /** * for_each_free_mem_range - iterate through free memblock areas * @i: u64 used as loop variable diff --git a/mm/memblock.c b/mm/memblock.c index ba7c878..bea5a9c 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid, *out_nid = r->nid; } +#ifdef CONFIG_HAVE_ARCH_PFN_VALID +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) +{ + struct memblock_type *type = + unsigned int right = type->cnt; + unsigned int mid, left = 0; + phys_addr_t addr = PFN_PHYS(++pfn); + + do { + mid = (right + left) / 2; + + if (addr < type->regions[mid].base) + right = mid; + else if (addr >= (type->regions[mid].base + + type->regions[mid].size)) + left = mid + 1; + else { + /* addr is within the region, so pfn is valid */ + return pfn; + } + } while (left < right); + + if (right == type->cnt) + return -1UL; + else + return PHYS_PFN(type->regions[right].base); +} +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ + /** * memblock_set_node - set node ID on memblock regions * @base: base of area to set node ID for diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c19f5ac..2a967f7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (context != MEMMAP_EARLY) goto not_early; - if (!early_pfn_valid(pfn)) + if (!early_pfn_valid(pfn)) { +#if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID) + /* +* Skip to the pfn preceding the next valid one (or +* end_pfn), such that we hit a valid pfn (or end_pfn) +* on our next iteration of the loop. +*/ + pfn = memblock_next_valid_pfn(pfn) - 1; +#endif continue; + } if (!early_pfn_in_nid(pfn, nid)) continue; if (!update_defer_init(pgdat, pfn, end_pfn, _initialised)) -- 2.7.4
[PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") tried to optimize the loop in memmap_init_zone(). But there is still some room for improvement. Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID is enabled Patch 2 optimizes the memblock_next_valid_pfn() Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts because the changes are located across subsystems. I tested the pfn loop process in memmap_init(), the same as before. As for the performance improvement, after this set, I can see the time overhead of memmap_init() is reduced from 41313 us to 24345 us in my armv8a server(QDF2400 with 96G memory). Attached the memblock region information in my server. [ 86.956758] Zone ranges: [ 86.959452] DMA [mem 0x0020-0x] [ 86.966041] Normal [mem 0x0001-0x0017] [ 86.972631] Movable zone start for each node [ 86.977179] Early memory node ranges [ 86.980985] node 0: [mem 0x0020-0x0021] [ 86.987666] node 0: [mem 0x0082-0x0307] [ 86.994348] node 0: [mem 0x0308-0x0308] [ 87.001029] node 0: [mem 0x0309-0x031f] [ 87.007710] node 0: [mem 0x0320-0x033f] [ 87.014392] node 0: [mem 0x0341-0x0563] [ 87.021073] node 0: [mem 0x0564-0x0567] [ 87.027754] node 0: [mem 0x0568-0x056d] [ 87.034435] node 0: [mem 0x056e-0x086f] [ 87.041117] node 0: [mem 0x0870-0x0871] [ 87.047798] node 0: [mem 0x0872-0x0894] [ 87.054479] node 0: [mem 0x0895-0x08ba] [ 87.061161] node 0: [mem 0x08bb-0x08bc] [ 87.067842] node 0: [mem 0x08bd-0x08c4] [ 87.074524] node 0: [mem 0x08c5-0x08e2] [ 87.081205] node 0: [mem 0x08e3-0x08e4] [ 87.087886] node 0: [mem 0x08e5-0x08fc] [ 87.094568] node 0: [mem 0x08fd-0x0910] [ 87.101249] node 0: [mem 0x0911-0x092e] [ 87.107930] node 0: [mem 0x092f-0x0930] [ 87.114612] node 0: [mem 0x0931-0x0963] [ 87.121293] node 0: [mem 0x0964-0x0e61] [ 87.127975] node 0: [mem 0x0e62-0x0e64] [ 87.134657] node 0: [mem 0x0e65-0x0fff] [ 87.141338] node 0: [mem 0x1080-0x17fe] [ 87.148019] node 0: [mem 0x1c00-0x1c00] [ 87.154701] node 0: [mem 0x1c01-0x1c7f] [ 87.161383] node 0: [mem 0x1c81-0x7efb] [ 87.168064] node 0: [mem 0x7efc-0x7efd] [ 87.174746] node 0: [mem 0x7efe-0x7efe] [ 87.181427] node 0: [mem 0x7eff-0x7eff] [ 87.188108] node 0: [mem 0x7f00-0x0017] [ 87.194791] Initmem setup node 0 [mem 0x0020-0x0017] Without this patchset: [ 117.106153] Initmem setup node 0 [mem 0x0020-0x0017] [ 117.113677] before memmap_init [ 117.118195] after memmap_init >>> memmap_init takes 4518 us [ 117.121446] before memmap_init [ 117.154992] after memmap_init >>> memmap_init takes 33546 us [ 117.158241] before memmap_init [ 117.161490] after memmap_init >>> memmap_init takes 3249 us >>> totally takes 41313 us With this patchset: [ 87.194791] Initmem setup node 0 [mem 0x0020-0x0017] [ 87.202314] before memmap_init [ 87.206164] after memmap_init >>> memmap_init takes 3850 us [ 87.209416] before memmap_init [ 87.226662] after memmap_init >>> memmap_init takes 17246 us [ 87.229911] before memmap_init [ 87.233160] after memmap_init >>> memmap_init takes 3249 us >>> totally takes 24345 us Changelog: V3: - fix 2 issues reported by kbuild test robot V2: - rebase to mmotm latest - remain memblock_next_valid_pfn on arm64 - refine memblock_search_pfn_regions and pfn_valid_region Jia He (5): mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() mm/memblock: introduce memblock_search_pfn_regions() arm64: introduce pfn_valid_region() mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() arch/arm64/include/asm/page.h| 3 ++- arch/arm64/mm/init.c | 25 ++- arch/x86/include/asm/mmzone_32.h | 2 +- include/linux/memblock.h | 6 + include/linux/mmzone.h
[PATCH v3 0/5] optimize memblock_next_valid_pfn and early_pfn_valid
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") tried to optimize the loop in memmap_init_zone(). But there is still some room for improvement. Patch 1 remain the memblock_next_valid_pfn when CONFIG_HAVE_ARCH_PFN_VALID is enabled Patch 2 optimizes the memblock_next_valid_pfn() Patch 3~5 optimizes the early_pfn_valid(), I have to split it into parts because the changes are located across subsystems. I tested the pfn loop process in memmap_init(), the same as before. As for the performance improvement, after this set, I can see the time overhead of memmap_init() is reduced from 41313 us to 24345 us in my armv8a server(QDF2400 with 96G memory). Attached the memblock region information in my server. [ 86.956758] Zone ranges: [ 86.959452] DMA [mem 0x0020-0x] [ 86.966041] Normal [mem 0x0001-0x0017] [ 86.972631] Movable zone start for each node [ 86.977179] Early memory node ranges [ 86.980985] node 0: [mem 0x0020-0x0021] [ 86.987666] node 0: [mem 0x0082-0x0307] [ 86.994348] node 0: [mem 0x0308-0x0308] [ 87.001029] node 0: [mem 0x0309-0x031f] [ 87.007710] node 0: [mem 0x0320-0x033f] [ 87.014392] node 0: [mem 0x0341-0x0563] [ 87.021073] node 0: [mem 0x0564-0x0567] [ 87.027754] node 0: [mem 0x0568-0x056d] [ 87.034435] node 0: [mem 0x056e-0x086f] [ 87.041117] node 0: [mem 0x0870-0x0871] [ 87.047798] node 0: [mem 0x0872-0x0894] [ 87.054479] node 0: [mem 0x0895-0x08ba] [ 87.061161] node 0: [mem 0x08bb-0x08bc] [ 87.067842] node 0: [mem 0x08bd-0x08c4] [ 87.074524] node 0: [mem 0x08c5-0x08e2] [ 87.081205] node 0: [mem 0x08e3-0x08e4] [ 87.087886] node 0: [mem 0x08e5-0x08fc] [ 87.094568] node 0: [mem 0x08fd-0x0910] [ 87.101249] node 0: [mem 0x0911-0x092e] [ 87.107930] node 0: [mem 0x092f-0x0930] [ 87.114612] node 0: [mem 0x0931-0x0963] [ 87.121293] node 0: [mem 0x0964-0x0e61] [ 87.127975] node 0: [mem 0x0e62-0x0e64] [ 87.134657] node 0: [mem 0x0e65-0x0fff] [ 87.141338] node 0: [mem 0x1080-0x17fe] [ 87.148019] node 0: [mem 0x1c00-0x1c00] [ 87.154701] node 0: [mem 0x1c01-0x1c7f] [ 87.161383] node 0: [mem 0x1c81-0x7efb] [ 87.168064] node 0: [mem 0x7efc-0x7efd] [ 87.174746] node 0: [mem 0x7efe-0x7efe] [ 87.181427] node 0: [mem 0x7eff-0x7eff] [ 87.188108] node 0: [mem 0x7f00-0x0017] [ 87.194791] Initmem setup node 0 [mem 0x0020-0x0017] Without this patchset: [ 117.106153] Initmem setup node 0 [mem 0x0020-0x0017] [ 117.113677] before memmap_init [ 117.118195] after memmap_init >>> memmap_init takes 4518 us [ 117.121446] before memmap_init [ 117.154992] after memmap_init >>> memmap_init takes 33546 us [ 117.158241] before memmap_init [ 117.161490] after memmap_init >>> memmap_init takes 3249 us >>> totally takes 41313 us With this patchset: [ 87.194791] Initmem setup node 0 [mem 0x0020-0x0017] [ 87.202314] before memmap_init [ 87.206164] after memmap_init >>> memmap_init takes 3850 us [ 87.209416] before memmap_init [ 87.226662] after memmap_init >>> memmap_init takes 17246 us [ 87.229911] before memmap_init [ 87.233160] after memmap_init >>> memmap_init takes 3249 us >>> totally takes 24345 us Changelog: V3: - fix 2 issues reported by kbuild test robot V2: - rebase to mmotm latest - remain memblock_next_valid_pfn on arm64 - refine memblock_search_pfn_regions and pfn_valid_region Jia He (5): mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() mm/memblock: introduce memblock_search_pfn_regions() arm64: introduce pfn_valid_region() mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() arch/arm64/include/asm/page.h| 3 ++- arch/arm64/mm/init.c | 25 ++- arch/x86/include/asm/mmzone_32.h | 2 +- include/linux/memblock.h | 6 + include/linux/mmzone.h
[PATCH v29 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the guest is using page poisoning. Guest writes to the poison_val config field to tell host about the page poisoning value in use. Signed-off-by: Wei WangSuggested-by: Michael S. Tsirkin Cc: Michael S. Tsirkin Cc: Michal Hocko Cc: Andrew Morton --- drivers/virtio/virtio_balloon.c | 10 ++ include/uapi/linux/virtio_balloon.h | 3 +++ 2 files changed, 13 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 18d24a4..6de9339 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -699,6 +699,7 @@ static struct file_system_type balloon_fs = { static int virtballoon_probe(struct virtio_device *vdev) { struct virtio_balloon *vb; + __u32 poison_val; int err; if (!vdev->config->get) { @@ -744,6 +745,11 @@ static int virtballoon_probe(struct virtio_device *vdev) vb->stop_cmd_id = cpu_to_virtio32(vb->vdev, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID); INIT_WORK(>report_free_page_work, report_free_page_func); + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) { + memset(_val, PAGE_POISON, sizeof(poison_val)); + virtio_cwrite(vb->vdev, struct virtio_balloon_config, + poison_val, _val); + } } vb->nb.notifier_call = virtballoon_oom_notify; @@ -862,6 +868,9 @@ static int virtballoon_restore(struct virtio_device *vdev) static int virtballoon_validate(struct virtio_device *vdev) { + if (!page_poisoning_enabled()) + __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON); + __virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM); return 0; } @@ -871,6 +880,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, VIRTIO_BALLOON_F_FREE_PAGE_HINT, + VIRTIO_BALLOON_F_PAGE_POISON, }; static struct virtio_driver virtio_balloon_driver = { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index b2d86c2..8b93581 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */ +#define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -47,6 +48,8 @@ struct virtio_balloon_config { __u32 actual; /* Free page report command id, readonly by guest */ __u32 free_page_report_cmd_id; + /* Stores PAGE_POISON if page poisoning is in use */ + __u32 poison_val; }; #define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */ -- 2.7.4
[PATCH v29 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the guest is using page poisoning. Guest writes to the poison_val config field to tell host about the page poisoning value in use. Signed-off-by: Wei Wang Suggested-by: Michael S. Tsirkin Cc: Michael S. Tsirkin Cc: Michal Hocko Cc: Andrew Morton --- drivers/virtio/virtio_balloon.c | 10 ++ include/uapi/linux/virtio_balloon.h | 3 +++ 2 files changed, 13 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 18d24a4..6de9339 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -699,6 +699,7 @@ static struct file_system_type balloon_fs = { static int virtballoon_probe(struct virtio_device *vdev) { struct virtio_balloon *vb; + __u32 poison_val; int err; if (!vdev->config->get) { @@ -744,6 +745,11 @@ static int virtballoon_probe(struct virtio_device *vdev) vb->stop_cmd_id = cpu_to_virtio32(vb->vdev, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID); INIT_WORK(>report_free_page_work, report_free_page_func); + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) { + memset(_val, PAGE_POISON, sizeof(poison_val)); + virtio_cwrite(vb->vdev, struct virtio_balloon_config, + poison_val, _val); + } } vb->nb.notifier_call = virtballoon_oom_notify; @@ -862,6 +868,9 @@ static int virtballoon_restore(struct virtio_device *vdev) static int virtballoon_validate(struct virtio_device *vdev) { + if (!page_poisoning_enabled()) + __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON); + __virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM); return 0; } @@ -871,6 +880,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, VIRTIO_BALLOON_F_FREE_PAGE_HINT, + VIRTIO_BALLOON_F_PAGE_POISON, }; static struct virtio_driver virtio_balloon_driver = { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index b2d86c2..8b93581 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */ +#define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -47,6 +48,8 @@ struct virtio_balloon_config { __u32 actual; /* Free page report command id, readonly by guest */ __u32 free_page_report_cmd_id; + /* Stores PAGE_POISON if page poisoning is in use */ + __u32 poison_val; }; #define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */ -- 2.7.4
[PATCH v29 1/4] mm: support reporting free page blocks
This patch adds support to walk through the free page blocks in the system and report them via a callback function. Some page blocks may leave the free list after zone->lock is released, so it is the caller's responsibility to either detect or prevent the use of such pages. One use example of this patch is to accelerate live migration by skipping the transfer of free pages reported from the guest. A popular method used by the hypervisor to track which part of memory is written during live migration is to write-protect all the guest memory. So, those pages that are reported as free pages but are written after the report function returns will be captured by the hypervisor, and they will be added to the next round of memory transfer. Signed-off-by: Wei WangSigned-off-by: Liang Li Cc: Michal Hocko Cc: Michael S. Tsirkin Acked-by: Michal Hocko --- include/linux/mm.h | 6 mm/page_alloc.c| 96 ++ 2 files changed, 102 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index ad06d42..c72b5a9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1944,6 +1944,12 @@ extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +extern int walk_free_mem_block(void *opaque, + int min_order, + int (*report_pfn_range)(void *opaque, + unsigned long pfn, + unsigned long num)); + /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) * into the buddy system. The freed pages will be poisoned with pattern diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 635d7dd..d58de87 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4912,6 +4912,102 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) show_swap_cache_info(); } +/* + * Walk through a free page list and report the found pfn range via the + * callback. + * + * Return 0 if it completes the reporting. Otherwise, return the non-zero + * value returned from the callback. + */ +static int walk_free_page_list(void *opaque, + struct zone *zone, + int order, + enum migratetype mt, + int (*report_pfn_range)(void *, + unsigned long, + unsigned long)) +{ + struct page *page; + struct list_head *list; + unsigned long pfn, flags; + int ret = 0; + + spin_lock_irqsave(>lock, flags); + list = >free_area[order].free_list[mt]; + list_for_each_entry(page, list, lru) { + pfn = page_to_pfn(page); + ret = report_pfn_range(opaque, pfn, 1 << order); + if (ret) + break; + } + spin_unlock_irqrestore(>lock, flags); + + return ret; +} + +/** + * walk_free_mem_block - Walk through the free page blocks in the system + * @opaque: the context passed from the caller + * @min_order: the minimum order of free lists to check + * @report_pfn_range: the callback to report the pfn range of the free pages + * + * If the callback returns a non-zero value, stop iterating the list of free + * page blocks. Otherwise, continue to report. + * + * Please note that there are no locking guarantees for the callback and + * that the reported pfn range might be freed or disappear after the + * callback returns so the caller has to be very careful how it is used. + * + * The callback itself must not sleep or perform any operations which would + * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC) + * or via any lock dependency. It is generally advisable to implement + * the callback as simple as possible and defer any heavy lifting to a + * different context. + * + * There is no guarantee that each free range will be reported only once + * during one walk_free_mem_block invocation. + * + * pfn_to_page on the given range is strongly discouraged and if there is + * an absolute need for that make sure to contact MM people to discuss + * potential problems. + * + * The function itself might sleep so it cannot be called from atomic + * contexts. + * + * In general low orders tend to be very volatile and so it makes more + * sense to query larger ones first for various optimizations which like + * ballooning etc... This will reduce the overhead as well. + * + * Return 0 if it completes the reporting. Otherwise, return the non-zero + * value returned from the callback. + */ +int walk_free_mem_block(void *opaque, + int
[PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules
In some usages, e.g. virtio-balloon, a kernel module needs to know if page poisoning is in use. This patch exposes the page_poisoning_enabled function to kernel modules. Signed-off-by: Wei WangCc: Andrew Morton Cc: Michal Hocko Cc: Michael S. Tsirkin --- mm/page_poison.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/mm/page_poison.c b/mm/page_poison.c index e83fd44..762b472 100644 --- a/mm/page_poison.c +++ b/mm/page_poison.c @@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf) } early_param("page_poison", early_page_poison_param); +/** + * page_poisoning_enabled - check if page poisoning is enabled + * + * Return true if page poisoning is enabled, or false if not. + */ bool page_poisoning_enabled(void) { /* @@ -29,6 +34,7 @@ bool page_poisoning_enabled(void) (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && debug_pagealloc_enabled())); } +EXPORT_SYMBOL_GPL(page_poisoning_enabled); static void poison_page(struct page *page) { -- 2.7.4
[PATCH v29 1/4] mm: support reporting free page blocks
This patch adds support to walk through the free page blocks in the system and report them via a callback function. Some page blocks may leave the free list after zone->lock is released, so it is the caller's responsibility to either detect or prevent the use of such pages. One use example of this patch is to accelerate live migration by skipping the transfer of free pages reported from the guest. A popular method used by the hypervisor to track which part of memory is written during live migration is to write-protect all the guest memory. So, those pages that are reported as free pages but are written after the report function returns will be captured by the hypervisor, and they will be added to the next round of memory transfer. Signed-off-by: Wei Wang Signed-off-by: Liang Li Cc: Michal Hocko Cc: Michael S. Tsirkin Acked-by: Michal Hocko --- include/linux/mm.h | 6 mm/page_alloc.c| 96 ++ 2 files changed, 102 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index ad06d42..c72b5a9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1944,6 +1944,12 @@ extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +extern int walk_free_mem_block(void *opaque, + int min_order, + int (*report_pfn_range)(void *opaque, + unsigned long pfn, + unsigned long num)); + /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) * into the buddy system. The freed pages will be poisoned with pattern diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 635d7dd..d58de87 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4912,6 +4912,102 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) show_swap_cache_info(); } +/* + * Walk through a free page list and report the found pfn range via the + * callback. + * + * Return 0 if it completes the reporting. Otherwise, return the non-zero + * value returned from the callback. + */ +static int walk_free_page_list(void *opaque, + struct zone *zone, + int order, + enum migratetype mt, + int (*report_pfn_range)(void *, + unsigned long, + unsigned long)) +{ + struct page *page; + struct list_head *list; + unsigned long pfn, flags; + int ret = 0; + + spin_lock_irqsave(>lock, flags); + list = >free_area[order].free_list[mt]; + list_for_each_entry(page, list, lru) { + pfn = page_to_pfn(page); + ret = report_pfn_range(opaque, pfn, 1 << order); + if (ret) + break; + } + spin_unlock_irqrestore(>lock, flags); + + return ret; +} + +/** + * walk_free_mem_block - Walk through the free page blocks in the system + * @opaque: the context passed from the caller + * @min_order: the minimum order of free lists to check + * @report_pfn_range: the callback to report the pfn range of the free pages + * + * If the callback returns a non-zero value, stop iterating the list of free + * page blocks. Otherwise, continue to report. + * + * Please note that there are no locking guarantees for the callback and + * that the reported pfn range might be freed or disappear after the + * callback returns so the caller has to be very careful how it is used. + * + * The callback itself must not sleep or perform any operations which would + * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC) + * or via any lock dependency. It is generally advisable to implement + * the callback as simple as possible and defer any heavy lifting to a + * different context. + * + * There is no guarantee that each free range will be reported only once + * during one walk_free_mem_block invocation. + * + * pfn_to_page on the given range is strongly discouraged and if there is + * an absolute need for that make sure to contact MM people to discuss + * potential problems. + * + * The function itself might sleep so it cannot be called from atomic + * contexts. + * + * In general low orders tend to be very volatile and so it makes more + * sense to query larger ones first for various optimizations which like + * ballooning etc... This will reduce the overhead as well. + * + * Return 0 if it completes the reporting. Otherwise, return the non-zero + * value returned from the callback. + */ +int walk_free_mem_block(void *opaque, + int min_order, + int (*report_pfn_range)(void *opaque, + unsigned
[PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules
In some usages, e.g. virtio-balloon, a kernel module needs to know if page poisoning is in use. This patch exposes the page_poisoning_enabled function to kernel modules. Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Michal Hocko Cc: Michael S. Tsirkin --- mm/page_poison.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/mm/page_poison.c b/mm/page_poison.c index e83fd44..762b472 100644 --- a/mm/page_poison.c +++ b/mm/page_poison.c @@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf) } early_param("page_poison", early_page_poison_param); +/** + * page_poisoning_enabled - check if page poisoning is enabled + * + * Return true if page poisoning is enabled, or false if not. + */ bool page_poisoning_enabled(void) { /* @@ -29,6 +34,7 @@ bool page_poisoning_enabled(void) (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && debug_pagealloc_enabled())); } +EXPORT_SYMBOL_GPL(page_poisoning_enabled); static void poison_page(struct page *page) { -- 2.7.4
[PATCH v29 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the support of reporting hints of guest free pages to host via virtio-balloon. Host requests the guest to report free page hints by sending a new cmd id to the guest via the free_page_report_cmd_id configuration register. When the guest starts to report, the first element added to the free page vq is the cmd id given by host. When the guest finishes the reporting of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added to the vq to tell host that the reporting is done. Host polls the free page vq after sending the starting cmd id, so the guest doesn't need to kick after filling an element to the vq. Host may also requests the guest to stop the reporting in advance by sending the stop cmd id to the guest via the configuration register. Signed-off-by: Wei WangSigned-off-by: Liang Li Cc: Michael S. Tsirkin Cc: Michal Hocko --- drivers/virtio/virtio_balloon.c | 257 +++- include/uapi/linux/virtio_balloon.h | 4 + 2 files changed, 225 insertions(+), 36 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index dfe5684..18d24a4 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); static struct vfsmount *balloon_mnt; #endif +enum virtio_balloon_vq { + VIRTIO_BALLOON_VQ_INFLATE, + VIRTIO_BALLOON_VQ_DEFLATE, + VIRTIO_BALLOON_VQ_STATS, + VIRTIO_BALLOON_VQ_FREE_PAGE, + VIRTIO_BALLOON_VQ_MAX +}; + struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; + + /* Balloon's own wq for cpu-intensive work items */ + struct workqueue_struct *balloon_wq; + /* The free page reporting work item submitted to the balloon wq */ + struct work_struct report_free_page_work; /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; @@ -63,6 +76,13 @@ struct virtio_balloon { spinlock_t stop_update_lock; bool stop_update; + /* The new cmd id received from host */ + uint32_t cmd_id_received; + /* The cmd id that is in use */ + __virtio32 cmd_id_use; + /* Buffer to store the stop sign */ + __virtio32 stop_cmd_id; + /* Waiting for host to ack the pages we released. */ wait_queue_head_t acked; @@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb) virtqueue_kick(vq); } -static void virtballoon_changed(struct virtio_device *vdev) -{ - struct virtio_balloon *vb = vdev->priv; - unsigned long flags; - - spin_lock_irqsave(>stop_update_lock, flags); - if (!vb->stop_update) - queue_work(system_freezable_wq, >update_balloon_size_work); - spin_unlock_irqrestore(>stop_update_lock, flags); -} - static inline s64 towards_target(struct virtio_balloon *vb) { s64 target; @@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb) return target - vb->num_pages; } +static void virtballoon_changed(struct virtio_device *vdev) +{ + struct virtio_balloon *vb = vdev->priv; + unsigned long flags; + s64 diff = towards_target(vb); + + if (diff) { + spin_lock_irqsave(>stop_update_lock, flags); + if (!vb->stop_update) + queue_work(system_freezable_wq, + >update_balloon_size_work); + spin_unlock_irqrestore(>stop_update_lock, flags); + } + + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { + virtio_cread(vdev, struct virtio_balloon_config, +free_page_report_cmd_id, >cmd_id_received); + if (vb->cmd_id_received != + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) { + spin_lock_irqsave(>stop_update_lock, flags); + if (!vb->stop_update) + queue_work(vb->balloon_wq, + >report_free_page_work); + spin_unlock_irqrestore(>stop_update_lock, flags); + } + } +} + static void update_balloon_size(struct virtio_balloon *vb) { u32 actual = vb->num_pages; @@ -421,42 +458,163 @@ static void update_balloon_size_func(struct work_struct *work) static int init_vqs(struct virtio_balloon *vb) { - struct virtqueue *vqs[3]; - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request }; - static const char * const names[] = { "inflate", "deflate", "stats" }; - int err,
[PATCH v29 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the support of reporting hints of guest free pages to host via virtio-balloon. Host requests the guest to report free page hints by sending a new cmd id to the guest via the free_page_report_cmd_id configuration register. When the guest starts to report, the first element added to the free page vq is the cmd id given by host. When the guest finishes the reporting of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added to the vq to tell host that the reporting is done. Host polls the free page vq after sending the starting cmd id, so the guest doesn't need to kick after filling an element to the vq. Host may also requests the guest to stop the reporting in advance by sending the stop cmd id to the guest via the configuration register. Signed-off-by: Wei Wang Signed-off-by: Liang Li Cc: Michael S. Tsirkin Cc: Michal Hocko --- drivers/virtio/virtio_balloon.c | 257 +++- include/uapi/linux/virtio_balloon.h | 4 + 2 files changed, 225 insertions(+), 36 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index dfe5684..18d24a4 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); static struct vfsmount *balloon_mnt; #endif +enum virtio_balloon_vq { + VIRTIO_BALLOON_VQ_INFLATE, + VIRTIO_BALLOON_VQ_DEFLATE, + VIRTIO_BALLOON_VQ_STATS, + VIRTIO_BALLOON_VQ_FREE_PAGE, + VIRTIO_BALLOON_VQ_MAX +}; + struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; + + /* Balloon's own wq for cpu-intensive work items */ + struct workqueue_struct *balloon_wq; + /* The free page reporting work item submitted to the balloon wq */ + struct work_struct report_free_page_work; /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; @@ -63,6 +76,13 @@ struct virtio_balloon { spinlock_t stop_update_lock; bool stop_update; + /* The new cmd id received from host */ + uint32_t cmd_id_received; + /* The cmd id that is in use */ + __virtio32 cmd_id_use; + /* Buffer to store the stop sign */ + __virtio32 stop_cmd_id; + /* Waiting for host to ack the pages we released. */ wait_queue_head_t acked; @@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb) virtqueue_kick(vq); } -static void virtballoon_changed(struct virtio_device *vdev) -{ - struct virtio_balloon *vb = vdev->priv; - unsigned long flags; - - spin_lock_irqsave(>stop_update_lock, flags); - if (!vb->stop_update) - queue_work(system_freezable_wq, >update_balloon_size_work); - spin_unlock_irqrestore(>stop_update_lock, flags); -} - static inline s64 towards_target(struct virtio_balloon *vb) { s64 target; @@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb) return target - vb->num_pages; } +static void virtballoon_changed(struct virtio_device *vdev) +{ + struct virtio_balloon *vb = vdev->priv; + unsigned long flags; + s64 diff = towards_target(vb); + + if (diff) { + spin_lock_irqsave(>stop_update_lock, flags); + if (!vb->stop_update) + queue_work(system_freezable_wq, + >update_balloon_size_work); + spin_unlock_irqrestore(>stop_update_lock, flags); + } + + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { + virtio_cread(vdev, struct virtio_balloon_config, +free_page_report_cmd_id, >cmd_id_received); + if (vb->cmd_id_received != + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) { + spin_lock_irqsave(>stop_update_lock, flags); + if (!vb->stop_update) + queue_work(vb->balloon_wq, + >report_free_page_work); + spin_unlock_irqrestore(>stop_update_lock, flags); + } + } +} + static void update_balloon_size(struct virtio_balloon *vb) { u32 actual = vb->num_pages; @@ -421,42 +458,163 @@ static void update_balloon_size_func(struct work_struct *work) static int init_vqs(struct virtio_balloon *vb) { - struct virtqueue *vqs[3]; - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request }; - static const char * const names[] = { "inflate", "deflate", "stats" }; - int err, nvqs; + struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX]; +
[PATCH v29 0/4] Virtio-balloon: support free page reporting
This patch series is separated from the previous "Virtio-balloon Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this series enables the virtio-balloon driver to report hints of guest free pages to the host. It can be used to accelerate live migration of VMs. Here is an introduction of this usage: Live migration needs to transfer the VM's memory from the source machine to the destination round by round. For the 1st round, all the VM's memory is transferred. From the 2nd round, only the pieces of memory that were written by the guest (after the 1st round) are transferred. One method that is popularly used by the hypervisor to track which part of memory is written is to write-protect all the guest memory. This feature enables the optimization by skipping the transfer of guest free pages during VM live migration. It is not concerned that the memory pages are used after they are given to the hypervisor as a hint of the free pages, because they will be tracked by the hypervisor and transferred in the subsequent round if they are used and written. * Tests - Test Environment Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Guest: 8G RAM, 4 vCPU Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second - Test Results - Idle Guest Live Migration Time (results are averaged over 10 runs) - Optimization v.s. Legacy = 261ms vs 1769ms --> ~86% reduction - Guest with Linux Compilation Workload (make bzImage -j4): - Live Migration Time (average) Optimization v.s. Legacy = 1260ms v.s. 2634ms --> ~51% reduction - Linux Compilation Time Optimization v.s. Legacy = 4min58s v.s. 5min3s --> no obvious difference ChangeLog: v28->v29: - mm/page_poison: only expose page_poison_enabled(), rather than more changes did in v28, as we are not 100% confident about that for now. - virtio-balloon: use a separate buffer for the stop cmd, instead of having the start and stop cmd use the same buffer. This avoids the corner case that the start cmd is overridden by the stop cmd when the host has a delay in reading the start cmd. v27->v28: - mm/page_poison: Move PAGE_POISON to page_poison.c and add a function to expose page poison val to kernel modules. v26->v27: - add a new patch to expose page_poisoning_enabled to kernel modules - virtio-balloon: set poison_val to 0x, instead of 0xaa v25->v26: virtio-balloon changes only - remove kicking free page vq since the host now polls the vq after initiating the reporting - report_free_page_func: detach all the used buffers after sending the stop cmd id. This avoids leaving the detaching burden (i.e. overhead) to the next cmd id. Detaching here isn't considered overhead since the stop cmd id has been sent, and host has already moved formard. v24->v25: - mm: change walk_free_mem_block to return 0 (instead of true) on completing the report, and return a non-zero value from the callabck, which stops the reporting. - virtio-balloon: - use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc. - avoid __virtio_clear_bit when bailing out; - a new method to avoid reporting the some cmd id to host twice - destroy_workqueue can cancel free page work when the feature is negotiated; - fail probe when the free page vq size is less than 2. v23->v24: - change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to VIRTIO_BALLOON_F_FREE_PAGE_HINT - kick when vq->num_free < half full, instead of "= half full" - replace BUG_ON with bailing out - check vb->balloon_wq in probe(), if null, bail out - add a new feature bit for page poisoning - solve the corner case that one cmd id being sent to host twice v22->v23: - change to kick the device when the vq is half-way full; - open-code batch_free_page_sg into add_one_sg; - change cmd_id from "uint32_t" to "__virtio32"; - reserver one entry in the vq for the driver to send cmd_id, instead of busywaiting for an available entry; - add "stop_update" check before queue_work for prudence purpose for now, will have a separate patch to discuss this flag check later; - init_vqs: change to put some variables on stack to have simpler implementation; - add destroy_workqueue(vb->balloon_wq); v21->v22: - add_one_sg: some code and comment re-arrangement - send_cmd_id: handle a cornercase For previous ChangeLog, please reference https://lwn.net/Articles/743660/ Wei Wang (4): mm: support reporting free page blocks virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT mm/page_poison: expose page_poisoning_enabled to kernel modules virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON drivers/virtio/virtio_balloon.c | 267 +++- include/linux/mm.h | 6 +
[PATCH v29 0/4] Virtio-balloon: support free page reporting
This patch series is separated from the previous "Virtio-balloon Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this series enables the virtio-balloon driver to report hints of guest free pages to the host. It can be used to accelerate live migration of VMs. Here is an introduction of this usage: Live migration needs to transfer the VM's memory from the source machine to the destination round by round. For the 1st round, all the VM's memory is transferred. From the 2nd round, only the pieces of memory that were written by the guest (after the 1st round) are transferred. One method that is popularly used by the hypervisor to track which part of memory is written is to write-protect all the guest memory. This feature enables the optimization by skipping the transfer of guest free pages during VM live migration. It is not concerned that the memory pages are used after they are given to the hypervisor as a hint of the free pages, because they will be tracked by the hypervisor and transferred in the subsequent round if they are used and written. * Tests - Test Environment Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Guest: 8G RAM, 4 vCPU Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second - Test Results - Idle Guest Live Migration Time (results are averaged over 10 runs) - Optimization v.s. Legacy = 261ms vs 1769ms --> ~86% reduction - Guest with Linux Compilation Workload (make bzImage -j4): - Live Migration Time (average) Optimization v.s. Legacy = 1260ms v.s. 2634ms --> ~51% reduction - Linux Compilation Time Optimization v.s. Legacy = 4min58s v.s. 5min3s --> no obvious difference ChangeLog: v28->v29: - mm/page_poison: only expose page_poison_enabled(), rather than more changes did in v28, as we are not 100% confident about that for now. - virtio-balloon: use a separate buffer for the stop cmd, instead of having the start and stop cmd use the same buffer. This avoids the corner case that the start cmd is overridden by the stop cmd when the host has a delay in reading the start cmd. v27->v28: - mm/page_poison: Move PAGE_POISON to page_poison.c and add a function to expose page poison val to kernel modules. v26->v27: - add a new patch to expose page_poisoning_enabled to kernel modules - virtio-balloon: set poison_val to 0x, instead of 0xaa v25->v26: virtio-balloon changes only - remove kicking free page vq since the host now polls the vq after initiating the reporting - report_free_page_func: detach all the used buffers after sending the stop cmd id. This avoids leaving the detaching burden (i.e. overhead) to the next cmd id. Detaching here isn't considered overhead since the stop cmd id has been sent, and host has already moved formard. v24->v25: - mm: change walk_free_mem_block to return 0 (instead of true) on completing the report, and return a non-zero value from the callabck, which stops the reporting. - virtio-balloon: - use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc. - avoid __virtio_clear_bit when bailing out; - a new method to avoid reporting the some cmd id to host twice - destroy_workqueue can cancel free page work when the feature is negotiated; - fail probe when the free page vq size is less than 2. v23->v24: - change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to VIRTIO_BALLOON_F_FREE_PAGE_HINT - kick when vq->num_free < half full, instead of "= half full" - replace BUG_ON with bailing out - check vb->balloon_wq in probe(), if null, bail out - add a new feature bit for page poisoning - solve the corner case that one cmd id being sent to host twice v22->v23: - change to kick the device when the vq is half-way full; - open-code batch_free_page_sg into add_one_sg; - change cmd_id from "uint32_t" to "__virtio32"; - reserver one entry in the vq for the driver to send cmd_id, instead of busywaiting for an available entry; - add "stop_update" check before queue_work for prudence purpose for now, will have a separate patch to discuss this flag check later; - init_vqs: change to put some variables on stack to have simpler implementation; - add destroy_workqueue(vb->balloon_wq); v21->v22: - add_one_sg: some code and comment re-arrangement - send_cmd_id: handle a cornercase For previous ChangeLog, please reference https://lwn.net/Articles/743660/ Wei Wang (4): mm: support reporting free page blocks virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT mm/page_poison: expose page_poisoning_enabled to kernel modules virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON drivers/virtio/virtio_balloon.c | 267 +++- include/linux/mm.h | 6 +
Re: [PATCH RFC] xfs, memcg: Call xfs_fs_nr_cached_objects() only in case of global reclaim
On Fri, Mar 23, 2018 at 03:39:53PM +0300, Kirill Tkhai wrote: > On 23.03.2018 02:46, Dave Chinner wrote: > > On Thu, Mar 22, 2018 at 07:52:37PM +0300, Kirill Tkhai wrote: > >> Here is the problem I'm solving: https://lkml.org/lkml/2018/3/21/365. > > > > Oh, finally you tell me what the problem is that you're trying to > > solve. I *asked this several times* and got no response. Thank you > > for wasting so much of my time. > > > >> Current shrinker is not scalable. Then there are many memcg and mounts, > >> the time of iteration shrink_slab() in case of global reclaim can > >> take much time. There is times of shrink_slab() by the link. A node > >> with 200 containers may waste 4 seconds on global reclaim just to > >> iterate over all shrinkers for all cgroups, call shrinker::count_objects() > >> and receive 0 zero objects. > > > > So, your problem is the way the memcgs were tacked onto the side > > of the list_lru infrastructure and are iterated, which has basically > > nothing to do with the way the low level XFS inode shrinker behaves. > > > > /me looks at the patches > > > > /me shudders at the thought of external "cache has freeable items" > > control for the shrinking of vfs caches. > > > > Biggest problem I see with this is the scope for coherency bugs ini > > the "memcg shrinker has freeable items" tracking. If that happens, > > there's no way of getting that memcg to run it's shrinker ever > > again. That seems very, very fragile and likely to be an endless > > source of OOM bugs. The whole point of the shrinker polling > > infrastructure is that it is not susceptible to this sort of bug. > > > > Really, the problem is that there's no separate list of memcg aware > > shrinkers, so every registered shrinker has to be iterated just to > > find the one shrinker that is memcg aware. > > I don't think the logic is difficult. There are generic rules, > and the only task is to teach them memcg-aware shrinkers. Currently, > they are only super block and workingsets shrinkers, and both of > them are based on generic list_lru infrastructure. Shrinker-related > bit is also cleared in generic code (shrink_slab()) only, and > the algorhythm doesn't allow to clear it without double check. > The only principle modification I'm thinking about is we should > clear the bit only when the shrinker is called with maximum > parameters: priority and GFP. Lots of "simple logic" combined together makes for a complex mass of difficult to understand and debug code. And, really, you're not suffering from a memcg problem - you're suffering from a "there are thousands of shrinkers" scalability issue because superblocks have per-superblock shrinker contexts and you have thousands of mounted filesystems. > There are a lot of performance improving synchronizations in kernel, > and they had been refused, the kernel would have remained in the age > of big kernel lock. That's false equivalence and hyperbole. The shrinkers are not limiting what every Linux user can do with their hardware. It's not a fundamental architectural limitation. These sorts of arguments are not convincing - this is the second time I've told you this, so please stick to technical arguments and drop the dramatic "anti-progress" conspiracy theory bullshit. > > So why not just do the simple thing which is create a separate > > "memcg-aware" shrinker list (i.e. create shrinker_list_memcg > > alongside shrinker_list) and only iterate the shrinker_list_memcg > > when a memcg is passed to shrink_slab()? > > > > That means we'll only run 2 shrinkers per memcg at most (sueprblock > > and working set) per memcg reclaim call. That's a simple 10-20 line > > change, not a whole mess of new code and issues... > > It was the first optimization, which came to my head, by there is no > almost a performance profit, since memcg-aware shrinkers still be called > per every memcg, and they are the biggest part of shrinkers in the system. Sure, but a polling algorithm is not a fundamental performance limitation. The problem here is that the memcg infrastructure has caused an exponential explosion in shrinker scanning. > >> Can't we call shrink of shared objects only for top memcg? Something like > >> this: > > > > What's a "shared object", and how is it any different to a normal > > slab cache object? > > Sorry, it's erratum. I'm speaking about cached objects. I mean something like > the below. The patch makes cached objects be cleared outside the memcg > iteration > cycle (it has not sense to call them for every memcg since cached object logic > just ignores memcg). The cached flag seems like a hack to me. It does nothing to address the number of shrinker callout calls (it actually increases them!), just tries to hack around something you want a specific shrinker to avoid doing. I've asked *repeatedly* for a description of the actual workload problems the XFS shrinker behaviour is causing you. In the absence of any workload description, I'm simply going to
Re: [PATCH RFC] xfs, memcg: Call xfs_fs_nr_cached_objects() only in case of global reclaim
On Fri, Mar 23, 2018 at 03:39:53PM +0300, Kirill Tkhai wrote: > On 23.03.2018 02:46, Dave Chinner wrote: > > On Thu, Mar 22, 2018 at 07:52:37PM +0300, Kirill Tkhai wrote: > >> Here is the problem I'm solving: https://lkml.org/lkml/2018/3/21/365. > > > > Oh, finally you tell me what the problem is that you're trying to > > solve. I *asked this several times* and got no response. Thank you > > for wasting so much of my time. > > > >> Current shrinker is not scalable. Then there are many memcg and mounts, > >> the time of iteration shrink_slab() in case of global reclaim can > >> take much time. There is times of shrink_slab() by the link. A node > >> with 200 containers may waste 4 seconds on global reclaim just to > >> iterate over all shrinkers for all cgroups, call shrinker::count_objects() > >> and receive 0 zero objects. > > > > So, your problem is the way the memcgs were tacked onto the side > > of the list_lru infrastructure and are iterated, which has basically > > nothing to do with the way the low level XFS inode shrinker behaves. > > > > /me looks at the patches > > > > /me shudders at the thought of external "cache has freeable items" > > control for the shrinking of vfs caches. > > > > Biggest problem I see with this is the scope for coherency bugs ini > > the "memcg shrinker has freeable items" tracking. If that happens, > > there's no way of getting that memcg to run it's shrinker ever > > again. That seems very, very fragile and likely to be an endless > > source of OOM bugs. The whole point of the shrinker polling > > infrastructure is that it is not susceptible to this sort of bug. > > > > Really, the problem is that there's no separate list of memcg aware > > shrinkers, so every registered shrinker has to be iterated just to > > find the one shrinker that is memcg aware. > > I don't think the logic is difficult. There are generic rules, > and the only task is to teach them memcg-aware shrinkers. Currently, > they are only super block and workingsets shrinkers, and both of > them are based on generic list_lru infrastructure. Shrinker-related > bit is also cleared in generic code (shrink_slab()) only, and > the algorhythm doesn't allow to clear it without double check. > The only principle modification I'm thinking about is we should > clear the bit only when the shrinker is called with maximum > parameters: priority and GFP. Lots of "simple logic" combined together makes for a complex mass of difficult to understand and debug code. And, really, you're not suffering from a memcg problem - you're suffering from a "there are thousands of shrinkers" scalability issue because superblocks have per-superblock shrinker contexts and you have thousands of mounted filesystems. > There are a lot of performance improving synchronizations in kernel, > and they had been refused, the kernel would have remained in the age > of big kernel lock. That's false equivalence and hyperbole. The shrinkers are not limiting what every Linux user can do with their hardware. It's not a fundamental architectural limitation. These sorts of arguments are not convincing - this is the second time I've told you this, so please stick to technical arguments and drop the dramatic "anti-progress" conspiracy theory bullshit. > > So why not just do the simple thing which is create a separate > > "memcg-aware" shrinker list (i.e. create shrinker_list_memcg > > alongside shrinker_list) and only iterate the shrinker_list_memcg > > when a memcg is passed to shrink_slab()? > > > > That means we'll only run 2 shrinkers per memcg at most (sueprblock > > and working set) per memcg reclaim call. That's a simple 10-20 line > > change, not a whole mess of new code and issues... > > It was the first optimization, which came to my head, by there is no > almost a performance profit, since memcg-aware shrinkers still be called > per every memcg, and they are the biggest part of shrinkers in the system. Sure, but a polling algorithm is not a fundamental performance limitation. The problem here is that the memcg infrastructure has caused an exponential explosion in shrinker scanning. > >> Can't we call shrink of shared objects only for top memcg? Something like > >> this: > > > > What's a "shared object", and how is it any different to a normal > > slab cache object? > > Sorry, it's erratum. I'm speaking about cached objects. I mean something like > the below. The patch makes cached objects be cleared outside the memcg > iteration > cycle (it has not sense to call them for every memcg since cached object logic > just ignores memcg). The cached flag seems like a hack to me. It does nothing to address the number of shrinker callout calls (it actually increases them!), just tries to hack around something you want a specific shrinker to avoid doing. I've asked *repeatedly* for a description of the actual workload problems the XFS shrinker behaviour is causing you. In the absence of any workload description, I'm simply going to
Re: [PATCH v1 08/16] rtc: mediatek: remove unnecessary irq_dispose_mapping
On Fri, 2018-03-23 at 11:38 +0100, Alexandre Belloni wrote: > On 23/03/2018 at 17:15:05 +0800, sean.w...@mediatek.com wrote: > > From: Sean Wang> > > > It's unnecessary doing irq_dispose_mapping as a reverse operation for > > platform_get_irq. > > > > Ususally, irq_dispose_mapping should be called in error path or module > > removal to release the resources for irq_of_parse_and_map requested. > > > > Signed-off-by: Sean Wang > > --- > > drivers/rtc/rtc-mt6397.c | 7 ++- > > 1 file changed, 2 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c > > index b62eaa8..cefb83b 100644 > > --- a/drivers/rtc/rtc-mt6397.c > > +++ b/drivers/rtc/rtc-mt6397.c > > @@ -17,7 +17,6 @@ > > #include > > #include > > #include > > -#include > > #include > > #include > > #include > > @@ -336,7 +335,7 @@ static int mtk_rtc_probe(struct platform_device *pdev) > > if (ret) { > > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n", > > rtc->irq, ret); > > - goto out_dispose_irq; > > + return ret; > > } > > > > device_init_wakeup(>dev, 1); > > @@ -353,8 +352,7 @@ static int mtk_rtc_probe(struct platform_device *pdev) > > > > out_free_irq: > > free_irq(rtc->irq, rtc->rtc_dev); > > -out_dispose_irq: > > - irq_dispose_mapping(rtc->irq); > > + > > Don't you still have a irq_create_mapping? > Sorry for that I didn't mention in the beginning that the series must depend on another patch [1]. With the patch, the job irq_create_mapping had been moved from rtc to mfd, so here it should be better to cleanup up irq_dispose_mapping in all paths. [1] https://patchwork.kernel.org/patch/9954643/ > > return ret; > > } > > > > @@ -364,7 +362,6 @@ static int mtk_rtc_remove(struct platform_device *pdev) > > > > rtc_device_unregister(rtc->rtc_dev); > > free_irq(rtc->irq, rtc->rtc_dev); > > - irq_dispose_mapping(rtc->irq); > > > > return 0; > > } > > -- > > 2.7.4 > > >
Re: [PATCH v1 08/16] rtc: mediatek: remove unnecessary irq_dispose_mapping
On Fri, 2018-03-23 at 11:38 +0100, Alexandre Belloni wrote: > On 23/03/2018 at 17:15:05 +0800, sean.w...@mediatek.com wrote: > > From: Sean Wang > > > > It's unnecessary doing irq_dispose_mapping as a reverse operation for > > platform_get_irq. > > > > Ususally, irq_dispose_mapping should be called in error path or module > > removal to release the resources for irq_of_parse_and_map requested. > > > > Signed-off-by: Sean Wang > > --- > > drivers/rtc/rtc-mt6397.c | 7 ++- > > 1 file changed, 2 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c > > index b62eaa8..cefb83b 100644 > > --- a/drivers/rtc/rtc-mt6397.c > > +++ b/drivers/rtc/rtc-mt6397.c > > @@ -17,7 +17,6 @@ > > #include > > #include > > #include > > -#include > > #include > > #include > > #include > > @@ -336,7 +335,7 @@ static int mtk_rtc_probe(struct platform_device *pdev) > > if (ret) { > > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n", > > rtc->irq, ret); > > - goto out_dispose_irq; > > + return ret; > > } > > > > device_init_wakeup(>dev, 1); > > @@ -353,8 +352,7 @@ static int mtk_rtc_probe(struct platform_device *pdev) > > > > out_free_irq: > > free_irq(rtc->irq, rtc->rtc_dev); > > -out_dispose_irq: > > - irq_dispose_mapping(rtc->irq); > > + > > Don't you still have a irq_create_mapping? > Sorry for that I didn't mention in the beginning that the series must depend on another patch [1]. With the patch, the job irq_create_mapping had been moved from rtc to mfd, so here it should be better to cleanup up irq_dispose_mapping in all paths. [1] https://patchwork.kernel.org/patch/9954643/ > > return ret; > > } > > > > @@ -364,7 +362,6 @@ static int mtk_rtc_remove(struct platform_device *pdev) > > > > rtc_device_unregister(rtc->rtc_dev); > > free_irq(rtc->irq, rtc->rtc_dev); > > - irq_dispose_mapping(rtc->irq); > > > > return 0; > > } > > -- > > 2.7.4 > > >
[no subject]
hi Linux https://goo.gl/BDc7JvDennis Aberilla
[no subject]
hi Linux https://goo.gl/BDc7JvDennis Aberilla
linux-next: manual merge of the i2c tree with the asm-generic tree
Hi Wolfram, Today's linux-next merge of the i2c tree got a conflict in: arch/blackfin/mach-bf561/boards/acvilon.c between commit: 120090af2745 ("arch: remove blackfin port") from the asm-generic tree and commit: eb49778c8c6c ("i2c: pca-platform: drop gpio from platform data") from the i2c tree. I fixed it up (I removed the file) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpJfUu18n4Li.pgp Description: OpenPGP digital signature
linux-next: manual merge of the i2c tree with the asm-generic tree
Hi Wolfram, Today's linux-next merge of the i2c tree got a conflict in: arch/blackfin/mach-bf561/boards/acvilon.c between commit: 120090af2745 ("arch: remove blackfin port") from the asm-generic tree and commit: eb49778c8c6c ("i2c: pca-platform: drop gpio from platform data") from the i2c tree. I fixed it up (I removed the file) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpJfUu18n4Li.pgp Description: OpenPGP digital signature
Re: [PATCH v2] scripts/kconfig: cleanup symbol handling code
On Mon, Mar 26, 2018 at 01:52:26AM +0900, Masahiro Yamada wrote: > I want to see Kconfig improvements in a bigger picture. > > The changes below are noise. That's understandable; I do agree that nothing here is _fundamentally_ broken at all, so no worries. -- Cheers, Joey Pabalinas signature.asc Description: PGP signature
Re: [PATCH v2] scripts/kconfig: cleanup symbol handling code
On Mon, Mar 26, 2018 at 01:52:26AM +0900, Masahiro Yamada wrote: > I want to see Kconfig improvements in a bigger picture. > > The changes below are noise. That's understandable; I do agree that nothing here is _fundamentally_ broken at all, so no worries. -- Cheers, Joey Pabalinas signature.asc Description: PGP signature