[PATCH] pci: Add a acs_disable option for pci kernel parameter

2017-10-26 Thread sbates
From: Stephen Bates 

On some servers the BIOS sets up ACS on any valid pci_dev in the
system. The kernel has no way of backing this out since the kernel
only turns ACS capabilities on.

This patch adds a new boot option to the pci kernel parameter called
"acs_disable" that will disable ACS. This is useful for PCI peer to
peer communication but can cause problems when IOVA isolation is
required and an IOMMU is enabled. Use with care.

Signed-off-by: Stephen Bates 
---
 Documentation/admin-guide/kernel-parameters.txt |  4 
 drivers/pci/pci.c   | 23 ---
 drivers/pci/pci.h   |  2 +-
 drivers/pci/probe.c |  4 ++--
 4 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 0549662..695eb12 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2907,6 +2907,10 @@
earlydump   [X86] dump PCI config space before the kernel
changes anything
off [X86] don't probe for the PCI bus
+   acs_disable [PCIE] disable access control services. Note
+   this can interfere with IOVA isolation if an 
IOMMU
+   is enabled but can be necessary when doing PCI
+   peer to peer communication. Use with care.
bios[X86-32] force use of PCI BIOS, don't access
the hardware directly. Use this if your machine
has a non-standard PCI host bridge.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6078dfc..ce33608 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -110,6 +110,9 @@ unsigned int pcibios_max_latency = 255;
 /* If set, the PCIe ARI capability will not be used. */
 static bool pcie_ari_disabled;
 
+/* If set, the PCIe ACS capability will be disabled. */
+static bool pci_acs_disable;
+
 /* Disable bridge_d3 for all PCIe ports */
 static bool pci_bridge_d3_disable;
 /* Force bridge_d3 for all PCIe ports */
@@ -1182,7 +1185,7 @@ void pci_restore_state(struct pci_dev *dev)
pci_restore_msi_state(dev);
 
/* Restore ACS and IOV configuration state */
-   pci_enable_acs(dev);
+   pci_config_acs(dev);
pci_restore_iov_state(dev);
 
dev->state_saved = false;
@@ -2821,11 +2824,23 @@ static void pci_std_enable_acs(struct pci_dev *dev)
 }
 
 /**
- * pci_enable_acs - enable ACS if hardware support it
+ * pci_config_acs - configure ACS
  * @dev: the PCI device
  */
-void pci_enable_acs(struct pci_dev *dev)
+void pci_config_acs(struct pci_dev *dev)
 {
+   int pos;
+
+   if (pci_acs_disable) {
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return;
+
+   dev_warn_ratelimited(>dev,
+"disabling ACS via pci_acs_disable\n");
+   pci_write_config_word(dev, pos + PCI_ACS_CTRL, 0);
+   }
+
if (!pci_acs_enable)
return;
 
@@ -5471,6 +5486,8 @@ static int __init pci_setup(char *str)
if (*str && (str = pcibios_setup(str)) && *str) {
if (!strcmp(str, "nomsi")) {
pci_no_msi();
+   } else if (!strcmp(str, "acs_disable")) {
+   pci_acs_disable = true;
} else if (!strcmp(str, "noaer")) {
pci_no_aer();
} else if (!strncmp(str, "realloc=", 8)) {
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a6560c9..16d94d3 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -338,7 +338,7 @@ static inline resource_size_t pci_resource_alignment(struct 
pci_dev *dev,
return resource_alignment(res);
 }
 
-void pci_enable_acs(struct pci_dev *dev);
+void pci_config_acs(struct pci_dev *dev);
 
 #ifdef CONFIG_PCIE_PTM
 void pci_ptm_init(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ff94b69..86c3299 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2020,8 +2020,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
/* Address Translation Services */
pci_ats_init(dev);
 
-   /* Enable ACS P2P upstream forwarding */
-   pci_enable_acs(dev);
+   /* Configure ACS P2P upstream forwarding */
+   pci_config_acs(dev);
 
/* Precision Time Measurement */
pci_ptm_init(dev);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management

2017-10-26 Thread Rafael J. Wysocki
On Thursday, October 26, 2017 10:41:40 PM CEST Wolfram Sang wrote:
> On Mon, Oct 16, 2017 at 03:31:17AM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki 
> > 
> > Optimize the power management in i2c-designware-platdrv by making it
> > set the DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED which
> > allows some code to be dropped from its PM callbacks.
> > 
> > First, setting DPM_FLAG_SMART_SUSPEND causes the intel-lpss driver
> > to avoid resuming i2c-designware-platdrv devices in its ->prepare
> > callback, so they can stay in runtime suspend after that point even
> > if the direct-complete feature is not used for them.
> > 
> > It also causes the PM core to avoid invoking "late" and "noirq"
> > suspend callbacks for these devices if they are in runtime suspend
> > at the beginning of the "late" phase of device suspend during
> > system suspend.  That guarantees dw_i2c_plat_suspend() to be
> > called for a device only if it is not in runtime suspend.
> > Moreover, it also causes the PM core to set the device's runtime
> > PM status to "active" after calling dw_i2c_plat_resume() for
> > it, so the driver doesn't need internal flags to avoid invoking
> > either dw_i2c_plat_suspend() or dw_i2c_plat_resume() twice in
> > a row.
> > 
> > Second, setting DPM_FLAG_LEAVE_SUSPENDED enables the optimization
> > allowing the device to stay suspended after system resume under
> > suitable conditions, so again the driver doesn't need to take
> > care of that by itself.
> > 
> > Accordingly, the internal "suspended" and "skip_resume" flags
> > used by the driver are not necessary any more, so drop them and
> > simplify the driver's PM callbacks.
> > 
> > Additionally, notice that dw_i2c_plat_complete() only needs
> > to schedule runtime PM for the device if platform firmware
> > has been involved in resuming the system, so make it call
> > pm_resume_via_firmware() to check that.
> > 
> > Signed-off-by: Rafael J. Wysocki 
> 
> So, if the designware maintainers ack it, I will, too.

Thanks!

I need to post a new revision of the core patches, so I'll send this one
again later.  Likely during the next cycle.

Thanks,
Rafael


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-26 Thread David Rientjes
On Thu, 26 Oct 2017, Johannes Weiner wrote:

> > The nack is for three reasons:
> > 
> >  (1) unfair comparison of root mem cgroup usage to bias against that mem 
> >  cgroup from oom kill in system oom conditions,
> > 
> >  (2) the ability of users to completely evade the oom killer by attaching
> >  all processes to child cgroups either purposefully or unpurposefully,
> >  and
> > 
> >  (3) the inability of userspace to effectively control oom victim  
> >  selection.
> 
> My apologies if my summary was too reductionist.
> 
> That being said, the arguments you repeat here have come up in
> previous threads and been responded to. This doesn't change my
> conclusion that your NAK is bogus.
> 

They actually haven't been responded to, Roman was working through v11 and 
made a change on how the root mem cgroup usage was calculated that was 
better than previous iterations but still not an apples to apples 
comparison with other cgroups.  The problem is that it the calculation for 
leaf cgroups includes additional memory classes, so it biases against 
processes that are moved to non-root mem cgroups.  Simply creating mem 
cgroups and attaching processes should not independently cause them to 
become more preferred: it should be a fair comparison between the root mem 
cgroup and the set of leaf mem cgroups as implemented.  That is very 
trivial to do with hierarchical oom cgroup scoring.

Since the ability of userspace to control oom victim selection is not 
addressed whatsoever by this patchset, and the suggested method cannot be 
implemented on top of this patchset as you have argued because it requires 
a change to the heuristic itself, the patchset needs to become complete 
before being mergeable.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] cpuset: Enable cpuset controller in default hierarchy

2017-10-26 Thread Christian Brauner
On Thu, Oct 26, 2017 at 02:12:01PM -0400, Waiman Long wrote:
> On 10/26/2017 10:39 AM, Tejun Heo wrote:
> > Hello, Waiman.
> >
> > On Wed, Oct 25, 2017 at 11:50:34AM -0400, Waiman Long wrote:
> >> Ping! Any comment on this patch?

Fwiw, I just saw this patch today for some weird reason.

> > Sorry about the lack of response.  Here are my two thoughts.
> >
> > 1. I'm not really sure about the memory part.  Mostly because of the
> >way it's configured and enforced is completely out of step with how
> >mm behaves in general.  I'd like to get more input from mm folks on
> >this.
> 
> Yes, I also have doubt about which of the additional features are being
> actively used. That is why the current patch exposes only the memory_migrate
> flag in addition to the core *cpus and *mems control files. All the
> other v1 features are not exposed waiting for further investigation and
> feedback. One way to get more feedback is to have something that people
> can play with. Maybe we could somehow tag it as experimental so that we
> can change the interface later on, when necessary, if you have concern
> about setting the APIs in stone.

This sounds like a reasonable approach to me. The cpuset controller is quite
important from a userspace (especially container) perspective. So making this
an experimental feature for a while to gather feedback seems worth it. I'd be
happy to carry/receive some experimental patches in a liblxc branch for cgroup
v2 to see where the current cpuset controller implementation currently gets us
and send/discuss patches where needed.

> 
> > 2. I want to think more about how we expose the effective settings.
> >Not that anything is wrong with what cpuset does, but more that I
> >wanna ensure that it's something we can follow in other cases where
> >we have similar hierarchical property propagation.
> 
> Currently, the effective setting is exposed via the effective_cpus and
> effective_mems control files. Unlike other controllers that control
> resources, cpuset is unique in the sense that it is propagating
> hierarchical constraints on CPUs and memory nodes down the tree. I
> understand your desire to have a unified framework that can be applied
> to most controllers, but I doubt cpuset is a good model in this regard.
> 
> Cheers,
> Longman
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management

2017-10-26 Thread Wolfram Sang
On Mon, Oct 16, 2017 at 03:31:17AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Optimize the power management in i2c-designware-platdrv by making it
> set the DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED which
> allows some code to be dropped from its PM callbacks.
> 
> First, setting DPM_FLAG_SMART_SUSPEND causes the intel-lpss driver
> to avoid resuming i2c-designware-platdrv devices in its ->prepare
> callback, so they can stay in runtime suspend after that point even
> if the direct-complete feature is not used for them.
> 
> It also causes the PM core to avoid invoking "late" and "noirq"
> suspend callbacks for these devices if they are in runtime suspend
> at the beginning of the "late" phase of device suspend during
> system suspend.  That guarantees dw_i2c_plat_suspend() to be
> called for a device only if it is not in runtime suspend.
> Moreover, it also causes the PM core to set the device's runtime
> PM status to "active" after calling dw_i2c_plat_resume() for
> it, so the driver doesn't need internal flags to avoid invoking
> either dw_i2c_plat_suspend() or dw_i2c_plat_resume() twice in
> a row.
> 
> Second, setting DPM_FLAG_LEAVE_SUSPENDED enables the optimization
> allowing the device to stay suspended after system resume under
> suitable conditions, so again the driver doesn't need to take
> care of that by itself.
> 
> Accordingly, the internal "suspended" and "skip_resume" flags
> used by the driver are not necessary any more, so drop them and
> simplify the driver's PM callbacks.
> 
> Additionally, notice that dw_i2c_plat_complete() only needs
> to schedule runtime PM for the device if platform firmware
> has been involved in resuming the system, so make it call
> pm_resume_via_firmware() to check that.
> 
> Signed-off-by: Rafael J. Wysocki 

So, if the designware maintainers ack it, I will, too.



signature.asc
Description: PGP signature


Re: [PATCH v3] cpuset: Enable cpuset controller in default hierarchy

2017-10-26 Thread Waiman Long
On 10/26/2017 10:39 AM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Wed, Oct 25, 2017 at 11:50:34AM -0400, Waiman Long wrote:
>> Ping! Any comment on this patch?
> Sorry about the lack of response.  Here are my two thoughts.
>
> 1. I'm not really sure about the memory part.  Mostly because of the
>way it's configured and enforced is completely out of step with how
>mm behaves in general.  I'd like to get more input from mm folks on
>this.

Yes, I also have doubt about which of the additional features are being
actively used. That is why the current patch exposes only the memory_migrate
flag in addition to the core *cpus and *mems control files. All the
other v1 features are not exposed waiting for further investigation and
feedback. One way to get more feedback is to have something that people
can play with. Maybe we could somehow tag it as experimental so that we
can change the interface later on, when necessary, if you have concern
about setting the APIs in stone.

> 2. I want to think more about how we expose the effective settings.
>Not that anything is wrong with what cpuset does, but more that I
>wanna ensure that it's something we can follow in other cases where
>we have similar hierarchical property propagation.

Currently, the effective setting is exposed via the effective_cpus and
effective_mems control files. Unlike other controllers that control
resources, cpuset is unique in the sense that it is propagating
hierarchical constraints on CPUs and memory nodes down the tree. I
understand your desire to have a unified framework that can be applied
to most controllers, but I doubt cpuset is a good model in this regard.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/5] gpio: gpiolib: Add chardev support for maintaining GPIO values on reset

2017-10-26 Thread Charles Keepax
On Thu, Oct 26, 2017 at 10:35:39AM +1030, Andrew Jeffery wrote:
> On Wed, 2017-10-25 at 09:14 +0100, Charles Keepax wrote:
> > On Fri, Oct 20, 2017 at 07:32:53PM +1030, Andrew Jeffery wrote:
> > > On Fri, 2017-10-20 at 09:27 +0200, Linus Walleij wrote:
> > > > I don't see it as helpful to give userspace control over whether the 
> > > > line
> > > > is persistent or not. It is more reasonable to assume persistance for
> > > > userspace use cases, don't you think? Whether the system goes to sleep
> > > > or the gpiochip resets should not make a door suddenly close or the
> > > > lights in the christmas tree go out, right? I think if the gpiochip 
> > > > supports
> > > > persistance of any kind, we should try to use it and not have userspace
> > > > provide flags for that.
> > > 
> > > Right. I guess the counter argument to your examples is if the gpio is
> > > controlling any active process that we don't want to continue if we've
> > > lost the capacity to monitor some other inputs (some kind of dead-man's 
> > > switch). But maybe the argument is that should be implemented in the
> > > kernel anyway?
> > > 
> > 
> > To me it certainly feels like decisions like this should live in
> > the kernel, your talking about things that could cause very weird
> > hardware behaviour if set wrong, so it makes sense to me to have
> > that responsibility guarded in the kernel.
> 
> I feel that taking this argument to its logical conclusion leads to
> never exporting any GPIOs to userspace and doing everything in the
> kernel. If userspace has exported the GPIO and is managing its state,
> then it can *already* cause very weird hardware behaviour if set wrong.
> The fact that userspace is controlling the GPIO state and not the
> kernel already says that the kernel doesn't know how to manage it, so
> why not expose the option for userspace to set the persistence, given
> that it should know what it's doing?

Admittedly yes, I guess it really comes down to use-cases.  There
are fairly strong use-cases to control GPIOs from user-space
that justify the risks. The use-cases for being able to set
non-persistent GPIOs from user-space seem less clear to me, but
if they exist I certainly don't have any objection.

Thanks,
Charles
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html