date:20160315

Re: [PATCH v18 19/22] richacl: Add richacl xattr handler

2016-03-15 Thread J. Bruce Fields

On Tue, Mar 15, 2016 at 12:10:14AM -0700, Christoph Hellwig wrote:
> On Fri, Mar 11, 2016 at 09:19:05AM -0500, J. Bruce Fields wrote:
> > On Fri, Mar 11, 2016 at 06:17:35AM -0800, Christoph Hellwig wrote:
> > > On Mon, Feb 29, 2016 at 09:17:24AM +0100, Andreas Gruenbacher wrote:
> > > > Add richacl xattr handler implementing the xattr operations based on the
> > > > get_richacl and set_richacl inode operations.
> > > 
> > > Given all the issues with Posix ACLs and selinux attributes these really
> > > should be proper syscalls instead of abusing the xattr interface.
> > 
> > What are those problems exactly?
> 
> That people get confused between the attr used by the xattr syscall
> interface and the attr used to store things on disk or the protocol.
> This has happened every time we have non-native support, e.g. XFS, NFS,
> CIFS, ntfs, etc.  And it's only going to become worse.

How has that confusion caused problems in practice?

--b.

Re: [PATCH v18 19/22] richacl: Add richacl xattr handler

2016-03-15 Thread J. Bruce Fields

On Tue, Mar 15, 2016 at 12:10:14AM -0700, Christoph Hellwig wrote:
> On Fri, Mar 11, 2016 at 09:19:05AM -0500, J. Bruce Fields wrote:
> > On Fri, Mar 11, 2016 at 06:17:35AM -0800, Christoph Hellwig wrote:
> > > On Mon, Feb 29, 2016 at 09:17:24AM +0100, Andreas Gruenbacher wrote:
> > > > Add richacl xattr handler implementing the xattr operations based on the
> > > > get_richacl and set_richacl inode operations.
> > > 
> > > Given all the issues with Posix ACLs and selinux attributes these really
> > > should be proper syscalls instead of abusing the xattr interface.
> > 
> > What are those problems exactly?
> 
> That people get confused between the attr used by the xattr syscall
> interface and the attr used to store things on disk or the protocol.
> This has happened every time we have non-native support, e.g. XFS, NFS,
> CIFS, ntfs, etc.  And it's only going to become worse.

How has that confusion caused problems in practice?

--b.

[RFC v2 -next 2/2] virtio_net: Read the advised MTU

2016-03-15 Thread Aaron Conole

This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
exists, read the advised MTU and use it.

No proper error handling is provided for the case where a user changes the
negotiated MTU. A future commit will add proper error handling. Instead, a
warning is emitted if the guest changes the device MTU after previously
being given advice.

Signed-off-by: Aaron Conole 
---
v2:
* Whitespace cleanup in the last hunk
* Code style change around the pr_warn
* Additional test for mtu change before printing warning

 drivers/net/virtio_net.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 767ab11..429fe01 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -146,6 +146,7 @@ struct virtnet_info {
virtio_net_ctrl_ack ctrl_status;
u8 ctrl_promisc;
u8 ctrl_allmulti;
+   bool negotiated_mtu;
 };
 
 struct padded_vnet_hdr {
@@ -1390,8 +1391,11 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
 
 static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
 {
+   struct virtnet_info *vi = netdev_priv(dev);
if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
return -EINVAL;
+   if ((vi->negotiated_mtu) && (dev->mtu != new_mtu))
+   pr_warn("changing mtu while the advised mtu bit exists.");
dev->mtu = new_mtu;
return 0;
 }
@@ -1836,6 +1840,13 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
vi->has_cvq = true;
 
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
+   vi->negotiated_mtu = true;
+   dev->mtu = virtio_cread16(vdev,
+ offsetof(struct virtio_net_config,
+  mtu));
+   }
+
if (vi->any_header_sg)
dev->needed_headroom = vi->hdr_len;
 
@@ -2019,6 +2030,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
VIRTIO_NET_F_CTRL_MAC_ADDR,
VIRTIO_F_ANY_LAYOUT,
+   VIRTIO_NET_F_MTU,
 };
 
 static struct virtio_driver virtio_net_driver = {
-- 
2.5.0

[RFC v2 -next 2/2] virtio_net: Read the advised MTU

2016-03-15 Thread Aaron Conole

This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
exists, read the advised MTU and use it.

No proper error handling is provided for the case where a user changes the
negotiated MTU. A future commit will add proper error handling. Instead, a
warning is emitted if the guest changes the device MTU after previously
being given advice.

Signed-off-by: Aaron Conole 
---
v2:
* Whitespace cleanup in the last hunk
* Code style change around the pr_warn
* Additional test for mtu change before printing warning

 drivers/net/virtio_net.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 767ab11..429fe01 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -146,6 +146,7 @@ struct virtnet_info {
virtio_net_ctrl_ack ctrl_status;
u8 ctrl_promisc;
u8 ctrl_allmulti;
+   bool negotiated_mtu;
 };
 
 struct padded_vnet_hdr {
@@ -1390,8 +1391,11 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
 
 static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
 {
+   struct virtnet_info *vi = netdev_priv(dev);
if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
return -EINVAL;
+   if ((vi->negotiated_mtu) && (dev->mtu != new_mtu))
+   pr_warn("changing mtu while the advised mtu bit exists.");
dev->mtu = new_mtu;
return 0;
 }
@@ -1836,6 +1840,13 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
vi->has_cvq = true;
 
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
+   vi->negotiated_mtu = true;
+   dev->mtu = virtio_cread16(vdev,
+ offsetof(struct virtio_net_config,
+  mtu));
+   }
+
if (vi->any_header_sg)
dev->needed_headroom = vi->hdr_len;
 
@@ -2019,6 +2030,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
VIRTIO_NET_F_CTRL_MAC_ADDR,
VIRTIO_F_ANY_LAYOUT,
+   VIRTIO_NET_F_MTU,
 };
 
 static struct virtio_driver virtio_net_driver = {
-- 
2.5.0

[RFC v2 -next 0/2] virtio-net: Advised MTU feature

2016-03-15 Thread Aaron Conole

The following series adds the ability for a hypervisor to set an MTU on the
guest during feature negotiation phase. This is useful for VM orchestration
when, for instance, tunneling is involved and the MTU of the various systems
should be homogenous.

The first patch adds the feature bit as described in the proposed VFIO spec
addition found at
https://lists.oasis-open.org/archives/virtio-dev/201603/msg1.html

The second patch adds a user of the bit, and a warning when the guest changes
the MTU from the hypervisor advised MTU. Future patches may add more thorough
error handling.

v2:
* Whitespace and code style cleanups from Sergei Shtylyov and Paolo Abeni
* Additional test before printing a warning

Aaron Conole (2):
  virtio: Start feature MTU support
  virtio_net: Read the advised MTU

 drivers/net/virtio_net.c| 12 
 include/uapi/linux/virtio_net.h |  3 +++
 2 files changed, 15 insertions(+)

-- 
2.5.0

[RFC v2 -next 1/2] virtio: Start feature MTU support

2016-03-15 Thread Aaron Conole

This commit adds the feature bit and associated mtu device entry for the
virtio network device. Future commits will make use of these bits to support
negotiated MTU.

Signed-off-by: Aaron Conole 
---
v2:
* No change

 include/uapi/linux/virtio_net.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index ec32293..41a6a01 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -55,6 +55,7 @@
 #define VIRTIO_NET_F_MQ22  /* Device supports Receive Flow
 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
+#define VIRTIO_NET_F_MTU 25/* Device supports Default MTU Negotiation */
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO   6   /* Host handles pkts w/ any GSO type */
@@ -73,6 +74,8 @@ struct virtio_net_config {
 * Legal values are between 1 and 0x8000
 */
__u16 max_virtqueue_pairs;
+   /* Default maximum transmit unit advice */
+   __u16 mtu;
 } __attribute__((packed));
 
 /*
-- 
2.5.0

[RFC v2 -next 0/2] virtio-net: Advised MTU feature

2016-03-15 Thread Aaron Conole

The following series adds the ability for a hypervisor to set an MTU on the
guest during feature negotiation phase. This is useful for VM orchestration
when, for instance, tunneling is involved and the MTU of the various systems
should be homogenous.

The first patch adds the feature bit as described in the proposed VFIO spec
addition found at
https://lists.oasis-open.org/archives/virtio-dev/201603/msg1.html

The second patch adds a user of the bit, and a warning when the guest changes
the MTU from the hypervisor advised MTU. Future patches may add more thorough
error handling.

v2:
* Whitespace and code style cleanups from Sergei Shtylyov and Paolo Abeni
* Additional test before printing a warning

Aaron Conole (2):
  virtio: Start feature MTU support
  virtio_net: Read the advised MTU

 drivers/net/virtio_net.c| 12 
 include/uapi/linux/virtio_net.h |  3 +++
 2 files changed, 15 insertions(+)

-- 
2.5.0

[RFC v2 -next 1/2] virtio: Start feature MTU support

2016-03-15 Thread Aaron Conole

This commit adds the feature bit and associated mtu device entry for the
virtio network device. Future commits will make use of these bits to support
negotiated MTU.

Signed-off-by: Aaron Conole 
---
v2:
* No change

 include/uapi/linux/virtio_net.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index ec32293..41a6a01 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -55,6 +55,7 @@
 #define VIRTIO_NET_F_MQ22  /* Device supports Receive Flow
 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
+#define VIRTIO_NET_F_MTU 25/* Device supports Default MTU Negotiation */
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO   6   /* Host handles pkts w/ any GSO type */
@@ -73,6 +74,8 @@ struct virtio_net_config {
 * Legal values are between 1 and 0x8000
 */
__u16 max_virtqueue_pairs;
+   /* Default maximum transmit unit advice */
+   __u16 mtu;
 } __attribute__((packed));
 
 /*
-- 
2.5.0

Re: [RFC -next 2/2] virtio_net: Read and use the advised MTU

2016-03-15 Thread Aaron Conole

Sergei Shtylyov  writes:

> Hello.

Hi Sergei,

> On 03/10/2016 05:28 PM, Aaron Conole wrote:
>
>> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
>> exists, read the advised MTU and use it.
>>
>> No proper error handling is provided for the case where a user changes the
>> negotiated MTU. A future commit will add proper error handling. Instead, a
>> warning is emitted if the guest changes the device MTU after previously being
>> given advice.
>>
>> Signed-off-by: Aaron Conole 
>> ---
>>   drivers/net/virtio_net.c | 15 ++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 767ab11..7175563 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
> [...]
>> @@ -1390,8 +1391,12 @@ static const struct ethtool_ops virtnet_ethtool_ops = 
>> {
>>
>>   static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>>   {
>> +struct virtnet_info *vi = netdev_priv(dev);
>>  if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
>>  return -EINVAL;
>> +if (vi->negotiated_mtu == true) {
>> +pr_warn("changing mtu from negotiated mtu.");
>> +}
>
>{} not needed, see Documentation/CodingStyle.

Okay, I'll make sure to fix this with v2.

> [...]
>
> MBR, Sergei

Thanks so much for the review!

-Aaron

Re: [RFC -next 2/2] virtio_net: Read and use the advised MTU

2016-03-15 Thread Aaron Conole

Sergei Shtylyov  writes:

> Hello.

Hi Sergei,

> On 03/10/2016 05:28 PM, Aaron Conole wrote:
>
>> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
>> exists, read the advised MTU and use it.
>>
>> No proper error handling is provided for the case where a user changes the
>> negotiated MTU. A future commit will add proper error handling. Instead, a
>> warning is emitted if the guest changes the device MTU after previously being
>> given advice.
>>
>> Signed-off-by: Aaron Conole 
>> ---
>>   drivers/net/virtio_net.c | 15 ++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 767ab11..7175563 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
> [...]
>> @@ -1390,8 +1391,12 @@ static const struct ethtool_ops virtnet_ethtool_ops = 
>> {
>>
>>   static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>>   {
>> +struct virtnet_info *vi = netdev_priv(dev);
>>  if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
>>  return -EINVAL;
>> +if (vi->negotiated_mtu == true) {
>> +pr_warn("changing mtu from negotiated mtu.");
>> +}
>
>{} not needed, see Documentation/CodingStyle.

Okay, I'll make sure to fix this with v2.

> [...]
>
> MBR, Sergei

Thanks so much for the review!

-Aaron

Re: [RFC -next 2/2] virtio_net: Read and use the advised MTU

2016-03-15 Thread Aaron Conole

Paolo Abeni  writes:

> On Thu, 2016-03-10 at 09:28 -0500, Aaron Conole wrote:
>> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
>> exists, read the advised MTU and use it.
>> 
>> No proper error handling is provided for the case where a user changes the
>> negotiated MTU. A future commit will add proper error handling. Instead, a
>> warning is emitted if the guest changes the device MTU after previously being
>> given advice.
>> 
>> Signed-off-by: Aaron Conole 
>> ---
>>  drivers/net/virtio_net.c | 15 ++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 767ab11..7175563 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -146,6 +146,7 @@ struct virtnet_info {
>>  virtio_net_ctrl_ack ctrl_status;
>>  u8 ctrl_promisc;
>>  u8 ctrl_allmulti;
>> +bool negotiated_mtu;
>>  };
>>  
>>  struct padded_vnet_hdr {
>> @@ -1390,8 +1391,12 @@ static const struct ethtool_ops virtnet_ethtool_ops = 
>> {
>>  
>>  static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>>  {
>> +struct virtnet_info *vi = netdev_priv(dev);
>>  if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
>>  return -EINVAL;
>> +if (vi->negotiated_mtu == true) {
>
> why don't:
>
> if ((vi->negotiated_mtu == true) && (dev->mtu != new_mtu))
>
> ?

Okay, I'll put this test in.

>> +pr_warn("changing mtu from negotiated mtu.");
>> +}
>>  dev->mtu = new_mtu;
>>  return 0;
>>  }
>> @@ -1836,6 +1841,13 @@ static int virtnet_probe(struct virtio_device *vdev)
>>  if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
>>  vi->has_cvq = true;
>>  
>> +if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>> +vi->negotiated_mtu = true;
>> +dev->mtu = virtio_cread16(vdev,
>> +  offsetof(struct virtio_net_config,
>> +   mtu));
>> +}
>> +
>>  if (vi->any_header_sg)
>>  dev->needed_headroom = vi->hdr_len;
>>  
>> @@ -2017,8 +2029,9 @@ static unsigned int features[] = {
>>  VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
>>  VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
>>  VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
>> -VIRTIO_NET_F_CTRL_MAC_ADDR,
>> +VIRTIO_NET_F_CTRL_MAC_ADDR, 
>
> Here a trailing white space slipped-in.
>
> Otherwise LGTM.
>
> Paolo

D'oh! Okay, v2 will have this fixed.

>>  VIRTIO_F_ANY_LAYOUT,
>> +VIRTIO_NET_F_MTU,
>>  };
>>  
>>  static struct virtio_driver virtio_net_driver = {

Thanks so much for the review, Paolo!

-Aaron

Re: [RFC -next 2/2] virtio_net: Read and use the advised MTU

2016-03-15 Thread Aaron Conole

Paolo Abeni  writes:

> On Thu, 2016-03-10 at 09:28 -0500, Aaron Conole wrote:
>> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
>> exists, read the advised MTU and use it.
>> 
>> No proper error handling is provided for the case where a user changes the
>> negotiated MTU. A future commit will add proper error handling. Instead, a
>> warning is emitted if the guest changes the device MTU after previously being
>> given advice.
>> 
>> Signed-off-by: Aaron Conole 
>> ---
>>  drivers/net/virtio_net.c | 15 ++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 767ab11..7175563 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -146,6 +146,7 @@ struct virtnet_info {
>>  virtio_net_ctrl_ack ctrl_status;
>>  u8 ctrl_promisc;
>>  u8 ctrl_allmulti;
>> +bool negotiated_mtu;
>>  };
>>  
>>  struct padded_vnet_hdr {
>> @@ -1390,8 +1391,12 @@ static const struct ethtool_ops virtnet_ethtool_ops = 
>> {
>>  
>>  static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>>  {
>> +struct virtnet_info *vi = netdev_priv(dev);
>>  if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
>>  return -EINVAL;
>> +if (vi->negotiated_mtu == true) {
>
> why don't:
>
> if ((vi->negotiated_mtu == true) && (dev->mtu != new_mtu))
>
> ?

Okay, I'll put this test in.

>> +pr_warn("changing mtu from negotiated mtu.");
>> +}
>>  dev->mtu = new_mtu;
>>  return 0;
>>  }
>> @@ -1836,6 +1841,13 @@ static int virtnet_probe(struct virtio_device *vdev)
>>  if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
>>  vi->has_cvq = true;
>>  
>> +if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>> +vi->negotiated_mtu = true;
>> +dev->mtu = virtio_cread16(vdev,
>> +  offsetof(struct virtio_net_config,
>> +   mtu));
>> +}
>> +
>>  if (vi->any_header_sg)
>>  dev->needed_headroom = vi->hdr_len;
>>  
>> @@ -2017,8 +2029,9 @@ static unsigned int features[] = {
>>  VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
>>  VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
>>  VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
>> -VIRTIO_NET_F_CTRL_MAC_ADDR,
>> +VIRTIO_NET_F_CTRL_MAC_ADDR, 
>
> Here a trailing white space slipped-in.
>
> Otherwise LGTM.
>
> Paolo

D'oh! Okay, v2 will have this fixed.

>>  VIRTIO_F_ANY_LAYOUT,
>> +VIRTIO_NET_F_MTU,
>>  };
>>  
>>  static struct virtio_driver virtio_net_driver = {

Thanks so much for the review, Paolo!

-Aaron

Re: [PATCH 0/5] Modularize PCI_DW related drivers.

2016-03-15 Thread Murali Karicheri

On 03/01/2016 04:35 PM, Arnd Bergmann wrote:
> On Monday 29 February 2016 14:59:35 Kishon Vijay Abraham I wrote:
>>> }
>>>
>>> You just need to pass the same resource in here htat you pass into
>>> pci_remap_iospace().
>>
>> I still seem to get the abort in ioremap_page_range().
>>
>> Here's the patch I used [3] and here's the kernel log [4].
>>
>> [3] -> http://pastebin.ubuntu.com/15241614/
>> [4] -> http://pastebin.ubuntu.com/15241637/
>>
>>
> 
> Sorry, I'm out of ideas here. The patch looks right to me, but the problem
> looks unchanged.
> 
>   Arnd
> 

Was there any progress since this last response from Arnd? or is it a TBD?

-- 
Murali Karicheri
Linux Kernel, Keystone

Re: [PATCH 0/5] Modularize PCI_DW related drivers.

2016-03-15 Thread Murali Karicheri

On 03/01/2016 04:35 PM, Arnd Bergmann wrote:
> On Monday 29 February 2016 14:59:35 Kishon Vijay Abraham I wrote:
>>> }
>>>
>>> You just need to pass the same resource in here htat you pass into
>>> pci_remap_iospace().
>>
>> I still seem to get the abort in ioremap_page_range().
>>
>> Here's the patch I used [3] and here's the kernel log [4].
>>
>> [3] -> http://pastebin.ubuntu.com/15241614/
>> [4] -> http://pastebin.ubuntu.com/15241637/
>>
>>
> 
> Sorry, I'm out of ideas here. The patch looks right to me, but the problem
> looks unchanged.
> 
>   Arnd
> 

Was there any progress since this last response from Arnd? or is it a TBD?

-- 
Murali Karicheri
Linux Kernel, Keystone

Re: [PATCH] kbuild: drop FORCE from PHONY targets

2016-03-15 Thread Andy Lutomirski

On Tue, Mar 15, 2016 at 1:45 PM, Michal Marek  wrote:
> Dne 15.3.2016 v 19:27 Andy Lutomirski napsal(a):
>> Fair enough, although I'm curious why this happens.  It might be worth
>> changing the docs to say that .PHONY is *not* an substitute for FORCE
>> in that context, then.
>
> These two are unrelated, except that FORCE is redundant for a .PHONY
> target. FORCE is our idiom to tell make to always remake the target and
> let us handle the dependencies manually. Listing a target as .PHONY
> tells make that the target will not produce a file of the same name
> (typically, "all", "install", etc).
>

Except that apparently if-changed doesn't work on .PHONY targets that
don't specify FORCE, which confuses me.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

Re: [PATCH] kbuild: drop FORCE from PHONY targets

2016-03-15 Thread Andy Lutomirski

On Tue, Mar 15, 2016 at 1:45 PM, Michal Marek  wrote:
> Dne 15.3.2016 v 19:27 Andy Lutomirski napsal(a):
>> Fair enough, although I'm curious why this happens.  It might be worth
>> changing the docs to say that .PHONY is *not* an substitute for FORCE
>> in that context, then.
>
> These two are unrelated, except that FORCE is redundant for a .PHONY
> target. FORCE is our idiom to tell make to always remake the target and
> let us handle the dependencies manually. Listing a target as .PHONY
> tells make that the target will not produce a file of the same name
> (typically, "all", "install", etc).
>

Except that apparently if-changed doesn't work on .PHONY targets that
don't specify FORCE, which confuses me.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

[PATCH 1/2] btrfs: cleaner_kthread() doesn't need explicit freeze

2016-03-15 Thread Jiri Kosina

cleaner_kthread() is not marked freezable, and therefore calling 
try_to_freeze() in its context is a pointless no-op.

In addition to that, as has been clearly demonstrated by 80ad623edd2d 
("Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"), it's perfectly 
valid / legal for cleaner_kthread() to stay scheduled out in an arbitrary 
place during suspend (in that particular example that was waiting for 
reading of extent pages), so there is no need to leave any traces of 
freezer in this kthread.

Fixes: 80ad623edd2d ("Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()")
Fixes: 696249132158 ("btrfs: clear PF_NOFREEZE in cleaner_kthread()")
Signed-off-by: Jiri Kosina 
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4545e2e..d8d68af 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1830,7 +1830,7 @@ static int cleaner_kthread(void *arg)
 */
btrfs_delete_unused_bgs(root->fs_info);
 sleep:
-   if (!try_to_freeze() && !again) {
+   if (!again) {
set_current_state(TASK_INTERRUPTIBLE);
if (!kthread_should_stop())
schedule();
-- 
Jiri Kosina
SUSE Labs

[PATCH 2/2] btrfs: transaction_kthread() is not freezable

2016-03-15 Thread Jiri Kosina

transaction_kthread() is calling try_to_freeze(), but that's just an 
expeinsive no-op given the fact that the thread is not marked freezable.

After removing this, disk-io.c is now independent on freezer API.

Signed-off-by: Jiri Kosina 
---
 fs/btrfs/disk-io.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d8d68af..4c7361a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1920,14 +1919,12 @@ sleep:
if (unlikely(test_bit(BTRFS_FS_STATE_ERROR,
  >fs_info->fs_state)))
btrfs_cleanup_transaction(root);
-   if (!try_to_freeze()) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   if (!kthread_should_stop() &&
-   (!btrfs_transaction_blocked(root->fs_info) ||
-cannot_commit))
-   schedule_timeout(delay);
-   __set_current_state(TASK_RUNNING);
-   }
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (!kthread_should_stop() &&
+   (!btrfs_transaction_blocked(root->fs_info) ||
+cannot_commit))
+   schedule_timeout(delay);
+   __set_current_state(TASK_RUNNING);
} while (!kthread_should_stop());
return 0;
 }

-- 
Jiri Kosina
SUSE Labs

[PATCH 1/2] btrfs: cleaner_kthread() doesn't need explicit freeze

2016-03-15 Thread Jiri Kosina

cleaner_kthread() is not marked freezable, and therefore calling 
try_to_freeze() in its context is a pointless no-op.

In addition to that, as has been clearly demonstrated by 80ad623edd2d 
("Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"), it's perfectly 
valid / legal for cleaner_kthread() to stay scheduled out in an arbitrary 
place during suspend (in that particular example that was waiting for 
reading of extent pages), so there is no need to leave any traces of 
freezer in this kthread.

Fixes: 80ad623edd2d ("Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()")
Fixes: 696249132158 ("btrfs: clear PF_NOFREEZE in cleaner_kthread()")
Signed-off-by: Jiri Kosina 
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4545e2e..d8d68af 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1830,7 +1830,7 @@ static int cleaner_kthread(void *arg)
 */
btrfs_delete_unused_bgs(root->fs_info);
 sleep:
-   if (!try_to_freeze() && !again) {
+   if (!again) {
set_current_state(TASK_INTERRUPTIBLE);
if (!kthread_should_stop())
schedule();
-- 
Jiri Kosina
SUSE Labs

[PATCH 2/2] btrfs: transaction_kthread() is not freezable

2016-03-15 Thread Jiri Kosina

transaction_kthread() is calling try_to_freeze(), but that's just an 
expeinsive no-op given the fact that the thread is not marked freezable.

After removing this, disk-io.c is now independent on freezer API.

Signed-off-by: Jiri Kosina 
---
 fs/btrfs/disk-io.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d8d68af..4c7361a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1920,14 +1919,12 @@ sleep:
if (unlikely(test_bit(BTRFS_FS_STATE_ERROR,
  >fs_info->fs_state)))
btrfs_cleanup_transaction(root);
-   if (!try_to_freeze()) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   if (!kthread_should_stop() &&
-   (!btrfs_transaction_blocked(root->fs_info) ||
-cannot_commit))
-   schedule_timeout(delay);
-   __set_current_state(TASK_RUNNING);
-   }
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (!kthread_should_stop() &&
+   (!btrfs_transaction_blocked(root->fs_info) ||
+cannot_commit))
+   schedule_timeout(delay);
+   __set_current_state(TASK_RUNNING);
} while (!kthread_should_stop());
return 0;
 }

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH 8/8] sched: prefer cpufreq_scale_freq_capacity

2016-03-15 Thread Michael Turquette

Quoting Dietmar Eggemann (2016-03-15 12:13:58)
> On 14/03/16 05:22, Michael Turquette wrote:
> > arch_scale_freq_capacity is weird. It specifies an arch hook for an
> > implementation that could easily vary within an architecture or even a
> > chip family.
> > 
> > This patch helps to mitigate this weirdness by defaulting to the
> > cpufreq-provided implementation, which should work for all cases where
> > CONFIG_CPU_FREQ is set.
> > 
> > If CONFIG_CPU_FREQ is not set, then try to use an implementation
> > provided by the architecture. Failing that, fall back to
> > SCHED_CAPACITY_SCALE.
> > 
> > It may be desirable for cpufreq drivers to specify their own
> > implementation of arch_scale_freq_capacity in the future. The same is
> > true for platform code within an architecture. In both cases an
> > efficient implementation selector will need to be created and this patch
> > adds a comment to that effect.
> 
> For me this independence of the scheduler code towards the actual
> implementation of the Frequency Invariant Engine (FEI) was actually a
> feature.

I do not agree that it is a strength; I think it is confusing. My
opinion is that cpufreq drivers should implement
arch_scale_freq_capacity. Having a sane fallback
(cpufreq_scale_freq_capacity) simply means that you can remove the
boilerplate from the arm32 and arm64 code, which is a win.

Furthermore, if we have multiple competing implementations of
arch_scale_freq_invariance, wouldn't it be better for all of them to
live in cpufreq drivers? This means we would only need to implement a
single run-time "selector".

On the other hand, if the implementation lives in arch code and we have
various implementations of arch_scale_freq_capacity within an
architecture, then each arch would need to implement this selector
function. Even worse then if we have a split where some implementations
live in drivers/cpufreq (e.g. intel_pstate) and others in arch/arm and
others in arch/arm64 ... now we have three selectors.

Note that this has nothing to do with cpu microarch invariance. I'm
happy for that to stay in arch code because we can have heterogeneous
cpus that do not scale frequency, and thus would not enable cpufreq.
But if your platform scales cpu frequency, then really cpufreq should be
in the loop.

> 
> In EAS RFC5.2 (linux-arm.org/linux-power.git energy_model_rfc_v5.2 ,
> which hasn't been posted to LKML) we establish the link in the ARCH code
> (arch/arm64/include/asm/topology.h).

Right, sorry again about preemptively posting the patch. Total brainfart
on my part.

> 
> #ifdef CONFIG_CPU_FREQ
> #define arch_scale_freq_capacity cpufreq_scale_freq_capacity
> ...
> +#endif

The above is no longer necessary with this patch. Same question as
above: why insist on the arch boilerplate?

Regards,
Mike

> 
> > 
> > Signed-off-by: Michael Turquette 
> > ---
> >  kernel/sched/sched.h | 16 +++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 469d11d..37502ea 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1368,7 +1368,21 @@ static inline int hrtick_enabled(struct rq *rq)
> >  #ifdef CONFIG_SMP
> >  extern void sched_avg_update(struct rq *rq);
> >  
> > -#ifndef arch_scale_freq_capacity
> > +/*
> > + * arch_scale_freq_capacity can be implemented by cpufreq, platform code or
> > + * arch code. We select the cpufreq-provided implementation first. If it
> > + * doesn't exist then we default to any other implementation provided from
> > + * platform/arch code. If those do not exist then we use the default
> > + * SCHED_CAPACITY_SCALE value below.
> > + *
> > + * Note that if cpufreq drivers or platform/arch code have competing
> > + * implementations it is up to those subsystems to select one at runtime 
> > with
> > + * an efficient solution, as we cannot tolerate the overhead of indirect
> > + * functions (e.g. function pointers) in the scheduler fast path
> > + */
> > +#ifdef CONFIG_CPU_FREQ
> > +#define arch_scale_freq_capacity cpufreq_scale_freq_capacity
> > +#elif !defined(arch_scale_freq_capacity)
> >  static __always_inline
> >  unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
> >  {
> >

Re: [PATCH 2/2] nouveau: use new vga_switcheroo power domain.

2016-03-15 Thread Lukas Wunner

Hi Dave,

On Thu, Mar 10, 2016 at 08:04:26AM +1000, Dave Airlie wrote:
> On 10 March 2016 at 00:40, Lukas Wunner  wrote:
> > On Wed, Mar 09, 2016 at 04:14:05PM +1000, Dave Airlie wrote:
> >> From: Dave Airlie 
> >>
> >> This fixes GPU auto powerdown on the Lenovo W541,
> >> since we advertise Windows 2013 to the ACPI layer.
> >>
> >> Signed-off-by: Dave Airlie 
> >> ---
> >>  drivers/gpu/drm/nouveau/nouveau_vga.c | 10 +++---
> >>  1 file changed, 7 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_vga.c 
> >> b/drivers/gpu/drm/nouveau/nouveau_vga.c
> >> index af89c36..b987427f 100644
> >> --- a/drivers/gpu/drm/nouveau/nouveau_vga.c
> >> +++ b/drivers/gpu/drm/nouveau/nouveau_vga.c
> >> @@ -101,8 +101,12 @@ nouveau_vga_init(struct nouveau_drm *drm)
> >>   runtime = true;
> >>   vga_switcheroo_register_client(dev->pdev, _switcheroo_ops, 
> >> runtime);
> >>
> >> - if (runtime && nouveau_is_v1_dsm() && !nouveau_is_optimus())
> >> - vga_switcheroo_init_domain_pm_ops(drm->dev->dev, 
> >> >vga_pm_domain);
> >> + if (runtime) {
> >> + if (nouveau_is_v1_dsm() && !nouveau_is_optimus())
> >
> > The " && !nouveau_is_optimus()" can be dropped because a machine cannot
> > have both. Note the "else" in nouveau_dsm_detect():
> 
> I'm pretty sure I've seen a machine with both in my past, back in the
> Vista/Win7 crossover days.

Yes, but the code in nouveau_dsm_detect() is such that you'll never have
both nouveau_is_v1_dsm() and nouveau_is_optimus() return true.

So you can drop the " && !nouveau_is_optimus()".

Best regards,

Lukas

> 
> > You're calling this unconditionally for all Optimus machines yet
> > I assume pre Windows 10 machines lack the PR3 hooks.
> >
> 
> Yes and I've confirmed on my older machine that nothing bad happens
> doing it unconditionally,
> and I couldn't find any bits in the _DSM flags to tell me if I should
> do something different.
> 
> Dave.

Re: [PATCH 8/8] sched: prefer cpufreq_scale_freq_capacity

2016-03-15 Thread Michael Turquette

Quoting Dietmar Eggemann (2016-03-15 12:13:58)
> On 14/03/16 05:22, Michael Turquette wrote:
> > arch_scale_freq_capacity is weird. It specifies an arch hook for an
> > implementation that could easily vary within an architecture or even a
> > chip family.
> > 
> > This patch helps to mitigate this weirdness by defaulting to the
> > cpufreq-provided implementation, which should work for all cases where
> > CONFIG_CPU_FREQ is set.
> > 
> > If CONFIG_CPU_FREQ is not set, then try to use an implementation
> > provided by the architecture. Failing that, fall back to
> > SCHED_CAPACITY_SCALE.
> > 
> > It may be desirable for cpufreq drivers to specify their own
> > implementation of arch_scale_freq_capacity in the future. The same is
> > true for platform code within an architecture. In both cases an
> > efficient implementation selector will need to be created and this patch
> > adds a comment to that effect.
> 
> For me this independence of the scheduler code towards the actual
> implementation of the Frequency Invariant Engine (FEI) was actually a
> feature.

I do not agree that it is a strength; I think it is confusing. My
opinion is that cpufreq drivers should implement
arch_scale_freq_capacity. Having a sane fallback
(cpufreq_scale_freq_capacity) simply means that you can remove the
boilerplate from the arm32 and arm64 code, which is a win.

Furthermore, if we have multiple competing implementations of
arch_scale_freq_invariance, wouldn't it be better for all of them to
live in cpufreq drivers? This means we would only need to implement a
single run-time "selector".

On the other hand, if the implementation lives in arch code and we have
various implementations of arch_scale_freq_capacity within an
architecture, then each arch would need to implement this selector
function. Even worse then if we have a split where some implementations
live in drivers/cpufreq (e.g. intel_pstate) and others in arch/arm and
others in arch/arm64 ... now we have three selectors.

Note that this has nothing to do with cpu microarch invariance. I'm
happy for that to stay in arch code because we can have heterogeneous
cpus that do not scale frequency, and thus would not enable cpufreq.
But if your platform scales cpu frequency, then really cpufreq should be
in the loop.

> 
> In EAS RFC5.2 (linux-arm.org/linux-power.git energy_model_rfc_v5.2 ,
> which hasn't been posted to LKML) we establish the link in the ARCH code
> (arch/arm64/include/asm/topology.h).

Right, sorry again about preemptively posting the patch. Total brainfart
on my part.

> 
> #ifdef CONFIG_CPU_FREQ
> #define arch_scale_freq_capacity cpufreq_scale_freq_capacity
> ...
> +#endif

The above is no longer necessary with this patch. Same question as
above: why insist on the arch boilerplate?

Regards,
Mike

> 
> > 
> > Signed-off-by: Michael Turquette 
> > ---
> >  kernel/sched/sched.h | 16 +++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 469d11d..37502ea 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1368,7 +1368,21 @@ static inline int hrtick_enabled(struct rq *rq)
> >  #ifdef CONFIG_SMP
> >  extern void sched_avg_update(struct rq *rq);
> >  
> > -#ifndef arch_scale_freq_capacity
> > +/*
> > + * arch_scale_freq_capacity can be implemented by cpufreq, platform code or
> > + * arch code. We select the cpufreq-provided implementation first. If it
> > + * doesn't exist then we default to any other implementation provided from
> > + * platform/arch code. If those do not exist then we use the default
> > + * SCHED_CAPACITY_SCALE value below.
> > + *
> > + * Note that if cpufreq drivers or platform/arch code have competing
> > + * implementations it is up to those subsystems to select one at runtime 
> > with
> > + * an efficient solution, as we cannot tolerate the overhead of indirect
> > + * functions (e.g. function pointers) in the scheduler fast path
> > + */
> > +#ifdef CONFIG_CPU_FREQ
> > +#define arch_scale_freq_capacity cpufreq_scale_freq_capacity
> > +#elif !defined(arch_scale_freq_capacity)
> >  static __always_inline
> >  unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
> >  {
> >

Re: [PATCH 2/2] nouveau: use new vga_switcheroo power domain.

2016-03-15 Thread Lukas Wunner

Hi Dave,

On Thu, Mar 10, 2016 at 08:04:26AM +1000, Dave Airlie wrote:
> On 10 March 2016 at 00:40, Lukas Wunner  wrote:
> > On Wed, Mar 09, 2016 at 04:14:05PM +1000, Dave Airlie wrote:
> >> From: Dave Airlie 
> >>
> >> This fixes GPU auto powerdown on the Lenovo W541,
> >> since we advertise Windows 2013 to the ACPI layer.
> >>
> >> Signed-off-by: Dave Airlie 
> >> ---
> >>  drivers/gpu/drm/nouveau/nouveau_vga.c | 10 +++---
> >>  1 file changed, 7 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_vga.c 
> >> b/drivers/gpu/drm/nouveau/nouveau_vga.c
> >> index af89c36..b987427f 100644
> >> --- a/drivers/gpu/drm/nouveau/nouveau_vga.c
> >> +++ b/drivers/gpu/drm/nouveau/nouveau_vga.c
> >> @@ -101,8 +101,12 @@ nouveau_vga_init(struct nouveau_drm *drm)
> >>   runtime = true;
> >>   vga_switcheroo_register_client(dev->pdev, _switcheroo_ops, 
> >> runtime);
> >>
> >> - if (runtime && nouveau_is_v1_dsm() && !nouveau_is_optimus())
> >> - vga_switcheroo_init_domain_pm_ops(drm->dev->dev, 
> >> >vga_pm_domain);
> >> + if (runtime) {
> >> + if (nouveau_is_v1_dsm() && !nouveau_is_optimus())
> >
> > The " && !nouveau_is_optimus()" can be dropped because a machine cannot
> > have both. Note the "else" in nouveau_dsm_detect():
> 
> I'm pretty sure I've seen a machine with both in my past, back in the
> Vista/Win7 crossover days.

Yes, but the code in nouveau_dsm_detect() is such that you'll never have
both nouveau_is_v1_dsm() and nouveau_is_optimus() return true.

So you can drop the " && !nouveau_is_optimus()".

Best regards,

Lukas

> 
> > You're calling this unconditionally for all Optimus machines yet
> > I assume pre Windows 10 machines lack the PR3 hooks.
> >
> 
> Yes and I've confirmed on my older machine that nothing bad happens
> doing it unconditionally,
> and I couldn't find any bits in the _DSM flags to tell me if I should
> do something different.
> 
> Dave.

Re: [PATCH] kbuild: drop FORCE from PHONY targets

2016-03-15 Thread Michal Marek

Dne 15.3.2016 v 19:27 Andy Lutomirski napsal(a):
> Fair enough, although I'm curious why this happens.  It might be worth
> changing the docs to say that .PHONY is *not* an substitute for FORCE
> in that context, then.

These two are unrelated, except that FORCE is redundant for a .PHONY
target. FORCE is our idiom to tell make to always remake the target and
let us handle the dependencies manually. Listing a target as .PHONY
tells make that the target will not produce a file of the same name
(typically, "all", "install", etc).

Michal

Re: [PATCH] kbuild: drop FORCE from PHONY targets

2016-03-15 Thread Michal Marek

Dne 15.3.2016 v 19:27 Andy Lutomirski napsal(a):
> Fair enough, although I'm curious why this happens.  It might be worth
> changing the docs to say that .PHONY is *not* an substitute for FORCE
> in that context, then.

These two are unrelated, except that FORCE is redundant for a .PHONY
target. FORCE is our idiom to tell make to always remake the target and
let us handle the dependencies manually. Listing a target as .PHONY
tells make that the target will not produce a file of the same name
(typically, "all", "install", etc).

Michal

Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

2016-03-15 Thread Linus Torvalds

On Tue, Mar 15, 2016 at 1:14 PM, Dave Chinner  wrote:
>
> Root can still change the group id of a file that has exposed stale
> data and hence make it visible outside of the group based
> containment wall.

Ok, Dave, now you're just being ridiculous.

The issue has never been - and *should* never be - that stale data
cannot get out.

The only issue is that we shouldn't make it ridiculously easy to make
silly mistakes.

There's no "group based containment wall" that is some kind of
absolute protection border.

Put another way: this is not about theoretical leaks - because those
are totally irrelevant (in theory, the original discard writer had
access to all that stale data anyway). This is about making it a
practical interface that doesn't have serious hidden gotchas.

So stop making silly theoretical arguments that make no sense.

We should make sure that we have _practical_ rules that are sensible,
but also not painful enough for the people who want to use this in
_practice_.

Reality trumps everything else.

If google is already using this kind of interface, then that is
_reality_. Take that into account.

 Linus

Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

2016-03-15 Thread Linus Torvalds

On Tue, Mar 15, 2016 at 1:14 PM, Dave Chinner  wrote:
>
> Root can still change the group id of a file that has exposed stale
> data and hence make it visible outside of the group based
> containment wall.

Ok, Dave, now you're just being ridiculous.

The issue has never been - and *should* never be - that stale data
cannot get out.

The only issue is that we shouldn't make it ridiculously easy to make
silly mistakes.

There's no "group based containment wall" that is some kind of
absolute protection border.

Put another way: this is not about theoretical leaks - because those
are totally irrelevant (in theory, the original discard writer had
access to all that stale data anyway). This is about making it a
practical interface that doesn't have serious hidden gotchas.

So stop making silly theoretical arguments that make no sense.

We should make sure that we have _practical_ rules that are sensible,
but also not painful enough for the people who want to use this in
_practice_.

Reality trumps everything else.

If google is already using this kind of interface, then that is
_reality_. Take that into account.

 Linus

Re: [PATCH v18 00/22] Richacls (Core and Ext4)

2016-03-15 Thread Volker Lendecke

On Tue, Mar 15, 2016 at 08:45:14AM -0700, Jeremy Allison wrote:
> On Tue, Mar 15, 2016 at 12:11:03AM -0700, Christoph Hellwig wrote:
> > People have long learned that we only have 'alloc' permissions.  Any
> > model that mixes allow and deny ACE is a mistake.
> 
> People can also learn and change though :-). One of the
> biggest complaints people deploying Samba on Linux have is the
> incompatible ACL models.

Just to confirm: I see this a lot in the field. NFSv4 ACLs, while not a
perfect match for NTFS ACLs are a lot closer much more usable to people
who want to serve Windows clients.

Also in the pure linux world there is a lot that you can not express
with just rwx, sgid, sticky bits and friends. If you want the additional
functionality of the richacl bits, I would call it a big mistake to
omit negative aces, if just for the reason not to create yet another
ACLs flavor.

> Whilst I have sympathy with your intense dislike of the
> Windows ACL model, this comes down to the core of "who
> do we serve ?"

The world has enough confusion around ACL semanics, please do not add
more to it by creating your own model of the day.

Volker

Re: [PATCH v18 00/22] Richacls (Core and Ext4)

2016-03-15 Thread Volker Lendecke

On Tue, Mar 15, 2016 at 08:45:14AM -0700, Jeremy Allison wrote:
> On Tue, Mar 15, 2016 at 12:11:03AM -0700, Christoph Hellwig wrote:
> > People have long learned that we only have 'alloc' permissions.  Any
> > model that mixes allow and deny ACE is a mistake.
> 
> People can also learn and change though :-). One of the
> biggest complaints people deploying Samba on Linux have is the
> incompatible ACL models.

Just to confirm: I see this a lot in the field. NFSv4 ACLs, while not a
perfect match for NTFS ACLs are a lot closer much more usable to people
who want to serve Windows clients.

Also in the pure linux world there is a lot that you can not express
with just rwx, sgid, sticky bits and friends. If you want the additional
functionality of the richacl bits, I would call it a big mistake to
omit negative aces, if just for the reason not to create yet another
ACLs flavor.

> Whilst I have sympathy with your intense dislike of the
> Windows ACL model, this comes down to the core of "who
> do we serve ?"

The world has enough confusion around ACL semanics, please do not add
more to it by creating your own model of the day.

Volker

[PATCH] Staging: speakup: Clear hi font bit from attributes

2016-03-15 Thread Samuel Thibault

Previously, speakup would see the hi-font bit in attributes.
Since this bit has nothing to do with attributes, we need to clear it.

Signed-off-by: Samuel Thibault 

--- a/drivers/staging/speakup/main.c
+++ b/drivers/staging/speakup/main.c
@@ -267,7 +267,7 @@ static struct notifier_block vt_notifier
 static unsigned char get_attributes(struct vc_data *vc, u16 *pos)
 {
pos = screen_pos(vc, pos - (u16 *)vc->vc_origin, 1);
-   return (u_char) (scr_readw(pos) >> 8);
+   return (scr_readw(pos) & ~vc->vc_hi_font_mask) >> 8;
 }
 
 static void speakup_date(struct vc_data *vc)
@@ -477,8 +477,10 @@ static u16 get_char(struct vc_data *vc,
w = scr_readw(pos);
c = w & 0xff;
 
-   if (w & vc->vc_hi_font_mask)
+   if (w & vc->vc_hi_font_mask) {
+   w &= ~vc->vc_hi_font_mask;
c |= 0x100;
+   }
 
ch = inverse_translate(vc, c, 0);
*attribs = (w & 0xff00) >> 8;

[PATCH] regulator: Don't print error in devm_regulator_bulk_get() on -EPROBE_DEFER

2016-03-15 Thread Javier Martinez Canillas

The regulators may not be available just because their driver's probe was
not executed and the regulators were not registered yet. So don't print an
error in this case to avoid polluting the kernel log and confuse the users.

Signed-off-by: Javier Martinez Canillas 

---

 drivers/regulator/devres.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/devres.c b/drivers/regulator/devres.c
index 6ad8ab4c578d..3d023422e228 100644
--- a/drivers/regulator/devres.c
+++ b/drivers/regulator/devres.c
@@ -171,8 +171,9 @@ int devm_regulator_bulk_get(struct device *dev, int 
num_consumers,
NORMAL_GET);
if (IS_ERR(consumers[i].consumer)) {
ret = PTR_ERR(consumers[i].consumer);
-   dev_err(dev, "Failed to get supply '%s': %d\n",
-   consumers[i].supply, ret);
+   if (ret != -EPROBE_DEFER)
+   dev_err(dev, "Failed to get supply '%s': %d\n",
+   consumers[i].supply, ret);
consumers[i].consumer = NULL;
goto err;
}
-- 
2.5.0

[PATCH] Staging: speakup: Clear hi font bit from attributes

2016-03-15 Thread Samuel Thibault

Previously, speakup would see the hi-font bit in attributes.
Since this bit has nothing to do with attributes, we need to clear it.

Signed-off-by: Samuel Thibault 

--- a/drivers/staging/speakup/main.c
+++ b/drivers/staging/speakup/main.c
@@ -267,7 +267,7 @@ static struct notifier_block vt_notifier
 static unsigned char get_attributes(struct vc_data *vc, u16 *pos)
 {
pos = screen_pos(vc, pos - (u16 *)vc->vc_origin, 1);
-   return (u_char) (scr_readw(pos) >> 8);
+   return (scr_readw(pos) & ~vc->vc_hi_font_mask) >> 8;
 }
 
 static void speakup_date(struct vc_data *vc)
@@ -477,8 +477,10 @@ static u16 get_char(struct vc_data *vc,
w = scr_readw(pos);
c = w & 0xff;
 
-   if (w & vc->vc_hi_font_mask)
+   if (w & vc->vc_hi_font_mask) {
+   w &= ~vc->vc_hi_font_mask;
c |= 0x100;
+   }
 
ch = inverse_translate(vc, c, 0);
*attribs = (w & 0xff00) >> 8;

[PATCH] regulator: Don't print error in devm_regulator_bulk_get() on -EPROBE_DEFER

2016-03-15 Thread Javier Martinez Canillas

The regulators may not be available just because their driver's probe was
not executed and the regulators were not registered yet. So don't print an
error in this case to avoid polluting the kernel log and confuse the users.

Signed-off-by: Javier Martinez Canillas 

---

 drivers/regulator/devres.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/devres.c b/drivers/regulator/devres.c
index 6ad8ab4c578d..3d023422e228 100644
--- a/drivers/regulator/devres.c
+++ b/drivers/regulator/devres.c
@@ -171,8 +171,9 @@ int devm_regulator_bulk_get(struct device *dev, int 
num_consumers,
NORMAL_GET);
if (IS_ERR(consumers[i].consumer)) {
ret = PTR_ERR(consumers[i].consumer);
-   dev_err(dev, "Failed to get supply '%s': %d\n",
-   consumers[i].supply, ret);
+   if (ret != -EPROBE_DEFER)
+   dev_err(dev, "Failed to get supply '%s': %d\n",
+   consumers[i].supply, ret);
consumers[i].consumer = NULL;
goto err;
}
-- 
2.5.0

Re: [PATCH v6 5/8] kbuild: add fine grained build dependencies for exported symbols

2016-03-15 Thread Michal Marek

Dne 15.3.2016 v 21:33 Michal Marek napsal(a):
> Dne 14.3.2016 v 03:42 Nicolas Pitre napsal(a):
>> +# Filter out exported kernel symbol names from the preprocessor output.
>> +# See also __KSYM_DEPS__ in include/linux/export.h.
>> +# We disable the depfile generation here, so as not to overwrite the 
>> existing
>> +# depfile while fixdep is parsing it
>> +flags_nodeps = $(filter-out -Wp$(comma)-M%, $($(1)))
>> +ksym_dep_filter =   
>>  \
>> +case "$(1)" in   \
>> +cc_*_c) $(CPP) $(call flags_nodeps,c_flags) -D__KSYM_DEPS__ $< ;;\
>> +as_*_S) $(CPP) $(call flags_nodeps,a_flags) -D__KSYM_DEPS__ $< ;;\
>> +cpp_lds_S) : ;;  \
>> +*) echo "Don't know how to preprocess $(1)" >&2; false ;;\
>> +esac | sed -rn 's/^.*=== __KSYM_(.*) ===.*$$/KSYM_\1/p'
>> +
>> +cmd_and_fixdep =
>>  \
>> +$(echo-cmd) $(cmd_$(1)); \
>> +$(ksym_dep_filter) | \
>> +scripts/basic/fixdep -e $(depfile) $@ '$(make-cmd)'  \
>> +> $(dot-target).tmp; \
>> +rm -f $(depfile);\
>> +mv -f $(dot-target).tmp $(dot-target).cmd;
>> +
>> +endif
> 
> Not sure what happened this time, but I got
> 
> drivers/md/.dm-round-robin.mod.o.cmd:5: *** unterminated call to
> function 'wildcard': missing ')'.  Stop.

Forgot to add: This was an allmodconfig build without
CONFIG_TRIM_UNUSED_SYMS.

Michal

Re: [PATCH v6 5/8] kbuild: add fine grained build dependencies for exported symbols

2016-03-15 Thread Michal Marek

Dne 15.3.2016 v 21:33 Michal Marek napsal(a):
> Dne 14.3.2016 v 03:42 Nicolas Pitre napsal(a):
>> +# Filter out exported kernel symbol names from the preprocessor output.
>> +# See also __KSYM_DEPS__ in include/linux/export.h.
>> +# We disable the depfile generation here, so as not to overwrite the 
>> existing
>> +# depfile while fixdep is parsing it
>> +flags_nodeps = $(filter-out -Wp$(comma)-M%, $($(1)))
>> +ksym_dep_filter =   
>>  \
>> +case "$(1)" in   \
>> +cc_*_c) $(CPP) $(call flags_nodeps,c_flags) -D__KSYM_DEPS__ $< ;;\
>> +as_*_S) $(CPP) $(call flags_nodeps,a_flags) -D__KSYM_DEPS__ $< ;;\
>> +cpp_lds_S) : ;;  \
>> +*) echo "Don't know how to preprocess $(1)" >&2; false ;;\
>> +esac | sed -rn 's/^.*=== __KSYM_(.*) ===.*$$/KSYM_\1/p'
>> +
>> +cmd_and_fixdep =
>>  \
>> +$(echo-cmd) $(cmd_$(1)); \
>> +$(ksym_dep_filter) | \
>> +scripts/basic/fixdep -e $(depfile) $@ '$(make-cmd)'  \
>> +> $(dot-target).tmp; \
>> +rm -f $(depfile);\
>> +mv -f $(dot-target).tmp $(dot-target).cmd;
>> +
>> +endif
> 
> Not sure what happened this time, but I got
> 
> drivers/md/.dm-round-robin.mod.o.cmd:5: *** unterminated call to
> function 'wildcard': missing ')'.  Stop.

Forgot to add: This was an allmodconfig build without
CONFIG_TRIM_UNUSED_SYMS.

Michal

Re: [PATCH v6 5/8] kbuild: add fine grained build dependencies for exported symbols

2016-03-15 Thread Michal Marek

Dne 14.3.2016 v 03:42 Nicolas Pitre napsal(a):
> +# Filter out exported kernel symbol names from the preprocessor output.
> +# See also __KSYM_DEPS__ in include/linux/export.h.
> +# We disable the depfile generation here, so as not to overwrite the existing
> +# depfile while fixdep is parsing it
> +flags_nodeps = $(filter-out -Wp$(comma)-M%, $($(1)))
> +ksym_dep_filter =
> \
> + case "$(1)" in   \
> + cc_*_c) $(CPP) $(call flags_nodeps,c_flags) -D__KSYM_DEPS__ $< ;;\
> + as_*_S) $(CPP) $(call flags_nodeps,a_flags) -D__KSYM_DEPS__ $< ;;\
> + cpp_lds_S) : ;;  \
> + *) echo "Don't know how to preprocess $(1)" >&2; false ;;\
> + esac | sed -rn 's/^.*=== __KSYM_(.*) ===.*$$/KSYM_\1/p'
> +
> +cmd_and_fixdep = 
> \
> + $(echo-cmd) $(cmd_$(1)); \
> + $(ksym_dep_filter) | \
> + scripts/basic/fixdep -e $(depfile) $@ '$(make-cmd)'  \
> + > $(dot-target).tmp; \
> + rm -f $(depfile);\
> + mv -f $(dot-target).tmp $(dot-target).cmd;
> +
> +endif

Not sure what happened this time, but I got

drivers/md/.dm-round-robin.mod.o.cmd:5: *** unterminated call to
function 'wildcard': missing ')'.  Stop.

The depfile is indeed corrupt (truncated):
tail drivers/md/.dm-round-robin.mod.o.cmd
  arch/x86/include/asm/disabled-features.h \
$(wildcard include/config/x86/intel/mpx.h) \
  arch/x86/include/asm/rmwcc.h \
  arch/x86/include/asm/barrier.h \
$(wildcard include/config/x86/ppro/fence.h) \
  arch/x86/include/asm/nops.h \
$(wildcard include/config/mk7.h) \
  include/asm-generic/barrier.h \
  include/asm-generic/bitops/find.h \
$(wildcard include/config/generic/

The file is exactly 8kB long:
$ ls -l drivers/md/.dm-round-robin.mod.o.cmd
-rw-r--r-- 1 mmarek users 8192 Mar  8 13:33
drivers/md/.dm-round-robin.mod.o.cmd

Michal

Re: [PATCH v6 5/8] kbuild: add fine grained build dependencies for exported symbols

2016-03-15 Thread Michal Marek

Dne 14.3.2016 v 03:42 Nicolas Pitre napsal(a):
> +# Filter out exported kernel symbol names from the preprocessor output.
> +# See also __KSYM_DEPS__ in include/linux/export.h.
> +# We disable the depfile generation here, so as not to overwrite the existing
> +# depfile while fixdep is parsing it
> +flags_nodeps = $(filter-out -Wp$(comma)-M%, $($(1)))
> +ksym_dep_filter =
> \
> + case "$(1)" in   \
> + cc_*_c) $(CPP) $(call flags_nodeps,c_flags) -D__KSYM_DEPS__ $< ;;\
> + as_*_S) $(CPP) $(call flags_nodeps,a_flags) -D__KSYM_DEPS__ $< ;;\
> + cpp_lds_S) : ;;  \
> + *) echo "Don't know how to preprocess $(1)" >&2; false ;;\
> + esac | sed -rn 's/^.*=== __KSYM_(.*) ===.*$$/KSYM_\1/p'
> +
> +cmd_and_fixdep = 
> \
> + $(echo-cmd) $(cmd_$(1)); \
> + $(ksym_dep_filter) | \
> + scripts/basic/fixdep -e $(depfile) $@ '$(make-cmd)'  \
> + > $(dot-target).tmp; \
> + rm -f $(depfile);\
> + mv -f $(dot-target).tmp $(dot-target).cmd;
> +
> +endif

Not sure what happened this time, but I got

drivers/md/.dm-round-robin.mod.o.cmd:5: *** unterminated call to
function 'wildcard': missing ')'.  Stop.

The depfile is indeed corrupt (truncated):
tail drivers/md/.dm-round-robin.mod.o.cmd
  arch/x86/include/asm/disabled-features.h \
$(wildcard include/config/x86/intel/mpx.h) \
  arch/x86/include/asm/rmwcc.h \
  arch/x86/include/asm/barrier.h \
$(wildcard include/config/x86/ppro/fence.h) \
  arch/x86/include/asm/nops.h \
$(wildcard include/config/mk7.h) \
  include/asm-generic/barrier.h \
  include/asm-generic/bitops/find.h \
$(wildcard include/config/generic/

The file is exactly 8kB long:
$ ls -l drivers/md/.dm-round-robin.mod.o.cmd
-rw-r--r-- 1 mmarek users 8192 Mar  8 13:33
drivers/md/.dm-round-robin.mod.o.cmd

Michal

[RESEND PATCH v2] ARM64: ACPI: Update documentation for latest specification version

2016-03-15 Thread Al Stone

The ACPI 6.1 specification was recently released at the end of January 2016,
but the arm64 kernel documentation for the use of ACPI was written for the
5.1 version of the spec.  There were significant additions to the spec that
had not yet been mentioned -- for example, the 6.0 mechanisms added to make
it easier to define processors and low power idle states, as well as the
6.1 addition allowing regular interrupts (not just from GPIO) be used to
signal ACPI general purpose events.

This patch reflects going back through and examining the specs in detail
and updating content appropriately.  Whilst there, a few odds and ends of
typos were caught as well.  This brings the documentation up to date with
ACPI 6.1 for arm64.

RESEND:
   -- Corrected From: header and added missing Cc's

Changes for v2:
   -- Clean up white space (Harb Abdulhahmid)
   -- Clarification on _CCA usage (Harb Abdulhamid)
   -- IORT moved to required from recommended (Hanjun Guo)
   -- Clarify IORT description (Hanjun Guo)

Signed-off-by: Al Stone 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Jonathan Corbet 
---
 Documentation/arm64/acpi_object_usage.txt | 445 ++
 Documentation/arm64/arm-acpi.txt  |  28 +-
 2 files changed, 356 insertions(+), 117 deletions(-)

diff --git a/Documentation/arm64/acpi_object_usage.txt 
b/Documentation/arm64/acpi_object_usage.txt
index a6e1a18..29bc1a1 100644
--- a/Documentation/arm64/acpi_object_usage.txt
+++ b/Documentation/arm64/acpi_object_usage.txt
@@ -11,15 +11,16 @@ outside of the UEFI Forum (see Section 5.2.6 of the 
specification).
 
 For ACPI on arm64, tables also fall into the following categories:
 
-   -- Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT
+   -- Required: DSDT, FADT, GTDT, IORT, MADT, MCFG, RSDP, SPCR, XSDT
 
-   -- Recommended: BERT, EINJ, ERST, HEST, SSDT
+   -- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
 
-   -- Optional: BGRT, CPEP, CSRT, DRTM, ECDT, FACS, FPDT, MCHI, MPST,
-  MSCT, RASF, SBST, SLIT, SPMI, SRAT, TCPA, TPM2, UEFI
+   -- Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, MCHI,
+  MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO, TCPA,
+  TPM2, UEFI, XENV
 
-   -- Not supported: BOOT, DBG2, DBGP, DMAR, ETDT, HPET, IBFT, IVRS,
-  LPIT, MSDM, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
+   -- Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT, 
+  MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
 
 
 Table  Usage for ARMv8 Linux
@@ -50,7 +51,8 @@ CSRT   Signature Reserved (signature == "CSRT")
 
 DBG2   Signature Reserved (signature == "DBG2")
== DeBuG port table 2 ==
-   Microsoft only table, will not be supported.
+   License has changed and should be usable.  Optional if used instead
+   of earlycon= on the command line.
 
 DBGP   Signature Reserved (signature == "DBGP")
== DeBuG Port table ==
@@ -133,10 +135,11 @@ GTDT   Section 5.2.24 (signature == "GTDT")
 
 HEST   Section 18.3.2 (signature == "HEST")
== Hardware Error Source Table ==
-   Until further error source types are defined, use only types 6 (AER
-   Root Port), 7 (AER Endpoint), 8 (AER Bridge), or 9 (Generic Hardware
-   Error Source).  Firmware first error handling is possible if and only
-   if Trusted Firmware is being used on arm64.
+   ARM-specific error sources have been defined; please use those or the
+   PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER 
+   Bridge), or use type 9 (Generic Hardware Error Source).  Firmware first
+   error handling is possible if and only if Trusted Firmware is being 
+   used on arm64.
 
Must be supplied if RAS support is provided by the platform.  It
is recommended this table be supplied.
@@ -149,20 +152,27 @@ IBFT   Signature Reserved (signature == "IBFT")
== iSCSI Boot Firmware Table ==
Microsoft defined table, support TBD.
 
+IORT   Signature Reserved (signature == "IORT")
+   == Input Output Remapping Table ==
+   arm64 only table, required in order to describe IO topology, SMMUs,
+   and GIC ITSs, and how those various components are connected together, 
+   such as identifying which components are behind which SMMUs/ITSs.
+
 IVRS   Signature Reserved (signature == "IVRS")
== I/O Virtualization Reporting Structure ==
x86_64 (AMD) only table, will not be supported.
 
 LPIT   Signature Reserved (signature == "LPIT")
== Low Power Idle Table ==
-   x86 only table as of ACPI 5.1; future versions have been adapted for
-   use with ARM and will be recommended in order to support ACPI power
-   management.
+   x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor
+   descriptions and power states on ARM platforms should use the DSDT
+   and

[RESEND PATCH v2] ARM64: ACPI: Update documentation for latest specification version

2016-03-15 Thread Al Stone

The ACPI 6.1 specification was recently released at the end of January 2016,
but the arm64 kernel documentation for the use of ACPI was written for the
5.1 version of the spec.  There were significant additions to the spec that
had not yet been mentioned -- for example, the 6.0 mechanisms added to make
it easier to define processors and low power idle states, as well as the
6.1 addition allowing regular interrupts (not just from GPIO) be used to
signal ACPI general purpose events.

This patch reflects going back through and examining the specs in detail
and updating content appropriately.  Whilst there, a few odds and ends of
typos were caught as well.  This brings the documentation up to date with
ACPI 6.1 for arm64.

RESEND:
   -- Corrected From: header and added missing Cc's

Changes for v2:
   -- Clean up white space (Harb Abdulhahmid)
   -- Clarification on _CCA usage (Harb Abdulhamid)
   -- IORT moved to required from recommended (Hanjun Guo)
   -- Clarify IORT description (Hanjun Guo)

Signed-off-by: Al Stone 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Jonathan Corbet 
---
 Documentation/arm64/acpi_object_usage.txt | 445 ++
 Documentation/arm64/arm-acpi.txt  |  28 +-
 2 files changed, 356 insertions(+), 117 deletions(-)

diff --git a/Documentation/arm64/acpi_object_usage.txt 
b/Documentation/arm64/acpi_object_usage.txt
index a6e1a18..29bc1a1 100644
--- a/Documentation/arm64/acpi_object_usage.txt
+++ b/Documentation/arm64/acpi_object_usage.txt
@@ -11,15 +11,16 @@ outside of the UEFI Forum (see Section 5.2.6 of the 
specification).
 
 For ACPI on arm64, tables also fall into the following categories:
 
-   -- Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT
+   -- Required: DSDT, FADT, GTDT, IORT, MADT, MCFG, RSDP, SPCR, XSDT
 
-   -- Recommended: BERT, EINJ, ERST, HEST, SSDT
+   -- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
 
-   -- Optional: BGRT, CPEP, CSRT, DRTM, ECDT, FACS, FPDT, MCHI, MPST,
-  MSCT, RASF, SBST, SLIT, SPMI, SRAT, TCPA, TPM2, UEFI
+   -- Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, MCHI,
+  MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO, TCPA,
+  TPM2, UEFI, XENV
 
-   -- Not supported: BOOT, DBG2, DBGP, DMAR, ETDT, HPET, IBFT, IVRS,
-  LPIT, MSDM, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
+   -- Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT, 
+  MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
 
 
 Table  Usage for ARMv8 Linux
@@ -50,7 +51,8 @@ CSRT   Signature Reserved (signature == "CSRT")
 
 DBG2   Signature Reserved (signature == "DBG2")
== DeBuG port table 2 ==
-   Microsoft only table, will not be supported.
+   License has changed and should be usable.  Optional if used instead
+   of earlycon= on the command line.
 
 DBGP   Signature Reserved (signature == "DBGP")
== DeBuG Port table ==
@@ -133,10 +135,11 @@ GTDT   Section 5.2.24 (signature == "GTDT")
 
 HEST   Section 18.3.2 (signature == "HEST")
== Hardware Error Source Table ==
-   Until further error source types are defined, use only types 6 (AER
-   Root Port), 7 (AER Endpoint), 8 (AER Bridge), or 9 (Generic Hardware
-   Error Source).  Firmware first error handling is possible if and only
-   if Trusted Firmware is being used on arm64.
+   ARM-specific error sources have been defined; please use those or the
+   PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER 
+   Bridge), or use type 9 (Generic Hardware Error Source).  Firmware first
+   error handling is possible if and only if Trusted Firmware is being 
+   used on arm64.
 
Must be supplied if RAS support is provided by the platform.  It
is recommended this table be supplied.
@@ -149,20 +152,27 @@ IBFT   Signature Reserved (signature == "IBFT")
== iSCSI Boot Firmware Table ==
Microsoft defined table, support TBD.
 
+IORT   Signature Reserved (signature == "IORT")
+   == Input Output Remapping Table ==
+   arm64 only table, required in order to describe IO topology, SMMUs,
+   and GIC ITSs, and how those various components are connected together, 
+   such as identifying which components are behind which SMMUs/ITSs.
+
 IVRS   Signature Reserved (signature == "IVRS")
== I/O Virtualization Reporting Structure ==
x86_64 (AMD) only table, will not be supported.
 
 LPIT   Signature Reserved (signature == "LPIT")
== Low Power Idle Table ==
-   x86 only table as of ACPI 5.1; future versions have been adapted for
-   use with ARM and will be recommended in order to support ACPI power
-   management.
+   x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor
+   descriptions and power states on ARM platforms should use the DSDT
+   and define processor container devices (_HID ACPI0010, Section 8.4,
+   and more

Re: [PATCH 7/8] cpufreq: Frequency invariant scheduler load-tracking support

2016-03-15 Thread Michael Turquette

Quoting Dietmar Eggemann (2016-03-15 12:13:46)
> Hi Mike,
> 
> On 14/03/16 05:22, Michael Turquette wrote:
> > From: Dietmar Eggemann 
> > 
> > Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
> > frequency scaling correction factor for more accurate load-tracking.
> > 
> > The factor is:
> > 
> >   current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
> > 
> > In fact, freq_scale should be a struct cpufreq_policy data member. But
> > this would require that the scheduler hot path (__update_load_avg()) would
> > have to grab the cpufreq lock. This can be avoided by using per-cpu data
> > initialized to SCHED_CAPACITY_SCALE for freq_scale.
> > 
> > Signed-off-by: Dietmar Eggemann 
> > Signed-off-by: Michael Turquette 
> > ---
> > I'm not as sure about patches 7 & 8, but I included them since I needed
> > frequency invariance while testing.
> > 
> > As mentioned by myself in 2014 and Rafael last month, the
> > arch_scale_freq_capacity hook is awkward, because this behavior may vary
> > within an architecture.
> > 
> > I re-introduce Dietmar's generic cpufreq implementation of the frequency
> > invariance hook in this patch,  and change the preprocessor magic in
> > sched.h to favor the cpufreq implementation over arch- or
> > platform-specific ones in the next patch.
> 
> Maybe it is worth mentioning that this patch is from EAS RFC5.2
> (linux-arm.org/linux-power.git energy_model_rfc_v5.2) which hasn't been
> posted to LKML. The last EAS RFCv5 has the Frequency Invariant Engine
> (FEI) based on the cpufreq notifier calls (cpufreq_callback,
> cpufreq_policy_callback) in the ARM arch code.

Oops, my apologies. I got a little mixed up while developing these
patches and I should have at least asked you about this one before
posting.

I'm really quite happy to drop #7 and #8 if they are too contentious or
if patch #7 is deemed as not-ready by you.

> 
> > If run-time selection of ops is needed them someone will need to write
> > that code.
> 
> Right now I see 3 different implementations of the FEI. 1) The X86
> aperf/mperf based one (https://lkml.org/lkml/2016/3/3/589), 2) This one
> in cpufreq.c and 3) the one based on cpufreq notifiers in ARCH (ARM,
> ARM64) code.
> 
> I guess with sched_util we do need a solution for all platforms
> (different archs, x86 w/ and w/o X86_FEATURE_APERFMPERF, ...).
> 
> > I think that this negates the need for the arm arch hooks[0-2], and
> > hopefully Morten and Dietmar can weigh in on this.
> 
> It's true that we tried to get rid of the usage of the cpufreq callbacks
> (cpufreq_callback, cpufreq_policy_callback) with this patch. Plus we
> didn't want to implement it twice (for ARM and ARM64).
> 
> But 2) would have to work for other ARCHs as well. Maybe as a fall-back
> for X86 w/o X86_FEATURE_APERFMPERF feature?

That's what I had in mind. I guess that some day there will be a need to
select implementations at run-time for both cpufreq (e.g. different
cpufreq drivers might implement arch_scale_freq_capacity) and for the
!CONFIG_CPU_FREQ case (e.g. different platforms might implement
arch_scale_freq_capcity within the same arch).

The cpufreq approach seems the most generic, hence patch #8 to make it
the default.

Regards,
Mike

> 
> [...]

Re: [PATCH 7/8] cpufreq: Frequency invariant scheduler load-tracking support

2016-03-15 Thread Michael Turquette

Quoting Dietmar Eggemann (2016-03-15 12:13:46)
> Hi Mike,
> 
> On 14/03/16 05:22, Michael Turquette wrote:
> > From: Dietmar Eggemann 
> > 
> > Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
> > frequency scaling correction factor for more accurate load-tracking.
> > 
> > The factor is:
> > 
> >   current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
> > 
> > In fact, freq_scale should be a struct cpufreq_policy data member. But
> > this would require that the scheduler hot path (__update_load_avg()) would
> > have to grab the cpufreq lock. This can be avoided by using per-cpu data
> > initialized to SCHED_CAPACITY_SCALE for freq_scale.
> > 
> > Signed-off-by: Dietmar Eggemann 
> > Signed-off-by: Michael Turquette 
> > ---
> > I'm not as sure about patches 7 & 8, but I included them since I needed
> > frequency invariance while testing.
> > 
> > As mentioned by myself in 2014 and Rafael last month, the
> > arch_scale_freq_capacity hook is awkward, because this behavior may vary
> > within an architecture.
> > 
> > I re-introduce Dietmar's generic cpufreq implementation of the frequency
> > invariance hook in this patch,  and change the preprocessor magic in
> > sched.h to favor the cpufreq implementation over arch- or
> > platform-specific ones in the next patch.
> 
> Maybe it is worth mentioning that this patch is from EAS RFC5.2
> (linux-arm.org/linux-power.git energy_model_rfc_v5.2) which hasn't been
> posted to LKML. The last EAS RFCv5 has the Frequency Invariant Engine
> (FEI) based on the cpufreq notifier calls (cpufreq_callback,
> cpufreq_policy_callback) in the ARM arch code.

Oops, my apologies. I got a little mixed up while developing these
patches and I should have at least asked you about this one before
posting.

I'm really quite happy to drop #7 and #8 if they are too contentious or
if patch #7 is deemed as not-ready by you.

> 
> > If run-time selection of ops is needed them someone will need to write
> > that code.
> 
> Right now I see 3 different implementations of the FEI. 1) The X86
> aperf/mperf based one (https://lkml.org/lkml/2016/3/3/589), 2) This one
> in cpufreq.c and 3) the one based on cpufreq notifiers in ARCH (ARM,
> ARM64) code.
> 
> I guess with sched_util we do need a solution for all platforms
> (different archs, x86 w/ and w/o X86_FEATURE_APERFMPERF, ...).
> 
> > I think that this negates the need for the arm arch hooks[0-2], and
> > hopefully Morten and Dietmar can weigh in on this.
> 
> It's true that we tried to get rid of the usage of the cpufreq callbacks
> (cpufreq_callback, cpufreq_policy_callback) with this patch. Plus we
> didn't want to implement it twice (for ARM and ARM64).
> 
> But 2) would have to work for other ARCHs as well. Maybe as a fall-back
> for X86 w/o X86_FEATURE_APERFMPERF feature?

That's what I had in mind. I guess that some day there will be a need to
select implementations at run-time for both cpufreq (e.g. different
cpufreq drivers might implement arch_scale_freq_capacity) and for the
!CONFIG_CPU_FREQ case (e.g. different platforms might implement
arch_scale_freq_capcity within the same arch).

The cpufreq approach seems the most generic, hence patch #8 to make it
the default.

Regards,
Mike

> 
> [...]

[PATCH] kvm-pr: manage illegal instructions

2016-03-15 Thread Laurent Vivier

While writing some instruction tests for kvm-unit-tests for powerpc,
I've found that illegal instructions are not managed correctly with kvm-pr,
while it is fine with kvm-hv.

When an illegal instruction (like ".long 0") is processed by kvm-pr,
the kernel logs are filled with:

 Couldn't emulate instruction 0x (op 0 xop 0)
 kvmppc_handle_exit_pr: emulation at 700 failed ()

While the exception handler receives an interrupt for each instruction
executed after the illegal instruction.

Signed-off-by: Laurent Vivier 
---
 arch/powerpc/kvm/book3s_emulate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 2afdb9c..4ee969d 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
switch (get_op(inst)) {
case 0:
-   emulated = EMULATE_FAIL;
if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
(inst == swab32(inst_sc))) {
/*
@@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED);
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
emulated = EMULATE_DONE;
+   } else {
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   emulated = EMULATE_AGAIN;
}
break;
case 19:
-- 
2.5.0

[PATCH] kvm-pr: manage illegal instructions

2016-03-15 Thread Laurent Vivier

While writing some instruction tests for kvm-unit-tests for powerpc,
I've found that illegal instructions are not managed correctly with kvm-pr,
while it is fine with kvm-hv.

When an illegal instruction (like ".long 0") is processed by kvm-pr,
the kernel logs are filled with:

 Couldn't emulate instruction 0x (op 0 xop 0)
 kvmppc_handle_exit_pr: emulation at 700 failed ()

While the exception handler receives an interrupt for each instruction
executed after the illegal instruction.

Signed-off-by: Laurent Vivier 
---
 arch/powerpc/kvm/book3s_emulate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 2afdb9c..4ee969d 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -99,7 +99,6 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
switch (get_op(inst)) {
case 0:
-   emulated = EMULATE_FAIL;
if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
(inst == swab32(inst_sc))) {
/*
@@ -112,6 +111,9 @@ int kvmppc_core_emulate_op_pr(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
kvmppc_set_gpr(vcpu, 3, EV_UNIMPLEMENTED);
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
emulated = EMULATE_DONE;
+   } else {
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   emulated = EMULATE_AGAIN;
}
break;
case 19:
-- 
2.5.0

Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

2016-03-15 Thread Dave Chinner

On Mon, Mar 14, 2016 at 10:46:03AM -0400, Theodore Ts'o wrote:
> On Mon, Mar 14, 2016 at 06:34:00AM -0400, Ric Wheeler wrote:
> > I think that once we enter this mode, the local file system has effectively
> > ceded its role to prevent stale data exposure to the upper layer. In effect,
> > this ceases to become a normal file system for any enabled process if we
> > control this through fallocate() or for all processes if we do the brute
> > force mount option that would be file system wide.
> 
> Or we do this via group id, such that we are ceding responsibility for
> proventing stale data exposure to the processes running under that
> group id.  That process has the responsibility for making sure that it
> doesn't return any data from that file unless it has been written, and
> also to make sure the permissions of that file are not readable by
> processes that aren't in that group.  (For example, owned by user
> ceph, group ceph, with premissions 640).

Root can still change the group id of a file that has exposed stale
data and hence make it visible outside of the group based
containment wall. i.e. external actors can still unintentionally
expose stale data, even though the application might be correctly
contained and safe.

What we are missing is actual numbers that show that exposing stale
data is a /significant/ win for these applications that are
demanding it. And then we need evidence proving that the problem is
actually systemic and not just a hack around a bad implementation of
a feature...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

2016-03-15 Thread Dave Chinner

On Mon, Mar 14, 2016 at 10:46:03AM -0400, Theodore Ts'o wrote:
> On Mon, Mar 14, 2016 at 06:34:00AM -0400, Ric Wheeler wrote:
> > I think that once we enter this mode, the local file system has effectively
> > ceded its role to prevent stale data exposure to the upper layer. In effect,
> > this ceases to become a normal file system for any enabled process if we
> > control this through fallocate() or for all processes if we do the brute
> > force mount option that would be file system wide.
> 
> Or we do this via group id, such that we are ceding responsibility for
> proventing stale data exposure to the processes running under that
> group id.  That process has the responsibility for making sure that it
> doesn't return any data from that file unless it has been written, and
> also to make sure the permissions of that file are not readable by
> processes that aren't in that group.  (For example, owned by user
> ceph, group ceph, with premissions 640).

Root can still change the group id of a file that has exposed stale
data and hence make it visible outside of the group based
containment wall. i.e. external actors can still unintentionally
expose stale data, even though the application might be correctly
contained and safe.

What we are missing is actual numbers that show that exposing stale
data is a /significant/ win for these applications that are
demanding it. And then we need evidence proving that the problem is
actually systemic and not just a hack around a bad implementation of
a feature...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH v5 1/4] i2c: add a protocol parameter to the alert callback

2016-03-15 Thread Benjamin Tissoires

On Mar 15 2016 or thereabouts, Guenter Roeck wrote:
> On Tue, Mar 15, 2016 at 03:53:41PM +0100, Benjamin Tissoires wrote:
> > .alert() is meant to be generic, but there is currently no way
> > for the device driver to know which protocol generated the alert.
> > Add a parameter in .alert() to help the device driver to understand
> > what is given in data.
> > 
> > This patch is required to have the support of SMBus Host Notify protocol
> > through .alert().
> > 
> > Signed-off-by: Benjamin Tissoires 
> > ---
> >  drivers/char/ipmi/ipmi_ssif.c | 6 +-
> >  drivers/hwmon/lm90.c  | 3 ++-
> >  drivers/i2c/i2c-smbus.c   | 3 ++-
> >  include/linux/i2c.h   | 7 ++-
> >  4 files changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
> > index 5f1c3d0..84d07bc 100644
> > --- a/drivers/char/ipmi/ipmi_ssif.c
> > +++ b/drivers/char/ipmi/ipmi_ssif.c
> > @@ -568,12 +568,16 @@ static void retry_timeout(unsigned long data)
> >  }
> >  
> >  
> > -static void ssif_alert(struct i2c_client *client, unsigned int data)
> > +static void ssif_alert(struct i2c_client *client,
> > +  enum i2c_alert_protocol protocol, unsigned int data)
> >  {
> > struct ssif_info *ssif_info = i2c_get_clientdata(client);
> > unsigned long oflags, *flags;
> > bool do_get = false;
> >  
> > +   if (protocol != I2C_PROTOCOL_SMBUS_ALERT)
> > +   return;
> > +
> > ssif_inc_stat(ssif_info, alerts);
> >  
> > flags = ipmi_ssif_lock_cond(ssif_info, );
> > diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
> > index c9ff08d..2b77dbd 100644
> > --- a/drivers/hwmon/lm90.c
> > +++ b/drivers/hwmon/lm90.c
> > @@ -1624,7 +1624,8 @@ static int lm90_remove(struct i2c_client *client)
> > return 0;
> >  }
> >  
> > -static void lm90_alert(struct i2c_client *client, unsigned int flag)
> > +static void lm90_alert(struct i2c_client *client, enum i2c_alert_protocol 
> > type,
> > +  unsigned int flag)
> >  {
> 
> For IPMI you added a check for the alert type. It would seem to be prudent to
> add one here as well, unless you expect the driver to be able to handle all
> alert types without modification. In that case, though, I would expect some
> note that this is the case.

Hi Guenter,

yes, this is a miss from my side. I'll update the series tomorrow.
Hopefully by then, the kbuild bot will find the remaining issues (2 in a
row is kind of shameful).

Cheers,
Benjamin

> 
> Thnaks,
> Guenter
> 
> > u16 alarms;
> >  
> > diff --git a/drivers/i2c/i2c-smbus.c b/drivers/i2c/i2c-smbus.c
> > index abb55d3..3b6765a 100644
> > --- a/drivers/i2c/i2c-smbus.c
> > +++ b/drivers/i2c/i2c-smbus.c
> > @@ -56,7 +56,8 @@ static int smbus_do_alert(struct device *dev, void *addrp)
> > if (client->dev.driver) {
> > driver = to_i2c_driver(client->dev.driver);
> > if (driver->alert)
> > -   driver->alert(client, data->flag);
> > +   driver->alert(client, I2C_PROTOCOL_SMBUS_ALERT,
> > + data->flag);
> > else
> > dev_warn(>dev, "no driver alert()!\n");
> > } else
> > diff --git a/include/linux/i2c.h b/include/linux/i2c.h
> > index 200cf13b..baae02a 100644
> > --- a/include/linux/i2c.h
> > +++ b/include/linux/i2c.h
> > @@ -126,6 +126,10 @@ i2c_smbus_read_i2c_block_data_or_emulated(const struct 
> > i2c_client *client,
> >   u8 command, u8 length, u8 *values);
> >  #endif /* I2C */
> >  
> > +enum i2c_alert_protocol {
> > +   I2C_PROTOCOL_SMBUS_ALERT,
> > +};
> > +
> >  /**
> >   * struct i2c_driver - represent an I2C device driver
> >   * @class: What kind of i2c device we instantiate (for detect)
> > @@ -181,7 +185,8 @@ struct i2c_driver {
> >  * For the SMBus alert protocol, there is a single bit of data passed
> >  * as the alert response's low bit ("event flag").
> >  */
> > -   void (*alert)(struct i2c_client *, unsigned int data);
> > +   void (*alert)(struct i2c_client *, enum i2c_alert_protocol protocol,
> > + unsigned int data);
> >  
> > /* a ioctl like command that can be used to perform specific functions
> >  * with the device.
> > -- 
> > 2.5.0
> >

Re: [PATCH v5 1/4] i2c: add a protocol parameter to the alert callback

2016-03-15 Thread Benjamin Tissoires

On Mar 15 2016 or thereabouts, Guenter Roeck wrote:
> On Tue, Mar 15, 2016 at 03:53:41PM +0100, Benjamin Tissoires wrote:
> > .alert() is meant to be generic, but there is currently no way
> > for the device driver to know which protocol generated the alert.
> > Add a parameter in .alert() to help the device driver to understand
> > what is given in data.
> > 
> > This patch is required to have the support of SMBus Host Notify protocol
> > through .alert().
> > 
> > Signed-off-by: Benjamin Tissoires 
> > ---
> >  drivers/char/ipmi/ipmi_ssif.c | 6 +-
> >  drivers/hwmon/lm90.c  | 3 ++-
> >  drivers/i2c/i2c-smbus.c   | 3 ++-
> >  include/linux/i2c.h   | 7 ++-
> >  4 files changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c
> > index 5f1c3d0..84d07bc 100644
> > --- a/drivers/char/ipmi/ipmi_ssif.c
> > +++ b/drivers/char/ipmi/ipmi_ssif.c
> > @@ -568,12 +568,16 @@ static void retry_timeout(unsigned long data)
> >  }
> >  
> >  
> > -static void ssif_alert(struct i2c_client *client, unsigned int data)
> > +static void ssif_alert(struct i2c_client *client,
> > +  enum i2c_alert_protocol protocol, unsigned int data)
> >  {
> > struct ssif_info *ssif_info = i2c_get_clientdata(client);
> > unsigned long oflags, *flags;
> > bool do_get = false;
> >  
> > +   if (protocol != I2C_PROTOCOL_SMBUS_ALERT)
> > +   return;
> > +
> > ssif_inc_stat(ssif_info, alerts);
> >  
> > flags = ipmi_ssif_lock_cond(ssif_info, );
> > diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
> > index c9ff08d..2b77dbd 100644
> > --- a/drivers/hwmon/lm90.c
> > +++ b/drivers/hwmon/lm90.c
> > @@ -1624,7 +1624,8 @@ static int lm90_remove(struct i2c_client *client)
> > return 0;
> >  }
> >  
> > -static void lm90_alert(struct i2c_client *client, unsigned int flag)
> > +static void lm90_alert(struct i2c_client *client, enum i2c_alert_protocol 
> > type,
> > +  unsigned int flag)
> >  {
> 
> For IPMI you added a check for the alert type. It would seem to be prudent to
> add one here as well, unless you expect the driver to be able to handle all
> alert types without modification. In that case, though, I would expect some
> note that this is the case.

Hi Guenter,

yes, this is a miss from my side. I'll update the series tomorrow.
Hopefully by then, the kbuild bot will find the remaining issues (2 in a
row is kind of shameful).

Cheers,
Benjamin

> 
> Thnaks,
> Guenter
> 
> > u16 alarms;
> >  
> > diff --git a/drivers/i2c/i2c-smbus.c b/drivers/i2c/i2c-smbus.c
> > index abb55d3..3b6765a 100644
> > --- a/drivers/i2c/i2c-smbus.c
> > +++ b/drivers/i2c/i2c-smbus.c
> > @@ -56,7 +56,8 @@ static int smbus_do_alert(struct device *dev, void *addrp)
> > if (client->dev.driver) {
> > driver = to_i2c_driver(client->dev.driver);
> > if (driver->alert)
> > -   driver->alert(client, data->flag);
> > +   driver->alert(client, I2C_PROTOCOL_SMBUS_ALERT,
> > + data->flag);
> > else
> > dev_warn(>dev, "no driver alert()!\n");
> > } else
> > diff --git a/include/linux/i2c.h b/include/linux/i2c.h
> > index 200cf13b..baae02a 100644
> > --- a/include/linux/i2c.h
> > +++ b/include/linux/i2c.h
> > @@ -126,6 +126,10 @@ i2c_smbus_read_i2c_block_data_or_emulated(const struct 
> > i2c_client *client,
> >   u8 command, u8 length, u8 *values);
> >  #endif /* I2C */
> >  
> > +enum i2c_alert_protocol {
> > +   I2C_PROTOCOL_SMBUS_ALERT,
> > +};
> > +
> >  /**
> >   * struct i2c_driver - represent an I2C device driver
> >   * @class: What kind of i2c device we instantiate (for detect)
> > @@ -181,7 +185,8 @@ struct i2c_driver {
> >  * For the SMBus alert protocol, there is a single bit of data passed
> >  * as the alert response's low bit ("event flag").
> >  */
> > -   void (*alert)(struct i2c_client *, unsigned int data);
> > +   void (*alert)(struct i2c_client *, enum i2c_alert_protocol protocol,
> > + unsigned int data);
> >  
> > /* a ioctl like command that can be used to perform specific functions
> >  * with the device.
> > -- 
> > 2.5.0
> >

Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events

2016-03-15 Thread Steven Rostedt

On Wed,  9 Mar 2016 21:46:41 +0100
Jiri Olsa  wrote:

> Currently we check sample type for ftrace:function event
> even if it's not created as sampling event. That prevents
> creating ftrace_function event in counting mode.
> 
> Making sure we check sample types only for sampling events.
> 
> Before:
>   $ sudo perf stat -e ftrace:function ls
>   ...
> 
>Performance counter stats for 'ls':
> 
>ftrace:function
> 
>  0.001983662 seconds time elapsed
> 
> After:
>   $ sudo perf stat -e ftrace:function ls

I'm assuming you gave yourself admin capabilities, and not any normal
user may sample function tracing, right?

-- Steve

Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events

2016-03-15 Thread Steven Rostedt

On Wed,  9 Mar 2016 21:46:41 +0100
Jiri Olsa  wrote:

> Currently we check sample type for ftrace:function event
> even if it's not created as sampling event. That prevents
> creating ftrace_function event in counting mode.
> 
> Making sure we check sample types only for sampling events.
> 
> Before:
>   $ sudo perf stat -e ftrace:function ls
>   ...
> 
>Performance counter stats for 'ls':
> 
>ftrace:function
> 
>  0.001983662 seconds time elapsed
> 
> After:
>   $ sudo perf stat -e ftrace:function ls

I'm assuming you gave yourself admin capabilities, and not any normal
user may sample function tracing, right?

-- Steve

[PATCH v2] devpts: Make ptmx be owned by the userns owner as a fallback

2016-03-15 Thread Andy Lutomirski

New devpts instances have ptmx owned by the inner uid and gid 0.

For container-style namespaces (LXC, etc), this should have no
effect, this is fine.

For sandbox-style namespaces (xdg-app and similar), this is
problematic -- there may not be an inner 0:0.  If that happens,
devpts mounts will fail.

Fix it by adding a fallback: if 0:0 is not mapped but the userns
owner and group are mapped, then ptmx will be owned by the namespace
owner.

This won't change behavior except in cases where mount would
currently return -EINVAL.

Cc: Alexander Larsson 
Cc: mcla...@redhat.com
Cc: "Eric W. Biederman" 
Cc: Linux Containers 
Signed-off-by: Andy Lutomirski 
---

Changes from v1:
 - Reversed the preference order (Serge)
 - Fixed misuse of uid_valid on userns->owner

fs/devpts/inode.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 655f21f99160..42b1e04d8334 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DEVPTS_DEFAULT_MODE 0600
 /*
@@ -247,13 +248,33 @@ static int mknod_ptmx(struct super_block *sb)
struct dentry *root = sb->s_root;
struct pts_fs_info *fsi = DEVPTS_SB(sb);
struct pts_mount_opts *opts = >mount_opts;
+   struct user_namespace *userns = current_user_ns();
kuid_t root_uid;
kgid_t root_gid;
 
-   root_uid = make_kuid(current_user_ns(), 0);
-   root_gid = make_kgid(current_user_ns(), 0);
-   if (!uid_valid(root_uid) || !gid_valid(root_gid))
-   return -EINVAL;
+   /*
+* For a new devpts instance, ptmx is owned by 0:0 if that uid
+* and gid are mapped in the creating namespace.
+*/
+   root_uid = make_kuid(userns, 0);
+   root_gid = make_kgid(userns, 0);
+
+   if (!uid_valid(root_uid) || !gid_valid(root_gid)) {
+   /*
+* If the creating namespace does not have 0:0 mapped
+* but does have the owner mapped (this is rare in
+* container-style namespaces but common in
+* sandbox-style namespaces), then let ptmx be owned by
+* the namespace owner.
+*/
+   root_uid = userns->owner;
+   root_gid = userns->group;
+
+   /* If this still doesn't work, give up. */
+   if (!kuid_has_mapping(userns, root_uid) ||
+   !kgid_has_mapping(userns, root_gid))
+   return -EINVAL;
+   }
 
inode_lock(d_inode(root));
 
-- 
2.5.0

[PATCH v2] devpts: Make ptmx be owned by the userns owner as a fallback

2016-03-15 Thread Andy Lutomirski

New devpts instances have ptmx owned by the inner uid and gid 0.

For container-style namespaces (LXC, etc), this should have no
effect, this is fine.

For sandbox-style namespaces (xdg-app and similar), this is
problematic -- there may not be an inner 0:0.  If that happens,
devpts mounts will fail.

Fix it by adding a fallback: if 0:0 is not mapped but the userns
owner and group are mapped, then ptmx will be owned by the namespace
owner.

This won't change behavior except in cases where mount would
currently return -EINVAL.

Cc: Alexander Larsson 
Cc: mcla...@redhat.com
Cc: "Eric W. Biederman" 
Cc: Linux Containers 
Signed-off-by: Andy Lutomirski 
---

Changes from v1:
 - Reversed the preference order (Serge)
 - Fixed misuse of uid_valid on userns->owner

fs/devpts/inode.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 655f21f99160..42b1e04d8334 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DEVPTS_DEFAULT_MODE 0600
 /*
@@ -247,13 +248,33 @@ static int mknod_ptmx(struct super_block *sb)
struct dentry *root = sb->s_root;
struct pts_fs_info *fsi = DEVPTS_SB(sb);
struct pts_mount_opts *opts = >mount_opts;
+   struct user_namespace *userns = current_user_ns();
kuid_t root_uid;
kgid_t root_gid;
 
-   root_uid = make_kuid(current_user_ns(), 0);
-   root_gid = make_kgid(current_user_ns(), 0);
-   if (!uid_valid(root_uid) || !gid_valid(root_gid))
-   return -EINVAL;
+   /*
+* For a new devpts instance, ptmx is owned by 0:0 if that uid
+* and gid are mapped in the creating namespace.
+*/
+   root_uid = make_kuid(userns, 0);
+   root_gid = make_kgid(userns, 0);
+
+   if (!uid_valid(root_uid) || !gid_valid(root_gid)) {
+   /*
+* If the creating namespace does not have 0:0 mapped
+* but does have the owner mapped (this is rare in
+* container-style namespaces but common in
+* sandbox-style namespaces), then let ptmx be owned by
+* the namespace owner.
+*/
+   root_uid = userns->owner;
+   root_gid = userns->group;
+
+   /* If this still doesn't work, give up. */
+   if (!kuid_has_mapping(userns, root_uid) ||
+   !kgid_has_mapping(userns, root_gid))
+   return -EINVAL;
+   }
 
inode_lock(d_inode(root));
 
-- 
2.5.0

Re: [PATCH V7 03/12] thermal: tegra: get rid of PDIV/HOTSPOT hack

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 02:21:53PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 04:05, Eduardo Valentin wrote:
> > * PGP Signed by an unknown key
> > 
> > On Fri, Mar 11, 2016 at 11:09:14AM +0800, Wei Ni wrote:
> >> Get rid of T124-specific PDIV/HOTSPOT hack.
> >> tegra-soctherm.c contained a hack to set the SENSOR_PDIV and
> >> SENSOR_HOTSPOT_OFFSET registers - it just did two writes of
> >> T124-specific opaque values.  Convert these into a form that can be
> >> substituted on a per-chip basis, and into structure fields that have
> >> at least some independent meaning.
> >>
> >> Signed-off-by: Wei Ni 
> >> ---
> >>  drivers/thermal/tegra/tegra-soctherm.c | 18 ++
> >>  1 file changed, 14 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/thermal/tegra/tegra-soctherm.c 
> >> b/drivers/thermal/tegra/tegra-soctherm.c
> >> index b3ec0faa2bee..b4b791ebfbb6 100644
> >> --- a/drivers/thermal/tegra/tegra-soctherm.c
> >> +++ b/drivers/thermal/tegra/tegra-soctherm.c
> >> @@ -48,14 +48,12 @@
> >>  #define SENSOR_CONFIG2_THERMB_SHIFT   0
> >>  
> >>  #define SENSOR_PDIV   0x1c0
> >> -#define SENSOR_PDIV_T124  0x
> >>  #define SENSOR_PDIV_CPU_MASK  (0xf << 12)
> >>  #define SENSOR_PDIV_GPU_MASK  (0xf << 8)
> >>  #define SENSOR_PDIV_MEM_MASK  (0xf << 4)
> >>  #define SENSOR_PDIV_PLLX_MASK (0xf << 0)
> >>  
> >>  #define SENSOR_HOTSPOT_OFF0x1c4
> >> -#define SENSOR_HOTSPOT_OFF_T124   0x00060600
> >>  #define SENSOR_HOTSPOT_CPU_MASK   (0xff << 16)
> >>  #define SENSOR_HOTSPOT_GPU_MASK   (0xff << 8)
> >>  #define SENSOR_HOTSPOT_MEM_MASK   (0xff << 0)
> >> @@ -436,6 +434,7 @@ static int tegra_soctherm_probe(struct platform_device 
> >> *pdev)
> >>struct resource *res;
> >>unsigned int i;
> >>int err;
> >> +  u32 pdiv, hotspot;
> >>  
> >>const struct tegra_tsensor *tsensors = t124_tsensors;
> >>const struct tegra_tsensor_group **ttgs = tegra124_tsensor_groups;
> >> @@ -493,8 +492,19 @@ static int tegra_soctherm_probe(struct 
> >> platform_device *pdev)
> >>goto disable_clocks;
> >>}
> >>  
> >> -  writel(SENSOR_PDIV_T124, tegra->regs + SENSOR_PDIV);
> >> -  writel(SENSOR_HOTSPOT_OFF_T124, tegra->regs + SENSOR_HOTSPOT_OFF);
> >> +  /* Program pdiv and hotspot offsets per THERM */
> >> +  pdiv = readl(tegra->regs + SENSOR_PDIV);
> >> +  hotspot = readl(tegra->regs + SENSOR_HOTSPOT_OFF);
> >> +  for (i = 0; i < TEGRA124_SOCTHERM_SENSOR_NUM; ++i) {
> >> +  pdiv = REG_SET_MASK(pdiv, ttgs[i]->pdiv_mask,
> >> +  ttgs[i]->pdiv);
> >> +  if (ttgs[i]->id != TEGRA124_SOCTHERM_SENSOR_PLLX)
> >> +  hotspot =  REG_SET_MASK(hotspot,
> >> +  ttgs[i]->pllx_hotspot_mask,
> >> +  ttgs[i]->pllx_hotspot_diff);
> >> +  }
> >> +  writel(pdiv, tegra->regs + SENSOR_PDIV);
> >> +  writel(hotspot, tegra->regs + SENSOR_HOTSPOT_OFF);
> > 
> > Is the above logic the same for all supported chips? e.g. do we always
> > skip pllx for hotspot configuration?
> 
> Yes, this logic support Tegra124, Tegra210, and Tegra132 which I will send out
> in next series.


Ok. Could you please add a comment then explaining why pllx is not
needed for the hotspot configuration?

> 
> > 
> > 
> >>  
> >>/* Initialize thermctl sensors */
> >>  
> >> -- 
> >> 1.9.1
> >>
> > 
> > * Unknown Key
> > * 0x7DA4E256
> > 


signature.asc
Description: Digital signature

Re: [PATCH V7 03/12] thermal: tegra: get rid of PDIV/HOTSPOT hack

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 02:21:53PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 04:05, Eduardo Valentin wrote:
> > * PGP Signed by an unknown key
> > 
> > On Fri, Mar 11, 2016 at 11:09:14AM +0800, Wei Ni wrote:
> >> Get rid of T124-specific PDIV/HOTSPOT hack.
> >> tegra-soctherm.c contained a hack to set the SENSOR_PDIV and
> >> SENSOR_HOTSPOT_OFFSET registers - it just did two writes of
> >> T124-specific opaque values.  Convert these into a form that can be
> >> substituted on a per-chip basis, and into structure fields that have
> >> at least some independent meaning.
> >>
> >> Signed-off-by: Wei Ni 
> >> ---
> >>  drivers/thermal/tegra/tegra-soctherm.c | 18 ++
> >>  1 file changed, 14 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/thermal/tegra/tegra-soctherm.c 
> >> b/drivers/thermal/tegra/tegra-soctherm.c
> >> index b3ec0faa2bee..b4b791ebfbb6 100644
> >> --- a/drivers/thermal/tegra/tegra-soctherm.c
> >> +++ b/drivers/thermal/tegra/tegra-soctherm.c
> >> @@ -48,14 +48,12 @@
> >>  #define SENSOR_CONFIG2_THERMB_SHIFT   0
> >>  
> >>  #define SENSOR_PDIV   0x1c0
> >> -#define SENSOR_PDIV_T124  0x
> >>  #define SENSOR_PDIV_CPU_MASK  (0xf << 12)
> >>  #define SENSOR_PDIV_GPU_MASK  (0xf << 8)
> >>  #define SENSOR_PDIV_MEM_MASK  (0xf << 4)
> >>  #define SENSOR_PDIV_PLLX_MASK (0xf << 0)
> >>  
> >>  #define SENSOR_HOTSPOT_OFF0x1c4
> >> -#define SENSOR_HOTSPOT_OFF_T124   0x00060600
> >>  #define SENSOR_HOTSPOT_CPU_MASK   (0xff << 16)
> >>  #define SENSOR_HOTSPOT_GPU_MASK   (0xff << 8)
> >>  #define SENSOR_HOTSPOT_MEM_MASK   (0xff << 0)
> >> @@ -436,6 +434,7 @@ static int tegra_soctherm_probe(struct platform_device 
> >> *pdev)
> >>struct resource *res;
> >>unsigned int i;
> >>int err;
> >> +  u32 pdiv, hotspot;
> >>  
> >>const struct tegra_tsensor *tsensors = t124_tsensors;
> >>const struct tegra_tsensor_group **ttgs = tegra124_tsensor_groups;
> >> @@ -493,8 +492,19 @@ static int tegra_soctherm_probe(struct 
> >> platform_device *pdev)
> >>goto disable_clocks;
> >>}
> >>  
> >> -  writel(SENSOR_PDIV_T124, tegra->regs + SENSOR_PDIV);
> >> -  writel(SENSOR_HOTSPOT_OFF_T124, tegra->regs + SENSOR_HOTSPOT_OFF);
> >> +  /* Program pdiv and hotspot offsets per THERM */
> >> +  pdiv = readl(tegra->regs + SENSOR_PDIV);
> >> +  hotspot = readl(tegra->regs + SENSOR_HOTSPOT_OFF);
> >> +  for (i = 0; i < TEGRA124_SOCTHERM_SENSOR_NUM; ++i) {
> >> +  pdiv = REG_SET_MASK(pdiv, ttgs[i]->pdiv_mask,
> >> +  ttgs[i]->pdiv);
> >> +  if (ttgs[i]->id != TEGRA124_SOCTHERM_SENSOR_PLLX)
> >> +  hotspot =  REG_SET_MASK(hotspot,
> >> +  ttgs[i]->pllx_hotspot_mask,
> >> +  ttgs[i]->pllx_hotspot_diff);
> >> +  }
> >> +  writel(pdiv, tegra->regs + SENSOR_PDIV);
> >> +  writel(hotspot, tegra->regs + SENSOR_HOTSPOT_OFF);
> > 
> > Is the above logic the same for all supported chips? e.g. do we always
> > skip pllx for hotspot configuration?
> 
> Yes, this logic support Tegra124, Tegra210, and Tegra132 which I will send out
> in next series.


Ok. Could you please add a comment then explaining why pllx is not
needed for the hotspot configuration?

> 
> > 
> > 
> >>  
> >>/* Initialize thermctl sensors */
> >>  
> >> -- 
> >> 1.9.1
> >>
> > 
> > * Unknown Key
> > * 0x7DA4E256
> > 


signature.asc
Description: Digital signature

Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Dr. David Alan Gilbert

* Li, Liang Z (liang.z...@intel.com) wrote:
> > On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote:
> > > * Li, Liang Z (liang.z...@intel.com) wrote:
> > > > >
> > > > > Hi,
> > > > >   I'm just catching back up on this thread; so without reference
> > > > > to any particular previous mail in the thread.
> > > > >
> > > > >   1) How many of the free pages do we tell the host about?
> > > > >  Your main change is telling the host about all the
> > > > >  free pages.
> > > >
> > > > Yes, all the guest's free pages.
> > > >
> > > > >  If we tell the host about all the free pages, then we might
> > > > >  end up needing to allocate more pages and update the host
> > > > >  with pages we now want to use; that would have to wait for the
> > > > >  host to acknowledge that use of these pages, since if we don't
> > > > >  wait for it then it might have skipped migrating a page we
> > > > >  just started using (I don't understand how your series solves 
> > > > > that).
> > > > >  So the guest probably needs to keep some free pages - how many?
> > > >
> > > > Actually, there is no need to care about whether the free pages will be
> > used by the host.
> > > > We only care about some of the free pages we get reused by the guest,
> > right?
> > > >
> > > > The dirty page logging can be used to solve this, starting the dirty
> > > > page logging before getting the free pages informant from guest.
> > > > Even some of the free pages are modified by the guest during the
> > > > process of getting the free pages information, these modified pages will
> > be traced by the dirty page logging mechanism. So in the following
> > migration_bitmap_sync() function.
> > > > The pages in the free pages bitmap, but latter was modified, will be
> > > > reset to dirty. We won't omit any dirtied pages.
> > > >
> > > > So, guest doesn't need to keep any free pages.
> > >
> > > OK, yes, that works; so we do:
> > >   * enable dirty logging
> > >   * ask guest for free pages
> > >   * initialise the migration bitmap as everything-free
> > >   * then later we do the normal sync-dirty bitmap stuff and it all just 
> > > works.
> > >
> > > That's nice and simple.
> > 
> > This works once, sure. But there's an issue is that you have to defer 
> > migration
> > until you get the free page list, and this only works once. So you end up 
> > with
> > heuristics about how long to wait.
> > 
> > Instead I propose:
> > 
> > - mark all pages dirty as we do now.
> > 
> > - at start of migration, start tracking dirty
> >   pages in kvm, and tell guest to start tracking free pages
> > 
> > we can now introduce any kind of delay, for example wait for ack from guest,
> > or do whatever else, or even just start migrating pages
> > 
> > - repeatedly:
> > - get list of free pages from guest
> > - clear them in migration bitmap
> > - get dirty list from kvm
> > 
> > - at end of migration, stop tracking writes in kvm,
> >   and tell guest to stop tracking free pages
> 
> I had thought of filtering out the free pages in each migration bitmap 
> synchronization. 
> The advantage is we can skip process as many free pages as possible. Not just 
> once.
> The disadvantage is that we should change the current memory management code 
> to track the free pages,
> instead of traversing the free page list to construct the free pages bitmap, 
> to reduce the overhead to get the free pages bitmap.
> I am not sure the if the Kernel people would like it.
> 
> If keeping the traversing mechanism, because of the overhead, maybe it's not 
> worth to filter out the free pages repeatedly.

Well, Michael's idea of not waiting for the dirty
bitmap to be filled does make that idea of constnatly
using the free-bitmap better.

In that case, is it easier if something (guest/host?)
allocates some memory in the guests physical RAM space
and just points the host to it, rather than having an 
explicit 'send'.

Dave

> Liang
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-15 Thread Dr. David Alan Gilbert

* Li, Liang Z (liang.z...@intel.com) wrote:
> > On Mon, Mar 14, 2016 at 05:03:34PM +, Dr. David Alan Gilbert wrote:
> > > * Li, Liang Z (liang.z...@intel.com) wrote:
> > > > >
> > > > > Hi,
> > > > >   I'm just catching back up on this thread; so without reference
> > > > > to any particular previous mail in the thread.
> > > > >
> > > > >   1) How many of the free pages do we tell the host about?
> > > > >  Your main change is telling the host about all the
> > > > >  free pages.
> > > >
> > > > Yes, all the guest's free pages.
> > > >
> > > > >  If we tell the host about all the free pages, then we might
> > > > >  end up needing to allocate more pages and update the host
> > > > >  with pages we now want to use; that would have to wait for the
> > > > >  host to acknowledge that use of these pages, since if we don't
> > > > >  wait for it then it might have skipped migrating a page we
> > > > >  just started using (I don't understand how your series solves 
> > > > > that).
> > > > >  So the guest probably needs to keep some free pages - how many?
> > > >
> > > > Actually, there is no need to care about whether the free pages will be
> > used by the host.
> > > > We only care about some of the free pages we get reused by the guest,
> > right?
> > > >
> > > > The dirty page logging can be used to solve this, starting the dirty
> > > > page logging before getting the free pages informant from guest.
> > > > Even some of the free pages are modified by the guest during the
> > > > process of getting the free pages information, these modified pages will
> > be traced by the dirty page logging mechanism. So in the following
> > migration_bitmap_sync() function.
> > > > The pages in the free pages bitmap, but latter was modified, will be
> > > > reset to dirty. We won't omit any dirtied pages.
> > > >
> > > > So, guest doesn't need to keep any free pages.
> > >
> > > OK, yes, that works; so we do:
> > >   * enable dirty logging
> > >   * ask guest for free pages
> > >   * initialise the migration bitmap as everything-free
> > >   * then later we do the normal sync-dirty bitmap stuff and it all just 
> > > works.
> > >
> > > That's nice and simple.
> > 
> > This works once, sure. But there's an issue is that you have to defer 
> > migration
> > until you get the free page list, and this only works once. So you end up 
> > with
> > heuristics about how long to wait.
> > 
> > Instead I propose:
> > 
> > - mark all pages dirty as we do now.
> > 
> > - at start of migration, start tracking dirty
> >   pages in kvm, and tell guest to start tracking free pages
> > 
> > we can now introduce any kind of delay, for example wait for ack from guest,
> > or do whatever else, or even just start migrating pages
> > 
> > - repeatedly:
> > - get list of free pages from guest
> > - clear them in migration bitmap
> > - get dirty list from kvm
> > 
> > - at end of migration, stop tracking writes in kvm,
> >   and tell guest to stop tracking free pages
> 
> I had thought of filtering out the free pages in each migration bitmap 
> synchronization. 
> The advantage is we can skip process as many free pages as possible. Not just 
> once.
> The disadvantage is that we should change the current memory management code 
> to track the free pages,
> instead of traversing the free page list to construct the free pages bitmap, 
> to reduce the overhead to get the free pages bitmap.
> I am not sure the if the Kernel people would like it.
> 
> If keeping the traversing mechanism, because of the overhead, maybe it's not 
> worth to filter out the free pages repeatedly.

Well, Michael's idea of not waiting for the dirty
bitmap to be filled does make that idea of constnatly
using the free-bitmap better.

In that case, is it easier if something (guest/host?)
allocates some memory in the guests physical RAM space
and just points the host to it, rather than having an 
explicit 'send'.

Dave

> Liang
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH V7 05/12] thermal: tegra: add Tegra210 specific SOC_THERM driver

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 02:59:52PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 02:57, Eduardo Valentin wrote:
> > * PGP Signed by an unknown key
> > 
> > On Fri, Mar 11, 2016 at 11:10:25AM +0800, Wei Ni wrote:
> >> Add Tegra210 specific SOC_THERM driver.
> >>
> >> Signed-off-by: Wei Ni 
> >> ---
> >>  drivers/thermal/tegra/Makefile|   1 +
> >>  drivers/thermal/tegra/soctherm-fuse.c |  11 ++
> >>  drivers/thermal/tegra/soctherm.c  |   6 ++
> >>  drivers/thermal/tegra/soctherm.h  |   4 +
> >>  drivers/thermal/tegra/tegra210-soctherm.c | 173 
> >> ++
> > 
> > No Kconfig change?
> 
> Yes, we doesn't need Kconfig change.
> 
> As discussed with Thierry in [V1,03/10] thermal: tegra: split tegra_soctherm
> driver, he said:
> "I'd like to do this differently to reduce the number of Kconfig symbols.
> The alternate proposal would be for the TEGRA_SOCTHERM symbol to remain
> as it is and then build in driver support depending on the selected
> ARCH_TEGRA_*_SOC options."
> 
> So we only have "config TEGRA_SOCTHERM" in the Kconfig, and in Makefile, we 
> add
> tegra-soctherm-$(CONFIG_ARCH_TEGRA_124_SOC)   += tegra124-soctherm.o
> tegra-soctherm-$(CONFIG_ARCH_TEGRA_210_SOC)   += tegra210-soctherm.o


makes sense to me.



signature.asc
Description: Digital signature

Re: [PATCH V7 05/12] thermal: tegra: add Tegra210 specific SOC_THERM driver

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 02:59:52PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 02:57, Eduardo Valentin wrote:
> > * PGP Signed by an unknown key
> > 
> > On Fri, Mar 11, 2016 at 11:10:25AM +0800, Wei Ni wrote:
> >> Add Tegra210 specific SOC_THERM driver.
> >>
> >> Signed-off-by: Wei Ni 
> >> ---
> >>  drivers/thermal/tegra/Makefile|   1 +
> >>  drivers/thermal/tegra/soctherm-fuse.c |  11 ++
> >>  drivers/thermal/tegra/soctherm.c  |   6 ++
> >>  drivers/thermal/tegra/soctherm.h  |   4 +
> >>  drivers/thermal/tegra/tegra210-soctherm.c | 173 
> >> ++
> > 
> > No Kconfig change?
> 
> Yes, we doesn't need Kconfig change.
> 
> As discussed with Thierry in [V1,03/10] thermal: tegra: split tegra_soctherm
> driver, he said:
> "I'd like to do this differently to reduce the number of Kconfig symbols.
> The alternate proposal would be for the TEGRA_SOCTHERM symbol to remain
> as it is and then build in driver support depending on the selected
> ARCH_TEGRA_*_SOC options."
> 
> So we only have "config TEGRA_SOCTHERM" in the Kconfig, and in Makefile, we 
> add
> tegra-soctherm-$(CONFIG_ARCH_TEGRA_124_SOC)   += tegra124-soctherm.o
> tegra-soctherm-$(CONFIG_ARCH_TEGRA_210_SOC)   += tegra210-soctherm.o


makes sense to me.



signature.asc
Description: Digital signature

[PATCH] Staging: wlan-ng: moved memset() calls after copy_from_user() call

2016-03-15 Thread Claudiu Beznea

This patch moves memset() calls from p80211netdev_ethtool()
after copy_from_user() call in order to avoid unnecessary
instruction in case copy_from_user() fails.

Signed-off-by: Claudiu Beznea 
---
 drivers/staging/wlan-ng/p80211netdev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/wlan-ng/p80211netdev.c 
b/drivers/staging/wlan-ng/p80211netdev.c
index 88255ce..3723b8c 100644
--- a/drivers/staging/wlan-ng/p80211netdev.c
+++ b/drivers/staging/wlan-ng/p80211netdev.c
@@ -465,12 +465,12 @@ static int p80211netdev_ethtool(wlandevice_t *wlandev, 
void __user *useraddr)
struct ethtool_drvinfo info;
struct ethtool_value edata;
 
-   memset(, 0, sizeof(info));
-   memset(, 0, sizeof(edata));
-
if (copy_from_user(, useraddr, sizeof(ethcmd)))
return -EFAULT;
 
+   memset(, 0, sizeof(info));
+   memset(, 0, sizeof(edata));
+
switch (ethcmd) {
case ETHTOOL_GDRVINFO:
info.cmd = ethcmd;
-- 
1.9.1

[PATCH] Staging: wlan-ng: moved memset() calls after copy_from_user() call

2016-03-15 Thread Claudiu Beznea

This patch moves memset() calls from p80211netdev_ethtool()
after copy_from_user() call in order to avoid unnecessary
instruction in case copy_from_user() fails.

Signed-off-by: Claudiu Beznea 
---
 drivers/staging/wlan-ng/p80211netdev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/wlan-ng/p80211netdev.c 
b/drivers/staging/wlan-ng/p80211netdev.c
index 88255ce..3723b8c 100644
--- a/drivers/staging/wlan-ng/p80211netdev.c
+++ b/drivers/staging/wlan-ng/p80211netdev.c
@@ -465,12 +465,12 @@ static int p80211netdev_ethtool(wlandevice_t *wlandev, 
void __user *useraddr)
struct ethtool_drvinfo info;
struct ethtool_value edata;
 
-   memset(, 0, sizeof(info));
-   memset(, 0, sizeof(edata));
-
if (copy_from_user(, useraddr, sizeof(ethcmd)))
return -EFAULT;
 
+   memset(, 0, sizeof(info));
+   memset(, 0, sizeof(edata));
+
switch (ethcmd) {
case ETHTOOL_GDRVINFO:
info.cmd = ethcmd;
-- 
1.9.1

Re: [PATCH V7 08/12] of: add notes of critical trips for soctherm

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 04:14:15PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 15:49, Wei Ni wrote:
> > 
> > 
> > On 2016年03月15日 05:00, Eduardo Valentin wrote:
> >> * PGP Signed by an unknown key
> >>
> >> On Fri, Mar 11, 2016 at 11:11:00AM +0800, Wei Ni wrote:
> >>> The "critical" type trip in thermal zone can be
> >>> set to SOC_THERM hardware, it can trigger shut down
> >>> or reset event from hardware.
> >>>
> >>> Signed-off-by: Wei Ni 
> >>> Acked-by: Rob Herring 
> >>> ---
> >>>  Documentation/devicetree/bindings/thermal/tegra-soctherm.txt | 12 
> >>> 
> >>>  1 file changed, 12 insertions(+)
> >>
> >> I did not see in your patch set an update on the compatible string for
> >> the new chip. Did I miss something?
> > 
> > As I said in the previous [00/12], the "commit 193c9d23a0f0" already added 
> > the
> > compatible string. At that time, it just used current tegra_soctherm.c 
> > driver to
> > support Tegra210, it can work, but can't show temperatures correctly.
> 
> Oh, sorry, I made a mistake, this "commit 193c9d23a0f0" just added the
> compatible string, did not use current tegra_soctherm driver to support 
> Tegra210.

Ok. got it. The binding is done, but the driver is being re factored only
now to add the support for this chip version.

> 
> Wei.
> 
> > 
> >>
> >> * Unknown Key
> >> * 0x7DA4E256
> >>


signature.asc
Description: Digital signature

Re: [PATCH V7 08/12] of: add notes of critical trips for soctherm

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 04:14:15PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 15:49, Wei Ni wrote:
> > 
> > 
> > On 2016年03月15日 05:00, Eduardo Valentin wrote:
> >> * PGP Signed by an unknown key
> >>
> >> On Fri, Mar 11, 2016 at 11:11:00AM +0800, Wei Ni wrote:
> >>> The "critical" type trip in thermal zone can be
> >>> set to SOC_THERM hardware, it can trigger shut down
> >>> or reset event from hardware.
> >>>
> >>> Signed-off-by: Wei Ni 
> >>> Acked-by: Rob Herring 
> >>> ---
> >>>  Documentation/devicetree/bindings/thermal/tegra-soctherm.txt | 12 
> >>> 
> >>>  1 file changed, 12 insertions(+)
> >>
> >> I did not see in your patch set an update on the compatible string for
> >> the new chip. Did I miss something?
> > 
> > As I said in the previous [00/12], the "commit 193c9d23a0f0" already added 
> > the
> > compatible string. At that time, it just used current tegra_soctherm.c 
> > driver to
> > support Tegra210, it can work, but can't show temperatures correctly.
> 
> Oh, sorry, I made a mistake, this "commit 193c9d23a0f0" just added the
> compatible string, did not use current tegra_soctherm driver to support 
> Tegra210.

Ok. got it. The binding is done, but the driver is being re factored only
now to add the support for this chip version.

> 
> Wei.
> 
> > 
> >>
> >> * Unknown Key
> >> * 0x7DA4E256
> >>


signature.asc
Description: Digital signature

Re: [PATCH V7 09/12] thermal: tegra: add thermtrip function

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 05:12:12PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 03:16, Eduardo Valentin wrote:
> > * PGP Signed by an unknown key
> > 
> > On Fri, Mar 11, 2016 at 11:11:12AM +0800, Wei Ni wrote:
> >> Add support for hardware critical thermal limits to the
> >> SOC_THERM driver. It use the Linux thermal framework to
> >> create critical trip temp, and set it to SOC_THERM hardware.
> >> If these limits are breached, the chip will reset, and if
> >> appropriately configured, will turn off the PMIC.
> >>
> >> This support is critical for safe usage of the chip.
> >>
> >> Signed-off-by: Wei Ni 
> >> ---
> >>  drivers/thermal/tegra/soctherm.c  | 166 
> >> +-
> >>  drivers/thermal/tegra/soctherm.h  |   7 ++
> >>  drivers/thermal/tegra/tegra124-soctherm.c |  24 +
> >>  drivers/thermal/tegra/tegra210-soctherm.c |  24 +
> >>  4 files changed, 216 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/thermal/tegra/soctherm.c 
> >> b/drivers/thermal/tegra/soctherm.c
> >> index 02ac6d2e5a20..dbaab160baba 100644
> >> --- a/drivers/thermal/tegra/soctherm.c
> >> +++ b/drivers/thermal/tegra/soctherm.c
> >> @@ -73,9 +73,14 @@
> >>  #define REG_SET_MASK(r, m, v) (((r) & ~(m)) | \
> >> (((v) & (m >> (ffs(m) - 1))) << (ffs(m) - 1)))
> >>  
> >> +static const int min_low_temp = -127000;
> >> +static const int max_high_temp = 127000;
> >> +
> >>  struct tegra_thermctl_zone {
> >>void __iomem *reg;
> >> -  u32 mask;
> >> +  struct device *dev;
> >> +  struct thermal_zone_device *tz;
> > 
> > 
> > Why not using tz->dev for the *dev above?
> 
> The tz is thermal_zone_device, this structure doesn't have *dev.
> It only have the member "struct device device;", but this device is created 
> for
> the thermal class, not this tegra_soctherm device.
> 
> > 
> >> +  const struct tegra_tsensor_group *sg;
> >>  };
> >>  
> >>  struct tegra_soctherm {
> >> @@ -145,22 +150,158 @@ static int tegra_thermctl_get_temp(void *data, int 
> >> *out_temp)
> >>u32 val;
> >>  
> >>val = readl(zone->reg);
> >> -  val = REG_GET_MASK(val, zone->mask);
> >> +  val = REG_GET_MASK(val, zone->sg->sensor_temp_mask);
> >>*out_temp = translate_temp(val);
> >>  
> >>return 0;
> >>  }
> >>  
> >> +static int
> >> +thermtrip_program(struct device *dev, const struct tegra_tsensor_group 
> >> *sg,
> >> +int trip_temp);
> >> +
> >> +static int tegra_thermctl_set_trip_temp(void *data, int trip, int temp)
> >> +{
> >> +  struct tegra_thermctl_zone *zone = data;
> >> +  struct thermal_zone_device *tz = zone->tz;
> >> +  const struct tegra_tsensor_group *sg = zone->sg;
> >> +  struct device *dev = zone->dev;
> >> +  enum thermal_trip_type type;
> >> +  int ret;
> >> +
> >> +  if (!tz)
> >> +  return -EINVAL;
> > 
> > 
> > Is the above check needed? If you saw a case in which your function is
> > called without tz, would it be the case we have a but in the probe (or
> > even worse, in thermal-core)?
> 
> This tz isn't from thermal-core, it's from the "void *data".
> This *data is the private structure "struct tegra_thermctl_zone *zone = 
> data;".
> It is registered in devm_thermal_zone_of_sensor_register(*dev, sensor_id, 
> *data,
> *ops). And when it register successful, I will set zone->tz = z, in here, the
> zone is the private data.
> Let's consider a special case, once the thermal_zone_of_sensor_register
> successful and didn't run to "zone->tz = z" yet, then the thermal_core 
> implement
> .set_trip(), then it may cause problems in here, although it's difficult to 
> hit
> this case. So I think we need to do this check.


Can you be more specific? I don't recall a case that core would call any
driver callbacks before setting up the data structures properly.

> > 


signature.asc
Description: Digital signature

Re: [PATCH V7 09/12] thermal: tegra: add thermtrip function

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 05:12:12PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 03:16, Eduardo Valentin wrote:
> > * PGP Signed by an unknown key
> > 
> > On Fri, Mar 11, 2016 at 11:11:12AM +0800, Wei Ni wrote:
> >> Add support for hardware critical thermal limits to the
> >> SOC_THERM driver. It use the Linux thermal framework to
> >> create critical trip temp, and set it to SOC_THERM hardware.
> >> If these limits are breached, the chip will reset, and if
> >> appropriately configured, will turn off the PMIC.
> >>
> >> This support is critical for safe usage of the chip.
> >>
> >> Signed-off-by: Wei Ni 
> >> ---
> >>  drivers/thermal/tegra/soctherm.c  | 166 
> >> +-
> >>  drivers/thermal/tegra/soctherm.h  |   7 ++
> >>  drivers/thermal/tegra/tegra124-soctherm.c |  24 +
> >>  drivers/thermal/tegra/tegra210-soctherm.c |  24 +
> >>  4 files changed, 216 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/thermal/tegra/soctherm.c 
> >> b/drivers/thermal/tegra/soctherm.c
> >> index 02ac6d2e5a20..dbaab160baba 100644
> >> --- a/drivers/thermal/tegra/soctherm.c
> >> +++ b/drivers/thermal/tegra/soctherm.c
> >> @@ -73,9 +73,14 @@
> >>  #define REG_SET_MASK(r, m, v) (((r) & ~(m)) | \
> >> (((v) & (m >> (ffs(m) - 1))) << (ffs(m) - 1)))
> >>  
> >> +static const int min_low_temp = -127000;
> >> +static const int max_high_temp = 127000;
> >> +
> >>  struct tegra_thermctl_zone {
> >>void __iomem *reg;
> >> -  u32 mask;
> >> +  struct device *dev;
> >> +  struct thermal_zone_device *tz;
> > 
> > 
> > Why not using tz->dev for the *dev above?
> 
> The tz is thermal_zone_device, this structure doesn't have *dev.
> It only have the member "struct device device;", but this device is created 
> for
> the thermal class, not this tegra_soctherm device.
> 
> > 
> >> +  const struct tegra_tsensor_group *sg;
> >>  };
> >>  
> >>  struct tegra_soctherm {
> >> @@ -145,22 +150,158 @@ static int tegra_thermctl_get_temp(void *data, int 
> >> *out_temp)
> >>u32 val;
> >>  
> >>val = readl(zone->reg);
> >> -  val = REG_GET_MASK(val, zone->mask);
> >> +  val = REG_GET_MASK(val, zone->sg->sensor_temp_mask);
> >>*out_temp = translate_temp(val);
> >>  
> >>return 0;
> >>  }
> >>  
> >> +static int
> >> +thermtrip_program(struct device *dev, const struct tegra_tsensor_group 
> >> *sg,
> >> +int trip_temp);
> >> +
> >> +static int tegra_thermctl_set_trip_temp(void *data, int trip, int temp)
> >> +{
> >> +  struct tegra_thermctl_zone *zone = data;
> >> +  struct thermal_zone_device *tz = zone->tz;
> >> +  const struct tegra_tsensor_group *sg = zone->sg;
> >> +  struct device *dev = zone->dev;
> >> +  enum thermal_trip_type type;
> >> +  int ret;
> >> +
> >> +  if (!tz)
> >> +  return -EINVAL;
> > 
> > 
> > Is the above check needed? If you saw a case in which your function is
> > called without tz, would it be the case we have a but in the probe (or
> > even worse, in thermal-core)?
> 
> This tz isn't from thermal-core, it's from the "void *data".
> This *data is the private structure "struct tegra_thermctl_zone *zone = 
> data;".
> It is registered in devm_thermal_zone_of_sensor_register(*dev, sensor_id, 
> *data,
> *ops). And when it register successful, I will set zone->tz = z, in here, the
> zone is the private data.
> Let's consider a special case, once the thermal_zone_of_sensor_register
> successful and didn't run to "zone->tz = z" yet, then the thermal_core 
> implement
> .set_trip(), then it may cause problems in here, although it's difficult to 
> hit
> this case. So I think we need to do this check.


Can you be more specific? I don't recall a case that core would call any
driver callbacks before setting up the data structures properly.

> > 


signature.asc
Description: Digital signature

Re: [REGRESSION] Headphones no longer working on MacPro6,1 with 4.4

2016-03-15 Thread Takashi Iwai

On Tue, 15 Mar 2016 20:38:41 +0100,
Takashi Iwai wrote:
> 
> On Tue, 15 Mar 2016 20:23:09 +0100,
> Laura Abbott wrote:
> > 
> > Hi,
> > 
> > We received a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1316119
> > that the headphone jack on a MacPro6,1 stopped working on an upgrade from 
> > 4.3
> > to 4.4.
> > 
> > The bugzilla has the alsainfo, diffing shows that the Amp-Out vals are
> > different. I tried a revert of 9f660a1c4 (" ALSA: hda/realtek - Fix silent
> > headphone output on MacPro 4,1 (v2)") but that didn't help.
> > 
> > Any ideas before asking for a bisect? Does this hardware version need to 
> > have
> > the vref fixup as well?
> 
> The obvious difference is the power state of each node.  The recent
> kernel has the finer power saving mode, and this might be the cause --
> Mac has some secret that requires some node to be powered up.
> 
> Try to power on each node via hda-verb.  For example, to power up the
> node 0x05, run like:
>   hda-verb /dev/snd/hwC0D0 0x05 SET_POWER 0x01

Oops, a typo: the last argument must be 0x00, corresponding to D0:
hda-verb /dev/snd/hwC0D0 0x05 SET_POWER 0x00


Takashi

Re: [REGRESSION] Headphones no longer working on MacPro6,1 with 4.4

2016-03-15 Thread Takashi Iwai

On Tue, 15 Mar 2016 20:38:41 +0100,
Takashi Iwai wrote:
> 
> On Tue, 15 Mar 2016 20:23:09 +0100,
> Laura Abbott wrote:
> > 
> > Hi,
> > 
> > We received a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1316119
> > that the headphone jack on a MacPro6,1 stopped working on an upgrade from 
> > 4.3
> > to 4.4.
> > 
> > The bugzilla has the alsainfo, diffing shows that the Amp-Out vals are
> > different. I tried a revert of 9f660a1c4 (" ALSA: hda/realtek - Fix silent
> > headphone output on MacPro 4,1 (v2)") but that didn't help.
> > 
> > Any ideas before asking for a bisect? Does this hardware version need to 
> > have
> > the vref fixup as well?
> 
> The obvious difference is the power state of each node.  The recent
> kernel has the finer power saving mode, and this might be the cause --
> Mac has some secret that requires some node to be powered up.
> 
> Try to power on each node via hda-verb.  For example, to power up the
> node 0x05, run like:
>   hda-verb /dev/snd/hwC0D0 0x05 SET_POWER 0x01

Oops, a typo: the last argument must be 0x00, corresponding to D0:
hda-verb /dev/snd/hwC0D0 0x05 SET_POWER 0x00


Takashi

Re: [PATCH] iommu/vt-d: Ratelimit fault handler

2016-03-15 Thread David Woodhouse

On Tue, 2016-03-15 at 10:35 -0600, Alex Williamson wrote:
> Fault rates can easily overwhelm the console and make the system
> unresponsive.  Ratelimit to allow an opportunity for maintenance.
> 
> Signed-off-by: Alex Williamson 

Rather than just rate-limiting the printk, I'd prefer to handle this
explicitly. There's a bit in the context-entry which can tell the IOMMU
not to bother raising an interrupt at all. And then we can re-enable it
if/when the driver recovers the device. (Or perhaps just when it next
does a mapping).

We really ought to be reporting faults to drivers too, FWIW. I keep
meaning to take a look at that.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH] iommu/vt-d: Ratelimit fault handler

2016-03-15 Thread David Woodhouse

On Tue, 2016-03-15 at 10:35 -0600, Alex Williamson wrote:
> Fault rates can easily overwhelm the console and make the system
> unresponsive.  Ratelimit to allow an opportunity for maintenance.
> 
> Signed-off-by: Alex Williamson 

Rather than just rate-limiting the printk, I'd prefer to handle this
explicitly. There's a bit in the context-entry which can tell the IOMMU
not to bother raising an interrupt at all. And then we can re-enable it
if/when the driver recovers the device. (Or perhaps just when it next
does a mapping).

We really ought to be reporting faults to drivers too, FWIW. I keep
meaning to take a look at that.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 1/2] Staging: wlan-ng: removed prototype of p80211_stt_findproto() from this file.

2016-03-15 Thread kbuild test robot

Hi Claudiu,

[auto build test ERROR on staging/staging-testing]
[also build test ERROR on v4.5 next-20160315]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Claudiu-Beznea/Staging-wlan-ng-removed-prototype-of-p80211_stt_findproto-from-this-file/20160316-032739
config: xtensa-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

Note: the 
linux-review/Claudiu-Beznea/Staging-wlan-ng-removed-prototype-of-p80211_stt_findproto-from-this-file/20160316-032739
 HEAD 5518803294d6e4fcd6c066edb94cc30d04949bdb builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   drivers/staging/wlan-ng/p80211conv.c: In function 'skb_ether_to_p80211':
>> drivers/staging/wlan-ng/p80211conv.c:156:8: error: implicit declaration of 
>> function 'p80211_stt_findproto' [-Werror=implicit-function-declaration]
   p80211_stt_findproto(proto)) {
   ^
   drivers/staging/wlan-ng/p80211conv.c: At top level:
>> drivers/staging/wlan-ng/p80211conv.c:528:5: error: conflicting types for 
>> 'p80211_stt_findproto'
int p80211_stt_findproto(u16 proto)
^
   drivers/staging/wlan-ng/p80211conv.c:529:1: note: an argument type that has 
a default promotion can't match an empty parameter name list declaration
{
^
   drivers/staging/wlan-ng/p80211conv.c:156:8: note: previous implicit 
declaration of 'p80211_stt_findproto' was here
   p80211_stt_findproto(proto)) {
   ^
   cc1: some warnings being treated as errors

vim +/p80211_stt_findproto +156 drivers/staging/wlan-ng/p80211conv.c

00b3ed16 Greg Kroah-Hartman 2008-10-02  150 /* tack on SNAP 
*/
82eaca7d Moritz Muehlenhoff 2009-02-08  151 e_snap =
4eb28f71 Johan Meiring  2010-11-06  152 (struct 
wlan_snap *)skb_push(skb,
4eb28f71 Johan Meiring  2010-11-06  153 
sizeof(struct wlan_snap));
00b3ed16 Greg Kroah-Hartman 2008-10-02  154 e_snap->type = 
htons(proto);
25845388 Pranjal Bhor   2016-01-19  155 if (ethconv == 
WLAN_ETHCONV_8021h &&
25845388 Pranjal Bhor   2016-01-19 @156 
p80211_stt_findproto(proto)) {
82eaca7d Moritz Muehlenhoff 2009-02-08  157 
memcpy(e_snap->oui, oui_8021h,
82eaca7d Moritz Muehlenhoff 2009-02-08  158
WLAN_IEEE_OUI_LEN);
00b3ed16 Greg Kroah-Hartman 2008-10-02  159 } else {

:: The code at line 156 was first introduced by commit
:: 2584538807926344e713548e5210bded8ed22011 staging: wlan-ng: Logical 
continuation fixes

:: TO: Pranjal Bhor <bhor.pran...@gmail.com>
:: CC: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [PATCH 1/2] Staging: wlan-ng: removed prototype of p80211_stt_findproto() from this file.

2016-03-15 Thread kbuild test robot

Hi Claudiu,

[auto build test ERROR on staging/staging-testing]
[also build test ERROR on v4.5 next-20160315]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Claudiu-Beznea/Staging-wlan-ng-removed-prototype-of-p80211_stt_findproto-from-this-file/20160316-032739
config: xtensa-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

Note: the 
linux-review/Claudiu-Beznea/Staging-wlan-ng-removed-prototype-of-p80211_stt_findproto-from-this-file/20160316-032739
 HEAD 5518803294d6e4fcd6c066edb94cc30d04949bdb builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   drivers/staging/wlan-ng/p80211conv.c: In function 'skb_ether_to_p80211':
>> drivers/staging/wlan-ng/p80211conv.c:156:8: error: implicit declaration of 
>> function 'p80211_stt_findproto' [-Werror=implicit-function-declaration]
   p80211_stt_findproto(proto)) {
   ^
   drivers/staging/wlan-ng/p80211conv.c: At top level:
>> drivers/staging/wlan-ng/p80211conv.c:528:5: error: conflicting types for 
>> 'p80211_stt_findproto'
int p80211_stt_findproto(u16 proto)
^
   drivers/staging/wlan-ng/p80211conv.c:529:1: note: an argument type that has 
a default promotion can't match an empty parameter name list declaration
{
^
   drivers/staging/wlan-ng/p80211conv.c:156:8: note: previous implicit 
declaration of 'p80211_stt_findproto' was here
   p80211_stt_findproto(proto)) {
   ^
   cc1: some warnings being treated as errors

vim +/p80211_stt_findproto +156 drivers/staging/wlan-ng/p80211conv.c

00b3ed16 Greg Kroah-Hartman 2008-10-02  150 /* tack on SNAP 
*/
82eaca7d Moritz Muehlenhoff 2009-02-08  151 e_snap =
4eb28f71 Johan Meiring  2010-11-06  152 (struct 
wlan_snap *)skb_push(skb,
4eb28f71 Johan Meiring  2010-11-06  153 
sizeof(struct wlan_snap));
00b3ed16 Greg Kroah-Hartman 2008-10-02  154 e_snap->type = 
htons(proto);
25845388 Pranjal Bhor   2016-01-19  155 if (ethconv == 
WLAN_ETHCONV_8021h &&
25845388 Pranjal Bhor   2016-01-19 @156 
p80211_stt_findproto(proto)) {
82eaca7d Moritz Muehlenhoff 2009-02-08  157 
memcpy(e_snap->oui, oui_8021h,
82eaca7d Moritz Muehlenhoff 2009-02-08  158
WLAN_IEEE_OUI_LEN);
00b3ed16 Greg Kroah-Hartman 2008-10-02  159 } else {

:: The code at line 156 was first introduced by commit
:: 2584538807926344e713548e5210bded8ed22011 staging: wlan-ng: Logical 
continuation fixes

:: TO: Pranjal Bhor 
:: CC: Greg Kroah-Hartman 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [PATCH V7 11/12] arm64: tegra: add soctherm node for Tegra210

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 06:43:00PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 03:25, Eduardo Valentin wrote:
> > * PGP Signed by an unknown key
> > 
> > On Fri, Mar 11, 2016 at 11:11:34AM +0800, Wei Ni wrote:
> >> Adds soctherm node for Tegra210, and add cpu,
> >> gpu, mem, pllx as thermal-zones. Set critical
> >> trip temp for cpu and gpu thermal zone.
> >>
> >> Signed-off-by: Wei Ni 
> >> ---
> >>  arch/arm64/boot/dts/nvidia/tegra210.dtsi | 60 
> >> 
> >>  1 file changed, 60 insertions(+)
> >>
> >> diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi 
> >> b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> >> index cd4f45ccd6a7..c7ef500a347e 100644
> >> --- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> >> +++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> >> @@ -3,6 +3,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  
> >>  / {
> >>compatible = "nvidia,tegra210";
> >> @@ -802,4 +803,63 @@
> >>(GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_LOW)>;
> >>interrupt-parent = <>;
> >>};
> >> +
> >> +  soctherm: thermal-sensor@0,700e2000 {
> >> +  compatible = "nvidia,tegra210-soctherm";
> >> +  reg = <0x0 0x700e2000 0x0 0x1000>;
> >> +  interrupts = ;
> >> +  clocks = <_car TEGRA210_CLK_TSENSOR>,
> >> +  <_car TEGRA210_CLK_SOC_THERM>;
> >> +  clock-names = "tsensor", "soctherm";
> >> +  resets = <_car 78>;
> >> +  reset-names = "soctherm";
> >> +  #thermal-sensor-cells = <1>;
> >> +  };
> >> +
> >> +  thermal-zones {
> >> +  cpu {
> >> +  polling-delay-passive = <1000>;
> >> +  polling-delay = <0>;
> >> +
> >> +  thermal-sensors =
> >> +  < TEGRA124_SOCTHERM_SENSOR_CPU>;
> >> +
> >> +  trips {
> >> +  cpu_shutdown_trip: shutdown-trip {
> >> +  temperature = <102500>;
> >> +  hysteresis = <1000>;
> >> +  type = "critical";
> >> +  };
> >> +  };
> >> +  };
> >> +  mem {
> >> +  polling-delay-passive = <0>;
> >> +  polling-delay = <0>;
> >> +
> >> +  thermal-sensors =
> >> +  < TEGRA124_SOCTHERM_SENSOR_MEM>;
> > 
> > 
> > Why no trips for mem? Why should we  care ?
> 
> The critical trip temperature will be set to HW for critical shutdown. 
> Normally,
> we just take care the CPU and GPU temperature. And in HW, the MEM use the same
> critical trip with GPU. For PLLX, we just keep the default critical trip in 
> HW.
> So I didn't configure the MEM and PLLX. I can add critical trips for them.

Ok. Please add them.

> 
> > 
> > Please have a look on the binding to check for mandatory properties and
> > sub nodes.
> 
> Hmm, yes, the trips and cooling-maps are required properties. How about to 
> add a
> dummy-cool-dev, so that it could be compatible with the binding.
> 

Yeah, what people are doing, when the cooling devices are not ready to
be linked, is to add an empty section of cooling-maps.


> Wei.
> 
> > 
> >> +  };
> >> +  gpu {
> >> +  polling-delay-passive = <1000>;
> >> +  polling-delay = <0>;
> >> +
> >> +  thermal-sensors =
> >> +  < TEGRA124_SOCTHERM_SENSOR_GPU>;
> >> +
> >> +  trips {
> >> +  gpu_shutdown_trip: shutdown-trip {
> >> +  temperature = <103000>;
> >> +  hysteresis = <1000>;
> >> +  type = "critical";
> >> +  };
> >> +  };
> >> +  };
> >> +  pllx {
> >> +  polling-delay-passive = <0>;
> >> +  polling-delay = <0>;
> >> +
> >> +  thermal-sensors =
> >> +  < TEGRA124_SOCTHERM_SENSOR_PLLX>;
> > 
> > ditto
> > 
> >> +  };
> >> +  };
> >>  };
> >> -- 
> >> 1.9.1
> >>
> > 
> > * Unknown Key
> > * 0x7DA4E256
> > 


signature.asc
Description: Digital signature

Re: [PATCH V7 11/12] arm64: tegra: add soctherm node for Tegra210

2016-03-15 Thread Eduardo Valentin

On Tue, Mar 15, 2016 at 06:43:00PM +0800, Wei Ni wrote:
> 
> 
> On 2016年03月15日 03:25, Eduardo Valentin wrote:
> > * PGP Signed by an unknown key
> > 
> > On Fri, Mar 11, 2016 at 11:11:34AM +0800, Wei Ni wrote:
> >> Adds soctherm node for Tegra210, and add cpu,
> >> gpu, mem, pllx as thermal-zones. Set critical
> >> trip temp for cpu and gpu thermal zone.
> >>
> >> Signed-off-by: Wei Ni 
> >> ---
> >>  arch/arm64/boot/dts/nvidia/tegra210.dtsi | 60 
> >> 
> >>  1 file changed, 60 insertions(+)
> >>
> >> diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi 
> >> b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> >> index cd4f45ccd6a7..c7ef500a347e 100644
> >> --- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> >> +++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> >> @@ -3,6 +3,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  
> >>  / {
> >>compatible = "nvidia,tegra210";
> >> @@ -802,4 +803,63 @@
> >>(GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_LOW)>;
> >>interrupt-parent = <>;
> >>};
> >> +
> >> +  soctherm: thermal-sensor@0,700e2000 {
> >> +  compatible = "nvidia,tegra210-soctherm";
> >> +  reg = <0x0 0x700e2000 0x0 0x1000>;
> >> +  interrupts = ;
> >> +  clocks = <_car TEGRA210_CLK_TSENSOR>,
> >> +  <_car TEGRA210_CLK_SOC_THERM>;
> >> +  clock-names = "tsensor", "soctherm";
> >> +  resets = <_car 78>;
> >> +  reset-names = "soctherm";
> >> +  #thermal-sensor-cells = <1>;
> >> +  };
> >> +
> >> +  thermal-zones {
> >> +  cpu {
> >> +  polling-delay-passive = <1000>;
> >> +  polling-delay = <0>;
> >> +
> >> +  thermal-sensors =
> >> +  < TEGRA124_SOCTHERM_SENSOR_CPU>;
> >> +
> >> +  trips {
> >> +  cpu_shutdown_trip: shutdown-trip {
> >> +  temperature = <102500>;
> >> +  hysteresis = <1000>;
> >> +  type = "critical";
> >> +  };
> >> +  };
> >> +  };
> >> +  mem {
> >> +  polling-delay-passive = <0>;
> >> +  polling-delay = <0>;
> >> +
> >> +  thermal-sensors =
> >> +  < TEGRA124_SOCTHERM_SENSOR_MEM>;
> > 
> > 
> > Why no trips for mem? Why should we  care ?
> 
> The critical trip temperature will be set to HW for critical shutdown. 
> Normally,
> we just take care the CPU and GPU temperature. And in HW, the MEM use the same
> critical trip with GPU. For PLLX, we just keep the default critical trip in 
> HW.
> So I didn't configure the MEM and PLLX. I can add critical trips for them.

Ok. Please add them.

> 
> > 
> > Please have a look on the binding to check for mandatory properties and
> > sub nodes.
> 
> Hmm, yes, the trips and cooling-maps are required properties. How about to 
> add a
> dummy-cool-dev, so that it could be compatible with the binding.
> 

Yeah, what people are doing, when the cooling devices are not ready to
be linked, is to add an empty section of cooling-maps.


> Wei.
> 
> > 
> >> +  };
> >> +  gpu {
> >> +  polling-delay-passive = <1000>;
> >> +  polling-delay = <0>;
> >> +
> >> +  thermal-sensors =
> >> +  < TEGRA124_SOCTHERM_SENSOR_GPU>;
> >> +
> >> +  trips {
> >> +  gpu_shutdown_trip: shutdown-trip {
> >> +  temperature = <103000>;
> >> +  hysteresis = <1000>;
> >> +  type = "critical";
> >> +  };
> >> +  };
> >> +  };
> >> +  pllx {
> >> +  polling-delay-passive = <0>;
> >> +  polling-delay = <0>;
> >> +
> >> +  thermal-sensors =
> >> +  < TEGRA124_SOCTHERM_SENSOR_PLLX>;
> > 
> > ditto
> > 
> >> +  };
> >> +  };
> >>  };
> >> -- 
> >> 1.9.1
> >>
> > 
> > * Unknown Key
> > * 0x7DA4E256
> > 


signature.asc
Description: Digital signature

Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool

2016-03-15 Thread Steven Rostedt

On Sat, 12 Mar 2016 17:35:02 +0900
Namhyung Kim  wrote:

> Hi Jiri,
> 
> On Fri, Mar 11, 2016 at 07:15:06PM +0100, Jiri Olsa wrote:
> > On Fri, Mar 11, 2016 at 11:28:00PM +0900, Namhyung Kim wrote:
> > 
> > SNIP
> >   
> > > > @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct 
> > > > ftrace_ops *ops,
> > > > if (inc) {
> > > > rec->flags++;
> > > > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 
> > > > FTRACE_REF_MAX))
> > > > -   return;
> > > > +   return false;
> > > >  
> > > > /*
> > > >  * If there's only a single callback registered 
> > > > to a
> > > > @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct 
> > > > ftrace_ops *ops,
> > > > rec->flags |= FTRACE_FL_REGS;
> > > > } else {
> > > > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> > > > -   return;
> > > > +   return false;
> > > > rec->flags--;
> > > >  
> > > > /*
> > > > @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct 
> > > > ftrace_ops *ops,
> > > >  */
> > > > }
> > > > count++;
> > > > +
> > > > +   update |= ftrace_test_record(rec, 1) != 
> > > > FTRACE_UPDATE_IGNORE;  

Yeah, this is confusing. Mind adding a comment above this:

/* Must match FTRACE_UPDATE_CALLS in ftrace_modify_all_code() */

That way others will know why this is a '1'.

-- Steve

> > > 
> > > Shouldn't it use 'inc' instead of 1 for the second argument of
> > > the ftrace_test_record()?  
> > 
> > I dont think so, 1 is to update calls (FTRACE_UPDATE_CALLS)
> > check ftrace_modify_all_code:
> > 
> > if (command & FTRACE_UPDATE_CALLS)
> > ftrace_replace_code(1);
> > else if (command & FTRACE_DISABLE_CALLS)
> > ftrace_replace_code(0);
> > 
> > both ftrace_startup, ftrace_shutdown use FTRACE_UPDATE_CALLS  
> 
> Ah, ok.  So the second argument of the ftrace_test_record() is not
> 'enable' actually..  :-/
> 
> > 
> > you'd use 0 only to disable all, check ftrace_check_record comments:
> > 
> > /*
> >  * If we are updating calls:
> >  *
> >  *   If the record has a ref count, then we need to enable it
> >  *   because someone is using it.
> >  *
> >  *   Otherwise we make sure its disabled.
> >  *
> >  * If we are disabling calls, then disable all records that
> >  * are enabled.
> >  */
> > if (enable && ftrace_rec_count(rec))
> > flag = FTRACE_FL_ENABLED;
> > 
> > 
> > used by ftrace_shutdown_sysctl  
> 
> I got it.  Thank you for the explanation!
> 
> Thanks,
> Namhyung

[PATCH v7 0/3] fallocate for block devices to provide zero-out

2016-03-15 Thread Darrick J. Wong

Hi,

This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code.  BLKZEROOUT2 is gone.

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.

Test cases for the new block device fallocate have been submitted to
the xfstests list as generic/70[5-7], though the numbering will change
to a lower number when the API and the tests are accepted upstream.
Look for the v2 testcase patch, which reflects v7 of this patchset.

Comments and questions are, as always, welcome.  Patches are against
4.5.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.

--D

Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool

2016-03-15 Thread Steven Rostedt

On Sat, 12 Mar 2016 17:35:02 +0900
Namhyung Kim  wrote:

> Hi Jiri,
> 
> On Fri, Mar 11, 2016 at 07:15:06PM +0100, Jiri Olsa wrote:
> > On Fri, Mar 11, 2016 at 11:28:00PM +0900, Namhyung Kim wrote:
> > 
> > SNIP
> >   
> > > > @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct 
> > > > ftrace_ops *ops,
> > > > if (inc) {
> > > > rec->flags++;
> > > > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 
> > > > FTRACE_REF_MAX))
> > > > -   return;
> > > > +   return false;
> > > >  
> > > > /*
> > > >  * If there's only a single callback registered 
> > > > to a
> > > > @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct 
> > > > ftrace_ops *ops,
> > > > rec->flags |= FTRACE_FL_REGS;
> > > > } else {
> > > > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> > > > -   return;
> > > > +   return false;
> > > > rec->flags--;
> > > >  
> > > > /*
> > > > @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct 
> > > > ftrace_ops *ops,
> > > >  */
> > > > }
> > > > count++;
> > > > +
> > > > +   update |= ftrace_test_record(rec, 1) != 
> > > > FTRACE_UPDATE_IGNORE;  

Yeah, this is confusing. Mind adding a comment above this:

/* Must match FTRACE_UPDATE_CALLS in ftrace_modify_all_code() */

That way others will know why this is a '1'.

-- Steve

> > > 
> > > Shouldn't it use 'inc' instead of 1 for the second argument of
> > > the ftrace_test_record()?  
> > 
> > I dont think so, 1 is to update calls (FTRACE_UPDATE_CALLS)
> > check ftrace_modify_all_code:
> > 
> > if (command & FTRACE_UPDATE_CALLS)
> > ftrace_replace_code(1);
> > else if (command & FTRACE_DISABLE_CALLS)
> > ftrace_replace_code(0);
> > 
> > both ftrace_startup, ftrace_shutdown use FTRACE_UPDATE_CALLS  
> 
> Ah, ok.  So the second argument of the ftrace_test_record() is not
> 'enable' actually..  :-/
> 
> > 
> > you'd use 0 only to disable all, check ftrace_check_record comments:
> > 
> > /*
> >  * If we are updating calls:
> >  *
> >  *   If the record has a ref count, then we need to enable it
> >  *   because someone is using it.
> >  *
> >  *   Otherwise we make sure its disabled.
> >  *
> >  * If we are disabling calls, then disable all records that
> >  * are enabled.
> >  */
> > if (enable && ftrace_rec_count(rec))
> > flag = FTRACE_FL_ENABLED;
> > 
> > 
> > used by ftrace_shutdown_sysctl  
> 
> I got it.  Thank you for the explanation!
> 
> Thanks,
> Namhyung

[PATCH v7 0/3] fallocate for block devices to provide zero-out

2016-03-15 Thread Darrick J. Wong

Hi,

This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code.  BLKZEROOUT2 is gone.

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.

Test cases for the new block device fallocate have been submitted to
the xfstests list as generic/70[5-7], though the numbering will change
to a lower number when the API and the tests are accepted upstream.
Look for the v2 testcase patch, which reflects v7 of this patchset.

Comments and questions are, as always, welcome.  Patches are against
4.5.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.

--D

[PATCH 3/3] block: implement (some of) fallocate for block devices

2016-03-15 Thread Darrick J. Wong

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong 
---
 fs/block_dev.c |   69 
 fs/open.c  |3 ++
 2 files changed, 71 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..6137c6e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 struct bdev_inode {
@@ -1786,6 +1787,73 @@ static int blkdev_mmap(struct file *file, struct 
vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#defineBLKDEV_FALLOC_FL_SUPPORTED  
\
+   (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |   \
+FALLOC_FL_ZERO_RANGE)
+
+long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
+{
+   struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+   struct request_queue *q = bdev_get_queue(bdev);
+   struct address_space *mapping;
+   loff_t end = start + len - 1;
+   loff_t bs_mask, isize;
+   int error;
+
+   /* We only support zero range and punch hole. */
+   if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+   return -EOPNOTSUPP;
+
+   /* We haven't a primitive for "ensure space exists" right now. */
+   if (!(mode & ~FALLOC_FL_KEEP_SIZE))
+   return -EOPNOTSUPP;
+
+   /* Only punch if the device can do zeroing discard. */
+   if ((mode & FALLOC_FL_PUNCH_HOLE) &&
+   (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
+   return -EOPNOTSUPP;
+
+   /* Don't go off the end of the device */
+   isize = i_size_read(bdev->bd_inode);
+   if (start >= isize)
+   return -EINVAL;
+   if (end > isize) {
+   if (mode & FALLOC_FL_KEEP_SIZE) {
+   len = isize - start;
+   end = start + len - 1;
+   } else
+   return -EINVAL;
+   }
+
+   /* Don't allow IO that isn't aligned to logical block size */
+   bs_mask = bdev_logical_block_size(bdev) - 1;
+   if ((start | len) & bs_mask)
+   return -EINVAL;
+
+   /* Invalidate the page cache, including dirty pages. */
+   mapping = bdev->bd_inode->i_mapping;
+   truncate_inode_pages_range(mapping, start, end);
+
+   error = -EINVAL;
+   if (mode & FALLOC_FL_ZERO_RANGE)
+   error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+   GFP_KERNEL, false);
+   else if (mode & FALLOC_FL_PUNCH_HOLE)
+   error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+GFP_KERNEL, 0);
+   if (error)
+   return error;
+
+   /*
+* Invalidate again; if someone wandered in and dirtied a page,
+* the caller will be given -EBUSY;
+*/
+   return invalidate_inode_pages2_range(mapping,
+start >> PAGE_CACHE_SHIFT,
+end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(blkdev_fallocate);
+
 const struct file_operations def_blk_fops = {
.open   = blkdev_open,
.release= blkdev_close,
@@ -1800,6 +1868,7 @@ const struct file_operations def_blk_fops = {
 #endif
.splice_read= generic_file_splice_read,
.splice_write   = iter_file_splice_write,
+   .fallocate  = blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..4f99adc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
 * Let individual file system decide if it supports preallocation
 * for directories or not.
 */
-   if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+   if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+   !S_ISBLK(inode->i_mode))
return -ENODEV;
 
/* Check for wrap through zero too */

[PATCH 3/3] block: implement (some of) fallocate for block devices

2016-03-15 Thread Darrick J. Wong

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong 
---
 fs/block_dev.c |   69 
 fs/open.c  |3 ++
 2 files changed, 71 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..6137c6e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 struct bdev_inode {
@@ -1786,6 +1787,73 @@ static int blkdev_mmap(struct file *file, struct 
vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#defineBLKDEV_FALLOC_FL_SUPPORTED  
\
+   (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |   \
+FALLOC_FL_ZERO_RANGE)
+
+long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
+{
+   struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+   struct request_queue *q = bdev_get_queue(bdev);
+   struct address_space *mapping;
+   loff_t end = start + len - 1;
+   loff_t bs_mask, isize;
+   int error;
+
+   /* We only support zero range and punch hole. */
+   if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+   return -EOPNOTSUPP;
+
+   /* We haven't a primitive for "ensure space exists" right now. */
+   if (!(mode & ~FALLOC_FL_KEEP_SIZE))
+   return -EOPNOTSUPP;
+
+   /* Only punch if the device can do zeroing discard. */
+   if ((mode & FALLOC_FL_PUNCH_HOLE) &&
+   (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
+   return -EOPNOTSUPP;
+
+   /* Don't go off the end of the device */
+   isize = i_size_read(bdev->bd_inode);
+   if (start >= isize)
+   return -EINVAL;
+   if (end > isize) {
+   if (mode & FALLOC_FL_KEEP_SIZE) {
+   len = isize - start;
+   end = start + len - 1;
+   } else
+   return -EINVAL;
+   }
+
+   /* Don't allow IO that isn't aligned to logical block size */
+   bs_mask = bdev_logical_block_size(bdev) - 1;
+   if ((start | len) & bs_mask)
+   return -EINVAL;
+
+   /* Invalidate the page cache, including dirty pages. */
+   mapping = bdev->bd_inode->i_mapping;
+   truncate_inode_pages_range(mapping, start, end);
+
+   error = -EINVAL;
+   if (mode & FALLOC_FL_ZERO_RANGE)
+   error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+   GFP_KERNEL, false);
+   else if (mode & FALLOC_FL_PUNCH_HOLE)
+   error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+GFP_KERNEL, 0);
+   if (error)
+   return error;
+
+   /*
+* Invalidate again; if someone wandered in and dirtied a page,
+* the caller will be given -EBUSY;
+*/
+   return invalidate_inode_pages2_range(mapping,
+start >> PAGE_CACHE_SHIFT,
+end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(blkdev_fallocate);
+
 const struct file_operations def_blk_fops = {
.open   = blkdev_open,
.release= blkdev_close,
@@ -1800,6 +1868,7 @@ const struct file_operations def_blk_fops = {
 #endif
.splice_read= generic_file_splice_read,
.splice_write   = iter_file_splice_write,
+   .fallocate  = blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..4f99adc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
 * Let individual file system decide if it supports preallocation
 * for directories or not.
 */
-   if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+   if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+   !S_ISBLK(inode->i_mode))
return -ENODEV;
 
/* Check for wrap through zero too */

[PATCH 2/3] block: require write_same and discard requests align to logical block size

2016-03-15 Thread Darrick J. Wong

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
---
 block/blk-lib.c |   15 +++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..9dca6bb 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -49,6 +49,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
struct bio *bio;
int ret = 0;
struct blk_plug plug;
+   sector_t bs_mask;
 
if (!q)
return -ENXIO;
@@ -56,6 +57,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
/* Zero-sector (unknown) and one-sector granularities are the same.  */
granularity = max(q->limits.discard_granularity >> 9, 1U);
alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -148,6 +153,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
unsigned int max_write_same_sectors;
+   sector_t bs_mask;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -155,6 +161,10 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
if (!q)
return -ENXIO;
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
/* Ensure that max_write_same_sectors doesn't overflow bi_size */
max_write_same_sectors = UINT_MAX >> 9;
 
@@ -218,9 +228,14 @@ static int __blkdev_issue_zeroout(struct block_device 
*bdev, sector_t sector,
int ret;
struct bio *bio;
struct bio_batch bb;
+   sector_t bs_mask;
unsigned int sz;
DECLARE_COMPLETION_ONSTACK(wait);
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
atomic_set(, 1);
bb.error = 0;
bb.wait =

[PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.

2016-03-15 Thread Darrick J. Wong

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.

Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
---
 block/ioctl.c |   29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index d8996bb..c6eb462 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, 
fmode_t mode,
unsigned long arg)
 {
uint64_t range[2];
-   uint64_t start, len;
+   struct address_space *mapping;
+   uint64_t start, end, len;
+   int ret;
 
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, 
fmode_t mode,
 
start = range[0];
len = range[1];
+   end = start + len - 1;
 
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
-   start >>= 9;
-   len >>= 9;
-
-   if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+   if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+   return -EINVAL;
+   if (end < start)
return -EINVAL;
 
-   return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+   /* Invalidate the page cache, including dirty pages */
+   mapping = bdev->bd_inode->i_mapping;
+   truncate_inode_pages_range(mapping, start, end);
+
+   ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+   false);
+   if (ret)
+   return ret;
+
+   /*
+* Invalidate again; if someone wandered in and dirtied a page,
+* the caller will be given -EBUSY.
+*/
+   return invalidate_inode_pages2_range(mapping,
+start >> PAGE_CACHE_SHIFT,
+end >> PAGE_CACHE_SHIFT);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

[PATCH 2/3] block: require write_same and discard requests align to logical block size

2016-03-15 Thread Darrick J. Wong

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
---
 block/blk-lib.c |   15 +++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..9dca6bb 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -49,6 +49,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
struct bio *bio;
int ret = 0;
struct blk_plug plug;
+   sector_t bs_mask;
 
if (!q)
return -ENXIO;
@@ -56,6 +57,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
/* Zero-sector (unknown) and one-sector granularities are the same.  */
granularity = max(q->limits.discard_granularity >> 9, 1U);
alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -148,6 +153,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
unsigned int max_write_same_sectors;
+   sector_t bs_mask;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -155,6 +161,10 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
if (!q)
return -ENXIO;
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
/* Ensure that max_write_same_sectors doesn't overflow bi_size */
max_write_same_sectors = UINT_MAX >> 9;
 
@@ -218,9 +228,14 @@ static int __blkdev_issue_zeroout(struct block_device 
*bdev, sector_t sector,
int ret;
struct bio *bio;
struct bio_batch bb;
+   sector_t bs_mask;
unsigned int sz;
DECLARE_COMPLETION_ONSTACK(wait);
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
atomic_set(, 1);
bb.error = 0;
bb.wait =

[PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.

2016-03-15 Thread Darrick J. Wong

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.

Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
---
 block/ioctl.c |   29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index d8996bb..c6eb462 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, 
fmode_t mode,
unsigned long arg)
 {
uint64_t range[2];
-   uint64_t start, len;
+   struct address_space *mapping;
+   uint64_t start, end, len;
+   int ret;
 
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, 
fmode_t mode,
 
start = range[0];
len = range[1];
+   end = start + len - 1;
 
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
-   start >>= 9;
-   len >>= 9;
-
-   if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+   if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+   return -EINVAL;
+   if (end < start)
return -EINVAL;
 
-   return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+   /* Invalidate the page cache, including dirty pages */
+   mapping = bdev->bd_inode->i_mapping;
+   truncate_inode_pages_range(mapping, start, end);
+
+   ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+   false);
+   if (ret)
+   return ret;
+
+   /*
+* Invalidate again; if someone wandered in and dirtied a page,
+* the caller will be given -EBUSY.
+*/
+   return invalidate_inode_pages2_range(mapping,
+start >> PAGE_CACHE_SHIFT,
+end >> PAGE_CACHE_SHIFT);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

Re: [GIT PULL] RCU changes for v4.6

2016-03-15 Thread Paul E. McKenney

On Tue, Mar 15, 2016 at 12:10:03PM -0700, Linus Torvalds wrote:
> On Tue, Mar 15, 2016 at 11:48 AM, Paul E. McKenney
>  wrote:
> >
> > On the quick quizzes, if you want me to get rid of them, they are gone.
> 
> You don't have to remove them (but I do think cartoons etc should be).
> 
> But dammit, you don't need to duplicate a big file or use a
> non-standard format for something as trivial as a quiz.
> 
> There are *trivial* solutions to this:
> 
>  - just move the answer in the html file (or to another html file past a 
> link).
> 
>  - or just make the answer be in text using the background color (have
> people select the text to see it)
> 
>  - or make the answer be in a tiny font and make people use "ctrl-+"
> or whatever.
> 
> None of these require a non-standard format and munging and duplication.

Will fix!

Thanx, Paul

Re: [GIT PULL] RCU changes for v4.6

2016-03-15 Thread Paul E. McKenney

On Tue, Mar 15, 2016 at 12:10:03PM -0700, Linus Torvalds wrote:
> On Tue, Mar 15, 2016 at 11:48 AM, Paul E. McKenney
>  wrote:
> >
> > On the quick quizzes, if you want me to get rid of them, they are gone.
> 
> You don't have to remove them (but I do think cartoons etc should be).
> 
> But dammit, you don't need to duplicate a big file or use a
> non-standard format for something as trivial as a quiz.
> 
> There are *trivial* solutions to this:
> 
>  - just move the answer in the html file (or to another html file past a 
> link).
> 
>  - or just make the answer be in text using the background color (have
> people select the text to see it)
> 
>  - or make the answer be in a tiny font and make people use "ctrl-+"
> or whatever.
> 
> None of these require a non-standard format and munging and duplication.

Will fix!

Thanx, Paul

Re: [REGRESSION] Headphones no longer working on MacPro6,1 with 4.4

2016-03-15 Thread Takashi Iwai

On Tue, 15 Mar 2016 20:23:09 +0100,
Laura Abbott wrote:
> 
> Hi,
> 
> We received a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1316119
> that the headphone jack on a MacPro6,1 stopped working on an upgrade from 4.3
> to 4.4.
> 
> The bugzilla has the alsainfo, diffing shows that the Amp-Out vals are
> different. I tried a revert of 9f660a1c4 (" ALSA: hda/realtek - Fix silent
> headphone output on MacPro 4,1 (v2)") but that didn't help.
> 
> Any ideas before asking for a bisect? Does this hardware version need to have
> the vref fixup as well?

The obvious difference is the power state of each node.  The recent
kernel has the finer power saving mode, and this might be the cause --
Mac has some secret that requires some node to be powered up.

Try to power on each node via hda-verb.  For example, to power up the
node 0x05, run like:
  hda-verb /dev/snd/hwC0D0 0x05 SET_POWER 0x01

And check whether it makes any difference.
Similarly, try for nodes 0x06, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d,
0x0f, 0x10, 0x11, 0x14, 0x15, 0x16, 0x17, 0x18.

thanks,

Takashi

Re: [REGRESSION] Headphones no longer working on MacPro6,1 with 4.4

2016-03-15 Thread Takashi Iwai

On Tue, 15 Mar 2016 20:23:09 +0100,
Laura Abbott wrote:
> 
> Hi,
> 
> We received a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1316119
> that the headphone jack on a MacPro6,1 stopped working on an upgrade from 4.3
> to 4.4.
> 
> The bugzilla has the alsainfo, diffing shows that the Amp-Out vals are
> different. I tried a revert of 9f660a1c4 (" ALSA: hda/realtek - Fix silent
> headphone output on MacPro 4,1 (v2)") but that didn't help.
> 
> Any ideas before asking for a bisect? Does this hardware version need to have
> the vref fixup as well?

The obvious difference is the power state of each node.  The recent
kernel has the finer power saving mode, and this might be the cause --
Mac has some secret that requires some node to be powered up.

Try to power on each node via hda-verb.  For example, to power up the
node 0x05, run like:
  hda-verb /dev/snd/hwC0D0 0x05 SET_POWER 0x01

And check whether it makes any difference.
Similarly, try for nodes 0x06, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d,
0x0f, 0x10, 0x11, 0x14, 0x15, 0x16, 0x17, 0x18.

thanks,

Takashi

Re: [PATCH 1/1] KVM: don't allow irq_fpu_usable when the VCPU's XCR0 is loaded

2016-03-15 Thread Paolo Bonzini



On 15/03/2016 19:27, Andy Lutomirski wrote:
> On Mon, Mar 14, 2016 at 6:17 AM, Paolo Bonzini  wrote:
>>
>>
>> On 11/03/2016 22:33, David Matlack wrote:
 Is this better than just always keeping the host's XCR0 loaded outside
 if the KVM interrupts-disabled region?
>>>
>>> Probably not. AFAICT KVM does not rely on it being loaded outside that
>>> region. xsetbv isn't insanely expensive, is it? Maybe to minimize the
>>> time spent with interrupts disabled it was put outside.
>>>
>>> I do like that your solution would be contained to KVM.
>>
>> I agree with Andy.  We do want a fix for recent kernels because of the
>> !eager_fpu case that Guangrong mentioned.
>>
>> Paolo
>>
>> ps: while Andy is planning to kill lazy FPU, I want to benchmark it with
>> KVM...  Remember that with a single pre-xsave host in your cluster, your
>> virt management might happily default your VMs to a Westmere or Nehalem
>> CPU model.  GCC might be a pretty good testbench for this (e.g. a kernel
>> compile with very high make -j), because outside of the lexer (which
>> plays SIMD games) it never uses the FPU.
> 
> Aren't pre-xsave CPUs really, really old?  A brief search suggests
> that Intel Core added it somewhere in the middle of the cycle.

I am fairly sure it was added in Sandy Bridge, together with AVX. But
what really matters for eager FPU is not xsave, it's xsaveopt, and I
think AMD has never even produced a microprocessor that supports it.

> For pre-xsave, it could indeed hurt performance a tiny bit under
> workloads that use the FPU and then stop completely because the
> xsaveopt and init optimizations aren't available.  But even that is
> probably a very small effect, especially because pre-xsave CPUs have
> smaller FPU state sizes.

It's still a few cache lines.  Benchmarks will tell.

Paolo

Re: [PATCH 1/1] KVM: don't allow irq_fpu_usable when the VCPU's XCR0 is loaded

2016-03-15 Thread Paolo Bonzini



On 15/03/2016 19:27, Andy Lutomirski wrote:
> On Mon, Mar 14, 2016 at 6:17 AM, Paolo Bonzini  wrote:
>>
>>
>> On 11/03/2016 22:33, David Matlack wrote:
 Is this better than just always keeping the host's XCR0 loaded outside
 if the KVM interrupts-disabled region?
>>>
>>> Probably not. AFAICT KVM does not rely on it being loaded outside that
>>> region. xsetbv isn't insanely expensive, is it? Maybe to minimize the
>>> time spent with interrupts disabled it was put outside.
>>>
>>> I do like that your solution would be contained to KVM.
>>
>> I agree with Andy.  We do want a fix for recent kernels because of the
>> !eager_fpu case that Guangrong mentioned.
>>
>> Paolo
>>
>> ps: while Andy is planning to kill lazy FPU, I want to benchmark it with
>> KVM...  Remember that with a single pre-xsave host in your cluster, your
>> virt management might happily default your VMs to a Westmere or Nehalem
>> CPU model.  GCC might be a pretty good testbench for this (e.g. a kernel
>> compile with very high make -j), because outside of the lexer (which
>> plays SIMD games) it never uses the FPU.
> 
> Aren't pre-xsave CPUs really, really old?  A brief search suggests
> that Intel Core added it somewhere in the middle of the cycle.

I am fairly sure it was added in Sandy Bridge, together with AVX. But
what really matters for eager FPU is not xsave, it's xsaveopt, and I
think AMD has never even produced a microprocessor that supports it.

> For pre-xsave, it could indeed hurt performance a tiny bit under
> workloads that use the FPU and then stop completely because the
> xsaveopt and init optimizations aren't available.  But even that is
> probably a very small effect, especially because pre-xsave CPUs have
> smaller FPU state sizes.

It's still a few cache lines.  Benchmarks will tell.

Paolo

[PATCH 1/2] Staging: wlan-ng: removed prototype of p80211_stt_findproto() from this file.

2016-03-15 Thread Claudiu Beznea

This patch removes the prototype of p80211_stt_findproto()
from p80211conv.h since global scope is not necessary.

Signed-off-by: Claudiu Beznea 
---
 drivers/staging/wlan-ng/p80211conv.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/staging/wlan-ng/p80211conv.h 
b/drivers/staging/wlan-ng/p80211conv.h
index 8c10357..6caba9a 100644
--- a/drivers/staging/wlan-ng/p80211conv.h
+++ b/drivers/staging/wlan-ng/p80211conv.h
@@ -155,6 +155,4 @@ int skb_ether_to_p80211(struct wlandevice *wlandev, u32 
ethconv,
struct sk_buff *skb, union p80211_hdr *p80211_hdr,
struct p80211_metawep *p80211_wep);
 
-int p80211_stt_findproto(u16 proto);
-
 #endif
-- 
1.9.1

[PATCH 1/2] Staging: wlan-ng: removed prototype of p80211_stt_findproto() from this file.

2016-03-15 Thread Claudiu Beznea

This patch removes the prototype of p80211_stt_findproto()
from p80211conv.h since global scope is not necessary.

Signed-off-by: Claudiu Beznea 
---
 drivers/staging/wlan-ng/p80211conv.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/staging/wlan-ng/p80211conv.h 
b/drivers/staging/wlan-ng/p80211conv.h
index 8c10357..6caba9a 100644
--- a/drivers/staging/wlan-ng/p80211conv.h
+++ b/drivers/staging/wlan-ng/p80211conv.h
@@ -155,6 +155,4 @@ int skb_ether_to_p80211(struct wlandevice *wlandev, u32 
ethconv,
struct sk_buff *skb, union p80211_hdr *p80211_hdr,
struct p80211_metawep *p80211_wep);
 
-int p80211_stt_findproto(u16 proto);
-
 #endif
-- 
1.9.1

[PATCH 2/2] Staging: wlan-ng: convert p80211_stt_findproto() to static inline functions

2016-03-15 Thread Claudiu Beznea

This patch convert p80211_stt_findproto() to "static inline"
since it is used only in p80211conv.c file and also has
few instructins. After the scope was changed to static
the function definition was moved at the beginning of the
file to avoid undefined references.

Signed-off-by: Claudiu Beznea 
---
 drivers/staging/wlan-ng/p80211conv.c | 64 ++--
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/wlan-ng/p80211conv.c 
b/drivers/staging/wlan-ng/p80211conv.c
index 0a8f396..d3f62329 100644
--- a/drivers/staging/wlan-ng/p80211conv.c
+++ b/drivers/staging/wlan-ng/p80211conv.c
@@ -79,6 +79,38 @@ static u8 oui_rfc1042[] = { 0x00, 0x00, 0x00 };
 static u8 oui_8021h[] = { 0x00, 0x00, 0xf8 };
 
 /*
+* p80211_stt_findproto
+*
+* Searches the 802.1h Selective Translation Table for a given
+* protocol.
+*
+* Arguments:
+*  proto   protocol number (in host order) to search for.
+*
+* Returns:
+*  1 - if the table is empty or a match is found.
+*  0 - if the table is non-empty and a match is not found.
+*
+* Call context:
+*  May be called in interrupt or non-interrupt context
+*
+*/
+static inline int p80211_stt_findproto(u16 proto)
+{
+   /* Always return found for now.  This is the behavior used by the */
+   /* Zoom Win95 driver when 802.1h mode is selected */
+   /* TODO: If necessary, add an actual search we'll probably
+* need this to match the CMAC's way of doing things.
+* Need to do some testing to confirm.
+*/
+
+   if (proto == ETH_P_AARP)/* APPLETALK */
+   return 1;
+
+   return 0;
+}
+
+/*
 * p80211pb_ether_to_80211
 *
 * Uses the contents of the ether frame and the etherconv setting
@@ -509,38 +541,6 @@ int skb_p80211_to_ether(wlandevice_t *wlandev, u32 ethconv,
 }
 
 /*
-* p80211_stt_findproto
-*
-* Searches the 802.1h Selective Translation Table for a given
-* protocol.
-*
-* Arguments:
-*  proto   protocol number (in host order) to search for.
-*
-* Returns:
-*  1 - if the table is empty or a match is found.
-*  0 - if the table is non-empty and a match is not found.
-*
-* Call context:
-*  May be called in interrupt or non-interrupt context
-*
-*/
-int p80211_stt_findproto(u16 proto)
-{
-   /* Always return found for now.  This is the behavior used by the */
-   /* Zoom Win95 driver when 802.1h mode is selected */
-   /* TODO: If necessary, add an actual search we'll probably
-* need this to match the CMAC's way of doing things.
-* Need to do some testing to confirm.
-*/
-
-   if (proto == ETH_P_AARP)/* APPLETALK */
-   return 1;
-
-   return 0;
-}
-
-/*
 * p80211skb_rxmeta_detach
 *
 * Disconnects the frmmeta and rxmeta from an skb.
-- 
1.9.1

[PATCH 2/2] Staging: wlan-ng: convert p80211_stt_findproto() to static inline functions

2016-03-15 Thread Claudiu Beznea

This patch convert p80211_stt_findproto() to "static inline"
since it is used only in p80211conv.c file and also has
few instructins. After the scope was changed to static
the function definition was moved at the beginning of the
file to avoid undefined references.

Signed-off-by: Claudiu Beznea 
---
 drivers/staging/wlan-ng/p80211conv.c | 64 ++--
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/wlan-ng/p80211conv.c 
b/drivers/staging/wlan-ng/p80211conv.c
index 0a8f396..d3f62329 100644
--- a/drivers/staging/wlan-ng/p80211conv.c
+++ b/drivers/staging/wlan-ng/p80211conv.c
@@ -79,6 +79,38 @@ static u8 oui_rfc1042[] = { 0x00, 0x00, 0x00 };
 static u8 oui_8021h[] = { 0x00, 0x00, 0xf8 };
 
 /*
+* p80211_stt_findproto
+*
+* Searches the 802.1h Selective Translation Table for a given
+* protocol.
+*
+* Arguments:
+*  proto   protocol number (in host order) to search for.
+*
+* Returns:
+*  1 - if the table is empty or a match is found.
+*  0 - if the table is non-empty and a match is not found.
+*
+* Call context:
+*  May be called in interrupt or non-interrupt context
+*
+*/
+static inline int p80211_stt_findproto(u16 proto)
+{
+   /* Always return found for now.  This is the behavior used by the */
+   /* Zoom Win95 driver when 802.1h mode is selected */
+   /* TODO: If necessary, add an actual search we'll probably
+* need this to match the CMAC's way of doing things.
+* Need to do some testing to confirm.
+*/
+
+   if (proto == ETH_P_AARP)/* APPLETALK */
+   return 1;
+
+   return 0;
+}
+
+/*
 * p80211pb_ether_to_80211
 *
 * Uses the contents of the ether frame and the etherconv setting
@@ -509,38 +541,6 @@ int skb_p80211_to_ether(wlandevice_t *wlandev, u32 ethconv,
 }
 
 /*
-* p80211_stt_findproto
-*
-* Searches the 802.1h Selective Translation Table for a given
-* protocol.
-*
-* Arguments:
-*  proto   protocol number (in host order) to search for.
-*
-* Returns:
-*  1 - if the table is empty or a match is found.
-*  0 - if the table is non-empty and a match is not found.
-*
-* Call context:
-*  May be called in interrupt or non-interrupt context
-*
-*/
-int p80211_stt_findproto(u16 proto)
-{
-   /* Always return found for now.  This is the behavior used by the */
-   /* Zoom Win95 driver when 802.1h mode is selected */
-   /* TODO: If necessary, add an actual search we'll probably
-* need this to match the CMAC's way of doing things.
-* Need to do some testing to confirm.
-*/
-
-   if (proto == ETH_P_AARP)/* APPLETALK */
-   return 1;
-
-   return 0;
-}
-
-/*
 * p80211skb_rxmeta_detach
 *
 * Disconnects the frmmeta and rxmeta from an skb.
-- 
1.9.1

[REGRESSION] Headphones no longer working on MacPro6,1 with 4.4

2016-03-15 Thread Laura Abbott


Hi,

We received a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1316119
that the headphone jack on a MacPro6,1 stopped working on an upgrade from 4.3
to 4.4.

The bugzilla has the alsainfo, diffing shows that the Amp-Out vals are
different. I tried a revert of 9f660a1c4 (" ALSA: hda/realtek - Fix silent
headphone output on MacPro 4,1 (v2)") but that didn't help.

Any ideas before asking for a bisect? Does this hardware version need to have
the vref fixup as well?

Thanks,
Laura

[REGRESSION] Headphones no longer working on MacPro6,1 with 4.4

2016-03-15 Thread Laura Abbott


Hi,

We received a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1316119
that the headphone jack on a MacPro6,1 stopped working on an upgrade from 4.3
to 4.4.

The bugzilla has the alsainfo, diffing shows that the Amp-Out vals are
different. I tried a revert of 9f660a1c4 (" ALSA: hda/realtek - Fix silent
headphone output on MacPro 4,1 (v2)") but that didn't help.

Any ideas before asking for a bisect? Does this hardware version need to have
the vref fixup as well?

Thanks,
Laura

Re: [PATCH 8/8] sched: prefer cpufreq_scale_freq_capacity

2016-03-15 Thread Dietmar Eggemann

On 14/03/16 05:22, Michael Turquette wrote:
> arch_scale_freq_capacity is weird. It specifies an arch hook for an
> implementation that could easily vary within an architecture or even a
> chip family.
> 
> This patch helps to mitigate this weirdness by defaulting to the
> cpufreq-provided implementation, which should work for all cases where
> CONFIG_CPU_FREQ is set.
> 
> If CONFIG_CPU_FREQ is not set, then try to use an implementation
> provided by the architecture. Failing that, fall back to
> SCHED_CAPACITY_SCALE.
> 
> It may be desirable for cpufreq drivers to specify their own
> implementation of arch_scale_freq_capacity in the future. The same is
> true for platform code within an architecture. In both cases an
> efficient implementation selector will need to be created and this patch
> adds a comment to that effect.

For me this independence of the scheduler code towards the actual
implementation of the Frequency Invariant Engine (FEI) was actually a
feature.

In EAS RFC5.2 (linux-arm.org/linux-power.git energy_model_rfc_v5.2 ,
which hasn't been posted to LKML) we establish the link in the ARCH code
(arch/arm64/include/asm/topology.h).

#ifdef CONFIG_CPU_FREQ
#define arch_scale_freq_capacity cpufreq_scale_freq_capacity
...
+#endif

> 
> Signed-off-by: Michael Turquette 
> ---
>  kernel/sched/sched.h | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 469d11d..37502ea 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1368,7 +1368,21 @@ static inline int hrtick_enabled(struct rq *rq)
>  #ifdef CONFIG_SMP
>  extern void sched_avg_update(struct rq *rq);
>  
> -#ifndef arch_scale_freq_capacity
> +/*
> + * arch_scale_freq_capacity can be implemented by cpufreq, platform code or
> + * arch code. We select the cpufreq-provided implementation first. If it
> + * doesn't exist then we default to any other implementation provided from
> + * platform/arch code. If those do not exist then we use the default
> + * SCHED_CAPACITY_SCALE value below.
> + *
> + * Note that if cpufreq drivers or platform/arch code have competing
> + * implementations it is up to those subsystems to select one at runtime with
> + * an efficient solution, as we cannot tolerate the overhead of indirect
> + * functions (e.g. function pointers) in the scheduler fast path
> + */
> +#ifdef CONFIG_CPU_FREQ
> +#define arch_scale_freq_capacity cpufreq_scale_freq_capacity
> +#elif !defined(arch_scale_freq_capacity)
>  static __always_inline
>  unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
>  {
>

Re: [PATCH 7/8] cpufreq: Frequency invariant scheduler load-tracking support

2016-03-15 Thread Dietmar Eggemann

Hi Mike,

On 14/03/16 05:22, Michael Turquette wrote:
> From: Dietmar Eggemann 
> 
> Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
> frequency scaling correction factor for more accurate load-tracking.
> 
> The factor is:
> 
>   current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
> 
> In fact, freq_scale should be a struct cpufreq_policy data member. But
> this would require that the scheduler hot path (__update_load_avg()) would
> have to grab the cpufreq lock. This can be avoided by using per-cpu data
> initialized to SCHED_CAPACITY_SCALE for freq_scale.
> 
> Signed-off-by: Dietmar Eggemann 
> Signed-off-by: Michael Turquette 
> ---
> I'm not as sure about patches 7 & 8, but I included them since I needed
> frequency invariance while testing.
> 
> As mentioned by myself in 2014 and Rafael last month, the
> arch_scale_freq_capacity hook is awkward, because this behavior may vary
> within an architecture.
> 
> I re-introduce Dietmar's generic cpufreq implementation of the frequency
> invariance hook in this patch,  and change the preprocessor magic in
> sched.h to favor the cpufreq implementation over arch- or
> platform-specific ones in the next patch.

Maybe it is worth mentioning that this patch is from EAS RFC5.2
(linux-arm.org/linux-power.git energy_model_rfc_v5.2) which hasn't been
posted to LKML. The last EAS RFCv5 has the Frequency Invariant Engine
(FEI) based on the cpufreq notifier calls (cpufreq_callback,
cpufreq_policy_callback) in the ARM arch code.

> If run-time selection of ops is needed them someone will need to write
> that code.

Right now I see 3 different implementations of the FEI. 1) The X86
aperf/mperf based one (https://lkml.org/lkml/2016/3/3/589), 2) This one
in cpufreq.c and 3) the one based on cpufreq notifiers in ARCH (ARM,
ARM64) code.

I guess with sched_util we do need a solution for all platforms
(different archs, x86 w/ and w/o X86_FEATURE_APERFMPERF, ...).

> I think that this negates the need for the arm arch hooks[0-2], and
> hopefully Morten and Dietmar can weigh in on this.

It's true that we tried to get rid of the usage of the cpufreq callbacks
(cpufreq_callback, cpufreq_policy_callback) with this patch. Plus we
didn't want to implement it twice (for ARM and ARM64).

But 2) would have to work for other ARCHs as well. Maybe as a fall-back
for X86 w/o X86_FEATURE_APERFMPERF feature?

[...]

Re: [PATCH 7/8] cpufreq: Frequency invariant scheduler load-tracking support

2016-03-15 Thread Dietmar Eggemann

Hi Mike,

On 14/03/16 05:22, Michael Turquette wrote:
> From: Dietmar Eggemann 
> 
> Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
> frequency scaling correction factor for more accurate load-tracking.
> 
> The factor is:
> 
>   current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
> 
> In fact, freq_scale should be a struct cpufreq_policy data member. But
> this would require that the scheduler hot path (__update_load_avg()) would
> have to grab the cpufreq lock. This can be avoided by using per-cpu data
> initialized to SCHED_CAPACITY_SCALE for freq_scale.
> 
> Signed-off-by: Dietmar Eggemann 
> Signed-off-by: Michael Turquette 
> ---
> I'm not as sure about patches 7 & 8, but I included them since I needed
> frequency invariance while testing.
> 
> As mentioned by myself in 2014 and Rafael last month, the
> arch_scale_freq_capacity hook is awkward, because this behavior may vary
> within an architecture.
> 
> I re-introduce Dietmar's generic cpufreq implementation of the frequency
> invariance hook in this patch,  and change the preprocessor magic in
> sched.h to favor the cpufreq implementation over arch- or
> platform-specific ones in the next patch.

Maybe it is worth mentioning that this patch is from EAS RFC5.2
(linux-arm.org/linux-power.git energy_model_rfc_v5.2) which hasn't been
posted to LKML. The last EAS RFCv5 has the Frequency Invariant Engine
(FEI) based on the cpufreq notifier calls (cpufreq_callback,
cpufreq_policy_callback) in the ARM arch code.

> If run-time selection of ops is needed them someone will need to write
> that code.

Right now I see 3 different implementations of the FEI. 1) The X86
aperf/mperf based one (https://lkml.org/lkml/2016/3/3/589), 2) This one
in cpufreq.c and 3) the one based on cpufreq notifiers in ARCH (ARM,
ARM64) code.

I guess with sched_util we do need a solution for all platforms
(different archs, x86 w/ and w/o X86_FEATURE_APERFMPERF, ...).

> I think that this negates the need for the arm arch hooks[0-2], and
> hopefully Morten and Dietmar can weigh in on this.

It's true that we tried to get rid of the usage of the cpufreq callbacks
(cpufreq_callback, cpufreq_policy_callback) with this patch. Plus we
didn't want to implement it twice (for ARM and ARM64).

But 2) would have to work for other ARCHs as well. Maybe as a fall-back
for X86 w/o X86_FEATURE_APERFMPERF feature?

[...]

Re: [PATCH 8/8] sched: prefer cpufreq_scale_freq_capacity

2016-03-15 Thread Dietmar Eggemann

On 14/03/16 05:22, Michael Turquette wrote:
> arch_scale_freq_capacity is weird. It specifies an arch hook for an
> implementation that could easily vary within an architecture or even a
> chip family.
> 
> This patch helps to mitigate this weirdness by defaulting to the
> cpufreq-provided implementation, which should work for all cases where
> CONFIG_CPU_FREQ is set.
> 
> If CONFIG_CPU_FREQ is not set, then try to use an implementation
> provided by the architecture. Failing that, fall back to
> SCHED_CAPACITY_SCALE.
> 
> It may be desirable for cpufreq drivers to specify their own
> implementation of arch_scale_freq_capacity in the future. The same is
> true for platform code within an architecture. In both cases an
> efficient implementation selector will need to be created and this patch
> adds a comment to that effect.

For me this independence of the scheduler code towards the actual
implementation of the Frequency Invariant Engine (FEI) was actually a
feature.

In EAS RFC5.2 (linux-arm.org/linux-power.git energy_model_rfc_v5.2 ,
which hasn't been posted to LKML) we establish the link in the ARCH code
(arch/arm64/include/asm/topology.h).

#ifdef CONFIG_CPU_FREQ
#define arch_scale_freq_capacity cpufreq_scale_freq_capacity
...
+#endif

> 
> Signed-off-by: Michael Turquette 
> ---
>  kernel/sched/sched.h | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 469d11d..37502ea 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1368,7 +1368,21 @@ static inline int hrtick_enabled(struct rq *rq)
>  #ifdef CONFIG_SMP
>  extern void sched_avg_update(struct rq *rq);
>  
> -#ifndef arch_scale_freq_capacity
> +/*
> + * arch_scale_freq_capacity can be implemented by cpufreq, platform code or
> + * arch code. We select the cpufreq-provided implementation first. If it
> + * doesn't exist then we default to any other implementation provided from
> + * platform/arch code. If those do not exist then we use the default
> + * SCHED_CAPACITY_SCALE value below.
> + *
> + * Note that if cpufreq drivers or platform/arch code have competing
> + * implementations it is up to those subsystems to select one at runtime with
> + * an efficient solution, as we cannot tolerate the overhead of indirect
> + * functions (e.g. function pointers) in the scheduler fast path
> + */
> +#ifdef CONFIG_CPU_FREQ
> +#define arch_scale_freq_capacity cpufreq_scale_freq_capacity
> +#elif !defined(arch_scale_freq_capacity)
>  static __always_inline
>  unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
>  {
>

Re: [GIT PULL] RCU changes for v4.6

2016-03-15 Thread Linus Torvalds

On Tue, Mar 15, 2016 at 11:48 AM, Paul E. McKenney
 wrote:
>
> On the quick quizzes, if you want me to get rid of them, they are gone.

You don't have to remove them (but I do think cartoons etc should be).

But dammit, you don't need to duplicate a big file or use a
non-standard format for something as trivial as a quiz.

There are *trivial* solutions to this:

 - just move the answer in the html file (or to another html file past a link).

 - or just make the answer be in text using the background color (have
people select the text to see it)

 - or make the answer be in a tiny font and make people use "ctrl-+"
or whatever.

None of these require a non-standard format and munging and duplication.

   Linus

Re: [GIT PULL] RCU changes for v4.6

2016-03-15 Thread Linus Torvalds

On Tue, Mar 15, 2016 at 11:48 AM, Paul E. McKenney
 wrote:
>
> On the quick quizzes, if you want me to get rid of them, they are gone.

You don't have to remove them (but I do think cartoons etc should be).

But dammit, you don't need to duplicate a big file or use a
non-standard format for something as trivial as a quiz.

There are *trivial* solutions to this:

 - just move the answer in the html file (or to another html file past a link).

 - or just make the answer be in text using the background color (have
people select the text to see it)

 - or make the answer be in a tiny font and make people use "ctrl-+"
or whatever.

None of these require a non-standard format and munging and duplication.

   Linus

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 1476 matches

Mail list logo