date:20160829

Re: Support for configurable PCIe endpoint

2016-08-29 Thread Roy Zang

On 08/18/2016 07:25 AM, Kishon Vijay Abraham I wrote:
> Hi,
>
> On Wednesday 17 August 2016 03:19 PM, Mingkai Hu wrote:
>>
>>> -Original Message-
>>> From: Kishon Vijay Abraham I [mailto:kis...@ti.com]
>>> Sent: Thursday, August 04, 2016 6:02 PM
>>> To: Joao Pinto ; bhelg...@google.com; linux-
>>> p...@vger.kernel.org; a...@arndb.de; Jingoo Han ;
>>> Pratyush Anand 
>>> Cc: Ley Foon Tan ; Rob Herring ;
>>> Tanmay Inamdar ; Roy Zang >> fei.z...@freescale.com>; Mingkai Hu ;
>>> Minghuan Lian ; Richard Zhu
>>> ; Lucas Stach ;
>>> Murali Karicheri ; Thomas Petazzoni
>>> ; Jason Cooper
>>> ; Thierry Reding ;
>>> Simon Horman ; Zhou Wang
>>> ; Gabriele Paoloni
>>> ; Stanimir Varbanov >> sol.com>; David Daney ; linux-
>>> ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
>>> o...@vger.kernel.org; Carlos Palminha
>>> 
>>> Subject: Re: Support for configurable PCIe endpoint
>>>
>>> Hi,
>>>
>>> On Wednesday 03 August 2016 07:09 PM, Joao Pinto wrote:
 Hi Kishon,

 On 8/3/2016 7:03 AM, Kishon Vijay Abraham I wrote:
> Hi,
>
> The PCIe controller present in TI's DRA7 SoC is capable of operating
> either in Root Complex mode or Endpoint mode. (It uses Synopsys
> Designware Core). I'd assume most of the PCIe controllers on other
> platforms that use Designware core should also be capable to operate
> in endpoint mode. But linux kernel right now supports only RC mode.
>
> PCIe endpoint support discussion came up briefly before [1] but it
> was felt the practical use case will find firmware more suitable and
> endpoint support in kernel can be used only for validation or demo.
>
> *) Modify platform driver to support EP mode (in my case pci-dra7xx.c).
>
> *) dt binding specific to EP mode should be created.
>
> Once I complete the implementation and start posting RFC patches, a
> lot of these will become clear. But I want to check if this sounds
> okay to you guys before starting the implementation.
>
> Let me know if you have some other ideas too.
>
> Cheers
> Kishon
>
> [1] -> http://www.spinics.net/lists/linux-pci/msg26026.html
>
 You are rising a topic that we are also addressing in Synopsys.

 For the PCIe RC hardware validation we are currently using the
 standard pcie-designware and pcie-designware-plat drivers.

 For the Endpoint we have to use an internal software package. Its main
 purpose is to initialize the IP registers, eDMA channels and make data
 transfer to prove that the everything is working properly. This is
 done in 2 levels, a custom driver built and loaded and an application
 that makes some ioctl to the driver executing some interesting
 functions to check the Endpoint status and make some data exchange.
>>> hmm.. the platform I have doesn't have a DMA in PCIe IP
>>> (http://www.ti.com/lit/ug/spruhz6g/spruhz6g.pdf). So in your testing does
>>> the EP access RC memory? i.e the driver in the RC allocates memory from it's
>>> DDR and gives it's DDR address to the EP. The EP then transfers data to this
>>> address. (This is a typical use case with ethernet PCIe cards). IIUC that's 
>>> not
>>> simple with configurable EPs. I'd like to know more about your testing 
>>> though.
>>>
>> Hi Kishon,
>>
>> This is a typical user case for EP to use DMA transfer data to/from RC 
>> memory.
>> In our case, we implement ring (like BD ring) or register in EP to 
>> communicate
>> The address allocated in RC memory, then EP can transfer data to/from RC 
>> memory.
> Initially I had some confusion w.r.t this because the address allocated in RC
> memory can also be an address in EP system. For example let's assume we 
> connect
> two similar systems one configured as RC and the other configured as EP. The
> PCI driver in the RC allocates memory in it's DDR (say 0x8000) and 
> programs
> this address in the EP. Since it's a similar system, 0x8000 will also be 
> an
> address in the EPs DDR. This will result in EP transferring data to it's own
> DDR (at 0x8000) instead of the same address in RC.
>
> But later realized instead of directly using the DDR address given by RC, this
> address should only be used to program the outbound window. That way the 
> target
> of the outbound window can be an address given by the RC and source should be
> an address from the address space in the EP's system.
>
> Do you also use the RC memory address to program

Re: Support for configurable PCIe endpoint

2016-08-29 Thread Roy Zang

On 08/18/2016 07:25 AM, Kishon Vijay Abraham I wrote:
> Hi,
>
> On Wednesday 17 August 2016 03:19 PM, Mingkai Hu wrote:
>>
>>> -Original Message-
>>> From: Kishon Vijay Abraham I [mailto:kis...@ti.com]
>>> Sent: Thursday, August 04, 2016 6:02 PM
>>> To: Joao Pinto ; bhelg...@google.com; linux-
>>> p...@vger.kernel.org; a...@arndb.de; Jingoo Han ;
>>> Pratyush Anand 
>>> Cc: Ley Foon Tan ; Rob Herring ;
>>> Tanmay Inamdar ; Roy Zang >> fei.z...@freescale.com>; Mingkai Hu ;
>>> Minghuan Lian ; Richard Zhu
>>> ; Lucas Stach ;
>>> Murali Karicheri ; Thomas Petazzoni
>>> ; Jason Cooper
>>> ; Thierry Reding ;
>>> Simon Horman ; Zhou Wang
>>> ; Gabriele Paoloni
>>> ; Stanimir Varbanov >> sol.com>; David Daney ; linux-
>>> ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
>>> o...@vger.kernel.org; Carlos Palminha
>>> 
>>> Subject: Re: Support for configurable PCIe endpoint
>>>
>>> Hi,
>>>
>>> On Wednesday 03 August 2016 07:09 PM, Joao Pinto wrote:
 Hi Kishon,

 On 8/3/2016 7:03 AM, Kishon Vijay Abraham I wrote:
> Hi,
>
> The PCIe controller present in TI's DRA7 SoC is capable of operating
> either in Root Complex mode or Endpoint mode. (It uses Synopsys
> Designware Core). I'd assume most of the PCIe controllers on other
> platforms that use Designware core should also be capable to operate
> in endpoint mode. But linux kernel right now supports only RC mode.
>
> PCIe endpoint support discussion came up briefly before [1] but it
> was felt the practical use case will find firmware more suitable and
> endpoint support in kernel can be used only for validation or demo.
>
> *) Modify platform driver to support EP mode (in my case pci-dra7xx.c).
>
> *) dt binding specific to EP mode should be created.
>
> Once I complete the implementation and start posting RFC patches, a
> lot of these will become clear. But I want to check if this sounds
> okay to you guys before starting the implementation.
>
> Let me know if you have some other ideas too.
>
> Cheers
> Kishon
>
> [1] -> http://www.spinics.net/lists/linux-pci/msg26026.html
>
 You are rising a topic that we are also addressing in Synopsys.

 For the PCIe RC hardware validation we are currently using the
 standard pcie-designware and pcie-designware-plat drivers.

 For the Endpoint we have to use an internal software package. Its main
 purpose is to initialize the IP registers, eDMA channels and make data
 transfer to prove that the everything is working properly. This is
 done in 2 levels, a custom driver built and loaded and an application
 that makes some ioctl to the driver executing some interesting
 functions to check the Endpoint status and make some data exchange.
>>> hmm.. the platform I have doesn't have a DMA in PCIe IP
>>> (http://www.ti.com/lit/ug/spruhz6g/spruhz6g.pdf). So in your testing does
>>> the EP access RC memory? i.e the driver in the RC allocates memory from it's
>>> DDR and gives it's DDR address to the EP. The EP then transfers data to this
>>> address. (This is a typical use case with ethernet PCIe cards). IIUC that's 
>>> not
>>> simple with configurable EPs. I'd like to know more about your testing 
>>> though.
>>>
>> Hi Kishon,
>>
>> This is a typical user case for EP to use DMA transfer data to/from RC 
>> memory.
>> In our case, we implement ring (like BD ring) or register in EP to 
>> communicate
>> The address allocated in RC memory, then EP can transfer data to/from RC 
>> memory.
> Initially I had some confusion w.r.t this because the address allocated in RC
> memory can also be an address in EP system. For example let's assume we 
> connect
> two similar systems one configured as RC and the other configured as EP. The
> PCI driver in the RC allocates memory in it's DDR (say 0x8000) and 
> programs
> this address in the EP. Since it's a similar system, 0x8000 will also be 
> an
> address in the EPs DDR. This will result in EP transferring data to it's own
> DDR (at 0x8000) instead of the same address in RC.
>
> But later realized instead of directly using the DDR address given by RC, this
> address should only be used to program the outbound window. That way the 
> target
> of the outbound window can be an address given by the RC and source should be
> an address from the address space in the EP's system.
>
> Do you also use the RC memory address to program the outbound window?
>

When EP access RC memory, from EP perspective, there should be a offset
added to 0x8 to match the pcie outbound access  window.
Thanks.
Roy

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Aaron Lu

On 08/30/2016 12:44 PM, Anshuman Khandual wrote:
> On 08/30/2016 09:09 AM, Andrew Morton wrote:
>> On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:
>>
> Case used for test on Haswell EP:
> usemem -n 72 --readonly -j 0x20 100G
> Which spawns 72 processes and each will mmap 100G anonymous space and
> then do read only access to that space sequentially with a step of 2MB.
>
> perf report for base commit:
> 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
> perf report for this commit:
>  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page

 Does this mean that overall usemem runtime halved?
>>>
>>> Sorry for the confusion, the above line is extracted from perf report.
>>> It shows the percent of CPU cycles executed in a specific function.
>>>
>>> The above two perf lines are used to show get_huge_zero_page doesn't
>>> consume that much CPU cycles after applying the patch.
>>>

 Do we have any numbers for something which is more real-wordly?
>>>
>>> Unfortunately, no real world numbers.
>>>
>>> We think the global atomic counter could be an issue for performance
>>> so I'm trying to solve the problem.
>>
>> So, umm, we don't actually know if the patch is useful to anyone?
> 
> On a POWER system it improves the CPU consumption of the above mentioned
> function a little bit. Dont think its going to improve actual throughput
> of the workload substantially.
> 
> 0.07%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page

I guess this is the base commit? But there shouldn't be the new
mm_get_huge_zero_page symbol before this patch. A typo perhaps?

Regards,
Aaron

> to
> 
> 0.01%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page
>

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Aaron Lu

On 08/30/2016 12:44 PM, Anshuman Khandual wrote:
> On 08/30/2016 09:09 AM, Andrew Morton wrote:
>> On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:
>>
> Case used for test on Haswell EP:
> usemem -n 72 --readonly -j 0x20 100G
> Which spawns 72 processes and each will mmap 100G anonymous space and
> then do read only access to that space sequentially with a step of 2MB.
>
> perf report for base commit:
> 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
> perf report for this commit:
>  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page

 Does this mean that overall usemem runtime halved?
>>>
>>> Sorry for the confusion, the above line is extracted from perf report.
>>> It shows the percent of CPU cycles executed in a specific function.
>>>
>>> The above two perf lines are used to show get_huge_zero_page doesn't
>>> consume that much CPU cycles after applying the patch.
>>>

 Do we have any numbers for something which is more real-wordly?
>>>
>>> Unfortunately, no real world numbers.
>>>
>>> We think the global atomic counter could be an issue for performance
>>> so I'm trying to solve the problem.
>>
>> So, umm, we don't actually know if the patch is useful to anyone?
> 
> On a POWER system it improves the CPU consumption of the above mentioned
> function a little bit. Dont think its going to improve actual throughput
> of the workload substantially.
> 
> 0.07%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page

I guess this is the base commit? But there shouldn't be the new
mm_get_huge_zero_page symbol before this patch. A typo perhaps?

Regards,
Aaron

> to
> 
> 0.01%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page
>

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Aaron Lu

On 08/30/2016 11:39 AM, Andrew Morton wrote:
> On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:
> 
 Case used for test on Haswell EP:
 usemem -n 72 --readonly -j 0x20 100G
 Which spawns 72 processes and each will mmap 100G anonymous space and
 then do read only access to that space sequentially with a step of 2MB.

 perf report for base commit:
 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
 perf report for this commit:
  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
>>>
>>> Does this mean that overall usemem runtime halved?
>>
>> Sorry for the confusion, the above line is extracted from perf report.
>> It shows the percent of CPU cycles executed in a specific function.
>>
>> The above two perf lines are used to show get_huge_zero_page doesn't
>> consume that much CPU cycles after applying the patch.
>>
>>>
>>> Do we have any numbers for something which is more real-wordly?
>>
>> Unfortunately, no real world numbers.
>>
>> We think the global atomic counter could be an issue for performance
>> so I'm trying to solve the problem.
> 
> So, umm, we don't actually know if the patch is useful to anyone?

It should help when multiple processes are doing read only anonymous
page faults with THP enabled.

> 
> Some more measurements would help things along, please.
 
In addition to the perf cycles drop in the get_huge_zero_page function,
the throughput for the above workload also increased a lot.

usemem -n 72 --readonly -j 0x20 100G

base commit
$ cat 7289420fc8e98999c8b7c1c2c888549ccc9aa96f/0/vm-scalability.json 
{
  "vm-scalability.throughput": [
1784430792
  ],
}

this patch
$ cat a57acb91d1a29efc4cf34ffee09e1cebe93dcd24/0/vm-scalability.json 
{
  "vm-scalability.throughput": [
4726928591
  ],
}

Throughput wise, it's a 164% gain.
Runtime wise, it's reduced from 707592 usecs to 303970 usecs, 50%+ drop.

Granted, real world use case may not encounter such an extreme case so
the gain would be much smaller.

Thanks,
Aaron

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Aaron Lu

On 08/30/2016 11:39 AM, Andrew Morton wrote:
> On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:
> 
 Case used for test on Haswell EP:
 usemem -n 72 --readonly -j 0x20 100G
 Which spawns 72 processes and each will mmap 100G anonymous space and
 then do read only access to that space sequentially with a step of 2MB.

 perf report for base commit:
 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
 perf report for this commit:
  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
>>>
>>> Does this mean that overall usemem runtime halved?
>>
>> Sorry for the confusion, the above line is extracted from perf report.
>> It shows the percent of CPU cycles executed in a specific function.
>>
>> The above two perf lines are used to show get_huge_zero_page doesn't
>> consume that much CPU cycles after applying the patch.
>>
>>>
>>> Do we have any numbers for something which is more real-wordly?
>>
>> Unfortunately, no real world numbers.
>>
>> We think the global atomic counter could be an issue for performance
>> so I'm trying to solve the problem.
> 
> So, umm, we don't actually know if the patch is useful to anyone?

It should help when multiple processes are doing read only anonymous
page faults with THP enabled.

> 
> Some more measurements would help things along, please.
 
In addition to the perf cycles drop in the get_huge_zero_page function,
the throughput for the above workload also increased a lot.

usemem -n 72 --readonly -j 0x20 100G

base commit
$ cat 7289420fc8e98999c8b7c1c2c888549ccc9aa96f/0/vm-scalability.json 
{
  "vm-scalability.throughput": [
1784430792
  ],
}

this patch
$ cat a57acb91d1a29efc4cf34ffee09e1cebe93dcd24/0/vm-scalability.json 
{
  "vm-scalability.throughput": [
4726928591
  ],
}

Throughput wise, it's a 164% gain.
Runtime wise, it's reduced from 707592 usecs to 303970 usecs, 50%+ drop.

Granted, real world use case may not encounter such an extreme case so
the gain would be much smaller.

Thanks,
Aaron

[PATCH v9 1/2] ASoC: sun4i-codec: Distinguish sun4i from sun7i

2016-08-29 Thread Danny Milosavljevic

This distinguishes sun4i from sun7i. It is necessary because they use
different registers for the audio mixer.
---
 sound/soc/sunxi/sun4i-codec.c | 44 +--
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/sound/soc/sunxi/sun4i-codec.c b/sound/soc/sunxi/sun4i-codec.c
index 0e19c50..30f4ea2 100644
--- a/sound/soc/sunxi/sun4i-codec.c
+++ b/sound/soc/sunxi/sun4i-codec.c
@@ -96,8 +96,9 @@
 /* Other various ADC registers */
 #define SUN4I_CODEC_DAC_TXCNT  (0x30)
 #define SUN4I_CODEC_ADC_RXCNT  (0x34)
-#define SUN4I_CODEC_AC_SYS_VERI(0x38)
-#define SUN4I_CODEC_AC_MIC_PHONE_CAL   (0x3c)
+
+#define SUN7I_CODEC_AC_DAC_CAL (0x38)
+#define SUN7I_CODEC_AC_MIC_PHONE_CAL   (0x3c)
 
 struct sun4i_codec {
struct device   *dev;
@@ -509,10 +510,17 @@ static const struct snd_kcontrol_new sun4i_codec_pa_mute =
 
 static DECLARE_TLV_DB_SCALE(sun4i_codec_pa_volume_scale, -6300, 100, 1);
 
-static const struct snd_kcontrol_new sun4i_codec_widgets[] = {
-   SOC_SINGLE_TLV("Power Amplifier Volume", SUN4I_CODEC_DAC_ACTL,
-  SUN4I_CODEC_DAC_ACTL_PA_VOL, 0x3F, 0,
-  sun4i_codec_pa_volume_scale),
+#define SUN4I_COMMON_CODEC_CONTROLS \
+   SOC_SINGLE_TLV("Power Amplifier Volume", SUN4I_CODEC_DAC_ACTL,\
+  SUN4I_CODEC_DAC_ACTL_PA_VOL, 0x3F, 0,\
+  sun4i_codec_pa_volume_scale)
+
+static const struct snd_kcontrol_new sun4i_codec_controls[] = {
+   SUN4I_COMMON_CODEC_CONTROLS,
+};
+
+static const struct snd_kcontrol_new sun7i_codec_controls[] = {
+   SUN4I_COMMON_CODEC_CONTROLS,
 };
 
 static const struct snd_kcontrol_new sun4i_codec_left_mixer_controls[] = {
@@ -629,8 +637,18 @@ static const struct snd_soc_dapm_route 
sun4i_codec_codec_dapm_routes[] = {
 
 static struct snd_soc_codec_driver sun4i_codec_codec = {
.component_driver = {
-   .controls   = sun4i_codec_widgets,
-   .num_controls   = ARRAY_SIZE(sun4i_codec_widgets),
+   .controls   = sun4i_codec_controls,
+   .num_controls   = ARRAY_SIZE(sun4i_codec_controls),
+   .dapm_widgets   = sun4i_codec_codec_dapm_widgets,
+   .num_dapm_widgets   = 
ARRAY_SIZE(sun4i_codec_codec_dapm_widgets),
+   .dapm_routes= sun4i_codec_codec_dapm_routes,
+   .num_dapm_routes= 
ARRAY_SIZE(sun4i_codec_codec_dapm_routes),
+   },
+};
+static struct snd_soc_codec_driver sun7i_codec_codec = {
+   .component_driver = {
+   .controls   = sun7i_codec_controls,
+   .num_controls   = ARRAY_SIZE(sun7i_codec_controls),
.dapm_widgets   = sun4i_codec_codec_dapm_widgets,
.num_dapm_widgets   = 
ARRAY_SIZE(sun4i_codec_codec_dapm_widgets),
.dapm_routes= sun4i_codec_codec_dapm_routes,
@@ -682,7 +700,7 @@ static const struct regmap_config sun4i_codec_regmap_config 
= {
.reg_bits   = 32,
.reg_stride = 4,
.val_bits   = 32,
-   .max_register   = SUN4I_CODEC_AC_MIC_PHONE_CAL,
+   .max_register   = SUN7I_CODEC_AC_MIC_PHONE_CAL,
 };
 
 static const struct of_device_id sun4i_codec_of_match[] = {
@@ -760,6 +778,7 @@ static int sun4i_codec_probe(struct platform_device *pdev)
 {
struct snd_soc_card *card;
struct sun4i_codec *scodec;
+   struct snd_soc_codec_driver *codec;
struct resource *res;
void __iomem *base;
int ret;
@@ -822,7 +841,12 @@ static int sun4i_codec_probe(struct platform_device *pdev)
scodec->capture_dma_data.maxburst = 4;
scodec->capture_dma_data.addr_width = DMA_SLAVE_BUSWIDTH_2_BYTES;
 
-   ret = snd_soc_register_codec(>dev, _codec_codec,
+   if (of_device_is_compatible(pdev->dev.of_node,
+   "allwinner,sun7i-a20-codec"))
+   codec = _codec_codec;
+   else
+   codec = _codec_codec;
+   ret = snd_soc_register_codec(>dev, codec,
 _codec_dai, 1);
if (ret) {
dev_err(>dev, "Failed to register our codec\n");

[PATCH v9 2/2] Add mixer controls: Line-In, FM-In, Mic 2, Capture Source, Differential Line-In.

2016-08-29 Thread Danny Milosavljevic

Note: Mic1 Capture Volume is in a different register on A20 than on A10.
Note: Mic2 Capture Volume is in a different register on A20 than on A10.
---
 sound/soc/sunxi/sun4i-codec.c | 256 ++
 1 file changed, 236 insertions(+), 20 deletions(-)

diff --git a/sound/soc/sunxi/sun4i-codec.c b/sound/soc/sunxi/sun4i-codec.c
index 30f4ea2..f510e6d 100644
--- a/sound/soc/sunxi/sun4i-codec.c
+++ b/sound/soc/sunxi/sun4i-codec.c
@@ -59,9 +59,20 @@
 #define SUN4I_CODEC_DAC_ACTL_DACAENR   (31)
 #define SUN4I_CODEC_DAC_ACTL_DACAENL   (30)
 #define SUN4I_CODEC_DAC_ACTL_MIXEN (29)
+#define SUN4I_CODEC_DAC_ACTL_LNG   (26)
+#define SUN4I_CODEC_DAC_ACTL_FMG   (23)
+#define SUN4I_CODEC_DAC_ACTL_MICG  (20)
+#define SUN4I_CODEC_DAC_ACTL_LLNS  (19)
+#define SUN4I_CODEC_DAC_ACTL_RLNS  (18)
+#define SUN4I_CODEC_DAC_ACTL_LFMS  (17)
+#define SUN4I_CODEC_DAC_ACTL_RFMS  (16)
 #define SUN4I_CODEC_DAC_ACTL_LDACLMIXS (15)
 #define SUN4I_CODEC_DAC_ACTL_RDACRMIXS (14)
 #define SUN4I_CODEC_DAC_ACTL_LDACRMIXS (13)
+#define SUN4I_CODEC_DAC_ACTL_MIC1LS(12)
+#define SUN4I_CODEC_DAC_ACTL_MIC1RS(11)
+#define SUN4I_CODEC_DAC_ACTL_MIC2LS(10)
+#define SUN4I_CODEC_DAC_ACTL_MIC2RS(9)
 #define SUN4I_CODEC_DAC_ACTL_DACPAS(8)
 #define SUN4I_CODEC_DAC_ACTL_MIXPAS(7)
 #define SUN4I_CODEC_DAC_ACTL_PA_MUTE   (6)
@@ -87,8 +98,12 @@
 #define SUN4I_CODEC_ADC_ACTL_PREG1EN   (29)
 #define SUN4I_CODEC_ADC_ACTL_PREG2EN   (28)
 #define SUN4I_CODEC_ADC_ACTL_VMICEN(27)
-#define SUN4I_CODEC_ADC_ACTL_VADCG (20)
+#define SUN4I_CODEC_ADC_ACTL_PREG1 (25)
+#define SUN4I_CODEC_ADC_ACTL_PREG2 (23)
+#define SUN4I_CODEC_ADC_ACTL_ADCG  (20)
 #define SUN4I_CODEC_ADC_ACTL_ADCIS (17)
+#define SUN4I_CODEC_ADC_ACTL_LNRDF (16)
+#define SUN4I_CODEC_ADC_ACTL_LNPREG(13)
 #define SUN4I_CODEC_ADC_ACTL_PA_EN (4)
 #define SUN4I_CODEC_ADC_ACTL_DDE   (3)
 #define SUN4I_CODEC_ADC_DEBUG  (0x2c)
@@ -100,6 +115,9 @@
 #define SUN7I_CODEC_AC_DAC_CAL (0x38)
 #define SUN7I_CODEC_AC_MIC_PHONE_CAL   (0x3c)
 
+#define SUN7I_CODEC_AC_MIC_PHONE_CAL_PREG1  (29)
+#define SUN7I_CODEC_AC_MIC_PHONE_CAL_PREG2  (26)
+
 struct sun4i_codec {
struct device   *dev;
struct regmap   *regmap;
@@ -509,23 +527,142 @@ static const struct snd_kcontrol_new sun4i_codec_pa_mute 
=
SUN4I_CODEC_DAC_ACTL_PA_MUTE, 1, 0);
 
 static DECLARE_TLV_DB_SCALE(sun4i_codec_pa_volume_scale, -6300, 100, 1);
+static DECLARE_TLV_DB_SCALE(sun4i_codec_linein_loopback_gain_scale,
+   -150,
+   150,
+   0);
+static DECLARE_TLV_DB_SCALE(sun4i_codec_linein_preamp_gain_scale,
+   -1200,
+   300,
+   0);
+static DECLARE_TLV_DB_SCALE(sun4i_codec_fmin_loopback_gain_scale,
+   -450,
+   150,
+   0);
+static DECLARE_TLV_DB_SCALE(sun4i_codec_micin_loopback_gain_scale,
+   -450,
+   150,
+   0);
+static DECLARE_TLV_DB_RANGE(sun4i_codec_micin_preamp_gain_scale,
+   0, 0, TLV_DB_SCALE_ITEM(0, 0, 0),
+   1, 7, TLV_DB_SCALE_ITEM(3500, 300, 0));
+static DECLARE_TLV_DB_SCALE(sun4i_codec_adc_gain_scale, -450, 150, 0);
+static DECLARE_TLV_DB_RANGE(sun7i_codec_micin_preamp_gain_scale,
+   0, 0, TLV_DB_SCALE_ITEM(0, 0, 0),
+   1, 7, TLV_DB_SCALE_ITEM(2400, 300, 0)
+);
+
+static const char * const sun4i_codec_capture_source[] = {
+   "Line",
+   "FM",
+   "Mic1",
+   "Mic2",
+   "Mic1,Mic2",
+   "Mic1+Mic2",
+   "Output Mixer",
+   "Line,Mic1",
+};
+static SOC_ENUM_SINGLE_DECL(sun4i_codec_enum_capture_source,
+   SUN4I_CODEC_ADC_ACTL,
+   SUN4I_CODEC_ADC_ACTL_ADCIS,
+   sun4i_codec_capture_source);
+
+static const struct snd_kcontrol_new sun4i_codec_capture_source_controls =
+   SOC_DAPM_ENUM("Capture Source", sun4i_codec_enum_capture_source);
+
+static const char * const sun4i_codec_difflinein_capture_source[] = {
+   "Non-Differential",
+   "Differential",
+};
+static SOC_ENUM_SINGLE_DECL(sun4i_codec_enum_difflinein_capture_source,
+   SUN4I_CODEC_ADC_ACTL,
+

[PATCH v9 0/2] sun4i-codec: Add Line-In, FM-In, Mic 2, Capture Source, Differential Line-In

2016-08-29 Thread Danny Milosavljevic

This patch adds support for some mixer controls:
 - Line-In
 - FM-In
 - Mic 2
 - Capture Source
 - Differential Line-In

v9 changes compared to v8 are:
 - added Line Differential Capture Switch.
 - split Capture Source into Left Capture Select, Right Capture Select.
 - added Line Capture Volume.
 - rename "sun4i_codec_widgets" to "sun4i_codec_controls" for 
   consistency with the struct field it's used in.
 - rename "Line-In" to "Line".
 - rename "Power Amplifier Playback Volume" to "Headphone Playback Volume".

v8 changes compared to v7 are:
 - fixed the routes for line and mic capturing.

v7 changes compared to v6 are:
 - preparation for different A20, A10 controls is now in an extra patch.
 - all register definitions are now at the top.
 - sun7i-specific things (A20-specific things) are now less grouped-together.
 - rename "Power Amplifier Volume" to "Power Amplifier Playback Volume".

v6 changes compared to v5 are:
 - Mic preamplifier special cases for A20 and A10 now are now not icky:
   There are two different _widget arrays and the probe() function now 
   selects the right one to pass to snd_soc_register_codec() unmodified.
 - sun7i-specific things (A20-specific things) are now grouped together.

v5 changes compared to v4 are:
 - Mic preamplifier controls have more common names now.
 - Mic preamplifier scale has a 0 dB entry as well now, as documented in the 
   A20 user manual.
 - Mic preamplifier has special cases for A20 and A10 now.
 - Gain controls have "Gain" in the name now.

v4 changes compared to v3 are:
 - names of the input are not uppercase anymore.
 - bit index constants are now named as in the A20 user manual v1.4.
 - added Mic1-In, Mac2-In.
 - added Mic1 and Mic2 Pre-Amplifiers.

v3 changes compared to v2 are:
 - added DAPM routes.

v2 changes compared to v1 are:
 - moved Line-In and FM-In playback switches to their respective 
   sun4i_codec_*_mixer_controls.

v1 changes:
 - added linein, fmin output volumes and switches.

Danny Milosavljevic (2):
  ASoC: sun4i-codec: Distinguish sun4i from sun7i
  Add mixer controls: Line-In, FM-In, Mic 2, Capture Source,
Differential Line-In.

 sound/soc/sunxi/sun4i-codec.c | 278 +++---
 1 file changed, 259 insertions(+), 19 deletions(-)

[PATCH v9 1/2] ASoC: sun4i-codec: Distinguish sun4i from sun7i

2016-08-29 Thread Danny Milosavljevic

This distinguishes sun4i from sun7i. It is necessary because they use
different registers for the audio mixer.
---
 sound/soc/sunxi/sun4i-codec.c | 44 +--
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/sound/soc/sunxi/sun4i-codec.c b/sound/soc/sunxi/sun4i-codec.c
index 0e19c50..30f4ea2 100644
--- a/sound/soc/sunxi/sun4i-codec.c
+++ b/sound/soc/sunxi/sun4i-codec.c
@@ -96,8 +96,9 @@
 /* Other various ADC registers */
 #define SUN4I_CODEC_DAC_TXCNT  (0x30)
 #define SUN4I_CODEC_ADC_RXCNT  (0x34)
-#define SUN4I_CODEC_AC_SYS_VERI(0x38)
-#define SUN4I_CODEC_AC_MIC_PHONE_CAL   (0x3c)
+
+#define SUN7I_CODEC_AC_DAC_CAL (0x38)
+#define SUN7I_CODEC_AC_MIC_PHONE_CAL   (0x3c)
 
 struct sun4i_codec {
struct device   *dev;
@@ -509,10 +510,17 @@ static const struct snd_kcontrol_new sun4i_codec_pa_mute =
 
 static DECLARE_TLV_DB_SCALE(sun4i_codec_pa_volume_scale, -6300, 100, 1);
 
-static const struct snd_kcontrol_new sun4i_codec_widgets[] = {
-   SOC_SINGLE_TLV("Power Amplifier Volume", SUN4I_CODEC_DAC_ACTL,
-  SUN4I_CODEC_DAC_ACTL_PA_VOL, 0x3F, 0,
-  sun4i_codec_pa_volume_scale),
+#define SUN4I_COMMON_CODEC_CONTROLS \
+   SOC_SINGLE_TLV("Power Amplifier Volume", SUN4I_CODEC_DAC_ACTL,\
+  SUN4I_CODEC_DAC_ACTL_PA_VOL, 0x3F, 0,\
+  sun4i_codec_pa_volume_scale)
+
+static const struct snd_kcontrol_new sun4i_codec_controls[] = {
+   SUN4I_COMMON_CODEC_CONTROLS,
+};
+
+static const struct snd_kcontrol_new sun7i_codec_controls[] = {
+   SUN4I_COMMON_CODEC_CONTROLS,
 };
 
 static const struct snd_kcontrol_new sun4i_codec_left_mixer_controls[] = {
@@ -629,8 +637,18 @@ static const struct snd_soc_dapm_route 
sun4i_codec_codec_dapm_routes[] = {
 
 static struct snd_soc_codec_driver sun4i_codec_codec = {
.component_driver = {
-   .controls   = sun4i_codec_widgets,
-   .num_controls   = ARRAY_SIZE(sun4i_codec_widgets),
+   .controls   = sun4i_codec_controls,
+   .num_controls   = ARRAY_SIZE(sun4i_codec_controls),
+   .dapm_widgets   = sun4i_codec_codec_dapm_widgets,
+   .num_dapm_widgets   = 
ARRAY_SIZE(sun4i_codec_codec_dapm_widgets),
+   .dapm_routes= sun4i_codec_codec_dapm_routes,
+   .num_dapm_routes= 
ARRAY_SIZE(sun4i_codec_codec_dapm_routes),
+   },
+};
+static struct snd_soc_codec_driver sun7i_codec_codec = {
+   .component_driver = {
+   .controls   = sun7i_codec_controls,
+   .num_controls   = ARRAY_SIZE(sun7i_codec_controls),
.dapm_widgets   = sun4i_codec_codec_dapm_widgets,
.num_dapm_widgets   = 
ARRAY_SIZE(sun4i_codec_codec_dapm_widgets),
.dapm_routes= sun4i_codec_codec_dapm_routes,
@@ -682,7 +700,7 @@ static const struct regmap_config sun4i_codec_regmap_config 
= {
.reg_bits   = 32,
.reg_stride = 4,
.val_bits   = 32,
-   .max_register   = SUN4I_CODEC_AC_MIC_PHONE_CAL,
+   .max_register   = SUN7I_CODEC_AC_MIC_PHONE_CAL,
 };
 
 static const struct of_device_id sun4i_codec_of_match[] = {
@@ -760,6 +778,7 @@ static int sun4i_codec_probe(struct platform_device *pdev)
 {
struct snd_soc_card *card;
struct sun4i_codec *scodec;
+   struct snd_soc_codec_driver *codec;
struct resource *res;
void __iomem *base;
int ret;
@@ -822,7 +841,12 @@ static int sun4i_codec_probe(struct platform_device *pdev)
scodec->capture_dma_data.maxburst = 4;
scodec->capture_dma_data.addr_width = DMA_SLAVE_BUSWIDTH_2_BYTES;
 
-   ret = snd_soc_register_codec(>dev, _codec_codec,
+   if (of_device_is_compatible(pdev->dev.of_node,
+   "allwinner,sun7i-a20-codec"))
+   codec = _codec_codec;
+   else
+   codec = _codec_codec;
+   ret = snd_soc_register_codec(>dev, codec,
 _codec_dai, 1);
if (ret) {
dev_err(>dev, "Failed to register our codec\n");

[PATCH v9 2/2] Add mixer controls: Line-In, FM-In, Mic 2, Capture Source, Differential Line-In.

2016-08-29 Thread Danny Milosavljevic

Note: Mic1 Capture Volume is in a different register on A20 than on A10.
Note: Mic2 Capture Volume is in a different register on A20 than on A10.
---
 sound/soc/sunxi/sun4i-codec.c | 256 ++
 1 file changed, 236 insertions(+), 20 deletions(-)

diff --git a/sound/soc/sunxi/sun4i-codec.c b/sound/soc/sunxi/sun4i-codec.c
index 30f4ea2..f510e6d 100644
--- a/sound/soc/sunxi/sun4i-codec.c
+++ b/sound/soc/sunxi/sun4i-codec.c
@@ -59,9 +59,20 @@
 #define SUN4I_CODEC_DAC_ACTL_DACAENR   (31)
 #define SUN4I_CODEC_DAC_ACTL_DACAENL   (30)
 #define SUN4I_CODEC_DAC_ACTL_MIXEN (29)
+#define SUN4I_CODEC_DAC_ACTL_LNG   (26)
+#define SUN4I_CODEC_DAC_ACTL_FMG   (23)
+#define SUN4I_CODEC_DAC_ACTL_MICG  (20)
+#define SUN4I_CODEC_DAC_ACTL_LLNS  (19)
+#define SUN4I_CODEC_DAC_ACTL_RLNS  (18)
+#define SUN4I_CODEC_DAC_ACTL_LFMS  (17)
+#define SUN4I_CODEC_DAC_ACTL_RFMS  (16)
 #define SUN4I_CODEC_DAC_ACTL_LDACLMIXS (15)
 #define SUN4I_CODEC_DAC_ACTL_RDACRMIXS (14)
 #define SUN4I_CODEC_DAC_ACTL_LDACRMIXS (13)
+#define SUN4I_CODEC_DAC_ACTL_MIC1LS(12)
+#define SUN4I_CODEC_DAC_ACTL_MIC1RS(11)
+#define SUN4I_CODEC_DAC_ACTL_MIC2LS(10)
+#define SUN4I_CODEC_DAC_ACTL_MIC2RS(9)
 #define SUN4I_CODEC_DAC_ACTL_DACPAS(8)
 #define SUN4I_CODEC_DAC_ACTL_MIXPAS(7)
 #define SUN4I_CODEC_DAC_ACTL_PA_MUTE   (6)
@@ -87,8 +98,12 @@
 #define SUN4I_CODEC_ADC_ACTL_PREG1EN   (29)
 #define SUN4I_CODEC_ADC_ACTL_PREG2EN   (28)
 #define SUN4I_CODEC_ADC_ACTL_VMICEN(27)
-#define SUN4I_CODEC_ADC_ACTL_VADCG (20)
+#define SUN4I_CODEC_ADC_ACTL_PREG1 (25)
+#define SUN4I_CODEC_ADC_ACTL_PREG2 (23)
+#define SUN4I_CODEC_ADC_ACTL_ADCG  (20)
 #define SUN4I_CODEC_ADC_ACTL_ADCIS (17)
+#define SUN4I_CODEC_ADC_ACTL_LNRDF (16)
+#define SUN4I_CODEC_ADC_ACTL_LNPREG(13)
 #define SUN4I_CODEC_ADC_ACTL_PA_EN (4)
 #define SUN4I_CODEC_ADC_ACTL_DDE   (3)
 #define SUN4I_CODEC_ADC_DEBUG  (0x2c)
@@ -100,6 +115,9 @@
 #define SUN7I_CODEC_AC_DAC_CAL (0x38)
 #define SUN7I_CODEC_AC_MIC_PHONE_CAL   (0x3c)
 
+#define SUN7I_CODEC_AC_MIC_PHONE_CAL_PREG1  (29)
+#define SUN7I_CODEC_AC_MIC_PHONE_CAL_PREG2  (26)
+
 struct sun4i_codec {
struct device   *dev;
struct regmap   *regmap;
@@ -509,23 +527,142 @@ static const struct snd_kcontrol_new sun4i_codec_pa_mute 
=
SUN4I_CODEC_DAC_ACTL_PA_MUTE, 1, 0);
 
 static DECLARE_TLV_DB_SCALE(sun4i_codec_pa_volume_scale, -6300, 100, 1);
+static DECLARE_TLV_DB_SCALE(sun4i_codec_linein_loopback_gain_scale,
+   -150,
+   150,
+   0);
+static DECLARE_TLV_DB_SCALE(sun4i_codec_linein_preamp_gain_scale,
+   -1200,
+   300,
+   0);
+static DECLARE_TLV_DB_SCALE(sun4i_codec_fmin_loopback_gain_scale,
+   -450,
+   150,
+   0);
+static DECLARE_TLV_DB_SCALE(sun4i_codec_micin_loopback_gain_scale,
+   -450,
+   150,
+   0);
+static DECLARE_TLV_DB_RANGE(sun4i_codec_micin_preamp_gain_scale,
+   0, 0, TLV_DB_SCALE_ITEM(0, 0, 0),
+   1, 7, TLV_DB_SCALE_ITEM(3500, 300, 0));
+static DECLARE_TLV_DB_SCALE(sun4i_codec_adc_gain_scale, -450, 150, 0);
+static DECLARE_TLV_DB_RANGE(sun7i_codec_micin_preamp_gain_scale,
+   0, 0, TLV_DB_SCALE_ITEM(0, 0, 0),
+   1, 7, TLV_DB_SCALE_ITEM(2400, 300, 0)
+);
+
+static const char * const sun4i_codec_capture_source[] = {
+   "Line",
+   "FM",
+   "Mic1",
+   "Mic2",
+   "Mic1,Mic2",
+   "Mic1+Mic2",
+   "Output Mixer",
+   "Line,Mic1",
+};
+static SOC_ENUM_SINGLE_DECL(sun4i_codec_enum_capture_source,
+   SUN4I_CODEC_ADC_ACTL,
+   SUN4I_CODEC_ADC_ACTL_ADCIS,
+   sun4i_codec_capture_source);
+
+static const struct snd_kcontrol_new sun4i_codec_capture_source_controls =
+   SOC_DAPM_ENUM("Capture Source", sun4i_codec_enum_capture_source);
+
+static const char * const sun4i_codec_difflinein_capture_source[] = {
+   "Non-Differential",
+   "Differential",
+};
+static SOC_ENUM_SINGLE_DECL(sun4i_codec_enum_difflinein_capture_source,
+   SUN4I_CODEC_ADC_ACTL,
+

[PATCH v9 0/2] sun4i-codec: Add Line-In, FM-In, Mic 2, Capture Source, Differential Line-In

2016-08-29 Thread Danny Milosavljevic

This patch adds support for some mixer controls:
 - Line-In
 - FM-In
 - Mic 2
 - Capture Source
 - Differential Line-In

v9 changes compared to v8 are:
 - added Line Differential Capture Switch.
 - split Capture Source into Left Capture Select, Right Capture Select.
 - added Line Capture Volume.
 - rename "sun4i_codec_widgets" to "sun4i_codec_controls" for 
   consistency with the struct field it's used in.
 - rename "Line-In" to "Line".
 - rename "Power Amplifier Playback Volume" to "Headphone Playback Volume".

v8 changes compared to v7 are:
 - fixed the routes for line and mic capturing.

v7 changes compared to v6 are:
 - preparation for different A20, A10 controls is now in an extra patch.
 - all register definitions are now at the top.
 - sun7i-specific things (A20-specific things) are now less grouped-together.
 - rename "Power Amplifier Volume" to "Power Amplifier Playback Volume".

v6 changes compared to v5 are:
 - Mic preamplifier special cases for A20 and A10 now are now not icky:
   There are two different _widget arrays and the probe() function now 
   selects the right one to pass to snd_soc_register_codec() unmodified.
 - sun7i-specific things (A20-specific things) are now grouped together.

v5 changes compared to v4 are:
 - Mic preamplifier controls have more common names now.
 - Mic preamplifier scale has a 0 dB entry as well now, as documented in the 
   A20 user manual.
 - Mic preamplifier has special cases for A20 and A10 now.
 - Gain controls have "Gain" in the name now.

v4 changes compared to v3 are:
 - names of the input are not uppercase anymore.
 - bit index constants are now named as in the A20 user manual v1.4.
 - added Mic1-In, Mac2-In.
 - added Mic1 and Mic2 Pre-Amplifiers.

v3 changes compared to v2 are:
 - added DAPM routes.

v2 changes compared to v1 are:
 - moved Line-In and FM-In playback switches to their respective 
   sun4i_codec_*_mixer_controls.

v1 changes:
 - added linein, fmin output volumes and switches.

Danny Milosavljevic (2):
  ASoC: sun4i-codec: Distinguish sun4i from sun7i
  Add mixer controls: Line-In, FM-In, Mic 2, Capture Source,
Differential Line-In.

 sound/soc/sunxi/sun4i-codec.c | 278 +++---
 1 file changed, 259 insertions(+), 19 deletions(-)

[patch v3.18+ regression fix] sched: Further improve spurious CPU_IDLE active migrations

2016-08-29 Thread Mike Galbraith


43f4d666 partially cured spurious migrations, but when there are
completely idle groups on a lightly loaded processor, and there is
a buddy pair occupying the busiest group, we will not attempt to
migrate due to select_idle_sibling() buddy placement, leaving the
busiest queue with one task.  We skip balancing, but increment
nr_balance_failed until we kick active balancing, and bounce a
buddy pair endlessly, demolishing throughput.

Regression detected on X5472 box, which has 4 MC groups of 2 cores.

netperf -l 60 -H 127.0.0.1 -t UDP_STREAM -i5,1 -I 95,5
pre:
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput  : 66.421%
!!!   Local CPU util  : 0.000%
!!!   Remote CPU util : 0.000%

Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

212992   65507   60.00 1779143  015539.49
212992   60.00 1773551   15490.65

post:
Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

212992   65507   60.00 3719377  032486.01
212992   60.00 3717492   32469.54

Signed-off-by: Mike Galbraith 
Fixes: caeb178c sched/fair: Make update_sd_pick_busiest() return 'true' on a 
busier sd
Cc:  # v3.18+
---
 kernel/sched/fair.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7249,11 +7249,12 @@ static struct sched_group *find_busiest_
 * This cpu is idle. If the busiest group is not overloaded
 * and there is no imbalance between this and busiest group
 * wrt idle cpus, it is balanced. The imbalance becomes
-* significant if the diff is greater than 1 otherwise we
-* might end up to just move the imbalance on another group
+* significant if the diff is greater than 2 otherwise we
+* may end up merely moving the imbalance to another group,
+* or bouncing a buddy pair needlessly.
 */
if ((busiest->group_type != group_overloaded) &&
-   (local->idle_cpus <= (busiest->idle_cpus + 1)))
+   (local->idle_cpus <= (busiest->idle_cpus + 2)))
goto out_balanced;
} else {
/*

[patch v3.18+ regression fix] sched: Further improve spurious CPU_IDLE active migrations

2016-08-29 Thread Mike Galbraith


43f4d666 partially cured spurious migrations, but when there are
completely idle groups on a lightly loaded processor, and there is
a buddy pair occupying the busiest group, we will not attempt to
migrate due to select_idle_sibling() buddy placement, leaving the
busiest queue with one task.  We skip balancing, but increment
nr_balance_failed until we kick active balancing, and bounce a
buddy pair endlessly, demolishing throughput.

Regression detected on X5472 box, which has 4 MC groups of 2 cores.

netperf -l 60 -H 127.0.0.1 -t UDP_STREAM -i5,1 -I 95,5
pre:
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput  : 66.421%
!!!   Local CPU util  : 0.000%
!!!   Remote CPU util : 0.000%

Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

212992   65507   60.00 1779143  015539.49
212992   60.00 1773551   15490.65

post:
Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

212992   65507   60.00 3719377  032486.01
212992   60.00 3717492   32469.54

Signed-off-by: Mike Galbraith 
Fixes: caeb178c sched/fair: Make update_sd_pick_busiest() return 'true' on a 
busier sd
Cc:  # v3.18+
---
 kernel/sched/fair.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7249,11 +7249,12 @@ static struct sched_group *find_busiest_
 * This cpu is idle. If the busiest group is not overloaded
 * and there is no imbalance between this and busiest group
 * wrt idle cpus, it is balanced. The imbalance becomes
-* significant if the diff is greater than 1 otherwise we
-* might end up to just move the imbalance on another group
+* significant if the diff is greater than 2 otherwise we
+* may end up merely moving the imbalance to another group,
+* or bouncing a buddy pair needlessly.
 */
if ((busiest->group_type != group_overloaded) &&
-   (local->idle_cpus <= (busiest->idle_cpus + 1)))
+   (local->idle_cpus <= (busiest->idle_cpus + 2)))
goto out_balanced;
} else {
/*

Re: [PATCH] ALSA: hda - Enable subwoofer on Dell Inspiron 7559

2016-08-29 Thread Takashi Iwai

On Tue, 30 Aug 2016 07:27:41 +0200,
Kai-Heng Feng wrote:
> 
> The subwoofer on Inspiron 7559 does not work originally.
> Applying a pin fixup can make it work.
> 
> Signed-off-by: Kai-Heng Feng 
> ---
>  sound/pci/hda/patch_realtek.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
> index 7100f05..2a3dd18 100644
> --- a/sound/pci/hda/patch_realtek.c
> +++ b/sound/pci/hda/patch_realtek.c
> @@ -4855,6 +4855,7 @@ enum {
>   ALC221_FIXUP_HP_FRONT_MIC,
>   ALC292_FIXUP_TPT460,
>   ALC298_FIXUP_SPK_VOLUME,
> + ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER,
>  };
>  
>  static const struct hda_fixup alc269_fixups[] = {
> @@ -5516,6 +5517,15 @@ static const struct hda_fixup alc269_fixups[] = {
>   .chained = true,
>   .chain_id = ALC298_FIXUP_DELL1_MIC_NO_PRESENCE,
>   },
> + [ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER] = {
> + .type = HDA_FIXUP_PINS,
> + .v.pins = (const struct hda_pintbl[]) {
> + { 0x1b, 0x90170151 },

What's the original value of this pin?


Takashi

Re: [PATCH] ALSA: hda - Enable subwoofer on Dell Inspiron 7559

2016-08-29 Thread Takashi Iwai

On Tue, 30 Aug 2016 07:27:41 +0200,
Kai-Heng Feng wrote:
> 
> The subwoofer on Inspiron 7559 does not work originally.
> Applying a pin fixup can make it work.
> 
> Signed-off-by: Kai-Heng Feng 
> ---
>  sound/pci/hda/patch_realtek.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
> index 7100f05..2a3dd18 100644
> --- a/sound/pci/hda/patch_realtek.c
> +++ b/sound/pci/hda/patch_realtek.c
> @@ -4855,6 +4855,7 @@ enum {
>   ALC221_FIXUP_HP_FRONT_MIC,
>   ALC292_FIXUP_TPT460,
>   ALC298_FIXUP_SPK_VOLUME,
> + ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER,
>  };
>  
>  static const struct hda_fixup alc269_fixups[] = {
> @@ -5516,6 +5517,15 @@ static const struct hda_fixup alc269_fixups[] = {
>   .chained = true,
>   .chain_id = ALC298_FIXUP_DELL1_MIC_NO_PRESENCE,
>   },
> + [ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER] = {
> + .type = HDA_FIXUP_PINS,
> + .v.pins = (const struct hda_pintbl[]) {
> + { 0x1b, 0x90170151 },

What's the original value of this pin?


Takashi

[PATCH] ALSA: hda - Enable subwoofer on Dell Inspiron 7559

2016-08-29 Thread Kai-Heng Feng

The subwoofer on Inspiron 7559 does not work originally.
Applying a pin fixup can make it work.

Signed-off-by: Kai-Heng Feng 
---
 sound/pci/hda/patch_realtek.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 7100f05..2a3dd18 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -4855,6 +4855,7 @@ enum {
ALC221_FIXUP_HP_FRONT_MIC,
ALC292_FIXUP_TPT460,
ALC298_FIXUP_SPK_VOLUME,
+   ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER,
 };
 
 static const struct hda_fixup alc269_fixups[] = {
@@ -5516,6 +5517,15 @@ static const struct hda_fixup alc269_fixups[] = {
.chained = true,
.chain_id = ALC298_FIXUP_DELL1_MIC_NO_PRESENCE,
},
+   [ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER] = {
+   .type = HDA_FIXUP_PINS,
+   .v.pins = (const struct hda_pintbl[]) {
+   { 0x1b, 0x90170151 },
+   { }
+   },
+   .chained = true,
+   .chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE
+   },
 };
 
 static const struct snd_pci_quirk alc269_fixup_tbl[] = {
@@ -5560,6 +5570,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1028, 0x06df, "Dell", 
ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK),
SND_PCI_QUIRK(0x1028, 0x06e0, "Dell", 
ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK),
SND_PCI_QUIRK(0x1028, 0x0704, "Dell XPS 13 9350", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
+   SND_PCI_QUIRK(0x1028, 0x0706, "Dell Inspiron 7559", 
ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER),
SND_PCI_QUIRK(0x1028, 0x0725, "Dell Inspiron 3162", 
ALC255_FIXUP_DELL_SPK_NOISE),
SND_PCI_QUIRK(0x1028, 0x075b, "Dell XPS 13 9360", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
SND_PCI_QUIRK(0x1028, 0x075d, "Dell AIO", ALC298_FIXUP_SPK_VOLUME),
-- 
2.9.3

[PATCH] ALSA: hda - Enable subwoofer on Dell Inspiron 7559

2016-08-29 Thread Kai-Heng Feng

The subwoofer on Inspiron 7559 does not work originally.
Applying a pin fixup can make it work.

Signed-off-by: Kai-Heng Feng 
---
 sound/pci/hda/patch_realtek.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 7100f05..2a3dd18 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -4855,6 +4855,7 @@ enum {
ALC221_FIXUP_HP_FRONT_MIC,
ALC292_FIXUP_TPT460,
ALC298_FIXUP_SPK_VOLUME,
+   ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER,
 };
 
 static const struct hda_fixup alc269_fixups[] = {
@@ -5516,6 +5517,15 @@ static const struct hda_fixup alc269_fixups[] = {
.chained = true,
.chain_id = ALC298_FIXUP_DELL1_MIC_NO_PRESENCE,
},
+   [ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER] = {
+   .type = HDA_FIXUP_PINS,
+   .v.pins = (const struct hda_pintbl[]) {
+   { 0x1b, 0x90170151 },
+   { }
+   },
+   .chained = true,
+   .chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE
+   },
 };
 
 static const struct snd_pci_quirk alc269_fixup_tbl[] = {
@@ -5560,6 +5570,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1028, 0x06df, "Dell", 
ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK),
SND_PCI_QUIRK(0x1028, 0x06e0, "Dell", 
ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK),
SND_PCI_QUIRK(0x1028, 0x0704, "Dell XPS 13 9350", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
+   SND_PCI_QUIRK(0x1028, 0x0706, "Dell Inspiron 7559", 
ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER),
SND_PCI_QUIRK(0x1028, 0x0725, "Dell Inspiron 3162", 
ALC255_FIXUP_DELL_SPK_NOISE),
SND_PCI_QUIRK(0x1028, 0x075b, "Dell XPS 13 9360", 
ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE),
SND_PCI_QUIRK(0x1028, 0x075d, "Dell AIO", ALC298_FIXUP_SPK_VOLUME),
-- 
2.9.3

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Michael S. Tsirkin

On Mon, Aug 29, 2016 at 10:53:04PM -0600, Alex Williamson wrote:
> On Mon, 29 Aug 2016 21:52:20 -0600
> Alex Williamson  wrote:
> 
> > On Mon, 29 Aug 2016 21:23:25 -0600
> > Alex Williamson  wrote:
> > 
> > > On Tue, 30 Aug 2016 05:27:17 +0300
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > > > to signal they are safe to use with an IOMMU.
> > > > 
> > > > Without this bit, exposing the device to userspace is unsafe, so probe
> > > > and fail VFIO initialization unless noiommu is enabled.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> > > >  drivers/vfio/pci/vfio_pci.c |  14 
> > > >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > > > 
> > > >  drivers/vfio/pci/Makefile   |   1 +
> > > >  4 files changed, 156 insertions(+)
> > > >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > > > 
> > > > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > > > b/drivers/vfio/pci/vfio_pci_private.h
> > > > index 2128de8..2bd5616 100644
> > > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > > > vfio_pci_device *vdev)
> > > > return -ENODEV;
> > > >  }
> > > >  #endif
> > > > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > > > noiommu);
> > > >  #endif /* VFIO_PCI_PRIVATE_H */
> > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > index d624a52..e93bf0c 100644
> > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > > > const struct pci_device_id *id)
> > > > return ret;
> > > > }
> > > >  
> > > > +   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
> > > 
> > > Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> > > ID range initially as well, this test raised a big red flag for me
> > > whether all devices within this vendor ID were virtio.
> > >   
> > > > +   bool noiommu = vfio_is_noiommu_group_dev(>dev);   
> > > >  
> > > 
> > > I think you can use iommu_present() for this and avoid patch 1of2.
> > > noiommu is mutually exclusive to an iommu being present.  Seems like
> > > all of this logic should be in the quirk itself, I'm not sure what it
> > > buys to get the value here but wait until later to use it.  Using
> > > iommu_present() could also move this test much earlier in
> > > vfio_pci_probe() making the exit path easier.  
> > 
> > Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
> > iommu_present() assumes an IOMMU API based device.  I'll try to think if
> > there's another way to avoid adding the is_noiommu function.  Thanks,
> 
> I think something like this would do it.
> 
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1214,6 +1214,22 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> str
> if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> return -EINVAL;
>  
> +   /*
> +* Filter out virtio devices that do not honor the iommu,
> +* but only for real iommu groups.
> +*/
> +   if (vfio_pci_is_virtio(pdev)) {
> +   struct iommu_group *tmp = iommu_group_get(>dev);
> +
> +   if (tmp) {
> +   iommu_group_put(tmp);
> +
> +   ret = vfio_pci_virtio_quirk(pdev);
> +   if (ret)
> +   return ret;
> +   }
> +   }
> +
> group = vfio_iommu_group_get(>dev);
> if (!group)
> return -EINVAL;
> 
> Thanks,
> Alex

Yes but I think this will also prevent binding
a vfio-noiommu to this device.

Arguably this is a separate bug as it's already impossible ...
but now that we are disabling regular vfio the noiommu
fallback becomes more important.

Any hints on how to fix?



> > > > +
> > > > +   ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > > > +   if (ret) {
> > > > +   dev_warn(>pdev->dev,
> > > > +"Failed to setup Virtio for VFIO\n");
> > > > +   vfio_del_group_dev(>dev);
> > > > +   vfio_iommu_group_put(group, >dev);
> > > > +   kfree(vdev);
> > > > +   return ret;
> > > > +   }
> > > > +   }
> > > > +
> > > > if (vfio_pci_is_vga(pdev)) {
> > > > vga_client_register(pdev, vdev, NULL, 
> > > > vfio_pci_set_vga_decode);
> > > > vga_set_legacy_decoding(pdev,
> > > > diff --git

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Michael S. Tsirkin

On Mon, Aug 29, 2016 at 10:53:04PM -0600, Alex Williamson wrote:
> On Mon, 29 Aug 2016 21:52:20 -0600
> Alex Williamson  wrote:
> 
> > On Mon, 29 Aug 2016 21:23:25 -0600
> > Alex Williamson  wrote:
> > 
> > > On Tue, 30 Aug 2016 05:27:17 +0300
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > > > to signal they are safe to use with an IOMMU.
> > > > 
> > > > Without this bit, exposing the device to userspace is unsafe, so probe
> > > > and fail VFIO initialization unless noiommu is enabled.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> > > >  drivers/vfio/pci/vfio_pci.c |  14 
> > > >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > > > 
> > > >  drivers/vfio/pci/Makefile   |   1 +
> > > >  4 files changed, 156 insertions(+)
> > > >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > > > 
> > > > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > > > b/drivers/vfio/pci/vfio_pci_private.h
> > > > index 2128de8..2bd5616 100644
> > > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > > > vfio_pci_device *vdev)
> > > > return -ENODEV;
> > > >  }
> > > >  #endif
> > > > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > > > noiommu);
> > > >  #endif /* VFIO_PCI_PRIVATE_H */
> > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > index d624a52..e93bf0c 100644
> > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > > > const struct pci_device_id *id)
> > > > return ret;
> > > > }
> > > >  
> > > > +   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
> > > 
> > > Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> > > ID range initially as well, this test raised a big red flag for me
> > > whether all devices within this vendor ID were virtio.
> > >   
> > > > +   bool noiommu = vfio_is_noiommu_group_dev(>dev);   
> > > >  
> > > 
> > > I think you can use iommu_present() for this and avoid patch 1of2.
> > > noiommu is mutually exclusive to an iommu being present.  Seems like
> > > all of this logic should be in the quirk itself, I'm not sure what it
> > > buys to get the value here but wait until later to use it.  Using
> > > iommu_present() could also move this test much earlier in
> > > vfio_pci_probe() making the exit path easier.  
> > 
> > Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
> > iommu_present() assumes an IOMMU API based device.  I'll try to think if
> > there's another way to avoid adding the is_noiommu function.  Thanks,
> 
> I think something like this would do it.
> 
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1214,6 +1214,22 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> str
> if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> return -EINVAL;
>  
> +   /*
> +* Filter out virtio devices that do not honor the iommu,
> +* but only for real iommu groups.
> +*/
> +   if (vfio_pci_is_virtio(pdev)) {
> +   struct iommu_group *tmp = iommu_group_get(>dev);
> +
> +   if (tmp) {
> +   iommu_group_put(tmp);
> +
> +   ret = vfio_pci_virtio_quirk(pdev);
> +   if (ret)
> +   return ret;
> +   }
> +   }
> +
> group = vfio_iommu_group_get(>dev);
> if (!group)
> return -EINVAL;
> 
> Thanks,
> Alex

Yes but I think this will also prevent binding
a vfio-noiommu to this device.

Arguably this is a separate bug as it's already impossible ...
but now that we are disabling regular vfio the noiommu
fallback becomes more important.

Any hints on how to fix?



> > > > +
> > > > +   ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > > > +   if (ret) {
> > > > +   dev_warn(>pdev->dev,
> > > > +"Failed to setup Virtio for VFIO\n");
> > > > +   vfio_del_group_dev(>dev);
> > > > +   vfio_iommu_group_put(group, >dev);
> > > > +   kfree(vdev);
> > > > +   return ret;
> > > > +   }
> > > > +   }
> > > > +
> > > > if (vfio_pci_is_vga(pdev)) {
> > > > vga_client_register(pdev, vdev, NULL, 
> > > > vfio_pci_set_vga_decode);
> > > > vga_set_legacy_decoding(pdev,
> > > > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > > > b/drivers/vfio/pci/vfio_pci_virtio.c
> > > > new file mode 100644
> > > >

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Andrew Morton

On Tue, 30 Aug 2016 10:44:21 +0530 Anshuman Khandual 
 wrote:

> On 08/30/2016 04:20 AM, Andrew Morton wrote:
> > On Mon, 29 Aug 2016 14:31:20 +0800 Aaron Lu  wrote:
> > 
> >> > 
> >> > The global zero page is used to satisfy an anonymous read fault. If
> >> > THP(Transparent HugePage) is enabled then the global huge zero page is 
> >> > used.
> >> > The global huge zero page uses an atomic counter for reference counting
> >> > and is allocated/freed dynamically according to its counter value.
> >> > 
> >> > CPU time spent on that counter will greatly increase if there are
> >> > a lot of processes doing anonymous read faults. This patch proposes a
> >> > way to reduce the access to the global counter so that the CPU load
> >> > can be reduced accordingly.
> >> > 
> >> > To do this, a new flag of the mm_struct is introduced: 
> >> > MMF_USED_HUGE_ZERO_PAGE.
> >> > With this flag, the process only need to touch the global counter in
> >> > two cases:
> >> > 1 The first time it uses the global huge zero page;
> >> > 2 The time when mm_user of its mm_struct reaches zero.
> >> > 
> >> > Note that right now, the huge zero page is eligible to be freed as soon
> >> > as its last use goes away.  With this patch, the page will not be
> >> > eligible to be freed until the exit of the last process from which it
> >> > was ever used.
> >> > 
> >> > And with the use of mm_user, the kthread is not eligible to use huge
> >> > zero page either. Since no kthread is using huge zero page today, there
> >> > is no difference after applying this patch. But if that is not desired,
> >> > I can change it to when mm_count reaches zero.
> 
> > I suppose we could simply never free the zero huge page - if some
> > process has used it in the past, others will probably use it in the
> > future.  One wonders how useful this optimization is...
> 
> Yeah, what prevents us from doing away with this lock altogether and
> keep one zero filled huge page (after a process has used it once) for
> ever to be mapped across all the read faults ? A 16MB / 2MB huge page
> is too much of memory loss on a THP enabled system ? We can also save
> on allocation time.

Sounds OK to me.  But only if it makes a useful performance benefit to
something that someone cares about!

otoh, that patch is simple enough...

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Andrew Morton

On Tue, 30 Aug 2016 10:44:21 +0530 Anshuman Khandual 
 wrote:

> On 08/30/2016 04:20 AM, Andrew Morton wrote:
> > On Mon, 29 Aug 2016 14:31:20 +0800 Aaron Lu  wrote:
> > 
> >> > 
> >> > The global zero page is used to satisfy an anonymous read fault. If
> >> > THP(Transparent HugePage) is enabled then the global huge zero page is 
> >> > used.
> >> > The global huge zero page uses an atomic counter for reference counting
> >> > and is allocated/freed dynamically according to its counter value.
> >> > 
> >> > CPU time spent on that counter will greatly increase if there are
> >> > a lot of processes doing anonymous read faults. This patch proposes a
> >> > way to reduce the access to the global counter so that the CPU load
> >> > can be reduced accordingly.
> >> > 
> >> > To do this, a new flag of the mm_struct is introduced: 
> >> > MMF_USED_HUGE_ZERO_PAGE.
> >> > With this flag, the process only need to touch the global counter in
> >> > two cases:
> >> > 1 The first time it uses the global huge zero page;
> >> > 2 The time when mm_user of its mm_struct reaches zero.
> >> > 
> >> > Note that right now, the huge zero page is eligible to be freed as soon
> >> > as its last use goes away.  With this patch, the page will not be
> >> > eligible to be freed until the exit of the last process from which it
> >> > was ever used.
> >> > 
> >> > And with the use of mm_user, the kthread is not eligible to use huge
> >> > zero page either. Since no kthread is using huge zero page today, there
> >> > is no difference after applying this patch. But if that is not desired,
> >> > I can change it to when mm_count reaches zero.
> 
> > I suppose we could simply never free the zero huge page - if some
> > process has used it in the past, others will probably use it in the
> > future.  One wonders how useful this optimization is...
> 
> Yeah, what prevents us from doing away with this lock altogether and
> keep one zero filled huge page (after a process has used it once) for
> ever to be mapped across all the read faults ? A 16MB / 2MB huge page
> is too much of memory loss on a THP enabled system ? We can also save
> on allocation time.

Sounds OK to me.  But only if it makes a useful performance benefit to
something that someone cares about!

otoh, that patch is simple enough...

Re: [Patch v4 9/9] arm64: Update device tree for Layerscape SoCs

2016-08-29 Thread Borislav Petkov

On Mon, Aug 29, 2016 at 02:39:21PM -0700, Olof Johansson wrote:
> DT changes need to go through arm-soc. It's how we've been operating
> for several years now.

Ok ok, we've wasted enough time with this.

So you guys pick up this one, I'll take the rest.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

Re: [Patch v4 9/9] arm64: Update device tree for Layerscape SoCs

2016-08-29 Thread Borislav Petkov

On Mon, Aug 29, 2016 at 02:39:21PM -0700, Olof Johansson wrote:
> DT changes need to go through arm-soc. It's how we've been operating
> for several years now.

Ok ok, we've wasted enough time with this.

So you guys pick up this one, I'll take the rest.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Anshuman Khandual

On 08/30/2016 04:20 AM, Andrew Morton wrote:
> On Mon, 29 Aug 2016 14:31:20 +0800 Aaron Lu  wrote:
> 
>> > 
>> > The global zero page is used to satisfy an anonymous read fault. If
>> > THP(Transparent HugePage) is enabled then the global huge zero page is 
>> > used.
>> > The global huge zero page uses an atomic counter for reference counting
>> > and is allocated/freed dynamically according to its counter value.
>> > 
>> > CPU time spent on that counter will greatly increase if there are
>> > a lot of processes doing anonymous read faults. This patch proposes a
>> > way to reduce the access to the global counter so that the CPU load
>> > can be reduced accordingly.
>> > 
>> > To do this, a new flag of the mm_struct is introduced: 
>> > MMF_USED_HUGE_ZERO_PAGE.
>> > With this flag, the process only need to touch the global counter in
>> > two cases:
>> > 1 The first time it uses the global huge zero page;
>> > 2 The time when mm_user of its mm_struct reaches zero.
>> > 
>> > Note that right now, the huge zero page is eligible to be freed as soon
>> > as its last use goes away.  With this patch, the page will not be
>> > eligible to be freed until the exit of the last process from which it
>> > was ever used.
>> > 
>> > And with the use of mm_user, the kthread is not eligible to use huge
>> > zero page either. Since no kthread is using huge zero page today, there
>> > is no difference after applying this patch. But if that is not desired,
>> > I can change it to when mm_count reaches zero.

> I suppose we could simply never free the zero huge page - if some
> process has used it in the past, others will probably use it in the
> future.  One wonders how useful this optimization is...

Yeah, what prevents us from doing away with this lock altogether and
keep one zero filled huge page (after a process has used it once) for
ever to be mapped across all the read faults ? A 16MB / 2MB huge page
is too much of memory loss on a THP enabled system ? We can also save
on allocation time.

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Anshuman Khandual

On 08/30/2016 04:20 AM, Andrew Morton wrote:
> On Mon, 29 Aug 2016 14:31:20 +0800 Aaron Lu  wrote:
> 
>> > 
>> > The global zero page is used to satisfy an anonymous read fault. If
>> > THP(Transparent HugePage) is enabled then the global huge zero page is 
>> > used.
>> > The global huge zero page uses an atomic counter for reference counting
>> > and is allocated/freed dynamically according to its counter value.
>> > 
>> > CPU time spent on that counter will greatly increase if there are
>> > a lot of processes doing anonymous read faults. This patch proposes a
>> > way to reduce the access to the global counter so that the CPU load
>> > can be reduced accordingly.
>> > 
>> > To do this, a new flag of the mm_struct is introduced: 
>> > MMF_USED_HUGE_ZERO_PAGE.
>> > With this flag, the process only need to touch the global counter in
>> > two cases:
>> > 1 The first time it uses the global huge zero page;
>> > 2 The time when mm_user of its mm_struct reaches zero.
>> > 
>> > Note that right now, the huge zero page is eligible to be freed as soon
>> > as its last use goes away.  With this patch, the page will not be
>> > eligible to be freed until the exit of the last process from which it
>> > was ever used.
>> > 
>> > And with the use of mm_user, the kthread is not eligible to use huge
>> > zero page either. Since no kthread is using huge zero page today, there
>> > is no difference after applying this patch. But if that is not desired,
>> > I can change it to when mm_count reaches zero.

> I suppose we could simply never free the zero huge page - if some
> process has used it in the past, others will probably use it in the
> future.  One wonders how useful this optimization is...

Yeah, what prevents us from doing away with this lock altogether and
keep one zero filled huge page (after a process has used it once) for
ever to be mapped across all the read faults ? A 16MB / 2MB huge page
is too much of memory loss on a THP enabled system ? We can also save
on allocation time.

Re: PROBLEM: DWC3 USB 3.0 not working on Odroid-XU4 with Exynos 5422

2016-08-29 Thread Anand Moon

Hi All

Adding Vivek Gautam.

On 29 August 2016 at 16:35, Michael Niewöhner  wrote:
> Hi Mathias,
> On Mo, 2016-08-29 at 13:59 +0300, Mathias Nyman wrote:
>> On 29.08.2016 10:28, Felipe Balbi wrote:
>> >
>> >
>> > Hi,
>> >
>> > Michael Niewöhner  writes:
>> > >
>> > > [1.] One line summary of the problem:
>> > > DWC3 USB 3.0 not working on Odroid-XU4 with Exynos 5422
>> > >
>> > > [2.] Full description of the problem/report:
>> > > No usb 3.0 devices are being detected when attached while USB 2.0
>> > > devices work on the same port.
>> > > USB 3.0 works after applying patches [9.1] and [9.2], but seems
>> > > to be
>> > > buggy. The usb hub is redetected every time an usb device is
>> > > attached.
>> >
>> > dwc3 is host, which means it's actually XHCI :-)
>> >
>> > Adding Mathias
>> >
>> > >
>> > > dmesg:
>> > > [  192.287080] usb 3-1.2: USB disconnect, device number 7
>> > > [  210.370699] hub 3-1:1.0: hub_ext_port_status failed (err =
>> > > -71)
>>
>> Looks like the hub GetPortStatus request fails with protocol error.
>>
>> Reading xhci root hub port status is mostly just register reads and
>> writes. It
>> shouldn't include any actual transfers that could return -EPROTO
>>
>> So this is not the root hub? but a external or integrated on your
>> board, right?
>>
>> The protocol error -71 is returned at transfer errors or if device
>> stalled.
>>
>> Adding more xhci debugging options could show something:
>> echo -n 'module xhci_hcd =p' >
>> /sys/kernel/debug/dynamic_debug/control
>>
>> >
>> > >
>> > > [9.] Other notes, patches, fixes, workarounds:
>> > > [9.1] https://lkml.org/lkml/2014/4/28/234
>> > > [9.2] https://lkml.org/lkml/2015/2/2/259
>>
>> The additional patches that makes things somehow work involve tuning
>> the PHY,
>> this is an area I'm not familiar with
>>
>> -Mathias
>>
>
>
> I'm sorry, I should have said that this is the dmesg output with the
> patches applied. Without them there is no output at all when I attach
> an usb 3.0 device.
>
> Michael

There are two dwc3 ports in the SoC : one for Gbit Ethernet another
one for on-board GL3521 USB 3.0 hub controller.

3.10.x kernel
odroid@odroid:~$ lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
|__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=s5p-ehci/3p, 480M

4.x kernel
odroid@odroid:~$ lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 5000M
|__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
|__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=exynos-ehci/3p, 480M

I fell that Ethernet driver r8152 is not getting registered to
xhci-hcd to bus port 06
which lead to other ports to miss configure, some time the Ethernet
port get register to bus port 04

Their is also a possibility that all the port are not getting proper
power from the S2MPS11 PMIC
and possible some reset of the phy is needed to reconfigure the exynos
USB HSIC controller.

Best Regards
-Anand Moon

Re: PROBLEM: DWC3 USB 3.0 not working on Odroid-XU4 with Exynos 5422

2016-08-29 Thread Anand Moon

Hi All

Adding Vivek Gautam.

On 29 August 2016 at 16:35, Michael Niewöhner  wrote:
> Hi Mathias,
> On Mo, 2016-08-29 at 13:59 +0300, Mathias Nyman wrote:
>> On 29.08.2016 10:28, Felipe Balbi wrote:
>> >
>> >
>> > Hi,
>> >
>> > Michael Niewöhner  writes:
>> > >
>> > > [1.] One line summary of the problem:
>> > > DWC3 USB 3.0 not working on Odroid-XU4 with Exynos 5422
>> > >
>> > > [2.] Full description of the problem/report:
>> > > No usb 3.0 devices are being detected when attached while USB 2.0
>> > > devices work on the same port.
>> > > USB 3.0 works after applying patches [9.1] and [9.2], but seems
>> > > to be
>> > > buggy. The usb hub is redetected every time an usb device is
>> > > attached.
>> >
>> > dwc3 is host, which means it's actually XHCI :-)
>> >
>> > Adding Mathias
>> >
>> > >
>> > > dmesg:
>> > > [  192.287080] usb 3-1.2: USB disconnect, device number 7
>> > > [  210.370699] hub 3-1:1.0: hub_ext_port_status failed (err =
>> > > -71)
>>
>> Looks like the hub GetPortStatus request fails with protocol error.
>>
>> Reading xhci root hub port status is mostly just register reads and
>> writes. It
>> shouldn't include any actual transfers that could return -EPROTO
>>
>> So this is not the root hub? but a external or integrated on your
>> board, right?
>>
>> The protocol error -71 is returned at transfer errors or if device
>> stalled.
>>
>> Adding more xhci debugging options could show something:
>> echo -n 'module xhci_hcd =p' >
>> /sys/kernel/debug/dynamic_debug/control
>>
>> >
>> > >
>> > > [9.] Other notes, patches, fixes, workarounds:
>> > > [9.1] https://lkml.org/lkml/2014/4/28/234
>> > > [9.2] https://lkml.org/lkml/2015/2/2/259
>>
>> The additional patches that makes things somehow work involve tuning
>> the PHY,
>> this is an area I'm not familiar with
>>
>> -Mathias
>>
>
>
> I'm sorry, I should have said that this is the dmesg output with the
> patches applied. Without them there is no output at all when I attach
> an usb 3.0 device.
>
> Michael

There are two dwc3 ports in the SoC : one for Gbit Ethernet another
one for on-board GL3521 USB 3.0 hub controller.

3.10.x kernel
odroid@odroid:~$ lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
|__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=s5p-ehci/3p, 480M

4.x kernel
odroid@odroid:~$ lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 5000M
|__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
|__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=exynos-ohci/3p, 12M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=exynos-ehci/3p, 480M

I fell that Ethernet driver r8152 is not getting registered to
xhci-hcd to bus port 06
which lead to other ports to miss configure, some time the Ethernet
port get register to bus port 04

Their is also a possibility that all the port are not getting proper
power from the S2MPS11 PMIC
and possible some reset of the phy is needed to reconfigure the exynos
USB HSIC controller.

Best Regards
-Anand Moon

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Andrew Morton

On Tue, 30 Aug 2016 10:14:25 +0530 Anshuman Khandual 
 wrote:

> On 08/30/2016 09:09 AM, Andrew Morton wrote:
> > On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:
> > 
>  Case used for test on Haswell EP:
>  usemem -n 72 --readonly -j 0x20 100G
>  Which spawns 72 processes and each will mmap 100G anonymous space and
>  then do read only access to that space sequentially with a step of 2MB.
> 
>  perf report for base commit:
>  54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
>  perf report for this commit:
>   0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
> >>>
> >>> Does this mean that overall usemem runtime halved?
> >>
> >> Sorry for the confusion, the above line is extracted from perf report.
> >> It shows the percent of CPU cycles executed in a specific function.
> >>
> >> The above two perf lines are used to show get_huge_zero_page doesn't
> >> consume that much CPU cycles after applying the patch.
> >>
> >>>
> >>> Do we have any numbers for something which is more real-wordly?
> >>
> >> Unfortunately, no real world numbers.
> >>
> >> We think the global atomic counter could be an issue for performance
> >> so I'm trying to solve the problem.
> > 
> > So, umm, we don't actually know if the patch is useful to anyone?
> 
> On a POWER system it improves the CPU consumption of the above mentioned
> function a little bit. Dont think its going to improve actual throughput
> of the workload substantially.
> 
> 0.07%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page
> 
> to
> 
> 0.01%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page

I can't say I'm surprised really.  A huge page is, ahem, huge.  The
computational cost of actually writing stuff into that page will swamp
the cost of the locking to acquire it.

Is the patch really worth the additional complexity?

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Andrew Morton

On Tue, 30 Aug 2016 10:14:25 +0530 Anshuman Khandual 
 wrote:

> On 08/30/2016 09:09 AM, Andrew Morton wrote:
> > On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:
> > 
>  Case used for test on Haswell EP:
>  usemem -n 72 --readonly -j 0x20 100G
>  Which spawns 72 processes and each will mmap 100G anonymous space and
>  then do read only access to that space sequentially with a step of 2MB.
> 
>  perf report for base commit:
>  54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
>  perf report for this commit:
>   0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
> >>>
> >>> Does this mean that overall usemem runtime halved?
> >>
> >> Sorry for the confusion, the above line is extracted from perf report.
> >> It shows the percent of CPU cycles executed in a specific function.
> >>
> >> The above two perf lines are used to show get_huge_zero_page doesn't
> >> consume that much CPU cycles after applying the patch.
> >>
> >>>
> >>> Do we have any numbers for something which is more real-wordly?
> >>
> >> Unfortunately, no real world numbers.
> >>
> >> We think the global atomic counter could be an issue for performance
> >> so I'm trying to solve the problem.
> > 
> > So, umm, we don't actually know if the patch is useful to anyone?
> 
> On a POWER system it improves the CPU consumption of the above mentioned
> function a little bit. Dont think its going to improve actual throughput
> of the workload substantially.
> 
> 0.07%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page
> 
> to
> 
> 0.01%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page

I can't say I'm surprised really.  A huge page is, ahem, huge.  The
computational cost of actually writing stuff into that page will swamp
the cost of the locking to acquire it.

Is the patch really worth the additional complexity?

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson

On Mon, 29 Aug 2016 21:52:20 -0600
Alex Williamson  wrote:

> On Mon, 29 Aug 2016 21:23:25 -0600
> Alex Williamson  wrote:
> 
> > On Tue, 30 Aug 2016 05:27:17 +0300
> > "Michael S. Tsirkin"  wrote:
> >   
> > > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > > to signal they are safe to use with an IOMMU.
> > > 
> > > Without this bit, exposing the device to userspace is unsafe, so probe
> > > and fail VFIO initialization unless noiommu is enabled.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> > >  drivers/vfio/pci/vfio_pci.c |  14 
> > >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > > 
> > >  drivers/vfio/pci/Makefile   |   1 +
> > >  4 files changed, 156 insertions(+)
> > >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > > 
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > > b/drivers/vfio/pci/vfio_pci_private.h
> > > index 2128de8..2bd5616 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > > vfio_pci_device *vdev)
> > >   return -ENODEV;
> > >  }
> > >  #endif
> > > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > > noiommu);
> > >  #endif /* VFIO_PCI_PRIVATE_H */
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index d624a52..e93bf0c 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > > const struct pci_device_id *id)
> > >   return ret;
> > >   }
> > >  
> > > + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
> > 
> > Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> > ID range initially as well, this test raised a big red flag for me
> > whether all devices within this vendor ID were virtio.
> >   
> > > + bool noiommu = vfio_is_noiommu_group_dev(>dev);
> > 
> > I think you can use iommu_present() for this and avoid patch 1of2.
> > noiommu is mutually exclusive to an iommu being present.  Seems like
> > all of this logic should be in the quirk itself, I'm not sure what it
> > buys to get the value here but wait until later to use it.  Using
> > iommu_present() could also move this test much earlier in
> > vfio_pci_probe() making the exit path easier.  
> 
> Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
> iommu_present() assumes an IOMMU API based device.  I'll try to think if
> there's another way to avoid adding the is_noiommu function.  Thanks,

I think something like this would do it.

--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1214,6 +1214,22 @@ static int vfio_pci_probe(struct pci_dev *pdev, const str
if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
return -EINVAL;
 
+   /*
+* Filter out virtio devices that do not honor the iommu,
+* but only for real iommu groups.
+*/
+   if (vfio_pci_is_virtio(pdev)) {
+   struct iommu_group *tmp = iommu_group_get(>dev);
+
+   if (tmp) {
+   iommu_group_put(tmp);
+
+   ret = vfio_pci_virtio_quirk(pdev);
+   if (ret)
+   return ret;
+   }
+   }
+
group = vfio_iommu_group_get(>dev);
if (!group)
return -EINVAL;

Thanks,
Alex

> > > +
> > > + ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > > + if (ret) {
> > > + dev_warn(>pdev->dev,
> > > +  "Failed to setup Virtio for VFIO\n");
> > > + vfio_del_group_dev(>dev);
> > > + vfio_iommu_group_put(group, >dev);
> > > + kfree(vdev);
> > > + return ret;
> > > + }
> > > + }
> > > +
> > >   if (vfio_pci_is_vga(pdev)) {
> > >   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > >   vga_set_legacy_decoding(pdev,
> > > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > > b/drivers/vfio/pci/vfio_pci_virtio.c
> > > new file mode 100644
> > > index 000..e1ecffd
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > > @@ -0,0 +1,140 @@
> > > +/*
> > > + * VFIO PCI Intel Graphics support
> >   ^^^  
> > > + *
> > > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > > + *   Author: Alex Williamson 
> > 
> > Update.
> >   
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson

On Mon, 29 Aug 2016 21:52:20 -0600
Alex Williamson  wrote:

> On Mon, 29 Aug 2016 21:23:25 -0600
> Alex Williamson  wrote:
> 
> > On Tue, 30 Aug 2016 05:27:17 +0300
> > "Michael S. Tsirkin"  wrote:
> >   
> > > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > > to signal they are safe to use with an IOMMU.
> > > 
> > > Without this bit, exposing the device to userspace is unsafe, so probe
> > > and fail VFIO initialization unless noiommu is enabled.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> > >  drivers/vfio/pci/vfio_pci.c |  14 
> > >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > > 
> > >  drivers/vfio/pci/Makefile   |   1 +
> > >  4 files changed, 156 insertions(+)
> > >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > > 
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > > b/drivers/vfio/pci/vfio_pci_private.h
> > > index 2128de8..2bd5616 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > > vfio_pci_device *vdev)
> > >   return -ENODEV;
> > >  }
> > >  #endif
> > > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > > noiommu);
> > >  #endif /* VFIO_PCI_PRIVATE_H */
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index d624a52..e93bf0c 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > > const struct pci_device_id *id)
> > >   return ret;
> > >   }
> > >  
> > > + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
> > 
> > Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> > ID range initially as well, this test raised a big red flag for me
> > whether all devices within this vendor ID were virtio.
> >   
> > > + bool noiommu = vfio_is_noiommu_group_dev(>dev);
> > 
> > I think you can use iommu_present() for this and avoid patch 1of2.
> > noiommu is mutually exclusive to an iommu being present.  Seems like
> > all of this logic should be in the quirk itself, I'm not sure what it
> > buys to get the value here but wait until later to use it.  Using
> > iommu_present() could also move this test much earlier in
> > vfio_pci_probe() making the exit path easier.  
> 
> Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
> iommu_present() assumes an IOMMU API based device.  I'll try to think if
> there's another way to avoid adding the is_noiommu function.  Thanks,

I think something like this would do it.

--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1214,6 +1214,22 @@ static int vfio_pci_probe(struct pci_dev *pdev, const str
if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
return -EINVAL;
 
+   /*
+* Filter out virtio devices that do not honor the iommu,
+* but only for real iommu groups.
+*/
+   if (vfio_pci_is_virtio(pdev)) {
+   struct iommu_group *tmp = iommu_group_get(>dev);
+
+   if (tmp) {
+   iommu_group_put(tmp);
+
+   ret = vfio_pci_virtio_quirk(pdev);
+   if (ret)
+   return ret;
+   }
+   }
+
group = vfio_iommu_group_get(>dev);
if (!group)
return -EINVAL;

Thanks,
Alex

> > > +
> > > + ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > > + if (ret) {
> > > + dev_warn(>pdev->dev,
> > > +  "Failed to setup Virtio for VFIO\n");
> > > + vfio_del_group_dev(>dev);
> > > + vfio_iommu_group_put(group, >dev);
> > > + kfree(vdev);
> > > + return ret;
> > > + }
> > > + }
> > > +
> > >   if (vfio_pci_is_vga(pdev)) {
> > >   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > >   vga_set_legacy_decoding(pdev,
> > > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > > b/drivers/vfio/pci/vfio_pci_virtio.c
> > > new file mode 100644
> > > index 000..e1ecffd
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > > @@ -0,0 +1,140 @@
> > > +/*
> > > + * VFIO PCI Intel Graphics support
> >   ^^^  
> > > + *
> > > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > > + *   Author: Alex Williamson 
> > 
> > Update.
> >   
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + *
> > > + * Register a device specific region through which to provide read-only
> > > + * access

Re: [PATCH] pwm: pwm-tipwmss: Remove all pm_runtime gets and puts from the driver

2016-08-29 Thread Vignesh R



On Monday 08 August 2016 03:39 PM, Vignesh R wrote:
> From: Jyri Sarha 
> 
> Remove all pm_runtime gets and puts, and dummy pm_ops, from the
> pwm-tipwmss driver as there is no direct hardware access. The runtime PM
> needs to be enabled, so that the runtime PM framework takes care of
> enabling/disabling of PWMSS clock when submodules of PWMSS (ECAP or
> EHRPWM) call pm_runtime APIs. With this change PWMSS clock goes to
> idle when none of the submodules are in use.
> 
> Signed-off-by: Jyri Sarha 
> Signed-off-by: Vignesh R 
> ---


Gentle ping

>  drivers/pwm/pwm-tipwmss.c | 19 ---
>  1 file changed, 19 deletions(-)
> 
> diff --git a/drivers/pwm/pwm-tipwmss.c b/drivers/pwm/pwm-tipwmss.c
> index 829f4991c96f..7fa85a1604da 100644
> --- a/drivers/pwm/pwm-tipwmss.c
> +++ b/drivers/pwm/pwm-tipwmss.c
> @@ -34,7 +34,6 @@ static int pwmss_probe(struct platform_device *pdev)
>   struct device_node *node = pdev->dev.of_node;
>  
>   pm_runtime_enable(>dev);
> - pm_runtime_get_sync(>dev);
>  
>   /* Populate all the child nodes here... */
>   ret = of_platform_populate(node, NULL, NULL, >dev);
> @@ -46,31 +45,13 @@ static int pwmss_probe(struct platform_device *pdev)
>  
>  static int pwmss_remove(struct platform_device *pdev)
>  {
> - pm_runtime_put_sync(>dev);
>   pm_runtime_disable(>dev);
>   return 0;
>  }
>  
> -#ifdef CONFIG_PM_SLEEP
> -static int pwmss_suspend(struct device *dev)
> -{
> - pm_runtime_put_sync(dev);
> - return 0;
> -}
> -
> -static int pwmss_resume(struct device *dev)
> -{
> - pm_runtime_get_sync(dev);
> - return 0;
> -}
> -#endif
> -
> -static SIMPLE_DEV_PM_OPS(pwmss_pm_ops, pwmss_suspend, pwmss_resume);
> -
>  static struct platform_driver pwmss_driver = {
>   .driver = {
>   .name   = "pwmss",
> - .pm = _pm_ops,
>   .of_match_table = pwmss_of_match,
>   },
>   .probe  = pwmss_probe,
> 

-- 
Regards
Vignesh

Re: [PATCH] pwm: pwm-tipwmss: Remove all pm_runtime gets and puts from the driver

2016-08-29 Thread Vignesh R



On Monday 08 August 2016 03:39 PM, Vignesh R wrote:
> From: Jyri Sarha 
> 
> Remove all pm_runtime gets and puts, and dummy pm_ops, from the
> pwm-tipwmss driver as there is no direct hardware access. The runtime PM
> needs to be enabled, so that the runtime PM framework takes care of
> enabling/disabling of PWMSS clock when submodules of PWMSS (ECAP or
> EHRPWM) call pm_runtime APIs. With this change PWMSS clock goes to
> idle when none of the submodules are in use.
> 
> Signed-off-by: Jyri Sarha 
> Signed-off-by: Vignesh R 
> ---


Gentle ping

>  drivers/pwm/pwm-tipwmss.c | 19 ---
>  1 file changed, 19 deletions(-)
> 
> diff --git a/drivers/pwm/pwm-tipwmss.c b/drivers/pwm/pwm-tipwmss.c
> index 829f4991c96f..7fa85a1604da 100644
> --- a/drivers/pwm/pwm-tipwmss.c
> +++ b/drivers/pwm/pwm-tipwmss.c
> @@ -34,7 +34,6 @@ static int pwmss_probe(struct platform_device *pdev)
>   struct device_node *node = pdev->dev.of_node;
>  
>   pm_runtime_enable(>dev);
> - pm_runtime_get_sync(>dev);
>  
>   /* Populate all the child nodes here... */
>   ret = of_platform_populate(node, NULL, NULL, >dev);
> @@ -46,31 +45,13 @@ static int pwmss_probe(struct platform_device *pdev)
>  
>  static int pwmss_remove(struct platform_device *pdev)
>  {
> - pm_runtime_put_sync(>dev);
>   pm_runtime_disable(>dev);
>   return 0;
>  }
>  
> -#ifdef CONFIG_PM_SLEEP
> -static int pwmss_suspend(struct device *dev)
> -{
> - pm_runtime_put_sync(dev);
> - return 0;
> -}
> -
> -static int pwmss_resume(struct device *dev)
> -{
> - pm_runtime_get_sync(dev);
> - return 0;
> -}
> -#endif
> -
> -static SIMPLE_DEV_PM_OPS(pwmss_pm_ops, pwmss_suspend, pwmss_resume);
> -
>  static struct platform_driver pwmss_driver = {
>   .driver = {
>   .name   = "pwmss",
> - .pm = _pm_ops,
>   .of_match_table = pwmss_of_match,
>   },
>   .probe  = pwmss_probe,
> 

-- 
Regards
Vignesh

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Anshuman Khandual

On 08/30/2016 09:09 AM, Andrew Morton wrote:
> On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:
> 
 Case used for test on Haswell EP:
 usemem -n 72 --readonly -j 0x20 100G
 Which spawns 72 processes and each will mmap 100G anonymous space and
 then do read only access to that space sequentially with a step of 2MB.

 perf report for base commit:
 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
 perf report for this commit:
  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
>>>
>>> Does this mean that overall usemem runtime halved?
>>
>> Sorry for the confusion, the above line is extracted from perf report.
>> It shows the percent of CPU cycles executed in a specific function.
>>
>> The above two perf lines are used to show get_huge_zero_page doesn't
>> consume that much CPU cycles after applying the patch.
>>
>>>
>>> Do we have any numbers for something which is more real-wordly?
>>
>> Unfortunately, no real world numbers.
>>
>> We think the global atomic counter could be an issue for performance
>> so I'm trying to solve the problem.
> 
> So, umm, we don't actually know if the patch is useful to anyone?

On a POWER system it improves the CPU consumption of the above mentioned
function a little bit. Dont think its going to improve actual throughput
of the workload substantially.

0.07%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page

to

0.01%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Anshuman Khandual

On 08/30/2016 09:09 AM, Andrew Morton wrote:
> On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:
> 
 Case used for test on Haswell EP:
 usemem -n 72 --readonly -j 0x20 100G
 Which spawns 72 processes and each will mmap 100G anonymous space and
 then do read only access to that space sequentially with a step of 2MB.

 perf report for base commit:
 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
 perf report for this commit:
  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
>>>
>>> Does this mean that overall usemem runtime halved?
>>
>> Sorry for the confusion, the above line is extracted from perf report.
>> It shows the percent of CPU cycles executed in a specific function.
>>
>> The above two perf lines are used to show get_huge_zero_page doesn't
>> consume that much CPU cycles after applying the patch.
>>
>>>
>>> Do we have any numbers for something which is more real-wordly?
>>
>> Unfortunately, no real world numbers.
>>
>> We think the global atomic counter could be an issue for performance
>> so I'm trying to solve the problem.
> 
> So, umm, we don't actually know if the patch is useful to anyone?

On a POWER system it improves the CPU consumption of the above mentioned
function a little bit. Dont think its going to improve actual throughput
of the workload substantially.

0.07%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page

to

0.01%  usemem  [kernel.vmlinux]  [k] mm_get_huge_zero_page

[ima] 8078f3035b: BUG: spinlock bad magic on CPU#1, swapper/0/1

2016-08-29 Thread kernel test robot


FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git 
next-restore-kexec
commit 8078f3035b9dc488ed4e896635f491cdd79e9239 ("ima: store the builtin/custom 
template definitions in a list")

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -smp 2 -m 512M

caused below changes:


++++
|| 379c7b5248 | 8078f3035b |
++++
| boot_successes | 0  | 0  |
++++



[   17.855906] kAFS: Red Hat AFS client v0.1 registering.
[   17.860556] Key type trusted registered
[   17.862727] Key type encrypted registered
[   17.863678] BUG: spinlock bad magic on CPU#1, swapper/0/1
[   17.864810]  lock: template_list+0x0/0x38, .magic: , .owner: 
/-1, .owner_cpu: 0
[   17.866700] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc1-00027-g8078f30 #1
[   17.868375] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[   17.870267]   88001ceefdd0 a3c29e13 

[   17.872268]  a7b29180 88001ceefdf0 a35a6547 
a7b29180
[   17.874302]  0003 88001ceefe10 a35a658a 
a7b29180
[   17.876716] Call Trace:
[   17.877559]  [] dump_stack+0xc5/0x128
[   17.878578]  [] spin_dump+0x9f/0xa4
[   17.879729]  [] spin_bug+0x3e/0x40
[   17.881509]  [] do_raw_spin_lock+0x5b/0x20c
[   17.883350]  [] ? do_early_param+0xbb/0xbb
[   17.884465]  [] _raw_spin_lock+0x3a/0x41
[   17.885514]  [] ? ima_init_template_list+0x10/0x5b
[   17.886662]  [] ? hash_setup+0x113/0x113
[   17.887710]  [] ima_init_template_list+0x10/0x5b
[   17.60]  [] init_ima+0xa/0x49
[   17.890116]  [] do_one_initcall+0xaf/0x1b9
[   17.891212]  [] ? do_early_param+0xbb/0xbb
[   17.892288]  [] kernel_init_freeable+0x131/0x1fb
[   17.893385]  [] kernel_init+0xe/0x16a
[   17.894442]  [] ret_from_fork+0x1f/0x40
[   17.895453]  [] ? rest_init+0x15d/0x15d
[   17.924479] ima: No TPM chip found, activating TPM-bypass!
[   17.926167] evm: HMAC attrs: 0x0






Thanks,
Kernel Test Robot
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.8.0-rc1 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

[ima] 8078f3035b: BUG: spinlock bad magic on CPU#1, swapper/0/1

2016-08-29 Thread kernel test robot


FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git 
next-restore-kexec
commit 8078f3035b9dc488ed4e896635f491cdd79e9239 ("ima: store the builtin/custom 
template definitions in a list")

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -smp 2 -m 512M

caused below changes:


++++
|| 379c7b5248 | 8078f3035b |
++++
| boot_successes | 0  | 0  |
++++



[   17.855906] kAFS: Red Hat AFS client v0.1 registering.
[   17.860556] Key type trusted registered
[   17.862727] Key type encrypted registered
[   17.863678] BUG: spinlock bad magic on CPU#1, swapper/0/1
[   17.864810]  lock: template_list+0x0/0x38, .magic: , .owner: 
/-1, .owner_cpu: 0
[   17.866700] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc1-00027-g8078f30 #1
[   17.868375] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[   17.870267]   88001ceefdd0 a3c29e13 

[   17.872268]  a7b29180 88001ceefdf0 a35a6547 
a7b29180
[   17.874302]  0003 88001ceefe10 a35a658a 
a7b29180
[   17.876716] Call Trace:
[   17.877559]  [] dump_stack+0xc5/0x128
[   17.878578]  [] spin_dump+0x9f/0xa4
[   17.879729]  [] spin_bug+0x3e/0x40
[   17.881509]  [] do_raw_spin_lock+0x5b/0x20c
[   17.883350]  [] ? do_early_param+0xbb/0xbb
[   17.884465]  [] _raw_spin_lock+0x3a/0x41
[   17.885514]  [] ? ima_init_template_list+0x10/0x5b
[   17.886662]  [] ? hash_setup+0x113/0x113
[   17.887710]  [] ima_init_template_list+0x10/0x5b
[   17.60]  [] init_ima+0xa/0x49
[   17.890116]  [] do_one_initcall+0xaf/0x1b9
[   17.891212]  [] ? do_early_param+0xbb/0xbb
[   17.892288]  [] kernel_init_freeable+0x131/0x1fb
[   17.893385]  [] kernel_init+0xe/0x16a
[   17.894442]  [] ret_from_fork+0x1f/0x40
[   17.895453]  [] ? rest_init+0x15d/0x15d
[   17.924479] ima: No TPM chip found, activating TPM-bypass!
[   17.926167] evm: HMAC attrs: 0x0






Thanks,
Kernel Test Robot
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.8.0-rc1 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

[ima] ee47739931: BUG: spinlock bad magic on CPU#0, swapper/0/1

2016-08-29 Thread kernel test robot


FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git 
next-restore-kexec
commit ee47739931a22b314879daedba9299c2834a05e1 ("ima: store the builtin/custom 
template definitions in a list")

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M

caused below changes:


++++
|| 732ca1f3c9 | ee47739931 |
++++
| boot_successes | 0  | 0  |
++++



[6.698532] cryptomgr_probe (168) used greatest stack depth: 14200 bytes left
[6.701158] Key type trusted registered
[6.702067] Key type encrypted registered
[6.702721] BUG: spinlock bad magic on CPU#0, swapper/0/1
[6.703564]  lock: template_list+0x0/0x38, .magic: , .owner: 
/-1, .owner_cpu: 0
[6.704863] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc1-00027-gee47739 #3
[6.706003] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[6.720399]   91fa0025bdc0 ba7b7b2c 

[6.721615]  bc8ffcd0 91fa0025bde0 ba572615 
bc8ffcd0
[6.722829]  0001 91fa0025be00 ba572645 
bc8ffcd0
[6.724044] Call Trace:
[6.724443]  [] dump_stack+0x85/0xbe
[6.725234]  [] spin_dump+0x90/0x95
[6.726019]  [] spin_bug+0x2b/0x2d
[6.726795]  [] do_raw_spin_lock+0x45/0x191
[6.727689]  [] _raw_spin_lock+0x3d/0x41
[6.728542]  [] ? ima_init_template_list+0x18/0x51
[6.729534]  [] ima_init_template_list+0x18/0x51
[6.730483]  [] ? hash_setup+0xb3/0xb3
[6.731296]  [] init_ima+0xa/0x36
[6.732057]  [] do_one_initcall+0x8b/0x153
[6.732942]  [] ? parse_args+0x17e/0x29f
[6.733789]  [] kernel_init_freeable+0x1dc/0x264
[6.734738]  [] ? set_debug_rodata+0x12/0x12
[6.735631]  [] kernel_init+0xe/0xfd
[6.736392]  [] ret_from_fork+0x1f/0x40
[6.737190]  [] ? rest_init+0x13c/0x13c
[6.738170] ima: No TPM chip found, activating TPM-bypass!
[6.739101] [ cut here ]
[6.739867] WARNING: CPU: 0 PID: 1 at init/main.c:790 
do_one_initcall+0x12b/0x153
[6.741307] initcall init_ima+0x0/0x36 returned with preemption imbalance 
[6.742385] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc1-00027-gee47739 #3
[6.743530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[6.744870]   91fa0025bde8 ba7b7b2c 
91fa0025be38
[6.746084]   91fa0025be28 ba518a10 
031620b97089
[6.747308]  bbabb155 0001  
0001
[6.748524] Call Trace:
[6.748915]  [] dump_stack+0x85/0xbe
[6.749710]  [] __warn+0xc5/0xe0
[6.750455]  [] ? hash_setup+0xb3/0xb3
[6.751263]  [] warn_slowpath_fmt+0x4f/0x51
[6.752141]  [] ? hash_setup+0x8a/0xb3
[6.752937]  [] ? hash_setup+0xb3/0xb3
[6.753721]  [] ? hash_setup+0xb3/0xb3
[6.754505]  [] do_one_initcall+0x12b/0x153
[6.755346]  [] kernel_init_freeable+0x1dc/0x264
[6.756243]  [] ? set_debug_rodata+0x12/0x12
[6.757101]  [] kernel_init+0xe/0xfd
[6.757879]  [] ret_from_fork+0x1f/0x40
[6.758757]  [] ? rest_init+0x13c/0x13c
[6.759593] ---[ end trace 0a42846a83c9d229 ]---
[6.760298] evm: HMAC attrs: 0x1






Thanks,
Kernel Test Robot
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.8.0-rc1 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set

[ima] ee47739931: BUG: spinlock bad magic on CPU#0, swapper/0/1

2016-08-29 Thread kernel test robot


FYI, we noticed the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git 
next-restore-kexec
commit ee47739931a22b314879daedba9299c2834a05e1 ("ima: store the builtin/custom 
template definitions in a list")

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M

caused below changes:


++++
|| 732ca1f3c9 | ee47739931 |
++++
| boot_successes | 0  | 0  |
++++



[6.698532] cryptomgr_probe (168) used greatest stack depth: 14200 bytes left
[6.701158] Key type trusted registered
[6.702067] Key type encrypted registered
[6.702721] BUG: spinlock bad magic on CPU#0, swapper/0/1
[6.703564]  lock: template_list+0x0/0x38, .magic: , .owner: 
/-1, .owner_cpu: 0
[6.704863] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc1-00027-gee47739 #3
[6.706003] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[6.720399]   91fa0025bdc0 ba7b7b2c 

[6.721615]  bc8ffcd0 91fa0025bde0 ba572615 
bc8ffcd0
[6.722829]  0001 91fa0025be00 ba572645 
bc8ffcd0
[6.724044] Call Trace:
[6.724443]  [] dump_stack+0x85/0xbe
[6.725234]  [] spin_dump+0x90/0x95
[6.726019]  [] spin_bug+0x2b/0x2d
[6.726795]  [] do_raw_spin_lock+0x45/0x191
[6.727689]  [] _raw_spin_lock+0x3d/0x41
[6.728542]  [] ? ima_init_template_list+0x18/0x51
[6.729534]  [] ima_init_template_list+0x18/0x51
[6.730483]  [] ? hash_setup+0xb3/0xb3
[6.731296]  [] init_ima+0xa/0x36
[6.732057]  [] do_one_initcall+0x8b/0x153
[6.732942]  [] ? parse_args+0x17e/0x29f
[6.733789]  [] kernel_init_freeable+0x1dc/0x264
[6.734738]  [] ? set_debug_rodata+0x12/0x12
[6.735631]  [] kernel_init+0xe/0xfd
[6.736392]  [] ret_from_fork+0x1f/0x40
[6.737190]  [] ? rest_init+0x13c/0x13c
[6.738170] ima: No TPM chip found, activating TPM-bypass!
[6.739101] [ cut here ]
[6.739867] WARNING: CPU: 0 PID: 1 at init/main.c:790 
do_one_initcall+0x12b/0x153
[6.741307] initcall init_ima+0x0/0x36 returned with preemption imbalance 
[6.742385] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc1-00027-gee47739 #3
[6.743530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[6.744870]   91fa0025bde8 ba7b7b2c 
91fa0025be38
[6.746084]   91fa0025be28 ba518a10 
031620b97089
[6.747308]  bbabb155 0001  
0001
[6.748524] Call Trace:
[6.748915]  [] dump_stack+0x85/0xbe
[6.749710]  [] __warn+0xc5/0xe0
[6.750455]  [] ? hash_setup+0xb3/0xb3
[6.751263]  [] warn_slowpath_fmt+0x4f/0x51
[6.752141]  [] ? hash_setup+0x8a/0xb3
[6.752937]  [] ? hash_setup+0xb3/0xb3
[6.753721]  [] ? hash_setup+0xb3/0xb3
[6.754505]  [] do_one_initcall+0x12b/0x153
[6.755346]  [] kernel_init_freeable+0x1dc/0x264
[6.756243]  [] ? set_debug_rodata+0x12/0x12
[6.757101]  [] kernel_init+0xe/0xfd
[6.757879]  [] ret_from_fork+0x1f/0x40
[6.758757]  [] ? rest_init+0x13c/0x13c
[6.759593] ---[ end trace 0a42846a83c9d229 ]---
[6.760298] evm: HMAC attrs: 0x1






Thanks,
Kernel Test Robot
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.8.0-rc1 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set

{standard input}:199: Error: unknown opcode

2016-08-29 Thread kbuild test robot

Hi Rich,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   b8927721ae9d5ac0582d29d7b8c267d465ad5f00
commit: b4214e41b7152b1964a3421a40251d202ae2d2c0 sh: add SMP support for J2
date:   4 weeks ago
config: sh-j2_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout b4214e41b7152b1964a3421a40251d202ae2d2c0
# save the attached .config to linux build tree
make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   {standard input}: Assembler messages:
>> {standard input}:199: Error: unknown opcode
   {standard input}:1242: Error: unknown opcode
--
   {standard input}: Assembler messages:
   {standard input}:173: Error: unknown opcode
>> {standard input}:199: Error: unknown opcode
   {standard input}:698: Error: unknown opcode
   {standard input}:714: Error: unknown opcode
   {standard input}:840: Error: unknown opcode
   {standard input}:903: Error: unknown opcode
--
   In file included from fs/ext4/inline.c:17:0:
   fs/ext4/ext4_jbd2.h: In function 'ext4_inode_journal_mode':
   fs/ext4/ext4_jbd2.h:428:1: warning: control reaches end of non-void function 
[-Wreturn-type]
}
^
   {standard input}: Assembler messages:
   {standard input}:143: Error: unknown opcode
   {standard input}:186: Error: unknown opcode
>> {standard input}:199: Error: unknown opcode
   {standard input}:213: Error: unknown opcode
   {standard input}:417: Error: unknown opcode
   {standard input}:431: Error: unknown opcode
   {standard input}:667: Error: unknown opcode
   {standard input}:681: Error: unknown opcode
   {standard input}:707: Error: unknown opcode
   {standard input}:729: Error: unknown opcode
   {standard input}:1345: Error: unknown opcode
   {standard input}:1353: Error: unknown opcode
   {standard input}:1475: Error: unknown opcode
   {standard input}:1488: Error: unknown opcode
   {standard input}:2060: Error: unknown opcode
   {standard input}:2172: Error: unknown opcode
   {standard input}:2229: Error: unknown opcode
   {standard input}:2337: Error: unknown opcode
   {standard input}:2346: Error: unknown opcode
   {standard input}:2378: Error: unknown opcode
   {standard input}:2403: Error: unknown opcode
   {standard input}:2635: Error: unknown opcode
   {standard input}:2981: Error: unknown opcode
   {standard input}:3136: Error: unknown opcode
   {standard input}:3149: Error: unknown opcode
   {standard input}:3190: Error: unknown opcode
   {standard input}:3360: Error: unknown opcode
   {standard input}:3467: Error: unknown opcode
   {standard input}:3666: Error: unknown opcode
   {standard input}:3707: Error: unknown opcode
   {standard input}:3821: Error: unknown opcode
   {standard input}:4138: Error: unknown opcode
   {standard input}:4358: Error: unknown opcode
   {standard input}:4372: Error: unknown opcode
   {standard input}:4391: Error: unknown opcode
   {standard input}:4451: Error: unknown opcode
   {standard input}:4858: Error: unknown opcode
   {standard input}:4891: Error: unknown opcode
   {standard input}:4902: Error: unknown opcode
   {standard input}:4914: Error: unknown opcode
   {standard input}:4929: Error: unknown opcode
   {standard input}:4995: Error: unknown opcode
   {standard input}:5057: Error: unknown opcode
   {standard input}:5142: Error: unknown opcode
   {standard input}:5289: Error: unknown opcode
   {standard input}:5344: Error: unknown opcode
   {standard input}:5456: Error: unknown opcode
   {standard input}:5464: Error: unknown opcode
   {standard input}:5493: Error: unknown opcode
   {standard input}:7673: Error: unknown opcode
   {standard input}:7681: Error: unknown opcode
   {standard input}:8403: Error: unknown opcode
--
   {standard input}: Assembler messages:
   {standard input}:126: Error: unknown opcode
>> {standard input}:199: Error: unknown opcode
   {standard input}:355: Error: unknown opcode
   {standard input}:454: Error: unknown opcode
   {standard input}:579: Error: unknown opcode

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

{standard input}:199: Error: unknown opcode

2016-08-29 Thread kbuild test robot

Hi Rich,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   b8927721ae9d5ac0582d29d7b8c267d465ad5f00
commit: b4214e41b7152b1964a3421a40251d202ae2d2c0 sh: add SMP support for J2
date:   4 weeks ago
config: sh-j2_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout b4214e41b7152b1964a3421a40251d202ae2d2c0
# save the attached .config to linux build tree
make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   {standard input}: Assembler messages:
>> {standard input}:199: Error: unknown opcode
   {standard input}:1242: Error: unknown opcode
--
   {standard input}: Assembler messages:
   {standard input}:173: Error: unknown opcode
>> {standard input}:199: Error: unknown opcode
   {standard input}:698: Error: unknown opcode
   {standard input}:714: Error: unknown opcode
   {standard input}:840: Error: unknown opcode
   {standard input}:903: Error: unknown opcode
--
   In file included from fs/ext4/inline.c:17:0:
   fs/ext4/ext4_jbd2.h: In function 'ext4_inode_journal_mode':
   fs/ext4/ext4_jbd2.h:428:1: warning: control reaches end of non-void function 
[-Wreturn-type]
}
^
   {standard input}: Assembler messages:
   {standard input}:143: Error: unknown opcode
   {standard input}:186: Error: unknown opcode
>> {standard input}:199: Error: unknown opcode
   {standard input}:213: Error: unknown opcode
   {standard input}:417: Error: unknown opcode
   {standard input}:431: Error: unknown opcode
   {standard input}:667: Error: unknown opcode
   {standard input}:681: Error: unknown opcode
   {standard input}:707: Error: unknown opcode
   {standard input}:729: Error: unknown opcode
   {standard input}:1345: Error: unknown opcode
   {standard input}:1353: Error: unknown opcode
   {standard input}:1475: Error: unknown opcode
   {standard input}:1488: Error: unknown opcode
   {standard input}:2060: Error: unknown opcode
   {standard input}:2172: Error: unknown opcode
   {standard input}:2229: Error: unknown opcode
   {standard input}:2337: Error: unknown opcode
   {standard input}:2346: Error: unknown opcode
   {standard input}:2378: Error: unknown opcode
   {standard input}:2403: Error: unknown opcode
   {standard input}:2635: Error: unknown opcode
   {standard input}:2981: Error: unknown opcode
   {standard input}:3136: Error: unknown opcode
   {standard input}:3149: Error: unknown opcode
   {standard input}:3190: Error: unknown opcode
   {standard input}:3360: Error: unknown opcode
   {standard input}:3467: Error: unknown opcode
   {standard input}:3666: Error: unknown opcode
   {standard input}:3707: Error: unknown opcode
   {standard input}:3821: Error: unknown opcode
   {standard input}:4138: Error: unknown opcode
   {standard input}:4358: Error: unknown opcode
   {standard input}:4372: Error: unknown opcode
   {standard input}:4391: Error: unknown opcode
   {standard input}:4451: Error: unknown opcode
   {standard input}:4858: Error: unknown opcode
   {standard input}:4891: Error: unknown opcode
   {standard input}:4902: Error: unknown opcode
   {standard input}:4914: Error: unknown opcode
   {standard input}:4929: Error: unknown opcode
   {standard input}:4995: Error: unknown opcode
   {standard input}:5057: Error: unknown opcode
   {standard input}:5142: Error: unknown opcode
   {standard input}:5289: Error: unknown opcode
   {standard input}:5344: Error: unknown opcode
   {standard input}:5456: Error: unknown opcode
   {standard input}:5464: Error: unknown opcode
   {standard input}:5493: Error: unknown opcode
   {standard input}:7673: Error: unknown opcode
   {standard input}:7681: Error: unknown opcode
   {standard input}:8403: Error: unknown opcode
--
   {standard input}: Assembler messages:
   {standard input}:126: Error: unknown opcode
>> {standard input}:199: Error: unknown opcode
   {standard input}:355: Error: unknown opcode
   {standard input}:454: Error: unknown opcode
   {standard input}:579: Error: unknown opcode

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH] generic: Add the exception case checking routine for ppi interrupt

2016-08-29 Thread MaJun

From: Ma Jun 

During system booting, if the interrupt which has no action registered
is triggered, it would cause system panic when try to access the
action member.

Signed-off-by: Ma Jun 
---
 kernel/irq/chip.c |   20 
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 8114d06..9a0e872 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -766,11 +766,23 @@ handle_percpu_irq(unsigned int irq, struct irq_desc *desc)
  */
 void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
 {
-   struct irq_chip *chip = irq_desc_get_chip(desc);
-   struct irqaction *action = desc->action;
-   void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
+   struct irq_chip *chip = NULL;
+   struct irqaction *action;
+   void *dev_id;
irqreturn_t res;
 
+   action = desc->action;
+
+   /* Unexpected interrupt in some execption case
+* we just send eoi to end this interrupt
+*/
+   if (unlikely(!action)) {
+   mask_irq(desc);
+   goto out;
+   }
+   dev_id = raw_cpu_ptr(action->percpu_dev_id);
+
+   chip = irq_desc_get_chip(desc);
kstat_incr_irqs_this_cpu(irq, desc);
 
if (chip->irq_ack)
@@ -779,7 +791,7 @@ void handle_percpu_devid_irq(unsigned int irq, struct 
irq_desc *desc)
trace_irq_handler_entry(irq, action);
res = action->handler(irq, dev_id);
trace_irq_handler_exit(irq, action, res);
-
+out:
if (chip->irq_eoi)
chip->irq_eoi(>irq_data);
 }
-- 
1.7.1

[PATCH] generic: Add the exception case checking routine for ppi interrupt

2016-08-29 Thread MaJun

From: Ma Jun 

During system booting, if the interrupt which has no action registered
is triggered, it would cause system panic when try to access the
action member.

Signed-off-by: Ma Jun 
---
 kernel/irq/chip.c |   20 
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 8114d06..9a0e872 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -766,11 +766,23 @@ handle_percpu_irq(unsigned int irq, struct irq_desc *desc)
  */
 void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
 {
-   struct irq_chip *chip = irq_desc_get_chip(desc);
-   struct irqaction *action = desc->action;
-   void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
+   struct irq_chip *chip = NULL;
+   struct irqaction *action;
+   void *dev_id;
irqreturn_t res;
 
+   action = desc->action;
+
+   /* Unexpected interrupt in some execption case
+* we just send eoi to end this interrupt
+*/
+   if (unlikely(!action)) {
+   mask_irq(desc);
+   goto out;
+   }
+   dev_id = raw_cpu_ptr(action->percpu_dev_id);
+
+   chip = irq_desc_get_chip(desc);
kstat_incr_irqs_this_cpu(irq, desc);
 
if (chip->irq_ack)
@@ -779,7 +791,7 @@ void handle_percpu_devid_irq(unsigned int irq, struct 
irq_desc *desc)
trace_irq_handler_entry(irq, action);
res = action->handler(irq, dev_id);
trace_irq_handler_exit(irq, action, res);
-
+out:
if (chip->irq_eoi)
chip->irq_eoi(>irq_data);
 }
-- 
1.7.1

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Michael S. Tsirkin

On Mon, Aug 29, 2016 at 09:52:20PM -0600, Alex Williamson wrote:
> On Mon, 29 Aug 2016 21:23:25 -0600
> Alex Williamson  wrote:
> 
> > On Tue, 30 Aug 2016 05:27:17 +0300
> > "Michael S. Tsirkin"  wrote:
> > 
> > > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > > to signal they are safe to use with an IOMMU.
> > > 
> > > Without this bit, exposing the device to userspace is unsafe, so probe
> > > and fail VFIO initialization unless noiommu is enabled.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> > >  drivers/vfio/pci/vfio_pci.c |  14 
> > >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > > 
> > >  drivers/vfio/pci/Makefile   |   1 +
> > >  4 files changed, 156 insertions(+)
> > >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > > 
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > > b/drivers/vfio/pci/vfio_pci_private.h
> > > index 2128de8..2bd5616 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > > vfio_pci_device *vdev)
> > >   return -ENODEV;
> > >  }
> > >  #endif
> > > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > > noiommu);
> > >  #endif /* VFIO_PCI_PRIVATE_H */
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index d624a52..e93bf0c 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > > const struct pci_device_id *id)
> > >   return ret;
> > >   }
> > >  
> > > + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {  
> > 
> > Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> > ID range initially as well, this test raised a big red flag for me
> > whether all devices within this vendor ID were virtio.
> > 
> > > + bool noiommu = vfio_is_noiommu_group_dev(>dev);  
> > 
> > I think you can use iommu_present() for this and avoid patch 1of2.
> > noiommu is mutually exclusive to an iommu being present.  Seems like
> > all of this logic should be in the quirk itself, I'm not sure what it
> > buys to get the value here but wait until later to use it.  Using
> > iommu_present() could also move this test much earlier in
> > vfio_pci_probe() making the exit path easier.
> 
> Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
> iommu_present() assumes an IOMMU API based device.  I'll try to think if
> there's another way to avoid adding the is_noiommu function.  Thanks,
> 
> Alex

FWIW I'm only too happy if you take over this patch.
You need Jason's recent patchset to QEMU to test,
but otherwise no special hardware is required.

> > 
> > > +
> > > + ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > > + if (ret) {
> > > + dev_warn(>pdev->dev,
> > > +  "Failed to setup Virtio for VFIO\n");
> > > + vfio_del_group_dev(>dev);
> > > + vfio_iommu_group_put(group, >dev);
> > > + kfree(vdev);
> > > + return ret;
> > > + }
> > > + }
> > > +
> > >   if (vfio_pci_is_vga(pdev)) {
> > >   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > >   vga_set_legacy_decoding(pdev,
> > > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > > b/drivers/vfio/pci/vfio_pci_virtio.c
> > > new file mode 100644
> > > index 000..e1ecffd
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > > @@ -0,0 +1,140 @@
> > > +/*
> > > + * VFIO PCI Intel Graphics support  
> >   ^^^
> > > + *
> > > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > > + *   Author: Alex Williamson   
> > 
> > Update.
> > 
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + *
> > > + * Register a device specific region through which to provide read-only
> > > + * access to the Intel IGD opregion.  The register defining the opregion
> > > + * address is also virtualized to prevent user modification.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include   
> > 
> > Are io.h and uaccess.h needed?
> > 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "vfio_pci_private.h"
> > > +
> > > +/**
> > > + * virtio_pci_find_capability - walk capabilities to find device info.
> > > + * @dev: the pci device
> > > + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> > > + *
> > > + * Returns offset of the capability, or 0.
> > > + */
> > > +static inline int

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Michael S. Tsirkin

On Mon, Aug 29, 2016 at 09:52:20PM -0600, Alex Williamson wrote:
> On Mon, 29 Aug 2016 21:23:25 -0600
> Alex Williamson  wrote:
> 
> > On Tue, 30 Aug 2016 05:27:17 +0300
> > "Michael S. Tsirkin"  wrote:
> > 
> > > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > > to signal they are safe to use with an IOMMU.
> > > 
> > > Without this bit, exposing the device to userspace is unsafe, so probe
> > > and fail VFIO initialization unless noiommu is enabled.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> > >  drivers/vfio/pci/vfio_pci.c |  14 
> > >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > > 
> > >  drivers/vfio/pci/Makefile   |   1 +
> > >  4 files changed, 156 insertions(+)
> > >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > > 
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > > b/drivers/vfio/pci/vfio_pci_private.h
> > > index 2128de8..2bd5616 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > > vfio_pci_device *vdev)
> > >   return -ENODEV;
> > >  }
> > >  #endif
> > > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > > noiommu);
> > >  #endif /* VFIO_PCI_PRIVATE_H */
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index d624a52..e93bf0c 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > > const struct pci_device_id *id)
> > >   return ret;
> > >   }
> > >  
> > > + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {  
> > 
> > Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> > ID range initially as well, this test raised a big red flag for me
> > whether all devices within this vendor ID were virtio.
> > 
> > > + bool noiommu = vfio_is_noiommu_group_dev(>dev);  
> > 
> > I think you can use iommu_present() for this and avoid patch 1of2.
> > noiommu is mutually exclusive to an iommu being present.  Seems like
> > all of this logic should be in the quirk itself, I'm not sure what it
> > buys to get the value here but wait until later to use it.  Using
> > iommu_present() could also move this test much earlier in
> > vfio_pci_probe() making the exit path easier.
> 
> Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
> iommu_present() assumes an IOMMU API based device.  I'll try to think if
> there's another way to avoid adding the is_noiommu function.  Thanks,
> 
> Alex

FWIW I'm only too happy if you take over this patch.
You need Jason's recent patchset to QEMU to test,
but otherwise no special hardware is required.

> > 
> > > +
> > > + ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > > + if (ret) {
> > > + dev_warn(>pdev->dev,
> > > +  "Failed to setup Virtio for VFIO\n");
> > > + vfio_del_group_dev(>dev);
> > > + vfio_iommu_group_put(group, >dev);
> > > + kfree(vdev);
> > > + return ret;
> > > + }
> > > + }
> > > +
> > >   if (vfio_pci_is_vga(pdev)) {
> > >   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > >   vga_set_legacy_decoding(pdev,
> > > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > > b/drivers/vfio/pci/vfio_pci_virtio.c
> > > new file mode 100644
> > > index 000..e1ecffd
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > > @@ -0,0 +1,140 @@
> > > +/*
> > > + * VFIO PCI Intel Graphics support  
> >   ^^^
> > > + *
> > > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > > + *   Author: Alex Williamson   
> > 
> > Update.
> > 
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + *
> > > + * Register a device specific region through which to provide read-only
> > > + * access to the Intel IGD opregion.  The register defining the opregion
> > > + * address is also virtualized to prevent user modification.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include   
> > 
> > Are io.h and uaccess.h needed?
> > 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "vfio_pci_private.h"
> > > +
> > > +/**
> > > + * virtio_pci_find_capability - walk capabilities to find device info.
> > > + * @dev: the pci device
> > > + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> > > + *
> > > + * Returns offset of the capability, or 0.
> > > + */
> > > +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> > > cfg_type)  
> > 
> > Does inlining

[PATCH] [bugfix] replace unnessary ldax with common ldr

2016-08-29 Thread Kenneth Lee

Signed-off-by: Kenneth Lee 
---
 arch/arm64/include/asm/spinlock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/spinlock.h 
b/arch/arm64/include/asm/spinlock.h
index c85e96d..3334c4f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -63,7 +63,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 */
 "  sevl\n"
 "2:wfe\n"
-"  ldaxrh  %w2, %4\n"
+"  ldrh%w2, %4\n"
 "  eor %w1, %w2, %w0, lsr #16\n"
 "  cbnz%w1, 2b\n"
/* We got the lock. Critical section starts here. */
-- 
1.9.1

[PATCH] [bugfix] replace unnessary ldax with common ldr

2016-08-29 Thread Kenneth Lee

Signed-off-by: Kenneth Lee 
---
 arch/arm64/include/asm/spinlock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/spinlock.h 
b/arch/arm64/include/asm/spinlock.h
index c85e96d..3334c4f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -63,7 +63,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 */
 "  sevl\n"
 "2:wfe\n"
-"  ldaxrh  %w2, %4\n"
+"  ldrh%w2, %4\n"
 "  eor %w1, %w2, %w0, lsr #16\n"
 "  cbnz%w1, 2b\n"
/* We got the lock. Critical section starts here. */
-- 
1.9.1

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson

On Mon, 29 Aug 2016 21:23:25 -0600
Alex Williamson  wrote:

> On Tue, 30 Aug 2016 05:27:17 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > to signal they are safe to use with an IOMMU.
> > 
> > Without this bit, exposing the device to userspace is unsafe, so probe
> > and fail VFIO initialization unless noiommu is enabled.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> >  drivers/vfio/pci/vfio_pci.c |  14 
> >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > 
> >  drivers/vfio/pci/Makefile   |   1 +
> >  4 files changed, 156 insertions(+)
> >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > b/drivers/vfio/pci/vfio_pci_private.h
> > index 2128de8..2bd5616 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > vfio_pci_device *vdev)
> > return -ENODEV;
> >  }
> >  #endif
> > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > noiommu);
> >  #endif /* VFIO_PCI_PRIVATE_H */
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index d624a52..e93bf0c 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > const struct pci_device_id *id)
> > return ret;
> > }
> >  
> > +   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {  
> 
> Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> ID range initially as well, this test raised a big red flag for me
> whether all devices within this vendor ID were virtio.
> 
> > +   bool noiommu = vfio_is_noiommu_group_dev(>dev);  
> 
> I think you can use iommu_present() for this and avoid patch 1of2.
> noiommu is mutually exclusive to an iommu being present.  Seems like
> all of this logic should be in the quirk itself, I'm not sure what it
> buys to get the value here but wait until later to use it.  Using
> iommu_present() could also move this test much earlier in
> vfio_pci_probe() making the exit path easier.

Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
iommu_present() assumes an IOMMU API based device.  I'll try to think if
there's another way to avoid adding the is_noiommu function.  Thanks,

Alex

> 
> > +
> > +   ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > +   if (ret) {
> > +   dev_warn(>pdev->dev,
> > +"Failed to setup Virtio for VFIO\n");
> > +   vfio_del_group_dev(>dev);
> > +   vfio_iommu_group_put(group, >dev);
> > +   kfree(vdev);
> > +   return ret;
> > +   }
> > +   }
> > +
> > if (vfio_pci_is_vga(pdev)) {
> > vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > vga_set_legacy_decoding(pdev,
> > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > b/drivers/vfio/pci/vfio_pci_virtio.c
> > new file mode 100644
> > index 000..e1ecffd
> > --- /dev/null
> > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > @@ -0,0 +1,140 @@
> > +/*
> > + * VFIO PCI Intel Graphics support  
>   ^^^
> > + *
> > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > + * Author: Alex Williamson   
> 
> Update.
> 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * Register a device specific region through which to provide read-only
> > + * access to the Intel IGD opregion.  The register defining the opregion
> > + * address is also virtualized to prevent user modification.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include   
> 
> Are io.h and uaccess.h needed?
> 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vfio_pci_private.h"
> > +
> > +/**
> > + * virtio_pci_find_capability - walk capabilities to find device info.
> > + * @dev: the pci device
> > + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> > + *
> > + * Returns offset of the capability, or 0.
> > + */
> > +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> > cfg_type)  
> 
> Does inlining this really make sense?
> 
> > +{
> > +   int pos;
> > +
> > +   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> > +pos > 0;
> > +pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> > +   u8 type;
> > +   pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> > +

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson

On Mon, 29 Aug 2016 21:23:25 -0600
Alex Williamson  wrote:

> On Tue, 30 Aug 2016 05:27:17 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > to signal they are safe to use with an IOMMU.
> > 
> > Without this bit, exposing the device to userspace is unsafe, so probe
> > and fail VFIO initialization unless noiommu is enabled.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> >  drivers/vfio/pci/vfio_pci.c |  14 
> >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > 
> >  drivers/vfio/pci/Makefile   |   1 +
> >  4 files changed, 156 insertions(+)
> >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > b/drivers/vfio/pci/vfio_pci_private.h
> > index 2128de8..2bd5616 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > vfio_pci_device *vdev)
> > return -ENODEV;
> >  }
> >  #endif
> > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > noiommu);
> >  #endif /* VFIO_PCI_PRIVATE_H */
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index d624a52..e93bf0c 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > const struct pci_device_id *id)
> > return ret;
> > }
> >  
> > +   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {  
> 
> Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> ID range initially as well, this test raised a big red flag for me
> whether all devices within this vendor ID were virtio.
> 
> > +   bool noiommu = vfio_is_noiommu_group_dev(>dev);  
> 
> I think you can use iommu_present() for this and avoid patch 1of2.
> noiommu is mutually exclusive to an iommu being present.  Seems like
> all of this logic should be in the quirk itself, I'm not sure what it
> buys to get the value here but wait until later to use it.  Using
> iommu_present() could also move this test much earlier in
> vfio_pci_probe() making the exit path easier.

Except then I'm reintroducing the bug fixed by 16ab8a5cbea4 since
iommu_present() assumes an IOMMU API based device.  I'll try to think if
there's another way to avoid adding the is_noiommu function.  Thanks,

Alex

> 
> > +
> > +   ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > +   if (ret) {
> > +   dev_warn(>pdev->dev,
> > +"Failed to setup Virtio for VFIO\n");
> > +   vfio_del_group_dev(>dev);
> > +   vfio_iommu_group_put(group, >dev);
> > +   kfree(vdev);
> > +   return ret;
> > +   }
> > +   }
> > +
> > if (vfio_pci_is_vga(pdev)) {
> > vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > vga_set_legacy_decoding(pdev,
> > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > b/drivers/vfio/pci/vfio_pci_virtio.c
> > new file mode 100644
> > index 000..e1ecffd
> > --- /dev/null
> > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > @@ -0,0 +1,140 @@
> > +/*
> > + * VFIO PCI Intel Graphics support  
>   ^^^
> > + *
> > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > + * Author: Alex Williamson   
> 
> Update.
> 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * Register a device specific region through which to provide read-only
> > + * access to the Intel IGD opregion.  The register defining the opregion
> > + * address is also virtualized to prevent user modification.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include   
> 
> Are io.h and uaccess.h needed?
> 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vfio_pci_private.h"
> > +
> > +/**
> > + * virtio_pci_find_capability - walk capabilities to find device info.
> > + * @dev: the pci device
> > + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> > + *
> > + * Returns offset of the capability, or 0.
> > + */
> > +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> > cfg_type)  
> 
> Does inlining this really make sense?
> 
> > +{
> > +   int pos;
> > +
> > +   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> > +pos > 0;
> > +pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> > +   u8 type;
> > +   pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> > +cfg_type),
> > +);
> > +
> > +

[PATCH v2] arm64: KVM: Save four instructions in __guest_enter/exit()

2016-08-29 Thread Shanker Donthineni

We are doing an unnecessary stack push/pop operation when restoring
the guest registers x0-x18 in __guest_enter(). This patch saves the
two instructions by using x18 as a base register. No need to store
the vcpu context pointer in stack because it is redundant, the same
information is available in tpidr_el2. The function __guest_exit()
prototype is simplified and caller pushes the regs x0-x1 to stack
instead of regs x0-x3.

Signed-off-by: Shanker Donthineni 
---
Changes since v1:
  Incorporated Cristoffer suggestions.
  __guest_exit prototype is changed to 'void __guest_exit(u64 reason, struct 
kvm_vcpu *vcpu)'.

 arch/arm64/kvm/hyp/entry.S | 101 +
 arch/arm64/kvm/hyp/hyp-entry.S |  11 +++--
 2 files changed, 57 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index ce9e5e5..f70489a 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -55,75 +55,76 @@
  */
 ENTRY(__guest_enter)
// x0: vcpu
-   // x1: host/guest context
-   // x2-x18: clobbered by macros
+   // x1: host context
+   // x2-x17: clobbered by macros
+   // x18: guest context
 
// Store the host regs
save_callee_saved_regs x1
 
-   // Preserve vcpu & host_ctxt for use at exit time
-   stp x0, x1, [sp, #-16]!
+   // Store the host_ctxt for use at exit time
+   str x1, [sp, #-16]!
 
-   add x1, x0, #VCPU_CONTEXT
+   add x18, x0, #VCPU_CONTEXT
 
-   // Prepare x0-x1 for later restore by pushing them onto the stack
-   ldp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
-   stp x2, x3, [sp, #-16]!
+   // Restore guest regs x0-x17
+   ldp x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
+   ldp x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
+   ldp x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
+   ldp x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
+   ldp x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
+   ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
+   ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
+   ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
+   ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
 
-   // x2-x18
-   ldp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
-   ldp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
-   ldp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
-   ldp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
-   ldp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
-   ldp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
-   ldp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
-   ldp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
-   ldr x18,  [x1, #CPU_XREG_OFFSET(18)]
+   // Restore guest regs x19-x29, lr
+   restore_callee_saved_regs x18
 
-   // x19-x29, lr
-   restore_callee_saved_regs x1
-
-   // Last bits of the 64bit state
-   ldp x0, x1, [sp], #16
+   // Restore guest reg x18
+   ldr x18,  [x18, #CPU_XREG_OFFSET(18)]
 
// Do not touch any register after this!
eret
 ENDPROC(__guest_enter)
 
+/*
+ * void __guest_exit(u64 exit_reason, struct kvm_vcpu *vcpu);
+ */
 ENTRY(__guest_exit)
-   // x0: vcpu
-   // x1: return code
-   // x2-x3: free
-   // x4-x29,lr: vcpu regs
-   // vcpu x0-x3 on the stack
-
-   add x2, x0, #VCPU_CONTEXT
-
-   stp x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
-   stp x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
-   stp x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
-   stp x10, x11, [x2, #CPU_XREG_OFFSET(10)]
-   stp x12, x13, [x2, #CPU_XREG_OFFSET(12)]
-   stp x14, x15, [x2, #CPU_XREG_OFFSET(14)]
-   stp x16, x17, [x2, #CPU_XREG_OFFSET(16)]
-   str x18,  [x2, #CPU_XREG_OFFSET(18)]
-
-   ldp x6, x7, [sp], #16   // x2, x3
-   ldp x4, x5, [sp], #16   // x0, x1
-
-   stp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
-   stp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
+   // x0: return code
+   // x1: vcpu
+   // x2-x29,lr: vcpu regs
+   // vcpu x0-x1 on the stack
+
+   add x1, x1, #VCPU_CONTEXT
+
+   // Store the guest regs x2 and x3
+   stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
+
+   // Retrieve the guest regs x0-x1 from the stack
+   ldp x2, x3, [sp], #16   // x0, x1
+
+   // Store the guest regs x0-x1 and x4-x18
+   stp x2, x3,   [x1, #CPU_XREG_OFFSET(0)]
+   stp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
+   stp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
+   stp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
+   stp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
+   stp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
+   stp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
+   stp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
+   str x18,  [x1, #CPU_XREG_OFFSET(18)]
+
+   // Store the guest regs x19-x29, lr
+   save_callee_saved_regs x1
 
-

[PATCH v2] arm64: KVM: Save four instructions in __guest_enter/exit()

2016-08-29 Thread Shanker Donthineni

We are doing an unnecessary stack push/pop operation when restoring
the guest registers x0-x18 in __guest_enter(). This patch saves the
two instructions by using x18 as a base register. No need to store
the vcpu context pointer in stack because it is redundant, the same
information is available in tpidr_el2. The function __guest_exit()
prototype is simplified and caller pushes the regs x0-x1 to stack
instead of regs x0-x3.

Signed-off-by: Shanker Donthineni 
---
Changes since v1:
  Incorporated Cristoffer suggestions.
  __guest_exit prototype is changed to 'void __guest_exit(u64 reason, struct 
kvm_vcpu *vcpu)'.

 arch/arm64/kvm/hyp/entry.S | 101 +
 arch/arm64/kvm/hyp/hyp-entry.S |  11 +++--
 2 files changed, 57 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index ce9e5e5..f70489a 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -55,75 +55,76 @@
  */
 ENTRY(__guest_enter)
// x0: vcpu
-   // x1: host/guest context
-   // x2-x18: clobbered by macros
+   // x1: host context
+   // x2-x17: clobbered by macros
+   // x18: guest context
 
// Store the host regs
save_callee_saved_regs x1
 
-   // Preserve vcpu & host_ctxt for use at exit time
-   stp x0, x1, [sp, #-16]!
+   // Store the host_ctxt for use at exit time
+   str x1, [sp, #-16]!
 
-   add x1, x0, #VCPU_CONTEXT
+   add x18, x0, #VCPU_CONTEXT
 
-   // Prepare x0-x1 for later restore by pushing them onto the stack
-   ldp x2, x3, [x1, #CPU_XREG_OFFSET(0)]
-   stp x2, x3, [sp, #-16]!
+   // Restore guest regs x0-x17
+   ldp x0, x1,   [x18, #CPU_XREG_OFFSET(0)]
+   ldp x2, x3,   [x18, #CPU_XREG_OFFSET(2)]
+   ldp x4, x5,   [x18, #CPU_XREG_OFFSET(4)]
+   ldp x6, x7,   [x18, #CPU_XREG_OFFSET(6)]
+   ldp x8, x9,   [x18, #CPU_XREG_OFFSET(8)]
+   ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
+   ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
+   ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
+   ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
 
-   // x2-x18
-   ldp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
-   ldp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
-   ldp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
-   ldp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
-   ldp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
-   ldp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
-   ldp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
-   ldp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
-   ldr x18,  [x1, #CPU_XREG_OFFSET(18)]
+   // Restore guest regs x19-x29, lr
+   restore_callee_saved_regs x18
 
-   // x19-x29, lr
-   restore_callee_saved_regs x1
-
-   // Last bits of the 64bit state
-   ldp x0, x1, [sp], #16
+   // Restore guest reg x18
+   ldr x18,  [x18, #CPU_XREG_OFFSET(18)]
 
// Do not touch any register after this!
eret
 ENDPROC(__guest_enter)
 
+/*
+ * void __guest_exit(u64 exit_reason, struct kvm_vcpu *vcpu);
+ */
 ENTRY(__guest_exit)
-   // x0: vcpu
-   // x1: return code
-   // x2-x3: free
-   // x4-x29,lr: vcpu regs
-   // vcpu x0-x3 on the stack
-
-   add x2, x0, #VCPU_CONTEXT
-
-   stp x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
-   stp x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
-   stp x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
-   stp x10, x11, [x2, #CPU_XREG_OFFSET(10)]
-   stp x12, x13, [x2, #CPU_XREG_OFFSET(12)]
-   stp x14, x15, [x2, #CPU_XREG_OFFSET(14)]
-   stp x16, x17, [x2, #CPU_XREG_OFFSET(16)]
-   str x18,  [x2, #CPU_XREG_OFFSET(18)]
-
-   ldp x6, x7, [sp], #16   // x2, x3
-   ldp x4, x5, [sp], #16   // x0, x1
-
-   stp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
-   stp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
+   // x0: return code
+   // x1: vcpu
+   // x2-x29,lr: vcpu regs
+   // vcpu x0-x1 on the stack
+
+   add x1, x1, #VCPU_CONTEXT
+
+   // Store the guest regs x2 and x3
+   stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
+
+   // Retrieve the guest regs x0-x1 from the stack
+   ldp x2, x3, [sp], #16   // x0, x1
+
+   // Store the guest regs x0-x1 and x4-x18
+   stp x2, x3,   [x1, #CPU_XREG_OFFSET(0)]
+   stp x4, x5,   [x1, #CPU_XREG_OFFSET(4)]
+   stp x6, x7,   [x1, #CPU_XREG_OFFSET(6)]
+   stp x8, x9,   [x1, #CPU_XREG_OFFSET(8)]
+   stp x10, x11, [x1, #CPU_XREG_OFFSET(10)]
+   stp x12, x13, [x1, #CPU_XREG_OFFSET(12)]
+   stp x14, x15, [x1, #CPU_XREG_OFFSET(14)]
+   stp x16, x17, [x1, #CPU_XREG_OFFSET(16)]
+   str x18,  [x1, #CPU_XREG_OFFSET(18)]
+
+   // Store the guest regs x19-x29, lr
+   save_callee_saved_regs x1
 
-   save_callee_saved_regs x2
+

Re: [Qemu-devel] [PATCH v2 0/2] vfio: blacklist legacy virtio devices

2016-08-29 Thread Michael S. Tsirkin

On Tue, Aug 30, 2016 at 11:16:25AM +0800, Jason Wang wrote:
> 
> 
> On 2016年08月30日 10:27, Michael S. Tsirkin wrote:
> > Legacy virtio devices always bypassed an IOMMU, so using them with vfio was
> > never safe.
> 
> And it actually won't work since GPA is assumed in the device. So I'm not
> sure this is must since we should get a IOMMU fault in this case.

We won't get an IOMMU fault for legacy systems since they
bypass the IOMMU. Instead guest userspace will get full
access to all of guest memory through the device.


> >   This adds a quirk detecting these and disabling VFIO unless the
> > noiommu mode is used.  At the moment, this only applies to virtio-pci 
> > devices.
> > 
> > The patch might make sense on stable as well.
> > 
> > Michael S. Tsirkin (2):
> >vfio: report group noiommu status
> >vfio: add virtio pci quirk
> > 
> >   drivers/vfio/pci/vfio_pci_private.h |   1 +
> >   include/linux/vfio.h|   2 +
> >   drivers/vfio/pci/vfio_pci.c |  14 
> >   drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > 
> >   drivers/vfio/vfio.c |  12 
> >   drivers/vfio/pci/Makefile   |   1 +
> >   6 files changed, 170 insertions(+)
> >   create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> >

Re: [Qemu-devel] [PATCH v2 0/2] vfio: blacklist legacy virtio devices

2016-08-29 Thread Michael S. Tsirkin

On Tue, Aug 30, 2016 at 11:16:25AM +0800, Jason Wang wrote:
> 
> 
> On 2016年08月30日 10:27, Michael S. Tsirkin wrote:
> > Legacy virtio devices always bypassed an IOMMU, so using them with vfio was
> > never safe.
> 
> And it actually won't work since GPA is assumed in the device. So I'm not
> sure this is must since we should get a IOMMU fault in this case.

We won't get an IOMMU fault for legacy systems since they
bypass the IOMMU. Instead guest userspace will get full
access to all of guest memory through the device.


> >   This adds a quirk detecting these and disabling VFIO unless the
> > noiommu mode is used.  At the moment, this only applies to virtio-pci 
> > devices.
> > 
> > The patch might make sense on stable as well.
> > 
> > Michael S. Tsirkin (2):
> >vfio: report group noiommu status
> >vfio: add virtio pci quirk
> > 
> >   drivers/vfio/pci/vfio_pci_private.h |   1 +
> >   include/linux/vfio.h|   2 +
> >   drivers/vfio/pci/vfio_pci.c |  14 
> >   drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > 
> >   drivers/vfio/vfio.c |  12 
> >   drivers/vfio/pci/Makefile   |   1 +
> >   6 files changed, 170 insertions(+)
> >   create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> >

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Michael S. Tsirkin

On Mon, Aug 29, 2016 at 09:23:25PM -0600, Alex Williamson wrote:
> On Tue, 30 Aug 2016 05:27:17 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > to signal they are safe to use with an IOMMU.
> > 
> > Without this bit, exposing the device to userspace is unsafe, so probe
> > and fail VFIO initialization unless noiommu is enabled.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> >  drivers/vfio/pci/vfio_pci.c |  14 
> >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > 
> >  drivers/vfio/pci/Makefile   |   1 +
> >  4 files changed, 156 insertions(+)
> >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > b/drivers/vfio/pci/vfio_pci_private.h
> > index 2128de8..2bd5616 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > vfio_pci_device *vdev)
> > return -ENODEV;
> >  }
> >  #endif
> > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > noiommu);
> >  #endif /* VFIO_PCI_PRIVATE_H */
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index d624a52..e93bf0c 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > const struct pci_device_id *id)
> > return ret;
> > }
> >  
> > +   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
> 
> Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> ID range initially as well, this test raised a big red flag for me
> whether all devices within this vendor ID were virtio.
> 
> > +   bool noiommu = vfio_is_noiommu_group_dev(>dev);
> 
> I think you can use iommu_present() for this and avoid patch 1of2.

So in presence of an IOMMU in the system, is it impossible to bind the
noiommu device?

I did not test this yet.

If yes this is something we'll need to fix as well -
we do want to allow binding noiommu to a legacy virtio device.



> noiommu is mutually exclusive to an iommu being present.  Seems like
> all of this logic should be in the quirk itself, I'm not sure what it
> buys to get the value here but wait until later to use it.  Using
> iommu_present() could also move this test much earlier in
> vfio_pci_probe() making the exit path easier.
> 
> > +
> > +   ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > +   if (ret) {
> > +   dev_warn(>pdev->dev,
> > +"Failed to setup Virtio for VFIO\n");
> > +   vfio_del_group_dev(>dev);
> > +   vfio_iommu_group_put(group, >dev);
> > +   kfree(vdev);
> > +   return ret;
> > +   }
> > +   }
> > +
> > if (vfio_pci_is_vga(pdev)) {
> > vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > vga_set_legacy_decoding(pdev,
> > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > b/drivers/vfio/pci/vfio_pci_virtio.c
> > new file mode 100644
> > index 000..e1ecffd
> > --- /dev/null
> > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > @@ -0,0 +1,140 @@
> > +/*
> > + * VFIO PCI Intel Graphics support
>   ^^^
> > + *
> > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > + * Author: Alex Williamson 
> 
> Update.
> 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * Register a device specific region through which to provide read-only
> > + * access to the Intel IGD opregion.  The register defining the opregion
> > + * address is also virtualized to prevent user modification.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> 
> Are io.h and uaccess.h needed?
> 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vfio_pci_private.h"
> > +
> > +/**
> > + * virtio_pci_find_capability - walk capabilities to find device info.
> > + * @dev: the pci device
> > + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> > + *
> > + * Returns offset of the capability, or 0.
> > + */
> > +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> > cfg_type)
> 
> Does inlining this really make sense?
> 
> > +{
> > +   int pos;
> > +
> > +   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> > +pos > 0;
> > +pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> > +   u8 type;
> > +   pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> > +

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Michael S. Tsirkin

On Mon, Aug 29, 2016 at 09:23:25PM -0600, Alex Williamson wrote:
> On Tue, 30 Aug 2016 05:27:17 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> > to signal they are safe to use with an IOMMU.
> > 
> > Without this bit, exposing the device to userspace is unsafe, so probe
> > and fail VFIO initialization unless noiommu is enabled.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/vfio/pci/vfio_pci_private.h |   1 +
> >  drivers/vfio/pci/vfio_pci.c |  14 
> >  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> > 
> >  drivers/vfio/pci/Makefile   |   1 +
> >  4 files changed, 156 insertions(+)
> >  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> > b/drivers/vfio/pci/vfio_pci_private.h
> > index 2128de8..2bd5616 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> > vfio_pci_device *vdev)
> > return -ENODEV;
> >  }
> >  #endif
> > +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool 
> > noiommu);
> >  #endif /* VFIO_PCI_PRIVATE_H */
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index d624a52..e93bf0c 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, 
> > const struct pci_device_id *id)
> > return ret;
> > }
> >  
> > +   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
> 
> Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
> ID range initially as well, this test raised a big red flag for me
> whether all devices within this vendor ID were virtio.
> 
> > +   bool noiommu = vfio_is_noiommu_group_dev(>dev);
> 
> I think you can use iommu_present() for this and avoid patch 1of2.

So in presence of an IOMMU in the system, is it impossible to bind the
noiommu device?

I did not test this yet.

If yes this is something we'll need to fix as well -
we do want to allow binding noiommu to a legacy virtio device.



> noiommu is mutually exclusive to an iommu being present.  Seems like
> all of this logic should be in the quirk itself, I'm not sure what it
> buys to get the value here but wait until later to use it.  Using
> iommu_present() could also move this test much earlier in
> vfio_pci_probe() making the exit path easier.
> 
> > +
> > +   ret = vfio_pci_virtio_quirk(vdev, noiommu);
> > +   if (ret) {
> > +   dev_warn(>pdev->dev,
> > +"Failed to setup Virtio for VFIO\n");
> > +   vfio_del_group_dev(>dev);
> > +   vfio_iommu_group_put(group, >dev);
> > +   kfree(vdev);
> > +   return ret;
> > +   }
> > +   }
> > +
> > if (vfio_pci_is_vga(pdev)) {
> > vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
> > vga_set_legacy_decoding(pdev,
> > diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> > b/drivers/vfio/pci/vfio_pci_virtio.c
> > new file mode 100644
> > index 000..e1ecffd
> > --- /dev/null
> > +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> > @@ -0,0 +1,140 @@
> > +/*
> > + * VFIO PCI Intel Graphics support
>   ^^^
> > + *
> > + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> > + * Author: Alex Williamson 
> 
> Update.
> 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * Register a device specific region through which to provide read-only
> > + * access to the Intel IGD opregion.  The register defining the opregion
> > + * address is also virtualized to prevent user modification.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> 
> Are io.h and uaccess.h needed?
> 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "vfio_pci_private.h"
> > +
> > +/**
> > + * virtio_pci_find_capability - walk capabilities to find device info.
> > + * @dev: the pci device
> > + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> > + *
> > + * Returns offset of the capability, or 0.
> > + */
> > +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> > cfg_type)
> 
> Does inlining this really make sense?
> 
> > +{
> > +   int pos;
> > +
> > +   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> > +pos > 0;
> > +pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> > +   u8 type;
> > +   pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> > +cfg_type),
> > +);
> > +

Re: [PATCH] tpm: fix invalid constant expressions in tpm.h

2016-08-29 Thread Jason Gunthorpe

On Tue, Aug 30, 2016 at 04:28:17AM +0300, Jarkko Sakkinen wrote:
> The enums tpm_capabilities and tpm_sub_capabilities do not contain legit
> constant expressions. This commit makes cap_id a separate parameter
> in

I wonder if this is a bug in sparse? the macro uses gcc magic to
expand to a constexpr.

You could also use __constant_cpu_to_be32 and similar instead.

But I admit I never liked the use of no-host endian in the constants..

>  #define TPM_ORD_STARTUP cpu_to_be32(153)
>  #define TPM_ST_CLEAR cpu_to_be16(1)

Would be nice to see these fixed into an enum someday too

> +enum tpm1_capabilities {
> + TPM1_CAP_FLAG   = 0x04,
> + TPM1_CAP_PROP   = 0x05,
> + TPM1_CAP_VERSION_1_1= 0x06,
> + TPM1_CAP_VERSION_1_2= 0x1A,

I usually discourage the extra horizontal whitespace, just causes patch churn to
keep it up (and clang-format won't do it automatically). Not sure if
there is a consensus on that though.

But looks fine to me.

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH] tpm: fix invalid constant expressions in tpm.h

2016-08-29 Thread Jason Gunthorpe

On Tue, Aug 30, 2016 at 04:28:17AM +0300, Jarkko Sakkinen wrote:
> The enums tpm_capabilities and tpm_sub_capabilities do not contain legit
> constant expressions. This commit makes cap_id a separate parameter
> in

I wonder if this is a bug in sparse? the macro uses gcc magic to
expand to a constexpr.

You could also use __constant_cpu_to_be32 and similar instead.

But I admit I never liked the use of no-host endian in the constants..

>  #define TPM_ORD_STARTUP cpu_to_be32(153)
>  #define TPM_ST_CLEAR cpu_to_be16(1)

Would be nice to see these fixed into an enum someday too

> +enum tpm1_capabilities {
> + TPM1_CAP_FLAG   = 0x04,
> + TPM1_CAP_PROP   = 0x05,
> + TPM1_CAP_VERSION_1_1= 0x06,
> + TPM1_CAP_VERSION_1_2= 0x1A,

I usually discourage the extra horizontal whitespace, just causes patch churn to
keep it up (and clang-format won't do it automatically). Not sure if
there is a consensus on that though.

But looks fine to me.

Reviewed-by: Jason Gunthorpe 

Jason

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Andrew Morton

On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:

> >> Case used for test on Haswell EP:
> >> usemem -n 72 --readonly -j 0x20 100G
> >> Which spawns 72 processes and each will mmap 100G anonymous space and
> >> then do read only access to that space sequentially with a step of 2MB.
> >>
> >> perf report for base commit:
> >> 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
> >> perf report for this commit:
> >>  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
> > 
> > Does this mean that overall usemem runtime halved?
> 
> Sorry for the confusion, the above line is extracted from perf report.
> It shows the percent of CPU cycles executed in a specific function.
> 
> The above two perf lines are used to show get_huge_zero_page doesn't
> consume that much CPU cycles after applying the patch.
> 
> > 
> > Do we have any numbers for something which is more real-wordly?
> 
> Unfortunately, no real world numbers.
> 
> We think the global atomic counter could be an issue for performance
> so I'm trying to solve the problem.

So, umm, we don't actually know if the patch is useful to anyone?

Some more measurements would help things along, please.

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Andrew Morton

On Tue, 30 Aug 2016 11:09:15 +0800 Aaron Lu  wrote:

> >> Case used for test on Haswell EP:
> >> usemem -n 72 --readonly -j 0x20 100G
> >> Which spawns 72 processes and each will mmap 100G anonymous space and
> >> then do read only access to that space sequentially with a step of 2MB.
> >>
> >> perf report for base commit:
> >> 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
> >> perf report for this commit:
> >>  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
> > 
> > Does this mean that overall usemem runtime halved?
> 
> Sorry for the confusion, the above line is extracted from perf report.
> It shows the percent of CPU cycles executed in a specific function.
> 
> The above two perf lines are used to show get_huge_zero_page doesn't
> consume that much CPU cycles after applying the patch.
> 
> > 
> > Do we have any numbers for something which is more real-wordly?
> 
> Unfortunately, no real world numbers.
> 
> We think the global atomic counter could be an issue for performance
> so I'm trying to solve the problem.

So, umm, we don't actually know if the patch is useful to anyone?

Some more measurements would help things along, please.

Re: [PATCH v3 1/2] input: misc: Add generic input driver to read encoded GPIO lines

2016-08-29 Thread Dmitry Torokhov

On Mon, Aug 29, 2016 at 09:50:28AM +0530, Vignesh R wrote:
> 
> 
> On Thursday 25 August 2016 10:26 PM, Dmitry Torokhov wrote:
> > On Wed, Aug 24, 2016 at 01:28:58PM +0530, Vignesh R wrote:
> >> Add a driver to read group of GPIO lines and provide its status as a
> >> numerical value as input event to the system. This will help in
> >> interfacing devices, that can be connected over GPIOs, that provide
> >> input to the system by driving GPIO lines connected to them like a
> >> rotary dial or a switch.
> >>
> >> For example, a rotary switch can be connected to four GPIO lines. The
> >> status of the GPIO lines reflect the actual position of the rotary
> >> switch dial. For example, if dial points to 9, then the four GPIO lines
> >> connected to the switch will read HLLH(0b'1001 = 9). This value
> >> can be reported as an ABS_* event to the input subsystem.
> >>
> >> Signed-off-by: Vignesh R 
> >> Acked-by: Rob Herring 
> >> ---
> >>
> >> v3: Fix comments by Andrew and Dmitry
> >> Link to v2: https://lkml.org/lkml/2016/8/23/79
> >>
> >>  .../devicetree/bindings/input/gpio-decoder.txt |  23 
> >>  drivers/input/misc/Kconfig |  12 ++
> >>  drivers/input/misc/Makefile|   1 +
> >>  drivers/input/misc/gpio_decoder.c  | 134 
> >> +
> >>  4 files changed, 170 insertions(+)
> >>  create mode 100644 
> >> Documentation/devicetree/bindings/input/gpio-decoder.txt
> >>  create mode 100644 drivers/input/misc/gpio_decoder.c
> >>
> >> diff --git a/Documentation/devicetree/bindings/input/gpio-decoder.txt 
> >> b/Documentation/devicetree/bindings/input/gpio-decoder.txt
> >> new file mode 100644
> >> index ..14a77fb96cf0
> >> --- /dev/null
> >> +++ b/Documentation/devicetree/bindings/input/gpio-decoder.txt
> >> @@ -0,0 +1,23 @@
> >> +* GPIO Decoder DT bindings
> >> +
> >> +Required Properties:
> >> +- compatible: should be "gpio-decoder"
> >> +- gpios: a spec of gpios (at least two) to be decoded to a number with
> >> +  first entry representing the MSB.
> >> +
> >> +Optional Properties:
> >> +- decoder-max-value: Maximum possible value that can be reported by
> >> +  the gpios.
> >> +- linux,axis: the input subsystem axis to map to (ABS_X/ABS_Y).
> >> +  Defaults to 0 (ABS_X).
> >> +
> >> +Example:
> >> +  gpio-decoder0 {
> >> +  compatible = "gpio-decoder";
> >> +  gpios = < 3 GPIO_ACTIVE_HIGH>,
> >> +  < 2 GPIO_ACTIVE_HIGH>,
> >> +  < 1 GPIO_ACTIVE_HIGH>,
> >> +  < 0 GPIO_ACTIVE_HIGH>;
> >> +  linux,axis = <0>; /* ABS_X */
> >> +  decoder-max-value = <9>;
> >> +  };
> >> diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
> >> index efb0ca871327..7cdb89397d18 100644
> >> --- a/drivers/input/misc/Kconfig
> >> +++ b/drivers/input/misc/Kconfig
> >> @@ -292,6 +292,18 @@ config INPUT_GPIO_TILT_POLLED
> >>  To compile this driver as a module, choose M here: the
> >>  module will be called gpio_tilt_polled.
> >>  
> >> +config INPUT_GPIO_DECODER
> >> +  tristate "Polled GPIO Decoder Input driver"
> >> +  depends on GPIOLIB || COMPILE_TEST
> >> +  select INPUT_POLLDEV
> >> +  help
> >> +   Say Y here if you want driver to read status of multiple GPIO
> >> +   lines and report the encoded value as an absolute integer to
> >> +   input subsystem.
> >> +
> >> +   To compile this driver as a module, choose M here: the module
> >> +   will be called gpio_decoder.
> >> +
> >>  config INPUT_IXP4XX_BEEPER
> >>tristate "IXP4XX Beeper support"
> >>depends on ARCH_IXP4XX
> >> diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile
> >> index 6a1e5e20fc1c..0b6d025f0487 100644
> >> --- a/drivers/input/misc/Makefile
> >> +++ b/drivers/input/misc/Makefile
> >> @@ -35,6 +35,7 @@ obj-$(CONFIG_INPUT_DRV2667_HAPTICS)  += drv2667.o
> >>  obj-$(CONFIG_INPUT_GP2A)  += gp2ap002a00f.o
> >>  obj-$(CONFIG_INPUT_GPIO_BEEPER)   += gpio-beeper.o
> >>  obj-$(CONFIG_INPUT_GPIO_TILT_POLLED)  += gpio_tilt_polled.o
> >> +obj-$(CONFIG_INPUT_GPIO_DECODER)  += gpio_decoder.o
> >>  obj-$(CONFIG_INPUT_HISI_POWERKEY) += hisi_powerkey.o
> >>  obj-$(CONFIG_HP_SDC_RTC)  += hp_sdc_rtc.o
> >>  obj-$(CONFIG_INPUT_IMS_PCU)   += ims-pcu.o
> >> diff --git a/drivers/input/misc/gpio_decoder.c 
> >> b/drivers/input/misc/gpio_decoder.c
> >> new file mode 100644
> >> index ..1c2191d4b143
> >> --- /dev/null
> >> +++ b/drivers/input/misc/gpio_decoder.c
> >> @@ -0,0 +1,134 @@
> >> +/*
> >> + * Copyright (C) 2016 Texas Instruments Incorporated - http://www.ti.com/
> >> + *
> >> + * This program is free software; you can redistribute it and/or
> >> + * modify it under the terms of the GNU General Public License as
> >> + * published by the Free Software Foundation version 2.
> >> + *
> >> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> >> + * kind,

Re: [PATCH v3 1/2] input: misc: Add generic input driver to read encoded GPIO lines

2016-08-29 Thread Dmitry Torokhov

On Mon, Aug 29, 2016 at 09:50:28AM +0530, Vignesh R wrote:
> 
> 
> On Thursday 25 August 2016 10:26 PM, Dmitry Torokhov wrote:
> > On Wed, Aug 24, 2016 at 01:28:58PM +0530, Vignesh R wrote:
> >> Add a driver to read group of GPIO lines and provide its status as a
> >> numerical value as input event to the system. This will help in
> >> interfacing devices, that can be connected over GPIOs, that provide
> >> input to the system by driving GPIO lines connected to them like a
> >> rotary dial or a switch.
> >>
> >> For example, a rotary switch can be connected to four GPIO lines. The
> >> status of the GPIO lines reflect the actual position of the rotary
> >> switch dial. For example, if dial points to 9, then the four GPIO lines
> >> connected to the switch will read HLLH(0b'1001 = 9). This value
> >> can be reported as an ABS_* event to the input subsystem.
> >>
> >> Signed-off-by: Vignesh R 
> >> Acked-by: Rob Herring 
> >> ---
> >>
> >> v3: Fix comments by Andrew and Dmitry
> >> Link to v2: https://lkml.org/lkml/2016/8/23/79
> >>
> >>  .../devicetree/bindings/input/gpio-decoder.txt |  23 
> >>  drivers/input/misc/Kconfig |  12 ++
> >>  drivers/input/misc/Makefile|   1 +
> >>  drivers/input/misc/gpio_decoder.c  | 134 
> >> +
> >>  4 files changed, 170 insertions(+)
> >>  create mode 100644 
> >> Documentation/devicetree/bindings/input/gpio-decoder.txt
> >>  create mode 100644 drivers/input/misc/gpio_decoder.c
> >>
> >> diff --git a/Documentation/devicetree/bindings/input/gpio-decoder.txt 
> >> b/Documentation/devicetree/bindings/input/gpio-decoder.txt
> >> new file mode 100644
> >> index ..14a77fb96cf0
> >> --- /dev/null
> >> +++ b/Documentation/devicetree/bindings/input/gpio-decoder.txt
> >> @@ -0,0 +1,23 @@
> >> +* GPIO Decoder DT bindings
> >> +
> >> +Required Properties:
> >> +- compatible: should be "gpio-decoder"
> >> +- gpios: a spec of gpios (at least two) to be decoded to a number with
> >> +  first entry representing the MSB.
> >> +
> >> +Optional Properties:
> >> +- decoder-max-value: Maximum possible value that can be reported by
> >> +  the gpios.
> >> +- linux,axis: the input subsystem axis to map to (ABS_X/ABS_Y).
> >> +  Defaults to 0 (ABS_X).
> >> +
> >> +Example:
> >> +  gpio-decoder0 {
> >> +  compatible = "gpio-decoder";
> >> +  gpios = < 3 GPIO_ACTIVE_HIGH>,
> >> +  < 2 GPIO_ACTIVE_HIGH>,
> >> +  < 1 GPIO_ACTIVE_HIGH>,
> >> +  < 0 GPIO_ACTIVE_HIGH>;
> >> +  linux,axis = <0>; /* ABS_X */
> >> +  decoder-max-value = <9>;
> >> +  };
> >> diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
> >> index efb0ca871327..7cdb89397d18 100644
> >> --- a/drivers/input/misc/Kconfig
> >> +++ b/drivers/input/misc/Kconfig
> >> @@ -292,6 +292,18 @@ config INPUT_GPIO_TILT_POLLED
> >>  To compile this driver as a module, choose M here: the
> >>  module will be called gpio_tilt_polled.
> >>  
> >> +config INPUT_GPIO_DECODER
> >> +  tristate "Polled GPIO Decoder Input driver"
> >> +  depends on GPIOLIB || COMPILE_TEST
> >> +  select INPUT_POLLDEV
> >> +  help
> >> +   Say Y here if you want driver to read status of multiple GPIO
> >> +   lines and report the encoded value as an absolute integer to
> >> +   input subsystem.
> >> +
> >> +   To compile this driver as a module, choose M here: the module
> >> +   will be called gpio_decoder.
> >> +
> >>  config INPUT_IXP4XX_BEEPER
> >>tristate "IXP4XX Beeper support"
> >>depends on ARCH_IXP4XX
> >> diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile
> >> index 6a1e5e20fc1c..0b6d025f0487 100644
> >> --- a/drivers/input/misc/Makefile
> >> +++ b/drivers/input/misc/Makefile
> >> @@ -35,6 +35,7 @@ obj-$(CONFIG_INPUT_DRV2667_HAPTICS)  += drv2667.o
> >>  obj-$(CONFIG_INPUT_GP2A)  += gp2ap002a00f.o
> >>  obj-$(CONFIG_INPUT_GPIO_BEEPER)   += gpio-beeper.o
> >>  obj-$(CONFIG_INPUT_GPIO_TILT_POLLED)  += gpio_tilt_polled.o
> >> +obj-$(CONFIG_INPUT_GPIO_DECODER)  += gpio_decoder.o
> >>  obj-$(CONFIG_INPUT_HISI_POWERKEY) += hisi_powerkey.o
> >>  obj-$(CONFIG_HP_SDC_RTC)  += hp_sdc_rtc.o
> >>  obj-$(CONFIG_INPUT_IMS_PCU)   += ims-pcu.o
> >> diff --git a/drivers/input/misc/gpio_decoder.c 
> >> b/drivers/input/misc/gpio_decoder.c
> >> new file mode 100644
> >> index ..1c2191d4b143
> >> --- /dev/null
> >> +++ b/drivers/input/misc/gpio_decoder.c
> >> @@ -0,0 +1,134 @@
> >> +/*
> >> + * Copyright (C) 2016 Texas Instruments Incorporated - http://www.ti.com/
> >> + *
> >> + * This program is free software; you can redistribute it and/or
> >> + * modify it under the terms of the GNU General Public License as
> >> + * published by the Free Software Foundation version 2.
> >> + *
> >> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> >> + * kind, whether express or implied;

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson

On Tue, 30 Aug 2016 05:27:17 +0300
"Michael S. Tsirkin"  wrote:

> Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> to signal they are safe to use with an IOMMU.
> 
> Without this bit, exposing the device to userspace is unsafe, so probe
> and fail VFIO initialization unless noiommu is enabled.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/vfio/pci/vfio_pci_private.h |   1 +
>  drivers/vfio/pci/vfio_pci.c |  14 
>  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> 
>  drivers/vfio/pci/Makefile   |   1 +
>  4 files changed, 156 insertions(+)
>  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> 
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index 2128de8..2bd5616 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> vfio_pci_device *vdev)
>   return -ENODEV;
>  }
>  #endif
> +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool noiommu);
>  #endif /* VFIO_PCI_PRIVATE_H */
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index d624a52..e93bf0c 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   return ret;
>   }
>  
> + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {

Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
ID range initially as well, this test raised a big red flag for me
whether all devices within this vendor ID were virtio.

> + bool noiommu = vfio_is_noiommu_group_dev(>dev);

I think you can use iommu_present() for this and avoid patch 1of2.
noiommu is mutually exclusive to an iommu being present.  Seems like
all of this logic should be in the quirk itself, I'm not sure what it
buys to get the value here but wait until later to use it.  Using
iommu_present() could also move this test much earlier in
vfio_pci_probe() making the exit path easier.

> +
> + ret = vfio_pci_virtio_quirk(vdev, noiommu);
> + if (ret) {
> + dev_warn(>pdev->dev,
> +  "Failed to setup Virtio for VFIO\n");
> + vfio_del_group_dev(>dev);
> + vfio_iommu_group_put(group, >dev);
> + kfree(vdev);
> + return ret;
> + }
> + }
> +
>   if (vfio_pci_is_vga(pdev)) {
>   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
>   vga_set_legacy_decoding(pdev,
> diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> b/drivers/vfio/pci/vfio_pci_virtio.c
> new file mode 100644
> index 000..e1ecffd
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> @@ -0,0 +1,140 @@
> +/*
> + * VFIO PCI Intel Graphics support
  ^^^
> + *
> + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> + *   Author: Alex Williamson 

Update.

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Register a device specific region through which to provide read-only
> + * access to the Intel IGD opregion.  The register defining the opregion
> + * address is also virtualized to prevent user modification.
> + */
> +
> +#include 
> +#include 
> +#include 

Are io.h and uaccess.h needed?

> +#include 
> +#include 
> +#include 
> +
> +#include "vfio_pci_private.h"
> +
> +/**
> + * virtio_pci_find_capability - walk capabilities to find device info.
> + * @dev: the pci device
> + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> + *
> + * Returns offset of the capability, or 0.
> + */
> +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> cfg_type)

Does inlining this really make sense?

> +{
> + int pos;
> +
> + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> +  pos > 0;
> +  pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> + u8 type;
> + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> +  cfg_type),
> +  );
> +
> + if (type != cfg_type)
> + continue;
> +
> + /* Ignore structures with reserved BAR values */
> + if (type != VIRTIO_PCI_CAP_PCI_CFG) {
> + u8 bar;
> +
> + pci_read_config_byte(dev, pos +
> +  offsetof(struct virtio_pci_cap,
> +   bar),
> +

Re: [PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Alex Williamson

On Tue, 30 Aug 2016 05:27:17 +0300
"Michael S. Tsirkin"  wrote:

> Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
> to signal they are safe to use with an IOMMU.
> 
> Without this bit, exposing the device to userspace is unsafe, so probe
> and fail VFIO initialization unless noiommu is enabled.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/vfio/pci/vfio_pci_private.h |   1 +
>  drivers/vfio/pci/vfio_pci.c |  14 
>  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
> 
>  drivers/vfio/pci/Makefile   |   1 +
>  4 files changed, 156 insertions(+)
>  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c
> 
> diff --git a/drivers/vfio/pci/vfio_pci_private.h 
> b/drivers/vfio/pci/vfio_pci_private.h
> index 2128de8..2bd5616 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct 
> vfio_pci_device *vdev)
>   return -ENODEV;
>  }
>  #endif
> +extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool noiommu);
>  #endif /* VFIO_PCI_PRIVATE_H */
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index d624a52..e93bf0c 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
> struct pci_device_id *id)
>   return ret;
>   }
>  
> + if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {

Perhaps a vfio_pci_is_virtio() like vga below?  Let's test the device
ID range initially as well, this test raised a big red flag for me
whether all devices within this vendor ID were virtio.

> + bool noiommu = vfio_is_noiommu_group_dev(>dev);

I think you can use iommu_present() for this and avoid patch 1of2.
noiommu is mutually exclusive to an iommu being present.  Seems like
all of this logic should be in the quirk itself, I'm not sure what it
buys to get the value here but wait until later to use it.  Using
iommu_present() could also move this test much earlier in
vfio_pci_probe() making the exit path easier.

> +
> + ret = vfio_pci_virtio_quirk(vdev, noiommu);
> + if (ret) {
> + dev_warn(>pdev->dev,
> +  "Failed to setup Virtio for VFIO\n");
> + vfio_del_group_dev(>dev);
> + vfio_iommu_group_put(group, >dev);
> + kfree(vdev);
> + return ret;
> + }
> + }
> +
>   if (vfio_pci_is_vga(pdev)) {
>   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
>   vga_set_legacy_decoding(pdev,
> diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
> b/drivers/vfio/pci/vfio_pci_virtio.c
> new file mode 100644
> index 000..e1ecffd
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_pci_virtio.c
> @@ -0,0 +1,140 @@
> +/*
> + * VFIO PCI Intel Graphics support
  ^^^
> + *
> + * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
> + *   Author: Alex Williamson 

Update.

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Register a device specific region through which to provide read-only
> + * access to the Intel IGD opregion.  The register defining the opregion
> + * address is also virtualized to prevent user modification.
> + */
> +
> +#include 
> +#include 
> +#include 

Are io.h and uaccess.h needed?

> +#include 
> +#include 
> +#include 
> +
> +#include "vfio_pci_private.h"
> +
> +/**
> + * virtio_pci_find_capability - walk capabilities to find device info.
> + * @dev: the pci device
> + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> + *
> + * Returns offset of the capability, or 0.
> + */
> +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 
> cfg_type)

Does inlining this really make sense?

> +{
> + int pos;
> +
> + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> +  pos > 0;
> +  pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> + u8 type;
> + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> +  cfg_type),
> +  );
> +
> + if (type != cfg_type)
> + continue;
> +
> + /* Ignore structures with reserved BAR values */
> + if (type != VIRTIO_PCI_CAP_PCI_CFG) {
> + u8 bar;
> +
> + pci_read_config_byte(dev, pos +
> +  offsetof(struct virtio_pci_cap,
> +   bar),
> +  );
> + if (bar > 0x5)

Re: [Qemu-devel] [PATCH v2 0/2] vfio: blacklist legacy virtio devices

2016-08-29 Thread Jason Wang




On 2016年08月30日 10:27, Michael S. Tsirkin wrote:

Legacy virtio devices always bypassed an IOMMU, so using them with vfio was
never safe.


And it actually won't work since GPA is assumed in the device. So I'm 
not sure this is must since we should get a IOMMU fault in this case.



  This adds a quirk detecting these and disabling VFIO unless the
noiommu mode is used.  At the moment, this only applies to virtio-pci devices.

The patch might make sense on stable as well.

Michael S. Tsirkin (2):
   vfio: report group noiommu status
   vfio: add virtio pci quirk

  drivers/vfio/pci/vfio_pci_private.h |   1 +
  include/linux/vfio.h|   2 +
  drivers/vfio/pci/vfio_pci.c |  14 
  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
  drivers/vfio/vfio.c |  12 
  drivers/vfio/pci/Makefile   |   1 +
  6 files changed, 170 insertions(+)
  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c

Re: [Qemu-devel] [PATCH v2 0/2] vfio: blacklist legacy virtio devices

2016-08-29 Thread Jason Wang




On 2016年08月30日 10:27, Michael S. Tsirkin wrote:

Legacy virtio devices always bypassed an IOMMU, so using them with vfio was
never safe.


And it actually won't work since GPA is assumed in the device. So I'm 
not sure this is must since we should get a IOMMU fault in this case.



  This adds a quirk detecting these and disabling VFIO unless the
noiommu mode is used.  At the moment, this only applies to virtio-pci devices.

The patch might make sense on stable as well.

Michael S. Tsirkin (2):
   vfio: report group noiommu status
   vfio: add virtio pci quirk

  drivers/vfio/pci/vfio_pci_private.h |   1 +
  include/linux/vfio.h|   2 +
  drivers/vfio/pci/vfio_pci.c |  14 
  drivers/vfio/pci/vfio_pci_virtio.c  | 140 
  drivers/vfio/vfio.c |  12 
  drivers/vfio/pci/Makefile   |   1 +
  6 files changed, 170 insertions(+)
  create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Aaron Lu

On 08/30/2016 06:50 AM, Andrew Morton wrote:
> On Mon, 29 Aug 2016 14:31:20 +0800 Aaron Lu  wrote:
> 
>>
>> The global zero page is used to satisfy an anonymous read fault. If
>> THP(Transparent HugePage) is enabled then the global huge zero page is used.
>> The global huge zero page uses an atomic counter for reference counting
>> and is allocated/freed dynamically according to its counter value.
>>
>> CPU time spent on that counter will greatly increase if there are
>> a lot of processes doing anonymous read faults. This patch proposes a
>> way to reduce the access to the global counter so that the CPU load
>> can be reduced accordingly.
>>
>> To do this, a new flag of the mm_struct is introduced: 
>> MMF_USED_HUGE_ZERO_PAGE.
>> With this flag, the process only need to touch the global counter in
>> two cases:
>> 1 The first time it uses the global huge zero page;
>> 2 The time when mm_user of its mm_struct reaches zero.
>>
>> Note that right now, the huge zero page is eligible to be freed as soon
>> as its last use goes away.  With this patch, the page will not be
>> eligible to be freed until the exit of the last process from which it
>> was ever used.
>>
>> And with the use of mm_user, the kthread is not eligible to use huge
>> zero page either. Since no kthread is using huge zero page today, there
>> is no difference after applying this patch. But if that is not desired,
>> I can change it to when mm_count reaches zero.
> 
> I suppose we could simply never free the zero huge page - if some
> process has used it in the past, others will probably use it in the
> future.  One wonders how useful this optimization is...
>
> But the patch is simple enough.
> 
>> Case used for test on Haswell EP:
>> usemem -n 72 --readonly -j 0x20 100G
>> Which spawns 72 processes and each will mmap 100G anonymous space and
>> then do read only access to that space sequentially with a step of 2MB.
>>
>> perf report for base commit:
>> 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
>> perf report for this commit:
>>  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
> 
> Does this mean that overall usemem runtime halved?

Sorry for the confusion, the above line is extracted from perf report.
It shows the percent of CPU cycles executed in a specific function.

The above two perf lines are used to show get_huge_zero_page doesn't
consume that much CPU cycles after applying the patch.

> 
> Do we have any numbers for something which is more real-wordly?

Unfortunately, no real world numbers.

We think the global atomic counter could be an issue for performance
so I'm trying to solve the problem.

Thanks,
Aaron

Re: [PATCH] thp: reduce usage of huge zero page's atomic counter

2016-08-29 Thread Aaron Lu

On 08/30/2016 06:50 AM, Andrew Morton wrote:
> On Mon, 29 Aug 2016 14:31:20 +0800 Aaron Lu  wrote:
> 
>>
>> The global zero page is used to satisfy an anonymous read fault. If
>> THP(Transparent HugePage) is enabled then the global huge zero page is used.
>> The global huge zero page uses an atomic counter for reference counting
>> and is allocated/freed dynamically according to its counter value.
>>
>> CPU time spent on that counter will greatly increase if there are
>> a lot of processes doing anonymous read faults. This patch proposes a
>> way to reduce the access to the global counter so that the CPU load
>> can be reduced accordingly.
>>
>> To do this, a new flag of the mm_struct is introduced: 
>> MMF_USED_HUGE_ZERO_PAGE.
>> With this flag, the process only need to touch the global counter in
>> two cases:
>> 1 The first time it uses the global huge zero page;
>> 2 The time when mm_user of its mm_struct reaches zero.
>>
>> Note that right now, the huge zero page is eligible to be freed as soon
>> as its last use goes away.  With this patch, the page will not be
>> eligible to be freed until the exit of the last process from which it
>> was ever used.
>>
>> And with the use of mm_user, the kthread is not eligible to use huge
>> zero page either. Since no kthread is using huge zero page today, there
>> is no difference after applying this patch. But if that is not desired,
>> I can change it to when mm_count reaches zero.
> 
> I suppose we could simply never free the zero huge page - if some
> process has used it in the past, others will probably use it in the
> future.  One wonders how useful this optimization is...
>
> But the patch is simple enough.
> 
>> Case used for test on Haswell EP:
>> usemem -n 72 --readonly -j 0x20 100G
>> Which spawns 72 processes and each will mmap 100G anonymous space and
>> then do read only access to that space sequentially with a step of 2MB.
>>
>> perf report for base commit:
>> 54.03%  usemem   [kernel.kallsyms]   [k] get_huge_zero_page
>> perf report for this commit:
>>  0.11%  usemem   [kernel.kallsyms]   [k] mm_get_huge_zero_page
> 
> Does this mean that overall usemem runtime halved?

Sorry for the confusion, the above line is extracted from perf report.
It shows the percent of CPU cycles executed in a specific function.

The above two perf lines are used to show get_huge_zero_page doesn't
consume that much CPU cycles after applying the patch.

> 
> Do we have any numbers for something which is more real-wordly?

Unfortunately, no real world numbers.

We think the global atomic counter could be an issue for performance
so I'm trying to solve the problem.

Thanks,
Aaron

Re: [PATCH v2] pwm-regulator: Add support for a fixed delay after duty cycle changes

2016-08-29 Thread Doug Anderson

Hi,

On Fri, Aug 26, 2016 at 5:20 PM, Matthias Kaehlcke  wrote:
> A change of the duty cycle doesn't necessarily cause an inmediate switch
> to the target voltage. The voltage change may be gradual and complete
> with a certain delay. This change introduces the device tree properties
> "settle-time-up-us" and "settle-time-down-us", which allow to specify a
> fixed delay after a voltage increase or decrease. Often it is not
> strictly necessary for a voltage decrease to complete, therefore the
> delays may be asymmetric. For regulators with a ramp delay the
> corresponding settle time is added to the ramp delay.
>
> Signed-off-by: Matthias Kaehlcke 
> ---
>  .../bindings/regulator/pwm-regulator.txt   | 10 +
>  drivers/regulator/pwm-regulator.c  | 25 
> ++
>  2 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt 
> b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
> index 3aeba9f..42b6819 100644
> --- a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
> +++ b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
> @@ -29,6 +29,16 @@ Required properties:
>
>  - pwms:PWM specification (See: ../pwm/pwm.txt)
>
> +Optional properties:
> +
> +- settle-time-up-us:   Time to settle down after a voltage increase
> +   (unit: us). For regulators with a ramp delay
> +   the two values are added.
> +
> +- settle-time-down-us: Time to settle down after a voltage decrease
> +   (unit: us). For regulators with a ramp delay
> +   the two values are added.

Based on investigations that we've been doing recently, it might make
sense to leave "settle-time-down-us" out of this patch for now.  From
how our EEs appear to have designed our hardware:

* There's no "ramp up" time that's in terms of uV / uS, just a
"settle-time-up-us" like you have here.

* For going down there's not really a settle time or a ramp down time.
The regulator will just fall at a rate that depends on the current
load.  ...and it's kinda slow / not worth it to block waiting for the
regulator to go down.

* We may want to introduce some other properties related to the
downward ramp to keep the regulator from falling too fast.  On our
board if you drop down too fast you can trigger the PWM regulator over
voltage protection, so on downward ramps we might have to break into
additional steps.

To say it another way:

- settle-time-up-us looks great.

- settle-time-down-us isn't useful on our board and it's probably
better to wait until there's a user before adding this property.

- future properties will come to help break the downward transition
into multiple steps.

Does that sound sane?

-Doug

Re: [PATCH v2] pwm-regulator: Add support for a fixed delay after duty cycle changes

2016-08-29 Thread Doug Anderson

Hi,

On Fri, Aug 26, 2016 at 5:20 PM, Matthias Kaehlcke  wrote:
> A change of the duty cycle doesn't necessarily cause an inmediate switch
> to the target voltage. The voltage change may be gradual and complete
> with a certain delay. This change introduces the device tree properties
> "settle-time-up-us" and "settle-time-down-us", which allow to specify a
> fixed delay after a voltage increase or decrease. Often it is not
> strictly necessary for a voltage decrease to complete, therefore the
> delays may be asymmetric. For regulators with a ramp delay the
> corresponding settle time is added to the ramp delay.
>
> Signed-off-by: Matthias Kaehlcke 
> ---
>  .../bindings/regulator/pwm-regulator.txt   | 10 +
>  drivers/regulator/pwm-regulator.c  | 25 
> ++
>  2 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt 
> b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
> index 3aeba9f..42b6819 100644
> --- a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
> +++ b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
> @@ -29,6 +29,16 @@ Required properties:
>
>  - pwms:PWM specification (See: ../pwm/pwm.txt)
>
> +Optional properties:
> +
> +- settle-time-up-us:   Time to settle down after a voltage increase
> +   (unit: us). For regulators with a ramp delay
> +   the two values are added.
> +
> +- settle-time-down-us: Time to settle down after a voltage decrease
> +   (unit: us). For regulators with a ramp delay
> +   the two values are added.

Based on investigations that we've been doing recently, it might make
sense to leave "settle-time-down-us" out of this patch for now.  From
how our EEs appear to have designed our hardware:

* There's no "ramp up" time that's in terms of uV / uS, just a
"settle-time-up-us" like you have here.

* For going down there's not really a settle time or a ramp down time.
The regulator will just fall at a rate that depends on the current
load.  ...and it's kinda slow / not worth it to block waiting for the
regulator to go down.

* We may want to introduce some other properties related to the
downward ramp to keep the regulator from falling too fast.  On our
board if you drop down too fast you can trigger the PWM regulator over
voltage protection, so on downward ramps we might have to break into
additional steps.

To say it another way:

- settle-time-up-us looks great.

- settle-time-down-us isn't useful on our board and it's probably
better to wait until there's a user before adding this property.

- future properties will come to help break the downward transition
into multiple steps.

Does that sound sane?

-Doug

Re: [PATCH v1 00/10] Optimize sched avgs computation and implement flat util hierarchy

2016-08-29 Thread Yuyang Du

On Wed, Aug 24, 2016 at 09:54:35AM +0100, Morten Rasmussen wrote:
> As Dietmar mentioned already, the 'disconnect' is a feature of the PELT
> rewrite. Paul and Ben's original implementation had full propagation up
> and down the hierarchy. IIRC, one of the key points of the rewrite was
> more 'stable' signals, which we would loose by re-introducing immediate
> updates throughout hierarchy.

As I mentioned earlier, no essential change! A feature perhaps is: the
rewrite takes into account the runnable ratio.

E.g., let there be a group having one task with share 1024, if the task
sticks to one CPU, and the task is runnable 50% of the time.

With the old implementation, the group_entity_load_avg is 1024; but with
the rewritten implementation, the group_entity_load_avg is 512. Isn't this
good?

If the task migrates, the old implementation will still be 1024 on the new
CPU, but the rewritten implementation will transition to 512, albeit taking
0.1+ second time, which we are now addressing. Isn't this good?

> It is a significant change to group scheduling, so I'm a bit surprised
> that nobody has observed any problems post the rewrite. But maybe most
> users don't care about the load-balance being slightly off when tasks
> have migrated or new tasks are added to a group.

I don't understand what you are saying.

> If we want to re-introduce propagation of both load and utilization I
> would suggest that we just look at the original implementation. It
> seemed to work.
>
> Handling utilization and load differently will inevitably result in more
> code. The 'flat hierarchy' approach seems slightly less complicated, but
> it prevents us from using group utilization later should we wish to do
> so. It might for example become useful for the schedutil cpufreq
> governor should it ever consider selecting frequencies differently based
> on whether the current task is in a (specific) group or not.

I understand group util may have some usage should you attempt to do so, I'm
not sure how realistic it is.

Nothing prevents you from knowing the current task is from which (specific)
group or not.

Re: [PATCH v1 00/10] Optimize sched avgs computation and implement flat util hierarchy

2016-08-29 Thread Yuyang Du

On Wed, Aug 24, 2016 at 09:54:35AM +0100, Morten Rasmussen wrote:
> As Dietmar mentioned already, the 'disconnect' is a feature of the PELT
> rewrite. Paul and Ben's original implementation had full propagation up
> and down the hierarchy. IIRC, one of the key points of the rewrite was
> more 'stable' signals, which we would loose by re-introducing immediate
> updates throughout hierarchy.

As I mentioned earlier, no essential change! A feature perhaps is: the
rewrite takes into account the runnable ratio.

E.g., let there be a group having one task with share 1024, if the task
sticks to one CPU, and the task is runnable 50% of the time.

With the old implementation, the group_entity_load_avg is 1024; but with
the rewritten implementation, the group_entity_load_avg is 512. Isn't this
good?

If the task migrates, the old implementation will still be 1024 on the new
CPU, but the rewritten implementation will transition to 512, albeit taking
0.1+ second time, which we are now addressing. Isn't this good?

> It is a significant change to group scheduling, so I'm a bit surprised
> that nobody has observed any problems post the rewrite. But maybe most
> users don't care about the load-balance being slightly off when tasks
> have migrated or new tasks are added to a group.

I don't understand what you are saying.

> If we want to re-introduce propagation of both load and utilization I
> would suggest that we just look at the original implementation. It
> seemed to work.
>
> Handling utilization and load differently will inevitably result in more
> code. The 'flat hierarchy' approach seems slightly less complicated, but
> it prevents us from using group utilization later should we wish to do
> so. It might for example become useful for the schedutil cpufreq
> governor should it ever consider selecting frequencies differently based
> on whether the current task is in a (specific) group or not.

I understand group util may have some usage should you attempt to do so, I'm
not sure how realistic it is.

Nothing prevents you from knowing the current task is from which (specific)
group or not.

Re: xfstests xfs fuzzers fail with DAX

2016-08-29 Thread Darrick J. Wong

On Mon, Aug 29, 2016 at 06:50:05PM -0700, Dan Williams wrote:
> [ Adding Darrick on the off chance that this triggers an "aha, of
> course it does!" ]

Aha!  Of course it does!!! :)

> Darrick these corruption tests you added to xfstests last year all
> fail the same way with DAX enabled.  They spew:
> 
> "pwrite64: Structure needs cleaning"
> 
> ...reports that are cleaned up by running without "-o dax".

I think this happens because in non-dax mode, the pwrite is a buffered
write and so long as we can create a delalloc reservation, everything
is ok and nothing fails.  Whereas for dax we have to allocate the
blocks for the pwrite immediately, thereby triggering the cntbt
verifier error.

Proceeding from the assumption "DAX behaves a lot like DIO", all the
tests that rely on buffered mode semantics are going to choke if DAX
is turned on without them knowing about it.

> Alternatively you could sit back and watch me try to figure it out,
> that should be quite entertaining... as a start I'll try to pin down a
> stack trace when the error is returned.

As for how to fix this, probably the best option is to change line 98
to 'pwrite -W -S 0x62...' and update the output to include the
'structure needs cleaning' message.

Or get rid of the mount option and require explicitly turning on DAX
on a per-inode basis, which I think is where Dave is already going.

--D

> 
> 
> On Wed, Aug 3, 2016 at 7:45 PM, Xiong Zhou  wrote:
> > Hi,
> >
> > A few xfs fuzzers in xfstests fail with dax mount option, pass without dax.
> > They are xfs/086 xfs/088 xfs/089 xfs/091.
> >
> > xfstests to commit 4470ad4c7e  (Jul 26)
> > kernel   to commit dd95069545  (Jul 24)
> >
> > + ./check xfs/091
> > FSTYP -- xfs (non-debug)
> > PLATFORM  -- Linux/x86_64 rhel73 4.7.0+
> > MKFS_OPTIONS  -- -f -bsize=4096 /dev/pmem1
> > MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/pmem1 /daxsch
> >
> > xfs/091  104s
> > Ran: xfs/091
> > Passed all 1 tests
> >
> > + echo 'MOUNT_OPTIONS="-o dax"'
> > + ./check xfs/091
> > FSTYP -- xfs (non-debug)
> > PLATFORM  -- Linux/x86_64 rhel73 4.7.0+
> > MKFS_OPTIONS  -- -f -bsize=4096 /dev/pmem1
> > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem1 
> > /daxsch
> >
> > xfs/091 104s ...  - output mismatch (see 
> > /root/xfstests/results//xfs/091.out.bad)
> > --- tests/xfs/091.out   2016-07-18 02:57:47.67000 -0400
> > +++ /root/xfstests/results//xfs/091.out.bad 2016-08-03 
> > 22:38:14.94800 -0400
> > @@ -6,6 +6,70 @@
> >  + corrupt image
> >  + mount image
> >  + modify files
> > +pwrite64: Structure needs cleaning
> > +pwrite64: Structure needs cleaning
> > +pwrite64: Structure needs cleaning
> > +pwrite64: Structure needs cleaning
> > ...
> > (Run 'diff -u tests/xfs/091.out 
> > /root/xfstests/results//xfs/091.out.bad'  to see the entire diff)
> > Ran: xfs/091
> > Failures: xfs/091
> > Failed 1 of 1 tests
> >
> > # diff -u xfstests/tests/xfs/091.out /root/xfstests/results//xfs/091.out.bad
> > --- xfstests/tests/xfs/091.out  2016-07-18 02:57:47.67000 -0400
> > +++ /root/xfstests/results//xfs/091.out.bad 2016-08-03 
> > 22:38:14.94800 -0400
> > @@ -6,6 +6,70 @@
> >  + corrupt image
> >  + mount image
> >  + modify files
> > +pwrite64: Structure needs cleaning
> > 
> > +pwrite64: Structure needs cleaning
> >  + repair fs
> >  + mount image
> >  + chattr -R -i
> >
> >
> > Thanks,
> > Xiong
> >
> > ___
> > Linux-nvdimm mailing list
> > linux-nvd...@lists.01.org
> > https://lists.01.org/mailman/listinfo/linux-nvdimm
> 
> ___
> xfs mailing list
> x...@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

Re: xfstests xfs fuzzers fail with DAX

2016-08-29 Thread Darrick J. Wong

On Mon, Aug 29, 2016 at 06:50:05PM -0700, Dan Williams wrote:
> [ Adding Darrick on the off chance that this triggers an "aha, of
> course it does!" ]

Aha!  Of course it does!!! :)

> Darrick these corruption tests you added to xfstests last year all
> fail the same way with DAX enabled.  They spew:
> 
> "pwrite64: Structure needs cleaning"
> 
> ...reports that are cleaned up by running without "-o dax".

I think this happens because in non-dax mode, the pwrite is a buffered
write and so long as we can create a delalloc reservation, everything
is ok and nothing fails.  Whereas for dax we have to allocate the
blocks for the pwrite immediately, thereby triggering the cntbt
verifier error.

Proceeding from the assumption "DAX behaves a lot like DIO", all the
tests that rely on buffered mode semantics are going to choke if DAX
is turned on without them knowing about it.

> Alternatively you could sit back and watch me try to figure it out,
> that should be quite entertaining... as a start I'll try to pin down a
> stack trace when the error is returned.

As for how to fix this, probably the best option is to change line 98
to 'pwrite -W -S 0x62...' and update the output to include the
'structure needs cleaning' message.

Or get rid of the mount option and require explicitly turning on DAX
on a per-inode basis, which I think is where Dave is already going.

--D

> 
> 
> On Wed, Aug 3, 2016 at 7:45 PM, Xiong Zhou  wrote:
> > Hi,
> >
> > A few xfs fuzzers in xfstests fail with dax mount option, pass without dax.
> > They are xfs/086 xfs/088 xfs/089 xfs/091.
> >
> > xfstests to commit 4470ad4c7e  (Jul 26)
> > kernel   to commit dd95069545  (Jul 24)
> >
> > + ./check xfs/091
> > FSTYP -- xfs (non-debug)
> > PLATFORM  -- Linux/x86_64 rhel73 4.7.0+
> > MKFS_OPTIONS  -- -f -bsize=4096 /dev/pmem1
> > MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/pmem1 /daxsch
> >
> > xfs/091  104s
> > Ran: xfs/091
> > Passed all 1 tests
> >
> > + echo 'MOUNT_OPTIONS="-o dax"'
> > + ./check xfs/091
> > FSTYP -- xfs (non-debug)
> > PLATFORM  -- Linux/x86_64 rhel73 4.7.0+
> > MKFS_OPTIONS  -- -f -bsize=4096 /dev/pmem1
> > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem1 
> > /daxsch
> >
> > xfs/091 104s ...  - output mismatch (see 
> > /root/xfstests/results//xfs/091.out.bad)
> > --- tests/xfs/091.out   2016-07-18 02:57:47.67000 -0400
> > +++ /root/xfstests/results//xfs/091.out.bad 2016-08-03 
> > 22:38:14.94800 -0400
> > @@ -6,6 +6,70 @@
> >  + corrupt image
> >  + mount image
> >  + modify files
> > +pwrite64: Structure needs cleaning
> > +pwrite64: Structure needs cleaning
> > +pwrite64: Structure needs cleaning
> > +pwrite64: Structure needs cleaning
> > ...
> > (Run 'diff -u tests/xfs/091.out 
> > /root/xfstests/results//xfs/091.out.bad'  to see the entire diff)
> > Ran: xfs/091
> > Failures: xfs/091
> > Failed 1 of 1 tests
> >
> > # diff -u xfstests/tests/xfs/091.out /root/xfstests/results//xfs/091.out.bad
> > --- xfstests/tests/xfs/091.out  2016-07-18 02:57:47.67000 -0400
> > +++ /root/xfstests/results//xfs/091.out.bad 2016-08-03 
> > 22:38:14.94800 -0400
> > @@ -6,6 +6,70 @@
> >  + corrupt image
> >  + mount image
> >  + modify files
> > +pwrite64: Structure needs cleaning
> > 
> > +pwrite64: Structure needs cleaning
> >  + repair fs
> >  + mount image
> >  + chattr -R -i
> >
> >
> > Thanks,
> > Xiong
> >
> > ___
> > Linux-nvdimm mailing list
> > linux-nvd...@lists.01.org
> > https://lists.01.org/mailman/listinfo/linux-nvdimm
> 
> ___
> xfs mailing list
> x...@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

Re: [LKP] [lkp] [f2fs] ec795418c4: fsmark.files_per_sec -36.3% regression

2016-08-29 Thread Jaegeuk Kim

Hello,

On Sat, Aug 27, 2016 at 10:13:34AM +0800, Fengguang Wu wrote:
> Hi Jaegeuk,
> 
> > > >> > - [lkp] [f2fs] b93f771286: aim7.jobs-per-min -81.2% regression
> > > >> >
> > > >> > The disk is 4 12G ram disk, and setup RAID0 on them via mdadm.  The
> > > >> > steps for aim7 is,
> > > >> >
> > > >> > cat > workfile < > > >> > FILESIZE: 1M
> > > >> > POOLSIZE: 10M
> > > >> > 10 sync_disk_rw
> > > >> > EOF
> > > >> >
> > > >> > (
> > > >> > echo $HOSTNAME
> > > >> > echo sync_disk_rw
> > > >> >
> > > >> > echo 1
> > > >> > echo 600
> > > >> > echo 2
> > > >> > echo 600
> > > >> > echo 1
> > > >> > ) | ./multitask -t &
> > > >>
> > > >> Any update on these 2 regressions?  Is the information is enough for 
> > > >> you
> > > >> to reproduce?
> > > >
> > > > Sorry, I've had no time to dig this due to business travel now.
> > > > I'll check that when back to US.
> > > 
> > > Any update?
> > 
> > Sorry, how can I get multitask binary?
> 
> It's part of aim7, which can be downloaded here:
> 
> http://nchc.dl.sourceforge.net/project/aimbench/aim-suite7/Initial%20release/s7110.tar.Z

Thank you for the codes.

I've run this workload on the latest f2fs and compared performance having
without the reported patch. (1TB nvme SSD, 16 cores, 16GB DRAM)
Interestingly, I could find slight performance improvement rather than
regression. :(
Not sure how to reproduce this.

Thanks,

Re: [LKP] [lkp] [f2fs] ec795418c4: fsmark.files_per_sec -36.3% regression

2016-08-29 Thread Jaegeuk Kim

Hello,

On Sat, Aug 27, 2016 at 10:13:34AM +0800, Fengguang Wu wrote:
> Hi Jaegeuk,
> 
> > > >> > - [lkp] [f2fs] b93f771286: aim7.jobs-per-min -81.2% regression
> > > >> >
> > > >> > The disk is 4 12G ram disk, and setup RAID0 on them via mdadm.  The
> > > >> > steps for aim7 is,
> > > >> >
> > > >> > cat > workfile < > > >> > FILESIZE: 1M
> > > >> > POOLSIZE: 10M
> > > >> > 10 sync_disk_rw
> > > >> > EOF
> > > >> >
> > > >> > (
> > > >> > echo $HOSTNAME
> > > >> > echo sync_disk_rw
> > > >> >
> > > >> > echo 1
> > > >> > echo 600
> > > >> > echo 2
> > > >> > echo 600
> > > >> > echo 1
> > > >> > ) | ./multitask -t &
> > > >>
> > > >> Any update on these 2 regressions?  Is the information is enough for 
> > > >> you
> > > >> to reproduce?
> > > >
> > > > Sorry, I've had no time to dig this due to business travel now.
> > > > I'll check that when back to US.
> > > 
> > > Any update?
> > 
> > Sorry, how can I get multitask binary?
> 
> It's part of aim7, which can be downloaded here:
> 
> http://nchc.dl.sourceforge.net/project/aimbench/aim-suite7/Initial%20release/s7110.tar.Z

Thank you for the codes.

I've run this workload on the latest f2fs and compared performance having
without the reported patch. (1TB nvme SSD, 16 cores, 16GB DRAM)
Interestingly, I could find slight performance improvement rather than
regression. :(
Not sure how to reproduce this.

Thanks,

Re: [PATCH net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events

2016-08-29 Thread Brendan Gregg

On Mon, Aug 29, 2016 at 5:19 AM, Peter Zijlstra  wrote:
>
> On Fri, Aug 26, 2016 at 07:31:18PM -0700, Alexei Starovoitov wrote:
> > Hi Peter, Dave,
> >
> > this patch set is a follow up to the discussion:
> > https://lkml.org/lkml/2016/8/4/304
> > It turned out to be simpler than what we discussed.
> >
> > Patches 1-3 is a bpf-side prep for the main patch 4
> > that adds bpf program as an overflow_handler to sw and hw perf_events.
> > Peter, please review.
> >
> > Patches 5 and 6 are tests/examples from myself and Brendan.
>
> Brendan, so this works for you without extra hacks required?

Yes, thanks for checking, I've done both IP and stack sampling so far with it.

Brendan

Re: [PATCH net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events

2016-08-29 Thread Brendan Gregg

On Mon, Aug 29, 2016 at 5:19 AM, Peter Zijlstra  wrote:
>
> On Fri, Aug 26, 2016 at 07:31:18PM -0700, Alexei Starovoitov wrote:
> > Hi Peter, Dave,
> >
> > this patch set is a follow up to the discussion:
> > https://lkml.org/lkml/2016/8/4/304
> > It turned out to be simpler than what we discussed.
> >
> > Patches 1-3 is a bpf-side prep for the main patch 4
> > that adds bpf program as an overflow_handler to sw and hw perf_events.
> > Peter, please review.
> >
> > Patches 5 and 6 are tests/examples from myself and Brendan.
>
> Brendan, so this works for you without extra hacks required?

Yes, thanks for checking, I've done both IP and stack sampling so far with it.

Brendan

[PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Michael S. Tsirkin

Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
to signal they are safe to use with an IOMMU.

Without this bit, exposing the device to userspace is unsafe, so probe
and fail VFIO initialization unless noiommu is enabled.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vfio/pci/vfio_pci_private.h |   1 +
 drivers/vfio/pci/vfio_pci.c |  14 
 drivers/vfio/pci/vfio_pci_virtio.c  | 140 
 drivers/vfio/pci/Makefile   |   1 +
 4 files changed, 156 insertions(+)
 create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c

diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 2128de8..2bd5616 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct vfio_pci_device 
*vdev)
return -ENODEV;
 }
 #endif
+extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool noiommu);
 #endif /* VFIO_PCI_PRIVATE_H */
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index d624a52..e93bf0c 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
return ret;
}
 
+   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
+   bool noiommu = vfio_is_noiommu_group_dev(>dev);
+
+   ret = vfio_pci_virtio_quirk(vdev, noiommu);
+   if (ret) {
+   dev_warn(>pdev->dev,
+"Failed to setup Virtio for VFIO\n");
+   vfio_del_group_dev(>dev);
+   vfio_iommu_group_put(group, >dev);
+   kfree(vdev);
+   return ret;
+   }
+   }
+
if (vfio_pci_is_vga(pdev)) {
vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
vga_set_legacy_decoding(pdev,
diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
b/drivers/vfio/pci/vfio_pci_virtio.c
new file mode 100644
index 000..e1ecffd
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_virtio.c
@@ -0,0 +1,140 @@
+/*
+ * VFIO PCI Intel Graphics support
+ *
+ * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Register a device specific region through which to provide read-only
+ * access to the Intel IGD opregion.  The register defining the opregion
+ * address is also virtualized to prevent user modification.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_pci_private.h"
+
+/**
+ * virtio_pci_find_capability - walk capabilities to find device info.
+ * @dev: the pci device
+ * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
+ *
+ * Returns offset of the capability, or 0.
+ */
+static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type)
+{
+   int pos;
+
+   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+pos > 0;
+pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
+   u8 type;
+   pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
+cfg_type),
+);
+
+   if (type != cfg_type)
+   continue;
+
+   /* Ignore structures with reserved BAR values */
+   if (type != VIRTIO_PCI_CAP_PCI_CFG) {
+   u8 bar;
+
+   pci_read_config_byte(dev, pos +
+offsetof(struct virtio_pci_cap,
+ bar),
+);
+   if (bar > 0x5)
+   continue;
+   }
+
+   return pos;
+   }
+   return 0;
+}
+
+
+int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool noiommu)
+{
+   struct pci_dev *dev = vdev->pdev;
+   int common, cfg;
+   u32 features;
+   u32 offset;
+   u8 bar;
+
+   /* Without an IOMMU, we don't care */
+   if (noiommu)
+   return 0;
+
+/* Virtio only owns devices >= 0x1000 and <= 0x107f: leave the rest. */
+if (dev->device < 0x1000 || dev->device > 0x107f)
+return 0;
+
+   /* Check whether device enforces the IOMMU correctly */
+
+   /*
+* All modern devices must have common and cfg capabilities. We use cfg
+* capability for access so that we don't need to worry about resource
+* availability. Slow but sure.
+*

[PATCH v2 1/2] vfio: report group noiommu status

2016-08-29 Thread Michael S. Tsirkin

When using vfio, callers might want to know whether device is added
to a regular group or an non-iommu group.

Report this status from vfio_is_noiommu_group_dev.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/vfio.h |  2 ++
 drivers/vfio/vfio.c  | 12 
 2 files changed, 14 insertions(+)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 0ecae0b..584757b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -51,6 +51,8 @@ extern int vfio_add_group_dev(struct device *dev,
  const struct vfio_device_ops *ops,
  void *device_data);
 
+extern bool vfio_is_noiommu_group_dev(struct device *dev);
+
 extern void *vfio_del_group_dev(struct device *dev);
 extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 extern void vfio_device_put(struct vfio_device *device);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0..22f279f 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -872,6 +872,18 @@ static bool vfio_dev_present(struct vfio_group *group, 
struct device *dev)
 }
 
 /*
+ * Is device part of a noiommu group?
+ * Note: must call vfio_add_group_dev first.
+ */
+bool vfio_is_noiommu_group_dev(struct device *dev)
+{
+   struct vfio_device *device = dev_get_drvdata(dev);
+   struct vfio_group *group = device->group;
+
+   return group->noiommu;
+}
+
+/*
  * Decrement the device reference count and wait for the device to be
  * removed.  Open file descriptors for the device... */
 void *vfio_del_group_dev(struct device *dev)
-- 
MST

[PATCH v2 0/2] vfio: blacklist legacy virtio devices

2016-08-29 Thread Michael S. Tsirkin

Legacy virtio devices always bypassed an IOMMU, so using them with vfio was
never safe.  This adds a quirk detecting these and disabling VFIO unless the
noiommu mode is used.  At the moment, this only applies to virtio-pci devices.

The patch might make sense on stable as well.

Michael S. Tsirkin (2):
  vfio: report group noiommu status
  vfio: add virtio pci quirk

 drivers/vfio/pci/vfio_pci_private.h |   1 +
 include/linux/vfio.h|   2 +
 drivers/vfio/pci/vfio_pci.c |  14 
 drivers/vfio/pci/vfio_pci_virtio.c  | 140 
 drivers/vfio/vfio.c |  12 
 drivers/vfio/pci/Makefile   |   1 +
 6 files changed, 170 insertions(+)
 create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c

-- 
MST

[PATCH v2 2/2] vfio: add virtio pci quirk

2016-08-29 Thread Michael S. Tsirkin

Modern virtio pci devices can set VIRTIO_F_IOMMU_PLATFORM
to signal they are safe to use with an IOMMU.

Without this bit, exposing the device to userspace is unsafe, so probe
and fail VFIO initialization unless noiommu is enabled.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vfio/pci/vfio_pci_private.h |   1 +
 drivers/vfio/pci/vfio_pci.c |  14 
 drivers/vfio/pci/vfio_pci_virtio.c  | 140 
 drivers/vfio/pci/Makefile   |   1 +
 4 files changed, 156 insertions(+)
 create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c

diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 2128de8..2bd5616 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -139,4 +139,5 @@ static inline int vfio_pci_igd_init(struct vfio_pci_device 
*vdev)
return -ENODEV;
 }
 #endif
+extern int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool noiommu);
 #endif /* VFIO_PCI_PRIVATE_H */
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index d624a52..e93bf0c 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1236,6 +1236,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
return ret;
}
 
+   if (pdev->vendor == PCI_VENDOR_ID_REDHAT_QUMRANET) {
+   bool noiommu = vfio_is_noiommu_group_dev(>dev);
+
+   ret = vfio_pci_virtio_quirk(vdev, noiommu);
+   if (ret) {
+   dev_warn(>pdev->dev,
+"Failed to setup Virtio for VFIO\n");
+   vfio_del_group_dev(>dev);
+   vfio_iommu_group_put(group, >dev);
+   kfree(vdev);
+   return ret;
+   }
+   }
+
if (vfio_pci_is_vga(pdev)) {
vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
vga_set_legacy_decoding(pdev,
diff --git a/drivers/vfio/pci/vfio_pci_virtio.c 
b/drivers/vfio/pci/vfio_pci_virtio.c
new file mode 100644
index 000..e1ecffd
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_virtio.c
@@ -0,0 +1,140 @@
+/*
+ * VFIO PCI Intel Graphics support
+ *
+ * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Register a device specific region through which to provide read-only
+ * access to the Intel IGD opregion.  The register defining the opregion
+ * address is also virtualized to prevent user modification.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_pci_private.h"
+
+/**
+ * virtio_pci_find_capability - walk capabilities to find device info.
+ * @dev: the pci device
+ * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
+ *
+ * Returns offset of the capability, or 0.
+ */
+static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type)
+{
+   int pos;
+
+   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+pos > 0;
+pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
+   u8 type;
+   pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
+cfg_type),
+);
+
+   if (type != cfg_type)
+   continue;
+
+   /* Ignore structures with reserved BAR values */
+   if (type != VIRTIO_PCI_CAP_PCI_CFG) {
+   u8 bar;
+
+   pci_read_config_byte(dev, pos +
+offsetof(struct virtio_pci_cap,
+ bar),
+);
+   if (bar > 0x5)
+   continue;
+   }
+
+   return pos;
+   }
+   return 0;
+}
+
+
+int vfio_pci_virtio_quirk(struct vfio_pci_device *vdev, bool noiommu)
+{
+   struct pci_dev *dev = vdev->pdev;
+   int common, cfg;
+   u32 features;
+   u32 offset;
+   u8 bar;
+
+   /* Without an IOMMU, we don't care */
+   if (noiommu)
+   return 0;
+
+/* Virtio only owns devices >= 0x1000 and <= 0x107f: leave the rest. */
+if (dev->device < 0x1000 || dev->device > 0x107f)
+return 0;
+
+   /* Check whether device enforces the IOMMU correctly */
+
+   /*
+* All modern devices must have common and cfg capabilities. We use cfg
+* capability for access so that we don't need to worry about resource
+* availability. Slow but sure.
+* Note that all vendor-specific fields we

[PATCH v2 1/2] vfio: report group noiommu status

2016-08-29 Thread Michael S. Tsirkin

When using vfio, callers might want to know whether device is added
to a regular group or an non-iommu group.

Report this status from vfio_is_noiommu_group_dev.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/vfio.h |  2 ++
 drivers/vfio/vfio.c  | 12 
 2 files changed, 14 insertions(+)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 0ecae0b..584757b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -51,6 +51,8 @@ extern int vfio_add_group_dev(struct device *dev,
  const struct vfio_device_ops *ops,
  void *device_data);
 
+extern bool vfio_is_noiommu_group_dev(struct device *dev);
+
 extern void *vfio_del_group_dev(struct device *dev);
 extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 extern void vfio_device_put(struct vfio_device *device);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0..22f279f 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -872,6 +872,18 @@ static bool vfio_dev_present(struct vfio_group *group, 
struct device *dev)
 }
 
 /*
+ * Is device part of a noiommu group?
+ * Note: must call vfio_add_group_dev first.
+ */
+bool vfio_is_noiommu_group_dev(struct device *dev)
+{
+   struct vfio_device *device = dev_get_drvdata(dev);
+   struct vfio_group *group = device->group;
+
+   return group->noiommu;
+}
+
+/*
  * Decrement the device reference count and wait for the device to be
  * removed.  Open file descriptors for the device... */
 void *vfio_del_group_dev(struct device *dev)
-- 
MST

[PATCH v2 0/2] vfio: blacklist legacy virtio devices

2016-08-29 Thread Michael S. Tsirkin

Legacy virtio devices always bypassed an IOMMU, so using them with vfio was
never safe.  This adds a quirk detecting these and disabling VFIO unless the
noiommu mode is used.  At the moment, this only applies to virtio-pci devices.

The patch might make sense on stable as well.

Michael S. Tsirkin (2):
  vfio: report group noiommu status
  vfio: add virtio pci quirk

 drivers/vfio/pci/vfio_pci_private.h |   1 +
 include/linux/vfio.h|   2 +
 drivers/vfio/pci/vfio_pci.c |  14 
 drivers/vfio/pci/vfio_pci_virtio.c  | 140 
 drivers/vfio/vfio.c |  12 
 drivers/vfio/pci/Makefile   |   1 +
 6 files changed, 170 insertions(+)
 create mode 100644 drivers/vfio/pci/vfio_pci_virtio.c

-- 
MST

Re: [PATCH 2/2] f2fs: add roll-forward recovery process for encrypted dentry

2016-08-29 Thread Jaegeuk Kim

On Tue, Aug 30, 2016 at 12:06:09AM +0800, Chao Yu wrote:
> Hi all,
> 
> On 2016/8/29 11:27, Shuoran Liu wrote:
> > Add roll-forward recovery process for encrypted dentry, so the first fsync
> > issued to an encrypted file does not need writing checkpoint.
> > 
> > This improves the performance of the following test at thousands of small
> > files: open -> write -> fsync -> close
> 
> I didn't find any functionality regression until now, from aspect of improving
> performance, I think it's worth to use this method.
> 
> To Jaegeuk, how do you think of this modification?

I added one change in this patch to show the encrypted file name.

But, I've got several bugs from xfstests. (e.g., 177, ...)
Please adjust the below patch that I just sent.

  f2fs: fix lost xattrs of directories

The issue was that we must guarantee parent's key information stored in xattr.

Thanks,

> 
> Thanks,
> 
> > 
> > Signed-off-by: Shuoran Liu 
> > ---
> >  fs/f2fs/dir.c  | 75 
> > ++
> >  fs/f2fs/f2fs.h |  4 +++
> >  fs/f2fs/file.c |  2 --
> >  fs/f2fs/recovery.c | 16 +---
> >  4 files changed, 58 insertions(+), 39 deletions(-)
> > 
> > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> > index 9054aea..8eca6dd 100644
> > --- a/fs/f2fs/dir.c
> > +++ b/fs/f2fs/dir.c
> > @@ -212,31 +212,17 @@ static struct f2fs_dir_entry *find_in_level(struct 
> > inode *dir,
> > return de;
> >  }
> >  
> > -/*
> > - * Find an entry in the specified directory with the wanted name.
> > - * It returns the page where the entry was found (as a parameter - 
> > res_page),
> > - * and the entry itself. Page is returned mapped and unlocked.
> > - * Entry is guaranteed to be valid.
> > - */
> > -struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
> > -   const struct qstr *child, struct page **res_page)
> > +struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir,
> > +   struct fscrypt_name *fname, struct page **res_page)
> >  {
> > unsigned long npages = dir_blocks(dir);
> > struct f2fs_dir_entry *de = NULL;
> > unsigned int max_depth;
> > unsigned int level;
> > -   struct fscrypt_name fname;
> > -   int err;
> > -
> > -   err = fscrypt_setup_filename(dir, child, 1, );
> > -   if (err) {
> > -   *res_page = ERR_PTR(err);
> > -   return NULL;
> > -   }
> >  
> > if (f2fs_has_inline_dentry(dir)) {
> > *res_page = NULL;
> > -   de = find_in_inline_dir(dir, , res_page);
> > +   de = find_in_inline_dir(dir, fname, res_page);
> > goto out;
> > }
> >  
> > @@ -256,11 +242,35 @@ struct f2fs_dir_entry *f2fs_find_entry(struct inode 
> > *dir,
> >  
> > for (level = 0; level < max_depth; level++) {
> > *res_page = NULL;
> > -   de = find_in_level(dir, level, , res_page);
> > +   de = find_in_level(dir, level, fname, res_page);
> > if (de || IS_ERR(*res_page))
> > break;
> > }
> >  out:
> > +   return de;
> > +}
> > +
> > +/*
> > + * Find an entry in the specified directory with the wanted name.
> > + * It returns the page where the entry was found (as a parameter - 
> > res_page),
> > + * and the entry itself. Page is returned mapped and unlocked.
> > + * Entry is guaranteed to be valid.
> > + */
> > +struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
> > +   const struct qstr *child, struct page **res_page)
> > +{
> > +   struct f2fs_dir_entry *de = NULL;
> > +   struct fscrypt_name fname;
> > +   int err;
> > +
> > +   err = fscrypt_setup_filename(dir, child, 1, );
> > +   if (err) {
> > +   *res_page = ERR_PTR(err);
> > +   return NULL;
> > +   }
> > +
> > +   de = __f2fs_find_entry(dir, , res_page);
> > +
> > fscrypt_free_filename();
> > return de;
> >  }
> > @@ -599,6 +609,24 @@ fail:
> > return err;
> >  }
> >  
> > +int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname,
> > +   struct inode *inode, nid_t ino, umode_t mode)
> > +{
> > +   struct qstr new_name;
> > +   int err = -EAGAIN;
> > +
> > +   new_name.name = fname_name(fname);
> > +   new_name.len = fname_len(fname);
> > +
> > +   if (f2fs_has_inline_dentry(dir))
> > +   err = f2fs_add_inline_entry(dir, _name, inode, ino, mode);
> > +   if (err == -EAGAIN)
> > +   err = f2fs_add_regular_entry(dir, _name, inode, ino, mode);
> > +
> > +   f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
> > +   return err;
> > +}
> > +
> >  /*
> >   * Caller should grab and release a rwsem by calling f2fs_lock_op() and
> >   * f2fs_unlock_op().
> > @@ -607,24 +635,15 @@ int __f2fs_add_link(struct inode *dir, const struct 
> > qstr *name,
> > struct inode *inode, nid_t ino, umode_t mode)
> >  {
> > struct fscrypt_name fname;
> > -   struct qstr new_name;
> > int err;
> >  
> > err =

Re: [PATCH 2/2] f2fs: add roll-forward recovery process for encrypted dentry

2016-08-29 Thread Jaegeuk Kim

On Tue, Aug 30, 2016 at 12:06:09AM +0800, Chao Yu wrote:
> Hi all,
> 
> On 2016/8/29 11:27, Shuoran Liu wrote:
> > Add roll-forward recovery process for encrypted dentry, so the first fsync
> > issued to an encrypted file does not need writing checkpoint.
> > 
> > This improves the performance of the following test at thousands of small
> > files: open -> write -> fsync -> close
> 
> I didn't find any functionality regression until now, from aspect of improving
> performance, I think it's worth to use this method.
> 
> To Jaegeuk, how do you think of this modification?

I added one change in this patch to show the encrypted file name.

But, I've got several bugs from xfstests. (e.g., 177, ...)
Please adjust the below patch that I just sent.

  f2fs: fix lost xattrs of directories

The issue was that we must guarantee parent's key information stored in xattr.

Thanks,

> 
> Thanks,
> 
> > 
> > Signed-off-by: Shuoran Liu 
> > ---
> >  fs/f2fs/dir.c  | 75 
> > ++
> >  fs/f2fs/f2fs.h |  4 +++
> >  fs/f2fs/file.c |  2 --
> >  fs/f2fs/recovery.c | 16 +---
> >  4 files changed, 58 insertions(+), 39 deletions(-)
> > 
> > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> > index 9054aea..8eca6dd 100644
> > --- a/fs/f2fs/dir.c
> > +++ b/fs/f2fs/dir.c
> > @@ -212,31 +212,17 @@ static struct f2fs_dir_entry *find_in_level(struct 
> > inode *dir,
> > return de;
> >  }
> >  
> > -/*
> > - * Find an entry in the specified directory with the wanted name.
> > - * It returns the page where the entry was found (as a parameter - 
> > res_page),
> > - * and the entry itself. Page is returned mapped and unlocked.
> > - * Entry is guaranteed to be valid.
> > - */
> > -struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
> > -   const struct qstr *child, struct page **res_page)
> > +struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir,
> > +   struct fscrypt_name *fname, struct page **res_page)
> >  {
> > unsigned long npages = dir_blocks(dir);
> > struct f2fs_dir_entry *de = NULL;
> > unsigned int max_depth;
> > unsigned int level;
> > -   struct fscrypt_name fname;
> > -   int err;
> > -
> > -   err = fscrypt_setup_filename(dir, child, 1, );
> > -   if (err) {
> > -   *res_page = ERR_PTR(err);
> > -   return NULL;
> > -   }
> >  
> > if (f2fs_has_inline_dentry(dir)) {
> > *res_page = NULL;
> > -   de = find_in_inline_dir(dir, , res_page);
> > +   de = find_in_inline_dir(dir, fname, res_page);
> > goto out;
> > }
> >  
> > @@ -256,11 +242,35 @@ struct f2fs_dir_entry *f2fs_find_entry(struct inode 
> > *dir,
> >  
> > for (level = 0; level < max_depth; level++) {
> > *res_page = NULL;
> > -   de = find_in_level(dir, level, , res_page);
> > +   de = find_in_level(dir, level, fname, res_page);
> > if (de || IS_ERR(*res_page))
> > break;
> > }
> >  out:
> > +   return de;
> > +}
> > +
> > +/*
> > + * Find an entry in the specified directory with the wanted name.
> > + * It returns the page where the entry was found (as a parameter - 
> > res_page),
> > + * and the entry itself. Page is returned mapped and unlocked.
> > + * Entry is guaranteed to be valid.
> > + */
> > +struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
> > +   const struct qstr *child, struct page **res_page)
> > +{
> > +   struct f2fs_dir_entry *de = NULL;
> > +   struct fscrypt_name fname;
> > +   int err;
> > +
> > +   err = fscrypt_setup_filename(dir, child, 1, );
> > +   if (err) {
> > +   *res_page = ERR_PTR(err);
> > +   return NULL;
> > +   }
> > +
> > +   de = __f2fs_find_entry(dir, , res_page);
> > +
> > fscrypt_free_filename();
> > return de;
> >  }
> > @@ -599,6 +609,24 @@ fail:
> > return err;
> >  }
> >  
> > +int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname,
> > +   struct inode *inode, nid_t ino, umode_t mode)
> > +{
> > +   struct qstr new_name;
> > +   int err = -EAGAIN;
> > +
> > +   new_name.name = fname_name(fname);
> > +   new_name.len = fname_len(fname);
> > +
> > +   if (f2fs_has_inline_dentry(dir))
> > +   err = f2fs_add_inline_entry(dir, _name, inode, ino, mode);
> > +   if (err == -EAGAIN)
> > +   err = f2fs_add_regular_entry(dir, _name, inode, ino, mode);
> > +
> > +   f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
> > +   return err;
> > +}
> > +
> >  /*
> >   * Caller should grab and release a rwsem by calling f2fs_lock_op() and
> >   * f2fs_unlock_op().
> > @@ -607,24 +635,15 @@ int __f2fs_add_link(struct inode *dir, const struct 
> > qstr *name,
> > struct inode *inode, nid_t ino, umode_t mode)
> >  {
> > struct fscrypt_name fname;
> > -   struct qstr new_name;
> > int err;
> >  
> > err = fscrypt_setup_filename(dir, name, 0, );
> >

[PATCH] f2fs: fix lost xattrs of directories

2016-08-29 Thread Jaegeuk Kim

This patch enhances the xattr consistency of dirs from suddern power-cuts.

Possible scenario would be:
1. dir->setxattr used by per-file encryption
2. file->setxattr goes into inline_xattr
3. file->fsync

In that case, we should do checkpoint for #1.
Otherwise we'd lose dir's key information for the file given #2.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 1 +
 fs/f2fs/f2fs.h   | 1 +
 fs/f2fs/file.c   | 2 ++
 fs/f2fs/xattr.c  | 2 ++
 4 files changed, 6 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 64a685d..727e97e 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1152,6 +1152,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
 
clear_prefree_segments(sbi, cpc);
clear_sbi_flag(sbi, SBI_IS_DIRTY);
+   clear_sbi_flag(sbi, SBI_NEED_CP);
 
return 0;
 }
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c2478a1..b9611d4 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -756,6 +756,7 @@ enum {
SBI_NEED_FSCK,  /* need fsck.f2fs to fix */
SBI_POR_DOING,  /* recovery is doing or not */
SBI_NEED_SB_WRITE,  /* need to recover superblock */
+   SBI_NEED_CP,/* need to checkpoint */
 };
 
 enum {
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7c6ee7e..21aa99b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -135,6 +135,8 @@ static inline bool need_do_checkpoint(struct inode *inode)
 
if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1)
need_cp = true;
+   else if (is_sbi_flag_set(sbi, SBI_NEED_CP))
+   need_cp = true;
else if (file_enc_name(inode) && need_dentry_mark(sbi, inode->i_ino))
need_cp = true;
else if (file_wrong_pino(inode))
diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
index c8898b5..d39a792 100644
--- a/fs/f2fs/xattr.c
+++ b/fs/f2fs/xattr.c
@@ -548,6 +548,8 @@ static int __f2fs_setxattr(struct inode *inode, int index,
!strcmp(name, F2FS_XATTR_NAME_ENCRYPTION_CONTEXT))
f2fs_set_encrypted_inode(inode);
f2fs_mark_inode_dirty_sync(inode);
+   if (!error && S_ISDIR(inode->i_mode))
+   set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_CP);
 exit:
kzfree(base_addr);
return error;
-- 
2.8.3

[PATCH] f2fs: fix lost xattrs of directories

2016-08-29 Thread Jaegeuk Kim

This patch enhances the xattr consistency of dirs from suddern power-cuts.

Possible scenario would be:
1. dir->setxattr used by per-file encryption
2. file->setxattr goes into inline_xattr
3. file->fsync

In that case, we should do checkpoint for #1.
Otherwise we'd lose dir's key information for the file given #2.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 1 +
 fs/f2fs/f2fs.h   | 1 +
 fs/f2fs/file.c   | 2 ++
 fs/f2fs/xattr.c  | 2 ++
 4 files changed, 6 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 64a685d..727e97e 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1152,6 +1152,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
 
clear_prefree_segments(sbi, cpc);
clear_sbi_flag(sbi, SBI_IS_DIRTY);
+   clear_sbi_flag(sbi, SBI_NEED_CP);
 
return 0;
 }
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c2478a1..b9611d4 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -756,6 +756,7 @@ enum {
SBI_NEED_FSCK,  /* need fsck.f2fs to fix */
SBI_POR_DOING,  /* recovery is doing or not */
SBI_NEED_SB_WRITE,  /* need to recover superblock */
+   SBI_NEED_CP,/* need to checkpoint */
 };
 
 enum {
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7c6ee7e..21aa99b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -135,6 +135,8 @@ static inline bool need_do_checkpoint(struct inode *inode)
 
if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1)
need_cp = true;
+   else if (is_sbi_flag_set(sbi, SBI_NEED_CP))
+   need_cp = true;
else if (file_enc_name(inode) && need_dentry_mark(sbi, inode->i_ino))
need_cp = true;
else if (file_wrong_pino(inode))
diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
index c8898b5..d39a792 100644
--- a/fs/f2fs/xattr.c
+++ b/fs/f2fs/xattr.c
@@ -548,6 +548,8 @@ static int __f2fs_setxattr(struct inode *inode, int index,
!strcmp(name, F2FS_XATTR_NAME_ENCRYPTION_CONTEXT))
f2fs_set_encrypted_inode(inode);
f2fs_mark_inode_dirty_sync(inode);
+   if (!error && S_ISDIR(inode->i_mode))
+   set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_CP);
 exit:
kzfree(base_addr);
return error;
-- 
2.8.3

Re: [RFC PATCH v8 1/9] Restartable sequences system call

2016-08-29 Thread Boqun Feng

On Mon, Aug 29, 2016 at 03:16:52PM +, Mathieu Desnoyers wrote:
> - On Aug 27, 2016, at 12:22 AM, Josh Triplett j...@joshtriplett.org wrote:
> 
> > On Thu, Aug 25, 2016 at 05:56:25PM +, Ben Maurer wrote:
> >> rseq opens up a whole world of algorithms to userspace – algorithms
> >> that are O(num CPUs) and where one can have an extremely fast fastpath
> >> at the cost of a slower slow path. Many of these algorithms are in use
> >> in the kernel today – per-cpu allocators, RCU, light-weight reader
> >> writer locks, etc. Even in cases where these APIs can be implemented
> >> today, a rseq implementation is often superior in terms of
> >> predictability and usability (eg per-thread counters consume more
> >> memory and are more expensive to read than per-cpu counters).
> >>
> >> Isn’t the large number of uses of rseq-like algorithms in the kernel a
> >> pretty substantial sign that there would be demand for similar
> >> algorithms by user-space systems programmers?
> > 
> > Yes and no.  It provides a substantial sign that such algorithms could
> > and should exist; however "someone should do this" doesn't demonstrate
> > that someone *will*.  I do think we need a concrete example of a
> > userspace user with benchmark numbers that demonstrate the value of this
> > approach.
> > 
> > Mathieu, do you have a version of URCU that can use rseq to work per-CPU
> > rather than per-thread?  URCU's data structures would work as a
> > benchmark.
> 
> I currently don't have a per-cpu flavor of liburcu. All the flavors are
> per-thread, because currently the alternative requires atomic operations
> on the fast-path. We could indeed re-implement something similar to SRCU
> (although under LGPLv2.1 license). I've looked at what would be required
> over the weekend, and it seems feasible, but in the short term my customers
> expect me to focus my work on speeding up the LTTng-UST tracer per-cpu
> ring buffer by adapting it to rseq. Completing the liburcu per-cpu flavor
> will be in my spare time for now.
> 

Just for you information.

I have been working on the new SRCU-like flavor of liburcu since last
week, but it took me a while to understand the directory architecture of
urcu...

I wrote only implemetion for rcu_read_{un}lock() and synchronize_rcu(),
and just is able to run the simplest multiflavor test case. My plan is
to post the code and some numbers(on x86 and ppc) by the end of this
week.

Regards,
Boqun

> I expect liburcu per-cpu flavor to improve the slow path in many-threads
> use-cases (smaller grace period overhead), but not the fast path much,
> except perhaps by allowing faster memory reclaim in update-heavy workloads,
> which could then lead to better use of the cache even for reads.
> 

[...]


signature.asc
Description: PGP signature

Re: [RFC PATCH v8 1/9] Restartable sequences system call

2016-08-29 Thread Boqun Feng

On Mon, Aug 29, 2016 at 03:16:52PM +, Mathieu Desnoyers wrote:
> - On Aug 27, 2016, at 12:22 AM, Josh Triplett j...@joshtriplett.org wrote:
> 
> > On Thu, Aug 25, 2016 at 05:56:25PM +, Ben Maurer wrote:
> >> rseq opens up a whole world of algorithms to userspace – algorithms
> >> that are O(num CPUs) and where one can have an extremely fast fastpath
> >> at the cost of a slower slow path. Many of these algorithms are in use
> >> in the kernel today – per-cpu allocators, RCU, light-weight reader
> >> writer locks, etc. Even in cases where these APIs can be implemented
> >> today, a rseq implementation is often superior in terms of
> >> predictability and usability (eg per-thread counters consume more
> >> memory and are more expensive to read than per-cpu counters).
> >>
> >> Isn’t the large number of uses of rseq-like algorithms in the kernel a
> >> pretty substantial sign that there would be demand for similar
> >> algorithms by user-space systems programmers?
> > 
> > Yes and no.  It provides a substantial sign that such algorithms could
> > and should exist; however "someone should do this" doesn't demonstrate
> > that someone *will*.  I do think we need a concrete example of a
> > userspace user with benchmark numbers that demonstrate the value of this
> > approach.
> > 
> > Mathieu, do you have a version of URCU that can use rseq to work per-CPU
> > rather than per-thread?  URCU's data structures would work as a
> > benchmark.
> 
> I currently don't have a per-cpu flavor of liburcu. All the flavors are
> per-thread, because currently the alternative requires atomic operations
> on the fast-path. We could indeed re-implement something similar to SRCU
> (although under LGPLv2.1 license). I've looked at what would be required
> over the weekend, and it seems feasible, but in the short term my customers
> expect me to focus my work on speeding up the LTTng-UST tracer per-cpu
> ring buffer by adapting it to rseq. Completing the liburcu per-cpu flavor
> will be in my spare time for now.
> 

Just for you information.

I have been working on the new SRCU-like flavor of liburcu since last
week, but it took me a while to understand the directory architecture of
urcu...

I wrote only implemetion for rcu_read_{un}lock() and synchronize_rcu(),
and just is able to run the simplest multiflavor test case. My plan is
to post the code and some numbers(on x86 and ppc) by the end of this
week.

Regards,
Boqun

> I expect liburcu per-cpu flavor to improve the slow path in many-threads
> use-cases (smaller grace period overhead), but not the fast path much,
> except perhaps by allowing faster memory reclaim in update-heavy workloads,
> which could then lead to better use of the cache even for reads.
> 

[...]


signature.asc
Description: PGP signature

[GIT PULL] hwmon fixes for v4.8-rc5

2016-08-29 Thread Guenter Roeck

Hi Linus,

Please pull hwmon fixes for Linux v4.8-rc5 from signed tag:

git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git 
hwmon-for-linus-v4.8-rc5

Thanks,
Guenter
--

The following changes since commit fa8410b355251fd30341662a40ac6b22d3e38468:

  Linux 4.8-rc3 (2016-08-21 16:14:10 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git 
tags/hwmon-for-linus-v4.8-rc5

for you to fetch changes up to 3c3292634fc2de1ab97b6aa3222fee647f737adb:

  hwmon: (it87) Add missing sysfs attribute group terminator (2016-08-29 
05:31:31 -0700)


hwmon patch for v4.8-rc5

Add missing sysfs attribute group terminator to it87 driver.


Jean Delvare (1):
  hwmon: (it87) Add missing sysfs attribute group terminator

 drivers/hwmon/it87.c | 1 +
 1 file changed, 1 insertion(+)

[GIT PULL] hwmon fixes for v4.8-rc5

2016-08-29 Thread Guenter Roeck

Hi Linus,

Please pull hwmon fixes for Linux v4.8-rc5 from signed tag:

git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git 
hwmon-for-linus-v4.8-rc5

Thanks,
Guenter
--

The following changes since commit fa8410b355251fd30341662a40ac6b22d3e38468:

  Linux 4.8-rc3 (2016-08-21 16:14:10 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git 
tags/hwmon-for-linus-v4.8-rc5

for you to fetch changes up to 3c3292634fc2de1ab97b6aa3222fee647f737adb:

  hwmon: (it87) Add missing sysfs attribute group terminator (2016-08-29 
05:31:31 -0700)


hwmon patch for v4.8-rc5

Add missing sysfs attribute group terminator to it87 driver.


Jean Delvare (1):
  hwmon: (it87) Add missing sysfs attribute group terminator

 drivers/hwmon/it87.c | 1 +
 1 file changed, 1 insertion(+)

Re: [PATCH v9 0/2] sun4i-codec: Add Line-In, FM-In, Mic 2, Capture Source, Differential Line-In

2016-08-29 Thread Chen-Yu Tsai

On Tue, Aug 30, 2016 at 2:03 AM, Danny Milosavljevic
 wrote:
> Hi Maxime,
>
> On Sat, 27 Aug 2016 14:02:51 +0200
> Maxime Ripard  wrote:
>
>> Please send your patches using send-email.
>
> If you are referring to "git send-email", I did send the patch series using 
> that (with almost the same setup I use for many other projects). It probably 
> barfed because my description enumeration contained a dash like "- Line In". 
> Sigh.

The problem is your patches are being treated as attachments.
Try using '--no-attach' with git format-patch, regenerate the patches
and check that they are not mulit-part before sending them again.

ChenYu

>
> Anyway, I'll send the E-Mail again.
>
> Thanks,
>Danny

Re: [PATCH v9 0/2] sun4i-codec: Add Line-In, FM-In, Mic 2, Capture Source, Differential Line-In

2016-08-29 Thread Chen-Yu Tsai

On Tue, Aug 30, 2016 at 2:03 AM, Danny Milosavljevic
 wrote:
> Hi Maxime,
>
> On Sat, 27 Aug 2016 14:02:51 +0200
> Maxime Ripard  wrote:
>
>> Please send your patches using send-email.
>
> If you are referring to "git send-email", I did send the patch series using 
> that (with almost the same setup I use for many other projects). It probably 
> barfed because my description enumeration contained a dash like "- Line In". 
> Sigh.

The problem is your patches are being treated as attachments.
Try using '--no-attach' with git format-patch, regenerate the patches
and check that they are not mulit-part before sending them again.

ChenYu

>
> Anyway, I'll send the E-Mail again.
>
> Thanks,
>Danny

Re: 4.8-rc4 spews "BUG: sleeping function called from invalid context at fs/dcache.c:757"

2016-08-29 Thread Ian Kent

On Tue, 2016-08-30 at 09:37 +0800, Ian Kent wrote:
> On Mon, 2016-08-29 at 16:18 +0100, Al Viro wrote:
> > On Mon, Aug 29, 2016 at 04:35:46PM +0200, Takashi Iwai wrote:
> > 
> > >   [] dput+0x46/0x400
> > ... which should not be called in atomic contexts
> > >   [] follow_down_one+0x27/0x60
> > ... and neither should this
> > >   [] autofs4_mount_busy+0x32/0x110
> > ... nor that (for fsck sake, there's full-blown path_put() in it!)
> > >   [] should_expire+0x51/0x3d0
> > ... so that would better not be called in atomic either (incidentally,
> > it also calls dput() directly)
> > >   [] autofs4_expire_indirect+0x190/0x2d0
> > ... while here it is called under sbi->fs_lock.
> > 
> > > I don't remember of a similar stack trace in the past, so if any, it
> > > can be a regression in 4.8 kernel.  But I cannot say it in 100%, as
> > > this looks spontaneous, nor I would be able to reproduce it at the
> > > next boot...
> > 
> > It's old; the race is narrow, but it's been there for quite a while, by
> > the look of it.
> 
> Right, I missed that when the rcu-walk concurrency changes went in, mmm 

Umm ... no, the problem has been there much longer than that ...

Re: 4.8-rc4 spews "BUG: sleeping function called from invalid context at fs/dcache.c:757"

2016-08-29 Thread Ian Kent

On Tue, 2016-08-30 at 09:37 +0800, Ian Kent wrote:
> On Mon, 2016-08-29 at 16:18 +0100, Al Viro wrote:
> > On Mon, Aug 29, 2016 at 04:35:46PM +0200, Takashi Iwai wrote:
> > 
> > >   [] dput+0x46/0x400
> > ... which should not be called in atomic contexts
> > >   [] follow_down_one+0x27/0x60
> > ... and neither should this
> > >   [] autofs4_mount_busy+0x32/0x110
> > ... nor that (for fsck sake, there's full-blown path_put() in it!)
> > >   [] should_expire+0x51/0x3d0
> > ... so that would better not be called in atomic either (incidentally,
> > it also calls dput() directly)
> > >   [] autofs4_expire_indirect+0x190/0x2d0
> > ... while here it is called under sbi->fs_lock.
> > 
> > > I don't remember of a similar stack trace in the past, so if any, it
> > > can be a regression in 4.8 kernel.  But I cannot say it in 100%, as
> > > this looks spontaneous, nor I would be able to reproduce it at the
> > > next boot...
> > 
> > It's old; the race is narrow, but it's been there for quite a while, by
> > the look of it.
> 
> Right, I missed that when the rcu-walk concurrency changes went in, mmm 

Umm ... no, the problem has been there much longer than that ...

Re: xfstests xfs fuzzers fail with DAX

2016-08-29 Thread Dan Williams

[ Adding Darrick on the off chance that this triggers an "aha, of
course it does!" ]

Darrick these corruption tests you added to xfstests last year all
fail the same way with DAX enabled.  They spew:

"pwrite64: Structure needs cleaning"

...reports that are cleaned up by running without "-o dax".

Alternatively you could sit back and watch me try to figure it out,
that should be quite entertaining... as a start I'll try to pin down a
stack trace when the error is returned.


On Wed, Aug 3, 2016 at 7:45 PM, Xiong Zhou  wrote:
> Hi,
>
> A few xfs fuzzers in xfstests fail with dax mount option, pass without dax.
> They are xfs/086 xfs/088 xfs/089 xfs/091.
>
> xfstests to commit 4470ad4c7e  (Jul 26)
> kernel   to commit dd95069545  (Jul 24)
>
> + ./check xfs/091
> FSTYP -- xfs (non-debug)
> PLATFORM  -- Linux/x86_64 rhel73 4.7.0+
> MKFS_OPTIONS  -- -f -bsize=4096 /dev/pmem1
> MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/pmem1 /daxsch
>
> xfs/091  104s
> Ran: xfs/091
> Passed all 1 tests
>
> + echo 'MOUNT_OPTIONS="-o dax"'
> + ./check xfs/091
> FSTYP -- xfs (non-debug)
> PLATFORM  -- Linux/x86_64 rhel73 4.7.0+
> MKFS_OPTIONS  -- -f -bsize=4096 /dev/pmem1
> MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem1 
> /daxsch
>
> xfs/091 104s ...  - output mismatch (see 
> /root/xfstests/results//xfs/091.out.bad)
> --- tests/xfs/091.out   2016-07-18 02:57:47.67000 -0400
> +++ /root/xfstests/results//xfs/091.out.bad 2016-08-03 22:38:14.94800 
> -0400
> @@ -6,6 +6,70 @@
>  + corrupt image
>  + mount image
>  + modify files
> +pwrite64: Structure needs cleaning
> +pwrite64: Structure needs cleaning
> +pwrite64: Structure needs cleaning
> +pwrite64: Structure needs cleaning
> ...
> (Run 'diff -u tests/xfs/091.out /root/xfstests/results//xfs/091.out.bad'  
> to see the entire diff)
> Ran: xfs/091
> Failures: xfs/091
> Failed 1 of 1 tests
>
> # diff -u xfstests/tests/xfs/091.out /root/xfstests/results//xfs/091.out.bad
> --- xfstests/tests/xfs/091.out  2016-07-18 02:57:47.67000 -0400
> +++ /root/xfstests/results//xfs/091.out.bad 2016-08-03 22:38:14.94800 
> -0400
> @@ -6,6 +6,70 @@
>  + corrupt image
>  + mount image
>  + modify files
> +pwrite64: Structure needs cleaning
> 
> +pwrite64: Structure needs cleaning
>  + repair fs
>  + mount image
>  + chattr -R -i
>
>
> Thanks,
> Xiong
>
> ___
> Linux-nvdimm mailing list
> linux-nvd...@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: xfstests xfs fuzzers fail with DAX

2016-08-29 Thread Dan Williams

[ Adding Darrick on the off chance that this triggers an "aha, of
course it does!" ]

Darrick these corruption tests you added to xfstests last year all
fail the same way with DAX enabled.  They spew:

"pwrite64: Structure needs cleaning"

...reports that are cleaned up by running without "-o dax".

Alternatively you could sit back and watch me try to figure it out,
that should be quite entertaining... as a start I'll try to pin down a
stack trace when the error is returned.


On Wed, Aug 3, 2016 at 7:45 PM, Xiong Zhou  wrote:
> Hi,
>
> A few xfs fuzzers in xfstests fail with dax mount option, pass without dax.
> They are xfs/086 xfs/088 xfs/089 xfs/091.
>
> xfstests to commit 4470ad4c7e  (Jul 26)
> kernel   to commit dd95069545  (Jul 24)
>
> + ./check xfs/091
> FSTYP -- xfs (non-debug)
> PLATFORM  -- Linux/x86_64 rhel73 4.7.0+
> MKFS_OPTIONS  -- -f -bsize=4096 /dev/pmem1
> MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/pmem1 /daxsch
>
> xfs/091  104s
> Ran: xfs/091
> Passed all 1 tests
>
> + echo 'MOUNT_OPTIONS="-o dax"'
> + ./check xfs/091
> FSTYP -- xfs (non-debug)
> PLATFORM  -- Linux/x86_64 rhel73 4.7.0+
> MKFS_OPTIONS  -- -f -bsize=4096 /dev/pmem1
> MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem1 
> /daxsch
>
> xfs/091 104s ...  - output mismatch (see 
> /root/xfstests/results//xfs/091.out.bad)
> --- tests/xfs/091.out   2016-07-18 02:57:47.67000 -0400
> +++ /root/xfstests/results//xfs/091.out.bad 2016-08-03 22:38:14.94800 
> -0400
> @@ -6,6 +6,70 @@
>  + corrupt image
>  + mount image
>  + modify files
> +pwrite64: Structure needs cleaning
> +pwrite64: Structure needs cleaning
> +pwrite64: Structure needs cleaning
> +pwrite64: Structure needs cleaning
> ...
> (Run 'diff -u tests/xfs/091.out /root/xfstests/results//xfs/091.out.bad'  
> to see the entire diff)
> Ran: xfs/091
> Failures: xfs/091
> Failed 1 of 1 tests
>
> # diff -u xfstests/tests/xfs/091.out /root/xfstests/results//xfs/091.out.bad
> --- xfstests/tests/xfs/091.out  2016-07-18 02:57:47.67000 -0400
> +++ /root/xfstests/results//xfs/091.out.bad 2016-08-03 22:38:14.94800 
> -0400
> @@ -6,6 +6,70 @@
>  + corrupt image
>  + mount image
>  + modify files
> +pwrite64: Structure needs cleaning
> 
> +pwrite64: Structure needs cleaning
>  + repair fs
>  + mount image
>  + chattr -R -i
>
>
> Thanks,
> Xiong
>
> ___
> Linux-nvdimm mailing list
> linux-nvd...@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

Re: [PATCH] ftrace: Access ret_stack->subtime only in the function profiler

2016-08-29 Thread Namhyung Kim

Hi Steve,

On Mon, Aug 29, 2016 at 04:07:00PM -0400, Steven Rostedt wrote:
> On Mon, 29 Aug 2016 12:05:18 +0900
> Namhyung Kim  wrote:
> 
> > The subtime is used only for function profiler with function graph
> > tracer enabled.  Move the definition of subtime under
> > CONFIG_FUNCTION_PROFILER to reduce the memory usage.  Also move the
> > initialization of subtime into the graph entry callback.
> 
> Hmm, I think documentation needs to be updated. Although it was never
> implemented, I believe I added the subtime to not only work with the
> profiler, but also with the normal tracing (to have the time of the
> internal functions subtracted from the upper level functions). But it
> appears that part was never implemented.
> 
> I'm fine with the patch, or actually implementing what graph-time
> states in Documentation/ftrace.txt. If we take this patch, that comment
> needs to be made to only mention the profiler (and the option should
> only be shown when the profiler is enabled).

Ah, missed the documentation part.  To implement it in the normal
tracing, I think we need to add 'subtime' field to struct
ftrace_graph_ret which will increase disk size.  Are you ok with this?

Thanks,
Namhyung


> 
> > 
> > Cc: Josh Poimboeuf 
> > Signed-off-by: Namhyung Kim 
> > ---
> >  include/linux/ftrace.h   | 2 ++
> >  kernel/trace/ftrace.c| 6 ++
> >  kernel/trace/trace_functions_graph.c | 1 -
> >  3 files changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index 6f93ac46e7f0..b3d34d3e0e7e 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -794,7 +794,9 @@ struct ftrace_ret_stack {
> > unsigned long ret;
> > unsigned long func;
> > unsigned long long calltime;
> > +#ifdef CONFIG_FUNCTION_PROFILER
> > unsigned long long subtime;
> > +#endif
> >  #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > unsigned long fp;
> >  #endif
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 84752c8e28b5..2050a7652a86 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -872,7 +872,13 @@ function_profile_call(unsigned long ip, unsigned long 
> > parent_ip,
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >  static int profile_graph_entry(struct ftrace_graph_ent *trace)
> >  {
> > +   int index = trace->depth;
> > +
> > function_profile_call(trace->func, 0, NULL, NULL);
> > +
> > +   if (index >= 0 && index < FTRACE_RETFUNC_DEPTH)
> > +   current->ret_stack[index].subtime = 0;
> > +
> > return 1;
> >  }
> >  
> > diff --git a/kernel/trace/trace_functions_graph.c 
> > b/kernel/trace/trace_functions_graph.c
> > index 0cbe38a844fa..9c7ffa4df5a8 100644
> > --- a/kernel/trace/trace_functions_graph.c
> > +++ b/kernel/trace/trace_functions_graph.c
> > @@ -170,7 +170,6 @@ ftrace_push_return_trace(unsigned long ret, unsigned 
> > long func, int *depth,
> > current->ret_stack[index].ret = ret;
> > current->ret_stack[index].func = func;
> > current->ret_stack[index].calltime = calltime;
> > -   current->ret_stack[index].subtime = 0;
> >  #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > current->ret_stack[index].fp = frame_pointer;
> >  #endif
>

Re: [PATCH] ftrace: Access ret_stack->subtime only in the function profiler

2016-08-29 Thread Namhyung Kim

Hi Steve,

On Mon, Aug 29, 2016 at 04:07:00PM -0400, Steven Rostedt wrote:
> On Mon, 29 Aug 2016 12:05:18 +0900
> Namhyung Kim  wrote:
> 
> > The subtime is used only for function profiler with function graph
> > tracer enabled.  Move the definition of subtime under
> > CONFIG_FUNCTION_PROFILER to reduce the memory usage.  Also move the
> > initialization of subtime into the graph entry callback.
> 
> Hmm, I think documentation needs to be updated. Although it was never
> implemented, I believe I added the subtime to not only work with the
> profiler, but also with the normal tracing (to have the time of the
> internal functions subtracted from the upper level functions). But it
> appears that part was never implemented.
> 
> I'm fine with the patch, or actually implementing what graph-time
> states in Documentation/ftrace.txt. If we take this patch, that comment
> needs to be made to only mention the profiler (and the option should
> only be shown when the profiler is enabled).

Ah, missed the documentation part.  To implement it in the normal
tracing, I think we need to add 'subtime' field to struct
ftrace_graph_ret which will increase disk size.  Are you ok with this?

Thanks,
Namhyung


> 
> > 
> > Cc: Josh Poimboeuf 
> > Signed-off-by: Namhyung Kim 
> > ---
> >  include/linux/ftrace.h   | 2 ++
> >  kernel/trace/ftrace.c| 6 ++
> >  kernel/trace/trace_functions_graph.c | 1 -
> >  3 files changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index 6f93ac46e7f0..b3d34d3e0e7e 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -794,7 +794,9 @@ struct ftrace_ret_stack {
> > unsigned long ret;
> > unsigned long func;
> > unsigned long long calltime;
> > +#ifdef CONFIG_FUNCTION_PROFILER
> > unsigned long long subtime;
> > +#endif
> >  #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > unsigned long fp;
> >  #endif
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 84752c8e28b5..2050a7652a86 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -872,7 +872,13 @@ function_profile_call(unsigned long ip, unsigned long 
> > parent_ip,
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >  static int profile_graph_entry(struct ftrace_graph_ent *trace)
> >  {
> > +   int index = trace->depth;
> > +
> > function_profile_call(trace->func, 0, NULL, NULL);
> > +
> > +   if (index >= 0 && index < FTRACE_RETFUNC_DEPTH)
> > +   current->ret_stack[index].subtime = 0;
> > +
> > return 1;
> >  }
> >  
> > diff --git a/kernel/trace/trace_functions_graph.c 
> > b/kernel/trace/trace_functions_graph.c
> > index 0cbe38a844fa..9c7ffa4df5a8 100644
> > --- a/kernel/trace/trace_functions_graph.c
> > +++ b/kernel/trace/trace_functions_graph.c
> > @@ -170,7 +170,6 @@ ftrace_push_return_trace(unsigned long ret, unsigned 
> > long func, int *depth,
> > current->ret_stack[index].ret = ret;
> > current->ret_stack[index].func = func;
> > current->ret_stack[index].calltime = calltime;
> > -   current->ret_stack[index].subtime = 0;
> >  #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
> > current->ret_stack[index].fp = frame_pointer;
> >  #endif
>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1426 matches

Mail list logo