[PULL] vhost: cleanups and fixes

2015-12-20 Thread Michael S. Tsirkin
The following changes since commit 9f9499ae8e6415cefc4fe0a96ad0e27864353c89:

  Linux 4.4-rc5 (2015-12-13 17:42:58 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 74a599f09bec7419b2490039f0fb33bc8581ef7c:

  virtio/s390: handle error values in irb (2015-12-17 10:37:33 +0200)


virtio: fixes on top of 4.4-rc5

This includes a single fix for virtio ccw error handling.

Signed-off-by: Michael S. Tsirkin 


Cornelia Huck (1):
  virtio/s390: handle error values in irb

 drivers/s390/virtio/virtio_ccw.c | 62 
 1 file changed, 37 insertions(+), 25 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] trace-cmd: Print relate stacktrace at once

2015-12-20 Thread Namhyung Kim
From: Namhyung Kim 

Currently trace-cmd prints ring buffer events in a strict time order.
But it sometimes annoying that stacktrace from the same cpu can be
intermixed with events from other cpu.  This patch looks next event when
print last record and prints it also if it's a stacktrace (from the same
cpu).

Requested-by: Joonsoo Kim 
Signed-off-by: Namhyung Kim 
---
 trace-read.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/trace-read.c b/trace-read.c
index aec8532..10edf5d 100644
--- a/trace-read.c
+++ b/trace-read.c
@@ -1155,9 +1155,20 @@ static void read_data_info(struct list_head 
*handle_list, enum output_type otype
}
}
if (last_record) {
+   int last_cpu = last_record->cpu;
+
print_handle_file(last_handle);
trace_show_data(last_handle->handle, last_record, 
profile);
free_handle_record(last_handle);
+
+   /* print related stacktrace at once */
+   record = tracecmd_peek_data(last_handle->handle, 
last_cpu);
+   if (record && pevent_data_type(pevent, record) == 
stacktrace_id &&
+   test_stacktrace(last_handle, record, 1)) {
+   print_handle_file(last_handle);
+   trace_show_data(last_handle->handle, record, 
profile);
+   tracecmd_read_data(last_handle->handle, 
last_cpu);
+   }
}
} while (last_record);
 
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: dts: imx28: add alternate auart4 pinmux

2015-12-20 Thread Shawn Guo
On Fri, Dec 11, 2015 at 01:36:26PM +, Mans Rullgard wrote:
> Add auart4 2-pin configuration on auart0 rts/cts pads.
> 
> Signed-off-by: Mans Rullgard 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v1 1/2] drm: rockchip/hdmi: add Innosilicon HDMI support

2015-12-20 Thread Yakir Yang

Hi Mark,

On 12/21/2015 03:31 PM, Mark yao wrote:

On 2015年11月11日 15:46, Yakir Yang wrote:

+hdmi->connector.polled = DRM_CONNECTOR_POLL_HPD;
+
+drm_connector_helper_add(>connector,
+ _hdmi_connector_helper_funcs);
+drm_connector_init(drm, >connector, 
_hdmi_connector_funcs,

+   DRM_MODE_CONNECTOR_HDMIA);
+
+hdmi->connector.encoder = encoder;


Pls remove "hdmi->connector.encoder = encoder;"

It would cause a error dump on atomic drm runtime:

WARNING: CPU: 3 PID: 74 at drivers/gpu/drm/drm_atomic_helper.c:682 
drm_atomic_helper_update_legacy_modeset_state+0x6c/0x200()


This issue can be explained from following thread:
https://lkml.org/lkml/2015/11/16/498



Done, thanks

- Yakir


Thanks.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v1 0/2] Introduce Innosilicon HDMI driver on Rockchip platforms

2015-12-20 Thread Yakir Yang

Hi Mark

On 12/21/2015 03:25 PM, Mark yao wrote:

Hi Yakir
I want to convert drm/rockchip to support atomic api,
I'd like you can do some modify to adapt it.



Sure, would update as soon as possible.

Thanks,
- Yakir


- Mark

On 2015年11月11日 15:45, Yakir Yang wrote:

Hi guys:

Here are a brief introduction to Innosilicon HDMI IP:
  - Support HDMI 1.4a, HDCP 1.2 and DVI 1.0 standard compliant 
transmitter

  - Support HDMI1.4 a/b 3D function defined in HDMI 1.4 a/b spec
  - Digital video interface supports a pixel size of 24, 30, 36, 
48bits color

depth in RGB
  - S/PDIF output supports PCM, Dolby Digital, DTS digital audio 
transmission

(32-192kHz Fs) using IEC60958 and IEC 61937
  - The EDID and CEC function are also supported by Innosilicon HDMI 
Transmitter

Controlle

This IP have been integrated on some Rockchip CPUs (like 
RK3036/RK312x), due
to those CPUs haven't been landed on manline kernel, so I creat a 
branch to

verify this series [https://github.com/rockchip-linux/kernel].

- Yakir


Yakir Yang (2):
   drm: rockchip/hdmi: add Innosilicon HDMI support
   dt-bindings: add document for Innosilicon HDMI on Rockchip platform

  .../display/rockchip/inno_hdmi-rockchip.txt|   50 +
  drivers/gpu/drm/rockchip/Kconfig   |8 +
  drivers/gpu/drm/rockchip/Makefile  |1 +
  drivers/gpu/drm/rockchip/inno_hdmi.c   | 1008 


  drivers/gpu/drm/rockchip/inno_hdmi.h   |  364 +++
  5 files changed, 1431 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/display/rockchip/inno_hdmi-rockchip.txt

  create mode 100644 drivers/gpu/drm/rockchip/inno_hdmi.c
  create mode 100644 drivers/gpu/drm/rockchip/inno_hdmi.h







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 4/4] scsi: storvsc: Tighten up the interrupt path

2015-12-20 Thread Hannes Reinecke

On 12/19/2015 03:28 AM, KY Srinivasan wrote:



[ .. ]


Could you?  You're making what you describe as an optimisation but
there are two reasons why this might not be so.  The first is that the
compiler is entitled to inline static functions.  If it did, likely it
picked up the optmisation anyway as Hannes suggested.  However, the
other reason this might not be an optimisation (assuming the compiler
doesn't inline the function) is you're passing an argument which can be
offset computed.  On all architectures, you have a fixed number of
registers for passing function arguments, then we have to use the
stack.  Using the stack comes in far more expensive than computing an
offset to an existing pointer.  Even if you're still in registers, the
offset now has to be computed and stored and the compiler loses track
of the relation.

The bottom line is that adding an extra argument for a value which can
be offset computed is rarely a win.


James,
When I did this, I was mostly concerned about the cost of reestablishing state 
that was
already known. So, even with the function being in-lined, I felt the cost of 
reestablishing
state that was already known is unnecessary. In this particular case, I did not 
change the
number of arguments that were being passed; I just changed the type of one of 
them -
instead of passing struct hv_device *, I am now passing struct storvsc_device 
*. In the
current code, we are using struct hv_device * to establish a pointer to struct 
storvsc_device *
via the function get_in_stor_device(). This pattern currently exists in the 
call chain from the
interrupt handler - storvsc_on_channel_callback().

While the compiler is smart enough to inline both get_in_stor_device() as well 
as many of the static
functions in the call chain from storvsc_on_channel_callback(), looking at the 
assembled code,
the compiler is repeatedly inlining the call to get_in_stor_device() and this 
clearly is less than optimal.

Which means you actually checked the compiler output, and it made a 
difference.


That's all I wanted to know, as it's not immediately clear from the 
patch.


So:

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ASoC: fsl-asoc-card: use different route map for AC'97 mode

2015-12-20 Thread Nicolin Chen
On Sun, Dec 20, 2015 at 09:34:29PM +0100, Maciej S. Szmigiero wrote:
> fsl_ssi uses different stream names ("AC97 Playback" / "AC97 Capture")
> in AC'97 mode so in this case fsl-asoc-card route map should
> also be using them.
> 
> Signed-off-by: Maciej S. Szmigiero 

Acked-by: Nicolin Chen 

> ---
>  sound/soc/fsl/fsl-asoc-card.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
> index c63d89da51f1..562b3bd22d9a 100644
> --- a/sound/soc/fsl/fsl-asoc-card.c
> +++ b/sound/soc/fsl/fsl-asoc-card.c
> @@ -107,6 +107,13 @@ static const struct snd_soc_dapm_route audio_map[] = {
>   {"CPU-Capture",  NULL, "Capture"},
>  };
>  
> +static const struct snd_soc_dapm_route audio_map_ac97[] = {
> + {"AC97 Playback",  NULL, "ASRC-Playback"},
> + {"Playback",  NULL, "AC97 Playback"},
> + {"ASRC-Capture",  NULL, "AC97 Capture"},
> + {"AC97 Capture",  NULL, "Capture"},
> +};
> +
>  /* Add all possible widgets into here without being redundant */
>  static const struct snd_soc_dapm_widget fsl_asoc_card_dapm_widgets[] = {
>   SND_SOC_DAPM_LINE("Line Out Jack", NULL),
> @@ -579,7 +586,8 @@ static int fsl_asoc_card_probe(struct platform_device 
> *pdev)
>   priv->card.dev = >dev;
>   priv->card.name = priv->name;
>   priv->card.dai_link = priv->dai_link;
> - priv->card.dapm_routes = audio_map;
> + priv->card.dapm_routes = fsl_asoc_card_is_ac97(priv) ?
> +  audio_map_ac97 : audio_map;
>   priv->card.late_probe = fsl_asoc_card_late_probe;
>   priv->card.num_dapm_routes = ARRAY_SIZE(audio_map);
>   priv->card.dapm_widgets = fsl_asoc_card_dapm_widgets;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lightnvm: unlock rq and free ppa_list after submission failed

2015-12-20 Thread Wenwei Tao
This patch is based on [PATCH] lightnvm: fix bio submission issue
 https://lkml.org/lkml/2015/12/9/394

2015-12-21 15:32 GMT+08:00 Wenwei Tao :
> after io submission failed, before free rq, delete rq from
> rrpc's inflight list, leave no bad item in the list. And
> free rq's ppa_list to avoid memory leak.
>
> Signed-off-by: Wenwei Tao 
> ---
>  drivers/lightnvm/rrpc.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/lightnvm/rrpc.c b/drivers/lightnvm/rrpc.c
> index a1e7488..c0886a8 100644
> --- a/drivers/lightnvm/rrpc.c
> +++ b/drivers/lightnvm/rrpc.c
> @@ -843,6 +843,12 @@ static int rrpc_submit_io(struct rrpc *rrpc, struct bio 
> *bio,
> if (err) {
> pr_err("rrpc: I/O submission failed: %d\n", err);
> bio_put(bio);
> +   if (!(flags & NVM_IOTYPE_GC)) {
> +   rrpc_unlock_rq(rrpc, rqd);
> +   if (rqd->nr_pages > 1)
> +   nvm_dev_dma_free(rrpc->dev,
> +   rqd->ppa_list, rqd->dma_ppa_list);
> +   }
> return NVM_IO_ERR;
> }
>
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] ASoC: imx-pcm-dma: add NULL test

2015-12-20 Thread Nicolin Chen
On Sun, Dec 20, 2015 at 12:15:50PM +0100, Julia Lawall wrote:
> Add NULL test on call to devm_kzalloc.
> 
> The semantic match that finds this problem is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // 
> @@
> expression x;
> @@
> 
> * x = devm_kzalloc(...);
>   ... when != x == NULL
>   *x
> // 
> 
> Signed-off-by: Julia Lawall 

Acked-by: Nicolin Chen 

Thank you

> 
> ---
>  sound/soc/fsl/imx-pcm-dma.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/sound/soc/fsl/imx-pcm-dma.c b/sound/soc/fsl/imx-pcm-dma.c
> index 1fc01ed..f3d3d1f 100644
> --- a/sound/soc/fsl/imx-pcm-dma.c
> +++ b/sound/soc/fsl/imx-pcm-dma.c
> @@ -62,6 +62,8 @@ int imx_pcm_dma_init(struct platform_device *pdev, size_t 
> size)
>  
>   config = devm_kzalloc(>dev,
>   sizeof(struct snd_dmaengine_pcm_config), GFP_KERNEL);
> + if (!config)
> + return -ENOMEM;
>   *config = imx_dmaengine_pcm_config;
>   if (size)
>   config->prealloc_buffer_size = size;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lightnvm: unlock rq and free ppa_list after submission failed

2015-12-20 Thread Wenwei Tao
after io submission failed, before free rq, delete rq from
rrpc's inflight list, leave no bad item in the list. And
free rq's ppa_list to avoid memory leak.

Signed-off-by: Wenwei Tao 
---
 drivers/lightnvm/rrpc.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/lightnvm/rrpc.c b/drivers/lightnvm/rrpc.c
index a1e7488..c0886a8 100644
--- a/drivers/lightnvm/rrpc.c
+++ b/drivers/lightnvm/rrpc.c
@@ -843,6 +843,12 @@ static int rrpc_submit_io(struct rrpc *rrpc, struct bio 
*bio,
if (err) {
pr_err("rrpc: I/O submission failed: %d\n", err);
bio_put(bio);
+   if (!(flags & NVM_IOTYPE_GC)) {
+   rrpc_unlock_rq(rrpc, rqd);
+   if (rqd->nr_pages > 1)
+   nvm_dev_dma_free(rrpc->dev,
+   rqd->ppa_list, rqd->dma_ppa_list);
+   }
return NVM_IO_ERR;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-21 10:23, Linus Torvalds wrote:

On Sun, Dec 20, 2015 at 8:47 PM, Linus Torvalds
 wrote:


That said, we obviously need to figure out this current problem
regardless first..


... although maybe it *would* be interesting to hear what happens if
you just compile a 64-bit kernel instead?

Do you still see the problem? Because if not, then we should look very
specifically for some 32-bit PAE issue.

For example, maybe we use "unsigned long" somewhere where we should
use "phys_addr_t". On x86-64, they obviously end up being the same. On
normal non-PAE x86-32, they are also the same. But ..



Let's wait for what Tejun Heo might say - I've applied his debugging 
patch and sent back the results.


Building x86_64 kernel here involves installing a 64bit Linux VM, so I'd 
like it to be the last resort.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v1 1/2] drm: rockchip/hdmi: add Innosilicon HDMI support

2015-12-20 Thread Mark yao

On 2015年11月11日 15:46, Yakir Yang wrote:

+   hdmi->connector.polled = DRM_CONNECTOR_POLL_HPD;
+
+   drm_connector_helper_add(>connector,
+_hdmi_connector_helper_funcs);
+   drm_connector_init(drm, >connector, _hdmi_connector_funcs,
+  DRM_MODE_CONNECTOR_HDMIA);
+
+   hdmi->connector.encoder = encoder;


Pls remove "hdmi->connector.encoder = encoder;"

It would cause a error dump on atomic drm runtime:

WARNING: CPU: 3 PID: 74 at drivers/gpu/drm/drm_atomic_helper.c:682 
drm_atomic_helper_update_legacy_modeset_state+0x6c/0x200()


This issue can be explained from following thread:
https://lkml.org/lkml/2015/11/16/498

Thanks.

--
Mark Yao


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v1 0/2] Introduce Innosilicon HDMI driver on Rockchip platforms

2015-12-20 Thread Mark yao

Hi Yakir
I want to convert drm/rockchip to support atomic api,
I'd like you can do some modify to adapt it.

- Mark

On 2015年11月11日 15:45, Yakir Yang wrote:

Hi guys:

Here are a brief introduction to Innosilicon HDMI IP:
  - Support HDMI 1.4a, HDCP 1.2 and DVI 1.0 standard compliant transmitter
  - Support HDMI1.4 a/b 3D function defined in HDMI 1.4 a/b spec
  - Digital video interface supports a pixel size of 24, 30, 36, 48bits color
depth in RGB
  - S/PDIF output supports PCM, Dolby Digital, DTS digital audio transmission
(32-192kHz Fs) using IEC60958 and IEC 61937
  - The EDID and CEC function are also supported by Innosilicon HDMI Transmitter
Controlle

This IP have been integrated on some Rockchip CPUs (like RK3036/RK312x), due
to those CPUs haven't been landed on manline kernel, so I creat a branch to
verify this series [https://github.com/rockchip-linux/kernel].

- Yakir


Yakir Yang (2):
   drm: rockchip/hdmi: add Innosilicon HDMI support
   dt-bindings: add document for Innosilicon HDMI on Rockchip platform

  .../display/rockchip/inno_hdmi-rockchip.txt|   50 +
  drivers/gpu/drm/rockchip/Kconfig   |8 +
  drivers/gpu/drm/rockchip/Makefile  |1 +
  drivers/gpu/drm/rockchip/inno_hdmi.c   | 1008 
  drivers/gpu/drm/rockchip/inno_hdmi.h   |  364 +++
  5 files changed, 1431 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/display/rockchip/inno_hdmi-rockchip.txt
  create mode 100644 drivers/gpu/drm/rockchip/inno_hdmi.c
  create mode 100644 drivers/gpu/drm/rockchip/inno_hdmi.h




--
Mark Yao


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-21 11:55, Tejun Heo wrote:

Artem, can you please reproduce the issue with the following patch
applied and attach the kernel log?

Thanks.



I've applied this patch on top of vanilla 4.3.3 kernel (without Linus'es 
revert). Hopefully it's how you intended it to be.


Here's the result (I skipped the beginning of dmesg - it's the same as 
always - see bugzilla).[   60.387407] Corrupted low memory at c0001000 (1000 phys) = cba3d25f
[   60.387411] Corrupted low memory at c0001004 (1004 phys) = e8f17ba7
[   60.387413] Corrupted low memory at c0001008 (1008 phys) = 61cfa79a
[   60.387415] Corrupted low memory at c000100c (100c phys) = dc4d5d71
[   60.387417] Corrupted low memory at c0001010 (1010 phys) = adbdc15b
[   60.387418] Corrupted low memory at c0001014 (1014 phys) = dee76bdc
[   60.387420] Corrupted low memory at c0001018 (1018 phys) = 827dee31
[   60.387422] Corrupted low memory at c000101c (101c phys) = ef70cf7b
[   60.387423] Corrupted low memory at c0001020 (1020 phys) = 82fdee4d
[   60.387425] Corrupted low memory at c0001024 (1024 phys) = 77533c7b
[   60.387427] Corrupted low memory at c0001028 (1028 phys) = ddd4cf35
[   60.387428] Corrupted low memory at c000102c (102c phys) = 7beea149
[   60.387430] Corrupted low memory at c0001030 (1030 phys) = 798fe878
[   60.387432] Corrupted low memory at c0001034 (1034 phys) = 4283a7a8
[   60.387434] Corrupted low memory at c0001038 (1038 phys) = 4dee093d
[   60.387435] Corrupted low memory at c000103c (103c phys) = ee21ef73
[   60.387437] Corrupted low memory at c0001040 (1040 phys) = fe3dc93d
[   60.387439] Corrupted low memory at c0001044 (1044 phys) = b8e7cf0d
[   60.387440] Corrupted low memory at c0001048 (1048 phys) = af3c9977
[   60.387442] Corrupted low memory at c000104c (104c phys) = b80b7b8b
[   60.387444] Corrupted low memory at c0001050 (1050 phys) = b6f73d77
[   60.387445] Corrupted low memory at c0001054 (1054 phys) = f7276f70
[   60.387447] Corrupted low memory at c0001058 (1058 phys) = c62f70f6
[   60.387449] Corrupted low memory at c000105c (105c phys) = 3ef734bd
[   60.387451] Corrupted low memory at c0001060 (1060 phys) = 1ef79f40
[   60.387452] Corrupted low memory at c0001064 (1064 phys) = f1cf9f65
[   60.387454] Corrupted low memory at c0001068 (1068 phys) = 297a5390
[   60.387456] Corrupted low memory at c000106c (106c phys) = a7f14fbc
[   60.387457] Corrupted low memory at c0001070 (1070 phys) = 57ef71af
[   60.387459] Corrupted low memory at c0001074 (1074 phys) = 219d15e4
[   60.387461] Corrupted low memory at c0001078 (1078 phys) = 7b99a2af
[   60.387462] Corrupted low memory at c000107c (107c phys) = c56d281b
[   60.387464] Corrupted low memory at c0001080 (1080 phys) = 3c84de6e
[   60.387466] Corrupted low memory at c0001084 (1084 phys) = edee56ec
[   60.387468] Corrupted low memory at c0001088 (1088 phys) = 49b557a7
[   60.387469] Corrupted low memory at c000108c (108c phys) = 01baeb6a
[   60.387471] Corrupted low memory at c0001090 (1090 phys) = b775acde
[   60.387473] Corrupted low memory at c0001094 (1094 phys) = 30dd6851
[   60.387474] Corrupted low memory at c0001098 (1098 phys) = f328fd0f
[   60.387476] Corrupted low memory at c000109c (109c phys) = 17ad185c
[   60.387478] Corrupted low memory at c00010a0 (10a0 phys) = b83985f5
[   60.387479] Corrupted low memory at c00010a4 (10a4 phys) = 775b8af5
[   60.387481] Corrupted low memory at c00010a8 (10a8 phys) = 3d35e4bc
[   60.387483] Corrupted low memory at c00010ac (10ac phys) = bf4d7b90
[   60.387485] Corrupted low memory at c00010b0 (10b0 phys) = 1db6fd99
[   60.387486] Corrupted low memory at c00010b4 (10b4 phys) = 3b94bf2f
[   60.387488] Corrupted low memory at c00010b8 (10b8 phys) = 5f447e55
[   60.387490] Corrupted low memory at c00010bc (10bc phys) = dcfe6395
[   60.387491] Corrupted low memory at c00010c0 (10c0 phys) = fc0b7a23
[   60.387493] Corrupted low memory at c00010c4 (10c4 phys) = 32fa23aa
[   60.387495] Corrupted low memory at c00010c8 (10c8 phys) = e88ef3f8
[   60.387496] Corrupted low memory at c00010cc (10cc phys) = 1ed7e14b
[   60.387498] Corrupted low memory at c00010d0 (10d0 phys) = 9fc3d7d1
[   60.387500] Corrupted low memory at c00010d4 (10d4 phys) = 015f447f
[   60.387501] Corrupted low memory at c00010d8 (10d8 phys) = 7d11c17f
[   60.387503] Corrupted low memory at c00010dc (10dc phys) = 4785fc2d
[   60.387505] Corrupted low memory at c00010e0 (10e0 phys) = 5fe16bf4
[   60.387507] Corrupted low memory at c00010e4 (10e4 phys) = 4de3fcc5
[   60.387508] Corrupted low memory at c00010e8 (10e8 phys) = 4f477297
[   60.387510] Corrupted low memory at c00010ec (10ec phys) = 59a47d35
[   60.387512] Corrupted low memory at c00010f0 (10f0 phys) = c97c78df
[   60.387513] Corrupted low memory at c00010f4 (10f4 phys) = e3aafa4b
[   60.387515] Corrupted low memory at c00010f8 (10f8 phys) = 658bd8cb
[   60.387517] Corrupted low memory at c00010fc (10fc phys) = 6f5eb91f
[   60.387518] Corrupted low memory at c0001100 (1100 phys) = ca66ce3a
[   

[PATCH] ARC: Fix linking errors with CONFIG_MODULE + CONFIG_CC_OPTIMIZE_FOR_SIZE

2015-12-20 Thread Vineet Gupta
At -Os, ARC gcc generates millicode thunk for function prologue/epilogue,
which are served by libgcc.

Modules historically are NOT linked with libgcc to avoid code bloat, reducing
runtime relocation fixups etc. I even once tried doign that but got lost
in makefile intricacies.

However CONFIG_MODULE + CONFIG_CC_OPTIMIZE_FOR_SIZE causes link errors

| MODPOST 5 modules
| ERROR: "__ld_r13_to_r18" [crypto/sha256_generic.ko] undefined!
| ERROR: "__ld_r13_to_r18_ret" [crypto/sha256_generic.ko] undefined!
| ERROR: "__st_r13_to_r18" [crypto/sha256_generic.ko] undefined!
| ERROR: "__ld_r13_to_r17_ret" [crypto/sha256_generic.ko] undefined!
| ERROR: "__st_r13_to_r17" [crypto/sha256_generic.ko] undefined!
| ERROR: "__ld_r13_to_r16_ret" [crypto/sha256_generic.ko] undefined!
| ERROR: "__st_r13_to_r16" [crypto/sha256_generic.ko] undefined!
|
|

Workaround that by inhibiting millicode thunks for loadable modules

Fixes STAR 9000641864:
("Linux built with optimizations for size emits errors for modules")

Reported-by: Anton Kolesov 
Cc: Michal Marek 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta 
---
 arch/arc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index cf0cf34eeb24..aeb19021099e 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -81,7 +81,7 @@ endif
 LIBGCC := $(shell $(CC) $(cflags-y) --print-libgcc-file-name)
 
 # Modules with short calls might break for calls into builtin-kernel
-KBUILD_CFLAGS_MODULE   += -mlong-calls
+KBUILD_CFLAGS_MODULE   += -mlong-calls -mno-millicode
 
 # Finally dump eveything into kernel build system
 KBUILD_CFLAGS  += $(cflags-y)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC] smp_store_mb should use smp_mb

2015-12-20 Thread Michael S. Tsirkin
On some architectures smp_store_mb() calls mb() which is stronger
than implied by both the name and the documentation.

smp_store_mb is only used by core kernel code at the moment, so
we know no one mis-uses it for an MMIO barrier.
Make it call smp_mb consistently before some arch-specific
code uses it as such by mistake.

Signed-off-by: Michael S. Tsirkin 

---

Note: I'm guessing an ack from arch maintainers will be needed, but
I'm working on a bigger cleanup moving a bunch of duplicated code
into asm-generic/barrier.h which depends on this, so not Cc'ing
widely yet.

Please Ack if appropriate but do not merge yet.

diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index df896a1..425552b 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -77,7 +77,7 @@ do {  
\
___p1;  \
 })
 
-#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); mb(); } 
while (0)
+#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); smp_mb(); 
} while (0)
 
 /*
  * The group barrier in front of the rsm & ssm are necessary to ensure
diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index 0eca6ef..4f0169e 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -34,7 +34,7 @@
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
 
-#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); mb(); } 
while (0)
+#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); smp_mb(); 
} while (0)
 
 #ifdef __SUBARCH_HAS_LWSYNC
 #define SMPWMB  LWSYNC
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index d68e11e..6c1d8b5 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -36,7 +36,7 @@
 #define smp_mb__before_atomic()smp_mb()
 #define smp_mb__after_atomic() smp_mb()
 
-#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); 
mb(); } while (0)
+#define smp_store_mb(var, value)   do { WRITE_ONCE(var, value); 
smp_mb(); } while (0)
 
 #define smp_store_release(p, v)
\
 do {   \
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index b42afad..0f45f93 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -93,7 +93,7 @@
 #endif /* CONFIG_SMP */
 
 #ifndef smp_store_mb
-#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); mb(); } while 
(0)
+#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); smp_mb(); } 
while (0)
 #endif
 
 #ifndef smp_mb__before_atomic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] theoretical race between memory hotplug and pfn iterator

2015-12-20 Thread Joonsoo Kim
On Mon, Dec 21, 2015 at 03:00:08PM +0800, Zhu Guihua wrote:
> 
> On 12/21/2015 11:15 AM, Joonsoo Kim wrote:
> >Hello, memory-hotplug folks.
> >
> >I found theoretical problems between memory hotplug and pfn iterator.
> >For example, pfn iterator works something like below.
> >
> >for (pfn = zone_start_pfn; pfn < zone_end_pfn; pfn++) {
> > if (!pfn_valid(pfn))
> > continue;
> >
> > page = pfn_to_page(pfn);
> > /* Do whatever we want */
> >}
> >
> >Sequence of hotplug is something like below.
> >
> >1) add memmap (after then, pfn_valid will return valid)
> >2) memmap_init_zone()
> >
> >So, if pfn iterator runs between 1) and 2), it could access
> >uninitialized page information.
> >
> >This problem could be solved by re-ordering initialization steps.
> >
> >Hot-remove also has a problem. If memory is hot-removed after
> >pfn_valid() succeed in pfn iterator, access to page would cause NULL
> >deference because hot-remove frees corresponding memmap. There is no
> >guard against free in any pfn iterators.
> >
> >This problem can be solved by inserting get_online_mems() in all pfn
> >iterators but this looks error-prone for future usage. Another idea is
> >that delaying free corresponding memmap until synchronization point such
> >as system suspend. It will guarantee that there is no running pfn
> >iterator. Do any have a better idea?
> >
> >Btw, I tried to memory-hotremove with QEMU 2.5.5 but it didn't work. I
> >followed sequences in doc/memory-hotplug. Do you have any comment on this?
> 
> I tried memory hot remove with qemu 2.5.5 and RHEL 7, it works well.
> Maybe you can provide more details, such as guest version, err log.

I'm testing with qemu 2.5.5 and linux-next-20151209 with reverting
following two patches.

"mm/memblock.c: use memblock_insert_region() for the empty array"
"mm-memblock-use-memblock_insert_region-for-the-empty-array-checkpatch-fixes"

When I type "device_del dimm1" in qemu monitor, there is no err log in
kernel and it looks like command has no effect. I inserted log to
acpi_memory_device_remove() but there is no message, too. Is there
another way to check that device_del event is actually transmitted to kernel?

I launch the qemu with following command.
./qemu-system-x86_64-recent -enable-kvm -smp 8 -m 4096,slots=16,maxmem=8G ...

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the akpm tree with the cgroup tree

2015-12-20 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in:

  init/Kconfig

between commit:

  6bf024e69333 ("cgroup: put controller Kconfig options in meaningful order")

from the cgroup tree and commit:

  "mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM"

from the akpm tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc init/Kconfig
index faa4d087d69e,8185e8de04a1..
--- a/init/Kconfig
+++ b/init/Kconfig
@@@ -1010,43 -1072,39 +1013,48 @@@ config MEMCG_KME
  the kmem extension can use it to guarantee that no group of processes
  will ever exhaust kernel resources alone.
  
+ This option affects the ORIGINAL cgroup interface. The cgroup2 memory
+ controller includes important in-kernel memory consumers per default.
+ 
+ If you're using cgroup2, say N.
+ 
 -config CGROUP_HUGETLB
 -  bool "HugeTLB Resource Controller for Control Groups"
 -  depends on HUGETLB_PAGE
 -  select PAGE_COUNTER
 +config BLK_CGROUP
 +  bool "IO controller"
 +  depends on BLOCK
default n
 -  help
 -Provides a cgroup Resource Controller for HugeTLB pages.
 -When you enable this, you can put a per cgroup limit on HugeTLB usage.
 -The limit is enforced during page fault. Since HugeTLB doesn't
 -support page reclaim, enforcing the limit at page fault time implies
 -that, the application will get SIGBUS signal if it tries to access
 -HugeTLB pages beyond its limit. This requires the application to know
 -beforehand how much HugeTLB pages it would require for its use. The
 -control group is tracked in the third page lru pointer. This means
 -that we cannot use the controller with huge page less than 3 pages.
 +  ---help---
 +  Generic block IO controller cgroup interface. This is the common
 +  cgroup interface which should be used by various IO controlling
 +  policies.
  
 -config CGROUP_PERF
 -  bool "Enable perf_event per-cpu per-container group (cgroup) monitoring"
 -  depends on PERF_EVENTS && CGROUPS
 -  help
 -This option extends the per-cpu mode to restrict monitoring to
 -threads which belong to the cgroup specified and run on the
 -designated cpu.
 +  Currently, CFQ IO scheduler uses it to recognize task groups and
 +  control disk bandwidth allocation (proportional time slice allocation)
 +  to such task groups. It is also used by bio throttling logic in
 +  block layer to implement upper limit in IO rates on a device.
  
 -Say N if unsure.
 +  This option only enables generic Block IO controller infrastructure.
 +  One needs to also enable actual IO controlling logic/policy. For
 +  enabling proportional weight division of disk bandwidth in CFQ, set
 +  CONFIG_CFQ_GROUP_IOSCHED=y; for enabling throttling policy, set
 +  CONFIG_BLK_DEV_THROTTLING=y.
 +
 +  See Documentation/cgroups/blkio-controller.txt for more information.
 +
 +config DEBUG_BLK_CGROUP
 +  bool "IO controller debugging"
 +  depends on BLK_CGROUP
 +  default n
 +  ---help---
 +  Enable some debugging help. Currently it exports additional stat
 +  files in a cgroup which can be useful for debugging.
 +
 +config CGROUP_WRITEBACK
 +  bool
 +  depends on MEMCG && BLK_CGROUP
 +  default y
  
  menuconfig CGROUP_SCHED
 -  bool "Group CPU scheduler"
 +  bool "CPU controller"
default n
help
  This feature lets CPU scheduler recognize task groups and control CPU
t
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

2015-12-20 Thread Michael S. Tsirkin
On Sun, Dec 20, 2015 at 08:59:44PM +0100, Peter Zijlstra wrote:
> On Sun, Dec 20, 2015 at 05:07:19PM +, Andrew Cooper wrote:
> > 
> > Very much +1 for fixing this.
> > 
> > Those names would be fine, but they do add yet another set of options in
> > an already-complicated area.
> > 
> > An alternative might be to have the regular smp_{w,r,}mb() not revert
> > back to nops if CONFIG_PARAVIRT, or perhaps if pvops have detected a
> > non-native environment.  (I don't know how feasible this suggestion is,
> > however.)
> 
> So a regular SMP kernel emits the LOCK prefix and will patch it out with
> a DS prefix (iirc) when it finds but a single CPU. So for those you
> could easily do this.
> 
> However an UP kernel will not emit the LOCK and do no patching.
> 
> So if you're willing to make CONFIG_PARAVIRT depend on CONFIG_SMP or
> similar, this is doable.

One of the uses for virtio is to allow testing an existing kernel on
kvm just by loading a module, and this will break this usecase.

> I don't see people going to allow emitting the LOCK prefix (and growing
> the kernel text size) for UP kernels.

Thinking about this more, maybe __smp_*mb is a better set of names.

The nice thing about it is that we can then have generic code
that does basically

#ifdef CONFIG_SMP
#define smp_mb() __smp_mb()
#else
#define smp_mb() barrier()
#endif 

and reuse this on all architectures.

So instead of a maintainance burden, we are actually
removing code duplication.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] arm64: Add support for PTE contiguous bit.

2015-12-20 Thread Steve Capper
On 17 December 2015 at 19:31, David Woods  wrote:
> The arm64 MMU supports a Contiguous bit which is a hint that the TTE
> is one of a set of contiguous entries which can be cached in a single
> TLB entry.  Supporting this bit adds new intermediate huge page sizes.
>
> The set of huge page sizes available depends on the base page size.
> Without using contiguous pages the huge page sizes are as follows.
>
>  4KB:   2MB  1GB
> 64KB: 512MB
>
> With a 4KB granule, the contiguous bit groups together sets of 16 pages
> and with a 64KB granule it groups sets of 32 pages.  This enables two new
> huge page sizes in each case, so that the full set of available sizes
> is as follows.
>
>  4KB:  64KB   2MB  32MB  1GB
> 64KB:   2MB 512MB  16GB
>
> If a 16KB granule is used then the contiguous bit groups 128 pages
> at the PTE level and 32 pages at the PMD level.
>
> If the base page size is set to 64KB then 2MB pages are enabled by
> default.  It is possible in the future to make 2MB the default huge
> page size for both 4KB and 64KB granules.
>
> Signed-off-by: David Woods 
> Reviewed-by: Chris Metcalf 

Thanks for this David, this looks great to me. Please add:
Reviewed-by: Steve Capper 

...and have a great Christmas break.

> ---
>
> Version 5 cleans up issues building with STRICT_MM_TYPECHECKS defined
> pointed out by Steve Capper.
>
>  arch/arm64/Kconfig |   3 -
>  arch/arm64/include/asm/hugetlb.h   |  44 ++
>  arch/arm64/include/asm/pgtable-hwdef.h |  18 ++-
>  arch/arm64/include/asm/pgtable.h   |  10 +-
>  arch/arm64/mm/hugetlbpage.c| 274 
> -
>  include/linux/hugetlb.h|   2 -
>  6 files changed, 313 insertions(+), 38 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 4876459..ffa3c54 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -530,9 +530,6 @@ config HW_PERF_EVENTS
>  config SYS_SUPPORTS_HUGETLBFS
> def_bool y
>
> -config ARCH_WANT_GENERAL_HUGETLB
> -   def_bool y
> -
>  config ARCH_WANT_HUGE_PMD_SHARE
> def_bool y if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
>
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index bb4052e..bbc1e35 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -26,36 +26,7 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
> return *ptep;
>  }
>
> -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> -  pte_t *ptep, pte_t pte)
> -{
> -   set_pte_at(mm, addr, ptep, pte);
> -}
> -
> -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> -unsigned long addr, pte_t *ptep)
> -{
> -   ptep_clear_flush(vma, addr, ptep);
> -}
> -
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> -  unsigned long addr, pte_t *ptep)
> -{
> -   ptep_set_wrprotect(mm, addr, ptep);
> -}
>
> -static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> -   unsigned long addr, pte_t *ptep)
> -{
> -   return ptep_get_and_clear(mm, addr, ptep);
> -}
> -
> -static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> -unsigned long addr, pte_t *ptep,
> -pte_t pte, int dirty)
> -{
> -   return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> -}
>
>  static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
>   unsigned long addr, unsigned long 
> end,
> @@ -97,4 +68,19 @@ static inline void arch_clear_hugepage_flags(struct page 
> *page)
> clear_bit(PG_dcache_clean, >flags);
>  }
>
> +extern pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
> +   struct page *page, int writable);
> +#define arch_make_huge_pte arch_make_huge_pte
> +extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> +   pte_t *ptep, pte_t pte);
> +extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep,
> + pte_t pte, int dirty);
> +extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> +unsigned long addr, pte_t *ptep);
> +extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
> +   unsigned long addr, pte_t *ptep);
> +extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep);
> +
>  #endif /* __ASM_HUGETLB_H */
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
> b/arch/arm64/include/asm/pgtable-hwdef.h
> index d6739e8..5c25b83 100644
> --- 

Re: [RFC] theoretical race between memory hotplug and pfn iterator

2015-12-20 Thread Zhu Guihua


On 12/21/2015 11:15 AM, Joonsoo Kim wrote:

Hello, memory-hotplug folks.

I found theoretical problems between memory hotplug and pfn iterator.
For example, pfn iterator works something like below.

for (pfn = zone_start_pfn; pfn < zone_end_pfn; pfn++) {
 if (!pfn_valid(pfn))
 continue;

 page = pfn_to_page(pfn);
 /* Do whatever we want */
}

Sequence of hotplug is something like below.

1) add memmap (after then, pfn_valid will return valid)
2) memmap_init_zone()

So, if pfn iterator runs between 1) and 2), it could access
uninitialized page information.

This problem could be solved by re-ordering initialization steps.

Hot-remove also has a problem. If memory is hot-removed after
pfn_valid() succeed in pfn iterator, access to page would cause NULL
deference because hot-remove frees corresponding memmap. There is no
guard against free in any pfn iterators.

This problem can be solved by inserting get_online_mems() in all pfn
iterators but this looks error-prone for future usage. Another idea is
that delaying free corresponding memmap until synchronization point such
as system suspend. It will guarantee that there is no running pfn
iterator. Do any have a better idea?

Btw, I tried to memory-hotremove with QEMU 2.5.5 but it didn't work. I
followed sequences in doc/memory-hotplug. Do you have any comment on this?


I tried memory hot remove with qemu 2.5.5 and RHEL 7, it works well.
Maybe you can provide more details, such as guest version, err log.

Thanks,
Zhu



Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Tejun Heo
Artem, can you please reproduce the issue with the following patch
applied and attach the kernel log?

Thanks.

---
 drivers/ata/libahci.c |   40 ++--
 drivers/ata/libata-eh.c   |4 
 drivers/ata/libata-scsi.c |1 +
 3 files changed, 43 insertions(+), 2 deletions(-)

--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -2278,7 +2278,7 @@ static int ahci_port_start(struct ata_po
struct ahci_host_priv *hpriv = ap->host->private_data;
struct device *dev = ap->host->dev;
struct ahci_port_priv *pp;
-   void *mem;
+   void *mem, *base;
dma_addr_t mem_dma;
size_t dma_sz, rx_fis_sz;
 
@@ -2319,7 +2319,9 @@ static int ahci_port_start(struct ata_po
rx_fis_sz = AHCI_RX_FIS_SZ;
}
 
-   mem = dmam_alloc_coherent(dev, dma_sz, _dma, GFP_KERNEL);
+   base = mem = dmam_alloc_coherent(dev, dma_sz, _dma, GFP_KERNEL);
+   printk("XXX port %d dma_sz=%zu mem=%p mem_dma=%p",
+  ap->port_no, dma_sz, mem, (void *)mem_dma);
if (!mem)
return -ENOMEM;
memset(mem, 0, dma_sz);
@@ -2331,6 +2333,8 @@ static int ahci_port_start(struct ata_po
pp->cmd_slot = mem;
pp->cmd_slot_dma = mem_dma;
 
+   pr_cont(" cmd_slot=%zu", mem - base);
+
mem += AHCI_CMD_SLOT_SZ;
mem_dma += AHCI_CMD_SLOT_SZ;
 
@@ -2340,6 +2344,8 @@ static int ahci_port_start(struct ata_po
pp->rx_fis = mem;
pp->rx_fis_dma = mem_dma;
 
+   pr_cont(" rx_fis=%zu", mem - base);
+
mem += rx_fis_sz;
mem_dma += rx_fis_sz;
 
@@ -2350,6 +2356,8 @@ static int ahci_port_start(struct ata_po
pp->cmd_tbl = mem;
pp->cmd_tbl_dma = mem_dma;
 
+   pr_cont(" cmd_tbl=%zu\n", mem - base);
+
/*
 * Save off initial list of interrupts to be enabled.
 * This could be changed later
@@ -2540,6 +2548,34 @@ int ahci_host_activate(struct ata_host *
 }
 EXPORT_SYMBOL_GPL(ahci_host_activate);
 
+void ahci_dump_dma(struct ata_queued_cmd *qc)
+{
+   struct ata_port *ap = qc->ap;
+   struct ahci_port_priv *pp = ap->private_data;
+   struct ahci_cmd_hdr *cmd = >cmd_slot[qc->tag];
+   void *cmd_tbl = pp->cmd_tbl + qc->tag * AHCI_CMD_TBL_SZ;
+   u32 *fis = cmd_tbl;
+   struct ahci_sg *ahci_sg = cmd_tbl + AHCI_CMD_TBL_HDR_SZ;
+   int prdtl = (cmd->opts & 0x) >> 16;
+   int i;
+
+   printk("XXX cmd=%p cmd_tbl=%p ahci_sg=%p\n", cmd, cmd_tbl, ahci_sg);
+   printk("XXX opts=%x st=%x addr=%x addr_hi=%x rsvd=%x:%x:%x:%x\n",
+  cmd->opts, cmd->status, cmd->tbl_addr, cmd->tbl_addr_hi,
+  cmd->reserved[0], cmd->reserved[1], cmd->reserved[2], 
cmd->reserved[3]);
+   printk("XXX fis=%08x:%08x:%08x:%08x %08x:%08x:%08x:%08x\n",
+  fis[0], fis[1], fis[2], fis[3],
+  fis[4], fis[5], fis[6], fis[7]);
+
+   printk("XXX qc->n_elem=%d fis_len=%d prdtl=%d\n",
+  qc->n_elem, cmd->opts & 0xf, prdtl);
+
+   for (i = 0; i < prdtl; i++)
+   printk("XXX sg[%d] = %x %x %x (%d)\n",
+  i, ahci_sg[i].addr, ahci_sg[i].addr_hi, 
ahci_sg[i].flags_size,
+  (ahci_sg[i].flags_size & 0x7fff) + 1);
+}
+
 MODULE_AUTHOR("Jeff Garzik");
 MODULE_DESCRIPTION("Common AHCI SATA low-level routines");
 MODULE_LICENSE("GPL");
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1059,6 +1059,7 @@ static int ata_do_link_abort(struct ata_
 
if (qc && (!link || qc->dev->link == link)) {
qc->flags |= ATA_QCFLAG_FAILED;
+   qc->err_mask = AC_ERR_DEV;
ata_qc_complete(qc);
nr_aborted++;
}
@@ -2416,6 +2417,8 @@ const char *ata_get_cmd_descript(u8 comm
 }
 EXPORT_SYMBOL_GPL(ata_get_cmd_descript);
 
+void ahci_dump_dma(struct ata_queued_cmd *qc);
+
 /**
  * ata_eh_link_report - report error handling to user
  * @link: ATA link EH is going on
@@ -2590,6 +2593,7 @@ static void ata_eh_link_report(struct at
  res->feature & ATA_IDNF ? "IDNF " : "",
  res->feature & ATA_ABORTED ? "ABRT " : "");
 #endif
+   ahci_dump_dma(qc);
}
 }
 
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4035,6 +4035,7 @@ int ata_scsi_user_scan(struct Scsi_Host
}
 
if (rc == 0) {
+   ata_port_freeze(ap);
ata_port_schedule_eh(ap);
spin_unlock_irqrestore(ap->lock, flags);
ata_port_wait_eh(ap);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pinctrl: mediatek: convert to arch_initcall

2015-12-20 Thread Daniel Kurtz
On Fri, Dec 18, 2015 at 11:06 PM, Yingjoe Chen
 wrote:
> On Fri, 2015-12-18 at 12:21 +0800, Daniel Kurtz wrote:
>> Move pinctrl initialization earlier in boot so that real devices can find
>> their pctldev without probe deferring.
>>
>> Signed-off-by: Daniel Kurtz 
>> ---
>>  drivers/pinctrl/mediatek/pinctrl-mt6397.c | 2 +-
>>  drivers/pinctrl/mediatek/pinctrl-mt8127.c | 2 +-
>>  drivers/pinctrl/mediatek/pinctrl-mt8135.c | 2 +-
>>  drivers/pinctrl/mediatek/pinctrl-mt8173.c | 2 +-
>>  4 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pinctrl/mediatek/pinctrl-mt6397.c 
>> b/drivers/pinctrl/mediatek/pinctrl-mt6397.c
>> index f9751ae..a3780d4 100644
>> --- a/drivers/pinctrl/mediatek/pinctrl-mt6397.c
>> +++ b/drivers/pinctrl/mediatek/pinctrl-mt6397.c
>> @@ -70,7 +70,7 @@ static int __init mtk_pinctrl_init(void)
>>   return platform_driver_register(_pinctrl_driver);
>>  }
>>
>> -module_init(mtk_pinctrl_init);
>> +arch_initcall(mtk_pinctrl_init);
>
>
> MT6397 is PMIC, which depends on pwrap on main AP to work. Since
> pmic-wrap itself is module_platform_driver, I think it make sense to
> keep this one as module_init. Maybe adding a comment to explain why it
> is different from others will help.

I interpret this the other way - I think that since the PMIC wrapper
provides a bus required for the system PMIC it should also be a
builtin and use arch_initcall.

WDYT?

-Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clk: let clk_disable() return immediately if clk is NULL or error

2015-12-20 Thread Wan ZongShun
2015-12-05 14:17 GMT+08:00 Masahiro Yamada :
> The clk_disable() in the common clock framework (drivers/clk/clk.c)
> returns immediately if the given clk is NULL or an error pointer.
> It allows drivers to call clk_disable() (and clk_disable_unprepare())
> with a clock that might be NULL or an error pointer as long as the
> drivers are only used along with the common clock framework.
>
> Unfortunately, NULL/error checking is missing from some of non-common
> clk_disable() implementations.  This prevents us from completely
> dropping NULL/error from callers.  Let's add IS_ERR_OR_NULL(clk)
> checks to all callees.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/arm/mach-ep93xx/clock.c |  2 +-
>  arch/arm/mach-lpc32xx/clock.c|  3 +++
>  arch/arm/mach-mmp/clock.c|  3 +++
>  arch/arm/mach-sa1100/clock.c | 15 ---
>  arch/arm/mach-w90x900/clock.c|  3 +++
>  arch/blackfin/mach-bf609/clock.c |  3 +++
>  arch/m68k/coldfire/clk.c |  4 
>  arch/mips/bcm63xx/clk.c  |  3 +++
>  arch/mips/lantiq/clk.c   |  3 +++
>  drivers/sh/clk/core.c|  2 +-
>  10 files changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm/mach-ep93xx/clock.c b/arch/arm/mach-ep93xx/clock.c
> index 39ef3b6..4e11f7d 100644
> --- a/arch/arm/mach-ep93xx/clock.c
> +++ b/arch/arm/mach-ep93xx/clock.c
> @@ -293,7 +293,7 @@ void clk_disable(struct clk *clk)
>  {
> unsigned long flags;
>
> -   if (!clk)
> +   if (IS_ERR_OR_NULL(clk))
> return;
>
> spin_lock_irqsave(_lock, flags);
> diff --git a/arch/arm/mach-lpc32xx/clock.c b/arch/arm/mach-lpc32xx/clock.c
> index 661c8f4..07faac2 100644
> --- a/arch/arm/mach-lpc32xx/clock.c
> +++ b/arch/arm/mach-lpc32xx/clock.c
> @@ -1125,6 +1125,9 @@ void clk_disable(struct clk *clk)
>  {
> unsigned long flags;
>
> +   if (IS_ERR_OR_NULL(clk))
> +   return;
> +
> spin_lock_irqsave(_clkregs_lock, flags);
> local_clk_disable(clk);
> spin_unlock_irqrestore(_clkregs_lock, flags);
> diff --git a/arch/arm/mach-mmp/clock.c b/arch/arm/mach-mmp/clock.c
> index 7c6f95f..7b33122 100644
> --- a/arch/arm/mach-mmp/clock.c
> +++ b/arch/arm/mach-mmp/clock.c
> @@ -67,6 +67,9 @@ void clk_disable(struct clk *clk)
>  {
> unsigned long flags;
>
> +   if (IS_ERR_OR_NULL(clk))
> +   return;
> +
> WARN_ON(clk->enabled == 0);
>
> spin_lock_irqsave(_lock, flags);
> diff --git a/arch/arm/mach-sa1100/clock.c b/arch/arm/mach-sa1100/clock.c
> index cbf53bb..ea103fd 100644
> --- a/arch/arm/mach-sa1100/clock.c
> +++ b/arch/arm/mach-sa1100/clock.c
> @@ -85,13 +85,14 @@ void clk_disable(struct clk *clk)
>  {
> unsigned long flags;
>
> -   if (clk) {
> -   WARN_ON(clk->enabled == 0);
> -   spin_lock_irqsave(_lock, flags);
> -   if (--clk->enabled == 0)
> -   clk->ops->disable(clk);
> -   spin_unlock_irqrestore(_lock, flags);
> -   }
> +   if (IS_ERR_OR_NULL(clk))
> +   return;
> +
> +   WARN_ON(clk->enabled == 0);
> +   spin_lock_irqsave(_lock, flags);
> +   if (--clk->enabled == 0)
> +   clk->ops->disable(clk);
> +   spin_unlock_irqrestore(_lock, flags);
>  }
>  EXPORT_SYMBOL(clk_disable);
>
> diff --git a/arch/arm/mach-w90x900/clock.c b/arch/arm/mach-w90x900/clock.c
> index 2c371ff..90ec250 100644
> --- a/arch/arm/mach-w90x900/clock.c
> +++ b/arch/arm/mach-w90x900/clock.c
> @@ -46,6 +46,9 @@ void clk_disable(struct clk *clk)
>  {
> unsigned long flags;
>
> +   if (IS_ERR_OR_NULL(clk))
> +   return;

Looks good for w90x900 platform.
Acked-by: Wan Zongshun 

> +
> WARN_ON(clk->enabled == 0);
>
> spin_lock_irqsave(_lock, flags);
> diff --git a/arch/blackfin/mach-bf609/clock.c 
> b/arch/blackfin/mach-bf609/clock.c
> index 3783058..fed8015 100644
> --- a/arch/blackfin/mach-bf609/clock.c
> +++ b/arch/blackfin/mach-bf609/clock.c
> @@ -97,6 +97,9 @@ EXPORT_SYMBOL(clk_enable);
>
>  void clk_disable(struct clk *clk)
>  {
> +   if (IS_ERR_OR_NULL(clk))
> +   return;
> +
> if (clk->ops && clk->ops->disable)
> clk->ops->disable(clk);
>  }
> diff --git a/arch/m68k/coldfire/clk.c b/arch/m68k/coldfire/clk.c
> index fddfdcc..eb0e8c1 100644
> --- a/arch/m68k/coldfire/clk.c
> +++ b/arch/m68k/coldfire/clk.c
> @@ -101,6 +101,10 @@ EXPORT_SYMBOL(clk_enable);
>  void clk_disable(struct clk *clk)
>  {
> unsigned long flags;
> +
> +   if (IS_ERR_OR_NULL(clk))
> +   return;
> +
> spin_lock_irqsave(_lock, flags);
> if ((--clk->enabled == 0) && clk->clk_ops)
> clk->clk_ops->disable(clk);
> diff --git a/arch/mips/bcm63xx/clk.c b/arch/mips/bcm63xx/clk.c
> index 6375652..d6a39bf 100644
> --- a/arch/mips/bcm63xx/clk.c
> +++ b/arch/mips/bcm63xx/clk.c
> @@ -326,6 +326,9 @@ EXPORT_SYMBOL(clk_enable);
>
>  void 

Re: [PATCH v3 5/5] hisilicon/dts: Add hi655x pmic dts node

2015-12-20 Thread chenfeng


On 2015/12/21 11:01, chenfeng wrote:
> Mark,
> 
> On 2015/12/19 1:58, Mark Brown wrote:
>> On Thu, Dec 17, 2015 at 11:27:27AM +0800, chenfeng wrote:
>>
>>>  +- regulator-vset-regs: Voltage set register offset.
>>>  +- regulator-vset-mask: voltage set control mask.
>>>  +- regulator-n-vol: The num of support voltages.
>>>  +- regulator-vset-table: The table of support voltages.
>>
 Why is this in the binding?  This is a binding for a specific device,
 there is no point in putting all these data tables in the DT - it just
 bloats the DT and makes it harder for us to enhance our support for this
 device in the future.
>>
>>> You mentioned in previous version,I I have some questions for it.
>>
>>> This regulator-vset-regs etc are vendor specific describe. The hi655x PMIC
>>
>> There's nothing vendor specific about the way this is written...
>>
>>> is a series of chips. They all have this value, but the offset may be 
>>> different.
>>> And we can generate the dts file from excel which is defined by SOC.
>>
>>> I think the dts is designed to distinguish different platform. If we hard 
>>> code this
>>> in files, it may be also different to use as common in next chip version.
>>
>> If your tooling can generate DT files it can generate C code just as
>> well and it seems unlikely you're going to be able to build new boards
>> without being able to do firmware updates here.  Especially for the
>> sorts of systems that use DT the set of scenarios where you're able to
>> update the DT but not the kernel seems like it will be extremely
>> limited.  I don't really buy the argument that there's any practical
>> difference in the ability to update the kernel and DT and to the extent
>> there is one it seems better to keep the ABI we have to support smaller
>> by having the DT be minimal.
>>
>> This also allows us to map things more efficiently than we can with just
>> a table of voltages.  For example a good selection of the regulators in
>> your example DT appear to be linear ranges and so should be mapped as
>> such so we can do direct calcuations rather than having to iterate
>> through a table to map voltages into selectors.  That gets especially
>> serious for higher resolution regulators like most DCDCs (and modern
>> LDOs for that matter).
>>
> Thanks,
> I see, I will change the table of voltages into driver.
> like this,
> static const unsigned int voltages[] = {
>   150, 180, 240, 250,
>   260, 270, 285, 300,
> };
> 
> And there will be two open-code function for is-enable and disable in the 
> regulator driver.
> Since we need use the status and disable register on PM chip. Only enable reg 
> in the regulator desc.
> 
> Do you agree with this?
> 
While doing this in driver code, I found that it seems all the vendor chip have
the voltage table. So I am wondering can we add this into the regulator 
framework.

We can add in the function of_get_regulation_constraints to get the vset table.

I am not sure this is right or not.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PM / sleep: console flush during suspend if no_console_suspend enabled

2015-12-20 Thread Bibek Basu
On multicore CPUs, sometimes debug console logs are not flushed
if you have VT consoles also enabled. Reason being console_lock
is taken by secondary/nonboot cpus which are  disabled as part
of suspend.This patch flushes the console before disabling
nonboot cpus

Signed-off-by: Bibek Basu 
---
 kernel/power/suspend.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index f9fe133..42c5912 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -352,6 +352,16 @@ static int suspend_enter(suspend_state_t state, bool 
*wakeup)
goto Platform_wake;
}
 
+   /*
+* Flush console buffer if console_suspend_enabled cleared.
+* This will enable console flush if console_lock is taken
+* by nonboot cpus which will soon be disabled below.
+*/
+   if (!console_suspend_enabled) {
+   console_lock();
+   console_unlock();
+   }
+
error = disable_nonboot_cpus();
if (error || suspend_test(TEST_CPUS))
goto Enable_cpus;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: linux-next: build failure after merge of the pinctrl tree

2015-12-20 Thread Pramod Kumar
Hi Stephen Rothwell,

This is the same error what we discussed on Friday mail-chain. Please see the 
attachment of my reply.

Regards,
Pramod

> -Original Message-
> From: Stephen Rothwell [mailto:s...@canb.auug.org.au]
> Sent: 21 December 2015 10:28
> To: Linus Walleij
> Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org; Pramod Kumar;
> Ray Jui; Scott Branden
> Subject: linux-next: build failure after merge of the pinctrl tree
> 
> Hi Linus,
> 
> After merging the pinctrl tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:640:50: warning: 'struct cygnus_gpio'
> declared inside parameter list  static void 
> iproc_gpio_unregister_pinconf(struct
> cygnus_gpio *chip)
>   ^
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:640:50: warning: its scope is only 
> this
> definition or declaration, which is probably not what you want
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c: In function
> 'iproc_gpio_unregister_pinconf':
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:642:25: error: dereferencing pointer 
> to
> incomplete type 'struct cygnus_gpio'
>   pinctrl_unregister(chip->pctl);
>  ^
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c: In function 'iproc_gpio_probe':
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:738:32: warning: passing argument 1 
> of
> 'iproc_gpio_unregister_pinconf' from incompatible pointer type [-
> Wincompatible-pointer-types]
>   iproc_gpio_unregister_pinconf(chip);
> ^
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:640:13: note: expected 'struct
> cygnus_gpio *' but argument is of type 'struct iproc_gpio *'
>  static void iproc_gpio_unregister_pinconf(struct cygnus_gpio *chip)
>  ^
> 
> Caused by commit
> 
>  afc8c78d179d ("gpio: Rename func/macro/var to IP-block,iproc")
> 
> This does not look like it has even been build tested :-(
> 
> I have used the pinctrl tree from next-20151217 again as the previous
> (fixed) error was hiding this one.
> 
> --
> Cheers,
> Stephen Rothwells...@canb.auug.org.au

--- Begin Message ---
Hi Stephen/Linus,

The patch " [PATCH v2 5/7] gpio: Rename func/macro/var to IP-block,iproc" 
(https://lkml.org/lkml/2015/11/18/1004) in "for-next" and "devel" of  pinctrl
tree ( http://git.linaro.org/people/linus.walleij/linux-pinctrl.git ) has 
little rebase issue.

Original patch has:

-static void cygnus_gpio_unregister_pinconf(struct cygnus_gpio *chip)
+static void iproc_gpio_unregister_pinconf(struct iproc_gpio *chip)


while "for-next" and "devel" branch of pinctrl tree has:

-static void cygnus_gpio_unregister_pinconf(struct cygnus_gpio *chip)
+static void iproc_gpio_unregister_pinconf(struct cygnus_gpio *chip) 


Please suggest us how could we fix this issue.

Regards,
Pramod

> -Original Message-
> From: Stephen Rothwell [mailto:s...@canb.auug.org.au]
> Sent: 18 December 2015 09:15
> To: Linus Walleij
> Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org; Pramod Kumar;
> Ray Jui; Scott Branden
> Subject: linux-next: build failure after merge of the pinctrl tree
> 
> Hi Linus,
> 
> After merging the pinctrl tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c: In function
> 'iproc_gpio_unregister_pinconf':
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:642:25: error: dereferencing pointer 
> to
> incomplete type 'struct cygnus_gpio'
>   pinctrl_unregister(chip->pctl);
>  ^
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c: In function 'iproc_gpio_probe':
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:738:32: warning: passing argument 1 
> of
> 'iproc_gpio_unregister_pinconf' from incompatible pointer type [-
> Wincompatible-pointer-types]
>   iproc_gpio_unregister_pinconf(chip);
> ^
> drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:640:13: note: expected 'struct
> cygnus_gpio *' but argument is of type 'struct iproc_gpio *'
>  static void iproc_gpio_unregister_pinconf(struct cygnus_gpio *chip)
>  ^
> 
> Caused by commit
> 
>   616043d58a89 ("pinctrl: Rename gpio driver from cygnus to iproc")
> 
> I have used the pinctrl tree from next-20151217 for today.
> 
> --
> Cheers,
> Stephen Rothwells...@canb.auug.org.au

--- End Message ---


Re: [Propose] Isolate core_pattern in mnt namespace.

2015-12-20 Thread Dongsheng Yang

On 12/20/2015 05:47 PM, Eric W. Biederman wrote:

Dongsheng Yang  writes:


On 12/20/2015 10:37 AM, Al Viro wrote:

On Sun, Dec 20, 2015 at 10:14:29AM +0800, Dongsheng Yang wrote:

On 12/17/2015 07:23 PM, Dongsheng Yang wrote:

Hi guys,
  We are working on making core dump behaviour isolated in
container. But the problem is, the /proc/sys/kernel/core_pattern
is a kernel wide setting, not belongs to a container.

  So we want to add core_pattern into mnt namespace. What
do you think about it?


Hi Eric,
I found your patch about "net: Implement the per network namespace
sysctl infrastructure", I want to do the similar thing
in mnt namespace. Is that suggested way?


Why mnt namespace and not something else?


Hi Al,

Well, because core_pattern indicates the path to store core file.
In different mnt namespace, we would like to change the path with
different value.

In addition, Let's considering other namespaces:
UTS ns: contains informations of kernel and arch, not proper for core_pattern.
IPC ns: communication informations, not proper for core_pattern
PID ns: core_pattern is not related with pid
net ns: obviousely no.
user ns: not proper too.

Then I believe it's better to do this in mnt namespace. of course,
core_pattern is just one example. After this infrastructure finished,
we can implement more sysctls as per-mnt if necessary, I think.

Al, what do you think about this idea?


The hard part is not the sysctl.  The hard part is starting the usermode
helper, in an environment that it can deal with.  The mount namespace
really provides you with no help there.


Do you mean the core dump helper? But I think I don't want to touch it
in my development. I think I can use non-pipe way to get what I want,
Let me try to explain what I want here.

(1). introduce a --core-path option in docker run command to specify the
path in host to store core file in one container.
E.g: docker run --core-path=/core/test --name=test IMAGE

(2). When the container starting, docker attach a volume to it, similar
with "-v /core/test:/var/lib/docker/coredump". That means, the path of
/var/lib/docker/coredump in container is a link to /core/test in host.

(3). Set the /proc/sys/kernel/core_pattern in container as
"/var/lib/docker/coredump". But that should not affect the core_pattern
in host or other containers.

Then I think I can collect the core files from each container and save
them in the paths where I want.

Thanx
Yang


Eric







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous

2015-12-20 Thread Joonsoo Kim
There is a performance drop report due to hugepage allocation and in there
half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
In that workload, compaction is triggered to make hugepage but most of
pageblocks are un-available for compaction due to pageblock type and
skip bit so compaction usually fails. Most costly operations in this case
is to find valid pageblock while scanning whole zone range. To check
if pageblock is valid to compact, valid pfn within pageblock is required
and we can obtain it by calling pageblock_pfn_to_page(). This function
checks whether pageblock is in a single zone and return valid pfn
if possible. Problem is that we need to check it every time before
scanning pageblock even if we re-visit it and this turns out to
be very expensive in this workload.

Although we have no way to skip this pageblock check in the system
where hole exists at arbitrary position, we can use cached value for
zone continuity and just do pfn_to_page() in the system where hole doesn't
exist. This optimization considerably speeds up in above workload.

Before vs After
Max: 1096 MB/s vs 1325 MB/s
Min: 635 MB/s 1015 MB/s
Avg: 899 MB/s 1194 MB/s

Avg is improved by roughly 30% [2].

[1]: http://www.spinics.net/lists/linux-mm/msg97378.html
[2]: https://lkml.org/lkml/2015/12/9/23

v2
o checking zone continuity after initialization
o handle memory-hotplug case

Reported and Tested-by: Aaron Lu 
Signed-off-by: Joonsoo Kim 
---
 include/linux/gfp.h|  6 ---
 include/linux/memory_hotplug.h |  3 ++
 include/linux/mmzone.h |  2 +
 mm/compaction.c| 43 -
 mm/internal.h  | 12 ++
 mm/memory_hotplug.c| 10 +
 mm/page_alloc.c| 85 +-
 7 files changed, 111 insertions(+), 50 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 91f74e7..6eb3eca 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct 
per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 void page_alloc_init_late(void);
-#else
-static inline void page_alloc_init_late(void)
-{
-}
-#endif
 
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 2ea574f..18c2676 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -196,6 +196,9 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 68cc063..eb5d88e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -523,6 +523,8 @@ struct zone {
boolcompact_blockskip_flush;
 #endif
 
+   boolcontiguous;
+
ZONE_PADDING(_pad3_)
/* Zone statistics */
atomic_long_t   vm_stat[NR_VM_ZONE_STAT_ITEMS];
diff --git a/mm/compaction.c b/mm/compaction.c
index 56fa321..9c89d46 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
 }
 
-/*
- * Check that the whole (or subset of) a pageblock given by the interval of
- * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
- * with the migration of free compaction scanner. The scanners then need to
- * use only pfn_valid_within() check for arches that allow holes within
- * pageblocks.
- *
- * Return struct page pointer of start_pfn, or NULL if checks were not passed.
- *
- * It's possible on some configurations to have a setup like node0 node1 node0
- * i.e. it's possible that all pages within a zones range of pages do not
- * belong to a single zone. We assume that a border between node0 and node1
- * can occur within a single pageblock, but not a node0 node1 node0
- * interleaving within a single pageblock. It is therefore sufficient to check
- * the first and last page of a pageblock and avoid checking each individual
- * page in a pageblock.
- */
-static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
-   unsigned long end_pfn, struct zone *zone)
-{
-   struct page *start_page;
-   struct page *end_page;
-
-   /* end_pfn is one past the range we are checking */
-   end_pfn--;
-
-   if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
-   return NULL;
-
-   start_page = pfn_to_page(start_pfn);
-
-   if (page_zone(start_page) != zone)
-   return NULL;

[PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn

2015-12-20 Thread Joonsoo Kim
free_pfn and compact_cached_free_pfn are the pointer that remember
restart position of freepage scanner. When they are reset or invalid,
we set them to zone_end_pfn because freepage scanner works in reverse
direction. But, because zone range is defined as [zone_start_pfn,
zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should
not store it to free_pfn and compact_cached_free_pfn. Instead, we need
to store zone_end_pfn - 1 to them. There is one more thing we should
consider. Freepage scanner scan reversely by pageblock unit. If free_pfn
and compact_cached_free_pfn are set to middle of pageblock, it regards
that sitiation as that it already scans front part of pageblock so we
lose opportunity to scan there. To fix-up, this patch do round_down()
to guarantee that reset position will be pageblock aligned.

Note that thanks to the current pageblock_pfn_to_page() implementation,
actual access to zone_end_pfn doesn't happen until now. But, following
patch will change pageblock_pfn_to_page() so this patch is needed
from now on.

Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/compaction.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 585de54..56fa321 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -200,7 +200,8 @@ static void reset_cached_positions(struct zone *zone)
 {
zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
-   zone->compact_cached_free_pfn = zone_end_pfn(zone);
+   zone->compact_cached_free_pfn =
+   round_down(zone_end_pfn(zone) - 1, pageblock_nr_pages);
 }
 
 /*
@@ -1371,11 +1372,11 @@ static int compact_zone(struct zone *zone, struct 
compact_control *cc)
 */
cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
cc->free_pfn = zone->compact_cached_free_pfn;
-   if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
-   cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
+   if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+   cc->free_pfn = round_down(end_pfn - 1, pageblock_nr_pages);
zone->compact_cached_free_pfn = cc->free_pfn;
}
-   if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
+   if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
cc->migrate_pfn = start_pfn;
zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kexec: Move some memembers and definitions within the scope of CONFIG_KEXEC_FILE

2015-12-20 Thread Xunlei Pang
Move the stuff currently only used by the kexec file code within
CONFIG_KEXEC_FILE (and CONFIG_KEXEC_VERIFY_SIG).

Also move internal "struct kexec_sha_region" and "struct kexec_buf"
into "kexec_internal.h".

Signed-off-by: Xunlei Pang 
---
 arch/x86/kernel/machine_kexec_64.c |  2 ++
 include/linux/kexec.h  | 62 +++---
 kernel/kexec_file.c|  2 ++
 kernel/kexec_internal.h| 21 +
 4 files changed, 50 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 819ab3f..ba7fbba 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -385,6 +385,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
return image->fops->cleanup(image->image_loader_data);
 }
 
+#ifdef CONFIG_KEXEC_VERIFY_SIG
 int arch_kexec_kernel_verify_sig(struct kimage *image, void *kernel,
 unsigned long kernel_len)
 {
@@ -395,6 +396,7 @@ int arch_kexec_kernel_verify_sig(struct kimage *image, void 
*kernel,
 
return image->fops->verify_sig(kernel, kernel_len);
 }
+#endif
 
 /*
  * Apply purgatory relocations.
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 7b68d27..2cc643c 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -109,11 +109,7 @@ struct compat_kexec_segment {
 };
 #endif
 
-struct kexec_sha_region {
-   unsigned long start;
-   unsigned long len;
-};
-
+#ifdef CONFIG_KEXEC_FILE
 struct purgatory_info {
/* Pointer to elf header of read only purgatory */
Elf_Ehdr *ehdr;
@@ -130,6 +126,28 @@ struct purgatory_info {
unsigned long purgatory_load_addr;
 };
 
+typedef int (kexec_probe_t)(const char *kernel_buf, unsigned long kernel_size);
+typedef void *(kexec_load_t)(struct kimage *image, char *kernel_buf,
+unsigned long kernel_len, char *initrd,
+unsigned long initrd_len, char *cmdline,
+unsigned long cmdline_len);
+typedef int (kexec_cleanup_t)(void *loader_data);
+
+#ifdef CONFIG_KEXEC_VERIFY_SIG
+typedef int (kexec_verify_sig_t)(const char *kernel_buf,
+unsigned long kernel_len);
+#endif
+
+struct kexec_file_ops {
+   kexec_probe_t *probe;
+   kexec_load_t *load;
+   kexec_cleanup_t *cleanup;
+#ifdef CONFIG_KEXEC_VERIFY_SIG
+   kexec_verify_sig_t *verify_sig;
+#endif
+};
+#endif
+
 struct kimage {
kimage_entry_t head;
kimage_entry_t *entry;
@@ -161,6 +179,7 @@ struct kimage {
struct kimage_arch arch;
 #endif
 
+#ifdef CONFIG_KEXEC_FILE
/* Additional fields for file based kexec syscall */
void *kernel_buf;
unsigned long kernel_buf_len;
@@ -179,38 +198,7 @@ struct kimage {
 
/* Information for loading purgatory */
struct purgatory_info purgatory_info;
-};
-
-/*
- * Keeps track of buffer parameters as provided by caller for requesting
- * memory placement of buffer.
- */
-struct kexec_buf {
-   struct kimage *image;
-   char *buffer;
-   unsigned long bufsz;
-   unsigned long mem;
-   unsigned long memsz;
-   unsigned long buf_align;
-   unsigned long buf_min;
-   unsigned long buf_max;
-   bool top_down;  /* allocate from top of memory hole */
-};
-
-typedef int (kexec_probe_t)(const char *kernel_buf, unsigned long kernel_size);
-typedef void *(kexec_load_t)(struct kimage *image, char *kernel_buf,
-unsigned long kernel_len, char *initrd,
-unsigned long initrd_len, char *cmdline,
-unsigned long cmdline_len);
-typedef int (kexec_cleanup_t)(void *loader_data);
-typedef int (kexec_verify_sig_t)(const char *kernel_buf,
-unsigned long kernel_len);
-
-struct kexec_file_ops {
-   kexec_probe_t *probe;
-   kexec_load_t *load;
-   kexec_cleanup_t *cleanup;
-   kexec_verify_sig_t *verify_sig;
+#endif
 };
 
 /* kexec interface functions */
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index b70ada0..007b791 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -109,11 +109,13 @@ int __weak arch_kimage_file_post_load_cleanup(struct 
kimage *image)
return -EINVAL;
 }
 
+#ifdef CONFIG_KEXEC_VERIFY_SIG
 int __weak arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
unsigned long buf_len)
 {
return -EKEYREJECTED;
 }
+#endif
 
 /* Apply relocations of type RELA */
 int __weak
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index e4392a6..0a52315 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -15,6 +15,27 @@ int kimage_is_destination_range(struct kimage *image,
 extern struct mutex kexec_mutex;
 
 #ifdef CONFIG_KEXEC_FILE
+struct kexec_sha_region {
+   unsigned long 

WANTED new maintainer for Linux/md (and mdadm)

2015-12-20 Thread NeilBrown

hi,
 I became maintainer for md (Linux Software RAID) in late 2001 and on
 the whole it has been fun and a valuable experience.  But I have been
 losing interest in recent years (https://lwn.net/Articles/511073/) and
 as was mentioned at the kernel summit, I would like to resign.  Some
 years ago I managed to hand over nfsd to the excellent Bruce Fields,
 but I do not seem to have the gift that Linus has of attracting
 maintainers.  While there are a number of people who know quite a bit
 about md and/or have contributed to development, there is no obvious
 candidate for replacement maintainer - no one who has already been
 doing significant parts of the maintainer role.

 So I have decided to fall back on the mechanism by which I ended up
 being maintainer in the first place.  I will create a vacuum and hope
 someone fills it (yes: I was sucked-in).  So as of 1st February
 2016 I will be resigning.

 At the kernel summit in October Linus talked about the value of
 maintainership teams (https://lwn.net/Articles/662979/).  I think it
 would be great if a (small) team formed to continue to oversee md
 rather than just a single individual (or maybe the dm team could extend
 to include md??).  If I had managed to be part of a team rather than
 "going it alone" for so long, I might feel less tired of the whole
 thing now.

 I don't see it as my place to appoint that team or any individuals, or
 even to nominate any candidates.  A very important attribute of a
 maintainer is that they need to care about the code and the subsystem
 and I cannot tell other people to care (or even know if they do).  It
 is really up to individuals to volunteer.  A few people have been
 mentioned to me in earlier less-public conversations.  Any of them may
 well be suitable, but I would rather they named themselves if
 interested.

 So I'm hoping to get one or more volunteers to be maintainer:
   - to gather and manage patches and outstanding issues,
   - to review patches or get them reviewed
   - to follow up bug reports and get them resolved
   - to feed patches upstream, maybe directly to Linus,
 maybe through some other maintainer, depending on what
 relationships already exist or can be formed,
   - to guide the longer term direction (saying "no" is important
 sometimes),
   - to care,
 but also to be aware that maintainership takes real effort and time, as
 does anything that is really worthwhile.

 This all applies to mdadm as well as md (except you would ultimately
 *be* upstream for mdadm, not needing to send it anywhere).  Even if a
 clear team doesn't form it would be great if different people
 maintained mdadm and md.

 One part of the job that I have put a lot of time in to is following
 the linux-r...@vger.kernel.org list and providing support.  This makes
 people feel good about md and so more adventurous in using it.
 Consequently I tend to hear about bugs and usability issues nice and
 early (well before paying customers hit them in most cases) and that is
 a big win.
 In recent times I've been doing less of this and have been absolutely
 thrilled that the gap has been more than filled by other very competent
 community members.  Not developers particular but a number of md users
 have been providing excellent support.  I'd particularly like to
 high-light Phil Turmel who is very forthcoming with excellent advice,
 but he is certainly not the only one who deserves a lot of thanks.
 So "Thank you" to everyone who answers questions on linux-raid.

 This would be a good place for any future maintainer to hang out to
 receive wisdom as well as to provide support.

 I will still be around.  I can certainly help out in some sort of
 mentor role, and can probably be convinced to review patches and
 comment on designs.  But I really want to head towards spending less
 time on md (there are so many other interesting things to learn about).

 So: if anyone is interested - please announce yourself, ask questions
 and start doing things.  I have no clear idea about how a transition
 will happen.  That is really up to you (plural). Take the bull by the
 horns and start *being* a maintainer(team).  I won't get in your way
 and I'll help where I can.

Thanks,
NeilBrown

P.S. I'm committed to continue to work with the raid5-journal effort
From Facebook and the raid1-cluster effort from SUSE and the
line-in-the-sand of 1st February won't affect my support for those.


signature.asc
Description: PGP signature


Re: livepatch: reuse module loader code to write relocations

2015-12-20 Thread Jessica Yu

+++ Petr Mladek [17/12/15 16:45 +0100]:

On Wed 2015-12-16 00:40:48, Jessica Yu wrote:

Turns out the string parsing stuff, even with the help of lib/string.c, doesn't
look very pretty. As I'm working on v3, I'm starting to think having
klp_write_object_relocations() loop simply through all the elf sections might
not be a good idea. Let me explain.

I don't like the amount of string manipulation code that would potentially come
with this change. Even with a string as simple as ".klp.rela.objname", we'll
end up with a bunch of kstrdup's/kmalloc's and kfree's (unless we modify and
chop the section name string in place, which I don't think we should do) that
are going to be required at every iteration of the loop, all just to be able to
call strcmp() and see if we're dealing with a klp rela section that belongs to
the object in question. This also leads to more complicated error handling.


I do not think that we need to allocate and free buffers every time
we compare a substring.

One possibility is to find the position of the substring using
strchr(). Then you could compare it using strncmp() and pass there
the pointer where the substring begins.


Hm, yes you're right. Specifically, it looks like strcspn() would also
be useful for this situation (i.e. calculate the length of a substring
that does not contain certain characters); combined with strncmp(),
this should make the string code much simpler, and no more buffer
allocating/freeing. :-)

Jessica
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the l2-mtd tree

2015-12-20 Thread Stephen Rothwell
Hi Brian,

After merging the l2-mtd tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:

drivers/mtd/nand/ndfc.c: In function 'ndfc_chip_init':
drivers/mtd/nand/ndfc.c:177:2: error: 'ppdata' undeclared (first use in this 
function)
  ppdata.of_node = flash_np;
  ^

Caused by commit

  a61ae81a1907 ("mtd: nand: drop unnecessary partition parser data")

I applied this fix patch for today:

From: Stephen Rothwell 
Date: Mon, 21 Dec 2015 16:44:57 +1100
Subject: [PATCH] mtd: nand: fix for drop unnecessary partition parser data

Signed-off-by: Stephen Rothwell 
---
 drivers/mtd/nand/ndfc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
index 0709ea9dd8ed..7d72f4fe06a1 100644
--- a/drivers/mtd/nand/ndfc.c
+++ b/drivers/mtd/nand/ndfc.c
@@ -174,7 +174,6 @@ static int ndfc_chip_init(struct ndfc_controller *ndfc,
return -ENODEV;
nand_set_flash_node(chip, flash_np);
 
-   ppdata.of_node = flash_np;
mtd->name = kasprintf(GFP_KERNEL, "%s.%s", dev_name(>ofdev->dev),
  flash_np->name);
if (!mtd->name) {
-- 
2.6.2

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: module: preserve Elf information for livepatch modules

2015-12-20 Thread Jessica Yu

+++ Petr Mladek [17/12/15 17:26 +0100]:

On Mon 2015-11-30 23:21:15, Jessica Yu wrote:

For livepatch modules, copy Elf section, symbol, and string information
from the load_info struct in the module loader.

Livepatch uses special relocation sections in order to be able to patch
modules that are not yet loaded, as well as apply patches to the kernel
when the addresses of symbols cannot be determined at compile time (for
example, when kaslr is enabled). Livepatch modules must preserve Elf
information such as section indices in order to apply the remaining
relocation sections at the appropriate time (i.e. when the target module
loads).

Signed-off-by: Jessica Yu 
---
 include/linux/module.h |  9 +
 kernel/module.c| 98 --
 2 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 3a19c79..9b46256 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -425,6 +425,14 @@ struct module {

/* Notes attributes */
struct module_notes_attrs *notes_attrs;
+
+   /* Elf information (optionally saved) */
+   Elf_Ehdr *hdr;
+   Elf_Shdr *sechdrs;
+   char *secstrings;
+   struct {
+   unsigned int sym, str, mod, vers, info, pcpu;
+   } index;
 #endif


I would hide this into a structure. It is 3 pointers and 6 integers
that are mostly unused. I think about using a pointer to struct load_info
here. We could set the unused stuff to zero and NULL.

Any better idea how to share the definition with struct load_info
is welcome.


Sure. If we want to encapsulate all this information, we can perhaps
replace this with a single pointer to a copy of load_info, and maybe
move the struct definition of load_info to module.h.

Jessica
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v11] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller

2015-12-20 Thread Bharat Kumar Gogada
Hi Bjorn, can you comment on this. Marc has also replied for query on 
irq_dispose_mapping().

> Subject: RE: [PATCH v11] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL
> PCIe Host Controller
> 
> > Subject: Re: [PATCH v11] PCI: Xilinx-NWL-PCIe: Added support for
> > Xilinx NWL PCIe Host Controller
> >
> > [+cc Marc for irq_dispose_mapping() question]
> >
> > On Thu, Dec 10, 2015 at 02:10:34PM +, Bharat Kumar Gogada wrote:
> > I'm trying to figure out what the difference is between these two
> > checks and why you have both of them:
> >
> > > + if (bus->number == pcie->root_busno && devfn > 0)
> > > + if (bus->primary == pcie->root_busno && devfn > 0)
> >
> > If I understand correctly, pcie->root_busno is the bus number of the
> > Root Port device (likely 00).  I think the "bus->number ==
> > pcie->root_busno && devfn > 0" check means that the Root Port, e.g.,
> > 00:00.0, is the only device allowed on bus 00.  Often a Root Complex
> > contains several Root Ports and other integrated devices that typically are
> on bus 00.
> > But in your case, I think you're saying there is only the single Root
> > Port and no other devices.
> >
> > I think that first check takes care of everything on bus 00, so I'm
> > trying to figure out what the second check is for.  Assume your Root
> > Port is device
> > 00:00.0 and it is a bridge to [bus 01-ff].  Then we have two pci_bus
> > structs with these values:
> >
> >   bus->number = 00
> >   bus->primary = 00
> >   bus->busn_res = [bus 00-ff]
> >
> >   bus->number = 01
> >   bus->primary = 00
> >   bus->busn_res = [bus 01-ff]
> >
> > Because of the first check, 00:00.0 is the only possible device on bus
> > 00, and because of the second check, 01:00.0 is the only possible device on
> bus 01.
> > Therefore, you don't support a multifunction device connected to the
> > Root Port.  Right?
> >
> We support multifunction devices also, so this check should not be there, will
> remove this check in next patch.
> 
> > > > > + return false;
> > > > > +
> > > > > + return true;
> > > > > +}
> > > > > + * nwl_setup_sspl - Set Slot Power limit
> > > > > + *
> > > > > + * @pcie: PCIe port information  */ static int
> > > > > +nwl_setup_sspl(struct nwl_pcie *pcie)
> > > >
> > > The Set_Slot_Power_Limit Message includes a one DW data payload. The
> > > data payload is copied from the Slot Capabilities register of the
> > > Downstream Port and is written into the Device Capabilities register
> > > of the Upstream Port on the other side of the Link. Bits 9:8 of the
> > > data payload map to the Slot Power Limit Scale field and bits 7:0
> > > map to the Slot Power Limit Value field. Bits 31:10 of the data
> > > payload must be set to all 0's by the Transmitter and ignored by the
> Receiver.
> >
> > > This Message is sent automatically by the Downstream Port (of a Root
> > > Complex or a Switch) when one of the following events occurs:
> > > -> On a Configuration Write to the Slot Capabilities register (see
> > > Section 7.8.9) when the Data Link Layer reports DL_Up status.
> >
> > I interpret this as meaning "the *hardware* automatically sends a
> > Set_Slot_Power_Limit Message."  There's no mention of software doing
> > anything other than the configuration write.
> >
> > If your hardware doesn't do that, I think it's a defect.  It's fine to
> > work around it, but we should have a comment to that effect so people
> > don't copy the code to new drivers that don't need it.
> 
> Our hardware is not capable of doing it, so we are doing it software. Yes I 
> will
> add some comments.
> 
> >
> > It's a little strange that 7.8.9 talks about writing to this register
> > when all of its fields are HwInit and supposedly read-only.  I had
> > assumed devices would use strapping or implementation-specific
> > registers to set the Slot Power values, but maybe some devices use direct
> writes to Slot Capabilities instead.
> >
> > BTW, I noticed a related lspci bug: it didn't decode the Capture Slot
> > Power Limit in Device Capabilities of Endpoints.  I posted a fix for that
> separately.
> >
> > The Slot Power Limit (in Slot Capabilities) indicates how much power
> > the slot can supply to a downstream device.  That's a function of the
> > platform design, so it seems like this is something you want to set
> > via DT or some other mechanism that knows about the platform.
> > Intercepting all config writes and updating it with whatever the
> > caller supplies doesn't sound wise.  The value might be coming from
> > setpci or some other source with no knowledge of the platform.
> 
> Agreed, but this is what can be done, it is difficult to determine who does
> what.
> >
> > > > > + status = nwl_bridge_readl(pcie,
> TX_PCIE_MSG)
> > > > > +   & MSG_DONE_BIT;
> > > > > + if (status) {
> > > > > + status = nwl_bridge_readl(pcie,
> > > > TX_PCIE_MSG)
> > > > > +  

Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Linus Torvalds
On Sun, Dec 20, 2015 at 8:47 PM, Linus Torvalds
 wrote:
>
> That said, we obviously need to figure out this current problem
> regardless first..

... although maybe it *would* be interesting to hear what happens if
you just compile a 64-bit kernel instead?

Do you still see the problem? Because if not, then we should look very
specifically for some 32-bit PAE issue.

For example, maybe we use "unsigned long" somewhere where we should
use "phys_addr_t". On x86-64, they obviously end up being the same. On
normal non-PAE x86-32, they are also the same. But ..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Linus Torvalds
On Sun, Dec 20, 2015 at 8:26 PM, Tejun Heo  wrote:
>
> I wonder whether ahci is screwing up command / sg table setup in a way
> that e.g. if there are too many segments the sg table overflows into
> the neighboring one which is now being exposed by upper layer being
> fixed to send down larger commands.  Looking into it.

That would explain the

  Corrupted low memory at c0001000 ...

that Artem also saw.

Anyway, it would be lovely to have some verification in the ATA
routines that the passed-on IO actually h9onors the limits it set.
Could you add a WARN_ON_ONCE(check_io_limits())" or similar, and maybe
we could catch whatever causes the overflow red-handed?

On a totally separate issue:

Just looking at some of the merging code, and I have to say that it
strikes me as insane. This in particular:

  #define __BIO_SEG_BOUNDARY(addr1, addr2, mask) \
(((addr1) | (mask)) == (((addr2) - 1) | (mask)))
  #define BIOVEC_SEG_BOUNDARY(q, b1, b2) \
__BIO_SEG_BOUNDARY(bvec_to_phys((b1)), bvec_to_phys((b2)) +
(b2)->bv_len, queue_segment_boundary((q)))

seems just *stupid*.

Why does it do that "bvec_to_phys((b2)) + (b2)->bv_len -1" on the
second bvec? That's the :"physical address of the last byte of the
second bvec".

I understand the "round both addresses up by the mask, and we want to
make sure that they are in the same segment" part.

But since an individual bvec had better be fully inside one segment
(since we split at bvec boundaries anyway, so if ). why do all that
crap anyway? The end address doesn't matter, you could just use the
beginning.

So remove the "-1" and remove the "+bv_len".

At which it would become just

  #define __BIO_SEG_BOUNDARY(addr1, addr2, mask) \
((addr1) | (mask) == (addr2)|(mask))
  #define BIOVEC_SEG_BOUNDARY(q, b1, b2) \
__BIO_SEG_BOUNDARY(bvec_to_phys((b1)), bvec_to_phys((b2)),
queue_segment_boundary((q)))

which seems simpler and more understandable. "Are the beginning
addresses in within the same segment"

Or are there ever bv_len == 0 things at the boundary that we want to
merge. Because then the "-1+bv_len" case migth make sense.

Anyway, that shouldn't change the end result in any way, so that
doesn't all *matter*, but it worries me when things look more
complicated than I think they should be.

Is there something I'm missing?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the gpio tree

2015-12-20 Thread Stephen Rothwell
Hi Linus,

On Wed, 16 Dec 2015 14:37:55 +1100 Stephen Rothwell  
wrote:
>
> After merging the gpio tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> drivers/pinctrl/bcm/pinctrl-nsp-gpio.c: In function 'nsp_gpio_probe':
> drivers/pinctrl/bcm/pinctrl-nsp-gpio.c:699:4: error: implicit declaration of 
> function 'set_irq_flags' [-Werror=implicit-function-declaration]
> set_irq_flags(irq, IRQF_VALID);
> ^
> drivers/pinctrl/bcm/pinctrl-nsp-gpio.c:699:23: error: 'IRQF_VALID' undeclared 
> (first use in this function)
> set_irq_flags(irq, IRQF_VALID);
>^
> 
> Caused by commit
> 
>   8bfcbbbcabe0 ("pinctrl: nsp: add gpio-a driver support for Broadcom NSP 
> SoC")
> 
> set_irq_flags was removed before v4.3-rc2 ...
> 
> I have used the gpio tree from next-20151215 for today.

I am still getting this error.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the pinctrl tree

2015-12-20 Thread Stephen Rothwell
Hi Linus,

After merging the pinctrl tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:640:50: warning: 'struct cygnus_gpio' 
declared inside parameter list
 static void iproc_gpio_unregister_pinconf(struct cygnus_gpio *chip)
  ^
drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:640:50: warning: its scope is only 
this definition or declaration, which is probably not what you want
drivers/pinctrl/bcm/pinctrl-iproc-gpio.c: In function 
'iproc_gpio_unregister_pinconf':
drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:642:25: error: dereferencing pointer 
to incomplete type 'struct cygnus_gpio'
  pinctrl_unregister(chip->pctl);   
 ^
drivers/pinctrl/bcm/pinctrl-iproc-gpio.c: In function 'iproc_gpio_probe':
drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:738:32: warning: passing argument 1 of 
'iproc_gpio_unregister_pinconf' from incompatible pointer type 
[-Wincompatible-pointer-types]   
  iproc_gpio_unregister_pinconf(chip);
^   
drivers/pinctrl/bcm/pinctrl-iproc-gpio.c:640:13: note: expected 'struct 
cygnus_gpio *' but argument is of type 'struct iproc_gpio *'
 static void iproc_gpio_unregister_pinconf(struct cygnus_gpio *chip)
 ^

Caused by commit

 afc8c78d179d ("gpio: Rename func/macro/var to IP-block,iproc")

This does not look like it has even been build tested :-(

I have used the pinctrl tree from next-20151217 again as the previous
(fixed) error was hiding this one.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the cgroup tree with the tip tree

2015-12-20 Thread Tejun Heo
On Mon, Dec 21, 2015 at 03:41:19PM +1100, Stephen Rothwell wrote:
> Hi Tejun,
> 
> Today's linux-next merge of the cgroup tree got a conflict in:
> 
>   init/Kconfig
> 
> between commits:
> 
>   257372262056 ("x86/intel_rdt: Add support for Cache Allocation detection")
>   5ad9144cdb9a ("x86,cgroup/intel_rdt : Add a cgroup interface to manage 
> Intel cache allocation")

Ingo, can you please revert the rdt cgroup.  I still don't think this
is the right approach and tglx's new proposal seems a lot better.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Linus Torvalds
On Sun, Dec 20, 2015 at 8:43 PM, Artem S. Tashkinov  wrote:
>
> In the past I happily ran an x86_64 bit kernel together with 32bit userland
> for quite some time but then I hit a wall: VirtualBox expects its kernel
> modules to have the same bitness as the application itself so I had to
> revert back to an i686 PAE setup.

Ugh, ok. That kind of forces your hand, yes.

Although:

> t's probably high time to try qemu however last time I looked at it a few
> years ago it lacked several crucial features I need from a VM.

kvm-qemu really ends up working pretty well.. Give it a try.

That said, we obviously need to figure out this current problem
regardless first..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5] arm64: run-time detection for aarch32 support

2015-12-20 Thread Yury Norov
Kernel option COMPAT defines the ability of executing aarch32 binaries.
Some platforms does not support aarch32 mode, and so cannot execute that
binaries. But we cannot just disable COMPAT for them because the same
kernel binary may be used by multiple platforms.

In this patch, system_supports_aarch32_el0() is introduced to detect
aarch32 support at run-time.

v4: use new CPU Feature API.
v5: use cpuid_feature_extract_field.

Signed-off-by: Yury Norov 
---
 arch/arm64/include/asm/cpufeature.h | 7 +++
 arch/arm64/include/asm/elf.h| 6 --
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 8f271b8..bcb0dbe 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -184,6 +184,13 @@ static inline bool system_supports_mixed_endian_el0(void)
return 
id_aa64mmfr0_mixed_endian_el0(read_system_reg(SYS_ID_AA64MMFR0_EL1));
 }
 
+static inline bool system_supports_aarch32_el0(void)
+{
+   u64 pfr0 = read_system_reg(SYS_ID_AA64PFR0_EL1);
+   return cpuid_feature_extract_field(pfr0, ID_AA64PFR0_EL0_SHIFT)
+   != ID_AA64PFR0_EL0_64BIT_ONLY;
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index faad6df..99ec9ac 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -21,6 +21,7 @@
 /*
  * ELF register definitions..
  */
+#include 
 #include 
 #include 
 
@@ -173,8 +174,9 @@ typedef compat_elf_greg_t   
compat_elf_gregset_t[COMPAT_ELF_NGREG];
 
 /* AArch32 EABI. */
 #define EF_ARM_EABI_MASK   0xff00
-#define compat_elf_check_arch(x)   (((x)->e_machine == EM_ARM) && \
-((x)->e_flags & EF_ARM_EABI_MASK))
+#define compat_elf_check_arch(x)   (system_supports_aarch32_el0()  \
+&& ((x)->e_machine == EM_ARM)  \
+&& ((x)->e_flags & EF_ARM_EABI_MASK))
 
 #define compat_start_threadcompat_start_thread
 #define COMPAT_SET_PERSONALITY(ex) set_thread_flag(TIF_32BIT);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-21 09:32, Linus Torvalds wrote:

On Sun, Dec 20, 2015 at 5:50 PM, Artem S. Tashkinov wrote:


P.S. I know Linus doesn't condone PAE but I still find it more 
preferrable

than running a mixed environment with almost zero benefit in regard to
performance and quite obvious performance regressions related to an
increased number of libraries being loaded (i686 + x86_64) and 
slightly

bloated code which sometimes cannot fit in the CPU cache. Call me old
fashioned but I won't upgrade to x86_64 until most of the things that 
I run

locally are available for x86_64 and that won't happen any time soon.


Don't upgrade *user* land. User land doesn't use the braindamage that 
is PAE.


Just run a 64-bit kernel. Keep all your 32-bit userland apps and 
libraries.


Trust me, that *will* be faster. PAE works really horribly badly,
because all your really important data structures like your inodes and
directory cache will all be in the low 1GB even if you have 16BG of
RAM.

Of course, I'd also like more people to run things that way just to
get more coverage of the whole "yes, we do all the compat stuff
correctly". So I have some other reasons to prefer people running
64-bit kernels with 32-bit user land. But PAE really is a disaster.



In the past I happily ran an x86_64 bit kernel together with 32bit 
userland for quite some time but then I hit a wall: VirtualBox expects 
its kernel modules to have the same bitness as the application itself so 
I had to revert back to an i686 PAE setup. It's probably high time to 
try qemu however last time I looked at it a few years ago it lacked 
several crucial features I need from a VM.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the cgroup tree with the tip tree

2015-12-20 Thread Stephen Rothwell
Hi Tejun,

Today's linux-next merge of the cgroup tree got a conflict in:

  include/linux/cgroup_subsys.h

between commit:

  5ad9144cdb9a ("x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel 
cache allocation")

from the tip tree and commit:

  b53202e63089 ("cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends")

from the cgroup tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc include/linux/cgroup_subsys.h
index c559ef5cae23,0df0336acee9..
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@@ -58,15 -52,6 +52,10 @@@ SUBSYS(net_prio
  SUBSYS(hugetlb)
  #endif
  
 +#if IS_ENABLED(CONFIG_INTEL_RDT)
 +SUBSYS(intel_rdt)
 +#endif
 +
- /*
-  * Subsystems that implement the can_fork() family of callbacks.
-  */
- SUBSYS_TAG(CANFORK_START)
- 
  #if IS_ENABLED(CONFIG_CGROUP_PIDS)
  SUBSYS(pids)
  #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the cgroup tree with the tip tree

2015-12-20 Thread Stephen Rothwell
Hi Tejun,

Today's linux-next merge of the cgroup tree got a conflict in:

  init/Kconfig

between commits:

  257372262056 ("x86/intel_rdt: Add support for Cache Allocation detection")
  5ad9144cdb9a ("x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel 
cache allocation")

from the tip tree and commit:

  6bf024e69333 ("cgroup: put controller Kconfig options in meaningful order")

from the cgroup tree.

I fixed it up (see below - I wasn't sure where to put the new INTEL_RDT
config option) and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc init/Kconfig
index 4139128e3cab,f8754f502c36..
--- a/init/Kconfig
+++ b/init/Kconfig
@@@ -935,77 -940,6 +935,18 @@@ menuconfig CGROUP
  
  if CGROUPS
  
 +config INTEL_RDT
 +  bool "Intel Resource Director Technology support"
 +  depends on X86_64 && CPU_SUP_INTEL
 +  help
 +This option provides support for Cache allocation which is a
 +sub-feature of Intel Resource Director  Technology(RDT).
 +Current implementation supports L3 cache allocation.
 +Using this feature a user can specify the amount of L3 cache space
 +into which an application can fill.
 +
 +Say N if unsure.
 +
- config CGROUP_DEBUG
-   bool "Example debug cgroup subsystem"
-   default n
-   help
- This option enables a simple cgroup subsystem that
- exports useful debugging information about the cgroups
- framework.
- 
- Say N if unsure.
- 
- config CGROUP_FREEZER
-   bool "Freezer cgroup subsystem"
-   help
- Provides a way to freeze and unfreeze all tasks in a
- cgroup.
- 
- config CGROUP_PIDS
-   bool "PIDs cgroup subsystem"
-   help
- Provides enforcement of process number limits in the scope of a
- cgroup. Any attempt to fork more processes than is allowed in the
- cgroup will fail. PIDs are fundamentally a global resource because it
- is fairly trivial to reach PID exhaustion before you reach even a
- conservative kmemcg limit. As a result, it is possible to grind a
- system to halt without being limited by other cgroup policies. The
- PIDs cgroup subsystem is designed to stop this from happening.
- 
- It should be noted that organisational operations (such as attaching
- to a cgroup hierarchy will *not* be blocked by the PIDs subsystem),
- since the PIDs limit only affects a process's ability to fork, not to
- attach to a cgroup.
- 
- config CGROUP_DEVICE
-   bool "Device controller for cgroups"
-   help
- Provides a cgroup implementing whitelists for devices which
- a process in the cgroup can mknod or open.
- 
- config CPUSETS
-   bool "Cpuset support"
-   help
- This option will let you create and manage CPUSETs which
- allow dynamically partitioning a system into sets of CPUs and
- Memory Nodes and assigning tasks to run only within those sets.
- This is primarily useful on large SMP or NUMA systems.
- 
- Say N if unsure.
- 
- config PROC_PID_CPUSET
-   bool "Include legacy /proc//cpuset file"
-   depends on CPUSETS
-   default y
- 
- config CGROUP_CPUACCT
-   bool "Simple CPU accounting cgroup subsystem"
-   help
- Provides a simple Resource Controller for monitoring the
- total CPU consumed by the tasks in a cgroup.
- 
  config PAGE_COUNTER
 bool
  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Linus Torvalds
On Sun, Dec 20, 2015 at 5:50 PM, Artem S. Tashkinov  wrote:
>
> P.S. I know Linus doesn't condone PAE but I still find it more preferrable
> than running a mixed environment with almost zero benefit in regard to
> performance and quite obvious performance regressions related to an
> increased number of libraries being loaded (i686 + x86_64) and slightly
> bloated code which sometimes cannot fit in the CPU cache. Call me old
> fashioned but I won't upgrade to x86_64 until most of the things that I run
> locally are available for x86_64 and that won't happen any time soon.

Don't upgrade *user* land. User land doesn't use the braindamage that is PAE.

Just run a 64-bit kernel. Keep all your 32-bit userland apps and libraries.

Trust me, that *will* be faster. PAE works really horribly badly,
because all your really important data structures like your inodes and
directory cache will all be in the low 1GB even if you have 16BG of
RAM.

Of course, I'd also like more people to run things that way just to
get more coverage of the whole "yes, we do all the compat stuff
correctly". So I have some other reasons to prefer people running
64-bit kernels with 32-bit user land. But PAE really is a disaster.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Tejun Heo
Hello, Linus.

On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
...
> (Also Tejun - maybe you can see what's up - maybe that error message
> tells you something)

Hmmm... all it says is that something went wrong on the PCI side.

> I'm not sure what's up with his machine, the disk doesn't seem to be
> anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:
> 
>   ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133
> 
> which doesn't strike me as odd.
> 
> Looking at the dmesg, it also looks like it's a pretty normal
> Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
> ID for the AHCI chip seems to be (INTEL, 0x1c02).
> 
> Any ideas? Anybody?

I wonder whether ahci is screwing up command / sg table setup in a way
that e.g. if there are too many segments the sg table overflows into
the neighboring one which is now being exposed by upper layer being
fixed to send down larger commands.  Looking into it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ceph: Avoid to propagate the invalid page point

2015-12-20 Thread Yan, Zheng

> On Dec 19, 2015, at 10:54, Minfei Huang  wrote:
> 
> The variant pagep will still get the invalid page point, although ceph
> fails in function ceph_update_writeable_page.
> 
> To fix this issue, Assigne the page to pagep until there is no failure
> in function ceph_update_writeable_page.
> 
> Signed-off-by: Minfei Huang 
> ---
> fs/ceph/addr.c | 1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index b7d218a..6491079 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1149,7 +1149,6 @@ static int ceph_write_begin(struct file *file, struct 
> address_space *mapping,
>   page = grab_cache_page_write_begin(mapping, index, 0);
>   if (!page)
>   return -ENOMEM;
> - *pagep = page;
> 
>   dout("write_begin file %p inode %p page %p %d~%d\n", file,
>inode, page, (int)pos, (int)len);

applied, thanks

Yan, Zheng

> -- 
> 2.6.3
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] thermal: Add Mediatek thermal controller support

2015-12-20 Thread Daniel Kurtz
Hi Sascha,

One nit below that can be fixed up later, or now if you don't plan to
spin this driver to
address Eduardo's feedback...

On Mon, Nov 30, 2015 at 7:42 PM, Sascha Hauer  wrote:
> This adds support for the Mediatek thermal controller found on MT8173
> and likely other SoCs.
> The controller is a bit special. It does not have its own ADC, instead
> it controls the on-SoC AUXADC via AHB bus accesses. For this reason
> we need the physical address of the AUXADC. Also it controls a mux
> using AHB bus accesses, so we need the APMIXEDSYS physical address aswell.
>
> Signed-off-by: Sascha Hauer 

[snip]

> +static int mtk_thermal_get_calibration_data(struct device *dev, struct 
> mtk_thermal *mt)
> +{
> +   struct nvmem_cell *cell;
> +   u32 *buf;
> +   size_t len;
> +   int i, ret = 0;
> +
> +   /* Start with default values */
> +   mt->adc_ge = 512;
> +   for (i = 0; i < MT8173_NUM_SENSORS; i++)
> +   mt->vts[i] = 260;
> +   mt->degc_cali = 40;
> +   mt->o_slope = 0;
> +
> +   cell = nvmem_cell_get(dev, "calibration-data");
> +   if (IS_ERR(cell)) {
> +   if (PTR_ERR(cell) == -EPROBE_DEFER)

It is useful to know why the thermal driver is being probe defered, so
I suggest here:
dev_warn(dev, "Waiting for calibration data.\n");

> +   return PTR_ERR(cell);
> +   return 0;
> +   }

Thanks,
-Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-21 08:21, Ming Lei wrote:

On Mon, Dec 21, 2015 at 10:25 AM, Artem S. Tashkinov wrote:

# cat
/sys/block/sda/queue/{max_hw_sectors_kb,max_sectors_kb,max_segments,max_segment_size}
32767
32767
168
65536


Looks it is fine, then maybe it is related with 
BIOVEC_PHYS_MERGEABLE(),

BIOVEC_SEG_BOUNDARY() or sort of thing, because dma_addr_t and
phys_addr_t turn to 64-bit with PAE, but 'unsigned long' and 'void *'
is still 32bit.

It was confirmed that there isn't the issue if PAE is disabled.

Dumping both sata/ahci hw sg table and bio's bvec might be helpful.


Um, sorry, what exact variables/files do you want to see? I'm not an 
expert in /sys.




On Mon, Dec 21, 2015 at 10:32 AM, Kent Overstreet wrote:


oy vey. WTF's been happening in blk-merge.c?

Theyy're not the same bug. The bug in your thread was introduced by 
Jens in
5014c311ba "block: fix bogus compiler warnings in blk-merge.c", where 
he screwed
up the bvprv handling - but that patch comes after the patch Artem 
bisected to.


blk_bio_segment_split() looks correct in b54ffb73ca.


Yes, that is why reverting 578270bfb(block: fix segment split) can make 
the

issue disappear, because 5014c311ba "block: fix bogus compiler
warnings in blk-merge.c" basically disables sg-merge and prevents the
issue from being
triggered.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Ming Lei
On Mon, Dec 21, 2015 at 10:25 AM, Artem S. Tashkinov  wrote:
> # cat
> /sys/block/sda/queue/{max_hw_sectors_kb,max_sectors_kb,max_segments,max_segment_size}
> 32767
> 32767
> 168
> 65536

Looks it is fine, then maybe it is related with BIOVEC_PHYS_MERGEABLE(),
BIOVEC_SEG_BOUNDARY() or sort of thing, because dma_addr_t and
phys_addr_t turn to 64-bit with PAE, but 'unsigned long' and 'void *'
is still 32bit.

It was confirmed that there isn't the issue if PAE is disabled.

Dumping both sata/ahci hw sg table and bio's bvec might be helpful.

On Mon, Dec 21, 2015 at 10:32 AM, Kent Overstreet
 wrote:
>
> oy vey. WTF's been happening in blk-merge.c?
>
> Theyy're not the same bug. The bug in your thread was introduced by Jens in
> 5014c311ba "block: fix bogus compiler warnings in blk-merge.c", where he 
> screwed
> up the bvprv handling - but that patch comes after the patch Artem bisected 
> to.
>
> blk_bio_segment_split() looks correct in b54ffb73ca.

Yes, that is why reverting 578270bfb(block: fix segment split) can make the
issue disappear, because 5014c311ba "block: fix bogus compiler
warnings in blk-merge.c" basically disables sg-merge and prevents the
issue from being
triggered.



Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] thermal: cpu_cooling: fix out of bounds access in time_in_idle

2015-12-20 Thread Viresh Kumar
On 19-12-15, 12:54, Javi Merino wrote:
> In __cpufreq_cooling_register() we allocate the arrays for time_in_idle
> and time_in_idle_timestamp to be as big as the number of cpus in this
> cpufreq device.  However, in get_load() we access this array using the
> cpu number as index, which can result in an out of bound access.
> 
> Index time_in_idle{,_timestamp} using the index in the cpufreq_device's
> allowed_cpus mask, as we do for the load_cpu array in
> cpufreq_get_requested_power()
> 
> Reported-by: Nicolas Boichat 
> Cc: Amit Daniel Kachhap 
> Cc: Viresh Kumar 
> Cc: Zhang Rui 
> Cc: Eduardo Valentin 
> Signed-off-by: Javi Merino 
> ---
>  drivers/thermal/cpu_cooling.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)

Acked-by: Viresh Kumar 

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blackfin-cpufreq: Change return type of cpu_set_cclk() to that of clk_set_rate()

2015-12-20 Thread Viresh Kumar
On 19-12-15, 09:23, SF Markus Elfring wrote:
> >> From: Markus Elfring 
> >> Date: Fri, 18 Dec 2015 19:43:27 +0100
> >>
> >> The return type "unsigned long" was used by the cpu_set_cclk() function
> >> while the type "int" is provided by the clk_set_rate() function.
> >> Let us make this usage consistent.
> >>
> >> This issue was detected by using the Coccinelle software.
> >>
> >> Signed-off-by: Markus Elfring 
> >> ---
> >>  drivers/cpufreq/blackfin-cpufreq.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/cpufreq/blackfin-cpufreq.c 
> >> b/drivers/cpufreq/blackfin-cpufreq.c
> >> index a9f8e5b..2a6f3ac 100644
> >> --- a/drivers/cpufreq/blackfin-cpufreq.c
> >> +++ b/drivers/cpufreq/blackfin-cpufreq.c
> >> @@ -112,7 +112,7 @@ static unsigned int bfin_getfreq_khz(unsigned int cpu)
> >>  }
> >>  
> >>  #ifdef CONFIG_BF60x
> >> -unsigned long cpu_set_cclk(int cpu, unsigned long new)
> >> +int cpu_set_cclk(int cpu, unsigned long new)
> >>  {
> >>struct clk *clk;
> > 
> > Acked-by: Viresh Kumar 
> 
> Thanks for your acceptance.
> 
> I would appreciate if another implementation detail can also be clarified 
> there.
> http://lxr.free-electrons.com/ident?v=4.3;i=cpu_set_cclk
> 
> * Do you want to reuse such a function in other modules?
> * Should it eventually marked as "static"?

This should be static, yeah.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] IRQ/Platform-MSI:Increase the maximum MSIs the MSI framework can support.

2015-12-20 Thread MaJun
From: Ma Jun 

The current MSI framework can only support 256 platform MSIs.

But on Hisilicon platform, some network related devices has about 500
wired interrupts.

To support these devices, we need a new maximum value more than 256.

Signed-off-by: Ma Jun 
---
 drivers/base/platform-msi.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index a203896..9c00d3f 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -24,7 +24,7 @@
 #include 
 #include 
 
-#define DEV_ID_SHIFT   24
+#define DEV_ID_SHIFT   22
 #define MAX_DEV_MSIS   (1 << (32 - DEV_ID_SHIFT))
 
 /*
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] theoretical race between memory hotplug and pfn iterator

2015-12-20 Thread Joonsoo Kim
Hello, memory-hotplug folks.

I found theoretical problems between memory hotplug and pfn iterator.
For example, pfn iterator works something like below.

for (pfn = zone_start_pfn; pfn < zone_end_pfn; pfn++) {
if (!pfn_valid(pfn))
continue;

page = pfn_to_page(pfn);
/* Do whatever we want */
}

Sequence of hotplug is something like below.

1) add memmap (after then, pfn_valid will return valid)
2) memmap_init_zone()

So, if pfn iterator runs between 1) and 2), it could access
uninitialized page information.

This problem could be solved by re-ordering initialization steps.

Hot-remove also has a problem. If memory is hot-removed after
pfn_valid() succeed in pfn iterator, access to page would cause NULL
deference because hot-remove frees corresponding memmap. There is no
guard against free in any pfn iterators.

This problem can be solved by inserting get_online_mems() in all pfn
iterators but this looks error-prone for future usage. Another idea is
that delaying free corresponding memmap until synchronization point such
as system suspend. It will guarantee that there is no running pfn
iterator. Do any have a better idea?

Btw, I tried to memory-hotremove with QEMU 2.5.5 but it didn't work. I
followed sequences in doc/memory-hotplug. Do you have any comment on this?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the clockevents tree with the tip tree

2015-12-20 Thread Stephen Rothwell
Hi Daniel,

Today's linux-next merge of the clockevents tree got a conflict in:

  drivers/clocksource/h8300_timer16.c

between commit:

  d33f250af4e6 ("clocksource/drivers/h8300: Use ioread / iowrite")

from the tip tree and commit:

  1ddca16cc5b3 ("clocksource/drivers/h8300: Use ioread / iowrite")

from the clockevents tree.

The former is just a fixed version of the latter, so I used that.

It looks like the clockevents tree needs to be cleaned up.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 5/5] hisilicon/dts: Add hi655x pmic dts node

2015-12-20 Thread chenfeng
Mark,

On 2015/12/19 1:58, Mark Brown wrote:
> On Thu, Dec 17, 2015 at 11:27:27AM +0800, chenfeng wrote:
> 
>>  +- regulator-vset-regs: Voltage set register offset.
>>  +- regulator-vset-mask: voltage set control mask.
>>  +- regulator-n-vol: The num of support voltages.
>>  +- regulator-vset-table: The table of support voltages.
> 
>>> Why is this in the binding?  This is a binding for a specific device,
>>> there is no point in putting all these data tables in the DT - it just
>>> bloats the DT and makes it harder for us to enhance our support for this
>>> device in the future.
> 
>> You mentioned in previous version,I I have some questions for it.
> 
>> This regulator-vset-regs etc are vendor specific describe. The hi655x PMIC
> 
> There's nothing vendor specific about the way this is written...
> 
>> is a series of chips. They all have this value, but the offset may be 
>> different.
>> And we can generate the dts file from excel which is defined by SOC.
> 
>> I think the dts is designed to distinguish different platform. If we hard 
>> code this
>> in files, it may be also different to use as common in next chip version.
> 
> If your tooling can generate DT files it can generate C code just as
> well and it seems unlikely you're going to be able to build new boards
> without being able to do firmware updates here.  Especially for the
> sorts of systems that use DT the set of scenarios where you're able to
> update the DT but not the kernel seems like it will be extremely
> limited.  I don't really buy the argument that there's any practical
> difference in the ability to update the kernel and DT and to the extent
> there is one it seems better to keep the ABI we have to support smaller
> by having the DT be minimal.
> 
> This also allows us to map things more efficiently than we can with just
> a table of voltages.  For example a good selection of the regulators in
> your example DT appear to be linear ranges and so should be mapped as
> such so we can do direct calcuations rather than having to iterate
> through a table to map voltages into selectors.  That gets especially
> serious for higher resolution regulators like most DCDCs (and modern
> LDOs for that matter).
> 
Thanks,
I see, I will change the table of voltages into driver.
like this,
static const unsigned int voltages[] = {
150, 180, 240, 250,
260, 270, 285, 300,
};

And there will be two open-code function for is-enable and disable in the 
regulator driver.
Since we need use the status and disable register on PM chip. Only enable reg 
in the regulator desc.

Do you agree with this?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] extcon: max3355: kill unneeded #include's

2015-12-20 Thread Chanwoo Choi
On 2015년 12월 21일 03:31, Sergei Shtylyov wrote:
> Some #include's weren't needed from the start, some are left overs from the
> earlier driver versions... Kill 'em all! :-)

I don't prefer following expression. I think you should write the description
to maintain the formal style on the next.
- "... Kill 'em all! :-) "

> 
> Signed-off-by: Sergei Shtylyov 
> 
> ---
> The patch is against the 'extcon-next' branch of the 'extcon.git' repo.
> If possible, please fold this patch into the main MAX3355 driver patch.
> 
>  drivers/extcon/extcon-max3355.c |5 -
>  1 file changed, 5 deletions(-)
> 
> Index: extcon/drivers/extcon/extcon-max3355.c
> ===
> --- extcon.orig/drivers/extcon/extcon-max3355.c
> +++ extcon/drivers/extcon/extcon-max3355.c
> @@ -11,14 +11,9 @@
>  
>  #include 
>  #include 
> -#include 
>  #include 
> -#include 
>  #include 
> -#include 
>  #include 
> -#include 
> -#include 
>  
>  struct max3355_data {
>   struct extcon_dev *edev;
> 

I combine it on original patch.

Thanks,
Chanwoo Choi



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] extcon: add Maxim MAX3355 driver

2015-12-20 Thread Chanwoo Choi
Hi,

On 2015년 12월 21일 02:15, Sergei Shtylyov wrote:
> Hello.
> 
> On 12/20/2015 05:31 PM, Chanwoo Choi wrote:
> 
>> This patch depend on GPIOLIB configuration as following:
>> I modified it with following diff and applied it.
>>
>> diff --git a/drivers/extcon/Kconfig b/drivers/extcon/Kconfig
>> index ba4db7d..3d89e60 100644
>> --- a/drivers/extcon/Kconfig
>> +++ b/drivers/extcon/Kconfig
>> @@ -54,6 +54,7 @@ config EXTCON_MAX14577
>>
>>   config EXTCON_MAX3355
>>  tristate "Maxim MAX3355 USB OTG EXTCON Support"
>> +   depends on GPIOLIB || COMPILE_TEST
> 
>If it won't compile w/o gpiolib, what's the use of COMIPLE_TEST?
>And no, it shouldn't depend on gpiolib. It has empty stubs for the case of 
> CONFIG_GPIOLIB=n. Obviously something is wrong with the GPIO headers, I'll 
> look into it.

Yes. When GPIOLIB is disabled, the build issue don't happen.
because include/linux/gpio/consumer.h implement the dummy function
for all gpio functions if CONFIG_GPIOLIB is disabled.

For correct operation of max3355, you should add the dependency 
to the extcon-max3355.c driver. This driver use the GPIO library
certainly.

COMPILE_TEST is used for just build test. You can see the detailed data[1].
[1] https://lkml.org/lkml/2013/5/22/155

Thanks,
Chanwoo Choi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] pinctrl: mediatek: Add Pinctrl/GPIO/EINT driver for mt2701

2015-12-20 Thread Daniel Kurtz
Hi Biao and Joe,

On Fri, Dec 18, 2015 at 11:13 PM, Yingjoe Chen
 wrote:
> On Fri, 2015-12-11 at 17:07 +0800, Biao Huang wrote:
>> Add mt2701 support using mediatek common pinctrl driver.
>> MT2701 have some special pins need an extra setting register
>> than other ICs, so adding this support to common code.
>>
>> Signed-off-by: Biao Huang 
>> ---
>>  drivers/pinctrl/mediatek/Kconfig  |6 +
>>  drivers/pinctrl/mediatek/Makefile |1 +
>>  drivers/pinctrl/mediatek/pinctrl-mt2701.c |  590 +++
>>  drivers/pinctrl/mediatek/pinctrl-mtk-common.c |   14 +
>>  drivers/pinctrl/mediatek/pinctrl-mtk-common.h |   12 +-
>>  drivers/pinctrl/mediatek/pinctrl-mtk-mt2701.h | 2323 
>> +
>>  6 files changed, 2945 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/pinctrl/mediatek/pinctrl-mt2701.c
>>  create mode 100644 drivers/pinctrl/mediatek/pinctrl-mtk-mt2701.h
>
> This patch looks good to me.
> Thanks
>
> Acked-by: Yingjoe Chen 

>
> Joe.C
>
>>
>> diff --git a/drivers/pinctrl/mediatek/Kconfig 
>> b/drivers/pinctrl/mediatek/Kconfig
>> index 02f6f92..13e9939 100644
>> --- a/drivers/pinctrl/mediatek/Kconfig
>> +++ b/drivers/pinctrl/mediatek/Kconfig
>> @@ -9,6 +9,12 @@ config PINCTRL_MTK_COMMON
>>   select OF_GPIO
>>
>>  # For ARMv7 SoCs
>> +config PINCTRL_MT2701
>> + bool "Mediatek MT2701 pin control" if COMPILE_TEST && !MACH_MT2701

This is 'bool', so can never be built as a module...

[snip...]

>> +module_init(mtk_pinctrl_init);

Yingjoe - you just ack'ed this change for the other MTK pinctrl.
So, let's apply it here too:

-module_init(mtk_pinctrl_init);
+arch_initcall(mtk_pinctrl_init);

>> +
>> +MODULE_LICENSE("GPL v2");
>> +MODULE_DESCRIPTION("MediaTek MT2701 Pinctrl Driver");
>> +MODULE_AUTHOR("Biao Huang ");

And remove these lines since this isn't a module.

-Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 4/4] gicv2m: acpi: Introducing GICv2m ACPI support

2015-12-20 Thread Rafael J. Wysocki
On Thursday, December 10, 2015 08:55:30 AM Suravee Suthikulpanit wrote:
> This patch introduces gicv2m_acpi_init(), which uses information
> in MADT GIC MSI frames structure to initialize GICv2m driver.
> It also exposes gicv2m_init() function, which simplifies callers
> to a single GICv2m init function.
> 
> Reviewed-by: Marc Zyngier 
> Tested-by: Duc Dang 
> Signed-off-by: Suravee Suthikulpanit 
> Signed-off-by: Hanjun Guo 

I don't see anything objectionable here, so please feel free to add my ACK
for the ACPI part to this.

> ---
>  drivers/irqchip/irq-gic-v2m.c   | 110 
> +++-
>  drivers/irqchip/irq-gic.c   |   6 ++-
>  include/linux/irqchip/arm-gic.h |   3 +-
>  3 files changed, 116 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
> index 779c390..7e2975d 100644
> --- a/drivers/irqchip/irq-gic-v2m.c
> +++ b/drivers/irqchip/irq-gic-v2m.c
> @@ -15,9 +15,11 @@
>  
>  #define pr_fmt(fmt) "GICv2m: " fmt
>  
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -138,6 +140,11 @@ static int gicv2m_irq_gic_domain_alloc(struct irq_domain 
> *domain,
>   fwspec.param[0] = 0;
>   fwspec.param[1] = hwirq - 32;
>   fwspec.param[2] = IRQ_TYPE_EDGE_RISING;
> + } else if (is_fwnode_irqchip(domain->parent->fwnode)) {
> + fwspec.fwnode = domain->parent->fwnode;
> + fwspec.param_count = 2;
> + fwspec.param[0] = hwirq;
> + fwspec.param[1] = IRQ_TYPE_EDGE_RISING;
>   } else {
>   return -EINVAL;
>   }
> @@ -255,6 +262,8 @@ static void gicv2m_teardown(void)
>   kfree(v2m->bm);
>   iounmap(v2m->base);
>   of_node_put(to_of_node(v2m->fwnode));
> + if (is_fwnode_irqchip(v2m->fwnode))
> + irq_domain_free_fwnode(v2m->fwnode);
>   kfree(v2m);
>   }
>  }
> @@ -373,9 +382,11 @@ static struct of_device_id gicv2m_device_id[] = {
>   {},
>  };
>  
> -int __init gicv2m_of_init(struct device_node *node, struct irq_domain 
> *parent)
> +static int __init gicv2m_of_init(struct fwnode_handle *parent_handle,
> +  struct irq_domain *parent)
>  {
>   int ret = 0;
> + struct device_node *node = to_of_node(parent_handle);
>   struct device_node *child;
>  
>   for (child = of_find_matching_node(node, gicv2m_device_id); child;
> @@ -411,3 +422,100 @@ int __init gicv2m_of_init(struct device_node *node, 
> struct irq_domain *parent)
>   gicv2m_teardown();
>   return ret;
>  }
> +
> +#ifdef CONFIG_ACPI
> +static int acpi_num_msi;
> +
> +static struct fwnode_handle *gicv2m_get_fwnode(struct device *dev)
> +{
> + struct v2m_data *data;
> +
> + if (WARN_ON(acpi_num_msi <= 0))
> + return NULL;
> +
> + /* We only return the fwnode of the first MSI frame. */
> + data = list_first_entry_or_null(_nodes, struct v2m_data, entry);
> + if (!data)
> + return NULL;
> +
> + return data->fwnode;
> +}
> +
> +static int __init
> +acpi_parse_madt_msi(struct acpi_subtable_header *header,
> + const unsigned long end)
> +{
> + int ret;
> + struct resource res;
> + u32 spi_start = 0, nr_spis = 0;
> + struct acpi_madt_generic_msi_frame *m;
> + struct fwnode_handle *fwnode;
> +
> + m = (struct acpi_madt_generic_msi_frame *)header;
> + if (BAD_MADT_ENTRY(m, end))
> + return -EINVAL;
> +
> + res.start = m->base_address;
> + res.end = m->base_address + SZ_4K;
> +
> + if (m->flags & ACPI_MADT_OVERRIDE_SPI_VALUES) {
> + spi_start = m->spi_base;
> + nr_spis = m->spi_count;
> +
> + pr_info("ACPI overriding V2M MSI_TYPER (base:%u, num:%u)\n",
> + spi_start, nr_spis);
> + }
> +
> + fwnode = irq_domain_alloc_fwnode((void *)m->base_address);
> + if (!fwnode) {
> + pr_err("Unable to allocate GICv2m domain token\n");
> + return -EINVAL;
> + }
> +
> + ret = gicv2m_init_one(fwnode, spi_start, nr_spis, );
> + if (ret)
> + irq_domain_free_fwnode(fwnode);
> +
> + return ret;
> +}
> +
> +static int __init gicv2m_acpi_init(struct irq_domain *parent)
> +{
> + int ret;
> +
> + if (acpi_num_msi > 0)
> + return 0;
> +
> + acpi_num_msi = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_MSI_FRAME,
> +   acpi_parse_madt_msi, 0);
> +
> + if (acpi_num_msi <= 0)
> + goto err_out;
> +
> + ret = gicv2m_allocate_domains(parent);
> + if (ret)
> + goto err_out;
> +
> + pci_msi_register_fwnode_provider(_get_fwnode);
> +
> + return 0;
> +
> +err_out:
> + gicv2m_teardown();
> + return -EINVAL;
> +}
> +#else /* CONFIG_ACPI */
> +static int __init gicv2m_acpi_init(struct 

Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Kent Overstreet
On Mon, Dec 21, 2015 at 06:50:21AM +0500, Artem S. Tashkinov wrote:
> On 2015-12-21 06:38, Ming Lei wrote:
> >On Mon, Dec 21, 2015 at 1:51 AM, Linus Torvalds wrote:
> >>Kent, Jens, Christoph et al,
> >> please see this bugzilla:
> >>
> >>  https://bugzilla.kernel.org/show_bug.cgi?id=109661
> >>
> >>where Artem Tashkinov bisected his problems with 4.3 down to commit
> >>b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> >>signed off on.
> >>
> >>(Also Tejun - maybe you can see what's up - maybe that error message
> >>tells you something)
> >>
> >>I'm not sure what's up with his machine, the disk doesn't seem to be
> >>anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:
> >>
> >>  ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133
> >>
> >>which doesn't strike me as odd.
> >>
> >>Looking at the dmesg, it also looks like it's a pretty normal
> >>Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
> >>ID for the AHCI chip seems to be (INTEL, 0x1c02).
> >>
> >>Any ideas? Anybody?
> >
> >BTW, I have posted very similar issue in the link:
> >
> >http://marc.info/?l=linux-ide=145066119623811=2
> >
> >Artem, I noticed from bugzillar that the hardware is i386, just
> >wondering if PAE is enabled?  If yes, I am more confident
> >that both the two kinds of report are similar or same.
> >
> 
> Yes, I'm on i686 with PAE (16GB of RAM here) - it's specifically mentioned
> in the corresponding bug report.
> 
> P.S. I know Linus doesn't condone PAE but I still find it more preferrable
> than running a mixed environment with almost zero benefit in regard to
> performance and quite obvious performance regressions related to an
> increased number of libraries being loaded (i686 + x86_64) and slightly
> bloated code which sometimes cannot fit in the CPU cache. Call me old
> fashioned but I won't upgrade to x86_64 until most of the things that I run
> locally are available for x86_64 and that won't happen any time soon.

oy vey. WTF's been happening in blk-merge.c?

Theyy're not the same bug. The bug in your thread was introduced by Jens in
5014c311ba "block: fix bogus compiler warnings in blk-merge.c", where he screwed
up the bvprv handling - but that patch comes after the patch Artem bisected to.

blk_bio_segment_split() looks correct in b54ffb73ca. 

What we need to do is:
 in the _driver_, immediately before handing the sglist off to the device, walk
 the sglist and verify it obeys all the restrictions for that particular device
 - and if it's not, print out exactly what we screwed up.

I don't know where that code lives in the ahci driver, and more importantly I
don't know where the dma restrictions come from, but if someone who knows the
driver code can walk me through it I'll write the patch.

--

Also - Ming, Christoph, anyone else who might be working on this stuff in the
future:

The way all the queue limits stuff works is still way too fragile; this has been
a recurring source of bugs. There's way too many different restrictions
different devices need, and it's easy for a driver to specify the restrictions
incorrectly in a way that just happens to work, but for the wrong reasons - e.g.
"I can't handle more than x segments, but saying I can't handle more than x
sectors happens to work for now because of some other bug in the upper layers" -
and then when we have to debug that later, we're screwed.

My intent when I was working on this was to eventually push the implementation
of the limits down as much as possible to the actual drivers - i.e. there the
limitations come from, so the driver can say, for example:

"ok, my device can only do scatter/gather dma to max 20 different addresses, so
I'll allocate sglists with 20 entries, and it doesn't matter if the bio or
request or whatever is bigger because when I call blk_rq_map_sg() it's just
going to map as much of the request as will fit in a given sglist and requests
will get processed incrementally until they're finished - and if a particular sg
entry can only be a particular size, or has alignment restrictions or whatever,
I'll just pass that directly to blk_rq_map_sg()"

so that the driver is ideally specifying _only_ its real restrictions, and
they're being specified in the code exactly where they're being used.

---

Basically, blk_queue_split() was only meant to be an interim solution, so I'd
suggest that instead of doing performance optimizations on that codepath a
better use of time and effort would be to work towards ripping it out entirely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 1/4] acpi: pci: Setup MSI domain for ACPI based pci devices

2015-12-20 Thread Rafael J. Wysocki
On Thursday, December 10, 2015 08:55:27 AM Suravee Suthikulpanit wrote:
> This patch introduces pci_msi_register_fwnode_provider() for irqchip
> to register a callback, to provide a way to determine appropriate MSI
> domain for a pci device.
> 
> It also introduces pci_host_bridge_acpi_msi_domain(), which returns
> the MSI domain of the specified PCI host bridge with DOMAIN_BUS_PCI_MSI
> bus token. Then, it is assigned to pci device.
> 
> Reviewed-by: Marc Zyngier 
> Cc: Bjorn Helgaas 
> Cc: Rafael J. Wysocki 
> Signed-off-by: Suravee Suthikulpanit 

Acked-by: Rafael J. Wysocki 

> ---
>  drivers/pci/pci-acpi.c| 42 ++
>  drivers/pci/probe.c   |  2 ++
>  include/linux/irqdomain.h |  5 +
>  include/linux/pci.h   | 10 ++
>  4 files changed, 59 insertions(+)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index a32ba75..d3f32d6 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -9,7 +9,9 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -689,6 +691,46 @@ static struct acpi_bus_type acpi_pci_bus = {
>   .cleanup = pci_acpi_cleanup,
>  };
>  
> +
> +static struct fwnode_handle *(*pci_msi_get_fwnode_cb)(struct device *dev);
> +
> +/**
> + * pci_msi_register_fwnode_provider - Register callback to retrieve fwnode
> + * @fn:   Callback matching a device to a fwnode that identifies a PCI
> + *MSI domain.
> + *
> + * This should be called by irqchip driver, which is the parent of
> + * the MSI domain to provide callback interface to query fwnode.
> + */
> +void
> +pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device 
> *))
> +{
> + pci_msi_get_fwnode_cb = fn;
> +}
> +
> +/**
> + * pci_host_bridge_acpi_msi_domain - Retrieve MSI domain of a PCI host bridge
> + * @bus:  The PCI host bridge bus.
> + *
> + * This function uses the callback function registered by
> + * pci_msi_register_fwnode_provider() to retrieve the irq_domain with
> + * type DOMAIN_BUS_PCI_MSI of the specified host bridge bus.
> + * This returns NULL on error or when the domain is not found.
> + */
> +struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus)
> +{
> + struct fwnode_handle *fwnode;
> +
> + if (!pci_msi_get_fwnode_cb)
> + return NULL;
> +
> + fwnode = pci_msi_get_fwnode_cb(>dev);
> + if (!fwnode)
> + return NULL;
> +
> + return irq_find_matching_fwnode(fwnode, DOMAIN_BUS_PCI_MSI);
> +}
> +
>  static int __init acpi_pci_init(void)
>  {
>   int ret;
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index edb1984..553a029 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -672,6 +672,8 @@ static struct irq_domain 
> *pci_host_bridge_msi_domain(struct pci_bus *bus)
>* should be called from here.
>*/
>   d = pci_host_bridge_of_msi_domain(bus);
> + if (!d)
> + d = pci_host_bridge_acpi_msi_domain(bus);
>  
>   return d;
>  }
> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
> index d5e5c5b..a06feda 100644
> --- a/include/linux/irqdomain.h
> +++ b/include/linux/irqdomain.h
> @@ -410,6 +410,11 @@ static inline bool irq_domain_is_hierarchy(struct 
> irq_domain *domain)
>  static inline void irq_dispose_mapping(unsigned int virq) { }
>  static inline void irq_domain_activate_irq(struct irq_data *data) { }
>  static inline void irq_domain_deactivate_irq(struct irq_data *data) { }
> +static inline struct irq_domain *irq_find_matching_fwnode(
> + struct fwnode_handle *fwnode, enum irq_domain_bus_token bus_token)
> +{
> + return NULL;
> +}
>  #endif /* !CONFIG_IRQ_DOMAIN */
>  
>  #endif /* _LINUX_IRQDOMAIN_H */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 6ae25aa..d86378c 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1946,6 +1946,16 @@ static inline struct irq_domain *
>  pci_host_bridge_of_msi_domain(struct pci_bus *bus) { return NULL; }
>  #endif  /* CONFIG_OF */
>  
> +#ifdef CONFIG_ACPI
> +struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus);
> +
> +void
> +pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device 
> *));
> +#else
> +static inline struct irq_domain *
> +pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; }
> +#endif
> +
>  #ifdef CONFIG_EEH
>  static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev)
>  {
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-21 07:18, Ming Lei wrote:

On Mon, Dec 21, 2015 at 9:50 AM, Artem S. Tashkinov wrote:

BTW, I have posted very similar issue in the link:

http://marc.info/?l=linux-ide=145066119623811=2

Artem, I noticed from bugzillar that the hardware is i386, just
wondering if PAE is enabled?  If yes, I am more confident
that both the two kinds of report are similar or same.



Yes, I'm on i686 with PAE (16GB of RAM here) - it's specifically 
mentioned

in the corresponding bug report.


OK, could you dump value of the following files under 
/sys/block/sdN/queue/ ?


max_hw_sectors_kb
max_sectors_kb
max_segments
max_segment_size

'sdN' is the faulted disk name.



# cat 
/sys/block/sda/queue/{max_hw_sectors_kb,max_sectors_kb,max_segments,max_segment_size}

32767
32767
168
65536
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] perf test 21("Test object code reading") failure on ARM64

2015-12-20 Thread xiakaixu
于 2015/12/20 8:25, Jan Stancek 写道:
> On Sat, Dec 19, 2015 at 11:04:21AM +0800, xiakaixu wrote:
>>
>...
>>>
>>> Hi,
>>>
>>> What is your objdump version?
>>
>> Hi,
>>
>> Sorry for the late reply.
>>
>> # objdump --version
>> GNU objdump (GNU Binutils) 2.25.
>>
>> I am sure that the system is Little endian.
>>>
> 
> I have attached a patch if you care to try it with your setup.
> If it still fails, output from -v and last objdump command output
> would be helpful.

Hi,

After applying this patch, the perf test case passed.

# perf test 21
21: Test object code reading : (no vmlinux) Ok

Thanks!
> 
> Regards,
> Jan
> 


-- 
Regards
Kaixu Xia

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Ming Lei
On Mon, Dec 21, 2015 at 9:50 AM, Artem S. Tashkinov  wrote:
>> BTW, I have posted very similar issue in the link:
>>
>> http://marc.info/?l=linux-ide=145066119623811=2
>>
>> Artem, I noticed from bugzillar that the hardware is i386, just
>> wondering if PAE is enabled?  If yes, I am more confident
>> that both the two kinds of report are similar or same.
>>
>
> Yes, I'm on i686 with PAE (16GB of RAM here) - it's specifically mentioned
> in the corresponding bug report.

OK, could you dump value of the following files under /sys/block/sdN/queue/ ?

max_hw_sectors_kb
max_sectors_kb
max_segments
max_segment_size

'sdN' is the faulted disk name.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-20 Thread Yang Zhang

On 2015/12/21 9:50, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Monday, December 21, 2015 9:46 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
priority interrupts

On 2015/12/16 9:37, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.

Signed-off-by: Feng Wu 
---
   arch/x86/kvm/irq_comm.c | 27 ++-
   arch/x86/kvm/lapic.c| 57

-

   arch/x86/kvm/lapic.h|  2 ++
   arch/x86/kvm/x86.c  |  9 
   arch/x86/kvm/x86.h  |  1 +
   5 files changed, 81 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 84b96d3..c8c5f61 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -32,6 +32,7 @@
   #include "ioapic.h"

   #include "lapic.h"
+#include "x86.h"

   static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
   struct kvm *kvm, int irq_source_id, int level,
@@ -53,8 +54,10 @@ static int kvm_set_ioapic_irq(struct

kvm_kernel_irq_routing_entry *e,

   int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map)
   {
-   int i, r = -1;
+   int i, r = -1, idx = 0;
struct kvm_vcpu *vcpu, *lowest = NULL;
+   unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
+   unsigned int dest_vcpus = 0;

if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
kvm_lowest_prio_delivery(irq)) {
@@ -65,6 +68,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct

kvm_lapic *src,

if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, , dest_map))
return r;

+   memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
+
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!kvm_apic_present(vcpu))
continue;
@@ -78,13 +83,25 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct

kvm_lapic *src,

r = 0;
r += kvm_apic_set_irq(vcpu, irq, dest_map);
} else if (kvm_lapic_enabled(vcpu)) {
-   if (!lowest)
-   lowest = vcpu;
-   else if (kvm_apic_compare_prio(vcpu, lowest) < 0)
-   lowest = vcpu;
+   if (!kvm_vector_hashing_enabled()) {
+   if (!lowest)
+   lowest = vcpu;
+   else if (kvm_apic_compare_prio(vcpu, lowest) <

0)

+   lowest = vcpu;
+   } else {
+   __set_bit(vcpu->vcpu_id, dest_vcpu_bitmap);
+   dest_vcpus++;
+   }
}
}

+   if (dest_vcpus != 0) {
+   idx = kvm_vector_2_index(irq->vector, dest_vcpus,
+dest_vcpu_bitmap, KVM_MAX_VCPUS);
+
+   lowest = kvm_get_vcpu(kvm, idx - 1);
+   }
+
if (lowest)
r = kvm_apic_set_irq(lowest, irq, dest_map);

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ecd4ea1..e29001f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -678,6 +678,22 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu,

struct kvm_lapic *source,

}
   }

+int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
+  const unsigned long *bitmap, u32 bitmap_size)
+{
+   u32 mod;
+   int i, idx = 0;
+
+   mod = vector % dest_vcpus;
+
+   for (i = 0; i <= mod; i++) {
+   idx = find_next_bit(bitmap, bitmap_size, idx) + 1;
+   BUG_ON(idx > bitmap_size);
+   }
+
+   return idx;
+}
+
   bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
   {
@@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm

*kvm, struct kvm_lapic *src,

dst = map->logical_map[cid];

if (kvm_lowest_prio_delivery(irq)) {
-   int l = -1;
-   for_each_set_bit(i, , 16) {
-   if (!dst[i])
-   continue;
-   if (l < 0)
-   l = i;
-   else if (kvm_apic_compare_prio(dst[i]->vcpu,

dst[l]->vcpu) < 0)

-   l = i;
+   if (!kvm_vector_hashing_enabled()) {
+   

Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-20 Thread Yang Zhang

On 2015/12/21 9:55, Wu, Feng wrote:




-Original Message-
From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
ow...@vger.kernel.org] On Behalf Of Yang Zhang
Sent: Monday, December 21, 2015 9:50 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
posted-interrupts

On 2015/12/16 9:37, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts for
VT-d posted-interrupts.

Signed-off-by: Feng Wu 
---
   arch/x86/kvm/lapic.c | 67



   arch/x86/kvm/lapic.h |  2 ++
   arch/x86/kvm/vmx.c   | 12 --
   3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e29001f..d4f2c8f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -854,6 +854,73 @@ out:
   }

   /*
+ * This routine handles lowest-priority interrupts using vector-hashing
+ * mechanism. As an example, modern Intel CPUs use this method to handle
+ * lowest-priority interrupts.
+ *
+ * Here is the details about the vector-hashing mechanism:
+ * 1. For lowest-priority interrupts, store all the possible destination
+ *vCPUs in an array.
+ * 2. Use "guest vector % max number of destination vCPUs" to find the right
+ *destination vCPU in the array for the lowest-priority interrupt.
+ */
+struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
+ struct kvm_lapic_irq *irq)
+{
+   struct kvm_apic_map *map;
+   struct kvm_vcpu *vcpu = NULL;
+
+   if (irq->shorthand)
+   return NULL;
+
+   rcu_read_lock();
+   map = rcu_dereference(kvm->arch.apic_map);
+
+   if (!map)
+   goto out;
+
+   if ((irq->dest_mode != APIC_DEST_PHYSICAL) &&
+   kvm_lowest_prio_delivery(irq)) {
+   u16 cid;
+   int i, idx = 0;
+   unsigned long bitmap = 1;
+   unsigned int dest_vcpus = 0;
+   struct kvm_lapic **dst = NULL;
+
+
+   if (!kvm_apic_logical_map_valid(map))
+   goto out;
+
+   apic_logical_id(map, irq->dest_id, , (u16 *));
+
+   if (cid >= ARRAY_SIZE(map->logical_map))
+   goto out;
+
+   dst = map->logical_map[cid];
+
+   for_each_set_bit(i, , 16) {
+   if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) {
+   clear_bit(i, );
+   continue;
+   }
+   }
+
+   dest_vcpus = hweight16(bitmap);
+
+   if (dest_vcpus != 0) {
+   idx = kvm_vector_2_index(irq->vector, dest_vcpus,
+, 16);
+   vcpu = dst[idx-1]->vcpu;
+   }
+   }
+
+out:
+   rcu_read_unlock();
+   return vcpu;
+}
+EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
+
+/*
* Add a pending IRQ into lapic.
* Return 1 if successfully added and 0 if discarded.
*/
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6890ef0..52bffce 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,

struct kvm_lapic_irq *irq,

struct kvm_vcpu **dest_vcpu);
   int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
   const unsigned long *bitmap, u32 bitmap_size);
+struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
+ struct kvm_lapic_irq *irq);
   #endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5eb56ed..3f89189 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm *kvm,

unsigned int host_irq,

 */

kvm_set_msi_irq(e, );
-   if (!kvm_intr_is_single_vcpu(kvm, , ))
-   continue;
+
+   if (!kvm_intr_is_single_vcpu(kvm, , )) {
+   if (!kvm_vector_hashing_enabled() ||
+   irq.delivery_mode !=

APIC_DM_LOWEST)

+   continue;
+
+   vcpu = kvm_intr_vector_hashing_dest(kvm, );
+   if (!vcpu)
+   continue;
+   }


I am a little confused with the 'continue'. If the destination is not
single vcpu, shouldn't we rollback to use non-PI mode?


Here is the logic:
- If it is single destination, we will use PI no matter it is fixed or 
lowest-priority.
- If it is not single destination:
a) It is fixed, we will use non-PI
b) It is lowest-priority and vector-hashing is enabled, we will use PI
c) otherwise, use 

RE: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-20 Thread Wu, Feng


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Yang Zhang
> Sent: Monday, December 21, 2015 9:50 AM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
> posted-interrupts
> 
> On 2015/12/16 9:37, Feng Wu wrote:
> > Use vector-hashing to deliver lowest-priority interrupts for
> > VT-d posted-interrupts.
> >
> > Signed-off-by: Feng Wu 
> > ---
> >   arch/x86/kvm/lapic.c | 67
> 
> >   arch/x86/kvm/lapic.h |  2 ++
> >   arch/x86/kvm/vmx.c   | 12 --
> >   3 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index e29001f..d4f2c8f 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -854,6 +854,73 @@ out:
> >   }
> >
> >   /*
> > + * This routine handles lowest-priority interrupts using vector-hashing
> > + * mechanism. As an example, modern Intel CPUs use this method to handle
> > + * lowest-priority interrupts.
> > + *
> > + * Here is the details about the vector-hashing mechanism:
> > + * 1. For lowest-priority interrupts, store all the possible destination
> > + *vCPUs in an array.
> > + * 2. Use "guest vector % max number of destination vCPUs" to find the 
> > right
> > + *destination vCPU in the array for the lowest-priority interrupt.
> > + */
> > +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
> > + struct kvm_lapic_irq *irq)
> > +{
> > +   struct kvm_apic_map *map;
> > +   struct kvm_vcpu *vcpu = NULL;
> > +
> > +   if (irq->shorthand)
> > +   return NULL;
> > +
> > +   rcu_read_lock();
> > +   map = rcu_dereference(kvm->arch.apic_map);
> > +
> > +   if (!map)
> > +   goto out;
> > +
> > +   if ((irq->dest_mode != APIC_DEST_PHYSICAL) &&
> > +   kvm_lowest_prio_delivery(irq)) {
> > +   u16 cid;
> > +   int i, idx = 0;
> > +   unsigned long bitmap = 1;
> > +   unsigned int dest_vcpus = 0;
> > +   struct kvm_lapic **dst = NULL;
> > +
> > +
> > +   if (!kvm_apic_logical_map_valid(map))
> > +   goto out;
> > +
> > +   apic_logical_id(map, irq->dest_id, , (u16 *));
> > +
> > +   if (cid >= ARRAY_SIZE(map->logical_map))
> > +   goto out;
> > +
> > +   dst = map->logical_map[cid];
> > +
> > +   for_each_set_bit(i, , 16) {
> > +   if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) {
> > +   clear_bit(i, );
> > +   continue;
> > +   }
> > +   }
> > +
> > +   dest_vcpus = hweight16(bitmap);
> > +
> > +   if (dest_vcpus != 0) {
> > +   idx = kvm_vector_2_index(irq->vector, dest_vcpus,
> > +, 16);
> > +   vcpu = dst[idx-1]->vcpu;
> > +   }
> > +   }
> > +
> > +out:
> > +   rcu_read_unlock();
> > +   return vcpu;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
> > +
> > +/*
> >* Add a pending IRQ into lapic.
> >* Return 1 if successfully added and 0 if discarded.
> >*/
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> > index 6890ef0..52bffce 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
> struct kvm_lapic_irq *irq,
> > struct kvm_vcpu **dest_vcpu);
> >   int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
> >const unsigned long *bitmap, u32 bitmap_size);
> > +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
> > + struct kvm_lapic_irq *irq);
> >   #endif
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 5eb56ed..3f89189 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm *kvm,
> unsigned int host_irq,
> >  */
> >
> > kvm_set_msi_irq(e, );
> > -   if (!kvm_intr_is_single_vcpu(kvm, , ))
> > -   continue;
> > +
> > +   if (!kvm_intr_is_single_vcpu(kvm, , )) {
> > +   if (!kvm_vector_hashing_enabled() ||
> > +   irq.delivery_mode !=
> APIC_DM_LOWEST)
> > +   continue;
> > +
> > +   vcpu = kvm_intr_vector_hashing_dest(kvm, );
> > +   if (!vcpu)
> > +   continue;
> > +   }
> 
> I am a little confused with the 'continue'. If the destination is not
> single vcpu, shouldn't we rollback to use non-PI mode?

Here is the logic:
- If it is single destination, 

[PATCH v1.1 4/6] drm/rockchip: vop: spilt scale regsters

2015-12-20 Thread Mark Yao
There are two version scale control register found on vop,
scale full version found on rk3288, support extension registers.
and scale little version found on rk3036, only support common scale.

Signed-off-by: Mark Yao 
---
Changes in v1.1
- fix scale calculation mistake.

 drivers/gpu/drm/rockchip/rockchip_drm_vop.c |   46 ++-
 drivers/gpu/drm/rockchip/rockchip_drm_vop.h |   14 ++--
 drivers/gpu/drm/rockchip/rockchip_vop_reg.c |5 ++-
 3 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index bbb781c..d83bf87 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -50,6 +50,8 @@
REG_SET(x, win->base, win->phy->name, v, RELAXED)
 #define VOP_SCL_SET(x, win, name, v) \
REG_SET(x, win->base, win->phy->scl->name, v, RELAXED)
+#define VOP_SCL_SET_EXT(x, win, name, v) \
+   REG_SET(x, win->base, win->phy->scl->ext->name, v, RELAXED)
 #define VOP_CTRL_SET(x, name, v) \
REG_SET(x, 0, (x)->data->ctrl->name, v, NORMAL)
 
@@ -313,6 +315,20 @@ static void scl_vop_cal_scl_fac(struct vop *vop, const 
struct vop_win_data *win,
return;
}
 
+   if (!win->phy->scl->ext) {
+   VOP_SCL_SET(vop, win, scale_yrgb_x,
+   scl_cal_scale2(src_w, dst_w));
+   VOP_SCL_SET(vop, win, scale_yrgb_y,
+   scl_cal_scale2(src_h, dst_h));
+   if (is_yuv) {
+   VOP_SCL_SET(vop, win, scale_cbcr_x,
+   scl_cal_scale2(src_w, dst_w));
+   VOP_SCL_SET(vop, win, scale_cbcr_y,
+   scl_cal_scale2(src_h, dst_h));
+   }
+   return;
+   }
+
yrgb_hor_scl_mode = scl_get_scl_mode(src_w, dst_w);
yrgb_ver_scl_mode = scl_get_scl_mode(src_h, dst_h);
 
@@ -330,7 +346,7 @@ static void scl_vop_cal_scl_fac(struct vop *vop, const 
struct vop_win_data *win,
lb_mode = scl_vop_cal_lb_mode(src_w, false);
}
 
-   VOP_SCL_SET(vop, win, lb_mode, lb_mode);
+   VOP_SCL_SET_EXT(vop, win, lb_mode, lb_mode);
if (lb_mode == LB_RGB_3840X2) {
if (yrgb_ver_scl_mode != SCALE_NONE) {
DRM_ERROR("ERROR : not allow yrgb ver scale\n");
@@ -354,14 +370,14 @@ static void scl_vop_cal_scl_fac(struct vop *vop, const 
struct vop_win_data *win,
false, vsu_mode, );
VOP_SCL_SET(vop, win, scale_yrgb_y, val);
 
-   VOP_SCL_SET(vop, win, vsd_yrgb_gt4, vskiplines == 4);
-   VOP_SCL_SET(vop, win, vsd_yrgb_gt2, vskiplines == 2);
+   VOP_SCL_SET_EXT(vop, win, vsd_yrgb_gt4, vskiplines == 4);
+   VOP_SCL_SET_EXT(vop, win, vsd_yrgb_gt2, vskiplines == 2);
 
-   VOP_SCL_SET(vop, win, yrgb_hor_scl_mode, yrgb_hor_scl_mode);
-   VOP_SCL_SET(vop, win, yrgb_ver_scl_mode, yrgb_ver_scl_mode);
-   VOP_SCL_SET(vop, win, yrgb_hsd_mode, SCALE_DOWN_BIL);
-   VOP_SCL_SET(vop, win, yrgb_vsd_mode, SCALE_DOWN_BIL);
-   VOP_SCL_SET(vop, win, yrgb_vsu_mode, vsu_mode);
+   VOP_SCL_SET_EXT(vop, win, yrgb_hor_scl_mode, yrgb_hor_scl_mode);
+   VOP_SCL_SET_EXT(vop, win, yrgb_ver_scl_mode, yrgb_ver_scl_mode);
+   VOP_SCL_SET_EXT(vop, win, yrgb_hsd_mode, SCALE_DOWN_BIL);
+   VOP_SCL_SET_EXT(vop, win, yrgb_vsd_mode, SCALE_DOWN_BIL);
+   VOP_SCL_SET_EXT(vop, win, yrgb_vsu_mode, vsu_mode);
if (is_yuv) {
val = scl_vop_cal_scale(cbcr_hor_scl_mode, cbcr_src_w,
dst_w, true, 0, NULL);
@@ -370,13 +386,13 @@ static void scl_vop_cal_scl_fac(struct vop *vop, const 
struct vop_win_data *win,
dst_h, false, vsu_mode, );
VOP_SCL_SET(vop, win, scale_cbcr_y, val);
 
-   VOP_SCL_SET(vop, win, vsd_cbcr_gt4, vskiplines == 4);
-   VOP_SCL_SET(vop, win, vsd_cbcr_gt2, vskiplines == 2);
-   VOP_SCL_SET(vop, win, cbcr_hor_scl_mode, cbcr_hor_scl_mode);
-   VOP_SCL_SET(vop, win, cbcr_ver_scl_mode, cbcr_ver_scl_mode);
-   VOP_SCL_SET(vop, win, cbcr_hsd_mode, SCALE_DOWN_BIL);
-   VOP_SCL_SET(vop, win, cbcr_vsd_mode, SCALE_DOWN_BIL);
-   VOP_SCL_SET(vop, win, cbcr_vsu_mode, vsu_mode);
+   VOP_SCL_SET_EXT(vop, win, vsd_cbcr_gt4, vskiplines == 4);
+   VOP_SCL_SET_EXT(vop, win, vsd_cbcr_gt2, vskiplines == 2);
+   VOP_SCL_SET_EXT(vop, win, cbcr_hor_scl_mode, cbcr_hor_scl_mode);
+   VOP_SCL_SET_EXT(vop, win, cbcr_ver_scl_mode, cbcr_ver_scl_mode);
+   VOP_SCL_SET_EXT(vop, win, cbcr_hsd_mode, SCALE_DOWN_BIL);
+   VOP_SCL_SET_EXT(vop, win, cbcr_vsd_mode, SCALE_DOWN_BIL);
+   VOP_SCL_SET_EXT(vop, win, 

RE: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-20 Thread Wu, Feng


> -Original Message-
> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> Sent: Monday, December 21, 2015 9:46 AM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> On 2015/12/16 9:37, Feng Wu wrote:
> > Use vector-hashing to deliver lowest-priority interrupts, As an
> > example, modern Intel CPUs in server platform use this method to
> > handle lowest-priority interrupts.
> >
> > Signed-off-by: Feng Wu 
> > ---
> >   arch/x86/kvm/irq_comm.c | 27 ++-
> >   arch/x86/kvm/lapic.c| 57
> -
> >   arch/x86/kvm/lapic.h|  2 ++
> >   arch/x86/kvm/x86.c  |  9 
> >   arch/x86/kvm/x86.h  |  1 +
> >   5 files changed, 81 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> > index 84b96d3..c8c5f61 100644
> > --- a/arch/x86/kvm/irq_comm.c
> > +++ b/arch/x86/kvm/irq_comm.c
> > @@ -32,6 +32,7 @@
> >   #include "ioapic.h"
> >
> >   #include "lapic.h"
> > +#include "x86.h"
> >
> >   static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
> >struct kvm *kvm, int irq_source_id, int level,
> > @@ -53,8 +54,10 @@ static int kvm_set_ioapic_irq(struct
> kvm_kernel_irq_routing_entry *e,
> >   int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> > struct kvm_lapic_irq *irq, unsigned long *dest_map)
> >   {
> > -   int i, r = -1;
> > +   int i, r = -1, idx = 0;
> > struct kvm_vcpu *vcpu, *lowest = NULL;
> > +   unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
> > +   unsigned int dest_vcpus = 0;
> >
> > if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
> > kvm_lowest_prio_delivery(irq)) {
> > @@ -65,6 +68,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct
> kvm_lapic *src,
> > if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, , dest_map))
> > return r;
> >
> > +   memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
> > +
> > kvm_for_each_vcpu(i, vcpu, kvm) {
> > if (!kvm_apic_present(vcpu))
> > continue;
> > @@ -78,13 +83,25 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct
> kvm_lapic *src,
> > r = 0;
> > r += kvm_apic_set_irq(vcpu, irq, dest_map);
> > } else if (kvm_lapic_enabled(vcpu)) {
> > -   if (!lowest)
> > -   lowest = vcpu;
> > -   else if (kvm_apic_compare_prio(vcpu, lowest) < 0)
> > -   lowest = vcpu;
> > +   if (!kvm_vector_hashing_enabled()) {
> > +   if (!lowest)
> > +   lowest = vcpu;
> > +   else if (kvm_apic_compare_prio(vcpu, lowest) <
> 0)
> > +   lowest = vcpu;
> > +   } else {
> > +   __set_bit(vcpu->vcpu_id, dest_vcpu_bitmap);
> > +   dest_vcpus++;
> > +   }
> > }
> > }
> >
> > +   if (dest_vcpus != 0) {
> > +   idx = kvm_vector_2_index(irq->vector, dest_vcpus,
> > +dest_vcpu_bitmap, KVM_MAX_VCPUS);
> > +
> > +   lowest = kvm_get_vcpu(kvm, idx - 1);
> > +   }
> > +
> > if (lowest)
> > r = kvm_apic_set_irq(lowest, irq, dest_map);
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index ecd4ea1..e29001f 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -678,6 +678,22 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu,
> struct kvm_lapic *source,
> > }
> >   }
> >
> > +int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
> > +  const unsigned long *bitmap, u32 bitmap_size)
> > +{
> > +   u32 mod;
> > +   int i, idx = 0;
> > +
> > +   mod = vector % dest_vcpus;
> > +
> > +   for (i = 0; i <= mod; i++) {
> > +   idx = find_next_bit(bitmap, bitmap_size, idx) + 1;
> > +   BUG_ON(idx > bitmap_size);
> > +   }
> > +
> > +   return idx;
> > +}
> > +
> >   bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
> > struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
> >   {
> > @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm
> *kvm, struct kvm_lapic *src,
> > dst = map->logical_map[cid];
> >
> > if (kvm_lowest_prio_delivery(irq)) {
> > -   int l = -1;
> > -   for_each_set_bit(i, , 16) {
> > -   if (!dst[i])
> > -   continue;
> > -   if (l < 0)
> > -   l = i;
> > -   else if 

Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-20 Thread Yang Zhang

On 2015/12/16 9:37, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts for
VT-d posted-interrupts.

Signed-off-by: Feng Wu 
---
  arch/x86/kvm/lapic.c | 67 
  arch/x86/kvm/lapic.h |  2 ++
  arch/x86/kvm/vmx.c   | 12 --
  3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e29001f..d4f2c8f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -854,6 +854,73 @@ out:
  }

  /*
+ * This routine handles lowest-priority interrupts using vector-hashing
+ * mechanism. As an example, modern Intel CPUs use this method to handle
+ * lowest-priority interrupts.
+ *
+ * Here is the details about the vector-hashing mechanism:
+ * 1. For lowest-priority interrupts, store all the possible destination
+ *vCPUs in an array.
+ * 2. Use "guest vector % max number of destination vCPUs" to find the right
+ *destination vCPU in the array for the lowest-priority interrupt.
+ */
+struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
+ struct kvm_lapic_irq *irq)
+{
+   struct kvm_apic_map *map;
+   struct kvm_vcpu *vcpu = NULL;
+
+   if (irq->shorthand)
+   return NULL;
+
+   rcu_read_lock();
+   map = rcu_dereference(kvm->arch.apic_map);
+
+   if (!map)
+   goto out;
+
+   if ((irq->dest_mode != APIC_DEST_PHYSICAL) &&
+   kvm_lowest_prio_delivery(irq)) {
+   u16 cid;
+   int i, idx = 0;
+   unsigned long bitmap = 1;
+   unsigned int dest_vcpus = 0;
+   struct kvm_lapic **dst = NULL;
+
+
+   if (!kvm_apic_logical_map_valid(map))
+   goto out;
+
+   apic_logical_id(map, irq->dest_id, , (u16 *));
+
+   if (cid >= ARRAY_SIZE(map->logical_map))
+   goto out;
+
+   dst = map->logical_map[cid];
+
+   for_each_set_bit(i, , 16) {
+   if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) {
+   clear_bit(i, );
+   continue;
+   }
+   }
+
+   dest_vcpus = hweight16(bitmap);
+
+   if (dest_vcpus != 0) {
+   idx = kvm_vector_2_index(irq->vector, dest_vcpus,
+, 16);
+   vcpu = dst[idx-1]->vcpu;
+   }
+   }
+
+out:
+   rcu_read_unlock();
+   return vcpu;
+}
+EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
+
+/*
   * Add a pending IRQ into lapic.
   * Return 1 if successfully added and 0 if discarded.
   */
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6890ef0..52bffce 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct 
kvm_lapic_irq *irq,
struct kvm_vcpu **dest_vcpu);
  int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
   const unsigned long *bitmap, u32 bitmap_size);
+struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
+ struct kvm_lapic_irq *irq);
  #endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5eb56ed..3f89189 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm *kvm, 
unsigned int host_irq,
 */

kvm_set_msi_irq(e, );
-   if (!kvm_intr_is_single_vcpu(kvm, , ))
-   continue;
+
+   if (!kvm_intr_is_single_vcpu(kvm, , )) {
+   if (!kvm_vector_hashing_enabled() ||
+   irq.delivery_mode != APIC_DM_LOWEST)
+   continue;
+
+   vcpu = kvm_intr_vector_hashing_dest(kvm, );
+   if (!vcpu)
+   continue;
+   }


I am a little confused with the 'continue'. If the destination is not 
single vcpu, shouldn't we rollback to use non-PI mode?




vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
vcpu_info.vector = irq.vector;




--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-21 06:38, Ming Lei wrote:

On Mon, Dec 21, 2015 at 1:51 AM, Linus Torvalds wrote:

Kent, Jens, Christoph et al,
 please see this bugzilla:

  https://bugzilla.kernel.org/show_bug.cgi?id=109661

where Artem Tashkinov bisected his problems with 4.3 down to commit
b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
signed off on.

(Also Tejun - maybe you can see what's up - maybe that error message
tells you something)

I'm not sure what's up with his machine, the disk doesn't seem to be
anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:

  ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133

which doesn't strike me as odd.

Looking at the dmesg, it also looks like it's a pretty normal
Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
ID for the AHCI chip seems to be (INTEL, 0x1c02).

Any ideas? Anybody?


BTW, I have posted very similar issue in the link:

http://marc.info/?l=linux-ide=145066119623811=2

Artem, I noticed from bugzillar that the hardware is i386, just
wondering if PAE is enabled?  If yes, I am more confident
that both the two kinds of report are similar or same.



Yes, I'm on i686 with PAE (16GB of RAM here) - it's specifically 
mentioned in the corresponding bug report.


P.S. I know Linus doesn't condone PAE but I still find it more 
preferrable than running a mixed environment with almost zero benefit in 
regard to performance and quite obvious performance regressions related 
to an increased number of libraries being loaded (i686 + x86_64) and 
slightly bloated code which sometimes cannot fit in the CPU cache. Call 
me old fashioned but I won't upgrade to x86_64 until most of the things 
that I run locally are available for x86_64 and that won't happen any 
time soon.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-20 Thread Yang Zhang

On 2015/12/16 9:37, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.

Signed-off-by: Feng Wu 
---
  arch/x86/kvm/irq_comm.c | 27 ++-
  arch/x86/kvm/lapic.c| 57 -
  arch/x86/kvm/lapic.h|  2 ++
  arch/x86/kvm/x86.c  |  9 
  arch/x86/kvm/x86.h  |  1 +
  5 files changed, 81 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 84b96d3..c8c5f61 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -32,6 +32,7 @@
  #include "ioapic.h"

  #include "lapic.h"
+#include "x86.h"

  static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
   struct kvm *kvm, int irq_source_id, int level,
@@ -53,8 +54,10 @@ static int kvm_set_ioapic_irq(struct 
kvm_kernel_irq_routing_entry *e,
  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map)
  {
-   int i, r = -1;
+   int i, r = -1, idx = 0;
struct kvm_vcpu *vcpu, *lowest = NULL;
+   unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
+   unsigned int dest_vcpus = 0;

if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
kvm_lowest_prio_delivery(irq)) {
@@ -65,6 +68,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, , dest_map))
return r;

+   memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
+
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!kvm_apic_present(vcpu))
continue;
@@ -78,13 +83,25 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
r = 0;
r += kvm_apic_set_irq(vcpu, irq, dest_map);
} else if (kvm_lapic_enabled(vcpu)) {
-   if (!lowest)
-   lowest = vcpu;
-   else if (kvm_apic_compare_prio(vcpu, lowest) < 0)
-   lowest = vcpu;
+   if (!kvm_vector_hashing_enabled()) {
+   if (!lowest)
+   lowest = vcpu;
+   else if (kvm_apic_compare_prio(vcpu, lowest) < 
0)
+   lowest = vcpu;
+   } else {
+   __set_bit(vcpu->vcpu_id, dest_vcpu_bitmap);
+   dest_vcpus++;
+   }
}
}

+   if (dest_vcpus != 0) {
+   idx = kvm_vector_2_index(irq->vector, dest_vcpus,
+dest_vcpu_bitmap, KVM_MAX_VCPUS);
+
+   lowest = kvm_get_vcpu(kvm, idx - 1);
+   }
+
if (lowest)
r = kvm_apic_set_irq(lowest, irq, dest_map);

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ecd4ea1..e29001f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -678,6 +678,22 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct 
kvm_lapic *source,
}
  }

+int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
+  const unsigned long *bitmap, u32 bitmap_size)
+{
+   u32 mod;
+   int i, idx = 0;
+
+   mod = vector % dest_vcpus;
+
+   for (i = 0; i <= mod; i++) {
+   idx = find_next_bit(bitmap, bitmap_size, idx) + 1;
+   BUG_ON(idx > bitmap_size);
+   }
+
+   return idx;
+}
+
  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
  {
@@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
struct kvm_lapic *src,
dst = map->logical_map[cid];

if (kvm_lowest_prio_delivery(irq)) {
-   int l = -1;
-   for_each_set_bit(i, , 16) {
-   if (!dst[i])
-   continue;
-   if (l < 0)
-   l = i;
-   else if (kvm_apic_compare_prio(dst[i]->vcpu, 
dst[l]->vcpu) < 0)
-   l = i;
+   if (!kvm_vector_hashing_enabled()) {
+   int l = -1;
+   for_each_set_bit(i, , 16) {
+   if (!dst[i])
+   continue;
+   if (l < 0)
+   l = i;
+   else if (kvm_apic_compare_prio(dst[i]->vcpu, 

Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Ming Lei
On Mon, Dec 21, 2015 at 1:51 AM, Linus Torvalds
 wrote:
> Kent, Jens, Christoph et al,
>  please see this bugzilla:
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=109661
>
> where Artem Tashkinov bisected his problems with 4.3 down to commit
> b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> signed off on.
>
> (Also Tejun - maybe you can see what's up - maybe that error message
> tells you something)
>
> I'm not sure what's up with his machine, the disk doesn't seem to be
> anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:
>
>   ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133
>
> which doesn't strike me as odd.
>
> Looking at the dmesg, it also looks like it's a pretty normal
> Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
> ID for the AHCI chip seems to be (INTEL, 0x1c02).
>
> Any ideas? Anybody?

BTW, I have posted very similar issue in the link:

http://marc.info/?l=linux-ide=145066119623811=2

Artem, I noticed from bugzillar that the hardware is i386, just
wondering if PAE is enabled?  If yes, I am more confident
that both the two kinds of report are similar or same.

Thanks,

>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: dma-mapping: Just allocate one chunk at a time

2015-12-20 Thread Laurent Pinchart
Hi Robin,

On Friday 18 December 2015 20:20:56 Robin Murphy wrote:
> On 18/12/15 18:55, Doug Anderson wrote:
> > On Fri, Dec 18, 2015 at 4:41 AM, Robin Murphy wrote:
> >> On 17/12/15 22:31, Doug Anderson wrote:
> >>> On Thu, Dec 17, 2015 at 12:30 PM, Douglas Anderson wrote:
>  The __iommu_alloc_buffer() is expected to be called to allocate pretty
>  sizeable buffers.  Upon simple tests of video I saw it trying to
>  allocate 4,194,304 bytes.  The function tries to be efficient about
>  this by starting out allocating large chunks and then moving to smaller
>  and smaller chunk sizes until it succeeds.
>  
>  The current function is very, very slow.
>  
>  One problem is the way it keeps trying and trying to allocate big
>  chunks.  Imagine a very fragmented memory that has 4M free but no
>  contiguous pages at all.  Further imagine allocating 4M (1024 pages).
>  We'll do the following memory allocations:
>  
>  - For page 1:
>  - Try to allocate order 10 (no retry)
>  - Try to allocate order 9 (no retry)
>  - ...
>  - Try to allocate order 0 (with retry, but not needed)
>  
>  - For page 2:
>  - Try to allocate order 9 (no retry)
>  - Try to allocate order 8 (no retry)
>  - ...
>  - Try to allocate order 0 (with retry, but not needed)
>  
>  - ...
>  - ...
>  
>  Total number of calls to alloc() calls for this case is:
>  sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
>  => 9228
>  
>  The above is obviously worse case, but given how slow alloc can be we
>  really want to try to avoid even somewhat bad cases.  I timed the old
>  code with a device under memory pressure and it wasn't hard to see it
>  take more than 24 seconds to allocate 4 megs of memory (!!).
>  
>  A second problem (and maybe even more important) is that allocating big
>  chunks when we don't need them is just not a good idea anyway.  The
>  first thing we do with these big chunks is break them into smaller
>  chunks!  If we allocate small chunks:
>  - The memory manager doesn't need to work so hard to give us big
>    chunks.
>  - We can save the big chunks for those that really need them and this
>    code can make great use of all the small chunks sitting around.
>  
>  Let's simplify by just allocating one page at a time.  We may make more
>  total allocate calls but it works way better.  In real world tests that
>  used to sometimes see a 24 second allocation call I can now see at most
>  250 ms.
> > 
> > One thing to note is that testing yesterday I actually managed to
> > reproduce an allocation taking 120 seconds (!) with the old code.
> 
> Yikes! That really is worth avoiding...
> 
> >>> Off-list I talked to Dmitry about this a little bit and he pointed out
> >>> that contiguous chunks actually give a benefit to the IOMMU.  I don't
> >>> think the benefit outweighs the cost in this case, but I'm happy to
> >>> hear what others have to say.  I did some quick printouts and it turns
> >>> out that even when requesting page at a time the memory manager
> >>> (unsurprisingly) can in many cases still give us pages that are
> >>> contiguous.
> >>> 
> >>> Also I'm happy to post up
> >>>  which sorts the
> >>> array and could possibly give us larger chunks of contiguous memory.
> >> 
> >> I think sorting individually-allocated pages really isn't worth the
> >> effort - I'm not aware of anything that's going to be capable of using
> >> larger page/section mappings without also having the necessary physical
> >> alignment, and if you _can_ cobble together, say, 2MB worth of
> >> contiguous pages *at 2MB alignment*, then you would have been far better
> >> off just asking the slab allocator for that in the first place.
> >> 
> >> That's the key point of the higher-order allocation - not that you get
> >> some contiguous pages, but that the region you get is also naturally
> >> aligned to its size physically. That we break up the CPU page tables for
> >> that region into individual pages is just an inconsequential
> >> implementation detail from the IOMMU side. When you _do_ have plenty of
> >> unfragmented free memory it can really be a big win - here's an
> >> instrumented example of what happens on my Juno with the ARM HDLCD/SMMU
> >> combo setting up a framebuffer at boot time:
> >>iommu_dma_alloc: alloc size 0x753000, 1875 pages
> >>__iommu_dma_alloc_pages: allocated at order 10
> >>__iommu_dma_alloc_pages: allocated at order 9
> >>__iommu_dma_alloc_pages: allocated at order 8
> >>__iommu_dma_alloc_pages: allocated at order 6
> >>__iommu_dma_alloc_pages: allocated at order 4
> >>__iommu_dma_alloc_pages: allocated at order 1
> >>__iommu_dma_alloc_pages: allocated at order 0
> >>iommu: map: iova 0xff80 

SATA IO errors after ''block: fix segment split"

2015-12-20 Thread Ming Lei
Hi,

Both Andre and Diethard reported that the following kind of
SATA write IO errors on 32bit(ARM/i386) with PAE after the
commit 578270bfb(block: fix segment split):

[  103.736982] ata1.00: exception Emask 0x0 SAct 0x30 SErr 0x0
action 0x6 frozen
[  103.744476] ata1.00: failed command: WRITE FPDMA QUEUED
[  103.749707] ata1.00: cmd 61/00:20:48:6b:41/08:00:0a:00:00/40 tag 4
ncq 1048576 out
[  103.749707]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  103.764659] ata1.00: status: { DRDY }

Unfortunately I can't reproduce the issue on QEMU with i686 plus
PAE after runing kinds of IO loads on the AHCI SATA drive.

But the commit 578270bfb(block: fix segment split) is quite simple and
straight-forward, and I can't see what is wrong with this change.

Looks size of the faulted I/O is always 1Mbytes in Diethard's reports(it
is in private email and I can show that if anyone need the logs), and seems
it is highly possiblly related with the commit b54ffb73cadc ("block:
remove bio_get_nr_vecs()"), which is another similar SATA report by
Linus.


Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] ata: sata_dwc_460ex: use "dmas" DT property to find dma channel

2015-12-20 Thread Måns Rullgård
Andy Shevchenko  writes:

> On Sun, Dec 20, 2015 at 10:17 PM, Andy Shevchenko
>  wrote:
>> On Sun, Dec 20, 2015 at 8:49 PM, Måns Rullgård  wrote:
>>> Julian Margetson  writes:
 On 12/20/2015 1:11 PM, Måns Rullgård wrote:
> Julian Margetson  writes:
>>
 [   48.769671] ata3.00: failed command: READ FPDMA QUEUED
>>>
>>> Well, that didn't help.  I still think it's part of the problem, but
>>> something else must be wrong as well.  The various Master Select fields
>>> look like a good place to start.
>>
>> Master number (which is here would be either 1 or 0) should not affect
>> as long as they are connected to the same AHB bus (I would be
>> surprised if they are not).
>>
>>>  Also, the manual says the LLP_SRC_EN
>>> and LLP_DST_EN flags should be cleared on the last in a chain of blocks.
>>> The old sata_dwc driver does this whereas dw_dma does not.
>>
>> Easy to fix, however I can't get how it might affect.
>>
>>> It might be worthwhile to try reverting drivers/ata/sata_dwc_460ex.c to
>>> v4.0 (leaving the rest at 4.4-rc5) just to make sure that's a good
>>> reference.  I've verified that this builds.
>>
>> It would be nice.
>>
>> I noticed thanks to DWC_PARAMS that burst size is hardcoded to 32
>> items on this board, however registers for SATA program it to 64. I
>> remember that I got no interrupt when I programmed transfer width
>> wrongly (64 bits against 32 bits) when I ported dw_dmac to be used on
>> Intel SoCs.
>
> One more thing, I have a patch to monitor DMA IO, we may check what
> exactly the values are written / read  in DMA. I can share it
> tomorrow.
>
> P.S. I also noticed that original driver enables interrupt per each
> block

And then ignores all but the transfer complete interrupt.

> and sets protection control bits.

With no indication what the value it sets is supposed to mean.

-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: dma-mapping: Just allocate one chunk at a time

2015-12-20 Thread Laurent Pinchart
Hi Tomasz,

On Friday 18 December 2015 15:05:45 Tomasz Figa wrote:
> On Fri, Dec 18, 2015 at 7:31 AM, Doug Anderson wrote:
> > On Thu, Dec 17, 2015 at 12:30 PM, Douglas Anderson wrote:
> >> The __iommu_alloc_buffer() is expected to be called to allocate pretty
> >> sizeable buffers.  Upon simple tests of video I saw it trying to
> >> allocate 4,194,304 bytes.  The function tries to be efficient about this
> >> by starting out allocating large chunks and then moving to smaller and
> >> smaller chunk sizes until it succeeds.
> >> 
> >> The current function is very, very slow.
> >> 
> >> One problem is the way it keeps trying and trying to allocate big
> >> chunks.  Imagine a very fragmented memory that has 4M free but no
> >> contiguous pages at all.  Further imagine allocating 4M (1024 pages).
> >> We'll do the following memory allocations:
> >> 
> >> - For page 1:
> >>   - Try to allocate order 10 (no retry)
> >>   - Try to allocate order 9 (no retry)
> >>   - ...
> >>   - Try to allocate order 0 (with retry, but not needed)
> >> 
> >> - For page 2:
> >>   - Try to allocate order 9 (no retry)
> >>   - Try to allocate order 8 (no retry)
> >>   - ...
> >>   - Try to allocate order 0 (with retry, but not needed)
> >> 
> >> - ...
> >> - ...
> >> 
> >> Total number of calls to alloc() calls for this case is:
> >>   sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
> >>   => 9228
> >> 
> >> The above is obviously worse case, but given how slow alloc can be we
> >> really want to try to avoid even somewhat bad cases.  I timed the old
> >> code with a device under memory pressure and it wasn't hard to see it
> >> take more than 24 seconds to allocate 4 megs of memory (!!).
> >> 
> >> A second problem (and maybe even more important) is that allocating big
> >> chunks when we don't need them is just not a good idea anyway.  The
> >> first thing we do with these big chunks is break them into smaller
> >> chunks!  If we allocate small chunks:
> >> - The memory manager doesn't need to work so hard to give us big chunks.
> >> - We can save the big chunks for those that really need them and this
> >> 
> >>   code can make great use of all the small chunks sitting around.
> >> 
> >> Let's simplify by just allocating one page at a time.  We may make more
> >> total allocate calls but it works way better.  In real world tests that
> >> used to sometimes see a 24 second allocation call I can now see at most
> >> 250 ms.
> > 
> > Off-list I talked to Dmitry about this a little bit and he pointed out
> > that contiguous chunks actually give a benefit to the IOMMU.  I don't
> > think the benefit outweighs the cost in this case, but I'm happy to
> > hear what others have to say.
> 
> Yeah, I'd like to see some discussion about the effect of allocating
> bigger chunks on IOMMU performance. Dmitry (on CC), could you
> elaborate a bit on what Doug mentioned?
> 
> As for my own understanding, some IOMMUs can map memory using big
> pages, which should improve TLB efficiency and so look-up speed.
> However AFAICT current implementation of allocating function doesn't
> allocate the chunks properly, because there is no guarantee that
> particular chunks are aligned on big page boundary. For example, it
> might happen that we allocate first chunk of order 0, then second
> chunk of order 4 (64KiB - typical big page), then we won't be able to
> map the second chunk using a big page, because the IOVA at that point
> will not be aligned properly.

That might be true of the current implementations, but there's nothing that 
would stop an IOMMU driver to map the start of the buffer at an IOVA address 
aligned to 64kB minus 4kB in the example you mentioned. This would move to a 
trade-off between allocation complexity and runtime performances.

> Is there any other case when bigger physically contiguous chunks can
> help the IOMMU?

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/8 v6] thermal: rcar: enable to use thermal-zone on DT

2015-12-20 Thread Kuninori Morimoto

Hi

> > +thermal: thermal@e61f {
> > +   compatible ="renesas,thermal-r8a7790",
> > +   "renesas,rcar-gen2-thermal",
> > +   "renesas,rcar-thermal";
> 
> Isn't having both mutually exclusive?

"rcar-thermal" is very basic version of "rcar-gen2",
and "rcar-gen2" is common/basic version of "thermal-r8xxx"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] vfio: Enable VFIO device for powerpc

2015-12-20 Thread David Gibson
ec53500f "kvm: Add VFIO device" added a special KVM pseudo-device which is
used to handle any necessary interactions between KVM and VFIO.

Currently that device is built on x86 and ARM, but not powerpc, although
powerpc does support both KVM and VFIO.  This makes things awkward in
userspace

Currently qemu prints an alarming error message if you attempt to use VFIO
and it can't initialize the KVM VFIO device.  We don't want to remove the
warning, because lack of the KVM VFIO device could mean coherency problems
on x86.  On powerpc, however, the error is harmless but looks disturbing,
and a test based on host architecture in qemu would be ugly, and break if
we do need the KVM VFIO device for something important in future.

There's nothing preventing the KVM VFIO device from being built for
powerpc, so this patch turns it on.  It won't actually do anything, since
we don't define any of the arch_*() hooks, but it will make qemu happy and
we can extend it in future if we need to.

Signed-off-by: David Gibson 
Reviewed-by: Eric Auger 
---
 arch/powerpc/kvm/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[RESEND?] I thought I sent this out some time ago.  Not sure if I
forgot, or if it fell through the cracks somewhere else.  In any case,
please apply.

diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 0570eef..7f7b6d8 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -8,7 +8,7 @@ ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm
 KVM := ../../../virt/kvm
 
 common-objs-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
-   $(KVM)/eventfd.o
+   $(KVM)/eventfd.o $(KVM)/vfio.o
 
 CFLAGS_e500_mmu.o := -I.
 CFLAGS_e500_mmu_host.o := -I.
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] ata: sata_dwc_460ex: use "dmas" DT property to find dma channel

2015-12-20 Thread Måns Rullgård
Måns Rullgård  writes:

> Andy Shevchenko  writes:
>
>> On Sun, Dec 20, 2015 at 8:49 PM, Måns Rullgård  wrote:
>>> Julian Margetson  writes:
 On 12/20/2015 1:11 PM, Måns Rullgård wrote:
> Julian Margetson  writes:
>>
 [   48.769671] ata3.00: failed command: READ FPDMA QUEUED
>>>
>>> Well, that didn't help.  I still think it's part of the problem, but
>>> something else must be wrong as well.  The various Master Select fields
>>> look like a good place to start.
>>
>> Master number (which is here would be either 1 or 0) should not affect
>> as long as they are connected to the same AHB bus (I would be
>> surprised if they are not).
>
> I think they are not.  The relevant part of the block diagram for the
> 460EX looks something like this:

Oops, hit send by accident.  More soon.

-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH kernel] vfio: Add explicit alignments in vfio_iommu_spapr_tce_create

2015-12-20 Thread David Gibson
On Fri, Dec 18, 2015 at 12:35:47PM +1100, Alexey Kardashevskiy wrote:
> The vfio_iommu_spapr_tce_create struct has 4x32bit and 2x64bit fields
> which should have resulted in sizeof(fio_iommu_spapr_tce_create) equal
> to 32 bytes. However due to the gcc's default alignment, the actual
> size of this struct is 40 bytes.
> 
> This fills gaps with __resv1/2 fields.
> 
> This should not cause any change in behavior.
> 
> Signed-off-by: Alexey Kardashevskiy 

Oops, that was a bit sloppy.  Oh well.

Acked-by: David Gibson 

> ---
>  include/uapi/linux/vfio.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9fd7b5d..d117233 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -568,8 +568,10 @@ struct vfio_iommu_spapr_tce_create {
>   __u32 flags;
>   /* in */
>   __u32 page_shift;
> + __u32 __resv1;
>   __u64 window_size;
>   __u32 levels;
> + __u32 __resv2;
>   /* out */
>   __u64 start_addr;
>  };

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/3] ata: sata_dwc_460ex: use "dmas" DT property to find dma channel

2015-12-20 Thread Måns Rullgård
Andy Shevchenko  writes:

> On Sun, Dec 20, 2015 at 8:49 PM, Måns Rullgård  wrote:
>> Julian Margetson  writes:
>>> On 12/20/2015 1:11 PM, Måns Rullgård wrote:
 Julian Margetson  writes:
>
>>> [   48.769671] ata3.00: failed command: READ FPDMA QUEUED
>>
>> Well, that didn't help.  I still think it's part of the problem, but
>> something else must be wrong as well.  The various Master Select fields
>> look like a good place to start.
>
> Master number (which is here would be either 1 or 0) should not affect
> as long as they are connected to the same AHB bus (I would be
> surprised if they are not).

I think they are not.  The relevant part of the block diagram for the
460EX looks something like this:

  +-+
  | CPU |
  +-+
 |
 +---+
 |  BUS  |
 +---+
| |
 +-+   +-+ 
 | DMA |   | RAM |
 +-+   +-+
|
 +--+
 | SATA |
 +--+

The DMA-SATA link is private and ignores the address, which is the only
reason the driver can possibly work (it's programming a CPU virtual
address there).

>> Also, the manual says the LLP_SRC_EN
>> and LLP_DST_EN flags should be cleared on the last in a chain of blocks.
>> The old sata_dwc driver does this whereas dw_dma does not.
>
> Easy to fix, however I can't get how it might affect.

>From the Atmel doc:

  In Table 17-1 on page 185, all other combinations of LLPx.LOC = 0,
  CTLx.LLP_S_EN, CFGx.RELOAD_SR, CTLx.LLP_D_EN, and CFGx.RELOAD_DS are
  illegal, and causes indeterminate or erroneous behavior.

Most likely nothing happens, but I think it ought to be fixed.  In fact,
I have a patch already.

Come to think of it, I have an AVR32 dev somewhere.  Maybe I should dust
it off.

-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] ata: sata_dwc_460ex: use "dmas" DT property to find dma channel

2015-12-20 Thread Måns Rullgård
Andy Shevchenko  writes:

> On Sun, Dec 20, 2015 at 8:49 PM, Måns Rullgård  wrote:
>> Julian Margetson  writes:
>>> On 12/20/2015 1:11 PM, Måns Rullgård wrote:
 Julian Margetson  writes:
>
>>> [   48.769671] ata3.00: failed command: READ FPDMA QUEUED
>>
>> Well, that didn't help.  I still think it's part of the problem, but
>> something else must be wrong as well.  The various Master Select fields
>> look like a good place to start.
>
> Master number (which is here would be either 1 or 0) should not affect
> as long as they are connected to the same AHB bus (I would be
> surprised if they are not).

I think they are not.  The relevant part of the block diagram for the
460EX looks something like this:

+-++-++-++--+
| CPU |<==>| BUS |<==>| DMA |<==>| SATA |
+-++-++-++--+

>>  Also, the manual says the LLP_SRC_EN
>> and LLP_DST_EN flags should be cleared on the last in a chain of blocks.
>> The old sata_dwc driver does this whereas dw_dma does not.
>
> Easy to fix, however I can't get how it might affect.
>
>> It might be worthwhile to try reverting drivers/ata/sata_dwc_460ex.c to
>> v4.0 (leaving the rest at 4.4-rc5) just to make sure that's a good
>> reference.  I've verified that this builds.
>
> It would be nice.
>
> I noticed thanks to DWC_PARAMS that burst size is hardcoded to 32
> items on this board, however registers for SATA program it to 64. I
> remember that I got no interrupt when I programmed transfer width
> wrongly (64 bits against 32 bits) when I ported dw_dmac to be used on
> Intel SoCs.
>
> -- 
> With Best Regards,
> Andy Shevchenko

-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1 2/4] selftests/seccomp: Remove the need for HAVE_ARCH_TRACEHOOK

2015-12-20 Thread Mickaël Salaün
Some architectures do not implement PTRACE_GETREGSET nor
PTRACE_SETREGSET (required by HAVE_ARCH_TRACEHOOK) but only implement
PTRACE_GETREGS and PTRACE_SETREGS (e.g. User-mode Linux).

This improve seccomp selftest portability for architectures without
HAVE_ARCH_TRACEHOOK support by defining a new trigger HAVE_GETREGS. For
now, this is only enabled for i386 and x86_64 architectures. This is
required to be able to run this tests on User-mode Linux.

Signed-off-by: Mickaël Salaün 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Shuah Khan 
Cc: linux-kernel@vger.kernel.org
Cc: user-mode-linux-de...@lists.sourceforge.net
Cc: user-mode-linux-u...@lists.sourceforge.net
Cc: linux-...@vger.kernel.org
Cc: Meredydd Luff 
Cc: David Drysdale 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 882fe83..b9453b8 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1246,11 +1246,24 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 # error "Do not know how to find your architecture's registers and syscalls"
 #endif
 
+/* Use PTRACE_GETREGS and PTRACE_SETREGS when available. This is useful for
+ * architectures without HAVE_ARCH_TRACEHOOK (e.g. User-mode Linux).
+ */
+#if defined(__x86_64__) || defined(__i386__)
+#define HAVE_GETREGS
+#endif
+
 /* Architecture-specific syscall fetching routine. */
 int get_syscall(struct __test_metadata *_metadata, pid_t tracee)
 {
-   struct iovec iov;
ARCH_REGS regs;
+#ifdef HAVE_GETREGS
+   EXPECT_EQ(0, ptrace(PTRACE_GETREGS, tracee, 0, )) {
+   TH_LOG("PTRACE_GETREGS failed");
+   return -1;
+   }
+#else
+   struct iovec iov;
 
iov.iov_base = 
iov.iov_len = sizeof(regs);
@@ -1258,6 +1271,7 @@ int get_syscall(struct __test_metadata *_metadata, pid_t 
tracee)
TH_LOG("PTRACE_GETREGSET failed");
return -1;
}
+#endif
 
return regs.SYSCALL_NUM;
 }
@@ -1266,13 +1280,16 @@ int get_syscall(struct __test_metadata *_metadata, 
pid_t tracee)
 void change_syscall(struct __test_metadata *_metadata,
pid_t tracee, int syscall)
 {
-   struct iovec iov;
int ret;
ARCH_REGS regs;
-
+#ifdef HAVE_GETREGS
+   ret = ptrace(PTRACE_GETREGS, tracee, 0, );
+#else
+   struct iovec iov;
iov.iov_base = 
iov.iov_len = sizeof(regs);
ret = ptrace(PTRACE_GETREGSET, tracee, NT_PRSTATUS, );
+#endif
EXPECT_EQ(0, ret);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__powerpc__) || \
@@ -1312,9 +1329,13 @@ void change_syscall(struct __test_metadata *_metadata,
if (syscall == -1)
regs.SYSCALL_RET = 1;
 
+#ifdef HAVE_GETREGS
+   ret = ptrace(PTRACE_SETREGS, tracee, 0, );
+#else
iov.iov_base = 
iov.iov_len = sizeof(regs);
ret = ptrace(PTRACE_SETREGSET, tracee, NT_PRSTATUS, );
+#endif
EXPECT_EQ(0, ret);
 }
 
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1 3/4] um: Add full asm/syscall.h support

2015-12-20 Thread Mickaël Salaün
Add subarchitecture-independent implementation of asm-generic/syscall.h
allowing access to user system call parameters and results:
* syscall_get_nr()
* syscall_rollback()
* syscall_get_error()
* syscall_get_return_value()
* syscall_set_return_value()
* syscall_get_arguments()
* syscall_set_arguments()
* syscall_get_arch() provided by arch/x86/um/asm/syscall.h

This provides the necessary syscall helpers needed by
HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error().

This is inspired from Meredydd Luff's patch
(https://gerrit.chromium.org/gerrit/21425).

Signed-off-by: Mickaël Salaün 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: H. Peter Anvin 
Cc: x...@kernel.org
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: linux-kernel@vger.kernel.org
Cc: user-mode-linux-de...@lists.sourceforge.net
Cc: user-mode-linux-u...@lists.sourceforge.net
Cc: Meredydd Luff 
Cc: David Drysdale 
---
 arch/um/include/asm/syscall-generic.h | 138 ++
 arch/x86/um/asm/syscall.h |   1 +
 2 files changed, 139 insertions(+)
 create mode 100644 arch/um/include/asm/syscall-generic.h

diff --git a/arch/um/include/asm/syscall-generic.h 
b/arch/um/include/asm/syscall-generic.h
new file mode 100644
index 000..9fb9cf8
--- /dev/null
+++ b/arch/um/include/asm/syscall-generic.h
@@ -0,0 +1,138 @@
+/*
+ * Access to user system call parameters and results
+ *
+ * See asm-generic/syscall.h for function descriptions.
+ *
+ * Copyright (C) 2015 Mickaël Salaün 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __UM_SYSCALL_GENERIC_H
+#define __UM_SYSCALL_GENERIC_H
+
+#include 
+#include 
+#include 
+#include 
+
+static inline int syscall_get_nr(struct task_struct *task, struct pt_regs 
*regs)
+{
+
+   return PT_REGS_SYSCALL_NR(regs);
+}
+
+static inline void syscall_rollback(struct task_struct *task,
+   struct pt_regs *regs)
+{
+   /* do nothing */
+}
+
+static inline long syscall_get_error(struct task_struct *task,
+struct pt_regs *regs)
+{
+   const long error = regs_return_value(regs);
+
+   return IS_ERR_VALUE(error) ? error : 0;
+}
+
+static inline long syscall_get_return_value(struct task_struct *task,
+   struct pt_regs *regs)
+{
+   return regs_return_value(regs);
+}
+
+static inline void syscall_set_return_value(struct task_struct *task,
+   struct pt_regs *regs,
+   int error, long val)
+{
+   PT_REGS_SET_SYSCALL_RETURN(regs, (long) error ?: val);
+}
+
+static inline void syscall_get_arguments(struct task_struct *task,
+struct pt_regs *regs,
+unsigned int i, unsigned int n,
+unsigned long *args)
+{
+   const struct uml_pt_regs *r = >regs;
+
+   switch (i) {
+   case 0:
+   if (!n--)
+   break;
+   *args++ = UPT_SYSCALL_ARG1(r);
+   case 1:
+   if (!n--)
+   break;
+   *args++ = UPT_SYSCALL_ARG2(r);
+   case 2:
+   if (!n--)
+   break;
+   *args++ = UPT_SYSCALL_ARG3(r);
+   case 3:
+   if (!n--)
+   break;
+   *args++ = UPT_SYSCALL_ARG4(r);
+   case 4:
+   if (!n--)
+   break;
+   *args++ = UPT_SYSCALL_ARG5(r);
+   case 5:
+   if (!n--)
+   break;
+   *args++ = UPT_SYSCALL_ARG6(r);
+   case 6:
+   if (!n--)
+   break;
+   default:
+   BUG();
+   break;
+   }
+}
+
+static inline void syscall_set_arguments(struct task_struct *task,
+struct pt_regs *regs,
+unsigned int i, unsigned int n,
+const unsigned long *args)
+{
+   struct uml_pt_regs *r = >regs;
+
+   switch (i) {
+   case 0:
+   if (!n--)
+   break;
+   UPT_SYSCALL_ARG1(r) = *args++;
+   case 1:
+   if (!n--)
+   break;
+   UPT_SYSCALL_ARG2(r) = *args++;
+   case 2:
+   if (!n--)
+   break;
+   UPT_SYSCALL_ARG3(r) = *args++;
+   case 3:
+   if (!n--)
+   break;
+   UPT_SYSCALL_ARG4(r) = *args++;
+   case 4:
+   if (!n--)
+   break;
+   UPT_SYSCALL_ARG5(r) = *args++;
+   case 5:
+   if 

Re: linux-next: build failure after merge of the vfs tree

2015-12-20 Thread Stephen Rothwell
Hi Stephen,

On Thu, 10 Dec 2015 11:18:47 +1100 Stephen Rothwell  
wrote:
>
> Hi Al,
> 
> After merging the vfs tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
> 
> fs/orangefs/symlink.c:26:2: error: unknown field 'follow_link' specified in 
> initializer
>   .follow_link = pvfs2_follow_link,
>   ^
> fs/orangefs/symlink.c:26:17: warning: initialization from incompatible 
> pointer type [-Wincompatible-pointer-types]
>   .follow_link = pvfs2_follow_link, 
>  ^
> fs/orangefs/symlink.c:26:17: note: (near initialization for 
> 'pvfs2_symlink_inode_operations.put_link')
> 
> Caused by commit
> 
>   6b2553918d8b ("replace ->follow_link() with new method that could stay in 
> RCU mode")
> 
> [I wish there was some way to stage these API changes :-(]
> 
> I applied the following merge fix patch (which may need more work):
> 
> From: Stephen Rothwell 
> Date: Thu, 10 Dec 2015 11:12:36 +1100
> Subject: [PATCH] orangfs: update for follow_link to get_link change
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  fs/orangefs/symlink.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/orangefs/symlink.c b/fs/orangefs/symlink.c
> index 2adfceff7730..dbf24a98a3c9 100644
> --- a/fs/orangefs/symlink.c
> +++ b/fs/orangefs/symlink.c
> @@ -8,9 +8,15 @@
>  #include "pvfs2-kernel.h"
>  #include "pvfs2-bufmap.h"
>  
> -static const char *pvfs2_follow_link(struct dentry *dentry, void **cookie)
> +static const char *pvfs2_get_link(struct dentry *dentry, struct inode *inode,
> +   void **cookie)
>  {
> - char *target =  PVFS2_I(dentry->d_inode)->link_target;
> + char *target;
> +
> + if (!dentry)
> + return ERR_PTR(-ECHILD);
> +
> + target =  PVFS2_I(inode)->link_target;
>  
>   gossip_debug(GOSSIP_INODE_DEBUG,
>"%s: called on %s (target is %p)\n",
> @@ -23,7 +29,7 @@ static const char *pvfs2_follow_link(struct dentry *dentry, 
> void **cookie)
>  
>  struct inode_operations pvfs2_symlink_inode_operations = {
>   .readlink = generic_readlink,
> - .follow_link = pvfs2_follow_link,
> + .get_link = pvfs2_get_link,
>   .setattr = pvfs2_setattr,
>   .getattr = pvfs2_getattr,
>   .listxattr = pvfs2_listxattr,

This patch now looks like this (after changes to the orangefs tree):

diff --git a/fs/orangefs/symlink.c b/fs/orangefs/symlink.c
index 1b3ae63463dc..01977e88e95d 100644
--- a/fs/orangefs/symlink.c
+++ b/fs/orangefs/symlink.c
@@ -8,9 +8,15 @@
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
-static const char *orangefs_follow_link(struct dentry *dentry, void **cookie)
+static const char *orangefs_get_link(struct dentry *dentry, struct inode 
*inode,
+void **cookie)
 {
-   char *target =  ORANGEFS_I(dentry->d_inode)->link_target;
+   char *target;
+
+   if (!dentry)
+   return ERR_PTR(-ECHILD);
+
+   target = ORANGEFS_I(inode)->link_target;
 
gossip_debug(GOSSIP_INODE_DEBUG,
 "%s: called on %s (target is %p)\n",
@@ -23,7 +29,7 @@ static const char *orangefs_follow_link(struct dentry 
*dentry, void **cookie)
 
 struct inode_operations orangefs_symlink_inode_operations = {
.readlink = generic_readlink,
-   .follow_link = orangefs_follow_link,
+   .get_link = orangefs_get_link,
.setattr = orangefs_setattr,
.getattr = orangefs_getattr,
.listxattr = orangefs_listxattr,

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 4.4-rc6

2015-12-20 Thread Linus Torvalds
Things remain fairly normal. Last week rc5 was very small indeed, this
week we have a slightly bigger rc6. The main difference is that rc6
had a network pull in it.

But rc6 is still pretty small, and the patch looks pretty normal: just
over 60% drivers, 16% core networking, 13% architecture updates, and
10% "misc" (documentation, header files, some small filesystem updates
etc). Small stuff all around - you can see the appended shortlog for a
flavor of what is going on.

I'd expect (and hope) that with the holidays next week should continue
being quiet.

And maybe I can hope that people take the downtime to play with their
hardware and test out the most recent kernel version?

   Linus

---

Alan Cox (1):
  ser_gigaset: turn nonsense checks into WARN_ON

Alan Stern (1):
  USB: fix invalid memory access in hub_activate()

Alexander Duyck (1):
  ixgbe: Reset interface after enabling SR-IOV

Alexander Sverdlin (1):
  i2c: davinci: Increase module clock frequency

Alexey Brodkin (1):
  ARC: [axs10x] cap ethernet phy to 100 Mbit/sec

Alexey Khoroshilov (1):
  nfit: acpi_nfit_notify(): Do not leave device locked

Alistair Popple (1):
  powerpc/opal-irqchip: Fix deadlock introduced by "Fix double
endian conversion"

Andrew Lunn (1):
  phy: micrel: Fix finding PHY properties in MAC node.

Andrzej Hajda (1):
  net/mlx4_core: fix handling return value of mlx4_slave_convert_port

Andy Shevchenko (2):
  net:hns: annotate IO address space properly
  net:hns: print MAC with %pM

Anson Huang (1):
  ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted

Anssi Hannula (2):
  ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly
  ALSA: usb-audio: Add sample rate inquiry quirk for AudioQuest DragonFly

Antti Palosaari (3):
  [media] hackrf: fix possible null ptr on debug printing
  [media] hackrf: move RF gain ctrl enable behind module parameter
  [media] airspy: increase USB control message buffer size

Ariel Elior (1):
  qed: Fix BAR size split for some servers

Arnd Bergmann (6):
  netfilter: nfnetlink_queue: avoid harmless unnitialized variable warnings
  phy: sun9i-usb: add USB dependency
  net: fsl: avoid 64-bit warning on pq_mdio
  net: ezchip: fix address space confusion in nps_enet.c
  cpufreq: tegra: add regulator dependency for T124
  hwmon: (sht15) Select CONFIG_BITREVERSE

Artur Paszkiewicz (1):
  md/raid10: fix data corruption and crash during resync

Axel Lin (1):
  gpio: ath79: Fix the logic to clear offset bit of
AR71XX_GPIO_REG_OE register

Bert Kenward (1):
  sfc: only use RSS filters if we're using RSS

Bhuvanchandra DV (1):
  spi-fsl-dspi: Fix CTAR Register access

Bjørn Mork (2):
  ipv6: keep existing flags when setting IFA_F_OPTIMISTIC
  net: cdc_mbim: add "NDP to end" quirk for Huawei E3372

Boris Ostrovsky (2):
  xen: Resume PMU from non-atomic context
  xen/x86/pvh: Use HVM's flush_tlb_others op

Brian Norris (2):
  mtd: ofpart: don't complain about missing 'partitions' node too loudly
  doc: dt: mtd: partitions: add compatible property to "partitions" node

Charles Keepax (1):
  Input: arizona-haptic - fix disabling of haptics device

Charlie Mooney (1):
  Input: elan_i2c - set input device's vendor and product IDs

Chen-Yu Tsai (1):
  stmmac: dwmac-sunxi: Call exit cleanup function in probe error path

Chris Mason (2):
  Btrfs: check for empty bitmap list in setup_cluster_bitmaps
  Btrfs: check prepare_uptodate_page() error code earlier

Chunfeng Yun (1):
  phy: core: Get a refcount to phy in devm_of_phy_get_by_index()

Colin Ian King (1):
  proc: fix -ESRCH error when writing to /proc/$pid/coredump_filter

Cyrille Pitchen (1):
  dmaengine: at_xdmac: fix at_xdmac_prep_dma_memcpy()

Dan Carpenter (6):
  mISDN: fix a loop count
  amd-xgbe: fix a couple timeout loops
  qlge: fix a timeout loop in ql_change_rx_buffers()
  sfc: fix a timeout loop
  qlcnic: fix a timeout loop
  USB: ipaq.c: fix a timeout loop

Dan Streetman (1):
  mm/zswap: change incorrect strncmp use to strcmp

Dan Williams (1):
  Revert "scatterlist: use sg_phys()"

Daniel Mentz (1):
  dma-debug: Fix dma_debug_entry offset calculation

David Ahern (1):
  net: Flush local routes when device changes vrf association

David Henningsson (1):
  ALSA: hda - Fix headphone mic input on a few Dell ALC293 machines

David S. Miller (2):
  Revert "rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation"
  bluetooth: Validate socket address length in sco_sock_bind().

David Vrabel (4):
  xen: Add RING_COPY_REQUEST()
  xen-netback: don't use last request to determine minimum Tx credit
  xen-netback: use RING_COPY_REQUEST() throughout
  xen-scsiback: safely copy requests

Dmitry Torokhov (1):
  Input: atmel_mxt_ts - add generic platform data for 

Re: [PATCH v1 1/4] um: Fix ptrace GETREGS/SETREGS bugs

2015-12-20 Thread Richard Weinberger
Am 21.12.2015 um 01:03 schrieb Mickaël Salaün:
> This fix two related bugs:
> * PTRACE_GETREGS doesn't get the right orig_ax (syscall) value
> * PTRACE_SETREGS can't set the orig_ax value (erased by initial value)
> 
> Remove the now useless and error-prone get_syscall().
> 
> Signed-off-by: Mickaël Salaün 
> Cc: Jeff Dike 
> Cc: Richard Weinberger 
> Cc: Thomas Gleixner 
> Cc: Kees Cook 
> Cc: Andy Lutomirski 
> Cc: Will Drewry 
> Cc: Thomas Meyer 
> Cc: Nicolas Iooss 
> Cc: Anton Ivanov 
> Cc: linux-kernel@vger.kernel.org
> Cc: user-mode-linux-de...@lists.sourceforge.net
> Cc: user-mode-linux-u...@lists.sourceforge.net
> Cc: Meredydd Luff 
> Cc: David Drysdale 
> ---
>  arch/um/kernel/skas/syscall.c   | 7 ++-
>  arch/um/os-Linux/skas/process.c | 7 ---
>  2 files changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c
> index 1683b8e..65f0d1a 100644
> --- a/arch/um/kernel/skas/syscall.c
> +++ b/arch/um/kernel/skas/syscall.c
> @@ -7,6 +7,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -16,12 +17,16 @@ void handle_syscall(struct uml_pt_regs *r)
>   long result;
>   int syscall;
>  
> + /* Save the syscall register. */
> + UPT_SYSCALL_NR(r) = PT_SYSCALL_NR(r->gp);
> +
>   if (syscall_trace_enter(regs)) {
>   result = -ENOSYS;
>   goto out;
>   }
>  
> - syscall = get_syscall(r);
> + /* Get the syscall after being potentially updated with ptrace. */
> + syscall = UPT_SYSCALL_NR(r);

Doesn't this break the support for changing syscall numbers using 
PTRACE_SETREGS?

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1 4/4] um: Add seccomp support

2015-12-20 Thread Mickaël Salaün
This brings SECCOMP_MODE_STRICT and SECCOMP_MODE_FILTER support through
prctl(2) and seccomp(2) to User-mode Linux for i386 and x86_64
subarchitectures.

secure_computing() is called first in handle_syscall() so that the
syscall emulation will be aborted quickly if matching a seccomp rule.

This is inspired from Meredydd Luff's patch
(https://gerrit.chromium.org/gerrit/21425).

Signed-off-by: Mickaël Salaün 
Cc: Jonathan Corbet 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Ingo Molnar 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Chris Metcalf 
Cc: Michael Ellerman 
Cc: James Hogan 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: user-mode-linux-de...@lists.sourceforge.net
Cc: user-mode-linux-u...@lists.sourceforge.net
Cc: Meredydd Luff 
Cc: David Drysdale 
---
 .../features/seccomp/seccomp-filter/arch-support.txt |  2 +-
 arch/um/Kconfig.common   |  1 +
 arch/um/Kconfig.um   | 16 
 arch/um/include/asm/thread_info.h|  2 ++
 arch/um/kernel/skas/syscall.c|  9 +
 5 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/features/seccomp/seccomp-filter/arch-support.txt 
b/Documentation/features/seccomp/seccomp-filter/arch-support.txt
index 76d39d6..4f66ec1 100644
--- a/Documentation/features/seccomp/seccomp-filter/arch-support.txt
+++ b/Documentation/features/seccomp/seccomp-filter/arch-support.txt
@@ -33,7 +33,7 @@
 |  sh: | TODO |
 |   sparc: | TODO |
 |tile: |  ok  |
-|  um: | TODO |
+|  um: |  ok  |
 |   unicore32: | TODO |
 | x86: |  ok  |
 |  xtensa: | TODO |
diff --git a/arch/um/Kconfig.common b/arch/um/Kconfig.common
index d195a87..cc00134 100644
--- a/arch/um/Kconfig.common
+++ b/arch/um/Kconfig.common
@@ -2,6 +2,7 @@ config UML
bool
default y
select HAVE_ARCH_AUDITSYSCALL
+   select HAVE_ARCH_SECCOMP_FILTER
select HAVE_UID16
select HAVE_FUTEX_CMPXCHG if FUTEX
select GENERIC_IRQ_SHOW
diff --git a/arch/um/Kconfig.um b/arch/um/Kconfig.um
index 28a9885..4b2ed58 100644
--- a/arch/um/Kconfig.um
+++ b/arch/um/Kconfig.um
@@ -104,3 +104,19 @@ config PGTABLE_LEVELS
int
default 3 if 3_LEVEL_PGTABLES
default 2
+
+config SECCOMP
+   def_bool y
+   prompt "Enable seccomp to safely compute untrusted bytecode"
+   ---help---
+ This kernel feature is useful for number crunching applications
+ that may need to compute untrusted bytecode during their
+ execution. By using pipes or other transports made available to
+ the process as file descriptors supporting the read/write
+ syscalls, it's possible to isolate those applications in
+ their own address space using seccomp. Once seccomp is
+ enabled via prctl(PR_SET_SECCOMP), it cannot be disabled
+ and the task is only allowed to execute a few safe syscalls
+ defined by each seccomp mode.
+
+ If unsure, say Y.
diff --git a/arch/um/include/asm/thread_info.h 
b/arch/um/include/asm/thread_info.h
index 53968aa..053baff 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -62,11 +62,13 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SYSCALL_AUDIT  6
 #define TIF_RESTORE_SIGMASK7
 #define TIF_NOTIFY_RESUME  8
+#define TIF_SECCOMP9   /* secure computing */
 
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_MEMDIE(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
+#define _TIF_SECCOMP   (1 << TIF_SECCOMP)
 
 #endif
diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c
index 65f0d1a..af567da 100644
--- a/arch/um/kernel/skas/syscall.c
+++ b/arch/um/kernel/skas/syscall.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -20,6 +21,14 @@ void handle_syscall(struct uml_pt_regs *r)
/* Save the syscall register. */
UPT_SYSCALL_NR(r) = PT_SYSCALL_NR(r->gp);
 
+   /* Do the secure computing check first; failures should be fast. */
+   if (secure_computing() == -1) {
+   /* Do not put secure_computing() into syscall_trace_enter() to
+* avoid forced syscall return value.
+*/
+   return;
+   }
+
if (syscall_trace_enter(regs)) {
result = -ENOSYS;
goto out;
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[PATCH v1 1/4] um: Fix ptrace GETREGS/SETREGS bugs

2015-12-20 Thread Mickaël Salaün
This fix two related bugs:
* PTRACE_GETREGS doesn't get the right orig_ax (syscall) value
* PTRACE_SETREGS can't set the orig_ax value (erased by initial value)

Remove the now useless and error-prone get_syscall().

Signed-off-by: Mickaël Salaün 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Thomas Gleixner 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Thomas Meyer 
Cc: Nicolas Iooss 
Cc: Anton Ivanov 
Cc: linux-kernel@vger.kernel.org
Cc: user-mode-linux-de...@lists.sourceforge.net
Cc: user-mode-linux-u...@lists.sourceforge.net
Cc: Meredydd Luff 
Cc: David Drysdale 
---
 arch/um/kernel/skas/syscall.c   | 7 ++-
 arch/um/os-Linux/skas/process.c | 7 ---
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c
index 1683b8e..65f0d1a 100644
--- a/arch/um/kernel/skas/syscall.c
+++ b/arch/um/kernel/skas/syscall.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -16,12 +17,16 @@ void handle_syscall(struct uml_pt_regs *r)
long result;
int syscall;
 
+   /* Save the syscall register. */
+   UPT_SYSCALL_NR(r) = PT_SYSCALL_NR(r->gp);
+
if (syscall_trace_enter(regs)) {
result = -ENOSYS;
goto out;
}
 
-   syscall = get_syscall(r);
+   /* Get the syscall after being potentially updated with ptrace. */
+   syscall = UPT_SYSCALL_NR(r);
 
if ((syscall > __NR_syscall_max) || syscall < 0)
result = -ENOSYS;
diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c
index b856c66..23025d6 100644
--- a/arch/um/os-Linux/skas/process.c
+++ b/arch/um/os-Linux/skas/process.c
@@ -172,13 +172,6 @@ static void handle_trap(int pid, struct uml_pt_regs *regs,
handle_syscall(regs);
 }
 
-int get_syscall(struct uml_pt_regs *regs)
-{
-   UPT_SYSCALL_NR(regs) = PT_SYSCALL_NR(regs->gp);
-
-   return UPT_SYSCALL_NR(regs);
-}
-
 extern char __syscall_stub_start[];
 
 static int userspace_tramp(void *stack)
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1 0/4] um: Add seccomp support

2015-12-20 Thread Mickaël Salaün
This series add seccomp support to User-mode Linux (i386 and x86_64
subarchitectures) and fix ptrace issues. This apply on v4.4-rc4 and pass all
the 48 tests from selftest/seccomp.

Regards,
 Mickaël

Mickaël Salaün (4):
  um: Fix ptrace GETREGS/SETREGS bugs
  selftests/seccomp: Remove the need for HAVE_ARCH_TRACEHOOK
  um: Add full asm/syscall.h support
  um: Add seccomp support

 .../seccomp/seccomp-filter/arch-support.txt|   2 +-
 arch/um/Kconfig.common |   1 +
 arch/um/Kconfig.um |  16 +++
 arch/um/include/asm/syscall-generic.h  | 138 +
 arch/um/include/asm/thread_info.h  |   2 +
 arch/um/kernel/skas/syscall.c  |  16 ++-
 arch/um/os-Linux/skas/process.c|   7 --
 arch/x86/um/asm/syscall.h  |   1 +
 tools/testing/selftests/seccomp/seccomp_bpf.c  |  27 +++-
 9 files changed, 198 insertions(+), 12 deletions(-)
 create mode 100644 arch/um/include/asm/syscall-generic.h

-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Propose] Isolate core_pattern in mnt namespace.

2015-12-20 Thread Kamezawa Hiroyuki
On 2015/12/20 18:47, Eric W. Biederman wrote:
> Dongsheng Yang  writes:
> 
>> On 12/20/2015 10:37 AM, Al Viro wrote:
>>> On Sun, Dec 20, 2015 at 10:14:29AM +0800, Dongsheng Yang wrote:
 On 12/17/2015 07:23 PM, Dongsheng Yang wrote:
> Hi guys,
>   We are working on making core dump behaviour isolated in
> container. But the problem is, the /proc/sys/kernel/core_pattern
> is a kernel wide setting, not belongs to a container.
>
>   So we want to add core_pattern into mnt namespace. What
> do you think about it?

 Hi Eric,
I found your patch about "net: Implement the per network namespace
 sysctl infrastructure", I want to do the similar thing
 in mnt namespace. Is that suggested way?
>>>
>>> Why mnt namespace and not something else?
>>
>> Hi Al,
>>
>> Well, because core_pattern indicates the path to store core file.
>> In different mnt namespace, we would like to change the path with
>> different value.
>>
>> In addition, Let's considering other namespaces:
>> UTS ns: contains informations of kernel and arch, not proper for 
>> core_pattern.
>> IPC ns: communication informations, not proper for core_pattern
>> PID ns: core_pattern is not related with pid
>> net ns: obviousely no.
>> user ns: not proper too.
>>
>> Then I believe it's better to do this in mnt namespace. of course,
>> core_pattern is just one example. After this infrastructure finished,
>> we can implement more sysctls as per-mnt if necessary, I think.
>>
>> Al, what do you think about this idea?
> 
> The hard part is not the sysctl.  The hard part is starting the usermode
> helper, in an environment that it can deal with.  The mount namespace
> really provides you with no help there.
> 
Let me ask. I think user-mode-helper shouldn be found in container's namespace.
Or Do you mean running user-mode-helper out of a container ?

I think if a user need to send cores to outside of a container, a file 
descriptor
or a network end point should be able to be passed to sysctl. 

Thanks,
-Kame



> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-21 04:42, Kent Overstreet wrote:

On Mon, Dec 21, 2015 at 04:25:12AM +0500, Artem S. Tashkinov wrote:

On 2015-12-20 23:18, Christoph Hellwig wrote:
>On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
>>Kent, Jens, Christoph et al,
>> please see this bugzilla:
>>
>>  https://bugzilla.kernel.org/show_bug.cgi?id=109661
>>
>>where Artem Tashkinov bisected his problems with 4.3 down to commit
>>b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
>>signed off on.
>
>Artem,
>
>can you re-check the commits around this series again?  I would be
>extremtly surprised if it's really this particular commit and not
>one just before it causing the problem - it just allocates bios
>to the biggest possible instead of only allocating up to what
>bio_add_page would accept.

I'm positive about this particular commit. Of course, it might be 
another
GCC 4.7.4 miscompilation which causes the errors which shouldn't be 
there

but
I'm not an expert, so.


I believe you on the commit, and I doubt this has anything to do with 
gcc - the
errors you're getting are exactly what you normally get when you send 
the device

an sglist to dma to/from that it doesn't like.

The queue limits stuff is annoyingly fragile, you'd think we'd be able 
to check
directly in the driver that the stuff we're sending the device is sane 
but we

don't.

If I came up with a debug patch could you try it out? I don't have any 
ideas for
one yet, but if someone who knows the ATA code doesn't jump in I'll 
call up

Tejun and make him walk me through it.


No problem, I just hope that this particular access mode (and you debug 
patch) won't decrease the lifespan of my HDD. Seagate HDDs have been 
very fragile (read atrociously unreliable) for the past five years.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Kent Overstreet
On Mon, Dec 21, 2015 at 04:25:12AM +0500, Artem S. Tashkinov wrote:
> On 2015-12-20 23:18, Christoph Hellwig wrote:
> >On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
> >>Kent, Jens, Christoph et al,
> >> please see this bugzilla:
> >>
> >>  https://bugzilla.kernel.org/show_bug.cgi?id=109661
> >>
> >>where Artem Tashkinov bisected his problems with 4.3 down to commit
> >>b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> >>signed off on.
> >
> >Artem,
> >
> >can you re-check the commits around this series again?  I would be
> >extremtly surprised if it's really this particular commit and not
> >one just before it causing the problem - it just allocates bios
> >to the biggest possible instead of only allocating up to what
> >bio_add_page would accept.
> 
> I'm positive about this particular commit. Of course, it might be another
> GCC 4.7.4 miscompilation which causes the errors which shouldn't be there
> but
> I'm not an expert, so.

I believe you on the commit, and I doubt this has anything to do with gcc - the
errors you're getting are exactly what you normally get when you send the device
an sglist to dma to/from that it doesn't like.

The queue limits stuff is annoyingly fragile, you'd think we'd be able to check
directly in the driver that the stuff we're sending the device is sane but we
don't.

If I came up with a debug patch could you try it out? I don't have any ideas for
one yet, but if someone who knows the ATA code doesn't jump in I'll call up
Tejun and make him walk me through it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-20 23:44, Kent Overstreet wrote:

On Sun, Dec 20, 2015 at 07:18:01PM +0100, Christoph Hellwig wrote:

On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
> Kent, Jens, Christoph et al,
ie  please see this bugzilla:
>o
>   httpps://bugzilla.kernel.org/show_bug.cgi?id=109661
>
> where Artem Tashkinov bisected his problems with 4.3 down to commit
> b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> signed off on.

Artem,

can you re-check the commits around this series again?  I would be
extremtly surprised if it's really this particular commit and not
one just before it causing the problem - it just allocates bios
to the biggest possible instead of only allocating up to what
bio_add_page would accept.


pretty sure it's something with how blk_bio_segment_split() decides 
what
segments are mergable and not. bio_get_nr_vecs() was just returning 
nr_pages ==
queue_max_segments (ignoring sectors for the moment) - so wait, wtf? 
that's
basically assuming no segment merging can ever happen, if it does then 
this was
causing us to send smaller requests to the device than we could have 
been.


so actually two possibilities I can see:
 - in blk_bio_segment_split(), something's screwed up with how it 
decides what
   segments are going to be mergable or not. but I don't think that's 
likely
   since it's doing the exact same thing the rest of the segment 
merging code

   does.
 - or, the driver was lying in its queue limits, using 
queue_max_segments for
   "the maximum number of pages I can possibly take", and that bug 
lurked

   undiscovered because of the screwed-upness in bio_get_nr_vecs().

Offhand I don't know where to start digging in the driver code to look 
into the

second theory though. Tejun, you got any ideas?


Here's an actual bisect log which Linus was missing:

git bisect start
# bad: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3
git bisect bad 6a13feb9c82803e2b815eca72fa7a9f5561d7861
# good: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
git bisect good 64291f7db5bd8150a74ad2036f1037e6a0428df2
# bad: [807249d3ada1ff28a47c4054ca4edd479421b671] Merge branch 
'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

git bisect bad 807249d3ada1ff28a47c4054ca4edd479421b671
# good: [102178108e2246cb4b329d3fb7872cd3d7120205] Merge tag 
'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

git bisect good 102178108e2246cb4b329d3fb7872cd3d7120205
# good: [62da98656b62a5ca57f22263705175af8ded5aa1] netfilter: 
nf_conntrack: make nf_ct_zone_dflt built-in

git bisect good 62da98656b62a5ca57f22263705175af8ded5aa1
# good: [f1a3c0b933e7ff856223d6fcd7456d403e54e4e5] Merge tag 
'devicetree-for-4.3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

git bisect good f1a3c0b933e7ff856223d6fcd7456d403e54e4e5
# bad: [9cbf22b37ae0592dea809cb8d424990774c21786] Merge tag 'dlm-4.3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

git bisect bad 9cbf22b37ae0592dea809cb8d424990774c21786
# good: [8bdc69b764013a9b5ebeef7df8f314f1066c5d79] Merge branch 
'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

git bisect good 8bdc69b764013a9b5ebeef7df8f314f1066c5d79
# good: [df910390e2db07a76c87f258475f6c96253cee6c] Merge tag 'scsi-misc' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

git bisect good df910390e2db07a76c87f258475f6c96253cee6c
# bad: [d975f309a8b250e67b66eabeb56be6989c783629] Merge branch 
'for-4.3/sg' of git://git.kernel.dk/linux-block

git bisect bad d975f309a8b250e67b66eabeb56be6989c783629
# bad: [89e2a8404e4415da1edbac6ca4f7332b4a74fae2] crypto/omap-sham: 
remove an open coded access to ->page_link

git bisect bad 89e2a8404e4415da1edbac6ca4f7332b4a74fae2
# good: [0e28997ec476bad4c7dbe0a08775290051325f53] btrfs: remove bio 
splitting and merge_bvec_fn() calls

git bisect good 0e28997ec476bad4c7dbe0a08775290051325f53
# bad: [2ec3182f9c20a9eef0dacc0512cf2ca2df7be5ad] Documentation: update 
notes in biovecs about arbitrarily sized bios

git bisect bad 2ec3182f9c20a9eef0dacc0512cf2ca2df7be5ad
# good: [7140aafce2fc14c5af02fdb7859b6bea0108be3d] md/raid5: get rid of 
bio_fits_rdev()

git bisect good 7140aafce2fc14c5af02fdb7859b6bea0108be3d
# good: [6cf66b4caf9c71f64a5486cadbd71ab58d0d4307] fs: use helper 
bio_add_page() instead of open coding on bi_io_vec

git bisect good 6cf66b4caf9c71f64a5486cadbd71ab58d0d4307
# bad: [b54ffb73cadcdcff9cc1ae0e11f502407e3e2e4c] block: remove 
bio_get_nr_vecs()

git bisect bad b54ffb73cadcdcff9cc1ae0e11f502407e3e2e4c

And like he said since the step before the last one was good and the 
very last one was bad there was no way I could have made a mistake.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO errors after "block: remove bio_get_nr_vecs()"

2015-12-20 Thread Artem S. Tashkinov

On 2015-12-20 23:41, Linus Torvalds wrote:

On Sun, Dec 20, 2015 at 10:18 AM, Christoph Hellwig wrote:


Artem,

can you re-check the commits around this series again?  I would be
extremtly surprised if it's really this particular commit and not
one just before it causing the problem - it just allocates bios
to the biggest possible instead of only allocating up to what
bio_add_page would accept.


Judging by Artem's bisect log, the last commit he tested before the
bad one was the commit before: commit 6cf66b4caf9c ("fs: use helper
bio_add_page() instead of open coding on bi_io_vec") and he marked
that one good.

Sadly, without CONFIG_LOCALVERSION_AUTO, there's no way to match up
the dmesg files (in the same bisection tar-file as the bisection log)
with the actual versions. Also, Artem's bisect.log isn't actually the
.git/BISECT_LOG file that contains the full information about what was
marked good and bad, so it's a bit hard to read (ie I can tell that
Artem had to mark commit 6cf66b4caf9c as "good" not because his log
says so, but because that explains the next commit to be tested).

Of course, it's fairly easy to make a mistake while bisecting (just
doing a thinko), but usually bisection miistakes end up causing you to
go into some "all good" or "all bad" region of commits, and the fact
that Artem seems to have marked the previous commit good and the final
commit bad does seem to imply the bisection was successful.

But yes, it is always nice to double-check the bisection results. The
best way to do it is generally to try to revert the bad commit and
verify that things work after that, but that commit doesn't revert
cleanly on top of 4.3 due to other changes.

Attached is a *COMPLETELY*UNTESTED* revertish patch for 4.3. It's
basically a revert of b54ffb73cadc, but with a few fixups to make the
revert work on top of 4.3.

So Artem, if you can test whether 4.3 works with that revert, and/or
double-check booting that b54ffb73cadc again (to verify that it's
really bad), and its parent (to double-check that it's really good),
that would be a good way to verify that yes, it is really that *one*
commit that breaks things for you.



After reverting (applying) this patch on top of 4.3.3 everything is back 
to normal. It's indeed a guilty commit.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the orangefs tree with Linus' tree

2015-12-20 Thread Stephen Rothwell
Hi Mike,

Today's linux-next merge of the orangefs tree got a conflict in:

  Makefile

between commit:

  1ec218373b8e ("Linux 4.4-rc2")

from Linus' tree and commit:

  575e946125f7 ("Orangefs: change pvfs2 filenames to orangefs")

from the orangefs tree.

You should really remove this EXTRAVERSION change from the patch.

I fixed it up (I use Linus' tree verison) and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >