Re: [PATCH 2/2 v8] printk: Add monotonic, boottime, and realtime timestamps

2017-08-24 Thread Sergey Senozhatsky
On (08/24/17 09:42), Prarit Bhargava wrote:
[..]
> +config PRINTK_TIME_DEBUG
> + bool "Allow runtime reselection of any timebase on printks"
> + depends on PRINTK
> + default N
> + help
> +   Selecting this option causes time stamps of the printk()
> +   messages to be changed freely at runtime on the output of
> +   the syslog() system call and at the console. Without this
> +   option, one can only enable or disable the configuration
> +   selected timebase.
> +
> +   Runtime adjustment can be set via
> +   /sys/module/printk/paramters/time as follows with a string:

s/paramters/parameters/

> +   0/N/n/disable, 1/Y/y/local, b/boot, m/monotonic, r/realtime.
> +   eg: echo local >/sys/module/printk/parameters/time
> +   echo realtime >/sys/module/printk/parameters/time

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Fix the kerneldoc for scsi_initialize_rq()

2017-08-24 Thread Bart Van Assche
On Thu, 2017-08-24 at 16:11 -0600, Jonathan Corbet wrote:
>  /**
>   * scsi_initialize_rq - initialize struct scsi_cmnd.req
> + * @rq: Request used to locate the scsi_cmnd structure.
>   *
>   * Called from inside blk_get_request().
>   */

Since the request structure and the SCSI command structure are adjacent,
how about the following:

@rq: Request associated with the SCSI command to be initialized.

Thanks,

Bart.N�r��yb�X��ǧv�^�)޺{.n�+{�v�"��^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

Re: [PATCH] driver core: Document struct device:dma_ops

2017-08-24 Thread Bart Van Assche
On Thu, 2017-08-24 at 16:09 -0600, Jonathan Corbet wrote:
> Commit 5657933dbb6e (treewide: Move dma_ops from struct dev_archdata into
> struct device) added the dma_ops field to struct device, but did not
> update the kerneldoc comment, yielding this warning:
> 
>   ./include/linux/device.h:969: warning: No description found for parameter 
> 'dma_ops'
> 
> Add a description and bring a little peace to the world.
> 
> Signed-off-by: Jonathan Corbet 
> ---
>  include/linux/device.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 723cd54b94da..ed9904728cda 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -847,6 +847,7 @@ struct dev_links_info {
>   * @msi_list:Hosts MSI descriptors
>   * @msi_domain: The generic MSI domain this device is using.
>   * @numa_node:   NUMA node this device is close to.
> + * @dma_ops:DMA mapping operations for this device.
>   * @dma_mask:Dma mask (if dma'ble device).
>   * @coherent_dma_mask: Like dma_mask, but for alloc_coherent mapping as not 
> all
>   *   hardware supports 64-bit addresses for consistent allocations

Thanks Jonathan for this patch.

Reviewed-by: Bart Van Assche 



[PATCH] usb: gadget: Add kerneldoc for some neglected structure fields

2017-08-24 Thread Jonathan Corbet
A couple of structures in  have incomplete kerneldoc
comments, leading to these warnings in the docs build:

./include/linux/usb/gadget.h:230: warning: No description found for parameter 
'claimed'
./include/linux/usb/gadget.h:230: warning: No description found for parameter 
'enabled'
./include/linux/usb/gadget.h:412: warning: No description found for parameter 
'quirk_altset_not_supp'
./include/linux/usb/gadget.h:412: warning: No description found for parameter 
'quirk_stall_not_supp'
./include/linux/usb/gadget.h:412: warning: No description found for parameter 
'quirk_zlp_not_supp'

Document those fields to make the warnings go away.

Signed-off-by: Jonathan Corbet 
---
 include/linux/usb/gadget.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h
index 1a4a4bacfae6..49d8e2b7ae4a 100644
--- a/include/linux/usb/gadget.h
+++ b/include/linux/usb/gadget.h
@@ -186,6 +186,8 @@ struct usb_ep_caps {
  * @ops: Function pointers used to access hardware-specific operations.
  * @ep_list:the gadget's ep_list holds all of its endpoints
  * @caps:The structure describing types and directions supported by endoint.
+ * @enabled: The current endpoint enabled/disabled state.
+ * @claimed: True if this endpoint is claimed by a function.
  * @maxpacket:The maximum packet size used on this endpoint.  The initial
  * value can sometimes be reduced (hardware allowing), according to
  * the endpoint descriptor used to configure the endpoint.
@@ -347,6 +349,9 @@ struct usb_gadget_ops {
  * or B-Peripheral wants to take host role.
  * @quirk_ep_out_aligned_size: epout requires buffer size to be aligned to
  * MaxPacketSize.
+ * @quirk_altset_not_supp: UDC controller doesn't support alt settings.
+ * @quirk_stall_not_supp: UDC controller doesn't support stalling.
+ * @quirk_zlp_not_supp: UDC controller doesn't support ZLP.
  * @quirk_avoids_skb_reserve: udc/platform wants to avoid skb_reserve() in
  * u_ether.c to improve performance.
  * @is_selfpowered: if the gadget is self-powered.
-- 
2.13.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] driver core: Document struct device:dma_ops

2017-08-24 Thread Jonathan Corbet
Commit 5657933dbb6e (treewide: Move dma_ops from struct dev_archdata into
struct device) added the dma_ops field to struct device, but did not
update the kerneldoc comment, yielding this warning:

  ./include/linux/device.h:969: warning: No description found for parameter 
'dma_ops'

Add a description and bring a little peace to the world.

Signed-off-by: Jonathan Corbet 
---
 include/linux/device.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 723cd54b94da..ed9904728cda 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -847,6 +847,7 @@ struct dev_links_info {
  * @msi_list:  Hosts MSI descriptors
  * @msi_domain: The generic MSI domain this device is using.
  * @numa_node: NUMA node this device is close to.
+ * @dma_ops:DMA mapping operations for this device.
  * @dma_mask:  Dma mask (if dma'ble device).
  * @coherent_dma_mask: Like dma_mask, but for alloc_coherent mapping as not all
  * hardware supports 64-bit addresses for consistent allocations
-- 
2.13.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] doc: Add documentation for the genalloc subsystem

2017-08-24 Thread Jonathan Corbet
Genalloc/genpool has kerneldoc comments, but nothing has ever been pulled
into the docs themselves.  Here's a first attempt, repurposed from an
article I wrote at https://lwn.net/Articles/729653/.

Signed-off-by: Jonathan Corbet 
---
 Documentation/core-api/genalloc.rst | 144 
 Documentation/core-api/index.rst|   1 +
 2 files changed, 145 insertions(+)
 create mode 100644 Documentation/core-api/genalloc.rst

diff --git a/Documentation/core-api/genalloc.rst 
b/Documentation/core-api/genalloc.rst
new file mode 100644
index ..6981fd627ccd
--- /dev/null
+++ b/Documentation/core-api/genalloc.rst
@@ -0,0 +1,144 @@
+The genalloc/genpool subsystem
+==
+
+There are a number of memory-allocation subsystems in the kernel, each
+aimed at a specific need.  Sometimes, however, a kernel developer needs to
+implement a new allocator for a specific range of special-purpose memory;
+often that memory is located on a device somewhere.  The author of the
+driver for that device can certainly write a little allocator to get the
+job done, but that is the way to fill the kernel with dozens of poorly
+tested allocators.  Back in 2005, Jes Sorensen lifted one of those
+allocators from the sym53c8xx_2 driver and posted_ it as a generic module
+for the creation of ad hoc memory allocators.  This code was merged
+for the 2.6.13 release; it has been modified considerably since then.
+
+.. _posted https://lwn.net/Articles/125842/
+
+Code using this allocator should include .  The action
+begins with the creation of a pool using one of:
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: gen_pool_create 
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: devm_gen_pool_create
+
+A call to :c:func:`gen_pool_create` will create a pool.  The granularity of
+allocations is set with min_alloc_order; it is a log-base-2 number like
+those used by the page allocator, but it refers to bytes rather than pages.
+So, if min_alloc_order is passed as 3, then all allocations will be a
+multiple of eight bytes.  Increasing min_alloc_order decreases the memory
+required to track the memory in the pool.  The nid parameter specifies
+which NUMA node should be used for the allocation of the housekeeping
+structures; it can be -1 if the caller doesn't care.
+
+The "managed" interface :c:func:`devm_gen_pool_create` ties the pool to a
+specific device.  Among other things, it will automatically clean up the
+pool when the given device is destroyed.
+
+A pool is shut down with:
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: gen_pool_destroy
+
+It's worth noting that, if there are still allocations outstanding from the
+given pool, this function will take the rather extreme step of invoking
+BUG(), crashing the entire system.  You have been warned.
+
+A freshly created pool has no memory to allocate.  It is fairly useless in
+that state, so one of the first orders of business is usually to add memory
+to the pool.  That can be done with one of:
+
+.. kernel-doc:: include/linux/genalloc.h
+   :functions: gen_pool_add
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: gen_pool_add_virt
+
+A call to :c:func:`gen_pool_add` will place the size bytes of memory
+starting at addr (in the kernel's virtual address space) into the given
+pool, once again using nid as the node ID for ancillary memory allocations.
+The :c:func:`gen_pool_add_virt` variant associates an explicit physical
+address with the memory; this is only necessary if the pool will be used
+for DMA allocations.
+
+The functions for allocating memory from the pool (and putting it back)
+are:
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: gen_pool_alloc
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: gen_pool_dma_alloc
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: gen_pool_free
+
+As one would expect, :c:func:`gen_pool_alloc` will allocate size< bytes
+from the given pool.  The :c:func:`gen_pool_dma_alloc` variant allocates
+memory for use with DMA operations, returning the associated physical
+address in the space pointed to by dma.  This will only work if the memory
+was added with :c:func:`gen_pool_add_virt`.  Note that this function
+departs from the usual genpool pattern of using unsigned long values to
+represent kernel addresses; it returns a void * instead.
+
+That all seems relatively simple; indeed, some developers clearly found it
+to be too simple.  After all, the interface above provides no control over
+how the allocation functions choose which specific piece of memory to
+return.  If that sort of control is needed, the following functions will be
+of interest:
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: gen_pool_alloc_algo
+
+.. kernel-doc:: lib/genalloc.c
+   :functions: gen_pool_set_algo
+
+Allocations with :c:func:`gen_pool_alloc_algo` specify an algorithm to be
+used to choose the memory to be allocated; the default algorithm can be set
+with :c:func:`gen_pool_set_algo`.  

Re: [PATCH] Documentation: stable-kernel-rules: fix broken git urls

2017-08-24 Thread Jonathan Corbet
On Sun, 13 Aug 2017 23:43:43 +0300
Andrii Bordunov  wrote:

> git.kernel.org links don't work (fatal: repository ... not found).
> Update them with the current style from https://git.kernel.org
> 
> There is no HTTP option, so also switch HTTP -> HTTPS.

Applied (finally), thanks.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] docs: fix nested numbering in the TOC

2017-08-24 Thread Jonathan Corbet
On Mon, 14 Aug 2017 10:15:16 +0200
Markus Heiser  wrote:

> TOC numbering is already set in::
> 
>   ./input/devices/index.rst:9:
>   ./media/uapi/cec/cec-api.rst:19:
> 
> I guess the nested numbering in:
> 
>   ./input/joydev/index.rst
>   ./media/uapi/cec/cec-funcs.rst
> 
> is just a C typo, so lets remove it.

Thanks, I've been meaning to get around to that for a bit; applied.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] docs-rst: Allow Sphinx version 1.6

2017-08-24 Thread Jonathan Corbet
On Wed, 23 Aug 2017 05:56:57 -0300
Mauro Carvalho Chehab  wrote:

> Now that the PDF building issues with Sphinx 1.6 got fixed,
> update the documentation and scripts accordingly.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/conf.py  | 3 ---
>  Documentation/doc-guide/sphinx.rst | 4 +---
>  scripts/sphinx-pre-install | 1 -
>  3 files changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 8e74d68037a5..0834a9933d69 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -331,9 +331,6 @@ latex_elements = {
>  \\setromanfont{DejaVu Sans}
>  \\setmonofont{DejaVu Sans Mono}
>  
> - % To allow adjusting table sizes
> - \\usepackage{adjustbox}
> -
>   '''
>  }

So this change doesn't quite match the changelog...what's the story there?

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v8] printk: Add monotonic, boottime, and realtime timestamps

2017-08-24 Thread John Stultz
On Thu, Aug 24, 2017 at 6:42 AM, Prarit Bhargava  wrote:
> --- a/include/linux/timekeeping.h
> +++ b/include/linux/timekeeping.h
> @@ -239,6 +239,7 @@ static inline u64 ktime_get_raw_ns(void)
>  extern u64 ktime_get_mono_fast_ns(void);
>  extern u64 ktime_get_raw_fast_ns(void);
>  extern u64 ktime_get_boot_fast_ns(void);
> +extern u64 ktime_get_real_offset(void);
>
>  /*
>   * Timespec interfaces utilizing the ktime based ones
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index fc47863f629c..dd972bc5c88b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
...
> + * printk_get_real_ns: - Return a realtime timestamp for printk messages
> + * On 32-bit systems selecting the real clock printk timestamp may lead to
> + * unlikely situations where a timestamp is wrong because the real time 
> offset
> + * is read without the protection of a sequence lock.
> + */
> +static u64 printk_get_real_ns(void)
> +{
> +   return ktime_get_mono_fast_ns() + ktime_get_real_offset();
> +}
> +
...
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index d111039e0245..de07bb5ffef5 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -508,6 +508,11 @@ u64 notrace ktime_get_boot_fast_ns(void)
>  }
>  EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
>
> +u64 ktime_get_real_offset(void)
> +{
> +   return ktime_to_ns(tk_core.timekeeper.offs_real);
> +}
> +

Apologies! One last nit here.  So if we're going to export
ktime_get_real_offset(), folks are going to use it, and there is no
documentation about its very sketchy behavioral limits as an
interface, except elsewhere in the printk code.

Instead of doing that, could you export a
__ktime_get_real_fast_ns_unsafe() function, which has its limits
(calculating the realtime w/o locks, thus may return completely bad
values occasionally) clearly documented in the timekeeping code?

You can then use that directly in your printk code, and others who
find it and think "Hey this would be great for my
life-safety-critical-system driver" will be clearly warned.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2] media: open.rst: document devnode-centric and mc-centric types

2017-08-24 Thread Sakari Ailus
Hi Mauro,

Thanks for the update! A few comments below.

(Cc Hans and Laurent.)

On Thu, Aug 24, 2017 at 09:07:35AM -0300, Mauro Carvalho Chehab wrote:
> From: "mche...@s-opensource.com" 
> 
> When we added support for omap3, back in 2010, we added a new
> type of V4L2 devices that aren't fully controlled via the V4L2
> device node. Yet, we never made it clear, at the V4L2 spec,
> about the differences between both types.
> 
> Let's document them with the current implementation.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> Signed-off-by: Mauro Carvalho Chehab 

Pick one. :-)

> ---
>  Documentation/media/uapi/v4l/open.rst | 47 
> +++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/Documentation/media/uapi/v4l/open.rst 
> b/Documentation/media/uapi/v4l/open.rst
> index afd116edb40d..cf522d9bb53c 100644
> --- a/Documentation/media/uapi/v4l/open.rst
> +++ b/Documentation/media/uapi/v4l/open.rst
> @@ -6,6 +6,53 @@
>  Opening and Closing Devices
>  ***
>  
> +Types of V4L2 device control
> +
> +
> +V4L2 devices are usually complex: they're implemented via a main driver and
> +often several additional drivers. The main driver always exposes one or
> +more **V4L2 device** devnodes (see :ref:`v4l2_device_naming`). The other
> +devices are called **V4L2 sub-devices**. They are usually controlled via a
> +serial bus (I2C or SMBus).

Reading this paragraph again, I think this is meant as an introduction to
the reader. In this level of documentation (user space), it's fair to
describe hardware. As the V4L2 sub-device concept in the kernel and the
V4L2 sub-device API which is visible to the user space are not exactly the
same things, I think I'd avoid using V4L2 sub-device concept when referring
to hardware and how the concept is used in the kernel.

How about, to replace the two last sentences:

The other devices are typically I²C or SPI devices. Depending on the main
driver, these devices are controlled either implicitly through the main
driver or explicitly through the **V4L2 sub-device** interface.

(I'm not sure the second sentence is even needed here.)

> +
> +When V4L2 started, there was only one type of device control. The entire
> +device was controlled via the **V4L2 device nodes**. We refer to this
> +kind of control as **V4L2 device-centric** (or, simply, **device-centric**).

I'm still not quite happy with the naming. "Device" is too generic. How
about "V4L2-centric"? SDR, radio and video nodes are part of V4L2, so I
think that should convey the message as well as is factually correct.

> +
> +Since the end of 2010, a new type of V4L2 device control was added in order
> +to support complex devices that are common on embedded systems. Those
> +devices are controlled mainly via the media controller and sub-devices.
> +So, they're called: **media controller centric** (or, simply,
> +"**mc-centric**").

Looks good. I'd use capital "M" in Media controller.

> +
> +On **device-centric** control, the device and their corresponding hardware
> +pipelines are controlled via the **V4L2 device** node. They may optionally
> +expose the hardware pipelines via the
> +:ref:`media controller API `.

s/the hardware pipelines//

It could be that there is a media device, but still no pipeline. Think of
lens and flash devices, for instance.

> +
> +On a **mc-centric**, before using the V4L2 device, it is required to

In English, abbreviations are capitalised, i.e. "MC-centric".

> +set the hardware pipelines via the
> +:ref:`media controller API `. On those devices, the
> +sub-devices' configuration can be controlled via the
> +:ref:`sub-device API `, with creates one device node per sub device.
> +
> +In summary, on **mc-centric** devices:
> +
> +- The **V4L2 device** node is mainly responsible for controlling the
> +  streaming features;
> +- The **media controller device** is responsible to setup the pipelines
> +  and image settings (like size and format);

"image settings (like size and format)" go to the sub-device bullet below.

> +- The **V4L2 sub-devices** are responsible for sub-device
> +  specific settings.
> +
> +.. note::
> +
> +   It is forbidden for **device-centric** devices to expose V4L2
> +   sub-devices via :ref:`sub-device API `, although this
> +   might change in the future.

I believe we agree on the subject matter but we can still argue. :-D

Could you drop ", although this might change in the future" part? We've
established that there is no use case currently and we could well allow
things in the API that were not allowed before without a specific not in
the spec.

> +
> +
> +.. _v4l2_device_naming:
>  
>  Device Naming
>  =

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ai...@iki.fi
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-24 Thread Roman Gushchin
On Thu, Aug 24, 2017 at 04:13:37PM +0200, Michal Hocko wrote:
> On Thu 24-08-17 14:58:42, Roman Gushchin wrote:
> > On Thu, Aug 24, 2017 at 02:58:11PM +0200, Michal Hocko wrote:
> > > On Thu 24-08-17 13:28:46, Roman Gushchin wrote:
> > > > Hi Michal!
> > > > 
> > > There is nothing like a "better victim". We are pretty much in a
> > > catastrophic situation when we try to survive by killing a userspace.
> > 
> > Not necessary, it can be a cgroup OOM.
> 
> memcg OOM is no different. The catastrophic is scoped to the specific
> hierarchy but tasks in that hierarchy still fail to make a further
> progress.
> 
> > > We try to kill the largest because that assumes that we return the
> > > most memory from it. Now I do understand that you want to treat the
> > > memcg as a single killable entity but I find it really questionable
> > > to do a per-memcg metric and then do not treat it like that and kill
> > > only a single task. Just imagine a single memcg with zillions of taks
> > > each very small and you select it as the largest while a small taks
> > > itself doesn't help to help to get us out of the OOM.
> > 
> > I don't think it's different from a non-containerized state: if you
> > have a zillion of small tasks in the system, you'll meet the same issues.
> 
> Yes this is possible but usually you are comparing apples to apples so
> you will kill the largest offender and then go on. To be honest I really
> do hate how we try to kill a children rather than the selected victim
> for the same reason.

I do hate it too.

> 
> > > > > I guess I have asked already and we haven't reached any consensus. I 
> > > > > do
> > > > > not like how you treat memcgs and tasks differently. Why cannot we 
> > > > > have
> > > > > a memcg score a sum of all its tasks?
> > > > 
> > > > It sounds like a more expensive way to get almost the same with less 
> > > > accuracy.
> > > > Why it's better?
> > > 
> > > because then you are comparing apples to apples?
> > 
> > Well, I can say that I compare some number of pages against some other 
> > number
> > of pages. And the relation between a page and memcg is more obvious, than a
> > relation between a page and a process.
> 
> But you are comparing different accounting systems.
>  
> > Both ways are not ideal, and sum of the processes is not ideal too.
> > Especially, if you take oom_score_adj into account. Will you respect it?
> 
> Yes, and I do not see any reason why we shouldn't.

It makes things even more complicated.
Right now task's oom_score can be in (~ -total_memory, ~ +2*total_memory) range,
and it you're starting summing it, it can be multiplied by number of tasks...
Weird.
It also will be different in case of system and memcg-wide OOM.

> 
> > I've started actually with such approach, but then found it weird.
> > 
> > > Besides that you have
> > > to check each task for over-killing anyway. So I do not see any
> > > performance merits here.
> > 
> > It's an implementation detail, and we can hopefully get rid of it at some 
> > point.
> 
> Well, we might do some estimations and ignore oom scopes but I that
> sounds really complicated and error prone. Unless we have anything like
> that then I would start from tasks and build up the necessary to make a
> decision at the higher level.

Seriously speaking, do you have an example, when summing per-process
oom_score will work better?

Especially, if we're talking about customizing oom_score calculation,
it makes no sence to me. How you will sum process timestamps?

>  
> > > > > How do you want to compare memcg score with tasks score?
> > > > 
> > > > I have to do it for tasks in root cgroups, but it shouldn't be a common 
> > > > case.
> > > 
> > > How come? I can easily imagine a setup where only some memcgs which
> > > really do need a kill-all semantic while all others can live with single
> > > task killed perfectly fine.
> > 
> > I mean taking a unified cgroup hierarchy into an account, there should not
> > be lot of tasks in the root cgroup, if any.
> 
> Is that really the case? I would assume that memory controller would be
> enabled only in those subtrees which really use the functionality and
> the rest will be sitting in the root memcg. It might be the case if you
> are running only containers but I am not really sure this is true in
> general.

Agree.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 01/19] crypto: change transient busy return code to -EAGAIN

2017-08-24 Thread Gilad Ben-Yossef
The crypto API was using the -EBUSY return value to indicate
both a hard failure to submit a crypto operation into a
transformation provider when the latter was busy and the backlog
mechanism was not enabled as well as a notification that the
operation was queued into the backlog when the backlog mechanism
was enabled.

Having the same return code indicate two very different conditions
depending on a flag is both error prone and requires extra runtime
check like the following to discern between the cases:

if (err == -EINPROGRESS ||
(err == -EBUSY && (ahash_request_flags(req) &
   CRYPTO_TFM_REQ_MAY_BACKLOG)))

This patch changes the return code used to indicate a crypto op
failed due to the transformation provider being transiently busy
to -EAGAIN.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/algapi.c |  6 --
 crypto/algif_hash.c | 20 +---
 crypto/cryptd.c |  4 +---
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index aa699ff..916bee3 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -897,9 +897,11 @@ int crypto_enqueue_request(struct crypto_queue *queue,
int err = -EINPROGRESS;
 
if (unlikely(queue->qlen >= queue->max_qlen)) {
-   err = -EBUSY;
-   if (!(request->flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (!(request->flags & CRYPTO_TFM_REQ_MAY_BACKLOG)) {
+   err = -EAGAIN;
goto out;
+   }
+   err = -EBUSY;
if (queue->backlog == >list)
queue->backlog = >list;
}
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 5e92bd2..3b3c154 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -39,6 +39,20 @@ struct algif_hash_tfm {
bool has_key;
 };
 
+/* Previous versions of crypto_* ops used to return -EBUSY
+ * rather than -EAGAIN to indicate being tied up. The in
+ * kernel API changed but we don't want to break the user
+ * space API. As only the hash user interface exposed this
+ * error ever to the user, do the translation here.
+ */
+static inline int crypto_user_err(int err)
+{
+   if (err == -EAGAIN)
+   return -EBUSY;
+
+   return err;
+}
+
 static int hash_alloc_result(struct sock *sk, struct hash_ctx *ctx)
 {
unsigned ds;
@@ -136,7 +150,7 @@ static int hash_sendmsg(struct socket *sock, struct msghdr 
*msg,
 unlock:
release_sock(sk);
 
-   return err ?: copied;
+   return err ? crypto_user_err(err) : copied;
 }
 
 static ssize_t hash_sendpage(struct socket *sock, struct page *page,
@@ -188,7 +202,7 @@ static ssize_t hash_sendpage(struct socket *sock, struct 
page *page,
 unlock:
release_sock(sk);
 
-   return err ?: size;
+   return err ? crypto_user_err(err) : size;
 }
 
 static int hash_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
@@ -236,7 +250,7 @@ static int hash_recvmsg(struct socket *sock, struct msghdr 
*msg, size_t len,
hash_free_result(sk, ctx);
release_sock(sk);
 
-   return err ?: len;
+   return err ? crypto_user_err(err) : len;
 }
 
 static int hash_accept(struct socket *sock, struct socket *newsock, int flags,
diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index 0508c48..d1dbdce 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -137,16 +137,14 @@ static int cryptd_enqueue_request(struct cryptd_queue 
*queue,
int cpu, err;
struct cryptd_cpu_queue *cpu_queue;
atomic_t *refcnt;
-   bool may_backlog;
 
cpu = get_cpu();
cpu_queue = this_cpu_ptr(queue->cpu_queue);
err = crypto_enqueue_request(_queue->queue, request);
 
refcnt = crypto_tfm_ctx(request->tfm);
-   may_backlog = request->flags & CRYPTO_TFM_REQ_MAY_BACKLOG;
 
-   if (err == -EBUSY && !may_backlog)
+   if (err == -EAGAIN)
goto out_put_cpu;
 
queue_work_on(cpu, kcrypto_wq, _queue->work);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 02/19] crypto: ccp: use -EAGAIN for transient busy indication

2017-08-24 Thread Gilad Ben-Yossef
Replace -EBUSY with -EAGAIN when reporting transient busy
indication in the absence of backlog.

Signed-off-by: Gilad Ben-Yossef 
Reviewed-by: Gary R Hook 
---
 drivers/crypto/ccp/ccp-crypto-main.c | 8 +++-
 drivers/crypto/ccp/ccp-dev.c | 7 +--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/ccp/ccp-crypto-main.c 
b/drivers/crypto/ccp/ccp-crypto-main.c
index 35a9de7..403ff0a 100644
--- a/drivers/crypto/ccp/ccp-crypto-main.c
+++ b/drivers/crypto/ccp/ccp-crypto-main.c
@@ -222,9 +222,10 @@ static int ccp_crypto_enqueue_cmd(struct ccp_crypto_cmd 
*crypto_cmd)
 
/* Check if the cmd can/should be queued */
if (req_queue.cmd_count >= CCP_CRYPTO_MAX_QLEN) {
-   ret = -EBUSY;
-   if (!(crypto_cmd->cmd->flags & CCP_CMD_MAY_BACKLOG))
+   if (!(crypto_cmd->cmd->flags & CCP_CMD_MAY_BACKLOG)) {
+   ret = -EAGAIN;
goto e_lock;
+   }
}
 
/* Look for an entry with the same tfm.  If there is a cmd
@@ -243,9 +244,6 @@ static int ccp_crypto_enqueue_cmd(struct ccp_crypto_cmd 
*crypto_cmd)
ret = ccp_enqueue_cmd(crypto_cmd->cmd);
if (!ccp_crypto_success(ret))
goto e_lock;/* Error, don't queue it */
-   if ((ret == -EBUSY) &&
-   !(crypto_cmd->cmd->flags & CCP_CMD_MAY_BACKLOG))
-   goto e_lock;/* Not backlogging, don't queue it */
}
 
if (req_queue.cmd_count >= CCP_CRYPTO_MAX_QLEN) {
diff --git a/drivers/crypto/ccp/ccp-dev.c b/drivers/crypto/ccp/ccp-dev.c
index 4e029b1..3d637e3 100644
--- a/drivers/crypto/ccp/ccp-dev.c
+++ b/drivers/crypto/ccp/ccp-dev.c
@@ -292,9 +292,12 @@ int ccp_enqueue_cmd(struct ccp_cmd *cmd)
i = ccp->cmd_q_count;
 
if (ccp->cmd_count >= MAX_CMD_QLEN) {
-   ret = -EBUSY;
-   if (cmd->flags & CCP_CMD_MAY_BACKLOG)
+   if (cmd->flags & CCP_CMD_MAY_BACKLOG) {
+   ret = -EBUSY;
list_add_tail(>entry, >backlog);
+   } else {
+   ret = -EAGAIN;
+   }
} else {
ret = -EINPROGRESS;
ccp->cmd_count++;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 03/19] crypto: remove redundant backlog checks on EBUSY

2017-08-24 Thread Gilad Ben-Yossef
Now that -EBUSY return code only indicates backlog queueing
we can safely remove the now redundant check for the
CRYPTO_TFM_REQ_MAY_BACKLOG flag when -EBUSY is returned.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/ahash.c| 12 +++-
 crypto/cts.c  |  6 ++
 crypto/lrw.c  |  8 ++--
 crypto/rsa-pkcs1pad.c | 16 
 crypto/xts.c  |  8 ++--
 5 files changed, 13 insertions(+), 37 deletions(-)

diff --git a/crypto/ahash.c b/crypto/ahash.c
index 5e8666e..3a35d67 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -334,9 +334,7 @@ static int ahash_op_unaligned(struct ahash_request *req,
return err;
 
err = op(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && (ahash_request_flags(req) &
-  CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
 
ahash_restore_req(req, err);
@@ -394,9 +392,7 @@ static int ahash_def_finup_finish1(struct ahash_request 
*req, int err)
req->base.complete = ahash_def_finup_done2;
 
err = crypto_ahash_reqtfm(req)->final(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && (ahash_request_flags(req) &
-  CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
 
 out:
@@ -432,9 +428,7 @@ static int ahash_def_finup(struct ahash_request *req)
return err;
 
err = tfm->update(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && (ahash_request_flags(req) &
-  CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
 
return ahash_def_finup_finish1(req, err);
diff --git a/crypto/cts.c b/crypto/cts.c
index 243f591..4773c18 100644
--- a/crypto/cts.c
+++ b/crypto/cts.c
@@ -136,8 +136,7 @@ static void crypto_cts_encrypt_done(struct 
crypto_async_request *areq, int err)
goto out;
 
err = cts_cbc_encrypt(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (err == -EINPROGRESS || err == -EBUSY)
return;
 
 out:
@@ -229,8 +228,7 @@ static void crypto_cts_decrypt_done(struct 
crypto_async_request *areq, int err)
goto out;
 
err = cts_cbc_decrypt(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (err == -EINPROGRESS || err == -EBUSY)
return;
 
 out:
diff --git a/crypto/lrw.c b/crypto/lrw.c
index a8bfae4..695cea9 100644
--- a/crypto/lrw.c
+++ b/crypto/lrw.c
@@ -328,9 +328,7 @@ static int do_encrypt(struct skcipher_request *req, int err)
  crypto_skcipher_encrypt(subreq) ?:
  post_crypt(req);
 
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY &&
-req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
}
 
@@ -380,9 +378,7 @@ static int do_decrypt(struct skcipher_request *req, int err)
  crypto_skcipher_decrypt(subreq) ?:
  post_crypt(req);
 
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY &&
-req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
}
 
diff --git a/crypto/rsa-pkcs1pad.c b/crypto/rsa-pkcs1pad.c
index 407c64b..2908f93 100644
--- a/crypto/rsa-pkcs1pad.c
+++ b/crypto/rsa-pkcs1pad.c
@@ -279,9 +279,7 @@ static int pkcs1pad_encrypt(struct akcipher_request *req)
   req->dst, ctx->key_size - 1, req->dst_len);
 
err = crypto_akcipher_encrypt(_ctx->child_req);
-   if (err != -EINPROGRESS &&
-   (err != -EBUSY ||
-!(req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err != -EINPROGRESS && err != -EBUSY)
return pkcs1pad_encrypt_sign_complete(req, err);
 
return err;
@@ -383,9 +381,7 @@ static int pkcs1pad_decrypt(struct akcipher_request *req)
   ctx->key_size);
 
err = crypto_akcipher_decrypt(_ctx->child_req);
-   if (err != -EINPROGRESS &&
-   (err != -EBUSY ||
-!(req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err != -EINPROGRESS && err != -EBUSY)
return pkcs1pad_decrypt_complete(req, err);
 
return err;
@@ -440,9 +436,7 @@ static int pkcs1pad_sign(struct akcipher_request *req)
   req->dst, ctx->key_size - 1, req->dst_len);
 
err = crypto_akcipher_sign(_ctx->child_req);
-   if 

[PATCH v7 04/19] crypto: marvell/cesa: remove redundant backlog checks on EBUSY

2017-08-24 Thread Gilad Ben-Yossef
Now that -EBUSY return code only indicates backlog queueing
we can safely remove the now redundant check for the
CRYPTO_TFM_REQ_MAY_BACKLOG flag when -EBUSY is returned.

Signed-off-by: Gilad Ben-Yossef 
Acked-by: Boris Brezillon 
---
 drivers/crypto/marvell/cesa.c | 3 +--
 drivers/crypto/marvell/cesa.h | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c
index 6e7a5c7..0c3909d 100644
--- a/drivers/crypto/marvell/cesa.c
+++ b/drivers/crypto/marvell/cesa.c
@@ -183,8 +183,7 @@ int mv_cesa_queue_req(struct crypto_async_request *req,
spin_lock_bh(>lock);
ret = crypto_enqueue_request(>queue, req);
if ((mv_cesa_req_get_type(creq) == CESA_DMA_REQ) &&
-   (ret == -EINPROGRESS ||
-   (ret == -EBUSY && req->flags & CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   (ret == -EINPROGRESS || ret == -EBUSY))
mv_cesa_tdma_chain(engine, creq);
spin_unlock_bh(>lock);
 
diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index b7872f6..63c8457 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -763,7 +763,7 @@ static inline int mv_cesa_req_needs_cleanup(struct 
crypto_async_request *req,
 * the backlog and will be processed later. There's no need to
 * clean it up.
 */
-   if (ret == -EBUSY && req->flags & CRYPTO_TFM_REQ_MAY_BACKLOG)
+   if (ret == -EBUSY)
return false;
 
/* Request wasn't queued, we need to clean it up */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 05/19] crypto: introduce crypto wait for async op

2017-08-24 Thread Gilad Ben-Yossef
Invoking a possibly async. crypto op and waiting for completion
while correctly handling backlog processing is a common task
in the crypto API implementation and outside users of it.

This patch adds a generic implementation for doing so in
preparation for using it across the board instead of hand
rolled versions.

Signed-off-by: Gilad Ben-Yossef 
CC: Eric Biggers 
CC: Jonathan Cameron 
---
 crypto/api.c   | 13 +
 include/linux/crypto.h | 40 
 2 files changed, 53 insertions(+)

diff --git a/crypto/api.c b/crypto/api.c
index 941cd4c..2a2479d 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 LIST_HEAD(crypto_alg_list);
@@ -595,5 +596,17 @@ int crypto_has_alg(const char *name, u32 type, u32 mask)
 }
 EXPORT_SYMBOL_GPL(crypto_has_alg);
 
+void crypto_req_done(struct crypto_async_request *req, int err)
+{
+   struct crypto_wait *wait = req->data;
+
+   if (err == -EINPROGRESS)
+   return;
+
+   wait->err = err;
+   complete(>completion);
+}
+EXPORT_SYMBOL_GPL(crypto_req_done);
+
 MODULE_DESCRIPTION("Cryptographic core API");
 MODULE_LICENSE("GPL");
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 84da997..78508ca 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Autoloaded crypto modules should only use a prefixed name to avoid allowing
@@ -468,6 +469,45 @@ struct crypto_alg {
 } CRYPTO_MINALIGN_ATTR;
 
 /*
+ * A helper struct for waiting for completion of async crypto ops
+ */
+struct crypto_wait {
+   struct completion completion;
+   int err;
+};
+
+/*
+ * Macro for declaring a crypto op async wait object on stack
+ */
+#define DECLARE_CRYPTO_WAIT(_wait) \
+   struct crypto_wait _wait = { \
+   COMPLETION_INITIALIZER_ONSTACK((_wait).completion), 0 }
+
+/*
+ * Async ops completion helper functioons
+ */
+void crypto_req_done(struct crypto_async_request *req, int err);
+
+static inline int crypto_wait_req(int err, struct crypto_wait *wait)
+{
+   switch (err) {
+   case -EINPROGRESS:
+   case -EBUSY:
+   wait_for_completion(>completion);
+   reinit_completion(>completion);
+   err = wait->err;
+   break;
+   };
+
+   return err;
+}
+
+static inline void crypto_init_wait(struct crypto_wait *wait)
+{
+   init_completion(>completion);
+}
+
+/*
  * Algorithm registration interface.
  */
 int crypto_register_alg(struct crypto_alg *alg);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 06/19] crypto: move algif to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
algif starts several async crypto ops and waits for their completion.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/af_alg.c | 27 ---
 crypto/algif_aead.c |  8 
 crypto/algif_hash.c | 30 ++
 crypto/algif_skcipher.c |  9 -
 include/crypto/if_alg.h | 15 +--
 5 files changed, 23 insertions(+), 66 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index ffa9f4c..cf312ed 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -481,33 +481,6 @@ int af_alg_cmsg_send(struct msghdr *msg, struct 
af_alg_control *con)
 }
 EXPORT_SYMBOL_GPL(af_alg_cmsg_send);
 
-int af_alg_wait_for_completion(int err, struct af_alg_completion *completion)
-{
-   switch (err) {
-   case -EINPROGRESS:
-   case -EBUSY:
-   wait_for_completion(>completion);
-   reinit_completion(>completion);
-   err = completion->err;
-   break;
-   };
-
-   return err;
-}
-EXPORT_SYMBOL_GPL(af_alg_wait_for_completion);
-
-void af_alg_complete(struct crypto_async_request *req, int err)
-{
-   struct af_alg_completion *completion = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   completion->err = err;
-   complete(>completion);
-}
-EXPORT_SYMBOL_GPL(af_alg_complete);
-
 /**
  * af_alg_alloc_tsgl - allocate the TX SGL
  *
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 516b38c..aacae08 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -278,11 +278,11 @@ static int _aead_recvmsg(struct socket *sock, struct 
msghdr *msg,
/* Synchronous operation */
aead_request_set_callback(>cra_u.aead_req,
  CRYPTO_TFM_REQ_MAY_BACKLOG,
- af_alg_complete, >completion);
-   err = af_alg_wait_for_completion(ctx->enc ?
+ crypto_req_done, >wait);
+   err = crypto_wait_req(ctx->enc ?
crypto_aead_encrypt(>cra_u.aead_req) :
crypto_aead_decrypt(>cra_u.aead_req),
->completion);
+   >wait);
}
 
/* AIO operation in progress */
@@ -554,7 +554,7 @@ static int aead_accept_parent_nokey(void *private, struct 
sock *sk)
ctx->merge = 0;
ctx->enc = 0;
ctx->aead_assoclen = 0;
-   af_alg_init_completion(>completion);
+   crypto_init_wait(>wait);
 
ask->private = ctx;
 
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 3b3c154..d2ab8de 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -26,7 +26,7 @@ struct hash_ctx {
 
u8 *result;
 
-   struct af_alg_completion completion;
+   struct crypto_wait wait;
 
unsigned int len;
bool more;
@@ -102,8 +102,7 @@ static int hash_sendmsg(struct socket *sock, struct msghdr 
*msg,
if ((msg->msg_flags & MSG_MORE))
hash_free_result(sk, ctx);
 
-   err = af_alg_wait_for_completion(crypto_ahash_init(>req),
-   >completion);
+   err = crypto_wait_req(crypto_ahash_init(>req), >wait);
if (err)
goto unlock;
}
@@ -124,8 +123,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr 
*msg,
 
ahash_request_set_crypt(>req, ctx->sgl.sg, NULL, len);
 
-   err = af_alg_wait_for_completion(crypto_ahash_update(>req),
->completion);
+   err = crypto_wait_req(crypto_ahash_update(>req),
+ >wait);
af_alg_free_sg(>sgl);
if (err)
goto unlock;
@@ -143,8 +142,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr 
*msg,
goto unlock;
 
ahash_request_set_crypt(>req, NULL, ctx->result, 0);
-   err = af_alg_wait_for_completion(crypto_ahash_final(>req),
->completion);
+   err = crypto_wait_req(crypto_ahash_final(>req),
+ >wait);
}
 
 unlock:
@@ -185,7 +184,7 @@ static ssize_t hash_sendpage(struct socket *sock, struct 
page *page,
} else {
if (!ctx->more) {
err = crypto_ahash_init(>req);
-   err = af_alg_wait_for_completion(err, >completion);
+   err = crypto_wait_req(err, >wait);
if (err)
goto unlock;
}
@@ -193,7 +192,7 @@ static ssize_t hash_sendpage(struct socket *sock, struct 
page *page,
err = 

[PATCH v7 07/19] crypto: move pub key to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
public_key_verify_signature() is starting an async crypto op and
waiting for it to complete. Move it over to generic code doing
the same.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/asymmetric_keys/public_key.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/crypto/asymmetric_keys/public_key.c 
b/crypto/asymmetric_keys/public_key.c
index 3cd6e12..d916235 100644
--- a/crypto/asymmetric_keys/public_key.c
+++ b/crypto/asymmetric_keys/public_key.c
@@ -57,29 +57,13 @@ static void public_key_destroy(void *payload0, void 
*payload3)
public_key_signature_free(payload3);
 }
 
-struct public_key_completion {
-   struct completion completion;
-   int err;
-};
-
-static void public_key_verify_done(struct crypto_async_request *req, int err)
-{
-   struct public_key_completion *compl = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   compl->err = err;
-   complete(>completion);
-}
-
 /*
  * Verify a signature using a public key.
  */
 int public_key_verify_signature(const struct public_key *pkey,
const struct public_key_signature *sig)
 {
-   struct public_key_completion compl;
+   struct crypto_wait cwait;
struct crypto_akcipher *tfm;
struct akcipher_request *req;
struct scatterlist sig_sg, digest_sg;
@@ -131,20 +115,16 @@ int public_key_verify_signature(const struct public_key 
*pkey,
sg_init_one(_sg, output, outlen);
akcipher_request_set_crypt(req, _sg, _sg, sig->s_size,
   outlen);
-   init_completion();
+   crypto_init_wait();
akcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
  CRYPTO_TFM_REQ_MAY_SLEEP,
- public_key_verify_done, );
+ crypto_req_done, );
 
/* Perform the verification calculation.  This doesn't actually do the
 * verification, but rather calculates the hash expected by the
 * signature and returns that to us.
 */
-   ret = crypto_akcipher_verify(req);
-   if ((ret == -EINPROGRESS) || (ret == -EBUSY)) {
-   wait_for_completion();
-   ret = compl.err;
-   }
+   ret = crypto_wait_req(crypto_akcipher_verify(req), );
if (ret < 0)
goto out_free_output;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 08/19] crypto: move drbg to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
DRBG is starting an async. crypto op and waiting for it complete.
Move it over to generic code doing the same.

The code now also passes CRYPTO_TFM_REQ_MAY_SLEEP flag indicating
crypto request memory allocation may use GFP_KERNEL which should
be perfectly fine as the code is obviously sleeping for the
completion of the request any way.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/drbg.c | 36 +---
 include/crypto/drbg.h |  3 +--
 2 files changed, 10 insertions(+), 29 deletions(-)

diff --git a/crypto/drbg.c b/crypto/drbg.c
index 633a88e..c522251 100644
--- a/crypto/drbg.c
+++ b/crypto/drbg.c
@@ -1651,16 +1651,6 @@ static int drbg_fini_sym_kernel(struct drbg_state *drbg)
return 0;
 }
 
-static void drbg_skcipher_cb(struct crypto_async_request *req, int error)
-{
-   struct drbg_state *drbg = req->data;
-
-   if (error == -EINPROGRESS)
-   return;
-   drbg->ctr_async_err = error;
-   complete(>ctr_completion);
-}
-
 static int drbg_init_sym_kernel(struct drbg_state *drbg)
 {
struct crypto_cipher *tfm;
@@ -1691,7 +1681,7 @@ static int drbg_init_sym_kernel(struct drbg_state *drbg)
return PTR_ERR(sk_tfm);
}
drbg->ctr_handle = sk_tfm;
-   init_completion(>ctr_completion);
+   crypto_init_wait(>ctr_wait);
 
req = skcipher_request_alloc(sk_tfm, GFP_KERNEL);
if (!req) {
@@ -1700,8 +1690,9 @@ static int drbg_init_sym_kernel(struct drbg_state *drbg)
return -ENOMEM;
}
drbg->ctr_req = req;
-   skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-   drbg_skcipher_cb, drbg);
+   skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+   CRYPTO_TFM_REQ_MAY_SLEEP,
+   crypto_req_done, >ctr_wait);
 
alignmask = crypto_skcipher_alignmask(sk_tfm);
drbg->ctr_null_value_buf = kzalloc(DRBG_CTR_NULL_LEN + alignmask,
@@ -1762,21 +1753,12 @@ static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
/* Output buffer may not be valid for SGL, use scratchpad */
skcipher_request_set_crypt(drbg->ctr_req, _in, _out,
   cryptlen, drbg->V);
-   ret = crypto_skcipher_encrypt(drbg->ctr_req);
-   switch (ret) {
-   case 0:
-   break;
-   case -EINPROGRESS:
-   case -EBUSY:
-   wait_for_completion(>ctr_completion);
-   if (!drbg->ctr_async_err) {
-   reinit_completion(>ctr_completion);
-   break;
-   }
-   default:
+   ret = crypto_wait_req(crypto_skcipher_encrypt(drbg->ctr_req),
+   >ctr_wait);
+   if (ret)
goto out;
-   }
-   init_completion(>ctr_completion);
+
+   crypto_init_wait(>ctr_wait);
 
memcpy(outbuf, drbg->outscratchpad, cryptlen);
 
diff --git a/include/crypto/drbg.h b/include/crypto/drbg.h
index 22f884c..8f94110 100644
--- a/include/crypto/drbg.h
+++ b/include/crypto/drbg.h
@@ -126,8 +126,7 @@ struct drbg_state {
__u8 *ctr_null_value;   /* CTR mode aligned zero buf */
__u8 *outscratchpadbuf; /* CTR mode output scratchpad */
 __u8 *outscratchpad;   /* CTR mode aligned outbuf */
-   struct completion ctr_completion;   /* CTR mode async handler */
-   int ctr_async_err;  /* CTR mode async error */
+   struct crypto_wait ctr_wait;/* CTR mode async wait obj */
 
bool seeded;/* DRBG fully seeded? */
bool pr;/* Prediction resistance enabled? */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 10/19] crypto: move testmgr to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
testmgr is starting async. crypto ops and waiting for them to complete.
Move it over to generic code doing the same.

This also provides a test of the generic crypto async. wait code.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/testmgr.c | 204 ++-
 1 file changed, 66 insertions(+), 138 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 8a124d3..af968f4 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -76,11 +76,6 @@ int alg_test(const char *driver, const char *alg, u32 type, 
u32 mask)
 #define ENCRYPT 1
 #define DECRYPT 0
 
-struct tcrypt_result {
-   struct completion completion;
-   int err;
-};
-
 struct aead_test_suite {
struct {
const struct aead_testvec *vecs;
@@ -155,17 +150,6 @@ static void hexdump(unsigned char *buf, unsigned int len)
buf, len, false);
 }
 
-static void tcrypt_complete(struct crypto_async_request *req, int err)
-{
-   struct tcrypt_result *res = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   res->err = err;
-   complete(>completion);
-}
-
 static int testmgr_alloc_buf(char *buf[XBUFSIZE])
 {
int i;
@@ -193,20 +177,10 @@ static void testmgr_free_buf(char *buf[XBUFSIZE])
free_page((unsigned long)buf[i]);
 }
 
-static int wait_async_op(struct tcrypt_result *tr, int ret)
-{
-   if (ret == -EINPROGRESS || ret == -EBUSY) {
-   wait_for_completion(>completion);
-   reinit_completion(>completion);
-   ret = tr->err;
-   }
-   return ret;
-}
-
 static int ahash_partial_update(struct ahash_request **preq,
struct crypto_ahash *tfm, const struct hash_testvec *template,
void *hash_buff, int k, int temp, struct scatterlist *sg,
-   const char *algo, char *result, struct tcrypt_result *tresult)
+   const char *algo, char *result, struct crypto_wait *wait)
 {
char *state;
struct ahash_request *req;
@@ -236,7 +210,7 @@ static int ahash_partial_update(struct ahash_request **preq,
}
ahash_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG,
-   tcrypt_complete, tresult);
+   crypto_req_done, wait);
 
memcpy(hash_buff, template->plaintext + temp,
template->tap[k]);
@@ -247,7 +221,7 @@ static int ahash_partial_update(struct ahash_request **preq,
pr_err("alg: hash: Failed to import() for %s\n", algo);
goto out;
}
-   ret = wait_async_op(tresult, crypto_ahash_update(req));
+   ret = crypto_wait_req(crypto_ahash_update(req), wait);
if (ret)
goto out;
*preq = req;
@@ -272,7 +246,7 @@ static int __test_hash(struct crypto_ahash *tfm,
char *result;
char *key;
struct ahash_request *req;
-   struct tcrypt_result tresult;
+   struct crypto_wait wait;
void *hash_buff;
char *xbuf[XBUFSIZE];
int ret = -ENOMEM;
@@ -286,7 +260,7 @@ static int __test_hash(struct crypto_ahash *tfm,
if (testmgr_alloc_buf(xbuf))
goto out_nobuf;
 
-   init_completion();
+   crypto_init_wait();
 
req = ahash_request_alloc(tfm, GFP_KERNEL);
if (!req) {
@@ -295,7 +269,7 @@ static int __test_hash(struct crypto_ahash *tfm,
goto out_noreq;
}
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-  tcrypt_complete, );
+  crypto_req_done, );
 
j = 0;
for (i = 0; i < tcount; i++) {
@@ -335,26 +309,26 @@ static int __test_hash(struct crypto_ahash *tfm,
 
ahash_request_set_crypt(req, sg, result, template[i].psize);
if (use_digest) {
-   ret = wait_async_op(, crypto_ahash_digest(req));
+   ret = crypto_wait_req(crypto_ahash_digest(req), );
if (ret) {
pr_err("alg: hash: digest failed on test %d "
   "for %s: ret=%d\n", j, algo, -ret);
goto out;
}
} else {
-   ret = wait_async_op(, crypto_ahash_init(req));
+   ret = crypto_wait_req(crypto_ahash_init(req), );
if (ret) {
pr_err("alg: hash: init failed on test %d "
   "for %s: ret=%d\n", j, algo, -ret);
goto out;
}
-   ret = wait_async_op(, crypto_ahash_update(req));
+   ret = crypto_wait_req(crypto_ahash_update(req), );
if (ret) {
pr_err("alg: hash: update failed on test %d "
  

[PATCH v7 11/19] fscrypt: move to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
fscrypt starts several async. crypto ops and waiting for them to
complete. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 fs/crypto/crypto.c  | 28 
 fs/crypto/fname.c   | 36 ++--
 fs/crypto/fscrypt_private.h | 10 --
 fs/crypto/keyinfo.c | 21 +++--
 4 files changed, 13 insertions(+), 82 deletions(-)

diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index c7835df..80a3cad 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -126,21 +126,6 @@ struct fscrypt_ctx *fscrypt_get_ctx(const struct inode 
*inode, gfp_t gfp_flags)
 }
 EXPORT_SYMBOL(fscrypt_get_ctx);
 
-/**
- * page_crypt_complete() - completion callback for page crypto
- * @req: The asynchronous cipher request context
- * @res: The result of the cipher operation
- */
-static void page_crypt_complete(struct crypto_async_request *req, int res)
-{
-   struct fscrypt_completion_result *ecr = req->data;
-
-   if (res == -EINPROGRESS)
-   return;
-   ecr->res = res;
-   complete(>completion);
-}
-
 int fscrypt_do_page_crypto(const struct inode *inode, fscrypt_direction_t rw,
   u64 lblk_num, struct page *src_page,
   struct page *dest_page, unsigned int len,
@@ -151,7 +136,7 @@ int fscrypt_do_page_crypto(const struct inode *inode, 
fscrypt_direction_t rw,
u8 padding[FS_IV_SIZE - sizeof(__le64)];
} iv;
struct skcipher_request *req = NULL;
-   DECLARE_FS_COMPLETION_RESULT(ecr);
+   DECLARE_CRYPTO_WAIT(wait);
struct scatterlist dst, src;
struct fscrypt_info *ci = inode->i_crypt_info;
struct crypto_skcipher *tfm = ci->ci_ctfm;
@@ -179,7 +164,7 @@ int fscrypt_do_page_crypto(const struct inode *inode, 
fscrypt_direction_t rw,
 
skcipher_request_set_callback(
req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-   page_crypt_complete, );
+   crypto_req_done, );
 
sg_init_table(, 1);
sg_set_page(, dest_page, len, offs);
@@ -187,14 +172,9 @@ int fscrypt_do_page_crypto(const struct inode *inode, 
fscrypt_direction_t rw,
sg_set_page(, src_page, len, offs);
skcipher_request_set_crypt(req, , , len, );
if (rw == FS_DECRYPT)
-   res = crypto_skcipher_decrypt(req);
+   res = crypto_wait_req(crypto_skcipher_decrypt(req), );
else
-   res = crypto_skcipher_encrypt(req);
-   if (res == -EINPROGRESS || res == -EBUSY) {
-   BUG_ON(req->base.data != );
-   wait_for_completion();
-   res = ecr.res;
-   }
+   res = crypto_wait_req(crypto_skcipher_encrypt(req), );
skcipher_request_free(req);
if (res) {
printk_ratelimited(KERN_ERR
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index ad9f814..a80a0d3 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -15,21 +15,6 @@
 #include "fscrypt_private.h"
 
 /**
- * fname_crypt_complete() - completion callback for filename crypto
- * @req: The asynchronous cipher request context
- * @res: The result of the cipher operation
- */
-static void fname_crypt_complete(struct crypto_async_request *req, int res)
-{
-   struct fscrypt_completion_result *ecr = req->data;
-
-   if (res == -EINPROGRESS)
-   return;
-   ecr->res = res;
-   complete(>completion);
-}
-
-/**
  * fname_encrypt() - encrypt a filename
  *
  * The caller must have allocated sufficient memory for the @oname string.
@@ -40,7 +25,7 @@ static int fname_encrypt(struct inode *inode,
const struct qstr *iname, struct fscrypt_str *oname)
 {
struct skcipher_request *req = NULL;
-   DECLARE_FS_COMPLETION_RESULT(ecr);
+   DECLARE_CRYPTO_WAIT(wait);
struct fscrypt_info *ci = inode->i_crypt_info;
struct crypto_skcipher *tfm = ci->ci_ctfm;
int res = 0;
@@ -76,17 +61,12 @@ static int fname_encrypt(struct inode *inode,
}
skcipher_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-   fname_crypt_complete, );
+   crypto_req_done, );
sg_init_one(, oname->name, cryptlen);
skcipher_request_set_crypt(req, , , cryptlen, iv);
 
/* Do the encryption */
-   res = crypto_skcipher_encrypt(req);
-   if (res == -EINPROGRESS || res == -EBUSY) {
-   /* Request is being completed asynchronously; wait for it */
-   wait_for_completion();
-   res = ecr.res;
-   }
+   res = crypto_wait_req(crypto_skcipher_encrypt(req), );
skcipher_request_free(req);
if (res < 0) {
printk_ratelimited(KERN_ERR
@@ -110,7 +90,7 @@ static int fname_decrypt(struct inode *inode,

[PATCH v7 13/19] cifs: move to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
cifs starts an async. crypto op and waits for their completion.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
Acked-by: Pavel Shilovsky 
---
 fs/cifs/smb2ops.c | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index cfacf2c..16fb041 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -1878,22 +1878,6 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
return sg;
 }
 
-struct cifs_crypt_result {
-   int err;
-   struct completion completion;
-};
-
-static void cifs_crypt_complete(struct crypto_async_request *req, int err)
-{
-   struct cifs_crypt_result *res = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   res->err = err;
-   complete(>completion);
-}
-
 static int
 smb2_get_enc_key(struct TCP_Server_Info *server, __u64 ses_id, int enc, u8 
*key)
 {
@@ -1934,12 +1918,10 @@ crypt_message(struct TCP_Server_Info *server, struct 
smb_rqst *rqst, int enc)
struct aead_request *req;
char *iv;
unsigned int iv_len;
-   struct cifs_crypt_result result = {0, };
+   DECLARE_CRYPTO_WAIT(wait);
struct crypto_aead *tfm;
unsigned int crypt_len = le32_to_cpu(tr_hdr->OriginalMessageSize);
 
-   init_completion();
-
rc = smb2_get_enc_key(server, tr_hdr->SessionId, enc, key);
if (rc) {
cifs_dbg(VFS, "%s: Could not get %scryption key\n", __func__,
@@ -1999,14 +1981,10 @@ crypt_message(struct TCP_Server_Info *server, struct 
smb_rqst *rqst, int enc)
aead_request_set_ad(req, assoc_data_len);
 
aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- cifs_crypt_complete, );
+ crypto_req_done, );
 
-   rc = enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req);
-
-   if (rc == -EINPROGRESS || rc == -EBUSY) {
-   wait_for_completion();
-   rc = result.err;
-   }
+   rc = crypto_wait_req(enc ? crypto_aead_encrypt(req)
+   : crypto_aead_decrypt(req), );
 
if (!rc && enc)
memcpy(_hdr->Signature, sign, SMB2_SIGNATURE_SIZE);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 14/19] ima: move to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
ima starts several async crypto ops and  waits for their completions.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
Acked-by: Mimi Zohar 
---
 security/integrity/ima/ima_crypto.c | 56 +++--
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/security/integrity/ima/ima_crypto.c 
b/security/integrity/ima/ima_crypto.c
index a856d8c..9057b16 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -27,11 +27,6 @@
 
 #include "ima.h"
 
-struct ahash_completion {
-   struct completion completion;
-   int err;
-};
-
 /* minimum file size for ahash use */
 static unsigned long ima_ahash_minsize;
 module_param_named(ahash_minsize, ima_ahash_minsize, ulong, 0644);
@@ -196,30 +191,13 @@ static void ima_free_atfm(struct crypto_ahash *tfm)
crypto_free_ahash(tfm);
 }
 
-static void ahash_complete(struct crypto_async_request *req, int err)
+static inline int ahash_wait(int err, struct crypto_wait *wait)
 {
-   struct ahash_completion *res = req->data;
 
-   if (err == -EINPROGRESS)
-   return;
-   res->err = err;
-   complete(>completion);
-}
+   err = crypto_wait_req(err, wait);
 
-static int ahash_wait(int err, struct ahash_completion *res)
-{
-   switch (err) {
-   case 0:
-   break;
-   case -EINPROGRESS:
-   case -EBUSY:
-   wait_for_completion(>completion);
-   reinit_completion(>completion);
-   err = res->err;
-   /* fall through */
-   default:
+   if (err)
pr_crit_ratelimited("ahash calculation failed: err: %d\n", err);
-   }
 
return err;
 }
@@ -233,7 +211,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
int rc, read = 0, rbuf_len, active = 0, ahash_rc = 0;
struct ahash_request *req;
struct scatterlist sg[1];
-   struct ahash_completion res;
+   struct crypto_wait wait;
size_t rbuf_size[2];
 
hash->length = crypto_ahash_digestsize(tfm);
@@ -242,12 +220,12 @@ static int ima_calc_file_hash_atfm(struct file *file,
if (!req)
return -ENOMEM;
 
-   init_completion();
+   crypto_init_wait();
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP,
-  ahash_complete, );
+  crypto_req_done, );
 
-   rc = ahash_wait(crypto_ahash_init(req), );
+   rc = ahash_wait(crypto_ahash_init(req), );
if (rc)
goto out1;
 
@@ -288,7 +266,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
 * read/request, wait for the completion of the
 * previous ahash_update() request.
 */
-   rc = ahash_wait(ahash_rc, );
+   rc = ahash_wait(ahash_rc, );
if (rc)
goto out3;
}
@@ -304,7 +282,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
 * read/request, wait for the completion of the
 * previous ahash_update() request.
 */
-   rc = ahash_wait(ahash_rc, );
+   rc = ahash_wait(ahash_rc, );
if (rc)
goto out3;
}
@@ -318,7 +296,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
active = !active; /* swap buffers, if we use two */
}
/* wait for the last update request to complete */
-   rc = ahash_wait(ahash_rc, );
+   rc = ahash_wait(ahash_rc, );
 out3:
if (read)
file->f_mode &= ~FMODE_READ;
@@ -327,7 +305,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
 out2:
if (!rc) {
ahash_request_set_crypt(req, NULL, hash->digest, 0);
-   rc = ahash_wait(crypto_ahash_final(req), );
+   rc = ahash_wait(crypto_ahash_final(req), );
}
 out1:
ahash_request_free(req);
@@ -537,7 +515,7 @@ static int calc_buffer_ahash_atfm(const void *buf, loff_t 
len,
 {
struct ahash_request *req;
struct scatterlist sg;
-   struct ahash_completion res;
+   struct crypto_wait wait;
int rc, ahash_rc = 0;
 
hash->length = crypto_ahash_digestsize(tfm);
@@ -546,12 +524,12 @@ static int calc_buffer_ahash_atfm(const void *buf, loff_t 
len,
if (!req)
return -ENOMEM;
 
-   init_completion();
+   crypto_init_wait();
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP,
-  ahash_complete, );
+   

[PATCH v7 15/19] crypto: tcrypt: move to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
tcrypt starts several async crypto ops and  waits for their completions.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/tcrypt.c | 84 +
 1 file changed, 25 insertions(+), 59 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 0022a18..802aa81 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -79,34 +79,11 @@ static char *check[] = {
NULL
 };
 
-struct tcrypt_result {
-   struct completion completion;
-   int err;
-};
-
-static void tcrypt_complete(struct crypto_async_request *req, int err)
-{
-   struct tcrypt_result *res = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   res->err = err;
-   complete(>completion);
-}
-
 static inline int do_one_aead_op(struct aead_request *req, int ret)
 {
-   if (ret == -EINPROGRESS || ret == -EBUSY) {
-   struct tcrypt_result *tr = req->base.data;
+   struct crypto_wait *wait = req->base.data;
 
-   ret = wait_for_completion_interruptible(>completion);
-   if (!ret)
-   ret = tr->err;
-   reinit_completion(>completion);
-   }
-
-   return ret;
+   return crypto_wait_req(ret, wait);
 }
 
 static int test_aead_jiffies(struct aead_request *req, int enc,
@@ -248,7 +225,7 @@ static void test_aead_speed(const char *algo, int enc, 
unsigned int secs,
char *axbuf[XBUFSIZE];
unsigned int *b_size;
unsigned int iv_len;
-   struct tcrypt_result result;
+   struct crypto_wait wait;
 
iv = kzalloc(MAX_IVLEN, GFP_KERNEL);
if (!iv)
@@ -284,7 +261,7 @@ static void test_aead_speed(const char *algo, int enc, 
unsigned int secs,
goto out_notfm;
}
 
-   init_completion();
+   crypto_init_wait();
printk(KERN_INFO "\ntesting speed of %s (%s) %s\n", algo,
get_driver_name(crypto_aead, tfm), e);
 
@@ -296,7 +273,7 @@ static void test_aead_speed(const char *algo, int enc, 
unsigned int secs,
}
 
aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- tcrypt_complete, );
+ crypto_req_done, );
 
i = 0;
do {
@@ -397,21 +374,16 @@ static void test_hash_sg_init(struct scatterlist *sg)
 
 static inline int do_one_ahash_op(struct ahash_request *req, int ret)
 {
-   if (ret == -EINPROGRESS || ret == -EBUSY) {
-   struct tcrypt_result *tr = req->base.data;
+   struct crypto_wait *wait = req->base.data;
 
-   wait_for_completion(>completion);
-   reinit_completion(>completion);
-   ret = tr->err;
-   }
-   return ret;
+   return crypto_wait_req(ret, wait);
 }
 
 struct test_mb_ahash_data {
struct scatterlist sg[TVMEMSIZE];
char result[64];
struct ahash_request *req;
-   struct tcrypt_result tresult;
+   struct crypto_wait wait;
char *xbuf[XBUFSIZE];
 };
 
@@ -440,7 +412,7 @@ static void test_mb_ahash_speed(const char *algo, unsigned 
int sec,
if (testmgr_alloc_buf(data[i].xbuf))
goto out;
 
-   init_completion([i].tresult.completion);
+   crypto_init_wait([i].wait);
 
data[i].req = ahash_request_alloc(tfm, GFP_KERNEL);
if (!data[i].req) {
@@ -449,8 +421,8 @@ static void test_mb_ahash_speed(const char *algo, unsigned 
int sec,
goto out;
}
 
-   ahash_request_set_callback(data[i].req, 0,
-  tcrypt_complete, [i].tresult);
+   ahash_request_set_callback(data[i].req, 0, crypto_req_done,
+  [i].wait);
test_hash_sg_init(data[i].sg);
}
 
@@ -492,16 +464,16 @@ static void test_mb_ahash_speed(const char *algo, 
unsigned int sec,
if (ret)
break;
 
-   complete([k].tresult.completion);
-   data[k].tresult.err = 0;
+   crypto_req_done([k].req->base, 0);
}
 
for (j = 0; j < k; j++) {
-   struct tcrypt_result *tr = [j].tresult;
+   struct crypto_wait *wait = [j].wait;
+   int wait_ret;
 
-   wait_for_completion(>completion);
-   if (tr->err)
-   ret = tr->err;
+   wait_ret = crypto_wait_req(-EINPROGRESS, wait);
+   if (wait_ret)
+   ret = wait_ret;
}
 
end = get_cycles();
@@ -679,7 +651,7 @@ static void test_ahash_speed_common(const char *algo, 
unsigned int secs,

[PATCH v7 17/19] crypto: qce: move to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
The qce driver starts several async crypto ops and  waits for their
completions. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 drivers/crypto/qce/sha.c | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/drivers/crypto/qce/sha.c b/drivers/crypto/qce/sha.c
index 47e114a..53227d7 100644
--- a/drivers/crypto/qce/sha.c
+++ b/drivers/crypto/qce/sha.c
@@ -349,28 +349,12 @@ static int qce_ahash_digest(struct ahash_request *req)
return qce->async_req_enqueue(tmpl->qce, >base);
 }
 
-struct qce_ahash_result {
-   struct completion completion;
-   int error;
-};
-
-static void qce_digest_complete(struct crypto_async_request *req, int error)
-{
-   struct qce_ahash_result *result = req->data;
-
-   if (error == -EINPROGRESS)
-   return;
-
-   result->error = error;
-   complete(>completion);
-}
-
 static int qce_ahash_hmac_setkey(struct crypto_ahash *tfm, const u8 *key,
 unsigned int keylen)
 {
unsigned int digestsize = crypto_ahash_digestsize(tfm);
struct qce_sha_ctx *ctx = crypto_tfm_ctx(>base);
-   struct qce_ahash_result result;
+   struct crypto_wait wait;
struct ahash_request *req;
struct scatterlist sg;
unsigned int blocksize;
@@ -405,9 +389,9 @@ static int qce_ahash_hmac_setkey(struct crypto_ahash *tfm, 
const u8 *key,
goto err_free_ahash;
}
 
-   init_completion();
+   crypto_init_wait();
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-  qce_digest_complete, );
+  crypto_req_done, );
crypto_ahash_clear_flags(ahash_tfm, ~0);
 
buf = kzalloc(keylen + QCE_MAX_ALIGN_SIZE, GFP_KERNEL);
@@ -420,13 +404,7 @@ static int qce_ahash_hmac_setkey(struct crypto_ahash *tfm, 
const u8 *key,
sg_init_one(, buf, keylen);
ahash_request_set_crypt(req, , ctx->authkey, keylen);
 
-   ret = crypto_ahash_digest(req);
-   if (ret == -EINPROGRESS || ret == -EBUSY) {
-   ret = wait_for_completion_interruptible();
-   if (!ret)
-   ret = result.error;
-   }
-
+   ret = crypto_wait_req(crypto_ahash_digest(req), );
if (ret)
crypto_ahash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 18/19] crypto: mediatek: move to generic async completion

2017-08-24 Thread Gilad Ben-Yossef
The mediatek driver starts several async crypto ops and waits for their
completions. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
Acked-by: Ryder Lee 
---
 drivers/crypto/mediatek/mtk-aes.c | 31 +--
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/crypto/mediatek/mtk-aes.c 
b/drivers/crypto/mediatek/mtk-aes.c
index 9e845e8..e2c7c95 100644
--- a/drivers/crypto/mediatek/mtk-aes.c
+++ b/drivers/crypto/mediatek/mtk-aes.c
@@ -137,11 +137,6 @@ struct mtk_aes_gcm_ctx {
struct crypto_skcipher *ctr;
 };
 
-struct mtk_aes_gcm_setkey_result {
-   int err;
-   struct completion completion;
-};
-
 struct mtk_aes_drv {
struct list_head dev_list;
/* Device list lock */
@@ -936,17 +931,6 @@ static int mtk_aes_gcm_crypt(struct aead_request *req, u64 
mode)
>base);
 }
 
-static void mtk_gcm_setkey_done(struct crypto_async_request *req, int err)
-{
-   struct mtk_aes_gcm_setkey_result *result = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   result->err = err;
-   complete(>completion);
-}
-
 /*
  * Because of the hardware limitation, we need to pre-calculate key(H)
  * for the GHASH operation. The result of the encryption operation
@@ -962,7 +946,7 @@ static int mtk_aes_gcm_setkey(struct crypto_aead *aead, 
const u8 *key,
u32 hash[4];
u8 iv[8];
 
-   struct mtk_aes_gcm_setkey_result result;
+   struct crypto_wait wait;
 
struct scatterlist sg[1];
struct skcipher_request req;
@@ -1002,22 +986,17 @@ static int mtk_aes_gcm_setkey(struct crypto_aead *aead, 
const u8 *key,
if (!data)
return -ENOMEM;
 
-   init_completion(>result.completion);
+   crypto_init_wait(>wait);
sg_init_one(data->sg, >hash, AES_BLOCK_SIZE);
skcipher_request_set_tfm(>req, ctr);
skcipher_request_set_callback(>req, CRYPTO_TFM_REQ_MAY_SLEEP |
  CRYPTO_TFM_REQ_MAY_BACKLOG,
- mtk_gcm_setkey_done, >result);
+ crypto_req_done, >wait);
skcipher_request_set_crypt(>req, data->sg, data->sg,
   AES_BLOCK_SIZE, data->iv);
 
-   err = crypto_skcipher_encrypt(>req);
-   if (err == -EINPROGRESS || err == -EBUSY) {
-   err = wait_for_completion_interruptible(
-   >result.completion);
-   if (!err)
-   err = data->result.err;
-   }
+   err = crypto_wait_req(crypto_skcipher_encrypt(>req),
+ >wait);
if (err)
goto out;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 19/19] crypto: adapt api sample to use async. op wait

2017-08-24 Thread Gilad Ben-Yossef
The code sample is waiting for an async. crypto op completion.
Adapt sample to use the new generic infrastructure to do the same.

This also fixes a possible data coruption bug created by the
use of wait_for_completion_interruptible() without dealing
correctly with an interrupt aborting the wait prior to the
async op finishing.

Signed-off-by: Gilad Ben-Yossef 
---
 Documentation/crypto/api-samples.rst | 52 +++-
 1 file changed, 10 insertions(+), 42 deletions(-)

diff --git a/Documentation/crypto/api-samples.rst 
b/Documentation/crypto/api-samples.rst
index 2531948..006827e 100644
--- a/Documentation/crypto/api-samples.rst
+++ b/Documentation/crypto/api-samples.rst
@@ -7,59 +7,27 @@ Code Example For Symmetric Key Cipher Operation
 ::
 
 
-struct tcrypt_result {
-struct completion completion;
-int err;
-};
-
 /* tie all data structures together */
 struct skcipher_def {
 struct scatterlist sg;
 struct crypto_skcipher *tfm;
 struct skcipher_request *req;
-struct tcrypt_result result;
+struct crypto_wait wait;
 };
 
-/* Callback function */
-static void test_skcipher_cb(struct crypto_async_request *req, int error)
-{
-struct tcrypt_result *result = req->data;
-
-if (error == -EINPROGRESS)
-return;
-result->err = error;
-complete(>completion);
-pr_info("Encryption finished successfully\n");
-}
-
 /* Perform cipher operation */
 static unsigned int test_skcipher_encdec(struct skcipher_def *sk,
  int enc)
 {
-int rc = 0;
+int rc;
 
 if (enc)
-rc = crypto_skcipher_encrypt(sk->req);
+rc = crypto_wait_req(crypto_skcipher_encrypt(sk->req), >wait);
 else
-rc = crypto_skcipher_decrypt(sk->req);
-
-switch (rc) {
-case 0:
-break;
-case -EINPROGRESS:
-case -EBUSY:
-rc = wait_for_completion_interruptible(
->result.completion);
-if (!rc && !sk->result.err) {
-reinit_completion(>result.completion);
-break;
-}
-default:
-pr_info("skcipher encrypt returned with %d result %d\n",
-rc, sk->result.err);
-break;
-}
-init_completion(>result.completion);
+rc = crypto_wait_req(crypto_skcipher_decrypt(sk->req), >wait);
+
+   if (rc)
+   pr_info("skcipher encrypt returned with result %d\n", rc);
 
 return rc;
 }
@@ -89,8 +57,8 @@ Code Example For Symmetric Key Cipher Operation
 }
 
 skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-  test_skcipher_cb,
-  );
+  crypto_req_done,
+  );
 
 /* AES 256 with random key */
 get_random_bytes(, 32);
@@ -122,7 +90,7 @@ Code Example For Symmetric Key Cipher Operation
 /* We encrypt one block */
 sg_init_one(, scratchpad, 16);
 skcipher_request_set_crypt(req, , , 16, ivdata);
-init_completion();
+crypto_init_wait();
 
 /* encrypt data */
 ret = test_skcipher_encdec(, 1);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 00/19] simplify crypto wait for async op

2017-08-24 Thread Gilad Ben-Yossef
Many users of kernel async. crypto services have a pattern of
starting an async. crypto op and than using a completion
to wait for it to end.

This patch set simplifies this common use case in two ways:

First, by separating the return codes of the case where a
request is queued to a backlog due to the provider being
busy (-EBUSY) from the case the request has failed due
to the provider being busy and backlogging is not enabled
(-EAGAIN).

Next, this change is than built on to create a generic API
to wait for a async. crypto operation to complete.

The end result is a smaller code base and an API that is
easier to use and more difficult to get wrong.

The patch set was boot tested on x86_64 and arm64 which
at the very least tests the crypto users via testmgr and
tcrypt but I do note that I do not have access to some
of the HW whose drivers are modified nor do I claim I was
able to test all of the corner cases.

The patch set is based upon linux-next release tagged
next-20170824.

Changes from v6:
- Fix brown paper bag compile error on marvell/cesa
  code.

Changes from v5:
- Remove redundant new line as spotted by Jonathan
  Cameron.
- Reworded dm-verity change commit message to better
  clarify potential issue averted by change as
  pointed out by Mikulas Patocka.

Changes from v4:
- Rebase on top of latest algif changes from Stephan
  Mueller.
- Fix typo in ccp patch title.

Changes from v3:
- Instead of changing the return code to indicate
  backlog queueing, change the return code to indicate
  transient busy state, as suggested by Herbert Xu.

Changes from v2:
- Patch title changed from "introduce crypto wait for
  async op" to better reflect the current state.
- Rebase on top of latest linux-next.
- Add a new return code of -EIOCBQUEUED for backlog
  queueing, as suggested by Herbert Xu.
- Transform more users to the new API.
- Update the drbg change to account for new init as
  indicated by Stephan Muller.

Changes from v1:
- Address review comments from Eric Biggers.
- Separated out bug fixes of existing code and rebase
  on top of that patch set.
- Rename 'ecr' to 'wait' in fscrypto code.
- Split patch introducing the new API from the change
  moving over the algif code which it originated from
  to the new API.
- Inline crypto_wait_req().
- Some code indentation fixes.

Gilad Ben-Yossef (19):
  crypto: change transient busy return code to -EAGAIN
  crypto: ccp: use -EAGAIN for transient busy indication
  crypto: remove redundant backlog checks on EBUSY
  crypto: marvell/cesa: remove redundant backlog checks on EBUSY
  crypto: introduce crypto wait for async op
  crypto: move algif to generic async completion
  crypto: move pub key to generic async completion
  crypto: move drbg to generic async completion
  crypto: move gcm to generic async completion
  crypto: move testmgr to generic async completion
  fscrypt: move to generic async completion
  dm: move dm-verity to generic async completion
  cifs: move to generic async completion
  ima: move to generic async completion
  crypto: tcrypt: move to generic async completion
  crypto: talitos: move to generic async completion
  crypto: qce: move to generic async completion
  crypto: mediatek: move to generic async completion
  crypto: adapt api sample to use async. op wait

 Documentation/crypto/api-samples.rst |  52 ++---
 crypto/af_alg.c  |  27 -
 crypto/ahash.c   |  12 +--
 crypto/algapi.c  |   6 +-
 crypto/algif_aead.c  |   8 +-
 crypto/algif_hash.c  |  50 +
 crypto/algif_skcipher.c  |   9 +-
 crypto/api.c |  13 +++
 crypto/asymmetric_keys/public_key.c  |  28 +
 crypto/cryptd.c  |   4 +-
 crypto/cts.c |   6 +-
 crypto/drbg.c|  36 ++-
 crypto/gcm.c |  32 ++
 crypto/lrw.c |   8 +-
 crypto/rsa-pkcs1pad.c|  16 +--
 crypto/tcrypt.c  |  84 +--
 crypto/testmgr.c | 204 ---
 crypto/xts.c |   8 +-
 drivers/crypto/ccp/ccp-crypto-main.c |   8 +-
 drivers/crypto/ccp/ccp-dev.c |   7 +-
 drivers/crypto/marvell/cesa.c|   3 +-
 drivers/crypto/marvell/cesa.h|   2 +-
 drivers/crypto/mediatek/mtk-aes.c|  31 +-
 drivers/crypto/qce/sha.c |  30 +-
 drivers/crypto/talitos.c |  38 +--
 drivers/md/dm-verity-target.c|  81 --
 drivers/md/dm-verity.h   |   5 -
 fs/cifs/smb2ops.c|  30 +-
 fs/crypto/crypto.c   |  28 +
 fs/crypto/fname.c|  36 ++-
 fs/crypto/fscrypt_private.h  |  10 --
 fs/crypto/keyinfo.c  |  21 +---
 include/crypto/drbg.h|   3 +-
 include/crypt

Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-24 Thread Michal Hocko
On Thu 24-08-17 14:58:42, Roman Gushchin wrote:
> On Thu, Aug 24, 2017 at 02:58:11PM +0200, Michal Hocko wrote:
> > On Thu 24-08-17 13:28:46, Roman Gushchin wrote:
> > > Hi Michal!
> > > 
> > There is nothing like a "better victim". We are pretty much in a
> > catastrophic situation when we try to survive by killing a userspace.
> 
> Not necessary, it can be a cgroup OOM.

memcg OOM is no different. The catastrophic is scoped to the specific
hierarchy but tasks in that hierarchy still fail to make a further
progress.

> > We try to kill the largest because that assumes that we return the
> > most memory from it. Now I do understand that you want to treat the
> > memcg as a single killable entity but I find it really questionable
> > to do a per-memcg metric and then do not treat it like that and kill
> > only a single task. Just imagine a single memcg with zillions of taks
> > each very small and you select it as the largest while a small taks
> > itself doesn't help to help to get us out of the OOM.
> 
> I don't think it's different from a non-containerized state: if you
> have a zillion of small tasks in the system, you'll meet the same issues.

Yes this is possible but usually you are comparing apples to apples so
you will kill the largest offender and then go on. To be honest I really
do hate how we try to kill a children rather than the selected victim
for the same reason.

> > > > I guess I have asked already and we haven't reached any consensus. I do
> > > > not like how you treat memcgs and tasks differently. Why cannot we have
> > > > a memcg score a sum of all its tasks?
> > > 
> > > It sounds like a more expensive way to get almost the same with less 
> > > accuracy.
> > > Why it's better?
> > 
> > because then you are comparing apples to apples?
> 
> Well, I can say that I compare some number of pages against some other number
> of pages. And the relation between a page and memcg is more obvious, than a
> relation between a page and a process.

But you are comparing different accounting systems.
 
> Both ways are not ideal, and sum of the processes is not ideal too.
> Especially, if you take oom_score_adj into account. Will you respect it?

Yes, and I do not see any reason why we shouldn't.

> I've started actually with such approach, but then found it weird.
> 
> > Besides that you have
> > to check each task for over-killing anyway. So I do not see any
> > performance merits here.
> 
> It's an implementation detail, and we can hopefully get rid of it at some 
> point.

Well, we might do some estimations and ignore oom scopes but I that
sounds really complicated and error prone. Unless we have anything like
that then I would start from tasks and build up the necessary to make a
decision at the higher level.
 
> > > > How do you want to compare memcg score with tasks score?
> > > 
> > > I have to do it for tasks in root cgroups, but it shouldn't be a common 
> > > case.
> > 
> > How come? I can easily imagine a setup where only some memcgs which
> > really do need a kill-all semantic while all others can live with single
> > task killed perfectly fine.
> 
> I mean taking a unified cgroup hierarchy into an account, there should not
> be lot of tasks in the root cgroup, if any.

Is that really the case? I would assume that memory controller would be
enabled only in those subtrees which really use the functionality and
the rest will be sitting in the root memcg. It might be the case if you
are running only containers but I am not really sure this is true in
general.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-24 Thread Roman Gushchin
On Thu, Aug 24, 2017 at 02:58:11PM +0200, Michal Hocko wrote:
> On Thu 24-08-17 13:28:46, Roman Gushchin wrote:
> > Hi Michal!
> > 
> There is nothing like a "better victim". We are pretty much in a
> catastrophic situation when we try to survive by killing a userspace.

Not necessary, it can be a cgroup OOM.

> We try to kill the largest because that assumes that we return the
> most memory from it. Now I do understand that you want to treat the
> memcg as a single killable entity but I find it really questionable
> to do a per-memcg metric and then do not treat it like that and kill
> only a single task. Just imagine a single memcg with zillions of taks
> each very small and you select it as the largest while a small taks
> itself doesn't help to help to get us out of the OOM.

I don't think it's different from a non-containerized state: if you
have a zillion of small tasks in the system, you'll meet the same issues.

> > > I guess I have asked already and we haven't reached any consensus. I do
> > > not like how you treat memcgs and tasks differently. Why cannot we have
> > > a memcg score a sum of all its tasks?
> > 
> > It sounds like a more expensive way to get almost the same with less 
> > accuracy.
> > Why it's better?
> 
> because then you are comparing apples to apples?

Well, I can say that I compare some number of pages against some other number
of pages. And the relation between a page and memcg is more obvious, than a
relation between a page and a process.

Both ways are not ideal, and sum of the processes is not ideal too.
Especially, if you take oom_score_adj into account. Will you respect it?

I've started actually with such approach, but then found it weird.

> Besides that you have
> to check each task for over-killing anyway. So I do not see any
> performance merits here.

It's an implementation detail, and we can hopefully get rid of it at some point.

> 
> > > How do you want to compare memcg score with tasks score?
> > 
> > I have to do it for tasks in root cgroups, but it shouldn't be a common 
> > case.
> 
> How come? I can easily imagine a setup where only some memcgs which
> really do need a kill-all semantic while all others can live with single
> task killed perfectly fine.

I mean taking a unified cgroup hierarchy into an account, there should not
be lot of tasks in the root cgroup, if any.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v6 3/4] mm, oom: introduce oom_priority for memory cgroups

2017-08-24 Thread Michal Hocko
On Thu 24-08-17 13:51:13, Roman Gushchin wrote:
> On Thu, Aug 24, 2017 at 02:10:54PM +0200, Michal Hocko wrote:
> > On Wed 23-08-17 17:52:00, Roman Gushchin wrote:
> > > Introduce a per-memory-cgroup oom_priority setting: an integer number
> > > within the [-1, 1] range, which defines the order in which
> > > the OOM killer selects victim memory cgroups.
> > 
> > Why do we need a range here?
> 
> No specific reason, both [INT_MIN, INT_MAX] and [-1, 1] will
> work equally.

Then do not enforce a range because this just reduces possible usecases
(e.g. timestamp one...).

> We should be able to predefine an OOM killing order for
> any reasonable amount of cgroups.
> 
> > 
> > > OOM killer prefers memory cgroups with larger priority if they are
> > > populated with eligible tasks.
> > 
> > So this is basically orthogonal to the score based selection and the
> > real size is only the tiebreaker for same priorities? Could you describe
> > the usecase? Becasuse to me this sounds like a separate oom killer
> > strategy. I can imagine somebody might be interested (e.g. always kill
> > the oldest memcgs...) but an explicit range wouldn't fly with such a
> > usecase very well.
> 
> The usecase: you have a machine with several containerized workloads
> of different importance, and some system-level stuff, also in (memory)
> cgroups.
> In case of global memory shortage, some workloads should be killed in
> a first order, others should be killed only if there is no other option.
> Several workloads can have equal importance. Size-based tiebreaking
> is very useful to catch memory leakers amongst them.

OK, please document that in the changelog.

> > That brings me back to my original suggestion. Wouldn't a "register an
> > oom strategy" approach much better than blending things together and
> > then have to wrap heads around different combinations of tunables?
> 
> Well, I believe that 90% of this patchset is still relevant;

agreed and didn't say otherwise.

> the only
> thing you might want to customize/replace size-based tiebreaking with
> something else (like timestamp-based tiebreaking, mentioned by David earlier).

> What about tunables, there are two, and they are completely orthogonal:
> 1) oom_priority allows to define an order, in which cgroups will be OOMed
> 2) oom_kill_all defines if all or just one task should be killed
> 
> So, I don't think it's a too complex interface.
> 
> Again, I'm not against oom strategy approach, it just looks as a much bigger
> project, and I do not see a big need.

Well, I was thinking that our current oom victim selection code is
quite extensible already. Your patches will teach it kill the whole
group semantic which is already very useful. Now we can talk about the
selection criteria and this is something to be replaceable. Because even
the current discussion has shown that different people might and will
have different requirements. Can we structure the code in such a way
that new comparison algorithm would be simple to add without reworking
the whole selection logic?

> Do you have an example, which can't be effectively handled by an approach
> I'm suggesting?

No, I do not have any which would be _explicitly_ requested but I do
envision new requirements will emerge. The most probable one would be
kill the youngest container because that would imply the least amount of
work wasted.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2 v8] printk: Add new timestamps

2017-08-24 Thread Prarit Bhargava
printk.time=1/CONFIG_PRINTK_TIME=1 adds a unmodified local hardware clock
timestamp to printk messages.  The local hardware clock loses time each
day making it difficult to determine exactly when an issue has occurred in
the kernel log, and making it difficult to determine how kernel and
hardware issues relate to each other in real time.

Add monotonic, boottime, and real clock timestamps in addition to the existing
local hardware clock timestamp.

Signed-off-by: Prarit Bhargava 
Cc: Mark Salyzyn 
Cc: Jonathan Corbet 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: Steven Rostedt 
Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: Andrew Morton 
Cc: Greg Kroah-Hartman 
Cc: "Paul E. McKenney" 
Cc: Christoffer Dall 
Cc: Deepa Dinamani 
Cc: Ingo Molnar 
Cc: Joel Fernandes 
Cc: Prarit Bhargava 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Geert Uytterhoeven 
Cc: "Luis R. Rodriguez" 
Cc: Nicholas Piggin 
Cc: "Jason A. Donenfeld" 
Cc: Olof Johansson 
Cc: Josh Poimboeuf 
Cc: linux-doc@vger.kernel.org
[jstultz: reworked Kconfig settings to avoid defconfig noise]
Signed-off-by: John Stultz 

Prarit Bhargava (2):
  time: Make fast functions return 0 before timekeeping is initialized
  printk: Add monotonic, boottime, and realtime timestamps

 Documentation/admin-guide/kernel-parameters.txt |   6 +-
 include/linux/timekeeping.h |   1 +
 kernel/printk/printk.c  | 154 +++-
 kernel/time/timekeeping.c   |  46 +--
 lib/Kconfig.debug   |  65 +-
 5 files changed, 253 insertions(+), 19 deletions(-)

-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 04/19] crypto: marvell/cesa: remove redundant backlog checks on EBUSY

2017-08-24 Thread Gilad Ben-Yossef
On Thu, Aug 24, 2017 at 4:01 PM, kbuild test robot <l...@intel.com> wrote:
> Hi Gilad,
>
> [auto build test ERROR on cryptodev/master]
> [also build test ERROR on v4.13-rc6 next-20170823]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Gilad-Ben-Yossef/crypto-change-transient-busy-return-code-to-EAGAIN/20170824-180606
> base:   
> https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git 
> master
> config: arm-allmodconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> wget 
> https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm
>
> All error/warnings (new ones prefixed by >>):
>
>drivers/crypto/marvell/cesa.c: In function 'mv_cesa_queue_req':
>>> drivers/crypto/marvell/cesa.c:187:3: error: expected ')' before 
>>> 'mv_cesa_tdma_chain'
>   mv_cesa_tdma_chain(engine, creq);
>   ^~
>>> drivers/crypto/marvell/cesa.c:196:1: error: expected expression before '}' 
>>> token
> }
> ^
>>> drivers/crypto/marvell/cesa.c:196:1: warning: control reaches end of 
>>> non-void function [-Wreturn-type]
> }
> ^

Oy! I thought I've added COMPILE_TEST to my local tree to build it. I didn't.

Fix underway.

Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 04/19] crypto: marvell/cesa: remove redundant backlog checks on EBUSY

2017-08-24 Thread kbuild test robot
Hi Gilad,

[auto build test ERROR on cryptodev/master]
[also build test ERROR on v4.13-rc6 next-20170823]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Gilad-Ben-Yossef/crypto-change-transient-busy-return-code-to-EAGAIN/20170824-180606
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All error/warnings (new ones prefixed by >>):

   drivers/crypto/marvell/cesa.c: In function 'mv_cesa_queue_req':
>> drivers/crypto/marvell/cesa.c:187:3: error: expected ')' before 
>> 'mv_cesa_tdma_chain'
  mv_cesa_tdma_chain(engine, creq);
  ^~
>> drivers/crypto/marvell/cesa.c:196:1: error: expected expression before '}' 
>> token
}
^
>> drivers/crypto/marvell/cesa.c:196:1: warning: control reaches end of 
>> non-void function [-Wreturn-type]
}
^

vim +187 drivers/crypto/marvell/cesa.c

f63601fd6 Boris Brezillon  2015-06-18  176  
53da740fe Romain Perier2016-06-21  177  int mv_cesa_queue_req(struct 
crypto_async_request *req,
53da740fe Romain Perier2016-06-21  178struct 
mv_cesa_req *creq)
f63601fd6 Boris Brezillon  2015-06-18  179  {
f63601fd6 Boris Brezillon  2015-06-18  180  int ret;
bf8f91e71 Romain Perier2016-06-21  181  struct mv_cesa_engine *engine = 
creq->engine;
f63601fd6 Boris Brezillon  2015-06-18  182  
bf8f91e71 Romain Perier2016-06-21  183  spin_lock_bh(>lock);
bf8f91e71 Romain Perier2016-06-21  184  ret = 
crypto_enqueue_request(>queue, req);
603e989eb Romain Perier2016-07-22  185  if ((mv_cesa_req_get_type(creq) 
== CESA_DMA_REQ) &&
3a1c7473a Gilad Ben-Yossef 2017-08-21  186  (ret == -EINPROGRESS || ret 
== -EBUSY)
603e989eb Romain Perier2016-07-22 @187  
mv_cesa_tdma_chain(engine, creq);
bf8f91e71 Romain Perier2016-06-21  188  spin_unlock_bh(>lock);
f63601fd6 Boris Brezillon  2015-06-18  189  
f63601fd6 Boris Brezillon  2015-06-18  190  if (ret != -EINPROGRESS)
f63601fd6 Boris Brezillon  2015-06-18  191  return ret;
f63601fd6 Boris Brezillon  2015-06-18  192  
85030c516 Romain Perier2016-06-21  193  mv_cesa_rearm_engine(engine);
f63601fd6 Boris Brezillon  2015-06-18  194  
f63601fd6 Boris Brezillon  2015-06-18  195  return -EINPROGRESS;
f63601fd6 Boris Brezillon  2015-06-18 @196  }
f63601fd6 Boris Brezillon  2015-06-18  197  

:: The code at line 187 was first introduced by commit
:: 603e989ebee21bfa9cddf8390d0515a07324edc8 crypto: marvell - Don't chain 
at DMA level when backlog is disabled

:: TO: Romain Perier <romain.per...@free-electrons.com>
:: CC: Herbert Xu <herb...@gondor.apana.org.au>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-24 Thread Michal Hocko
On Thu 24-08-17 13:28:46, Roman Gushchin wrote:
> Hi Michal!
> 
> On Thu, Aug 24, 2017 at 01:47:06PM +0200, Michal Hocko wrote:
> > This doesn't apply on top of mmotm cleanly. You are missing
> > http://lkml.kernel.org/r/20170807113839.16695-3-mho...@kernel.org
> 
> I'll rebase. Thanks!
> 
> > 
> > On Wed 23-08-17 17:51:59, Roman Gushchin wrote:
> > > Traditionally, the OOM killer is operating on a process level.
> > > Under oom conditions, it finds a process with the highest oom score
> > > and kills it.
> > > 
> > > This behavior doesn't suit well the system with many running
> > > containers:
> > > 
> > > 1) There is no fairness between containers. A small container with
> > > few large processes will be chosen over a large one with huge
> > > number of small processes.
> > > 
> > > 2) Containers often do not expect that some random process inside
> > > will be killed. In many cases much safer behavior is to kill
> > > all tasks in the container. Traditionally, this was implemented
> > > in userspace, but doing it in the kernel has some advantages,
> > > especially in a case of a system-wide OOM.
> > > 
> > > 3) Per-process oom_score_adj affects global OOM, so it's a breache
> > > in the isolation.
> > 
> > Please explain more. I guess you mean that an untrusted memcg could hide
> > itself from the global OOM killer by reducing the oom scores? Well you
> > need CAP_SYS_RESOURCE do reduce the current oom_score{_adj} as David has
> > already pointed out. I also agree that we absolutely must not kill an
> > oom disabled task. I am pretty sure somebody is using OOM_SCORE_ADJ_MIN
> > as a protection from an untrusted SIGKILL and inconsistent state as a
> > result. Those applications simply shouldn't behave differently in the
> > global and container contexts.
> 
> The main point of the kill_all option is to clean up the victim cgroup
> _completely_. If some tasks can survive, that means userspace should
> take care of them, look at the cgroup after oom, and kill the survivors
> manually.
> 
> If you want to rely on OOM_SCORE_ADJ_MIN, don't set kill_all.
> I really don't get the usecase for this "kill all, except this and that".

OOM_SCORE_ADJ_MIN has become a contract de-facto. You cannot simply
expect that somebody would alter a specific workload for a container
just to be safe against unexpected SIGKILL. kill-all might be set up the
memcg hierarchy which is out of your control.

> Also, it's really confusing to respect -1000 value, and completely ignore 
> -999.
> 
> I believe that any complex userspace OOM handling should use memory.high
> and handle memory shortage before an actual OOM.
> 
> > 
> > If nothing else we have to skip OOM_SCORE_ADJ_MIN tasks during the kill.
> > 
> > > To address these issues, cgroup-aware OOM killer is introduced.
> > > 
> > > Under OOM conditions, it tries to find the biggest memory consumer,
> > > and free memory by killing corresponding task(s). The difference
> > > the "traditional" OOM killer is that it can treat memory cgroups
> > > as memory consumers as well as single processes.
> > > 
> > > By default, it will look for the biggest leaf cgroup, and kill
> > > the largest task inside.
> > 
> > Why? I believe that the semantic should be as simple as kill the largest
> > oom killable entity. And the entity is either a process or a memcg which
> > is marked that way.
> 
> So, you still need to compare memcgroups and processes.
> 
> In my case, it's more like an exception (only processes from root memcg,
> and only if there are no eligible cgroups with lower oom_priority).
> You suggest to rely on this comparison.
> 
> > Why should we mix things and select a memcg to kill
> > a process inside it? More on that below.
> 
> To have some sort of "fairness" in a containerized environemnt.
> Say, 1 cgroup with 1 big task, another cgroup with many smaller tasks.
> It's not necessary true, that first one is a better victim.

There is nothing like a "better victim". We are pretty much in a
catastrophic situation when we try to survive by killing a userspace.
We try to kill the largest because that assumes that we return the
most memory from it. Now I do understand that you want to treat the
memcg as a single killable entity but I find it really questionable
to do a per-memcg metric and then do not treat it like that and kill
only a single task. Just imagine a single memcg with zillions of taks
each very small and you select it as the largest while a small taks
itself doesn't help to help to get us out of the OOM.
 
> > > But a user can change this behavior by enabling the per-cgroup
> > > oom_kill_all_tasks option. If set, it causes the OOM killer treat
> > > the whole cgroup as an indivisible memory consumer. In case if it's
> > > selected as on OOM victim, all belonging tasks will be killed.
> > > 
> > > Tasks in the root cgroup are treated as independent memory consumers,
> > > and are compared with other memory consumers (e.g. leaf cgroups).
> > > The root cgroup doesn't 

Re: [v6 3/4] mm, oom: introduce oom_priority for memory cgroups

2017-08-24 Thread Roman Gushchin
On Thu, Aug 24, 2017 at 02:10:54PM +0200, Michal Hocko wrote:
> On Wed 23-08-17 17:52:00, Roman Gushchin wrote:
> > Introduce a per-memory-cgroup oom_priority setting: an integer number
> > within the [-1, 1] range, which defines the order in which
> > the OOM killer selects victim memory cgroups.
> 
> Why do we need a range here?

No specific reason, both [INT_MIN, INT_MAX] and [-1, 1] will
work equally. We should be able to predefine an OOM killing order for
any reasonable amount of cgroups.

> 
> > OOM killer prefers memory cgroups with larger priority if they are
> > populated with eligible tasks.
> 
> So this is basically orthogonal to the score based selection and the
> real size is only the tiebreaker for same priorities? Could you describe
> the usecase? Becasuse to me this sounds like a separate oom killer
> strategy. I can imagine somebody might be interested (e.g. always kill
> the oldest memcgs...) but an explicit range wouldn't fly with such a
> usecase very well.

The usecase: you have a machine with several containerized workloads
of different importance, and some system-level stuff, also in (memory)
cgroups.
In case of global memory shortage, some workloads should be killed in
a first order, others should be killed only if there is no other option.
Several workloads can have equal importance. Size-based tiebreaking
is very useful to catch memory leakers amongst them.

> 
> That brings me back to my original suggestion. Wouldn't a "register an
> oom strategy" approach much better than blending things together and
> then have to wrap heads around different combinations of tunables?

Well, I believe that 90% of this patchset is still relevant; the only
thing you might want to customize/replace size-based tiebreaking with
something else (like timestamp-based tiebreaking, mentioned by David earlier).

What about tunables, there are two, and they are completely orthogonal:
1) oom_priority allows to define an order, in which cgroups will be OOMed
2) oom_kill_all defines if all or just one task should be killed

So, I don't think it's a too complex interface.

Again, I'm not against oom strategy approach, it just looks as a much bigger
project, and I do not see a big need.

Do you have an example, which can't be effectively handled by an approach
I'm suggesting?

> 
> [...]
> > @@ -2760,7 +2761,12 @@ static void select_victim_memcg(struct mem_cgroup 
> > *root, struct oom_control *oc)
> > if (iter->oom_score == 0)
> > continue;
> >  
> > -   if (iter->oom_score > score) {
> > +   if (iter->oom_priority > prio) {
> > +   memcg = iter;
> > +   prio = iter->oom_priority;
> > +   score = iter->oom_score;
> > +   } else if (iter->oom_priority == prio &&
> > +  iter->oom_score > score) {
> > memcg = iter;
> > score = iter->oom_score;
> > }
> 
> Just a minor thing. Why do we even have to calculate oom_score when we
> use it only as a tiebreaker?

Right now it's necessary, because at the same time we do look for
per-existing OOM victims. But if we can have a memcg-level counter for it,
this can be optimized.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-24 Thread Roman Gushchin
Hi Michal!

On Thu, Aug 24, 2017 at 01:47:06PM +0200, Michal Hocko wrote:
> This doesn't apply on top of mmotm cleanly. You are missing
> http://lkml.kernel.org/r/20170807113839.16695-3-mho...@kernel.org

I'll rebase. Thanks!

> 
> On Wed 23-08-17 17:51:59, Roman Gushchin wrote:
> > Traditionally, the OOM killer is operating on a process level.
> > Under oom conditions, it finds a process with the highest oom score
> > and kills it.
> > 
> > This behavior doesn't suit well the system with many running
> > containers:
> > 
> > 1) There is no fairness between containers. A small container with
> > few large processes will be chosen over a large one with huge
> > number of small processes.
> > 
> > 2) Containers often do not expect that some random process inside
> > will be killed. In many cases much safer behavior is to kill
> > all tasks in the container. Traditionally, this was implemented
> > in userspace, but doing it in the kernel has some advantages,
> > especially in a case of a system-wide OOM.
> > 
> > 3) Per-process oom_score_adj affects global OOM, so it's a breache
> > in the isolation.
> 
> Please explain more. I guess you mean that an untrusted memcg could hide
> itself from the global OOM killer by reducing the oom scores? Well you
> need CAP_SYS_RESOURCE do reduce the current oom_score{_adj} as David has
> already pointed out. I also agree that we absolutely must not kill an
> oom disabled task. I am pretty sure somebody is using OOM_SCORE_ADJ_MIN
> as a protection from an untrusted SIGKILL and inconsistent state as a
> result. Those applications simply shouldn't behave differently in the
> global and container contexts.

The main point of the kill_all option is to clean up the victim cgroup
_completely_. If some tasks can survive, that means userspace should
take care of them, look at the cgroup after oom, and kill the survivors
manually.

If you want to rely on OOM_SCORE_ADJ_MIN, don't set kill_all.
I really don't get the usecase for this "kill all, except this and that".

Also, it's really confusing to respect -1000 value, and completely ignore -999.

I believe that any complex userspace OOM handling should use memory.high
and handle memory shortage before an actual OOM.

> 
> If nothing else we have to skip OOM_SCORE_ADJ_MIN tasks during the kill.
> 
> > To address these issues, cgroup-aware OOM killer is introduced.
> > 
> > Under OOM conditions, it tries to find the biggest memory consumer,
> > and free memory by killing corresponding task(s). The difference
> > the "traditional" OOM killer is that it can treat memory cgroups
> > as memory consumers as well as single processes.
> > 
> > By default, it will look for the biggest leaf cgroup, and kill
> > the largest task inside.
> 
> Why? I believe that the semantic should be as simple as kill the largest
> oom killable entity. And the entity is either a process or a memcg which
> is marked that way.

So, you still need to compare memcgroups and processes.

In my case, it's more like an exception (only processes from root memcg,
and only if there are no eligible cgroups with lower oom_priority).
You suggest to rely on this comparison.

> Why should we mix things and select a memcg to kill
> a process inside it? More on that below.

To have some sort of "fairness" in a containerized environemnt.
Say, 1 cgroup with 1 big task, another cgroup with many smaller tasks.
It's not necessary true, that first one is a better victim.

> 
> > But a user can change this behavior by enabling the per-cgroup
> > oom_kill_all_tasks option. If set, it causes the OOM killer treat
> > the whole cgroup as an indivisible memory consumer. In case if it's
> > selected as on OOM victim, all belonging tasks will be killed.
> > 
> > Tasks in the root cgroup are treated as independent memory consumers,
> > and are compared with other memory consumers (e.g. leaf cgroups).
> > The root cgroup doesn't support the oom_kill_all_tasks feature.
> 
> If anything you wouldn't have to treat the root memcg any special. It
> will be like any other memcg which doesn't have oom_kill_all_tasks...
>  
> [...]
> 
> > +static long memcg_oom_badness(struct mem_cgroup *memcg,
> > + const nodemask_t *nodemask)
> > +{
> > +   long points = 0;
> > +   int nid;
> > +   pg_data_t *pgdat;
> > +
> > +   for_each_node_state(nid, N_MEMORY) {
> > +   if (nodemask && !node_isset(nid, *nodemask))
> > +   continue;
> > +
> > +   points += mem_cgroup_node_nr_lru_pages(memcg, nid,
> > +   LRU_ALL_ANON | BIT(LRU_UNEVICTABLE));
> > +
> > +   pgdat = NODE_DATA(nid);
> > +   points += lruvec_page_state(mem_cgroup_lruvec(pgdat, memcg),
> > +   NR_SLAB_UNRECLAIMABLE);
> > +   }
> > +
> > +   points += memcg_page_state(memcg, MEMCG_KERNEL_STACK_KB) /
> > +   (PAGE_SIZE / 1024);
> > +   points += memcg_page_state(memcg, MEMCG_SOCK);
> > +   points += 

Re: [PATCH v7 4/4] boot/param: add pointer to next argument to unknown parameter callback

2017-08-24 Thread msuchanek
On Thu, 24 Aug 2017 21:04:51 +1000
Michael Ellerman  wrote:

> Michal Suchanek  writes:
> 
> > The fadump parameter processing re-does the logic of next_arg quote
> > stripping to determine where the argument ends. Pass pointer to the
> > next argument instead to make this more robust.
> >
> > Signed-off-by: Michal Suchanek 
> > ---
> >  arch/powerpc/kernel/fadump.c  | 13 +
> >  arch/powerpc/mm/hugetlbpage.c |  4 ++--
> >  include/linux/moduleparam.h   |  2 +-
> >  init/main.c   | 12 ++--
> >  kernel/module.c   |  4 ++--
> >  kernel/params.c   | 19 +++
> >  lib/dynamic_debug.c   |  2 +-
> >  7 files changed, 28 insertions(+), 28 deletions(-)  
> 
> Can you split out a patch that adds the next argument and updates the
> callers. And then a patch for the fadump to use the new arg.
> 
> cheers

Yes, that makes sense.

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v6 3/4] mm, oom: introduce oom_priority for memory cgroups

2017-08-24 Thread Michal Hocko
On Wed 23-08-17 17:52:00, Roman Gushchin wrote:
> Introduce a per-memory-cgroup oom_priority setting: an integer number
> within the [-1, 1] range, which defines the order in which
> the OOM killer selects victim memory cgroups.

Why do we need a range here?

> OOM killer prefers memory cgroups with larger priority if they are
> populated with eligible tasks.

So this is basically orthogonal to the score based selection and the
real size is only the tiebreaker for same priorities? Could you describe
the usecase? Becasuse to me this sounds like a separate oom killer
strategy. I can imagine somebody might be interested (e.g. always kill
the oldest memcgs...) but an explicit range wouldn't fly with such a
usecase very well.

That brings me back to my original suggestion. Wouldn't a "register an
oom strategy" approach much better than blending things together and
then have to wrap heads around different combinations of tunables?

[...]
> @@ -2760,7 +2761,12 @@ static void select_victim_memcg(struct mem_cgroup 
> *root, struct oom_control *oc)
>   if (iter->oom_score == 0)
>   continue;
>  
> - if (iter->oom_score > score) {
> + if (iter->oom_priority > prio) {
> + memcg = iter;
> + prio = iter->oom_priority;
> + score = iter->oom_score;
> + } else if (iter->oom_priority == prio &&
> +iter->oom_score > score) {
>   memcg = iter;
>   score = iter->oom_score;
>   }

Just a minor thing. Why do we even have to calculate oom_score when we
use it only as a tiebreaker?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2] media: open.rst: document devnode-centric and mc-centric types

2017-08-24 Thread Mauro Carvalho Chehab
From: "mche...@s-opensource.com" 

When we added support for omap3, back in 2010, we added a new
type of V4L2 devices that aren't fully controlled via the V4L2
device node. Yet, we never made it clear, at the V4L2 spec,
about the differences between both types.

Let's document them with the current implementation.

Signed-off-by: Mauro Carvalho Chehab 
Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/media/uapi/v4l/open.rst | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/media/uapi/v4l/open.rst 
b/Documentation/media/uapi/v4l/open.rst
index afd116edb40d..cf522d9bb53c 100644
--- a/Documentation/media/uapi/v4l/open.rst
+++ b/Documentation/media/uapi/v4l/open.rst
@@ -6,6 +6,53 @@
 Opening and Closing Devices
 ***
 
+Types of V4L2 device control
+
+
+V4L2 devices are usually complex: they're implemented via a main driver and
+often several additional drivers. The main driver always exposes one or
+more **V4L2 device** devnodes (see :ref:`v4l2_device_naming`). The other
+devices are called **V4L2 sub-devices**. They are usually controlled via a
+serial bus (I2C or SMBus).
+
+When V4L2 started, there was only one type of device control. The entire
+device was controlled via the **V4L2 device nodes**. We refer to this
+kind of control as **V4L2 device-centric** (or, simply, **device-centric**).
+
+Since the end of 2010, a new type of V4L2 device control was added in order
+to support complex devices that are common on embedded systems. Those
+devices are controlled mainly via the media controller and sub-devices.
+So, they're called: **media controller centric** (or, simply,
+"**mc-centric**").
+
+On **device-centric** control, the device and their corresponding hardware
+pipelines are controlled via the **V4L2 device** node. They may optionally
+expose the hardware pipelines via the
+:ref:`media controller API `.
+
+On a **mc-centric**, before using the V4L2 device, it is required to
+set the hardware pipelines via the
+:ref:`media controller API `. On those devices, the
+sub-devices' configuration can be controlled via the
+:ref:`sub-device API `, with creates one device node per sub device.
+
+In summary, on **mc-centric** devices:
+
+- The **V4L2 device** node is mainly responsible for controlling the
+  streaming features;
+- The **media controller device** is responsible to setup the pipelines
+  and image settings (like size and format);
+- The **V4L2 sub-devices** are responsible for sub-device
+  specific settings.
+
+.. note::
+
+   It is forbidden for **device-centric** devices to expose V4L2
+   sub-devices via :ref:`sub-device API `, although this
+   might change in the future.
+
+
+.. _v4l2_device_naming:
 
 Device Naming
 =
-- 
2.13.5

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2] media: open.rst: document devnode-centric and mc-centric types

2017-08-24 Thread Mauro Carvalho Chehab
From: "mche...@s-opensource.com" 

When we added support for omap3, back in 2010, we added a new
type of V4L2 devices that aren't fully controlled via the V4L2
device node. Yet, we never made it clear, at the V4L2 spec,
about the differences between both types.

Let's document them with the current implementation.

Signed-off-by: Mauro Carvalho Chehab 
Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/media/uapi/v4l/open.rst | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/media/uapi/v4l/open.rst 
b/Documentation/media/uapi/v4l/open.rst
index afd116edb40d..cf522d9bb53c 100644
--- a/Documentation/media/uapi/v4l/open.rst
+++ b/Documentation/media/uapi/v4l/open.rst
@@ -6,6 +6,53 @@
 Opening and Closing Devices
 ***
 
+Types of V4L2 device control
+
+
+V4L2 devices are usually complex: they're implemented via a main driver and
+often several additional drivers. The main driver always exposes one or
+more **V4L2 device** devnodes (see :ref:`v4l2_device_naming`). The other
+devices are called **V4L2 sub-devices**. They are usually controlled via a
+serial bus (I2C or SMBus).
+
+When V4L2 started, there was only one type of device control. The entire
+device was controlled via the **V4L2 device nodes**. We refer to this
+kind of control as **V4L2 device-centric** (or, simply, **device-centric**).
+
+Since the end of 2010, a new type of V4L2 device control was added in order
+to support complex devices that are common on embedded systems. Those
+devices are controlled mainly via the media controller and sub-devices.
+So, they're called: **media controller centric** (or, simply,
+"**mc-centric**").
+
+On **device-centric** control, the device and their corresponding hardware
+pipelines are controlled via the **V4L2 device** node. They may optionally
+expose the hardware pipelines via the
+:ref:`media controller API `.
+
+On a **mc-centric**, before using the V4L2 device, it is required to
+set the hardware pipelines via the
+:ref:`media controller API `. On those devices, the
+sub-devices' configuration can be controlled via the
+:ref:`sub-device API `, with creates one device node per sub device.
+
+In summary, on **mc-centric** devices:
+
+- The **V4L2 device** node is mainly responsible for controlling the
+  streaming features;
+- The **media controller device** is responsible to setup the pipelines
+  and image settings (like size and format);
+- The **V4L2 sub-devices** are responsible for sub-device
+  specific settings.
+
+.. note::
+
+   It is forbidden for **device-centric** devices to expose V4L2
+   sub-devices via :ref:`sub-device API `, although this
+   might change in the future.
+
+
+.. _v4l2_device_naming:
 
 Device Naming
 =
-- 
2.13.5

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v6 2/4] mm, oom: cgroup-aware OOM killer

2017-08-24 Thread Michal Hocko
This doesn't apply on top of mmotm cleanly. You are missing
http://lkml.kernel.org/r/20170807113839.16695-3-mho...@kernel.org

On Wed 23-08-17 17:51:59, Roman Gushchin wrote:
> Traditionally, the OOM killer is operating on a process level.
> Under oom conditions, it finds a process with the highest oom score
> and kills it.
> 
> This behavior doesn't suit well the system with many running
> containers:
> 
> 1) There is no fairness between containers. A small container with
> few large processes will be chosen over a large one with huge
> number of small processes.
> 
> 2) Containers often do not expect that some random process inside
> will be killed. In many cases much safer behavior is to kill
> all tasks in the container. Traditionally, this was implemented
> in userspace, but doing it in the kernel has some advantages,
> especially in a case of a system-wide OOM.
> 
> 3) Per-process oom_score_adj affects global OOM, so it's a breache
> in the isolation.

Please explain more. I guess you mean that an untrusted memcg could hide
itself from the global OOM killer by reducing the oom scores? Well you
need CAP_SYS_RESOURCE do reduce the current oom_score{_adj} as David has
already pointed out. I also agree that we absolutely must not kill an
oom disabled task. I am pretty sure somebody is using OOM_SCORE_ADJ_MIN
as a protection from an untrusted SIGKILL and inconsistent state as a
result. Those applications simply shouldn't behave differently in the
global and container contexts.

If nothing else we have to skip OOM_SCORE_ADJ_MIN tasks during the kill.

> To address these issues, cgroup-aware OOM killer is introduced.
> 
> Under OOM conditions, it tries to find the biggest memory consumer,
> and free memory by killing corresponding task(s). The difference
> the "traditional" OOM killer is that it can treat memory cgroups
> as memory consumers as well as single processes.
> 
> By default, it will look for the biggest leaf cgroup, and kill
> the largest task inside.

Why? I believe that the semantic should be as simple as kill the largest
oom killable entity. And the entity is either a process or a memcg which
is marked that way. Why should we mix things and select a memcg to kill
a process inside it? More on that below.

> But a user can change this behavior by enabling the per-cgroup
> oom_kill_all_tasks option. If set, it causes the OOM killer treat
> the whole cgroup as an indivisible memory consumer. In case if it's
> selected as on OOM victim, all belonging tasks will be killed.
> 
> Tasks in the root cgroup are treated as independent memory consumers,
> and are compared with other memory consumers (e.g. leaf cgroups).
> The root cgroup doesn't support the oom_kill_all_tasks feature.

If anything you wouldn't have to treat the root memcg any special. It
will be like any other memcg which doesn't have oom_kill_all_tasks...
 
[...]

> +static long memcg_oom_badness(struct mem_cgroup *memcg,
> +   const nodemask_t *nodemask)
> +{
> + long points = 0;
> + int nid;
> + pg_data_t *pgdat;
> +
> + for_each_node_state(nid, N_MEMORY) {
> + if (nodemask && !node_isset(nid, *nodemask))
> + continue;
> +
> + points += mem_cgroup_node_nr_lru_pages(memcg, nid,
> + LRU_ALL_ANON | BIT(LRU_UNEVICTABLE));
> +
> + pgdat = NODE_DATA(nid);
> + points += lruvec_page_state(mem_cgroup_lruvec(pgdat, memcg),
> + NR_SLAB_UNRECLAIMABLE);
> + }
> +
> + points += memcg_page_state(memcg, MEMCG_KERNEL_STACK_KB) /
> + (PAGE_SIZE / 1024);
> + points += memcg_page_state(memcg, MEMCG_SOCK);
> + points += memcg_page_state(memcg, MEMCG_SWAP);
> +
> + return points;

I guess I have asked already and we haven't reached any consensus. I do
not like how you treat memcgs and tasks differently. Why cannot we have
a memcg score a sum of all its tasks? How do you want to compare memcg
score with tasks score? This just smells like the outcome of a weird
semantic that you try to select the largest group I have mentioned
above.

This is a rather fundamental concern and I believe we should find a
consensus on it before going any further. I believe that users shouldn't
see any difference in the OOM behavior when memcg v2 is used and there
is no kill-all memcg. If there is such a memcg then we should treat only
those specially. But you might have really strong usecases which haven't
been presented or I've missed their importance.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] media: open.rst: document devnode-centric and mc-centric types

2017-08-24 Thread Mauro Carvalho Chehab
Em Thu, 24 Aug 2017 12:18:56 +0300
Sakari Ailus  escreveu:

> Hi Mauro,
> 
> On Wed, Aug 23, 2017 at 10:42:28AM -0300, Mauro Carvalho Chehab wrote:
> > Em Wed, 23 Aug 2017 12:37:30 +0300
> > Sakari Ailus  escreveu:
> >   
> > > Hi Mauro,
> > > 
> > > Thanks for the patch! Something like this was long due.
> > > 
> > > I cc'd Hans and Laurent to get their attention, too.
> > > 
> > > On Sat, Aug 19, 2017 at 07:04:40AM -0300, Mauro Carvalho Chehab wrote:  
> > > > When we added support for omap3, back in 2010, we added a new
> > > > type of V4L2 devices that aren't fully controlled via the V4L2
> > > > device node. Yet, we never made it clear, at the V4L2 spec,
> > > > about the differences between both types.
> > > > 
> > > > Let's document them with the current implementation.
> > > > 
> > > > Signed-off-by: Mauro Carvalho Chehab 
> > > > ---
> > > > 
> > > > This patch is a result of this week's discussion we had with regards to 
> > > > merging
> > > > a patch series from Sakari documenting the need of propagating streaming
> > > > control between sub-devices on some complex mc-centric devices.
> > > > 
> > > > One big gap on our documentation popped up: while we have distinct 
> > > > behavior
> > > > for mc-centric and devnode-centric types of V4L2 devices, we never 
> > > > clearly
> > > > documented about that.
> > > > 
> > > > This RFC patch is a first attempt to have a definition of those types, 
> > > > and to
> > > > standardize the names we use to distinguish between those types.
> > > > 
> > > > Comments are welcome.
> > > > 
> > > > 
> > > >  Documentation/media/uapi/v4l/open.rst | 37 
> > > > +++
> > > >  1 file changed, 37 insertions(+)
> > > > 
> > > > diff --git a/Documentation/media/uapi/v4l/open.rst 
> > > > b/Documentation/media/uapi/v4l/open.rst
> > > > index afd116edb40d..9cf4f74c466a 100644
> > > > --- a/Documentation/media/uapi/v4l/open.rst
> > > > +++ b/Documentation/media/uapi/v4l/open.rst
> > > > @@ -6,6 +6,43 @@
> > > >  Opening and Closing Devices
> > > >  ***
> > > >  
> > > > +Types of V4L2 devices
> > > > +=
> > > > +
> > > > +V4L2 devices are usually complex: they're implemented via a main 
> > > > driver and
> > > 
> > > Not all such as UVC webcams. How about:
> > > 
> > > s/implemented/often implemented/
> > >   
> > > > +several other drivers.  
> > 
> > True. uvcvideo and gspca drivers has only a main driver.
> > 
> > What about, instead:
> > 
> > V4L2 devices are usually complex: they're implemented via a main driver
> > and often by several other drivers.  
> 
> How about s/other/additional/? That I think would better convey the
> suggestion the main driver's role stays even if there are other drivers.

OK.

> 
> >   
> > > The main driver always exposes a V4L2 device node  
> 
> This should actually say that there's "one or more V4L2 device nodes".
> 
> > > > +(see :ref:`v4l2_device_naming`). The other devices are called 
> > > > **sub-devices**.
> > > > +They are usually controlled via a serial bus (I2C or SMBus).
> > > 
> > > The main driver also often exposes sub-devices.  
> > 
> > What about:
> > 
> > (see :ref:`v4l2_device_naming`). The other devices, when present,
> > are called **sub-devices**.  
> 
> I might use "V4L2 sub-devices" instead of plain "sub-devices". I don't have
> strong opinion either way.

OK.

> >   
> > > The practice has been that the video nodes are registered by the driver
> > > that generally orchestrates things for the complete device but I don't
> > > think there's anything that would exclude doing this otherwise if there 
> > > are
> > > two struct devices involved that do DMA.  
> > 
> > Yeah. Well, I avoided the discussion if a mc-centric device with just
> > enable a single DMA engine for camera output should orchestrate things
> > for the complete device, as this is something that the current drivers
> > don't usually do (yet, IMHO, they should, at least for those cheap SoC
> > devices with just one camera connector).
> > 
> > Anyway, this is a separate issue. For now, let's focus on documenting
> > the current situation.
> >   
> > >   
> > > > +
> > > > +When V4L2 started, there was only one type of device, fully controlled 
> > > > via
> > > > +V4L2 device nodes. We call those devices as **devnode-centric 
> > > > devices**.
> > > 
> > > As device nodes can refer to any types of device nodes (such as sub-device
> > > nodes), how about calling these "video node centric"? To make it more
> > > explicit, I'd use "V4L2 video node centric" here. Perhaps "video node
> > > centric" later to make it shorter.  
> > 
> > I'm open to change its name, but "video node" is also wrong. There are
> > some devices that only exposes /dev/radio.  
> 
> Good point.
> 
> > 
> > main-driver-centric?  
> 
> I think most of the user space developers probably see this from the
> interface 

Re: [v6 1/4] mm, oom: refactor the oom_kill_process() function

2017-08-24 Thread Michal Hocko
This patch fails to apply on top of the mmotm tree. It seems the only
reason is the missing
http://lkml.kernel.org/r/20170810075019.28998-2-mho...@kernel.org

On Wed 23-08-17 17:51:57, Roman Gushchin wrote:
> The oom_kill_process() function consists of two logical parts:
> the first one is responsible for considering task's children as
> a potential victim and printing the debug information.
> The second half is responsible for sending SIGKILL to all
> tasks sharing the mm struct with the given victim.
> 
> This commit splits the oom_kill_process() function with
> an intention to re-use the the second half: __oom_kill_process().

Yes this makes some sense even without further changes.

> The cgroup-aware OOM killer will kill multiple tasks
> belonging to the victim cgroup. We don't need to print
> the debug information for the each task, as well as play
> with task selection (considering task's children),
> so we can't use the existing oom_kill_process().
> 
> Signed-off-by: Roman Gushchin 
> Cc: Michal Hocko 
> Cc: Vladimir Davydov 
> Cc: Johannes Weiner 
> Cc: Tetsuo Handa 
> Cc: David Rientjes 
> Cc: Tejun Heo 
> Cc: kernel-t...@fb.com
> Cc: cgro...@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux...@kvack.org

I do agree with the patch there is just one thing to fix up.

> ---
>  mm/oom_kill.c | 123 
> +++---
>  1 file changed, 65 insertions(+), 58 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 53b44425ef35..5c29a3dd591b 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -817,67 +817,12 @@ static bool task_will_free_mem(struct task_struct *task)
>   return ret;
>  }
>  
> -static void oom_kill_process(struct oom_control *oc, const char *message)
> +static void __oom_kill_process(struct task_struct *victim)
>  {
[...]
>   p = find_lock_task_mm(victim);
>   if (!p) {
>   put_task_struct(victim);

The context doesn't tell us but there is return right after this.

p = find_lock_task_mm(victim);
if (!p) {
put_task_struct(victim);
return;
} else if (victim != p) {
get_task_struct(p);
put_task_struct(victim);
victim = p;
}

So we return with the reference dropped. Moreover we can change
the victim, drop the reference on old one...

> +static void oom_kill_process(struct oom_control *oc, const char *message)
> +{
[...]
> + __oom_kill_process(victim);
> + put_task_struct(victim);

while we drop it here again and won't drop the changed one. If we race
with the exiting task and there is no mm then we we double drop as well.
So I think that __oom_kill_process should really drop the reference for
all cases and oom_kill_process shouldn't care. Or if you absolutely
need a guarantee that the victim won't go away after __oom_kill_process
then you need to return the real victim and let the caller to deal with
put_task_struct.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/4] boot/param: add pointer to next argument to unknown parameter callback

2017-08-24 Thread Michael Ellerman
Michal Suchanek  writes:

> The fadump parameter processing re-does the logic of next_arg quote
> stripping to determine where the argument ends. Pass pointer to the
> next argument instead to make this more robust.
>
> Signed-off-by: Michal Suchanek 
> ---
>  arch/powerpc/kernel/fadump.c  | 13 +
>  arch/powerpc/mm/hugetlbpage.c |  4 ++--
>  include/linux/moduleparam.h   |  2 +-
>  init/main.c   | 12 ++--
>  kernel/module.c   |  4 ++--
>  kernel/params.c   | 19 +++
>  lib/dynamic_debug.c   |  2 +-
>  7 files changed, 28 insertions(+), 28 deletions(-)

Can you split out a patch that adds the next argument and updates the
callers. And then a patch for the fadump to use the new arg.

cheers

> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index d7da4ce9f7ae..6ef96711ee9a 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -474,13 +474,14 @@ struct param_info {
>  };
>  
>  static void __init fadump_update_params(struct param_info *param_info,
> - char *param, char *val)
> + char *param, char *val, char *next)
>  {
>   ptrdiff_t param_offset = param - param_info->tmp_cmdline;
>   size_t vallen = val ? strlen(val) : 0;
>   char *tgt = param_info->cmdline + param_offset +
>   FADUMP_EXTRA_ARGS_LEN - param_info->shortening;
> - int shortening = 0;
> + int shortening = ((next - 1) - (param))
> + - (FADUMP_EXTRA_ARGS_LEN + 1 + vallen);
>  
>   if (!val)
>   return;
> @@ -488,10 +489,6 @@ static void __init fadump_update_params(struct 
> param_info *param_info,
>   /* remove '=' */
>   *tgt++ = ' ';
>  
> - /* next_arg removes one leading and one trailing '"' */
> - if ((*tgt == '"') && (*(tgt + vallen + shortening) == '"'))
> - shortening += 2;
> -
>   /* remove one leading and one trailing quote if both are present */
>   if ((val[0] == '"') && (val[vallen - 1] == '"')) {
>   shortening += 2;
> @@ -517,7 +514,7 @@ static void __init fadump_update_params(struct param_info 
> *param_info,
>   * to enforce the parameters passed through it
>   */
>  static int __init fadump_rework_cmdline_params(char *param, char *val,
> -const char *unused, void *arg)
> + char *next, const char *unused, void *arg)
>  {
>   struct param_info *param_info = (struct param_info *)arg;
>  
> @@ -525,7 +522,7 @@ static int __init fadump_rework_cmdline_params(char 
> *param, char *val,
>strlen(FADUMP_EXTRA_ARGS_PARAM) - 1))
>   return 0;
>  
> - fadump_update_params(param_info, param, val);
> + fadump_update_params(param_info, param, val, next);
>  
>   return 0;
>  }
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index e1bf5ca397fe..3a4cce552906 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -268,8 +268,8 @@ int alloc_bootmem_huge_page(struct hstate *hstate)
>  
>  unsigned long gpage_npages[MMU_PAGE_COUNT];
>  
> -static int __init do_gpage_early_setup(char *param, char *val,
> -const char *unused, void *arg)
> +static int __init do_gpage_early_setup(char *param, char *val, char *unused1,
> +const char *unused2, void *arg)
>  {
>   static phys_addr_t size;
>   unsigned long npages;
> diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
> index 1ee7b30dafec..fec05a186c08 100644
> --- a/include/linux/moduleparam.h
> +++ b/include/linux/moduleparam.h
> @@ -326,7 +326,7 @@ extern char *parse_args(const char *name,
> s16 level_min,
> s16 level_max,
> void *arg,
> -   int (*unknown)(char *param, char *val,
> +   int (*unknown)(char *param, char *val, char *next,
>const char *doing, void *arg));
>  
>  /* Called by module remove. */
> diff --git a/init/main.c b/init/main.c
> index 052481fbe363..920c3564b2f0 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -239,7 +239,7 @@ static int __init loglevel(char *str)
>  early_param("loglevel", loglevel);
>  
>  /* Change NUL term back to "=", to make "param" the whole string. */
> -static int __init repair_env_string(char *param, char *val,
> +static int __init repair_env_string(char *param, char *val, char *unused2,
>   const char *unused, void *arg)
>  {
>   if (val) {
> @@ -257,7 +257,7 @@ static int __init repair_env_string(char *param, char 
> *val,
>  }
>  
>  /* Anything after -- gets handed straight to init. */
> -static int __init set_init_arg(char *param, char *val,
> +static int __init 

Re: [PATCH v7 3/4] lib/cmdline.c Remove quotes symmetrically.

2017-08-24 Thread Michael Ellerman
Michal Suchanek  writes:

> Remove quotes from argument value only if there is qoute on both sides.
>
> Signed-off-by: Michal Suchanek 
> ---
>  arch/powerpc/kernel/fadump.c | 6 ++
>  lib/cmdline.c| 7 ++-

Can you split that into two patches?

cheers

>  2 files changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index a1614d9b8a21..d7da4ce9f7ae 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -489,10 +489,8 @@ static void __init fadump_update_params(struct 
> param_info *param_info,
>   *tgt++ = ' ';
>  
>   /* next_arg removes one leading and one trailing '"' */
> - if (*tgt == '"')
> - shortening += 1;
> - if (*(tgt + vallen + shortening) == '"')
> - shortening += 1;
> + if ((*tgt == '"') && (*(tgt + vallen + shortening) == '"'))
> + shortening += 2;
>  
>   /* remove one leading and one trailing quote if both are present */
>   if ((val[0] == '"') && (val[vallen - 1] == '"')) {
> diff --git a/lib/cmdline.c b/lib/cmdline.c
> index 4c0888c4a68d..01e701b2afe8 100644
> --- a/lib/cmdline.c
> +++ b/lib/cmdline.c
> @@ -227,14 +227,11 @@ char *next_arg(char *args, char **param, char **val)
>   *val = args + equals + 1;
>  
>   /* Don't include quotes in value. */
> - if (**val == '"') {
> + if ((**val == '"') && (args[i-1] == '"')) {
>   (*val)++;
> - if (args[i-1] == '"')
> - args[i-1] = '\0';
> + args[i-1] = '\0';
>   }
>   }
> - if (quoted && args[i-1] == '"')
> - args[i-1] = '\0';
>  
>   if (args[i]) {
>   args[i] = '\0';
> -- 
> 2.10.2
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] media: open.rst: document devnode-centric and mc-centric types

2017-08-24 Thread Sakari Ailus
Hi Mauro,

On Wed, Aug 23, 2017 at 10:42:28AM -0300, Mauro Carvalho Chehab wrote:
> Em Wed, 23 Aug 2017 12:37:30 +0300
> Sakari Ailus  escreveu:
> 
> > Hi Mauro,
> > 
> > Thanks for the patch! Something like this was long due.
> > 
> > I cc'd Hans and Laurent to get their attention, too.
> > 
> > On Sat, Aug 19, 2017 at 07:04:40AM -0300, Mauro Carvalho Chehab wrote:
> > > When we added support for omap3, back in 2010, we added a new
> > > type of V4L2 devices that aren't fully controlled via the V4L2
> > > device node. Yet, we never made it clear, at the V4L2 spec,
> > > about the differences between both types.
> > > 
> > > Let's document them with the current implementation.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab 
> > > ---
> > > 
> > > This patch is a result of this week's discussion we had with regards to 
> > > merging
> > > a patch series from Sakari documenting the need of propagating streaming
> > > control between sub-devices on some complex mc-centric devices.
> > > 
> > > One big gap on our documentation popped up: while we have distinct 
> > > behavior
> > > for mc-centric and devnode-centric types of V4L2 devices, we never clearly
> > > documented about that.
> > > 
> > > This RFC patch is a first attempt to have a definition of those types, 
> > > and to
> > > standardize the names we use to distinguish between those types.
> > > 
> > > Comments are welcome.
> > > 
> > > 
> > >  Documentation/media/uapi/v4l/open.rst | 37 
> > > +++
> > >  1 file changed, 37 insertions(+)
> > > 
> > > diff --git a/Documentation/media/uapi/v4l/open.rst 
> > > b/Documentation/media/uapi/v4l/open.rst
> > > index afd116edb40d..9cf4f74c466a 100644
> > > --- a/Documentation/media/uapi/v4l/open.rst
> > > +++ b/Documentation/media/uapi/v4l/open.rst
> > > @@ -6,6 +6,43 @@
> > >  Opening and Closing Devices
> > >  ***
> > >  
> > > +Types of V4L2 devices
> > > +=
> > > +
> > > +V4L2 devices are usually complex: they're implemented via a main driver 
> > > and  
> > 
> > Not all such as UVC webcams. How about:
> > 
> > s/implemented/often implemented/
> > 
> > > +several other drivers.
> 
> True. uvcvideo and gspca drivers has only a main driver.
> 
> What about, instead:
> 
> V4L2 devices are usually complex: they're implemented via a main driver
> and often by several other drivers.

How about s/other/additional/? That I think would better convey the
suggestion the main driver's role stays even if there are other drivers.

> 
> > The main driver always exposes a V4L2 device node

This should actually say that there's "one or more V4L2 device nodes".

> > > +(see :ref:`v4l2_device_naming`). The other devices are called 
> > > **sub-devices**.
> > > +They are usually controlled via a serial bus (I2C or SMBus).  
> > 
> > The main driver also often exposes sub-devices.
> 
> What about:
> 
>   (see :ref:`v4l2_device_naming`). The other devices, when present,
>   are called **sub-devices**.

I might use "V4L2 sub-devices" instead of plain "sub-devices". I don't have
strong opinion either way.

> 
> > The practice has been that the video nodes are registered by the driver
> > that generally orchestrates things for the complete device but I don't
> > think there's anything that would exclude doing this otherwise if there are
> > two struct devices involved that do DMA.
> 
> Yeah. Well, I avoided the discussion if a mc-centric device with just
> enable a single DMA engine for camera output should orchestrate things
> for the complete device, as this is something that the current drivers
> don't usually do (yet, IMHO, they should, at least for those cheap SoC
> devices with just one camera connector).
> 
> Anyway, this is a separate issue. For now, let's focus on documenting
> the current situation.
> 
> > 
> > > +
> > > +When V4L2 started, there was only one type of device, fully controlled 
> > > via
> > > +V4L2 device nodes. We call those devices as **devnode-centric devices**. 
> > >  
> > 
> > As device nodes can refer to any types of device nodes (such as sub-device
> > nodes), how about calling these "video node centric"? To make it more
> > explicit, I'd use "V4L2 video node centric" here. Perhaps "video node
> > centric" later to make it shorter.
> 
> I'm open to change its name, but "video node" is also wrong. There are
> some devices that only exposes /dev/radio.

Good point.

> 
> main-driver-centric?

I think most of the user space developers probably see this from the
interface point of view, they might not know which driver exposes
particular device nodes, for instance.

Would it be too clunky to call them non-MC-centric? That would be
technically correct.

Or simply "plain". Combined with the explanation above, this could also be
workable.

> 
> > > +Since the end of 2010, a new type of V4L2 device was added, in order to
> > > +support complex devices that are common on 

Re: [PATCH v2] docs: ReSTify table of contents in core.rst

2017-08-24 Thread Jani Nikula
On Wed, 23 Aug 2017, Josh Holland  wrote:
> Sphinx will now generate the table of contents automatically, which
> avoids having the ToC getting out of sync with the rest of the document.
>
> Signed-off-by: Josh Holland 
> ---
>  Documentation/security/keys/core.rst | 12 +---
>  1 file changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/Documentation/security/keys/core.rst 
> b/Documentation/security/keys/core.rst
> index 312f981fa..1266eeae4 100644
> --- a/Documentation/security/keys/core.rst
> +++ b/Documentation/security/keys/core.rst
> @@ -16,17 +16,7 @@ The key service can be configured on by enabling:
>  
>  This document has the following sections:
>  
> - - Key overview
> - - Key service overview
> - - Key access permissions
> - - SELinux support
> - - New procfs files
> - - Userspace system call interface
> - - Kernel services
> - - Notes on accessing payload contents
> - - Defining a key type
> - - Request-key callback service
> - - Garbage collection
> +.. contents:: :local:

Did you actually try this and look at the 'make htmldocs' results?

I know I tried what I suggested:

.. contents::
   :local:

http://docutils.sourceforge.net/docs/ref/rst/directives.html#table-of-contents

BR,
Jani.

>  
>  
>  Key Overview

-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html