Re: [RESEND PATCH V5 33/33] block: document usage of bio iterator helpers

2018-05-25 Thread Randy Dunlap
On 05/24/2018 08:46 PM, Ming Lei wrote:
> Now multipage bvec is supported, and some helpers may return page by
> page, and some may return segment by segment, this patch documents the
> usage for helping us use them correctly.
> 
> Signed-off-by: Ming Lei 
> ---
>  Documentation/block/biovecs.txt | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
> index b4d238b8d9fc..32a6643caeca 100644
> --- a/Documentation/block/biovecs.txt
> +++ b/Documentation/block/biovecs.txt
> @@ -117,3 +117,35 @@ Other implications:
> size limitations and the limitations of the underlying devices. Thus
> there's no need to define ->merge_bvec_fn() callbacks for individual block
> drivers.
> +
> +Usage of helpers:
> +=
> +
> +* The following helpers which name has suffix of "_all" can only be used on

   * The following helpers, whose names have the suffix "_all", can only be 
used on

> +non-BIO_CLONED bio, and ususally they are used by filesystem code, and driver

   usually

> +shouldn't use them becasue bio may have been splitted before they got to the

  because   split

> +driver:
> +
> + bio_for_each_segment_all()
> + bio_for_each_page_all()
> + bio_pages_all()
> + bio_first_bvec_all()
> + bio_first_page_all()
> + bio_last_bvec_all()
> + segment_for_each_page_all()
> +
> +* The following helpers iterate bio page by page, and the local variable of
> +'struct bio_vec' or the reference records single page io vector during the
> +itearation:

   iteration:

> +
> + bio_for_each_page()
> + bio_for_each_page_all()
> + segment_for_each_page_all()
> +
> +* The following helpers iterate bio segment by segment, and each segment may
> +include multiple physically contiguous pages, and the local variable of
> +'struct bio_vec' or the reference records multi page io vector during the
> +itearation:

   iteration:

> +
> + bio_for_each_segment()
> + bio_for_each_segment_all()
> 


-- 
~Randy


Re: [PATCH 3/6] block: Create scsi_sense.h for SCSI and ATAPI

2018-05-23 Thread Randy Dunlap
On 05/23/2018 02:22 PM, Jens Axboe wrote:
> On 5/23/18 3:20 PM, Randy Dunlap wrote:
>> On 05/23/2018 02:14 PM, Jens Axboe wrote:
>>> On 5/23/18 2:52 PM, Kees Cook wrote:
>>>> On Wed, May 23, 2018 at 7:31 AM, Jens Axboe <ax...@kernel.dk> wrote:
>>>>> On 5/23/18 8:25 AM, Christoph Hellwig wrote:
>>>>>> On Wed, May 23, 2018 at 08:13:56AM -0600, Jens Axboe wrote:
>>>>>>>> Should I move to code to a new drivers/scsi/scsi_sense.c and add it to
>>>>>>>> drivers/scsi/Makefile as:
>>>>>>>>
>>>>>>>> obj-$(CONFIG_BLK_SCSI_REQUEST)+= scsi_sense.o
>>>>>>>>
>>>>>>>> Every place I want to use the code is already covered by
>>>>>>>> CONFIG_BLK_SCSI_REQUEST, so it seems like I just need to know where to
>>>>>>>> put the .c file. :P
>>>>>>>
>>>>>>> I think this is so much saner than a SCSI select or dependency, so I'll
>>>>>>> have to disagree with Martin and Christoph. Just put it in drivers/scsi,
>>>>>>> if it's the location they care about.
>>>>>>
>>>>>> I actually plan to remove CONFIG_BLK_SCSI_REQUEST in a merge window
>>>>>> or two.  The only users are scsi and the ide layer, (virtio_blk
>>>>>> support has already been accidentally disabled for a while), and getting
>>>>>> rid of it allows to to shrink and simply the scsi data structures.
>>>>>>
>>>>>> But if you want this for now lets keep scsi_sense.c in drivers/scsi
>>>>>> but depend on CONFIG_BLK_SCSI_REQUEST, that is easy enough to fix up.
>>>>>
>>>>> It could be a stand-alone dependency, doesn't have to be BLK_SCSI_REQUEST.
>>>>> BLA_SCSI_SENSE or something would do. I don't care too much about that,
>>>>> mostly getting rid of the entire stack dependency.
>>>>
>>>> Aaand, I can't do this and leave it in drivers/scsi because of 
>>>> drivers/Makefile:
>>>>
>>>> obj-$(CONFIG_SCSI)  += scsi/
>>>>
>>>> So: this needs to live in block/ just like CONFIG_BLK_SCSI_REQUEST's
>>>> scsi_ioctl.c. I will split it into CONFIG_BLK_SCSI_SENSE, but I'll
>>>> still need to move the code from drivers/scsi/ to block/. Is this
>>>> okay?
>>>
>>> Ugh, so that would necessitate a change there too. As I said before,
>>> I don't really care where it lives. I know the SCSI folks seem bothered
>>> by moving it, but in reality, it's not like this stuff will likely ever
>>> really change. Of the two choices (select entire SCSI stack, or just move
>>> this little bit), I know what I would consider the saner option...
>>>
>>
>> or option 3:
>>
>> obj-y   += scsi/
>>
>> so that make descends into drivers/scsi/ and then builds whatever is needed,
>> depending on individual kconfig options.
> 
> Right, that was the initial option, the later two are the other options.
> 

Sorry, I'm late to the party.

-- 
~Randy


Re: [PATCH 3/6] block: Create scsi_sense.h for SCSI and ATAPI

2018-05-23 Thread Randy Dunlap
On 05/23/2018 02:14 PM, Jens Axboe wrote:
> On 5/23/18 2:52 PM, Kees Cook wrote:
>> On Wed, May 23, 2018 at 7:31 AM, Jens Axboe  wrote:
>>> On 5/23/18 8:25 AM, Christoph Hellwig wrote:
 On Wed, May 23, 2018 at 08:13:56AM -0600, Jens Axboe wrote:
>> Should I move to code to a new drivers/scsi/scsi_sense.c and add it to
>> drivers/scsi/Makefile as:
>>
>> obj-$(CONFIG_BLK_SCSI_REQUEST)+= scsi_sense.o
>>
>> Every place I want to use the code is already covered by
>> CONFIG_BLK_SCSI_REQUEST, so it seems like I just need to know where to
>> put the .c file. :P
>
> I think this is so much saner than a SCSI select or dependency, so I'll
> have to disagree with Martin and Christoph. Just put it in drivers/scsi,
> if it's the location they care about.

 I actually plan to remove CONFIG_BLK_SCSI_REQUEST in a merge window
 or two.  The only users are scsi and the ide layer, (virtio_blk
 support has already been accidentally disabled for a while), and getting
 rid of it allows to to shrink and simply the scsi data structures.

 But if you want this for now lets keep scsi_sense.c in drivers/scsi
 but depend on CONFIG_BLK_SCSI_REQUEST, that is easy enough to fix up.
>>>
>>> It could be a stand-alone dependency, doesn't have to be BLK_SCSI_REQUEST.
>>> BLA_SCSI_SENSE or something would do. I don't care too much about that,
>>> mostly getting rid of the entire stack dependency.
>>
>> Aaand, I can't do this and leave it in drivers/scsi because of 
>> drivers/Makefile:
>>
>> obj-$(CONFIG_SCSI)  += scsi/
>>
>> So: this needs to live in block/ just like CONFIG_BLK_SCSI_REQUEST's
>> scsi_ioctl.c. I will split it into CONFIG_BLK_SCSI_SENSE, but I'll
>> still need to move the code from drivers/scsi/ to block/. Is this
>> okay?
> 
> Ugh, so that would necessitate a change there too. As I said before,
> I don't really care where it lives. I know the SCSI folks seem bothered
> by moving it, but in reality, it's not like this stuff will likely ever
> really change. Of the two choices (select entire SCSI stack, or just move
> this little bit), I know what I would consider the saner option...
> 

or option 3:

obj-y   += scsi/

so that make descends into drivers/scsi/ and then builds whatever is needed,
depending on individual kconfig options.

-- 
~Randy


Re: [PATCH 3/6] block: Create scsi_sense.h for SCSI and ATAPI

2018-05-22 Thread Randy Dunlap
On 05/22/2018 04:39 PM, Kees Cook wrote:
> On Tue, May 22, 2018 at 4:34 PM, Randy Dunlap <rdun...@infradead.org> wrote:
>> On 05/22/2018 04:31 PM, Kees Cook wrote:
>>> On Tue, May 22, 2018 at 12:16 PM, Jens Axboe <ax...@kernel.dk> wrote:
>>>> On 5/22/18 1:13 PM, Christoph Hellwig wrote:
>>>>> On Tue, May 22, 2018 at 01:09:41PM -0600, Jens Axboe wrote:
>>>>>> I think Martin and Christoph are objecting to moving the code to
>>>>>> block/scsi_ioctl.h. I don't care too much about where the code is, but
>>>>>> think it would be nice to have the definitions in a separate header. But
>>>>>> if they prefer just pulling in all of SCSI for it, well then I guess
>>>>>> it's pointless to move the header bits. Seems very heavy handed to me,
>>>>>> though.
>>>>>
>>>>> It might be heavy handed for the 3 remaining users of drivers/ide,
>>>>
>>>> Brutal :-)
>>>
>>> Heh. I noticed a similar sense buffer use in drivers/cdrom/cdrom.c
>>> too. Is this okay under the same considerations?
>>
>> No.  Do not select an entire subsystem.  Use depends on it instead.
> 
> I looked at that first, but it seems it's designed for that. For
> example, "config ATA" already has a "select SCSI".
> 
> It does look fishy, though, since "config SCSI" has a "depends" which
> would be ignored by "select". Luckily, all these uses already do a
> "depends on BLOCK" (directly or indirectly).

Linus has railed against selecting subsystems.  We shouldn't be adding
more IMHO, although it is difficult to get rid of ones that we already have.


-- 
~Randy


Re: [PATCH 3/6] block: Create scsi_sense.h for SCSI and ATAPI

2018-05-22 Thread Randy Dunlap
On 05/22/2018 04:31 PM, Kees Cook wrote:
> On Tue, May 22, 2018 at 12:16 PM, Jens Axboe  wrote:
>> On 5/22/18 1:13 PM, Christoph Hellwig wrote:
>>> On Tue, May 22, 2018 at 01:09:41PM -0600, Jens Axboe wrote:
 I think Martin and Christoph are objecting to moving the code to
 block/scsi_ioctl.h. I don't care too much about where the code is, but
 think it would be nice to have the definitions in a separate header. But
 if they prefer just pulling in all of SCSI for it, well then I guess
 it's pointless to move the header bits. Seems very heavy handed to me,
 though.
>>>
>>> It might be heavy handed for the 3 remaining users of drivers/ide,
>>
>> Brutal :-)
> 
> Heh. I noticed a similar sense buffer use in drivers/cdrom/cdrom.c
> too. Is this okay under the same considerations?

No.  Do not select an entire subsystem.  Use depends on it instead.


> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> index ad9b687a236a..220ff321c102 100644
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -79,7 +79,7 @@ config GDROM
> tristate "SEGA Dreamcast GD-ROM drive"
> depends on SH_DREAMCAST
> select CDROM
> -   select BLK_SCSI_REQUEST # only for the generic cdrom code
> +   select SCSI
> help
>   A standard SEGA Dreamcast comes with a modified CD ROM drive called 
> a
>   "GD-ROM" by SEGA to signify it is capable of reading special disks
> @@ -345,7 +345,7 @@ config CDROM_PKTCDVD
> tristate "Packet writing on CD/DVD media (DEPRECATED)"
> depends on !UML
> select CDROM
> -   select BLK_SCSI_REQUEST
> +   select SCSI
> help
>   Note: This driver is deprecated and will be removed from the
>   kernel in the near future!
> diff --git a/drivers/block/paride/Kconfig b/drivers/block/paride/Kconfig
> index f8bd6ef3605a..7fdfcc5eaca5 100644
> --- a/drivers/block/paride/Kconfig
> +++ b/drivers/block/paride/Kconfig
> @@ -27,7 +27,7 @@ config PARIDE_PCD
> tristate "Parallel port ATAPI CD-ROMs"
> depends on PARIDE
> select CDROM
> -   select BLK_SCSI_REQUEST # only for the generic cdrom code
> +   select SCSI
> ---help---
>   This option enables the high-level driver for ATAPI CD-ROM devices
>   connected through a parallel port. If you chose to build PARIDE
> 
>>> but as long as that stuff just keeps working I'd rather worry about
>>> everyone else, and keep the scsi code where it belongs.
>>
>> Fine with me then, hopefully we can some day kill it off.
> 
> I'll send a v2. I found a few other things to fix up (including the
> cdrom.c one).


-- 
~Randy


Re: [PATCH v4 06/14] PCI/P2PDMA: Add P2P DMA driver writer's documentation

2018-05-22 Thread Randy Dunlap
On 04/23/2018 04:30 PM, Logan Gunthorpe wrote:
> Add a restructured text file describing how to write drivers
> with support for P2P DMA transactions. The document describes
> how to use the APIs that were added in the previous few
> commits.
> 
> Also adds an index for the PCI documentation tree even though this
> is the only PCI document that has been converted to restructured text
> at this time.
> 
> Signed-off-by: Logan Gunthorpe 
> Cc: Jonathan Corbet 
> ---
>  Documentation/PCI/index.rst |  14 +++
>  Documentation/driver-api/pci/index.rst  |   1 +
>  Documentation/driver-api/pci/p2pdma.rst | 166 
> 
>  Documentation/index.rst |   3 +-
>  4 files changed, 183 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/PCI/index.rst
>  create mode 100644 Documentation/driver-api/pci/p2pdma.rst


> diff --git a/Documentation/driver-api/pci/p2pdma.rst 
> b/Documentation/driver-api/pci/p2pdma.rst
> new file mode 100644
> index ..49a512c405b2
> --- /dev/null
> +++ b/Documentation/driver-api/pci/p2pdma.rst
> @@ -0,0 +1,166 @@
> +
> +PCI Peer-to-Peer DMA Support
> +
> +
> +The PCI bus has pretty decent support for performing DMA transfers
> +between two endpoints on the bus. This type of transaction is
> +henceforth called Peer-to-Peer (or P2P). However, there are a number of
> +issues that make P2P transactions tricky to do in a perfectly safe way.
> +
> +One of the biggest issues is that PCI Root Complexes are not required
> +to support forwarding packets between Root Ports. To make things worse,
> +there is no simple way to determine if a given Root Complex supports
> +this or not. (See PCIe r4.0, sec 1.3.1). Therefore, as of this writing,
> +the kernel only supports doing P2P when the endpoints involved are all
> +behind the same PCIe root port as the spec guarantees that all
> +packets will always be routable but does not require routing between
> +root ports.
> +
> +The second issue is that to make use of existing interfaces in Linux,
> +memory that is used for P2P transactions needs to be backed by struct
> +pages. However, PCI BARs are not typically cache coherent so there are
> +a few corner case gotchas with these pages so developers need to
> +be careful about what they do with them.
> +
> +
> +Driver Writer's Guide
> +=
> +
> +In a given P2P implementation there may be three or more different
> +types of kernel drivers in play:
> +
> +* Providers - A driver which provides or publishes P2P resources like

   * Provider -

> +  memory or doorbell registers to other drivers.
> +* Clients - A driver which makes use of a resource by setting up a

   * Client -

> +  DMA transaction to or from it.
> +* Orchestrators - A driver which orchestrates the flow of data between

   * Orchestrator -

> +  clients and providers
> +
> +In many cases there could be overlap between these three types (ie.

  (i.e.,

> +it may be typical for a driver to be both a provider and a client).
> +
> +For example, in the NVMe Target Copy Offload implementation:
> +
> +* The NVMe PCI driver is both a client, provider and orchestrator
> +  in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
> +  resource (provider), it accepts P2P memory pages as buffers in requests
> +  to be used directly (client) and it can also make use the CMB as
> +  submission queue entries.
> +* The RDMA driver is a client in this arrangement so that an RNIC
> +  can DMA directly to the memory exposed by the NVMe device.
> +* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
> +  to the P2P memory (CMB) and then to the NVMe device (and vice versa).
> +
> +This is currently the only arrangement supported by the kernel but
> +one could imagine slight tweaks to this that would allow for the same
> +functionality. For example, if a specific RNIC added a BAR with some
> +memory behind it, its driver could add support as a P2P provider and
> +then the NVMe Target could use the RNIC's memory instead of the CMB
> +in cases where the NVMe cards in use do not have CMB support.
> +
> +
> +Provider Drivers
> +
> +
> +A provider simply needs to register a BAR (or a portion of a BAR)
> +as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
> +This will register struct pages for all the specified memory.
> +
> +After that it may optionally publish all of its resources as
> +P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
> +any orchestrator drivers to find and use the memory. When marked in
> +this way, the resource must be regular memory with no side effects.
> +
> +For the time being this is fairly rudimentary in that all resources
> +are typically going to be P2P memory. Future work will likely expand
> +this to include other types 

Re: [PATCH 6/7] psi: pressure stall information for CPU, memory, and IO

2018-05-07 Thread Randy Dunlap
On 05/07/2018 02:01 PM, Johannes Weiner wrote:
> 
> Signed-off-by: Johannes Weiner 
> ---
>  Documentation/accounting/psi.txt |  73 ++
>  include/linux/psi.h  |  27 ++
>  include/linux/psi_types.h|  84 ++
>  include/linux/sched.h|  10 +
>  include/linux/sched/stat.h   |  10 +-
>  init/Kconfig |  16 ++
>  kernel/fork.c|   4 +
>  kernel/sched/Makefile|   1 +
>  kernel/sched/core.c  |   3 +
>  kernel/sched/psi.c   | 424 +++
>  kernel/sched/sched.h | 166 ++--
>  kernel/sched/stats.h |  91 ++-
>  mm/compaction.c  |   5 +
>  mm/filemap.c |  15 +-
>  mm/page_alloc.c  |  10 +
>  mm/vmscan.c  |  13 +
>  16 files changed, 859 insertions(+), 93 deletions(-)
>  create mode 100644 Documentation/accounting/psi.txt
>  create mode 100644 include/linux/psi.h
>  create mode 100644 include/linux/psi_types.h
>  create mode 100644 kernel/sched/psi.c
> 
> diff --git a/Documentation/accounting/psi.txt 
> b/Documentation/accounting/psi.txt
> new file mode 100644
> index ..e051810d5127
> --- /dev/null
> +++ b/Documentation/accounting/psi.txt
> @@ -0,0 +1,73 @@

Looks good to me.


> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> new file mode 100644
> index ..052c529a053b
> --- /dev/null
> +++ b/kernel/sched/psi.c
> @@ -0,0 +1,424 @@
> +/*
> + * Measure workload productivity impact from overcommitting CPU, memory, IO
> + *
> + * Copyright (c) 2017 Facebook, Inc.
> + * Author: Johannes Weiner 
> + *
> + * Implementation
> + *
> + * Task states -- running, iowait, memstall -- are tracked through the
> + * scheduler and aggregated into a system-wide productivity state. The
> + * ratio between the times spent in productive states and delays tells
> + * us the overall productivity of the workload.
> + *
> + * The ratio is tracked in decaying time averages over 10s, 1m, 5m
> + * windows. Cumluative stall times are tracked and exported as well to

   Cumulative

> + * allow detection of latency spikes and custom time averaging.
> + *
> + * Multiple CPUs
> + *
> + * To avoid cache contention, times are tracked local to the CPUs. To
> + * get a comprehensive view of a system or cgroup, we have to consider
> + * the fact that CPUs could be unevenly loaded or even entirely idle
> + * if the workload doesn't have enough threads. To avoid artifacts
> + * caused by that, when adding up the global pressure ratio, the
> + * CPU-local ratios are weighed according to their non-idle time:
> + *
> + *   Time the CPU had stalled tasksTime the CPU was non-idle
> + *   -- * ---
> + *WalltimeTime all CPUs were non-idle
> + */


> +
> +/**
> + * psi_memstall_leave - mark the end of an memory stall section

end of a memory

> + * @flags: flags to handle nested memdelay sections
> + *
> + * Marks the calling task as no longer stalled due to lack of memory.
> + */
> +void psi_memstall_leave(unsigned long *flags)
> +{



-- 
~Randy


[PATCH] Documentation: block: cmdline-partition.txt fixes and additions

2018-05-06 Thread Randy Dunlap
From: Randy Dunlap <rdun...@infradead.org>

Make the description of the kernel command line option "blkdevparts"
a bit more flowing and readable.

Fix a few typos.
Add the optional  and  suffixes.
Note that size can be "-" to indicate all of the remaining space.

Signed-off-by: Randy Dunlap <rdun...@infradead.org>
Cc: Cai Zhiyong <caizhiy...@huawei.com>
---
Should the "ro" and "lk" flags options be described?

 Documentation/block/cmdline-partition.txt |   21 +---
 1 file changed, 14 insertions(+), 7 deletions(-)

--- linux-next-20180504.orig/Documentation/block/cmdline-partition.txt
+++ linux-next-20180504/Documentation/block/cmdline-partition.txt
@@ -1,7 +1,9 @@
 Embedded device command line partition parsing
 =
 
-Support for reading the block device partition table from the command line.
+The "blkdevparts" command line option adds support for reading the
+block device partition table from the kernel command line.
+
 It is typically used for fixed block (eMMC) embedded devices.
 It has no MBR, so saves storage space. Bootloader can be easily accessed
 by absolute address of data on the block device.
@@ -14,22 +16,27 @@ blkdevparts=[;]
  := [@](part-name)
 
 
-block device disk name, embedded device used fixed block device,
-it's disk name also fixed. such as: mmcblk0, mmcblk1, mmcblk0boot0.
+block device disk name. Embedded device uses fixed block device.
+Its disk name is also fixed, such as: mmcblk0, mmcblk1, mmcblk0boot0.
 
 
 partition size, in bytes, such as: 512, 1m, 1G.
+size may contain an optional suffix of (upper or lower case):
+  K, M, G, T, P, E.
+"-" is used to denote all remaining space.
 
 
 partition start address, in bytes.
+offset may contain an optional suffix of (upper or lower case):
+  K, M, G, T, P, E.
 
 (part-name)
-partition name, kernel send uevent with "PARTNAME". application can create
-a link to block device partition with the name "PARTNAME".
-user space application can access partition by partition name.
+partition name. Kernel sends uevent with "PARTNAME". Application can
+create a link to block device partition with the name "PARTNAME".
+User space application can access partition by partition name.
 
 Example:
-eMMC disk name is "mmcblk0" and "mmcblk0boot0"
+eMMC disk names are "mmcblk0" and "mmcblk0boot0".
 
   bootargs:
 'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot),-(kernel)'



Re: [PATCH v3] block: add verifier for cmdline partition

2018-05-06 Thread Randy Dunlap
Hi,

On 05/05/2018 10:23 PM, Wang YanQing wrote:
> I meet strange filesystem corruption issue recently, the reason
> is there are overlaps partitions in cmdline partition argument.
> 
> This patch add verifier for cmdline partition, then if there are
> overlaps partitions, cmdline_partition will log a warning.
> 
> Signed-off-by: Wang YanQing 
> ---
>  Changes
>  v2-v3:
>  1:Fix log one pair of overlaps partitions twice in cmdline_parts_verifier.
>  2:Fix out of bound access in cmdline_parts_verifier.
> 
>  v1-v2:
>  1:Don't treat overlaps partition as a error, but log a warning.
> 
>  block/partitions/cmdline.c | 66 
> ++
>  1 file changed, 66 insertions(+)
> 
> diff --git a/block/partitions/cmdline.c b/block/partitions/cmdline.c
> index e333583..fefcc72 100644
> --- a/block/partitions/cmdline.c
> +++ b/block/partitions/cmdline.c
> @@ -58,6 +58,71 @@ static int __init cmdline_parts_setup(char *s)
>  }
>  __setup("blkdevparts=", cmdline_parts_setup);
>  
> +static bool has_overlaps(sector_t from, sector_t size,
> +  sector_t from2, sector_t size2)
> +{
> + sector_t end = from + size;
> + sector_t end2 = from2 + size2;
> +
> + if (from >= from2 && from < end2)
> + return true;
> +
> + if (end > from2 && end <= end2)
> + return true;
> +
> + if (from2 >= from && from2 < end)
> + return true;
> +
> + if (end2 > from && end2 <= end)
> + return true;
> +
> + return false;
> +}
> +
> +static inline void overlaps_warns_header(void)
> +{
> + pr_warn("\n");
> + pr_warn("Overlaps partitions being used in command line partition.\n");

"Overlapping partitions are used in the command line 
partitions.\n"

> + pr_warn("Don't use filesystems on overlaps partitions:\n");

"Don't use filesystems on overlapping partitions:\n"

> +}
> +
> +static inline void overlaps_warns_tailer(void)
> +{
> + pr_warn("\n");
> +}
> +
> +static void cmdline_parts_verifier(int slot, struct parsed_partitions *state)
> +{
> + int i;
> + bool header = true;
> +
> + for (; slot < state->limit && state->parts[slot].has_info; slot++) {
> + for (i = slot+1; i < state->limit && state->parts[i].has_info;
> +  i++) {
> + if (has_overlaps(state->parts[slot].from,
> +  state->parts[slot].size,
> +  state->parts[i].from,
> +  state->parts[i].size)) {
> + if (header) {
> + header = false;
> + overlaps_warns_header();
> + }
> + pr_warn("%s[%llu,%llu] overlaps with "
> + "%s[%llu,%llu].\n",
> + state->parts[slot].info.volname,
> + (u64)state->parts[slot].from << 9,
> + (u64)state->parts[slot].size << 9,
> + state->parts[i].info.volname,
> + (u64)state->parts[i].from << 9,
> + (u64)state->parts[i].size << 9);
> + }
> + }
> + }
> +
> + if (!header)
> + overlaps_warns_tailer();
> +}
> +
>  /*
>   * Purpose: allocate cmdline partitions.
>   * Returns:
> @@ -93,6 +158,7 @@ int cmdline_partition(struct parsed_partitions *state)
>   disk_size = get_capacity(state->bdev->bd_disk) << 9;
>  
>   cmdline_parts_set(parts, disk_size, 1, add_part, (void *)state);
> + cmdline_parts_verifier(1, (void *)state);
>  
>   strlcat(state->pp_buf, "\n", PAGE_SIZE);
>  
> 


-- 
~Randy


[PATCH -next] zram: fix printk formats in zram_drv.c

2018-05-03 Thread Randy Dunlap
From: Randy Dunlap <rdun...@infradead.org>

Fix printk format warnings (seen on i386 build):

../drivers/block/zram/zram_drv.c:678:4: warning: format ‘%lu’ expects argument 
of type ‘long unsigned int’, but argument 4 has type ‘ssize_t’ [-Wformat=]
../drivers/block/zram/zram_drv.c:678:4: warning: format ‘%lu’ expects argument 
of type ‘long unsigned int’, but argument 5 has type ‘time64_t’ [-Wformat=]

time64_t is 64 bits (and signed), so printing it should use %lld,
not %ld.  %ld (long) is only 32 bits on several architectures.

Signed-off-by: Randy Dunlap <rdun...@infradead.org>
Cc: Nitin Gupta <ngu...@vflare.org>
Cc: Minchan Kim <minc...@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.w...@gmail.com>
---
 drivers/block/zram/zram_drv.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next-20180503.orig/drivers/block/zram/zram_drv.c
+++ linux-next-20180503/drivers/block/zram/zram_drv.c
@@ -671,7 +671,7 @@ static ssize_t read_block_state(struct f
 
ts = ktime_to_timespec64(zram->table[index].ac_time);
copied = snprintf(kbuf + written, count,
-   "%12lu %12lu.%06lu %c%c%c\n",
+   "%12zd %12lld.%06lu %c%c%c\n",
index, ts.tv_sec, ts.tv_nsec / NSEC_PER_USEC,
zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',




Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

2018-04-23 Thread Randy Dunlap
On 04/23/2018 04:30 PM, Logan Gunthorpe wrote:> > Signed-off-by: Logan 
Gunthorpe > ---
>  drivers/pci/Kconfig|  9 +>  drivers/pci/p2pdma.c   | 45 
> ++--->  drivers/pci/pci.c  |  
> 6 ++>  include/linux/pci-p2pdma.h |  5 +>  4 files changed, 50 
> insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index b2396c22b53e..b6db41d4b708 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -139,6 +139,15 @@ config PCI_P2PDMA
> transations must be between devices behind the same root port.
> (Typically behind a network of PCIe switches).
>  
> +   Enabling this option will also disable ACS on all ports behind
> +   any PCIe switch. This effectively puts all devices behind any
> +   switch heirarchy into the same IOMMU group. Which implies that

 hierarchy group, which

and sames fixes in the commit description...

> +   individual devices behind any switch will not be able to be
> +   assigned to separate VMs because there is no isolation between
> +   them. Additionally, any malicious PCIe devices will be able to
> +   DMA to memory exposed by other EPs in the same domain as TLPs
> +   will not be checked by the IOMMU.
> +
> If unsure, say N.
>  
>  config PCI_LABEL


-- 
~Randy


Re: [PATCH 1/3] fs: move documentation for thaw_super() where appropriate

2018-04-20 Thread Randy Dunlap
On 04/20/18 17:07, Luis R. Rodriguez wrote:
> On Sat, Apr 21, 2018 at 12:01:31AM +, Bart Van Assche wrote:
>> On Fri, 2018-04-20 at 16:59 -0700, Luis R. Rodriguez wrote:
>>> +/**
>>> + * thaw_super -- unlock filesystem
>>> + * @sb: the super to thaw
>>> + *
>>> + * Unlocks the filesystem and marks it writeable again after 
>>> freeze_super().
>>> + */
>>
>> Have you verified the output generated by scripts/kernel-doc? Last
>> time I checked the output for headers containing a double dash looked
>> weird.
> 
> No, I just moved the block of kdoc crap. Randy would have this memorized I'm
> sure though so I should just fix the kdoc if its bad while at it.
> 
>   Luis
> 

"--" does sound odd.  I've never seen it used on purpose.
FWIW, I just go by what is in Documentation/doc-guide/kernel-doc.rst:

Function documentation
--

The general format of a function and function-like macro kernel-doc comment is::

  /**
   * function_name() - Brief description of function.
   * @arg1: Describe the first argument.
   * @arg2: Describe the second argument.
   *One can provide multiple line descriptions
   *for arguments.



-- 
~Randy


Re: [PATCH 1/2] blkcg: small fix on comment in blkcg_init_queue

2018-04-19 Thread Randy Dunlap
On 04/19/18 07:52, Jens Axboe wrote:
> On 4/18/18 10:04 PM, Jiang Biao wrote:
>> The comment before blkg_create() in blkcg_init_queue() was moved
>> from blkcg_activate_policy() by commit ec13b1d6f0a0457312e615, but
>> it does not suit for the new context.
> 
> Applied - btw, in the future, if you send more than one patch in
> a series, please include a cover letter.

Hi Jens,

Where does that cover letter requirement come from?  It's not documented
anywhere AFAICT and shouldn't be needed for such small patch series.

> I've applied 2/2 as well.

thanks,
-- 
~Randy


Re: [PATCH 3/4] lightnvm: add 2.0 geometry identification

2018-02-05 Thread Randy Dunlap
On 02/05/2018 04:15 AM, Matias Bjørling wrote:
> Implement the geometry data structures for 2.0 and enable a drive
> to be identified as one, including exposing the appropriate 2.0
> sysfs entries.
> 
> Signed-off-by: Matias Bjørling 
> ---
>  drivers/lightnvm/core.c  |   2 +-
>  drivers/nvme/host/lightnvm.c | 334 
> +--
>  include/linux/lightnvm.h |  11 +-
>  3 files changed, 295 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> index c72863b36439..250e74dfa120 100644
> --- a/drivers/lightnvm/core.c
> +++ b/drivers/lightnvm/core.c
> @@ -934,7 +934,7 @@ static int nvm_init(struct nvm_dev *dev)
>   pr_debug("nvm: ver:%x nvm_vendor:%x\n",
>   dev->identity.ver_id, dev->identity.vmnt);
>  
> - if (dev->identity.ver_id != 1) {
> + if (dev->identity.ver_id != 1 && dev->identity.ver_id != 2) {
>   pr_err("nvm: device not supported by kernel.");
>   goto err;
>   }

Hi,
The pr_err() above could be a bit more informative to the user. E.g.,
pr_err("nvm: device ver_id %d not supported by kernel.",
dev->identity.ver_id);

BTW, isn't that line missing a '\n'?

-- 
~Randy


Re: [PATCH v6 1/2] Return bytes transferred for partial direct I/O

2018-01-29 Thread Randy Dunlap
On 01/29/2018 06:57 AM, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues 
> 

> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index 6c00c1e2743f..72e213d62511 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -76,6 +77,19 @@ dcache isn't pruned yet.
>  
>  ==
>  
> +dio_short_writes:
> +
> +In case Direct I/O encounters an transient error, it returns

 a transient

> +the errorcode, even if it has performed part of the write.

   error code,

> +This flag, if on (default), will return the number of bytes written
> +so far, as the write(2) symantics are. However, some older applications

   semantics

> +still consider a direct write as an error if all of the I/O
> +submitted is not complete. ie write(file, count, buf) != count.

  I.e.

> +This option can be disabled on systems in order to support
> +existing applications which do not expect short writes.

and if my system has a mix of older applications and new ones,
will they all work just fine?


thanks,
-- 
~Randy


Re: [GIT PULL 18/25] lightnvm: set target over-provision on create ioctl

2018-01-05 Thread Randy Dunlap
On 01/05/2018 11:56 AM, Javier Gonzalez wrote:
>> On 5 Jan 2018, at 20.53, Matias Bjørling <m...@bjorling.me> wrote:
>>
>> On 01/05/2018 08:52 PM, Javier Gonzalez wrote:
>>>> On 5 Jan 2018, at 20.33, Randy Dunlap <rdun...@infradead.org> wrote:
>>>>
>>>> On 01/05/2018 05:16 AM, Matias Bjørling wrote:
>>>>> From: Javier González <jav...@cnexlabs.com>
>>>>>
>>>>> Allow to set the over-provision percentage on target creation. In case
>>>>> that the value is not provided, fall back to the default value set by
>>>>> the target.
>>>>>
>>>>> In pblk, set the default OP to 11% of the total size of the device
>>>>
>>>>> +#define PBLK_DEFAULT_OP (11)
>>>>
>>>> Hi,
>>>> Just curious -- where does 11 come from?  Is it a spec value?
>>> 11 stands for 11% over-provisioning for the media to allow garbage
>>> collection. Different SSDs have different values based on the targeted
>>> workload - 11% is a common default value for standard drives.
>>> The spec. does not specify any default values of over-provisioning.
>>> Javier.
>>
>> I think what Randy means is, why is the value not 20% or 7%, which is the 
>> traditional over-provisioning on SSDs.
> 
> We decided 11% based on customer input, but 7% and 20% are also good
> default values. I'm ok with having any of those as default - anyone
> caring about OP will define the value on target creation, which is the
> primary objective of this patch.

Got it.  Thanks.

-- 
~Randy


Re: [GIT PULL 18/25] lightnvm: set target over-provision on create ioctl

2018-01-05 Thread Randy Dunlap
On 01/05/2018 05:16 AM, Matias Bjørling wrote:
> From: Javier González 
> 
> Allow to set the over-provision percentage on target creation. In case
> that the value is not provided, fall back to the default value set by
> the target.
> 
> In pblk, set the default OP to 11% of the total size of the device

> +#define PBLK_DEFAULT_OP (11)

Hi,
Just curious -- where does 11 come from?  Is it a spec value?

thanks,
-- 
~Randy


[PATCH] block: genhd.c: fix message typo

2017-11-18 Thread Randy Dunlap
From: Randy Dunlap <rdun...@infradead.org>

Fix typo in error message.

Signed-off-by: Randy Dunlap <rdun...@infradead.org>
---
 block/genhd.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-414.orig/block/genhd.c
+++ lnx-414/block/genhd.c
@@ -1367,7 +1367,7 @@ struct gendisk *alloc_disk_node(int mino
 
if (minors > DISK_MAX_PARTS) {
printk(KERN_ERR
-   "block: can't allocated more than %d partitions\n",
+   "block: can't allocate more than %d partitions\n",
DISK_MAX_PARTS);
minors = DISK_MAX_PARTS;
}




[PATCH] block: fix Sphinx kernel-doc warning

2017-10-16 Thread Randy Dunlap
From: Randy Dunlap <rdun...@infradead.org>

Sphinx treats symbols that end with '_' as a kind of special
documentation indicator, so fix that by adding an ending '*'
to it.

../block/bio.c:404: ERROR: Unknown target name: "gfp".

Signed-off-by: Randy Dunlap <rdun...@infradead.org>
---
 block/bio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-414-rc5.orig/block/bio.c
+++ lnx-414-rc5/block/bio.c
@@ -400,7 +400,7 @@ static void punt_bios_to_rescuer(struct
 
 /**
  * bio_alloc_bioset - allocate a bio for I/O
- * @gfp_mask:   the GFP_ mask given to the slab allocator
+ * @gfp_mask:   the GFP_* mask given to the slab allocator
  * @nr_iovecs: number of iovecs to pre-allocate
  * @bs:the bio_set to allocate from.
  *



Re: [PATCH 1/8] lib: Introduce sgl_alloc() and sgl_free()

2017-10-13 Thread Randy Dunlap
On 10/12/17 15:45, Bart Van Assche wrote:

> diff --git a/lib/sgl_alloc.c b/lib/sgl_alloc.c
> new file mode 100644
> index ..d96b395dd5c8
> --- /dev/null
> +++ b/lib/sgl_alloc.c
> @@ -0,0 +1,102 @@

> +/**
> + * sgl_free_order - free a scatterlist and its pages
> + * @sg: Scatterlist with one or more elements

  @sgl:

> + * @order: Second argument for __free_pages()
> + */
> +void sgl_free_order(struct scatterlist *sgl, int order)
> +{
> + struct scatterlist *sg;
> + struct page *page;
> +
> + for (sg = sgl; sg; sg = sg_next(sg)) {
> + page = sg_page(sg);
> + if (page)
> + __free_pages(page, order);
> + }
> + kfree(sgl);
> +}
> +EXPORT_SYMBOL(sgl_free_order);
> +
> +/**
> + * sgl_free - free a scatterlist and its pages
> + * @sg: Scatterlist with one or more elements

  @sgl:

> + */
> +void sgl_free(struct scatterlist *sgl)
> +{
> + sgl_free_order(sgl, 0);
> +}
> +EXPORT_SYMBOL(sgl_free);
> 

ta.
-- 
~Randy


[PATCH] block: fix bio.c kernel-doc notation warning

2017-09-04 Thread Randy Dunlap
From: Randy Dunlap <rdun...@infradead.org>

Sphinx treats symbols that end with '_' as some kind of special
documentation indicator, so fix that by adding an ending '*'
to it.

../block/bio.c:404: ERROR: Unknown target name: "gfp".

Signed-off-by: Randy Dunlap <rdun...@infradead.org>
---
 block/bio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-413.orig/block/bio.c
+++ lnx-413/block/bio.c
@@ -400,7 +400,7 @@ static void punt_bios_to_rescuer(struct
 
 /**
  * bio_alloc_bioset - allocate a bio for I/O
- * @gfp_mask:   the GFP_ mask given to the slab allocator
+ * @gfp_mask:   the GFP_* mask given to the slab allocator
  * @nr_iovecs: number of iovecs to pre-allocate
  * @bs:the bio_set to allocate from.
  *