Re: [dm-devel] [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path

2016-07-18 Thread David Rientjes
On Mon, 18 Jul 2016, Michal Hocko wrote:

> David Rientjes was objecting that such an approach wouldn't help if the
> oom victim was blocked on a lock held by process doing mempool_alloc. This
> is very similar to other oom deadlock situations and we have oom_reaper
> to deal with them so it is reasonable to rely on the same mechanism
> rather inventing a different one which has negative side effects.
> 

Right, this causes oom livelock as described in the aforementioned thread: 
the oom victim is waiting on a mutex that is held by a thread doing 
mempool_alloc().  The oom reaper is not guaranteed to free any memory, so 
nothing on the system can allocate memory from the page allocator.

I think the better solution here is to allow mempool_alloc() users to set 
__GFP_NOMEMALLOC if they are in a context which allows them to deplete 
memory reserves.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [4.7-rc6 snapshot] xfstests::generic/081 unable to tear down snapshot VG

2016-07-18 Thread Dave Chinner
Hi folks,

I'm currently running the latest set of XFS patches through QA, and
I'm getting generic/081 failing and leaving a block device in an
unrecoverable EBUSY state. I'm running xfstests on a pair of 8GB
fake pmem devices:

$ sudo ./run_check.sh " -i sparse=1" "" " -s xfs generic/081"
umount: /mnt/test: not mounted
umount: /mnt/scratch: not mounted
meta-data=/dev/pmem0 isize=512agcount=4, agsize=524288 blks
 =   sectsz=4096  attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=1
data =   bsize=4096   blocks=2097152, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096   blocks=2560, version=2
 =   sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
Clearing log and setting UUID
writing all SBs
new UUID = 300b4aff-a65e-4de1-ac0e-5a0058e93ef0
Building include
Building lib
Building tests
Building ltp
Building src
Building m4
Building common
Building aio-dio-regress
SECTION   -- xfs
FSTYP -- xfs (debug)
PLATFORM  -- Linux/x86_64 test4 4.7.0-rc6-dgc+
MKFS_OPTIONS  -- -f -i sparse=1 /dev/pmem1
MOUNT_OPTIONS -- /dev/pmem1 /mnt/scratch

generic/081 1s ... 1s
Ran: generic/081
Passed all 1 tests

SECTION   -- xfs
=
Ran: generic/081
Passed all 1 tests

$

Looking at the console output from the test:

[   28.227059] run fstests generic/081 at 2016-07-19 09:03:19
[   28.465398] XFS (pmem1): Unmounting Filesystem
[   28.684889] XFS (dm-3): EXPERIMENTAL sparse inode feature enabled. Use at 
your own risk!
[   28.686940] XFS (dm-3): Mounting V5 Filesystem
[   28.692561] XFS (dm-3): Ending clean mount
[   28.703692] device-mapper: snapshots: Invalidating snapshot: Unable to 
allocate exception.
[   28.707574] Buffer I/O error on dev dm-3, logical block 24, lost async page 
write
[   28.708653] Buffer I/O error on dev dm-3, logical block 25, lost async page 
write
[   28.709798] Buffer I/O error on dev dm-3, logical block 26, lost async page 
write
[   28.710899] Buffer I/O error on dev dm-3, logical block 27, lost async page 
write
[   28.711973] Buffer I/O error on dev dm-3, logical block 28, lost async page 
write
[   28.713062] Buffer I/O error on dev dm-3, logical block 29, lost async page 
write
[   28.714191] Buffer I/O error on dev dm-3, logical block 30, lost async page 
write
[   28.715247] Buffer I/O error on dev dm-3, logical block 31, lost async page 
write
[   28.716407] Buffer I/O error on dev dm-3, logical block 32, lost async page 
write
[   28.717490] Buffer I/O error on dev dm-3, logical block 33, lost async page 
write
[   28.725428] XFS (dm-3): metadata I/O error: block 0x40048 ("xlog_iodone") 
error 5 numblks 64
[   28.726555] XFS (dm-3): xfs_do_force_shutdown(0x2) called from line 1200 of 
file fs/xfs/xfs_log.c.  Return address = 0x81520ef2
[   28.728136] XFS (dm-3): Log I/O Error Detected.  Shutting down filesystem
[   28.729019] XFS (dm-3): Please umount the filesystem and rectify the 
problem(s)
[   28.730025] XFS (dm-3): xfs_log_force: error -5 returned.
[   28.731613] XFS (dm-3): Unmounting Filesystem
[   28.732197] XFS (dm-3): xfs_log_force: error -5 returned.
[   28.732905] XFS (dm-3): xfs_log_force: error -5 returned.
[   28.777469] XFS (pmem0): Unmounting Filesystem

And so, aparently the test passed. Except, the scratch device is now
busy:

$ sudo mkfs.xfs -f /dev/pmem1
mkfs.xfs: cannot open /dev/pmem1: Device or resource busy
$

And the device mapper volumes created have not been torn down.
The test attempts to tear down the dm devices via "vgremove -f "
and "pvremove -f ". These fail, and when I ran them manually:

$ ls /dev/mapper
control  vg_081-base_081  vg_081-base_081-real  vg_081-snap_081  
vg_081-snap_081-cow
$ sudo vgs
  VG #PV #LV #SN Attr   VSize VFree
  vg_081   1   2   1 wz--n- 8.00g 7.74g
$ sudo vgremove vg_081
Do you really want to remove volume group "vg_081" containing 2 logical 
volumes? [y/n]: y
Do you really want to remove active logical volume snap_081? [y/n]: y
  device-mapper: resume ioctl on (249:2) failed: Invalid argument
  Unable to resume vg_081-snap_081-cow (249:2)
  Attempted to decrement suspended device counter below zero.
  Failed to activate snap_081.
$

I couldn't remove the the VGs, with or without the "-f" option.

However, I could remove the base and snapshot LVs with lvremove, and
then I could remove the VGs and PVs. However, this still left
/dev/pmem1 in an EBUSY state. lsof and friends showed no visible
users of the block device, and so a reboot followed.

In reproducing it, I've found that re-running the test immediately
fails 9/10 times. If it does fail, then I have to manually run:

$ sudo vgremove -f vg_081; sudo pvremove -f /dev/pmem1

before I can use the scratch device again. Re

Re: [dm-devel] [PATCH next] Btrfs: fix comparison in __btrfs_map_block()

2016-07-18 Thread Jens Axboe

On 07/15/2016 09:03 AM, Vincent Stehlé wrote:

Add missing comparison to op in expression, which was forgotten when doing
the REQ_OP transition.


Thanks, added to the 4.8 branch.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] multipath-tools: replace -shared with $(SHARED_FLAGS) in Makefile

2016-07-18 Thread Xose Vazquez Perez
On 07/18/2016 05:12 PM, Xose Vazquez Perez wrote:
> Cc: Christophe Varoqui 
> Cc: device-mapper development 
> Signed-off-by: Xose Vazquez Perez 
> ---
>  libmpathpersist/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libmpathpersist/Makefile b/libmpathpersist/Makefile
> index a2dd7ed..93e106d 100644
> --- a/libmpathpersist/Makefile
> +++ b/libmpathpersist/Makefile
> @@ -19,7 +19,7 @@ all: $(LIBS)
>  
>  $(LIBS):
>   $(CC) -Wall -fPIC -c $(CFLAGS) *.c
> - $(CC)  -shared $(LIBDEPS) -Wl,-soname=$@ $(CFLAGS) -o $@ $(OBJS)
> + $(CC) $(SHARED_FLAGS) $(LIBDEPS) -Wl,-soname=$@ $(CFLAGS) -o $@ $(OBJS)
>   $(LN) $(LIBS) $(DEVLIB)
>   $(GZIP) mpath_persistent_reserve_in.3 > mpath_persistent_reserve_in.3.gz
>   $(GZIP) mpath_persistent_reserve_out.3 > 
> mpath_persistent_reserve_out.3.gz
> 

DROP this one, DUPLICATE !!!

Thank you.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] multipath-tools: replace <> with "" for local headers

2016-07-18 Thread Xose Vazquez Perez
On 07/18/2016 05:07 PM, Bart Van Assche wrote:
> On 07/18/2016 07:42 AM, Xose Vazquez Perez wrote:
>> [ ... ]
> 
> Was this perhaps reported by clang and has the resulting
> patch been verified with clang? If so, please mention this
> in the patch description. In that case:

I did not use clang, and splint was (only) the trigger.
Mainly it was done with scripts.

All my patches are, at least, compile-tested. Usually with
gcc, but sometimes also with clang.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH] multipath-tools: replace -shared with $(SHARED_FLAGS) in Makefile

2016-07-18 Thread Xose Vazquez Perez
Cc: Christophe Varoqui 
Cc: device-mapper development 
Signed-off-by: Xose Vazquez Perez 
---
 libmpathpersist/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmpathpersist/Makefile b/libmpathpersist/Makefile
index a2dd7ed..93e106d 100644
--- a/libmpathpersist/Makefile
+++ b/libmpathpersist/Makefile
@@ -19,7 +19,7 @@ all: $(LIBS)
 
 $(LIBS):
$(CC) -Wall -fPIC -c $(CFLAGS) *.c
-   $(CC)  -shared $(LIBDEPS) -Wl,-soname=$@ $(CFLAGS) -o $@ $(OBJS)
+   $(CC) $(SHARED_FLAGS) $(LIBDEPS) -Wl,-soname=$@ $(CFLAGS) -o $@ $(OBJS)
$(LN) $(LIBS) $(DEVLIB)
$(GZIP) mpath_persistent_reserve_in.3 > mpath_persistent_reserve_in.3.gz
$(GZIP) mpath_persistent_reserve_out.3 > 
mpath_persistent_reserve_out.3.gz
-- 
2.7.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] multipath-tools: replace <> with "" for local headers

2016-07-18 Thread Bart Van Assche

On 07/18/2016 07:42 AM, Xose Vazquez Perez wrote:
> [ ... ]

Was this perhaps reported by clang and has the resulting patch been 
verified with clang? If so, please mention this in the patch 
description. In that case:


Reviewed-by: Bart Van Assche 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH] multipath-tools: put a space between #include and the header file

2016-07-18 Thread Bart Van Assche

On 07/18/2016 07:58 AM, Xose Vazquez Perez wrote:

[ ... ]


Reviewed-by: Bart Van Assche 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH] multipath-tools: put a space between #include and the header file

2016-07-18 Thread Xose Vazquez Perez
Cc: Christophe Varoqui 
Cc: device-mapper development 
Signed-off-by: Xose Vazquez Perez 
---
 libmpathpersist/mpath_updatepr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libmpathpersist/mpath_updatepr.c b/libmpathpersist/mpath_updatepr.c
index faafe51..be73de4 100644
--- a/libmpathpersist/mpath_updatepr.c
+++ b/libmpathpersist/mpath_updatepr.c
@@ -1,5 +1,5 @@
-#include
-#include
+#include 
+#include 
 #include 
 
 #include 
-- 
2.7.4

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [PATCH] multipath-tools: replace <> with "" for local headers

2016-07-18 Thread Xose Vazquez Perez
Cc: Christophe Varoqui 
Cc: device-mapper development 
Signed-off-by: Xose Vazquez Perez 
---
 kpartx/lopart.c  |  2 +-
 libmpathpersist/mpath_persist.c  | 28 +++
 libmpathpersist/mpath_pr_ioctl.c |  4 +--
 libmpathpersist/mpath_updatepr.c |  6 ++--
 libmultipath/configure.c |  2 +-
 libmultipath/dict.c  |  2 +-
 libmultipath/prioritizers/alua.c |  6 ++--
 libmultipath/prioritizers/const.c|  2 +-
 libmultipath/prioritizers/datacore.c |  8 ++---
 libmultipath/prioritizers/emc.c  |  8 ++---
 libmultipath/prioritizers/hds.c  |  8 ++---
 libmultipath/prioritizers/hp_sw.c|  8 ++---
 libmultipath/prioritizers/iet.c  |  6 ++--
 libmultipath/prioritizers/ontap.c|  8 ++---
 libmultipath/prioritizers/random.c   |  2 +-
 libmultipath/prioritizers/rdac.c |  8 ++---
 libmultipath/prioritizers/weightedpath.c | 12 +++
 libmultipath/uxsock.c|  2 +-
 mpathpersist/main.c  | 10 +++---
 multipath/main.c | 46 -
 multipathd/cli.c | 12 +++
 multipathd/cli_handlers.c| 26 +++---
 multipathd/main.c| 58 
 multipathd/pidfile.c |  2 +-
 multipathd/uxclnt.c  |  8 ++---
 multipathd/uxlsnr.c  | 18 +-
 26 files changed, 151 insertions(+), 151 deletions(-)

diff --git a/kpartx/lopart.c b/kpartx/lopart.c
index 5d4967b..8eb328f 100644
--- a/kpartx/lopart.c
+++ b/kpartx/lopart.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-#include 
+#include "sysmacros.h"
 #include 
 
 #include "lopart.h"
diff --git a/libmpathpersist/mpath_persist.c b/libmpathpersist/mpath_persist.c
index b037822..f2a0905 100644
--- a/libmpathpersist/mpath_persist.c
+++ b/libmpathpersist/mpath_persist.c
@@ -1,25 +1,25 @@
 #include 
-#include 
+#include "defaults.h"
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
+#include "vector.h"
+#include "checkers.h"
+#include "structs.h"
+#include "structs_vec.h"
 #include 
 
-#include 
+#include "prio.h"
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
+#include "devmapper.h"
+#include "debug.h"
+#include "config.h"
+#include "switchgroup.h"
+#include "discovery.h"
+#include "dmparser.h"
 #include 
-#include 
-#include 
+#include "propsel.h"
+#include "util.h"
 
 #include "mpath_persist.h"
 #include "mpathpr.h"
diff --git a/libmpathpersist/mpath_pr_ioctl.c b/libmpathpersist/mpath_pr_ioctl.c
index e54c804..cc781f6 100644
--- a/libmpathpersist/mpath_pr_ioctl.c
+++ b/libmpathpersist/mpath_pr_ioctl.c
@@ -12,9 +12,9 @@
 #include 
 #include 
 #include "mpath_pr_ioctl.h"
-#include 
+#include "mpath_persist.h"
 
-#include 
+#include "debug.h"
 
 #define FILE_NAME_SIZE  256
 
diff --git a/libmpathpersist/mpath_updatepr.c b/libmpathpersist/mpath_updatepr.c
index 0529d13..faafe51 100644
--- a/libmpathpersist/mpath_updatepr.c
+++ b/libmpathpersist/mpath_updatepr.c
@@ -11,9 +11,9 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
+#include "debug.h"
+#include "mpath_cmd.h"
+#include "uxsock.h"
 #include "memory.h"
 
 unsigned long mem_allocated;/* Total memory used in Bytes */
diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index 4b9d0c6..1e0037f 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-#include 
+#include "mpath_cmd.h"
 
 #include "checkers.h"
 #include "vector.h"
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 7b92a91..030ad98 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -21,7 +21,7 @@
 #include "prio.h"
 #include "errno.h"
 #include 
-#include 
+#include "mpath_cmd.h"
 
 static int
 set_int(vector strvec, void *ptr)
diff --git a/libmultipath/prioritizers/alua.c b/libmultipath/prioritizers/alua.c
index b6c5176..9f0851d 100644
--- a/libmultipath/prioritizers/alua.c
+++ b/libmultipath/prioritizers/alua.c
@@ -14,9 +14,9 @@
  */
 #include 
 
-#include 
-#include 
-#include 
+#include "debug.h"
+#include "prio.h"
+#include "structs.h"
 
 #include "alua.h"
 
diff --git a/libmultipath/prioritizers/const.c 
b/libmultipath/prioritizers/const.c
index bf689cd..9d9d003 100644
--- a/libmultipath/prioritizers/const.c
+++ b/libmultipath/prioritizers/const.c
@@ -1,6 +1,6 @@
 #include 
 
-#include 
+#include "prio.h"
 
 int getprio (struct path * pp, char * args)
 {
diff --git a/libmultipath/prioritizers/datacore.c 
b/libmultipath/prioritizers/datacore.c
index 658a598..9afbbfe 100644
--- a/libmultipath/prioritizers/datacore.c
+++ b/libmultipath/prioritizers/datacore.c
@@ -21,10 +21,10 @@
 #include 
 
 #include 
-#include 
-#include 
-#include 
-#include 
+#include "sg_include.h"
+#include "debug.h"
+#include "prio.h"

[dm-devel] [PATCH] multipath-tools: remove final \ in fprintf

2016-07-18 Thread Xose Vazquez Perez
Useless.

Cc: Christophe Varoqui 
Cc: device-mapper development 
Signed-off-by: Xose Vazquez Perez 
---
 multipath/main.c | 64 
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/multipath/main.c b/multipath/main.c
index 907a96c..0bec0d2 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -110,38 +110,38 @@ usage (char * progname)
fprintf (stderr,
"\n"
"Where:\n"
-   "  -h  print this usage text\n" \
-   "  -l  show multipath topology (sysfs and DM info)\n" \
-   "  -ll show multipath topology (maximum info)\n" \
-   "  -f  flush a multipath device map\n" \
-   "  -F  flush all multipath device maps\n" \
-   "  -a  add a device wwid to the wwids file\n" \
-   "  -c  check if a device should be a path in a multipath 
device\n" \
-   "  -q  allow queue_if_no_path when multipathd is not 
running\n"\
-   "  -d  dry run, do not create or update devmaps\n" \
-   "  -t  dump internal hardware table\n" \
-   "  -r  force devmap reload\n" \
-   "  -i  ignore wwids file\n" \
-   "  -B  treat the bindings file as read only\n" \
-   "  -b fil  bindings file location\n" \
-   "  -w  remove a device from the wwids file\n" \
-   "  -W  reset the wwids file include only the current 
devices\n" \
-   "  -p pol  force all maps to specified path grouping policy 
:\n" \
-   "  . failoverone path per priority group\n" 
\
-   "  . multibusall paths in one priority 
group\n" \
-   "  . group_by_serial one priority group per 
serial\n" \
-   "  . group_by_prio   one priority group per 
priority lvl\n" \
-   "  . group_by_node_name  one priority group per target 
node\n" \
-   "  -v lvl  verbosity level\n" \
-   "  . 0 no output\n" \
-   "  . 1 print created devmap names only\n" \
-   "  . 2 default verbosity\n" \
-   "  . 3 print debug information\n" \
-   "  dev action limited to:\n" \
-   "  . multipath named 'dev' (ex: mpath0) or\n" \
-   "  . multipath whose wwid is 'dev' (ex: 60051..)\n" \
-   "  . multipath including the path named 'dev' (ex: 
/dev/sda)\n" \
-   "  . multipath including the path with maj:min 'dev' 
(ex: 8:0)\n" \
+   "  -h  print this usage text\n"
+   "  -l  show multipath topology (sysfs and DM info)\n"
+   "  -ll show multipath topology (maximum info)\n"
+   "  -f  flush a multipath device map\n"
+   "  -F  flush all multipath device maps\n"
+   "  -a  add a device wwid to the wwids file\n"
+   "  -c  check if a device should be a path in a multipath 
device\n"
+   "  -q  allow queue_if_no_path when multipathd is not 
running\n"
+   "  -d  dry run, do not create or update devmaps\n"
+   "  -t  dump internal hardware table\n"
+   "  -r  force devmap reload\n"
+   "  -i  ignore wwids file\n"
+   "  -B  treat the bindings file as read only\n"
+   "  -b fil  bindings file location\n"
+   "  -w  remove a device from the wwids file\n"
+   "  -W  reset the wwids file include only the current 
devices\n"
+   "  -p pol  force all maps to specified path grouping policy :\n"
+   "  . failoverone path per priority group\n"
+   "  . multibusall paths in one priority 
group\n"
+   "  . group_by_serial one priority group per 
serial\n"
+   "  . group_by_prio   one priority group per 
priority lvl\n"
+   "  . group_by_node_name  one priority group per target 
node\n"
+   "  -v lvl  verbosity level\n"
+   "  . 0 no output\n"
+   "  . 1 print created devmap names only\n"
+   "  . 2 default verbosity\n"
+   "  . 3 print debug information\n"
+   "  dev action limited to:\n"
+   "  . multipath named 'dev' (ex: mpath0) or\n"
+   "  . multipath whose wwid is 'dev' (ex: 60051..)\n"
+   "  . multipath including the path named 'dev' (ex: 
/dev/sda)\n"
+   "  . multipath including the path with maj:min 'dev' 
(ex: 8:0)\n"
);
 
 }
-- 
2.7.4

--
dm-devel maili

Re: [dm-devel] [PATCH V6 1/3] multipath-tools: New way to limit the IPC command length.

2016-07-18 Thread Gris Ge
On Fri, Jul 15, 2016 at 04:35:45PM -0500, Benjamin Marzinski wrote:
> On Tue, Jul 12, 2016 at 02:50:36PM +0800, Gris Ge wrote:
> 
> The only thing that I wonder about with this patch is, when previously
> the multipath client code would have failed with EPIPE, and (at least in
> some cases) spit out a semi-useful message, the program will now
> terminate because of the SIGPIPE signal.  I'm not sure it makes any real
> difference, since we weren't very diligent with returning useful error
> messages in this case, and the client code isn't very likely to get
> SIGPIPE.
> 
> I'm not very concerned if nobody else thinks this is important, I just
> though I should bring it up.
> 
> -Ben
> 
Thanks Ben, I will make next version of patch to handle/ignore
SIGPIPE.

-- 
Gris Ge


signature.asc
Description: PGP signature
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

[dm-devel] [PATCH] multipath-tools: replace leading spaces with tabs

2016-07-18 Thread Xose Vazquez Perez
replace ^8x(1,2,3,4,5,6 and 7) spaces with tabs.

Cc: Christophe Varoqui 
Cc: device-mapper development 
Signed-off-by: Xose Vazquez Perez 
---
 kpartx/dasd.h|  76 +--
 kpartx/gpt.c | 238 +--
 kpartx/gpt.h |   6 +-
 kpartx/lopart.c  |   2 +-
 kpartx/mac.c |   4 +-
 kpartx/mac.h |  18 +--
 kpartx/ps3.c |   2 +-
 libmpathpersist/mpath_pr_ioctl.c |  62 -
 libmpathpersist/mpathpr.h|  36 +++---
 libmultipath/configure.c |   2 +-
 libmultipath/devmapper.c |   8 +-
 libmultipath/memory.h|   8 +-
 libmultipath/print.c |  10 +-
 libmultipath/print.h |  70 +--
 libmultipath/prioritizers/datacore.c |   6 +-
 libmultipath/prioritizers/hds.c  |   2 +-
 libmultipath/prioritizers/ontap.c|   2 +-
 mpathpersist/main.c  |  24 ++--
 18 files changed, 288 insertions(+), 288 deletions(-)

diff --git a/kpartx/dasd.h b/kpartx/dasd.h
index 2779366..0356066 100644
--- a/kpartx/dasd.h
+++ b/kpartx/dasd.h
@@ -27,40 +27,40 @@
 
 typedef struct ttr
 {
-uint16_t tt;
-uint8_t  r;
+   uint16_t tt;
+   uint8_t  r;
 } __attribute__ ((packed)) ttr_t;
 
 typedef struct cchhb
 {
-uint16_t cc;
-uint16_t hh;
-uint8_t b;
+   uint16_t cc;
+   uint16_t hh;
+   uint8_t b;
 } __attribute__ ((packed)) cchhb_t;
 
 typedef struct cchh
 {
-uint16_t cc;
-uint16_t hh;
+   uint16_t cc;
+   uint16_t hh;
 } __attribute__ ((packed)) cchh_t;
 
 typedef struct labeldate
 {
-uint8_t  year;
-uint16_t day;
+   uint8_t  year;
+   uint16_t day;
 } __attribute__ ((packed)) labeldate_t;
 
 
 typedef struct volume_label
 {
-char volkey[4]; /* volume key = volume label */
+   char volkey[4]; /* volume key = volume label */
char vollbl[4]; /* volume label  */
char volid[6];  /* volume identifier */
uint8_t security;   /* security byte
 */
cchhb_t vtoc;   /* VTOC address  */
char res1[5];   /* reserved  */
-char cisize[4];/* CI-size for FBA,...  
 */
-/* ...blanks for CKD */
+   char cisize[4]; /* CI-size for FBA,...   */
+   /* ...blanks for CKD */
char blkperci[4];   /* no of blocks per CI (FBA), blanks for CKD */
char labperci[4];   /* no of labels per CI (FBA), blanks for CKD */
char res2[4];   /* reserved  */
@@ -73,25 +73,25 @@ typedef struct volume_label
 
 typedef struct extent
 {
-uint8_t  typeind;  /* extent type indicator
 */
-uint8_t  seqno;/* extent sequence number   
 */
-cchh_t llimit;  /* starting point of this extent */
-cchh_t ulimit;  /* ending point of this extent   */
+   uint8_t  typeind;  /* extent type indicator 
*/
+   uint8_t  seqno;/* extent sequence number
*/
+   cchh_t llimit;  /* starting point of this extent */
+   cchh_t ulimit;  /* ending point of this extent   */
 } __attribute__ ((packed)) extent_t;
 
 
 typedef struct dev_const
 {
-uint16_t DS4DSCYL;   /* number of logical cyls 
 */
-uint16_t DS4DSTRK;   /* number of tracks in a logical cylinder 
 */
-uint16_t DS4DEVTK;   /* device track length
 */
-uint8_t  DS4DEVI;/* non-last keyed record overhead 
 */
-uint8_t  DS4DEVL;/* last keyed record overhead 
 */
-uint8_t  DS4DEVK;/* non-keyed record overhead differential 
 */
-uint8_t  DS4DEVFG;   /* flag byte  
 */
-uint16_t DS4DEVTL;   /* device tolerance   
 */
-uint8_t  DS4DEVDT;   /* number of DSCB's per track 
 */
-uint8_t  DS4DEVDB;   /* number of directory blocks per track   
 */
+   uint16_t DS4DSCYL;   /* number of logical cyls  
*/
+   uint16_t DS4DSTRK;   /* number of tracks in a logical cylinder  
*/
+   uint16_t DS4DEVTK;   /* device track length 
*/
+   uint8_t  DS4DEVI;

[dm-devel] [PATCH] multipathd: fix memory leak in reconfigure()

2016-07-18 Thread tang . junhui
From: "tang.junhui" 

Problem:
Memory leak exists in reconfigure() when multipathd command reconfigure is 
executed.

Reasons:
 * The following judgment condition is not satisfied when there is no path,
   free_pathvec() is not called to free the vector of vecs->pathvec:
   if (VECTOR_SIZE(vecs->pathvec))
   free_pathvec(vecs->pathvec, FREE_PATHS);
 * Then the vecs->pathvec is set to NULL, so the vector memory which
   vecs->pathvec pointed to is leaked:
   vecs->pathvec = NULL;

Signed-off-by: tang.junhui 
---
 multipathd/main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index c129298..d9731cd 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1997,9 +1997,7 @@ reconfigure (struct vectors * vecs)
if (VECTOR_SIZE(vecs->mpvec))
remove_maps_and_stop_waiters(vecs);
 
-   if (VECTOR_SIZE(vecs->pathvec))
-   free_pathvec(vecs->pathvec, FREE_PATHS);
-
+   free_pathvec(vecs->pathvec, FREE_PATHS);
vecs->pathvec = NULL;
 
/* Re-read any timezone changes */
-- 
2.8.1.windows.1


ZTE Information Security Notice: The information contained in this mail (and 
any attachment transmitted herewith) is privileged and confidential and is 
intended for the exclusive use of the addressee(s).  If you are not an intended 
recipient, any disclosure, reproduction, distribution or other dissemination or 
use of the information contained is strictly prohibited.  If you have received 
this mail in error, please delete it and notify us immediately.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] How to filter local disks that should not be managed by multipathd?

2016-07-18 Thread Xieyingtai

Hi,
   Recently I encountered a problem, In certain scenarios that a new local disk 
was hot plugged, I have no way
to obtain the wwn of the disk in advance, so I can not filter that disk by 
adding it to blacklists. Actually there
is no need for multipathd to  take over local disks in the hot-plugged sceno. 
And find_multipaths param seems not to be
a perfect way to solve this problem. Is there any way to  provide a param in 
/etc/multipath.conf for mulitpathd to manage
only ISCSI or FC bus disks? In that configuration, path_discovery() will only 
scan devices which were underline ISCSI bus
or FC bus in sysfs.

Thanks.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] Recover filenames from failed RAID0

2016-07-18 Thread keld
Hi

Which file system did you use?
I once wrote some code to get files out of an ext3 filesystem, 
http://www.open-std.org/keld/readme-salvage.html

Maybe you can make some corrections for it to work
as I remember the code, then if you hit a directory, it will salvage the file 
names and the files
of that directory.

Best regards
keld

On Sun, Jul 17, 2016 at 03:10:03PM -0400, Michel Dubois wrote:
> Dear linux-raid mailing list,
> 
> I have a RAID0 array of four 3TB disks that failed on the "third" disk.
> 
> I am aware of the non-redundancy of RAID0 but I would like to recover
> the filenames from that RAID0. If I could recover some data it would
> be a bonus.
> 
> Below you'll find the outputs of the following commands
>  mdadm --examine /dev/sd[abcd]1
>  fdisk -l
> 
> where sda1, sdb1, sdc1 and sdd1 should be the 4 RAID devices.
> 
> What could be my next step?
> 
> I thank you for your time
> 
> Michel Dubois
> 
> ==
> mdadm --examine /dev/sd[abcd]1
> /dev/sda1:
>   Magic : a92b4efc
> Version : 00.90.00
>UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
>   Creation Time : Mon Apr 23 19:55:36 2012
>  Raid Level : raid1
>   Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
>  Array Size : 20980800 (20.01 GiB 21.48 GB)
>Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
> Update Time : Mon Jun 27 21:12:23 2016
>   State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>Checksum : 1a57db60 - correct
>  Events : 164275
> 
> 
>   Number   Major   Minor   RaidDevice State
> this 0   810  active sync   /dev/sda1
> 
>0 0   810  active sync   /dev/sda1
>1 1   8   171  active sync   /dev/sdb1
>2 2   002  faulty removed
>3 3   8   333  active sync   /dev/sdc1
> /dev/sdb1:
>   Magic : a92b4efc
> Version : 00.90.00
>UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
>   Creation Time : Mon Apr 23 19:55:36 2012
>  Raid Level : raid1
>   Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
>  Array Size : 20980800 (20.01 GiB 21.48 GB)
>Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
> Update Time : Mon Jun 27 21:12:23 2016
>   State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>Checksum : 1a57db72 - correct
>  Events : 164275
> 
> 
>   Number   Major   Minor   RaidDevice State
> this 1   8   171  active sync   /dev/sdb1
> 
>0 0   810  active sync   /dev/sda1
>1 1   8   171  active sync   /dev/sdb1
>2 2   002  faulty removed
>3 3   8   333  active sync   /dev/sdc1
> /dev/sdc1:
>   Magic : a92b4efc
> Version : 00.90.00
>UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
>   Creation Time : Mon Apr 23 19:55:36 2012
>  Raid Level : raid1
>   Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
>  Array Size : 20980800 (20.01 GiB 21.48 GB)
>Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
> Update Time : Mon Jun 27 21:12:23 2016
>   State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>Checksum : 1a57db86 - correct
>  Events : 164275
> 
> 
>   Number   Major   Minor   RaidDevice State
> this 3   8   333  active sync   /dev/sdc1
> 
>0 0   810  active sync   /dev/sda1
>1 1   8   171  active sync   /dev/sdb1
>2 2   002  faulty removed
>3 3   8   333  active sync   /dev/sdc1
> 
> ==
> fdisk -l
> 
> WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util
> fdisk doesn't support GPT. Use GNU Parted.
> 
> 
> Disk /dev/sda: 3000.5 GB, 3000592982016 bytes
> 255 heads, 63 sectors/track, 364801 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x03afffbe
> 
>Device Boot  Start End  Blocks   Id  System
> /dev/sda1   1  267350  2147483647+  ee  EFI GPT
> 
> WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util
> fdisk doesn't support GPT. Use GNU Parted.
> 
> 
> Disk /dev/sdb: 3000.5 GB, 3000592982016 bytes
> 255 heads, 63 sectors/track, 364801 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x142a889c
> 
>Device Boot  Start End  Blocks   Id  System
> /dev/sdb1   1  267350  2147483647+  ee  EFI GPT
> 
> WARNING: GPT (GUID Partition Table) detected on '/dev/sdc'! The util
> fdisk doesn't support GPT. Use GNU Parted.
> 
> 
> Disk /dev/sdc: 3000.5 GB, 3

[dm-devel] [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path

2016-07-18 Thread Michal Hocko
From: Michal Hocko 

There has been a report about OOM killer invoked when swapping out to
a dm-crypt device. The primary reason seems to be that the swapout
out IO managed to completely deplete memory reserves. Mikulas was
able to bisect and explained the issue by pointing to f9054c70d28b
("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements").

The reason is that the swapout path is not throttled properly because
the md-raid layer needs to allocate from the generic_make_request path
which means it allocates from the PF_MEMALLOC context. dm layer uses
mempool_alloc in order to guarantee a forward progress which used to
inhibit access to memory reserves when using page allocator. This has
changed by f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if
there are free elements") which has dropped the __GFP_NOMEMALLOC
protection when the memory pool is depleted.

If we are running out of memory and the only way forward to free memory
is to perform swapout we just keep consuming memory reserves rather than
throttling the mempool allocations and allowing the pending IO to
complete up to a moment when the memory is depleted completely and there
is no way forward but invoking the OOM killer. This is less than
optimal.

The original intention of f9054c70d28b was to help with the OOM
situations where the oom victim depends on mempool allocation to make a
forward progress. We can handle that case in a different way, though. We
can check whether the current task has access to memory reserves ad an
OOM victim (TIF_MEMDIE) and drop __GFP_NOMEMALLOC protection if the pool
is empty.

David Rientjes was objecting that such an approach wouldn't help if the
oom victim was blocked on a lock held by process doing mempool_alloc. This
is very similar to other oom deadlock situations and we have oom_reaper
to deal with them so it is reasonable to rely on the same mechanism
rather inventing a different one which has negative side effects.

Fixes: f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free 
elements")
Bisected-by: Mikulas Patocka 
Signed-off-by: Michal Hocko 
---
 mm/mempool.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index 8f65464da5de..ea26d75c8adf 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -322,20 +322,20 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 
might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
 
+   gfp_mask |= __GFP_NOMEMALLOC;   /* don't allocate emergency reserves */
gfp_mask |= __GFP_NORETRY;  /* don't loop in __alloc_pages */
gfp_mask |= __GFP_NOWARN;   /* failures are OK */
 
gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
 
 repeat_alloc:
-   if (likely(pool->curr_nr)) {
-   /*
-* Don't allocate from emergency reserves if there are
-* elements available.  This check is racy, but it will
-* be rechecked each loop.
-*/
-   gfp_temp |= __GFP_NOMEMALLOC;
-   }
+   /*
+* Make sure that the OOM victim will get access to memory reserves
+* properly if there are no objects in the pool to prevent from
+* livelocks.
+*/
+   if (!likely(pool->curr_nr) && test_thread_flag(TIF_MEMDIE))
+   gfp_temp &= ~__GFP_NOMEMALLOC;
 
element = pool->alloc(gfp_temp, pool->pool_data);
if (likely(element != NULL))
@@ -359,7 +359,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 * We use gfp mask w/o direct reclaim or IO for the first round.  If
 * alloc failed with that and @pool was empty, retry immediately.
 */
-   if ((gfp_temp & ~__GFP_NOMEMALLOC) != gfp_mask) {
+   if ((gfp_temp & __GFP_DIRECT_RECLAIM) != (gfp_mask & 
__GFP_DIRECT_RECLAIM)) {
spin_unlock_irqrestore(&pool->lock, flags);
gfp_temp = gfp_mask;
goto repeat_alloc;
-- 
2.8.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[dm-devel] [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks

2016-07-18 Thread Michal Hocko
From: Michal Hocko 

Mikulas has reported that a swap backed by dm-crypt doesn't work
properly because the swapout cannot make a sufficient forward progress
as the writeout path depends on dm_crypt worker which has to allocate
memory to perform the encryption. In order to guarantee a forward
progress it relies on the mempool allocator. mempool_alloc(), however,
prefers to use the underlying (usually page) allocator before it grabs
objects from the pool. Such an allocation can dive into the memory
reclaim and consequently to throttle_vm_writeout. If there are too many
dirty or pages under writeback it will get throttled even though it is
in fact a flusher to clear pending pages.

[  345.352536] kworker/u4:0D 88003df7f438 10488 6  2 0x
[  345.352536] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[  345.352536]  88003df7f438 88003e5d0380 88003e5d0380 
88003e5d8e80
[  345.352536]  88003dfb3240 88003df73240 88003df8 
88003df7f470
[  345.352536]  88003e5d0380 88003e5d0380 88003df7f828 
88003df7f450
[  345.352536] Call Trace:
[  345.352536]  [] schedule+0x3c/0x90
[  345.352536]  [] schedule_timeout+0x1d8/0x360
[  345.352536]  [] ? detach_if_pending+0x1c0/0x1c0
[  345.352536]  [] ? ktime_get+0xb3/0x150
[  345.352536]  [] ? __delayacct_blkio_start+0x1f/0x30
[  345.352536]  [] io_schedule_timeout+0xa4/0x110
[  345.352536]  [] congestion_wait+0x86/0x1f0
[  345.352536]  [] ? prepare_to_wait_event+0xf0/0xf0
[  345.352536]  [] throttle_vm_writeout+0x44/0xd0
[  345.352536]  [] shrink_zone_memcg+0x613/0x720
[  345.352536]  [] shrink_zone+0xe0/0x300
[  345.352536]  [] do_try_to_free_pages+0x1ad/0x450
[  345.352536]  [] try_to_free_pages+0xef/0x300
[  345.352536]  [] __alloc_pages_nodemask+0x879/0x1210
[  345.352536]  [] ? sched_clock_cpu+0x90/0xc0
[  345.352536]  [] alloc_pages_current+0xa1/0x1f0
[  345.352536]  [] ? new_slab+0x3f5/0x6a0
[  345.352536]  [] new_slab+0x2d7/0x6a0
[  345.352536]  [] ? sched_clock_local+0x17/0x80
[  345.352536]  [] ___slab_alloc+0x3fb/0x5c0
[  345.352536]  [] ? mempool_alloc_slab+0x1d/0x30
[  345.352536]  [] ? sched_clock_local+0x17/0x80
[  345.352536]  [] ? mempool_alloc_slab+0x1d/0x30
[  345.352536]  [] __slab_alloc+0x51/0x90
[  345.352536]  [] ? mempool_alloc_slab+0x1d/0x30
[  345.352536]  [] kmem_cache_alloc+0x27b/0x310
[  345.352536]  [] mempool_alloc_slab+0x1d/0x30
[  345.352536]  [] mempool_alloc+0x91/0x230
[  345.352536]  [] bio_alloc_bioset+0xbd/0x260
[  345.352536]  [] kcryptd_crypt+0x114/0x3b0 [dm_crypt]

Memory pools are usually used for the writeback paths and it doesn't
really make much sense to throttle them just because there are too many
dirty/writeback pages. The main purpose of throttle_vm_writeout is to
make sure that the pageout path doesn't generate too much dirty data.
Considering that we are in mempool path which performs __GFP_NORETRY
requests the risk shouldn't be really high.

Fix this by ensuring that mempool users will get PF_LESS_THROTTLE and
that such processes are not throttled in throttle_vm_writeout. They can
still get throttled due to current_may_throttle() sleeps but that should
happen when the backing device itself is congested which sounds like a
proper reaction.

Please note that the bonus given by domain_dirty_limits() alone is not
sufficient because at least dm-crypt has to double buffer each page
under writeback so this won't be sufficient to prevent from being
throttled.

There are other users of the flag but they are in the writeout path so
this looks like a proper thing for them as well.

Reported-by: Mikulas Patocka 
Signed-off-by: Michal Hocko 
---
 mm/mempool.c| 19 +++
 mm/page-writeback.c |  3 +++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index ea26d75c8adf..916e95c4192c 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -310,7 +310,8 @@ EXPORT_SYMBOL(mempool_resize);
  */
 void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 {
-   void *element;
+   unsigned int pflags = current->flags;
+   void *element = NULL;
unsigned long flags;
wait_queue_t wait;
gfp_t gfp_temp;
@@ -328,6 +329,12 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 
gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO);
 
+   /*
+* Make sure that the allocation doesn't get throttled during the
+* reclaim
+*/
+   if (gfpflags_allow_blocking(gfp_mask))
+   current->flags |= PF_LESS_THROTTLE;
 repeat_alloc:
/*
 * Make sure that the OOM victim will get access to memory reserves
@@ -339,7 +346,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
 
element = pool->alloc(gfp_temp, pool->pool_data);
if (likely(element != NULL))
-   return element;
+   goto out;
 
spin_lock_irqsave(&pool->lock, flags);
if (likely(pool->curr_nr)) {
@@ -352,7 +359,7 @@ void *mempool_alloc(m

[dm-devel] [RFC PATCH 0/2] mempool vs. page allocator interaction

2016-07-18 Thread Michal Hocko
Hi,
there have been two issues identified when investigating dm-crypt
backed swap recently [1]. The first one looks like a regression from
f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free
elements") because swapout path can now deplete all the available memory
reserves. The first patch tries to address that issue by dropping
__GFP_NOMEMALLOC only to TIF_MEMDIE tasks.

The second issue is that dm writeout path which relies on mempool
allocator gets throttled by the direct reclaim in throttle_vm_writeout
which just makes the whole memory pressure problem even worse. The
patch2 just makes sure that we annotate mempool users to be throttled
less by PF_LESS_THROTTLE flag and prevent from throttle_vm_writeout for
that path. mempool users are usually the IO path and throttle them less
sounds like a reasonable way to go.

I do not have any more complicated dm setup available so I would
appreciate if dm people (CCed) could give these two a try.

Also it would be great to iron out concerns from David. He has posted a
deadlock stack trace [2] which has led to f9054c70d28b which is bio
allocation lockup because the TIF_MEMDIE process cannot make a forward
progress without access to memory reserve. This case should be fixed by
patch 1 AFAICS. There are other potential cases when the stuck mempool
is called from PF_MEMALLOC context and blocks the oom victim indirectly
(over a lock) but I believe those are much less likely and we have the
oom reaper to make a forward progress.

Sorry of pulling the discussion outside of the original email thread
but there were more lines of dicussion there and I felt discussing
particualr solution with its justification has a greater chance of
moving towards a solution. I am sending this as an RFC because this
needs a deep review as there might be other side effects I do not see
(especially about patch 2).

Any comments, suggestions are welcome.

---
[1] 
http://lkml.kernel.org/r/alpine.lrh.2.02.1607111027080.14...@file01.intranet.prod.int.rdu2.redhat.com
[2] 
http://lkml.kernel.org/r/alpine.deb.2.10.1607131644590.92...@chino.kir.corp.google.com


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) with check/repair/sync

2016-07-18 Thread Matthias Dahl

Hello again...

So I spent all weekend doing further tests, since this issue is
really bugging me for obvious reasons.

I thought it would be beneficial if I created a bug report that
summarized and centralized everything in one place rather than
having everything spread across several lists and posts.

Here the bug report I created:
https://bugzilla.kernel.org/show_bug.cgi?id=135481

If anyone has any suggestions, ideas or wants me to do further tests,
please just let me know. There is not much more I can do at this point
without further help/guidance.

Thanks,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] System freezes after OOM

2016-07-18 Thread Michal Hocko
On Fri 15-07-16 13:02:17, Mikulas Patocka wrote:
> 
> 
> On Fri, 15 Jul 2016, Michal Hocko wrote:
> 
> > On Fri 15-07-16 08:11:22, Mikulas Patocka wrote:
> > > 
> > > The stacktraces showed that the kcryptd process was throttled when it 
> > > tried to do mempool allocation. Mempool adds the __GFP_NORETRY flag to 
> > > the 
> > > allocation, but unfortunatelly, this flag doesn't prevent the allocator 
> > > from throttling.
> > 
> > Yes and in fact it shouldn't prevent any throttling. The flag merely
> > says that the allocation should give up rather than retry
> > reclaim/compaction again and again.
> > 
> > > I say that the process doing mempool allocation shouldn't ever be 
> > > throttled. Maybe add __GFP_NOTHROTTLE?
> > 
> > A specific gfp flag would be an option but we are slowly running out of
> > bit space there and I am not yet convinced PF_LESS_THROTTLE is
> > unsuitable.
> 
> PF_LESS_THROTTLE will make it throttle less, but it doesn't eliminate 
> throttling entirely. So, maybe add PF_NO_THROTTLE? But PF_* flags are also 
> almost exhausted.

I am not really sure we can make anybody so special to not throttle at all.
Seeing a congested backig device sounds like a reasonable compromise.
Besides that it seems that we do not really need to eliminate
wait_iff_congested for dm to work properly again AFAIU. I plan to repost
both patch today after some more internal review. If we need to do more
changes I would suggest making them in separet patches.
-- 
Michal Hocko
SUSE Labs

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] System freezes after OOM

2016-07-18 Thread Tetsuo Handa
On 2016/07/15 2:07, Ondrej Kozina wrote:
> On 07/14/2016 05:31 PM, Michal Hocko wrote:
>> On Thu 14-07-16 16:08:28, Ondrej Kozina wrote:
>> [...]
>>> As Mikulas pointed out, this doesn't work. The system froze as well with the
>>> patch above. Will try to tweak the patch with Mikulas's suggestion...
>>
>> Thank you for testing! Do you happen to have traces of the frozen
>> processes? Does the flusher still gets throttled because the bias it
>> gets is not sufficient. Or does it get throttled at a different place?
>>
> 
> Sure. Here it is (including sysrq+t and sysrq+w output): 
> https://okozina.fedorapeople.org/bugs/swap_on_dmcrypt/4.7.0-rc7+/1/4.7.0-rc7+.log
> 

Oh, this resembles another dm-crypt lockup problem reported last month.
( http://lkml.kernel.org/r/20160616212641.ga3...@sig21.net )

In Johannes's case, there are so many pending kcryptd_crypt work requests and
mempool_alloc() is waiting at throttle_vm_writeout() or shrink_inactive_list().

[ 2378.279029] kswapd0 D 88003744f538 0   766  2 0x
[ 2378.286167]  88003744f538 00ff88011b5ccd80 88011b5d62d8 
88011ae58000
[ 2378.293628]  88003745 88003745 0001000984f2 
88003744f570
[ 2378.301168]  88011b5ccd80 88003745 88003744f550 
81845cec
[ 2378.308674] Call Trace:
[ 2378.311154]  [] schedule+0x8b/0xa3
[ 2378.316153]  [] schedule_timeout+0x20b/0x285
[ 2378.322028]  [] ? init_timer_key+0x112/0x112
[ 2378.327931]  [] io_schedule_timeout+0xa0/0x102
[ 2378.333960]  [] ? io_schedule_timeout+0xa0/0x102
[ 2378.340166]  [] mempool_alloc+0x123/0x154
[ 2378.345781]  [] ? wait_woken+0x72/0x72
[ 2378.351148]  [] bio_alloc_bioset+0xe8/0x1d7
[ 2378.356910]  [] alloc_tio+0x2d/0x47
[ 2378.361996]  [] __split_and_process_bio+0x310/0x3a3
[ 2378.368470]  [] dm_make_request+0xb5/0xe2
[ 2378.374078]  [] generic_make_request+0xcc/0x180
[ 2378.380206]  [] submit_bio+0xfd/0x145
[ 2378.385482]  [] __swap_writepage+0x202/0x225
[ 2378.391349]  [] ? preempt_count_sub+0xf0/0x100
[ 2378.397398]  [] ? _raw_spin_unlock+0x31/0x44
[ 2378.403273]  [] ? page_swapcount+0x45/0x4c
[ 2378.408984]  [] swap_writepage+0x3a/0x3e
[ 2378.414530]  [] pageout.isra.16+0x160/0x2a7
[ 2378.420320]  [] shrink_page_list+0x5a0/0x8c4
[ 2378.426197]  [] shrink_inactive_list+0x29e/0x4a1
[ 2378.432434]  [] shrink_zone_memcg+0x4c1/0x661
[ 2378.438406]  [] shrink_zone+0xdc/0x1e5
[ 2378.443742]  [] ? shrink_zone+0xdc/0x1e5
[ 2378.449238]  [] kswapd+0x6df/0x814
[ 2378.454222]  [] ? mem_cgroup_shrink_node_zone+0x209/0x209
[ 2378.461196]  [] kthread+0xff/0x107
[ 2378.466182]  [] ret_from_fork+0x22/0x50
[ 2378.471631]  [] ? kthread_create_on_node+0x1ea/0x1ea

[ 2378.769494] kworker/u8:4D 8800c5dc3508 0  1592  2 0x
[ 2378.776582] Workqueue: kcryptd kcryptd_crypt
[ 2378.780887]  8800c5dc3508 00ff88011b7ccd80 88011b7d62d8 
88011ae5a900
[ 2378.788399]  88011a605200 8800c5dc4000 0001000983f7 
8800c5dc3540
[ 2378.795930]  88011b7ccd80  8800c5dc3520 
81845cec
[ 2378.803408] Call Trace:
[ 2378.805879]  [] schedule+0x8b/0xa3
[ 2378.810908]  [] schedule_timeout+0x20b/0x285
[ 2378.816783]  [] ? init_timer_key+0x112/0x112
[ 2378.822677]  [] io_schedule_timeout+0xa0/0x102
[ 2378.828716]  [] ? io_schedule_timeout+0xa0/0x102
[ 2378.834956]  [] congestion_wait+0x84/0x160
[ 2378.840658]  [] ? wait_woken+0x72/0x72
[ 2378.845997]  [] throttle_vm_writeout+0x88/0xab
[ 2378.852036]  [] shrink_zone_memcg+0x635/0x661
[ 2378.857982]  [] shrink_zone+0xdc/0x1e5
[ 2378.863309]  [] ? shrink_zone+0xdc/0x1e5
[ 2378.868832]  [] do_try_to_free_pages+0x1a5/0x2c3
[ 2378.875028]  [] try_to_free_pages+0x123/0x21f
[ 2378.880972]  [] __alloc_pages_nodemask+0x4c9/0x978
[ 2378.887385]  [] ? debug_smp_processor_id+0x17/0x19
[ 2378.893782]  [] new_slab+0xbc/0x3bb
[ 2378.898868]  [] ___slab_alloc.constprop.22+0x2fb/0x37b
[ 2378.905634]  [] ? mempool_alloc_slab+0x15/0x17
[ 2378.911659]  [] ? sched_clock+0x9/0xd
[ 2378.916909]  [] ? local_clock+0x20/0x22
[ 2378.922325]  [] ? __lock_acquire.isra.16+0x55e/0xb4c
[ 2378.928877]  [] ? sched_clock+0x9/0xd
[ 2378.934138]  [] ? local_clock+0x20/0x22
[ 2378.939555]  [] ? __lock_acquire.isra.16+0x55e/0xb4c
[ 2378.946125]  [] __slab_alloc.isra.17.constprop.21+0x57/0x8b
[ 2378.953289]  [] ? 
__slab_alloc.isra.17.constprop.21+0x57/0x8b
[ 2378.960630]  [] ? mempool_alloc_slab+0x15/0x17
[ 2378.966706]  [] kmem_cache_alloc+0xa0/0x1d6
[ 2378.972503]  [] ? mempool_alloc_slab+0x15/0x17
[ 2378.978567]  [] mempool_alloc_slab+0x15/0x17
[ 2378.984426]  [] mempool_alloc+0x72/0x154
[ 2378.989930]  [] ? lockdep_init_map+0xc9/0x5a3
[ 2378.995866]  [] ? local_clock+0x20/0x22
[ 2379.001300]  [] bio_alloc_bioset+0xe8/0x1d7
[ 2379.007107]  [] kcryptd_crypt+0x1ab/0x325
[ 2379.012704]  [] ? process_one_work+0x1ad/0x4e2
[ 2379.018753]  [] process_one_work+0x283/0x4e2
[ 2379.024629]  [] ? put_lock_stats.isra.9+0xe/0x20
[ 2379.030851]  [] worker_thread+0x285/0x370
[ 2379.036423

[dm-devel] [PATCH] dm: fix parameter to blk_delay_queue()

2016-07-18 Thread Tahsin Erdogan
Second parameter to blk_delay_queue() must be in msec units not jiffies.

Signed-off-by: Tahsin Erdogan 
---
 drivers/md/dm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1b2f96205361..17c63265a205 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2175,7 +2175,7 @@ static void dm_request_fn(struct request_queue *q)
 md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
 md->last_rq_pos == pos && md->last_rq_rw == 
rq_data_dir(rq)) ||
(ti->type->busy && ti->type->busy(ti))) {
-   blk_delay_queue(q, HZ / 100);
+   blk_delay_queue(q, 10);
return;
}
 
-- 
2.8.0.rc3.226.g39d4020

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] dm: fix parameter to blk_delay_queue()

2016-07-18 Thread Tahsin Erdogan
> This needs to be rebased against linux-next (or linux-dm.git's
> 'for-next') because the code in question has been moved out to dm-rq.c
>
> But I'll gladly take care of it.

Thanks Mike!

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Recover filenames from failed RAID0

2016-07-18 Thread Michel Dubois
Dear linux-raid mailing list,

I have a RAID0 array of four 3TB disks that failed on the "third" disk.

I am aware of the non-redundancy of RAID0 but I would like to recover
the filenames from that RAID0. If I could recover some data it would
be a bonus.

Below you'll find the outputs of the following commands
 mdadm --examine /dev/sd[abcd]1
 fdisk -l

where sda1, sdb1, sdc1 and sdd1 should be the 4 RAID devices.

What could be my next step?

I thank you for your time

Michel Dubois

==
mdadm --examine /dev/sd[abcd]1
/dev/sda1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
  Creation Time : Mon Apr 23 19:55:36 2012
 Raid Level : raid1
  Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
 Array Size : 20980800 (20.01 GiB 21.48 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

Update Time : Mon Jun 27 21:12:23 2016
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 1a57db60 - correct
 Events : 164275


  Number   Major   Minor   RaidDevice State
this 0   810  active sync   /dev/sda1

   0 0   810  active sync   /dev/sda1
   1 1   8   171  active sync   /dev/sdb1
   2 2   002  faulty removed
   3 3   8   333  active sync   /dev/sdc1
/dev/sdb1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
  Creation Time : Mon Apr 23 19:55:36 2012
 Raid Level : raid1
  Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
 Array Size : 20980800 (20.01 GiB 21.48 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

Update Time : Mon Jun 27 21:12:23 2016
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 1a57db72 - correct
 Events : 164275


  Number   Major   Minor   RaidDevice State
this 1   8   171  active sync   /dev/sdb1

   0 0   810  active sync   /dev/sda1
   1 1   8   171  active sync   /dev/sdb1
   2 2   002  faulty removed
   3 3   8   333  active sync   /dev/sdc1
/dev/sdc1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
  Creation Time : Mon Apr 23 19:55:36 2012
 Raid Level : raid1
  Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
 Array Size : 20980800 (20.01 GiB 21.48 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

Update Time : Mon Jun 27 21:12:23 2016
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 1a57db86 - correct
 Events : 164275


  Number   Major   Minor   RaidDevice State
this 3   8   333  active sync   /dev/sdc1

   0 0   810  active sync   /dev/sda1
   1 1   8   171  active sync   /dev/sdb1
   2 2   002  faulty removed
   3 3   8   333  active sync   /dev/sdc1

==
fdisk -l

WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util
fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sda: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x03afffbe

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   1  267350  2147483647+  ee  EFI GPT

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util
fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sdb: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x142a889c

   Device Boot  Start End  Blocks   Id  System
/dev/sdb1   1  267350  2147483647+  ee  EFI GPT

WARNING: GPT (GUID Partition Table) detected on '/dev/sdc'! The util
fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sdc: 3000.5 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x3daebd50

   Device Boot  Start End  Blocks   Id  System
/dev/sdc1   1  267350  2147483647+  ee  EFI GPT

Disk /dev/md0: 21.4 GB, 21484339200 bytes
2 heads, 4 sectors/track, 5245200 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x

Disk /dev/md0 doesn't contain a valid partition table

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Recover filenames from failed RAID0

2016-07-18 Thread Stewart Ives
Michael,

I'll preface my reply with the statement that I am far from an expert at
this but I can read and understand the descriptions of the different RAID
levels and it seems to me with a RAID0 you are SOL if you lose a device in
the array. Just by the very nature of the RAID0 configuration there is
absolutely NO redundancy. The only reason anyone would configure such a
system is for SPEED and the only data that should be permitted on a RAID0
array is temp or working data that is recoverable by other means in the
event of a failure. I know of many Videographers that use a SSD RAID0 array
for working on their current project but they also copy that array out
about every hour for backup.

I pose only one question to you. Did you have a backup?

Good luck.

-stew


>>
>> Stewart M. Ives
>> SofTEC USA
>> 1717 Bridge St
>> New Cumberland, PA 17070 USA
>>
>> Tel: 717-910-4600
>> Fax: 888-371-6022
>> Skype: softecusa-ivessm
>> EMail: ive...@softecusa.com
>> WebSite: www.softecusa.com
>>

On Sun, Jul 17, 2016 at 3:10 PM, Michel Dubois 
wrote:

> Dear linux-raid mailing list,
>
> I have a RAID0 array of four 3TB disks that failed on the "third" disk.
>
> I am aware of the non-redundancy of RAID0 but I would like to recover
> the filenames from that RAID0. If I could recover some data it would
> be a bonus.
>
> Below you'll find the outputs of the following commands
>  mdadm --examine /dev/sd[abcd]1
>  fdisk -l
>
> where sda1, sdb1, sdc1 and sdd1 should be the 4 RAID devices.
>
> What could be my next step?
>
> I thank you for your time
>
> Michel Dubois
>
> ==
> mdadm --examine /dev/sd[abcd]1
> /dev/sda1:
>   Magic : a92b4efc
> Version : 00.90.00
>UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
>   Creation Time : Mon Apr 23 19:55:36 2012
>  Raid Level : raid1
>   Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
>  Array Size : 20980800 (20.01 GiB 21.48 GB)
>Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
>
> Update Time : Mon Jun 27 21:12:23 2016
>   State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>Checksum : 1a57db60 - correct
>  Events : 164275
>
>
>   Number   Major   Minor   RaidDevice State
> this 0   810  active sync   /dev/sda1
>
>0 0   810  active sync   /dev/sda1
>1 1   8   171  active sync   /dev/sdb1
>2 2   002  faulty removed
>3 3   8   333  active sync   /dev/sdc1
> /dev/sdb1:
>   Magic : a92b4efc
> Version : 00.90.00
>UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
>   Creation Time : Mon Apr 23 19:55:36 2012
>  Raid Level : raid1
>   Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
>  Array Size : 20980800 (20.01 GiB 21.48 GB)
>Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
>
> Update Time : Mon Jun 27 21:12:23 2016
>   State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>Checksum : 1a57db72 - correct
>  Events : 164275
>
>
>   Number   Major   Minor   RaidDevice State
> this 1   8   171  active sync   /dev/sdb1
>
>0 0   810  active sync   /dev/sda1
>1 1   8   171  active sync   /dev/sdb1
>2 2   002  faulty removed
>3 3   8   333  active sync   /dev/sdc1
> /dev/sdc1:
>   Magic : a92b4efc
> Version : 00.90.00
>UUID : 7d247a6e:7b5d46c8:f52d9c89:db304b21
>   Creation Time : Mon Apr 23 19:55:36 2012
>  Raid Level : raid1
>   Used Dev Size : 20980800 (20.01 GiB 21.48 GB)
>  Array Size : 20980800 (20.01 GiB 21.48 GB)
>Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
>
> Update Time : Mon Jun 27 21:12:23 2016
>   State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 0
>Checksum : 1a57db86 - correct
>  Events : 164275
>
>
>   Number   Major   Minor   RaidDevice State
> this 3   8   333  active sync   /dev/sdc1
>
>0 0   810  active sync   /dev/sda1
>1 1   8   171  active sync   /dev/sdb1
>2 2   002  faulty removed
>3 3   8   333  active sync   /dev/sdc1
>
> ==
> fdisk -l
>
> WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util
> fdisk doesn't support GPT. Use GNU Parted.
>
>
> Disk /dev/sda: 3000.5 GB, 3000592982016 bytes
> 255 heads, 63 sectors/track, 364801 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x03afffbe
>
>Device Boot  Start End  Blocks   Id  System
> /dev/sda1   1 

[dm-devel] [PATCH next] Btrfs: fix comparison in __btrfs_map_block()

2016-07-18 Thread Vincent Stehlé
Add missing comparison to op in expression, which was forgotten when doing
the REQ_OP transition.

Fixes: b3d3fa519905 ("btrfs: update __btrfs_map_block for REQ_OP transition")
Signed-off-by: Vincent Stehlé 
Cc: Mike Christie 
Cc: Jens Axboe 
---


Hi,

I saw that issue in linux next.

Not sure if it is too late to squash the fix with commit b3d3fa519905 or
not...

Best regards,

Vincent.


 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a69203a..6ee1e36 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5533,7 +5533,7 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info, int op,
}
 
} else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
-   if (op == REQ_OP_WRITE || REQ_OP_DISCARD ||
+   if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD ||
op == REQ_GET_READ_MIRRORS) {
num_stripes = map->num_stripes;
} else if (mirror_num) {
-- 
2.8.1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel