Bug#556610: Please do incremental checks every night instead of a full monthly one
On Mon, Jan 06, 2014 at 03:14:14PM +1100, NeilBrown wrote: It is very unlikely to have a positive effect. Well, at least one - we can simplify the incremental check script drastically. If it has any effect, it will significantly slow down any check/repair etc that is happening. I think, it would be nice to end (not pause) check if it's reached sync_max. Perhaps, there is deep reasons why md's interface doesn't work in this way. Neil, could you explan this a bit? There might be a reason to continue the resync. Could you explain the reasons behind of this interface? If you want to end the resync, then have some program wait for sync_completed to reach sync_max, then write 'idle' to 'sync_action'. Yes, I know. But this solution looks too ugly to be a good interface for shell-scripting. That's why I asked the question above. If you (or someone here) want to write a general incremental check script then I think that is a great idea, but rather than treating it as a Debian thing, post the proposal to linux-r...@vger.kernel.org and get feedback and suggestions there and when it is ready we can include it in the upstream mdadm package. Ok. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#556610: Please do incremental checks every night instead of a full monthly one
On Wed, 25 Dec 2013 19:13:27 +0400 Sergey B Kirpichev skirpic...@gmail.com wrote: The main issue which all proposed solutions share is when there's a large array, say, md0, and a small array, say, md1, both shares the same set of underlying disks, so md subystem will not check/repair them in parallel. In this situation, we will never check md1 if checking md0 takes more time than we allow in a month (28 days). What do you think about suggested above solution (set sync_force_parallel to 1 during cronjobs)? This workaround is implemented in the updated (attached) patch. See also: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556610#74 BTW, how bad is in general to set sync_force_parallel to 1 per default? (Cc'd to Neil Brown.) It is very unlikely to have a positive effect. If it has any effect, it will significantly slow down any check/repair etc that is happening. I think, it would be nice to end (not pause) check if it's reached sync_max. Perhaps, there is deep reasons why md's interface doesn't work in this way. Neil, could you explan this a bit? There might be a reason to continue the resync. If you want to end the resync, then have some program wait for sync_completed to reach sync_max, then write 'idle' to 'sync_action'. If you (or someone here) want to write a general incremental check script then I think that is a great idea, but rather than treating it as a Debian thing, post the proposal to linux-r...@vger.kernel.org and get feedback and suggestions there and when it is ready we can include it in the upstream mdadm package. NeilBrown I'll think about it all more. Any news? signature.asc Description: PGP signature
Bug#556610: Please do incremental checks every night instead of a full monthly one
The main issue which all proposed solutions share is when there's a large array, say, md0, and a small array, say, md1, both shares the same set of underlying disks, so md subystem will not check/repair them in parallel. In this situation, we will never check md1 if checking md0 takes more time than we allow in a month (28 days). What do you think about suggested above solution (set sync_force_parallel to 1 during cronjobs)? This workaround is implemented in the updated (attached) patch. See also: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556610#74 BTW, how bad is in general to set sync_force_parallel to 1 per default? (Cc'd to Neil Brown.) I think, it would be nice to end (not pause) check if it's reached sync_max. Perhaps, there is deep reasons why md's interface doesn't work in this way. Neil, could you explan this a bit? I'll think about it all more. Any news? --- /etc/cron.d/mdadm.orig 2013-12-25 19:00:14.0 +0400 +++ /etc/cron.d/mdadm 2013-12-25 19:01:50.0 +0400 @@ -5,8 +5,7 @@ # distributed under the terms of the Artistic Licence 2.0 # -# By default, run at 00:57 on every Sunday, but do nothing unless the day of -# the month is less than or equal to 7. Thus, only run on the first Sunday of -# each month. crontab(5) sucks, unfortunately, in this regard; therefore this -# hack (see #380425). -57 0 * * 0 root if [ -x /usr/share/mdadm/checkarray ] [ $(date +\%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi +# By default, start (or continue unfinished checks) at 00:57 +# and stop (interrupt) checks at 01:57. +57 0 * * * root [ -x /usr/share/mdadm/checkarray ] /usr/share/mdadm/checkarray --cron --all --idle --quiet +57 1 * * * root [ -x /usr/share/mdadm/checkarray ] /usr/share/mdadm/checkarray --cron --all --idle --quiet --interrupt --- /usr/share/mdadm/checkarray.orig 2013-01-24 17:26:51.0 +0400 +++ /usr/share/mdadm/checkarray 2013-12-25 18:58:56.0 +0400 @@ -27,10 +27,12 @@ -a|--all check all assembled arrays (ignores arrays in command line). -s|--status print redundancy check status of devices. -x|--cancel queue a request to cancel a running redundancy check. + --interrupt queue a request to interrupt a running redundancy check. -i|--idle perform check in a lowest scheduling class (idle) -l|--slow perform check in a lower-than-standard scheduling class -f|--fast perform check in higher-than-standard scheduling class --realtime perform check in real-time scheduling class (DANGEROUS!) + --split n check next 1/n'th part (n = 28) of every specified device (override CHECK_SPLIT) -c|--cron honour AUTOCHECK setting in /etc/default/mdadm. -q|--quiet suppress informational messages (use twice to suppress error messages too). @@ -50,7 +52,7 @@ } SHORTOPTS=achVqQsxilf -LONGOPTS=all,cron,help,version,quiet,real-quiet,status,cancel,idle,slow,fast,realtime +LONGOPTS=all,cron,help,version,quiet,real-quiet,status,cancel,interrupt,idle,slow,fast,realtime,split: eval set -- $(getopt -o $SHORTOPTS -l $LONGOPTS -n $PROGNAME -- $@) @@ -62,20 +64,31 @@ action=check ionice= -for opt in $@; do - case $opt in --a|--all) all=1;; --s|--status) action=status;; --x|--cancel) action=idle;; --i|--idle) ionice=idle;; --l|--slow) ionice=low;; --f|--fast) ionice=high;; ---realtime) ionice=realtime;; --c|--cron) cron=1;; --q|--quiet) quiet=$(($quiet+1));; --Q|--real-quiet) quiet=$(($quiet+2));; # for compatibility +while true +do + case $1 in +-a|--all) all=1; shift;; +-s|--status) action=status; shift;; +-x|--cancel) action=cancel; shift;; +--interrupt) action=interrupt; shift;; +-i|--idle) ionice=idle; shift;; +-l|--slow) ionice=low; shift;; +-f|--fast) ionice=high; shift;; +--realtime) ionice=realtime; shift;; +--split) CHECK_SPLIT=$2; shift 2;; +-c|--cron) cron=1; shift;; +-q|--quiet) quiet=$(($quiet+1)); shift;; +-Q|--real-quiet) quiet=$(($quiet+2)); shift;; # for compatibility -h|--help) usage; exit 0;; -V|--version) about; exit 0;; +--) shift; break;; +*) echo $PROGNAME: E: invalid option: $1. Try --help. 2; exit 1;; + esac +done + +for opt in $@ +do + case $opt in /dev/md/*|md/*) arrays=${arrays:+$arrays }md${opt#*md/};; /dev/md*|md*) arrays=${arrays:+$arrays }${opt#/dev/};; /sys/block/md*) arrays=${arrays:+$arrays }${opt#/sys/block/};; @@ -99,6 +112,20 @@ exit 0 fi +CHECK_SPLIT=${CHECK_SPLIT:-28} + +if [ $CHECK_SPLIT -gt 28 ] +then + CHECK_SPLIT=28 + echo $PROGNAME: W: CHECK_SPLIT 28, reset to 28. 2 +fi + +if [ $CHECK_SPLIT -lt 1 ] +then + CHECK_SPLIT=1 + echo $PROGNAME: W: CHECK_SPLIT 1, reset to 1. 2 +fi + if [ ! -f /proc/mdstat ]; then [ $quiet -lt 2 ] echo $PROGNAME: E: MD subsystem not loaded, or /proc unavailable. 2 exit 2 @@ -159,10 +186,34 @@ continue fi + chunk_size=$(cat $MDBASE/chunk_size) + # set one to safe value if raid level
Bug#556610: Please do incremental checks every night instead of a full monthly one
Ok. I reviewed the patches and proposed solutions, but I can't commit/implement any of them so far. The main issue which all proposed solutions share is when there's a large array, say, md0, and a small array, say, md1, both shares the same set of underlying disks, so md subystem will not check/repair them in parallel. In this situation, we will never check md1 if checking md0 takes more time than we allow in a month (28 days). I'll think about it all more. Thanks, /mjt -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#556610: Please do incremental checks every night instead of a full monthly one
On Fri, Jun 22, 2012 at 07:51:27PM +0300, Michael Tokarev wrote: The main issue which all proposed solutions share is when there's a large array, say, md0, and a small array, say, md1, both shares the same set of underlying disks, so md subystem will not check/repair them in parallel. In this situation, we will never check md1 if checking md0 takes more time than we allow in a month (28 days). Yep. See my last post. I'll think about it all more. What do you think about suggested above solution (set sync_force_parallel to 1 during cronjobs)? Another solution (except mentioned above pooling etc): don't delay check for current array after reaching of the sync_max threshold. Just *stop* it. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#556610: Please do incremental checks every night instead of a full monthly one
Just to note, the above patch wont work properly on squeeze kernel (That is why you may need here black magick with watching sync_completed file, as Alice suggests). This is fixed in kernel since the commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c07b70ad32ed0a5ec9735cafb1aa10b3a2298b7d Seems to be simple, but there is no chance to enter squeeze, right? Attached checkarray (fixed typo) and cron.d/mdadm patches. checkarray.patch Description: Binary data mdadm-cron.patch Description: Binary data
Bug#556610: Please do incremental checks every night instead of a full monthly one
Attached slightly fixed version of the above patch: sync_min must be a multiple of chunk_size. checkarray.patch Description: Binary data
Bug#556610: Please do incremental checks every night instead of a full monthly one
tag 556610 +patch thanks Just a more simple version of the http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=32;filename=checkarray.diff;att=2;bug=556610 Rough idea is to 1) setup crontab on a regular basis, e.g. weekly: --8--- 57 0 * * 0 root [ -x /usr/share/mdadm/checkarray ] /usr/share/mdadm/checkarray --cron --all --quiet 57 6 * * 0 root [ -x /usr/share/mdadm/checkarray ] /usr/share/mdadm/checkarray --cron --all --quiet --cancel -8--- 2) Save sync_completed info to sync_min on --cancel: ---8- --- /usr/share/mdadm/checkarray 2011-12-06 04:41:09.0 +0400 +++ ./checkarray2011-12-06 18:45:41.0 +0400 @@ -165,8 +165,10 @@ case $action in idle) + completed=$(awk -F/ '{ if ($1 == none) {print 0} else {print $1}}' /sys/block/$array/md/sync_completed) echo $action $SYNC_ACTION_CTL [ $quiet -lt 1 ] echo $PROGNAME: I: cancel request queued for array $array. 2 + echo $completed /sys/block/$array/md/sync_min ;; check) -8 Of course, it's easy to dump sync_completed state in temporary files somewhere in /var/lib/mdadm/ to survive on reboot. I'm not sure if that is a good idea at all... --- /usr/share/mdadm/checkarray 2011-12-06 04:41:09.0 +0400 +++ ./checkarray 2011-12-06 18:45:41.0 +0400 @@ -165,8 +165,10 @@ case $action in idle) + completed=$(awk -F/ '{ if ($1 == none) {print 0} else {print $1}}' /sys/block/$array/md/sync_completed) echo $action $SYNC_ACTION_CTL [ $quiet -lt 1 ] echo $PROGNAME: I: cancel request queued for array $array. 2 + echo $completed /sys/block/$array/md/sync_min ;; check) #!/bin/sh # # checkarray -- initiates a check run of an MD array's redundancy information. # # Copyright © martin f. krafft madd...@debian.org # distributed under the terms of the Artistic Licence 2.0 # set -eu PROGNAME=${0##*/} about() { echo $PROGNAME -- MD array (RAID) redundancy checker tool echo Copyright © martin f. krafft madd...@debian.org echo Released under the terms of the Artistic Licence 2.0 } usage() { about echo echo Usage: $PROGNAME [options] [arrays] echo echo Valid options are: cat -_eof | column -s\ -t -a|--all check all assembled arrays (check /proc/mdstat). -s|--status print redundancy check status of devices. -x|--cancel queue a request to cancel a running redundancy check. -i|--idle perform check in a lowest I/O scheduling class (idle). -l|--slow perform check in a lower-than-standard I/O scheduling class. -f|--fast perform check in higher-than-standard I/O scheduling class. --realtime perform check in real-time I/O scheduling class (DANGEROUS!). -c|--cron honour AUTOCHECK setting in /etc/default/mdadm. -q|--quiet suppress informational messages. -Q|--real-quiet suppress all output messages, including warnings and errors. -h|--help show this output. -V|--version show version information. _eof echo echo Examples: echo $PROGNAME --all --idle echo $PROGNAME --quiet /dev/md[123] echo $PROGNAME -sa echo $PROGNAME -x --all echo echo Devices can be specified in almost any format. The following are echo all equivalent: echo /dev/md0, md0, /dev/md/0, /sys/block/md0 echo echo The --all option overrides all arrays passed to the script. echo echo You can also control the status of a check with /proc/mdstat . } SHORTOPTS=achVqQsxilf LONGOPTS=all,cron,help,version,quiet,real-quiet,status,cancel,idle,slow,fast,realtime eval set -- $(getopt -o $SHORTOPTS -l $LONGOPTS -n $PROGNAME -- $@) arrays='' cron=0 all=0 quiet=0 status=0 action=check ionice= for opt in $@; do case $opt in -a|--all) all=1;; -s|--status) action=status;; -x|--cancel) action=idle;; -i|--idle) ionice=idle;; -l|--slow) ionice=low;; -f|--fast) ionice=high;; --realtime) ionice=realtime;; -c|--cron) cron=1;; -q|--quiet) quiet=1;; -Q|--real-quiet) quiet=2;; -h|--help) usage; exit 0;; -V|--version) about; exit 0;; /dev/md/*|md/*) arrays=${arrays:+$arrays }md${opt#*md/};; /dev/md*|md*) arrays=${arrays:+$arrays }${opt#/dev/};; /sys/block/md*) arrays=${arrays:+$arrays }${opt#/sys/block/};; --) :;; *) echo $PROGNAME: E: invalid option: $opt 2; usage 2; exit 0;; esac done is_true() { case ${1:-} in [Yy]es|[Yy]|1|[Tt]rue|[Tt]) return 0;; *) return 1; esac } DEBIANCONFIG=/etc/default/mdadm [ -r $DEBIANCONFIG ] . $DEBIANCONFIG if [ $cron = 1 ] ! is_true ${AUTOCHECK:-false}; then [ $quiet -lt 1 ] echo $PROGNAME: I: disabled in $DEBIANCONFIG . 2 exit 0 fi if [ ! -f /proc/mdstat ]; then [ $quiet -lt 2 ] echo $PROGNAME: E: MD subsystem not loaded, or /proc unavailable. 2 exit 2 fi if [ ! -d /sys/block ]; then [ $quiet -lt 2 ] echo $PROGNAME: E:
Bug#556610: Please do incremental checks every night instead of a full monthly one
also sprach Goswin von Brederlow goswin-...@web.de [2009.11.17.0558 +0100]: Neil Brown recently explained on the linux-raid ML that one can do partial checks on a raid array: | If you first read from 'sync_completed' and store that value, | then before starting a new 'check', write the value to | sync_max, then you get exactly what you are asking for, all I assume he ment sync_min here. | easily done in a shell script. | You can also set 'sync_max' if you like, thus you could e.g. | quite easily have a cron job that scrubs 1/28th of the array each | night based on the day of the month. I think it would be a good idea to change the default check to run like this, a little every day or week with /etc/default/mdadm saying which of the two. I like the idea but won't have the time to implement this anytime soon. Patches welcome. -- .''`. martin f. krafft madd...@d.o Related projects: : :' : proud Debian developer http://debiansystem.info `. `'` http://people.debian.org/~madduckhttp://vcs-pkg.org `- Debian - when you have better things to do than fixing systems digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)
Bug#556610: Please do incremental checks every night instead of a full monthly one
Package: mdadm Version: 3.0-2 Severity: wishlist Hi, Neil Brown recently explained on the linux-raid ML that one can do partial checks on a raid array: | If you first read from 'sync_completed' and store that value, | then before starting a new 'check', write the value to | sync_max, then you get exactly what you are asking for, all I assume he ment sync_min here. | easily done in a shell script. | You can also set 'sync_max' if you like, thus you could e.g. | quite easily have a cron job that scrubs 1/28th of the array each | night based on the day of the month. I think it would be a good idea to change the default check to run like this, a little every day or week with /etc/default/mdadm saying which of the two. MfG Goswin -- System Information: Debian Release: squeeze/sid APT prefers unstable-i386 APT policy: (1001, 'unstable-i386'), (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.29.4-frosties-2 (SMP w/4 CPU cores) Locale: LANG=C, LC_CTYPE=de_DE (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/bash Versions of packages mdadm depends on: ii debconf 1.5.27 Debian configuration management sy ii libc6 2.10.1-2 GNU C Library: Shared libraries ii lsb-base 3.2-23 Linux Standard Base 3.2 init scrip ii makedev 2.3.1-89 creates device files in /dev ii udev 0.141-1/dev/ and hotplug management daemo Versions of packages mdadm recommends: ii exim4-daemon-heavy [mail-tran 4.69-11Exim MTA (v4) daemon with extended ii module-init-tools 3.9-2 tools for managing Linux kernel mo mdadm suggests no packages. -- debconf information excluded -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org