Re: mksnap_ffs, snapshot issues, again
Julian Elischer said: Would it not be possible to make the snapshot file not appear in a directory until it si finished? (I know that would be 'wierd' but it would give a guaranteed solution.. That sounds kinda neat as a compile time option or non-default (or perhaps even default) tunable if it were possible and not too difficult. The problem should likely be addressed given people might want to create snapshots around the same time they run find via the maintanence scripts. -- Adam - Migus Dot Org (http://www.migus.org) ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: mksnap_ffs, snapshot issues, again
Would it not be possible to make the snapshot file not appear in a directory until it si finished? (I know that would be 'wierd' but it would give a guaranteed solution.. On Sat, 23 Aug 2003, Kirk McKusick wrote: Robert Watson forwarded your posting to me as I am not as current on current as I should be. -- Forwarded message -- Date: Mon, 18 Aug 2003 22:38:47 +0200 From: [iso-8859-2] Branko F. Graènar [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: mksnap_ffs, snapshot issues, again I have 900G array on a promise sx6000 controller This is freshly formatted filesystem (newfs -L export -O 2 -U -g 48000 -i 2048 -m 0 -o space /dev/pst0s2d) # df -i /export /dev/pst0s2d 778742004 216194 778525810 0% 2 4451592920% /export # mount | grep export /dev/pst0s2d on /export (ufs, local, soft-updates) let's try to create a snapshot of empty filesystem # cd /export # mksnap_ffs /export aaa.snap ... after 30 minutes ... snapshot was not created (!!! On a empty filesystem !!!)... Ok, long snapshot creation would be fine if it would not hang all processes, which would like to do something on /export (ls /export for example.). Filesystem cannot be unmounted. mksnap_ffs process cannot be killed. Reboot and foreground fsck helps. This is 5.1-RELEASE (without patches, with custom kernel - just picked up generic kernel and removed uneeded stuff.) Any ideas, why is this happening? As i mentioned before, this prevents background fsck to make his job done (machine hangs.) I would really like to solve this issue Brane Discussion - Paul Saab kindly arranged a machine (tank.freebsd.org) with a 2Tb disk array on it for me to test. I enclose a copy of the `sysctl kern' output at the end of this message. I first ran my own test which involved creating a default configuration filesystem, taking a snapshot, and removing the snapshot. The scripted result is below. It shows that it takes 48 minutes to create the snapshot and 15 minutes to remove it. But importantly, it shows that the filesystem is only locked down and inaccessible for 0.042 seconds of that 48 minutes. The problem is that the 77,000 indirect blocks needed by the snapshot do not fit in the 300 kernel buffers allotted to it. So, every indirect block needs to be read and written approximately three times. Just to be sure that there was not something weird about your configuration, I also ran the same set of tests using your newfs parameters. Other than creating more cylinder groups the result (e.g., running time) was about the same. But, to get to the problem that you are having with accessing your filesystem. The problem is that although the filesystem is only locked briefly, the snapshot file is locked for the entire 48 minutes. Thus, if you touch the snapshot file (by for example doing a stat on it), then the process doing the stat will hang for 48 minutes. The next process to try and touch the snapshot will lock /export while it waits for the lock on the snapshot to clear. And at that point you are hosed for 48 minutes on all access to /export :-( So, I think that the best solution for you would be to try creating a hidden directory for the snapshot file, e.g., create a /export/.snap directory mode 700 owned by root, then create the snapshot as say /export/.snap/snap1. This way, it will be out of the way of all snoopy programs except those walking the filetree as root. Kirk McKusick Results of my test - Script started on Fri Aug 22 17:18:34 2003 tank# newfs /dev/twed0 /dev/twed0: 2097152.0MB (4294967292 sectors) block size 16384, fragment size 2048 using 11413 cylinder groups of 183.77MB, 11761 blks, 23552 inodes. super-block backups (for fsck -b #) at: 160, 376512, 752864, 1129216, 1505568, 1881920, 2258272, 2634624, 3010976, 3387328, 3763680, 4140032, 4516384, 4892736, 5269088, 5645440, 6021792, 6398144, 6774496, 7150848, 7527200, 7903552, 8279904, 8656256, 9032608, 9408960, 9785312, 10161664, 10538016, 10914368, 11290720, 11667072, 12043424, 12419776, 12796128, 13172480, 13548832, 13925184, 14301536, 14677888, 15054240, 15430592, 15806944, 16183296, 16559648, 16936000, 17312352, 17688704, 18065056, 18441408, 18817760, 19194112, 19570464, 19946816, 20323168, 20699520, 21075872, 21452224, 21828576, 22204928, 22581280, etc, etc, etc 4283638624, 4284014976, 4284391328, 4284767680, 4285144032, 4285520384, 4285896736, 4286273088, 4286649440, 4287025792, 4287402144, 4287778496, 4288154848, 4288531200, 4288907552, 4289283904, 4289660256, 4290036608, 4290412960, 4290789312, 4291165664, 4291542016, 4291918368, 4292294720, 4292671072, 4293047424, 4293423776, 4293800128, 4294176480, 4294552832, 4294929184 tank# dumpfs /dev/twed0 | head -22 magic 19540119 (UFS2) timeSat Aug 23 01:18:55
mksnap_ffs, snapshot issues, again
Robert Watson forwarded your posting to me as I am not as current on current as I should be. -- Forwarded message -- Date: Mon, 18 Aug 2003 22:38:47 +0200 From: [iso-8859-2] Branko F. Graènar [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: mksnap_ffs, snapshot issues, again I have 900G array on a promise sx6000 controller This is freshly formatted filesystem (newfs -L export -O 2 -U -g 48000 -i 2048 -m 0 -o space /dev/pst0s2d) # df -i /export /dev/pst0s2d 778742004 216194 778525810 0% 2 4451592920% /export # mount | grep export /dev/pst0s2d on /export (ufs, local, soft-updates) let's try to create a snapshot of empty filesystem # cd /export # mksnap_ffs /export aaa.snap ... after 30 minutes ... snapshot was not created (!!! On a empty filesystem !!!)... Ok, long snapshot creation would be fine if it would not hang all processes, which would like to do something on /export (ls /export for example.). Filesystem cannot be unmounted. mksnap_ffs process cannot be killed. Reboot and foreground fsck helps. This is 5.1-RELEASE (without patches, with custom kernel - just picked up generic kernel and removed uneeded stuff.) Any ideas, why is this happening? As i mentioned before, this prevents background fsck to make his job done (machine hangs.) I would really like to solve this issue Brane Discussion - Paul Saab kindly arranged a machine (tank.freebsd.org) with a 2Tb disk array on it for me to test. I enclose a copy of the `sysctl kern' output at the end of this message. I first ran my own test which involved creating a default configuration filesystem, taking a snapshot, and removing the snapshot. The scripted result is below. It shows that it takes 48 minutes to create the snapshot and 15 minutes to remove it. But importantly, it shows that the filesystem is only locked down and inaccessible for 0.042 seconds of that 48 minutes. The problem is that the 77,000 indirect blocks needed by the snapshot do not fit in the 300 kernel buffers allotted to it. So, every indirect block needs to be read and written approximately three times. Just to be sure that there was not something weird about your configuration, I also ran the same set of tests using your newfs parameters. Other than creating more cylinder groups the result (e.g., running time) was about the same. But, to get to the problem that you are having with accessing your filesystem. The problem is that although the filesystem is only locked briefly, the snapshot file is locked for the entire 48 minutes. Thus, if you touch the snapshot file (by for example doing a stat on it), then the process doing the stat will hang for 48 minutes. The next process to try and touch the snapshot will lock /export while it waits for the lock on the snapshot to clear. And at that point you are hosed for 48 minutes on all access to /export :-( So, I think that the best solution for you would be to try creating a hidden directory for the snapshot file, e.g., create a /export/.snap directory mode 700 owned by root, then create the snapshot as say /export/.snap/snap1. This way, it will be out of the way of all snoopy programs except those walking the filetree as root. Kirk McKusick Results of my test - Script started on Fri Aug 22 17:18:34 2003 tank# newfs /dev/twed0 /dev/twed0: 2097152.0MB (4294967292 sectors) block size 16384, fragment size 2048 using 11413 cylinder groups of 183.77MB, 11761 blks, 23552 inodes. super-block backups (for fsck -b #) at: 160, 376512, 752864, 1129216, 1505568, 1881920, 2258272, 2634624, 3010976, 3387328, 3763680, 4140032, 4516384, 4892736, 5269088, 5645440, 6021792, 6398144, 6774496, 7150848, 7527200, 7903552, 8279904, 8656256, 9032608, 9408960, 9785312, 10161664, 10538016, 10914368, 11290720, 11667072, 12043424, 12419776, 12796128, 13172480, 13548832, 13925184, 14301536, 14677888, 15054240, 15430592, 15806944, 16183296, 16559648, 16936000, 17312352, 17688704, 18065056, 18441408, 18817760, 19194112, 19570464, 19946816, 20323168, 20699520, 21075872, 21452224, 21828576, 22204928, 22581280, etc, etc, etc 4283638624, 4284014976, 4284391328, 4284767680, 4285144032, 4285520384, 4285896736, 4286273088, 4286649440, 4287025792, 4287402144, 4287778496, 4288154848, 4288531200, 4288907552, 4289283904, 4289660256, 4290036608, 4290412960, 4290789312, 4291165664, 4291542016, 4291918368, 4292294720, 4292671072, 4293047424, 4293423776, 4293800128, 4294176480, 4294552832, 4294929184 tank# dumpfs /dev/twed0 | head -22 magic 19540119 (UFS2) timeSat Aug 23 01:18:55 2003 superblock location 65536 id [ 3f47236f d612c37d ] ncg 11413 size1073741823 blocks 1039959213 bsize 16384 shift 14 mask0xc000 fsize 2048shift 11 mask0xf800 frag8 shift 3 fsbtodb 2 minfree 8% optim timesymlinklen 120
Re: mksnap_ffs, snapshot issues, again
In message [EMAIL PROTECTED], Kirk McKusick writes: But, to get to the problem that you are having with accessing your filesystem. The problem is that although the filesystem is only locked briefly, the snapshot file is locked for the entire 48 minutes. Thus, if you touch the snapshot file (by for example doing a stat on it), then the process doing the stat will hang for 48 minutes. Isn't there some way we can loosen this aspect up ? Either by having stat know about it and return approximate info or simply by failing ? (I pressume that making the sleep interruptible would break all sorts of standards) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: mksnap_ffs, snapshot issues, again
To: Kirk McKusick [EMAIL PROTECTED] cc: [iso-8859-2] Branko F. Graènar [EMAIL PROTECTED], Paul Saab [EMAIL PROTECTED], Robert Watson [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: mksnap_ffs, snapshot issues, again From: Poul-Henning Kamp [EMAIL PROTECTED] In-Reply-To: Your message of Sat, 23 Aug 2003 01:32:38 PDT. Date: Sat, 23 Aug 2003 11:01:28 +0200 X-ASK-Info: Whitelist match In message [EMAIL PROTECTED], Kirk McKusick writes: But, to get to the problem that you are having with accessing your filesystem. The problem is that although the filesystem is only locked briefly, the snapshot file is locked for the entire 48 minutes. Thus, if you touch the snapshot file (by for example doing a stat on it), then the process doing the stat will hang for 48 minutes. Isn't there some way we can loosen this aspect up ? Either by having stat know about it and return approximate info or simply by failing ? (I pressume that making the sleep interruptible would break all sorts of standards) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe The race to the root problem in general could be largely solved by changing lookup (VOP_LOOKUP really) to release the lock that it holds on the directory before blocking on the next component in the case where it is doing a lookup without intent to create. If we did this, then a single locked node would have lookups pile up on itself, but could not cascade to the root. A related change would be to do an interruptable locking request on the node so that if one did an `ls -l foo' where foo was say a locked snapshot, it would be possible to interrupt it. ~Kirk ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: mksnap_ffs, snapshot issues, again
The behaviour of filesystem activity stalling during snapshot creation is intentional, but 30 minutes to snapshot an empty FS is not. Is there disk activity during this time? It's not clear from your mail whether bg fsck is in operation during this time. If so, that's probably the cause, since bg fsck itself uses a snapshot to check the FS consistency. Background fsck was NOT running. I formatted fs and then tried to make snapshot. Machine just hangs. Brane ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: mksnap_ffs, snapshot issues, again
On Tue, 19 Aug 2003, Branko F. Gracnar wrote: The behaviour of filesystem activity stalling during snapshot creation is intentional, but 30 minutes to snapshot an empty FS is not. Is there disk activity during this time? It's not clear from your mail whether bg fsck is in operation during this time. If so, that's probably the cause, since bg fsck itself uses a snapshot to check the FS consistency. Background fsck was NOT running. I formatted fs and then tried to make snapshot. When reporting bgfsck/snapshot/... problems, you may want to CC Kirk McKusick [EMAIL PROTECTED] -- I don't believe he closely tracks current@, and he's the best person to track down and fix problems in this area. I forwarded your earlier message to him, but haven't heard back as yet. Just FYI. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Network Associates Laboratories ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
mksnap_ffs, snapshot issues, again
I have 900G array on a promise sx6000 controller This is freshly formatted filesystem (newfs -L export -O 2 -U -g 48000 -i 2048 -m 0 -o space /dev/pst0s2d) # df -i /export /dev/pst0s2d 778742004 216194 778525810 0% 2 4451592920% /export # mount | grep export /dev/pst0s2d on /export (ufs, local, soft-updates) let's try to create a snapshot of empty filesystem # cd /export # mksnap_ffs /export aaa.snap ... after 30 minutes ... snapshot was not created (!!! On a empty filesystem !!!)... Ok, long snapshot creation would be fine if it would not hang all processes, which would like to do something on /export (ls /export for example.). Filesystem cannot be unmounted. mksnap_ffs process cannot be killed. Reboot and foreground fsck helps. This is 5.1-RELEASE (without patches, with custom kernel - just picked up generic kernel and removed uneeded stuff.) Any ideas, why is this happening? As i mentioned before, this prevents background fsck to make his job done (machine hangs.) I would really like to solve this issue Brane ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: mksnap_ffs, snapshot issues, again
On Mon, Aug 18, 2003 at 10:38:47PM +0200, Branko F. Gra?nar wrote: # mksnap_ffs /export aaa.snap ... after 30 minutes ... snapshot was not created (!!! On a empty filesystem !!!)... Ok, long snapshot creation would be fine if it would not hang all processes, which would like to do something on /export (ls /export for example.). Filesystem cannot be unmounted. mksnap_ffs process cannot be killed. Reboot and foreground fsck helps. Please wrap your lines at 70 characters so your emails can be easily read. The behaviour of filesystem activity stalling during snapshot creation is intentional, but 30 minutes to snapshot an empty FS is not. Is there disk activity during this time? It's not clear from your mail whether bg fsck is in operation during this time. If so, that's probably the cause, since bg fsck itself uses a snapshot to check the FS consistency. Kris pgp0.pgp Description: PGP signature