Re: mksnap_ffs, snapshot issues, again

2003-08-25 Thread Adam Migus

Julian Elischer said:
 Would it not be possible to make the snapshot file not appear in a
 directory until it si finished? (I know that would be 'wierd'
 but it would give a guaranteed solution..


That sounds kinda neat as a compile time option or non-default (or
perhaps even default) tunable if it were possible and not too
difficult.  The problem should likely be addressed given people
might want to create snapshots around the same time they run find
via the maintanence scripts.

-- 
Adam - Migus Dot Org (http://www.migus.org)
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: mksnap_ffs, snapshot issues, again

2003-08-24 Thread Julian Elischer
Would it not be possible to make the snapshot file not appear in a
directory until it si finished? (I know that would be 'wierd'
but it would give a guaranteed solution..


On Sat, 23 Aug 2003, Kirk McKusick wrote:

 Robert Watson forwarded your posting to me as I am not as current
 on current as I should be.
 
 -- Forwarded message --
  Date: Mon, 18 Aug 2003 22:38:47 +0200
  From: [iso-8859-2] Branko F. Graènar [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Subject: mksnap_ffs, snapshot issues, again
  
  I have 900G array on a promise sx6000 controller
  
  This is freshly formatted filesystem (newfs -L export -O 2 -U -g 48000 -i 2048 -m 
  0 -o space /dev/pst0s2d)
  
  # df -i /export
  /dev/pst0s2d 778742004 216194 778525810   0%   2 4451592920%   /export
  
  # mount | grep export
  /dev/pst0s2d on /export (ufs, local, soft-updates)
  
  let's try to create a snapshot of empty filesystem
  
  # cd /export
  # mksnap_ffs /export aaa.snap
  
  ... after 30 minutes ... snapshot was not created (!!! On a empty
  filesystem !!!)... Ok, long snapshot creation would be fine if it
  would not hang all processes, which would like to do something on
  /export (ls /export for example.). Filesystem cannot be unmounted.
  mksnap_ffs process cannot be killed. Reboot and foreground fsck
  helps.
 
  This is 5.1-RELEASE (without patches, with custom kernel - just picked up generic 
  kernel and removed uneeded stuff.)
  
  Any ideas, why is this happening? As i mentioned before, this prevents background 
  fsck to make his job done (machine hangs.)
  
  
  I would really like to solve this issue
  
  Brane
 
  Discussion -
 
 Paul Saab kindly arranged a machine (tank.freebsd.org) with a 2Tb
 disk array on it for me to test. I enclose a copy of the `sysctl kern'
 output at the end of this message. I first ran my own test which
 involved creating a default configuration filesystem, taking a
 snapshot, and removing the snapshot. The scripted result is below.
 It shows that it takes 48 minutes to create the snapshot and 15
 minutes to remove it. But importantly, it shows that the filesystem
 is only locked down and inaccessible for 0.042 seconds of that 48
 minutes. The problem is that the 77,000 indirect blocks needed by
 the snapshot do not fit in the 300 kernel buffers allotted to it.
 So, every indirect block needs to be read and written approximately
 three times. Just to be sure that there was not something weird about
 your configuration, I also ran the same set of tests using your
 newfs parameters. Other than creating more cylinder groups the
 result (e.g., running time) was about the same.
 
 But, to get to the problem that you are having with accessing your
 filesystem. The problem is that although the filesystem is only
 locked briefly, the snapshot file is locked for the entire 48 minutes.
 Thus, if you touch the snapshot file (by for example doing a stat
 on it), then the process doing the stat will hang for 48 minutes.
 The next process to try and touch the snapshot will lock /export
 while it waits for the lock on the snapshot to clear. And at that
 point you are hosed for 48 minutes on all access to /export :-(
 So, I think that the best solution for you would be to try creating
 a hidden directory for the snapshot file, e.g., create a /export/.snap
 directory mode 700 owned by root, then create the snapshot as say
 /export/.snap/snap1. This way, it will be out of the way of all
 snoopy programs except those walking the filetree as root.
 
   Kirk McKusick
 
  Results of my test -
 
 Script started on Fri Aug 22 17:18:34 2003
 
 tank# newfs /dev/twed0
 /dev/twed0: 2097152.0MB (4294967292 sectors) block size 16384, fragment size 2048
 using 11413 cylinder groups of 183.77MB, 11761 blks, 23552 inodes.
 super-block backups (for fsck -b #) at:
  160, 376512, 752864, 1129216, 1505568, 1881920, 2258272, 2634624, 3010976,
  3387328, 3763680, 4140032, 4516384, 4892736, 5269088, 5645440, 6021792,
  6398144, 6774496, 7150848, 7527200, 7903552, 8279904, 8656256, 9032608,
  9408960, 9785312, 10161664, 10538016, 10914368, 11290720, 11667072, 12043424,
  12419776, 12796128, 13172480, 13548832, 13925184, 14301536, 14677888,
  15054240, 15430592, 15806944, 16183296, 16559648, 16936000, 17312352,
  17688704, 18065056, 18441408, 18817760, 19194112, 19570464, 19946816,
  20323168, 20699520, 21075872, 21452224, 21828576, 22204928, 22581280,
 
   etc, etc, etc 
 
  4283638624, 4284014976, 4284391328, 4284767680, 4285144032, 4285520384,
  4285896736, 4286273088, 4286649440, 4287025792, 4287402144, 4287778496,
  4288154848, 4288531200, 4288907552, 4289283904, 4289660256, 4290036608,
  4290412960, 4290789312, 4291165664, 4291542016, 4291918368, 4292294720,
  4292671072, 4293047424, 4293423776, 4293800128, 4294176480, 4294552832,
  4294929184
 
 tank# dumpfs /dev/twed0 | head -22
 magic   19540119 (UFS2) timeSat Aug 23 01:18:55

mksnap_ffs, snapshot issues, again

2003-08-23 Thread Kirk McKusick
Robert Watson forwarded your posting to me as I am not as current
on current as I should be.

-- Forwarded message --
 Date: Mon, 18 Aug 2003 22:38:47 +0200
 From: [iso-8859-2] Branko F. Graènar [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: mksnap_ffs, snapshot issues, again
 
 I have 900G array on a promise sx6000 controller
 
 This is freshly formatted filesystem (newfs -L export -O 2 -U -g 48000 -i 2048 -m 0 
 -o space /dev/pst0s2d)
 
 # df -i /export
 /dev/pst0s2d 778742004 216194 778525810   0%   2 4451592920%   /export
 
 # mount | grep export
 /dev/pst0s2d on /export (ufs, local, soft-updates)
 
 let's try to create a snapshot of empty filesystem
 
 # cd /export
 # mksnap_ffs /export aaa.snap
 
 ... after 30 minutes ... snapshot was not created (!!! On a empty
 filesystem !!!)... Ok, long snapshot creation would be fine if it
 would not hang all processes, which would like to do something on
 /export (ls /export for example.). Filesystem cannot be unmounted.
 mksnap_ffs process cannot be killed. Reboot and foreground fsck
 helps.

 This is 5.1-RELEASE (without patches, with custom kernel - just picked up generic 
 kernel and removed uneeded stuff.)
 
 Any ideas, why is this happening? As i mentioned before, this prevents background 
 fsck to make his job done (machine hangs.)
 
 
 I would really like to solve this issue
 
 Brane

 Discussion -

Paul Saab kindly arranged a machine (tank.freebsd.org) with a 2Tb
disk array on it for me to test. I enclose a copy of the `sysctl kern'
output at the end of this message. I first ran my own test which
involved creating a default configuration filesystem, taking a
snapshot, and removing the snapshot. The scripted result is below.
It shows that it takes 48 minutes to create the snapshot and 15
minutes to remove it. But importantly, it shows that the filesystem
is only locked down and inaccessible for 0.042 seconds of that 48
minutes. The problem is that the 77,000 indirect blocks needed by
the snapshot do not fit in the 300 kernel buffers allotted to it.
So, every indirect block needs to be read and written approximately
three times. Just to be sure that there was not something weird about
your configuration, I also ran the same set of tests using your
newfs parameters. Other than creating more cylinder groups the
result (e.g., running time) was about the same.

But, to get to the problem that you are having with accessing your
filesystem. The problem is that although the filesystem is only
locked briefly, the snapshot file is locked for the entire 48 minutes.
Thus, if you touch the snapshot file (by for example doing a stat
on it), then the process doing the stat will hang for 48 minutes.
The next process to try and touch the snapshot will lock /export
while it waits for the lock on the snapshot to clear. And at that
point you are hosed for 48 minutes on all access to /export :-(
So, I think that the best solution for you would be to try creating
a hidden directory for the snapshot file, e.g., create a /export/.snap
directory mode 700 owned by root, then create the snapshot as say
/export/.snap/snap1. This way, it will be out of the way of all
snoopy programs except those walking the filetree as root.

Kirk McKusick

 Results of my test -

Script started on Fri Aug 22 17:18:34 2003

tank# newfs /dev/twed0
/dev/twed0: 2097152.0MB (4294967292 sectors) block size 16384, fragment size 2048
using 11413 cylinder groups of 183.77MB, 11761 blks, 23552 inodes.
super-block backups (for fsck -b #) at:
 160, 376512, 752864, 1129216, 1505568, 1881920, 2258272, 2634624, 3010976,
 3387328, 3763680, 4140032, 4516384, 4892736, 5269088, 5645440, 6021792,
 6398144, 6774496, 7150848, 7527200, 7903552, 8279904, 8656256, 9032608,
 9408960, 9785312, 10161664, 10538016, 10914368, 11290720, 11667072, 12043424,
 12419776, 12796128, 13172480, 13548832, 13925184, 14301536, 14677888,
 15054240, 15430592, 15806944, 16183296, 16559648, 16936000, 17312352,
 17688704, 18065056, 18441408, 18817760, 19194112, 19570464, 19946816,
 20323168, 20699520, 21075872, 21452224, 21828576, 22204928, 22581280,

  etc, etc, etc 

 4283638624, 4284014976, 4284391328, 4284767680, 4285144032, 4285520384,
 4285896736, 4286273088, 4286649440, 4287025792, 4287402144, 4287778496,
 4288154848, 4288531200, 4288907552, 4289283904, 4289660256, 4290036608,
 4290412960, 4290789312, 4291165664, 4291542016, 4291918368, 4292294720,
 4292671072, 4293047424, 4293423776, 4293800128, 4294176480, 4294552832,
 4294929184

tank# dumpfs /dev/twed0 | head -22
magic   19540119 (UFS2) timeSat Aug 23 01:18:55 2003
superblock location 65536   id  [ 3f47236f d612c37d ]
ncg 11413   size1073741823  blocks  1039959213
bsize   16384   shift   14  mask0xc000
fsize   2048shift   11  mask0xf800
frag8   shift   3   fsbtodb 2
minfree 8%  optim   timesymlinklen 120

Re: mksnap_ffs, snapshot issues, again

2003-08-23 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Kirk McKusick writes:

But, to get to the problem that you are having with accessing your
filesystem. The problem is that although the filesystem is only
locked briefly, the snapshot file is locked for the entire 48 minutes.
Thus, if you touch the snapshot file (by for example doing a stat
on it), then the process doing the stat will hang for 48 minutes.

Isn't there some way we can loosen this aspect up ?

Either by having stat know about it and return approximate info or
simply by failing ? (I pressume that making the sleep interruptible
would break all sorts of standards)


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: mksnap_ffs, snapshot issues, again

2003-08-23 Thread Kirk McKusick
To: Kirk McKusick [EMAIL PROTECTED]
cc: [iso-8859-2] Branko F. Graènar [EMAIL PROTECTED],
Paul Saab [EMAIL PROTECTED],
Robert Watson [EMAIL PROTECTED],
[EMAIL PROTECTED]
Subject: Re: mksnap_ffs, snapshot issues, again 
From: Poul-Henning Kamp [EMAIL PROTECTED]
In-Reply-To: Your message of Sat, 23 Aug 2003 01:32:38 PDT.
Date: Sat, 23 Aug 2003 11:01:28 +0200
X-ASK-Info: Whitelist match

In message [EMAIL PROTECTED],
Kirk McKusick writes:

But, to get to the problem that you are having with accessing your
filesystem. The problem is that although the filesystem is only
locked briefly, the snapshot file is locked for the entire 48 minutes.
Thus, if you touch the snapshot file (by for example doing a stat
on it), then the process doing the stat will hang for 48 minutes.

Isn't there some way we can loosen this aspect up ?

Either by having stat know about it and return approximate info or
simply by failing ? (I pressume that making the sleep interruptible
would break all sorts of standards)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe

The race to the root problem in general could be largely solved
by changing lookup (VOP_LOOKUP really) to release the lock that
it holds on the directory before blocking on the next component
in the case where it is doing a lookup without intent to create.
If we did this, then a single locked node would have lookups
pile up on itself, but could not cascade to the root. A related
change would be to do an interruptable locking request on the
node so that if one did an `ls -l foo' where foo was say a
locked snapshot, it would be possible to interrupt it.

~Kirk
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: mksnap_ffs, snapshot issues, again

2003-08-19 Thread Branko F. Gracnar

The behaviour of filesystem activity stalling during snapshot creation
is intentional, but 30 minutes to snapshot an empty FS is not.  Is
there disk activity during this time?  It's not clear from your mail
whether bg fsck is in operation during this time.  If so, that's
probably the cause, since bg fsck itself uses a snapshot to check the
FS consistency.

Background fsck was NOT running. I formatted fs and then tried to make snapshot.

Machine just hangs.

Brane
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: mksnap_ffs, snapshot issues, again

2003-08-19 Thread Robert Watson

On Tue, 19 Aug 2003, Branko F. Gracnar wrote:

 The behaviour of filesystem activity stalling during snapshot creation
 is intentional, but 30 minutes to snapshot an empty FS is not.  Is
 there disk activity during this time?  It's not clear from your mail
 whether bg fsck is in operation during this time.  If so, that's
 probably the cause, since bg fsck itself uses a snapshot to check the
 FS consistency.
 
 Background fsck was NOT running. I formatted fs and then tried to make
 snapshot. 

When reporting bgfsck/snapshot/... problems, you may want to CC Kirk
McKusick [EMAIL PROTECTED] -- I don't believe he closely tracks
current@, and he's the best person to track down and fix problems in this
area.  I forwarded your earlier message to him, but haven't heard back as
yet.  Just FYI.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


mksnap_ffs, snapshot issues, again

2003-08-18 Thread Branko F . Granar
I have 900G array on a promise sx6000 controller

This is freshly formatted filesystem (newfs -L export -O 2 -U -g 48000 -i 2048 -m 0 -o 
space /dev/pst0s2d)

# df -i /export
/dev/pst0s2d 778742004 216194 778525810 0%   2 4451592920%   /export

# mount | grep export
/dev/pst0s2d on /export (ufs, local, soft-updates)

let's try to create a snapshot of empty filesystem

# cd /export
# mksnap_ffs /export aaa.snap

... after 30 minutes ... snapshot was not created (!!! On a empty filesystem !!!)... 
Ok, long snapshot creation would be fine if it would not hang all processes, which 
would like to do something on /export (ls /export for example.). Filesystem cannot be 
unmounted. mksnap_ffs process cannot be killed. Reboot and foreground fsck helps.

This is 5.1-RELEASE (without patches, with custom kernel - just picked up generic 
kernel and removed uneeded stuff.)

Any ideas, why is this happening? As i mentioned before, this prevents background fsck 
to make his job done (machine hangs.)


I would really like to solve this issue

Brane
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: mksnap_ffs, snapshot issues, again

2003-08-18 Thread Kris Kennaway
On Mon, Aug 18, 2003 at 10:38:47PM +0200, Branko F. Gra?nar wrote:

 # mksnap_ffs /export aaa.snap
 
 ... after 30 minutes ... snapshot was not created (!!! On a empty
 filesystem !!!)... Ok, long snapshot creation would be fine if it
 would not hang all processes, which would like to do something on
 /export (ls /export for example.). Filesystem cannot be
 unmounted. mksnap_ffs process cannot be killed. Reboot and
 foreground fsck helps.

Please wrap your lines at 70 characters so your emails can be easily read.

The behaviour of filesystem activity stalling during snapshot creation
is intentional, but 30 minutes to snapshot an empty FS is not.  Is
there disk activity during this time?  It's not clear from your mail
whether bg fsck is in operation during this time.  If so, that's
probably the cause, since bg fsck itself uses a snapshot to check the
FS consistency.

Kris


pgp0.pgp
Description: PGP signature