background fsck high load on 8.1

2011-04-12 Thread Sergi Seira

Hello,

we've experienced that background fsck on 8.1 degrades server performance on a 
higher degree than in previous fbsd versions (6.3, 7.3; amd64).

We've noticed it after upgrading - same hardware - to a 8.1-RELEASE.
Now, performance of other services (i.e. apache, mysql) during a background 
fsck falls miserably.

Is there any way to calm fsck down?, nice(1)?, some sysctl?

We have also gmirror, but we prevent to rebuild it if there is a fsck running 
in background.

Thanks for your help,
regards,
Sergi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Deadlock between background fsck and cvsup, both stuck in snaplk

2003-07-01 Thread Jesper Skriver
Hi,

I recently reinstalled my scratch box, last time I turned it off, I
didn't shut it down nicely so when I booted it today the file systems
needed a fsck.

While bgfsck was still running I started a cvsup to update /usr/ports/,
which ran fine for a while, and then stopped, and now both cvsup and
fsck are waiting for snaplk

The box is running 5.1-CURRENT as of June 8th

When it booted it logged this

...
da0 at ahc0 bus 0 target 0 lun 0
da0: IBM DCAS-34330 S65A Fixed Direct Access SCSI-2 device 
da0: 20.000MB/s transfers (20.000MHz, offset 15)
da0: 4134MB (8467200 512 byte sectors: 255H 63S/T 527C)
da1 at ahc1 bus 0 target 0 lun 0
da1: IBM DORS-32160W WA6A Fixed Direct Access SCSI-2 device 
da1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da1: 2063MB (4226725 512 byte sectors: 255H 63S/T 263C)
Mounting root from ufs:/dev/da0s1a
WARNING: / was not properly dismounted
/: mount pending error: blocks 4 files 0
WARNING: /usr was not properly dismounted
/usr: superblock summary recomputed
WARNING: /usr/obj was not properly dismounted
WARNING: /var was not properly dismounted
acquiring duplicate lock of same type: vnode interlock
 1st vnode interlock @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:477
 2nd vnode interlock @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:480
Stack backtrace:

But the backtrace isn't in the output of dmesg nor in /var/log/messages

The processes currently waiting for snaplk are

olive# ps auxl | grep snaplk
root  41  0.0  0.0 0   12  ??  DL8:47PM   0:01.44  (syncer)0   
  0   0  -4  0 snaplk
root 496  0.0  1.5  2280 1848  ??  DN8:49PM   0:08.78 fsck_ufs -p -B / 0   
481   0  -4  4 snaplk
root 537  0.0  4.2  6192 5280  p0  D 8:53PM   0:34.17 /usr/local/bin/c 0   
  1   0  -4  0 snaplk
root 590  0.0  0.1   328  180  p0  R+9:08PM   0:00.01 grep snaplk  0   
520   0 108  0 - 

Any ideas ?

/Jesper

-- 
Jesper Skriver, jesper(at)skriver(dot)dk  -  CCIE #5456

One Unix to rule them all, One Resolver to find them,
One IP to bring them all and in the zone to bind them.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Why doesn't background fsck work ?

2003-06-22 Thread Alexander Leidinger
On Thu, 19 Jun 2003 00:25:22 -0700
Terry Lambert [EMAIL PROTECTED] wrote:

 Then you reboot.
 
 Then you start BG fsck again.
 
 Then you panic again.
 
 Repeat this until a human intervenes and manually runs a full FG
 fsck on the disk before letting it be used, and/or someone adds
 a count-down counter to the superblock,

I have a dejá vue... I think there was already a discussion on this
topic (and you participated in it)... and I think someone pointed out,
that we already have a mechanism which does a fg-fsck instead of a
bg-fsck in such a situation... but maybe someone just changed the Matrix
around me...

Bye,
Alexander.

-- 
   Reboot America.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Why doesn't background fsck work ?

2003-06-19 Thread Terry Lambert
[ ... BG fsck ... ]

  I haven't got softupdates enabled, but I didn't want to enable it,
  because I've heard that it isn't 100% reliable and I didn't want to lose
  data
 
 Theer have been no problems with softupdates in regard to data
 integrity in either 5.0 or 5.1 release. I do recall a couple of
 glitches at various times in current, probably prior to 5.0-Release.

If you end up with real disk corruption as a result of power
failure, a BG fsck will potentially not correct it.  The CG
bitmaps that are set that shouldn't be are effectively read-only,
and the snapshot permits you to fsck in the background.

However, this assumes that the only thing that's blown is the CG
bitmap.

If anything else is blowm then it's likely your system will panic
when it tries to use corrupt data in some pointer math or some
other use in the implementation of the FS in the kernel.  At this
point you panic.

Then you reboot.

Then you start BG fsck again.

Then you panic again.

Repeat this until a human intervenes and manually runs a full FG
fsck on the disk before letting it be used, and/or someone adds
a count-down counter to the superblock, and the on disk FS layout
changes yet again and becomes incompatible with older versions of
the kernel and fsck, both, and requires a fsck with a special flag
to upgrade the FS.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Why doesn't background fsck work ?

2003-06-18 Thread Juan Rodriguez Hervella
Hello!:

I tried to make my Nvidia video card work yesterday, and everytime
I launched the X system my computer hang up. So I had to 
make a hard reboot.
(I think I will fix the Xs problem this night at home)

But I've got another weird problem. :)

The fsck of my partitions is always made on the foregroud,
although I've heard about something like a delayed/background
file system checker.

Why is it always made on the foreground ?

-- 
JFRH

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Why doesn't background fsck work ?

2003-06-18 Thread Bernd Walter
On Wed, Jun 18, 2003 at 04:54:44PM +0200, Juan Rodriguez Hervella wrote:
 Hello!:
 
 I tried to make my Nvidia video card work yesterday, and everytime
 I launched the X system my computer hang up. So I had to 
 make a hard reboot.
 (I think I will fix the Xs problem this night at home)
 
 But I've got another weird problem. :)
 
 The fsck of my partitions is always made on the foregroud,
 although I've heard about something like a delayed/background
 file system checker.
 
 Why is it always made on the foreground ?

No softupdates enabled?

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Why doesn't background fsck work ?

2003-06-18 Thread Juan Rodriguez Hervella
On Wednesday 18 June 2003 17:39, Bernd Walter wrote:
 On Wed, Jun 18, 2003 at 04:54:44PM +0200, Juan Rodriguez Hervella wrote:
  Hello!:
 
  I tried to make my Nvidia video card work yesterday, and everytime
  I launched the X system my computer hang up. So I had to
  make a hard reboot.
  (I think I will fix the Xs problem this night at home)
 
  But I've got another weird problem. :)
 
  The fsck of my partitions is always made on the foregroud,
  although I've heard about something like a delayed/background
  file system checker.
 
  Why is it always made on the foreground ?

 No softupdates enabled?

yes, you've got it !

I haven't got softupdates enabled, but I didn't want to enable it,
because I've heard that it isn't 100% reliable and I didn't want to lose
data

Do you recommend me to switch it on ?

Thanks!

-- 
JFRH

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Why doesn't background fsck work ?

2003-06-18 Thread Bernd Walter
On Wed, Jun 18, 2003 at 05:51:00PM +0200, Juan Rodriguez Hervella wrote:
 On Wednesday 18 June 2003 17:39, Bernd Walter wrote:
  On Wed, Jun 18, 2003 at 04:54:44PM +0200, Juan Rodriguez Hervella wrote:
   Hello!:
  
   I tried to make my Nvidia video card work yesterday, and everytime
   I launched the X system my computer hang up. So I had to
   make a hard reboot.
   (I think I will fix the Xs problem this night at home)
  
   But I've got another weird problem. :)
  
   The fsck of my partitions is always made on the foregroud,
   although I've heard about something like a delayed/background
   file system checker.
  
   Why is it always made on the foreground ?
 
  No softupdates enabled?
 
 yes, you've got it !
 
 I haven't got softupdates enabled, but I didn't want to enable it,
 because I've heard that it isn't 100% reliable and I didn't want to lose
 data

Softupdates works reliable from what I can tell.
Snapshots might have some issues and background fsck uses snapshots.

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Why doesn't background fsck work ?

2003-06-18 Thread Kevin Oberman
 From: Juan Rodriguez Hervella [EMAIL PROTECTED]
 Date: Wed, 18 Jun 2003 17:51:00 +0200
 Sender: [EMAIL PROTECTED]
 
 On Wednesday 18 June 2003 17:39, Bernd Walter wrote:
  On Wed, Jun 18, 2003 at 04:54:44PM +0200, Juan Rodriguez Hervella wrote:
   Hello!:
  
   I tried to make my Nvidia video card work yesterday, and everytime
   I launched the X system my computer hang up. So I had to
   make a hard reboot.
   (I think I will fix the Xs problem this night at home)
  
   But I've got another weird problem. :)
  
   The fsck of my partitions is always made on the foregroud,
   although I've heard about something like a delayed/background
   file system checker.
  
   Why is it always made on the foreground ?
 
  No softupdates enabled?
 
 yes, you've got it !
 
 I haven't got softupdates enabled, but I didn't want to enable it,
 because I've heard that it isn't 100% reliable and I didn't want to lose
 data

Theer have been no problems with softupdates in regard to data
integrity in either 5.0 or 5.1 release. I do recall a couple of
glitches at various times in current, probably prior to 5.0-Release.

There is an issue of combining softupdate and write cache. In the
event of a power failure, this could lead to a loss of data integrity.
But write-cache is dangerous even without softupdate.

In addition, because of the nature of softupdate, it is possible that
a file is created/updated shortly before a system failure. While data
corruption should not be an issue, the file may simply not be there
after the system re-boots or may be in an earlier (but still
consistent) state.

You should probably turn off write cache on any system that you think
might lose power. (And just how reliable is that UPS? Batteries (DC)
are probably much safer.) But I don't think soft updates are a
significant issue. (Others can feel free to tell me I'm an idiot.)

You should probably read Kirk's Usnix paper on soft updates. It's very
good.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: several background fsck panics

2003-03-30 Thread David Schultz
Thus spake Alexander Langer [EMAIL PROTECTED]:
 I had several panics related to background fsck now.  Once I disabled
 background fsck, all went ok.
 
 It began when I pressed the reset buttons on several boots while the
 system was still doing fscks.
[...]
 Mar 24 21:48:59 fump kernel: panic: ufs_dirbad: bad dir

You would have gotten this one without bgfsck as well the next
time you tried to look the offending directory.  Background fsck
only expedited the panic by reading all the directories on the
system in order to perform its checks.  Basically, the panic is
the kernel's way of telling you that something is unexpectedly
wrong with the filesystem (due in this case to ATA write caching),
and that it is going to give up rather than risk causing further
damage.  UFS, as well as most other filesystems, are not designed
to tolerate failures on the part of the hardware to honor its
guarantees, so it's hard to do better without inventing a new
filesystem.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [Re: several background fsck panics

2003-03-30 Thread David Schultz
Thus spake Terry Lambert [EMAIL PROTECTED]:
 o Put a counter in the first superblock; it would be
   incremented when the BG fsck is started, and reset
   to zero when it completes.  If the counter reaches
   3 (or some command line specified number), then the
   BG flagging is ignored, and a full FG fsck is then
   performed instead.  I like this idea because it will
   always work, and it's not actually a hack, it's a
   correct solution.

I'm glad you like it because AFAIK, it is already implemented.  ;-)

 o Implement soft read-only.  The place that most of
   the complaints are coming from is desktop users, with
   relatively quiescent machines.  Though swap is used,
   it does not occur in an FS partition.  As a result,
   the FS could be marked read-only for long period of
   time.  This marking would be in memory.  The clean bit
   would be set on the superblock.  When a write occurs,
   the clean bit would be reset to dirty, and committed
   to disk prior to the write operation being permitted
   to proceed (a stall barrier).  I like this idea because,
   for the most part, it eliminates fsck, both BG and FG,
   on systems that crash while it's in effect.  The net
   result is a system that is statistically much more
   tolerant of failures, but which still requires another
   safety net, such as the previous solution.

I was thinking of doing something like this myself as part of an
``idle timeout'' for disks.  (Marking the filesystem clean after a
period of quiescence would actually interfere with ATA disks'
built-in mechanism for spinning down after a timeout, which is
important for laptops, so the OS would have to track the true
amount of idle time.)  Annoyingly, I can never get the disk
containing /var to remain quiescent for long while cron is running
(even without any crontabs), and I hope this can be solved without
disabling cron or adding a nontrivial hack to bio.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [Re: several background fsck panics

2003-03-28 Thread Terry Lambert
David Schultz wrote:
 Thus spake Terry Lambert [EMAIL PROTECTED]:
  o Put a counter in the first superblock; it would be
incremented when the BG fsck is started, and reset
to zero when it completes.  If the counter reaches
3 (or some command line specified number), then the
BG flagging is ignored, and a full FG fsck is then
performed instead.  I like this idea because it will
always work, and it's not actually a hack, it's a
correct solution.
 
 I'm glad you like it because AFAIK, it is already implemented.  ;-)

Nope.  What's implemented is the FS_NEEDSFSCK flag.  But that
flag is not set in the superblock flags field as *the very first
thing done*.

Thus a failure that results in a panic will not set the flag in
pfatal(), since it never gets there.

Probably the correct thing to do is to set the flag as the very
first operation, and then it will work as expected.

FWIW, it looks like the code in pfatal() wanted to be in main(),
since it complains about not being able to run in the background,
the same way main() does.

However, this still leaves a race window.

The reason the panic happens is that FreeBSD is running processes
on a corrupt FS.

Even in the best case, this panic may occur when anything is
loaded off the FS, so it could happen on init, or on fsck
itself, etc..

So really, the only solution is a counter that the FS kernel
code counts up, which is reset to zero when a BG fsck completes
successfully.   Say grabbing the first byte of fs_sparecon32[].

BTW: This still leaves a failure case: the BG fsck has to be
able to complete successfully... but that's not enough to stave
off a future panic from an undetected error that the fsck didn't
see, because it was only pruning CG bitmaps.

So the correct place to zero the counter is, once again, in the
kernel.  As a result of a successful unmount, from a non-panic
shutdown.

This does mean that three (or count) consecutive power failures
gets you a FG fsck, but that's probably livable (if you were that
certain there was no corruption, you could boot to a shell and
override the count parameter to the FG fsck trigger threshold).


  o Implement soft read-only.  The place that most of
the complaints are coming from is desktop users, with
relatively quiescent machines.  Though swap is used,
it does not occur in an FS partition.  As a result,
the FS could be marked read-only for long period of
time.  This marking would be in memory.  The clean bit
would be set on the superblock.  When a write occurs,
the clean bit would be reset to dirty, and committed
to disk prior to the write operation being permitted
to proceed (a stall barrier).  I like this idea because,
for the most part, it eliminates fsck, both BG and FG,
on systems that crash while it's in effect.  The net
result is a system that is statistically much more
tolerant of failures, but which still requires another
safety net, such as the previous solution.
 
 I was thinking of doing something like this myself as part of an
 ``idle timeout'' for disks.  (Marking the filesystem clean after a
 period of quiescence would actually interfere with ATA disks'
 built-in mechanism for spinning down after a timeout, which is
 important for laptops, so the OS would have to track the true
 amount of idle time.)  Annoyingly, I can never get the disk
 containing /var to remain quiescent for long while cron is running
 (even without any crontabs), and I hope this can be solved without
 disabling cron or adding a nontrivial hack to bio.

We implemented this when we implemented soft updates in FFS under
Windows at Artisoft.  That was back before ATX power supplies were
wide spread, and we needed to be tolerant of users who simply
turned off the power switch, without running the Windows95
shutdown sequence.

I dunno about cron.  I think it noticing crontab changes
automatically has maybe made it too smart for its own good.

Cron updates the access time on the crontab file every time it
runs, which is once a second.  If you disabled this for fstat,
the problem would go away.  I'm not sure the semantics are OK,
though.

The old pre-smarter cron would not have this problem, as it
would run on intervals, and sleep for long periods (until the
next job was scheduled to run), and you had to hit it over the
head with kill -HUP to tell it the file changed.

Probably the correct thing to do is to use old-style long delta
intervals, and register a kevent interest in file modifications.

The cruddy thing is, if it were really read-only, then the access
time update wouldn't happen.  Catch-22.

I think maybe it's useful to distinguish the POSIX semantics here:
shall be scheduled for update is not the same thing, really, as
shall be updated.  So, in practice, you could cache the access
time update for long periods, as long as the correct time was
marked in 

Re: several background fsck panics

2003-03-26 Thread Matthias Schuendehuette
Terry Lambert wrote:

 The issue with the repeating background fsck's is important.
 I suggest a counter that gets reset to zero each time the
 FS is marked clean by fsck, and incremented each time the
 background fsck process is started.

 When this counter reaches a predefined value (I sugest a
 command line option to background fsck, which defaults to
 3, if left unspecified), then the fsck is automatically
 converted to a foreground fsck.

 This counter would be recorded in the superblock.

This sounds like a good idea! I vote for a counter of 2... :-)

Also I suggest to mention as clearly as possible, that operating Soft 
Updates with Write Cache enabled is kind of 'out of specs'. This cannot 
work when crashing! (As you stated clearly!) So I'm also voting for 'WC 
disable' for any kind of disks. SCSI-disks don't need it because of 
Tagged Queuing and only those ATA-Disks that *have* TQ can/should be 
operated 'the fast way' - hoping that Soeren gets it working again... 
:-/
-- 
Ciao/BSD - Matthias

Matthias Schuendehuette msch [at] snafu.de, Berlin (Germany)
Powered by FreeBSD 5.0-CURRENT

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: several background fsck panics

2003-03-25 Thread Alexander Langer
Thus spake Terry Lambert ([EMAIL PROTECTED]):

 Disable write caching on your ATA drive.  You should be able to
 safely reset after that.

Good idea, thanks.  Nevertheless:  I don't think the system should
panic on background fsck's, while a manual fsck works.

Alex

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: several background fsck panics

2003-03-25 Thread Terry Lambert
Alexander Langer wrote:
 Thus spake Terry Lambert ([EMAIL PROTECTED]):
  Disable write caching on your ATA drive.  You should be able to
  safely reset after that.
 
 Good idea, thanks.  Nevertheless:  I don't think the system should
 panic on background fsck's, while a manual fsck works.

A manual fsck can deal with corrupt data.

A background fsck can only deal with invalid cylinder group
bitmaps, and operates on a snapshot.

For a background fsck to be feasible, the FS has to be in a
self-consistent state already, which it wasn't.

When you killed the power on your system and reset it, you
lost the cached data sitting in the ATA disk.  This is due
to the fact that the ATA disk lied, and claimed that it had
committed some writes to stable storage, when in fact it had
only copied them to the disk cache.  As a result, when the
device reset happened, you lost some writes which were in
progress.  Therefore you disk image was corrupt, and so your
FS was *not* in a self-consistent state.

This type of error happens on ATA disks because they do not
permit disconnects during writes, only during reads.  If you
want to be able to reset your machine out from under your
disk, with caching turned on, buy SCSI hardware, instead of
ATA hardware: it does not lie to the host system, and claim
tagged writes have been committed to stable storage when they
have not, and are only in (volatile) cache RAM.

The panic was not, in fact, a result of the background fsck
itself: it was a result of an attempt to access FS structures
by the kernel through the FS, assuming -- incorrectly -- that
the FS structures were in a self-consistent state.

This assumption was bogus, but there was no way for the kernel
to know this because the failure state was not recovered, and
that happened because PC hardware is bogus.

This happened because you had background fsck enabled, and it
was unable to tell the difference between a power failure vs.
a panic vs. some other cause for a system crash (hardware or
other failure).  This is because the PC hardware itself doesn't
record these types of events in NVRAM (e.g. CMOS), nor does it
have sufficient DC holdup time that it could write a failure
code to NVRAM, before losing its marbles.

Hope this explains why you had the problem, and why real servers
tend to specify SCSI hardware, and tend not to be PC-class hardware
(i.e. an RS/6000 would have known the failure cause when it came
back up from reading it's NVRAM, and performed a full recovery
appropriate to the failure).


PS: Unfortunately, this will not change on PC's any time soon,
because people have been trained by computer vendors, disk
vendors, and OS vendors that it's OK for PC's to need
rebooting, and/or to crash unexpectedly in catastrophic
ways that require reinstalling the OS.  So people tolerate
hardware that has ambiguous failure modes, as long as it
costs less.


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: several background fsck panics

2003-03-25 Thread Alexander Langer
Thus spake Terry Lambert ([EMAIL PROTECTED]):

 A manual fsck can deal with corrupt data.

[...]

Yes, I recall the discussion about WC on ata vs. softupdates a few
months back.  I even have it disabled on more important machines than
this one :-)

 The panic was not, in fact, a result of the background fsck
 itself: it was a result of an attempt to access FS structures
 by the kernel through the FS, assuming -- incorrectly -- that
 the FS structures were in a self-consistent state.

Actually I don't care _where_ the panic happened.  If I hadn't manually
interupted the boot process, this kernel would have booted and paniced
on that error for the next three years.  I could fix that by simply
doing a manual (background_fsck=NO), so something is broken, for some
definition of broken:  If my system panics, I call that broken.

We claim background fsck as a cool new feature in the release notes,
which is even the DEFAULT, including WC on ATA disks, which is ALSO the
default.  So , and if this is broken, there is a serious design flaw,
which must be fixed.  It doesn't help to explain why the error is there,
the next user will have the same error, running a verbatim system.

Ciao

Alex

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: several background fsck panics

2003-03-25 Thread Alexander Leidinger
On Tue, 25 Mar 2003 16:04:07 +0100
Alexander Langer [EMAIL PROTECTED] wrote:

 We claim background fsck as a cool new feature in the release notes,
 which is even the DEFAULT, including WC on ATA disks, which is ALSO the
 default.  So , and if this is broken, there is a serious design flaw,
 which must be fixed.  It doesn't help to explain why the error is there,
 the next user will have the same error, running a verbatim system.

AFAIK: Søren had the WC off for a while on -current, but a lot of people
complained, so he switched it back on (I'm sure he regrets it every time
he is reminded about it). So -- including you and me -- there are at
least 3 committers which would like to see the WC turned off by default.

There are a lot of other people without special @FreeBSD.org privileges
which also don't like the actual default (if you can get a look at
iX-10/2002 read the BSD-Softupdates vs. Linux-Journaling article - the
tests for this article where done on SCSI hardware, but this doesn't
matter in this case - it explains the interactions of WC, TQ and SO and
how they affect the speed of some FS-operations).

Maybe we can gain some momentum and restore POLA (in this case the
default of going the safe way instead of the fast (but sometimes
dangerous) way).

Bye,
Alexander.

-- 
Yes, I've heard of decaf. What's your point?

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: several background fsck panics

2003-03-25 Thread The Anarcat
On Tue Mar 25, 2003 at 03:54:58AM -0800, Terry Lambert wrote:
 Alexander Langer wrote:
  Thus spake Terry Lambert ([EMAIL PROTECTED]):
   Disable write caching on your ATA drive.  You should be able to
   safely reset after that.
  
  Good idea, thanks.  Nevertheless:  I don't think the system should
  panic on background fsck's, while a manual fsck works.
 
 A manual fsck can deal with corrupt data.
 
 A background fsck can only deal with invalid cylinder group
 bitmaps, and operates on a snapshot.
 
 For a background fsck to be feasible, the FS has to be in a
 self-consistent state already, which it wasn't.
 
 When you killed the power on your system and reset it, you
 lost the cached data sitting in the ATA disk.  This is due
 to the fact that the ATA disk lied, and claimed that it had
 committed some writes to stable storage, when in fact it had
 only copied them to the disk cache.  As a result, when the
 device reset happened, you lost some writes which were in
 progress.  Therefore you disk image was corrupt, and so your
 FS was *not* in a self-consistent state.

Shouldn't fsck run in the foreground for disks setup with WC? That
would be a quick hack solving this issue altogether.

A.

-- 
Conformity-the natural instinct to passively yield to that vague something
recognized as authority.
- Mark Twain


pgp0.pgp
Description: PGP signature


Re: several background fsck panics

2003-03-25 Thread Terry Lambert
Alexander Langer wrote:
 Actually I don't care _where_ the panic happened.  If I hadn't manually
 interupted the boot process, this kernel would have booted and paniced
 on that error for the next three years.  I could fix that by simply
 doing a manual (background_fsck=NO), so something is broken, for some
 definition of broken:  If my system panics, I call that broken.

Actually, you *do* care where the panic occurred.  8-).

The issue with the repeating background fsck's is important.
I suggest a counter that gets reset to zero each time the
FS is marked clean by fsck, and incremented each time the
background fsck process is started.

When this counter reaches a predefined value (I sugest a
command line option to background fsck, which defaults to
3, if left unspecified), then the fsck is automatically
converted to a foreground fsck.

This counter would be recorded in the superblock.


 We claim background fsck as a cool new feature in the release notes,

I don't.  I'm convinced it's technically infeasible, and Kirk
has validated my reasoning on this, previously.  It is about
as safe or unsafe as running with async mounts.  Maybe worse,
depending on the MTBF for your disk drives (i.e. ATA drives
fail fairly often, if not catastrophically, in the presence
of power failures; this can be mitigated by dual power supplies
and UPS equipment).


 which is even the DEFAULT, including WC on ATA disks, which is ALSO the
 default.  So , and if this is broken, there is a serious design flaw,
 which must be fixed.  It doesn't help to explain why the error is there,
 the next user will have the same error, running a verbatim system.

The explanation is that the very idea of a background fsck,
without additional hardware support, is flawed.  Rather than
the problem occuring in the snapshot code, it could just as
easily occured as a result of some process running before it
had the opportunity to fsck at all.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


[Re: several background fsck panics

2003-03-25 Thread Terry Lambert
The Anarcat wrote:
  When you killed the power on your system and reset it, you
  lost the cached data sitting in the ATA disk.  This is due
  to the fact that the ATA disk lied, and claimed that it had
  committed some writes to stable storage, when in fact it had
  only copied them to the disk cache.  As a result, when the
  device reset happened, you lost some writes which were in
  progress.  Therefore you disk image was corrupt, and so your
  FS was *not* in a self-consistent state.
 
 Shouldn't fsck run in the foreground for disks setup with WC? That
 would be a quick hack solving this issue altogether.

There are a lot of quick hacks that can be done to solve the
issue.  There are also real fixes:

o   Disable BG fsck if WC is on; I dislike this hack,
mostly because of postings by drive engineers to
FreeBSD lists, indicating a willingness to address
ATA issues like this, and the fact that most SCSI
drives don't actually have this issue.

o   Put a counter in the first superblock; it would be
incremented when the BG fsck is started, and reset
to zero when it completes.  If the counter reaches
3 (or some command line specified number), then the
BG flagging is ignored, and a full FG fsck is then
performed instead.  I like this idea because it will
always work, and it's not actually a hack, it's a
correct solution.

o   Implement soft read-only.  The place that most of
the complaints are coming from is desktop users, with
relatively quiescent machines.  Though swap is used,
it does not occur in an FS partition.  As a result,
the FS could be marked read-only for long period of
time.  This marking would be in memory.  The clean bit
would be set on the superblock.  When a write occurs,
the clean bit would be reset to dirty, and committed
to disk prior to the write operation being permitted
to proceed (a stall barrier).  I like this idea because,
for the most part, it eliminates fsck, both BG and FG,
on systems that crash while it's in effect.  The net
result is a system that is statistically much more
tolerant of failures, but which still requires another
safety net, such as the previous solution.

o   Disk manufacturers could fix the ATA write caching
problem.  I think this will happen eventually, so the
first solution is out.

o   PC manufacturers could provide OS-usable NVRAM scratch
areas, which would permit an OS to allocate a section,
and use it.  The OS would then write the FreeBSD marker
into an area to allocate it, and then write power fail
as the failure code into the allocated area.  When a
panic or hardware failure occurred, it could write panic
or hardware fail as the failure code.  When the system
came back up, it would be able to distinguish which type
of failure by reading the NVRAM area.  If it was something
like panic with sync, it could run the BG fsck, otherwise
it would run the FG fsck.  I really like this idea, too.  I
believe that more modern systems have this capability, but
it has not yet been standardized.  Therefore we should take
a wait and see attitude towards it.

o   Disk manufacturers could provide a Lithium battery on board
disks.  This would not only bound their planned obsolesence
curve to 5 years or so (lifetime of the battery), it would
give them an aftermarket.  The battery would trickle-charge
from the disk drive power, and would be used to commit the
write cache in event of power failure.  I like this too; it
makes disk drives obsolete at about 2X the distance that they
become obsolete, it gives the drive manufacturers a bone for
playing along, and it actually solves the problem at it's
source.  People might not like your disk lasts 5 years vs.
your warranty is one year, but smoothing the market demand
function is probably worth more, in terms of lower cost to
consumers and assured profit to disk manufacturers, and it
can be billed as a marketing checkbox item, to force all the
other disk manufacturers into implementing it, too, so there
should be no downside.

o   We can change our file system structure to journalled; I like
this as well, but there are some issues with manufacturers who
do not provide track bondary information, so you can assure
yourselves that a track boundary doesn't span a corruption
boundary, in the event of a power failure.  If you can do this,
journalling actually becomes incredibly fast, since you know
the disk writes backwards on a given track, so you can just
implemente the completed write datestamp, and perform a single
 

several background fsck panics

2003-03-24 Thread Alexander Langer
Hi!

I had several panics related to background fsck now.  Once I disabled
background fsck, all went ok.

It began when I pressed the reset buttons on several boots while the
system was still doing fscks.

Then sometime this happened:

Mar 24 21:31:12 fump root: /dev/ad0s2g: 701589 files, 12766670 used,
32836022 free (76598 frags, 4094928 blocks, 0.2% fragmentation) 
Mar 24 21:32:27 fump kernel: handle_workitem_freeblocks: block count
Mar 24 21:37:36 fump root: fsck_ufs: cannot find inode 1443360 

and a bit later:

Mar 24 21:48:59 fump syslogd: kernel boot file is /boot/kernel/kernel
Mar 24 21:48:59 fump kernel: /usr: bad dir ino 500641 at offset 0:
mangled entry
Mar 24 21:48:59 fump kernel: panic: ufs_dirbad: bad dir
Mar 24 21:48:59 fump kernel: 
Mar 24 21:48:59 fump kernel: syncing disks, buffers remaining... 3810
3810 3810 3809 3809 3809 3806 3807 3807 3807 3807 3807 3803 3803 3803
3780 378
2 3782 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780
3780 3780 3780 3780 3780 3780 3780 
Mar 24 21:48:59 fump kernel: giving up on 2299 buffers
Mar 24 21:48:59 fump kernel: Uptime: 36m18s
Mar 24 21:48:59 fump kernel: Dumping 511 MB
Mar 24 21:48:59 fump kernel: ata0: resetting devices ..
Mar 24 21:48:59 fump kernel: done
Mar 24 21:48:59 fump kernel: 16 32 48 64 80[CTRL-C to abort]  96 112 128
144 160 176 192 208 224 240 256 272 288 304 320 336 352[CTRL-C to abort]
[C
TRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]
[CTRL-
C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]
[CTRL-C to
 abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]

As I was in X, I thought my system just hung w/o dumping the core (my
hdd led is broken obviously), I then pressed reset.

Next turn:

Mar 24 21:48:59 fump kernel: WARNING: /data was not properly dismounted
Mar 24 21:48:59 fump kernel: /data: mount pending error: blocks 32 files 0
Mar 24 21:48:59 fump kernel: /data: superblock summary recomputed

system entered multi-user, I logged in, but did not start anything but
my login shell.

Mar 24 21:53:37 fump syslogd: kernel boot file is /boot/kernel/kernel
Mar 24 21:53:37 fump kernel: dev = ad0s2g, block = 1, fs = /data
Mar 24 21:53:37 fump kernel: panic: ffs_blkfree: freeing free block
Mar 24 21:53:37 fump kernel: 

(entered debugger, I did a c)

Mar 24 21:53:37 fump kernel: syncing disks, buffers remaining... panic:
bremfree: removing a buffer not on a queue
Mar 24 21:53:37 fump kernel: Uptime: 2m10s
Mar 24 21:53:37 fump kernel: Dumping 511 MB
Mar 24 21:53:37 fump kernel: ata0: resetting devices ..
Mar 24 21:53:37 fump kernel: done
Mar 24 21:53:37 fump kernel: 16 32 48 64 80 96 112 128 144 160 176 192 

I pressed reset again as it would have dumped the wrong panic
and I wanted to dump the panic that caused the initial panic.
Next boot:

Mar 24 22:22:58 fump kernel: RENT WAS I=2404025
Mar 24 22:22:58 fump savecore: reboot after panic: bremfree: removing a
buffer not on a queue
Mar 24 22:22:58 fump savecore: writing core to vmcore.1

ok, here's the stuff:


Script started on Mon Mar 24 22:50:14 2003
[EMAIL PROTECTED] /data/crash # gdb -k
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-undermydesk-freebsd.
(kgdb) exec-file /boot/kernel/kernel
(kgdb) symbol-file /usr/obj/usr/src/sys/ZEROGRAVITY/kernel.debug 
Reading symbols from /usr/obj/usr/src/sys/ZEROGRAVITY/kernel.debug...done.
(kgdb) core-file vmcore.1
panic: bremfree: removing a buffer not on a queue
panic messages:
---
panic: ufs_dirbad: bad dir

syncing disks, buffers remaining... 3810 3810 3810 3809 3809 3809 3806 3807 3807 3807 
3807 3807 3803 3803 3803 3780 3782 3782 3780 3780 3780 3780 3780 3780 3780 3780 3780 
3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 3780 
giving up on 2299 buffers
Uptime: 36m18s
Dumping 511 MB
ata0: resetting devices ..
done
 16 32 48 64 80[CTRL-C to abort]  96 112 128 144 160 176 192 208 224 240 256 272 288 
304 320 336 352[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] 
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to 
abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C 
to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] 
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] Copyright (c) 1992-2003 The 
FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #5: Tue Mar 18 16:04:55

Re: several background fsck panics

2003-03-24 Thread Terry Lambert
Alexander Langer wrote:
 I had several panics related to background fsck now.  Once I disabled
 background fsck, all went ok.
 
 It began when I pressed the reset buttons on several boots while the
 system was still doing fscks.

Disable write caching on your ATA drive.  You should be able to
safely reset after that.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-03-14 Thread Vallo Kallaste
On Fri, Mar 14, 2003 at 01:16:02PM +1030, Greg 'groggy' Lehey
[EMAIL PROTECTED] wrote:

  So I did. Loaned two SCSI disks and 50-pin cable. Things haven't
  improved a bit, I'm very sorry to say it.
 
 Sorry for the slow reply to this.  I thought it would make sense to
 try things out here, and so I kept trying to find time, but I have to
 admit I just don't have it yet for a while.  I haven't forgotten, and
 I hope that in a few weeks time I can spend some time chasing down a
 whole lot of Vinum issues.  This is definitely the worst I have seen,
 and I'm really puzzled why it always happens to you.
 
  # simulate disk crash by forcing one arbitrary subdisk down
  # seems that vinum doesn't return values for command completion status
  # checking?
  echo Stopping subdisk.. degraded mode
  vinum stop -f r5.p0.s3  # assume it was successful
 
 I wonder if there's something relating to stop -f that doesn't happen
 during a normal failure.  But this was exactly the way I tested it in
 the first place.

Thank you Greg, I really appreciate your ongoing effort for making
vinum stable, trusted volume manager.
I have to add some facts to the mix. Raidframe on the same hardware
does not have any problems. The later tests I conducted was done
under -stable, because I couldn't get raidframe to work under
-current, system did panic everytime at the end of initialisation of
parity (raidctl -iv raid?). So I used the raidframe patch for
-stable at
http://people.freebsd.org/~scottl/rf/2001-08-28-RAIDframe-stable.diff.gz
Had to do some patching by hand, but otherwise works well.
Will it suffice to switch off power for one disk to simulate more
real-world disk failure? Are there any hidden pitfalls for failing
and restoring operation of non-hotswap disks?
-- 

Vallo Kallaste

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-03-14 Thread Greg 'groggy' Lehey
On Friday, 14 March 2003 at 10:05:28 +0200, Vallo Kallaste wrote:
 On Fri, Mar 14, 2003 at 01:16:02PM +1030, Greg 'groggy' Lehey
 [EMAIL PROTECTED] wrote:

 So I did. Loaned two SCSI disks and 50-pin cable. Things haven't
 improved a bit, I'm very sorry to say it.

 Sorry for the slow reply to this.  I thought it would make sense to
 try things out here, and so I kept trying to find time, but I have to
 admit I just don't have it yet for a while.  I haven't forgotten, and
 I hope that in a few weeks time I can spend some time chasing down a
 whole lot of Vinum issues.  This is definitely the worst I have seen,
 and I'm really puzzled why it always happens to you.

 # simulate disk crash by forcing one arbitrary subdisk down
 # seems that vinum doesn't return values for command completion status
 # checking?
 echo Stopping subdisk.. degraded mode
 vinum stop -f r5.p0.s3  # assume it was successful

 I wonder if there's something relating to stop -f that doesn't happen
 during a normal failure.  But this was exactly the way I tested it in
 the first place.

 Thank you Greg, I really appreciate your ongoing effort for making
 vinum stable, trusted volume manager.
 I have to add some facts to the mix. Raidframe on the same hardware
 does not have any problems. The later tests I conducted was done
 under -stable, because I couldn't get raidframe to work under
 -current, system did panic everytime at the end of initialisation of
 parity (raidctl -iv raid?). So I used the raidframe patch for
 -stable at
 http://people.freebsd.org/~scottl/rf/2001-08-28-RAIDframe-stable.diff.gz
 Had to do some patching by hand, but otherwise works well.

I don't think that problems with RAIDFrame are related to these
problems with Vinum.  I seem to remember a commit to the head branch
recently (in the last 12 months) relating to the problem you've seen.
I forget exactly where it went (it wasn't from me), and in cursory
searching I couldn't find it.  It's possible that it hasn't been
MFC'd, which would explain your problem.  If you have a 5.0 machine,
it would be interesting to see if you can reproduce it there.

 Will it suffice to switch off power for one disk to simulate more
 real-world disk failure? Are there any hidden pitfalls for failing
 and restoring operation of non-hotswap disks?

I don't think so.  It was more thinking aloud than anything else.  As
I said above, this is the way I tested things in the first place.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-03-13 Thread Greg 'groggy' Lehey
On Saturday,  1 March 2003 at 20:43:10 +0200, Vallo Kallaste wrote:
 On Thu, Feb 27, 2003 at 11:53:02AM +0200, Vallo Kallaste vallo wrote:

 The vinum R5 and system as a whole were stable without
 softupdates. Only one problem remained after disabling softupdates,
 while being online and user I/O going on, rebuilding of failed disk
 corrupt the R5 volume completely.

 Yes, we've fixed a bug in that area.  It had nothing to do with soft
 updates, though.

 Oh, that's very good news, thank you! Yes, it had nothing to do with
 soft updates at all and that's why I had the remained after in the
 sentence.

 Don't know is it fixed or not as I don't have necessary hardware at
 the moment. The only way around was to quiesce the volume before
 rebuilding, umount it, and wait until rebuild finished. I'll suggest
 extensive testing cycle for everyone who's going to work with vinum
 R5. Concat, striping and mirroring has been a breeze but not so with
 R5.

 IIRC the rebuild bug bit any striped configuration.

 Ok, I definitely had problems only with R5, but you certainly know
 much better what it was exactly. I'll need to lend 50-pin SCSI cable
 and test vinum again. Will it matter on what version of FreeBSD I'll
 try on? My home system runs -current of Feb 5, but if you suggest
 -stable for consistent results, I'll do it.

 So I did. Loaned two SCSI disks and 50-pin cable. Things haven't
 improved a bit, I'm very sorry to say it.

Sorry for the slow reply to this.  I thought it would make sense to
try things out here, and so I kept trying to find time, but I have to
admit I just don't have it yet for a while.  I haven't forgotten, and
I hope that in a few weeks time I can spend some time chasing down a
whole lot of Vinum issues.  This is definitely the worst I have seen,
and I'm really puzzled why it always happens to you.

 # simulate disk crash by forcing one arbitrary subdisk down
 # seems that vinum doesn't return values for command completion status
 # checking?
 echo Stopping subdisk.. degraded mode
 vinum stop -f r5.p0.s3# assume it was successful

I wonder if there's something relating to stop -f that doesn't happen
during a normal failure.  But this was exactly the way I tested it in
the first place.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-03-01 Thread Vallo Kallaste
On Thu, Feb 27, 2003 at 11:53:02AM +0200, Vallo Kallaste vallo wrote:

   The vinum R5 and system as a whole were stable without
   softupdates. Only one problem remained after disabling softupdates,
   while being online and user I/O going on, rebuilding of failed disk
   corrupt the R5 volume completely.
  
  Yes, we've fixed a bug in that area.  It had nothing to do with soft
  updates, though.
 
 Oh, that's very good news, thank you! Yes, it had nothing to do with
 soft updates at all and that's why I had the remained after in the
 sentence.
 
   Don't know is it fixed or not as I don't have necessary hardware at
   the moment. The only way around was to quiesce the volume before
   rebuilding, umount it, and wait until rebuild finished. I'll suggest
   extensive testing cycle for everyone who's going to work with vinum
   R5. Concat, striping and mirroring has been a breeze but not so with
   R5.
  
  IIRC the rebuild bug bit any striped configuration.
 
 Ok, I definitely had problems only with R5, but you certainly know
 much better what it was exactly. I'll need to lend 50-pin SCSI cable
 and test vinum again. Will it matter on what version of FreeBSD I'll
 try on? My home system runs -current of Feb 5, but if you suggest
 -stable for consistent results, I'll do it.

So I did. Loaned two SCSI disks and 50-pin cable. Things haven't
improved a bit, I'm very sorry to say it.
The entire test session (script below) was done in single user. To
be fair, I did tens of them, and the mode doesn't matter.
Complete script:

Script started on Sat Mar  1 19:54:45 2003
# pwd
/root
# dmesg
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #0: Sun Feb  2 16:16:49 EET 2003
[EMAIL PROTECTED]:/usr/home/vallo/Kevad-5.0
Preloaded elf kernel /boot/kernel/kernel at 0xc0516000.
Preloaded elf module /boot/kernel/vinum.ko at 0xc05160b4.
Preloaded elf module /boot/kernel/ahc_pci.ko at 0xc0516160.
Preloaded elf module /boot/kernel/ahc.ko at 0xc051620c.
Preloaded elf module /boot/kernel/cam.ko at 0xc05162b4.
Timecounter i8254  frequency 1193182 Hz
Timecounter TSC  frequency 132955356 Hz
CPU: Pentium/P54C (132.96-MHz 586-class CPU)
  Origin = GenuineIntel  Id = 0x526  Stepping = 6
  Features=0x1bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8
real memory  = 67108864 (64 MB)
avail memory = 59682816 (56 MB)
Intel Pentium detected, installing workaround for F00F bug
Initializing GEOMetry subsystem
VESA: v2.0, 4096k memory, flags:0x0, mode table:0xc037dec2 (122)
VESA: ATI MACH64
npx0: math processor on motherboard
npx0: INT 16 interface
pcib0: Host to PCI bridge at pcibus 0 on motherboard
pci0: PCI bus on pcib0
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel PIIX ATA controller port 0xff90-0xff9f at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ahc0: Adaptec 2940 Ultra SCSI adapter port 0xf800-0xf8ff mem 0xffbee000-0xffbeefff 
irq 10 at device 13.0 on pci0
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
pci0: display, VGA at device 14.0 (no driver attached)
atapci1: Promise ATA66 controller port 
0xff00-0xff3f,0xffe0-0xffe3,0xffa8-0xffaf,0xffe4-0xffe7,0xfff0-0xfff7 mem 
0xffbc-0xffbd irq 11 at device 15.0 on pci0
ata2: at 0xfff0 on atapci1
ata3: at 0xffa8 on atapci1
orm0: Option ROMs at iomem 
0xed000-0xedfff,0xca000-0xca7ff,0xc8000-0xc9fff,0xc-0xc7fff on isa0
atkbdc0: Keyboard controller (i8042) at port 0x64,0x60 on isa0
atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
ed0 at port 0x300-0x31f iomem 0xd8000 irq 5 on isa0
ed0: address 00:80:c8:37:e2:a6, type NE2000 (16 bit) 
fdc0: Enhanced floppy controller (i82077, NE72065 or clone) at port 
0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1440-KB 3.5 drive on fdc0 drive 0
ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppi0: Parallel I/O on ppbus0
sc0: System console at flags 0x100 on isa0
sc0: VGA 5 virtual consoles, flags=0x300
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
unknown: PNP0303 can't assign resources (port)
unknown: PNP0700 can't assign resources (port)
unknown: PNP0401 can't assign resources (port)
unknown: PNP0501 can't assign resources (port)
unknown: PNP0501 can't assign resources (port)
Timecounters tick every 1.000 msec
ata0-slave: ATAPI identify retries exceeded
ad4: 2445MB QUANTUM FIREBALL EL2.5A [5300/15/63] at ata2-master UDMA33
ad6: 2423MB SAMSUNG WU32553A (2.54GB) [4924/16/63] at ata3-master UDMA33
acd0: CDROM WPI CDD-820 at ata0-master PIO3
Waiting 15 seconds for SCSI devices to settle
da0 at ahc0 bus 0 target 0 

Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-27 Thread Vallo Kallaste
On Thu, Feb 27, 2003 at 11:59:59AM +1030, Greg 'groggy' Lehey
[EMAIL PROTECTED] wrote:

  The crashes and anomalies with filesystem residing on R5 volume were
  related to vinum(R5)/softupdates combo.
 
 Well, at one point we suspected that.  But the cases I have seen were
 based on a misassumption.  Do you have any concrete evidence that
 points to that particular combination?

Don't have any other evidence than the case I was describing. After
changing my employer I hadn't had much time or motivation to try
again.

  The vinum R5 and system as a whole were stable without
  softupdates. Only one problem remained after disabling softupdates,
  while being online and user I/O going on, rebuilding of failed disk
  corrupt the R5 volume completely.
 
 Yes, we've fixed a bug in that area.  It had nothing to do with soft
 updates, though.

Oh, that's very good news, thank you! Yes, it had nothing to do with
soft updates at all and that's why I had the remained after in the
sentence.

  Don't know is it fixed or not as I don't have necessary hardware at
  the moment. The only way around was to quiesce the volume before
  rebuilding, umount it, and wait until rebuild finished. I'll suggest
  extensive testing cycle for everyone who's going to work with vinum
  R5. Concat, striping and mirroring has been a breeze but not so with
  R5.
 
 IIRC the rebuild bug bit any striped configuration.

Ok, I definitely had problems only with R5, but you certainly know
much better what it was exactly. I'll need to lend 50-pin SCSI cable
and test vinum again. Will it matter on what version of FreeBSD I'll
try on? My home system runs -current of Feb 5, but if you suggest
-stable for consistent results, I'll do it.

Thanks
-- 
Vallo Kallaste

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-26 Thread Greg 'groggy' Lehey
On Friday, 21 February 2003 at 10:00:46 +0200, Vallo Kallaste wrote:
 On Thu, Feb 20, 2003 at 02:28:45PM -0800, Darryl Okahata
 [EMAIL PROTECTED] wrote:

 Vallo Kallaste [EMAIL PROTECTED] wrote:

 I'll second Brad's statement about vinum and softupdates
 interactions. My last experiments with vinum were more than half a
 year ago, but I guess it still holds. BTW, the interactions showed
 up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq
 Proliant 3000 and the system was very stable before I enabled
 softupdates.. and of course after I disabled softupdates. In between
 there were crashes and nasty problems with filesystem. Unfortunately
 it was production system and I hadn't chanche to play.

  Did you believe that the crashes were caused by enabling softupdates on
 an R5 vinum volume, or were the crashes unrelated to vinum/softupdates?
 I can see how crashes unrelated to vinum/softupdates might trash vinum
 filesystems.

 The crashes and anomalies with filesystem residing on R5 volume were
 related to vinum(R5)/softupdates combo.

Well, at one point we suspected that.  But the cases I have seen were
based on a misassumption.  Do you have any concrete evidence that
points to that particular combination?

 The vinum R5 and system as a whole were stable without
 softupdates. Only one problem remained after disabling softupdates,
 while being online and user I/O going on, rebuilding of failed disk
 corrupt the R5 volume completely.

Yes, we've fixed a bug in that area.  It had nothing to do with soft
updates, though.

 Don't know is it fixed or not as I don't have necessary hardware at
 the moment. The only way around was to quiesce the volume before
 rebuilding, umount it, and wait until rebuild finished. I'll suggest
 extensive testing cycle for everyone who's going to work with vinum
 R5. Concat, striping and mirroring has been a breeze but not so with
 R5.

IIRC the rebuild bug bit any striped configuration.

Greg
--
See complete headers for address and phone numbers
Please note: we block mail from major spammers, notably yahoo.com.
See http://www.lemis.com/yahoospam.html for further details.


pgp0.pgp
Description: PGP signature


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-26 Thread Greg 'groggy' Lehey
On Friday, 21 February 2003 at  1:56:56 -0800, Terry Lambert wrote:
 Vallo Kallaste wrote:
 The crashes and anomalies with filesystem residing on R5 volume were
 related to vinum(R5)/softupdates combo. The vinum R5 and system as
 a whole were stable without softupdates. Only one problem remained
 after disabling softupdates, while being online and user I/O going
 on, rebuilding of failed disk corrupt the R5 volume completely.
 Don't know is it fixed or not as I don't have necessary hardware at
 the moment. The only way around was to quiesce the volume before
 rebuilding, umount it, and wait until rebuild finished. I'll suggest
 extensive testing cycle for everyone who's going to work with
 vinum R5. Concat, striping and mirroring has been a breeze but not
 so with R5.

 I think this is an expected problem with a lot of concatenation,
 whether through Vinum, GEOM, RAIDFrame, or whatever.

Can you be more specific?  What you say below doesn't address any
basic difference between virtual and real disks.

 This comes about for the same reason that you can't mount -u
 to turn Soft Updates from off to on: Soft Updates does not
 tolerate dirty buffers for which a dependency does not exist, and
 will crap out when a pending dirty buffer causes a write.

I don't understand what this has to do with virtual disks.

 This could be fixed in the mount -u case for Soft Updates, and it
 can also be fixed for Vinum (et. al.).

 The key is the difference between a mount -u vs. a umount ; mount,
 which comes down to flushing and invalidating all buffers on the
 underlying device, e.g.:

   vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, p);
   vinvalbuf(devvp, V_SAVE, NOCRED, p, 0, 0);
   error = VOP_CLOSE(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p);
   error = VOP_OPEN(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p);
   VOP_UNLOCK(devvp, 0, p);

 ... Basically, after rebuilding, before allowing the mount to proceed,
 the Vinum (and GEOM and RAIDFRame, etc.) code needs to cause all the
 pending dirty buffers to be written.  This will guarantee that there
 are no outstanding dirty buffers at mount time, which in turn guarantees
 that there will be no dirty buffers that the dependency tracking in
 Soft Updates does not know about.

I don't understand what you're assuming here.  Certainly I can't see
any relevance to Vinum, RAIDframe or any other virtual disk system.

Greg
--
See complete headers for address and phone numbers
Please note: we block mail from major spammers, notably yahoo.com.
See http://www.lemis.com/yahoospam.html for further details.


pgp0.pgp
Description: PGP signature


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-24 Thread Darryl Okahata
Terry Lambert [EMAIL PROTECTED] wrote:

 I think this is an expected problem with a lot of concatenation,
 whether through Vinum, GEOM, RAIDFrame, or whatever.
 
 This comes about for the same reason that you can't mount -u
 to turn Soft Updates from off to on: Soft Updates does not
 tolerate dirty buffers for which a dependency does not exist, and
 will crap out when a pending dirty buffer causes a write.

 Does this affect background fsck, too (on regular, non-vinum
filesystems)?  From what little I know of bg fsck, I'm guessing not, but
I'd like to be sure.  Thanks.

-- 
Darryl Okahata
[EMAIL PROTECTED]

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-24 Thread Terry Lambert
Darryl Okahata wrote:
 Terry Lambert [EMAIL PROTECTED] wrote:
  I think this is an expected problem with a lot of concatenation,
  whether through Vinum, GEOM, RAIDFrame, or whatever.
 
  This comes about for the same reason that you can't mount -u
  to turn Soft Updates from off to on: Soft Updates does not
  tolerate dirty buffers for which a dependency does not exist, and
  will crap out when a pending dirty buffer causes a write.
 
  Does this affect background fsck, too (on regular, non-vinum
 filesystems)?  From what little I know of bg fsck, I'm guessing not, but
 I'd like to be sure.  Thanks.

No, it doesn't.  Background fsck works by assuming that the only
thing that could contain bad data is the cylinder group bitmaps,
which means the worst case failure is some blocks are not available
for reallocation.  It works by taking a snapshot, which is a feature
that allows modification of the FS while the bgfsck's idea of the FS
remains unchanged.  Then it goes through the bitmaps, verifying that
the blocks it thinks are allocated are in fact allocated by files
within the snapshot.  Basically, it's only job is really to clear
bits in the bitmap that represent blocks for which there are no files
referencing them.

There are situations where bgfsck can fail, sometimes catastrophically,
but they are unrelated to having dirty blocks in memory for which no
updates have been created.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-21 Thread Vallo Kallaste
On Thu, Feb 20, 2003 at 02:28:45PM -0800, Darryl Okahata
[EMAIL PROTECTED] wrote:

 Vallo Kallaste [EMAIL PROTECTED] wrote:
 
  I'll second Brad's statement about vinum and softupdates
  interactions. My last experiments with vinum were more than half a
  year ago, but I guess it still holds. BTW, the interactions showed
  up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq
  Proliant 3000 and the system was very stable before I enabled
  softupdates.. and of course after I disabled softupdates. In between
  there were crashes and nasty problems with filesystem. Unfortunately
  it was production system and I hadn't chanche to play.
 
  Did you believe that the crashes were caused by enabling softupdates on
 an R5 vinum volume, or were the crashes unrelated to vinum/softupdates?
 I can see how crashes unrelated to vinum/softupdates might trash vinum
 filesystems.

The crashes and anomalies with filesystem residing on R5 volume were
related to vinum(R5)/softupdates combo. The vinum R5 and system as
a whole were stable without softupdates. Only one problem remained
after disabling softupdates, while being online and user I/O going
on, rebuilding of failed disk corrupt the R5 volume completely.
Don't know is it fixed or not as I don't have necessary hardware at
the moment. The only way around was to quiesce the volume before
rebuilding, umount it, and wait until rebuild finished. I'll suggest
extensive testing cycle for everyone who's going to work with
vinum R5. Concat, striping and mirroring has been a breeze but not
so with R5.
-- 

Vallo Kallaste
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]

2003-02-21 Thread Terry Lambert
Vallo Kallaste wrote:
 The crashes and anomalies with filesystem residing on R5 volume were
 related to vinum(R5)/softupdates combo. The vinum R5 and system as
 a whole were stable without softupdates. Only one problem remained
 after disabling softupdates, while being online and user I/O going
 on, rebuilding of failed disk corrupt the R5 volume completely.
 Don't know is it fixed or not as I don't have necessary hardware at
 the moment. The only way around was to quiesce the volume before
 rebuilding, umount it, and wait until rebuild finished. I'll suggest
 extensive testing cycle for everyone who's going to work with
 vinum R5. Concat, striping and mirroring has been a breeze but not
 so with R5.

I think this is an expected problem with a lot of concatenation,
whether through Vinum, GEOM, RAIDFrame, or whatever.

This comes about for the same reason that you can't mount -u
to turn Soft Updates from off to on: Soft Updates does not
tolerate dirty buffers for which a dependency does not exist, and
will crap out when a pending dirty buffer causes a write.

This could be fixed in the mount -u case for Soft Updates, and it
can also be fixed for Vinum (et. al.).

The key is the difference between a mount -u vs. a umount ; mount,
which comes down to flushing and invalidating all buffers on the
underlying device, e.g.:

vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, p);
vinvalbuf(devvp, V_SAVE, NOCRED, p, 0, 0);
error = VOP_CLOSE(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p);
error = VOP_OPEN(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p);
VOP_UNLOCK(devvp, 0, p);

... Basically, after rebuilding, before allowing the mount to proceed,
the Vinum (and GEOM and RAIDFRame, etc.) code needs to cause all the
pending dirty buffers to be written.  This will guarantee that there
are no outstanding dirty buffers at mount time, which in turn guarantees
that there will be no dirty buffers that the dependency tracking in
Soft Updates does not know about.

FWIW: I've maintained for over 6 years now that the mount update
code should be modified to do this automatically (and provided
patches; see early 1997 mailing list archives), essentially turning
a mount -u into a umount ; mount, without invalidating outstanding
vnodes and in-core inodes or their references (so that open files do
not break... they just get all their buffers taken away from them).

Of course, the only open files that matter for device layering are the
device exporting the layered block store, and the underlying component
block stores that make it up (i.e. no open files there).


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck deadlocks with ufs2 and big disk

2003-02-20 Thread Darryl Okahata
Vallo Kallaste [EMAIL PROTECTED] wrote:

 I'll second Brad's statement about vinum and softupdates
 interactions. My last experiments with vinum were more than half a
 year ago, but I guess it still holds. BTW, the interactions showed
 up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq
 Proliant 3000 and the system was very stable before I enabled
 softupdates.. and of course after I disabled softupdates. In between
 there were crashes and nasty problems with filesystem. Unfortunately
 it was production system and I hadn't chanche to play.

 Did you believe that the crashes were caused by enabling softupdates on
an R5 vinum volume, or were the crashes unrelated to vinum/softupdates?
I can see how crashes unrelated to vinum/softupdates might trash vinum
filesystems.

-- 
Darryl Okahata
[EMAIL PROTECTED]

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck deadlocks with ufs2 and big disk

2003-02-20 Thread Brad Knowles
At 2:28 PM -0800 2003/02/20, Darryl Okahata wrote:


  Did you believe that the crashes were caused by enabling softupdates on
 an R5 vinum volume, or were the crashes unrelated to vinum/softupdates?
 I can see how crashes unrelated to vinum/softupdates might trash vinum
 filesystems.


	Using RAID-5 under vinum was always a somewhat tricky business 
for me, but in many cases I could get it to work reasonably well most 
of the time.  But if I enabled softupdates on that filesystem, I was 
toast.  Softupdates enabled on filesystems that were not on top of 
vinum RAID-5 logical devices seemed to be fine.

	So, the interaction that I personally witnessed was specifically 
between vinum RAID-5 and softupdates.

--
Brad Knowles, [EMAIL PROTECTED]

They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety.
-Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: background fsck deadlocks with ufs2 and big disk

2003-02-19 Thread Darryl Okahata
David Schultz [EMAIL PROTECTED] wrote:

 IIRC, Kirk was trying to reproduce this a little while ago in
 response to similar reports.  He would probably be interested
 in any new information.

 I don't have any useful information, but I do have a data point:

My 5.0-RELEASE system recently mysteriously panic'd, which
resulted in a partially trashed UFS1 filesystem, which caused bg
fsck to hang.

Details:

* The panic was weird, in that only the first 4-6 characters of the
  first function (in the panic stacktrace) was displayed on the console
  (sorry, forgot what it was).  Nothing else past that point was shown,
  and the console was locked up.  Ddb was compiled into the kernel, but
  ctrl-esc did nothing.

* The UFS1 filesystem in question (and I assume that it was UFS1, as I
  did not specify a filesystem type to newfs) is located on a RAID5
  vinum volume, consisting of five 80GB disks.

* Softupdates is enabled.

* When bg fsck hung (w/no disk activity), I could break into the ddb.
  Unfortunately, I don't know how to use ddb, aside from ps.

* Disabling bg fsck allowed the system to boot.  However, fg fsck
  failed, and I had to do a manual fsck, which spewed lots of nasty
  SOFTUPDATE INCONSISTENCY errors.

* Disturbingly (but fortunately), I then unmounted the filesystem (in
  multi-user mode) and re-ran fsck, and fsck still found errors.  There
  should not have been any errors, as fg fsck just finished running.

  [ Unfortunately, I've forgotten what they were, and an umount/fsck
done right now shows no problems.  I think the errors were one of
the incorrect block count errors.  ]

* After the fsck, some files were partially truncated ( corrupted?).
  After investigating, I believe these truncated files (which were NOT
  recently modified) were in a directory in which other files were being
  created/written at the time of the panic.

-- 
Darryl Okahata
[EMAIL PROTECTED]

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck deadlocks with ufs2 and big disk

2003-02-19 Thread Brad Knowles
At 9:15 AM -0800 2003/02/19, Darryl Okahata wrote:


 * The UFS1 filesystem in question (and I assume that it was UFS1, as I
   did not specify a filesystem type to newfs) is located on a RAID5
   vinum volume, consisting of five 80GB disks.

 * Softupdates is enabled.


	You know, vinum  softupdates have had bad interactions with each 
other for as long as I can remember.  Has this truly been a 
consistent thing (as I seem to recall), or has this been an 
on-again/off-again situation?

--
Brad Knowles, [EMAIL PROTECTED]

They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety.
-Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: background fsck deadlocks with ufs2 and big disk

2003-02-19 Thread Darryl Okahata
Brad Knowles [EMAIL PROTECTED] wrote:

   You know, vinum  softupdates have had bad interactions with each 
 other for as long as I can remember.  Has this truly been a 
 consistent thing (as I seem to recall), or has this been an 
 on-again/off-again situation?

 Ah, yaaah.  Hmm 

 This is the first I've heard of that, but I can see how that could
be.  Could vinum be considered to be a form of (unintentional)
write-caching?  

 That might explain how the filesystem got terribly hosed, but it
doesn't help with the panic.  Foo.

[ This is on a system that's been running in the current state for
  around a month.  So far, it's panic'd once (a week or so ago), and so
  I don't have any feel for long-term stability.  We'll see how it
  goes.  ]

-- 
Darryl Okahata
[EMAIL PROTECTED]

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Panic induced by background fsck.

2003-02-19 Thread ianf
Hi

Several seconds into background fsck, this happens.  It's easily
repeatable by enabling background fsck and booting after an unclean
shutdown.

Kernel is 2003-02-19 from source checked out on that day (SMP).
The filesystems are UFS1 with softupdates.

panic: vm_fault: fault on nofault entry, addr: c7c65000
cpuid = 0; lapic.id = 
Stack backtrace:
backtrace(c03185a5,0,c0325e37,d280258c,1) at backtrace+0x17
panic(c0325e37,c7c65000,2,d2802638,d2802628) at panic+0x10a
vm_fault(c082f000,c7c65000,2,0,c3908690) at vm_fault+0x1073
trap_pfault(d2802724,0,c7c65000,1db,c7c65000) at trap_pfault+0x161
trap(d2800018,d2800010,c3920010,c7c65000,0) at trap+0x3cd
calltrap() at calltrap+0x5
--- trap 0xc, eip = 0xc02da907, esp = 0xd2802764, ebp = 0xd2802a3c ---
generic_bzero(c38f7000,80be140,70,1000,3e5428db) at generic_bzero+0xf
ffs_mount(c38f7000,c3a4c800,bfbffcc0,d2802bec,c3908690) at ffs_mount+0x638
vfs_mount(c3908690,c385e230,c3a4c800,1211000,bfbffcc0) at vfs_mount+0x83a
mount(c3908690,d2802d10,c032ba37,407,4) at mount+0xb8
syscall(2f,2f,2f,0,bfbffdc0) at syscall+0x28e
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (21), eip = 0x805636b, esp = 0xbfbffc0c, ebp = 0xbfbffd48 ---
Debugger(panic)
Stopped at  Debugger+0x55:  xchgl   %ebx,in_Debugger.0
db call boot
panic: bremfree: bp 0xc75ab978 not locked
cpuid = 0; lapic.id = 
boot() called on cpu#0
Uptime: 2m43s
Dumping 191 MB


Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



background fsck deadlocks with ufs2 and big disk

2003-02-18 Thread Martin Blapp

Hi all,

I just wanted to tell that I can deadlock one of my current boxes
with a ufs2 filesystem on a 120GB ATA disk. I can reproduce
the problem. The background fsck process hangs some time at the
same place always at the same place, sometimes the box freezes
after some time.

The same box works fine with ufs1.

Martin

Martin Blapp, [EMAIL PROTECTED] [EMAIL PROTECTED]
--
ImproWare AG, UNIXSP  ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 61 826 93 00 Fax: +41 61 826 93 01
PGP: finger -l [EMAIL PROTECTED]
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
--

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck deadlocks with ufs2 and big disk

2003-02-18 Thread David Schultz
Thus spake Martin Blapp [EMAIL PROTECTED]:
 I just wanted to tell that I can deadlock one of my current boxes
 with a ufs2 filesystem on a 120GB ATA disk. I can reproduce
 the problem. The background fsck process hangs some time at the
 same place always at the same place, sometimes the box freezes
 after some time.
 
 The same box works fine with ufs1.

IIRC, Kirk was trying to reproduce this a little while ago in
response to similar reports.  He would probably be interested
in any new information.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-22 Thread Jan Srzednicki
On Mon, 20 Jan 2003, David Schultz wrote:

  First two entries clearly correspond to the missing file, which should
  have been put in /home/lost+found. But, the poroblem is that no lost+found
  directory was created, while it should (as fsck_ffs(8) says). I guess its
  a bug, probably in the background fsck code. Still, is there any way to
  reclaim the file now, besides running strings(1) on the whole partition?

 Consider what happens when you remove a large directory tree.
 Thousands of directory entries may be removed, but in the
 softupdates case, the inodes will stick around a bit longer.  The
 same also applies to files that have been intentionally unlinked
 but are still open.  To avoid a syndrome where all these thousands
 of files end up in lost+found after a crash or power failure, fsck
 just removes them on softupdates-enabled filesystems.

Would that be a big problem to allow some fsck option not to erase all
these softupdates-pending inodes, but to put them in lost+found as usual?
The default behaviour is unchanged, yet there is a way to reclaim lost
files.

-- 
  -- wrzask --= v =-- Winfried --=-- GG# 3838383 --=-- JS500-RIPE --
-- [EMAIL PROTECTED] --- [EMAIL PROTECTED] --===-- http://violent.dream.vg/ ---
--= Ride the wild wind - push the envelope, don't sit on the fence, ---
  -- Ride the wild wind - live life on the razor's edge! =-- Queen --


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-22 Thread Garrett Wollman
On Wed, 22 Jan 2003 11:14:47 +0100 (CET), Jan Srzednicki 
[EMAIL PROTECTED] said:

 Would that be a big problem to allow some fsck option not to erase all
 these softupdates-pending inodes, but to put them in lost+found as usual?

It certainly couldn't be done with the background fsck, because
background fsck works on a snapshot and not the running filesystem;
thus, it cannot make any allocations -- it can only deallocate things.

-GAWollman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-22 Thread Jan Srzednicki
On Wed, 22 Jan 2003, Garrett Wollman wrote:

  Would that be a big problem to allow some fsck option not to erase all
  these softupdates-pending inodes, but to put them in lost+found as usual?

 It certainly couldn't be done with the background fsck, because
 background fsck works on a snapshot and not the running filesystem;
 thus, it cannot make any allocations -- it can only deallocate things.

Still, in case you know some of your important files can be lost, you can
boot the system to single user and run foreground fsck.

-- 
  -- wrzask --= v =-- Winfried --=-- GG# 3838383 --=-- JS500-RIPE --
-- [EMAIL PROTECTED] --- [EMAIL PROTECTED] --===-- http://violent.dream.vg/ ---
--= Ride the wild wind - push the envelope, don't sit on the fence, ---
  -- Ride the wild wind - live life on the razor's edge! =-- Queen --


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-22 Thread Max Khon
hi, there!

On Wed, Jan 22, 2003 at 07:18:44PM +0100, Jan Srzednicki wrote:

   Would that be a big problem to allow some fsck option not to erase all
   these softupdates-pending inodes, but to put them in lost+found as usual?
 
  It certainly couldn't be done with the background fsck, because
  background fsck works on a snapshot and not the running filesystem;
  thus, it cannot make any allocations -- it can only deallocate things.
 
 Still, in case you know some of your important files can be lost, you can
 boot the system to single user and run foreground fsck.

this is not an option if the system was rebooted because of power loss
or kernel panic

/fjoe


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-22 Thread David Schultz
Thus spake Garrett Wollman [EMAIL PROTECTED]:
 On Wed, 22 Jan 2003 11:14:47 +0100 (CET), Jan Srzednicki 
[EMAIL PROTECTED] said:
 
  Would that be a big problem to allow some fsck option not to erase all
  these softupdates-pending inodes, but to put them in lost+found as usual?
 
 It certainly couldn't be done with the background fsck, because
 background fsck works on a snapshot and not the running filesystem;
 thus, it cannot make any allocations -- it can only deallocate things.

Actually, that should work just fine.  When background fsck
notices an unreferenced inode in the snapshot, it could create a
file in the underlying filesystem.  The easy way to do this is to
copy the data with the standard open(2)/write(2)/close(2)
interfaces.  After the copy, the original data blocks are
deallocated as usual.  A more efficient implementation might
require a special kernel interface that creates a directory entry,
given an inode number and path.  Unfortunately, I think it is
possible that the unreferenced inode has not been initialized,
even though it is allocated in the inode bitmap, so you could
potentially get random junk.

Such a feature sounds reasonable, although I'm not sure how useful
it would really be.  If you have software that introduces a race
window where you can lose data because it does updates
incorrectly, hacking the operating system to make the race window
slightly smaller is not the best solution.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-22 Thread Garance A Drosihn
At 12:53 AM +0600 1/23/03, Max Khon wrote:


On Wed, Jan 22, 2003 at 07:18:44PM +0100, Jan Srzednicki wrote:

Would that be a big problem to allow some fsck option not
to erase all these softupdates-pending inodes, but to put
them in lost+found as usual?
  
   It certainly couldn't be done with the background fsck,
   because background fsck works on a snapshot and not the
   running filesystem; thus, it cannot make any allocations -- it
   can only deallocate things.
 
  Still, in case you know some of your important files can be lost,
  you can boot the system to single user and run foreground fsck.

this is not an option if the system was rebooted because of power
loss or kernel panic


Can't you just set the rc.conf option to not-do the background fsck?

--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-22 Thread Max Khon
hi, there!

On Wed, Jan 22, 2003 at 02:43:37PM -0500, Garance A Drosihn wrote:

  Would that be a big problem to allow some fsck option not
  to erase all these softupdates-pending inodes, but to put
  them in lost+found as usual?

 It certainly couldn't be done with the background fsck,
 because background fsck works on a snapshot and not the
 running filesystem; thus, it cannot make any allocations -- it
 can only deallocate things.
   
Still, in case you know some of your important files can be lost,
you can boot the system to single user and run foreground fsck.
 
 this is not an option if the system was rebooted because of power
 loss or kernel panic
 
 Can't you just set the rc.conf option to not-do the background fsck?

I can but the whole purpose of background fsck (faster startup times)
will be lost.

/fjoe


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-22 Thread Garrett Wollman
On Wed, 22 Jan 2003 11:32:12 -0800, David Schultz [EMAIL PROTECTED] 
said:

 Unfortunately, I think it is possible that the unreferenced inode
 has not been initialized, even though it is allocated in the inode
 bitmap, so you could potentially get random junk.

That is definitely true on UFS2, which I had forgotten.  UFS2 inodes
are only initialized when they are used.

-GAWollman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



background fsck did not create lost+found

2003-01-20 Thread Jan Srzednicki

Hello,
After building new world and installing new kernel, I rebooted my machine
to launch a new kernel. The system mysteriously failed to flush 22 disk
buffers, and after reboot fsck was launched. I have the following
partitions:

/ - UFS1
/usr - UFS2
/home - UFS1

This massive disk mangling occured on /usr, but still, one file in /home
got lost - which happened to be quite important file. Background fsck
logged:

Jan 20 16:06:30 stronghold root: /dev/ad1s1d: UNREF FILE I=1723065
OWNER=winfried MODE=100644
Jan 20 16:06:30 stronghold root: /dev/ad1s1d: SIZE=23397 MTIME=Jan 20
15:57 2003 (CLEARED)
Jan 20 16:06:30 stronghold root: /dev/ad1s1d: Reclaimed: 0 directories, 8
files, 16439 fragments
Jan 20 16:06:30 stronghold root: /dev/ad1s1d: 33802 files, 13109700 used,
6310697 free (11577 frags, 787390 blocks, 0.1% fragmentation)

First two entries clearly correspond to the missing file, which should
have been put in /home/lost+found. But, the poroblem is that no lost+found
directory was created, while it should (as fsck_ffs(8) says). I guess its
a bug, probably in the background fsck code. Still, is there any way to
reclaim the file now, besides running strings(1) on the whole partition?

-- 
  -- wrzask --= v =-- Winfried --=-- GG# 3838383 --=-- JS500-RIPE --
-- [EMAIL PROTECTED] --- [EMAIL PROTECTED] --===-- http://violent.dream.vg/ ---
--= Ride the wild wind - push the envelope, don't sit on the fence, ---
  -- Ride the wild wind - live life on the razor's edge! =-- Queen --


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-20 Thread Dan Nelson
In the last episode (Jan 20), Jan Srzednicki said:
 After building new world and installing new kernel, I rebooted my
 machine to launch a new kernel. The system mysteriously failed to
 flush 22 disk buffers, and after reboot fsck was launched.
[...] 
 This massive disk mangling occured on /usr, but still, one file in
 /home got lost - which happened to be quite important file.
 Background fsck logged:
 
 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: UNREF FILE I=1723065 OWNER=winfried 
MODE=100644
 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: SIZE=23397 MTIME=Jan 20 15:57 2003 
(CLEARED)
 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: Reclaimed: 0 directories, 8 files, 
16439 fragments
 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: 33802 files, 13109700 used, 6310697 
free (11577 frags, 787390 blocks, 0.1% fragmentation)
 
 First two entries clearly correspond to the missing file, which
 should have been put in /home/lost+found. But, the poroblem is that
 no lost+found directory was created, while it should (as fsck_ffs(8)
 says). I guess its a bug, probably in the background fsck code.
 Still, is there any way to reclaim the file now, besides running
 strings(1) on the whole partition?

It's not a bug.  Softupdates works by guaranteeing that the only things
that a background fsck needs to do are reduce link counts, clear
inodes, and fix free-space bitmaps.  Bgfsck will clear a file's space
rather than put it in lost+found.  This means that if you delete a
file, immediately create a new one with the same name, and then reboot
within 30 seconds, both files will be gone.  You can minimize the risk
by lowering the kern.metadelay, kern.dirdelay, and kern.filedelay
sysctl values, but the lower you go, the less benefit you get.

String'ing the raw partition is probably your best bet for recovering
the data.

-- 
Dan Nelson
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-20 Thread David Schultz
Thus spake Jan Srzednicki [EMAIL PROTECTED]:
 This massive disk mangling occured on /usr, but still, one file in /home
 got lost - which happened to be quite important file. Background fsck
 logged:
 
 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: UNREF FILE I=1723065
 OWNER=winfried MODE=100644
 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: SIZE=23397 MTIME=Jan 20
 15:57 2003 (CLEARED)
 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: Reclaimed: 0 directories, 8
 files, 16439 fragments
 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: 33802 files, 13109700 used,
 6310697 free (11577 frags, 787390 blocks, 0.1% fragmentation)
 
 First two entries clearly correspond to the missing file, which should
 have been put in /home/lost+found. But, the poroblem is that no lost+found
 directory was created, while it should (as fsck_ffs(8) says). I guess its
 a bug, probably in the background fsck code. Still, is there any way to
 reclaim the file now, besides running strings(1) on the whole partition?

Consider what happens when you remove a large directory tree.
Thousands of directory entries may be removed, but in the
softupdates case, the inodes will stick around a bit longer.  The
same also applies to files that have been intentionally unlinked
but are still open.  To avoid a syndrome where all these thousands
of files end up in lost+found after a crash or power failure, fsck
just removes them on softupdates-enabled filesystems.

Unfortunately, this means that a newly-created file that has an
inode but no directory entry will also be cleared.  In some sense,
this race is equivalent to the situation where something went
wrong before the inode could be written.

However, when you are saving a new version of an important file,
you need to be careful that the new version (and its directory
entry) hits the disk before the old one goes away.  I know that vi
saves files in a safe way, whereas ee and emacs do not.  (Emacs
introduces only a small race, though.)  Also, mv will DTRT only if
the source and destination files live on the same filesystem.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-20 Thread Matthew Dillon
:However, when you are saving a new version of an important file,
:you need to be careful that the new version (and its directory
:entry) hits the disk before the old one goes away.  I know that vi
:saves files in a safe way, whereas ee and emacs do not.  (Emacs
:introduces only a small race, though.)  Also, mv will DTRT only if
:the source and destination files live on the same filesystem.
:

I think you have that reversed.  vi just overwrites the destination
file (O_CREAT|O_TRUNC, try ktrace'ing a vi session and you will see). 
I believe emacs defaults to a mode where it creates a new file and 
renames it over the original. 

This means that there is a period of time where a crash may result in
the loss of the file if the vi session cannot be recovered (with vi -r)
after the fact.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck did not create lost+found

2003-01-20 Thread David Schultz
Thus spake Matthew Dillon [EMAIL PROTECTED]:
 :However, when you are saving a new version of an important file,
 :you need to be careful that the new version (and its directory
 :entry) hits the disk before the old one goes away.  I know that vi
 :saves files in a safe way, whereas ee and emacs do not.  (Emacs
 :introduces only a small race, though.)  Also, mv will DTRT only if
 :the source and destination files live on the same filesystem.
 :
 
 I think you have that reversed.  vi just overwrites the destination
 file (O_CREAT|O_TRUNC, try ktrace'ing a vi session and you will see). 
 I believe emacs defaults to a mode where it creates a new file and 
 renames it over the original. 
 
 This means that there is a period of time where a crash may result in
 the loss of the file if the vi session cannot be recovered (with vi -r)
 after the fact.

vi writes and fsyncs a recovery file when it opens a file for
editing, and it fsyncs the real file before removing the recovery
file.  (I don't know how reliable vi's recovery mechanism is
because I don't use vi, but at least it's ensuring that the
recovery file is written to disk when it should be.)

In Emacs, if 'make-backup-files' is non-nil (the default), the
original file ${FILE} is renamed to ${FILE}~.  Then it writes out
and fsyncs a new file, which is perfectly safe.  If
'make-backup-files' is nil, emacs simply omits the renaming part,
unsafely overwriting the original file.  The behavior in the
latter case appears to be a bug, or at least an undocumented
feature.  Emacs even causes data loss in this case when the disk
fills up!  It needs to either do an fsync/rename or write and
fsync a backup file for the duration of the save.

Lastly, with ee, there's no backup file and no fsync.

Some ktrace snippets are below.


  3662 vi   CALL  open(0x808e260,0x2,0x180)
  3662 vi   NAMI  /var/tmp/vi.recover/vi.HjDlgO
  3662 vi   RET   open 4
...
  3662 vi   CALL  write(0x4,0x809a01c,0x400)
  3662 vi   GIO   fd 4 wrote 1024 bytes
   [...]old contents[...]
...
  3662 vi   CALL  fsync(0x4)
  3662 vi   RET   fsync 0
...
[I edit the file from old contents to new contents]
...
  3662 vi   CALL  open(0x8095140,0x601,0x1b6)
  3662 vi   NAMI  foo
  3662 vi   RET   open 7
...
  3662 vi   CALL  write(0x7,0x80bb000,0xd)
  3662 vi   GIO   fd 7 wrote 13 bytes
   new contents
   
  3662 vi   RET   write 13/0xd
...
  3662 vi   CALL  fsync(0x7)
  3662 vi   RET   fsync 0
  3662 vi   CALL  close(0x7)
  3662 vi   RET   close 0
...
  3662 vi   CALL  lseek(0x4,0,0x400,0,0)
  3662 vi   RET   lseek 1024/0x400
  3662 vi   CALL  write(0x4,0x809a01c,0x400)
  3662 vi   GIO   fd 4 wrote 1024 bytes
[...]new contents[...]
...
  3662 vi   CALL  fsync(0x4)
  3662 vi   RET   fsync 0

[The following bit only happens if make-backup-files is non-nil]
  3799 emacsCALL  rename(0x848c328,0x848fba8)
  3799 emacsNAMI  /home/test/foo
  3799 emacsNAMI  /home/test/foo~
  3799 emacsRET   rename 0
...
[This part happens unconditionally]
  3799 emacsCALL  open(0x848c328,0x601,0x1b6)
  3799 emacsNAMI  /home/test/foo
  3799 emacsRET   open 3
  3799 emacsCALL  write(0x3,0xbfbfae24,0x3)
  3799 emacsGIO   fd 3 wrote 3 bytes
   new
  3799 emacsRET   write 3
  3799 emacsCALL  write(0x3,0xbfbfae24,0x9)
  3799 emacsGIO   fd 3 wrote 9 bytes
contents
  3799 emacsRET   write 9
  3799 emacsCALL  fsync(0x3)
  3799 emacsRET   fsync 0
  3799 emacsCALL  close(0x3)
  3799 emacsRET   close 0

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Crashes with 5.0-RC2 (background fsck?)

2003-01-02 Thread Kevin Oberman
Environment: IBM ThinkPad 600E with fresh RC2 installation

Failure:
mode = 041777, inum = 534, fs = /usr
panic: ffs_valloc: dup alloc

syncing disks, buffers remaining... panic: bwrite: buffer is not
busy???

I was unable to bring up the system long enough to build a new kernel
with debug (or even load the sources to build a kernel).

I booted stand-alone and manually run 'fsck_ffs -y' on /usr. (It
failed 'fsck -p'.) I seems to be stable and have now built a new kernel
with DDB and symbols.

Would the dumps (without symbols) from GENERIC be of any use in
looking at this?

R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]  Phone: +1 510 486-8634

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



panic in background fsck

2002-12-14 Thread Andrew Gallatin

This was on an x86, 5.0-current as of roughly 8am EST today.

Machine panic'ed when it finished /usr and moved on to /var on
a manually invoked fsck -B (I'd hit ^C to abort multi-user startup
during fsck and was surprised to see the system continue on.. ;)

Drew


panic: ffs_blkfree: freeing free block

syncing disks, buffers remaining... panic: bremfree: bp 0xce3e35a8 not locked
Uptime: 6m38s
Dumping 511 MB
ata0: resetting devices ..
done
[CTRL-C to abort] [CTRL-C to abort]  16 32 48 64 80 96 112 128 144 160 176 192 208 224 
240 256 272 288 304[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]  320 336 352 
368 384 400 416 432 448 464 480 496
---
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:232
232 dumping++;
(kgdb) where
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:232
#1  0xc024bc45 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:364
#2  0xc024be93 in panic () at /usr/src/sys/kern/kern_shutdown.c:517
#3  0xc028c2a7 in bremfree (bp=0xce3e35a8) at /usr/src/sys/kern/vfs_bio.c:632
#4  0xc0214f91 in spec_fsync (ap=0xdcb103c4)
at /usr/src/sys/fs/specfs/spec_vnops.c:465
#5  0xc0214428 in spec_vnoperate (ap=0x0)
at /usr/src/sys/fs/specfs/spec_vnops.c:126
#6  0xc0317eb7 in ffs_sync (mp=0xc3f2b000, waitfor=2, cred=0xc1263e80,
td=0xc0402c20) at vnode_if.h:612
#7  0xc02a018b in sync (td=0xc0402c20, uap=0x0)
at /usr/src/sys/kern/vfs_syscalls.c:138
#8  0xc024b88c in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:273
#9  0xc024be93 in panic () at /usr/src/sys/kern/kern_shutdown.c:517
#10 0xc03007ca in ffs_blkfree (fs=0xc3f69000, devvp=0xc3f8b708, bno=7713464,
size=16384, inum=3) at /usr/src/sys/ufs/ffs/ffs_alloc.c:1771
#11 0xc030fef1 in indir_trunc (freeblks=0xc1b27a00, dbn=22201312, level=1,
lbn=4108, countp=0xdcb105f0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:2603
#12 0xc030f99e in handle_workitem_freeblocks (freeblks=0xc1b27a00, flags=0)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:2469
#13 0xc030cbbd in process_worklist_item (matchmnt=0xc3f2b200, flags=0)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:745
#14 0xc030c900 in softdep_process_worklist (matchmnt=0xc3f2b200)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:624
#15 0xc030ce82 in softdep_flushworklist (oldmnt=0xc3f2b200, countp=0xdcb10704,
td=0xc4115540) at /usr/src/sys/ufs/ffs/ffs_softdep.c:838
#16 0xc0317e09 in ffs_sync (mp=0xc3f2b200, waitfor=1, cred=0xc4303800,
td=0xc4115540) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1168
#17 0xc02a8b90 in vfs_write_suspend (mp=0xc3f2b200)
at /usr/src/sys/kern/vfs_vnops.c:1020
#18 0xc0306e20 in ffs_snapshot (mp=0xc3f2b200, snapfile=---Can't read userspace from 
dump, or kernel process---

)
at /usr/src/sys/ufs/ffs/ffs_snapshot.c:312
#19 0xc0315f28 in ffs_mount (mp=0xc3f2b200, path=0xc1b2d080 /var, data=0x0,
ndp=0xdcb10bec, td=0xc4115540) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:297
#20 0xc029848a in vfs_mount (td=0xc4115540, fstype=0xc3f7f100 ,
fspath=0xc1b2d080 /var, fsflags=-592376852, fsdata=0x0)
at /usr/src/sys/kern/vfs_mount.c:1060
#21 0xc0297c18 in mount (td=0x0, uap=0xdcb10d10)
at /usr/src/sys/kern/vfs_mount.c:818
#22 0xc036f7ce in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = -1077938028, tf_ebp = 
-1077938152, tf_isp = -592376460, tf_ebx = 135001510, tf_edx = 19, tf_ecx = 135001344, 
tf_eax = 21, tf_trapno = 12, tf_err = 2, tf_eip = 134571915, tf_cs = 31, tf_eflags = 
514, tf_esp = -1077938468, tf_ss = 47})
at /usr/src/sys/i386/i386/trap.c:1033
#23 0xc035fa6d in Xint0x80_syscall () at {standard input}:140
---Can't read userspace from dump, or kernel process---



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Data modified on freelist with background fsck

2002-01-12 Thread Alexander Leidinger

Hi,

this was with a Jan 8 kernel on -current:

WARNING: / was not properly dismounted
WARNING: /big was not properly dismounted
/big: lost blocks 8 files 2
/big: superblock summary recomputed
Data modified on freelist: word 0 of object 0xc3990620 size 16 previous type pcb
 (0xdeadc1de != 0xdeadc0de)

~a minute after this the system rebootet (I've used X at that time, so
no handwritten panic strings), no core dump.

I know background fsck isn't mature yet, this is just for the
bughunters.

Bye,
Alexander.

-- 
   Speak softly and carry a cellular phone.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



background fsck panics?

2001-07-05 Thread Christian Carstensen


hi,

after some weeks without softupdates, i recently reenabled them on my
notebook. it seems, that my problems still exist. background checking
one of my partitions leads to a panic() complaining about  some block
too large error. sorry, i didn't capture the message, and for obvious
reasons i don't like reproducing that condition.
it might be useful to know, that fsck checks all other partitions without
problems. moreover, that one partition that causes the problem, contains
a .fsck_snapshot entry:

slot 5 ino 0 reclen 444: regular, `.fsck_snapshot'

any ideas?


best,
  christian

--
Sorry, no defects found. Please try a different search
  [http://www.cisco.com/support/bugtools/bugtool.shtml]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck

2001-05-21 Thread John Baldwin


On 19-May-01 Matthew Thyer wrote:
 Is it possible that background fsck is not the culprit here ?  I think
 this may be fallout from the dirpref changes as Chris Knight recently
 emailed in 018b01c0c496$07ed13d0$[EMAIL PROTECTED].  The solution
 is to unmount all your filesystems, fsck them and then use tunefs with
 -A to change something so that all superblock backups will be updated.
 
 Does this sound likely ?
 
 Jason Evans wrote:
 
 I had exactly the same thing happen to /var on an SMP test box using
 -current as of 16 May.  It happened once out of about a half dozen panics.
 
 Jason

My instances at least are not the result of dirpref, as my laptop tracks
-current very closely and I navigated the dirpref waters a while back.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck

2001-05-18 Thread Jason Evans

I had exactly the same thing happen to /var on an SMP test box using
-current as of 16 May.  It happened once out of about a half dozen panics.

Jason

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck

2001-05-18 Thread Brian Somers

This happens to me ``almost all the time'' on my dev box:

Filesystem   1K-blocks UsedAvail Capacity  Mounted on
/dev/ad0s1a 25406382600   15113835%/
devfs110   100%/dev
procfs   440   100%/proc
/dev/ad0s1e 2540637   233731 0%/tmp
/dev/ad1s2a 49623926424   430116 6%/var
/dev/ad1s2e4466254  1448160  266079435%/usr
/dev/ad0s1f 775487   392540   32090955%/usr/obj
/dev/ad1s1a   10145116  5631076  370243260%/usr/ports/distfiles
/dev/ad1s1e   10145116  4957632  437587653%/usr/audio
/dev/ad1s1g4963030  3621790   94419879%/usr/packages
/dev/ad1s1f   10145116  4790396  454311251%/cvs
/dev/ad1s2f   330596761 30414901 0%/spare1

The interesting thing is that it always happens on /usr and /cvs 
and no other partitions.  Both of these partitions have large 
directory hierarchies

Also, FWIW it now takes nearly 30 minutes to fsck my laptop's disk 
(20Gb 5400rpm).  That's not good

 Has anyone else been trying out the background fsck?   Last night I was working
 on the ithread code some and managed to panic my laptop while ejecting a pccard.
 Anyways, the kernel ate itself while trying to flush its buffers and I ended up
 with a dirty filesystem.  I rebooted and let fsck -p do its usual thing, except
 that it freaked out.  The actual fsck of / proceeded fine (actual fs activity
 when I panic'd my machine was very low, so the filesystems weren't corrupted,
 just marked dirty).  When it got to /usr and /var, however, fsck freaked out
 and claimed that the primary superblock didn't match the first alternate.   At
 this point I first had a heart attack.  Once I recovered from that, I attempted
 read-only mounts of /usr and /var which did succeed, except that each mount
 spewed out a message to the kernel console about losing x files and y blocks. 
 Confident that my fs wasn't totally hosed after doing some ls's, I unmounted
 /usr and /var and ran a non-preen fsck on them, which insisted on using an
 alternate superblock, but otherwise proceeded fine (except that it seemed to
 take longer than usual).  Once the fscks's finished, it seemed to be all ok. 
 Is anyone else seeing any weird stuff like this?  I've never had fsck complain
 about the superblocks after a crash before.
 
  df -t ufs
 Filesystem  1K-blocks UsedAvail Capacity  Mounted on
 /dev/ad0s2a148823847175220162%/
 /dev/ad0s2f  10191770  7052563  232386675%/usr
 /dev/ad0s2e 99183142547699516%/var
  mount -t ufs
 /dev/ad0s2a on / (ufs, local)
 /dev/ad0s2f on /usr (ufs, local)
 /dev/ad0s2e on /var (ufs, local)
  grep ufs /etc/fstab
 /dev/ad0s2a /   ufs rw  1   1
 /dev/ad0s2f /usrufs rw  2   2
 /dev/ad0s2e /varufs rw  2   2
 
 Hmm, that's odd, I did have soft updates on on /usr and /var before the crash. 
 It seems to be off now. :(
 
 -- 
 
 John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
 PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
 Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

-- 
Brian [EMAIL PROTECTED]brian@[uk.]FreeBSD.org
  http://www.Awfulhak.org   brian@[uk.]OpenBSD.org
Don't _EVER_ lose your sense of humour !



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



background fsck

2001-05-17 Thread John Baldwin

Has anyone else been trying out the background fsck?   Last night I was working
on the ithread code some and managed to panic my laptop while ejecting a pccard.
Anyways, the kernel ate itself while trying to flush its buffers and I ended up
with a dirty filesystem.  I rebooted and let fsck -p do its usual thing, except
that it freaked out.  The actual fsck of / proceeded fine (actual fs activity
when I panic'd my machine was very low, so the filesystems weren't corrupted,
just marked dirty).  When it got to /usr and /var, however, fsck freaked out
and claimed that the primary superblock didn't match the first alternate.   At
this point I first had a heart attack.  Once I recovered from that, I attempted
read-only mounts of /usr and /var which did succeed, except that each mount
spewed out a message to the kernel console about losing x files and y blocks. 
Confident that my fs wasn't totally hosed after doing some ls's, I unmounted
/usr and /var and ran a non-preen fsck on them, which insisted on using an
alternate superblock, but otherwise proceeded fine (except that it seemed to
take longer than usual).  Once the fscks's finished, it seemed to be all ok. 
Is anyone else seeing any weird stuff like this?  I've never had fsck complain
about the superblocks after a crash before.

 df -t ufs
Filesystem  1K-blocks UsedAvail Capacity  Mounted on
/dev/ad0s2a148823847175220162%/
/dev/ad0s2f  10191770  7052563  232386675%/usr
/dev/ad0s2e 99183142547699516%/var
 mount -t ufs
/dev/ad0s2a on / (ufs, local)
/dev/ad0s2f on /usr (ufs, local)
/dev/ad0s2e on /var (ufs, local)
 grep ufs /etc/fstab
/dev/ad0s2a /   ufs rw  1   1
/dev/ad0s2f /usrufs rw  2   2
/dev/ad0s2e /varufs rw  2   2

Hmm, that's odd, I did have soft updates on on /usr and /var before the crash. 
It seems to be off now. :(

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck

2001-05-17 Thread David Wolfskill

Date: Thu, 17 May 2001 14:31:55 -0700 (PDT)
From: John Baldwin [EMAIL PROTECTED]

Has anyone else been trying out the background fsck?

A little; despite my desire to help debug things, getting to a point
where doing this is appropriate isn't something I am all too eager to do.
Thus, it wasn't exactly voluntary.  :-}

Last night I was working on the ithread code some and managed to panic
my laptop while ejecting a pccard.  Anyways, the kernel ate itself while
trying to flush its buffers and I ended up with a dirty filesystem.

I've had a couple of occasions when I'd boot my laptop (which resembles
yours, as you may recall) from -STABLE into -CURRENT (or vice versa),
and xdm would fire up, but not present a login window.  Meanwhile, the
fan kicks into high gear (indicating that the machine is a tad busy,
thankyouverymuch), and I can't get its attention by any means I have
been able to discover short of a power-cycle.  (At least the button does
the job; I didn't need to yank the batteries out.)  But lid-closure just
shut off the display, no key chord I could find had a noticable effect,
nor did removing  re-inserting a PCMCIA card.

I rebooted and let fsck -p do its usual thing, except
that it freaked out.  The actual fsck of / proceeded fine (actual fs activity
when I panic'd my machine was very low, so the filesystems weren't corrupted,
just marked dirty).  When it got to /usr and /var, however, fsck freaked out
and claimed that the primary superblock didn't match the first alternate.

Well, I confess that the first couple of times I had been running
-CURRENT and the box wanted a fsck more elaborate than the -p variety, I
recalled that there had been recent activity, and I remembered one
person's rather unfortunate experience of finding everything sitting in
lost+found.  Since I had no desire for that to happen, I booted -STABLE
instead:  single-user mode, fsck -p.  Wasn't quite happy with a couple
of file systems, so I did manual fsck (still under -STABLE) on each of
those.  Finally, system said things were OK; I was able to do a
mount-a, so after that, I did a reboot into -CURRENT.

Much to my surprise (and some chagrin), those 2 file systems that needed
the extra attention (/var and -CURRENT's /usr, if I recall correctly)
didn't pass muster with -CURRENT's fsck; it wanted a manual fsck of
those, no question about it.  Since they passed -STABLE's fsck, I
figured they weren't likely in *too* terribly bad shape, so I went ahead
and did the manual fsck, per request.  And in each case, I had a similar
symptom (re: primary  first alternate superblock mismatch).

I did wonder about making a choice just between those two, without
checking for one of the other alternates (some sort of voting protocol
-- though I wouldn't be too terribly keen on making fsck unecessarily
complicated, certainly).  But under the circumstances, I wanted to run
-CURRENT, so I didn't see that I had a great deal of choice in the
matter (regardless of what I was being asked), so I told it to go ahead.

Following those manual fscks, I re-booted into multi-user mode, and
things worked normally (as far as I can tell).

At this point I first had a heart attack.

I believe that a technical term for that literary device is hyperbole.  :-)

Once I recovered from that, I attempted
read-only mounts of /usr and /var which did succeed, except that each mount
spewed out a message to the kernel console about losing x files and y blocks. 
Confident that my fs wasn't totally hosed after doing some ls's, I unmounted
/usr and /var and ran a non-preen fsck on them, which insisted on using an
alternate superblock, but otherwise proceeded fine (except that it seemed to
take longer than usual).  Once the fscks's finished, it seemed to be all ok. 
Is anyone else seeing any weird stuff like this?  I've never had fsck complain
about the superblocks after a crash before.

As noted, it's happened a couple of times for me.  Generally, somewhat
inopportune times (almost by definition), so I wasn't really able to
take the time to sit back, take notes, and report back much that was
coherent.  And I was under the impression that much of this was under
construction anyhow, so the value of any report I maight make was
somewhat open to question (from my perspective, anyhow).

...

Hmm, that's odd, I did have soft updates on on /usr and /var before the crash. 
It seems to be off now. :(

That also happened to me.  I thought it odd at the time, but forgot to
mention it  At least I have some reason to believe I was unlikely to
have been hallucinating about that

Cheers,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
As a computing professional, I believe it would be unethical for me to
advise, recommend, or support the use (save possibly for personal
amusement) of any product that is or depends on any Microsoft product.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck

2001-05-17 Thread David Scheidt

On Thu, 17 May 2001, David Wolfskill wrote:

:From: John Baldwin [EMAIL PROTECTED]
:
:Hmm, that's odd, I did have soft updates on on /usr and /var before the crash.
:It seems to be off now. :(
:
:That also happened to me.  I thought it odd at the time, but forgot to
:mention it  At least I have some reason to believe I was unlikely to
:have been hallucinating about that

Does tunefs update the alternate superblocks when it enables soft updates?
It doesn't look it does, but I might be missing something.

-- 
[EMAIL PROTECTED]
Bipedalism is only a fad.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck

2001-05-17 Thread David Wolfskill

Date: Thu, 17 May 2001 22:30:03 -0500 (CDT)
From: David Scheidt [EMAIL PROTECTED]

Does tunefs update the alternate superblocks when it enables soft updates?
It doesn't look it does, but I might be missing something.

I could easily have overlooked something myself, but it doesn't appear
to do so to me.

(I see it does want the file system clean when soft updates is enabled,
but doesn't check for that for a disable request.)

Cheers,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
As a computing professional, I believe it would be unethical for me to
advise, recommend, or support the use (save possibly for personal
amusement) of any product that is or depends on any Microsoft product.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: background fsck

2001-05-17 Thread David Scheidt

cc list trimed.
On Thu, 17 May 2001, David Wolfskill wrote:

:(I see it does want the file system clean when soft updates is enabled,
:but doesn't check for that for a disable request.)
:

Right.  fsck(8) can make assumptions about the state of the filesystem if it
knows that softupdates were in use.  (There's a smaller set of possible
inconsistancies, but I don't remember what they are.)  It's safe for fsck to
assume that the filesystem could be in worse shape than it actually is, but
not safe to assume it's cleaner.

David
-- 
[EMAIL PROTECTED]
Bipedalism is only a fad.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message