Re: swapon vs savecore dilemma

2003-09-03 Thread Terry Lambert
Dirk Meyer wrote:
   Wouldn't fsck - mount - savecore - swapon be a more appropriate order?
 
 Terry Lambert schrieb:,
  If you had small enough disks, large enough RAM, or could limit
  the number of CG bitmaps you had to simultaneously examine, then
  yes.  Otherwise, no.
 
 Can't we get a knob in /etc/rc.conf to choses that per system?
 
 kind regards Dirk

See Doug Barton's posting under the Subject: line of:

savecore check for a dump patch for review

Which provides a better solution than a blind knob.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Scott Long
Doug White wrote:
Hey folks,

It looks like we may need to rethink the way swap is mounted at boot time
if we want crashdumps to work.
Recently(?), a change was made so you can no longer open a swap partition
read/write after it is activated with swapon(8).  In the current boot
sequence, swap is mounted before the root fsck starts so additional space
is available if the root fsck needs it.  But at that point no partitions
are available for writing a core to, so we can't run savecore then.
Without crashdumps debugging gets kind of interesting.

Suggestions, other than have separate dump and swap partitions?



I question the wizdom of what you're describing.  If swap space needs to
be made available for fsck to run, then what happens to the crashdump
data that used to be on the swap partition?  Doing a swapon(8) means
that nothing in the swap partition is reliable or consistent anymore.
Scott

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Pawel Worach
Scott Long wrote:

Doug White wrote:

Hey folks,

It looks like we may need to rethink the way swap is mounted at boot 
time
if we want crashdumps to work.

I question the wizdom of what you're describing.  If swap space needs to
be made available for fsck to run, then what happens to the crashdump
data that used to be on the swap partition?  Doing a swapon(8) means
that nothing in the swap partition is reliable or consistent anymore.
Scott

Yes, I have seen this too.
Sep  2 02:16:30 darkstar savecore: /dev/da0s1b: Operation not permitted
Sep  2 02:16:30 darkstar savecore: no dumps found
Is fsck really that memory heavy so that it needs swap?
Wouldn't fsck - mount - savecore - swapon be a more appropriate order?
   - Pawel

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Terry Lambert
Doug White wrote:
 It looks like we may need to rethink the way swap is mounted at boot time
 if we want crashdumps to work.
 
 Recently(?), a change was made so you can no longer open a swap partition
 read/write after it is activated with swapon(8).  In the current boot
 sequence, swap is mounted before the root fsck starts so additional space
 is available if the root fsck needs it.  But at that point no partitions
 are available for writing a core to, so we can't run savecore then.
 
 Without crashdumps debugging gets kind of interesting.
 
 Suggestions, other than have separate dump and swap partitions?

There are a couple of stacked problems in this area:

1)  How do I do a savecore, if the FS to which I want to write
the crash dump image is dirty?

2)  How to I do an fsck on a dirty FS to enable me to write a
crash dump image, without overwriting the crash dump image
itself?

3)  How do I do a crash dump early on in the boot cycle, if
the partition to which I'm dumping is not open for write?

I think the main problem here is that we now swap on very early,
on the theory that we have a very large FS that we need to fsck.

Probably the correct thing to do is to fix fsck so that it
doesn't need swap to run, even for a very large FS.  This is
relatively trivial to do, if you are willing to either change
the on-disk layout, or do a little hocus-pocus with the contents
of inode 1 (the whiteout file) to keep a rotating cylinder
group bitmap log, and introduce stall barriers so that the number
of simultaneously dirty cylinder groups is small enough that you
could fit them in memory without swapping.

Meanwhile, a workaround that would handle almost all the cases
would be to *conditionally* do the swap-on, only if there is a
dirty FS, and the checking of the FS would require swap in order
to succeed.  In general, people with huge FS's, if they care at
all about system dumps, can set aside a separate space for them,
while the majority of the rest of the work can fsck without them
needing separate swap for it to succeed.

That would at least restore the status-quo, pre the swapon change,
for everyone whose fsck would have been successful without major
surgery anyway.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Terry Lambert
Pawel Worach wrote:
 Is fsck really that memory heavy so that it needs swap?

Yes, if you have a huge FS.

The problem is that the checking of the CG bitmaps during an fsck
require that you have all the bitmaps in core, and then linearly
traverse the entire directory structure to identify which bits
need to be cleared (to indicate that the block was deallocated
prior to the crash, without the bitmap being successfully written
out).

Because a block allocation for any file can be written (effectively)
anywhere on the disk, and there is no guarantee of cylinder group
locality, this basically means that you have to hold all the
bitmaps in memory simultaneously ...or you have to make multiple
passes over the directory structure, with the number of passes being
equal to the total number of CG's divided by the number of bitmaps
you can keep in core simultaneously.

This is hideously expensive, and it's never been implemented: you
are assumed to have enough memory (or memory and swap) available
to hold all the bitmaps in memory simultaneously.  If you have a
multiterabyte FS, passing over it 9 times instead of once with
swapping would be extremely dissatisfying: presumably you have
all that data for a reason, and need it back online as fast as
possible.

My suggestion (which has been my suggestion all along) is to add
two date stamped CG bitmap bitmaps somewhere (my favorite place
for this is to steal space at the front of inode 1, which is used
only rarely, since people don't use the whiteout feature, and
which can be made compatible with whiteouts, in any case).  Then
you don't let more than some small number be dirty simultaneously,
without flushing some of them out (you would need an additional
soft dependency to implement this).

If you did this, then you could guarantee a smaller set of data
to be simultaneously dirty, even for an arbitrarily large FS.  You
just load only those bitmaps that are marked dirty in the bitmap
logs, and do a single pass through the full directory structure.


 Wouldn't fsck - mount - savecore - swapon be a more appropriate order?

If you had small enough disks, large enough RAM, or could limit
the number of CG bitmaps you had to simultaneously examine, then
yes.  Otherwise, no.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Bakul Shah
  Is fsck really that memory heavy so that it needs swap?
 
 Yes, if you have a huge FS.
 
 The problem is that the checking of the CG bitmaps during an fsck
 require that you have all the bitmaps in core

Hmm
For a one TB FS with 8KB block size you need 2^(40-13) bits
to keep track of blocks.  That is 2^24 bytes or 16Mbytes.
That doesn't seem so bad (considering that you really should
have a lot more RAM if you are playing terrybytes of data).

 My suggestion (which has been my suggestion all along) is to add
 two date stamped CG bitmap bitmaps somewhere (my favorite place
 for this is to steal space at the front of inode 1, which is used
 only rarely, since people don't use the whiteout feature, and
 which can be made compatible with whiteouts, in any case).

This is the old stable storage idea.  You need a generation
number rather than a date stamp but the idea is the same.
Something needs to be done so that time to fsck depends on
the outstanding FS traffic at the time of the crash rather
than the size of the FS (especially when you are dealing with
multi terabytes of data).
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Aaron Wohl
I usualy have a number of swap partitions since the max size of  a swap
partition is kind of limited.  I was thinking of changing it to do swapon
twice.  The first time early in the boot would skip mounting any swap
areas that had kernel core dumps.  Then after the savecore it could do
swapon again to mount the rest of the swap areas.   Either that or have
swaping start to allocate space at the oposite end of the swap space than
savecore uses.  
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Aaron Wohl writes:
I usualy have a number of swap partitions since the max size of  a swap
partition is kind of limited.  I was thinking of changing it to do swapon
twice.  The first time early in the boot would skip mounting any swap
areas that had kernel core dumps.  Then after the savecore it could do
swapon again to mount the rest of the swap areas.   Either that or have
swaping start to allocate space at the oposite end of the swap space than
savecore uses.  

Hmm, that was an unfortunate side effect.

The writing is only needed for marking the dump as read, the same
effect could be had much cleaner by writing the signature of the dump
to a file in /var/ somewhere and not reading dumps already in that
file.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Doug Barton
On Tue, 2 Sep 2003, Poul-Henning Kamp wrote:

 Hmm, that was an unfortunate side effect.

Heh, well, stuff happens. I think your idea of opening swap exclusive is
probably a good one, but it will require some gymnastics to accomodate
it. One thing that'd really help is an option to savecore that tells us
if there is a dump to deal with or not. If I had that, we could do
something like this in /etc/rc.d/savecore

if there is no dump
exit
else
does fsck -p of the fs to write the dump to succeed?
mount it rw
write the dump
clear the dump
exit
else
does try fsck -y of the fs without swap succeed?
mount, write, clear, exit
else
???

At the ??? point I'm not sure how best to proceed, since if we swapon to
the same partition with the dump, it's likely to corrupt the dump, yes?
On the other hand, we're doing swapon before savecore now, so I guess
I'm curious about how dangerous this really is.

Probably the right thing to do is to swapon, fsck -y, and if it succeeds
then swapoff, and try writing the dump anyway. I just want to be
sure before we start re-writing rc.d/savecore.

So, the first question is does the pseudocode above look reasonable, and
the second question is what's the likelihood of getting an option to
savecore to detect a dump to play with?

Doug

-- 

This .signature sanitized for your protection

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Doug Barton writes:
On Tue, 2 Sep 2003, Poul-Henning Kamp wrote:

 Hmm, that was an unfortunate side effect.

Heh, well, stuff happens. I think your idea of opening swap exclusive is
probably a good one, but it will require some gymnastics to accomodate
it.

Yeah, but I'm ENOTIME right now, so I've just dropped the exclusive
bit for now with an XXX comment.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Scott Long
Doug Barton wrote:
On Tue, 2 Sep 2003, Poul-Henning Kamp wrote:


Hmm, that was an unfortunate side effect.


Heh, well, stuff happens. I think your idea of opening swap exclusive is
probably a good one, but it will require some gymnastics to accomodate
it. One thing that'd really help is an option to savecore that tells us
if there is a dump to deal with or not. If I had that, we could do
something like this in /etc/rc.d/savecore
if there is no dump
exit
else
does fsck -p of the fs to write the dump to succeed?
mount it rw
write the dump
clear the dump
exit
else
does try fsck -y of the fs without swap succeed?
mount, write, clear, exit
else
???
At the ??? point I'm not sure how best to proceed, since if we swapon to
the same partition with the dump, it's likely to corrupt the dump, yes?
On the other hand, we're doing swapon before savecore now, so I guess
I'm curious about how dangerous this really is.
Probably the right thing to do is to swapon, fsck -y, and if it succeeds
then swapoff, and try writing the dump anyway. I just want to be
sure before we start re-writing rc.d/savecore.
So, the first question is does the pseudocode above look reasonable, and
the second question is what's the likelihood of getting an option to
savecore to detect a dump to play with?
Doug

I still think that the real problem is in running swapon before
savecore.  In 99% of the cases out there, RAM scales with storage,
so I really can't imaging fsck needing to swap, and certainly not
in it's 'preen-before-background' mode.
Scott

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Matthew D. Fuller
On Tue, Sep 02, 2003 at 12:58:40AM -0600 I heard the voice of
Scott Long, and lo! it spake thus:
 
 I still think that the real problem is in running swapon before
 savecore.  In 99% of the cases out there, RAM scales with storage,
 so I really can't imaging fsck needing to swap, and certainly not
 in it's 'preen-before-background' mode.

Note also that (last I heard, anyway) this is often worked around, or
non-issued, by us allocating swap from the bottom of the partition up,
and coredumps happening from the top down.  So, if you've got 512 megs
of swap, and 128 megs of ram, you'd need to use 384 megs of swap (+/-
housekeeping) before you corrupted your core.


-- 
Matthew Fuller (MF4839)   |  [EMAIL PROTECTED]
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/

The only reason I'm burning my candle at both ends, is because I
  haven't figured out how to light the middle yet
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Doug Barton
On Tue, 2 Sep 2003, Matthew D. Fuller wrote:

 On Tue, Sep 02, 2003 at 12:58:40AM -0600 I heard the voice of
 Scott Long, and lo! it spake thus:
 
  I still think that the real problem is in running swapon before
  savecore.  In 99% of the cases out there, RAM scales with storage,
  so I really can't imaging fsck needing to swap, and certainly not
  in it's 'preen-before-background' mode.

 Note also that (last I heard, anyway) this is often worked around, or
 non-issued, by us allocating swap from the bottom of the partition up,
 and coredumps happening from the top down.  So, if you've got 512 megs
 of swap, and 128 megs of ram, you'd need to use 384 megs of swap (+/-
 housekeeping) before you corrupted your core.

I agree that this _should_ be the case, but I've seen the advice of
putting in swap space equal to the amount of memory often enough to make
me nervous that this is a safe assumption.

Doug

-- 

This .signature sanitized for your protection

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Doug Barton
On Tue, 2 Sep 2003, Scott Long wrote:

 I still think that the real problem is in running swapon before
 savecore.  In 99% of the cases out there, RAM scales with storage,
 so I really can't imaging fsck needing to swap, and certainly not
 in it's 'preen-before-background' mode.

I agree, but the proble is that in order to make this successful in a
scenario when the system is well and truly fubar (which is where you're
most likely to want a good dump), then just moving it earlier isn't
enough.

Doug

-- 

This .signature sanitized for your protection

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: swapon vs savecore dilemma

2003-09-02 Thread Doug Barton
On Tue, 2 Sep 2003, Poul-Henning Kamp wrote:

 In message [EMAIL PROTECTED], Doug Barton writes:
 On Tue, 2 Sep 2003, Poul-Henning Kamp wrote:
 
  Hmm, that was an unfortunate side effect.
 
 Heh, well, stuff happens. I think your idea of opening swap exclusive is
 probably a good one, but it will require some gymnastics to accomodate
 it.

 Yeah, but I'm ENOTIME right now, so I've just dropped the exclusive
 bit for now with an XXX comment.

I wasn't suggesting that you do the rc part, I was volunteering myself
(or the team) for that bit. However, the voices are whispering in my ear
that making savecore tell me if I have a dump is really easy to
implement, so I might be able to do this bit too.

Doug

-- 

This .signature sanitized for your protection

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]