Re: bad sector in gmirror HDD

2011-08-20 Thread Daniel Kalchev

On Aug 20, 2011, at 06:24 , Jeremy Chadwick wrote:

 You might also be wondering that dd command writes 512 bytes of zero to
 that LBA; what about the old data that was there, in the case that the
 drive remaps the LBA?

If you write zeros at OS level to an LBA, you will end up with zeros at that 
LBA. What else did you expect???

The already remapped LBAs in ATA are not visible anymore to the user/OS. You 
get a perfectly readable sector. Of course not at the original location, but as 
you confirmed we are done with CHS addressing.

The pending bad sectors are almost always 'corrected', that is, remapped when 
you write to that LBA.

So your script will find only one readable sector and that will be the sector 
that is pending reallocation.

It may be that writing zeros to all free space, like

dd if=/dev/zero of=/filesystem/zero bs=1m; rm /filesystem/zero

is enough to remap the pending bad block and not have any unreadable sectors. 
But if the unreadable sector is in a file or directory -- bad luck -- these 
will need to be rewritten.

Once upon a time, BSD/OS had wonderful disk 'repair' utility. It could detect 
failing disks by reading every sector (had nice visual), or could re-write the 
drive by reading and writing back every sector. On bad blocks it would retry 
lots of times and eventually average what was read (with error).
Having said that, I doubt modern ATA drives will let anything be read by the 
pending bad block, but.. who knows.

Daniel

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon
on 18/08/2011 02:15 Steven Hartland said the following:
 In a nutshell the jail manager we're using will attempt to resurrect the jail
 from a dieing state in a few specific scenarios.
 
 Here's an exmaple:-
 1. jail restart requested
 2. jail is stopped, so the java processes is killed off, but active tcp 
 sessions
 may prevent the timely full shutdown of the jail.
 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
 starting a new jail we attach to the old one and exec the new java process.
 4. if an existing jail isnt detected, i.e. where there where not hanging tcp
 sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
 and the java exec'ed.
 
 The system uses static jailid's so its possible to determine if an existing
 jail for this service exists or not. This prevents duplicate services as
 well as making services easy to identify by their jailid.
 
 So what we could be seeing is a race between the jail shutdown and the attach
 of the new process?

Not a jail expert at all, but a few suggestions...

First, wouldn't the 'persist' jail option simplify your life a little bit?

Second, you may want to try to monitor value of prison0.pr_uref variable (e.g.
via kgdb) while executing various scenarios of what you do now.  If after
finishing a certain scenario you end up with a value lower than at the start of
scenario, then this is the troublesome one.
Please note that prison0.pr_uref is composed from a number of non-jailed
processes plus a number of top-level jails.  So take this into account when
comparing prison0.pr_uref values - it's better to record the initial value when
no jails are started and it's important to keep the number of non-jailed
processes the same (or to account for its changes).

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon
on 20/08/2011 13:02 Andriy Gapon said the following:
 on 18/08/2011 02:15 Steven Hartland said the following:
 In a nutshell the jail manager we're using will attempt to resurrect the jail
 from a dieing state in a few specific scenarios.

 Here's an exmaple:-
 1. jail restart requested
 2. jail is stopped, so the java processes is killed off, but active tcp 
 sessions
 may prevent the timely full shutdown of the jail.
 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
 starting a new jail we attach to the old one and exec the new java process.
 4. if an existing jail isnt detected, i.e. where there where not hanging tcp
 sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
 and the java exec'ed.

 The system uses static jailid's so its possible to determine if an existing
 jail for this service exists or not. This prevents duplicate services as
 well as making services easy to identify by their jailid.

 So what we could be seeing is a race between the jail shutdown and the attach
 of the new process?
 
 Not a jail expert at all, but a few suggestions...
 
 First, wouldn't the 'persist' jail option simplify your life a little bit?
 
 Second, you may want to try to monitor value of prison0.pr_uref variable (e.g.
 via kgdb) while executing various scenarios of what you do now.  If after
 finishing a certain scenario you end up with a value lower than at the start 
 of
 scenario, then this is the troublesome one.
 Please note that prison0.pr_uref is composed from a number of non-jailed
 processes plus a number of top-level jails.  So take this into account when
 comparing prison0.pr_uref values - it's better to record the initial value 
 when
 no jails are started and it's important to keep the number of non-jailed
 processes the same (or to account for its changes).

BTW, I suspect the following scenario, but I am not able to verify it either via
testing or in the code:
- last process in a dying jail exits
- pr_uref of the jail reaches zero
- pr_uref of prison0 gets decremented
- you attach to the jail and resurrect it
- but pr_uref of prison0 stays decremented

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Remote installing

2011-08-20 Thread Willem Jan Withagen

Hi,

Today I liked to live dangerously, and want to upgrade a backups server 
from i386 to amd64. Just to see if we could.

And otherwise I'd scap it and install from usb-stick.

So I have my server running amd64 build GENERIC.
export /, /var, /usr on the server to be upgraded.

But upgrading world dus have a snag already early on:


empty changed
flags expected schg found none not modified: Operation not 
supported



This is probably where some program wants to set immutable flag on 
/var/tmp/empy...


But looks like NFS does not grok that.

Now I seen plenty of sugestions to do it this way, but never saw anybody 
come back with this complaint


So I must be ommiting something ??

--WjW

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Remote installing

2011-08-20 Thread Willem Jan Withagen

On 2011-08-20 13:15, Willem Jan Withagen wrote:

Hi,

Today I liked to live dangerously, and want to upgrade a backups server
from i386 to amd64. Just to see if we could.
And otherwise I'd scap it and install from usb-stick.

So I have my server running amd64 build GENERIC.
export /, /var, /usr on the server to be upgraded.

But upgrading world dus have a snag already early on:


empty changed
flags expected schg found none not modified: Operation not supported


This is probably where some program wants to set immutable flag on
/var/tmp/empy...

But looks like NFS does not grok that.

Now I seen plenty of sugestions to do it this way, but never saw anybody
come back with this complaint

So I must be ommiting something ??


I looked at the work errors.
---
cd /mnt/; rm -f /mnt/sys; ln -s usr/src/sys sys
cd /mnt/usr/share/man/en.ISO8859-1; ln -sf ../man* .
ln: ./man1: Permission denied
ln: ./man1aout: Permission denied
ln: ./man2: Permission denied
ln: ./man3: Permission denied
ln: ./man4: Permission denied
ln: ./man5: Permission denied
ln: ./man6: Permission denied
ln: ./man7: Permission denied
ln: ./man8: Permission denied
ln: ./man9: Permission denied
-

Which comes from the target distrib-dirs in etc

Why would an ln -sf like that fail
the filesystems are exported with -maproot=0

--WjW



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



BTW, I suspect the following scenario, but I am not able to
verify it either via testing or in the code:
- last process in a dying jail exits
- pr_uref of the jail reaches zero
- pr_uref of prison0 gets decremented
- you attach to the jail and resurrect it
- but pr_uref of prison0 stays decremented

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.


Ahh now that explains all of our experienced panic scenarios:-
1. jail stop / start causing the panic but only after at least a
few days worth of uptime.

Here what we're seeing is enough leak of pr_uref from the restarted
jails to decrement prison0.pr_uref to 0 even with all the standard
unjailed processes still running.

2. A machine reboot, after all jails have been stopped but after
less time than #2.

In this case we haven't seen enough leakage to decrement
prison0.pr_uref to 0 given the number or prison0 process but
it has been incorrectly decremented, so as soon as the reboot kicks
in and prison0 processes start exiting, prison0.pr_uref gets 
further decremented and again hits 0 when it shouldn't


Now if this is the case, we should be able to confirm it with a little
more info.

1. What exactly does pr_uref represent?
2. Can what its value should be, be calculated from examining other
details of the system i.e. number of running processes, number of
running jails?

If we can calculate the value that prison0.pr_uref should be, then
by examining the machines we have which have been up for a while,
we should be able to confirm if an incorrect value is present on
them and hence prove this is the case.

Ideally a little script to run in kgdb to test this would be the
best way to go.

   Regards
   Steve



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Hans Petter Selasky
On Friday 19 August 2011 18:32:13 Andriy Gapon wrote:
 on 19/08/2011 00:24 Hans Petter Selasky said the following:
  On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote:
  If you can help Hans to figure out what you is wrong with USB subsystem
  in this respect that would help us all.
  
  Hi,
  
  usb_busdma.c:   /* we use mtx_owned() instead of this function */
  usb_busdma.c:   owned = mtx_owned(uptag-mtx);
  usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
  usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
  usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
  usb_hub.c:  if (mtx_owned(bus-bus_mtx)) {
  usb_transfer.c: if (!mtx_owned(info-xfer_mtx)) {
  usb_transfer.c: if (mtx_owned(xfer-xroot-xfer_mtx)) {
  usb_transfer.c: while (mtx_owned(xroot-udev-bus-bus_mtx)) {
  usb_transfer.c: while (mtx_owned(xroot-xfer_mtx)) {
 
  One fix you will need to do, if mtx_owned is not giving correct value is:
 First, could you please clarify what is the correct, or rather - expected,
 value in this case.  It's not immediately clear to me if we should
 consider all locks as owned or un-owned in a situation where all locks are
 actually skipped behind the scenes.
 Maybe USB code should explicitly check for that condition as to not make
 any unsafe assumptions.
 
 Second, it's not clear to me what the above list actually represents in the
 context of this discussion.

Hi,

The mtx_owned() is not only used to assert mutex ownership, but also to figure 
out which context the function is being called from. If the correct mutex is 
not locked already we postpone the work until later. In the panic case, there 
is no way to postpone work, so this check should be skipped in case of panic, 
because there is no other thread to put work to.

--HPS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



BTW, I suspect the following scenario, but I am not able to verify it either via
testing or in the code:
- last process in a dying jail exits
- pr_uref of the jail reaches zero
- pr_uref of prison0 gets decremented
- you attach to the jail and resurrect it
- but pr_uref of prison0 stays decremented

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.


I've just checked across a number of the panic dumps from the
past few days and they all have prison0.pr_uref = 0 which confirms
the cause of the panic.

I've tried scripting continuous jail start stops, but even after 1000's
of iterations have been unable to trigger this on my test machine, so
I'm going to dig into the jail code to see if I can find out how its
incorrectly decrementing prison0 via inspection.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Andriy Gapon
on 20/08/2011 16:35 Hans Petter Selasky said the following:
 On Friday 19 August 2011 18:32:13 Andriy Gapon wrote:
 on 19/08/2011 00:24 Hans Petter Selasky said the following:
 On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote:
 If you can help Hans to figure out what you is wrong with USB subsystem
 in this respect that would help us all.

 Hi,

 usb_busdma.c:   /* we use mtx_owned() instead of this function */
 usb_busdma.c:   owned = mtx_owned(uptag-mtx);
 usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
 usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
 usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
 usb_hub.c:  if (mtx_owned(bus-bus_mtx)) {
 usb_transfer.c: if (!mtx_owned(info-xfer_mtx)) {
 usb_transfer.c: if (mtx_owned(xfer-xroot-xfer_mtx)) {
 usb_transfer.c: while (mtx_owned(xroot-udev-bus-bus_mtx)) {
 usb_transfer.c: while (mtx_owned(xroot-xfer_mtx)) {

 One fix you will need to do, if mtx_owned is not giving correct value is:
 First, could you please clarify what is the correct, or rather - expected,
 value in this case.  It's not immediately clear to me if we should
 consider all locks as owned or un-owned in a situation where all locks are
 actually skipped behind the scenes.
 Maybe USB code should explicitly check for that condition as to not make
 any unsafe assumptions.

 Second, it's not clear to me what the above list actually represents in the
 context of this discussion.
 
 Hi,
 
 The mtx_owned() is not only used to assert mutex ownership, but also to 
 figure 
 out which context the function is being called from. If the correct mutex is 
 not locked already we postpone the work until later. In the panic case, there 
 is no way to postpone work, so this check should be skipped in case of panic, 
 because there is no other thread to put work to.

Now I see, but still I can not make the conclusions...
So what would you suggest - should USB code explicitly check for panicstr (or
SCHEDULER_STOPPED in the future)?  Or what mutex_owned should return - true or
false?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon
on 20/08/2011 18:51 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 
 BTW, I suspect the following scenario, but I am not able to verify it either 
 via
 testing or in the code:
 - last process in a dying jail exits
 - pr_uref of the jail reaches zero
 - pr_uref of prison0 gets decremented
 - you attach to the jail and resurrect it
 - but pr_uref of prison0 stays decremented

 Repeat this enough times and prison0.pr_uref reaches zero.
 To reach zero even sooner just kill enough of non-jailed processes.
 
 I've just checked across a number of the panic dumps from the
 past few days and they all have prison0.pr_uref = 0 which confirms
 the cause of the panic.
 
 I've tried scripting continuous jail start stops, but even after 1000's
 of iterations have been unable to trigger this on my test machine, so
 I'm going to dig into the jail code to see if I can find out how its
 incorrectly decrementing prison0 via inspection.

Steve,

thanks for doing this!  I'll reiterate my suspicion just in case - I think that
you should look for the cases where you stop a jail, but then re-attach and
resurrect the jail before it's completely dead.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Hans Petter Selasky
On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote:
 SCHEDULER_STOPPED

The USB code needs to check for the SCHEDULER_STOPPED and cold at the present 
moment. If this state can be set during bootup, and cleared at the same time 
like cold, it would be very good.

--HPS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Andriy Gapon
on 20/08/2011 19:54 Hans Petter Selasky said the following:
 On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote:
 SCHEDULER_STOPPED
 
 The USB code needs to check for the SCHEDULER_STOPPED and cold at the present 
 moment. If this state can be set during bootup, and cleared at the same time 
 like cold, it would be very good.

Sorry again - not sure if I follow.
SCHEDULER_STOPPED is supposed to be set on panic and never be reset.  It's like
a mirror of 'cold' in a sense.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: USB/coredump hangs in 8 and 9

2011-08-20 Thread Hans Petter Selasky
On Saturday 20 August 2011 19:09:02 Andriy Gapon wrote:
 on 20/08/2011 19:54 Hans Petter Selasky said the following:
  On Saturday 20 August 2011 18:45:57 Andriy Gapon wrote:
  SCHEDULER_STOPPED
  
  The USB code needs to check for the SCHEDULER_STOPPED and cold at the
  present moment. If this state can be set during bootup, and cleared at
  the same time like cold, it would be very good.
 
 Sorry again - not sure if I follow.
 SCHEDULER_STOPPED is supposed to be set on panic and never be reset.  It's
 like a mirror of 'cold' in a sense.

OK. Then you should add a test  !SCHEDULER_STOPPED where I pointed out:

static void
usbd_callback_wrapper(struct usb_xfer_queue *pq)
{
struct usb_xfer *xfer = pq-curr;
struct usb_xfer_root *info = xfer-xroot;

USB_BUS_LOCK_ASSERT(info-bus, MA_OWNED);
if (!mtx_owned(info-xfer_mtx)  !SCHEDULER_STOPPED) {
/*
 * Cases that end up here:
 *

And also ensure that no mutex asserts can trigger further panics.

--HPS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille
On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote:

 On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote:
 
 On Aug 19, 2011, at 7:21 PM, Jeremy Chadwick wrote:
 
 On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
 System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
 
 After a recent power failure, I'm seeing this in my logs:
 
 Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently 
 unreadable (pending) sectors
 
 I doubt this is related to a power failure.
 
 Searching on that error message, I was led to believe that identifying the 
 bad sector and
 running dd to read it would cause the HDD to reallocate that bad block.
 
 http://smartmontools.sourceforge.net/badblockhowto.html
 
 This is incorrect (meaning you've misunderstood what's written there).
 
 Unreadable LBAs can be a result of the LBA being actually bad (as in
 uncorrectable), or the LBA being marked suspect.  In either case the
 LBA will return an I/O error when read.
 
 If the LBAs are marked suspect, the drive will perform re-analysis of
 the LBA (to determine if the LBA can be read and the data re-mapped, or
 if it cannot then the LBA is marked uncorrectable) when you **write** to
 the LBA.
 
 The above smartd output doesn't tell me much.  Providing actual SMART
 attribute data (smartctl -a) for the drive would help.  The brand of the
 drive, the firmware version, and the model all matter -- every drive
 behaves a little differently.
 
 Information such as this?  
 http://beta.freebsddiary.org/smart-fixing-bad-sector.php
 
 Yes, perfect.  Thank you.  First thing first: upgrade smartmontools to
 5.41.  Your attributes will be the same after you do this (the drive is
 already in smartmontools' internal drive DB), but I often have to remind
 people that they really need to keep smartmontools updated as often as
 possible.  The changes between versions are vast; this is especially
 important for people with SSDs (I'm responsible for submitting some
 recent improvements for Intel 320 and 510 SSDs).

Done.

 Anyway, the drive (albeit an old PATA Maxtor) appears to have three
 anomalies:
 
 1) One confirmed reallocated LBA (SMART attribute 5)
 
 2) One suspect LBA (SMART attribute 197)
 
 3) A very high temperature of 51C (SMART attribute 194).  If this drive
 is in an enclosure or in a system with no fans this would be
 understandable, otherwise this is a bit high.  My home workstation which
 has only one case fan has a drive with more platters than your Maxtor,
 and it idles at ~38C.  Possibly this drive has been undergoing constant
 I/O recently (which does greatly increase drive temperature)?  Not sure.
 I'm not going to focus too much on this one.

This is an older system.  I suspect insufficient ventilation.  I'll look at 
getting
a new case fan, if not some HDD fans.

 The SMART error log also indicates an LBA failure at the 26000 hour mark
 (which is 16 hours prior to when you did smartctl -a /dev/ad2).  Whether
 that LBA is the remapped one or the suspect one is unknown.  The LBA was
 5566440.
 
 The SMART tests you did didn't really amount to anything; no surprise.
 short and long tests usually do not test the surface of the disk.  There
 are some drives which do it on a long test, but as I said before,
 everything varies from drive to drive.
 
 Furthermore, on this model of drive, you cannot do a surface scans via
 SMART.  Bummer.  That's indicated in the Offline data collection
 capabilities section at the top, where it reads:
 
   No Selective Self-test supported.
 
 So you'll have to use the dd method.  This takes longer than if surface
 scanning was supported by the drive, but is acceptable.  I'll get to how
 to go about that in a moment.

FWIW, I've done a dd read of the entire suspect disk already.  Just two errors.
From the URL mentioned above:

[root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror
dd: /dev/ad2: Input/output error
2717+0 records in
2717+0 records out
2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)
dd: /dev/ad2: Input/output error
38170+1 records in
38170+1 records out
40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec)
[root@bast:~] # 

That seems to indicate two problems.  Are those the values I should be using 
with dd?

I did some more precise testing:

# time dd of=/dev/null if=/dev/ad2 bs=512 iseek=5566440
dd: /dev/ad2: Input/output error
9+0 records in
9+0 records out
4608 bytes transferred in 5.368668 secs (858 bytes/sec)

real0m5.429s
user0m0.000s
sys 0m0.010s

NOTE: that's 9 blocks later than mentioned in smarctl

The above generated this in /var/log/messages:

Aug 20 17:29:25 bast kernel: ad2: FAILURE - READ_DMA status=51READY,DSC,ERROR 
error=40UNCORRECTABLE LBA=5566449


 [stuff snipped]


 That said:
 
 http://jdc.parodius.com/freebsd/bad_block_scan
 
 If you run this on your ad2 drive, I'm hoping what you'll find are two
 LBAs which can't be read -- one will be the remapped LBA 

Re: 32GB limit per swap device?

2011-08-20 Thread Kostik Belousov
On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
 On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov 
 melif...@ipfw.ruwrote:
 
  On 10.08.2011 19:16, per...@pluto.rain.com wrote:
 
  Chuck Swigercswi...@mac.com  wrote:
 
   On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
 
  I am trying to set up 64GB partitions for swap for a system that
  has 64GB of RAM (with the idea to dump kernel core etc). But, on
  8-stable as of today I get:
 
  WARNING: reducing size to maximum of 67108864 blocks per swap unit
 
  Is there workaround for this limitation?
 
 
  Another interesting question:
 
  swap pager operates in page blocks (PAGE_SIZE=4k on common arch).
 
  Block device size in passed to swaponsomething() in number of _disk_ blocks
   (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
  pager is build) maximum objects check is enforced.
 
  The (possible) problem is that real object count we will operate on is not
  the value passed to swaponsomething() since it is calculated in wrong units.
 
  we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
  is rough (X / 8) so we should be able to address 32*8=256G.
 
  The code should look like this:
 
  Index: vm/swap_pager.c
  ==**==**===
  --- vm/swap_pager.c (revision 223877)
  +++ vm/swap_pager.c (working copy)
  @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
 u_long mblocks;
 
 /*
  +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
  +* First chop nblks off to page-align it, then convert.
  +*
  +* sw-sw_nblks is in page-sized chunks now too.
  +*/
  +   nblks = ~(ctodb(1) - 1);
  +   nblks = dbtoc(nblks);
  +
  +   /*
 
  * If we go beyond this, we get overflows in the radix
  * tree bitmap code.
  */
  @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
 mblocks);
 nblks = mblocks;
 }
  -   /*
  -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
  -* First chop nblks off to page-align it, then convert.
  -*
  -* sw-sw_nblks is in page-sized chunks now too.
  -*/
  -   nblks = ~(ctodb(1) - 1);
  -   nblks = dbtoc(nblks);
 
 sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
 sp-sw_vp = vp;
 
 
  (move pages recalculation before b-list check)
 
 
  Can someone comment on this?
 
 
 I believe that you are correct.  Have you tried testing this change on a
 large swap device?
I probably agree too, but I am in the process of re-reading the swap code,
and I do not quite believe in the limit.

When the initial code was committed, our daddr_t was 32bit, I checked
the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression
right now is that we only utilize the low 32bits of daddr_t.

Esp. interesting looks the following typedef:
typedef uint32_tu_daddr_t;  /* unsigned disk address */
which (correctly) means that typical mask (u_daddr_t)-1 is 0x.

I wonder whether we could just use full 64bit and de-facto remove the
limitation on the swap partition size.


pgpJVixGsCJlw.pgp
Description: PGP signature


Re: bad sector in gmirror HDD

2011-08-20 Thread Alex Samorukov
You can run long self-test in smartmontools (-t long). Then you can get 
failed sector number from the smartmontools (-l selftest) and then you 
can use DD to write zero to the specific sector. Also i am highly 
recommending to setup smartd as daemon and to monitor number of 
relocated sectors. If they will grow again - then it is a good time to 
utilize this disk.

[root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror
dd: /dev/ad2: Input/output error
2717+0 records in
2717+0 records out
2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)
dd: /dev/ad2: Input/output error
38170+1 records in
38170+1 records out
40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec)
[root@bast:~] #

That seems to indicate two problems.  Are those the values I should be using
with dd?



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 32GB limit per swap device?

2011-08-20 Thread Alan Cox
On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov melif...@ipfw.ruwrote:

 On 10.08.2011 19:16, per...@pluto.rain.com wrote:

 Chuck Swigercswi...@mac.com  wrote:

  On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:

 I am trying to set up 64GB partitions for swap for a system that
 has 64GB of RAM (with the idea to dump kernel core etc). But, on
 8-stable as of today I get:

 WARNING: reducing size to maximum of 67108864 blocks per swap unit

 Is there workaround for this limitation?


 Another interesting question:

 swap pager operates in page blocks (PAGE_SIZE=4k on common arch).

 Block device size in passed to swaponsomething() in number of _disk_ blocks
  (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
 pager is build) maximum objects check is enforced.

 The (possible) problem is that real object count we will operate on is not
 the value passed to swaponsomething() since it is calculated in wrong units.

 we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
 is rough (X / 8) so we should be able to address 32*8=256G.

 The code should look like this:

 Index: vm/swap_pager.c
 ==**==**===
 --- vm/swap_pager.c (revision 223877)
 +++ vm/swap_pager.c (working copy)
 @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
u_long mblocks;

/*
 +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
 +* First chop nblks off to page-align it, then convert.
 +*
 +* sw-sw_nblks is in page-sized chunks now too.
 +*/
 +   nblks = ~(ctodb(1) - 1);
 +   nblks = dbtoc(nblks);
 +
 +   /*

 * If we go beyond this, we get overflows in the radix
 * tree bitmap code.
 */
 @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
mblocks);
nblks = mblocks;
}
 -   /*
 -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
 -* First chop nblks off to page-align it, then convert.
 -*
 -* sw-sw_nblks is in page-sized chunks now too.
 -*/
 -   nblks = ~(ctodb(1) - 1);
 -   nblks = dbtoc(nblks);

sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
sp-sw_vp = vp;


 (move pages recalculation before b-list check)


 Can someone comment on this?


I believe that you are correct.  Have you tried testing this change on a
large swap device?

Alan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Diane Bruce
On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote:
 On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote:
 
  On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote:
...
  Information such as this?  
  http://beta.freebsddiary.org/smart-fixing-bad-sector.php
...
  3) A very high temperature of 51C (SMART attribute 194).  If this drive
  is in an enclosure or in a system with no fans this would be

...

eh? What's the temperature of the second drive?

...

 This is an older system.  I suspect insufficient ventilation.  I'll look at 
 getting
 a new case fan, if not some HDD fans.

...

  I still suggest you replace the drive, although given its age I doubt

Older drive and errors starting to happen, replace ASAP.

  you'll be able to find a suitable replacement.  I tend to keep disks
  like this around for testing/experimental purposes and not for actual
  use.
 
 I have several unused 80GB HDD I can place into this system.  I think that's
 what I'll wind up doing.  But I'd like to follow this process through and get 
 it documented
 for future reference.

If the data is valuable, the sooner the better. 
It's actually somewhat saner if the two drives are not from the same lot.


 -- 
 Dan Langille - http://langille.org
 

- Diane
-- 
- d...@freebsd.org d...@db.net http://www.db.net/~db
  Why leave money to our children if we don't leave them the Earth?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille

On Aug 20, 2011, at 1:54 PM, Alex Samorukov wrote:

 [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror
 dd: /dev/ad2: Input/output error
 2717+0 records in
 2717+0 records out
 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)
 dd: /dev/ad2: Input/output error
 38170+1 records in
 38170+1 records out
 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec)
 [root@bast:~] #
 
 That seems to indicate two problems.  Are those the values I should be using
 with dd?
 
 


 You can run long self-test in smartmontools (-t long). Then you can get 
 failed sector number from the smartmontools (-l selftest) and then you can 
 use DD to write zero to the specific sector.

Already done: http://beta.freebsddiary.org/smart-fixing-bad-sector.php

Search for 786767

Or did you mean something else?

That doesn't seem to map to a particular sector though... I ran it for a 
while...

# time dd of=/dev/null if=/dev/ad2 bs=512 iseek=786767 
^C4301949+0 records in
4301949+0 records out
2202597888 bytes transferred in 780.245828 secs (2822954 bytes/sec)

real13m0.256s
user0m22.087s
sys 3m24.215s



 Also i am highly recommending to setup smartd as daemon and to monitor number 
 of relocated sectors. If they will grow again - then it is a good time to 
 utilize this disk.

It is running, but with nothing custom in the .conf file.

-- 
Dan Langille - http://langille.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille

On Aug 20, 2011, at 2:04 PM, Diane Bruce wrote:

 On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote:
 On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote:
 
 On Fri, Aug 19, 2011 at 09:39:17PM -0400, Dan Langille wrote:
 ...
 Information such as this?  
 http://beta.freebsddiary.org/smart-fixing-bad-sector.php
 ...
 3) A very high temperature of 51C (SMART attribute 194).  If this drive
 is in an enclosure or in a system with no fans this would be
 
 ...
 
 eh? What's the temperature of the second drive?

Roughly the same:


[root@bast:/home/dan/tmp] # smartctl -a /dev/ad2 | grep -i temp
194 Temperature_Celsius 0x0022   080   076   042Old_age   Always   
-   51

[root@bast:/home/dan/tmp] # smartctl -a /dev/ad0 | grep -i temp
194 Temperature_Celsius 0x0022   081   074   042Old_age   Always   
-   49
[root@bast:/home/dan/tmp] # 


FYI, when I first set up smartd, I questioned those values.  The HDD in 
question, at the time,
did not feel hot to the touch.

 
 ...
 
 This is an older system.  I suspect insufficient ventilation.  I'll look at 
 getting
 a new case fan, if not some HDD fans.
 
 ...
 
 I still suggest you replace the drive, although given its age I doubt
 
 Older drive and errors starting to happen, replace ASAP.
 
 you'll be able to find a suitable replacement.  I tend to keep disks
 like this around for testing/experimental purposes and not for actual
 use.
 
 I have several unused 80GB HDD I can place into this system.  I think that's
 what I'll wind up doing.  But I'd like to follow this process through and 
 get it documented
 for future reference.
 
 If the data is valuable, the sooner the better. 
 It's actually somewhat saner if the two drives are not from the same lot.

Noted.

-- 
Dan Langille - http://langille.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Roger Marquis

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.


Interesting.  We've been getting kernel panics in -stable but with only
one jail started at boot without being restarted.

Are you using SAS drives by any chance?  Setting ethernet polling and HZ?
How about softupdates, gmirror, and/or anything in sysctl.conf?

Roger Marquis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - 
From: Roger Marquis marq...@roble.com

To: freebsd-j...@freebsd.org; freebsd-stable@FreeBSD.org
Sent: Saturday, August 20, 2011 7:10 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE



Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.


Interesting.  We've been getting kernel panics in -stable but with only
one jail started at boot without being restarted.

Are you using SAS drives by any chance?  Setting ethernet polling and HZ?
How about softupdates, gmirror, and/or anything in sysctl.conf?


If your not restarting things it may be unrelated. No SAS, polling is
compiled in but no devices have it active and using ZFS only.

Are you seeing a double fault panic?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
On Sat, Aug 20, 2011 at 07:54:30PM +0200, Alex Samorukov wrote:
 You can run long self-test in smartmontools (-t long). Then you can
 get failed sector number from the smartmontools (-l selftest) and
 then you can use DD to write zero to the specific sector.

This is inaccurate advice.  I covered this in my reply already as well:

http://lists.freebsd.org/pipermail/freebsd-stable/2011-August/063665.html

Quote:

The SMART tests you did didn't really amount to anything; no surprise.
short and long tests usually do not test the surface of the disk.  There
are some drives which do it on a long test, but as I said before,
everything varies from drive to drive.

TL;DR version: smartctl -t long  !=  smartctl -t select.

The OP's drive does not support selective scans (-t select), and long
turned up nothing (no surprise there either).  So, using dd to find the
bad LBAs is the only choice he has.

 Also i am highly recommending to setup smartd as daemon and to monitor
 number of relocated sectors. If they will grow again - then it is a
 good time to utilize this disk.

You have to know what you're looking at and how to interpret the data
smartd gives you for it to be useful.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
Dan, I will respond to your reply sometime tomorrow.  I do not have time
to review the Email today (~7.7KBytes), but will have time tomorrow.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 32GB limit per swap device?

2011-08-20 Thread Alan Cox

On 08/20/2011 12:41, Kostik Belousov wrote:

On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:

On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikovmelif...@ipfw.ruwrote:


On 10.08.2011 19:16, per...@pluto.rain.com wrote:


Chuck Swigercswi...@mac.com   wrote:

  On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:

I am trying to set up 64GB partitions for swap for a system that
has 64GB of RAM (with the idea to dump kernel core etc). But, on
8-stable as of today I get:

WARNING: reducing size to maximum of 67108864 blocks per swap unit

Is there workaround for this limitation?


Another interesting question:

swap pager operates in page blocks (PAGE_SIZE=4k on common arch).

Block device size in passed to swaponsomething() in number of _disk_ blocks
  (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
pager is build) maximum objects check is enforced.

The (possible) problem is that real object count we will operate on is not
the value passed to swaponsomething() since it is calculated in wrong units.

we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
is rough (X / 8) so we should be able to address 32*8=256G.

The code should look like this:

Index: vm/swap_pager.c
==**==**===
--- vm/swap_pager.c (revision 223877)
+++ vm/swap_pager.c (working copy)
@@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
u_long mblocks;

/*
+* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
+* First chop nblks off to page-align it, then convert.
+*
+* sw-sw_nblks is in page-sized chunks now too.
+*/
+   nblks= ~(ctodb(1) - 1);
+   nblks = dbtoc(nblks);
+
+   /*

 * If we go beyond this, we get overflows in the radix
 * tree bitmap code.
 */
@@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
mblocks);
nblks = mblocks;
}
-   /*
-* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
-* First chop nblks off to page-align it, then convert.
-*
-* sw-sw_nblks is in page-sized chunks now too.
-*/
-   nblks= ~(ctodb(1) - 1);
-   nblks = dbtoc(nblks);

sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
sp-sw_vp = vp;


(move pages recalculation before b-list check)


Can someone comment on this?



I believe that you are correct.  Have you tried testing this change on a
large swap device?

I probably agree too, but I am in the process of re-reading the swap code,
and I do not quite believe in the limit.



I'm uncertain whether the current limit, 0x4000 / 
BLIST_META_RADIX, is exact or not, but I doubt that it is too large.



When the initial code was committed, our daddr_t was 32bit, I checked
the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression
right now is that we only utilize the low 32bits of daddr_t.

Esp. interesting looks the following typedef:
typedef uint32_tu_daddr_t;  /* unsigned disk address */
which (correctly) means that typical mask (u_daddr_t)-1 is 0x.

I wonder whether we could just use full 64bit and de-facto remove the
limitation on the swap partition size.


I would rather argue first that the subr_list code should not be using 
daddr_t all.  The code is abusing daddr_t and defining u_daddr_t to 
represent things that are not disk addresses.  Instead, it should either 
define its own type or directly use (u)int*_t.  Then, as for choosing 
between 32 and 64 bits, I'm skeptical of using this structure for 
managing more than 32 bits worth of blocks, given the amount of RAM it 
will use.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille
On Aug 20, 2011, at 2:36 PM, Jeremy Chadwick wrote:

 Dan, I will respond to your reply sometime tomorrow.  I do not have time
 to review the Email today (~7.7KBytes), but will have time tomorrow.


No worries.  Thank you.

-- 
Dan Langille - http://langille.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Alex Samorukov



The SMART tests you did didn't really amount to anything; no surprise.
short and long tests usually do not test the surface of the disk.  There
are some drives which do it on a long test, but as I said before,
everything varies from drive to drive.

It is not correct statement, sorry. Long test trying to read all the 
data from surface (and doing some other things).


// one of the smartmontools developers and sysutils/smartmontools 
maintainer.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 32GB limit per swap device?

2011-08-20 Thread Alexander V. Chernikov
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Alan Cox wrote:
 On 08/20/2011 12:41, Kostik Belousov wrote:
 On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
 On Thu, Aug 18, 2011 at 3:16 AM, Alexander V.
 Chernikovmelif...@ipfw.ruwrote:

 On 10.08.2011 19:16, per...@pluto.rain.com wrote:

 Chuck Swigercswi...@mac.com   wrote:

   On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
 I am trying to set up 64GB partitions for swap for a system that
 has 64GB of RAM (with the idea to dump kernel core etc). But, on
 8-stable as of today I get:

 WARNING: reducing size to maximum of 67108864 blocks per swap unit

 Is there workaround for this limitation?

 Another interesting question:

 swap pager operates in page blocks (PAGE_SIZE=4k on common arch).

 Block device size in passed to swaponsomething() in number of _disk_
 blocks
   (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of
 which swap
 pager is build) maximum objects check is enforced.

 The (possible) problem is that real object count we will operate on
 is not
 the value passed to swaponsomething() since it is calculated in
 wrong units.

 we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value
 which
 is rough (X / 8) so we should be able to address 32*8=256G.

 The code should look like this:

 Index: vm/swap_pager.c
 ==**==**===
 --- vm/swap_pager.c (revision 223877)
 +++ vm/swap_pager.c (working copy)
 @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id,
 u_long
 u_long mblocks;

 /*
 +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
 chunks.
 +* First chop nblks off to page-align it, then convert.
 +*
 +* sw-sw_nblks is in page-sized chunks now too.
 +*/
 +   nblks= ~(ctodb(1) - 1);
 +   nblks = dbtoc(nblks);
 +
 +   /*

  * If we go beyond this, we get overflows in the radix
  * tree bitmap code.
  */
 @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id,
 u_long
 mblocks);
 nblks = mblocks;
 }
 -   /*
 -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
 chunks.
 -* First chop nblks off to page-align it, then convert.
 -*
 -* sw-sw_nblks is in page-sized chunks now too.
 -*/
 -   nblks= ~(ctodb(1) - 1);
 -   nblks = dbtoc(nblks);

 sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
 sp-sw_vp = vp;


 (move pages recalculation before b-list check)


 Can someone comment on this?


 I believe that you are correct.  Have you tried testing this change on a
 large swap device?
I will try tomorrow.

 I probably agree too, but I am in the process of re-reading the swap
 code,
 and I do not quite believe in the limit.

 
 I'm uncertain whether the current limit, 0x4000 /
 BLIST_META_RADIX, is exact or not, but I doubt that it is too large.

It is not exact.  It is rough estimation of
sizeof(blmeta_t) * X  4G (blist_create() assumes malloc() not being
able to allocate more that 4G. I'm not sure if it is true this days)
X is number of blocks we need to store. Actual number, however, it is X
/ (1 + 1/BLIST_META_RADIX + 1/BLIST_META_RADIX^2 + ...) but it dffers
from X not very much.

blist can be seen as tree of radix trees, with metainformation for all
those radix trees allocated by single allocation which imposes this
limit. Metatinformation is used to find free blocks more quickly

Single linear allocation is required to advance to next radix tree on
the same level very fast:


*   *   *   *   *
**  **  **  **  **

^^^
Some kind of schema with 3 level in tree and BLIST_META_RADIX=2 (instead
of 16).



 
 When the initial code was committed, our daddr_t was 32bit, I checked
 the RELENG_4 sources. Current code uses int64_t for daddr_t. My
 impression
 right now is that we only utilize the low 32bits of daddr_t.

 Esp. interesting looks the following typedef:
 typedefuint32_tu_daddr_t;/* unsigned disk address */
 which (correctly) means that typical mask (u_daddr_t)-1 is 0x.

 I wonder whether we could just use full 64bit and de-facto remove the
 limitation on the swap partition size.

This will increase struct blmeta_t twice and cause 2*X memory usage for
every swap configuration.

 
 I would rather argue first that the subr_list code should not be using
 daddr_t all.  The code is abusing daddr_t and defining u_daddr_t to
 represent things that are not disk addresses.  Instead, it should either
 define its own type or directly use (u)int*_t.  Then, as for choosing
 between 32 and 64 bits, I'm skeptical of using this structure for
 managing more than 32 bits worth of blocks, given the amount of RAM it
 will use.
 
 
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


Re: 32GB limit per swap device?

2011-08-20 Thread Kostik Belousov
On Sat, Aug 20, 2011 at 10:42:28PM +0400, Alexander V. Chernikov wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Alan Cox wrote:
  On 08/20/2011 12:41, Kostik Belousov wrote:
  On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
  On Thu, Aug 18, 2011 at 3:16 AM, Alexander V.
  Chernikovmelif...@ipfw.ruwrote:
 
  On 10.08.2011 19:16, per...@pluto.rain.com wrote:
 
  Chuck Swigercswi...@mac.com   wrote:
 
On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
  I am trying to set up 64GB partitions for swap for a system that
  has 64GB of RAM (with the idea to dump kernel core etc). But, on
  8-stable as of today I get:
 
  WARNING: reducing size to maximum of 67108864 blocks per swap unit
 
  Is there workaround for this limitation?
 
  Another interesting question:
 
  swap pager operates in page blocks (PAGE_SIZE=4k on common arch).
 
  Block device size in passed to swaponsomething() in number of _disk_
  blocks
(e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of
  which swap
  pager is build) maximum objects check is enforced.
 
  The (possible) problem is that real object count we will operate on
  is not
  the value passed to swaponsomething() since it is calculated in
  wrong units.
 
  we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value
  which
  is rough (X / 8) so we should be able to address 32*8=256G.
 
  The code should look like this:
 
  Index: vm/swap_pager.c
  ==**==**===
  --- vm/swap_pager.c (revision 223877)
  +++ vm/swap_pager.c (working copy)
  @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id,
  u_long
  u_long mblocks;
 
  /*
  +* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
  chunks.
  +* First chop nblks off to page-align it, then convert.
  +*
  +* sw-sw_nblks is in page-sized chunks now too.
  +*/
  +   nblks= ~(ctodb(1) - 1);
  +   nblks = dbtoc(nblks);
  +
  +   /*
 
   * If we go beyond this, we get overflows in the radix
   * tree bitmap code.
   */
  @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id,
  u_long
  mblocks);
  nblks = mblocks;
  }
  -   /*
  -* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd
  chunks.
  -* First chop nblks off to page-align it, then convert.
  -*
  -* sw-sw_nblks is in page-sized chunks now too.
  -*/
  -   nblks= ~(ctodb(1) - 1);
  -   nblks = dbtoc(nblks);
 
  sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
  sp-sw_vp = vp;
 
 
  (move pages recalculation before b-list check)
 
 
  Can someone comment on this?
 
 
  I believe that you are correct.  Have you tried testing this change on a
  large swap device?
 I will try tomorrow.
 
  I probably agree too, but I am in the process of re-reading the swap
  code,
  and I do not quite believe in the limit.
 
  
  I'm uncertain whether the current limit, 0x4000 /
  BLIST_META_RADIX, is exact or not, but I doubt that it is too large.
 
 It is not exact.  It is rough estimation of
 sizeof(blmeta_t) * X  4G (blist_create() assumes malloc() not being
 able to allocate more that 4G. I'm not sure if it is true this days)
 X is number of blocks we need to store. Actual number, however, it is X
 / (1 + 1/BLIST_META_RADIX + 1/BLIST_META_RADIX^2 + ...) but it dffers
 from X not very much.
 
 blist can be seen as tree of radix trees, with metainformation for all
 those radix trees allocated by single allocation which imposes this
 limit. Metatinformation is used to find free blocks more quickly
 
 Single linear allocation is required to advance to next radix tree on
 the same level very fast:
 
 
 *   *   *   *   *
 **  **  **  **  **
 
 ^^^
 Some kind of schema with 3 level in tree and BLIST_META_RADIX=2 (instead
 of 16).
 
 
 
  
  When the initial code was committed, our daddr_t was 32bit, I checked
  the RELENG_4 sources. Current code uses int64_t for daddr_t. My
  impression
  right now is that we only utilize the low 32bits of daddr_t.
 
  Esp. interesting looks the following typedef:
  typedefuint32_tu_daddr_t;/* unsigned disk address */
  which (correctly) means that typical mask (u_daddr_t)-1 is 0x.
 
  I wonder whether we could just use full 64bit and de-facto remove the
  limitation on the swap partition size.
 
 This will increase struct blmeta_t twice and cause 2*X memory usage for
 every swap configuration.
No, daddr_t is already 64bit. Nothing will increase.
My point is the current limitation is artificial.

I think Alan note referred to the amount of the radix tree nodes
required to cover the large swap partition. But it could be a good
temporary measure.

I expect to be able to provide some numeric evidence later.
 
  
  I would rather argue first that the subr_list code 

Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
On Sat, Aug 20, 2011 at 08:43:09PM +0200, Alex Samorukov wrote:
 
 The SMART tests you did didn't really amount to anything; no surprise.
 short and long tests usually do not test the surface of the disk.  There
 are some drives which do it on a long test, but as I said before,
 everything varies from drive to drive.
 
 It is not correct statement, sorry. Long test trying to read all the
 data from surface (and doing some other things).

 // one of the smartmontools developers and sysutils/smartmontools
 maintainer.

That's great, but too bad it's generally not true in practise.  Dan's
long scan on his site proves it, and I've dealt with this situation
myself many times over.

SMART long tests *may* do a surface scan, but in most cases they just
seem to do something that's similar to short but over a longer period
of time.  Furthermore, some which *do* do a surface scan on a long
test don't always report LBA failures in the self-test log.  I've
personally seen this happen on Western Digital disks (model strings are
unknown, I'm certain I've rid myself of those drives).  Firmware
bug/quirk?  Possibly, but at the end of the day it doesn't matter -- it
means the end-user has wasted 2-3 hours for something that tests OK yet
we know for a fact isn't OK.

I *have* seen a drive do a surface scan on a long test and report LBAs
it couldn't read, but as I said, it's rare and varies from vendor to
vendor, drive to drive, and firmware to firmware.  When it happened I
was very, very surprised (and delighted).

The only thing I can trust 100% of the time when it comes to surface
scans is SMART selective scans (if available, which again the OP's drive
does not offer this), or using dd or a read-per-LBA on the OS level
(which works everywhere).

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Remote installing

2011-08-20 Thread Willem Jan Withagen

On 20-8-2011 13:26, Willem Jan Withagen wrote:

On 2011-08-20 13:15, Willem Jan Withagen wrote:

Hi,

Today I liked to live dangerously, and want to upgrade a backups server
from i386 to amd64. Just to see if we could.
And otherwise I'd scap it and install from usb-stick.

So I have my server running amd64 build GENERIC.
export /, /var, /usr on the server to be upgraded.

But upgrading world dus have a snag already early on:


empty changed
flags expected schg found none not modified: Operation not supported


This is probably where some program wants to set immutable flag on
/var/tmp/empy...

But looks like NFS does not grok that.

Now I seen plenty of sugestions to do it this way, but never saw anybody
come back with this complaint

So I must be ommiting something ??


I looked at the work errors.
---
cd /mnt/; rm -f /mnt/sys; ln -s usr/src/sys sys
cd /mnt/usr/share/man/en.ISO8859-1; ln -sf ../man* .
ln: ./man1: Permission denied
ln: ./man1aout: Permission denied
ln: ./man2: Permission denied
ln: ./man3: Permission denied
ln: ./man4: Permission denied
ln: ./man5: Permission denied
ln: ./man6: Permission denied
ln: ./man7: Permission denied
ln: ./man8: Permission denied
ln: ./man9: Permission denied
-

Which comes from the target distrib-dirs in etc

Why would an ln -sf like that fail
the filesystems are exported with -maproot=0


Well turned out that the easiest fix was to run
chflags -R noschg /
at the client, because certain files are immutable and once you run into 
those, it is hard to fix it after the fact.


Next would be to move /lib and /usr/lib out of the way. So that doesn't 
cause conflict in near future.
Which will cause new programs to start to fail. So better make shure 
that every thing is set before you start upgrading over NFS.


But I did manage to get it upgraded from i386 to amd64.

--WjW
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
Dan, sorry for the previous mail.  Seems my schedule today has just
unexpected changed; I had social events to deal with but as I found out
a few minutes ago those events are cancelled, which means I have time
today to look at your mail.

On Sat, Aug 20, 2011 at 01:34:41PM -0400, Dan Langille wrote:
 On Aug 19, 2011, at 11:24 PM, Jeremy Chadwick wrote:
  The SMART error log also indicates an LBA failure at the 26000 hour mark
  (which is 16 hours prior to when you did smartctl -a /dev/ad2).  Whether
  that LBA is the remapped one or the suspect one is unknown.  The LBA was
  5566440.
  
  The SMART tests you did didn't really amount to anything; no surprise.
  short and long tests usually do not test the surface of the disk.  There
  are some drives which do it on a long test, but as I said before,
  everything varies from drive to drive.
  
  Furthermore, on this model of drive, you cannot do a surface scans via
  SMART.  Bummer.  That's indicated in the Offline data collection
  capabilities section at the top, where it reads:
  
  No Selective Self-test supported.
  
  So you'll have to use the dd method.  This takes longer than if surface
  scanning was supported by the drive, but is acceptable.  I'll get to how
  to go about that in a moment.
 
 FWIW, I've done a dd read of the entire suspect disk already.  Just two 
 errors.

Actually one error -- keep reading.

 From the URL mentioned above:
 
 [root@bast:~] # dd of=/dev/null if=/dev/ad2 bs=1m conv=noerror
 dd: /dev/ad2: Input/output error
 2717+0 records in
 2717+0 records out
 2848980992 bytes transferred in 127.128503 secs (22410246 bytes/sec)
 dd: /dev/ad2: Input/output error
 38170+1 records in
 38170+1 records out
 40025063424 bytes transferred in 1544.671423 secs (25911701 bytes/sec)
 [root@bast:~] # 
 
 That seems to indicate two problems.  Are those the values I should be using 
 with dd?

The values you refer to are byte offsets, not LBAs.  Furthermore, you
used a block size of 1 megabyte (not sure why people keep doing this).
LBA size on your drive is 512 bytes; asking for 1 megabyte in dd is
going to make the drive try to read() 1MByte, and an I/O error could
happen anywhere within that 1MByte range.  (1024*1024) / 512 == 2048
LBAs make up 1MByte.

Next, remember that the noerror attribute has some quirks associated
with it that need to be kept in mind.  The man page discusses these.

Finally, I believe the last I/O error you see (at byte 40025063424) is
normal given what you told dd to do.  It was trying to use bs=1m, and
your drive has a capacity limit of 40027029504 bytes.  I'm left to
believe you had a short read (less than 1MByte), so this is normal.
40027029504 / (1024*1024) == 38172.75, which is not a round number,
hence the error.

 I did some more precise testing:
 
 # time dd of=/dev/null if=/dev/ad2 bs=512 iseek=5566440
 dd: /dev/ad2: Input/output error
 9+0 records in
 9+0 records out
 4608 bytes transferred in 5.368668 secs (858 bytes/sec)
 
 real  0m5.429s
 user  0m0.000s
 sys   0m0.010s
 
 NOTE: that's 9 blocks later than mentioned in smarctl
 
 The above generated this in /var/log/messages:
 
 Aug 20 17:29:25 bast kernel: ad2: FAILURE - READ_DMA 
 status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=5566449

Your dd command above is saying use a block size of 512 bytes, and read
indefinitely from /dev/ad2, starting with an lseek() on /dev/ad2 of
5566440.  You then get an I/O error somewhere from where you start to
when the device ends.  You're assuming that the number of bytes
transferred indicates where the actual error happened, which in my
experience is not always true.

What really needs to happen here is use of count=1, and you adjusting
iseek manually per each LBA.  Or you could use the script I wrote and
let the computer do it for you.  :-)

I understand what you're getting at, re: that's 9 blocks later.  But
the OS does some caching of I/O and so on sometimes, or aggregates
block reads larger than physical LBA size, so that may be what's going
on here.  However, if you keep reading, you might find your answer is
that you may (still unsure) have other LBAs which are now marked suspect.

  That said:
  
  http://jdc.parodius.com/freebsd/bad_block_scan
  
  If you run this on your ad2 drive, I'm hoping what you'll find are two
  LBAs which can't be read -- one will be the remapped LBA and one will be
  the suspect LBA.  If you only get one LBA error then that's fine too,
  and will be the suspect LBA.
 
  Once you have the LBA(s), you can submit writes to them to get the drive
  to re-analyse them (assuming they're suspect):
  
  dd if=/dev/zero of=/dev/XXX bs=512 count=1 seek=N
  
  Where XXX is the device and N is the LBA number.
  
  If this works properly, the dd command should sit there for a little bit
  (as the drive does its re-analysis magic) and then should complete.
 
 ad2 is part of a gmirror with ad0.   Does this change things?
 
 I haven't tried the dd yet.

It does not change things, but I 

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



thanks for doing this!  I'll reiterate my suspicion just in case - I think that
you should look for the cases where you stop a jail, but then re-attach and
resurrect the jail before it's completely dead.


Yer that's where I think its happening too, but I also suspect its not just
dieing jail that's needed, I think its a dieing jail in the final stages of
cleanup.

Looking through the code I believe I may have noticed a scenario which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
   struct prison *ppr, *tpr;
   int vfslocked;

   if (!(flags  PD_LOCKED))
   mtx_lock(pr-pr_mtx);
   /* Decrement the user references in a separate loop. */
   if (flags  PD_DEUREF) {
   for (tpr = pr;; tpr = tpr-pr_parent) {
   if (tpr != pr)
   mtx_lock(tpr-pr_mtx);
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {
   mtx_unlock(tpr-pr_mtx);
   if (flags  PD_LIST_SLOCKED)
   sx_sunlock(allprison_lock);
   else if (flags  PD_LIST_XLOCKED)
   sx_xunlock(allprison_lock);
   return;
   }
   if (tpr != pr) {
   mtx_unlock(tpr-pr_mtx);
   mtx_lock(pr-pr_mtx);
   }
   }

If you take a scenario of a simple one level prison setup running a single 
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by 
prison1)

It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is really being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.

What do people think?

   Regards
   Steve







This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Dan Langille
On Aug 20, 2011, at 3:57 PM, Jeremy Chadwick wrote:

 I still suggest you replace the drive, although given its age I doubt
 you'll be able to find a suitable replacement.  I tend to keep disks
 like this around for testing/experimental purposes and not for actual
 use.
 
 I have several unused 80GB HDD I can place into this system.  I think that's
 what I'll wind up doing.  But I'd like to follow this process through and 
 get it documented
 for future reference.
 
 Yes, given the behaviour of the drive I would recommend you simply
 replace it at this point in time.  What concerns me the most is
 Current_Pending_Sector incrementing, but it's impossible for me to
 determine if that incrementing means there are other LBAs which are bad,
 or if the drive is behaving how its firmware is designed.
 
 Keep the drive around for further experiments/tinkering if you're
 interested.  Stuff like this is always interesting/fun as long as your
 data isn't at risk, so doing the replacement first would be best
 (especially if both drives in your mirror were bought at the same time
 from the same place and have similar manufacturing plants/dates on
 them).


I'm happy to send you this drive for your experimentation pleasure.

If so, please email me an address offline.  You don't have a disk with 
errors, and it seems you should have one.

After I wipe it.  I'm sure I have a destroyer CD here somewhere

-- 
Dan Langille - http://langille.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
A follow-up given that I just viewed the SMART attribute data at the
very bottom of this page as of this writing (Sat Aug 20 13:00:09 PDT
2011):

http://beta.freebsddiary.org/smart-fixing-bad-sector.php

And I see this:

ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   020Pre-fail  Always   -
   2
  9 Power_On_Hours  0x0012   059   059   001Old_age   Always   -
   27440
196 Reallocated_Event_Count 0x0010   099   099   020Old_age   Offline  -
   1
197 Current_Pending_Sector  0x0032   100   100   020Old_age   Always   -
   2
198 Offline_Uncorrectable   0x0010   100   253   000Old_age   Offline  -
   0

These attributes USUALLY mean:

1) Reallocated_Sector_Ct   == There are 2 remapped LBAs.
2) Reallocated_Event_Count == There is 1 remapping event which has been
  noticed (either failure or success).
3) Current_Pending_Sector  == There are 2 LBAs which are suspect.

Now, given my previous statement about this particular model of drive,
Maxtor may have a firmware quirk or other oddities that don't cause
Current_Pending_Sector to drop to 0 or Reallocated_Event_Count to match
reality.  I simply don't know.  But keep reading.

And remember, this is what we started with:

ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   020Pre-fail  Always   -
   1
  9 Power_On_Hours  0x0012   059   059   001Old_age   Always   -
   27416
196 Reallocated_Event_Count 0x0010   100   100   020Old_age   Offline  -
   0
197 Current_Pending_Sector  0x0032   100   100   020Old_age   Always   -
   1
198 Offline_Uncorrectable   0x0010   100   253   000Old_age   Offline  -
   0

Anyway, in the SMART error log, I see 3 entries (2 new ones since the
last time I saw the web page):

* Error 3 occurred at disk power-on lifetime: 27422 hours (1142 days + 14 hours)
  40 59 18 e8 ef 54 e0  Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440
* Error 2 occurred at disk power-on lifetime: 27421 hours (1142 days + 13 hours)
  40 59 18 e8 ef 54 e0  Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440
* Error 1 occurred at disk power-on lifetime: 27400 hours (1141 days + 16 hours)
  40 59 18 e8 ef 54 e0  Error: UNC 24 sectors at LBA = 0x0054efe8 = 5566440

These are all for the same LBA -- 5566440.

Error 1 was something we already saw on the page the first time.  So
where did the other two come from?  Earlier on the web page I saw these
commands being executed:

sh ./bad_block_scan /dev/ad2 5566400 5566500   -- will hit bad LBA
sh ./bad_block_scan /dev/ad2 5566000 5566500   -- will hit bad LBA
sh ./bad_block_scan /dev/ad2 556 5566000   -- will not hit bad LBA
sh ./bad_block_scan /dev/ad2 556 5566000   -- will not hit bad LBA

So there's the explanation for the two newly-added entries in the SMART
error log.  I'm very surprised if bad_block_scan did not echo that it
had encountered read errors on LBA 5566440.  It should have, unless I
left the script in some weird state.  The commands to use to verify
would be:

dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566439
dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566440
dd if=/dev/ad2 of=/dev/null bs=512 count=1 skip=5566441

(I tend to check around that LBA area as well, just to make sure,
that's why there's 3 commands with -1 and +1 LBAs).  One of these should
return an I/O error, unless the LBA has been remapped already, in which
case it shouldn't.

Finally, there's this very interesting piece of information in the SMART
self-test log (not selective scan log, but the self-test log; meaning
this was the result of smartctl -t long /dev/ad2 at some point):

Num  Test_DescriptionStatus  Remaining LifeTime(hours)  
LBA_of_first_error
# 1  Extended offlineCompleted: read failure   90% 27416
786767

So it seems this is one of those drives which does do a surface scan on
a long test.

But that's interesting -- LBA 786767.

If that's true, then issuing the same dd commands as above (but with
skip changed appropriately) should return an I/O error as well.
Naturally check the SMART error log for verification.

So, it's possible that there are actually two bad LBAs on this drive --
LBA 5566440 and LBA 786767.  I simply don't know about the latter, but
the former is confirmed in the SMART error log.

If either of these LBAs are the ones which Current_Pending_Sector is
referring to, then writes to them should be sufficient to induce
re-analysis.  E.g.:

dd if=/dev/zero of=/dev/ad2 bs=512 count=1 seek=5566440
dd if=/dev/zero of=/dev/ad2 bs=512 count=1 seek=786767

The offsets for seek (not skip!!!) should probably be based on what the
dd reads done earlier would show.  Unless of course what we're seeing is

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - 
From: Steven Hartland kill...@multiplay.co.uk

Looking through the code I believe I may have noticed a scenario which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
   struct prison *ppr, *tpr;
   int vfslocked;

   if (!(flags  PD_LOCKED))
   mtx_lock(pr-pr_mtx);
   /* Decrement the user references in a separate loop. */
   if (flags  PD_DEUREF) {
   for (tpr = pr;; tpr = tpr-pr_parent) {
   if (tpr != pr)
   mtx_lock(tpr-pr_mtx);
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {
   mtx_unlock(tpr-pr_mtx);
   if (flags  PD_LIST_SLOCKED)
   sx_sunlock(allprison_lock);
   else if (flags  PD_LIST_XLOCKED)
   sx_xunlock(allprison_lock);
   return;
   }
   if (tpr != pr) {
   mtx_unlock(tpr-pr_mtx);
   mtx_lock(pr-pr_mtx);
   }
   }

If you take a scenario of a simple one level prison setup running a single 
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by 
prison1)

It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is really being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.

What do people think?


After reviewing the changes to prison_deref in commit which added hierarchical
jails, the removal of the lock by the inital loop on the passed in prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h

If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
-   mtx_unlock(tpr-pr_mtx);
+   if (tpr != pr)
+   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon
on 20/08/2011 23:24 Steven Hartland said the following:
 - Original Message - From: Steven Hartland kill...@multiplay.co.uk
 Looking through the code I believe I may have noticed a scenario which could
 trigger the problem.

 Given the following code:-

 static void
 prison_deref(struct prison *pr, int flags)
 {
struct prison *ppr, *tpr;
int vfslocked;

if (!(flags  PD_LOCKED))
mtx_lock(pr-pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags  PD_DEUREF) {
for (tpr = pr;; tpr = tpr-pr_parent) {
if (tpr != pr)
mtx_lock(tpr-pr_mtx);
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {
mtx_unlock(tpr-pr_mtx);
if (flags  PD_LIST_SLOCKED)
sx_sunlock(allprison_lock);
else if (flags  PD_LIST_XLOCKED)
sx_xunlock(allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(tpr-pr_mtx);
mtx_lock(pr-pr_mtx);
}
}

 If you take a scenario of a simple one level prison setup running a single
 process
 where a prison has just been stopped.

 In the above code pr_uref of the processes prison is decremented. As this is 
 the
 last process then pr_uref will hit 0 and the loop continues instead of 
 breaking
 early.

 Now at the end of the loop iteration the mtx is unlocked so other process can
 now manipulate the jail, this is where I think the problem may be.

 If we now have another process come in and attach to the jail but then 
 instantly
 exit, this process may allow another kernel thread to hit this same bit of 
 code
 and so two process for the same prison get into the section which decrements
 prison0's pr_uref, instead of only one.

 In essence I think we can get the following flow where 1# = process1
 and 2# = process2
 1#1. prison1.pr_uref = 1 (single process jail)
 1#2. prison_deref( prison1,...
 1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
 1#3. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
 1#3. prison0.pr_uref--
 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
 2#2. process1.exit
 2#3. prison_deref( prison1,...
 2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
 2#5. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by 
 prison1)

 It seems like the action on the parent prison to decrement the pr_uref is
 happening too early, while the jail can still be used and without the lock on
 the child jails mtx, so causing a race condition.

 I think the fix is to the move the decrement of parent prison pr_uref's down
 so it only takes place if the jail is really being removed. Either that or
 to change the locking semantics so that once the lock is aquired in this
 prison_deref its not unlocked until the function completes.

 What do people think?
 
 After reviewing the changes to prison_deref in commit which added hierarchical
 jails, the removal of the lock by the inital loop on the passed in prison may
 be unintentional.
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h
 
 
 If so the following may be all that's needed to fix this issue:-
 
 diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
 --- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
 +++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100
 @@ -2455,7 +2455,8 @@
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
 -   mtx_unlock(tpr-pr_mtx);
 +   if (tpr != pr)
 +   mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {

Not sure if this would fly as is - please double check the later block where
pr-pr_mtx is re-locked.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
-   mtx_unlock(tpr-pr_mtx);
+   if (tpr != pr)
+   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {


Not sure if this would fly as is - please double check the later block where
pr-pr_mtx is re-locked.


Will do, I'm now 99.9% sure this is the problem and even better I now have a
reproducible scenario :)

Something else you many be more interested in Andriy:-
I added in debugging options DDB  INVARIANTS to see if I can get a more
useful info and the panic results in a looping panic constantly scrolling up
the console. Not sure if this is a side effect of the patches we've been
trying.

Going to see if I can confirm that, lmk if there's something you want me
to try?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland


- Original Message - 
From: Steven Hartland kill...@multiplay.co.uk



Something else you many be more interested in Andriy:-
I added in debugging options DDB  INVARIANTS to see if I can get a more
useful info and the panic results in a looping panic constantly scrolling up
the console. Not sure if this is a side effect of the patches we've been
trying.

Going to see if I can confirm that, lmk if there's something you want me
to try?


Seems the stop_scheduler_on_panic.8.x.patch is the cause of this.

Removing it allows me to drop to ddb when the panic due to the KASSERT
happens.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



on 20/08/2011 23:24 Steven Hartland said the following:

- Original Message - From: Steven Hartland

Looking through the code I believe I may have noticed a scenario which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
   struct prison *ppr, *tpr;
   int vfslocked;

   if (!(flags  PD_LOCKED))
   mtx_lock(pr-pr_mtx);
   /* Decrement the user references in a separate loop. */
   if (flags  PD_DEUREF) {
   for (tpr = pr;; tpr = tpr-pr_parent) {
   if (tpr != pr)
   mtx_lock(tpr-pr_mtx);
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {
   mtx_unlock(tpr-pr_mtx);
   if (flags  PD_LIST_SLOCKED)
   sx_sunlock(allprison_lock);
   else if (flags  PD_LIST_XLOCKED)
   sx_xunlock(allprison_lock);
   return;
   }
   if (tpr != pr) {
   mtx_unlock(tpr-pr_mtx);
   mtx_lock(pr-pr_mtx);
   }
   }

If you take a scenario of a simple one level prison setup running a single
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by 
prison1)

It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is really being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.

What do people think?


After reviewing the changes to prison_deref in commit which added hierarchical
jails, the removal of the lock by the inital loop on the passed in prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h


If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
-   mtx_unlock(tpr-pr_mtx);
+   if (tpr != pr)
+   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {


Not sure if this would fly as is - please double check the later block where
pr-pr_mtx is re-locked.


Your right, and its actually more complex than that. Although changing it to
not unlock in the middle of prison_deref fixes that race condition it doesn't
prevent pr_uref being incorrectly decremented each time the jail gets into
the dying state, which is really the problem we are seeing.

If hierarchical prisons are used there seems to be an additional problem
where the counter of all prisons in the hierarchy are decremented, but as
far as I can tell only the immediate parent is ever incremented, so another
reference problem there as well I think.

The following patch I believe fixes both of these issues.

I've testing with debug added and confirmed prison0's pr_uref is maintained
correctly even when a jail hits dying state multiple times.

It essentially reverts the changes to the if (flags  PD_DEUREF) by

Re: bad sector in gmirror HDD

2011-08-20 Thread perryh
Jeremy Chadwick free...@jdc.parodius.com wrote:

 ... using dd to find the bad LBAs is the only choice he has.

or sysutils/diskcheckd.  It uses a 64KB blocksize, falling back to
512 -- to identify the bad LBA(s) -- after getting a failure when
reading a large block, and IME it runs something like 10x faster
than dd with bs=64k.

It would be advisable to check syslog configuration before using
diskcheckd, since that is how it reports and there is reason to
suspect that the as-shipped syslog.conf may discard at least some
of diskcheckd's messages.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bad sector in gmirror HDD

2011-08-20 Thread Jeremy Chadwick
On Sun, Aug 21, 2011 at 02:00:33AM -0700, per...@pluto.rain.com wrote:
 Jeremy Chadwick free...@jdc.parodius.com wrote:
 
  ... using dd to find the bad LBAs is the only choice he has.
 
 or sysutils/diskcheckd.  It uses a 64KB blocksize, falling back to
 512 -- to identify the bad LBA(s) -- after getting a failure when
 reading a large block, and IME it runs something like 10x faster
 than dd with bs=64k.
 
 It would be advisable to check syslog configuration before using
 diskcheckd, since that is how it reports and there is reason to
 suspect that the as-shipped syslog.conf may discard at least some
 of diskcheckd's messages.

That software has a major problem where it runs constantly, rather than
periodically.  I know because I'm the one who opened the PR on it:

http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/115853

There's a discussion about this port/issue from a few days ago (how
sweet!):

http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069276.html

With comments from you stating that the software is behaving as designed
and that I misread the man page, but also stating point blank that
either way the software runs continuously (which is what the PR was
about in the first place):

http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069321.html

I closed the PR because when I left as a committer I no longer wanted to
deal with the issue.  I probably should have marked the PR as suspended,
but either way it's an ordeal that needs to get dealt with; it
absolutely should be re-opened in some way.

Then there's this PR, which I fully agree should have *nothing* to do
with gmirror, so I'm not even sure how to interpret what's written.
Furthermore, the author of this PR commented in PR 115853 stating
something completely different (read the first few lines very
carefully/slowly -- it seems to indicate he agrees with my PR, but then
opened up a separate PR with different wording):

http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/143566

Back to my PR.

I state that I set up diskcheckd.conf using the option you describe as
a length of time over which to spread each pass, yet what happened was
that it did as much I/O as it could (read the entire disk in 45 minutes)
then proceeded to do it again (no sleep()).  That is not the same thing
as do I/O over the course of 7 days.

Furthermore, the man page example gives this:

   EXAMPLES
   To check all of /dev/ad0 for errors once every two weeks, use
   this entry in diskcheckd.conf:

 /dev/ad0*   14  *

Which is no different than what I specified in my PR other than that I
used a value of 7 and the example uses 14.  So what about the rest of
the man page?

   The second format consists of four white space separated fields,
   which are the full pathname of the disk device, the size of that disk,
   the frequency in days at which to check that disk, and the rate in kilo-
   bytes per second at which to check this disk.  Naturally, it would be
   contradictory to specify both the frequency and the rate, so only one of
   these should be specified.  Additionally, the size of the disk should not
   be specified if the rate is specified, as this information is unneces-
   sary.

I did not misread the man page, especially given what's in EXAMPLES.
It's a bug somewhere -- either in the man page or the software itself.
This software will burn through your drive constantly, unless you use
the rate-in-kilobytes-per-second field.  The frequency field doesn't
work as advertised.

And besides, such a utility really shouldn't be a daemon anyway but a
periodic(8)-called utility with appropriate locks put in place to ensure
more than one instance can't be run at once.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org