Re: 32GB limit per swap device?

2011-08-18 Thread Alexander V. Chernikov

On 10.08.2011 19:16, per...@pluto.rain.com wrote:

Chuck Swigercswi...@mac.com  wrote:


On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:

I am trying to set up 64GB partitions for swap for a system that
has 64GB of RAM (with the idea to dump kernel core etc). But, on
8-stable as of today I get:

WARNING: reducing size to maximum of 67108864 blocks per swap unit

Is there workaround for this limitation?


Another interesting question:

swap pager operates in page blocks (PAGE_SIZE=4k on common arch).

Block device size in passed to swaponsomething() in number of _disk_ 
blocks  (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of 
which swap pager is build) maximum objects check is enforced.


The (possible) problem is that real object count we will operate on is 
not the value passed to swaponsomething() since it is calculated in 
wrong units.


we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value 
which is rough (X / 8) so we should be able to address 32*8=256G.


The code should look like this:

Index: vm/swap_pager.c
===
--- vm/swap_pager.c (revision 223877)
+++ vm/swap_pager.c (working copy)
@@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
u_long mblocks;

/*
+* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
+* First chop nblks off to page-align it, then convert.
+*
+* sw-sw_nblks is in page-sized chunks now too.
+*/
+   nblks = ~(ctodb(1) - 1);
+   nblks = dbtoc(nblks);
+
+   /*
 * If we go beyond this, we get overflows in the radix
 * tree bitmap code.
 */
@@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
mblocks);
nblks = mblocks;
}
-   /*
-* nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
-* First chop nblks off to page-align it, then convert.
-*
-* sw-sw_nblks is in page-sized chunks now too.
-*/
-   nblks = ~(ctodb(1) - 1);
-   nblks = dbtoc(nblks);

sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
sp-sw_vp = vp;


(move pages recalculation before b-list check)


Can someone comment on this?




Apparently, the 32GB swapspace limit is per swap area; you can add
up to 4 swap areas so create two or three 32GB swap partitions.


Will that enable a 64GB dump?  In 8.1, dumpon(8) says:

kernel swap pager and dump facility are completely unrelated to each other.
The only possible relation is that dumpon rc-script searches first swap 
device in fstab to notify kernel it should dump on this device.


  The dumpon utility is used to specify a device where the kernel
  can save a crash dump in the case of a panic.
  ...
  For most systems the size of the specified dump device must be
  at least the size of physical memory.
  ...
  The dumpon utility will refuse to enable a dump device which is
  smaller than the total amount of physical memory as reported by
  the hw.physmem sysctl(8) variable.

Note the use of the singluar:  a device and the specified device.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


WD Advanced Format: do I need to do something special?

2011-08-18 Thread Yuri
WD has sectors of the size 4kB in their latest hard drives, which is 
different from the traditional 512B.

http://www.wdc.com/advformat
http://wdc.custhelp.com/app/answers/detail/a_id/5655

These articles assert that something special should be done in OS to 
enable high performance of such drives. For ex. WD recommends to install 
some latest drivers of particular version.
But what about FreeBSD? Should it be configured in some special way too 
for these drive to perform well?

Is it aware of 4kB sector size?

Yuri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: WD Advanced Format: do I need to do something special?

2011-08-18 Thread Jeremy Chadwick
On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote:
 WD has sectors of the size 4kB in their latest hard drives, which is
 different from the traditional 512B.
 http://www.wdc.com/advformat
 http://wdc.custhelp.com/app/answers/detail/a_id/5655
 
 These articles assert that something special should be done in OS to
 enable high performance of such drives. For ex. WD recommends to
 install some latest drivers of particular version.
 But what about FreeBSD? Should it be configured in some special way
 too for these drive to perform well?
 Is it aware of 4kB sector size?

The below advice still applies.  Do not skim the page, read it.

http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html

You will therefore have to go through some manual rigmarole (preferably
with gpart(8)) to ensure performance.  If you plan on using the disks in
ZFS, you get to go through some extra rigmarole.

Also be aware that mixed LBA sizes on things like RAID (and possibly
ZFS?) may result in abysmal performance.  I just got done assisting a
user on a forum who had horrible performance on his 2-disk RAID-1 array
driven by an Intel ICH9R using Intel's native RST driver under 64-bit
Windows.  How/why?

He bought two drives, both WD10EADS (not a typo).  However, one drive
was WD10EADS-65M2BX (firmware 01.00A01, 512 byte physical, 512 byte
logical) while the other was WD10EADS-11M2B1 (firmware 80.00A80, 4096
byte physical, 512 byte logical).

He replaced the WD10EADS-65M2BX drive with another 4KB physical drive
and his performance problem disappeared.

I only point this out because this could happen to any user.  Oh I need
to get a replacement WD10EADS drive for my system... what the heck?!?
This is going to confuse a lot of people, and caught me by surprise when
I saw it.  Shame on Western Digital for not adjusting the model string!

Comparatively, the WD EARS-model drives, however, have always been
4KByte physical / 512 byte logical.  The logical size is set to 512 to
ensure full compatibility with existing and legacy OSes.

I'm dreading the day the WD Caviar Black models succumb to all this
nonsense.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Andriy Gapon
on 18/08/2011 02:15 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 
 Thanks to the debug that Steven provided and to the help that I received from
 Kostik, I think that now I understand the basic mechanics of this panic, but,
 unfortunately, not the details of its root cause.

 It seems like everything starts with some kind of a race between terminating
 processes in a jail and termination of the jail itself.  This is where the
 details are very thin so far.  What we see is that a process (http) is in
 exit(2) syscall, in exit1() function actually, and past the place where 
 P_WEXIT
 flag is set and even past the place where p_limit is freed and reset to NULL.
 At that place the thread calls prison_proc_free(), which calls 
 prison_deref().
 Then, we see that in prison_deref() the thread gets a page fault because of 
 what
 seems like a NULL pointer dereference.  That's just the start of the problem 
 and
 its root cause.
 
 Thats interesting, are you using http as an example or is that something thats
 been gleaned from the debugging of our output? I ask as there's only one 
 process
 running in each of our jails and thats a single java process.


It's from the debug data: p_comm = httpd
I also would like to ask you to revert the last patch that I sent you (with 
tf_rip
comparisons) and try the patch from Kostik instead.
Given what we suspect about the problem, can please also try to provoke the
problem by e.g. doing frequent jail restarts or something else that supposedly
should hit the bug.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: WD Advanced Format: do I need to do something special?

2011-08-18 Thread Xin LI
Hi,

On Thu, Aug 18, 2011 at 1:47 AM, Yuri y...@rawbw.com wrote:
 WD has sectors of the size 4kB in their latest hard drives, which is
 different from the traditional 512B.
 http://www.wdc.com/advformat
 http://wdc.custhelp.com/app/answers/detail/a_id/5655

 These articles assert that something special should be done in OS to enable
 high performance of such drives. For ex. WD recommends to install some
 latest drivers of particular version.
 But what about FreeBSD? Should it be configured in some special way too for
 these drive to perform well?
 Is it aware of 4kB sector size?

The FreeBSD driver detects 4k drives.

At this time as far as I know all AF drives on market advertises
512-bytes sector rather than 4k (mostly for compatibility with BIOS,
etc).  If they advertise 4k sector natively, you don't have to do
anything special but currently you need to make sure:

 - FS Partitions starts at a 4k boundary;
 - FS is aware of 4k sector, e.g. through gnop -S 4k for ZFS, which
will remember this so you don't have to do that at later time.  For
UFS you may want to specify larger fragment size and block size
(4k/32k for example).

Some newly developed application like FreeNAS already detect this and
make adjustment for you by default.  We need to check and make sure
that our base system tools, especially installer, would do that
though.

Cheers,
-- 
Xin LI delp...@delphij.net https://www.delphij.net/
FreeBSD - The Power to Serve! Live free or die
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: WD Advanced Format: do I need to do something special?

2011-08-18 Thread Yuri

On 08/18/2011 02:17, Jeremy Chadwick wrote:

The below advice still applies.  Do not skim the page, read it.

http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html

You will therefore have to go through some manual rigmarole (preferably
with gpart(8)) to ensure performance.  If you plan on using the disks in
ZFS, you get to go through some extra rigmarole.


I didn't know about such extra actions that are required and just 
created ZFS pool.
zdb -C mypool shows ashift as 9. I read it as meaning that sector size 
if 512bytes (wrong!).


But I tested the 25GB file writing/reading speed on the middle tracks 
and it seems reasonable:

WR 55MB/s
RD 107MB/s

So can I get even better speeds if it was aware of 4k sector?

Yuri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: WD Advanced Format: do I need to do something special?

2011-08-18 Thread Marc Fonvieille
On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote:
 WD has sectors of the size 4kB in their latest hard drives, which is 
 different from the traditional 512B.
 http://www.wdc.com/advformat
 http://wdc.custhelp.com/app/answers/detail/a_id/5655
 
 These articles assert that something special should be done in OS to 
 enable high performance of such drives. For ex. WD recommends to install 
 some latest drivers of particular version.
 But what about FreeBSD? Should it be configured in some special way too 
 for these drive to perform well?
 Is it aware of 4kB sector size?


I own that (I'm running 8-STABLE):

ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
ada0: WDC WD10EARS-00Y5B1 80.00A80 ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)

which has 4kB sectors but says 512 byte sectors :)

I use the whole disk for the FreeBSD slice, I aligned all partitions on
a multiple of 8 sectors (512*8=4096).

By default fdisk(8) uses a 63 sectors default offset:

*** Working on device /dev/ada0 ***
parameters extracted from in-core disklabel are:
cylinders=1938021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=1938021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 1953525105 (953869 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 1023/ head 15/ sector 63
The data for partition 2 is:
UNUSED
The data for partition 3 is:
UNUSED
The data for partition 4 is:
UNUSED


Look at start 63 statement.  Instead of fixing fdisk(8) behavior, I just
correctly edited my bsdlabel(8) table:

# /dev/ada0s1:
8 partitions:
#  size offsetfstype   [fsize bsize bps/cpg]
  a:4194304 174.2BSD0 0 0
  b:83886084194321  swap
  c: 1953525105  0unused0 0 # raw part, don't edit
  d:   16777216   125829294.2BSD0 0 0
  e: 1924163584   293601454.2BSD0 0 0


The important part is the offset 17 to correct the fdisk(8) offset (16+1
to align the previous 63).  The remaining offsets are calculted from the
size I gave for the partitions (in MB, which can be divided by 8).
Then I used newfs(8) with the option -f 4096.


There's another painful issue with this disk: the automatic head-parking
after few seconds.  I disabled it (with wdidle3) cause after 2 months of
use, I was at more than 35000 head-parkings...

-- 
Marc
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org

Thats interesting, are you using http as an example or is that something thats
been gleaned from the debugging of our output? I ask as there's only one process
running in each of our jails and thats a single java process.



It's from the debug data: p_comm = httpd


Hmm, there's only one httpd thats ever run on the machine and thats not in the 
jail
its on the raw machine.


I also would like to ask you to revert the last patch that I sent you (with 
tf_rip
comparisons) and try the patch from Kostik instead.


Sure.


Given what we suspect about the problem, can please also try to provoke the
problem by e.g. doing frequent jail restarts or something else that supposedly
should hit the bug.


I've tried doing this for quite some days on the test machine, but I've been
unable to provoke it, will continue to try.

   Regards
   Steve




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Andriy Gapon
on 18/08/2011 13:35 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 Thats interesting, are you using http as an example or is that something 
 thats
 been gleaned from the debugging of our output? I ask as there's only one 
 process
 running in each of our jails and thats a single java process.


 It's from the debug data: p_comm = httpd
 
 Hmm, there's only one httpd thats ever run on the machine and thats not in 
 the jail
 its on the raw machine.

Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
something to do with an actual jail, while it could have been just prison0 where
all non-jailed processes belong.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
something to do with an actual jail, while it could have been just prison0 where
all non-jailed processes belong.


That makes sense as this particular panic was caused by a machine reboot,
which is slightly different from the more common jail panic we're seeing.

Doesn't help with our reproduction scenario though unfortunately. If we
don't have any joy reproducing on our single test machine I'll have this
kernel rolled out across a portion of the farm, which should mean we
see the panic results in a few days time.

I understand there's a risk involved in this but, its important for us
to determine the cause and get a confirmed fix, as well as being able
to prove that the panic fix works which will help everyone in the long
run.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: can not boot from RAIDZ with 8-STABLE

2011-08-18 Thread Miroslav Lachman

Artem Belevich wrote:

On Wed, Aug 17, 2011 at 12:40 PM, Miroslav Lachman000.f...@quip.cz  wrote:

Thank you guys, you are right. The BIOS provides only 1 disk to the loader!
I checked it from loader prompt by lsdev (booted from USB external HDD).

So I will try to make a small zpool mirror for root and boot (if ZFS mirror
can be made of 4 providers instead of two) and the rest will be in RAIDZ.

If that fails, I will go my old way with internal USB flash disk with UFS
for booting and RAIDZ of 4 disks for storage as I did it few years ago with
7.0 or 7.1.


You seem to be booting from disks attached to some sort of add-on
card. Sometimes those have per-disk 'bootable' option in their own
extension ROM. You may investigate yours. Perhaps all you need to do
is just tweak controller settings.


Advanced controller settings allows me to choose which disk will be 
bootable - but I can mark just one of them, not all.


So my working setup is made from 2 pools. First is 4 way ZFS mirror for 
/ (root), second is RAIDZ for the rest.

(plus swap made on the top of gmirrored partitions)

Each disk has following partitions:

# gpart show da0
=   34  976773101  da0  GPT  (465G)
 341281  freebsd-boot  (64k)
16283886082  freebsd-swap  (4.0G)
8388770   209715203  freebsd-zfs  (10G)
   29360290  9437184004  freebsd-zfs  (450G)
  9730786903694445   - free -  (1.8G)


# zpool list
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
sys9.94G   781M  9.17G 7%  1.00x  ONLINE  -
tank   1.75T  4.77G  1.75T 0%  1.00x  ONLINE  -


Filesystem SizeMounted on
sys/root   9.8G/
devfs  1.0k/dev
tank/tmp   1.3T/tmp
tank/usr/home  1.3T/usr/home
tank/usr/home/quip 1.3T/usr/home/quip
tank/usr/local 1.3T/usr/local
tank/usr/obj   1.3T/usr/obj
tank/usr/ports 1.3T/usr/ports
tank/usr/ports/distfiles   1.3T/usr/ports/distfiles
tank/usr/ports/packages1.3T/usr/ports/packages
tank/usr/src   1.3T/usr/src
tank/var/amavis1.3T/var/amavis
tank/var/audit 1.3T/var/audit
tank/var/crash 1.3T/var/crash
tank/var/db1.3T/var/db
tank/var/db/mysql  1.3T/var/db/mysql
tank/var/log   1.3T/var/log
tank/var/mail  1.3T/var/mail
tank/var/tmp   1.3T/var/tmp
tank/var/virusmails1.3T/var/virusmails
tank/vol0  1.3T/vol0


I hope that it helps to somebody with similar problem.

Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Andriy Gapon
on 18/08/2011 14:11 Andriy Gapon said the following:
 Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
 something to do with an actual jail, while it could have been just prison0 
 where
 all non-jailed processes belong.

So, indeed:
(kgdb) p $2-p_ucred-cr_prison
$10 = (struct prison *) 0x807d5080
(kgdb) p prison0
$11 = (struct prison *) 0x807d5080
(kgdb) p *$2-p_ucred-cr_prison
$12 = {pr_list = {tqe_next = 0x0, tqe_prev = 0x0}, pr_id = 0, pr_ref = 398,
pr_uref = 0, pr_flags = 386, pr_children = {lh_first = 0x0}, pr_sibling = 
{le_next
= 0x0, le_prev = 0x0}, pr_parent = 0x0,
  pr_mtx = {lock_object = {lo_name = 0x8063007c jail mutex, lo_flags =
16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, pr_task = {ta_link =
{stqe_next = 0x0}, ta_pending = 0,
ta_priority = 0, ta_func = 0, ta_context = 0x0}, pr_osd = {osd_nslots = 0,
osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, pr_cpuset =
0xff0012d65dc8, pr_vnet = 0x0,
  pr_root = 0xff00166ebce8, pr_ip4s = 0, pr_ip6s = 0, pr_ip4 = 0x0, pr_ip6 =
0x0, pr_sparep = {0x0, 0x0, 0x0, 0x0}, pr_childcount = 0, pr_childmax = 99,
pr_allow = 127, pr_securelevel = -1,
  pr_enforce_statfs = 0, pr_spare = {0, 0, 0, 0, 0}, pr_hostid = 3251597242,
pr_name = 0, '\0' repeats 254 times, pr_path = /, '\0' repeats 1022 
times,
  pr_hostname = censored, '\0' repeats 231 times, pr_domainname = '\0'
repeats 255 times, pr_hostuuid = 54443842-0054-2500-902c-0025902c3cb0, '\0'
repeats 27 times}

Also, let's consider this code:
if (flags  PD_DEUREF) {
for (tpr = pr;; tpr = tpr-pr_parent) {
if (tpr != pr)
mtx_lock(tpr-pr_mtx);
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {
mtx_unlock(tpr-pr_mtx);
if (flags  PD_LIST_SLOCKED)
sx_sunlock(allprison_lock);
else if (flags  PD_LIST_XLOCKED)
sx_xunlock(allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(tpr-pr_mtx);
mtx_lock(pr-pr_mtx);
}
}

The most suspicious thing is that pr_uref is zero in the debug data.
With INVARIANTS we would hit the prison0 pr_uref=0 KASSERT.

Then, because this is prison0 and because pr_uref reached zero, tpr gets 
assigned
to NULL.  And then because tpr != pr we try to execute mtx_unlock(tpr-pr_mtx).
That's where the NULL pointer deref happens.

So, now the big question is how/why we reached pr_uref == 0.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: USB/coredump hangs in 8 and 9

2011-08-18 Thread Andriy Gapon
on 12/08/2011 22:59 Andrew Boyer said the following:
 Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net)
 
 Re: debugging frequent kernel panics on 8.2-RELEASE (originally on 
 freebsd-stable)
 
 Re: System hang in USB umass module while processing panic  (originally on
 freebsd-usb)
 
 Hello Andriy and Hans,
 
 Sorry for tying in so many discussions on this topic, but I think I have an
 explanation for the problems we have been reporting* with hanging coredumps on
 multicore systems on 8.2-RELEASE, and it has implications for Andriy's 
 proposed
 scheduler patch** and for USB.
 
 In today's 8.X and 9.X branches, nothing that I can find stops the other CPUs 
 when
 the kernel panics, but many parts of the locking code get disabled (grep on
 'panicstr').  The 'bufwrite: buffer is not busy???' panic is caused by the 
 syncer
 encountering an error.  If that happens when it's on the dumping CPU 
 everything
 hangs.  If it's running on a different CPU, it will be blocked and hidden by 
 the
 panic_cpu spinlock in panic(), and the dump continues, polling every attached
 keyboard for a Ctl-C.
 
 But, the new 8.X USB stack relies on multithreading.  (The new stack is the
 variable that broke coredumps for us in the 7.1-8.2 transition, I think.)  
 SVN
 224223 fixes a hang that would happen when dumpsys() polls the USB keyboard 
 (IPMI
 KVM, in our case).  That helps, but it only gets as far as usb_process(), 
 where it
 hangs in a loop around a cv_wait() call.  This is easy to reproduce by adding 
 code
 to the watchdog to break into the debugger if panicstr is set.
 
 I am experimenting with Andriy's patch** to stop the scheduler and it seems 
 to be
 most of the way there, stopping the CPUs and disabling the rest of locking.  
 There
 are a few places that still reference panicstr, but that's minor.  These are 
 the
 changes I made to the patch:
  * Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED() is 
 true, so
 that we don't hang up in USB.  ukbd_yield()  locks up in DROP_GIANT(), and if 
 you
 skip ukbd_yield(), usbd_transfer_poll() locks up trying to drop mutexes.

Hmm, this is a little bit unexpected.  I though that with the patch all the
mutex/lock operations would be skipped.
Can you please check which locks give you the trouble and why?
I would like to improve the patch, so that all lock operations are by-passed
(whether locking or unlocking).

  * Changed the call to spinlock_enter() back to critical_enter(), so that
 interrupts stay enabled and the hardclock still functions.

Not sure if I like this idea in general.

  * Added code in the beginning of panic() to switch to CPU 0, so that we're 
 able
 to service the hardclock interrupts and so that watchdog panics get through.

Also I wouldn't like switching a panic thread to a different CPU as that messes 
up
with a lot of state and is not safe for an arbitrary context.
Also, can you please clarify what you meant by watchdog panics get through?
Do you talk about SW_WATCHDOG specifically?

 This has worked 100% for me so far, although anyone using a USB keyboard or 
 dump
 device would still be out of luck.
 
 Thoughts?  It seems like stopping all of the other CPUs is the right thing to 
 do
 on a panic (what are they doing otherwise?).  Are the USB issues fixable?  If
 Andriy's patch get committed it might just involve short-circuiting all of the
 locking in the polling path, but I haven't gotten that far yet.  I bet 
 dumping to
 NFS will have the same problem.

I think that no subsystem should rely on working scheduling and interrupts in
post-panic world.  In fact, all the code for skipping locking is just a giant
hack/workaround in my opinion.  Ideally, all the subsystems that can be expected
to be called after panic should be aware of that and should check for that.  So
they should not attempt any locking or switching threads or rebinding CPUs or
expect interrupts, etc.  The environment should mirror early boot where we have
only one CPU, only one thread, no interrupts, only polling.

If you can help Hans to figure out what you is wrong with USB subsystem in this
respect that would help us all.

Thank you for your testing and feedback!
-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Andriy Gapon

on 17/08/2011 23:21 Andriy Gapon said the following:

It seems like everything starts with some kind of a race between terminating
processes in a jail and termination of the jail itself.  This is where the
details are very thin so far.  What we see is that a process (http) is in
exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
flag is set and even past the place where p_limit is freed and reset to NULL.
At that place the thread calls prison_proc_free(), which calls prison_deref().
Then, we see that in prison_deref() the thread gets a page fault because of what
seems like a NULL pointer dereference.  That's just the start of the problem and
its root cause.

Then, trap_pfault() gets invoked and, because addresses close to NULL look like
userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes
on to call vm_map_growstack.  First thing that vm_map_growstack does is a call
to lim_cur(), but because p_limit is already NULL, that call results in a NULL
pointer dereference and a page fault.  Goto the beginning of this paragraph.

So we get this recursion of sorts, which only ends when a stack is exhausted and
a CPU generates a double-fault.


BTW, does anyone has an idea why the thread in question would disappear from
the kgdb's point of view?

(kgdb) p cpuid_to_pcpu[2]-pc_curthread-td_tid
$3 = 102057
(kgdb) tid 102057
invalid tid

info threads also doesn't list the thread.

Is it because the panic happened while the thread was somewhere in exit1()?
is there an easy way to examine its stack in this case?

--
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Attilio Rao
2011/8/18 Andriy Gapon a...@freebsd.org:
 on 17/08/2011 23:21 Andriy Gapon said the following:

 It seems like everything starts with some kind of a race between
 terminating
 processes in a jail and termination of the jail itself.  This is where the
 details are very thin so far.  What we see is that a process (http) is in
 exit(2) syscall, in exit1() function actually, and past the place where
 P_WEXIT
 flag is set and even past the place where p_limit is freed and reset to
 NULL.
 At that place the thread calls prison_proc_free(), which calls
 prison_deref().
 Then, we see that in prison_deref() the thread gets a page fault because
 of what
 seems like a NULL pointer dereference.  That's just the start of the
 problem and
 its root cause.

 Then, trap_pfault() gets invoked and, because addresses close to NULL look
 like
 userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn
 goes
 on to call vm_map_growstack.  First thing that vm_map_growstack does is a
 call
 to lim_cur(), but because p_limit is already NULL, that call results in a
 NULL
 pointer dereference and a page fault.  Goto the beginning of this
 paragraph.

 So we get this recursion of sorts, which only ends when a stack is
 exhausted and
 a CPU generates a double-fault.

 BTW, does anyone has an idea why the thread in question would disappear
 from
 the kgdb's point of view?

 (kgdb) p cpuid_to_pcpu[2]-pc_curthread-td_tid
 $3 = 102057
 (kgdb) tid 102057
 invalid tid

 info threads also doesn't list the thread.

 Is it because the panic happened while the thread was somewhere in exit1()?
 is there an easy way to examine its stack in this case?

Yes it is likely it.

'tid' command should lookup the tid_to_thread() table (or similar
name) which returns NULL, which means the thread has past beyond the
point it was in the lookup table.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: USB/coredump hangs in 8 and 9

2011-08-18 Thread Hans Petter Selasky
On Thursday 18 August 2011 19:04:10 Andriy Gapon wrote:
 If you can help Hans to figure out what you is wrong with USB subsystem in
 this respect that would help us all.

Hi,

usb_busdma.c:   /* we use mtx_owned() instead of this function */
usb_busdma.c:   owned = mtx_owned(uptag-mtx);
usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
usb_compat_linux.c: do_unlock = mtx_owned(Giant) ? 0 : 1;
usb_hub.c:  if (mtx_owned(bus-bus_mtx)) {
usb_transfer.c: if (!mtx_owned(info-xfer_mtx)) {
usb_transfer.c: if (mtx_owned(xfer-xroot-xfer_mtx)) {
usb_transfer.c: while (mtx_owned(xroot-udev-bus-bus_mtx)) {
usb_transfer.c: while (mtx_owned(xroot-xfer_mtx)) {

One fix you will need to do, if mtx_owned is not giving correct value is:

static void
usbd_callback_wrapper(struct usb_xfer_queue *pq)
{
struct usb_xfer *xfer = pq-curr;
struct usb_xfer_root *info = xfer-xroot;

USB_BUS_LOCK_ASSERT(info-bus, MA_OWNED);
if (!mtx_owned(info-xfer_mtx)) {

The above if should be anded with  !paniced  !dumping ... or maybe the 
new not scheduling variable is good for this purpose?

/*
 * Cases that end up here:
 *

#if USB_HAVE_BUSDMA
if (mtx_owned(xfer-xroot-xfer_mtx)) {
struct usb_xfer_queue *pq;


This case is more like a BUS-DMA error case, and is not so important to 
execute.

--HPS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: WD Advanced Format: do I need to do something special?

2011-08-18 Thread Kevin Oberman
On Thu, Aug 18, 2011 at 3:10 AM, Marc Fonvieille black...@freebsd.org wrote:
 On Thu, Aug 18, 2011 at 01:47:26AM -0700, Yuri wrote:
 WD has sectors of the size 4kB in their latest hard drives, which is
 different from the traditional 512B.
 http://www.wdc.com/advformat
 http://wdc.custhelp.com/app/answers/detail/a_id/5655

 These articles assert that something special should be done in OS to
 enable high performance of such drives. For ex. WD recommends to install
 some latest drivers of particular version.
 But what about FreeBSD? Should it be configured in some special way too
 for these drive to perform well?
 Is it aware of 4kB sector size?


 I own that (I'm running 8-STABLE):

 ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
 ada0: WDC WD10EARS-00Y5B1 80.00A80 ATA-8 SATA 2.x device
 ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
 ada0: Command Queueing enabled
 ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)

 which has 4kB sectors but says 512 byte sectors :)

 I use the whole disk for the FreeBSD slice, I aligned all partitions on
 a multiple of 8 sectors (512*8=4096).

 By default fdisk(8) uses a 63 sectors default offset:

 *** Working on device /dev/ada0 ***
 parameters extracted from in-core disklabel are:
 cylinders=1938021 heads=16 sectors/track=63 (1008 blks/cyl)

 Figures below won't work with BIOS for partitions not in cyl 1
 parameters to be used for BIOS calculations are:
 cylinders=1938021 heads=16 sectors/track=63 (1008 blks/cyl)

 Media sector size is 512
 Warning: BIOS sector numbering starts with sector 1
 Information from DOS bootblock is:
 The data for partition 1 is:
 sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 1953525105 (953869 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 1;
        end: cyl 1023/ head 15/ sector 63
 The data for partition 2 is:
 UNUSED
 The data for partition 3 is:
 UNUSED
 The data for partition 4 is:
 UNUSED


 Look at start 63 statement.  Instead of fixing fdisk(8) behavior, I just
 correctly edited my bsdlabel(8) table:

 # /dev/ada0s1:
 8 partitions:
 #          size     offset    fstype   [fsize bsize bps/cpg]
  a:    4194304         17    4.2BSD        0     0     0
  b:    8388608    4194321      swap
  c: 1953525105          0    unused        0     0     # raw part, don't 
 edit
  d:   16777216   12582929    4.2BSD        0     0     0
  e: 1924163584   29360145    4.2BSD        0     0     0


 The important part is the offset 17 to correct the fdisk(8) offset (16+1
 to align the previous 63).  The remaining offsets are calculted from the
 size I gave for the partitions (in MB, which can be divided by 8).
 Then I used newfs(8) with the option -f 4096.


 There's another painful issue with this disk: the automatic head-parking
 after few seconds.  I disabled it (with wdidle3) cause after 2 months of
 use, I was at more than 35000 head-parkings...

I'd strongly suggest avoiding fdisk(8) and using gpart(8) on 8 and
above. It has an
alignment option that makes this all just work and also allows the use of GPT
formatting. (Watch out for GPT on any system that needs to run 32-bit Windows.)

gpart create -s gpt ada1
gpart bootcode -b /boot/pmbr ada1
gpart add -t freebsd-boot -a 4 -s 128 -b 40 ad0
gpart bootcode -p /boot/gptboot -i 1 ad0
gpart add -t freebsd-ufs -a 4 -s 2097152 ada1
gpart add -t freebsd-swap -a 4 -s 8388608 ada1
gpart add -t freebsd-ufs -a 4 -s 10485760 ada1
gpart add -t freebsd-ufs -a 4 -s 1048576 ada1
gpart add -t freebsd-ufs -a 4 ada1

This will give you a disk with a 1G root, 4G swap, 5G var, .5G tmp and
the remainder for usr.. You can adjust these as you feel appropriate.
I would suggest a careful reading of the gpart(8) man page, as well,
just so you understand what is going on. You might find the Wikipedia
entry for GUID Partition Table intetresting if you want to go the
GPT route.

You can also use gpart create -s mbr to create a traditional MBR
slice/partition setup, There are several on-line articles detailing
this operation.
-- 
R. Kevin Oberman, Network Engineer - Retired
E-mail: kob6...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-18 Thread Hiroki Sato
Chip Camden sterl...@camdensoftware.com wrote
  in 20110818025550.ga1...@libertas.local.camdensoftware.com:

st Quoth Attilio Rao on Thursday, 18 August 2011:
st  In callout_cpu_switch() if a low priority thread is migrating the
st  callout and gets preempted after the outcoming cpu queue lock is left
st  (and scheduled much later) we get this problem.
st 
st  In order to fix this bug it could be enough to use a critical section,
st  but I think this should be really interrupt safe, thus I'd wrap them
st  up with spinlock_enter()/spinlock_exit(). Fortunately
st  callout_cpu_switch() should be called rarely and also we already do
st  expensive locking operations in callout, thus we should not have
st  problem performance-wise.
st 
st  Can the guys I also CC'ed here try the following patch, with all the
st  initial kernel options that were leading you to the deadlock? (thus
st  revert any debugging patch/option you added for the moment):
st  http://www.freebsd.org/~attilio/callout-fixup.diff
st 
st  Please note that this patch is for STABLE_8, if you can confirm the
st  good result I'll commit to -CURRENT and then backmarge as soon as
st  possible.
st 
st  Thanks,
st  Attilio
st 
st
st Thanks, Attilio.  I've applied the patch and removed the extra debug
st options I had added (though keeping debug symbols).  I'll let you know if
st I experience any more panics.

 No panic for 20 hours at this moment, FYI.  For my NFS server, I
 think another 24 hours would be sufficient to confirm the stability.
 I will see how it works...

-- Hiroki


pgpatVE0r5wVx.pgp
Description: PGP signature


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-18 Thread Chip Camden
Quoth Hiroki Sato on Friday, 19 August 2011:
 Chip Camden sterl...@camdensoftware.com wrote
   in 20110818025550.ga1...@libertas.local.camdensoftware.com:
 
 st Quoth Attilio Rao on Thursday, 18 August 2011:
 st  In callout_cpu_switch() if a low priority thread is migrating the
 st  callout and gets preempted after the outcoming cpu queue lock is left
 st  (and scheduled much later) we get this problem.
 st 
 st  In order to fix this bug it could be enough to use a critical section,
 st  but I think this should be really interrupt safe, thus I'd wrap them
 st  up with spinlock_enter()/spinlock_exit(). Fortunately
 st  callout_cpu_switch() should be called rarely and also we already do
 st  expensive locking operations in callout, thus we should not have
 st  problem performance-wise.
 st 
 st  Can the guys I also CC'ed here try the following patch, with all the
 st  initial kernel options that were leading you to the deadlock? (thus
 st  revert any debugging patch/option you added for the moment):
 st  http://www.freebsd.org/~attilio/callout-fixup.diff
 st 
 st  Please note that this patch is for STABLE_8, if you can confirm the
 st  good result I'll commit to -CURRENT and then backmarge as soon as
 st  possible.
 st 
 st  Thanks,
 st  Attilio
 st 
 st
 st Thanks, Attilio.  I've applied the patch and removed the extra debug
 st options I had added (though keeping debug symbols).  I'll let you know if
 st I experience any more panics.
 
  No panic for 20 hours at this moment, FYI.  For my NFS server, I
  think another 24 hours would be sufficient to confirm the stability.
  I will see how it works...
 
 -- Hiroki

Likewise:

$ uptime
 5:37PM  up 21:45, 5 users, load averages: 0.68, 0.45, 0.63

So far, so good (knocks on head).

-- 
.O. | Sterling (Chip) Camden  | http://camdensoftware.com
..O | sterl...@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91  | http://chipstips.com


pgpkzPv5qfAnG.pgp
Description: PGP signature


Re: WD Advanced Format: do I need to do something special?

2011-08-18 Thread Yuri
Following instructions here 
(http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html) I 
destroyed my previous ZFS pool with 512 byte sectors and did this:

gnop create -S 4096 /dev/ad4
zpool create mypool /dev/ad4.nop
zpol create mypool/mydir
zpool export mypool
gnop destroy /dev/ad4.nop
zpool import mypool

Now this command 'zdb -C data | grep ashift' shows ashift=12 (4096 byte 
sectors).


However, when I begin to copy a lot of files files into /mypool/mydir 
online radio player gets severely affected. Sound get interrupted all 
the time. Itrettuptions stop after 1-2 secs after I stop copying.

This didn't happen with sector size 512 bytes.

What is wrong?

Yuri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


crash on 8.2-RELEASE amd64, high-traffic squid server

2011-08-18 Thread Doug Barton

Howdy,

I have some high-traffic squid servers, most of which are running a 
flavor of RELENG_7 very successfully, but one that I've been evaluating 
8.x on has had a lot of problems. Most recently we had the crash below 
twice in the last 2 weeks. Same exact backtrace. Any suggestions on 
where to look would be appreciated.



Thanks,

Doug

#0  doadump () at pcpu.h:224
224 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) #0  doadump () at pcpu.h:224
#1  0x803ec4be in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:419
#2  0x803ec8f1 in panic (fmt=Variable fmt is not available.
)
at /usr/src/sys/kern/kern_shutdown.c:592
#3  0x8069a4d0 in trap_fatal (frame=0x1c, eva=Variable eva is not 
available.
)
at /usr/src/sys/amd64/amd64/trap.c:783
#4  0x8069aab9 in trap (frame=0xff800012f650)
at /usr/src/sys/amd64/amd64/trap.c:592
#5  0x80682e84 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:224
#6  0x80698896 in bcopy ()
at /usr/src/sys/amd64/amd64/support.S:124
#7  0x8044df61 in sbcompress (sb=0xff01d98945e0,
m=0xff010b815300, n=0xff006baa3700)
at /usr/src/sys/kern/uipc_sockbuf.c:779
#8  0x8044e1e6 in sbappendstream_locked (sb=0xff01d98945e0,
m=0xff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534
#9  0x80527530 in tcp_do_segment (m=0xff010b815300, th=Variable 
th is not available.
)
at /usr/src/sys/netinet/tcp_input.c:2588
#10 0x80528b4b in tcp_input (m=0xff010b815300, off0=Variable off0 
is not available.
)
at /usr/src/sys/netinet/tcp_input.c:1029
#11 0x804c3b2c in ip_input (m=0xff010b815300)
at /usr/src/sys/netinet/ip_input.c:787
#12 0x804a631e in netisr_dispatch_src (proto=1, source=Variable 
source is not available.
)
at /usr/src/sys/net/netisr.c:917
#13 0x8049d73d in ether_demux (ifp=0xff0002d3,
m=0xff010b815300) at /usr/src/sys/net/if_ethersubr.c:894
#14 0x8049db2d in ether_input (ifp=0xff0002d3,
m=0xff010b815300) at /usr/src/sys/net/if_ethersubr.c:753
#15 0x8027c18a in em_rxeof (rxr=0xff0002d7c600, count=98,
done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293
#16 0x8027c5a8 in em_handle_que (context=Variable context is not 
available.
)
at /usr/src/sys/dev/e1000/if_em.c:1482
#17 0x80429ab5 in taskqueue_run_locked (queue=0xff0002d8d800)
at /usr/src/sys/kern/subr_taskqueue.c:250
#18 0x80429c4e in taskqueue_thread_loop (arg=Variable arg is not 
available.
)
at /usr/src/sys/kern/subr_taskqueue.c:387
#19 0x803c30f8 in fork_exit (
callout=0x80429c00 taskqueue_thread_loop,
arg=0xff80005a8748, frame=0xff800012fc40)
at /usr/src/sys/kern/kern_fork.c:845
#20 0x8068334e in fork_trampoline ()
at /usr/src/sys/amd64/amd64/exception.S:565
#21 0x in ?? ()
#22 0x in ?? ()
#23 0x in ?? ()
#24 0x in ?? ()
#25 0x in ?? ()
#26 0x in ?? ()
#27 0x in ?? ()
#28 0x in ?? ()
#29 0x in ?? ()
#30 0x in ?? ()
#31 0x in ?? ()
#32 0x in ?? ()
#33 0x in ?? ()
#34 0x in ?? ()
#35 0x in ?? ()
#36 0x in ?? ()
#37 0x in ?? ()
#38 0x in ?? ()
#39 0x in ?? ()
#40 0x in ?? ()
#41 0x in ?? ()
#42 0x in ?? ()
#43 0x in ?? ()
#44 0x in ?? ()
#45 0x8095ac00 in affinity ()
#46 0x in ?? ()
#47 0x in ?? ()
#48 0xff0002d2d8c0 in ?? ()
#49 0xff800012f320 in ?? ()
#50 0xff800012f2c8 in ?? ()
#51 0xff0002c59000 in ?? ()
#52 0x80411db9 in sched_switch (td=0x80429c00,
newtd=0xff80005a8748, flags=Variable flags is not available.
)
at /usr/src/sys/kern/sched_ule.c:1852
Previous frame inner to this frame (corrupt stack?)
(kgdb)


--

Nothin' ever doesn't change, but nothin' changes much.
-- OK Go

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: crash on 8.2-RELEASE amd64, high-traffic squid server

2011-08-18 Thread Jeremy Chadwick
On Thu, Aug 18, 2011 at 07:36:50PM -0700, Doug Barton wrote:
 Howdy,
 
 I have some high-traffic squid servers, most of which are running a
 flavor of RELENG_7 very successfully, but one that I've been
 evaluating 8.x on has had a lot of problems. Most recently we had
 the crash below twice in the last 2 weeks. Same exact backtrace. Any
 suggestions on where to look would be appreciated.
 
 
 Thanks,
 
 Doug
 
 #0  doadump () at pcpu.h:224
 224   pcpu.h: No such file or directory.
   in pcpu.h
 (kgdb) #0  doadump () at pcpu.h:224
 #1  0x803ec4be in boot (howto=260)
 at /usr/src/sys/kern/kern_shutdown.c:419
 #2  0x803ec8f1 in panic (fmt=Variable fmt is not available.
 )
 at /usr/src/sys/kern/kern_shutdown.c:592
 #3  0x8069a4d0 in trap_fatal (frame=0x1c, eva=Variable eva is not 
 available.
 )
 at /usr/src/sys/amd64/amd64/trap.c:783
 #4  0x8069aab9 in trap (frame=0xff800012f650)
 at /usr/src/sys/amd64/amd64/trap.c:592
 #5  0x80682e84 in calltrap ()
 at /usr/src/sys/amd64/amd64/exception.S:224
 #6  0x80698896 in bcopy ()
 at /usr/src/sys/amd64/amd64/support.S:124
 #7  0x8044df61 in sbcompress (sb=0xff01d98945e0,
 m=0xff010b815300, n=0xff006baa3700)
 at /usr/src/sys/kern/uipc_sockbuf.c:779
 #8  0x8044e1e6 in sbappendstream_locked (sb=0xff01d98945e0,
 m=0xff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534
 #9  0x80527530 in tcp_do_segment (m=0xff010b815300, th=Variable 
 th is not available.
 )
 at /usr/src/sys/netinet/tcp_input.c:2588
 #10 0x80528b4b in tcp_input (m=0xff010b815300, off0=Variable 
 off0 is not available.
 )
 at /usr/src/sys/netinet/tcp_input.c:1029
 #11 0x804c3b2c in ip_input (m=0xff010b815300)
 at /usr/src/sys/netinet/ip_input.c:787
 #12 0x804a631e in netisr_dispatch_src (proto=1, source=Variable 
 source is not available.
 )
 at /usr/src/sys/net/netisr.c:917
 #13 0x8049d73d in ether_demux (ifp=0xff0002d3,
 m=0xff010b815300) at /usr/src/sys/net/if_ethersubr.c:894
 #14 0x8049db2d in ether_input (ifp=0xff0002d3,
 m=0xff010b815300) at /usr/src/sys/net/if_ethersubr.c:753
 #15 0x8027c18a in em_rxeof (rxr=0xff0002d7c600, count=98,
 done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293
 #16 0x8027c5a8 in em_handle_que (context=Variable context is not 
 available.
 )
 at /usr/src/sys/dev/e1000/if_em.c:1482
 #17 0x80429ab5 in taskqueue_run_locked (queue=0xff0002d8d800)
 at /usr/src/sys/kern/subr_taskqueue.c:250
 #18 0x80429c4e in taskqueue_thread_loop (arg=Variable arg is not 
 available.
 )
 at /usr/src/sys/kern/subr_taskqueue.c:387
 #19 0x803c30f8 in fork_exit (
 callout=0x80429c00 taskqueue_thread_loop,
 arg=0xff80005a8748, frame=0xff800012fc40)
 at /usr/src/sys/kern/kern_fork.c:845
 #20 0x8068334e in fork_trampoline ()
 at /usr/src/sys/amd64/amd64/exception.S:565
 #21 0x in ?? ()
 #22 0x in ?? ()
 #23 0x in ?? ()
 #24 0x in ?? ()
 #25 0x in ?? ()
 #26 0x in ?? ()
 #27 0x in ?? ()
 #28 0x in ?? ()
 #29 0x in ?? ()
 #30 0x in ?? ()
 #31 0x in ?? ()
 #32 0x in ?? ()
 #33 0x in ?? ()
 #34 0x in ?? ()
 #35 0x in ?? ()
 #36 0x in ?? ()
 #37 0x in ?? ()
 #38 0x in ?? ()
 #39 0x in ?? ()
 #40 0x in ?? ()
 #41 0x in ?? ()
 #42 0x in ?? ()
 #43 0x in ?? ()
 #44 0x in ?? ()
 #45 0x8095ac00 in affinity ()
 #46 0x in ?? ()
 #47 0x in ?? ()
 #48 0xff0002d2d8c0 in ?? ()
 #49 0xff800012f320 in ?? ()
 #50 0xff800012f2c8 in ?? ()
 #51 0xff0002c59000 in ?? ()
 #52 0x80411db9 in sched_switch (td=0x80429c00,
 newtd=0xff80005a8748, flags=Variable flags is not available.
 )
 at /usr/src/sys/kern/sched_ule.c:1852
 Previous frame inner to this frame (corrupt stack?)
 (kgdb)

CC'ing Jack Vogel here, since I see em(4) is involved.  Jack will
probably want this data from the system:

# uname -a   (hostname can be XXX'd out)
# dmesg  (particularly the emX entries and driver version)
# pciconf -lvbc  (specifically the emX entries and related data)
# ifconfig -a(IPs and MACs can be X'd out; mainly interested in
  options and other pieces)
# netstat -m (if possible from a system which has been up a while
  and is a likely crash candidate)
# vmstat -i  (same condition as netstat -m)

There isn't enough data above for me to determine what's going on, but
from the stack trace it looks 

Re: WD Advanced Format: do I need to do something special?

2011-08-18 Thread Andrey V. Elsukov
On 19.08.2011 3:11, Kevin Oberman wrote:
 I'd strongly suggest avoiding fdisk(8) and using gpart(8) on 8 and
 above. It has an
 alignment option that makes this all just work and also allows the use of GPT
 formatting. (Watch out for GPT on any system that needs to run 32-bit 
 Windows.)
 
 gpart create -s gpt ada1
 gpart bootcode -b /boot/pmbr ada1
 gpart add -t freebsd-boot -a 4 -s 128 -b 40 ad0
 gpart bootcode -p /boot/gptboot -i 1 ad0
 gpart add -t freebsd-ufs -a 4 -s 2097152 ada1
 gpart add -t freebsd-swap -a 4 -s 8388608 ada1
 gpart add -t freebsd-ufs -a 4 -s 10485760 ada1
 gpart add -t freebsd-ufs -a 4 -s 1048576 ada1
 gpart add -t freebsd-ufs -a 4 ada1

If you are using gpart with -a option you don't need to specify exactly numbers.
And if you want to align your partition to 4096 bytes you should use -a 4k or 
-a 8.
E.g.

# gpart add -t freebsd-boot -a 4k -s 64k ad0

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature