panic in kevent

2008-11-11 Thread Johan Ström

Hi
One of my DL360G5 boxes running 7.0 had a panic this night:

jb-2 ~$ uname -rsv
FreeBSD 7.0-RELEASE-p4 FreeBSD 7.0-RELEASE-p4 #2: Thu Sep  4 10:49:27  
CEST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DL360G5


The config is a GENERIC with some pf, IPSEC and ALTQ stuff enabled.

jb-2 /usr/obj/usr/src/sys/DL360G5# kgdb kernel.debug /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/ 
libthread_db.so: Undefined symbol ps_pglobal_lookup]

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.

This GDB was configured as amd64-marcel-freebsd.

Unread portion of the kernel message buffer:

panic: page fault
cpuid = 1
Uptime: 40d22h42m5s
Physical memory: 10225 MB
Dumping 867 MB: 852 836 820 804 788 772 756 740 724 708 692 676 660  
644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388  
372 356


#0  doadump () at pcpu.h:194
194 __asm __volatile(movq %%gs:0,%0 : =r (td));
(kgdb) where
#0  doadump () at pcpu.h:194
#1  0x0004 in ?? ()
#2  0x804bb259 in boot (howto=260) at /usr/src/sys/kern/ 
kern_shutdown.c:409
#3  0x804bb65d in panic (fmt=0x104 Address 0x104 out of  
bounds) at /usr/src/sys/kern/kern_shutdown.c:563
#4  0x8079ec84 in trap_fatal (frame=0xff01b33229f0,  
eva=18446742984664492240) at /usr/src/sys/amd64/amd64/trap.c:724
#5  0x8079f055 in trap_pfault (frame=0xb6337780,  
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641
#6  0x8079f998 in trap (frame=0xb6337780) at /usr/src/ 
sys/amd64/amd64/trap.c:410
#7  0x8078560e in calltrap () at /usr/src/sys/amd64/amd64/ 
exception.S:169
#8  0x80494b0b in knlist_remove_kq (knl=0xff0114407748,  
kn=0xff0054f5fc30, knlislocked=0, kqislocked=0)

at /usr/src/sys/kern/kern_event.c:1615
#9  0x80495f58 in kqueue_register (kq=Variable kq is not  
available.

) at /usr/src/sys/kern/kern_event.c:956
#10 0x804962f3 in kern_kevent (td=0xff01b33229f0,  
fd=Variable fd is not available.

) at /usr/src/sys/kern/kern_event.c:673
#11 0x80496ca5 in kevent (td=0xff01b33229f0,  
uap=0xb6337be0) at /usr/src/sys/kern/kern_event.c:594
#12 0x8079f2d7 in syscall (frame=0xb6337c70) at /usr/ 
src/sys/amd64/amd64/trap.c:852
#13 0x8078581b in Xfast_syscall () at /usr/src/sys/amd64/amd64/ 
exception.S:290

#14 0x10999ccc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)


Please let me know if I can help with anything else. Is there any way  
to know which app caused this?
I Did some googling with only one or two similar crashes as result,  
although the hits didn't give much..

I've never had this crash before.

Thanks

--
Johan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: connect(): Operation not permitted

2008-05-18 Thread Johan Ström

On May 18, 2008, at 9:19 AM, Matthew Seaman wrote:


Johan Ström wrote:

drop all traffic)? A check with pfctl -vsr reveals that the actual  
rule inserted is pass on lo0 inet from 123.123.123.123 to  
123.123.123.123 flags S/SA keep state. Where did that keep state  
come from?


'flags S/SA keep state' is the default now for tcp filter rules --  
that
was new in 7.0 reflecting the upstream changes made between the 4.0  
and 4.1

releases of OpenBSD.  If you want a stateless rule, append 'no state'.

http://www.openbsd.org/faq/pf/filter.html#state


Thanks! I was actually looking around in the pf.conf manpage but  
failed to find it yesterday, but looking closer today I now saw it.
Applied the no state (and quick) to the rule, and now no state is  
created.
And the problem I had in the first place seems to have been resolved  
too now, even though it didn't look like a state problem.. (started to  
deny new connections much earlier than the states was full, altough  
maybee i wasnt looking for updates fast enough or something).


Anyways, thanks to all helping me out, and of course thanks to  
everybody involved in FreeBSD/pf and all for great products! Cannot be  
said enough times ;)___

freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


connect(): Operation not permitted

2008-05-17 Thread Johan Ström

Hello

I got a FreeBSD 7 machine running mail services (among other things).  
This machine recently replaced a FreeBSD 6.2 machine doing the same  
tasks.
Now and then I need to send alot of mail to customers (mailing list),  
and one thing i've noticed now after the change is that when I use a  
lot of connections subsequently (high connection rate, even if they  
are very shortlived) inside a jail (dunno if that has anything to do  
with it though), I start to get Operation not permitted in return to  
connect().
I've seen this in the PHP app that sends mail, when it tried to  
connect to localhost, as well as from postfix when it have been trying  
to connect to amavisd on localhost, but also from postfix when it has  
tried to connect to remote SMTP servers.


I do have PF for filtering, but there are no max-src-conn-rate limits  
enabled for any rules that is used for this. However, from one of the  
jail I do have a hfsc queue limiting the outgoing mail traffic from  
one jailed IP. But I'm not sure that this would be the problem, since  
I've also seen the problem when doing localhost connects in the jail,  
and also in other jails on an entierly different IP that is not  
affected.


Does anyone have any clues about what I can look at and tune to fix  
this?


Thanks!

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: connect(): Operation not permitted

2008-05-17 Thread Johan Ström


First of all, for freebsd-pf subscribers, I posted my original problem  
(in the bottom) to freebsd-net earlier, but replies seems to point to  
PF so I'll CC there too..


On May 17, 2008, at 5:19 PM, Alex Trull wrote:


Hi Johan and List,

In my case a few months ago it was pahu. Don't give that fine fellow  
an

account on your precious system !





But seriously, I had a pf-firewalled jail being being used for DNS
testing, with large numbers of udp connections hanging around in pf
state. While the default udp timeout settings in PF are lower than  
those

of the tcp timeouts, it is was still too high for it to to remove the
states in time before hitting the default 10k state limit!

If this is the case with you - run 'pfctl -s state | wc -l' - when  
there
is traffic load you may see that hitting 10k states if you've not  
tuned

that variable.

What to do next - up the state limit or lower the state timeouts. I  
did

both, to be safe.

in /etc/pf.conf these must be at the very top of the file:

# options
# 10k is insanely low, lets raise it..
set limit { frags 16384, states 32768 }
# timeouts - see 'pfctl -s timeouts' for options - you will want to
# change the tcp ones rather than the udp ones for your smtp setup.
# but these are mine, I set them for the dns traffic.
set timeout { udp.first 15, udp.single 5, udp.multiple 30 }


don't forget to:

$ /etc/rc.d/pf check  /etc/rc.d/pf reload


Ok, looked over the PF states now, but I'm not quite sure thats what  
causing it. I have default limit on 10k states, normally I seem to  
have around ~800 states, and when I start my test script that tries to  
send as many mails as possible (using PHP's Pear::Mail, creating a  
connection, sending, disconnecting, creating new connection.. and so  
on), I can clearly see the PF state counter (pfctl -vsi) increase, but  
the script aborts with Operation not permitted way before I hit 10k,  
its rather around 3-4k..
If I then wait a few seconds and run the script again, I can see the  
number of states increase even more, and if I do this enough times I  
finally hit around 9700 states. But at this point (states exhausted),  
I don't get Operation not permitted, instead it just seems that the  
script blocks up a few seconds while states clear up, then continues  
running until it gets a Operation not permitted.


So, from the above results, I cant say that it looks like its the  
states?


Just tried to disable the altq rule now too, no changes (not that I  
expected one, since its on bce0 not lo0).


Another thing, which might be more approriate in freebsd-pf though..  
Why would it create states at all for this traffic, when my pf.conf  
rule is pass on lo0 inet from $jail to $jail (i have a block drop in  
rule to drop all traffic)? A check with pfctl -vsr reveals that the  
actual rule inserted is pass on lo0 inet from 123.123.123.123 to  
123.123.123.123 flags S/SA keep state. Where did that keep state  
come from?


Thanks for ideas :)




HTH,

Alex

On Sat, 2008-05-17 at 16:33 +0200, Johan Ström wrote:

Hello

I got a FreeBSD 7 machine running mail services (among other things).
This machine recently replaced a FreeBSD 6.2 machine doing the same
tasks.
Now and then I need to send alot of mail to customers (mailing list),
and one thing i've noticed now after the change is that when I use a
lot of connections subsequently (high connection rate, even if they
are very shortlived) inside a jail (dunno if that has anything to do
with it though), I start to get Operation not permitted in return to
connect().
I've seen this in the PHP app that sends mail, when it tried to
connect to localhost, as well as from postfix when it have been  
trying

to connect to amavisd on localhost, but also from postfix when it has
tried to connect to remote SMTP servers.

I do have PF for filtering, but there are no max-src-conn-rate limits
enabled for any rules that is used for this. However, from one of the
jail I do have a hfsc queue limiting the outgoing mail traffic from
one jailed IP. But I'm not sure that this would be the problem, since
I've also seen the problem when doing localhost connects in the jail,
and also in other jails on an entierly different IP that is not
affected.

Does anyone have any clues about what I can look at and tune to fix
this?

Thanks!

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED] 



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


ZFS deadlock

2008-04-08 Thread Johan Ström

Hello

A box of mine running RELENG_7_0 and ZFS over a couple of disks (6  
disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T:


load: 0.50  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k
load: 0.43  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k
load: 0.11  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k


Worked for a while then that stopped working too (was over ssh). When  
trying a local login i only got


load: 0.09  cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k

I found one post like this earlier (by Xin LI), but nobody seemed to  
have replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure  
though, since I've edited my file yesterday for next reboot), with 2G  
of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M.  
currently it is at default), but since I just got back to 2G total mem  
after some hardware problems I've been runnig at those lows (1G total  
is kindof tight with zfs..)


Well, just wanted to report... The box is not totally dead yet, ie I  
can still do Ctrl-T on console, but thats it.. I don't really know  
what more I can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it  
unlocks or if anyone have any suggestions.


Thanks

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Johan Ström

On Apr 8, 2008, at 9:32 AM, Jeremy Chadwick wrote:


On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Ström wrote:

Hello

A box of mine running RELENG_7_0 and ZFS over a couple of disks (6  
disks, 3

mirrors) seems to have gotten stuck. From Ctrl-T:

load: 0.50  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k
load: 0.43  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k
load: 0.10  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k
load: 0.10  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k
load: 0.11  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k

Worked for a while then that stopped working too (was over ssh). When
trying a local login i only got

load: 0.09  cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k

I found one post like this earlier (by Xin LI), but nobody seemed  
to have

replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure  
though,
since I've edited my file yesterday for next reboot), with 2G of  
system
RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M.  
currently it is
at default), but since I just got back to 2G total mem after some  
hardware
problems I've been runnig at those lows (1G total is kindof tight  
with

zfs..)

Well, just wanted to report... The box is not totally dead yet, ie  
I can
still do Ctrl-T on console, but thats it.. I don't really know what  
more I

can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it  
unlocks

or if anyone have any suggestions.


I don't think there are any suggestions left to give.  Many people,
including myself, have experienced this kind of problem.  It's well-
documented both on my Common Issues page, and the official FreeBSD ZFS
Wiki.


Ah.. I guess I was just to restrictive with the googling on  
zfs:buf_hash_table.ht_locks[i].ht_lock.



ZFS is still considered highly experimental, so if your data is at all
important to you, perform backups or switch to another filesystem
provider.


That I am aware of.

Thanks.___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Johan Ström

On Apr 8, 2008, at 9:37 AM, LI Xin wrote:


Johan Ström wrote:

Hello
A box of mine running RELENG_7_0 and ZFS over a couple of disks (6  
disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T:
load: 0.50  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
load: 0.43  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
load: 0.11  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
Worked for a while then that stopped working too (was over ssh).  
When trying a local login i only got

load: 0.09  cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
I found one post like this earlier (by Xin LI), but nobody seemed  
to have replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure  
though, since I've edited my file yesterday for next reboot), with  
2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of  
512M. currently it is at default), but since I just got back to 2G  
total mem after some hardware problems I've been runnig at those  
lows (1G total is kindof tight with zfs..)
Well, just wanted to report... The box is not totally dead yet, ie  
I can still do Ctrl-T on console, but thats it.. I don't really  
know what more I can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it  
unlocks or if anyone have any suggestions.


The key is to increase your kmem and prevent it from being  
exhausted.  I think more recent OpenSolaris's ZFS code has some  
improvements but I do not have spare devices at hand to test and  
debug :(


Yep, never had the problem when I was running with 2G total mem, but  
then one stick (damn consumer crap) failed and I was left with 1G, and  
I started to have random problems. Going to tune kmem back up now when  
I got more mem again, thinking about putting in 4G too..





Maybe pjd@ would get a new import at some point?  I have cc'ed him.

Cheers,
--
Xin LI [EMAIL PROTECTED]http://www.delphij.net/
FreeBSD - The Power to Serve!



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Johan Ström

On Apr 8, 2008, at 9:40 AM, LI Xin wrote:

For your question: just reboot would be fine, you may want to tune  
your arc size (to be smaller) and kmem space (to be larger), which  
would reduce the chance that this would happen, or eliminate it,  
depending on your workload.


Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are  
those reasonable on a 2G machine? I think I've read that from  
somewhere, but cannot find that (arc at least) in the TuningGuide now.




This situation is not recoverable and you can trust ZFS that you  
will not lose data if they are already sync'ed.




Actually, I've had a lot of hard crashes lately on this machine (bad  
hw) but not a single time I have lost data (to my knowledge at  
least...). In that regard, comparing to UFS, ZFS is waaay better! :)



--
Xin LI [EMAIL PROTECTED]http://www.delphij.net/
FreeBSD - The Power to Serve!



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 7 and multiple IP (mijail-patch in 6.x)

2008-04-03 Thread Johan Ström

On Apr 3, 2008, at 8:39 PM, Bjoern A. Zeeb wrote:


On Mon, 31 Mar 2008, Johan Ström wrote:

Hi,

I got a machine running 6.2 right now, which is being replaced. And  
since SMP performance is much better on 7.x I'd like to go with 7.0  
(and many ppl have indeed verified that it works good on this box,  
HP DL360 G5)...
But, now when I start to setup the machine, I recalled that i've  
patched the 6.2 box with the freebsd mijail patch (http://www.digitaldaemon.com/FreeBSD/FreeBSD/FreeBSD_6.2-STABLE-mijail.patch 
).
However, I cannot find anywhere about FreeBSD 7 and a similar  
patch. A quick look at the patch vs the 7.x source tells me it  
won't apply cleanly, but from what I've seen quickly, it could  
maybe be done. The differences I've seen doesn't look too advanced,  
but then again, I'm  not a kernel developer...


So, I'd like to know if anyone considered this on 7.x, or if anyone  
can tell me immediately that this wont work or will be LOTS of  
work, or just some patch line adjusting? Ie, how big are the  
changes from 6.x to 7.x in these sections?


I had planned to have a patch for multiv4/v6 jails last month but  
it's not

yet publicly available. I have sent it off to some people for review.

In case the above is a successor of pjd's multi-ip v4 jail patch I can
give you a plain forward port to a FreeBSD 7 system (which might have
possible locking issues I have never experienced).

All depends on how quickly you need it.



Hello, thanks for your answer.

Yep, the patch i've been using on 6 looks very much like pjd's (http://people.freebsd.org/~pjd/patches/mijail5.patch 
). Are you using this Fbsd7-port, or do you have any idea if anyone  
does/how much it have been tested?


I have no need for IPv6 right now, so if nothing else, I'd be glad to  
test the 7-port of pjd's to see if it works. That sounds kindof what I  
thought to do so.. :)


Thank you!

--
Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


FreeBSD 7 and multiple IP (mijail-patch in 6.x)

2008-03-31 Thread Johan Ström

Hello

I got a machine running 6.2 right now, which is being replaced. And  
since SMP performance is much better on 7.x I'd like to go with 7.0  
(and many ppl have indeed verified that it works good on this box, HP  
DL360 G5)...
But, now when I start to setup the machine, I recalled that i've  
patched the 6.2 box with the freebsd mijail patch (http://www.digitaldaemon.com/FreeBSD/FreeBSD/FreeBSD_6.2-STABLE-mijail.patch 
).
However, I cannot find anywhere about FreeBSD 7 and a similar patch. A  
quick look at the patch vs the 7.x source tells me it won't apply  
cleanly, but from what I've seen quickly, it could maybe be done. The  
differences I've seen doesn't look too advanced, but then again, I'm   
not a kernel developer...


So, I'd like to know if anyone considered this on 7.x, or if anyone  
can tell me immediately that this wont work or will be LOTS of work,  
or just some patch line adjusting? Ie, how big are the changes from  
6.x to 7.x in these sections?


Thank you for any answers or pointers.

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-17 Thread Johan Ström

On Mar 16, 2008, at 8:36 AM, Ulf Zimmermann wrote:


On Wed, Mar 12, 2008 at 06:40:49PM -0500, Joe Koberg wrote:

Johan Str?m wrote:

But..
http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf 
 seems
to tell me that in basic mode I can only access BIOS (pre-OS)  
using the
Remote Console feature, and that after POST I have to have the  
advanced

licensed option?



I don't do the purchasing and we get all Advanced iLO, so I will take
your word for it.  The older generations supported text console (i  
have
a 360G2 that does so).   We use the HP Management agents under  
Windows
for all SNMP reporting so I can't comment on the reporting method  
under

other OS's.


iLO2 ActiveX based remote console (Integrated KVM) can still do
text only console without license but it doesn't work too well IMHO.
The Java based console is the same, text will work out license but  
graphics

mode and that includes certain VESA text modes.

Standard iLO gives the graphical console and virtual media. On Blade  
servers
the graphical access and virtual media is included. And the Advanced  
license

gives extra stuff like integration into AD for authentication afik.


How about SSH mode? SSH and view textmode at boot (serial rdr in bios  
too?) and console @ serial in fbsd (bootloader and on). Does that work  
good or not to well either?
Lets hope it works out good now at least, I ordered the box, without  
full license though, but I guess I can always get that later on if it  
turns out to work like crap.. But for once I'm purchasing quality  
brand hardware.. So it should work with me instead of against me... I  
hope :)


Thank you all for all of your replies!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-17 Thread Johan Ström

On Mar 17, 2008, at 11:46 AM, Ulf Zimmermann wrote:


On Mon, Mar 17, 2008 at 08:33:20AM +0100, Johan Str?m wrote:

On Mar 16, 2008, at 8:36 AM, Ulf Zimmermann wrote:


On Wed, Mar 12, 2008 at 06:40:49PM -0500, Joe Koberg wrote:

Johan Str?m wrote:

But..
http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf
seems
to tell me that in basic mode I can only access BIOS (pre-OS)
using the
Remote Console feature, and that after POST I have to have the
advanced
licensed option?



I don't do the purchasing and we get all Advanced iLO, so I will  
take

your word for it.  The older generations supported text console (i
have
a 360G2 that does so).   We use the HP Management agents under
Windows
for all SNMP reporting so I can't comment on the reporting method
under
other OS's.


iLO2 ActiveX based remote console (Integrated KVM) can still do
text only console without license but it doesn't work too well IMHO.
The Java based console is the same, text will work out license but
graphics
mode and that includes certain VESA text modes.

Standard iLO gives the graphical console and virtual media. On Blade
servers
the graphical access and virtual media is included. And the Advanced
license
gives extra stuff like integration into AD for authentication afik.


How about SSH mode? SSH and view textmode at boot (serial rdr in bios
too?) and console @ serial in fbsd (bootloader and on). Does that  
work

good or not to well either?
Lets hope it works out good now at least, I ordered the box, without
full license though, but I guess I can always get that later on if it
turns out to work like crap.. But for once I'm purchasing quality
brand hardware.. So it should work with me instead of against me... I
hope :)

Thank you all for all of your replies!


iLO1 (used on DL360 g3, g4, g4p and DL380 g3, g4) had text console
via ssh and I have used it often because of cut+paste. Unfortunatly
as far I know iLO2 (used on g5) does not support ssh text console.


Hm.. No not native text console, but the virtual serial port should  
work under SSH if I'm reading the manual correct:



Although additional configuration steps are required to use Remote  
Serial Console (as compared to using
the remote console or IRC), the Remote Serial Console allows telnet or  
SSH users to interact with the
server remotely and without requiring an iLO 2 Advanced license and is  
the only way a true text-based

remote console is presented by iLO 2.


If I've understood correct, the text console mode from iLO 1 is  
removed all together in favor for graphical mode (the internal  
workings have been changed).


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-17 Thread Johan Ström

On Mar 17, 2008, at 9:52 AM, Jeremy Chadwick wrote:


On Mon, Mar 17, 2008 at 08:33:20AM +0100, Johan Ström wrote:

On Mar 16, 2008, at 8:36 AM, Ulf Zimmermann wrote:


On Wed, Mar 12, 2008 at 06:40:49PM -0500, Joe Koberg wrote:

Johan Str?m wrote:

But..
http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf
seems
to tell me that in basic mode I can only access BIOS (pre-OS)  
using the
Remote Console feature, and that after POST I have to have the  
advanced

licensed option?



I don't do the purchasing and we get all Advanced iLO, so I will  
take
your word for it.  The older generations supported text console  
(i have
a 360G2 that does so).   We use the HP Management agents under  
Windows
for all SNMP reporting so I can't comment on the reporting method  
under

other OS's.


iLO2 ActiveX based remote console (Integrated KVM) can still do
text only console without license but it doesn't work too well IMHO.
The Java based console is the same, text will work out license but
graphics
mode and that includes certain VESA text modes.

Standard iLO gives the graphical console and virtual media. On Blade
servers
the graphical access and virtual media is included. And the Advanced
license
gives extra stuff like integration into AD for authentication afik.


How about SSH mode? SSH and view textmode at boot (serial rdr in  
bios too?)
and console @ serial in fbsd (bootloader and on). Does that work  
good or

not to well either?


I have to chime in here.

Who cares if it has SSH support?  iLO, LOM, and serial console should
all be done over a *private network*, and should NOT be hooked up to a
publicly-accessible network or given public IPs.  I cannot stress how
important this is.  DO NOT put stuff like this on the public Internet:
you will regret it.


The advantage to iLO is that it's the equivalent of KVM-over-IP,
supporting virtual media too (read: an ISO image on your laptop/local
client machine being used as a CD on the server itself, thus you can
install whatever OS you want, etc.).  You get NATIVE VGA CONSOLE
remotely on the machine -- there is no serial console, and that's
always best.  I've seen it in action, and it's *awesome*.


For advanced license yes. Thats another $400 or so (which might not be  
very much money for big corps but for me and my one server  
installation its more..)





Said iLO capability usually works over a series of TCP or UDP ports,
somtimes even supporting HTTP (on the iLO module itself!) which  
means if

its on a private network, you can tunnel to it using SSH or similar
utilities via another box in the co-lo.  Then simply access
127.0.0.1:whatever in the ActiveX, Java, or native Win32/Linux client
and voila -- you have the machines' native VGA console in front of  
you,

with no issues relating to serial console.  No more ohhh, the bootup
configuration uses 9600bps, but our serial console servers are
configured to use 115200bps... but the disk isn't booting so it's  
still

using 9600bps at that stage, now I HAVE to go to the datacenter
scenarios.


Yep, there are some downsides with serial console. But if it works,  
i'd rather use a normal ssh client in my terminal together with the  
virtual serial port than sitting in a web browser. But i'll guess I'm  
going to evaluate the serial port option when I get the box, and if it  
isnt working to good i'll just have to throw up the money and get the  
advanced license (even if i'd rather use that money on more fun  
things..)





I do not trust IPMI based on stories I have heard from Yahoo! SAs,
talking about how every implementation is different (so much for a
standard), and how the number of bugs in Supermicro's IPMI
implementation are absurd.  Supposedly Intel and others have done a
better job with it, but I lost all interest in it once I found that
there was no real standard.  Besides, anything that piggybacks on
top of an existing LAN port (even some iLO implementations do this!)  
is

worth avoiding.  I do not want to deal with a single NIC emitting two
separate MAC addresses -- and that's what happens.  It's sometimes
referred to as ASF as well.


I've got a supermicro ipmi card now and.. I'm afraid I cannot describe  
it with better words than crappy toy.. Constant IPMI card restarts/ 
crashes, the serial consol java browser applet stopping responding,  
firmware upgrades that b0rks the card totally etc...


--
Johan___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-14 Thread Johan Ström

On Mar 13, 2008, at 4:57 PM, David Schutt wrote:


Johan Ström wrote:

On Mar 13, 2008, at 1:01 AM, Sean Winn wrote:
For using HP blades and standard iLO (no licensed advance  
features), it works perfectly well, installing both FreeBSD 5 and  
6 on the blades I've tried, using a remote install CD from the  
Java applet (there's one for remote devices like disks, and one  
for remote console); there's only text mode but that's plenty to  
install the OS and enable it to the point of using SSH to manage  
it from there on in. I'd hope the iLO hardware/software is  
relatively common to all the HP range :)


text mode access continues at all times - the iLO interface is  
just a remote screen/keyboard onto it, even POST BIOS boot. The  
external devices are USB mass storage ones, but I didn't have  
problems booting off the CD and installing it for 6.2.

Well.. The blades seems to be an exception:
• iLO 2 Standard Blade Edition (unlicensed blade server):
o Remote Console and IRC
This is not listed under iLO 2 Standard (unlicensed:)... I guess  
that means I'm out of luck unless I want to bang up another $400  
(listing price).. Which I'd rather not :)
Anyone running those 360G5's using serial console on a normal  
licensed iLO?


Yes.  We have one DL360 G5, and I was able to get serial console  
working using information I found in this thread --


http://lists.freebsd.org/pipermail/freebsd-proliant/2007-October/000303.html

Only downside is that the physical COM1 becomes unavailable, which  
caused some consternation when trying to monitor a UPS :-)


That is probably not a big deal for me. Then I guess this should work  
fine.. I've just played around some on 7.0 with serial console on a  
regular port (not HP box though), and it seems to work fine.


From what I could tell i can just SSH to iLO and enter 'vps' and I   
get the serial port, and that this works very good (http://lists.freebsd.org/pipermail/freebsd-proliant/2007-August/000292.html 
). If anyone thinks opposite, I'd appreciate a line. :)


Thanks!___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-13 Thread Johan Ström

On Mar 13, 2008, at 1:01 AM, Sean Winn wrote:

For using HP blades and standard iLO (no licensed advance features),  
it works perfectly well, installing both FreeBSD 5 and 6 on the  
blades I've tried, using a remote install CD from the Java applet  
(there's one for remote devices like disks, and one for remote  
console); there's only text mode but that's plenty to install the OS  
and enable it to the point of using SSH to manage it from there on  
in. I'd hope the iLO hardware/software is relatively common to all  
the HP range :)


text mode access continues at all times - the iLO interface is just  
a remote screen/keyboard onto it, even POST BIOS boot. The external  
devices are USB mass storage ones, but I didn't have problems  
booting off the CD and installing it for 6.2.



Well.. The blades seems to be an exception:

• iLO 2 Standard Blade Edition (unlicensed blade server):
o Remote Console and IRC

This is not listed under iLO 2 Standard (unlicensed:)... I guess  
that means I'm out of luck unless I want to bang up another $400  
(listing price).. Which I'd rather not :)


Anyone running those 360G5's using serial console on a normal licensed  
iLO?






The iLO web interface nominally requires/is tested only on Windows/ 
IE setup, but I do my work from a Mac running Safari, and have no  
problems to date.


On 13/03/2008, at 10:41 AM, Johan Ström wrote:


On Mar 13, 2008, at 12:40 AM, Joe Koberg wrote:


Johan Ström wrote:
But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf 
 seems to tell me that in basic mode I can only access BIOS (pre- 
OS) using the Remote Console feature, and that after POST I have  
to have the advanced licensed option?




I don't do the purchasing and we get all Advanced iLO, so I will  
take your word for it.  The older generations supported text  
console (i have a 360G2 that does so).   We use the HP Management  
agents under Windows for all SNMP reporting so I can't comment on  
the reporting method under other OS's.




I see. Can anyone else maybe shed some light here?

Thanks___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED] 







___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


HP ProLiant DL360 G5 success stories?

2008-03-12 Thread Johan Ström

Hello!

Im looking into getting a new server box to replace a Supermicro box,  
which unfortunately have a bunch of problems with heat, random  
hangups, crappy IPMI/remote admin capabilities etc..
What I'm looking at is a DL360 G5, probably with one E5335 (quad 2.0)  
and 4G of RAM and 4x 146Gb SAS disks on the Smart Array P400i card.
I've googled and looked through the list archives trying to find  
success stories/problem reports using FreeBSD on this box, but haven't  
found very much.. Only thing was http://www.freebsd.org/platforms/amd64/motherboards.html 
 which says Functional which isnt very informative ;)


So.. Does anyone have any experience with this combo (DL360 G5 / P400i)?

Furthermore, anyone run 7.0 on this? Or should I still stick with  
6.3... Load will be a couple of jails mainly running apache + php +  
mysql (or at least thats where the load will be).


Thanks!

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-12 Thread Johan Ström
First of all, nice with all these positive answers! Thank you all  
(without responding to each and every post:))!



On Mar 12, 2008, at 12:35 PM, Pete French wrote:


What I'm looking at is a DL360 G5, probably with one E5335 (quad 2.0)
and 4G of RAM and 4x 146Gb SAS disks on the Smart Array P400i card.

...
So.. Does anyone have any experience with this combo (DL360 G5 /  
P400i)?


We have around 20 machines like that and they work beautifully. We
run 7.0/amd64 on the machines now, but we have run 6.2/i386 in the  
past

and that work fine - though you will only be able to use the first
3.5 gig of RAM.


I don't have any plans on running i368, running amd64 on the  
supermicro box now without any problems (that I can relate to that at  
least).


How long have you run 7.0 (before release)?  From all the other  
responses it seems lots of ppl use 7.0 on these without any problems  
at all.






Furthermore, anyone run 7.0 on this? Or should I still stick with


We run 7.0 on these machines and it works fine - I always prefer 7.0
to 6.3 on SMP machines as it performs better. Also 7.0 works well with
the iLO on these machines - I seem to recall when I installed 6.X that
it didn't work too well and I had to use boot floppy images. I'd say
go for 7.0 and amd64 if you can.


This is where I'm a bit curious. What OS interaction does iLO do? That  
needs to be compatible i mean.
On my current box I got a IPMI card that gives me (when its working..)  
SOL capabilities.. To what degree can I remote control with iLO? If  
I've understood correct, I get the exact console as on screen with kb  
access, over web/ssh/telnet. Is this working good? This is one of my  
important points for changing since its so crappy on my current box,  
and when the box is a couple of miles away its quite nice to have it  
working flawlessly..
iLO over internet? Possible, impossible? Encryption? (yes i know, not  
exactly freebsd related questions but.. )



Another thing, how is it with physical monitoring? Temperatures/ 
fanspeeds/voltage?


Thank you (all)! :)

--
Johan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-12 Thread Johan Ström

On Mar 12, 2008, at 12:26 PM, Krassimir Slavchev wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Johan Ström wrote:

Hello!

Im looking into getting a new server box to replace a Supermicro box,
which unfortunately have a bunch of problems with heat, random  
hangups,

crappy IPMI/remote admin capabilities etc..
What I'm looking at is a DL360 G5, probably with one E5335 (quad 2.0)
and 4G of RAM and 4x 146Gb SAS disks on the Smart Array P400i card.


I have some DL360 G5 (3.0GHz E53xx) with 8G, 4x146G SAS and P400i and
they work perfect with  7.0 (amd64) and sched ULE!

I've googled and looked through the list archives trying to find  
success

stories/problem reports using FreeBSD on this box, but haven't found
very much.. Only thing was


Try to find Performance! thread on this list


Found it, but the main point seemed to be dont compile postgres stuff  
with incompatible threadsafeness? Or did I miss something?


http://www.mail-archive.com/freebsd-stable@freebsd.org/msg92787.html





http://www.freebsd.org/platforms/amd64/motherboards.html which says
Functional which isnt very informative ;)

So.. Does anyone have any experience with this combo (DL360 G5 /  
P400i)?




Yes, P400i works fine with ciss(4) driver


Can i offline/online drives, rebuild arrays etc from the OS?

Thank you!

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-12 Thread Johan Ström

On Mar 12, 2008, at 11:27 PM, Pete French wrote:


How long have you run 7.0 (before release)?  From all the other
responses it seems lots of ppl use 7.0 on these without any problems
at all.


I've been running it since last september - never had any problem with
it, and am pretty convinced it is stable.


Sounds good :)




This is where I'm a bit curious. What OS interaction does iLO do?  
That

needs to be compatible i mean.


Booting from the CD - I had one FreeBSD/iLO combination which would  
noit
boot from the emulated CD. I needed to use the floppies and do a  
network
install. That was painful - I can't remember the version though.  
Certainly

I have had no such probelsm with 7.0 and the latest iLO.


I see.. well if i ever need to boot from cd (reinstall) this box i'll  
probably do it with local access anyway..






SOL capabilities.. To what degree can I remote control with iLO? If


It acts as a complete console - just as if you were sitting in front
of the machine. You can see the screen, use keyboard and mouse, and
attack images as CD's or floppies.



iLO over internet? Possible, impossible? Encryption? (yes i know, not
exactly freebsd related questions but.. )


iLO runs over https so is encrypted. It does run better from a Windows
client than anything else sadly - but I keep a wWindows box around
for this purpose. Have just installed a set of machines somewhere in
Louisianna remotely, whilst sitting in bed in London with a cup of  
tea :

using an OSX laptop :-) I love iLO...


Just what I thought then.

Thanks :)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-12 Thread Johan Ström

On Mar 12, 2008, at 11:37 PM, Joe Koberg wrote:

The iLO is a completely separate management processor with its own  
network port. It runs its own OS and has its own IP address. It runs  
an SSL webserver for access.  The iLO is accessible over the network  
any time the machine is plugged into power.  I am not sure about  
IPMI access to it.


Okay, kind of what I expected (havent read up very much on it yet).




The normal iLO option will give you exact textual console screen  
output and keyboard control from the moment of power-on.  It will  
also let you toggle power and hit the reset button. I believe it  
uses a java applet in the browser.


The advanced iLO option, which is license-key-unlocked, also  
provides graphical remote console, and virtual media. You can upload  
a CD or floppy image and then boot the server from it.  I suspect  
the compatibility issue appears here - the virtual media probably  
emulates USB mass storage, and the OS must be able to boot from it.


I see... So for a box that is going to run fbsd in console mode, and  
hopefully never need to boot from CD after install, it sounds like the  
normal mode will work splendid.


But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf 
 seems to tell me that in basic mode I can only access BIOS (pre-OS)  
using the Remote Console feature, and that after POST I have to have  
the advanced licensed option?


iLO 2 displays this information through the remote console applet  
while in the server pre-operating system
state, enabling a non-licensed iLO 2 to observe and interact with the  
server during POST activities. A non-
licensed iLO 2 cannot use remote console access after the server  
completes POST and begins to load the
operating system. The iLO 2 Advanced License enables access to the  
remote console at all times.


So.. Then what? I have to configure FreeBSD to use a serial console  
and continue with using serial console instead? Later in the same doc:


• iLO 2 Standard (unlicensed:)
NOTE:  The features annotated with an asterisk (*) are not supported  
on all systems.


o Virtual Power and Reset control
o Remote serial console through POST only
...
o Serial access*


Am i missing something here or will I only be able to access the  
console during post, unless i configure the box to use a serial  
console? Hope you can shed some light here :)





It has full reporting of hardware state and management log details,  
and the home page is a big summary with any faults outlined in red.


Yes, that was what I expected. But can i retreive the data some other  
way? IPMI, SNMP or something? Would like to gather the stats to a  
central management site. Further investigation in the manual seems to  
indicate that no SNMP access is available, but there is some XML  
RIBCL interface I can use (yes this is in standard mode too :))


Thank you!

--
Johan___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP ProLiant DL360 G5 success stories?

2008-03-12 Thread Johan Ström

On Mar 13, 2008, at 12:40 AM, Joe Koberg wrote:


Johan Ström wrote:
But.. http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c00553302/c00553302.pdf 
 seems to tell me that in basic mode I can only access BIOS (pre- 
OS) using the Remote Console feature, and that after POST I have to  
have the advanced licensed option?




I don't do the purchasing and we get all Advanced iLO, so I will  
take your word for it.  The older generations supported text console  
(i have a 360G2 that does so).   We use the HP Management agents  
under Windows for all SNMP reporting so I can't comment on the  
reporting method under other OS's.




I see. Can anyone else maybe shed some light here?

Thanks___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Backup solution suggestions [ggated]

2008-01-18 Thread Johan Ström

On Jan 17, 2008, at 09:30 , Ulrich Spoerlein wrote:


On Jan 17, 2008 1:31 AM, Johan Ström [EMAIL PROTECTED] wrote:

Export the disk on the backup server with ggated. Bind it on the
client
with ggatec. Slap a GELI or GBDE encryption on top of it and then
put a
ZFS on top of it.

You can mount/import this remote ZFS at will and do your zfs
send/receive on your local box. Nothing ever leaves your box
unencrypted.


Now that is a cool solution! That actually sounds like something  
doable.

I tried it out some at home between a 6.2 box (client) and 7.0 box
(server), hosting the system in a ZFS sparse volume with a
predefined size, exported that via ggated and connected ggatec on the
client box. I then did some experimentation with just newfs, and it
worked great!
The only downside with this would be that the size is fixed. So I
played around a bit with setting the volsize property in ZFS and it
seemd to work just fine. zfs list reported the new, bigger, size.
Restarted ggatec and did a growfs, and then remounted.. Yay bigger
disk :)
Then I went on do do some geli test, geli'ed /dev/ggate0 and
newfs'ed, mounted and played around a bit. All fine.. Now came the
problem, i unmounetd it, expanded the zfs volume a bit more,
restarted ggatec and tried to attach it using geli again (note, I
have no idea if this is supposed to work at all, I'm just testing.
Havent read such things anywhere). Now I got Invalid argument.
Im not realy sure about how GEOM works, but if I recall correct it
uses the last sectors of the disk? If I moved X bytes of data from
old end of disk to new end of disk, would that make GELI work? If I
can get that to work, then this would be a kickass solution (all
encryption stuff works great, I don't have to allocate all space
immediatly, I can expand it later without destroying data and
starting from scratch etc).


I'm pretty certain that GELI cannot handle variable sized disks. But
you could add GVIRSTOR into the mix. But I'd just allocate the
necessary space and be done with it. Adding yet another layer is
asking for trouble, imho.


Okay.




Some other questions, more related to ggated/c. Is this stable? Good
working? how does it handle failure situations? Anyone using it for
production systems?


From my personal experience (which is rather limited): No, barely,  
bad, hell no.


There were/are some open PRs about ggate. I had troubles with
gmirror+ggate in that it would deadlock every other hour on SMP
systems (try removing option PREEMPTION if that bug hits you).


Your no,barely, bad hell no seems to fit pretty good.. I did some  
testing during the night with the above (non-production) setup.

What I did was doing some rsyncing over the night:

while true ; do
echo `date` Clearing vmail  logfile
rm -rf vmail
echo `date` Starting rsync  logfile
rsync -vr /usr/var/vmail . |tee -a logfile
echo `date` Rsync finished   logfile
done

I started this at ~02.0. The results? A freshly rebooted 6.2 (6.2- 
RELEASE-p6 FreeBSD 6.2-RELEASE-p6 #0: Fri Jul 27 15:47:50 UTC 2007)  
box in the morning..

Looking at the messages:

Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=8844480512, length=4096)]
Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=8844484608, length=4096)]
Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=8844488704, length=4096)]
Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=8844492800, length=4096)]
Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=8844517376, length=4096)]

... more of the same...
Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE 
(offset=8844480512, length=4096)]error = 5
Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=8844640256, length=32768)]

..more of the same...
Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=8844988416, length=4096)]
Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE 
(offset=8844484608, length=4096)]error = 5
Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE 
(offset=8844488704, length=4096)]error = 5
Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE 
(offset=8844492800, length=4096)]error = 5
Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE 
(offset=8844517376, length=4096)]error = 5
Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=8844992512, length=4096)]

...more of the same...
Jan 18 05:33:25 phomca kernel: GEOM_ELI: Crypto WRITE request failed  
(error=5). ggate0.eli[WRITE(offset=65536, length=4096)]
Jan 18 05:33:25 phomca kernel: g_vfs_done():ggate0.eli[WRITE 
(offset=8844521472, length=4096

Re: Backup solution suggestions [ggated]

2008-01-16 Thread Johan Ström

On Jan 16, 2008, at 23:27 , Ulrich Spoerlein wrote:


On Wed, 16.01.2008 at 00:26:34 +0100, Johan Ström wrote:
I create regular tarball (gziped maybee) with some files i want to  
backup,
Then i encrypt this file with ie gpg. Then i send of this file  
using some

unspecified network protocol to the storage server.
Encrypted all the way, from my end to the remote disk..
The downside is that it is a static file.. not a dynamic  
filesystem,
nothing I can mount and have easy access to individual files from.  
*Thats*

what I'm looking for.


Export the disk on the backup server with ggated. Bind it on the  
client
with ggatec. Slap a GELI or GBDE encryption on top of it and then  
put a

ZFS on top of it.

You can mount/import this remote ZFS at will and do your zfs
send/receive on your local box. Nothing ever leaves your box
unencrypted.


Now that is a cool solution! That actually sounds like something doable.
I tried it out some at home between a 6.2 box (client) and 7.0 box  
(server), hosting the system in a ZFS sparse volume with a  
predefined size, exported that via ggated and connected ggatec on the  
client box. I then did some experimentation with just newfs, and it  
worked great!
The only downside with this would be that the size is fixed. So I  
played around a bit with setting the volsize property in ZFS and it  
seemd to work just fine. zfs list reported the new, bigger, size.  
Restarted ggatec and did a growfs, and then remounted.. Yay bigger  
disk :)
Then I went on do do some geli test, geli'ed /dev/ggate0 and  
newfs'ed, mounted and played around a bit. All fine.. Now came the  
problem, i unmounetd it, expanded the zfs volume a bit more,  
restarted ggatec and tried to attach it using geli again (note, I  
have no idea if this is supposed to work at all, I'm just testing.  
Havent read such things anywhere). Now I got Invalid argument.
Im not realy sure about how GEOM works, but if I recall correct it  
uses the last sectors of the disk? If I moved X bytes of data from  
old end of disk to new end of disk, would that make GELI work? If I  
can get that to work, then this would be a kickass solution (all  
encryption stuff works great, I don't have to allocate all space  
immediatly, I can expand it later without destroying data and  
starting from scratch etc).


Some other questions, more related to ggated/c. Is this stable? Good  
working? how does it handle failure situations? Anyone using it for  
production systems? Yes this is for backup only so minor glitches  
might be acceptable for me, but I'd rather know about those beforehand.
I did some dd from urandom to the disk, with and without GELI.. I did  
notice some slightly lower speeds, i was able to write around 11MB/s  
without GELI, with GELI it did around 9.5MB/s. The client machine is  
no super box but its not that bad (A64 3200, 1G mem with not much load).


Input and ideas?

Thank you very much :)

--
Johan


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Backup solution suggestions

2008-01-16 Thread Johan Ström

On Jan 16, 2008, at 19:02 , Toomas Aas wrote:


Johan Ström wrote:

My main problem with existing solutions is this gap of  
encryption on the backup server side. I dont want it to be  
readable outside of my box (without encryption keys ofcourse), so  
as soon as I send it of from my box I want it to be encrypted over  
the link, and down on the disk. Not decrypted on the remote box,  
to then be encrypted again (with keys available on that box) and  
then stored to disk. That would allow any users of that box (yes  
sure you can have file permissions but lets assume someone else  
have root access there) to read my files.

Simple Example:
I create regular tarball (gziped maybee) with some files i want to  
backup, Then i encrypt this file with ie gpg. Then i send of this  
file using some unspecified network protocol to the storage server.

Encrypted all the way, from my end to the remote disk..
The downside is that it is a static file.. not a dynamic  
filesystem, nothing I can mount and have easy access to  
individual files from. *Thats* what I'm looking for.


As a long-time user of Amanda and regular lurker on their mailing  
list, I've noticed that latest versions of Amanda have encryption  
capabilities. They seem to fit your needs in that encryption can be  
performed entirely on the backup client (your box) side if one  
opts to set things up that way.


I haven't used encryption with Amanda myself so this is just what  
I've heard on the list and read from the wiki just now:


http://wiki.zmanda.com/index.php/How_To:Set_up_data_encryption

As for the ease of restore, it's not quite *that* easy, i.e. you  
can't just transparently mount the backup as a filesystem and copy  
files from there. Amanda has a command-line-ftp-like recovery  
interface, where you can specify which files/subdirectories and  
from which date you want recovered. It's been easy enough for me.





Looked through that page, seems like pretty much work right now. And  
I looked through the amanda docs, and I got to say, when calling  
themselfs Amanda is the world's most popular Open Source Backup and  
Archiving software. one would expect somewhat better docs.. hehe.
Anyway, I will look more into the ggated suggestion from another post  
before digging deeper into amanda :)


--
Johan___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Backup solution suggestions

2008-01-15 Thread Johan Ström

Hello

I'm looking to invest in some new hardware for backup. probably some  
kind of NAS (a 4-disk 1U NAS or something in that size). The thing is  
that I won't be the only one with access to this box, thus I would  
like to secure my data.
What I would like is encryption both for the transfer to the box, and  
encrypted on disk. The data on disk should not be readable by anyone  
but me (ie the other user(s) of the box should not be able to read  
it, at least not without a big effort).


So, I'm wondering what the best solution might be.. Tar'balling all  
my stuff and encrypt it with GPG or something and just dump it there  
with NFS would be the easiest solution, but maybe not the best. I've  
been thinking about running a GELI image on my box, and store that on  
the NAS over NFS.. would that be doable/secure/stable?
Another idea would be to go with some regular 1U box running some  
FBSD, doing scp to the box and geli local on the box but that would  
require me to have the encryption keys on that box (which would be  
shared so thus no good idea).


Any other ideas? Being able to rsync to the backup storage instead of  
just sending big encrypted tarballs would be very nice (and I guess  
that would be possible with geli version)


Maybe not the perfect list for this, but it is somewhat freebsd  
specific and I'm sure some other ppl on the list have had simliar  
situations :)


--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Backup solution suggestions

2008-01-15 Thread Johan Ström

First of all, thanks for your extensive answer!

On Jan 15, 2008, at 13:34 , Jeremy Chadwick wrote:


On Tue, Jan 15, 2008 at 10:52:56AM +0100, Johan Ström wrote:
I'm looking to invest in some new hardware for backup. probably  
some kind
of NAS (a 4-disk 1U NAS or something in that size). The thing is  
that I
won't be the only one with access to this box, thus I would like  
to secure

my data.


In my experience, your best bet when it comes to backups like what you
want (1U box with 4 disks, or a 2U box with 8 or more) is to simply  
buy

a server with the specifications you want, and run FreeBSD on it.  I
cannot recommend commercial products for something of this  
scale (e.g.

small/medium).

I could list off all the reasons why [as a small hosting provider] I
avoid proprietary backup solutions, but the list is quite long.  The
two main reasons:

1) Proprietary solutions often use proprietary hardware.  How do you
know what's inside of that mystery box?  What if it uses a SATA
controller you know has h/w-level bugs in it?  What if something in  
the

device fails; are you going to be charged an arm and a leg for a
replacement part?  Does it even HAVE user-servicable parts?  etc...

I feel much more confident relying on hardware that I'm familiar with,
e.g. I know what motherboard is in the server I buy or build, I  
know who

makes it, I know if it's compatible with FreeBSD or Linux, I know the
SATA controller works and isn't flaky, I know the SATA backplane
actually works properly and supports hot-swapping, and I know if I  
need
replacement parts I can get them promptly.  Also, if the h/w I buy  
turns

out to have compatibility problems or performance issues, I can always
return it, get my money back, and try other h/w; with a proprietary
solution you're stuck with it, and if something's broken about it
which the vendor can't/won't fix, you're screwed.

2) Proprietary solutions also means proprietary software.  This is
pretty much guaranteed regardless of what h/w is used.  What if the
volume manager used for your array has a bug and your data is
corrupt?  You have no way of really knowing this until it's too  
late,

and you only have one person to turn to: the vendor.


All good points there, cannot argue against that. Certainly something  
to think about before doing any purchases. The only thing against  
that right now is size (we've got cheap access to a rack with  
limited depth), havent realy found any good 1U chassis that arent to  
deep. Admittedly I haven't spent veery much time looking yet but.. :)




I prefer to have freedom of choice when it comes to backup methods.
Hmm, dump/restore isn't working out very well, so maybe I'll try ZFS,
or bacula, or tar over NFS, or rsync, or


What I would like is encryption both for the transfer to the box, and
encrypted on disk. The data on disk should not be readable by  
anyone but me
(ie the other user(s) of the box should not be able to read it, at  
least

not without a big effort).


I'm curious what the reason is for on-disk encryption?  Is it  
necessary
for something *only you* will have access to?  What's the concern  
here?


I think I wrote that I *wont* be the only one with access to the box.  
Sorry if that wasn't clear.


It will be shared with a friend (or rather his company) of mine. I do  
trust him, but to keep some level of security I don't want him (or  
rather, someone with access to his box) to be able to read my files  
(and the other way arround for his files).




So, I'm wondering what the best solution might be.. Tar'balling  
all my
stuff and encrypt it with GPG or something and just dump it there  
with NFS
would be the easiest solution, but maybe not the best. I've been  
thinking
about running a GELI image on my box, and store that on the NAS  
over NFS..

would that be doable/secure/stable?


I would recommend avoiding NFS unless the machine you're running
nfsd/mountd/portmap on has no direct way to talk to the Internet.   
It's
impossible to get NFS-related daemons to bind solely to one IP/ 
interface

on FreeBSD, which imposes a security risk.  If the machine is behind
NAT, you're very likely safe (unless the public has some way of
accessing another machine on that NAT network).  Thus, if you  
choose to

go the NFS route, have it on a segregated network.


The box will be on a separate LAN only accessible by our two boxes.  
No internet connectivity. But the client boxes ofcourse have internet  
connectivty (but that would only be NFS clients, not servers).




That said -- what we use in our production environment is dump/restore
over SSH over a dedicated LAN.  I wrote a series of scripts that do
this, using SSH keys for the SSH portion.  Incrementals are done 6  
days

a week, with fulls done once a week.


I use a similar scheme now, using BackupPC. However that is to my box  
at home which is not a very good solution due to bandwidth  
limitations (5MBit only).. The first copy takes ages

Re: Backup solution suggestions

2008-01-15 Thread Johan Ström


On Jan 15, 2008, at 15:03 , Ronald Klop wrote:

This sounds like a problem for 'tarsnap'. It's from the same author  
as portsnap.


http://www.daemonology.net/blog/2006-09-13-encrypted-backup.html
http://www.daemonology.net/blog/2007-08-29-tarsnap-update.html
http://www.tarsnap.com/

I never used it so I don't know more about it than you can find in  
these url's.


Indeed, sounds like that could be what I'm looking for. But however  
nothing public yet, only private beta..


And from the sound of it, only a hosted service? Nothing that one  
will be able to put up themself.

http://news.ycombinator.com/item?id=81221
And a remotely hosted service I can have at home, but it's the  
bandwith of the internet link that limits me..


--
Johan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Backup solution suggestions

2008-01-15 Thread Johan Ström

On Jan 15, 2008, at 13:44 , Jeremy Chadwick wrote:


On Tue, Jan 15, 2008 at 12:40:02PM +0100, Vladimir Botka wrote:

Dne Tue, 15 Jan 2008 10:52:56 +0100
Johan Ström [EMAIL PROTECTED] napsal(a):


Hello

I'm looking to invest in some new hardware for backup. probably some
kind of NAS (a 4-disk 1U NAS or something in that size). The thing
is that I won't be the only one with access to this box, thus I
would like to secure my data.
What I would like is encryption both for the transfer to the box,
and encrypted on disk. The data on disk should not be readable by
anyone but me (ie the other user(s) of the box should not be able to
read it, at least not without a big effort).

So, I'm wondering what the best solution might be.. Tar'balling all
my stuff and encrypt it with GPG or something and just dump it there
with NFS would be the easiest solution, but maybe not the best. I've
been thinking about running a GELI image on my box, and store that
on the NAS over NFS.. would that be doable/secure/stable?
Another idea would be to go with some regular 1U box running some
FBSD, doing scp to the box and geli local on the box but that would
require me to have the encryption keys on that box (which would be
shared so thus no good idea).

Any other ideas? Being able to rsync to the backup storage instead
of just sending big encrypted tarballs would be very nice (and I
guess that would be possible with geli version)

Maybe not the perfect list for this, but it is somewhat freebsd
specific and I'm sure some other ppl on the list have had simliar
situations :)

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/



Hello,

As of the encryption on the transfer I use security/sfs to mount  
remote

directory for backup and then rsync in the local.


I thought SFS looked pretty neat until I saw this in the  
documentation:


  Finally, you must export all the local-directorys in your  
sfsrwsd_config

  to localhost via NFS version 3.

See my mail to Johan, as it documents a known issue with
nfsd/mountd/portmap on FreeBSD (re: binding to INADDR_ANY and using
dynamically-allocated port numbers).  This circles back to my if you
HAVE to use NFS, do so on a dedicated network which has no public
access statement.



SFS indeed looked very nice, but didnt provide me with the encrypted- 
on-disk feature I need as I understand?.
As mentioned earlier I don't want to store crypto keys on the backup  
machine itself, otherwise I could have used geli or something.


Thanks

--
Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Backup solution suggestions

2008-01-15 Thread Johan Ström

On Jan 15, 2008, at 22:09 , Aristedes Maniatis wrote:



On 15/01/2008, at 8:52 PM, Johan Ström wrote:

I'm looking to invest in some new hardware for backup. probably  
some kind of NAS (a 4-disk 1U NAS or something in that size). The  
thing is that I won't be the only one with access to this box,  
thus I would like to secure my data.
What I would like is encryption both for the transfer to the box,  
and encrypted on disk. The data on disk should not be readable by  
anyone but me (ie the other user(s) of the box should not be able  
to read it, at least not without a big effort).


Take a look at bacula. It is a proper backup system, meaning that  
it does incremental backups, etc. Storage pools can be encrypted.  
Not sure if the network stream can be, but that could be solved  
with an ssh tunnel. And it is open source, reliable and runs nicely  
on FreeBSD.




My main problem with existing solutions is this gap of encryption  
on the backup server side. I dont want it to be readable outside of  
my box (without encryption keys ofcourse), so as soon as I send it of  
from my box I want it to be encrypted over the link, and down on the  
disk. Not decrypted on the remote box, to then be encrypted again  
(with keys available on that box) and then stored to disk. That would  
allow any users of that box (yes sure you can have file permissions  
but lets assume someone else have root access there) to read my files.


Simple Example:

I create regular tarball (gziped maybee) with some files i want to  
backup, Then i encrypt this file with ie gpg. Then i send of this  
file using some unspecified network protocol to the storage server.

Encrypted all the way, from my end to the remote disk..
The downside is that it is a static file.. not a dynamic  
filesystem, nothing I can mount and have easy access to individual  
files from. *Thats* what I'm looking for.


--
Johan___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: I just broke out of a FreeBSD jail.. Known bug??

2007-12-28 Thread Johan Ström

On Dec 28, 2007, at 13:41 , Edwin Groothuis wrote:


On Fri, Dec 28, 2007 at 01:15:38PM +0100, Johan Str?m wrote:

Thats my home dir on core!.. That should very much not be visible
there! I have full access now (from the wrong jail!)

Known bug or did I just stumble upon something pretty bad??


You didn't really break out of it, the person who managed the machine
did something he shouldn't have done: Moving the directories while
the jail(s) were running. It should be mentioned in the BUGS section
of the jail(8) command.



Yes, thats true.. Without super-root doing that the breakout  
would never happen. But still a bug, so yes I guess it should be  
mentioned in BUGS (and handbook too? not sure where this kind of  
special features are noted) unless its fixed.


--
Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


I just broke out of a FreeBSD jail.. Known bug??

2007-12-28 Thread Johan Ström

Hello list!

I'm running a FreeBSD 6.2-p8 box with a few jails. The other day a  
user of mine uploaded a number of files to one jail, then I (in the  
actual system outside of all jails) moved that directory to another  
jail.. When I later did some chdiring in the original jail, I found  
my self standing in my other jails pwd and beeing able to read/ 
manipulate files!..


Example:

jb-1 (the base machine, jailbox-1)
shell (jail 1)
core (jail 2)

shell /home/johan# pwd
/home/johan
shell /home/johan# ls
.cshrc  .irssi  .login_conf .mailrc .profile 
.shrc   .zcompdump  public_html
.histfile   .login  .mail_aliases   .noident.rhosts  
.ssh.zshrc

shell /home/johan# mkdir test
shell /home/johan# cd test
shell /home/johan/test# touch asd
shell /home/johan/test# ls -al
total 4
drwxr-xr-x  2 root   root   512 Dec 28 13:09 .
drwxr-x--x  6 johan  johan  512 Dec 28 13:09 ..
-rw-r--r--  1 root   root0 Dec 28 13:09 asd
shell /home/johan/test#

Then moving it on the root box

jb-1 /usr/jails# mv shell/home/johan/test core/home/johan/
jb-1 /usr/jails#

And back on shell jail:

shell /home/johan/test# ls
asd
shell /home/johan/test# pwd
pwd: .: No such file or directory
shell /home/johan/test# cd ..
shell /home/johan# ls
.cshrc  .lesshst.mailrc .shrc   .vimrc   
file.bigroundcube.sql   www.tar.gz
.histfile   .login  .mysql_history  .ssh.zcompdu 
mp  picsstuff
.history.login_conf .profile.vim.zshrc   
postfix-2.4.5   test
.irssi  .mail_aliases   .rhosts .viminfo 
cacert.pem  public_html vmail.tar.gz

shell /home/johan#

Thats my home dir on core!.. That should very much not be visible  
there! I have full access now (from the wrong jail!)


Known bug or did I just stumble upon something pretty bad??

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: scrambled (gmirror) dmesg output

2007-12-02 Thread Johan Ström

On Dec 1, 2007, at 12:37 , Jeremy Chadwick wrote:


On Sat, Dec 01, 2007 at 12:16:45PM +0100, Johan Ström wrote:

Hello
Im playing with a new box running RELENG_7.0 from yesterday. I got  
two

discs with gmirror on ad[6|14]s1a and zfs-mirror on s1d. When o do
atacontrol detach ata7 (detach ad14), i get this in dmesG:

(first time)
subdisk14: detached
ad14: detached
GEOM_MIRROR: Device gm1b: provider ad14s1bG  
dEiOsMc_oMnInReRcOtRe:d .De

vice gm1: provider ad14s1a disconnected.

(second time, detaching again after reattach)
subdisk14: detached
ad14: detached
GEOMG_EMOIMR_RMOIRR:R ORD:e viDceev icgem 1bg:m 1p:r opvriodveird era
d1a4ds114bs 1dai sdciosncnoencnteecdt.ed.

huh? :) Some print raceing or something?


The problem isn't specific to GEOM or ZFS.  It's a known issue with  
two

kernel printf()s being called simultaneously.  There are older threads
discussing the issue.  I can dig up URLs if you want to read them,  
but I

don't have them available quickly...


Just what I thought then. Just have never seen it 6.x (where I use  
gmirror) so I was a bit curious.


Btw, zfs doesnt seem to be very chatty in dmesg? Ie loosing discs,  
starting to rebuild discs etc... Isnt that something one would want  
in logs?


Thanks!



--
| Jeremy Chadwickjdc at  
parodius.com |
| Parodius Networking   http:// 
www.parodius.com/ |
| UNIX Systems Administrator  Mountain View,  
CA, USA |
| Making life hard for others since 1977.  PGP:  
4BD6C0CB |




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


scrambled (gmirror) dmesg output

2007-12-01 Thread Johan Ström

Hello
Im playing with a new box running RELENG_7.0 from yesterday. I got  
two discs with gmirror on ad[6|14]s1a and zfs-mirror on s1d. When o  
do atacontrol detach ata7 (detach ad14), i get this in dmesG:


(first time)
subdisk14: detached
ad14: detached
GEOM_MIRROR: Device gm1b: provider ad14s1bG dEiOsMc_oMnInReRcOtRe:d .De
vice gm1: provider ad14s1a disconnected.

(second time, detaching again after reattach)
subdisk14: detached
ad14: detached
GEOMG_EMOIMR_RMOIRR:R ORD:e viDceev icgem 1bg:m 1p:r opvriodveird era  
d1a4ds114bs 1dai sdciosncnoencnteecdt.ed.


huh? :) Some print raceing or something?

Btw, Im doing ZFS'ed root as on wiki, but i added gmirror to the root  
partition to (and steps to install from one disc to the other, then  
boot over and add the original disc to mirrors)..
I've documented the steps (or at least the commands and some simple  
comments), would anyone be interested in having it, on the wiki or  
otherwise?


--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Jails and PF states on locahost

2007-11-12 Thread Johan Ström

No-one with any clues or recommendations? :/ CCing to -stable too..

Thanks
--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


On Oct 29, 2007, at 09:37 , Johan Ström wrote:


Hello

I got a FreeBSD 6.2 box running a few jails, with a pretty strict  
PF ruleset. I got a problem with traffic between two of the jails.  
Both have public IPs (one of them have two using the jail-multiple- 
ip-patch). The problem I have is when they are to talk with each  
other. First let med describe the PF ruleset (somewhat stripped  
down but this should be the relevant stuff)


jail1=xx.xx.xx.131
jail2a=xx.xx.xx.133
jail2b=xx.xx.xx.134
scrub in all
block drop in log
# base system talk to itself
pass in on lo0 inet from 127.0.0.1 to 127.0.0.1

# all can talk out
pass out on em0 proto tcp flags S/SA modulate state
pass out on em0 proto udp keep state

# jails talk to them selfs
pass in on lo0 inet from $jail1 to $jail1
pass in on lo0 inet from {$jail2a $jail2b} to {$jail2a $jail2b}

# let smtp in on jail1
pass in on {lo0 em0} inet proto tcp from any to $jail1 port smtp  
flags S/SA modulate state


Okay, so the problem occurs when jail2 shall talk to jail1 on port  
25 (smtp). From the above rules, when the traffic leaves jail2  
(traffic comes from $jail2b it seems) it should match the last rule  
and create a state. And so it does!


self tcp xx.xx.xx:25 - xx.xx.xx.134:57557   SYN_SENT:ESTABLISHED
   [3014249759 + 65536](+2074393365) wscale 1  [4121000179 + 65536] 
(+541973245) wscale 1

   age 00:01:03, expires in 00:00:01, 7:10 pkts, 384:640 bytes

So the SYN arives at $jail1, but the SYNACK fails to go back to  
$jail2b (where the state should let the packet back in?), which is  
also seen in the following row from pflog0:


09:30:34.370402 rule 1/0(match): block in on lo0: (tos 0x0, ttl   
64, id 35618, offset 0, flags [DF], proto: TCP (6), length: 64)  
xx.xx.xx.131.25  xx.xx.xx.134.57557: S 793675827:793675827(0) ack  
4121000179 win 65535 mss 1460,nop,wscale 1,[|tcp]


So.. What have I missed? The state is created but it doesnt seem to  
match enough bytes or something? 384:640 matched packets, so et  
matches in both directions?


Any clues are welcome! Thanks

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Crashed gmirror, single disk marked SYNC and wont boot...

2007-08-24 Thread Johan Ström

On Aug 21, 2007, at 17:53 , Johan Ström wrote:


On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote:


All in all, your partition table seems to be gone. If you created  
it on

gmirror before (gm0s1) you may still have the same partition table on
the other half of the mirror. You can try to move it to ad6 with
bsdlabel and verify if you can see file system inside partitions.


Okay, tried that now.. Saved ad0s1 label, reloaded it onto ad6s1..  
Now I got same partition table on ad6s1 as on ad0s1...
Trying to mount any though gives me incorrect super block... fsck  
cannot find any superblocks either..


So.. What to do now then? Just for get ad6 and start from scratch  
from ad2? (as i said, the data isnt very old realy)...


Im thinking about doing complete reinstall on ad4+ad6 then.. Can I  
do that? fdisk both with full partition on both, create a new  
gmirror between ad6s1/ad4s1 (or should i go on ad4/ad6?), create  
slices, use dump | restore (of course with apps shutdown so no data  
is changed.. or at least nothing that I care about) to copy all  
files from ad2 to new mirror.. what do I need to do more? bsdlabel - 
B on both to write boot blocks? Is there anything else to think about?




Ok just for the record, I plugged both sata disks in, cleared them,  
created a new mirror on both of them, sliced up and dump -0 -L -f - /  
| restore -r -f - all filesystems, also bsdlabel -B. and what i  
missed in the above thext, fdisk -B  to write boot0 code.. Now its  
booted fine on the mirror!


altough, one thing that I got curious about. In the fdisk manpage it  
says -b can be used to change the bootcode.. and that default is / 
boot/mbr.. What is this? I checked md5 against boot0 and its not the  
same (altough I guess it might just be some boot0 with different  
config..). I never found any references to this mbr file in neither  
man pages or handbook.


Again, thanks for the help :)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Crashed gmirror, single disk marked SYNC and wont boot...

2007-08-24 Thread Johan Ström

On Aug 24, 2007, at 12:21 , CyberLeo Kitsana wrote:


Johan Ström wrote:

altough, one thing that I got curious about. In the fdisk manpage it
says -b can be used to change the bootcode.. and that default is
/boot/mbr.. What is this? I checked md5 against boot0 and its not the
same (altough I guess it might just be some boot0 with different
config..). I never found any references to this mbr file in  
neither man

pages or handbook.


boot0 is the pretty 'F1 FreeBSD' type boot menu. mbr is more like your
standard MS bootloader, that just boots the active slice of the  
current

disk.

The latter is my favorite, as I despise multi-booting.


I see. Shouldn't this info be in the manpages/handbook somewhere?  
Like referenced from boot0cfgs manpage or something, and in the boot  
section in handbook.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Crashed gmirror, single disk marked SYNC and wont boot...

2007-08-21 Thread Johan Ström

Hi

FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7:  
Tue Feb 13 18:24:34 CET 2007 [EMAIL PROTECTED]:/usr/obj/usr/ 
src/sys/ROUTER.POLLING  i386


(ROUTER.POLLING is GENERIC  + options DEVICE_POLLING  and ALTQ,  
IPSEC, also pfsync and carp)


This weekend I had a disk failing on me in a machine running gmirror  
gm0 with 2 providers (ad0 and ad6). The whole box froze with no  
screen output, and on hard reboot I got some LBA errors etc from ad0,  
after a few reboots it got up and running though (I wasnt at the  
screen, had do do it by phone so couldn't really debug very well).
As soon as the box got up, I removed ad0 from the gmirror, so ad6 was  
the only provider. Today I got a new disk that would replace ad0..
Now remeber, ad6 was the only disk in the mirror. I took the box down  
fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4 
+6 is SATA, ad0 was IDE). Changed so I booted of the old SATA..   
Okay, there came the first problem; the boot loader gave me the usual  
options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1  
i got the same prompt again.. F5 nothing at all.. Funny!... The  
system refused to load the loader (or whatever the 1-9 menu thingy is  
called) kernel or anything..
So I finally plugged the old ad0 disk into the machine to at least  
get it booted, thinking it would go up on the gmirror.. Nope..:


(got the new ad4 out here)
ad0: 38166MB WDC WD400BB-00CAA1 17.07W17 at ata0-master UDMA100
ad6: 152627MB SAMSUNG HD160JJ ZM100-41 at ata3-master SATA150
GEOM_MIRROR: Device gm0 created (id=4029378995).
GEOM_MIRROR: Device gm0: provider ad6 detected.
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
GEOM_MIRROR: Force device gm0 start due to timeout.
Trying to mount root from ufs:/dev/mirror/gm0s1a

Manual root filesystem specification:
  fstype:device  Mount device using filesystem fstype
   eg. ufs:da0s1a
  ?  List valid disk boot devices
  empty line   Abort manual input

mountroot

Okey... so why wouldnt it load my mirror from ad6 now?? I just did a  
clean shutdown without problems.. It didnt even recognize any slices  
on ad6s1 (altough the ad6s1 was found)...
I entered ad0s1 as root and booted from there, ofcourse i got to  
emergency shell since fstab looked for the gmirror devices, which  
didnt exist..


Some more digging into gmirror, I did a gmirror dump ad6:

Metadata on /dev/ad6:
 magic: GEOM::MIRROR
   version: 3
  name: gm0
   mid: 4029378995
   did: 449032193
   all: 3
 genid: 0
syncid: 5
  priority: 0
 slice: 4096
   balance: round-robin
mediasize: 20416757248
sectorsize: 512
syncoffset: 0
mflags: NONE
dflags: SYNCHRONIZING
hcprovider:
  provsize: 160041885696
  MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f

Some googling indicated  that  SYNCHRONIZING means that its not  
complete and wont mount? Is that correct? Why would it be in that  
state then, I just shut it down fine... And where the f*ck did my  
slices go??..


Did a sysctl kern.geom.mirror.debug=2 and tried to gmirror activate  
the mirror:


GEOM_MIRROR[1]: Creating device gm0 (id=4029378995).
GEOM_MIRROR[0]: Device gm0 created (id=4029378995).
GEOM_MIRROR[1]: root_mount_hold 0xc3539510
GEOM_MIRROR[1]: Adding disk ad6 to gm0.
GEOM_MIRROR[2]: Adding disk ad6.
GEOM_MIRROR[2]: Disk ad6 connected.
GEOM_MIRROR[1]: Disk ad6 state changed from NONE to NEW (device gm0).
GEOM_MIRROR[0]: Device gm0: provider ad6 detected.
GEOM_MIRROR[2]: Tasting ad6s1.
GEOM_MIRROR[0]: Force device gm0 start due to timeout.
GEOM_MIRROR[1]: root_mount_rel[2169] 0xc3539510
GEOM_MIRROR[2]: No I/O requests for gm0, it can be destroyed.
GEOM_MIRROR[2]: Metadata on ad6 updated.
GEOM_MIRROR[2]: Access ad6 r-1w-1e-1 = 0
GEOM_MIRROR[0]: Device gm0 destroyed.
GEOM_MIRROR[1]: Thread exiting.
GEOM_MIRROR[1]: Consumer ad6 destroyed.


Soo.. What is going on here? Anyone with some clues? Currently  
running on the ad0 disk, no raid at all.. Lets hope it doesnt die on  
me (havent had any signs of that since sunday when it froze and gave  
boot errors now so I'm hoping..). The data loss from using ad0  
instead of ad6 is probably minimal, its a router so its more or less  
only logging that seems to been lost... For now I just want to get  
clear about wth happened here and how to prevent it, and how to get  
back up on a gmirror with ad6 and ad4 (to be plugged in) so I can  
throw ad0 out...



Thanks

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Crashed gmirror, single disk marked SYNC and wont boot...

2007-08-21 Thread Johan Ström

On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote:


On Tue, Aug 21, 2007 at 02:15:08PM +0200, Johan Ström wrote:

Hi

FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7:
Tue Feb 13 18:24:34 CET 2007 [EMAIL PROTECTED]:/usr/obj/usr/
src/sys/ROUTER.POLLING  i386

(ROUTER.POLLING is GENERIC  + options DEVICE_POLLING  and ALTQ,
IPSEC, also pfsync and carp)

This weekend I had a disk failing on me in a machine running gmirror
gm0 with 2 providers (ad0 and ad6). The whole box froze with no
screen output, and on hard reboot I got some LBA errors etc from ad0,
after a few reboots it got up and running though (I wasnt at the
screen, had do do it by phone so couldn't really debug very well).
As soon as the box got up, I removed ad0 from the gmirror, so ad6 was
the only provider. Today I got a new disk that would replace ad0..
Now remeber, ad6 was the only disk in the mirror. I took the box down
fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4
+6 is SATA, ad0 was IDE). Changed so I booted of the old SATA..
Okay, there came the first problem; the boot loader gave me the usual
options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1
i got the same prompt again.. F5 nothing at all.. Funny!... The
system refused to load the loader (or whatever the 1-9 menu thingy is
called) kernel or anything..
So I finally plugged the old ad0 disk into the machine to at least
get it booted, thinking it would go up on the gmirror.. Nope..:

(got the new ad4 out here)
ad0: 38166MB WDC WD400BB-00CAA1 17.07W17 at ata0-master UDMA100
ad6: 152627MB SAMSUNG HD160JJ ZM100-41 at ata3-master SATA150
GEOM_MIRROR: Device gm0 created (id=4029378995).
GEOM_MIRROR: Device gm0: provider ad6 detected.
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
GEOM_MIRROR: Force device gm0 start due to timeout.
Trying to mount root from ufs:/dev/mirror/gm0s1a

Manual root filesystem specification:
  fstype:device  Mount device using filesystem fstype
   eg. ufs:da0s1a
  ?  List valid disk boot devices
  empty line   Abort manual input

mountroot

Okey... so why wouldnt it load my mirror from ad6 now?? I just did a
clean shutdown without problems.. It didnt even recognize any slices
on ad6s1 (altough the ad6s1 was found)...


It loaded your mirror just fine, you confuse things. Gmirror  
started in

degraded state, as one could expect, but it seems there is no 'a'
partition on your gm0s1 slice (or entire bsdlabel is gone).
You could try to recreate it based on bsdlabel from ad0 (if it  
should be
the same), but I've no idea how it disapeared. Anyway, gmirror  
seems to

work properly.


Okay.. So it tries to load, find no partition table, and ignores and  
unloads gm0?





Some more digging into gmirror, I did a gmirror dump ad6:

Metadata on /dev/ad6:
 magic: GEOM::MIRROR
   version: 3
  name: gm0
   mid: 4029378995
   did: 449032193
   all: 3


You have 3-way mirror?


Uhm.. never had more than 2 disks in this machine..




 genid: 0
syncid: 5
  priority: 0
 slice: 4096
   balance: round-robin
mediasize: 20416757248
sectorsize: 512
syncoffset: 0
mflags: NONE
dflags: SYNCHRONIZING
hcprovider:
  provsize: 160041885696
  MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f


BTW. Your provider size is 149GB and mirror only use 19GB, which means
you mirrored 149GB disk with 19GB disk and you waste 130GB (it's
unused).


Yes, the ad0 disk was (is) only 40GB so only first 40 Gb of that disk  
was in mirror (the rest was in another slice with its own label..  
altough if I'm doing fdisk on the disk it seems to not be there at  
all..)
But hum, 19??.. It should be 40 (or somewhere around there at  
least).. From ad0 mount:

Filesystem   1K-blocks Used Avail Capacity  Mounted on
/dev/ad0s1a 5076308514238187818%/
/dev/ad0s1e 507630   20467000 0%/tmp
/dev/ad0s1f   10154158  1176410   816541613%/usr
/dev/ad0s1d150619080326   1305370 6%/var
/dev/ad0s1g   24174212  6939804  1530047231%/var/squid
swapinfo:
/dev/ad0s1b   10225360  1022536 0%

~35Gb...
Compared slice 1 on ad0 vs ad6, both have the exact same size.




Some googling indicated  that  SYNCHRONIZING means that its not
complete and wont mount? Is that correct? Why would it be in that
state then, I just shut it down fine... And where the f*ck did my
slices go??..


SYNCHRONIZING means that this component was/is being synchronized. It
seems that you removed/lost the master disk, while it was  
synchronizing.

It should work anyway.


Okay thats odd.. ad6 was the only disk in the mirror when I shut down  
(shutdown -p now, and it powered off by itself..) so it should have  
been good..




BTW. You confuse things again. Your slice is just fine (ad6s1), you
don't have

ATA driver/gmirror problems, multiple boxes...

2007-04-25 Thread Johan Ström
 produce them on demand.. The crashes happens now  
and then, no regular intervals though.. For elfi:
Apr 24 05:20:27 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad6  
disconnected.
(I actually cant find any other entry in the logs, but judging from  
IRC logs: march 28, march 12, feb 13, jan 22, jan 18)


For crus:
Apr 23 13:46:14 crus kernel: GEOM_MIRROR: Device gm1: provider ad8  
disconnected.
Apr 13 09:57:49 crus kernel: GEOM_MIRROR: Device gm1: provider ad8  
disconnected.

I think it has happened once more, but thats it..

For gw-1 it's luckily only once so far.. At least with the current  
install, it has had problems when the maxtor disks was running in it  
(and i think it was 6.0 back then)


So.. Three different boxes, with three different chipsets... With  
three different crash scenarios.. But they all have problems.. So  
where is the actual problem? The HW? The chipset drivers? Gmirror  
code? I have run SMART tests on the crashing disks, no errors.. I  
have run powermax (maxtors own test program) a while back on the  
maxtor disks, no problems.. I have tried changing SATA cables on some  
of the disks, no difference..


Does anyone have any clue about what can be causing this? What is  
most likely? How do we hunt this down?


Thank you.

Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: OpenSSH 4.6 error: channel 0: chan_read_failed for istate 3

2007-03-24 Thread Johan Ström

On Mar 20, 2007, at 16:04 , Dominik Zalewski wrote:


Hi All,

After upgrading to openssh-portable-4.6.p1,1 I'm getting following  
messages in

logs:

error: channel 0: chan_read_failed for istate 3

Althought ssh works fine.


Hi,
just wan't to report that I've started to see the same thing since i  
upgraded to 4.6.p1,1.
Every night my backup server uses scp to transfer files from my box,  
and I see this:


Mar 23 21:00:04 elfi sshd[76875]: Accepted publickey for root from  
2001:xxx::xxx:xx  port 63449 ssh2
Mar 23 21:01:18 elfi sshd[76875]: error: channel 0: chan_read_failed  
for istate 3
Mar 23 21:01:18 elfi sshd[76875]: error: channel 0: chan_read_failed  
for istate 3
Mar 23 21:01:18 elfi sshd[77389]: Accepted publickey for root from  
2001: xxx::xxx:xx  port port 63450 ssh2
Mar 23 21:53:31 elfi sshd[77389]: error: channel 0: chan_read_failed  
for istate 3
Mar 23 21:53:32 elfi sshd[77389]: error: channel 0: chan_read_failed  
for istate 3
Mar 23 21:53:34 elfi sshd[85742]: Accepted publickey for root from  
2001: xxx::xxx:xx  port  port 49493 ssh2
Mar 23 21:53:34 elfi sshd[85742]: error: channel 0: chan_read_failed  
for istate 3
Mar 23 21:53:34 elfi sshd[85742]: error: channel 0: chan_read_failed  
for istate 3


The backup process works by first executing a pre-script, then  
scp'ing, then executing a post-script.. so those errors looks like  
they appear directly when the ssh session is disconnected.


Anyone else with clues?

Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Network polling

2006-10-15 Thread Johan Ström

Hi

I just tried to enable network polling on my router box, a P2 400MHz  
with 3 different NICs (one internal, i think its the fxp one):


fxp0: Intel 82558 Pro/100 Ethernet port 0x7c60-0x7c7f mem  
0xf3dff000-0xf3df,0xf3f0-0xf3ff irq 11 at device 3.0 on pci0

miibus0: MII bus on fxp0
inphy0: i82555 10/100 media interface on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: [GIANT-LOCKED]
rl0: RealTek 8139 10/100BaseTX port 0x7400-0x74ff mem  
0xf3efef00-0xf3efefff irq 10 at device 16.0 on pci0

miibus1: MII bus on rl0
rlphy0: RealTek internal media interface on miibus1
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: [GIANT-LOCKED]
sis0: NatSemi DP8381[56] 10/100BaseTX port 0x7800-0x78ff mem  
0xf3eff000-0xf3ef irq 9 at device 20.0 on pci0

sis0: Silicon Revision: DP83815C

gw-1 ~$ uname -a
FreeBSD gw-1.stromnet.org 6.1-RELEASE-p10 FreeBSD 6.1-RELEASE-p10 #1:  
Fri Oct 13 16:59:41 CEST 2006 [EMAIL PROTECTED]:/usr/obj/ 
usr/src/sys/ROUTER.POLLING  i386


Kernel is GENERIC + carp+pfsync+ipsec+polling..
Ok, so when I transfer data between sis0 to rl0 for example, i get  
very high intererrupt rate, ~40% or so.. Im using openvpn on the box  
(laptop on rl0), so the packets is maybee shopped up into smaller  
fragmenst, im not sure.. But anyways, I got the idea that I should  
try to enable polling on the interface instead. So I did,r ecompiled  
with polling and enabled polling on all thre if's (man polling says  
all three should be supported). Any difference? None! still at 40%  
interrupts when loading ~10MBit (cant seem to get much more since  
ovpn floors the CPU at that speed).
So, shouldnt the interrupts go down somewhat now that i enabled  
polling? Or did I get this all wrong ;)


Thanks
Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Total crash in gdb!.. something is broken!.. Was: Re: FreeBSD with a Gigabyte GA-K8NSC?

2006-09-25 Thread Johan Ström
Fcking great.. Waking up and noting that the box has rebooted it self  
during the night... Yay!!... No kernel dumps, nothing in message  
log.. Nada... (this was on the first box, that is the one first in  
this thread)


What exactly does

kernel dumps on /dev/mirror/gm0s1b

mean? Not that it saves any kernel dumps at least.. But otoh I have  
no clue why it crashed at all and if it even did try to dump kernel  
or if it just blacked out as when i tried to debug clamd...


--
Johan

On Sep 24, 2006, at 14:23 , Johan Ström wrote:

Okay, I got some problems here now... I'm trying to get clamav's  
clamd to work.. Failes with an abort in libc:


It coredumps directly on start, trace: http://sial.org/pbot/19922
Truss output: www.stromnet.org/~johan/clamd.log

Okay...something seems to be f*cked in the nss/ldap stuff..

Anyway,, when running with gdb --args /usr/local/sbin/clamd --debug  
it works fine!... No coredump or anything, untill i decide to kill  
clamd..


kill pidofclamd breaks into gdb, and when I run continue to  
process the signal and let it die, the whole fcking box dies!  
Screen goes black and reboot.. No panic messages or anything...
I have reproduced this two times now on the box in the dmesg in  
earlier mail... Then i moved the disk to another box pretty  
similar, same chipset i thikn but not exactly same mobo.. tried the  
above commands, and bam exactly same problem.. screen just goes  
black and the box reboots... Dmesg from that box:


if i can get the crap up running. now the fs is broken or some  
shit get this on boot, after started a few services:


Starting jails:/usr: bad dir ino 32125198 at offset 512: mangled entry
panic: ufs_dirbad: bad dir
Uptime: 54s
GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed
GEOM_MIRROR: Device gm0 destroyed.
Cannot dump. No dump device defined.
Automatic reboot in 15 seconds...

Ok, now i rebooted to singeluser mode and enalbed the dumpdev in  
rc.conf... , then saw it continune booting and i chcked for the  
line saying kernel dumps on /dev/mirror/gm0s1b... it was there...  
and then it booted further and got by the place it crashed before,  
but a minute later when i try to login to crashes on the same  
inode... AND STILL!.. it says Cannot dump. No dump device  
defined... WTF??... brokeness brokeness..


Okay, after some fscking its back up. dmesg from second box which i  
can crash with clamd...:


Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,  
1994
The Regents of the University of California. All rights  
reserved.

FreeBSD 6.1-RELEASE-p7 #0: Wed Sep 20 09:21:41 CEST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/ELFI
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 Processor 3200+ (2210.09-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0xfc0  Stepping = 0
   
Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PG 
E,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2

  AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow+,3DNow
real memory  = 1073676288 (1023 MB)
avail memory = 1024299008 (976 MB)
ACPI APIC Table: Nvidia AWRDACPI
ioapic0 Version 1.1 irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: Nvidia AWRDACPI on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x4008-0x400b on acpi0
cpu0: ACPI CPU on acpi0
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff,0xcf0-0xcf3 on acpi0
pci0: ACPI PCI bus on pcib0
agp0: NVIDIA nForce3-250 AGP Controller mem 0xf000-0xf7ff  
at device 0.0 on pci0

isab0: PCI-ISA bridge at device 1.0 on pci0
isa0: ISA bus on isab0
pci0: serial bus, SMBus at device 1.1 (no driver attached)
ohci0: OHCI (generic) USB controller mem 0xfe02f000-0xfe02  
irq 21 at device 2.0 on pci0

ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: OHCI (generic) USB controller on ohci0
usb0: USB revision 1.0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
ohci1: OHCI (generic) USB controller mem 0xfe02e000-0xfe02efff  
irq 22 at device 2.1 on pci0

ohci1: [GIANT-LOCKED]
usb1: OHCI version 1.0, legacy support
usb1: OHCI (generic) USB controller on ohci1
usb1: USB revision 1.0
uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 4 ports with 4 removable, self powered
ehci0: NVIDIA nForce3 250 USB 2.0 controller mem  
0xfe02d000-0xfe02d0ff irq 23 at device 2.2 on pci0

ehci0: [GIANT-LOCKED]
usb2: EHCI version 1.0
usb2: companion controllers, 4 ports each: usb0 usb1
usb2: NVIDIA nForce3 250 USB 2.0 controller on ehci0
usb2: USB revision 2.0
uhub2: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 8 ports with 8 removable, self powered
nve0: NVIDIA nForce MCP7 Networking Adapter port 0xf000-0xf007  
mem 0xfe02c000-0xfe02cfff irq 21 at device 5.0 on pci0

nve0: Ethernet address 00:11:09:c5:fc

Re: Total crash in gdb!.. something is broken!.. Was: Re: FreeBSD with a Gigabyte GA-K8NSC?

2006-09-25 Thread Johan Ström


On Sep 25, 2006, at 08:45 , Jiawei Ye wrote:


On 9/25/06, Johan Ström [EMAIL PROTECTED] wrote:

Fcking great.. Waking up and noting that the box has rebooted it self
during the night... Yay!!... No kernel dumps, nothing in message
log.. Nada... (this was on the first box, that is the one first in
this thread)

What exactly does

kernel dumps on /dev/mirror/gm0s1b

mean? Not that it saves any kernel dumps at least.. But otoh I have
no clue why it crashed at all and if it even did try to dump kernel
or if it just blacked out as when i tried to debug clamd...

--
Johan

It means that the system died and released the sphincter when it did.
If you have dumpdev='AUTO'
dumpdir='/var/crash'
in your rc.conf, then you can find the crash dump in ${dumpdir}, then
you can use kgdb to retrieve the backtrace from the dump.



I got dumpdev=/dev/mirror/gm0s1b, savecore doesnt extract any dumps :/


Jiawei

--  
If it looks like a duck, walks like a duck, and quacks like a duck,

then to the end user it's a duck, and end users have made it pretty
clear they want a duck; whether the duck drinks hot chocolate or
coffee is irrelevant.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable- 
[EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Total crash in gdb!.. something is broken!.. Was: Re: FreeBSD with a Gigabyte GA-K8NSC?

2006-09-25 Thread Johan Ström

fOn Sep 25, 2006, at 09:55 , Alban Hertroys wrote:



On Sep 25, 2006, at 8:36, Johan Ström wrote:



What exactly does

kernel dumps on /dev/mirror/gm0s1b

mean? Not that it saves any kernel dumps at least.. But otoh I  
have no clue why it




It means exactly that. IIRC kernel dumps are created in swap space  
and on the next boot are moved to ${dumpdir}. I'm pretty certain  
this is explained nicely in the handbook[1].




Probably is, yes I know what it means, I was just pretty upset at the  
moment..;)




AFAIK kernels can only be dumped on real devices, not on virtual  
devices like /dev/mirror/*. In that case your setup is not going to  
get you any dumps.




In earlier FBSD (6.0 i think?) one got an ioctl error when trying to  
dumpon to a gmirror device, but if I dont recall wrong this has been  
changed since (I dont get an ioctl error anymore at least...)





Besides that, it is probably not a very good idea to mirror your  
swap. I am certain it is bad for performance, if it'd gain you  
reliability is beyond my knowledge. This has been discussed before,  
you probably want to check the archives.




Performance yes, but I think I've read that it is best anyway, if  
one of your disks dies, youd dont want to loose half your swap since  
that would not be very good if there is anything swapped out to that  
disk..





[1] Which I didn't check as I'm about to be in a hurry...
--
Alban Hertroys

Priest to alien: We want to know, is there a higher 
being?.
Alien: Well, actually that's why we're here,
we're sheer out of virgins.



!DSPAM:259,45178b407241208415560!






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Total crash in gdb!.. something is broken!.. Was: Re: FreeBSD with a Gigabyte GA-K8NSC?

2006-09-24 Thread Johan Ström
-0xbe3,0x960-0x967,0xb60-0xb63,0xc800-0xc80f, 
0xc400-0xc47f irq 23 at device 9.0 on pci0

ata2: ATA channel 0 on atapci1
ata3: ATA channel 1 on atapci1
atapci2: nVidia nForce3 Pro SATA150 controller port  
0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xb000-0xb00f, 
0xac00-0xac7f irq 20 at device 10.0 on pci0

ata4: ATA channel 0 on atapci2
ata5: ATA channel 1 on atapci2
pcib1: ACPI PCI-PCI bridge at device 11.0 on pci0
pci1: ACPI PCI bus on pcib1
pci1: display, VGA at device 0.0 (no driver attached)
pcib2: ACPI PCI-PCI bridge at device 14.0 on pci0
pci2: ACPI PCI bus on pcib2
pci2: multimedia, audio at device 8.0 (no driver attached)
xl0: 3Com 3c905B-TX Fast Etherlink XL port 0x8800-0x887f mem  
0xfdfff000-0xfdfff07f irq 19 at device 9.0 on pci2

miibus1: MII bus on xl0
xlphy0: 3Com internal media interface on miibus1
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:10:5a:dc:5e:aa
fwohci0: VIA Fire II (VT6306) port 0x8400-0x847f mem  
0xfdffe000-0xfdffe7ff irq 19 at device 12.0 on pci2

fwohci0: OHCI version 1.0 (ROM=1)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:10:dc:00:00:77:83:dc
fwohci0: Phy 1394a available S400, 3 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: IEEE1394(FireWire) bus on fwohci0
fwe0: Ethernet over FireWire on firewire0
if_fwe0: Fake Ethernet address: 02:10:dc:77:83:dc
fwe0: Ethernet address: 02:10:dc:77:83:dc
fwe0: if_start running deferred for Giant
sbp0: SBP-2/SCSI over FireWire on firewire0
fwohci0: Initiate bus reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
firewire0: 1 nodes, maxhop = 0, cable IRM = 0 (me)
firewire0: bus manager 0 (me)
fdc0: floppy drive controller port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on  
acpi0

fdc0: [FAST]
fd0: 1440-KB 3.5 drive on fdc0 drive 0
sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10  
on acpi0

sio0: type 16550A
sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: Standard parallel printer port port 0x378-0x37f,0x778-0x77b  
irq 7 on acpi0

ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: Parallel port bus on ppc0
ppbus0: IEEE1284 device found /NIBBLE/ECP
Probing for PnP devices on ppbus0:
ppbus0: HEWLETT-PACKARD DESKJET 820C SCP,VLINK
plip0: PLIP network interface on ppbus0
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppi0: Parallel I/O on ppbus0
atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
orm0: ISA Option ROMs at iomem 0xc-0xce7ff,0xd-0xd17ff on isa0
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x300
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on  
isa0
ums0: Microsoft Microsoft 5-Button Mouse with IntelliEye(TM), rev  
1.10/3.00, addr 2, iclass 3/1

ums0: 5 buttons and Z dir.
Timecounter TSC frequency 2210091714 Hz quality 800
Timecounters tick every 1.000 msec
module_register_init: MOD_LOAD (amr_linux, 0x80654f90, 0)  
error 6

ad0: 156334MB Maxtor 6Y160P4 YAR41BW0 at ata0-master UDMA133
acd0: DVDR NEC DVD RW ND-3500AG/2.16 at ata1-master UDMA33
ad4: 286187MB Maxtor 7L300S0 BANC1G10 at ata2-master SATA150
GEOM_MIRROR: Device gm0 created (id=316220990).
GEOM_MIRROR: Device gm0: provider ad4 detected.
GEOM_MIRROR: Device gm0: provider ad4 activated.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
Trying to mount root from ufs:/dev/mirror/gm0s1a


So.. Wtf is the problem here?...

Hope someone can help me..
Thanks

On Sep 23, 2006, at 18:41 , Johan Ström wrote:



On Sep 3, 2006, at 14:13 , Johan Ström wrote:


Hi

I'm about to get a new server... In this case what I'm looking  
at is a Gigabyte GA-K8NSC mobo with nForce3 250Gb chipset, and a  
AMD 64 3200+ Venice S939.


Does anyone have any experience with FreeBSD (6.1) and this mobo/ 
chipset? Does the network work? How good? SATA? Any stability/ 
performance issues?


I did notice it was mentioned on http://www.freebsd.org/platforms/ 
amd64/motherboards.html on 5.4 with the only comment Sound and  
USB untested... So.. anyone got more detailed experience than that?


Thanks :)
--
Johan Ström
[EMAIL PROTECTED]



Hi again,
I got the mobo now and everything I've tested seems to work fine,  
network (Marvell Gigabit Ethernet) works perfect (altough just  
using 100mbit, havent tested gig), and sata seems to work..  
Somewhat... Thats part of why I post this..


I got two disks plugged in currently, two pieces of ad4: 286187MB  
Maxtor 7L300S0 BANC1G10 at ata2-master SATA150 (ad4 and ad6) on  
one SATA each... When I only access ad4 (the system disk) and dont  
touch ad6 (the old system disk, moving some data form there now..  
soon to be gmirrored with ad4) it works fine.
But as soon as i start to transer data from ad6 to ad4 (or rather,  
from ad4s1f to gm0s1f of which ad6 is provider), the system becomes  
veeerrry slow... Its still usable but it takes several seconds

Re: FreeBSD with a Gigabyte GA-K8NSC?

2006-09-23 Thread Johan Ström


On Sep 3, 2006, at 14:13 , Johan Ström wrote:


Hi

I'm about to get a new server... In this case what I'm looking at  
is a Gigabyte GA-K8NSC mobo with nForce3 250Gb chipset, and a AMD  
64 3200+ Venice S939.


Does anyone have any experience with FreeBSD (6.1) and this mobo/ 
chipset? Does the network work? How good? SATA? Any stability/ 
performance issues?


I did notice it was mentioned on http://www.freebsd.org/platforms/ 
amd64/motherboards.html on 5.4 with the only comment Sound and USB  
untested... So.. anyone got more detailed experience than that?


Thanks :)
--
Johan Ström
[EMAIL PROTECTED]



Hi again,
I got the mobo now and everything I've tested seems to work fine,  
network (Marvell Gigabit Ethernet) works perfect (altough just using  
100mbit, havent tested gig), and sata seems to work.. Somewhat...  
Thats part of why I post this..


I got two disks plugged in currently, two pieces of ad4: 286187MB  
Maxtor 7L300S0 BANC1G10 at ata2-master SATA150 (ad4 and ad6) on one  
SATA each... When I only access ad4 (the system disk) and dont touch  
ad6 (the old system disk, moving some data form there now.. soon to  
be gmirrored with ad4) it works fine.
But as soon as i start to transer data from ad6 to ad4 (or rather,  
from ad4s1f to gm0s1f of which ad6 is provider), the system becomes  
veeerrry slow... Its still usable but it takes several seconds  
(sometimes as much as 10-20) to ie exectue a simple command like ls,  
top, su...

gstat reports speeds of around 30MB/s:

dT: 0.501  flag_I 50us  sizeof 288  i -1
L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
   17395 12200  577.2383  490598.5   99.2| ad4
   17395 12200  577.2383  490598.5   99.2|  
mirror/gm0

0  0  0  00.0  0  00.00.0| ad4s1
   17395 12200  577.2383  490598.5   99.2|  
mirror/gm0s1

0387387  495701.1  0  00.0   43.0| ad6
3  2  2 32  583.1  0  00.0  116.5|  
mirror/gm0s1a
1  0  0  00.0  0  00.00.0|  
mirror/gm0s1b
0  0  0  00.0  0  00.00.0|  
mirror/gm0s1c
0  0  0  00.0  0  00.00.0|  
mirror/gm0s1d
0  0  0  00.0  0  00.00.0|  
mirror/gm0s1e
   13393 10168  576.0383  490598.5   95.3|  
mirror/gm0s1f

0387387  495701.1  0  00.0   43.2| ad6s1
0  0  0  00.0  0  00.00.0| ad6s1a
0  0  0  00.0  0  00.00.0| ad6s1b
0  0  0  00.0  0  00.00.0| ad6s1c
0  0  0  00.0  0  00.00.0| ad6s1d
0  0  0  00.0  0  00.00.0| ad6s1e
0387387  495701.1  0  00.0   44.0| ad6s1f

Those busy figures.. on the gmirror they fly up to  100% all the  
time and are red.. on the ad6 figures they are 40-50% all the time  
(during copy that is)..


Any ideas?

dmesg:

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights  
reserved.

FreeBSD 6.1-RELEASE-p7 #0: Wed Sep 20 09:21:41 CEST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/ELFI
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 Processor 3200+ (2009.79-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x20ff0  Stepping = 0
   
Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, 
MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2

  Features2=0x1SSE3
  AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow+,3DNow
  AMD Features2=0x1LAHF
real memory  = 1073676288 (1023 MB)
avail memory = 1024299008 (976 MB)
ACPI APIC Table: Nvidia AWRDACPI
ioapic0 Version 1.1 irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: Nvidia AWRDACPI on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
cpu0: ACPI CPU on acpi0
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff,0xcf0-0xcf3 on acpi0
pci0: ACPI PCI bus on pcib0
agp0: NVIDIA nForce3-250 AGP Controller mem 0xf800-0xf9ff  
at device 0.0 on pci0

isab0: PCI-ISA bridge at device 1.0 on pci0
isa0: ISA bus on isab0
pci0: serial bus, SMBus at device 1.1 (no driver attached)
ohci0: OHCI (generic) USB controller mem 0xfd005000-0xfd005fff irq  
20 at device 2.0 on pci0

ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: OHCI (generic) USB controller on ohci0
usb0: USB revision 1.0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
ohci1: OHCI (generic) USB controller mem 0xfd00-0xfd000fff irq  
21

FreeBSD with a Gigabyte GA-K8NSC?

2006-09-03 Thread Johan Ström

Hi

I'm about to get a new server... In this case what I'm looking at  
is a Gigabyte GA-K8NSC mobo with nForce3 250Gb chipset, and a AMD 64  
3200+ Venice S939.


Does anyone have any experience with FreeBSD (6.1) and this mobo/ 
chipset? Does the network work? How good? SATA? Any stability/ 
performance issues?


I did notice it was mentioned on http://www.freebsd.org/platforms/ 
amd64/motherboards.html on 5.4 with the only comment Sound and USB  
untested... So.. anyone got more detailed experience than that?


Thanks :)
--
Johan Ström
[EMAIL PROTECTED]



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Q about gmirror's metadata sector

2006-08-18 Thread Johan Ström

Hi

If i've understood correctly gmirror uses the last sector on the  
provider for a metadata.
If one uses http://www.onlamp.com/pub/a/bsd/2005/11/10/ 
FreeBSD_Basics.html to setup a gmirror'ed system, that is haveing a  
fully used disk
where the last sector is used (right?) and converting it to a  
gmirror, this will overwrite whatever is on the last sector, right?
This will probably not be overwritten on a non-full fs, but if the fs  
gets full later, is there any risk that this sector get's  
overwritten? Does one have to shrink the fs/slice manually or  
something to make sure this does not happend?


I haven't seen anyone mention this anywhere so im just curious to how  
it works


Thanks
Johan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ATA problems again ... This time system froze!

2006-08-15 Thread Johan Ström


On Jul 28, 2006, at 13:15 , Johan Ström wrote:



On 17 jul 2006, at 17.40, Miroslav Lachman wrote:


Mike Tancsa wrote:
[..]

Install the smartmontools from
/usr/ports/sysutils/smartmontools/
and post the output of
smartctl -a /dev/ad8


smartmontools was previously installed and running as daemon  
without any bad reports.
I can not run smartctl -a /dev/ad8 now, because my server  
housing provider replaced HDD with the new one and after an hour  
of synchronization ad8: FAILURE - device detached. So provider  
replaced whole server, only ad4 is original piece of HW.
On new server synchronization was much faster then in previous  
server (1:30 hour compared to 5 hours in previous server) - so I  
think it was HW problem.
Now I am running stresstest with copying /usr/ports to another  
partition in infinite loop.
I will post results later. (On bad server, test failed after about  
30 minutes. On another server the test is running fine second day,  
so I think if disk will not fail after 1 day, problem is solved)


At last - now I think this was not GEOM/gmirror related. I tried  
remove ad8 provider from gmirror (gm0), boot up system from gm0  
with one provider (ad4) and test ad8 mounted separately - ad8  
failed again.


Just got another one..

Jul 25 13:30:47 elfi kernel: ad4: FAILURE - device detached
Jul 25 13:30:47 elfi kernel: subdisk4: detached
Jul 25 13:30:47 elfi kernel: ad4: detached
Jul 25 13:30:47 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Jul 25 13:30:47 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=46318008320, length=2048)]error = 6
Jul 25 13:30:47 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=77269614592, length=16384)]error = 6


6 days uptime when this occured... Both disks are tested with  
PowerMax without a single problem (same with smartctl), both SATA  
cables are new. So the only hwproblem that I cant rule out would be  
the mobo, but that is quite new too...


Solutions? Try RELENG_6 as recommended earlier?


Okay still on 6.1-RELEASE:

FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue  
May  9 20:40:23 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/ 
src/sys/GENERIC  i386


Uptime approx 12 days since last reboot for raid fix... Just got home  
to meet a box which doesnt respond to SSH.. monitor tells me it has  
crashed totaly. From /var/log/message:


Aug 16 00:58:37 elfi kernel: ad4: FAILURE - device detached
Aug 16 00:58:37 elfi kernel: subdisk4: detached
Aug 16 00:58:37 elfi kernel: ad4: detached
Aug 16 00:58:37 elfi kernel: GEOM_MIRROR: Cannot write metadata on  
ad4s1 (device=gm0s1, error=6).
Aug 16 00:58:37 elfi kernel: GEOM_MIRROR: Cannot update metadata on  
disk ad4s1 (error=6).

Aug 16 00:58:37 elfi last message repeated 2 times
Aug 16 00:58:37 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Aug 16 00:58:37 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=112910630912, length=32768)]error = 6
Aug 16 00:58:37 labdator kernel: nfs: server 192.168.1.2 not  
responding, still trying

Aug 16 00:58:37 labdator kernel: nfs: server 192.168.1.2 OK
Aug 16 03:04:21 elfi syslogd: kernel boot file is /boot/kernel/kernel
Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE 
(offset=2325168128, length=16384)]error = 6
Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE 
(offset=2325184512, length=16384)]error = 6
Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE 
(offset=2325200896, length=16384)]error = 6
Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE 
(offset=2325217280, length=16384)]error = 6
Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE 
(offset=2325233664, length=16384)]error = 6
Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE 
(offset=2325250048, length=16384)]error = 6
Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE 
(offset=2319169536, length=2048)]error = 6
Aug 16 03:04:21 elfi kernel: g_vfs_done():mirror/gm0s1d[WRITE 
(offset=2312404992, length=16384)]error = 6
Aug 16 03:04:21 elfi kernel: Copyright (c) 1992-2006 The FreeBSD  
Project.
Aug 16 03:04:21 elfi kernel: Copyright (c) 1979, 1980, 1983, 1986,  
1988, 1989, 1991, 1992, 1993, 1994
Aug 16 03:04:21 elfi kernel: The Regents of the University of  
California. All rights reserved.
Aug 16 03:04:21 elfi kernel: FreeBSD 6.1-RELEASE #3: Tue May  9  
20:40:23 CEST 2006

...(regular boot stuff)...

(labdator is a box with a elfi nfs export mounted)

dmesg shows me some other stuff not in messages:

ad4: FAILURE - device detached
subdisk4: detached
ad4: detached
GEOM_MIRROR: Cannot write metadata on ad4s1 (device=gm0s1, error=6).
GEOM_MIRROR: Cannot update metadata on disk ad4s1 (error=6).
GEOM_MIRROR: Cannot update metadata on disk ad4s1 (error=6).
GEOM_MIRROR: Cannot update metadata on disk ad4s1 (error=6).
GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected.
g_vfs_done():mirror/gm0s1f[READ(offset=112910630912, length=32768)] 
error = 6

ad6

Re: ATA problems again ...

2006-07-28 Thread Johan Ström


On 17 jul 2006, at 17.40, Miroslav Lachman wrote:


Mike Tancsa wrote:
[..]

Install the smartmontools from
/usr/ports/sysutils/smartmontools/
and post the output of
smartctl -a /dev/ad8


smartmontools was previously installed and running as daemon  
without any bad reports.
I can not run smartctl -a /dev/ad8 now, because my server housing  
provider replaced HDD with the new one and after an hour of  
synchronization ad8: FAILURE - device detached. So provider  
replaced whole server, only ad4 is original piece of HW.
On new server synchronization was much faster then in previous  
server (1:30 hour compared to 5 hours in previous server) - so I  
think it was HW problem.
Now I am running stresstest with copying /usr/ports to another  
partition in infinite loop.
I will post results later. (On bad server, test failed after about  
30 minutes. On another server the test is running fine second day,  
so I think if disk will not fail after 1 day, problem is solved)


At last - now I think this was not GEOM/gmirror related. I tried  
remove ad8 provider from gmirror (gm0), boot up system from gm0  
with one provider (ad4) and test ad8 mounted separately - ad8  
failed again.


Just got another one..

Jul 25 13:30:47 elfi kernel: ad4: FAILURE - device detached
Jul 25 13:30:47 elfi kernel: subdisk4: detached
Jul 25 13:30:47 elfi kernel: ad4: detached
Jul 25 13:30:47 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Jul 25 13:30:47 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=46318008320, length=2048)]error = 6
Jul 25 13:30:47 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=77269614592, length=16384)]error = 6


6 days uptime when this occured... Both disks are tested with  
PowerMax without a single problem (same with smartctl), both SATA  
cables are new. So the only hwproblem that I cant rule out would be  
the mobo, but that is quite new too...


Solutions? Try RELENG_6 as recommended earlier?

Thanks

Johan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ATA problems again ...

2006-07-17 Thread Johan Ström


On 17 jul 2006, at 00.53, Mike Tancsa wrote:


At 03:02 PM 14/07/2006, Miroslav Lachman wrote:


After reboot (command reboot), system boot up with both disks  
attached and start autosynchronization. I do not know, if this is  
hw or sw error, I got


Install the smartmontools from

/usr/ports/sysutils/smartmontools/

and post the output of
smartctl -a /dev/ad8


I tried this on my SATA disk ad6:
=== START OF INFORMATION SECTION ===
Model Family: Maxtor MaXLine III family
Device Model: Maxtor 7L300S0
Serial Number:L60CJKPH
Firmware Version: BANC1G10
User Capacity:300,090,728,448 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:Mon Jul 17 11:54:35 2006 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART Error Log Version: 1
No Errors Logged

Any other output from smartctl that can help? Both my ad4 and ad6 are  
the same.


Now I had yet another crash.. They are very much more intense now the  
latest days... Two crashes ago I changed the SATA cable to ad4, i  
wonder if that had anything to do with it... On the other hand, now  
it was ad6 that got lost, so why would ad4's cable make any  
difference.. i'll change ad6 now too when i've taken the box down..


Jul 16 03:27:25 elfi kernel: ad6: FAILURE - device detached
Jul 16 03:27:25 elfi kernel: subdisk6: detached
Jul 16 03:27:25 elfi kernel: ad6: detached
Jul 16 03:27:25 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 disconnected.
Jul 16 03:27:25 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=27210082304, length=2048)]error = 6
Jul 16 03:27:25 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=35153985536, length=32768)]error = 6
Jul 16 03:27:25 elfi kernel: ufs_access(): Error retrieving ACL on  
object (6).
Jul 16 03:27:25 labdator kernel: nfs: server 192.168.1.2 not  
responding, still trying

Jul 16 03:27:25 labdator kernel: nfs: server 192.168.1.2 OK

Those last 3 messages seems to be very related to the gmirror going  
to degraded mode? Some ACL reading and a mounted NFS system  
(192.168.1.2 is the failing box).


Is there some way to enable more debug info output or something??

Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ATA problems again ...

2006-07-17 Thread Johan Ström


On 17 jul 2006, at 16.51, Mike Tancsa wrote:


At 05:59 AM 17/07/2006, Johan Ström wrote:


On 17 jul 2006, at 00.53, Mike Tancsa wrote:


At 03:02 PM 14/07/2006, Miroslav Lachman wrote:



After reboot (command reboot), system boot up with both disks
attached and start autosynchronization. I do not know, if this is
hw or sw error, I got


Install the smartmontools from

/usr/ports/sysutils/smartmontools/

and post the output of
smartctl -a /dev/ad8


I tried this on my SATA disk ad6:
=== START OF INFORMATION SECTION ===
Model Family: Maxtor MaXLine III family
Device Model: Maxtor 7L300S0
Serial Number:L60CJKPH
Firmware Version: BANC1G10
User Capacity:300,090,728,448 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:Mon Jul 17 11:54:35 2006 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART Error Log Version: 1
No Errors Logged

Any other output from smartctl that can help? Both my ad4 and ad6 are
the same.


This at least rules out the disks being bad for the most part.  It  
still could be bad cables, but if you changed those out than its  
doubtful.  Perhaps try updating to RELENG_6 ? If its a gmirror  
issue, I think there have been a number of fixes.


Just ran PowerMax (maxtors own testing software) full length test on  
ad6, not a single problem.. Same result as on ad4 a couple of days  
ago.. So no, i doubt it's the disks. I've changed the other SATA  
cable too now (the one one ad6), this was a fresh one never used  
before. I'll change ad4 too when i take it down for reboot.


I'm currently running RELENG_6_1, however from may 9th. Have there  
been any ata/gmirror changes merged to 6_1 since then?
If I run RELENG_6 instead, how big is the change any other problems  
might arise? ;) I've never used anything other than stable..


Thanks
Johan___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: GEOM problems again...

2006-07-13 Thread Johan Ström


On 10 jul 2006, at 13.59, Johan Ström wrote:



On 10 jul 2006, at 11.09, Johan Ström wrote:



On 21 maj 2006, at 11.16, Johan Ström wrote:


Hi

I've had problems before with GEOM mirror and my SATA drives, and  
i've posted about it here before too. The solution seemd to be a  
change of motherboard, this reduced the crash very much (and also  
the speeds archieved was greatly improved, from 10-15MB/s to  
40-50MB/s..).
However after the change i had one or two crashes, but now it has  
been running for well over 50-60 days or so without any problems.
Then, 11 days ago I upgraded to 6.1... And now I got these  
crashes again (the mirror is crashed that is, the system still  
runs fine):


May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached
May 21 02:04:58 elfi kernel: subdisk6: detached
May 21 02:04:58 elfi kernel: ad6: detached
May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 disconnected.
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=11006308352, length=2048)]error = 6
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=164847927296, length=131072)]error = 6
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=256680296448, length=32768)]error = 6



Some info about the controller and disks:

May  9 22:46:52 elfi kernel: ata1: ATA channel 1 on atapci0
May  9 22:46:52 elfi kernel: atapci1: nVidia nForce2 Pro SATA150  
controller port  
0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0 
f,0x7c0

0-0x7c7f irq 22 at device 11.0 on pci0

May  9 22:46:52 elfi kernel: ad4: 286188MB Maxtor 7L300S0  
BANC1G10 at ata2-master SATA150
May  9 22:46:52 elfi kernel: ad6: 286188MB Maxtor 7L300S0  
BANC1G10 at ata3-master SATA150
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created  
(id=4118114647).
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 detected.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 detected.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 activated.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 activated.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
mirror/gm0s1 launched.
May  9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ 
mirror/gm0s1a


Anyone got any new clues? Afaik the disks should be working fine  
(they are 6 months old and this same problem has occured multiple  
times...)


Hope to solve this ;)

Thanks
Johan



Here we go again

Jul  7 16:20:09 elfi kernel: ad4: FAILURE - device detached
Jul  7 16:20:09 elfi kernel: subdisk4: detached
Jul  7 16:20:09 elfi kernel: ad4: detached
Jul  7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Jul  7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=88896847872, length=32768)]error = 6


However no read read timeouts etc as before, just this. 18 days  
uptime this time (i've rebooted for other reasons since last  
mail). It always seems to be ad4 that is disconnecting.. I'm going  
to do some disk tests on it but i doubt it will give anything  
since i've had similiar problems from day one (did tests at that  
time w/o problems) with this gmirror setup (new disks).


Johan


Followup, I ran over the disk with Maxtors own test program, full  
length test. Not a single problem.

After reboot the raid is rebuilding fine:

GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1.

As usual it seems i cannot get the controller/driver to redetect  
the disk using atacontrol etc..


Johan


And now again... raid gone degraded only 2 days after reboot!

Jul 12 22:22:50 elfi kernel: ad4: FAILURE - device detached
Jul 12 22:22:50 elfi kernel: subdisk4: detached
Jul 12 22:22:50 elfi kernel: ad4: detached
Jul 12 22:22:50 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Jul 12 22:22:50 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=120776474624, length=32768)]error = 6


$ uname -a
FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue  
May  9 20:40:23 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/ 
sys/GENERIC  i386


Still no luck with atacontrol...

Is there any way to debug this further ?? I've tested the disk, the  
SATA cables are new... I've had similar problems with other  
motherboard...
I dont think this is related to hw problems, but rather a  
softwareproblem that needs to be solved, this is not something one  
can call stable ;)


So, any pointers how to enable more debugging or anything that could  
give some clues?


Johan


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ATA problems again ... (was: Re: GEOM problems again...)

2006-07-13 Thread Johan Ström


On 13 jul 2006, at 14.26, Robert Watson wrote:



On Thu, 13 Jul 2006, Johan Ström wrote:


And now again... raid gone degraded only 2 days after reboot!

Jul 12 22:22:50 elfi kernel: ad4: FAILURE - device detached
Jul 12 22:22:50 elfi kernel: subdisk4: detached
Jul 12 22:22:50 elfi kernel: ad4: detached
Jul 12 22:22:50 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Jul 12 22:22:50 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=120776474624, length=32768)]error = 6


$ uname -a
FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue  
May  9 20:40:23 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/ 
sys/GENERIC  i386


Still no luck with atacontrol...

Is there any way to debug this further ?? I've tested the disk,  
the SATA cables are new... I've had similar problems with other  
motherboard... I dont think this is related to hw problems, but  
rather a softwareproblem that needs to be solved, this is not  
something one can call stable ;)


So, any pointers how to enable more debugging or anything that  
could give some clues?


I don't have a whole lot to add to this thread, but have changed  
the subject to make sure that the right people are reading this.   
This is likely either a hardware problem (motherboard/cable/drive)  
or driver problem.  GEOM and the mirror driver seems to be behaving  
as desired (it detaches a drive reported by the driver as being  
bad).  Could you post the dmesg -v output for the probing of the  
ata controller and driver?


dmesg -v?

I got the full dmesg from dmesg.boot (this has been posted earlier in  
this thread too)


Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights  
reserved.

FreeBSD 6.1-RELEASE #3: Tue May  9 20:40:23 CEST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
ACPI APIC Table: A M I  OEMAPIC 
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) XP  (1200.01-MHz 686-class CPU)
  Origin = AuthenticAMD  Id = 0x662  Stepping = 2
   
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, 
MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE

  AMD Features=0xc0480800SYSCALL,MP,MMX+,3DNow+,3DNow
real memory  = 536674304 (511 MB)
avail memory = 515805184 (491 MB)
ioapic0 Version 1.1 irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: A M I OEMRSDT on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x4008-0x400b on acpi0
cpu0: ACPI CPU on acpi0
acpi_throttle0: ACPI CPU Throttling on cpu0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
agp0: NVIDIA nForce2 AGP Controller mem 0xf800-0xfbff at  
device 0.0 on pci0

pci0: memory, RAM at device 0.1 (no driver attached)
pci0: memory, RAM at device 0.2 (no driver attached)
pci0: memory, RAM at device 0.3 (no driver attached)
pci0: memory, RAM at device 0.4 (no driver attached)
pci0: memory, RAM at device 0.5 (no driver attached)
isab0: PCI-ISA bridge at device 1.0 on pci0
isa0: ISA bus on isab0
pci0: serial bus, SMBus at device 1.1 (no driver attached)
ohci0: OHCI (generic) USB controller mem 0xfebfb000-0xfebfbfff irq  
20 at device 2.0 on pci0

ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: OHCI (generic) USB controller on ohci0
usb0: USB revision 1.0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
ohci1: OHCI (generic) USB controller mem 0xfebfc000-0xfebfcfff irq  
21 at device 2.1 on pci0

ohci1: [GIANT-LOCKED]
usb1: OHCI version 1.0, legacy support
usb1: OHCI (generic) USB controller on ohci1
usb1: USB revision 1.0
uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 4 ports with 4 removable, self powered
ehci0: NVIDIA nForce2 Ultra 400 USB 2.0 controller mem  
0xfebfdc00-0xfebfdcff irq 22 at device 2.2 on pci0

ehci0: [GIANT-LOCKED]
usb2: EHCI version 1.0
usb2: companion controllers, 4 ports each: usb0 usb1
usb2: NVIDIA nForce2 Ultra 400 USB 2.0 controller on ehci0
usb2: USB revision 2.0
uhub2: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 8 ports with 8 removable, self powered
nve0: NVIDIA nForce MCP5 Networking Adapter port 0xdc00-0xdc07 mem  
0xfebfe000-0xfebfefff irq 20 at device 4.0 on pci0

nve0: Ethernet address 00:13:d4:bf:5b:79
miibus0: MII bus on nve0
rlphy0: RTL8201L 10/100 media interface on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
nve0: Ethernet address: 00:13:d4:bf:5b:79
pci0: multimedia, audio at device 6.0 (no driver attached)
pcib1: ACPI PCI-PCI bridge at device 8.0 on pci0
pci2: ACPI PCI bus on pcib1
pci2: display, VGA at device 6.0 (no driver attached)
xl0: 3Com 3c905C-TX Fast Etherlink XL port 0xcc00-0xcc7f mem  
0xfeafec00-0xfeafec7f irq 17 at device 9.0 on pci2

miibus1: MII bus on xl0
xlphy0: 3c905C 10/100 internal PHY on miibus1

Re: GEOM problems again...

2006-07-10 Thread Johan Ström


On 21 maj 2006, at 11.16, Johan Ström wrote:


Hi

I've had problems before with GEOM mirror and my SATA drives, and  
i've posted about it here before too. The solution seemd to be a  
change of motherboard, this reduced the crash very much (and also  
the speeds archieved was greatly improved, from 10-15MB/s to  
40-50MB/s..).
However after the change i had one or two crashes, but now it has  
been running for well over 50-60 days or so without any problems.
Then, 11 days ago I upgraded to 6.1... And now I got these  
crashes again (the mirror is crashed that is, the system still  
runs fine):


May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached
May 21 02:04:58 elfi kernel: subdisk6: detached
May 21 02:04:58 elfi kernel: ad6: detached
May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 disconnected.
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=11006308352, length=2048)]error = 6
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=164847927296, length=131072)]error = 6
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=256680296448, length=32768)]error = 6



Some info about the controller and disks:

May  9 22:46:52 elfi kernel: ata1: ATA channel 1 on atapci0
May  9 22:46:52 elfi kernel: atapci1: nVidia nForce2 Pro SATA150  
controller port  
0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0f, 
0x7c0

0-0x7c7f irq 22 at device 11.0 on pci0

May  9 22:46:52 elfi kernel: ad4: 286188MB Maxtor 7L300S0  
BANC1G10 at ata2-master SATA150
May  9 22:46:52 elfi kernel: ad6: 286188MB Maxtor 7L300S0  
BANC1G10 at ata3-master SATA150
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created  
(id=4118114647).
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 detected.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 detected.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 activated.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 activated.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
mirror/gm0s1 launched.
May  9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ 
mirror/gm0s1a


Anyone got any new clues? Afaik the disks should be working fine  
(they are 6 months old and this same problem has occured multiple  
times...)


Hope to solve this ;)

Thanks
Johan



Here we go again

Jul  7 16:20:09 elfi kernel: ad4: FAILURE - device detached
Jul  7 16:20:09 elfi kernel: subdisk4: detached
Jul  7 16:20:09 elfi kernel: ad4: detached
Jul  7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Jul  7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=88896847872, length=32768)]error = 6


However no read read timeouts etc as before, just this. 18 days  
uptime this time (i've rebooted for other reasons since last mail).  
It always seems to be ad4 that is disconnecting.. I'm going to do  
some disk tests on it but i doubt it will give anything since i've  
had similiar problems from day one (did tests at that time w/o  
problems) with this gmirror setup (new disks).


Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: GEOM problems again...

2006-07-10 Thread Johan Ström


On 10 jul 2006, at 11.09, Johan Ström wrote:



On 21 maj 2006, at 11.16, Johan Ström wrote:


Hi

I've had problems before with GEOM mirror and my SATA drives, and  
i've posted about it here before too. The solution seemd to be a  
change of motherboard, this reduced the crash very much (and also  
the speeds archieved was greatly improved, from 10-15MB/s to  
40-50MB/s..).
However after the change i had one or two crashes, but now it has  
been running for well over 50-60 days or so without any problems.
Then, 11 days ago I upgraded to 6.1... And now I got these  
crashes again (the mirror is crashed that is, the system still  
runs fine):


May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached
May 21 02:04:58 elfi kernel: subdisk6: detached
May 21 02:04:58 elfi kernel: ad6: detached
May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 disconnected.
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=11006308352, length=2048)]error = 6
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=164847927296, length=131072)]error = 6
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=256680296448, length=32768)]error = 6



Some info about the controller and disks:

May  9 22:46:52 elfi kernel: ata1: ATA channel 1 on atapci0
May  9 22:46:52 elfi kernel: atapci1: nVidia nForce2 Pro SATA150  
controller port  
0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0f 
,0x7c0

0-0x7c7f irq 22 at device 11.0 on pci0

May  9 22:46:52 elfi kernel: ad4: 286188MB Maxtor 7L300S0  
BANC1G10 at ata2-master SATA150
May  9 22:46:52 elfi kernel: ad6: 286188MB Maxtor 7L300S0  
BANC1G10 at ata3-master SATA150
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created  
(id=4118114647).
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 detected.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 detected.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 activated.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 activated.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
mirror/gm0s1 launched.
May  9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ 
mirror/gm0s1a


Anyone got any new clues? Afaik the disks should be working fine  
(they are 6 months old and this same problem has occured multiple  
times...)


Hope to solve this ;)

Thanks
Johan



Here we go again

Jul  7 16:20:09 elfi kernel: ad4: FAILURE - device detached
Jul  7 16:20:09 elfi kernel: subdisk4: detached
Jul  7 16:20:09 elfi kernel: ad4: detached
Jul  7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Jul  7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=88896847872, length=32768)]error = 6


However no read read timeouts etc as before, just this. 18 days  
uptime this time (i've rebooted for other reasons since last mail).  
It always seems to be ad4 that is disconnecting.. I'm going to do  
some disk tests on it but i doubt it will give anything since i've  
had similiar problems from day one (did tests at that time w/o  
problems) with this gmirror setup (new disks).


Johan


Followup, I ran over the disk with Maxtors own test program, full  
length test. Not a single problem.

After reboot the raid is rebuilding fine:

GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1.

As usual it seems i cannot get the controller/driver to redetect the  
disk using atacontrol etc..


Johan


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable- 
[EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: SNMP access to pf ALTQ data?

2006-07-08 Thread Johan Ström

On 8 jul 2006, at 09.18, J. Buck Caldwell wrote:


Forgive the cross-posting, but I think I need a wider audience.

Is it possible to track pf ALTQ usage with MRTG? I notice that  
FreeBSD's built-in bsnmpd has a module and mibs to support pf, but  
I know too little about SNMP to figure out how to access the queue  
stats.


Specifically, I'm looking to make a series of MRTG graphs that show  
the total bytes that pass through each queue. I figure if worst  
comes to worst, I can work out a separate program that parses the  
output of 'pfctl -vsq' and returns that as MRTG-readable input, but  
it would be much smoother to get it via SNMP, if it can be done.


I got one of those, a small python script which feeds the data into a  
rrd file:


https://svn.stromnet.org/repos/misc/trunk/rrd/pfque-rrd.py

Works fine, the only problem I have is when i reload my rules (that  
is, reset the counters).. The graph goes mad ;)
Altough, if there is some way to do this via SNMP instead, I would  
also like to know...
The above scripts uses tftp to move the rrdfiles to my graphing host.  
I call it from crontab every minute.

For the graphing I use this:

https://svn.stromnet.org/repos/misc/trunk/rrd/pfque-graph.py

And the result looks like this:

http://stats.stromnet.org/router/details.php?file=pfqueue_out

If you look at the last month/year graphs, you see the problem with  
resetting the counters..




Any help would be appreciated. I'm sure others would be interested  
in this as well.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable- 
[EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


maxproc limit exceeded by uid 0

2006-06-22 Thread Johan Ström

Hi

Today I woke up and was not able to log in to my system (ssh). Some  
stuff worked (DNS for example, this box runs bind), altough the IMAP  
server didnt work to well...

Anyway, I checked out local console:

maxproc limit exceeded by uid 0, please see tuning(7) and login.conf(5).

Repeated 23 times...

I was not able to do anything, neither local or remote, ACPI didnt  
work very well except for giving me

acpi: suspend request ignored (not ready yet)
the second time I pressed the power button... So a hard reboot it was.

Anyway.. I'm using default login.conf, which have unlimited for all  
resource limits.. So wtf is this?


As far as I know there shouldnt be any processes running away but you  
never know... The only thing would be a umount -f /some/nfs and a  
df -h running (the umount as root) but both hanged since the NFS  
volume was unreachable, but why would this fork like this?


Dunno what more info could be useable, doesnt have much more in logs...

Ström___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: maxproc limit exceeded by uid 0

2006-06-22 Thread Johan Ström


On 22 jun 2006, at 09.57, [LoN]Kamikaze wrote:


Johan Ström wrote:

Anyway.. I'm using default login.conf, which have unlimited for all
resource limits.. So wtf is this?


Look at

# sysctl kern.maxproc


Okay, 4096 procs... But what was those 4k procs...On my newly booted  
i got 127... Well I guess there is now way to find out now.___

freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: maxproc limit exceeded by uid 0

2006-06-22 Thread Johan Ström


On 22 jun 2006, at 17.42, Dan Nelson wrote:



If it ever happens again, you can drop to the debugger with
Ctrl-Alt-ESC and run ps to get a list of running processes.  You
might even be able to recover by killing some offending processes with
kill 9 pid, then continue with c.



Hm, I tried this on a 6.1 GENERIC box just now, ctrl-alt-esc doesnt  
seem to give me any debugger... I suppose I have to recompile with  
DDB for this? Is this recommended for servers where I normally dont  
need DDB?


Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: maxproc limit exceeded by uid 0

2006-06-22 Thread Johan Ström


On 22 jun 2006, at 18.45, Dan Nelson wrote:


In the last episode (Jun 22), Johan Strm said:

On 22 jun 2006, at 17.42, Dan Nelson wrote:

If it ever happens again, you can drop to the debugger with
Ctrl-Alt-ESC and run ps to get a list of running processes.  You
might even be able to recover by killing some offending processes
with kill 9 pid, then continue with c.


Hm, I tried this on a 6.1 GENERIC box just now, ctrl-alt-esc doesnt
seem to give me any debugger... I suppose I have to recompile with
DDB for this? Is this recommended for servers where I normally dont
need DDB?


Right; DDB isn't in GENERIC.  The problem with not including DDB on
servers you don't think you'll need it on is:  the one time you need
it, it's not there :)


Very true.. ;) But are there any reasons NOT to have it on my servers?

--
Johan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


GEOM problems again...

2006-05-21 Thread Johan Ström

Hi

I've had problems before with GEOM mirror and my SATA drives, and  
i've posted about it here before too. The solution seemd to be a  
change of motherboard, this reduced the crash very much (and also the  
speeds archieved was greatly improved, from 10-15MB/s to 40-50MB/s..).
However after the change i had one or two crashes, but now it has  
been running for well over 50-60 days or so without any problems.
Then, 11 days ago I upgraded to 6.1... And now I got these crashes  
again (the mirror is crashed that is, the system still runs fine):


May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached
May 21 02:04:58 elfi kernel: subdisk6: detached
May 21 02:04:58 elfi kernel: ad6: detached
May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 disconnected.
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=11006308352, length=2048)]error = 6
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=164847927296, length=131072)]error = 6
May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ 
(offset=256680296448, length=32768)]error = 6



Some info about the controller and disks:

May  9 22:46:52 elfi kernel: ata1: ATA channel 1 on atapci0
May  9 22:46:52 elfi kernel: atapci1: nVidia nForce2 Pro SATA150  
controller port  
0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0f, 
0x7c0

0-0x7c7f irq 22 at device 11.0 on pci0

May  9 22:46:52 elfi kernel: ad4: 286188MB Maxtor 7L300S0 BANC1G10  
at ata2-master SATA150
May  9 22:46:52 elfi kernel: ad6: 286188MB Maxtor 7L300S0 BANC1G10  
at ata3-master SATA150
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created  
(id=4118114647).
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 detected.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 detected.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 activated.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 activated.
May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
mirror/gm0s1 launched.
May  9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/ 
mirror/gm0s1a


Anyone got any new clues? Afaik the disks should be working fine  
(they are 6 months old and this same problem has occured multiple  
times...)


Hope to solve this ;)

Thanks
Johan



Re: gmirror/disk problems!

2006-02-26 Thread Johan Ström


On 10 feb 2006, at 07.15, Johan Ström wrote:


Hi list!

I've been experiencing problems earlier with gmirror (thread Page  
fault, GEOM problem??). My gmirror crashed, and the box compleatly  
froze.
Now I got a new mobo, and it has been working great since (no  
crashes, and i get decent 40-50mb/s read/write instead of ~10-20).

This morning i woke up to this:
...

I could try to move the disks to my promise sata2 tx4 card i bought  
for the old mobo (which didnt have sata)... But i'd rather find the  
problem ;)


Hope someone can help.
Thanks
Johan



And now it happened again..

Feb 26 00:13:27 elfi kernel: subdisk4: detached
Feb 26 00:13:27 elfi kernel: ad4: detached
Feb 26 00:13:27 elfi kernel: unknown: TIMEOUT - READ_DMA retrying (1  
retry left) LBA=11660623

Feb 26 00:13:27 elfi kernel: unknown: timeout waiting to issue command
Feb 26 00:13:27 elfi kernel: unknown: error issueing READ_DMA command
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 disconnected.
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=5).  
ad4s1[READ(offset=5970206720, length=16384)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5974401024, length=131072)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5976973312, length=131072)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5977153536, length=131072)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5977333760, length=131072)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5977530368, length=131072)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5977710592, length=131072)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5977907200, length=131072)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5978087424, length=131072)]
Feb 26 00:13:27 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad4s1[WRITE(offset=5978939392, length=114688)]


And then on reboot

Feb 26 20:17:53 elfi kernel: ad4: 286188MB Maxtor 7L300S0 BANC1G10  
at ata2-master SATA150
Feb 26 20:17:53 elfi kernel: ad6: 286188MB Maxtor 7L300S0 BANC1G10  
at ata3-master SATA150
Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1 created  
(id=4118114647).
Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad4s1 detected.
Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 detected.

Feb 26 20:17:53 elfi kernel: Root mount waiting for: GMIRROR
Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad6s1 activated.
Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
mirror/gm0s1 launched.
Feb 26 20:17:53 elfi kernel: GEOM_MIRROR: Device gm0s1: rebuilding  
provider ad4s1.


Rebuilding currently

This problem have occured many times now.. Does anyone know why this  
happens? Is there some bug somewhere that needs to be haunted down??  
In geom? in ata driver? This needs to be solved..



Johan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: gmirror/disk problems!

2006-02-12 Thread Johan Ström

On 10 feb 2006, at 07.43, Johan Ström wrote:



On 10 feb 2006, at 07.15, Johan Ström wrote:


Hi list!

I've been experiencing problems earlier with gmirror (thread Page  
fault, GEOM problem??). My gmirror crashed, and the box  
compleatly froze.
Now I got a new mobo, and it has been working great since (no  
crashes, and i get decent 40-50mb/s read/write instead of ~10-20).

This morning i woke up to this:


subdisk4: detached
ad4: detached
unknown: TIMEOUT - READ_DMA retrying (1 retry left) LBA=187595536
unknown: timeout waiting to issue command
unknown: error issueing READ_DMA command
GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected.
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=134373376, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=134438912, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=268591104, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=268607488, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=268656640, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=5966399488, length=2048)]
GEOM_MIRROR: Request failed (error=5). ad4s1[READ 
(offset=96048882176, length=32768)]


Just like old times... However, no page faults! Yay.. But.. what  
is going on here?? Why does the atacontroler or whatever think they
need to detach my disk?? And how do i reattach it? I have tried  
some stuff with atacontrol:


$ atacontrol list
ATA channel 0:
Master: acd0 CD-ROM CDU701-F/1.0q ATA/ATAPI revision 0
Slave:   no device present
ATA channel 1:
Master:  no device present
Slave:   no device present
ATA channel 2:
Master:  no device present
Slave:   no device present
ATA channel 3:
Master:  ad6 Maxtor 7L300S0/BANC1G10 Serial ATA v1.0
Slave:   no device present
$ atacontrol attach ata2
atacontrol: ioctl(IOCATAATTACH): File exists
$ atacontrol reinit ata2
 here i get a long system wide block
Master:  no device present
Slave:   no device present
$

Okay so no luck reiniting it.. I dont realy wanna reboot the box  
(each time this might happen).. But im happy that it doesnt crash  
totally anymore heh...


dmesg of current system:


Feb  2 19:39:09 elfi syslogd: kernel boot file is /boot/kernel/kernel
Feb  2 19:39:09 elfi kernel: Copyright (c) 1992-2005 The FreeBSD  
Project.
Feb  2 19:39:09 elfi kernel: Copyright (c) 1979, 1980, 1983, 1986,  
1988, 1989, 1991, 1992, 1993, 1994
Feb  2 19:39:09 elfi kernel: The Regents of the University of  
California. All rights reserved.
Feb  2 19:39:09 elfi kernel: FreeBSD 6.0-RELEASE #2: Thu Dec  1  
20:18:30 CET 2005
Feb  2 19:39:09 elfi kernel: [EMAIL PROTECTED]:/usr/obj/usr/ 
src/sys/GENERIC

Feb  2 19:39:09 elfi kernel: ACPI APIC Table: A M I  OEMAPIC 
Feb  2 19:39:09 elfi kernel: Timecounter i8254 frequency 1193182  
Hz quality 0
Feb  2 19:39:09 elfi kernel: CPU: AMD Athlon(tm) XP  (1200.01-MHz  
686-class CPU)
Feb  2 19:39:09 elfi kernel: Origin = AuthenticAMD  Id = 0x662   
Stepping = 2
Feb  2 19:39:09 elfi kernel:  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PG 
E,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
Feb  2 19:39:09 elfi kernel: AMD Features=0xc0480800SYSCALL,MP,MMX 
+,3DNow+,3DNow

Feb  2 19:39:09 elfi kernel: real memory  = 536674304 (511 MB)
Feb  2 19:39:09 elfi kernel: avail memory = 515833856 (491 MB)
Feb  2 19:39:09 elfi kernel: ioapic0 Version 1.1 irqs 0-23 on  
motherboard

Feb  2 19:39:09 elfi kernel: npx0: [FAST]
Feb  2 19:39:09 elfi kernel: npx0: math processor on motherboard
Feb  2 19:39:09 elfi kernel: npx0: INT 16 interface
Feb  2 19:39:09 elfi kernel: acpi0: A M I OEMRSDT on motherboard
Feb  2 19:39:09 elfi kernel: acpi0: Power Button (fixed)
Feb  2 19:39:09 elfi kernel: pci_link0: ACPI PCI Link LNKA irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link1: ACPI PCI Link LNKB irq 5  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link2: ACPI PCI Link LNKC irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link3: ACPI PCI Link LNKD irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link4: ACPI PCI Link LNKE irq 11  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link5: ACPI PCI Link LUS0 irq 5  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link6: ACPI PCI Link LUS1 irq 5  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link7: ACPI PCI Link LUS2 irq 3  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link8: ACPI PCI Link LKLN irq 5  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link9: ACPI PCI Link LAPU irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link10: ACPI PCI Link LAUI irq  
11 on acpi0
Feb  2 19:39:09 elfi kernel: pci_link11: ACPI PCI Link LKMO irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link12: ACPI PCI Link LKSM irq 5  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link13: ACPI PCI Link LFWR irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link14: ACPI PCI Link LETH irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link15: ACPI PCI Link

Re: gmirror/disk problems!

2006-02-09 Thread Johan Ström


On 10 feb 2006, at 07.15, Johan Ström wrote:


Hi list!

I've been experiencing problems earlier with gmirror (thread Page  
fault, GEOM problem??). My gmirror crashed, and the box compleatly  
froze.
Now I got a new mobo, and it has been working great since (no  
crashes, and i get decent 40-50mb/s read/write instead of ~10-20).

This morning i woke up to this:


subdisk4: detached
ad4: detached
unknown: TIMEOUT - READ_DMA retrying (1 retry left) LBA=187595536
unknown: timeout waiting to issue command
unknown: error issueing READ_DMA command
GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected.
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=134373376, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=134438912, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=268591104, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=268607488, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=268656640, length=16384)]
GEOM_MIRROR: Request failed (error=6). ad4s1[WRITE 
(offset=5966399488, length=2048)]
GEOM_MIRROR: Request failed (error=5). ad4s1[READ 
(offset=96048882176, length=32768)]


Just like old times... However, no page faults! Yay.. But.. what  
is going on here?? Why does the atacontroler or whatever think they
need to detach my disk?? And how do i reattach it? I have tried  
some stuff with atacontrol:


$ atacontrol list
ATA channel 0:
Master: acd0 CD-ROM CDU701-F/1.0q ATA/ATAPI revision 0
Slave:   no device present
ATA channel 1:
Master:  no device present
Slave:   no device present
ATA channel 2:
Master:  no device present
Slave:   no device present
ATA channel 3:
Master:  ad6 Maxtor 7L300S0/BANC1G10 Serial ATA v1.0
Slave:   no device present
$ atacontrol attach ata2
atacontrol: ioctl(IOCATAATTACH): File exists
$ atacontrol reinit ata2
 here i get a long system wide block
Master:  no device present
Slave:   no device present
$

Okay so no luck reiniting it.. I dont realy wanna reboot the box  
(each time this might happen).. But im happy that it doesnt crash  
totally anymore heh...


dmesg of current system:


Feb  2 19:39:09 elfi syslogd: kernel boot file is /boot/kernel/kernel
Feb  2 19:39:09 elfi kernel: Copyright (c) 1992-2005 The FreeBSD  
Project.
Feb  2 19:39:09 elfi kernel: Copyright (c) 1979, 1980, 1983, 1986,  
1988, 1989, 1991, 1992, 1993, 1994
Feb  2 19:39:09 elfi kernel: The Regents of the University of  
California. All rights reserved.
Feb  2 19:39:09 elfi kernel: FreeBSD 6.0-RELEASE #2: Thu Dec  1  
20:18:30 CET 2005
Feb  2 19:39:09 elfi kernel: [EMAIL PROTECTED]:/usr/obj/usr/src/ 
sys/GENERIC

Feb  2 19:39:09 elfi kernel: ACPI APIC Table: A M I  OEMAPIC 
Feb  2 19:39:09 elfi kernel: Timecounter i8254 frequency 1193182 Hz  
quality 0
Feb  2 19:39:09 elfi kernel: CPU: AMD Athlon(tm) XP  (1200.01-MHz 686- 
class CPU)
Feb  2 19:39:09 elfi kernel: Origin = AuthenticAMD  Id = 0x662   
Stepping = 2
Feb  2 19:39:09 elfi kernel:  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, 
MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
Feb  2 19:39:09 elfi kernel: AMD Features=0xc0480800SYSCALL,MP,MMX+, 
3DNow+,3DNow

Feb  2 19:39:09 elfi kernel: real memory  = 536674304 (511 MB)
Feb  2 19:39:09 elfi kernel: avail memory = 515833856 (491 MB)
Feb  2 19:39:09 elfi kernel: ioapic0 Version 1.1 irqs 0-23 on  
motherboard

Feb  2 19:39:09 elfi kernel: npx0: [FAST]
Feb  2 19:39:09 elfi kernel: npx0: math processor on motherboard
Feb  2 19:39:09 elfi kernel: npx0: INT 16 interface
Feb  2 19:39:09 elfi kernel: acpi0: A M I OEMRSDT on motherboard
Feb  2 19:39:09 elfi kernel: acpi0: Power Button (fixed)
Feb  2 19:39:09 elfi kernel: pci_link0: ACPI PCI Link LNKA irq 0 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link1: ACPI PCI Link LNKB irq 5 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link2: ACPI PCI Link LNKC irq 0 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link3: ACPI PCI Link LNKD irq 0 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link4: ACPI PCI Link LNKE irq 11  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link5: ACPI PCI Link LUS0 irq 5 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link6: ACPI PCI Link LUS1 irq 5 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link7: ACPI PCI Link LUS2 irq 3 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link8: ACPI PCI Link LKLN irq 5 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link9: ACPI PCI Link LAPU irq 0 on  
acpi0
Feb  2 19:39:09 elfi kernel: pci_link10: ACPI PCI Link LAUI irq 11  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link11: ACPI PCI Link LKMO irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link12: ACPI PCI Link LKSM irq 5  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link13: ACPI PCI Link LFWR irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link14: ACPI PCI Link LETH irq 0  
on acpi0
Feb  2 19:39:09 elfi kernel: pci_link15: ACPI PCI Link LATA irq 10  
on acpi0
Feb  2 19:39:09 elfi

Re: SCSI device timeout

2006-02-01 Thread Johan Ström


On 1 feb 2006, at 11.42, Holm Tiffe wrote:


Johan Ström wrote:


On 1 feb 2006, at 10.57, Holm Tiffe wrote:


Derkjan de Haan wrote:


All,

Today, after a cvsup (RELENG_6) and a rebuild of kernel and  
world, my

system no longer boots. It hangs on

Waiting 5 seconds for SCSI devices to settle

Booting from the previous kernel allows my system to boot again.
Please let
me know if I can do anything to diagnose further.


regards,

Derkjan de Haan



I have exactly the same problem here on a ASUS A7V333 Motherboard  
and

an Adaptec 3960D SCSI Controller.

The problem seems to be in the acpi interrupt routing, I've updated
the mainboard Bios to the last available version in the meantime
(1018.004 Beta) with no luck. Disabling acpi completly helps booting
the machine again..


Hi

I got one of those motherboards.. however no SCSI card but a promise.
Ive hade huge problems with it (check out the Page fault, GEOM
problem?? thread). The problems i had was random crashes and very
bad speed to the disks.
It was solved by throwing the mobo out with a new one with nforce2
chipset... Got great speeds now and haven't had a crash since i
installed it (roughly a week now).

Johan Ström
[EMAIL PROTECTED]
http://www.stromnet.org/



No Johan, my A7V333 has no problem, it runs for arounrd 2 years now  
as my

personal workstation here at work 24/7.

I've cvsupped RELENG_6 again for an hour or so and the now build  
kernel

runs flawlessly. There are some new patches in the pci code.

Regards,

Holm
--
LP::Kommunikation GbR  Holm Tiffe  * Administration,  
Development
FreibergNet.de Internet Systems phone +49 3731  
419010
Bereich Server  Technik fax +49 3731  
4196026
D-09599 Freiberg * Am St. Niclas Schacht 13 http:// 
www.freibergnet.de


Hi, yes I've been running it for around 2-3 years too, but with  
linux. A couple of months ago I switched to fbsd and problems began  
to occur.

Might not be the same problem however..

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Page fault, GEOM problem?? (also: using a ASUS A7N8X-XE/nForce2 utlra400?)

2006-01-28 Thread Johan Ström


On 23 jan 2006, at 20.01, Johan Ström wrote:



On 23 jan 2006, at 14.15, Michael S. Eubanks wrote:


On Mon, 2006-01-23 at 10:24 +0100, Johan Ström wrote:

On 23 jan 2006, at 09.53, Michael S. Eubanks wrote:


On Mon, 2006-01-23 at 06:43 +0100, Johan Ström wrote:

Wish I could be of more help. :)  Have you tried to toggle the  
sysctl

dma flags?  I've seen similar posts in the past with read timeouts
caused from dma being enabled.

# sysctl -a | grep dma
...
hw.ata.ata_dma: 1  === Try turning this one off (1 == 0).
hw.ata.atapi_dma: 1
...


Disabling DMA, wouldnt that give me pretty bad performance?


-Michael



If it was not the problem, you could always change it back.  It  
*should*

be possible to simply set the control mode on those two disks (``man
rc.early'', ``man atacontrol'').  Unfortunately, the problem is  
noted as
errata in several FreeBSD versions tending to appear on SATA  
disks.  I

believe this is also a problem with some linux setups.  If you google
``FreeBSD hw.ata.ata_dma RELEASE'' you will eventually find the
following page relating to Asus motherboards:

http://www.ryxi.com/freebsd/63-668-write-dma-other-similar-errors- 
read.shtml


I picked it out based on the following line in the dmesg output:


Nov 29 20:46:09 elfi kernel: ACPI APIC Table: ASUS   A7V333  


I'd say it's worth a shot.  You might even try turning both the flags
off temporarily to see what you get.  Your guess is as good as  
mine.  :)




Okay, tried turning it of.. The disk IO speeds went even lower...  
whoping 9-10MB/s and lots of load ;)
And since the crashes comes randomly (haven't been able to  
reproduce them on deamon) i dont realy want to run it like this.. ;)


I did another test. I moved the controller card and the disks to my  
MSI K8N Neo motherboard (with AMD64 3200+ etc), and immediatly I  
got write speeds of ~49MB/s:


 $ dd if=/dev/zero of=bigfile.zero bs=1024 count=124
1024024576 bytes transferred in 21.974227 secs (46601164 bytes/sec)

Compared to
$ dd if=/dev/zero of=bigfile.zero bs=1024 count=124
1024024576 bytes transferred in 78.897708 secs (12979142 bytes/sec)

All tests where done in
/dev/mirror/gm0s1f on /usr (ufs, NFS exported, local, soft-updates,  
acls)


Soo.. I guess this mobo is just plain fucked and needs to be  
replaced with something newer ;)
Bad thing is, this is Socket A.. so there isnt so many choices left  
in the mobo market..


However, i found a ASUS A7N8X-XE NF ULTRA 400 SOCKET A with Nforce2  
Ultra 400 chipset.. Does anyone have any knowledge about this chipset?
How well does it work with Fbsd? I'll do some googling but if  
someone is using this successfully or unsuccessfully, please let me  
know :)


Got the board now, everything seems to work great, fine  
transferspeeds, no crashes so far (1 day..). Lets hope this thread  
ends here..:)



--
Johan




Re: Page fault, GEOM problem??

2006-01-23 Thread Johan Ström


On 23 jan 2006, at 09.53, Michael S. Eubanks wrote:


On Mon, 2006-01-23 at 06:43 +0100, Johan Ström wrote:

Wish I could be of more help. :)  Have you tried to toggle the sysctl
dma flags?  I've seen similar posts in the past with read timeouts
caused from dma being enabled.

# sysctl -a | grep dma
...
hw.ata.ata_dma: 1  === Try turning this one off (1 == 0).
hw.ata.atapi_dma: 1
...


Disabling DMA, wouldnt that give me pretty bad performance?


-Michael

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable- 
[EMAIL PROTECTED]




Re: Page fault, GEOM problem?? (also: using a ASUS A7N8X-XE/nForce2 utlra400?)

2006-01-23 Thread Johan Ström


On 23 jan 2006, at 14.15, Michael S. Eubanks wrote:


On Mon, 2006-01-23 at 10:24 +0100, Johan Ström wrote:

On 23 jan 2006, at 09.53, Michael S. Eubanks wrote:


On Mon, 2006-01-23 at 06:43 +0100, Johan Ström wrote:

Wish I could be of more help. :)  Have you tried to toggle the  
sysctl

dma flags?  I've seen similar posts in the past with read timeouts
caused from dma being enabled.

# sysctl -a | grep dma
...
hw.ata.ata_dma: 1  === Try turning this one off (1 == 0).
hw.ata.atapi_dma: 1
...


Disabling DMA, wouldnt that give me pretty bad performance?


-Michael



If it was not the problem, you could always change it back.  It  
*should*

be possible to simply set the control mode on those two disks (``man
rc.early'', ``man atacontrol'').  Unfortunately, the problem is  
noted as

errata in several FreeBSD versions tending to appear on SATA disks.  I
believe this is also a problem with some linux setups.  If you google
``FreeBSD hw.ata.ata_dma RELEASE'' you will eventually find the
following page relating to Asus motherboards:

http://www.ryxi.com/freebsd/63-668-write-dma-other-similar-errors- 
read.shtml


I picked it out based on the following line in the dmesg output:


Nov 29 20:46:09 elfi kernel: ACPI APIC Table: ASUS   A7V333  


I'd say it's worth a shot.  You might even try turning both the flags
off temporarily to see what you get.  Your guess is as good as  
mine.  :)




Okay, tried turning it of.. The disk IO speeds went even lower...  
whoping 9-10MB/s and lots of load ;)
And since the crashes comes randomly (haven't been able to reproduce  
them on deamon) i dont realy want to run it like this.. ;)


I did another test. I moved the controller card and the disks to my  
MSI K8N Neo motherboard (with AMD64 3200+ etc), and immediatly I got  
write speeds of ~49MB/s:


 $ dd if=/dev/zero of=bigfile.zero bs=1024 count=124
1024024576 bytes transferred in 21.974227 secs (46601164 bytes/sec)

Compared to
$ dd if=/dev/zero of=bigfile.zero bs=1024 count=124
1024024576 bytes transferred in 78.897708 secs (12979142 bytes/sec)

All tests where done in
/dev/mirror/gm0s1f on /usr (ufs, NFS exported, local, soft-updates,  
acls)


Soo.. I guess this mobo is just plain fucked and needs to be replaced  
with something newer ;)
Bad thing is, this is Socket A.. so there isnt so many choices left  
in the mobo market..


However, i found a ASUS A7N8X-XE NF ULTRA 400 SOCKET A with Nforce2  
Ultra 400 chipset.. Does anyone have any knowledge about this chipset?
How well does it work with Fbsd? I'll do some googling but if someone  
is using this successfully or unsuccessfully, please let me know :)


--
Johan

Re: Page fault, GEOM problem??

2006-01-23 Thread Johan Ström

On 23 jan 2006, at 20.16, Paul T. Root wrote:

My friends disks are SATA. The jumper was to force
the drives to use the SATA 1.x 1.5 gig standard instead
of the faster SATA 2.x standard. Older cards can have
trouble recognizing newer disks.

His were recognized, but very flaky. They've been solid
since.



These disk should be SATA150 afaik (Maxtor MaXLine III 300Gb).
The promise card is named SATAII 150..
So shouldnt be any missmatching. Both card and disks supports NCQ..  
Dunno about freebsd on the other hand..Havent found a way to enable/ 
disable this




Johan Ström wrote:


On 23 jan 2006, at 15.29, Paul T. Root wrote:


I'm coming in very late here, and only have some
hearsay. But, a friend of mine has built a new hobby
machine, with twin 160G drives on a 3Ware 8006, working as
a stripe. He had a bunch of problems with stability of the drives
until I gave him a couple of tiny (half size) jumpers, that he
put on the drive. Smooth sailing since them. If needed, I can find
what the jumpers did. But looking through the controllers doco
should give you a clue.

As far as I know, SATA drives doesnt have jumpers.. Mine doesnt  
seem to do atleast.. There are two unused pins but i doubt they  
are for jumpers..




--
Paul Root
Few people know what to do when hula girls attack. - Sam, age 8







Re: Page fault, GEOM problem??

2006-01-22 Thread Johan Ström

On 22 jan 2006, at 22.58, Michael S. Eubanks wrote:




...snip...



Can there be problems with the mobo/controllercard? Or is it more
likely to be driver realted? Promise lists my motherboard (asus
a7v333) in their manual for the controllercard (promise sataII 150
TX4).




...snip...

After looking at the dmesg output, I am curious whether you are using
the promise sataII 150 TX4 controller for the raid disks?  I see  
you are

using 6.0-RELEASE whereas I'm using 5.4-STABLE with that particular
controller.  My dmesg output for the disk array looks like the
following:



Hi! Thanks for response!
Yes, this is a Promise SATAII 150 TX4 controller.. But afaik it  
doesnt do raid??





ad4: 238475MB HDT722525DLA380/V44OA80A [484521/16/63] at ata2-master
SATA150
ad6: 238475MB HDS722525VLSA80/V36OA60A [484521/16/63] at ata3-master
SATA150
ad8: 238475MB HDT722525DLA380/V44OA80A [484521/16/63] at ata4-master
SATA150
ad10: 238475MB HDT722525DLA380/V44OA80A [484521/16/63] at ata5- 
master

SATA150
ar0: 953900MB ATA RAID0 array [65535/255/63] status: READY subdisks:
 disk0 READY on ad4 at ata2-master
 disk1 READY on ad6 at ata3-master
 disk2 READY on ad8 at ata4-master
 disk3 READY on ad10 at ata5-master

The device I mount as my raid filesystem is ar0s1 and I believe it
corresponds to ``device ataraid'' in the kernel.  I read the raid
mirroring page in the handbook, although, I'm thinking your controller
should represent each disk as ``ar0'' and handle the mirroring itself
(possibly consisting of two sets of two disks).  I really don't know
though.



No /dev/ar*..




It looks like the RAID1 mirroring tutorial is for systems that don't
actually have a raid controller.  Hence, the RAID0 tutorial is the one
that I would be using if I did not use the promise controller.   
Because

I _DO_ use the controller, I am simply able to manipulate the ar0 disk
array as a single disk.  I imagine your setup will differ, but I hope
this helps.



This card does afaik dont have raid functionalitys (I've never read  
anything about it either on the web, the cards box or anywhere else..).

I'm running GENERIC, which does include ataraid..
What does your dmesg identify your card as?

atapci0: Promise PDC40518 SATA150 controller port 0xb800-0xb87f, 
0xb400-0xb4ff mem 0xfb80-0xfb800fff,0xfb00-0xfb01 irq 19  
at device 12.0 on pci0


Is it the same PDC chipset?

--
Johan



-Michael
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable- 
[EMAIL PROTECTED]







Re: Page fault, GEOM problem??

2006-01-22 Thread Johan Ström

On 23 jan 2006, at 01.17, Michael S. Eubanks wrote:



On Sun, 2006-01-22 at 23:51 +0100, Johan Ström wrote:

...snip...



On 22 jan 2006, at 22.58, Michael S. Eubanks wrote:
This card does afaik dont have raid functionalitys (I've never read
anything about it either on the web, the cards box or anywhere  
else..).

I'm running GENERIC, which does include ataraid..
What does your dmesg identify your card as?

atapci0: Promise PDC40518 SATA150 controller port 0xb800-0xb87f,
0xb400-0xb4ff mem 0xfb80-0xfb800fff,0xfb00-0xfb01 irq 19
at device 12.0 on pci0

Is it the same PDC chipset?

--
Johan




No, I have a different controller.  My mistake.  I think what is
happening is the DMA read command is failing, therefore causing the
device to be disconnected, and the kernel can't write to the disk from
that point on (this is somewhat obvious given the output below).



Nov 29 20:36:54 elfi kernel: subdisk10: detached
Nov 29 20:36:54 elfi kernel: ad10: detached
Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying
(1 retry left) LBA=426562704
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider
ad10s1 disconnected.



The message seen from the last line above is generated in any of the
following scenarios (from g_mirror.c):
  1. Device wasn't running yet, but disk disappear.
  2. Disk was active and disapppear.
  3. Disk disappear during synchronization process.



Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).
ad10s1[WRITE(offset=134356992, length=16384)]



As far as recovering the disk, I remember seeing something about  
booting
to single user mode and using fsck after a core dump in a previous  
post.
I'm assuming the disks worked initially and that you were able to  
label

them etc?  Is there any possibility that the disk state may be altered
by a power saving feature or setting in the BIOS and FreeBSD just
doesn't know when it happens until the next time it tries to access  
the

disk?



For recovering, i've always done a direct reboot, the gmirror  
rebuilds the mirror and fsck is run.
No problems reading labels etc, and never has been, only problem has  
been these sporadic crashes.. And the read/write performance (see  
earlier in thread)...


This is a server, so all bios setting for powersaving is (should be)  
shut of. Bios should thus never make the disk go to sleep.






-Michael




Thanks for trying to help!
--
Johan

Re: Page fault, GEOM problem??

2006-01-19 Thread Johan Ström

On 29 nov 2005, at 21.10, Johan Ström wrote:

On 19 nov 2005, at 00.30, Michal Mertl wrote:

Parv wrote:

in message [EMAIL PROTECTED], wrote Michal
Mertl thusly...


Johan Ström wrote:


On 18 nov 2005, at 18.43, Xin LI wrote:

...

So, it seems it does run savecore after running dumpon and
mounting  disks etc... Is that wrong?


No, this is normal. When you run savecore you need to have mounted
filesystems. In order to mount the filesystems they may have to be
checked. The fsck program requires big amount of memory to check
larger filesystems so the swap has to be enabled. Core dumps are
written to the dump device (swap) from the end whereas the swap is
normally used from the beginning (or the other way around).
Therefore there's quite a big chance that, even when the swap has
to be used for fsck, the core dump is intact and usable.


Is there any formula to calculate the size of swap to account for
fsck  core dump while assigning swap size (short of having two swap
partitions)?


None that I know of. Someone posted to some FreeBSD mailing list some
figures about the fsck consumption of memory. I really don't  
remember,
but I think it was something like some MBs of memory per quite a  
lot of

GB of file system space. E.g. that the fsck on normally sized file
systems (e.g. at most a couple of hundred GB) doesn't normally cosume
all of normally sized memory (=256MB) and thus doesn't need to  
swap.



If the usage of the swap file by fsck corrupts the core dump you
may start after next crash in single user mode and run the
commands manually (without enabling swap).


Is that after kernel (re)boots?  And would the commands to be
executed be savecore followed by swapon?


If the dump got corrupted by fsck, you would have to wait for another
crash and dump. Then you would reboot and start in single user mode,
repair the file systems without swap enabled (fsck would crash on the
large file system(s)) and then run savecore. Swapon is then  
irrelevant,

you probably don't need swap for savecore. After running savecore you
can start normally multi user (exit from the single user shell).

I didn't try all of that but I believe it should work.

Michal



I just got another coredump, hadn't had one since the first one.  
From messages:


Nov 29 20:36:54 elfi kernel: subdisk10: detached
Nov 29 20:36:54 elfi kernel: ad10: detached
Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying  
(1 retry left) LBA=426562704
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad10s1 disconnected.
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134356992, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134373376, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134389760, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134438912, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268591104, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268607488, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268623872, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5966307328, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5967650816, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5968355328, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5968584704, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5969715200, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5971795968, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5972697088, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063848960, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063865344, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063881728, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063914496, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064324096, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064340480, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064373248

Re: Page fault, GEOM problem??

2005-12-01 Thread Johan Ström

On 29 nov 2005, at 21.10, Johan Ström wrote:


I just got another coredump, hadn't had one since the first one.  
From messages:


Nov 29 20:36:54 elfi kernel: subdisk10: detached
Nov 29 20:36:54 elfi kernel: ad10: detached
Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying  
(1 retry left) LBA=426562704
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad10s1 disconnected.
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134356992, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134373376, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134389760, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134438912, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268591104, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268607488, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268623872, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5966307328, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5967650816, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5968355328, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5968584704, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5969715200, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5971795968, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5972697088, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063848960, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063865344, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063881728, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063914496, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064324096, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064340480, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064373248, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064471552, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18761523712, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18762850816, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18762867200, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18762883584, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18762899968, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18762949120, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18762965504, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18846032384, length=131072)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18846228992, length=131072)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18846441984, length=131072)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=18846638592, length=131072)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=20110369280, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=2011168, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=20111696384, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=21073961472, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=21073977856, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=21844845056, length=16384

Re: Page fault, GEOM problem??

2005-11-29 Thread Johan Ström

On 19 nov 2005, at 00.30, Michal Mertl wrote:

Parv wrote:

in message [EMAIL PROTECTED], wrote Michal
Mertl thusly...


Johan Ström wrote:


On 18 nov 2005, at 18.43, Xin LI wrote:

...

So, it seems it does run savecore after running dumpon and
mounting  disks etc... Is that wrong?


No, this is normal. When you run savecore you need to have mounted
filesystems. In order to mount the filesystems they may have to be
checked. The fsck program requires big amount of memory to check
larger filesystems so the swap has to be enabled. Core dumps are
written to the dump device (swap) from the end whereas the swap is
normally used from the beginning (or the other way around).
Therefore there's quite a big chance that, even when the swap has
to be used for fsck, the core dump is intact and usable.


Is there any formula to calculate the size of swap to account for
fsck  core dump while assigning swap size (short of having two swap
partitions)?


None that I know of. Someone posted to some FreeBSD mailing list some
figures about the fsck consumption of memory. I really don't remember,
but I think it was something like some MBs of memory per quite a  
lot of

GB of file system space. E.g. that the fsck on normally sized file
systems (e.g. at most a couple of hundred GB) doesn't normally cosume
all of normally sized memory (=256MB) and thus doesn't need to  
swap.



If the usage of the swap file by fsck corrupts the core dump you
may start after next crash in single user mode and run the
commands manually (without enabling swap).


Is that after kernel (re)boots?  And would the commands to be
executed be savecore followed by swapon?


If the dump got corrupted by fsck, you would have to wait for another
crash and dump. Then you would reboot and start in single user mode,
repair the file systems without swap enabled (fsck would crash on the
large file system(s)) and then run savecore. Swapon is then  
irrelevant,

you probably don't need swap for savecore. After running savecore you
can start normally multi user (exit from the single user shell).

I didn't try all of that but I believe it should work.

Michal



I just got another coredump, hadn't had one since the first one. From  
messages:


Nov 29 20:36:54 elfi kernel: subdisk10: detached
Nov 29 20:36:54 elfi kernel: ad10: detached
Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying  
(1 retry left) LBA=426562704
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad10s1 disconnected.
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134356992, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134373376, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134389760, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134438912, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268591104, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268607488, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268623872, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5966307328, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5967650816, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5968355328, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5968584704, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5969715200, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5971795968, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=5972697088, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063848960, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063865344, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063881728, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16063914496, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064324096, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064340480, length=16384)]
Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=16064373248, length=16384)]
Nov 29 20:36:54 elfi kernel

Re: Page fault, GEOM problem??

2005-11-19 Thread Johan Ström


On 19 nov 2005, at 02.35, Pawel Jakub Dawidek wrote:


On Sat, Nov 19, 2005 at 01:55:57AM +0100, Johan Ström wrote:

snip

+ I just noticed another thing... My disk performance... sucks! :P
+
+ Some examples (from an otherwise unloaded system):
+
+ [EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero  
bs=1024 count=100

+ 100+0 records in
+ 100+0 records out
+ 102400 bytes transferred in 77.014797 secs (13296146 bytes/sec)

You won't get more with such small block size. Try bs=128k.


Hi
Can't say that a bigger blocksize did much better..

[EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=128k  
count=1

1+0 records in
1+0 records out
131072 bytes transferred in 98.519181 secs (13304211 bytes/sec)

[EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=512k  
count=1

^C3587+0 records in
3587+0 records out
1880621056 bytes transferred in 145.049578 secs (12965367 bytes/sec)

[EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=50k  
count=1

1+0 records in
1+0 records out
51200 bytes transferred in 38.536217 secs (13286203 bytes/sec)

All this time, iostats MB/s column wouldnt go over 0.24MB/s...

Back on GENERIC:

[EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=128k  
count=1

1+0 records in
1+0 records out
131072 bytes transferred in 99.497358 secs (13173415 bytes/sec)

[EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=512k  
count=1000

1000+0 records in
1000+0 records out
524288000 bytes transferred in 39.019239 secs (13436654 bytes/sec)

Still slow.. However, iostat goes up as high as 5.64MB/s on each disk  
in the mirror.






--
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Page fault, GEOM problem??

2005-11-18 Thread Johan Ström


On 18 nov 2005, at 10.17, Xin LI wrote:


On 11/18/05, Johan Ström [EMAIL PROTECTED] wrote:

Ok, just got this not so very nice error on a RELENG_6_0 box (built
from sources this morning, GENERIC kernel minus drivers I dont use):
The network card is the exact same model as the one I used in the
test machine, didn't have any problems there..

[...]

So, any ideas what this can be? If there were a disk crash, wish I
have a hard time believing since I ran powermax (maxtor test program)
on both of these disk 3 weeks ago and they have been running fine w/o
a single problem since I started using them, why didn't just GEOM
kick in and run on the other disk? Pagefaulting is not a way to react
if a disk goes dead..

Hope someone can help me/this problem doesn't occur any more... but I
suppose that is to much to hope for...


Would you please consider trying to obtain a crashdump and send the
backtrace so we can investigate more?

(Hints can be found at
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers- 
handbook/kerneldebug.html#KERNELDEBUG-OBTAIN)




Thanks for answer

Doesnt look like I got any usable dump devices..
When booting i get

GEOM_MIRROR: Device gm0s1 created (id=4118114647).
GEOM_MIRROR: Device gm0s1: provider ad6s1 detected.
GEOM_MIRROR: Device gm0s1: provider ad10s1 detected.
GEOM_MIRROR: Device gm0s1: provider ad6s1 activated.
GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched.
GEOM_MIRROR: Device gm0s1: rebuilding provider ad10s1.
Trying to mount root from ufs:/dev/mirror/gm0s1a
WARNING: / was not properly dismounted
Loading configuration files.
No suitable dump device was found.
Entropy harvesting:
interrupts
ethernet
point_to_point
kickstart
.
swapon: adding /dev/mirror/gm0s1b as swap device

Then naturally:
/etc/rc: WARNING: Dump device does not exist.  Savecore not run.

Looked around in the rc-scripts and tried to figure out what it did,  
the dumpon script

tries to autolookup a good dump device but finds none..
According to the page you linked to, the dumpon command has to be  
executed AFTER swapon.. Why is the rc scripts trying to run it before  
swapon then?

Anyway, tried to do dumpon manually on my swap drive:

$ dumpon -v /dev/mirror/gm0s1b
dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported

Didn't work too good..
Also tried savecore manually:

$ savecore /var/crash/ /dev/mirror/gm0s1b
savecore: no dumps found

Didnt work very good either (but probably expected since there was no  
working dumps..)
Google showed me some other thread in this list about gmirror swap  
dump, just a question (if it was supported) w/o any answers tho. Same  
error as I got.


Hope this helps.
Thanks again

Johan


Thanks,
--
Xin LI [EMAIL PROTECTED] http://www.delphij.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable- 
[EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Page fault, GEOM problem??

2005-11-18 Thread Johan Ström

Hi!

On 18 nov 2005, at 18.43, Xin LI wrote:


Hi, Johan,

On 11/18/05, Johan Ström [EMAIL PROTECTED] wrote:

On 18 nov 2005, at 10.17, Xin LI wrote:

[snip]

Doesnt look like I got any usable dump devices..
When booting i get

[...]

Loading configuration files.
No suitable dump device was found.
Entropy harvesting:
interrupts
ethernet
point_to_point
kickstart
.
swapon: adding /dev/mirror/gm0s1b as swap device


I see, so your both SATA disks are in the same mirror group...


Then naturally:
/etc/rc: WARNING: Dump device does not exist.  Savecore not run.

Looked around in the rc-scripts and tried to figure out what it did,
the dumpon script
tries to autolookup a good dump device but finds none..


Unfortunately, kernel dumps currently does not support every device,
for some technical reasons (probably to simplify the crash code so
they do not make more mistakes^Wdamages)


According to the page you linked to, the dumpon command has to be
executed AFTER swapon.. Why is the rc scripts trying to run it before
swapon then?


I guess this is because that dumpon now can detect dump device
automatically, but I'm not quite sure about this.  Will look for the
reason.  I think either Handbook should be updated, or the code should
be corrected.

What I am very curious is that why dumpon is BEFORE savecore.  Maybe
I have some misunderstanding...


Sorry, partly my misstake.. I think i missunderstod how save savecore  
works below (when i tried it manually in last mail)..
But the messages from above are directly from boot, seems it tries  
dumpon before savecore? Relevant bootlog from last boot:



ad0: 2441MB WDC AC22500L 32.41N35 at ata0-master UDMA33
acd0: CDROM CD-ROM CDU701-F/1.0q at ata1-master PIO4
ad6: 286188MB Maxtor 7L300S0 BANC1G10 at ata3-master SATA150
ad10: 286188MB Maxtor 7L300S0 BANC1G10 at ata5-master SATA150
GEOM_MIRROR: Device gm0s1 created (id=4118114647).
GEOM_MIRROR: Device gm0s1: provider ad6s1 detected.
GEOM_MIRROR: Device gm0s1: provider ad10s1 detected.
GEOM_MIRROR: Device gm0s1: provider ad10s1 activated.
GEOM_MIRROR: Device gm0s1: provider ad6s1 activated.
GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched.
Trying to mount root from ufs:/dev/mirror/gm0s1a
Loading configuration files.
dumpon: (this DIOCSKERNELDUMP message is probably since i specified  
dumpdev in rc.conf so it forced useage of gm0s1b instead of letting  
the scripts autodetect.. )

ioctl(DIOCSKERNELDUMP)
:
Operation not supported
Entropy harvesting:
interrupts
ethernet
point_to_point
kickstart
.
swapon: adding /dev/mirror/gm0s1b as swap device
Starting file system checks:
/dev/mirror/gm0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/mirror/gm0s1a: clean, 213811 free (771 frags, 26630 blocks, 0.3%  
fragmentation)

/dev/mirror/gm0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/mirror/gm0s1e: clean, 1012917 free (85 frags, 126604 blocks,  
0.0% fragmentation)

/dev/mirror/gm0s1f: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/mirror/gm0s1f: clean, 115955787 free (40747 frags, 14489380  
blocks, 0.0% fragmentation)

/dev/mirror/gm0s1d: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/mirror/gm0s1d: clean, 1983354 free (4834 frags, 247315 blocks,  
0.2% fragmentation)

ifconfig stuff
Starting devd.
Mounting NFS file systems:
.
Creating and/or trimming log files:
.
Starting syslogd.
Checking for core dump on /dev/mirror/gm0s1b...
savecore: no dumps found
Starting named.
rest of boot

So, it seems it does run savecore after running dumpon and mounting  
disks etc... Is that wrong?





Anyway, tried to do dumpon manually on my swap drive:

$ dumpon -v /dev/mirror/gm0s1b
dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported

Didn't work too good..
Also tried savecore manually:

$ savecore /var/crash/ /dev/mirror/gm0s1b
savecore: no dumps found


(This was my misstake, of course there are no dumps when I didnt have  
a dump when it crashed..)




Didnt work very good either (but probably expected since there was no
working dumps..)
Google showed me some other thread in this list about gmirror swap
dump, just a question (if it was supported) w/o any answers tho. Same
error as I got.


It seems that this could not be workaround'ed easily.  If possible, my
suggestion is that you attach a third disk and create a swap partition
on it for the crash dump.  If this is not feasible, then adding DDB
and KDB may give us a chance to catch the panic and you can use
trace command at the ddb prompt to obtain a simplified backtrace,
and there is good chance that it would reveal what is happening.

I have cc'ed to Pawel who is very knowledgeable in this area, and
let's see whether he has some better suggestions :-)


Okay, just added an old but working 2 gig disk to the system, made it  
a swap and swapon'ed and:


[EMAIL PROTECTED]:~$ dumpon -v /dev/ad0s1b
kernel dumps on /dev/ad0s1b

Great! :) So, let's see when/if it dies next time... Before I took it  
down for the dump-disk, it had been running fine
for 1d 1h (since boot after crasch), however probably

Re: Page fault, GEOM problem??

2005-11-18 Thread Johan Ström


On 18 nov 2005, at 23.39, Michal Mertl wrote:


Johan Ström wrote:

Hi!

On 18 nov 2005, at 18.43, Xin LI wrote:


Hi, Johan,


 large snip


So, it seems it does run savecore after running dumpon and mounting
disks etc... Is that wrong?


No, this is normal. When you run savecore you need to have mounted
filesystems. In order to mount the filesystems they may have to be
checked. The fsck program requires big amount of memory to check  
larger
filesystems so the swap has to be enabled. Core dumps are written  
to the

dump device (swap) from the end whereas the swap is normally used from
the beginning (or the other way around). Therefore there's quite a big
chance that, even when the swap has to be used for fsck, the core dump
is intact and usable. If the usage of the swap file by fsck  
corrupts the
core dump you may start after next crash in single user mode and  
run the

commands manually (without enabling swap).

As to why you can write kernel core dumps only to certain devices the
answer is that at the time, when the kernel is dumping core, it is
usually in pretty bad state, kernel internals may be corrupted and so
on. The dumping code is therefore written to be quite low level so  
that
even wedged kernel can be dumped. The dumping code is part of hard  
disk

controller's drivers. The gmirror is quite high-level device and geom
itself needs working scheduler so there will probably never be a  
way to

dump on gmirror provided swap. When you issue the dumpon command the
check is performed whether the driver for the disk you want to dump on
supports kernel core dumps.

Michal


Well that makes sense... Then that is right at least.. :)

I just noticed another thing... My disk performance... sucks! :P

Some examples (from an otherwise unloaded system):

[EMAIL PROTECTED]:/home/johan$ time dd if=/dev/zero of=bigfile.zero bs=1024  
count=100

100+0 records in
100+0 records out
102400 bytes transferred in 77.014797 secs (13296146 bytes/sec)

real1m17.100s
user0m0.244s
sys 0m10.140s

13MB/s from /dev/zero?? This was to my home dir (gm0s1f, last label  
on the slice/disk))..
When I'm about to open a new window in screen (ctrl-a-c) it takes  
forever (or rather, bash takes forever) to init when the above dd is  
running...

Well, iostat during dd:

[EMAIL PROTECTED]:~$ iostat
  tty ad0  ad6   
ad10 cpu
tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy  
in id
   0  164  2.19   0  0.00  50.52   3  0.17  50.99   3  0.17   1  0   
1  1 97



0.17MB/s?? Am i missreading these iostats or something?..
Load averages directly after the dd is complete is at 0.36, 0.15,  
0.05, so the dd doesnt take that much of aload to make bash work soo  
slow...Gotta be something else...



Running diskinfo -t gives me good values (for /dev/ad6 and /dev/ad10)

Transfer rates:
outside:   102400 kbytes in   1.846578 sec =55454  
kbytes/sec
middle:102400 kbytes in   1.879855 sec =54472  
kbytes/sec
inside:102400 kbytes in   3.147158 sec =32537  
kbytes/sec


So it shouldnt be the disk itself.. those values are the same as when  
I hade the disk in the temp system.. However I never did try any dd  
speedtests there.
Btw, tried to do regular cp on a dirtree at some gigs, same slooow  
speed..


Maybee my customkernel is fuckedup or something? It's just a GENERIC  
with some nonused devicedrivers removed so it would be strange...

I'll recompile during night and test GENERIC tomorrow, reporting back..

Did try to move the cards (network/vga/sata) arround in the PCI  
ports, in case there were any strange conflicts... No difference  
except I only got one txerror from xl since last boot (wooh!)


No crash so far.

--
Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Page fault, GEOM problem??

2005-11-17 Thread Johan Ström
Ok, just got this not so very nice error on a RELENG_6_0 box (built  
from sources this morning, GENERIC kernel minus drivers I dont use):


Nov 17 15:35:43 elfi kernel: subdisk10: detached
Nov 17 15:35:43 elfi kernel: ad10: detached
Nov 17 15:35:43 elfi kernel: unknown: TIMEOUT - READ_DMA retrying (1  
retry left) LBA=85720528
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad10s1 disconnected.
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134356992, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134373376, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134438912, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268591104, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268607488, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268623872, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268640256, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=20151026176, length=2048)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=32299655680, length=8192)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[READ(offset=37363671552, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[READ(offset=38349087232, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[READ(offset=45453566464, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[READ(offset=54459458048, length=131072)]

Nov 17 17:59:18 elfi syslogd: kernel boot file is /boot/kernel/kernel
Nov 17 17:59:18 elfi kernel:
Nov 17 17:59:18 elfi kernel:
Nov 17 17:59:18 elfi kernel: Fatal trap 12: page fault while in  
kernel mode

Nov 17 17:59:18 elfi kernel: fault virtual address  = 0x48
Nov 17 17:59:18 elfi kernel: fault code = supervisor read,  
page not present
Nov 17 17:59:18 elfi kernel: instruction pointer=  
0x20:0xc0506b92
Nov 17 17:59:18 elfi kernel: stack pointer  =  
0x28:0xd56d7c9c
Nov 17 17:59:18 elfi kernel: frame pointer  =  
0x28:0xd56d7c9c
Nov 17 17:59:18 elfi kernel: code segment   = base 0x0,  
limit 0xf, type 0x1b

Nov 17 17:59:18 elfi kernel: = DPL 0, pres 1, def32 1, gran 1
Nov 17 17:59:18 elfi kernel: processor eflags   = interrupt enabled,  
resume, IOPL = 0
Nov 17 17:59:18 elfi kernel: current process= 36 (swi4:  
clock sio)

Nov 17 17:59:18 elfi kernel: trap number= 12
Nov 17 17:59:18 elfi kernel: panic: page fault
Nov 17 17:59:18 elfi kernel: Uptime: 8h55m1s

ad10 and ad6, 2 brand new Maxtor Maxline 300GB SATA, attached to a  
Promise PDC40518 SATA150 controller, makes a GEOM mirror gm0s1.
I've been running this stuff in another test machine (MSI K8N neo  
Platinum, KT333 chip I believe), and I havent had a single problem. I  
moved the disks/controllercard to my real server 24 hours ago, with  
the only apparent problem I seemd to have was this:


Nov 17 07:06:12 elfi kernel: xl0: transmission error: 90
Nov 17 07:06:12 elfi kernel: xl0: tx underrun, increasing tx start  
threshold to 120 bytes

Nov 17 07:06:18 elfi kernel: xl0: watchdog timeout
Nov 17 07:06:18 elfi kernel: xl0: link state changed to DOWN
Nov 17 07:06:18 elfi kernel: vlan5: link state changed to DOWN
Nov 17 07:06:20 elfi kernel: xl0: link state changed to UP
Nov 17 07:06:20 elfi kernel: vlan5: link state changed to UP

Comming and going... these problems just apperade during first 20-30  
minutes after boot, then they dissapeared totally (and yes there was  
plenty of IO on the net going on both during and after these  
messages). Sometimes i just got the first two messages and nothing  
happened, but sometimes the watchdog message came and the network  
died for a minute or so.


Here is dmesg from last boot (directly after crash):

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights  
reserved.

FreeBSD 6.0-RELEASE #0: Thu Nov 17 00:49:29 CET 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/ELFI
ACPI APIC Table: ASUS   A7V333  
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: AMD Athlon(TM) XP 1900+ (1599.56-MHz 686-class CPU)
  Origin = AuthenticAMD  Id = 0x662  Stepping = 2
   
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, 
MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE

  AMD Features=0xc0480800SYSCALL,MP,MMX+,3DNow+,3DNow
real memory  = 536854528 (511 MB)
avail memory = 516014080 (492 MB)
ioapic0: 

Re: Apache2, mod_python and nss_ldap: Coredump...

2005-11-10 Thread Johan Ström

On 10 nov 2005, at 00.25, Brian Fundakowski Feldman wrote:


On Wed, Nov 09, 2005 at 10:20:26AM +0100, Johan Ström wrote:

Hi

I got a new 6.0-STABLE box. Rebuilt kernel and world 2 hours ago
(against RELENG_6), so it should be pretty new.

Im trying to have apache 2.0.55, mod_python 3.1.4 and nss_ldap 239,
all the latest from ports.
The problem I have is this: If i have LoadModule python_module
libexec/apache2/mod_python.so in my httpd.conf, and at the same time
have either
group: files ldap and/or passwd: files ldap in my nsswitch.conf,
i get Segfaults. Example:

[EMAIL PROTECTED]:~$ apachectl configtest
Syntax OK
Segmentation fault (core dumped)
[EMAIL PROTECTED]:~$

However, apache itself is running fine, even using mod_python.
If i remove either the LoadModule or both the ldap-entrys in
nsswitch, the segfaults dissappear. I've compiled httpd with debug
symbols, and this is what I found with gdb (httpd -t is same as
apachectl configtest):
[...]
(gdb) where
#0  0x in ?? ()
#1  0x28be6744 in ?? () from /usr/local/lib/nss_ldap.so.1
#2  0x28bf2200 in ?? () from /usr/local/lib/nss_ldap.so.1


Can you try making sure that nss_ldap gets built and linked with -g,
and is not stripped, so that all symbols and debug info are preserved
as well?  Looks to be atexit(3)-related, from here, but the symbols
should clear things up.


Hi, thanks for the answer!
I *think* i got the nss_ldap.so to not be strip'd, at least I cant  
find any call in the port Makefile or the sources makefile/configure  
stuff that would strip it. Same result as before, no new symbols..  
Strange? I'm compiling with -g and -O0..


However, I've noticed one thing, if I run gdb httpd and then run -t,  
I get this:


[EMAIL PROTECTED]:~$ gdb httpd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.

This GDB was configured as i386-marcel-freebsd...
(gdb) run -t
Starting program: /usr/local/sbin/httpd -t
warning: Unable to get location for thread creation breakpoint:  
generic error

[New LWP 100128]
[New Thread 0x80fa000 (LWP 100128)]
wWarning: DocumentRoot [/usr/local/nagios/share] does not exist
Syntax OK
[New LWP 100128]

Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to LWP 100128]
0x28bce277 in pthread_testcancel () from /usr/lib/libpthread.so.2
(gdb) where
#0  0x28bce277 in pthread_testcancel () from /usr/lib/libpthread.so.2
Error accessing memory address 0x28bcd7a8: Bad address.
(gdb)


Thats the pthread_cancel thing I was talking about before...
However, if I do run httpd -t and then check the dump with gdb httpd - 
c httpd.core, I get the same as first posted.


Did the test over and over again, got the same pthread_cancel error,  
reading the same memory address, re-ran httpd -t a couple of times  
and seems I only get these pthread_cancel calls...


Is there any way to check if a lib is strip'd/got debug symbols or not?

Thanks
Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Apache2, mod_python and nss_ldap: Coredump

2005-11-10 Thread Johan Ström

On 10 nov 2005, at 13.55, Stephane Bortzmeyer wrote:


On Wed, Nov 09, 2005 at 01:46:37PM +0100,
 Johan Ström [EMAIL PROTECTED] wrote
 a message of 112 lines which said:


Im trying to have apache 2.0.55, mod_python 3.1.4 and nss_ldap 239,
all the latest from ports.
The problem I have is this: If i have LoadModule python_module
libexec/apache2/mod_python.so in my httpd.conf, and at the same time
have either
group: files ldap and/or passwd: files ldap in my nsswitch.conf,
i get Segfaults. Example:


The only thing I can say is that I have the same problem on
FreeBSD 5.4-RELEASE.




Intresting... So it seems im not the only one with problems then.

CC'ing this to the freebsd-stable-list (and the correct mod_python  
mail-address.. had it wrong in the first mail to apache-users)..


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Apache2, mod_python and nss_ldap: Coredump...

2005-11-10 Thread Johan Ström

On 10 nov 2005, at 12.54, Johan Ström wrote:


On 10 nov 2005, at 00.25, Brian Fundakowski Feldman wrote:


On Wed, Nov 09, 2005 at 10:20:26AM +0100, Johan Ström wrote:

Hi

I got a new 6.0-STABLE box. Rebuilt kernel and world 2 hours ago
(against RELENG_6), so it should be pretty new.

Im trying to have apache 2.0.55, mod_python 3.1.4 and nss_ldap 239,
all the latest from ports.
The problem I have is this: If i have LoadModule python_module
libexec/apache2/mod_python.so in my httpd.conf, and at the same time
have either
group: files ldap and/or passwd: files ldap in my nsswitch.conf,
i get Segfaults. Example:

[EMAIL PROTECTED]:~$ apachectl configtest
Syntax OK
Segmentation fault (core dumped)
[EMAIL PROTECTED]:~$

However, apache itself is running fine, even using mod_python.
If i remove either the LoadModule or both the ldap-entrys in
nsswitch, the segfaults dissappear. I've compiled httpd with debug
symbols, and this is what I found with gdb (httpd -t is same as
apachectl configtest):
[...]
(gdb) where
#0  0x in ?? ()
#1  0x28be6744 in ?? () from /usr/local/lib/nss_ldap.so.1
#2  0x28bf2200 in ?? () from /usr/local/lib/nss_ldap.so.1


Can you try making sure that nss_ldap gets built and linked with -g,
and is not stripped, so that all symbols and debug info are preserved
as well?  Looks to be atexit(3)-related, from here, but the symbols
should clear things up.


Hi, thanks for the answer!
I *think* i got the nss_ldap.so to not be strip'd, at least I cant  
find any call in the port Makefile or the sources makefile/ 
configure stuff that would strip it. Same result as before, no new  
symbols.. Strange? I'm compiling with -g and -O0..


However, I've noticed one thing, if I run gdb httpd and then run - 
t, I get this:


[EMAIL PROTECTED]:~$ gdb httpd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License,  
and you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.

This GDB was configured as i386-marcel-freebsd...
(gdb) run -t
Starting program: /usr/local/sbin/httpd -t
warning: Unable to get location for thread creation breakpoint:  
generic error

[New LWP 100128]
[New Thread 0x80fa000 (LWP 100128)]
wWarning: DocumentRoot [/usr/local/nagios/share] does not exist
Syntax OK
[New LWP 100128]

Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to LWP 100128]
0x28bce277 in pthread_testcancel () from /usr/lib/libpthread.so.2
(gdb) where
#0  0x28bce277 in pthread_testcancel () from /usr/lib/libpthread.so.2
Error accessing memory address 0x28bcd7a8: Bad address.
(gdb)


Thats the pthread_cancel thing I was talking about before...
However, if I do run httpd -t and then check the dump with gdb  
httpd -c httpd.core, I get the same as first posted.


Did the test over and over again, got the same pthread_cancel  
error, reading the same memory address, re-ran httpd -t a couple of  
times and seems I only get these pthread_cancel calls...


Is there any way to check if a lib is strip'd/got debug symbols or  
not?


Thanks
Johan


Okay, some news here then.. Thanks to David Adam I used file to  
determine if it was striped or not, seems it was.. So now I've fixed  
it, not striped anymore (the install command striped it, i missed  
that)..

New debug output then:

(gdb) where
#0  0x in ?? ()
#1  0x28bd9730 in __do_global_dtors_aux () from /usr/local/lib/ 
nss_ldap.so.1

#2  0x28be2984 in _fini () from /usr/local/lib/nss_ldap.so.1
#3  0x280b5018 in tls_dtv_generation () from /libexec/ld-elf.so.1
#4  0x280b63d8 in ?? () from /libexec/ld-elf.so.1
#5  0xbfbfe628 in ?? ()
#6  0x2809d076 in elf_hash () from /libexec/ld-elf.so.1
#7  0x2809f958 in dlclose () from /libexec/ld-elf.so.1
#8  0x284b064c in _nsdbtaddsrc () from /lib/libc.so.6
#9  0x284b020f in endhostent () from /lib/libc.so.6
#10 0x284b06cc in _nsdbtaddsrc () from /lib/libc.so.6
#11 0x284cf35f in __cxa_finalize () from /lib/libc.so.6
#12 0x284cef9a in exit () from /lib/libc.so.6
#13 0x0806b746 in destroy_and_exit_process (process=0x80a4090,  
process_exit_value=0) at main.c:216

#14 0x0806c0fe in main (argc=2, argv=0xbfbfe838) at main.c:565


(Also sent this to the other lists this thread is discussed in).

Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Apache2, mod_python and nss_ldap: Coredump...

2005-11-09 Thread Johan Ström

Hi

I got a new 6.0-STABLE box. Rebuilt kernel and world 2 hours ago  
(against RELENG_6), so it should be pretty new.


Im trying to have apache 2.0.55, mod_python 3.1.4 and nss_ldap 239,  
all the latest from ports.
The problem I have is this: If i have LoadModule python_module   
libexec/apache2/mod_python.so in my httpd.conf, and at the same time  
have either
group: files ldap and/or passwd: files ldap in my nsswitch.conf,  
i get Segfaults. Example:


[EMAIL PROTECTED]:~$ apachectl configtest
Syntax OK
Segmentation fault (core dumped)
[EMAIL PROTECTED]:~$

However, apache itself is running fine, even using mod_python.
If i remove either the LoadModule or both the ldap-entrys in  
nsswitch, the segfaults dissappear. I've compiled httpd with debug  
symbols, and this is what I found with gdb (httpd -t is same as  
apachectl configtest):


[EMAIL PROTECTED]:~$ gdb httpd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.

This GDB was configured as i386-marcel-freebsd...
(gdb) run -t
Starting program: /usr/local/sbin/httpd -t
warning: Unable to get location for thread creation breakpoint:  
generic error

[New LWP 100104]
[New Thread 0x80ab000 (LWP 100104)]
Warning: DocumentRoot [/usr/local/nagios/share] does not exist
Syntax OK

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x80ab000 (LWP 100104)]
0x in ?? ()
(gdb) where
#0  0x in ?? ()
#1  0x28be6744 in ?? () from /usr/local/lib/nss_ldap.so.1
#2  0x28bf2200 in ?? () from /usr/local/lib/nss_ldap.so.1
#3  0x280ba3d8 in ?? () from /libexec/ld-elf.so.1
#4  0xbfbfe618 in ?? ()
#5  0x280a0b26 in _rtld_error () from /libexec/ld-elf.so.1
#6  0x28bef998 in _fini () from /usr/local/lib/nss_ldap.so.1
#7  0x280b9018 in tls_dtv_generation () from /libexec/ld-elf.so.1
#8  0x280ba3d8 in ?? () from /libexec/ld-elf.so.1
#9  0xbfbfe628 in ?? ()
#10 0x280a1076 in elf_hash () from /libexec/ld-elf.so.1
#11 0x280a3958 in dlclose () from /libexec/ld-elf.so.1
#12 0x284de64c in _nsdbtaddsrc () from /lib/libc.so.6
#13 0x284de20f in endhostent () from /lib/libc.so.6
#14 0x284de6cc in _nsdbtaddsrc () from /lib/libc.so.6
#15 0x284fd35f in __cxa_finalize () from /lib/libc.so.6
#16 0x284fcf9a in exit () from /lib/libc.so.6
#17 0x0806f0ee in destroy_and_exit_process (process=0x80b6098,  
process_exit_value=0) at main.c:216

#18 0x0806faa6 in main (argc=2, argv=0xbfbfe838) at main.c:565
(gdb)

So, seems the segfault appears when apache calls exit(), explains why  
it seems to work good otherwise...
Googling gave me some similar problem (bug 65220), however that bug  
seemd to affect other programs, so far I've only encountered this  
problem with apache.

Currently I've compiled apache with the following:

portinstall apache-2.0.55 -M WITH_DBM=bdb WITH_BERKELEYDB=db4  
WITH_LDAP=1 WITH_MPM=prefork WITH_THREADS=yes  
WITH_THREADS_MODULES=yes WITH_DEBUG=1


The threads stuff was added after some suspect gdb'ing around a  
pthread function (can't remember exact name now.. something  
pthread_cancel.. the symptoms where the same, segfault just before  
exit).
mod_python is installed without any special options, there isnt realy  
any (ie no option to turn of threads).


Does anyone have any clue about whats going on here?
Thanks!
Johan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]