Re: LOR in mpr(4)

2016-10-23 Thread geoffroy desvernay
On 10/19/2016 06:39 PM, Pete Wright wrote:
> 
> the issue you are seeing is most likely not related to the LOR from the
> original email and PR I filed.  This looks like a media error with the
> disk device on your RAID controller.  A quick google search turn's up
> quite a few threads on this - ranging from bad RAID/JBOD controllers to
> out of date firmware.
> 
> Cheers,
> -pete
> 

Thank you for your response, I'll take more time checking around
controller. I just fear that this dell-repackaged avago controller may
have some 'dell' crapped firmware… I have to look closer then :)

Cheers,

dgeo.
-- 
*geoffroy desvernay*
C.R.I - Administration systèmes et réseaux
Ecole Centrale de Marseille
Tel: (+33|0)4 91 05 45 24
Fax: (+33|0)4 91 05 45 98
d...@centrale-marseille.fr

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: LOR in mpr(4)

2016-10-19 Thread geoffroy desvernay
On 11/17/2015 21:43, Pete Wright wrote:
> 
> 
> On 11/12/15 09:44, Pete Wright wrote:
>> Hi All,
>> Just wanted a sanity check before filing a PR.  I am running r290688 and
>> am seeing a LOR being triggered in the mpr(4) device:
>>
>> $ uname -ar
>> FreeBSD srd0013 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r290688: Wed Nov 11
>> 21:28:26 PST 2015 root@srd0013:/usr/obj/usr/src/sys/GENERIC  amd64
>>
>> 
>> lock order reversal:
>>  1st 0xf8000d26bc60 CAM device lock (CAM device lock) @
>> /usr/src/sys/cam/cam_xpt.c:784
>>  2nd 0xfe00012811c0 MPR lock (MPR lock) @
>> /usr/src/sys/cam/cam_xpt.c:2620
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe04608ee890
>> witness_checkorder() at witness_checkorder+0xe79/frame 0xfe04608ee910
>> __mtx_lock_flags() at __mtx_lock_flags+0xa4/frame 0xfe04608ee960
>> xpt_action_default() at xpt_action_default+0xb6c/frame 0xfe04608ee9b0
>> scsi_scan_bus() at scsi_scan_bus+0x1d5/frame 0xfe04608eea20
>> xpt_scanner_thread() at xpt_scanner_thread+0x15c/frame 0xfe04608eea70
>> fork_exit() at fork_exit+0x84/frame 0xfe04608eeab0
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe04608eeab0
>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> 
> 
> FWIW I filed the following PR as I can still reproduce this on boot:
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204614
> 
> cheers,
> -pete
> 
Hi all,

Sorry for cross-posting, let me know where this should go please, I
didn't figured it out :(

On 11-RELEASE-p1 here (but replying on current@ where I found something
around mpr(4))

Not sure if it's related, but on a fresh new machine with Avago SAS3008
and a 24 disks enclosure (single attached).

I see a bunch of:

mpr0: Found device <401,End Device> <12.0Gbps> handle<0x001b>
enclosureHandle<0x0002> slot 8
(da0:mpr0:0:8:0): UNMAPPED
(da0:mpr0:0:8:0): CAM status: SCSI Status Error
(da0:mpr0:0:8:0): SCSI status: Check Condition
(da0:mpr0:0:8:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command
operation code)
(da0:mpr0:0:8:0): Error 22, Unretryable error
10:0): UNMAPPED
(da0:mpr0:0:8:0): READ(10). CDB: 28 00 e8 e0 88 71 00 00 04 00
(da0:mpr0:0:8:0): CAM status: SCSI Status Error
(da0:mpr0:0:8:0): SCSI status: Check Condition
(da0:mpr0:0:8:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command
operation code)
(da0:mpr0:0:8:0): Error 22, Unretryable error
ses0: da0: Element descriptor: 'Drive Slot 0'
ses0: da0: SAS Device Slot Element: 2 Phys at Slot 0
ses0:  phy 0: SAS device type 1 id 0
ses0:  phy 0: protocols: Initiator( None ) Target( SSP )
ses0:  phy 0: parent 520474729974b57f addr 5000c50097ce8215
ses0:  phy 1: SAS device type 1 id 1
ses0:  phy 1: protocols: Initiator( None ) Target( SSP )
ses0:  phy 1: parent 520474729974b5ff addr 5000c50097ce8216

(more complete dmesg.boot here: http://dgeo.perso.ec-m.fr/dmesg.boot )

Later, no way to use these disks with zfs:
# zpool create tank da0
cannot create 'tank': invalid argument for this pool operation

I can dd if=/dev/zero of=/dev/da0 though not tested until disk is full…

Can this be related ? Must I open a pr ? How can I help debugging this ?

I'm not kernel/driver hacker, but I'd like to help this be figured out :)

Yours,
-- 
*geoffroy desvernay*
C.R.I - Administration systèmes et réseaux
Ecole Centrale de Marseille




signature.asc
Description: OpenPGP digital signature


Re: ahcich reset - cannot mount zfs root in 9.1-PRE

2012-10-03 Thread geoffroy desvernay
On 10/02/2012 17:40, Alexander Motin wrote:
 On 02.10.2012 16:51, Andriy Gapon wrote:
 on 02/10/2012 16:16 geoffroy desvernay said the following:
 Hi all,

 Trying to upgrade a system from 9.0-RELEASE to 9.1-PRE from yesterday on
 my machine (GEOM+ZFS mirror setup on ada[01]p3), the new kernel becomes
 unable to mount root... The only way to recover is to boot from 9.0
 kernel.
 The disks were already named ada[01] in 9.0, so I suspect nothing
 there...

 I tried
   - disabling AHCI in bios (no change seen)
   - change cables, check PSU, test disks with smartctl

 Here are some bits (via serial console):
 ahci0: ATI IXP600 AHCI SATA controller port
 0xc000-0xc007,0xb000-0xb003,0xa000-0xa007,0x9000-0x9003,0x8000-0x800f
 mem 0xfe9ff800-0xfe9ffbff irq 22 at device 18.0 on pci0
 ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier supported
 ahci0: Caps: 64bit NCQ SNTF MPS AL CLO 3Gbps PM PMD SSC PSC 32cmd CCC
 4ports
 ahcich0: AHCI channel at channel 0 on ahci0
 ahcich0: Caps: HPCP
 ahcich1: AHCI channel at channel 1 on ahci0
 ahcich1: Caps: HPCP
 ahcich2: AHCI channel at channel 2 on ahci0
 ahcich2: Caps: HPCP
 ahcich3: AHCI channel at channel 3 on ahci0
 ahcich3: Caps: HPCP
 ahcich0: AHCI reset...
 ahcich0: SATA connect time=100us status=0123
 ahcich0: AHCI reset: device found
 ahcich0: AHCI reset: device ready after 0ms

 The difference with 9.0 is after that: here is 9.0's next lines: (same
 for ahcich1)
 (aprobe0:ahcich0:0:15:0): Command timed out
 (aprobe0:ahcich0:0:15:0): Error 5, Retries exhausted
 (aprobe0:ahcich0:0:0:0): SIGNATURE: 

 And 9.1-PRE's:
 (aprobe0:ahcich0:0:15:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
 (aprobe0:ahcich0:0:15:0): CAM status: Command timeout
 (aprobe0:ahcich0:0:15:0): Error 5, Retries exhausted

 In both cases ada[01] are detected and available, but with 9.1-PRE I
 see:
 GEOM_RAID: Promise: Disk ada0 state changed from NONE to SPARE.
 GEOM_RAID: Promise: Disk ada1 state changed from NONE to SPARE.

 (I see the same when I # kldload geom_raid # from running 9.0, doesn't
 breaks anything...)

 I attach the full boot log with 9.1-PRE (bios with NO-raid nor AHCI
 enabled, but this changes nothing in the output)

 I could test patches or try any command required to debug this… But for
 the moment I don't know where to search (and kernel code is far away
 from my current skills in debugging…)

 You probably need to clear RAID metadata on the disks as I think that
 disabling
 geom_raid is not possible in 9.1-PRE.
 I think that Alexander can help you more here.
 
 The right way is to clear RAID metadata on disks. If it is possible to
 boot from any other source, you can just do `graid delete Promise` and
 then reboot.
 
 Alternatively it is possible to disable geom_raid module using recently
 added loader tunable kern.geom.raid.enable=0. After that your system
 should boot and run fine. I would still recommend you to erase metadata,
 but after setting that tunable it will be impossible to do it via graid
 tool, only with manual dd surgery. In case of Promise format metadata
 use up to 63 last sectors of the disk. You can identify respective
 sectors to erase by signature Promise Technology, Inc. in the
 beginning of the sector.
 
I tried clearing metadata, but no effect (it seems to work, the first
'geom raid delete Promise' returns 0, the second one complains something
like 'Promise array doesn't exist', but it didn't solve the problem.

But adding kern.geom.raid.enable=0 did ;)

I still didn't try to locate manualy the last sectors...

Thanks a lot !
-- 
*geoffroy desvernay*
C.R.I - Administration systèmes et réseaux
Ecole Centrale de Marseille
Tel: (+33|0)4 91 05 45 24
Fax: (+33|0)4 91 05 45 98
d...@centrale-marseille.fr

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: problem with LSI MegaRAID on 8.2-RELEASE

2011-10-23 Thread geoffroy desvernay

On 13/09/2011 13:22, Johan Hendriks wrote:

Maciej Jan Broniarz schreef:

Wiadomość napisana przez Jeremy Chadwick w dniu 13 wrz 2011, o godz.
12:33:


On Tue, Sep 13, 2011 at 11:43:29AM +0200, Maciej Jan Broniarz wrote:

I'm having some trouble with LSI MegaRAID on FreeBSD 8.2-RELEASE-p2
My storage starts to freeze and the following message apears in the
log:

mfi0: COMMAND 0xff8000b6be58 TIMEOUT AFTER 3005 SECONDS
mfi0: COMMAND 0xff8000b6be58 TIMEOUT AFTER 3035 SECONDS
mfi0: COMMAND 0xff8000b6be58 TIMEOUT AFTER 3065 SECONDS
mfi0: COMMAND 0xff8000b6be58 TIMEOUT AFTER 3095 SECONDS
mfi0: COMMAND 0xff8000b6be58 TIMEOUT AFTER 3125 SECONDS
mfi0: COMMAND 0xff8000b6be58 TIMEOUT AFTER 3156 SECONDS
mfi0: COMMAND 0xff8000b6be58 TIMEOUT AFTER 3186 SECONDS

What might be the issue here?

http://lists.freebsd.org/pipermail/freebsd-stable/2011-August/063808.html

http://lists.freebsd.org/pipermail/freebsd-stable/2011-August/063809.html

http://lists.freebsd.org/pipermail/freebsd-stable/2011-August/063810.html

http://lists.freebsd.org/pipermail/freebsd-stable/2011-August/063811.html

http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063816.html

http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063817.html

http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html

http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063823.html


--

Thanks. But there is still no solution for the problem. I haven't
applied any patch
and yet the problem occurs.

All best,
mjb





Maybe i understand your comment the wrong way, but i think the patch is
there to prevent prevent the problem.
So by applying the following patch//
www.freebsd.org/~jhb/patches/mfi.patch//
Your issue should be SOLVED.//
//
regards
Johan Hendriks
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Same issue here with dell's PERC H700 (LSI repackaged by dell).
The patch referenced here solves the problem for me (8.2-STABLE and 
9.0-RC1 on amd64):

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/140416

Could someone commit this, or is this problem solved y other means ? (I 
don't follow freebsd-fs@ not freebsd-scsi@ for now...)


--
*Geoffroy Desvernay*
C.R.I - Administration systèmes et réseaux
Ecole Centrale de Marseille

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: bin/136073: recent nscd(8) changes cause client processes to die with SIGPIPE

2011-04-29 Thread geoffroy desvernay
This change is not so recent now... But I'm still experiencing this bug
with 8.2p1 :(

This bug happens with nss_ldap-1.265_6 and nss-pam-ldapd, using one or
more ldap:// and|or ldaps:// servers, and cache enabled in nsswitch.conf.

Some symptoms:
# id dgeo; echo $?
141

cron jobs are logged as executed but are not !

I may test any patch or ?
-- 
*geoffroy desvernay*
C.R.I - Administration systèmes et réseaux
Ecole Centrale de Marseille



signature.asc
Description: OpenPGP digital signature


Re: RELENG_7_1: bce driver change generating too much interrupts ?

2008-12-03 Thread geoffroy desvernay
Xin LI a écrit :
 Hi guys,
 
 I think I got a real fix.
 
It seems to work for me® too

Server under normal charge (smtp/imap/Maildir for ~1000 users, NFS
filer), everything seems ok... (1h uptime for now)

Thank you !
-- 
geoffroy desvernay



signature.asc
Description: OpenPGP digital signature


RELENG_7_1: bce driver change generating too much interrupts ?

2008-12-02 Thread Geoffroy Desvernay
Since last upgrade, I see much more CPU time eated by interrupts (at
least 10% cpu in top)
(see http://dgeo.perso.ec-marseille.fr/cpu-week.png)

The server behave correctly (Or seems to…), and high interrupt number
seems to come from bce cards (source: systat -vmstat)

I just upgraded from
RELENG_7 Mon Sep  8 12:33:06 CEST 2008
to
RELENG_7_1 Sat Nov 29 16:20:35 CET 2008

We have the same machine (dell PE 1950) which have not been upgraded
(production use - the two machine are carp(4)-redundant)

I don't know if it is related to SVN rev 184826 on 2008-11-10 22:40:16Z
by delphij patch to sys/dev/bce/if_bce.c


If I can help debugging something… These are production machines, but I
may test patches or ? on the faulty system.



Some clues:

Under the very same load (carp interfaces down on other machine), vmstat
shows:
for newer system:

 procs  memory  page   disk   faults cpu
 r b w avmfre   flt  re  pi  pofr  sr mf0   in   sy   cs us
sy id
 0 1 1   4806M   460M   649   0   0   0   582   2   0 21770 1270 13653
1 15 85

and for older:

 procs  memory  page   disk   faults cpu
 r b w avmfre   flt  re  pi  pofr  sr mf0   in   sy   cs us
sy id
 0 1 0   3694M   414M   236   0   0   0   199  17   0  286  317  386  1
 1 97


bce-related part of dmesg for the newer system:

bce0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) mem
0xf400-0xf5ff irq 16 at device 0.0 on pci9
miibus0: MII bus on bce0
bce0: Ethernet address: 00:15:c5:f1:56:f4
bce0: [ITHREAD]
bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W
(0x02090105); Flags( SPLT MFW MSI )
bce1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) mem
0xf800-0xf9ff irq 16 at device 0.0 on pci5
miibus1: MII bus on bce1
bce1: Ethernet address: 00:15:c5:f1:56:f2
bce1: [ITHREAD]
bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W
(0x02090105); Flags( SPLT MFW MSI )

And on the older system:

bce0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) mem
0xf400-0xf5ff irq 16 at device 0.0 on pci9
miibus0: MII bus on bce0
bce0: Ethernet address: 00:15:c5:f1:6a:47
bce0: [ITHREAD]
bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W
(0x02090105); Flags( MFW MSI )
bce1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) mem
0xf800-0xf9ff irq 16 at device 0.0 on pci5
miibus1: MII bus on bce1
bce1: Ethernet address: 00:15:c5:f1:6a:45
bce1: [ITHREAD]
bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W
(0x02090105); Flags( MFW MSI )

-- 
Geoffroy Desvernay
Ecole Centrale de Marseille



signature.asc
Description: OpenPGP digital signature


Re: RELENG_7_1: bce driver change generating too much interrupts ?

2008-12-02 Thread geoffroy desvernay
Xin LI a écrit :
 Can anyone try reverting the changeset itself?  There are two recent
 changesets:
 
   http://www.delphij.net/bce-185161.diff.bz2
   http://www.delphij.net/bce-184826.diff.bz2
 
 You can revert the change by doing this:
 
 cd /usr/src
 fetch http://www.delphij.net/bce-185161.diff.bz2
 fetch http://www.delphij.net/bce-184826.diff.bz2
 bzcat bce-185161.diff.bz2 | patch -R
 bzcat bce-184826.diff.bz2 | patch -R
 
 I'll check what's happening ASAP.
 
Done:

I'd say it seems to be related...

Before applying your patches:
# vmstat -i
interrupt  total   rate
irq1: atkbd0  18  0
irq14: ata0   58  0
irq20: uhci1  96  0
irq21: uhci0 uhci+ 5  0
irq78: mfi0   539747  3
cpu0: timer350029937   1999
irq256: bce0  6757905080  38611
irq259: bce1  8296789513  47403
cpu1: timer350029945   1999
cpu2: timer350030010   1999
cpu3: timer350030025   1999
Total16455354434  94018


After patch, make buildkernel  make reinstallkernel and reboot
interrupt  total   rate
irq1: atkbd0  18  0
irq14: ata0   58  0
irq20: uhci1   2  0
irq21: uhci0 uhci+ 5  0
irq78: mfi0 3947 24
cpu0: timer   320361   1989
irq256: bce06658 41
irq259: bce11428  8
cpu1: timer   320320   1989
cpu2: timer   320380   1989
cpu3: timer   320507   1990
Total1293684   8035

-- 
geoffroy desvernay



signature.asc
Description: OpenPGP digital signature


page fault on RELENG_6_1

2006-11-06 Thread Geoffroy DESVERNAY

I'm experiencing kernel panics , and trying to understand something
(not a real kernel hacker... I'm more near 'Hello World' programmer:)

I think there is something like a null-pointer each time, in nd6_output 
(crashes 2 and 3)


I'm not sure crash 4 is the same (look like 
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/96413 )


Here are my dmesg, and some kgdb logs, hope I didn't forgot anything 
important...


The machine (via C7) is hosting some websites, some mails, is 
ipv6-enabled via gif tunnel, and use 2 openvpn instances.


Please cc my mail address.
--
 ___
/ Geoffroy DESVERNAY   |\
   /\`Service info`| Tel: (+33|0)4 91 05 45 24  /\
   \/ Ecole Centrale de Marseille  | Fax: (+33|0)4 91 05 45 98  \/
\ (ex-EGIM)| Mail: [EMAIL PROTECTED] /
 ---


Dump header from device /dev/ad4s1b
  Architecture: i386
  Architecture Version: 2
  Dump Length: 1056505856B (1007 MB)
  Blocksize: 512
  Dumptime: Sun Oct  8 01:03:32 2006
  Hostname: box.dgeos.net
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 6.1-RELEASE-p10 #0: Wed Oct  4 09:30:30 CEST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/BOX
  Panic String: page fault
  Dump Parity: 103186717
  Bounds: 2
  Dump Status: good

[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.

Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x24
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc0515778
stack pointer   = 0x28:0xe338d828
frame pointer   = 0x28:0xe338d848
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= resume, IOPL = 0
current process = 12 (swi1: net)
trap number = 12
panic: page fault
Uptime: 2d22h25m31s
Dumping 1007 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 1007MB (257776 pages) 991 975 959 943 927 911 895 879 863 847 831 
815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 
495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 
175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) list *0xc0515778
0xc0515778 is in propagate_priority (/usr/src/sys/kern/subr_turnstile.c:241).
236 /*
237  * Pick up the lock that td is blocked on.
238  */
239 ts = td-td_blocked;
240 MPASS(ts != NULL);
241 tc = TC_LOOKUP(ts-ts_lockobj);
242 mtx_lock_spin(tc-tc_lock);
243 
244 /* Resort td on the list if needed. */
245 if (!turnstile_adjust_thread(ts, td)) {
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc04edbb7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:402
#2  0xc04edef9 in panic (fmt=0xc06c92d8 %s) at 
/usr/src/sys/kern/kern_shutdown.c:558
#3  0xc06ac32c in trap_fatal (frame=0xe338d7e8, eva=0) at 
/usr/src/sys/i386/i386/trap.c:836
#4  0xc06ab9c4 in trap (frame=
  {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -995882752, tf_esi = 
-995882368, tf_ebp = -482813880, tf_isp = -482813932, tf_ebx = -995882752, 
tf_edx = -995882368, tf_ecx = -992324084, tf_eax = 0, tf_trapno = 12, tf_err = 
0, tf_eip = -1068411016, tf_cs = 32, tf_eflags = 589954, tf_esp = -995882368, 
tf_ss = 40})
at /usr/src/sys/i386/i386/trap.c:269
#5  0xc0698a7a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#6  0xc0515778 in propagate_priority (td=0xc4a40a80) at 
/usr/src/sys/kern/subr_turnstile.c:239
#7  0xc0515ff3 in turnstile_wait (lock=0xc4da560c, owner=0x0) at 
/usr/src/sys/kern/subr_turnstile.c:634
#8  0xc04e2ba4 in _mtx_lock_sleep (m=0xc4da560c, tid=3299084544, opts=0, 
file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:565
#9  0xc05cc183 in nd6_output (ifp=0xc4b1d000, origifp=0x0, m0=0xc4f1f700, 
dst=0xc5185e1c, rt0=0xc4da59cc) at /usr/src/sys/netinet6/nd6.c:2004
#10 0xc05c505b in ip6_output (m0=0xe338da44, opt=0x0, ro=0xe338da44, flags=0, 
im6o=0x0, ifpp=0x0, inp=0xc4f81870) at /usr/src/sys/netinet6/ip6_output.c:994
#11 0xc05a7c77 in syncache_respond (sc=0xc8f8baf0, m=0xc4f1f700) at 
/usr

Re: i386/86880: [hang] 6.0 hangs or reboots whilst 5.4 is stable (ASUS-A7NX motherboard with nforce2 chipset)

2006-02-09 Thread Geoffroy Desvernay

Quoting Mars G. Miro [EMAIL PROTECTED]:


On 2/8/06, Geoffroy Desvernay [EMAIL PROTECTED] wrote:


I've got the same problem with an A7N8X-X (athlon 2000+) motherboard and
6-STABLE (Build Feb, 2 2006).
booting with kernel.debug says nothing, seems to be hardware hang but
doesn't happend with linux nor OpenBSD. Didn't tried 5.3 yet.
Hang after detection of ATA devices (floppy's light turns on, then hang)

 I've experienced this myself. Happens w/ nForce-based mobos and
 certain shuttles. My fix has to always set the BIOS setting of the HD
 to LBA instead of Auto or CHS.

 Try this and report back ;-)

Tried this unsuccessfully, but fixing cpu freq to 100Mhz (instead of
133Mhz) seems to work...

I've read something about disabling firewire in the bios, but I have it
on a separate card (not in the MB), and I can't remove it for the moment...



Also try disabling APIC (not ACPI) as I've encountered several mobos
that have this implemented poorly w/c results in weird behaviors of
the OS.


I'm not kernel developper, but I may try patches or ?



I'm not aware of any patches but I think this is just a hardware
config problem tho YMMV.


Working at 133Mhz with hint.apic.0.disabled=1 in looader.conf.

Thanks for that :)

I saw at http://acpi.sf.net/dsdt/view.php?id=233 that a dsdt specific 
for this board is available... (not fixing all), could this fix 
anything in my case ? I think I'll give a try on of these days...


Geoffroy


This message was sent using IMP, the Internet Messaging Program.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: i386/86880: [hang] 6.0 hangs or reboots whilst 5.4 is stable (ASUS-A7NX motherboard with nforce2 chipset)

2006-02-07 Thread Geoffroy Desvernay



I've got the same problem with an A7N8X-X (athlon 2000+) motherboard and
6-STABLE (Build Feb, 2 2006).
booting with kernel.debug says nothing, seems to be hardware hang but
doesn't happend with linux nor OpenBSD. Didn't tried 5.3 yet.
Hang after detection of ATA devices (floppy's light turns on, then hang)


I've experienced this myself. Happens w/ nForce-based mobos and
certain shuttles. My fix has to always set the BIOS setting of the HD
to LBA instead of Auto or CHS.

Try this and report back ;-)


Tried this unsuccessfully, but fixing cpu freq to 100Mhz (instead of
133Mhz) seems to work...

I've read something about disabling firewire in the bios, but I have it
on a separate card (not in the MB), and I can't remove it for the moment...

I'm not kernel developper, but I may try patches or ?

Geoffroy.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: i386/86880: [hang] 6.0 hangs or reboots whilst 5.4 is stable (ASUS-A7NX motherboard with nforce2 chipset) (regression)

2006-02-03 Thread Geoffroy Desvernay
I've got the same problem with an A7N8X-X (athlon 2000+) motherboard and
6-STABLE (Build Feb, 2 2006).

booting with kernel.debug says nothing, seems to be hardware hang but
doesn't happend with linux nor OpenBSD. Didn't tried 5.3 yet.

Hang after detection of ATA devices (floppy's light turns on, then hang)



signature.asc
Description: OpenPGP digital signature


Re: 5.4 Installer + Promise FT100TX2 = Loader crash

2005-08-05 Thread Geoffroy DESVERNAY
Daniel O'Connor wrote:
 Hi,
 I am updating an old 4.x system to 5.4 here and it has a Promise FT100TX2 
 RAID 
 controller (in mirror).
 
 The problem is that when I boot the CD the loader crashes (does a reg dump) 
 but only if the card is present and an array is defined. I can't record what 
 the dump is because it continually sprays the dump down the screen which 
 makes it unreadable :(
 
 I have seen this on an AMD64 system (I used the same RAID card in it) and got 
 the same problem. To work around it on that system I installed via the 
 motherboard IDE controller and then moved the disk over to the RAID 
 controller.
 
 It only seems to affect booting the installer - once the system is installed 
 it boots from the RAID card just fine (!)
 
 I just tried booting from floppy and that works (?!) although that method 
 doesn't probe my PS/2 keyboard for some reason :-/
 
 Does anyone have any suggestions for fixing the RAID + CD boot problem?
 
 Thanks.
 
I've exactly the same issue: on a PIII866, only one IDE CD, Floppy and a
RAID1 array on the FT100TX2, no way to boot on the CD...

It works with there are no array on the card.
(No array defined, Ctrl-F for bios or I dont remember, ESC ? to continue).
- I just installed on 'ad4'(prim. master from raid card) disk, edited
fstab to mount ar0 instead of ad4, then build (on the card's bios) a
raid1 with ad6(long time copying...), then it boots from the raid. Still
not from a FreeBSD5.4 RELEASE CD, nor a FreeSBIE 1.1 (based on 5.3).

I can do some tests if needed, even if I'm don't really understand more
than 'hello world', I know how to apply patches ;)

-- 
 ---
/ Geoffroy DESVERNAY|   \
   /`Service info`  | Tel: (+33|0)4 91 05 45 24  \
   \  Ecole Généraliste d'Ingénieurs| Fax: (+33|0)4 91 05 45 98  /
\ ...de MARSEILLE   |  dgeo _AT_ egim-mrs.fr/
 ---



smime.p7s
Description: S/MIME Cryptographic Signature


Re: kernel bug (ufs2?) on a dell 2600

2005-06-14 Thread Geoffroy Desvernay
It's not related to more-than-full fs: it occured one more time without
it :(

Do someone have an idea ?

Geoffroy Desvernay a écrit :
 This server (FreeBSD 5.4 RELENG) is crashing once a week or more since
 5.4 (maybe before).
 
 It may be related with a full filesystem:
 I'm using snapshots on this server (using
 http://people.freebsd.org/~rse/snapshot/), and crash has occured after
 (~30mins) a snapshot that fills up to 100% the filesystem.
 
 Attached the dmesg and kgdb logs.
 
 I'm not so hacker, but hope that it can help to resolve this bug.
 
 
 
 
 
 
 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
 Undefined symbol ps_pglobal_lookup]
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-marcel-freebsd.
 #0  doadump () at pcpu.h:160
 160   __asm __volatile(movl %%fs:0,%0 : =r (td));
 (kgdb) bt full
 #0  doadump () at pcpu.h:160
 No locals.
 #1  0xc06878d6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
   first_buf_printf = 1
 #2  0xc0687cc4 in panic (fmt=0xc091a1ae initiate_write_inodeblock_ufs2: 
 already started)
 at /usr/src/sys/kern/kern_shutdown.c:566
   td = (struct thread *) 0xc3641c00
   bootopt = 260
   newpanic = 0
   ap = 0xc3641c00 \\\214\220Ã -XÃ
   buf = initiate_write_inodeblock_ufs2: already started, '\0' repeats 
 208 times
 #3  0xc080ef5f in initiate_write_inodeblock_ufs2 (inodedep=0xc5be0280, bp=0x0)
 at /usr/src/sys/ufs/ffs/ffs_softdep.c:3781
   adp = (struct allocdirect *) 0xd75bd13c
   lastadp = (struct allocdirect *) 0x1000
   dp = (struct ufs2_dinode *) 0x0
   fs = (struct fs *) 0xc21f6730
   i = Unhandled dwarf expression opcode 0x93
 (kgdb) quit
 
 
 
 
 Copyright (c) 1992-2005 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
   The Regents of the University of California. All rights reserved.
 FreeBSD 5.4-STABLE #0: Mon Jun  6 18:51:49 CEST 2005
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ZLIP
 Timecounter i8254 frequency 1193182 Hz quality 0
 CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2392.29-MHz 686-class CPU)
   Origin = GenuineIntel  Id = 0xf29  Stepping = 9
   
 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
   Hyperthreading: 2 logical CPUs
 real memory  = 2147287040 (2047 MB)
 avail memory = 2095828992 (1998 MB)
 ACPI APIC Table: DELL   PE2600  
 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
  cpu2 (AP): APIC ID:  6
  cpu3 (AP): APIC ID:  7
 ioapic0: Changing APIC ID to 8
 ioapic1: Changing APIC ID to 9
 ioapic2: Changing APIC ID to 10
 ioapic2: WARNING: intbase 72 != expected base 48
 ioapic3: Changing APIC ID to 11
 ioapic3: WARNING: intbase 120 != expected base 96
 ioapic4: Changing APIC ID to 12
 ioapic0 Version 2.0 irqs 0-23 on motherboard
 ioapic1 Version 2.0 irqs 24-47 on motherboard
 ioapic2 Version 2.0 irqs 72-95 on motherboard
 ioapic3 Version 2.0 irqs 120-143 on motherboard
 ioapic4 Version 2.0 irqs 144-167 on motherboard
 npx0: math processor on motherboard
 npx0: INT 16 interface
 acpi0: DELL PE2600 on motherboard
 acpi0: Power Button (fixed)
 Timecounter ACPI-safe frequency 3579545 Hz quality 1000
 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
 cpu0: ACPI CPU on acpi0
 cpu1: ACPI CPU on acpi0
 cpu2: ACPI CPU on acpi0
 cpu3: ACPI CPU on acpi0
 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
 pci0: ACPI PCI bus on pcib0
 pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0
 pci1: ACPI PCI bus on pcib1
 pci1: base peripheral, interrupt controller at device 28.0 (no driver 
 attached)
 pcib2: ACPI PCI-PCI bridge at device 29.0 on pci1
 pci2: ACPI PCI bus on pcib2
 em0: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 
 0xece0-0xecff mem 0xfdec-0xfded,0xfdee-0xfdef irq 24 at 
 device 2.0 on pci2
 em0: Ethernet address: 00:02:b3:d4:d3:a2
 em0:  Speed:N/A  Duplex:N/A
 pci1: base peripheral, interrupt controller at device 30.0 (no driver 
 attached)
 pcib3: ACPI PCI-PCI bridge at device 31.0 on pci1
 pci3: ACPI PCI bus on pcib3
 em1: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 
 0xdce0-0xdcff mem 0xfdcc-0xfdcd,0xfdce-0xfdcf irq 28 at 
 device 1.0 on pci3
 em1: Ethernet address: 00:0b:db:92:0a:e4
 em1:  Speed:N/A  Duplex:N/A
 pcib4: ACPI PCI-PCI bridge at device 3.0 on pci0
 pci4: ACPI PCI bus on pcib4
 pci4: base peripheral

kernel bug (ufs2?) on a dell 2600

2005-06-13 Thread Geoffroy Desvernay
This server (FreeBSD 5.4 RELENG) is crashing once a week or more since
5.4 (maybe before).

It may be related with a full filesystem:
I'm using snapshots on this server (using
http://people.freebsd.org/~rse/snapshot/), and crack has occured after
(~30mins) a snapshot that fills up to 100% the filesystem.

Attached the dmesg and kgdb logs.

I'm not so hacker, but hope that it can help to resolve this bug.


[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.
#0  doadump () at pcpu.h:160
160 __asm __volatile(movl %%fs:0,%0 : =r (td));
(kgdb) bt full
#0  doadump () at pcpu.h:160
No locals.
#1  0xc06878d6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
first_buf_printf = 1
#2  0xc0687cc4 in panic (fmt=0xc091a1ae initiate_write_inodeblock_ufs2: 
already started)
at /usr/src/sys/kern/kern_shutdown.c:566
td = (struct thread *) 0xc3641c00
bootopt = 260
newpanic = 0
ap = 0xc3641c00 \\\214\220Ã -XÃ
buf = initiate_write_inodeblock_ufs2: already started, '\0' repeats 
208 times
#3  0xc080ef5f in initiate_write_inodeblock_ufs2 (inodedep=0xc5be0280, bp=0x0)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:3781
adp = (struct allocdirect *) 0xd75bd13c
lastadp = (struct allocdirect *) 0x1000
dp = (struct ufs2_dinode *) 0x0
fs = (struct fs *) 0xc21f6730
i = Unhandled dwarf expression opcode 0x93
(kgdb) quit
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-STABLE #0: Mon Jun  6 18:51:49 CEST 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/ZLIP
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2392.29-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf29  Stepping = 9
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Hyperthreading: 2 logical CPUs
real memory  = 2147287040 (2047 MB)
avail memory = 2095828992 (1998 MB)
ACPI APIC Table: DELL   PE2600  
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic2: WARNING: intbase 72 != expected base 48
ioapic3: Changing APIC ID to 11
ioapic3: WARNING: intbase 120 != expected base 96
ioapic4: Changing APIC ID to 12
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
ioapic2 Version 2.0 irqs 72-95 on motherboard
ioapic3 Version 2.0 irqs 120-143 on motherboard
ioapic4 Version 2.0 irqs 144-167 on motherboard
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: DELL PE2600 on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-safe frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
cpu0: ACPI CPU on acpi0
cpu1: ACPI CPU on acpi0
cpu2: ACPI CPU on acpi0
cpu3: ACPI CPU on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0
pci1: ACPI PCI bus on pcib1
pci1: base peripheral, interrupt controller at device 28.0 (no driver 
attached)
pcib2: ACPI PCI-PCI bridge at device 29.0 on pci1
pci2: ACPI PCI bus on pcib2
em0: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 
0xece0-0xecff mem 0xfdec-0xfded,0xfdee-0xfdef irq 24 at device 
2.0 on pci2
em0: Ethernet address: 00:02:b3:d4:d3:a2
em0:  Speed:N/A  Duplex:N/A
pci1: base peripheral, interrupt controller at device 30.0 (no driver 
attached)
pcib3: ACPI PCI-PCI bridge at device 31.0 on pci1
pci3: ACPI PCI bus on pcib3
em1: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 
0xdce0-0xdcff mem 0xfdcc-0xfdcd,0xfdce-0xfdcf irq 28 at device 
1.0 on pci3
em1: Ethernet address: 00:0b:db:92:0a:e4
em1:  Speed:N/A  Duplex:N/A
pcib4: ACPI PCI-PCI bridge at device 3.0 on pci0
pci4: ACPI PCI bus on pcib4
pci4: base peripheral, interrupt controller at device 28.0 (no driver 
attached)
pcib5: ACPI PCI-PCI bridge at device 29.0 on pci4
pci5: ACPI PCI bus on pcib5
pci4: base peripheral, interrupt controller at device 30.0 (no driver 
attached)
pcib6: ACPI PCI-PCI bridge at device 31.0 on pci4
pci6: ACPI PCI bus on pcib6
bge0: Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002 mem 
0xfd8f-0xfd8f