Re: sunfire v120 gem interfaces

2015-11-20 Thread Ryan Freeman
On Fri, Nov 13, 2015 at 12:36:40PM +1000, David Gwynne wrote:
> 
> > On 13 Nov 2015, at 12:16, Ryan Freeman  wrote:
> > 
> > On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote:
> >> any joy? i mean, failure?
> > 
> > Well I got something different.  I've noticed the failures only seem to 
> > happen
> > when my roommates arrive home.  I can use my stuff remotely all day from 
> > work
> > without a hitch, roommates come home and usually within an hr there is an
> > internet complaint.
> > 
> > Since I started using the little scripts to detect connection failure
> > and down/up the iface in question, things had been pretty good simply in the
> > fact that nobody could really notice before it fixed itself.
> > 
> > Today the machine dropped to ddb>!  of course i couldn't remember a damn
> > thing to type :(  i got trace, terribly sorry it wasn't more...
> > 
> > ddb> trace
> > extent_free(400012600c0, 0, 0, 0, 1fef078, 800012fa) at 
> > extent_free
> > +0x174
> > iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at 
> > iommu_dvmamap_unl
> > oad+0x74
> > gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, 
> > 80
> > 00) at gem_rint+0x160
> > gem_intr(400014ac000, c00ca000, 2000, 0, 0, 8000) at gem_intr+0x154
> > intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc
> > sparc_interrupt(0, 400014b, 80206910, 400171b7c60, 40009ec0810, 0) at 
> > sparc
> > _interrupt+0x298
> > gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, 
> > 40009b73c10) a
> > t gem_ioctl+0x19c
> > ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c
> > sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190
> > syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4
> > softtrap(3, 80206910, fffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c
> > ddb>
> 
> that is interesting. if you're still in ddb, can you go sh panic?
> 
> if not, not biggy.
> 
> my gut feeling is our ring accounting is wonky. mpi@ and jmatthew@ have 
> tweaks to gem(4) for mpsafety which might fix this. ill poke them to see if 
> they would share.

I scraped some more stuff from another panic, not running w/ the jmatthew patch 
yet
though...


Connected to /dev/cuaU0 (speed 9600)

ddb> trace
extent_free(400012600c0, 0, 0, 0, 1fef078, 86fc) at extent_free
+0x174 
iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at iommu_dvmamap_unl
oad+0x74   
gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, 80
00) at gem_rint+0x160  
gem_intr(400014ac000, c005, 2000, 0, 0, 8000) at gem_intr+0x154
intr_handler(e0017ec8, 4000117ae00, 1b5e78e1, 0, 800, 2) at intr_handler+0xc
sparc_interrupt(0, 400014b, 80206910, 40017d87c60, 40009f34cb0, 0) at sparc
_interrupt+0x298   
gem_ioctl(400014ac048, 400014ac000, 40017d87c60, 40017d87c60, 0, 400096ca950) a
t gem_ioctl+0x19c  
ifioctl(0, 80206910, 40017d87c60, 400096ca950, 1012d74, 0) at ifioctl+0x38c
sys_ioctl(0, 40017d87db8, 40017d87df8, 0, 0, 14b) at sys_ioctl+0x190   
syscall(40017d87ed0, 436, 198ac20888, 198ac2088c, 0, 0) at syscall+0x3c4
softtrap(3, 80206910, fffd8138, 0, 0, 1ff7fff6df8) at softtrap+0x19c
ddb> sh panic   
extent_free: extent `psycho0 dvma', region not within extent
ddb> ps 
   PID   PPID   PGRPUID  S   FLAGS  WAIT  COMMAND
*22395   2599  32097  0  7 0x2ifconfig
  2599  32097  32097  0  30x8a  pause sh  
 32097   1585  32097  0  30x8a  pause sh
  1585  27132  27132  0  30x80  piperdcron
 21846  1  21846 77  20x90dhclient
 13160  1  13160  0  30x80  poll  dhclient
  5578   7747   5578   1000  30x83  ttyin ksh 
  7747  16002  16002   1000  30x90  selectsshd
 16002  28625  16002  0  30x92  poll  sshd
  4106195195  0  30x83  poll  pftop
   195  24715195   1000  30x8b  pause ksh  
 24715   5976   5976   1000  30x90  selectsshd
  5976  28625   5976  0  30x92  poll  sshd
 28625  1  28625  0  30x80  selectsshd
 29463  19386  29463   1000  30x83  kqreadtail
 19386  24409  19386   1000  30x8b  pause ksh 
 24409   7564   7564   1000  30x90  selectsshd
  7564  1   7564  0  30x92  

Re: sunfire v120 gem interfaces

2015-11-12 Thread David Gwynne

> On 13 Nov 2015, at 12:16, Ryan Freeman  wrote:
> 
> On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote:
>> any joy? i mean, failure?
> 
> Well I got something different.  I've noticed the failures only seem to happen
> when my roommates arrive home.  I can use my stuff remotely all day from work
> without a hitch, roommates come home and usually within an hr there is an
> internet complaint.
> 
> Since I started using the little scripts to detect connection failure
> and down/up the iface in question, things had been pretty good simply in the
> fact that nobody could really notice before it fixed itself.
> 
> Today the machine dropped to ddb>!  of course i couldn't remember a damn
> thing to type :(  i got trace, terribly sorry it wasn't more...
> 
> ddb> trace
> extent_free(400012600c0, 0, 0, 0, 1fef078, 800012fa) at 
> extent_free
> +0x174
> iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at 
> iommu_dvmamap_unl
> oad+0x74
> gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, 
> 80
> 00) at gem_rint+0x160
> gem_intr(400014ac000, c00ca000, 2000, 0, 0, 8000) at gem_intr+0x154
> intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc
> sparc_interrupt(0, 400014b, 80206910, 400171b7c60, 40009ec0810, 0) at 
> sparc
> _interrupt+0x298
> gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, 40009b73c10) 
> a
> t gem_ioctl+0x19c
> ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c
> sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190
> syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4
> softtrap(3, 80206910, fffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c
> ddb>

that is interesting. if you're still in ddb, can you go sh panic?

if not, not biggy.

my gut feeling is our ring accounting is wonky. mpi@ and jmatthew@ have tweaks 
to gem(4) for mpsafety which might fix this. ill poke them to see if they would 
share.

dlg

> 
> 
> 
>> 
>>> On 9 Nov 2015, at 10:40 AM, Ryan Freeman  wrote:
>>> 
>>> On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote:
 can you get the ifconfig output when its locked up? and a copy of what 
 systat mb is showing?
 
 cheers,
 dlg
>>> 
>>> Thanks David,
>>> 
>>> I have setup a script to try and capture this immediately when it happens.
>>> 
>>> FWIW here is the output as it is now, working:
>>> 
>>> 16:35 ryan@void:~$ ifconfig
>>> lo0: flags=8049 mtu 32768
>>>   priority: 0
>>>   groups: lo
>>>   inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
>>>   inet6 ::1 prefixlen 128
>>>   inet 127.0.0.1 netmask 0xff00
>>> gem0: flags=8867 
>>> mtu 1500
>>>   lladdr 00:03:ba:2b:47:70
>>>   priority: 0
>>>   groups: egress
>>>   media: Ethernet autoselect (100baseTX full-duplex)
>>>   status: active
>>>   inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255
>>> gem1: 
>>> flags=8b63
>>>  mtu 1500
>>>   lladdr 00:03:ba:2b:47:71
>>>   priority: 0
>>>   media: Ethernet autoselect (100baseTX full-duplex)
>>>   status: active
>>>   inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31
>>>   inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2
>>>   inet6 2001:470:b:6cf::1 prefixlen 64
>>> enc0: flags=0<>
>>>   priority: 0
>>>   groups: enc
>>>   status: active
>>> vlan100: flags=8843 mtu 1500
>>>   lladdr 00:03:ba:2b:47:71
>>>   description: servers
>>>   priority: 0
>>>   vlan: 100 parent interface: gem1
>>>   groups: vlan
>>>   status: active
>>>   inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31
>>>   inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5
>>>   inet6 2001:470:eac8:666::1 prefixlen 64
>>> vlan101: flags=8843 mtu 1500
>>>   lladdr 00:03:ba:2b:47:71
>>>   description: workstations
>>>   priority: 0
>>>   vlan: 101 parent interface: gem1
>>>   groups: vlan
>>>   status: active
>>>   inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255
>>>   inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6
>>>   inet6 2001:470:eac8:a::1 prefixlen 64
>>> vlan102: flags=8843 mtu 1500
>>>   lladdr 00:03:ba:2b:47:71
>>>   description: wireless
>>>   priority: 0
>>>   vlan: 102 parent interface: gem1
>>>   groups: vlan
>>>   status: active
>>>   inet 10.21.9.254 netmask 0xff80 broadcast 10.21.9.255
>>>   inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7
>>>   inet6 2001:470:eac8:b::1 prefixlen 64
>>> vlan2: 

Re: sunfire v120 gem interfaces

2015-11-12 Thread Ryan Freeman
On Fri, Nov 13, 2015 at 12:36:40PM +1000, David Gwynne wrote:
> 
> > On 13 Nov 2015, at 12:16, Ryan Freeman  wrote:
> > 
> > On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote:
> >> any joy? i mean, failure?
> > 
> > Well I got something different.  I've noticed the failures only seem to 
> > happen
> > when my roommates arrive home.  I can use my stuff remotely all day from 
> > work
> > without a hitch, roommates come home and usually within an hr there is an
> > internet complaint.
> > 
> > Since I started using the little scripts to detect connection failure
> > and down/up the iface in question, things had been pretty good simply in the
> > fact that nobody could really notice before it fixed itself.
> > 
> > Today the machine dropped to ddb>!  of course i couldn't remember a damn
> > thing to type :(  i got trace, terribly sorry it wasn't more...
> > 
> > ddb> trace
> > extent_free(400012600c0, 0, 0, 0, 1fef078, 800012fa) at 
> > extent_free
> > +0x174
> > iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at 
> > iommu_dvmamap_unl
> > oad+0x74
> > gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, 
> > 80
> > 00) at gem_rint+0x160
> > gem_intr(400014ac000, c00ca000, 2000, 0, 0, 8000) at gem_intr+0x154
> > intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc
> > sparc_interrupt(0, 400014b, 80206910, 400171b7c60, 40009ec0810, 0) at 
> > sparc
> > _interrupt+0x298
> > gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, 
> > 40009b73c10) a
> > t gem_ioctl+0x19c
> > ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c
> > sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190
> > syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4
> > softtrap(3, 80206910, fffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c
> > ddb>
> 
> that is interesting. if you're still in ddb, can you go sh panic?
> 
> if not, not biggy.

Sadly, I am not.  as it is my router, I had to reboot to get back online to
send the mail.  If it triggers again I will make sure I include that.

> my gut feeling is our ring accounting is wonky. mpi@ and jmatthew@ have 
> tweaks to gem(4) for mpsafety which might fix this. ill poke them to see if 
> they would share.

I am willing to try anything! :)  I will reiterate that I am just running 5.8
stable (with mtier binpatches for errata); if it requires me to bump up to
-current, no biggie :)

> 
> dlg
> 
> > 
> > 
> > 
> >> 
> >>> On 9 Nov 2015, at 10:40 AM, Ryan Freeman  wrote:
> >>> 
> >>> On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote:
>  can you get the ifconfig output when its locked up? and a copy of what 
>  systat mb is showing?
>  
>  cheers,
>  dlg
> >>> 
> >>> Thanks David,
> >>> 
> >>> I have setup a script to try and capture this immediately when it happens.
> >>> 
> >>> FWIW here is the output as it is now, working:
> >>> 
> >>> 16:35 ryan@void:~$ ifconfig
> >>> lo0: flags=8049 mtu 32768
> >>>   priority: 0
> >>>   groups: lo
> >>>   inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
> >>>   inet6 ::1 prefixlen 128
> >>>   inet 127.0.0.1 netmask 0xff00
> >>> gem0: flags=8867 
> >>> mtu 1500
> >>>   lladdr 00:03:ba:2b:47:70
> >>>   priority: 0
> >>>   groups: egress
> >>>   media: Ethernet autoselect (100baseTX full-duplex)
> >>>   status: active
> >>>   inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255
> >>> gem1: 
> >>> flags=8b63
> >>>  mtu 1500
> >>>   lladdr 00:03:ba:2b:47:71
> >>>   priority: 0
> >>>   media: Ethernet autoselect (100baseTX full-duplex)
> >>>   status: active
> >>>   inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31
> >>>   inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2
> >>>   inet6 2001:470:b:6cf::1 prefixlen 64
> >>> enc0: flags=0<>
> >>>   priority: 0
> >>>   groups: enc
> >>>   status: active
> >>> vlan100: flags=8843 mtu 1500
> >>>   lladdr 00:03:ba:2b:47:71
> >>>   description: servers
> >>>   priority: 0
> >>>   vlan: 100 parent interface: gem1
> >>>   groups: vlan
> >>>   status: active
> >>>   inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31
> >>>   inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5
> >>>   inet6 2001:470:eac8:666::1 prefixlen 64
> >>> vlan101: flags=8843 mtu 1500
> >>>   lladdr 00:03:ba:2b:47:71
> >>>   description: workstations
> >>>   priority: 0
> >>>   vlan: 101 parent interface: gem1
> >>>   groups: vlan
> >>>   status: active
> >>>   inet 10.21.8.254 netmask 0xff80 

Re: sunfire v120 gem interfaces

2015-11-12 Thread Ryan Freeman
On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote:
> any joy? i mean, failure?

Well I got something different.  I've noticed the failures only seem to happen
when my roommates arrive home.  I can use my stuff remotely all day from work
without a hitch, roommates come home and usually within an hr there is an
internet complaint.

Since I started using the little scripts to detect connection failure
and down/up the iface in question, things had been pretty good simply in the
fact that nobody could really notice before it fixed itself.

Today the machine dropped to ddb>!  of course i couldn't remember a damn
thing to type :(  i got trace, terribly sorry it wasn't more...

ddb> trace
extent_free(400012600c0, 0, 0, 0, 1fef078, 800012fa) at extent_free
+0x174
iommu_dvmamap_unload(40001266300, 0, 4000129f080, 0, 0, 2) at iommu_dvmamap_unl
oad+0x74
gem_rint(400014ac000, 40016ff, 7fff, e0017c48, 4000, 80
00) at gem_rint+0x160
gem_intr(400014ac000, c00ca000, 2000, 0, 0, 8000) at gem_intr+0x154
intr_handler(e0017ec8, 4000117ae00, 4bca3020, 0, 800, 2) at intr_handler+0xc
sparc_interrupt(0, 400014b, 80206910, 400171b7c60, 40009ec0810, 0) at sparc
_interrupt+0x298
gem_ioctl(400014ac048, 400014ac000, 400171b7c60, 400171b7c60, 0, 40009b73c10) a
t gem_ioctl+0x19c
ifioctl(0, 80206910, 400171b7c60, 40009b73c10, 1012d74, 0) at ifioctl+0x38c
sys_ioctl(0, 400171b7db8, 400171b7df8, 0, 0, 14b) at sys_ioctl+0x190
syscall(400171b7ed0, 436, bec8920888, bec892088c, 0, 0) at syscall+0x3c4
softtrap(3, 80206910, fffe3018, 0, 0, 1ff7fff6df8) at softtrap+0x19c
ddb>



> 
> > On 9 Nov 2015, at 10:40 AM, Ryan Freeman  wrote:
> > 
> > On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote:
> >> can you get the ifconfig output when its locked up? and a copy of what 
> >> systat mb is showing?
> >> 
> >> cheers,
> >> dlg
> > 
> > Thanks David,
> > 
> > I have setup a script to try and capture this immediately when it happens.
> > 
> > FWIW here is the output as it is now, working:
> > 
> > 16:35 ryan@void:~$ ifconfig
> > lo0: flags=8049 mtu 32768
> >priority: 0
> >groups: lo
> >inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
> >inet6 ::1 prefixlen 128
> >inet 127.0.0.1 netmask 0xff00
> > gem0: flags=8867 
> > mtu 1500
> >lladdr 00:03:ba:2b:47:70
> >priority: 0
> >groups: egress
> >media: Ethernet autoselect (100baseTX full-duplex)
> >status: active
> >inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255
> > gem1: 
> > flags=8b63
> >  mtu 1500
> >lladdr 00:03:ba:2b:47:71
> >priority: 0
> >media: Ethernet autoselect (100baseTX full-duplex)
> >status: active
> >inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31
> >inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2
> >inet6 2001:470:b:6cf::1 prefixlen 64
> > enc0: flags=0<>
> >priority: 0
> >groups: enc
> >status: active
> > vlan100: flags=8843 mtu 1500
> >lladdr 00:03:ba:2b:47:71
> >description: servers
> >priority: 0
> >vlan: 100 parent interface: gem1
> >groups: vlan
> >status: active
> >inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31
> >inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5
> >inet6 2001:470:eac8:666::1 prefixlen 64
> > vlan101: flags=8843 mtu 1500
> >lladdr 00:03:ba:2b:47:71
> >description: workstations
> >priority: 0
> >vlan: 101 parent interface: gem1
> >groups: vlan
> >status: active
> >inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255
> >inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6
> >inet6 2001:470:eac8:a::1 prefixlen 64
> > vlan102: flags=8843 mtu 1500
> >lladdr 00:03:ba:2b:47:71
> >description: wireless
> >priority: 0
> >vlan: 102 parent interface: gem1
> >groups: vlan
> >status: active
> >inet 10.21.9.254 netmask 0xff80 broadcast 10.21.9.255
> >inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7
> >inet6 2001:470:eac8:b::1 prefixlen 64
> > vlan2: flags=8843 mtu 1500
> >lladdr 00:03:ba:2b:47:71
> >description: transit
> >priority: 0
> >vlan: 2 parent interface: gem1
> >groups: vlan
> >status: active
> >inet 172.21.1.2 netmask 0xfffc broadcast 172.21.1.3
> > tun0: flags=51 mtu 1500
> >priority: 0
> >groups: tun
> >

Re: sunfire v120 gem interfaces

2015-11-10 Thread Ryan Freeman
On Tue, Nov 10, 2015 at 08:27:36PM +1000, David Gwynne wrote:
> any joy? i mean, failure?

Last night my script triggered three times, hooray ;)

unfortunately my eyes do not even notice much of a difference outside of
system load values in the systat output :(

gem0: flags=8867 mtu 
1500
lladdr 00:03:ba:2b:47:70
priority: 0
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255
gem0: flags=8867 mtu 
1500
lladdr 00:03:ba:2b:47:70
priority: 0
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255
gem0: flags=8867 mtu 
1500
lladdr 00:03:ba:2b:47:70
priority: 0
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255
# 

8 usersLoad 0.69 0.43 0.29 Mon Nov  9 20:31:11 2015

IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM   
System0   25656 129 
 2048321025 
lo0 
gem0 204818 4   12418   
gem1 204812 4   12412   
enc0
vlan100 
vlan101 
vlan102 
vlan2   
tun0
gif0
pflow0  
pflog0  


8 usersLoad 0.44 0.39 0.29 Mon Nov  9 20:32:11 2015

IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM   
System0   25652 129 
 2048251025 
lo0 
gem0 204811 4   12411   
gem1 204812 4   12412   
enc0
vlan100 
vlan101 
vlan102 
vlan2   
tun0
gif0
pflow0  
pflog0  


8 usersLoad 0.11 0.18 0.16 Mon Nov  9 21:54:11 2015

IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM   
System0   25655 129 
 2048281025 
lo0 
gem0 204818 4   12418   
gem1 204810 4   12410   
enc0
vlan100 
vlan101 
vlan102 
vlan2   
tun0
gif0
pflow0  

Re: sunfire v120 gem interfaces

2015-11-10 Thread David Gwynne
any joy? i mean, failure?

> On 9 Nov 2015, at 10:40 AM, Ryan Freeman  wrote:
> 
> On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote:
>> can you get the ifconfig output when its locked up? and a copy of what 
>> systat mb is showing?
>> 
>> cheers,
>> dlg
> 
> Thanks David,
> 
> I have setup a script to try and capture this immediately when it happens.
> 
> FWIW here is the output as it is now, working:
> 
> 16:35 ryan@void:~$ ifconfig
> lo0: flags=8049 mtu 32768
>priority: 0
>groups: lo
>inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
>inet6 ::1 prefixlen 128
>inet 127.0.0.1 netmask 0xff00
> gem0: flags=8867 mtu 
> 1500
>lladdr 00:03:ba:2b:47:70
>priority: 0
>groups: egress
>media: Ethernet autoselect (100baseTX full-duplex)
>status: active
>inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255
> gem1: 
> flags=8b63
>  mtu 1500
>lladdr 00:03:ba:2b:47:71
>priority: 0
>media: Ethernet autoselect (100baseTX full-duplex)
>status: active
>inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31
>inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2
>inet6 2001:470:b:6cf::1 prefixlen 64
> enc0: flags=0<>
>priority: 0
>groups: enc
>status: active
> vlan100: flags=8843 mtu 1500
>lladdr 00:03:ba:2b:47:71
>description: servers
>priority: 0
>vlan: 100 parent interface: gem1
>groups: vlan
>status: active
>inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31
>inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5
>inet6 2001:470:eac8:666::1 prefixlen 64
> vlan101: flags=8843 mtu 1500
>lladdr 00:03:ba:2b:47:71
>description: workstations
>priority: 0
>vlan: 101 parent interface: gem1
>groups: vlan
>status: active
>inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255
>inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6
>inet6 2001:470:eac8:a::1 prefixlen 64
> vlan102: flags=8843 mtu 1500
>lladdr 00:03:ba:2b:47:71
>description: wireless
>priority: 0
>vlan: 102 parent interface: gem1
>groups: vlan
>status: active
>inet 10.21.9.254 netmask 0xff80 broadcast 10.21.9.255
>inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7
>inet6 2001:470:eac8:b::1 prefixlen 64
> vlan2: flags=8843 mtu 1500
>lladdr 00:03:ba:2b:47:71
>description: transit
>priority: 0
>vlan: 2 parent interface: gem1
>groups: vlan
>status: active
>inet 172.21.1.2 netmask 0xfffc broadcast 172.21.1.3
> tun0: flags=51 mtu 1500
>priority: 0
>groups: tun
>status: down
>inet 10.21.2.1 --> 10.21.2.2 netmask 0xfffc
> gif0: flags=8051 mtu 1280
>priority: 0
>groups: gif egress
>tunnel: inet 96.54.13.103 -> 216.218.226.238
>inet6 fe80::203:baff:fe2b:4770%gif0 ->  prefixlen 64 scopeid 0xa
>inet6 2001:470:a:6cf::2 -> 2001:470:a:6cf::1 prefixlen 128
> pflow0: flags=41 mtu 1492
>priority: 0
>pflow: sender: 127.0.0.1 receiver: 127.0.0.1:9995 version: 5
>groups: pflow
> pflog0: flags=141 mtu 33144
>priority: 0
>groups: pflog
> 
> 16:36 ryan@void:~$ systat -b mb
>8 usersLoad 0.21 0.25 0.26 Sun Nov  8 16:37:12 2015
> 
> IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM 
>   
> System0   25648 129   
>   
> 2048241025
>  
> lo0   
>   
> gem0 204811 4   12411 
>   
> gem1 204812 4   12412 
>   
> enc0  
>   
> vlan100   
>   
> vlan101   
>   
> vlan102   
>   
> vlan2 
>   
> tun0 

Re: sunfire v120 gem interfaces

2015-11-09 Thread Christian Weisgerber
Ryan Freeman:

> However, for some reason after sometimes mere hours -- othertimes days at a
> time,  the gem0 interface needs to be cycled: [...] starting to wonder
> if this machine is at its EOL and the network ports are dying :(

I see the same problem with the gem in my Blade 150.

-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: sunfire v120 gem interfaces

2015-11-08 Thread David Gwynne
can you get the ifconfig output when its locked up? and a copy of what systat 
mb is showing?

cheers,
dlg

> On 9 Nov 2015, at 09:36, Ryan Freeman  wrote:
> 
> Hey tech@,
> 
> At my wits end here, I recently got a sunfire v120 from work for pretty cheap.
> Quite excited to have some non x86 hardware, I set it up as a router.
> 
> However, for some reason after sometimes mere hours -- othertimes days at a
> time,  the gem0 interface needs to be cycled:
> 
> ifconfig gem0 down
> ifconfig gem0 up
> dhclient gem0
> 
> no packets pass until that has been done.   At first I have been placing the
> blame squarely on the Hitron modem we have in the house from shaw cable,
> but now I've noticed the issue happen twice on the internal interface as well,
> gem1.  All VLANs I have setup stop responding until gem1 is cycled.
> 
> gem1 is just used by a collection of vlan(4) interfaces, so traffic resumes
> immediately after interface gem1 down/up.
> 
> I've tried to turn on ifconfig gem0 debug to catch anything wierd, but there
> has been nothing of interest there.   Dmesg attached,  starting to wonder
> if this machine is at its EOL and the network ports are dying :(
> 
> This issue occurred with the 5.7 release as well.
> 
> dmesg:
> console is /pci@1f,0/pci@1,1/isa@7/serial@0,3f8
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2015 OpenBSD. All rights reserved.  http://www.OpenBSD.org
> 
> OpenBSD 5.8 (GENERIC) #0: Thu Oct 22 00:24:09 PDT 2015
>r...@void.inter.lan:/usr/src/sys/arch/sparc64/compile/GENERIC
> real mem = 1073741824 (1024MB)
> avail mem = 1039228928 (991MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root: Sun Fire V120 (UltraSPARC-IIe 648MHz)
> cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 3.3) @ 648 MHz
> cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K external (64 
> b/l)
> psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
> psycho0: bus range 0-2, PCI bus 0
> psycho0: dvma map c000-dfff
> pci0 at psycho0
> ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x13
> pci1 at ppb0 bus 1
> ebus0 at pci1 dev 12 function 0 "Sun RIO EBus" rev 0x01
> "flashprom" at ebus0 addr 0-f not configured
> clock1 at ebus0 addr 0-1fff: mk48t59
> lom0 at ebus0 addr 20-23 ivec 0x2a: LOMlite2 rev 3.12
> alipm0 at pci1 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz clock
> iic0 at alipm0
> "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
> spdmem0 at iic0 addr 0x54: 512MB SDRAM registered ECC PC133CL2
> spdmem1 at iic0 addr 0x55: 512MB SDRAM registered ECC PC133CL2
> ebus1 at pci1 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
> power0 at ebus1 addr 2000-2007 ivec 0x25
> com0 at ebus1 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
> com0: console
> com1 at ebus1 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
> gem0 at pci1 dev 12 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7c6, address 
> 00:03:ba:2b:47:70
> ukphy0 at gem0 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI 
> 0x0010dd, model 0x0002
> ohci0 at pci1 dev 12 function 3 "Sun USB" rev 0x01: ivec 0x7e4, version 1.0, 
> legacy support
> pciide0 at pci1 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, 
> channel 0 configured to native-PCI, channel 1 configured to native-PCI
> pciide0: using ivec 0x7cc for native-PCI interrupt
> atapiscsi0 at pciide0 channel 0 drive 0
> scsibus1 at atapiscsi0: 2 targets
> cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom removable
> cd0(pciide0:0:0): using PIO mode 4, DMA mode 2
> pciide0: channel 1 disabled (no drives)
> gem1 at pci1 dev 5 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7dc, address 
> 00:03:ba:2b:47:71
> ukphy1 at gem1 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI 
> 0x0010dd, model 0x0002
> ohci1 at pci1 dev 5 function 3 "Sun USB" rev 0x01: ivec 0x7e6, version 1.0, 
> legacy support
> usb0 at ohci0: USB revision 1.0
> uhub0 at usb0 "Sun OHCI root hub" rev 1.00/1.00 addr 1
> usb1 at ohci1: USB revision 1.0
> uhub1 at usb1 "Sun OHCI root hub" rev 1.00/1.00 addr 1
> ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x13
> pci2 at ppb1 bus 2
> siop0 at pci2 dev 8 function 0 "Symbios Logic 53c896" rev 0x07: ivec 0x7e0, 
> using 8K of on-board RAM
> scsibus2 at siop0: 16 targets, initiator 7
> sym0 at scsibus2 targ 0 lun 0:  SCSI3 
> 0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN8731804D9
> sd0 at scsibus0 targ 0 lun 0:  SCSI3 0/direct 
> fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN8731804D9
> sd0: 34732MB, 512 bytes/sector, 71132959 sectors
> probe(siop0:1:0): Check Condition (error 0x70) on opcode 0x0
>SENSE KEY: Hardware Error
> ASC/ASCQ: Defect List Error
> FRU CODE: 0x7
> sym1 at scsibus2 targ 1 lun 0:  SCSI3 
> 0/direct fixed 

Re: sunfire v120 gem interfaces

2015-11-08 Thread Ryan Freeman
On Mon, Nov 09, 2015 at 10:07:31AM +1000, David Gwynne wrote:
> can you get the ifconfig output when its locked up? and a copy of what systat 
> mb is showing?
> 
> cheers,
> dlg

Thanks David,

I have setup a script to try and capture this immediately when it happens.

FWIW here is the output as it is now, working:

16:35 ryan@void:~$ ifconfig
lo0: flags=8049 mtu 32768
priority: 0
groups: lo
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
inet6 ::1 prefixlen 128
inet 127.0.0.1 netmask 0xff00
gem0: flags=8867 mtu 
1500
lladdr 00:03:ba:2b:47:70
priority: 0
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 96.54.13.103 netmask 0xfc00 broadcast 96.54.15.255
gem1: 
flags=8b63 
mtu 1500
lladdr 00:03:ba:2b:47:71
priority: 0
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 10.16.1.30 netmask 0xffe0 broadcast 10.16.1.31
inet6 fe80::203:baff:fe2b:4771%gem1 prefixlen 64 scopeid 0x2
inet6 2001:470:b:6cf::1 prefixlen 64
enc0: flags=0<>
priority: 0
groups: enc
status: active
vlan100: flags=8843 mtu 1500
lladdr 00:03:ba:2b:47:71
description: servers
priority: 0
vlan: 100 parent interface: gem1
groups: vlan
status: active
inet 10.21.1.30 netmask 0xffe0 broadcast 10.21.1.31
inet6 fe80::203:baff:fe2b:4771%vlan100 prefixlen 64 scopeid 0x5
inet6 2001:470:eac8:666::1 prefixlen 64
vlan101: flags=8843 mtu 1500
lladdr 00:03:ba:2b:47:71
description: workstations
priority: 0
vlan: 101 parent interface: gem1
groups: vlan
status: active
inet 10.21.8.254 netmask 0xff80 broadcast 10.21.8.255
inet6 fe80::203:baff:fe2b:4771%vlan101 prefixlen 64 scopeid 0x6
inet6 2001:470:eac8:a::1 prefixlen 64
vlan102: flags=8843 mtu 1500
lladdr 00:03:ba:2b:47:71
description: wireless
priority: 0
vlan: 102 parent interface: gem1
groups: vlan
status: active
inet 10.21.9.254 netmask 0xff80 broadcast 10.21.9.255
inet6 fe80::203:baff:fe2b:4771%vlan102 prefixlen 64 scopeid 0x7
inet6 2001:470:eac8:b::1 prefixlen 64
vlan2: flags=8843 mtu 1500
lladdr 00:03:ba:2b:47:71
description: transit
priority: 0
vlan: 2 parent interface: gem1
groups: vlan
status: active
inet 172.21.1.2 netmask 0xfffc broadcast 172.21.1.3
tun0: flags=51 mtu 1500
priority: 0
groups: tun
status: down
inet 10.21.2.1 --> 10.21.2.2 netmask 0xfffc
gif0: flags=8051 mtu 1280
priority: 0
groups: gif egress
tunnel: inet 96.54.13.103 -> 216.218.226.238
inet6 fe80::203:baff:fe2b:4770%gif0 ->  prefixlen 64 scopeid 0xa
inet6 2001:470:a:6cf::2 -> 2001:470:a:6cf::1 prefixlen 128
pflow0: flags=41 mtu 1492
priority: 0
pflow: sender: 127.0.0.1 receiver: 127.0.0.1:9995 version: 5
groups: pflow
pflog0: flags=141 mtu 33144
priority: 0
groups: pflog

16:36 ryan@void:~$ systat -b mb
8 usersLoad 0.21 0.25 0.26 Sun Nov  8 16:37:12 2015

IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM   
System0   25648 129 
 2048241025 
lo0 
gem0 204811 4   12411   
gem1 204812 4   12412   
enc0
vlan100 
vlan101 
vlan102 
vlan2   
tun0
gif0
pflow0  
pflog0  

> 
> > On 9 Nov 2015, at 09:36, Ryan Freeman  wrote:
> > 
> > Hey tech@,
> > 
> > At my 

Re: sunfire v120 gem interfaces

2015-11-08 Thread Alexander Hall
I had problems with my dual AC200 carp setup, in that the interfaces would 
periodically stop receiving packets. Transmission still worked though, so the 
carp wouldn't fail over... 

Machines are retired now, but I believe details exist in the archives 
somewhere. I also believe henning@ had similar issues in the past.

/Alexander 

On November 9, 2015 12:36:33 AM GMT+01:00, Ryan Freeman  
wrote:
>Hey tech@,
>
>At my wits end here, I recently got a sunfire v120 from work for pretty
>cheap.
>Quite excited to have some non x86 hardware, I set it up as a router.
>
>However, for some reason after sometimes mere hours -- othertimes days
>at a
>time,  the gem0 interface needs to be cycled:
>
>ifconfig gem0 down
>ifconfig gem0 up
>dhclient gem0
>
>no packets pass until that has been done.   At first I have been
>placing the
>blame squarely on the Hitron modem we have in the house from shaw
>cable,
>but now I've noticed the issue happen twice on the internal interface
>as well,
>gem1.  All VLANs I have setup stop responding until gem1 is cycled.
>
>gem1 is just used by a collection of vlan(4) interfaces, so traffic
>resumes
>immediately after interface gem1 down/up.
>
>I've tried to turn on ifconfig gem0 debug to catch anything wierd, but
>there
>has been nothing of interest there.   Dmesg attached,  starting to
>wonder
>if this machine is at its EOL and the network ports are dying :(
>
>This issue occurred with the 5.7 release as well.
>
>dmesg:
>console is /pci@1f,0/pci@1,1/isa@7/serial@0,3f8
>Copyright (c) 1982, 1986, 1989, 1991, 1993
> The Regents of the University of California.  All rights reserved.
>Copyright (c) 1995-2015 OpenBSD. All rights reserved. 
>http://www.OpenBSD.org
>
>OpenBSD 5.8 (GENERIC) #0: Thu Oct 22 00:24:09 PDT 2015
>r...@void.inter.lan:/usr/src/sys/arch/sparc64/compile/GENERIC
>real mem = 1073741824 (1024MB)
>avail mem = 1039228928 (991MB)
>mpath0 at root
>scsibus0 at mpath0: 256 targets
>mainbus0 at root: Sun Fire V120 (UltraSPARC-IIe 648MHz)
>cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 3.3) @ 648 MHz
>cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 512K
>external (64 b/l)
>psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
>psycho0: bus range 0-2, PCI bus 0
>psycho0: dvma map c000-dfff
>pci0 at psycho0
>ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x13
>pci1 at ppb0 bus 1
>ebus0 at pci1 dev 12 function 0 "Sun RIO EBus" rev 0x01
>"flashprom" at ebus0 addr 0-f not configured
>clock1 at ebus0 addr 0-1fff: mk48t59
>lom0 at ebus0 addr 20-23 ivec 0x2a: LOMlite2 rev 3.12
>alipm0 at pci1 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz
>clock
>iic0 at alipm0
>"max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
>spdmem0 at iic0 addr 0x54: 512MB SDRAM registered ECC PC133CL2
>spdmem1 at iic0 addr 0x55: 512MB SDRAM registered ECC PC133CL2
>ebus1 at pci1 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
>power0 at ebus1 addr 2000-2007 ivec 0x25
>com0 at ebus1 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
>com0: console
>com1 at ebus1 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
>gem0 at pci1 dev 12 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7c6,
>address 00:03:ba:2b:47:70
>ukphy0 at gem0 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI
>0x0010dd, model 0x0002
>ohci0 at pci1 dev 12 function 3 "Sun USB" rev 0x01: ivec 0x7e4, version
>1.0, legacy support
>pciide0 at pci1 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3:
>DMA, channel 0 configured to native-PCI, channel 1 configured to
>native-PCI
>pciide0: using ivec 0x7cc for native-PCI interrupt
>atapiscsi0 at pciide0 channel 0 drive 0
>scsibus1 at atapiscsi0: 2 targets
>cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom
>removable
>cd0(pciide0:0:0): using PIO mode 4, DMA mode 2
>pciide0: channel 1 disabled (no drives)
>gem1 at pci1 dev 5 function 1 "Sun ERI Ether" rev 0x01: ivec 0x7dc,
>address 00:03:ba:2b:47:71
>ukphy1 at gem1 phy 1: Generic IEEE 802.3u media interface, rev. 1: OUI
>0x0010dd, model 0x0002
>ohci1 at pci1 dev 5 function 3 "Sun USB" rev 0x01: ivec 0x7e6, version
>1.0, legacy support
>usb0 at ohci0: USB revision 1.0
>uhub0 at usb0 "Sun OHCI root hub" rev 1.00/1.00 addr 1
>usb1 at ohci1: USB revision 1.0
>uhub1 at usb1 "Sun OHCI root hub" rev 1.00/1.00 addr 1
>ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x13
>pci2 at ppb1 bus 2
>siop0 at pci2 dev 8 function 0 "Symbios Logic 53c896" rev 0x07: ivec
>0x7e0, using 8K of on-board RAM
>scsibus2 at siop0: 16 targets, initiator 7
>sym0 at scsibus2 targ 0 lun 0:  SCSI3
>0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN8731804D9
>sd0 at scsibus0 targ 0 lun 0:  SCSI3
>0/direct fixed serial.SEAGATE_ST336607LSUN36G_3JA0DGN8731804D9
>sd0: 34732MB, 512 bytes/sector, 71132959 sectors
>probe(siop0:1:0): Check Condition (error 0x70) on opcode 0x0
>SENSE KEY: Hardware Error
> ASC/ASCQ: Defect List Error
>