Re: CURRENT: re(4) crashing system

2016-11-20 Thread O. Hartmann
Am Sun, 20 Nov 2016 16:43:52 +0900
YongHyeon PYUN  schrieb:

> On Sat, Nov 19, 2016 at 07:44:35PM +0100, O. Hartmann wrote:
> > Am Mon, 7 Nov 2016 11:16:23 +0900
> > YongHyeon PYUN  schrieb:
> >   
> > > On Sun, Nov 06, 2016 at 01:20:36PM +0100, Hartmann, O. wrote:  
> > > > On Mon, 31 Oct 2016 11:12:22 +0900
> > > > YongHyeon PYUN  wrote:
> > > > 
> > > > > On Fri, Oct 28, 2016 at 09:21:13PM +0200, Hartmann, O. wrote:
> > > > > > On Thu, 27 Oct 2016 10:00:04 +0900
> > > > > > YongHyeon PYUN  wrote:
> > > > > >   
> > > > > > > On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote: 
> > > > > > >  
> > > > > > > > On Tue, 25 Oct 2016 11:05:38 +0900
> > > > > > > > YongHyeon PYUN  wrote:
> > > > > > > > 
> > > > > > > 
> > > > > > > [...]
> > > > > > >   
> > > > > > > > > I'm not sure but it's likely the issue is related with
> > > > > > > > > EEE/Green Ethernet handling. EEE is negotiated feature with
> > > > > > > > > link partner. If you directly connect your laptop to non-EEE
> > > > > > > > > capable link partner like other re(4) box without switches
> > > > > > > > > you may be able to tell whether the issue is EEE/Green
> > > > > > > > > Ethernet related one or not.
> > > > > > > > 
> > > > > > > > Me either since when I discovered a problem the first time with
> > > > > > > > CURRENT, that was the Friday before last week's Friday, there
> > > > > > > > was a unlucky coicidence: I got the new switch, FreeBSD
> > > > > > > > introduced a serious bug and I changed the NICs.
> > > > > > > > 
> > > > > > > > The laptop, the last in the row of re(4) equipted systems on
> > > > > > > > which I use the Realtek NIC, does well now with Green IT
> > > > > > > > technology, but crashes on plugging/unplugging - not on each
> > > > > > > > event, but at least in one of ten.
> > > > > > > 
> > > > > > > Hmm, it seems you know how to trigger the issue. When you unplug
> > > > > > > UTP cable was there active network traffic on re(4) device?
> > > > > > > It would be helpful to know which event triggers the crash(e.g.
> > > > > > > unplugging or plugging).  And would you show me backtrace of
> > > > > > > panic? 
> > > > > > > > I guess the Green IT issue is more a unlucky guess of mine and
> > > > > > > > went hand in hand with the problem I face with CURRENT right
> > > > > > > > now on some older, Non UEFI machines.
> > > > > > > > 
> > > > > > > 
> > > > > > > Ok.
> > > > > > > 
> > > > > > > [...]  
> > > > > > > > 
> > > > > > > > As requested the informations about re0 and rgephy0 on the
> > > > > > > > laptop (Lenovo E540) 
> > > > > > > > 
> > > > > > > > [...]
> > > > > > > > 
> > > > > > > > rgephy0:  PHY 1 on miibus0
> > > > > > > > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow,
> > > > > > > > 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > > > > > > > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > > > > > > > 1000baseT-FDX-flow-master, auto, auto-flow
> > > > > > > > 
> > > > > > > > re0: 
> > > > > > > > port 0x3000-0x30ff mem
> > > > > > > > 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff at device 0.0 on
> > > > > > > > pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip
> > > > > > > > rev. 0x5080 re0: MAC rev. 0x0010
> > > > > > > 
> > > > > > > This looks like 8168GU controller.
> > > > > > > 
> > > > > > > [...]
> > > > > > >   
> > > > > > > > I use options netmap in kernel config, but the problem is also
> > > > > > > > present without this option - just for the record.
> > > > > > > > 
> > > > > > > 
> > > > > > > Yup, netmap(4) has nothing to do with the crash.
> > > > > > > 
> > > > > > > Thanks.  
> > > > > > 
> > > > > > Attached, you'll find the backtrace of the crash. This time it was
> > > > > > really easy - just one pull of the LAN cabling - and we are
> > > > > > happy :-/
> > > > > > 
> > > > > > Please let me know if you need something else. I will return to
> > > > > > normal operations (disabling debugging) due to CURRENT is very
> > > > > > unstable at the moment on other hosts beyond r307157.
> > > > > >   
> > > > > 
> > > > > It seems the attachment was stripped.
> > > > 
> > > > This time I hope I got it right!
> > > > 
> > > > Attached you'll find the latest CURRENT's backtrace on the provoked
> > > > crash (plug and unplug).
> > > > 
> > > > I also saved the kernel and coredump, so if you need me to do further
> > > > investigations,please let me know.
> > > > 
> > > 
> > > Thanks a lot for the backtrace.  This backtrace is not the one I
> > > expected and I guess the issue is related with cached route removal
> > > on interface down.  Quick looking over the code didn't reveal the
> > > cause of crash(I'm not familiar with that part code).  Probably
> > > gnn@ may have better idea what's going on here(CCed).
> > > 
> > > Thanks.  
> > 
> > In another thread I complained about permanent crashes on several "older" 
> > Intel
> > archi

Re: CURRENT: re(4) crashing system

2016-11-19 Thread YongHyeon PYUN
On Sat, Nov 19, 2016 at 07:44:35PM +0100, O. Hartmann wrote:
> Am Mon, 7 Nov 2016 11:16:23 +0900
> YongHyeon PYUN  schrieb:
> 
> > On Sun, Nov 06, 2016 at 01:20:36PM +0100, Hartmann, O. wrote:
> > > On Mon, 31 Oct 2016 11:12:22 +0900
> > > YongHyeon PYUN  wrote:
> > >   
> > > > On Fri, Oct 28, 2016 at 09:21:13PM +0200, Hartmann, O. wrote:  
> > > > > On Thu, 27 Oct 2016 10:00:04 +0900
> > > > > YongHyeon PYUN  wrote:
> > > > > 
> > > > > > On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:
> > > > > > > On Tue, 25 Oct 2016 11:05:38 +0900
> > > > > > > YongHyeon PYUN  wrote:
> > > > > > >   
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > > I'm not sure but it's likely the issue is related with
> > > > > > > > EEE/Green Ethernet handling. EEE is negotiated feature with
> > > > > > > > link partner. If you directly connect your laptop to non-EEE
> > > > > > > > capable link partner like other re(4) box without switches
> > > > > > > > you may be able to tell whether the issue is EEE/Green
> > > > > > > > Ethernet related one or not.  
> > > > > > > 
> > > > > > > Me either since when I discovered a problem the first time with
> > > > > > > CURRENT, that was the Friday before last week's Friday, there
> > > > > > > was a unlucky coicidence: I got the new switch, FreeBSD
> > > > > > > introduced a serious bug and I changed the NICs.
> > > > > > > 
> > > > > > > The laptop, the last in the row of re(4) equipted systems on
> > > > > > > which I use the Realtek NIC, does well now with Green IT
> > > > > > > technology, but crashes on plugging/unplugging - not on each
> > > > > > > event, but at least in one of ten.  
> > > > > > 
> > > > > > Hmm, it seems you know how to trigger the issue. When you unplug
> > > > > > UTP cable was there active network traffic on re(4) device?
> > > > > > It would be helpful to know which event triggers the crash(e.g.
> > > > > > unplugging or plugging).  And would you show me backtrace of
> > > > > > panic?   
> > > > > > > I guess the Green IT issue is more a unlucky guess of mine and
> > > > > > > went hand in hand with the problem I face with CURRENT right
> > > > > > > now on some older, Non UEFI machines.
> > > > > > >   
> > > > > > 
> > > > > > Ok.
> > > > > > 
> > > > > > [...]
> > > > > > > 
> > > > > > > As requested the informations about re0 and rgephy0 on the
> > > > > > > laptop (Lenovo E540) 
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > rgephy0:  PHY 1 on miibus0
> > > > > > > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow,
> > > > > > > 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > > > > > > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > > > > > > 1000baseT-FDX-flow-master, auto, auto-flow
> > > > > > > 
> > > > > > > re0: 
> > > > > > > port 0x3000-0x30ff mem
> > > > > > > 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff at device 0.0 on
> > > > > > > pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip
> > > > > > > rev. 0x5080 re0: MAC rev. 0x0010  
> > > > > > 
> > > > > > This looks like 8168GU controller.
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > I use options netmap in kernel config, but the problem is also
> > > > > > > present without this option - just for the record.
> > > > > > >   
> > > > > > 
> > > > > > Yup, netmap(4) has nothing to do with the crash.
> > > > > > 
> > > > > > Thanks.
> > > > > 
> > > > > Attached, you'll find the backtrace of the crash. This time it was
> > > > > really easy - just one pull of the LAN cabling - and we are
> > > > > happy :-/
> > > > > 
> > > > > Please let me know if you need something else. I will return to
> > > > > normal operations (disabling debugging) due to CURRENT is very
> > > > > unstable at the moment on other hosts beyond r307157.
> > > > > 
> > > > 
> > > > It seems the attachment was stripped.  
> > > 
> > > This time I hope I got it right!
> > > 
> > > Attached you'll find the latest CURRENT's backtrace on the provoked
> > > crash (plug and unplug).
> > > 
> > > I also saved the kernel and coredump, so if you need me to do further
> > > investigations,please let me know.
> > >   
> > 
> > Thanks a lot for the backtrace.  This backtrace is not the one I
> > expected and I guess the issue is related with cached route removal
> > on interface down.  Quick looking over the code didn't reveal the
> > cause of crash(I'm not familiar with that part code).  Probably
> > gnn@ may have better idea what's going on here(CCed).
> > 
> > Thanks.
> 
> In another thread I complained about permanent crashes on several "older" 
> Intel
> architectures (IvyBridge and down). It has been revealed, that
> 
> option FLOWTABLE
> 
> in the kernel, which is part of my custom kernels a long time for now, has 
> been
> identified as the culprit on those systems. Commenting out that special 
> option solved the
> problem!
> 
> Interestingly, also commenting out this option from the ker

Re: CURRENT: re(4) crashing system

2016-11-19 Thread O. Hartmann
Am Mon, 7 Nov 2016 11:16:23 +0900
YongHyeon PYUN  schrieb:

> On Sun, Nov 06, 2016 at 01:20:36PM +0100, Hartmann, O. wrote:
> > On Mon, 31 Oct 2016 11:12:22 +0900
> > YongHyeon PYUN  wrote:
> >   
> > > On Fri, Oct 28, 2016 at 09:21:13PM +0200, Hartmann, O. wrote:  
> > > > On Thu, 27 Oct 2016 10:00:04 +0900
> > > > YongHyeon PYUN  wrote:
> > > > 
> > > > > On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:
> > > > > > On Tue, 25 Oct 2016 11:05:38 +0900
> > > > > > YongHyeon PYUN  wrote:
> > > > > >   
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > > I'm not sure but it's likely the issue is related with
> > > > > > > EEE/Green Ethernet handling. EEE is negotiated feature with
> > > > > > > link partner. If you directly connect your laptop to non-EEE
> > > > > > > capable link partner like other re(4) box without switches
> > > > > > > you may be able to tell whether the issue is EEE/Green
> > > > > > > Ethernet related one or not.  
> > > > > > 
> > > > > > Me either since when I discovered a problem the first time with
> > > > > > CURRENT, that was the Friday before last week's Friday, there
> > > > > > was a unlucky coicidence: I got the new switch, FreeBSD
> > > > > > introduced a serious bug and I changed the NICs.
> > > > > > 
> > > > > > The laptop, the last in the row of re(4) equipted systems on
> > > > > > which I use the Realtek NIC, does well now with Green IT
> > > > > > technology, but crashes on plugging/unplugging - not on each
> > > > > > event, but at least in one of ten.  
> > > > > 
> > > > > Hmm, it seems you know how to trigger the issue. When you unplug
> > > > > UTP cable was there active network traffic on re(4) device?
> > > > > It would be helpful to know which event triggers the crash(e.g.
> > > > > unplugging or plugging).  And would you show me backtrace of
> > > > > panic?   
> > > > > > I guess the Green IT issue is more a unlucky guess of mine and
> > > > > > went hand in hand with the problem I face with CURRENT right
> > > > > > now on some older, Non UEFI machines.
> > > > > >   
> > > > > 
> > > > > Ok.
> > > > > 
> > > > > [...]
> > > > > > 
> > > > > > As requested the informations about re0 and rgephy0 on the
> > > > > > laptop (Lenovo E540) 
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > rgephy0:  PHY 1 on miibus0
> > > > > > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow,
> > > > > > 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > > > > > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > > > > > 1000baseT-FDX-flow-master, auto, auto-flow
> > > > > > 
> > > > > > re0: 
> > > > > > port 0x3000-0x30ff mem
> > > > > > 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff at device 0.0 on
> > > > > > pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip
> > > > > > rev. 0x5080 re0: MAC rev. 0x0010  
> > > > > 
> > > > > This looks like 8168GU controller.
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > I use options netmap in kernel config, but the problem is also
> > > > > > present without this option - just for the record.
> > > > > >   
> > > > > 
> > > > > Yup, netmap(4) has nothing to do with the crash.
> > > > > 
> > > > > Thanks.
> > > > 
> > > > Attached, you'll find the backtrace of the crash. This time it was
> > > > really easy - just one pull of the LAN cabling - and we are
> > > > happy :-/
> > > > 
> > > > Please let me know if you need something else. I will return to
> > > > normal operations (disabling debugging) due to CURRENT is very
> > > > unstable at the moment on other hosts beyond r307157.
> > > > 
> > > 
> > > It seems the attachment was stripped.  
> > 
> > This time I hope I got it right!
> > 
> > Attached you'll find the latest CURRENT's backtrace on the provoked
> > crash (plug and unplug).
> > 
> > I also saved the kernel and coredump, so if you need me to do further
> > investigations,please let me know.
> >   
> 
> Thanks a lot for the backtrace.  This backtrace is not the one I
> expected and I guess the issue is related with cached route removal
> on interface down.  Quick looking over the code didn't reveal the
> cause of crash(I'm not familiar with that part code).  Probably
> gnn@ may have better idea what's going on here(CCed).
> 
> Thanks.

In another thread I complained about permanent crashes on several "older" Intel
architectures (IvyBridge and down). It has been revealed, that

option FLOWTABLE

in the kernel, which is part of my custom kernels a long time for now, has been
identified as the culprit on those systems. Commenting out that special option 
solved the
problem!

Interestingly, also commenting out this option from the kernel config of the 
laptop in
question of this thread, I wasn't able - as of this writing - to reproduce the 
crashes,
so it might be that the same issue with FLOWTABLE has been triggered by pluggin 
and/or
unpluggin the LAN cord.

Usually I was able to trigger the coredump after two or three rounds, 

Re: CURRENT: re(4) crashing system

2016-11-06 Thread YongHyeon PYUN
On Sun, Nov 06, 2016 at 01:20:36PM +0100, Hartmann, O. wrote:
> On Mon, 31 Oct 2016 11:12:22 +0900
> YongHyeon PYUN  wrote:
> 
> > On Fri, Oct 28, 2016 at 09:21:13PM +0200, Hartmann, O. wrote:
> > > On Thu, 27 Oct 2016 10:00:04 +0900
> > > YongHyeon PYUN  wrote:
> > >   
> > > > On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:  
> > > > > On Tue, 25 Oct 2016 11:05:38 +0900
> > > > > YongHyeon PYUN  wrote:
> > > > > 
> > > > 
> > > > [...]
> > > >   
> > > > > > I'm not sure but it's likely the issue is related with
> > > > > > EEE/Green Ethernet handling. EEE is negotiated feature with
> > > > > > link partner. If you directly connect your laptop to non-EEE
> > > > > > capable link partner like other re(4) box without switches
> > > > > > you may be able to tell whether the issue is EEE/Green
> > > > > > Ethernet related one or not.
> > > > > 
> > > > > Me either since when I discovered a problem the first time with
> > > > > CURRENT, that was the Friday before last week's Friday, there
> > > > > was a unlucky coicidence: I got the new switch, FreeBSD
> > > > > introduced a serious bug and I changed the NICs.
> > > > > 
> > > > > The laptop, the last in the row of re(4) equipted systems on
> > > > > which I use the Realtek NIC, does well now with Green IT
> > > > > technology, but crashes on plugging/unplugging - not on each
> > > > > event, but at least in one of ten.
> > > > 
> > > > Hmm, it seems you know how to trigger the issue. When you unplug
> > > > UTP cable was there active network traffic on re(4) device?
> > > > It would be helpful to know which event triggers the crash(e.g.
> > > > unplugging or plugging).  And would you show me backtrace of
> > > > panic? 
> > > > > I guess the Green IT issue is more a unlucky guess of mine and
> > > > > went hand in hand with the problem I face with CURRENT right
> > > > > now on some older, Non UEFI machines.
> > > > > 
> > > > 
> > > > Ok.
> > > > 
> > > > [...]  
> > > > > 
> > > > > As requested the informations about re0 and rgephy0 on the
> > > > > laptop (Lenovo E540) 
> > > > > 
> > > > > [...]
> > > > > 
> > > > > rgephy0:  PHY 1 on miibus0
> > > > > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow,
> > > > > 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > > > > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > > > > 1000baseT-FDX-flow-master, auto, auto-flow
> > > > > 
> > > > > re0: 
> > > > > port 0x3000-0x30ff mem
> > > > > 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff at device 0.0 on
> > > > > pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip
> > > > > rev. 0x5080 re0: MAC rev. 0x0010
> > > > 
> > > > This looks like 8168GU controller.
> > > > 
> > > > [...]
> > > >   
> > > > > I use options netmap in kernel config, but the problem is also
> > > > > present without this option - just for the record.
> > > > > 
> > > > 
> > > > Yup, netmap(4) has nothing to do with the crash.
> > > > 
> > > > Thanks.  
> > > 
> > > Attached, you'll find the backtrace of the crash. This time it was
> > > really easy - just one pull of the LAN cabling - and we are
> > > happy :-/
> > > 
> > > Please let me know if you need something else. I will return to
> > > normal operations (disabling debugging) due to CURRENT is very
> > > unstable at the moment on other hosts beyond r307157.
> > >   
> > 
> > It seems the attachment was stripped.
> 
> This time I hope I got it right!
> 
> Attached you'll find the latest CURRENT's backtrace on the provoked
> crash (plug and unplug).
> 
> I also saved the kernel and coredump, so if you need me to do further
> investigations,please let me know.
> 

Thanks a lot for the backtrace.  This backtrace is not the one I
expected and I guess the issue is related with cached route removal
on interface down.  Quick looking over the code didn't reveal the
cause of crash(I'm not familiar with that part code).  Probably
gnn@ may have better idea what's going on here(CCed).

Thanks.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: re(4) crashing system

2016-11-06 Thread Hartmann, O.
On Mon, 31 Oct 2016 11:12:22 +0900
YongHyeon PYUN  wrote:

> On Fri, Oct 28, 2016 at 09:21:13PM +0200, Hartmann, O. wrote:
> > On Thu, 27 Oct 2016 10:00:04 +0900
> > YongHyeon PYUN  wrote:
> >   
> > > On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:  
> > > > On Tue, 25 Oct 2016 11:05:38 +0900
> > > > YongHyeon PYUN  wrote:
> > > > 
> > > 
> > > [...]
> > >   
> > > > > I'm not sure but it's likely the issue is related with
> > > > > EEE/Green Ethernet handling. EEE is negotiated feature with
> > > > > link partner. If you directly connect your laptop to non-EEE
> > > > > capable link partner like other re(4) box without switches
> > > > > you may be able to tell whether the issue is EEE/Green
> > > > > Ethernet related one or not.
> > > > 
> > > > Me either since when I discovered a problem the first time with
> > > > CURRENT, that was the Friday before last week's Friday, there
> > > > was a unlucky coicidence: I got the new switch, FreeBSD
> > > > introduced a serious bug and I changed the NICs.
> > > > 
> > > > The laptop, the last in the row of re(4) equipted systems on
> > > > which I use the Realtek NIC, does well now with Green IT
> > > > technology, but crashes on plugging/unplugging - not on each
> > > > event, but at least in one of ten.
> > > 
> > > Hmm, it seems you know how to trigger the issue. When you unplug
> > > UTP cable was there active network traffic on re(4) device?
> > > It would be helpful to know which event triggers the crash(e.g.
> > > unplugging or plugging).  And would you show me backtrace of
> > > panic? 
> > > > I guess the Green IT issue is more a unlucky guess of mine and
> > > > went hand in hand with the problem I face with CURRENT right
> > > > now on some older, Non UEFI machines.
> > > > 
> > > 
> > > Ok.
> > > 
> > > [...]  
> > > > 
> > > > As requested the informations about re0 and rgephy0 on the
> > > > laptop (Lenovo E540) 
> > > > 
> > > > [...]
> > > > 
> > > > rgephy0:  PHY 1 on miibus0
> > > > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow,
> > > > 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > > > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > > > 1000baseT-FDX-flow-master, auto, auto-flow
> > > > 
> > > > re0: 
> > > > port 0x3000-0x30ff mem
> > > > 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff at device 0.0 on
> > > > pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip
> > > > rev. 0x5080 re0: MAC rev. 0x0010
> > > 
> > > This looks like 8168GU controller.
> > > 
> > > [...]
> > >   
> > > > I use options netmap in kernel config, but the problem is also
> > > > present without this option - just for the record.
> > > > 
> > > 
> > > Yup, netmap(4) has nothing to do with the crash.
> > > 
> > > Thanks.  
> > 
> > Attached, you'll find the backtrace of the crash. This time it was
> > really easy - just one pull of the LAN cabling - and we are
> > happy :-/
> > 
> > Please let me know if you need something else. I will return to
> > normal operations (disabling debugging) due to CURRENT is very
> > unstable at the moment on other hosts beyond r307157.
> >   
> 
> It seems the attachment was stripped.

This time I hope I got it right!

Attached you'll find the latest CURRENT's backtrace on the provoked
crash (plug and unplug).

I also saved the kernel and coredump, so if you need me to do further
investigations,please let me know.

Thanks in advance and kind regards,

oliver

core.txt.0
Description: Binary data
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: CURRENT: re(4) crashing system

2016-11-03 Thread O. Hartmann
Am Mon, 31 Oct 2016 11:12:22 +0900
YongHyeon PYUN  schrieb:

> On Fri, Oct 28, 2016 at 09:21:13PM +0200, Hartmann, O. wrote:
> > On Thu, 27 Oct 2016 10:00:04 +0900
> > YongHyeon PYUN  wrote:
> >   
> > > On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:  
> > > > On Tue, 25 Oct 2016 11:05:38 +0900
> > > > YongHyeon PYUN  wrote:
> > > > 
> > > 
> > > [...]
> > >   
> > > > > I'm not sure but it's likely the issue is related with EEE/Green
> > > > > Ethernet handling. EEE is negotiated feature with link partner. If
> > > > > you directly connect your laptop to non-EEE capable link partner
> > > > > like other re(4) box without switches you may be able to tell
> > > > > whether the issue is EEE/Green Ethernet related one or not.
> > > > 
> > > > Me either since when I discovered a problem the first time with
> > > > CURRENT, that was the Friday before last week's Friday, there was a
> > > > unlucky coicidence: I got the new switch, FreeBSD introduced a
> > > > serious bug and I changed the NICs.
> > > > 
> > > > The laptop, the last in the row of re(4) equipted systems on which I
> > > > use the Realtek NIC, does well now with Green IT technology, but
> > > > crashes on plugging/unplugging - not on each event, but at least in
> > > > one of ten.
> > > 
> > > Hmm, it seems you know how to trigger the issue. When you unplug
> > > UTP cable was there active network traffic on re(4) device?
> > > It would be helpful to know which event triggers the crash(e.g.
> > > unplugging or plugging).  And would you show me backtrace of panic?
> > >   
> > > > I guess the Green IT issue is more a unlucky guess of mine and went
> > > > hand in hand with the problem I face with CURRENT right now on some
> > > > older, Non UEFI machines.
> > > > 
> > > 
> > > Ok.
> > > 
> > > [...]  
> > > > 
> > > > As requested the informations about re0 and rgephy0 on the laptop
> > > > (Lenovo E540) 
> > > > 
> > > > [...]
> > > > 
> > > > rgephy0:  PHY 1 on miibus0
> > > > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
> > > > 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > > > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > > > 1000baseT-FDX-flow-master, auto, auto-flow
> > > > 
> > > > re0: 
> > > > port 0x3000-0x30ff mem 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff
> > > > at device 0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled
> > > > re0: Chip rev. 0x5080
> > > > re0: MAC rev. 0x0010
> > > 
> > > This looks like 8168GU controller.
> > > 
> > > [...]
> > >   
> > > > I use options netmap in kernel config, but the problem is also
> > > > present without this option - just for the record.
> > > > 
> > > 
> > > Yup, netmap(4) has nothing to do with the crash.
> > > 
> > > Thanks.  
> > 
> > Attached, you'll find the backtrace of the crash. This time it was
> > really easy - just one pull of the LAN cabling - and we are happy :-/
> > 
> > Please let me know if you need something else. I will return to normal
> > operations (disabling debugging) due to CURRENT is very unstable at the
> > moment on other hosts beyond r307157.
> >   
> 
> It seems the attachment was stripped.

[...]

Sorry for the late reply.
Indeed, someone forgot to append the dump/core info and this someone seems to 
be me.

I have severe time constraints and I will prepare another crash/dump on this 
weekend with
a most recent CURRENT.

My apologizes for this,

kind regards,

Oliver


pgprRC6baT13b.pgp
Description: OpenPGP digital signature


Re: CURRENT: re(4) crashing system

2016-10-30 Thread YongHyeon PYUN
On Fri, Oct 28, 2016 at 09:21:13PM +0200, Hartmann, O. wrote:
> On Thu, 27 Oct 2016 10:00:04 +0900
> YongHyeon PYUN  wrote:
> 
> > On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:
> > > On Tue, 25 Oct 2016 11:05:38 +0900
> > > YongHyeon PYUN  wrote:
> > >   
> > 
> > [...]
> > 
> > > > I'm not sure but it's likely the issue is related with EEE/Green
> > > > Ethernet handling. EEE is negotiated feature with link partner. If
> > > > you directly connect your laptop to non-EEE capable link partner
> > > > like other re(4) box without switches you may be able to tell
> > > > whether the issue is EEE/Green Ethernet related one or not.  
> > > 
> > > Me either since when I discovered a problem the first time with
> > > CURRENT, that was the Friday before last week's Friday, there was a
> > > unlucky coicidence: I got the new switch, FreeBSD introduced a
> > > serious bug and I changed the NICs.
> > > 
> > > The laptop, the last in the row of re(4) equipted systems on which I
> > > use the Realtek NIC, does well now with Green IT technology, but
> > > crashes on plugging/unplugging - not on each event, but at least in
> > > one of ten.  
> > 
> > Hmm, it seems you know how to trigger the issue. When you unplug
> > UTP cable was there active network traffic on re(4) device?
> > It would be helpful to know which event triggers the crash(e.g.
> > unplugging or plugging).  And would you show me backtrace of panic?
> > 
> > > I guess the Green IT issue is more a unlucky guess of mine and went
> > > hand in hand with the problem I face with CURRENT right now on some
> > > older, Non UEFI machines.
> > >   
> > 
> > Ok.
> > 
> > [...]
> > > 
> > > As requested the informations about re0 and rgephy0 on the laptop
> > > (Lenovo E540) 
> > > 
> > > [...]
> > > 
> > > rgephy0:  PHY 1 on miibus0
> > > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
> > > 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > > 1000baseT-FDX-flow-master, auto, auto-flow
> > > 
> > > re0: 
> > > port 0x3000-0x30ff mem 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff
> > > at device 0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled
> > > re0: Chip rev. 0x5080
> > > re0: MAC rev. 0x0010  
> > 
> > This looks like 8168GU controller.
> > 
> > [...]
> > 
> > > I use options netmap in kernel config, but the problem is also
> > > present without this option - just for the record.
> > >   
> > 
> > Yup, netmap(4) has nothing to do with the crash.
> > 
> > Thanks.
> 
> Attached, you'll find the backtrace of the crash. This time it was
> really easy - just one pull of the LAN cabling - and we are happy :-/
> 
> Please let me know if you need something else. I will return to normal
> operations (disabling debugging) due to CURRENT is very unstable at the
> moment on other hosts beyond r307157.
> 

It seems the attachment was stripped.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: re(4) crashing system

2016-10-28 Thread Hartmann, O.
On Thu, 27 Oct 2016 10:00:04 +0900
YongHyeon PYUN  wrote:

> On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:
> > On Tue, 25 Oct 2016 11:05:38 +0900
> > YongHyeon PYUN  wrote:
> >   
> 
> [...]
> 
> > > I'm not sure but it's likely the issue is related with EEE/Green
> > > Ethernet handling. EEE is negotiated feature with link partner. If
> > > you directly connect your laptop to non-EEE capable link partner
> > > like other re(4) box without switches you may be able to tell
> > > whether the issue is EEE/Green Ethernet related one or not.  
> > 
> > Me either since when I discovered a problem the first time with
> > CURRENT, that was the Friday before last week's Friday, there was a
> > unlucky coicidence: I got the new switch, FreeBSD introduced a
> > serious bug and I changed the NICs.
> > 
> > The laptop, the last in the row of re(4) equipted systems on which I
> > use the Realtek NIC, does well now with Green IT technology, but
> > crashes on plugging/unplugging - not on each event, but at least in
> > one of ten.  
> 
> Hmm, it seems you know how to trigger the issue. When you unplug
> UTP cable was there active network traffic on re(4) device?
> It would be helpful to know which event triggers the crash(e.g.
> unplugging or plugging).  And would you show me backtrace of panic?
> 
> > I guess the Green IT issue is more a unlucky guess of mine and went
> > hand in hand with the problem I face with CURRENT right now on some
> > older, Non UEFI machines.
> >   
> 
> Ok.
> 
> [...]
> > 
> > As requested the informations about re0 and rgephy0 on the laptop
> > (Lenovo E540) 
> > 
> > [...]
> > 
> > rgephy0:  PHY 1 on miibus0
> > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
> > 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > 1000baseT-FDX-flow-master, auto, auto-flow
> > 
> > re0: 
> > port 0x3000-0x30ff mem 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff
> > at device 0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled
> > re0: Chip rev. 0x5080
> > re0: MAC rev. 0x0010  
> 
> This looks like 8168GU controller.
> 
> [...]
> 
> > I use options netmap in kernel config, but the problem is also
> > present without this option - just for the record.
> >   
> 
> Yup, netmap(4) has nothing to do with the crash.
> 
> Thanks.

Attached, you'll find the backtrace of the crash. This time it was
really easy - just one pull of the LAN cabling - and we are happy :-/

Please let me know if you need something else. I will return to normal
operations (disabling debugging) due to CURRENT is very unstable at the
moment on other hosts beyond r307157.

Kind regards,

oh
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: re(4) crashing system

2016-10-27 Thread O. Hartmann
Am Thu, 27 Oct 2016 10:00:04 +0900
YongHyeon PYUN  schrieb:

> On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:
> > On Tue, 25 Oct 2016 11:05:38 +0900
> > YongHyeon PYUN  wrote:
> >   
> 
> [...]
> 
> > > I'm not sure but it's likely the issue is related with EEE/Green
> > > Ethernet handling. EEE is negotiated feature with link partner. If
> > > you directly connect your laptop to non-EEE capable link partner
> > > like other re(4) box without switches you may be able to tell
> > > whether the issue is EEE/Green Ethernet related one or not.  
> > 
> > Me either since when I discovered a problem the first time with
> > CURRENT, that was the Friday before last week's Friday, there was a
> > unlucky coicidence: I got the new switch, FreeBSD introduced a serious
> > bug and I changed the NICs.
> > 
> > The laptop, the last in the row of re(4) equipted systems on which I
> > use the Realtek NIC, does well now with Green IT technology, but
> > crashes on plugging/unplugging - not on each event, but at least in one
> > of ten.  
> 
> Hmm, it seems you know how to trigger the issue. When you unplug
> UTP cable was there active network traffic on re(4) device?
> It would be helpful to know which event triggers the crash(e.g.
> unplugging or plugging).  And would you show me backtrace of panic?

Yes, as I wrote, plugging and unplugging. Usually, there is no traffic I'm 
aware of,
simply the - a hunch - attempt to renegotiate the connection triggers the 
crash. As I can
force by bringing up and down the port on the switch.

 Of course you can get a panic/backtrace, but I need the weekend. I complained
in another thread about the inability of getting a core - I use ELI encrypted 
swap, so I
shot myself at that point.

> 
> > I guess the Green IT issue is more a unlucky guess of mine and went
> > hand in hand with the problem I face with CURRENT right now on some
> > older, Non UEFI machines.
> >   
> 
> Ok.
> 
> [...]
> > 
> > As requested the informations about re0 and rgephy0 on the laptop
> > (Lenovo E540) 
> > 
> > [...]
> > 
> > rgephy0:  PHY 1 on miibus0
> > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
> > 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master,
> > 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
> > 
> > re0:  port
> > 0x3000-0x30ff mem 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff at device
> > 0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled
> > re0: Chip rev. 0x5080
> > re0: MAC rev. 0x0010  
> 
> This looks like 8168GU controller.


> 
> [...]
> 
> > I use options netmap in kernel config, but the problem is also present
> > without this option - just for the record.
> >   
> 
> Yup, netmap(4) has nothing to do with the crash.
> 
> Thanks.
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



pgpzPKWwKVKBl.pgp
Description: OpenPGP digital signature


Re: CURRENT: re(4) crashing system

2016-10-26 Thread YongHyeon PYUN
On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:
> On Tue, 25 Oct 2016 11:05:38 +0900
> YongHyeon PYUN  wrote:
> 

[...]

> > I'm not sure but it's likely the issue is related with EEE/Green
> > Ethernet handling. EEE is negotiated feature with link partner. If
> > you directly connect your laptop to non-EEE capable link partner
> > like other re(4) box without switches you may be able to tell
> > whether the issue is EEE/Green Ethernet related one or not.
> 
> Me either since when I discovered a problem the first time with
> CURRENT, that was the Friday before last week's Friday, there was a
> unlucky coicidence: I got the new switch, FreeBSD introduced a serious
> bug and I changed the NICs.
> 
> The laptop, the last in the row of re(4) equipted systems on which I
> use the Realtek NIC, does well now with Green IT technology, but
> crashes on plugging/unplugging - not on each event, but at least in one
> of ten.

Hmm, it seems you know how to trigger the issue. When you unplug
UTP cable was there active network traffic on re(4) device?
It would be helpful to know which event triggers the crash(e.g.
unplugging or plugging).  And would you show me backtrace of panic?

> I guess the Green IT issue is more a unlucky guess of mine and went
> hand in hand with the problem I face with CURRENT right now on some
> older, Non UEFI machines.
> 

Ok.

[...]
> 
> As requested the informations about re0 and rgephy0 on the laptop
> (Lenovo E540) 
> 
> [...]
> 
> rgephy0:  PHY 1 on miibus0
> rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
> 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master,
> 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
> 
> re0:  port
> 0x3000-0x30ff mem 0xf0d04000-0xf0d04fff,0xf0d0-0xf0d03fff at device
> 0.0 on pci2 re0: Using 1 MSI-X message re0: ASPM disabled
> re0: Chip rev. 0x5080
> re0: MAC rev. 0x0010

This looks like 8168GU controller.

[...]

> I use options netmap in kernel config, but the problem is also present
> without this option - just for the record.
> 

Yup, netmap(4) has nothing to do with the crash.

Thanks.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: re(4) crashing system

2016-10-24 Thread Hartmann, O.
On Tue, 25 Oct 2016 11:05:38 +0900
YongHyeon PYUN  wrote:

> On Mon, Oct 24, 2016 at 02:03:37PM +0200, O. Hartmann wrote:
> > On Mon, 24 Oct 2016 14:14:00 +0900
> > YongHyeon PYUN  wrote:
> >   
> > > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:  
> > > > I tried to report earlier here that CURRENT does have some
> > > > serious problems right now and one of those problems seems to
> > > > be triggered by the recent re(4) driver. The problem is also
> > > > present in recen 11-STABLE!
> > > > 
> > > > Below, you'll find pciconf-output reagrding the device on a
> > > > Lenovo E540 Laptop I can test on and trigger the problem.
> > > > 
> > > > The phenomenon is that this NIC does not negotiate 1000baseTX,
> > > > it is always falling back to 100baseTX although the device
> > > > claims to be a 1 GBit capable device.
> > > > 
> > > > When I try to put the device manually into 1000basTX mode via
> > > > 
> > > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4)
> > > > driver)
> > > > 
> > > > it is possible to crash the system. The system also crashes when
> > > > plugging/unplugging the LAN cord - I guess the renegotiation is
> > > > triggering this crash immediately.
> > > > 
> > > > I tried with several switches and routers capable of 1 GBit and
> > > > it seems to be independent from the network hardware in use.
> > > > 
> > > > I tried to capture a backtrace when the kernel crashes, but I
> > > > do not know how to save the the kernel debugger output.
> > > > Although I configured according the handbook debugging, there
> > > > is no coredump at all.
> > > > 
> > > > Advice is appreciated - if anybody is interesetd in solving
> > > > this. 
> > > 
> > > There were several instability reports on re(4).  I vaguely guess
> > > it would be related with some missing initializations for certain
> > > controllers.  Unfortunately, there is no publicly available
> > > datasheet for those controllers and it's not likely to get access
> > > to it in near future.  It seems vendor's FreeBSD driver accesses
> > > lots of magic registers as well as loading DSP fixups.  I have no
> > > idea what it wants to do and re(4) used to heavily rely on
> > > power-on default register values.  Engineering samples I have do
> > > not show instabilities so it wouldn't be easy to identify the
> > > issue.
> > > 
> > > Probably the first step to address the issue would be identifying
> > > those chips and narrowing down the scope of guessing.  Would you
> > > show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
> > > output is useless here since RealTek uses the same PCI id for
> > > PCIe variants.
> > > 
> > > BTW, I was told that the vendor's FreeBSD driver seems to work
> > > fine for normal usage pattern.  The vendor's driver triggered an
> > > instant panic and lacked H/W offloading features in the past.  It
> > > might have changed though.  
> > 
> > The problemacy with re(4) drivers arose again, when I bought some
> > "green" equipment, mainly switches, which reduces power emission on
> > short cables or non-connected ports. This brought down some servers
> > with re(4) chipsets immediately and I had no clue what happend. I
> > do not know whether this is a  
> 
> I'm not sure but it's likely the issue is related with EEE/Green
> Ethernet handling. EEE is negotiated feature with link partner. If
> you directly connect your laptop to non-EEE capable link partner
> like other re(4) box without switches you may be able to tell
> whether the issue is EEE/Green Ethernet related one or not.

Me either since when I discovered a problem the first time with
CURRENT, that was the Friday before last week's Friday, there was a
unlucky coicidence: I got the new switch, FreeBSD introduced a serious
bug and I changed the NICs.

The laptop, the last in the row of re(4) equipted systems on which I
use the Realtek NIC, does well now with Green IT technology, but
crashes on plugging/unplugging - not on each event, but at least in one
of ten.
I guess the Green IT issue is more a unlucky guess of mine and went
hand in hand with the problem I face with CURRENT right now on some
older, Non UEFI machines.

> 
> > single fate so to speak, or this problem will arise for others,
> > too. We exchanged on serving hardware all Realtek NICs with those
> > from Intel, and luckily some server mainboards already have Intel
> > PHY or NICs. The Broadcom devices we have on some older Fujitus
> > hardware is also stable like a charme, even with the new power
> > saving switches. 
> 
> bge(4) also lacks EEE support(Publicly available datasheet is too
> sanitized one).  bge(4) firmware probably does not announce EEE
> capability by default in link establishment while recent re(4)
> devices seem to unconditionally announce EEE.  Generally EEE
> handling requires a kind of handshake for link state change from
> MAC/PHY.
> 
> > While we can swap on server or workstation platforms the NIC, it is
> > almost impossible on laptops 

Re: CURRENT: re(4) crashing system

2016-10-24 Thread YongHyeon PYUN
On Mon, Oct 24, 2016 at 02:03:37PM +0200, O. Hartmann wrote:
> On Mon, 24 Oct 2016 14:14:00 +0900
> YongHyeon PYUN  wrote:
> 
> > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:
> > > I tried to report earlier here that CURRENT does have some serious
> > > problems right now and one of those problems seems to be triggered by
> > > the recent re(4) driver. The problem is also present in recen 11-STABLE!
> > > 
> > > Below, you'll find pciconf-output reagrding the device on a Lenovo E540
> > > Laptop I can test on and trigger the problem.
> > > 
> > > The phenomenon is that this NIC does not negotiate 1000baseTX, it is
> > > always falling back to 100baseTX although the device claims to be a 1
> > > GBit capable device.
> > > 
> > > When I try to put the device manually into 1000basTX mode via
> > > 
> > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)
> > > 
> > > it is possible to crash the system. The system also crashes when
> > > plugging/unplugging the LAN cord - I guess the renegotiation is
> > > triggering this crash immediately.
> > > 
> > > I tried with several switches and routers capable of 1 GBit and it
> > > seems to be independent from the network hardware in use.
> > > 
> > > I tried to capture a backtrace when the kernel crashes, but I do not
> > > know how to save the the kernel debugger output. Although I configured
> > > according the handbook debugging, there is no coredump at all.
> > > 
> > > Advice is appreciated - if anybody is interesetd in solving this. 
> > >   
> > 
> > There were several instability reports on re(4).  I vaguely guess
> > it would be related with some missing initializations for certain
> > controllers.  Unfortunately, there is no publicly available
> > datasheet for those controllers and it's not likely to get access
> > to it in near future.  It seems vendor's FreeBSD driver accesses
> > lots of magic registers as well as loading DSP fixups.  I have no
> > idea what it wants to do and re(4) used to heavily rely on power-on
> > default register values.  Engineering samples I have do not show
> > instabilities so it wouldn't be easy to identify the issue.
> > 
> > Probably the first step to address the issue would be identifying
> > those chips and narrowing down the scope of guessing.  Would you
> > show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
> > output is useless here since RealTek uses the same PCI id for
> > PCIe variants.
> > 
> > BTW, I was told that the vendor's FreeBSD driver seems to work fine
> > for normal usage pattern.  The vendor's driver triggered an instant
> > panic and lacked H/W offloading features in the past.  It might
> > have changed though.
> 
> The problemacy with re(4) drivers arose again, when I bought some "green"
> equipment, mainly switches, which reduces power emission on short cables or
> non-connected ports. This brought down some servers with re(4) chipsets
> immediately and I had no clue what happend. I do not know whether this is a

I'm not sure but it's likely the issue is related with EEE/Green
Ethernet handling. EEE is negotiated feature with link partner. If
you directly connect your laptop to non-EEE capable link partner
like other re(4) box without switches you may be able to tell
whether the issue is EEE/Green Ethernet related one or not.

> single fate so to speak, or this problem will arise for others, too. We
> exchanged on serving hardware all Realtek NICs with those from Intel, and
> luckily some server mainboards already have Intel PHY or NICs. The Broadcom
> devices we have on some older Fujitus hardware is also stable like a charme,
> even with the new power saving switches.
> 

bge(4) also lacks EEE support(Publicly available datasheet is too
sanitized one).  bge(4) firmware probably does not announce EEE
capability by default in link establishment while recent re(4)
devices seem to unconditionally announce EEE.  Generally EEE
handling requires a kind of handshake for link state change from
MAC/PHY.

> While we can swap on server or workstation platforms the NIC, it is almost
> impossible on laptops and the number of laptops with realtek chips seems to
> grow. It is a pity that the venodr of the chipsets reject supporting other 
> OSes
> than Windows - or in some rare cases only Linux. After you wrote the answer, I
> checked on the net who's suiatble drivers and the situation seems bad for
> almost all OSes apart from commercial ones like Windooze and Apple OS X.
> 
> As soon as I get hands on the laptop again, I'll send the requested
> informations. I know that I played around with re(4) and rgephy(4) in the
> kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect -
> except that it offered some additional "media xxx-options-xxx" mostly appended
> with "flow" - but rying brought also down the system as pluggin or unplugging.

rgephy(4) will show recognized PHY H/W model. Another information
I'd like to know is OUI information of the PH

Re: CURRENT: re(4) crashing system

2016-10-24 Thread O. Hartmann
On Mon, 24 Oct 2016 14:14:00 +0900
YongHyeon PYUN  wrote:

> On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:
> > I tried to report earlier here that CURRENT does have some serious
> > problems right now and one of those problems seems to be triggered by
> > the recent re(4) driver. The problem is also present in recen 11-STABLE!
> > 
> > Below, you'll find pciconf-output reagrding the device on a Lenovo E540
> > Laptop I can test on and trigger the problem.
> > 
> > The phenomenon is that this NIC does not negotiate 1000baseTX, it is
> > always falling back to 100baseTX although the device claims to be a 1
> > GBit capable device.
> > 
> > When I try to put the device manually into 1000basTX mode via
> > 
> > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)
> > 
> > it is possible to crash the system. The system also crashes when
> > plugging/unplugging the LAN cord - I guess the renegotiation is
> > triggering this crash immediately.
> > 
> > I tried with several switches and routers capable of 1 GBit and it
> > seems to be independent from the network hardware in use.
> > 
> > I tried to capture a backtrace when the kernel crashes, but I do not
> > know how to save the the kernel debugger output. Although I configured
> > according the handbook debugging, there is no coredump at all.
> > 
> > Advice is appreciated - if anybody is interesetd in solving this. 
> >   
> 
> There were several instability reports on re(4).  I vaguely guess
> it would be related with some missing initializations for certain
> controllers.  Unfortunately, there is no publicly available
> datasheet for those controllers and it's not likely to get access
> to it in near future.  It seems vendor's FreeBSD driver accesses
> lots of magic registers as well as loading DSP fixups.  I have no
> idea what it wants to do and re(4) used to heavily rely on power-on
> default register values.  Engineering samples I have do not show
> instabilities so it wouldn't be easy to identify the issue.
> 
> Probably the first step to address the issue would be identifying
> those chips and narrowing down the scope of guessing.  Would you
> show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
> output is useless here since RealTek uses the same PCI id for
> PCIe variants.
> 
> BTW, I was told that the vendor's FreeBSD driver seems to work fine
> for normal usage pattern.  The vendor's driver triggered an instant
> panic and lacked H/W offloading features in the past.  It might
> have changed though.

The problemacy with re(4) drivers arose again, when I bought some "green"
equipment, mainly switches, which reduces power emission on short cables or
non-connected ports. This brought down some servers with re(4) chipsets
immediately and I had no clue what happend. I do not know whether this is a
single fate so to speak, or this problem will arise for others, too. We
exchanged on serving hardware all Realtek NICs with those from Intel, and
luckily some server mainboards already have Intel PHY or NICs. The Broadcom
devices we have on some older Fujitus hardware is also stable like a charme,
even with the new power saving switches.

While we can swap on server or workstation platforms the NIC, it is almost
impossible on laptops and the number of laptops with realtek chips seems to
grow. It is a pity that the venodr of the chipsets reject supporting other OSes
than Windows - or in some rare cases only Linux. After you wrote the answer, I
checked on the net who's suiatble drivers and the situation seems bad for
almost all OSes apart from commercial ones like Windooze and Apple OS X.

As soon as I get hands on the laptop again, I'll send the requested
informations. I know that I played around with re(4) and rgephy(4) in the
kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect -
except that it offered some additional "media xxx-options-xxx" mostly appended
with "flow" - but rying brought also down the system as pluggin or unplugging.
The last kernel I compiled was then without rgephy(4) - the NIC worked as
expected, but pluggin/unplugging or having some power-down activities on a
Netgear SoHo green-pwer switch brings the system down as usual. 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: re(4) crashing system

2016-10-23 Thread YongHyeon PYUN
On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:
> I tried to report earlier here that CURRENT does have some serious
> problems right now and one of those problems seems to be triggered by
> the recent re(4) driver. The problem is also present in recen 11-STABLE!
> 
> Below, you'll find pciconf-output reagrding the device on a Lenovo E540
> Laptop I can test on and trigger the problem.
> 
> The phenomenon is that this NIC does not negotiate 1000baseTX, it is
> always falling back to 100baseTX although the device claims to be a 1
> GBit capable device.
> 
> When I try to put the device manually into 1000basTX mode via
> 
> ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)
> 
> it is possible to crash the system. The system also crashes when
> plugging/unplugging the LAN cord - I guess the renegotiation is
> triggering this crash immediately.
> 
> I tried with several switches and routers capable of 1 GBit and it
> seems to be independent from the network hardware in use.
> 
> I tried to capture a backtrace when the kernel crashes, but I do not
> know how to save the the kernel debugger output. Although I configured
> according the handbook debugging, there is no coredump at all.
> 
> Advice is appreciated - if anybody is interesetd in solving this. 
> 

There were several instability reports on re(4).  I vaguely guess
it would be related with some missing initializations for certain
controllers.  Unfortunately, there is no publicly available
datasheet for those controllers and it's not likely to get access
to it in near future.  It seems vendor's FreeBSD driver accesses
lots of magic registers as well as loading DSP fixups.  I have no
idea what it wants to do and re(4) used to heavily rely on power-on
default register values.  Engineering samples I have do not show
instabilities so it wouldn't be easy to identify the issue.

Probably the first step to address the issue would be identifying
those chips and narrowing down the scope of guessing.  Would you
show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
output is useless here since RealTek uses the same PCI id for
PCIe variants.

BTW, I was told that the vendor's FreeBSD driver seems to work fine
for normal usage pattern.  The vendor's driver triggered an instant
panic and lacked H/W offloading features in the past.  It might
have changed though.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"