Re: CURRENT: re(4) crashing system

2016-10-24 Thread Hartmann, O.
On Tue, 25 Oct 2016 11:05:38 +0900
YongHyeon PYUN  wrote:

> On Mon, Oct 24, 2016 at 02:03:37PM +0200, O. Hartmann wrote:
> > On Mon, 24 Oct 2016 14:14:00 +0900
> > YongHyeon PYUN  wrote:
> >   
> > > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:  
> > > > I tried to report earlier here that CURRENT does have some
> > > > serious problems right now and one of those problems seems to
> > > > be triggered by the recent re(4) driver. The problem is also
> > > > present in recen 11-STABLE!
> > > > 
> > > > Below, you'll find pciconf-output reagrding the device on a
> > > > Lenovo E540 Laptop I can test on and trigger the problem.
> > > > 
> > > > The phenomenon is that this NIC does not negotiate 1000baseTX,
> > > > it is always falling back to 100baseTX although the device
> > > > claims to be a 1 GBit capable device.
> > > > 
> > > > When I try to put the device manually into 1000basTX mode via
> > > > 
> > > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4)
> > > > driver)
> > > > 
> > > > it is possible to crash the system. The system also crashes when
> > > > plugging/unplugging the LAN cord - I guess the renegotiation is
> > > > triggering this crash immediately.
> > > > 
> > > > I tried with several switches and routers capable of 1 GBit and
> > > > it seems to be independent from the network hardware in use.
> > > > 
> > > > I tried to capture a backtrace when the kernel crashes, but I
> > > > do not know how to save the the kernel debugger output.
> > > > Although I configured according the handbook debugging, there
> > > > is no coredump at all.
> > > > 
> > > > Advice is appreciated - if anybody is interesetd in solving
> > > > this. 
> > > 
> > > There were several instability reports on re(4).  I vaguely guess
> > > it would be related with some missing initializations for certain
> > > controllers.  Unfortunately, there is no publicly available
> > > datasheet for those controllers and it's not likely to get access
> > > to it in near future.  It seems vendor's FreeBSD driver accesses
> > > lots of magic registers as well as loading DSP fixups.  I have no
> > > idea what it wants to do and re(4) used to heavily rely on
> > > power-on default register values.  Engineering samples I have do
> > > not show instabilities so it wouldn't be easy to identify the
> > > issue.
> > > 
> > > Probably the first step to address the issue would be identifying
> > > those chips and narrowing down the scope of guessing.  Would you
> > > show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
> > > output is useless here since RealTek uses the same PCI id for
> > > PCIe variants.
> > > 
> > > BTW, I was told that the vendor's FreeBSD driver seems to work
> > > fine for normal usage pattern.  The vendor's driver triggered an
> > > instant panic and lacked H/W offloading features in the past.  It
> > > might have changed though.  
> > 
> > The problemacy with re(4) drivers arose again, when I bought some
> > "green" equipment, mainly switches, which reduces power emission on
> > short cables or non-connected ports. This brought down some servers
> > with re(4) chipsets immediately and I had no clue what happend. I
> > do not know whether this is a  
> 
> I'm not sure but it's likely the issue is related with EEE/Green
> Ethernet handling. EEE is negotiated feature with link partner. If
> you directly connect your laptop to non-EEE capable link partner
> like other re(4) box without switches you may be able to tell
> whether the issue is EEE/Green Ethernet related one or not.

Me either since when I discovered a problem the first time with
CURRENT, that was the Friday before last week's Friday, there was a
unlucky coicidence: I got the new switch, FreeBSD introduced a serious
bug and I changed the NICs.

The laptop, the last in the row of re(4) equipted systems on which I
use the Realtek NIC, does well now with Green IT technology, but
crashes on plugging/unplugging - not on each event, but at least in one
of ten.
I guess the Green IT issue is more a unlucky guess of mine and went
hand in hand with the problem I face with CURRENT right now on some
older, Non UEFI machines.

> 
> > single fate so to speak, or this problem will arise for others,
> > too. We exchanged on serving hardware all Realtek NICs with those
> > from Intel, and luckily some server mainboards already have Intel
> > PHY or NICs. The Broadcom devices we have on some older Fujitus
> > hardware is also stable like a charme, even with the new power
> > saving switches. 
> 
> bge(4) also lacks EEE support(Publicly available datasheet is too
> sanitized one).  bge(4) firmware probably does not announce EEE
> capability by default in link establishment while recent re(4)
> devices seem to unconditionally announce EEE.  Generally EEE
> handling requires a kind of handshake for link state change from
> MAC/PHY.
> 
> > While we can swap on server or workstation platforms the NIC, it 

Re: CURRENT: re(4) crashing system

2016-10-24 Thread YongHyeon PYUN
On Mon, Oct 24, 2016 at 02:03:37PM +0200, O. Hartmann wrote:
> On Mon, 24 Oct 2016 14:14:00 +0900
> YongHyeon PYUN  wrote:
> 
> > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:
> > > I tried to report earlier here that CURRENT does have some serious
> > > problems right now and one of those problems seems to be triggered by
> > > the recent re(4) driver. The problem is also present in recen 11-STABLE!
> > > 
> > > Below, you'll find pciconf-output reagrding the device on a Lenovo E540
> > > Laptop I can test on and trigger the problem.
> > > 
> > > The phenomenon is that this NIC does not negotiate 1000baseTX, it is
> > > always falling back to 100baseTX although the device claims to be a 1
> > > GBit capable device.
> > > 
> > > When I try to put the device manually into 1000basTX mode via
> > > 
> > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)
> > > 
> > > it is possible to crash the system. The system also crashes when
> > > plugging/unplugging the LAN cord - I guess the renegotiation is
> > > triggering this crash immediately.
> > > 
> > > I tried with several switches and routers capable of 1 GBit and it
> > > seems to be independent from the network hardware in use.
> > > 
> > > I tried to capture a backtrace when the kernel crashes, but I do not
> > > know how to save the the kernel debugger output. Although I configured
> > > according the handbook debugging, there is no coredump at all.
> > > 
> > > Advice is appreciated - if anybody is interesetd in solving this. 
> > >   
> > 
> > There were several instability reports on re(4).  I vaguely guess
> > it would be related with some missing initializations for certain
> > controllers.  Unfortunately, there is no publicly available
> > datasheet for those controllers and it's not likely to get access
> > to it in near future.  It seems vendor's FreeBSD driver accesses
> > lots of magic registers as well as loading DSP fixups.  I have no
> > idea what it wants to do and re(4) used to heavily rely on power-on
> > default register values.  Engineering samples I have do not show
> > instabilities so it wouldn't be easy to identify the issue.
> > 
> > Probably the first step to address the issue would be identifying
> > those chips and narrowing down the scope of guessing.  Would you
> > show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
> > output is useless here since RealTek uses the same PCI id for
> > PCIe variants.
> > 
> > BTW, I was told that the vendor's FreeBSD driver seems to work fine
> > for normal usage pattern.  The vendor's driver triggered an instant
> > panic and lacked H/W offloading features in the past.  It might
> > have changed though.
> 
> The problemacy with re(4) drivers arose again, when I bought some "green"
> equipment, mainly switches, which reduces power emission on short cables or
> non-connected ports. This brought down some servers with re(4) chipsets
> immediately and I had no clue what happend. I do not know whether this is a

I'm not sure but it's likely the issue is related with EEE/Green
Ethernet handling. EEE is negotiated feature with link partner. If
you directly connect your laptop to non-EEE capable link partner
like other re(4) box without switches you may be able to tell
whether the issue is EEE/Green Ethernet related one or not.

> single fate so to speak, or this problem will arise for others, too. We
> exchanged on serving hardware all Realtek NICs with those from Intel, and
> luckily some server mainboards already have Intel PHY or NICs. The Broadcom
> devices we have on some older Fujitus hardware is also stable like a charme,
> even with the new power saving switches.
> 

bge(4) also lacks EEE support(Publicly available datasheet is too
sanitized one).  bge(4) firmware probably does not announce EEE
capability by default in link establishment while recent re(4)
devices seem to unconditionally announce EEE.  Generally EEE
handling requires a kind of handshake for link state change from
MAC/PHY.

> While we can swap on server or workstation platforms the NIC, it is almost
> impossible on laptops and the number of laptops with realtek chips seems to
> grow. It is a pity that the venodr of the chipsets reject supporting other 
> OSes
> than Windows - or in some rare cases only Linux. After you wrote the answer, I
> checked on the net who's suiatble drivers and the situation seems bad for
> almost all OSes apart from commercial ones like Windooze and Apple OS X.
> 
> As soon as I get hands on the laptop again, I'll send the requested
> informations. I know that I played around with re(4) and rgephy(4) in the
> kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect -
> except that it offered some additional "media xxx-options-xxx" mostly appended
> with "flow" - but rying brought also down the system as pluggin or unplugging.

rgephy(4) will show recognized PHY H/W model. Another information
I'd like to know is OUI 

WRITE_FPDMA_QUEUED error when installing on MBP 2014

2016-10-24 Thread Lundberg, Johannes
Hi All

I read that people successfully installed FreeBSD on 2014's MacBook Pros.

I just got a used machine (in excellent shape) and try to install FreeBSD
current from USB memory besides OSX and Linux.

Every time unpacking fails with WRITE_FPDMA_QUEUED timeout error.

I'm worried that the SSD can be damaged so I liked to confirm if anyone
knows if there is any known issues like this? The error is reported for ada
and not da but could it be a source error (USB memory, downloaded memstick
image)?

A 'dd if=/dev/sda of=/dev/null' show no error on Linux. Will try FreeBSD
next.

Grateful for any feedback.

Thanks!

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
秘密保持について:この電子メールは、名宛人に送信したものであり、秘匿特権の対象となる情報を含んでいます。
もし、名宛人以外の方が受信された場合、このメールの破棄、およびこのメールに関する一切の開示、
複写、配布、その他の利用、または記載内容に基づくいかなる行動もされないようお願い申し上げます。
---
CONFIDENTIALITY NOTE: The information in this email is confidential
and intended solely for the addressee.
Disclosure, copying, distribution or any other action of use of this
email by person other than intended recipient, is prohibited.
If you are not the intended recipient and have received this email in
error, please destroy the original message.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: r307877: buildkernel fails: x86/cpu_machdep.c:564:1: error: function definition is not allowed here {

2016-10-24 Thread Konstantin Belousov
On Mon, Oct 24, 2016 at 10:01:48PM +0200, Hartmann, O. wrote:
> r307877 fails to buildkernel with the error shown below:
> 
> [...]
> /usr/src/sys/x86/x86/cpu_machdep.c:564:1: error: function definition is
> not allowed here {
> ^
> /usr/src/sys/x86/x86/cpu_machdep.c:574:2: error: expected '}'
> }
>  ^
> /usr/src/sys/x86/x86/cpu_machdep.c:541:1: note: to match this '{'
> {
> ^
> 2 errors generated.

Should be fixed in r307880.
Thank you for the report.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SVN r307866 compilation problem

2016-10-24 Thread Konstantin Belousov
On Mon, Oct 24, 2016 at 02:58:43PM -0400, Michael Butler wrote:
> It seems that compilation of -current fails in the case that KDB is not 
> defined.
> 
> I'm assuming that the following diff achieves what was intended:
> 
> imb@vm01:/usr/src/sys/x86/x86> svn diff
> Index: cpu_machdep.c
> ===
> --- cpu_machdep.c   (revision 307875)
> +++ cpu_machdep.c   (working copy)
> @@ -540,9 +540,9 @@
>   nmi_call_kdb(u_int cpu, u_int type, struct trapframe *frame, bool 
> do_panic)
>   {
> 
> +#ifdef KDB
>  /* machine/parity/power fail/"kitchen sink" faults */
>  if (isa_nmi(frame->tf_err) == 0) {
> -#ifdef KDB
>  /*
>   * NMI can be hooked up to a pushbutton for debugging.
>   */
Um, no.  isa_nmi() should be checked and panic avoided regardless
of the panic_on_nmi setting, if no hw error was reported.  It is
#endif that was misplaced.

This and another change, are committed as r307880.

Thank you for the report.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


r307877: buildkernel fails: x86/cpu_machdep.c:564:1: error: function definition is not allowed here {

2016-10-24 Thread Hartmann, O.
r307877 fails to buildkernel with the error shown below:

[...]
/usr/src/sys/x86/x86/cpu_machdep.c:564:1: error: function definition is
not allowed here {
^
/usr/src/sys/x86/x86/cpu_machdep.c:574:2: error: expected '}'
}
 ^
/usr/src/sys/x86/x86/cpu_machdep.c:541:1: note: to match this '{'
{
^
2 errors generated.
*** Error code 1
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


SVN r307866 compilation problem

2016-10-24 Thread Michael Butler
It seems that compilation of -current fails in the case that KDB is not 
defined.


I'm assuming that the following diff achieves what was intended:

imb@vm01:/usr/src/sys/x86/x86> svn diff
Index: cpu_machdep.c
===
--- cpu_machdep.c   (revision 307875)
+++ cpu_machdep.c   (working copy)
@@ -540,9 +540,9 @@
 nmi_call_kdb(u_int cpu, u_int type, struct trapframe *frame, bool 
do_panic)

 {

+#ifdef KDB
/* machine/parity/power fail/"kitchen sink" faults */
if (isa_nmi(frame->tf_err) == 0) {
-#ifdef KDB
/*
 * NMI can be hooked up to a pushbutton for debugging.
 */
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FreeBSD 11.x grinds to a halt after about 48h of uptime

2016-10-24 Thread Ulrich Spörlein
On Sat, 2016-10-15 at 09:36:27 -0700, Kevin Oberman wrote:
> On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky 
> wrote:
> 
> > On 10/15/16 18:18, Ulrich Spörlein wrote:
> >
> >> Hey all, while 11.x is -STABLE now, this happens to my machine ever
> >> since I upgraded it to 11-CURRENT years ago. I have no idea when this
> >> started, actually, but what always happens is this:
> >>
> >> - System and X11 is up and running, I keep it running over night as I'm
> >> too lazy to reboot and restart everthing.
> >> - There's a bunch of xterms, Chrome, Clementine-Player and some other
> >> programs running
> >> - Coming back to the machine the next day (or the day after) it will
> >> exit the screensaver just fine and then either I can use it for a couple
> >> of seconds before it freezes, or it's pretty much dead already. The
> >> mouse cursor still moves for a bit, but the also freezes (so it this a
> >> GPU problem??)
> >>
> >> Now what I currently see on the screen is a clock widget stuck at 18:04
> >> but conky itself has last updated at 18:00:18 ...
> >>
> >> This time I had some SSH sessions from another machine to see some more
> >> useful things. There was nothing in various logs under /var/log (I also
> >> can't run dmesg anymore ...)
> >> I had top(1) running in a loop, this is the last output:
> >>
> >> last pid: 25633;  load averages:  0.27,  0.39,  0.36  up 1+23:03:28
> >> 18:00:12
> >> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting
> >>
> >> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Free
> >> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 1012M Other
> >> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse
> >>
> >>
> >>   PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIMEWCPU
> >> COMMAND
> >>11 root8 155 ki31 0K   128K CPU00 364.6H 772.95%
> >> idle
> >>  3122 uqs15  280  7113M  5861M uwait   0
> >> 94:44  13.96% chrome
> >>2887 uqs28  220  1394M   237M
> >> select  2 172:53   6.98% chrome
> >>2890 uqs11  210
> >> 1034M   178M select  5 231:21   1.95% chrome
> >>1062 root9
> >> 210   440M 47220K select  0  67:09   0.98% Xorg
> >>  3002 uqs
> >>   15  255  1159M   172M uwait   2  19:09   0.00% chrome
> >>  3139 uqs17  255  1163M   156M uwait   2  16:15   0.00%
> >> chrome
> >>  3001 uqs18  255  1639M   575M uwait   0  16:05   0.00%
> >> chrome
> >>12 root   24 -64- 0K   384K WAIT   -1  10:53   0.00%
> >> intr
> >>  3129 uqs12  200  2820M  1746M uwait   6   8:36   0.00%
> >> chrome
> >>  2822 uqs 9  200   217M 81300K select  0   5:10   0.00%
> >> conky
> >>  3174 root1  200 21532K  3188K select  0   4:20   0.00%
> >> systat
> >>  3130 uqs16  200  1058M   131M uwait   4   3:03   0.00%
> >> chrome
> >>  2998 uqs16  200  1110M   123M uwait   2   2:53   0.00%
> >> chrome
> >>  3165 uqs10  200  1209M   215M uwait   6   2:52   0.00%
> >> chrome
> >>  3142 uqs11  255  1344M   195M uwait   2   2:46   0.00%
> >> chrome
> >>  2876 uqs19  200   580M 37164K select  3   2:42   0.00%
> >> clementine-player
> >>20 root2 -16- 0K32K psleep  6   2:25   0.00%
> >> pagedaemon
> >>
> >> I also had systat -vm running and it continued to update its screen ...
> >> for a short while, this is the last update before SSH died:
> >>
> >>
> >>Mem usage:  0k%Phy  5%Kmem
> >> Mem: KBREALVIRTUAL  VN PAGER   SWAP
> >> PAGER
> >> Tot   Share  TotShareFree   in   out in
> >>  out
> >> Act  11051k   67868 71051992   255448   61840  count
> >> All  11051k   67924 71058776   262100  pages
> >> Proc:
> >> Interrupts
> >>   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Fltioflt   224
> >> total
> >>  25 730  11   724  109  404  101   13 cow   2
> >> ehci0 16
> >>   zfod  3
> >> ehci1 23
> >>  0.0%Sys   0.1%Intr  0.0%User  0.0%Nice 99.9%Idle ozfod16
> >> cpu0:timer
> >> ||||||||||   %ozfod
> >>  xhci0 264
> >>   daefr 3 em0
> >> 265
> >> 50 dtbuf  prcfr94
> >> hdac1 266
> >> Namei Name-cache   Dir-cache349167 desvn  totfr
> >>  ahci0 270
> >>Callshits   %hits   %349155 numvn  react 5
> >> cpu1:timer
> >>  121 121 100253501 frevn  pdwak 1
> >> cpu2:timer
> >> 

Re: CURRENT: re(4) crashing system

2016-10-24 Thread O. Hartmann
On Mon, 24 Oct 2016 14:14:00 +0900
YongHyeon PYUN  wrote:

> On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:
> > I tried to report earlier here that CURRENT does have some serious
> > problems right now and one of those problems seems to be triggered by
> > the recent re(4) driver. The problem is also present in recen 11-STABLE!
> > 
> > Below, you'll find pciconf-output reagrding the device on a Lenovo E540
> > Laptop I can test on and trigger the problem.
> > 
> > The phenomenon is that this NIC does not negotiate 1000baseTX, it is
> > always falling back to 100baseTX although the device claims to be a 1
> > GBit capable device.
> > 
> > When I try to put the device manually into 1000basTX mode via
> > 
> > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)
> > 
> > it is possible to crash the system. The system also crashes when
> > plugging/unplugging the LAN cord - I guess the renegotiation is
> > triggering this crash immediately.
> > 
> > I tried with several switches and routers capable of 1 GBit and it
> > seems to be independent from the network hardware in use.
> > 
> > I tried to capture a backtrace when the kernel crashes, but I do not
> > know how to save the the kernel debugger output. Although I configured
> > according the handbook debugging, there is no coredump at all.
> > 
> > Advice is appreciated - if anybody is interesetd in solving this. 
> >   
> 
> There were several instability reports on re(4).  I vaguely guess
> it would be related with some missing initializations for certain
> controllers.  Unfortunately, there is no publicly available
> datasheet for those controllers and it's not likely to get access
> to it in near future.  It seems vendor's FreeBSD driver accesses
> lots of magic registers as well as loading DSP fixups.  I have no
> idea what it wants to do and re(4) used to heavily rely on power-on
> default register values.  Engineering samples I have do not show
> instabilities so it wouldn't be easy to identify the issue.
> 
> Probably the first step to address the issue would be identifying
> those chips and narrowing down the scope of guessing.  Would you
> show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
> output is useless here since RealTek uses the same PCI id for
> PCIe variants.
> 
> BTW, I was told that the vendor's FreeBSD driver seems to work fine
> for normal usage pattern.  The vendor's driver triggered an instant
> panic and lacked H/W offloading features in the past.  It might
> have changed though.

The problemacy with re(4) drivers arose again, when I bought some "green"
equipment, mainly switches, which reduces power emission on short cables or
non-connected ports. This brought down some servers with re(4) chipsets
immediately and I had no clue what happend. I do not know whether this is a
single fate so to speak, or this problem will arise for others, too. We
exchanged on serving hardware all Realtek NICs with those from Intel, and
luckily some server mainboards already have Intel PHY or NICs. The Broadcom
devices we have on some older Fujitus hardware is also stable like a charme,
even with the new power saving switches.

While we can swap on server or workstation platforms the NIC, it is almost
impossible on laptops and the number of laptops with realtek chips seems to
grow. It is a pity that the venodr of the chipsets reject supporting other OSes
than Windows - or in some rare cases only Linux. After you wrote the answer, I
checked on the net who's suiatble drivers and the situation seems bad for
almost all OSes apart from commercial ones like Windooze and Apple OS X.

As soon as I get hands on the laptop again, I'll send the requested
informations. I know that I played around with re(4) and rgephy(4) in the
kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect -
except that it offered some additional "media xxx-options-xxx" mostly appended
with "flow" - but rying brought also down the system as pluggin or unplugging.
The last kernel I compiled was then without rgephy(4) - the NIC worked as
expected, but pluggin/unplugging or having some power-down activities on a
Netgear SoHo green-pwer switch brings the system down as usual. 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"