Re: CURRENT: re(4) crashing system
On Tue, 25 Oct 2016 11:05:38 +0900 YongHyeon PYUNwrote: > On Mon, Oct 24, 2016 at 02:03:37PM +0200, O. Hartmann wrote: > > On Mon, 24 Oct 2016 14:14:00 +0900 > > YongHyeon PYUN wrote: > > > > > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote: > > > > I tried to report earlier here that CURRENT does have some > > > > serious problems right now and one of those problems seems to > > > > be triggered by the recent re(4) driver. The problem is also > > > > present in recen 11-STABLE! > > > > > > > > Below, you'll find pciconf-output reagrding the device on a > > > > Lenovo E540 Laptop I can test on and trigger the problem. > > > > > > > > The phenomenon is that this NIC does not negotiate 1000baseTX, > > > > it is always falling back to 100baseTX although the device > > > > claims to be a 1 GBit capable device. > > > > > > > > When I try to put the device manually into 1000basTX mode via > > > > > > > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) > > > > driver) > > > > > > > > it is possible to crash the system. The system also crashes when > > > > plugging/unplugging the LAN cord - I guess the renegotiation is > > > > triggering this crash immediately. > > > > > > > > I tried with several switches and routers capable of 1 GBit and > > > > it seems to be independent from the network hardware in use. > > > > > > > > I tried to capture a backtrace when the kernel crashes, but I > > > > do not know how to save the the kernel debugger output. > > > > Although I configured according the handbook debugging, there > > > > is no coredump at all. > > > > > > > > Advice is appreciated - if anybody is interesetd in solving > > > > this. > > > > > > There were several instability reports on re(4). I vaguely guess > > > it would be related with some missing initializations for certain > > > controllers. Unfortunately, there is no publicly available > > > datasheet for those controllers and it's not likely to get access > > > to it in near future. It seems vendor's FreeBSD driver accesses > > > lots of magic registers as well as loading DSP fixups. I have no > > > idea what it wants to do and re(4) used to heavily rely on > > > power-on default register values. Engineering samples I have do > > > not show instabilities so it wouldn't be easy to identify the > > > issue. > > > > > > Probably the first step to address the issue would be identifying > > > those chips and narrowing down the scope of guessing. Would you > > > show me the dmesg output(re(4) and regphy(4) only)? pciconf(8) > > > output is useless here since RealTek uses the same PCI id for > > > PCIe variants. > > > > > > BTW, I was told that the vendor's FreeBSD driver seems to work > > > fine for normal usage pattern. The vendor's driver triggered an > > > instant panic and lacked H/W offloading features in the past. It > > > might have changed though. > > > > The problemacy with re(4) drivers arose again, when I bought some > > "green" equipment, mainly switches, which reduces power emission on > > short cables or non-connected ports. This brought down some servers > > with re(4) chipsets immediately and I had no clue what happend. I > > do not know whether this is a > > I'm not sure but it's likely the issue is related with EEE/Green > Ethernet handling. EEE is negotiated feature with link partner. If > you directly connect your laptop to non-EEE capable link partner > like other re(4) box without switches you may be able to tell > whether the issue is EEE/Green Ethernet related one or not. Me either since when I discovered a problem the first time with CURRENT, that was the Friday before last week's Friday, there was a unlucky coicidence: I got the new switch, FreeBSD introduced a serious bug and I changed the NICs. The laptop, the last in the row of re(4) equipted systems on which I use the Realtek NIC, does well now with Green IT technology, but crashes on plugging/unplugging - not on each event, but at least in one of ten. I guess the Green IT issue is more a unlucky guess of mine and went hand in hand with the problem I face with CURRENT right now on some older, Non UEFI machines. > > > single fate so to speak, or this problem will arise for others, > > too. We exchanged on serving hardware all Realtek NICs with those > > from Intel, and luckily some server mainboards already have Intel > > PHY or NICs. The Broadcom devices we have on some older Fujitus > > hardware is also stable like a charme, even with the new power > > saving switches. > > bge(4) also lacks EEE support(Publicly available datasheet is too > sanitized one). bge(4) firmware probably does not announce EEE > capability by default in link establishment while recent re(4) > devices seem to unconditionally announce EEE. Generally EEE > handling requires a kind of handshake for link state change from > MAC/PHY. > > > While we can swap on server or workstation platforms the NIC, it
Re: CURRENT: re(4) crashing system
On Mon, Oct 24, 2016 at 02:03:37PM +0200, O. Hartmann wrote: > On Mon, 24 Oct 2016 14:14:00 +0900 > YongHyeon PYUNwrote: > > > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote: > > > I tried to report earlier here that CURRENT does have some serious > > > problems right now and one of those problems seems to be triggered by > > > the recent re(4) driver. The problem is also present in recen 11-STABLE! > > > > > > Below, you'll find pciconf-output reagrding the device on a Lenovo E540 > > > Laptop I can test on and trigger the problem. > > > > > > The phenomenon is that this NIC does not negotiate 1000baseTX, it is > > > always falling back to 100baseTX although the device claims to be a 1 > > > GBit capable device. > > > > > > When I try to put the device manually into 1000basTX mode via > > > > > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver) > > > > > > it is possible to crash the system. The system also crashes when > > > plugging/unplugging the LAN cord - I guess the renegotiation is > > > triggering this crash immediately. > > > > > > I tried with several switches and routers capable of 1 GBit and it > > > seems to be independent from the network hardware in use. > > > > > > I tried to capture a backtrace when the kernel crashes, but I do not > > > know how to save the the kernel debugger output. Although I configured > > > according the handbook debugging, there is no coredump at all. > > > > > > Advice is appreciated - if anybody is interesetd in solving this. > > > > > > > There were several instability reports on re(4). I vaguely guess > > it would be related with some missing initializations for certain > > controllers. Unfortunately, there is no publicly available > > datasheet for those controllers and it's not likely to get access > > to it in near future. It seems vendor's FreeBSD driver accesses > > lots of magic registers as well as loading DSP fixups. I have no > > idea what it wants to do and re(4) used to heavily rely on power-on > > default register values. Engineering samples I have do not show > > instabilities so it wouldn't be easy to identify the issue. > > > > Probably the first step to address the issue would be identifying > > those chips and narrowing down the scope of guessing. Would you > > show me the dmesg output(re(4) and regphy(4) only)? pciconf(8) > > output is useless here since RealTek uses the same PCI id for > > PCIe variants. > > > > BTW, I was told that the vendor's FreeBSD driver seems to work fine > > for normal usage pattern. The vendor's driver triggered an instant > > panic and lacked H/W offloading features in the past. It might > > have changed though. > > The problemacy with re(4) drivers arose again, when I bought some "green" > equipment, mainly switches, which reduces power emission on short cables or > non-connected ports. This brought down some servers with re(4) chipsets > immediately and I had no clue what happend. I do not know whether this is a I'm not sure but it's likely the issue is related with EEE/Green Ethernet handling. EEE is negotiated feature with link partner. If you directly connect your laptop to non-EEE capable link partner like other re(4) box without switches you may be able to tell whether the issue is EEE/Green Ethernet related one or not. > single fate so to speak, or this problem will arise for others, too. We > exchanged on serving hardware all Realtek NICs with those from Intel, and > luckily some server mainboards already have Intel PHY or NICs. The Broadcom > devices we have on some older Fujitus hardware is also stable like a charme, > even with the new power saving switches. > bge(4) also lacks EEE support(Publicly available datasheet is too sanitized one). bge(4) firmware probably does not announce EEE capability by default in link establishment while recent re(4) devices seem to unconditionally announce EEE. Generally EEE handling requires a kind of handshake for link state change from MAC/PHY. > While we can swap on server or workstation platforms the NIC, it is almost > impossible on laptops and the number of laptops with realtek chips seems to > grow. It is a pity that the venodr of the chipsets reject supporting other > OSes > than Windows - or in some rare cases only Linux. After you wrote the answer, I > checked on the net who's suiatble drivers and the situation seems bad for > almost all OSes apart from commercial ones like Windooze and Apple OS X. > > As soon as I get hands on the laptop again, I'll send the requested > informations. I know that I played around with re(4) and rgephy(4) in the > kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect - > except that it offered some additional "media xxx-options-xxx" mostly appended > with "flow" - but rying brought also down the system as pluggin or unplugging. rgephy(4) will show recognized PHY H/W model. Another information I'd like to know is OUI
WRITE_FPDMA_QUEUED error when installing on MBP 2014
Hi All I read that people successfully installed FreeBSD on 2014's MacBook Pros. I just got a used machine (in excellent shape) and try to install FreeBSD current from USB memory besides OSX and Linux. Every time unpacking fails with WRITE_FPDMA_QUEUED timeout error. I'm worried that the SSD can be damaged so I liked to confirm if anyone knows if there is any known issues like this? The error is reported for ada and not da but could it be a source error (USB memory, downloaded memstick image)? A 'dd if=/dev/sda of=/dev/null' show no error on Linux. Will try FreeBSD next. Grateful for any feedback. Thanks! -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 秘密保持について:この電子メールは、名宛人に送信したものであり、秘匿特権の対象となる情報を含んでいます。 もし、名宛人以外の方が受信された場合、このメールの破棄、およびこのメールに関する一切の開示、 複写、配布、その他の利用、または記載内容に基づくいかなる行動もされないようお願い申し上げます。 --- CONFIDENTIALITY NOTE: The information in this email is confidential and intended solely for the addressee. Disclosure, copying, distribution or any other action of use of this email by person other than intended recipient, is prohibited. If you are not the intended recipient and have received this email in error, please destroy the original message. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: r307877: buildkernel fails: x86/cpu_machdep.c:564:1: error: function definition is not allowed here {
On Mon, Oct 24, 2016 at 10:01:48PM +0200, Hartmann, O. wrote: > r307877 fails to buildkernel with the error shown below: > > [...] > /usr/src/sys/x86/x86/cpu_machdep.c:564:1: error: function definition is > not allowed here { > ^ > /usr/src/sys/x86/x86/cpu_machdep.c:574:2: error: expected '}' > } > ^ > /usr/src/sys/x86/x86/cpu_machdep.c:541:1: note: to match this '{' > { > ^ > 2 errors generated. Should be fixed in r307880. Thank you for the report. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: SVN r307866 compilation problem
On Mon, Oct 24, 2016 at 02:58:43PM -0400, Michael Butler wrote: > It seems that compilation of -current fails in the case that KDB is not > defined. > > I'm assuming that the following diff achieves what was intended: > > imb@vm01:/usr/src/sys/x86/x86> svn diff > Index: cpu_machdep.c > === > --- cpu_machdep.c (revision 307875) > +++ cpu_machdep.c (working copy) > @@ -540,9 +540,9 @@ > nmi_call_kdb(u_int cpu, u_int type, struct trapframe *frame, bool > do_panic) > { > > +#ifdef KDB > /* machine/parity/power fail/"kitchen sink" faults */ > if (isa_nmi(frame->tf_err) == 0) { > -#ifdef KDB > /* > * NMI can be hooked up to a pushbutton for debugging. > */ Um, no. isa_nmi() should be checked and panic avoided regardless of the panic_on_nmi setting, if no hw error was reported. It is #endif that was misplaced. This and another change, are committed as r307880. Thank you for the report. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
r307877: buildkernel fails: x86/cpu_machdep.c:564:1: error: function definition is not allowed here {
r307877 fails to buildkernel with the error shown below: [...] /usr/src/sys/x86/x86/cpu_machdep.c:564:1: error: function definition is not allowed here { ^ /usr/src/sys/x86/x86/cpu_machdep.c:574:2: error: expected '}' } ^ /usr/src/sys/x86/x86/cpu_machdep.c:541:1: note: to match this '{' { ^ 2 errors generated. *** Error code 1 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
SVN r307866 compilation problem
It seems that compilation of -current fails in the case that KDB is not defined. I'm assuming that the following diff achieves what was intended: imb@vm01:/usr/src/sys/x86/x86> svn diff Index: cpu_machdep.c === --- cpu_machdep.c (revision 307875) +++ cpu_machdep.c (working copy) @@ -540,9 +540,9 @@ nmi_call_kdb(u_int cpu, u_int type, struct trapframe *frame, bool do_panic) { +#ifdef KDB /* machine/parity/power fail/"kitchen sink" faults */ if (isa_nmi(frame->tf_err) == 0) { -#ifdef KDB /* * NMI can be hooked up to a pushbutton for debugging. */ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FreeBSD 11.x grinds to a halt after about 48h of uptime
On Sat, 2016-10-15 at 09:36:27 -0700, Kevin Oberman wrote: > On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky> wrote: > > > On 10/15/16 18:18, Ulrich Spörlein wrote: > > > >> Hey all, while 11.x is -STABLE now, this happens to my machine ever > >> since I upgraded it to 11-CURRENT years ago. I have no idea when this > >> started, actually, but what always happens is this: > >> > >> - System and X11 is up and running, I keep it running over night as I'm > >> too lazy to reboot and restart everthing. > >> - There's a bunch of xterms, Chrome, Clementine-Player and some other > >> programs running > >> - Coming back to the machine the next day (or the day after) it will > >> exit the screensaver just fine and then either I can use it for a couple > >> of seconds before it freezes, or it's pretty much dead already. The > >> mouse cursor still moves for a bit, but the also freezes (so it this a > >> GPU problem??) > >> > >> Now what I currently see on the screen is a clock widget stuck at 18:04 > >> but conky itself has last updated at 18:00:18 ... > >> > >> This time I had some SSH sessions from another machine to see some more > >> useful things. There was nothing in various logs under /var/log (I also > >> can't run dmesg anymore ...) > >> I had top(1) running in a loop, this is the last output: > >> > >> last pid: 25633; load averages: 0.27, 0.39, 0.36 up 1+23:03:28 > >> 18:00:12 > >> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting > >> > >> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Free > >> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 1012M Other > >> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse > >> > >> > >> PID USERNAME THR PRI NICE SIZERES STATE C TIMEWCPU > >> COMMAND > >>11 root8 155 ki31 0K 128K CPU00 364.6H 772.95% > >> idle > >> 3122 uqs15 280 7113M 5861M uwait 0 > >> 94:44 13.96% chrome > >>2887 uqs28 220 1394M 237M > >> select 2 172:53 6.98% chrome > >>2890 uqs11 210 > >> 1034M 178M select 5 231:21 1.95% chrome > >>1062 root9 > >> 210 440M 47220K select 0 67:09 0.98% Xorg > >> 3002 uqs > >> 15 255 1159M 172M uwait 2 19:09 0.00% chrome > >> 3139 uqs17 255 1163M 156M uwait 2 16:15 0.00% > >> chrome > >> 3001 uqs18 255 1639M 575M uwait 0 16:05 0.00% > >> chrome > >>12 root 24 -64- 0K 384K WAIT -1 10:53 0.00% > >> intr > >> 3129 uqs12 200 2820M 1746M uwait 6 8:36 0.00% > >> chrome > >> 2822 uqs 9 200 217M 81300K select 0 5:10 0.00% > >> conky > >> 3174 root1 200 21532K 3188K select 0 4:20 0.00% > >> systat > >> 3130 uqs16 200 1058M 131M uwait 4 3:03 0.00% > >> chrome > >> 2998 uqs16 200 1110M 123M uwait 2 2:53 0.00% > >> chrome > >> 3165 uqs10 200 1209M 215M uwait 6 2:52 0.00% > >> chrome > >> 3142 uqs11 255 1344M 195M uwait 2 2:46 0.00% > >> chrome > >> 2876 uqs19 200 580M 37164K select 3 2:42 0.00% > >> clementine-player > >>20 root2 -16- 0K32K psleep 6 2:25 0.00% > >> pagedaemon > >> > >> I also had systat -vm running and it continued to update its screen ... > >> for a short while, this is the last update before SSH died: > >> > >> > >>Mem usage: 0k%Phy 5%Kmem > >> Mem: KBREALVIRTUAL VN PAGER SWAP > >> PAGER > >> Tot Share TotShareFree in out in > >> out > >> Act 11051k 67868 71051992 255448 61840 count > >> All 11051k 67924 71058776 262100 pages > >> Proc: > >> Interrupts > >> r p d s w Csw Trp Sys Int Sof Fltioflt 224 > >> total > >> 25 730 11 724 109 404 101 13 cow 2 > >> ehci0 16 > >> zfod 3 > >> ehci1 23 > >> 0.0%Sys 0.1%Intr 0.0%User 0.0%Nice 99.9%Idle ozfod16 > >> cpu0:timer > >> |||||||||| %ozfod > >> xhci0 264 > >> daefr 3 em0 > >> 265 > >> 50 dtbuf prcfr94 > >> hdac1 266 > >> Namei Name-cache Dir-cache349167 desvn totfr > >> ahci0 270 > >>Callshits %hits %349155 numvn react 5 > >> cpu1:timer > >> 121 121 100253501 frevn pdwak 1 > >> cpu2:timer > >>
Re: CURRENT: re(4) crashing system
On Mon, 24 Oct 2016 14:14:00 +0900 YongHyeon PYUNwrote: > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote: > > I tried to report earlier here that CURRENT does have some serious > > problems right now and one of those problems seems to be triggered by > > the recent re(4) driver. The problem is also present in recen 11-STABLE! > > > > Below, you'll find pciconf-output reagrding the device on a Lenovo E540 > > Laptop I can test on and trigger the problem. > > > > The phenomenon is that this NIC does not negotiate 1000baseTX, it is > > always falling back to 100baseTX although the device claims to be a 1 > > GBit capable device. > > > > When I try to put the device manually into 1000basTX mode via > > > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver) > > > > it is possible to crash the system. The system also crashes when > > plugging/unplugging the LAN cord - I guess the renegotiation is > > triggering this crash immediately. > > > > I tried with several switches and routers capable of 1 GBit and it > > seems to be independent from the network hardware in use. > > > > I tried to capture a backtrace when the kernel crashes, but I do not > > know how to save the the kernel debugger output. Although I configured > > according the handbook debugging, there is no coredump at all. > > > > Advice is appreciated - if anybody is interesetd in solving this. > > > > There were several instability reports on re(4). I vaguely guess > it would be related with some missing initializations for certain > controllers. Unfortunately, there is no publicly available > datasheet for those controllers and it's not likely to get access > to it in near future. It seems vendor's FreeBSD driver accesses > lots of magic registers as well as loading DSP fixups. I have no > idea what it wants to do and re(4) used to heavily rely on power-on > default register values. Engineering samples I have do not show > instabilities so it wouldn't be easy to identify the issue. > > Probably the first step to address the issue would be identifying > those chips and narrowing down the scope of guessing. Would you > show me the dmesg output(re(4) and regphy(4) only)? pciconf(8) > output is useless here since RealTek uses the same PCI id for > PCIe variants. > > BTW, I was told that the vendor's FreeBSD driver seems to work fine > for normal usage pattern. The vendor's driver triggered an instant > panic and lacked H/W offloading features in the past. It might > have changed though. The problemacy with re(4) drivers arose again, when I bought some "green" equipment, mainly switches, which reduces power emission on short cables or non-connected ports. This brought down some servers with re(4) chipsets immediately and I had no clue what happend. I do not know whether this is a single fate so to speak, or this problem will arise for others, too. We exchanged on serving hardware all Realtek NICs with those from Intel, and luckily some server mainboards already have Intel PHY or NICs. The Broadcom devices we have on some older Fujitus hardware is also stable like a charme, even with the new power saving switches. While we can swap on server or workstation platforms the NIC, it is almost impossible on laptops and the number of laptops with realtek chips seems to grow. It is a pity that the venodr of the chipsets reject supporting other OSes than Windows - or in some rare cases only Linux. After you wrote the answer, I checked on the net who's suiatble drivers and the situation seems bad for almost all OSes apart from commercial ones like Windooze and Apple OS X. As soon as I get hands on the laptop again, I'll send the requested informations. I know that I played around with re(4) and rgephy(4) in the kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect - except that it offered some additional "media xxx-options-xxx" mostly appended with "flow" - but rying brought also down the system as pluggin or unplugging. The last kernel I compiled was then without rgephy(4) - the NIC worked as expected, but pluggin/unplugging or having some power-down activities on a Netgear SoHo green-pwer switch brings the system down as usual. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"