Re: random lockups (now suspecting zfs)
On Sat, 4 Nov 2023, Simon Burge wrote: Hi Greg, Greg Troxel wrote: Fri, Oct 20, 2023 at 01:11:15PM -0400, Greg Troxel wrote: A different machine has locked up, running recent netbsd-10. I was doing pkgsrc rebuilds in zfs, in a dom0 with 4G of RAM, with 8G total physical. It has a private patch to reduce the amount of memory used for ARC, which has been working well. Are you still seeing the problem below even with limiting the amount of memory ARC can use? All 3 tmux windows show something like [ 373598.5266510] load: 0.00 cmd: bash 21965 [flt_noram5] 0.37u 2.89s 0% 6396k and I can switch among them and ^T, but trying to run top is stuck (in flt_noram5). I'll give it an hour or so, and have a look at the console. I've seen cc1plus processes wedged in either flt_noram or tstile after doing multiple builds, and a reboot is the only way out. I'm using ZFS for everything except swap and some mostly-unused media files that live on an FFS. So to me this feels like a locking botch in a rare path in zfs. This appears to be the case. Chuck Silvers has some understanding of the problem and I'm helping test, but at this stage there isn't a fix available. :/ It's interesting that you see the lockups during pkgsrc builds, i.e. a period where there is lots of file creation. We use zfs on backup systems that pull in data with rsync. During the initial runs (where every file is new) we usually get a couple of lockups, but during day to day operation (few changes) it is reliable. These are on physical and virtual machines running NetBSD 9 with the rule of thumb of 1GB RAM per TB of storage obeyed, but no patches besides setting MAXPHYS in the module to 32k for Xen. -- Stephen
Re: ipmi0: incorrect critical max
On Sat, 18 Mar 2023, Lloyd Parkes wrote: On 18/03/23 05:14, Stephen Borrill wrote: On an HP Microserver Gen10 Plus, I found that soon after booting, I get the following alert: ... Current CritMax WarnMax WarnMin CritMin Unit [ipmi0] 11-LOM-CORE: 59.253 0.000 110.471 degC Just out of interest, in the BIOS (RBSU) what is the Power Management / Power Regulator set to? It will have settings such as "Dynamic Power Savings Mode" and "OS Control Mode". I set it to Maximum I/O Performance (words may not match exactly, it is in a box waiting to be installed at a customer). -- Stephen
ipmi0: incorrect critical max
On an HP Microserver Gen10 Plus, I found that soon after booting, I get the following alert: ipmi0: critical over limit on '11-LOM-CORE' If powerd is running (the default), it shuts the machine down (so basically as soon as it hits multi-user). envstat shows that CritMax is zero: Current CritMax WarnMax WarnMin CritMin Unit [ipmi0] 11-LOM-CORE:59.2530.000 110.471degC Seen on 9.3_STABLE, but also in 10 BETA. I suppose one simple fix would be to ensure that if CritMax is lower than WarnMax, it should be set to the value of WarnMax. Any other things to look at? The machine won't be put into production for a few days, so it's good time to experiment I have put the latest BIOS on the machine -- Stephen
Re: ixg wierdness
On Wed, 22 Dec 2021, Patrick Welche wrote: On Wed, Dec 22, 2021 at 01:34:25PM +0100, Hauke Fath wrote: On Wed, 22 Dec 2021 12:26:21 +, Patrick Welche wrote: The box in 53155 is Hauke's - also a Dell, but slightly different model. he@, not hauke@ -- no Dell boxes here. Sorry - Havard's! On the 51355 front, dholland asks if the 2 bnx hang issue is the same as 47229, and it looks like it. From the email threads quoted in 47229, the gist seems to be that the issue doesn't exist on /i386, just /amd64. I reported something similar on an IBM x3550M3 back in 2019, too: http://mail-index.netbsd.org/tech-net/2019/03/19/msg007302.html -- Stephen
Re: sdmmc_mem_enable failed with error 60
On Tue, Mar 24, 2020 at 03:48:03PM +, Patrick Welche wrote: Last time I played with my raspberry pi zero w, I couldn't see the network card and saw sdmmc_mem_enable failed with error 60 Now I'm seeing the same thing on a new amd64 laptop trying to use another new 32GB microsd card. I opened kern/54959 in the rpi0w case. The laptop has a rtsx0 at pci6 dev 0 function 0: Realtek Semiconductor RTS525A PCI-E Card Reader (rev. 0x01) rtsx0: interrupting at msi2 vec 0 sdmmc0 at rtsx0 I'm testing a Dell Latitude 3190 and its hard drive is on sdmmc0 so I have no storage: [ 1.016863] sdhc0 at pci0 dev 28 function 0: Intel Gemini Lake eMMC (rev. 0x06) [ 1.016863] sdhc0: interrupting at ioapic0 pin 39 [ 1.016863] sdhc0: SDHC 3.0, rev 16, SDMA, 20 kHz, embedded slot, HS SDR50 DDR50 SDR104 HS200 1.8V, re-tuning mode 1 (128s timer), 2048 byte blocks [ 1.016863] sdmmc0 at sdhc0 slot 0 [ 4.914910] sdmmc0: sdmmc_mem_enable failed with error 60 [ 4.924910] sdmmc0: autoconfiguration error: couldn't enable card: 60 Full dmesg (without SDMMC_DEBUG): http://www.netbsd.org/~sborrill/dmesg.lt3190 And acpidump: http://www.netbsd.org/~sborrill/acpidump.lt3190 -- Stephen
Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1
On Mon, 6 Jul 2020, Martin Husemann wrote: On Mon, Jul 06, 2020 at 05:07:51PM +0100, Mike Pumford wrote: A quick look around suggests that some of the very high end gaming ones don't. Also assuming users will actually be able to find a cable to actually hook up the motherboard COM port is optimistic. You would probably have to get one second hand these days and if I remember correctly there are 2 incompatible pinouts for the 10 pin header. :( I had no trouble ordering new ones late last year. If you wanted a branded one (!), there's a Lenovo server option with SKU 7Z17A02577 (I don't know about the pinout off-hand, but I have one here I could buzz through if anyone really cared). -- Stephen
Re: modload & xen and -current 9.99.60
On Fri, 8 May 2020, Manuel Bouyer wrote: On Fri, May 08, 2020 at 02:55:10PM +0200, Frank Kardel wrote: I checked to same kernel in an instance with memory=2048 and it just works. Using todays kernel also works woth memory=2048. Using memory=65536 for the xen instance gives a surprising familiar TEST-A# modload bpfjit [ 97.4727034] kobj_load, 444: [%M/bpfjit/bpfjit.kmod]: linker error: out of memory modload: bpfjit: Cannot allocate memory TEST-A# So it seems to be linked to available memory. The more you have the less you get for modload. It could be a variable overflow somewhere but I can't see how it relates to 64Gb. Does it work with 16Gb ? This sounds similar to the problem I reported a couple of weeks ago with exactly 16GB: http://mail-index.netbsd.org/port-xen/2020/04/17/msg009654.html Also could you try with a PVH or HVM guest ? These ones would use modules from /stand/amd64/ and not /stand/amd64-xen/ and should be close to native. I don't have a box with that much RAM to test ... -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: xen & uefi
On Fri, 20 Mar 2020, Brad Spencer wrote: Patrick Welche writes: Is booting into xen from uefi meant to work? I have a slightly unorthdox set up, but get: NetBSD/x86 EFI Boot (x64), Revision 1.1 (Tue Jan 28 13:49:42 UTC 2020) (from) ... Start @ 0xce60 [1=0xce982000-0xce9820ec]... Trampoline space cannot be allocated; will try fallback. I didn't think that a DOM0 + UEFI worked anywhere very well at this point or at least was not a default... See https://wiki.xenproject.org/wiki/Xen_EFI for example... or ... https://ubuntuforums.org/showthread.php?t=2413434 It's been the default on XenServer (Citrix Hypervisor) for a long time (if you boot the installer from uEFI). -- Stephen
Re: XEN3_DOMU no longer shutting down or rebooting
On Fri, 1 Mar 2019, Chavdar Ivanov wrote: On 3/1/19 в 1:02 AM, Mathew, Cherry G.: Would be could to know the dom0 versions it broke under, please. The DOMU is tCentOS 7.6, the virtualizer is XCP-NG v.7.6. That's not what Cherry's asking for. Try the output of xl dmesg on XCP-NG. Near the top, you'll see lines like: (XEN) [0.00] Xen version 4.7.6-6.3 (mockbuild@[unknown]) (gcc (GCC) 7.3.0) debug=n Fri Nov 9 15:37:51 UTC 2018 (XEN) [0.00] Latest ChangeSet: 9a6cc4f5c14b, pq 15428fd29b9a uname -a will give you something like: Linux xen01 4.4.0+10 #1 SMP Fri Aug 24 08:15:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux (the above are from fully-patched XenServer 7.6, so I'm mildly curious to see what XCP-NG is based on). -- Stephen
Re: mfii0 kudos to bouyer@ Was Re: dmesg | grep -c "not configured" = 240...
On Tue, 4 Dec 2018, Martin Husemann wrote: On Tue, Dec 04, 2018 at 07:17:59PM +, Mike Pumford wrote: One thing that surprised me was that I was testing with the USB install image but instead of landing in sysinst I ended up at a a login prompt which was unexpected. Could this be because the USB disk that was my root device ended up as sd23 and there is a hard coded sd0 somewhere in the install code? No hardcoded sd0, but maybe the boot device matching did not properly work for this case (depends on geometry and stuff that the bootloader gets from bios USB emulation or something). Interesting, I noticed exactly the same when booting a -current image to test the original mfii changes. In my case, sd0 and sd1 are the HW RAID arrays and sd2 is the USB stick: [ 8.177453] sd2 at scsibus1 target 0 lun 0: disk removable [ 8.177453] sd2: 3958 MB, 522 cyl, 255 head, 63 sec, 512 bytes/sect x 8105984 sectors [ 8.287566] boot device: sd2 [ 8.287566] root on sd2a dumps on sd2b -- Stephen
Re: mfii0 kudos to bouyer@ Was Re: dmesg | grep -c "not configured" = 240...
On Thu, 29 Nov 2018, Manuel Bouyer wrote: On Thu, Nov 29, 2018 at 03:56:37PM +, Stephen Borrill wrote: [snip] The other missing driver is handled by mpii in OpenBSD (SAS3408). Our mpii doesn't yet support any SAS3 cards. [ 1.048805] vendor 1000 product 00af (SAS mass storage, revision 0x01) at pci4 dev 0 function 0 not configured Do you have drives connected to this controller ? If so I can probably come up with a patch this week-end. The SAS3 has a sighly different interface, but from looking at the OpenBSD driver it's all in a single function. I cannot easily attach drives to it (it has external ports only, and I would need to drag it to our datacenter to connect it to something). Let's see what Mike Pumford's PCI IDs are. If I do go to the datacenter however, I should also be able to MegaRAID 3108 support (IBM ServeRAID M5210). Do you have a gut feel on how easy it would be to backport your mpii changes to -7 and -8? -- Stephen
Re: Running NetBSD-current in PV mode under Xen
On Thu, 29 Nov 2018, Chavdar Ivanov wrote: I was trying to respond to and old pr of mine - https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=47486 - and went through the installation of XCP-NG (after I found out about the existence of this project and that Citrix has apparently changed some licensing conditions after XenServer 7.2). Latest -current works in HVM mode (after switching the network adapter emulation to e1000, modulo the weird mouse behaviour under X, but I am not bothered about it). If I switch the system to PV mode following the method described in the above pr, the machine apparently starts and in some 10-15 seconds stops, without showing any console. I checked through the XCP-NG log files, but could not iodentify anythin obvious (I may have missed stuff - they are copious). I seem to recall a relatively recent discussion about NetBSD not working any more under AWS PV - some parameter needed to be modified, but could not find a reference; could that be related? /opt/xensource/libexec/xen-cmdline --set-xen pv-linear-pt=true See N.B. (3) here: https://www.precedence.co.uk/wiki/Support-KB-Citrix/XenServer-Hotfixes -- Stephen
Re: mfii0 kudos to bouyer@ Was Re: dmesg | grep -c "not configured" = 240...
On Mon, 26 Nov 2018, Stephen Borrill wrote: Thanks Manuel! [ 1.048805] mfii0 at pci11 dev 0 function 0: "RAID 930-8i 2GB Flash", firmware 50.3.0-1075, 2048MB cache [ 1.048805] mfii0: interrupting at ioapic4 pin 2 [ 1.048805] scsibus0 at mfii0: 64 targets, 8 luns per target [ 2.161214] scsibus0: waiting 2 seconds for devices to settle... [ 2.161214] mfii0: physical disk inserted id 18 enclosure 134 [ 2.161214] mfii0: physical disk inserted id 19 enclosure 134 [ 2.161214] mfii0: physical disk inserted id 20 enclosure 134 [ 2.161214] mfii0: physical disk inserted id 21 enclosure 134 [ 4.163289] sd0 at scsibus0 target 0 lun 0: 5.03> disk fixed [ 4.163289] sd0: fabricating a geometry [ 4.163289] sd0: 2234 GB, 2287864 cyl, 64 head, 32 sec, 512 bytes/sect x 4685545472 sectors [ 4.163289] sd0: fabricating a geometry [ 4.163289] sd0: tagged queueing [ 4.163289] sd1 at scsibus0 target 1 lun 0: 5.03> disk fixed [ 4.163289] sd1: fabricating a geometry [ 4.163289] sd1: 744 GB, 761985 cyl, 64 head, 32 sec, 512 bytes/sect x 1560545280 sectors [ 4.163289] sd1: fabricating a geometry [ 4.163289] sd1: tagged queueing [32.192359] mfii0: critical limit on 'mfii0 BBU state' [32.192359] mfii0: normal state on 'mfii0:0' (online) [32.192359] mfii0: normal state on 'mfii0:1' (online) The other missing driver is handled by mpii in OpenBSD (SAS3408). Our mpii doesn't yet support any SAS3 cards. [ 1.048805] vendor 1000 product 00af (SAS mass storage, revision 0x01) at pci4 dev 0 function 0 not configured -- Stephen
mfii0 kudos to bouyer@ Was Re: dmesg | grep -c "not configured" = 240...
Thanks Manuel! [ 1.048805] mfii0 at pci11 dev 0 function 0: "RAID 930-8i 2GB Flash", firmware 50.3.0-1075, 2048MB cache [ 1.048805] mfii0: interrupting at ioapic4 pin 2 [ 1.048805] scsibus0 at mfii0: 64 targets, 8 luns per target [ 2.161214] scsibus0: waiting 2 seconds for devices to settle... [ 2.161214] mfii0: physical disk inserted id 18 enclosure 134 [ 2.161214] mfii0: physical disk inserted id 19 enclosure 134 [ 2.161214] mfii0: physical disk inserted id 20 enclosure 134 [ 2.161214] mfii0: physical disk inserted id 21 enclosure 134 [ 4.163289] sd0 at scsibus0 target 0 lun 0: disk fixed [ 4.163289] sd0: fabricating a geometry [ 4.163289] sd0: 2234 GB, 2287864 cyl, 64 head, 32 sec, 512 bytes/sect x 4685545472 sectors [ 4.163289] sd0: fabricating a geometry [ 4.163289] sd0: tagged queueing [ 4.163289] sd1 at scsibus0 target 1 lun 0: disk fixed [ 4.163289] sd1: fabricating a geometry [ 4.163289] sd1: 744 GB, 761985 cyl, 64 head, 32 sec, 512 bytes/sect x 1560545280 sectors [ 4.163289] sd1: fabricating a geometry [ 4.163289] sd1: tagged queueing [32.192359] mfii0: critical limit on 'mfii0 BBU state' [32.192359] mfii0: normal state on 'mfii0:0' (online) [32.192359] mfii0: normal state on 'mfii0:1' (online) # bioctl mfii0 show Volume Status Size Device/LabelLevel Stripe = 0 Online 2.2T System RAID 1N/A 65535 seconds 0:0 Online 2.2T 1:0.0 noencl 0:1 Online 2.2T 1:1.0 noencl 1 Online 744G WriteCache RAID 1N/A 65535 seconds 1:0 Online 745G 1:2.0 noencl 1:1 Online 745G 1:3.0 noencl dmesg | grep -c "not configured" = 239 :-) On Mon, 19 Feb 2018, Stephen Borrill wrote: So I've just got a Lenovo ThinkSystem SR630 and: # dmesg | grep -c "not configured" 240 http://www.netbsd.org/~sborrill/sr630.dmesg.txt Main issues are missing Ethernet (Intel X722) and RAID controller: vendor 8086 product 37d2 (ethernet network, revision 0x09) at pci7 dev 0 function 0 not configured vendor 8086 product 37d2 (ethernet network, revision 0x09) at pci7 dev 0 function 1 not configured vendor 8086 product 37d2 (ethernet network, revision 0x09) at pci7 dev 0 function 2 not configured vendor 8086 product 37d2 (ethernet network, revision 0x09) at pci7 dev 0 function 3 not configured vendor 1000 product 0016 (RAID mass storage, revision 0x01) at pci11 dev 0 function 0 not configured msaitoh@ - have you looked at the Intel X722 gigabit controllers? As for the RAID controller, we are missing support for all recent LSI/Symbios/Avago/Broadcom controllers meaning no support for lots of servers from Lenovo/HP, etc. OpenBSD's mfii supports most of these: https://www.precedence.co.uk/wiki/Support-KB-IBM/PCIIDs NetBSD has extended mfi to support a few variants, but OpenBSD has split the driver into mfi and mfii which makes porting more tricky. I tried OpenBSD 6.2 (last release), but the support for the RAID controller in this server was added after 6.2. On OpenBSD: # dmesg | grep -c "not configured" 350 -- Stephen
Re: M.2 SSDs and Marvell 88SE9230 SATA
On Tue, 3 Jul 2018, Stephen Borrill wrote: Any informed guesses whether M.2 SSDs will work if I buy them for my Lenovo server? Based on the following, I've determined they should be Marvell 88SE9230: https://discussions.citrix.com/topic/396920-unable-to-install-xen-server-75-onto-lenovo-m2-drive/ Only 88SE91XX is currently explicitly supported by ahcisata: ahcisata_pci.c: { PCI_VENDOR_MARVELL2, PCI_PRODUCT_MARVELL2_88SE91XX, pcidevs:product MARVELL2 88SE91XX 0x91a3 88SE91XX SATA pcidevs:product MARVELL2 88SE9215 0x9215 88SE9215 SATA pcidevs:product MARVELL2 88SE9220 0x9220 88SE9220 SATA pcidevs:product MARVELL2 88SE9230 0x9230 88SE9230 SATA pcidevs:product MARVELL2 88SE9235 0x9235 88SE9235 SATA I'm probably going to try nonetheless (how hard can it be to add???), but any hints based on experience would be useful. OK, so I fitted a pair of M.2 SSDs: https://lenovopress.com/lp0769-thinksystem-m2-drives-adapters Without doing anything they appeared as JBODs (along with a extra virtual device): ahcisata2 at pci3 dev 0 function 0: vendor 1b4b product 9230 (rev. 0x11) ahcisata2: interrupting at ioapic0 pin 18 ahcisata2: 64-bit DMA ahcisata2: AHCI revision 1.20, 3 ports, 32 slots, CAP 0xc0309f02 atabus12 at ahcisata2 channel 0 atabus13 at ahcisata2 channel 1 atabus14 at ahcisata2 channel 2 wd0: wd0: drive supports 1-sector PIO transfers, LBA48 addressing wd0: 119 GB, 248085 cyl, 16 head, 63 sec, 512 bytes/sect x 250069680 sectors wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), NCQ (32 tags) wd0(ahcisata2:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags) wd1 at atabus13 drive 0 wd1: wd1: drive supports 1-sector PIO transfers, LBA48 addressing wd1: 119 GB, 248085 cyl, 16 head, 63 sec, 512 bytes/sect x 250069680 sectors wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), NCQ (32 tags) wd1(ahcisata2:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags) atapibus0 at atabus14: 1 targets uk0 at atapibus0 drive 0: processor fixed uk0: drive supports PIO mode 4, Ultra-DMA mode 4 (Ultra/66) uk0(ahcisata2:2:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA) By going into the uEFI setup, I could easily set up a RAID-1 and a new slightly smaller virtual wd0 appeared and wd1 disappeared: wd0 at atabus12 drive 0 wd0: wd0: drive supports 16-sector PIO transfers, LBA48 addressing wd0: 119 GB, 247954 cyl, 16 head, 63 sec, 512 bytes/sect x 249938560 sectors wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7, NCQ (32 tags) wd0(ahcisata2:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags) Performance isn't too shabby either. bonnie says: 119691 K/sec output char 137535 K/sec output block 130260 K/sec output rewrite 171789 K/sec input char 260260 K/sec input block 13679 seeks/sec So happy days! -- Stephen
M.2 SSDs and Marvell 88SE9230 SATA
Any informed guesses whether M.2 SSDs will work if I buy them for my Lenovo server? Based on the following, I've determined they should be Marvell 88SE9230: https://discussions.citrix.com/topic/396920-unable-to-install-xen-server-75-onto-lenovo-m2-drive/ Only 88SE91XX is currently explicitly supported by ahcisata: ahcisata_pci.c: { PCI_VENDOR_MARVELL2, PCI_PRODUCT_MARVELL2_88SE91XX, pcidevs:product MARVELL2 88SE91XX 0x91a3 88SE91XX SATA pcidevs:product MARVELL2 88SE9215 0x9215 88SE9215 SATA pcidevs:product MARVELL2 88SE9220 0x9220 88SE9220 SATA pcidevs:product MARVELL2 88SE9230 0x9230 88SE9230 SATA pcidevs:product MARVELL2 88SE9235 0x9235 88SE9235 SATA I'm probably going to try nonetheless (how hard can it be to add???), but any hints based on experience would be useful. -- Stephen
dmesg | grep -c "not configured" = 240...
So I've just got a Lenovo ThinkSystem SR630 and: # dmesg | grep -c "not configured" 240 http://www.netbsd.org/~sborrill/sr630.dmesg.txt Main issues are missing Ethernet (Intel X722) and RAID controller: vendor 8086 product 37d2 (ethernet network, revision 0x09) at pci7 dev 0 function 0 not configured vendor 8086 product 37d2 (ethernet network, revision 0x09) at pci7 dev 0 function 1 not configured vendor 8086 product 37d2 (ethernet network, revision 0x09) at pci7 dev 0 function 2 not configured vendor 8086 product 37d2 (ethernet network, revision 0x09) at pci7 dev 0 function 3 not configured vendor 1000 product 0016 (RAID mass storage, revision 0x01) at pci11 dev 0 function 0 not configured msaitoh@ - have you looked at the Intel X722 gigabit controllers? As for the RAID controller, we are missing support for all recent LSI/Symbios/Avago/Broadcom controllers meaning no support for lots of servers from Lenovo/HP, etc. OpenBSD's mfii supports most of these: https://www.precedence.co.uk/wiki/Support-KB-IBM/PCIIDs NetBSD has extended mfi to support a few variants, but OpenBSD has split the driver into mfi and mfii which makes porting more tricky. I tried OpenBSD 6.2 (last release), but the support for the RAID controller in this server was added after 6.2. On OpenBSD: # dmesg | grep -c "not configured" 350 -- Stephen
Re: DHCP client: dhclient vs dhcpcd ?
On Thu, 1 Feb 2018, Thomas Mueller wrote: On Wed, Jan 31, 2018 at 1:18 PM, KIRIHARA Masaharuwrote: NetBSD has two DHCP clients; dhclient(8) and dhcpcd(8). What's the difference? Which is better to use? On Wed, 31 Jan 2018 13:47:42 +0100, Benny Siegert responded: I agree that this is confusing. dhclient is the older tool, while dhcpcd has been created by a NetBSD developer, is newer and smaller. I have run into situations (on Google Compute Engine for instance) where dhclient was unable to interpret some of the more modern DHCP features. I recommend using dhcpcd :) I have read about NetBSD planning to drop dhclient in favor of dhcpcd. I have had installations where dhcpcd succeeded where dhclient failed, and (7.99.1 amd64) where dhclient succeeded where dhcpcd failed. Failure means not being able to set up the internet connection even if the command ran without error messages. I have also had a situation where neither dhcpcd nor dhclient could establish the internet connection, but I was able to connect by using ifconfig and route directly. I notice NetBSD's dhclient is very big while FreeBSD's dhclient is much smaller, like $ ls -l /sbin/dhclient -r-xr-xr-x 1 root wheel 100056 Jul 31 2017 /sbin/dhclient $ ls -l /media/zip0/sbin/dh* -r-xr-xr-x 1 root wheel 5352184 Jun 20 2017 /media/zip0/sbin/dhclient -r-xr-xr-x 1 root wheel 6221 Jun 20 2017 /media/zip0/sbin/dhclient-script -r-xr-xr-x 1 root wheel 299176 Jun 20 2017 /media/zip0/sbin/dhcpcd running from FreeBSD 11.1-STABLE where /media/zip0 is mount point for NetBSD 8.99.1 installation. Interesting. NetBSD-5/i386: # ls -l /sbin/dhclient -r-xr-xr-x 1 root wheel 353002 May 21 2010 /sbin/dhclient NetBSD-7/amd64: # ls -l /sbin/dhclient -r-xr-xr-x 1 root wheel 5056282 Jan 16 14:27 /sbin/dhclient Both dynamically linked, not stripped. -- Stephen
Re: The NPF firewall leaks! (was Re: in_cksum: out of data)
On Tue, 6 Dec 2016, Tom Ivar Helbekkmo wrote: Tom Ivar Helbekkmowrites: So far, I have just one improvement suggestion for npf: the ability to use sets instead of singletons in rules is great, but needs to be extended to letting sets of addresses and networks cross address families. I now have one more. I accidentally created a leak in my npf configuration, partially caused by looking at the example in the man page npf.conf(5). I've got several VLANs, one of them connected to the outside world, and the others to internal networks with various levels of trust. To limit access among them, I've configured npf to handle each VLAN by allowing all outbound traffic, statefully, while limiting inbound traffic to the particular connections I want to allow. The groups typically follow this pattern: group "vlan10" on $vlan10 { pass stateful out final all pass in final proto tcp to $somehost port $someservices pass in final proto udp to $somehost port $otherservices block return in final all } Can you spot the vulnerability? Some of the attack software that probes well-known ports to look for holes, will respond to a TCP RST by sending a new TCP SYN from the very same source port. Guess what npf does then? :) Yup, the TCP RST sent by the last line of the above example gets permitted out by the rule in the first line, updating the connection state -- and the next connection attempt is permitted. I had to change the above to this: group "vlan10" on $vlan10 { pass stateful out final proto tcp flags S/SAFR all pass out final proto tcp all pass stateful out final all pass in final proto tcp to $somehost port $someservices pass in final proto udp to $somehost port $otherservices block return in final all } It's fine and all, but I tend to think that the simplistic first version might automatically expand to the code in the second one. In fact, the documentation seems to agree with me: By default, a stateful rule implies SYN-only flag check ("flags S/SAFR") for the TCP packets. It is not advisable to change this behavior; however, it can be overridden with the flags keyword. The code or the documentation needs to change. I vote for the code. :) Yep, I found it was pretty easy to naively end up with rules where if I added a block it was allowed and if I removed it, it was blocked... -- Stephen
Re: Lenovo T500 hang - down to DRM
On Fri, 10 Jun 2016, Stephen Borrill wrote: On Sun, 5 Jun 2016, Michael van Elst wrote: On Sat, Jun 04, 2016 at 04:15:15PM +0100, Stephen Borrill wrote: Under Windows it is cool and I get 6-hour battery life... Does it have an ATI graphics card and also Intel graphics? Yes. And that's the root of the hanging problem. If Integrated graphics (Intel) or Discrete graphics (ATI) are selected in the BIOS, the machine boots. With Intel, DRM attaches very early on: pci0 at mainbus0 bus 0: configuration mode 1 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x2a40 (rev. 0x07) agp0 at pchb0: G4X-family chipset agp0: detected 32252k stolen memory agp0: aperture at 0xd000, size 0x1000 i915drmkms0 at pci0 dev 2 function 0: vendor 0x8086 product 0x2a42 (rev. 0x07) drm: Memory usable by graphics device = 512M drm: Supports vblank timestamp caching Rev 2 (21.10.2013). drm: Driver supports precise vblank timestamp query. i915drmkms0: interrupting at ioapic0 pin 16 (i915) intelfb0 at i915drmkms0 i915drmkms0: info: registered panic notifier With ATI, DRM attaches much, much later on: pad0: outputs: 44100Hz, 16-bit, stereo audio1 at pad0: half duplex, playback, capture boot device: wd0 root on wd0a dumps on wd0b root file system type: ffs drm: initializing kernel modesetting (RV635 0x1002:0x9591 0x17AA:0x2117). drm: register mmio base: 0xcfff drm: register mmio size: 65536 drm kern info: ATOM BIOS: M86M radeon0: info: VRAM: 256M 0x - 0x0FFF (256M used) radeon0: info: GTT: 512M 0x1000 - 0x2FFF Note that this is around the point where the machine hangs. If Switchable graphics is enabled in the BIOS (which is the default it likes to keep resetting back to), the hang occurs. My theory is that i915drmkms0 attaches early and then the later probe for radeon0 (and perhaps even trying to double up on DRM?) is causing the hang. When the device is hidden or disabled by the BIOS, it is OK. Between RC2 (OK) and RC3 (hang) there were a number of changes to the radeon and i915 drm code. Not had chance to test which yet. dmesgs here: http://dmesgd.nycbug.org/index.cgi?do=view=2980 http://dmesgd.nycbug.org/index.cgi?do=view=2981 -- Stephen
Re: Lenovo T500 hang - down to DRM
On Sun, 5 Jun 2016, Michael van Elst wrote: On Sat, Jun 04, 2016 at 04:15:15PM +0100, Stephen Borrill wrote: Under Windows it is cool and I get 6-hour battery life... Does it have an ATI graphics card and also Intel graphics? Yes. And that's the root of the hanging problem. If Integrated graphics (Intel) or Discrete graphics (ATI) are selected in the BIOS, the machine boots. With Intel, DRM attaches very early on: pci0 at mainbus0 bus 0: configuration mode 1 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x2a40 (rev. 0x07) agp0 at pchb0: G4X-family chipset agp0: detected 32252k stolen memory agp0: aperture at 0xd000, size 0x1000 i915drmkms0 at pci0 dev 2 function 0: vendor 0x8086 product 0x2a42 (rev. 0x07) drm: Memory usable by graphics device = 512M drm: Supports vblank timestamp caching Rev 2 (21.10.2013). drm: Driver supports precise vblank timestamp query. i915drmkms0: interrupting at ioapic0 pin 16 (i915) intelfb0 at i915drmkms0 i915drmkms0: info: registered panic notifier With ATI, DRM attaches much, much later on: pad0: outputs: 44100Hz, 16-bit, stereo audio1 at pad0: half duplex, playback, capture boot device: wd0 root on wd0a dumps on wd0b root file system type: ffs drm: initializing kernel modesetting (RV635 0x1002:0x9591 0x17AA:0x2117). drm: register mmio base: 0xcfff drm: register mmio size: 65536 drm kern info: ATOM BIOS: M86M radeon0: info: VRAM: 256M 0x - 0x0FFF (256M used) radeon0: info: GTT: 512M 0x1000 - 0x2FFF Note that this is around the point where the machine hangs. If Switchable graphics is enabled in the BIOS (which is the default it likes to keep resetting back to), the hang occurs. My theory is that i915drmkms0 attaches early and then the later probe for radeon0 (and perhaps even trying to double up on DRM?) is causing the hang. When the device is hidden or disabled by the BIOS, it is OK. Between RC2 (OK) and RC3 (hang) there were a number of changes to the radeon and i915 drm code. Not had chance to test which yet. -- Stephen
Re: Lenovo T500 hang
On Fri, 3 Jun 2016, Michael van Elst wrote: net...@precedence.co.uk (Stephen Borrill) writes: Does it run at a sensible temperature? Mine runs very hot and runs the battery down very quickly. Probably needs some cleaning and/or there is a problem with the fan and heat-pipes. Under Windows it is cool and I get 6-hour battery life... -- Stephen
Re: Lenovo T500 hang
On Thu, 2 Jun 2016, Roy Marples wrote: On 2016-06-01 10:54, Stephen Borrill wrote: Somewhere after 7.0_RC2, a problem started where the machine hangs right at the end of the kernel boot just before it prints the boot device: pad0: outputs: 44100Hz, 16-bit, stereo audio1 at pad0: half duplex, playback, capture *** HANGS HERE *** boot device: wd0 root on wd0a dumps on wd0b The only thing to do is to power the machine off. This happens with all later kernels (including -current). I've not had chance to bisect the sources yet as I need to use the machine for real work (and running NetBSD 7 on it also makes it so hot it burns my legs). I think a few users (and developers) have Lenovo T500 laptops. Does any recent NetBSD work for them? I have a T500 which is my main NetBSD dev machine. Do you have a dmesg? Perhaps yours is a model without switchable graphics or the 3G modem, etc. I haven't updated the kernel in a month or so, but it's been running -current very well. Does it run at a sensible temperature? Mine runs very hot and runs the battery down very quickly. I don't recall if it ever ran anything earlier than a 7.0 release though. I've been using mine since the netbsd-5 and this problem just started. -- Stephen
Lenovo T500 hang
Somewhere after 7.0_RC2, a problem started where the machine hangs right at the end of the kernel boot just before it prints the boot device: pad0: outputs: 44100Hz, 16-bit, stereo audio1 at pad0: half duplex, playback, capture *** HANGS HERE *** boot device: wd0 root on wd0a dumps on wd0b The only thing to do is to power the machine off. This happens with all later kernels (including -current). I've not had chance to bisect the sources yet as I need to use the machine for real work (and running NetBSD 7 on it also makes it so hot it burns my legs). I think a few users (and developers) have Lenovo T500 laptops. Does any recent NetBSD work for them? -- Stephen
Re: USB scanners and PR 50340
On Fri, 18 Mar 2016, Gary Duzan wrote: =>Dave Tysonwrites: => =>> I note that PR 50340 has been closed and with the latest pkgsrc =>>under current (amd64) my Mustek 1200 UB scanner seems to work OK =>>- but I have comment out the uscanner device in the kernel and use =>>it as a ugen device. It seems that this is the 'new world order' =>>and the sane backend code to handle uscanner devices is deprecated. =>>Given this is the case is there any point in still keeping the =>> =>> uscanner* at uhub? port ? =>> =>> in GENERIC? => =>Quite possibly we should remove (comment out) uscanner in GENERIC. =>ulpt is more controversial, but cups wants to use libusb too. => =>> I am of the same opinion as the PR originator that it is easier =>>to control access permissions with a uscanner device rather than =>>having to open up a whole raft of ugen devices, but I guess the =>>sane developers feel that using libusb makes support easier... => =>Perhaps if we had something called uscanner that would match scanners =>and that libusb would fine, we could have the permissions management of =>direct matching but the cope-with-the-rest-of-the-world benefit of =>libusb. Can we not build some sort of bus-like device to which both the specialized and generic devices can attach which prevents opening both at the same time? An alternative is to have a method to detach the kernel driver so that you can revert to ugen access (and probably method to reattach too). This is true for all usb devices (e.g. uvideo, umass, etc.). libusb has the following API, but we don't have the kernel support for it. int libusb_kernel_driver_active (libusb_device_handle *dev, int interface_number) Determine if a kernel driver is active on an interface. int libusb_detach_kernel_driver (libusb_device_handle *dev, int interface_number) Detach a kernel driver from an interface. int libusb_attach_kernel_driver (libusb_device_handle *dev, int interface_number) Re-attach an interface's kernel driver, which was previously detached using libusb_detach_kernel_driver(). int libusb_set_auto_detach_kernel_driver (libusb_device_handle *dev, int enable) Being able to detach kernel drivers would allow for USB remoting (e.g. http://usbip.sourceforge.net/ or Citrix Receiver). It would aid development of drivers with rump too. -- Stephen
Re: MegaRAID 3008/3108
On Thu, 4 Jun 2015, David Brownlee wrote: On 4 June 2015 at 08:03, Frank Kardel kar...@netbsd.org wrote: On 06/03/15 20:27, Christos Zoulas wrote: In article 20150603111042.4fad14b2@taliesin-2.local, Harry Waddell wadd...@caravaninfotech.com wrote: On Tue, 2 Jun 2015 16:13:07 +0100 (BST) Stephen Borrill net...@precedence.co.uk wrote: Anyone working on adding support for SYMBIOS MEGARAID 3108 (0x1000/0x005d) or 3008 (0x1000/0x005f)? These are supported in OpenBSD by the mfii driver which also supports the MEGARAID 2208 (0x1000/0x005b). In NetBSD, the mfi(4) driver was extended to support the 2208 (Thunderbolt) rather than adding a new driver. The 3008/3108 will require another MFI_IOP type (OpenBSD call it 25). -- Stephen I have a system with this on the motherboard, but I'm dropping an lsi 9261-i8 in because the newer cards are not supported. My vendor has told me that the 9261 is near EOL, so it would be really helpful if someone could add support for the newer LSI cards. Unfortunately, I don't have much experience in this area. Shouldn't be too hard to do... As long someone has a card to test... christos One of our customer systems (Dell PowerEdge R730) has this card. I got it to work by adding the pciids to the driver and crudely adjusting the thunderbolt support to use EOM markers, remove the setting of a flag. I/O seemed to be working (installation was ok and the system was running fine. Issues left were: Absysmal I/O performance on SSDs (no non SSDs were available) in the range of 5 - 40 Mb/sec averaging around 20 Mb/sec. Checking other OS delivered: FreeBSD 10 - 5 MB/sec, OpenBSD 420 Mb/sec slowly decreasing. Linux SuSe 13.2 - 525-490 MB/sec. So due to time constraints and a customer machine we went for the fastest. Patches (mis-using the MFI_IOP type for thunderbolt) have been postedalready. OpenBSD seems to have an additional change in the way i/o commands are handled. Would it help to get a card or machine into the hands of someone with time to work on the driver? Maybe something like http://www.ebay.co.uk/itm/251453703470 ? I'm happy to throw something into the pot :) I'm interested in IBM ServeRAID M5210 (3108): http://www.redbooks.ibm.com/abstracts/tips1069.html And IBM ServeRAID M1215 (3008): http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/tips1174.html I can get these at trade prices if anyone is interested (also happy to provide hardware for someone to work with). -- Stephen
MegaRAID 3008/3108
Anyone working on adding support for SYMBIOS MEGARAID 3108 (0x1000/0x005d) or 3008 (0x1000/0x005f)? These are supported in OpenBSD by the mfii driver which also supports the MEGARAID 2208 (0x1000/0x005b). In NetBSD, the mfi(4) driver was extended to support the 2208 (Thunderbolt) rather than adding a new driver. The 3008/3108 will require another MFI_IOP type (OpenBSD call it 25). -- Stephen
PXE entry invalid, so PXE boot hangs
I'm trying to PXE boot an x86 box. The setup works fine on all my other kit, but on the problem one I see: booting netbsd - starting in 0 seconds. pxe_init: bad cksum (0xbc) for PXENV+ at 0x900d8 PXE BIOS Version 2.1 *hang* I'm prepared for a flaky BIOS, but it does boot pxelinux and Citrix Provisioning Services OK. Adding a few printfs, I see it found PXENV+ at two locations: 0x900d8 (rejected as bad checksum) and 0x8bb52. PXE+ was found at 0x8baf2. As it's PXE BIOS 2.1, it ignores the PXENV+ info. The hang is because it never returns from this call: pxe_call(PXENV_GET_CACHED_INFO); http://nxr.netbsd.org/xref/src/sys/arch/i386/stand/pxeboot/pxe.c#380 pxelinux uses 5 methods in priority order to find the pxe structure. NetBSD only uses a memory scan which is combination of its final 2 (it calls these plans D and E). pxelinux prints: !PXE entry point found (we hope) at 8A44:0100 via plan A http://git.kernel.org/cgit/boot/syslinux/syslinux.git/tree/core/fs/pxe/bios.c?id=a7f5892c4d85f3685708b8efb237c9c73a8b1ddf#n240 These addresses correspond to bangpxe_seg and bangpxe_off: http://nxr.netbsd.org/xref/src/sys/arch/i386/stand/pxeboot/pxe.c#360 Printing these shows that bangpxe_reg is 0, not 0x8a44. Hardwiring bangpxe_reg to 0x8a44 gets the machine booting (well, the kernel panics later on, but that's a different story). Therefore, it looks like the structure found by memory scanning is incorrect and perhaps we should implement Linux's plans A and B (C being the int 0x1a function 0x5650 that we explicitly choose not to support). These involve reading points from offsets relative to InitStack. Where does this correspond to in NetBSD? -- Stephen
Re: Problemns after updating from 6.1.4 to 6.1.5
On Sun, 19 Oct 2014, John Nemeth wrote: On Oct 20, 6:38am, Paul Goyette wrote: } On Sun, 19 Oct 2014, John Nemeth wrote: } } } * I have to load the kernel from an external partition using grub, and } }thus have to edit grub's menu.lst config file! } } } } * The booted kernel is independent of what is in /netbsd, so I currently } }have to manually gunzip(1) the kernel on the external partition and } }put the results in /netbsd } } Why would you be using grub when you're keeping the kernel } outside the NetBSD partition. All you need to do is add: } } kernel = path to kernel } } to your domU config. xentools is fully capable of loading a NetBSD } kernel, including one that is gzipped. An example from one of my } config files is: } } kernel = /usr/pkg/etc/xen/kernels/netbsd-7-XEN3_DOMU.gz } } This is not something that should be dependent on your dom0 as } xentools is supplied by the Xen project. } } Most likely, I don't know enough (approx zero) of the XEN environment to } get it right. I'm just following the explicit instructions from my DOM0 } provider. So, you're using a VPS. This can change everything depending on how they do things. Which raises the question, who is the provider? } As a quick summary, I initially boot up a common DOMU which runs some } variant of Linux. A customer-specific very small ext2fs partition is } mounted (based on my login information) on which I put my grub menu.lst } file and the kernel(s) I need to boot. This partition is accessible This tells me that they are most likely using pygrub or pvgrub. Basically these things dig around in the image for the domU to find a grub config file, which then tells p[yv]grub what needs to be extracted (i.e. it will copy the kernel {and initramhd for linux} out of the image and hand it to Xen). This also tells me that in your case, the menu.lst thing is necessary. A couple of notes on how this relates to XenServer (and perhaps other xapi-based providers) - more for the record rather than necessarily of immediate relevance: - if loading a kernel from the dom0, it has to be in a very specific location otherwise you get annoyingly vague errors - pygrub does not need menu.lst if the kernel path is set in the VM properties: PV-kernel ( RW): PV-ramdisk ( RW): PV-args ( RW): PV-legacy-args ( RW): PV-bootloader ( RW): pygrub PV-bootloader-args ( RW): --kernel=/netbsd - pygrub does not need a MBR partition table, it will deal with just a disklabel with root starting at 0. However, if you've had an MBR partition on there in the past, ensure you wipe sector zero, not just the partition table otherwise it will spot the 0xaa55 signature and insist on reading the non-existent MBR leading to a No partition bot failure. -- Stephen
Re: KASSERT fail in uvm_page.c in latest -7
On Sat, 18 Oct 2014, Stephen Borrill wrote: Just updated my netbsd-7 sources and built and installed a new release from it (amd64). Now GENERIC will not boot, when trying to start init it fails the KASSERT on line 1226 of uvm/uvm_page.c: KASSERT(obj == NULL || mutex_owned(obj-vmobjlock)) Apologies for the noise, I think this was down to running an update build after the gcc changes so that .o files were mixed between two different compilers. Anyway, wiping out obj and doing a clean build fixed it. -- Stephen
KASSERT fail in uvm_page.c in latest -7
Just updated my netbsd-7 sources and built and installed a new release from it (amd64). Now GENERIC will not boot, when trying to start init it fails the KASSERT on line 1226 of uvm/uvm_page.c: KASSERT(obj == NULL || mutex_owned(obj-vmobjlock)) -- Stephen