Re: 11.1 running on HyperV hn interface hangs
On Fri, 29 Sep 2017 13:31:22 +0800 Sepherosa Ziehau <se...@freebsd.org> wrote: > On Fri, Sep 29, 2017 at 12:36 PM, Paul Koch <paul.k...@akips.com> wrote: > > On Thu, 14 Sep 2017 09:54:56 +0800 > > Sepherosa Ziehau <se...@freebsd.org> wrote: > > > >> If you have any updates on this, please let me know. There is still > >> time for 10.4. > > > > We are still playing around with this in the lab... > > > > Running similar setup as the customer > > Microsoft Windows Server 2012 R2 Datacentre (6.3.9600) Revision 16384 > > Hyper-V 2012 > > > > Two VM guests > > - 11.0-RELEASE > > - 11.1-p1 > > > > We can not get the Hyper-V hn interface to lock up like the customer can > > though. > > > > We can get the VMs to hang/stall regularly if the guests run ntpd - approx > > every 15 mins, but no real obvious pattern to it. Disabling ntpd fixes > > it. > > Hmm, by ntpd I think you mean ntp client? You will have to disable > timesync if you run ntp client: > sysctl hw.hvtimesync.sample_thresh=-1 > sysctl hw.hvtimesync.ignore_sync=1 > > They interfere w/ each other. > > Or do you mean the network hanging triggered by "RXBUF ack failed"? > > Thanks, > sephe Yes, ntp client running on the VM guest. After finding it was unstable, we concluded that there must be some type of interference. Best that we automatically force ntp off when our software detects it is running in Hyper-V. We haven't been able to trigger the hn to hang like our customer can with the RXBUF problem though. Different underlying hardware probably. Paul. -- Paul Koch | Founder | CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1 running on HyperV hn interface hangs
On Thu, 14 Sep 2017 09:54:56 +0800 Sepherosa Ziehau <se...@freebsd.org> wrote: > If you have any updates on this, please let me know. There is still > time for 10.4. We are still playing around with this in the lab... Running similar setup as the customer Microsoft Windows Server 2012 R2 Datacentre (6.3.9600) Revision 16384 Hyper-V 2012 Two VM guests - 11.0-RELEASE - 11.1-p1 We can not get the Hyper-V hn interface to lock up like the customer can though. We can get the VMs to hang/stall regularly if the guests run ntpd - approx every 15 mins, but no real obvious pattern to it. Disabling ntpd fixes it. Paul. -- Paul Koch | Founder | CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1 running on HyperV hn interface hangs
On Thu, 14 Sep 2017 09:54:56 +0800 Sepherosa Ziehau <se...@freebsd.org> wrote: > If you have any updates on this, please let me know. There is still > time for 10.4. Still working on it. We are trying to replicate the FreeBSD 11.1 running in a Hyper-V VM setup in our test lab. We have ping/snmp/netflow network simulators that can create large amounts of real network traffic to see if it reliably triggers the problem. Paul. -- Paul Koch | Founder | CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1 running on HyperV hn interface hangs
On Thu, 7 Sep 2017 13:51:11 +0800 Sepherosa Ziehau <se...@freebsd.org> wrote: > Weird, your traffic pattern does not even belong to anything heavy. > Sending is mainly UDP, which will never be able to saturate the TX > buffer ring causing the RXBUF ACK sending failure. This is weird. It's a bit tricky. The poller is very fast. We ping every device every 15 seconds, and collect every MIB object every 60 seconds. The poller "rate limits" itself by dividing each minute into 100ms time slots and only sends a specific amount of pings/snmp packets in each time slot. The problem is, it blasts the request packets out really fast at the start of each time slot, and then sits in a receive loop until the next time slot comes around. The requests are not paced over the 100ms, therefore it will blast out a lot of packets in a few milliseconds. We use to use a 1 second rate limiting time slot, and didn't interlace ping/snmp requests, but we found certain interface types on Cisco 6509 switches couldn't keep up with back-to-back pings and would lose them. > Anyhow, make sure to test this patch: > 8762017-Sep-07 02:19 hn_inc_txbr.diff Yep. Might take a bit of time to test though because we'll need to get the customer to spin up a test VM on the same platform, and they are fairly remote (Perth, Australia). We don't run any Microsoft servers/HyperV setups in our lab. Paul. -- Paul Koch | Founder | CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1 running on HyperV hn interface hangs
On Thu, 7 Sep 2017 10:22:40 +0800 Sepherosa Ziehauwrote: > Is it possible to tell me your workload? e.g. TX heavy or RX heavy. > Enabled TSO or not. Details like how the send syscalls are issue will > be interesting. And your Windows version, include the patch level, > etc. > > Please try the following patch: > https://people.freebsd.org/~sephe/hn_dec_txdesc.diff > > Thanks, > sephe Hi Sephe, Here's a bit of an explanation of the environment... AKIPS Network Monitor workload: - 22000 devices (routers/switches/APs/etc) - 123000 interfaces (60 snmp polling) - 131 netflow exporters - ~1500 pings per second - ~1000 snmp requests/responses per second (~1.9 million MIB object/min) - ~250 netflow packets/sec (~4500 flows/sec incoming) - ~130 syslog messages/sec (incoming) - ~200 snmp traps/sec (incoming) The ping/snmp poller is a single monolithic process (no threads). Separate processes for each of the syslog/trap/netflow collection. SNMP requests are sent using the sendto() system call over a non-blocking UDP socket for both IPv4 and v6. We set the UDP socket receive buffer size to 4 Mbytes. Nothing really complex with it. Pings are interlaced with snmp requests so we limit the bursty nature of small back-to-back packets (eliminates issues with switch interfaces dropping bursts of packets). Ping requests are sent using a raw icmp socket. We don't read the responses from the icmp socket, instead we put the interface into promiscuous mode and use the BPF info to measure the tx/rx RTT values. Syslog daemon just listens on a UDP socket with a 4 Mbyte receive buffer. Same with the snmp trap daemon. Here's some links to performance graphs of the VM: https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last2h.pdf https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last24h.pdf https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last7d.pdf The OS was upgraded to 11.1p1 at 5pm on the 5th Sep. The hn0 interface hung at 7:36pm. The interface hung three times before we reverted to 11.0p9. It takes a few hours after rebooting the VM before the interface hangs. Microsoft Host is running Windows 2012 R2. Waiting for patch level info from the customer. I'll have to get the customer to spin up a new VM before trying your patch. Here's some info (after a reboot of the VM) Guest VM dmesg: FreeBSD 11.1-RELEASE-p1 #0 r322350: Thu Aug 10 22:16:21 UTC 2017 r...@shed31.akips.com:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0) VT(vga): text 80x25 Hyper-V Version: 6.3.9600 [SP18] Features=0xe7f PM Features=0x0 [C2] Features3=0x7b2 Timecounter "Hyper-V" frequency 1000 Hz quality 2000 CPU: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz (2300.00-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306f2 Family=0x6 Model=0x3f Stepping=2 Features=0x1f83fbff Features2=0x80002001 AMD Features=0x20100800 AMD Features2=0x1 Hypervisor: Origin = "Microsoft Hv" real memory = 34359738368 (32768 MB) avail memory = 33325903872 (31782 MB) Event timer "LAPIC" quality 100 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) random: unblocking device. ioapic0: Changing APIC ID to 0 ioapic0 irqs 0-23 on motherboard SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! Timecounter "Hyper-V-TSC" frequency 1000 Hz quality 3000 random: entropy device external interface kbd1 at kbdmux0 netmap: loaded module module_register_init: MOD_LOAD (vesa, 0x80f5b220, 0) error 19 nexus0 vtvga0: on motherboard cryptosoft0: on motherboard acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 attimer0: port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 vmbus0: on pcib0 pci0: on pcib0 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0 ata0: at channel 0 on atapci0 ata1: at channel 1 on atapci0 pci0: at device 7.3 (no driver attached) vgapci0: mem 0xf800-0xfbff irq 11 at device 8.0 on pci0 vgapci0: Boot video device atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer,
Re: 11.1 running on HyperV hn interface hangs
On Wed, 6 Sep 2017 12:02:43 +0100 Pete Frenchwrote: > > We recently moved our software from 11.0-p9 to 11.1-p1, but looks like > > there is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 > > R2) where the virtual hn0 interface hangs with the following kernel > > messages: > > > > hn0: on vmbus0 > > hn0: Ethernet address: 00:15:5d:31:21:0f > > hn0: link state changed to UP > > ... > > hn0: RXBUF ack retry > > hn0: RXBUF ack failed > > last message repeated 571 times > > > > It requires a restart of the HyperV VM. > > > > This is a customer production server (remote customer ~4000km away) > > running fairly critical monitoring software, so we needed to roll it back > > to 11.0-p9. We only have two customers running our software in HyperV, vs > > lots in VMware and a handful on physical hardware. > > > > 11.0-p9 has been very stable. Has anyone seen this problem before with > > 11.1 ? > > > I don't run anything on local hyper-v anymore, but I do run a ot of > stuff in Azure, and we havent seen anything like this. I track STABLE > for things though, updating after reading the commits and testing > locally for a week or so, so the version I am running currently is > r320175, which was part of 11.1-BETA2. I am going to upgrade to a more > recent STABLE sometime this weke or next though, will do that on a test > amchine and let you now how it goes. > > I seem to recall that there were some large changes to the hn code in > August to add virtual function support. When does 11.1-p1 date from ? Looks like 2017-08-10 Paul. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1 running on HyperV hn interface hangs
No sure if -stable is the right mailing list for this one. We recently moved our software from 11.0-p9 to 11.1-p1, but looks like there is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2) where the virtual hn0 interface hangs with the following kernel messages: hn0: on vmbus0 hn0: Ethernet address: 00:15:5d:31:21:0f hn0: link state changed to UP ... hn0: RXBUF ack retry hn0: RXBUF ack failed last message repeated 571 times It requires a restart of the HyperV VM. This is a customer production server (remote customer ~4000km away) running fairly critical monitoring software, so we needed to roll it back to 11.0-p9. We only have two customers running our software in HyperV, vs lots in VMware and a handful on physical hardware. 11.0-p9 has been very stable. Has anyone seen this problem before with 11.1 ? 11.1 is listed here https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-freebsd-virtual-machines-on-hyper-v Paul. -- Paul Koch | Founder | CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
date -r {big number} results in segmentation fault
I had one of those in-a-hurry copy-n-paste errors... date -r 15038137211503813721 Segmentation fault Haven't had a chance to look into it yet though. This was on 11.1-RELEASE-p1 #0 r322350 Paul. -- Paul Koch | Founder | CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS ARC and mmap/page cache coherency question
We are trying to understand a performance issue when syncing large mmap'ed files on ZFS. Example test box setup: FreeBSD 10.3-p5 Intel i7-5820K 3.30GHz with 64G RAM 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe Read performance of a sequentially written large file on the pool is typically around 950Mbytes/sec using dd. Our software mmap's some large database files using MAP_NOSYNC, and we call fsync() every 10 minutes when we know the file system is mostly idle. In our test setup, the database files are 1.1G, 2G, 1.4G, 12G, 4.7G and ~20 small files (under 10M). Most of the memory pages in the mmap'ed files are updated every minute with massive amounts of collected ping and snmp mib values. When the 10 minute fsync() occurs, gstat typically shows very little disk reads and very high write speeds, which is what we expect. But, every 80 minutes we process the data in the large mmap'ed files and store it in highly compressed blocks of a ~300G file using pread/pwrite (i.e. not mmap'ed). After that, the performance of the next fsync() of the mmap'ed files falls off a cliff. We are assuming it is because the ARC has thrown away the cached data of the mmap'ed files. gstat shows lots of read/write contention and lots of things tend to stall waiting for disk. Is this just a lack of ZFS ARC and page cache coherency ?? Is there a way to prime the ARC with the mmap'ed files again before we call fsync() ? We've tried cat and read() on the mmap'ed files but doesn't seem to touch the disk at all and the fsync() performance is still poor, so it looks like the ARC is not being filled. msync() doesn't seem to be much different. mincore() stats show the mmap'ed data is entirely incore and referenced. Paul. -- Paul Koch | Founder | CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
10.2 - Process stuck in unkillable sleep
, td_vnet = 0x0, td_vnet_lpush = 0x0, td_intr_frame = 0x0, td_rfppwait_p = 0x0, td_ma = 0x0, td_ma_cnt = 0, td_su = 0x0 } Paul. -- Paul Koch | Founder, CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gptboot: unable to read backup GPT header - virtualbox guest with SAS controller
On Mon, 22 Jun 2015 08:28:10 -0600 (MDT) Warren Block wbl...@wonkity.com wrote: On Mon, 22 Jun 2015, Paul Koch wrote: We get the following error after installing 10.1-p12 in a VirtualBox guest when setup with an emulated LSI / SAS controller and a 50G fixed sized virtual disk: gptboot: error 1 lba 104857599 gptboot: unable to read backup GPT header Can't seem to find anyone who has this same issue. The problem does not exist if we configure the guest with a SATA controller and same size virtual disk. ... The guest boots fine, but we always get the gptboot error. Is this just a problem with the virtualbox SAS controller emulation where gptboot can't retrieve the backup table ? That would be my first guess: an off-by-one error preventing the last block from being read. It's not clear which emulated controller was being used for the diskinfo output posted earlier. If it really was an off-by-one bug, the block count would differ depending on the controller. However, some controllers keep metadata on the drive, and report a reduced capacity, and that would have almost the same effect. Seems like there would be a complaint by the controller firmware about the contents of the metadata block, but maybe not by an emulated controller. If controller metadata is the problem, installing FreeBSD using the emulated controller in place should make sure the backup GPT is in the correct position, rather than switching to the SCSI controller after installing with, say, SATA. It does look like gptboot couldn't access the last sector on the virtual SAS disk. We were playing with expanding the size of the virtual disk... Shutting down the VM, expanding the disk size, rebooting, and no gptboot error. Running the appropriate gpart/zpool commands to take up the expanded space, then reboot, and the gptboot error is back again. We've been having similar/same issues with a customer who is running on bare hardware using a Cisco C240 M4 using the mrsas driver, but it also appears to exhibit the problem we were having where the backup partition table does not get updated when the gpart bootonce flag is set so we can boot from an alternate partition. Adding 'gpart recover ${disk}' to our startup script gets around the problem. Paul. -- Paul Koch | Founder, CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
bootonce not set in backup partition table - virtualbox guest with SAS controller
11e5 9e8d 0008 3f27 7893 120 0800 0080 07ff 0120 130 0800 0061 006b 0069 0070 140 0073 002d 0072 006f 006f 0074 0030 150 * 180 7cb6 516e 6ecf 11d6 f88f 0200 092d 2b71 190 66da 9ac1 18ed 11e5 9e8d 0008 3f27 7893 1a0 0800 0120 07ff 01c0 1b0 0400 0061 006b 0069 0070 1c0 0073 002d 0072 006f 006f 0074 0031 1d0 * 200 7cba 516e 6ecf 11d6 f88f 0200 092d 2b71 210 1d30 9ac3 18ed 11e5 9e8d 0008 3f27 7893 220 0800 01c0 f7ff 063f 230 0061 006b 0069 0070 240 0073 002d 0068 006f 006d 0065 250 * Dump of the backup partition table # dd if=/dev/da0 bs=512 skip=104857567 count=32 2/dev/null | hexdump 000 6b9d 83bd 7f41 11dc 0bbe 1500 b860 0f4f 010 b409 9abc 18ed 11e5 9e8d 0008 3f27 7893 020 0028 0427 030 0067 0070 0074 0062 040 006f 006f 0074 050 * 080 7cb5 516e 6ecf 11d6 f88f 0200 092d 2b71 090 3b28 9abe 18ed 11e5 9e8d 0008 3f27 7893 0a0 0428 0427 0080 0b0 0061 006b 0069 0070 0c0 0073 002d 0073 0077 0061 0070 0d0 * 100 7cb6 516e 6ecf 11d6 f88f 0200 092d 2b71 110 c0b3 9abf 18ed 11e5 9e8d 0008 3f27 7893 120 0800 0080 07ff 0120 130 0800 0061 006b 0069 0070 140 0073 002d 0072 006f 006f 0074 0030 150 * 180 7cb6 516e 6ecf 11d6 f88f 0200 092d 2b71 190 66da 9ac1 18ed 11e5 9e8d 0008 3f27 7893 1a0 0800 0120 07ff 01c0 1b0 0c00 0061 006b 0069 0070 different 1c0 0073 002d 0072 006f 006f 0074 0031 1d0 * 200 7cba 516e 6ecf 11d6 f88f 0200 092d 2b71 210 1d30 9ac3 18ed 11e5 9e8d 0008 3f27 7893 220 0800 01c0 f7ff 063f 230 0061 006b 0069 0070 240 0073 002d 0068 006f 006d 0065 250 * The line at offset 1b0 is out by one bit. Looks like the bootonce flag is only updated in the primary partition table and not the backup. At this point our boot script tries to do gpart set -a bootme -i 4 da0 gpart unset -a bootme -i 3 da0 gpart unset -a bootonce -i 4 da0 but we of course get gpart: table 'da0' is corrupt: Operation not permitted We currently have a workaround that does gpart recover da0 before doing the set/unset gpart commands. Not sure what we are doing wrong because it all works fine if we configure the vm guest using a SATA controller. Paul. -- Paul Koch | Founder, CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
gptboot: unable to read backup GPT header - virtualbox guest with SAS controller
Hi, We get the following error after installing 10.1-p12 in a VirtualBox guest when setup with an emulated LSI / SAS controller and a 50G fixed sized virtual disk: gptboot: error 1 lba 104857599 gptboot: unable to read backup GPT header Can't seem to find anyone who has this same issue. The problem does not exist if we configure the guest with a SATA controller and same size virtual disk. Setup: - 10.1-RELEASE-p12 host - VirtualBox 4.3.28 - 10.1-RELEASE-p12 guest Guest info after boot... # uname -a FreeBSD shed65.akips.com 10.1-RELEASE-p12 FreeBSD 10.1-RELEASE-p12 #0 r284334: Sat Jun 13 05:45:13 UTC 2015 r...@shed21.akips.com:/usr/obj/usr/src/sys/GENERIC amd64 # diskinfo -v da0 da0 512 # sectorsize 53687091200 # mediasize in bytes (50G) 104857600 # mediasize in sectors 0 # stripesize 0 # stripeoffset 6527# Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. # Disk ident. # gpart show = 34 104857533 da0 GPT (50G) 34 2014 - free - (1.0M) 2048 10241 freebsd-boot (512K) 307283886082 freebsd-swap (4.0G) 8391680 1024 - free - (512K) 8392704 104857603 freebsd-ufs [bootme] (5.0G) 18878464 104857604 freebsd-ufs (5.0G) 29364224 754913285 freebsd-zfs (36G) 10482 2015 - free - (1.0M) # gpart status Name Status Components da0p1 OK da0 da0p2 OK da0 da0p3 OK da0 da0p4 OK da0 da0p5 OK da0 The primary and backup GPT headers differ, which is expected we think. Primary GPT header # dd if=/dev/da0 bs=512 skip=1 count=1 2/dev/null | hexdump 000 4645 2049 4150 5452 0001 005c 010 b85b a5f5 0001 020 063f 0022 030 ffde 063f 16e7 cafe 18d1 11e5 040 f697 0008 9b27 bd6a 0002 050 0080 0080 8856 ebca 060 * Backup GPT header # dd if=/dev/da0 bs=512 skip=104857599 count=1 2/dev/null | hexdump 000 4645 2049 4150 5452 0001 005c 010 98b1 45c8 063f 020 0001 0022 030 ffde 063f 16e7 cafe 18d1 11e5 040 f697 0008 9b27 bd6a ffdf 063f 050 0080 0080 8856 ebca 060 * MD5 of the primary and backup partition tables identical. # dd if=/dev/da0 bs=512 skip=2 count=32 2/dev/null | md5 8c4510e1854f3371c3241e8a4374dc2c # dd if=/dev/da0 bs=512 skip=104857567 count=32 2/dev/null | md5 8c4510e1854f3371c3241e8a4374dc2c The guest boots fine, but we always get the gptboot error. Is this just a problem with the virtualbox SAS controller emulation where gptboot can't retrieve the backup table ? Appears to fail in sys/boot/i386/common/drv.c in drvread(). We have another problem to do with the same setup to do with setting bootonce flag not being updated in the backup partition table, but I'll describe that in a separate email. Paul. -- Paul Koch | Founder, CEO AKIPS Network Monitor | akips.com Brisbane, Australia ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: flock incorrectly detects deadlock on 7-stable and current
On Thu, 8 May 2008 06:37:00 pm Doug Rabson wrote: On 8 May 2008, at 09:12, Paul Koch wrote: Hi, We have been trying to track down a problem with one of our apps which does a lot of flock(2) calls. flock returns errno 11 (Resource deadlock avoided) under certain scenarios. Our app works fine on 7-Release, but fails on 7-stable and -current. The problem appears to be when we have at least three processes doing flock() on a file, and one is trying to upgrade a shared lock to an exclusive lock but fails with a deadlock avoided. Attached is a simple flock() test program. a. Process 1 requests and gets a shared lock b. Process 2 requests and blocks for an exclusive lock c. Process 3 requests and gets a shared lock d. Process 3 requests an upgrade to an exclusive lock but fails (errno 11) If we change 'd' to Process 3 requests unlock, then requests exclusive lock, it works. Could you possibly try this patch and tell me if it helps: //depot/user/dfr/lockd/sys/kern/kern_lockf.c#57 - /tank/projects/ lockd/src/sys/kern/kern_lockf.c @@ -1370,6 +1370,18 @@ } /* + * For flock type locks, we must first remove + * any shared locks that we hold before we sleep + * waiting for an exclusive lock. + */ + if ((lock-lf_flags F_FLOCK) + lock-lf_type == F_WRLCK) { + lock-lf_type = F_UNLCK; + lf_activate_lock(state, lock); + lock-lf_type = F_WRLCK; + } + + /* * We are blocked. Create edges to each blocking lock, * checking for deadlock using the owner graph. For * simplicity, we run deadlock detection for all @@ -1389,17 +1401,6 @@ } /* - * For flock type locks, we must first remove - * any shared locks that we hold before we sleep - * waiting for an exclusive lock. - */ - if ((lock-lf_flags F_FLOCK) - lock-lf_type == F_WRLCK) { - lock-lf_type = F_UNLCK; - lf_activate_lock(state, lock); - lock-lf_type = F_WRLCK; - } - /* * We have added edges to everything that blocks * us. Sleep until they all go away. */ Manually applied the patch to stable kern_lockf.c 1.57.2.1. Ran the flock_test program on many of our architectures and it works fine. Have also been testing our app on a single core i386 machine today with no locking problems. Just setup a quad core -stable amd64 build and it also appears to be running fine now. Thanks Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
flock incorrectly detects deadlock on 7-stable and current
Hi, We have been trying to track down a problem with one of our apps which does a lot of flock(2) calls. flock returns errno 11 (Resource deadlock avoided) under certain scenarios. Our app works fine on 7-Release, but fails on 7-stable and -current. The problem appears to be when we have at least three processes doing flock() on a file, and one is trying to upgrade a shared lock to an exclusive lock but fails with a deadlock avoided. Attached is a simple flock() test program. a. Process 1 requests and gets a shared lock b. Process 2 requests and blocks for an exclusive lock c. Process 3 requests and gets a shared lock d. Process 3 requests an upgrade to an exclusive lock but fails (errno 11) If we change 'd' to Process 3 requests unlock, then requests exclusive lock, it works. The manual page says: A shared lock may be upgraded to an exclusive lock, and vice versa, simply by specifying the appropriate lock type; this results in the previous lock being released and the new lock applied (possibly after other processes have gained and released the lock). The manual page doesn't mention that flock() can fail with a deadlock. Our test environment is: - 8 core Intel machine running i386 stable - 4 core Intel machine running amd64 current (20080508) - 4 core Intel machine running amd64 stable (20080508) - 2 core AMD machine running i386 stable (20080418) - 2 core AMD machine running i386 stable (20080418) - single core (no hyperthreading) i386 stable (20080418) There appears to have been changes to kern_lockf.c and other stuff around the 10th April to do with deadlock detection. We don't see the problem on 6.2-stable, 7-Release, or 7-stable pre ~10th April. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD boots too fast on Dell PE850
On Saturday 19 August 2006 03:12, Alan Amesbury wrote: Thanks for the feedback and discussion! Alas, in terms of network configuration, I'm just a tenant; I have no direct control over the networking gear, nor direct visibility into how the switch is configured. A couple people wrote to me directly and suggested I 'send-pr' this, so I'll do so (hopefully later today). Thanks again! -- Alan Amesbury University of Minnesota This is a really old problem, actually two. The first being the spanning tree problem where it can take a long time for it to settle and your port go into forwarding state. Adding a random sleep doesn't help because - how long do you sleep for ? How we got around this problem at various sites was, by modifying rc scripts, to check if a default gateway was configured (typical), and ping it until a response was received, or a large timeout occurred (eg. 5 minutes). That way, all other network services like nptdate, and sendmail would have a better chance of working. If your machine doesn't use a static IP, but instead dhcp, then you will need to have a long timeout/retry on the dhcp requests. The second problem we found was, various NICs would report that they were active after doing auto negotiation, but no rx packets were being passed into to the OS. Not sure if it was a hardware or driver issue, but we discovered that by forcing a packet out the NIC via the bpf interface, it would immediately start doing stuff. It was if the auto negotiation had not really completed fully until a packet was transmitted. This only occurred on certain types of NICs, the newer ones. This was a problem for us because we build something called a remote network appliance (RNA) which is basically FreeBSD on a floppy and runs a statistical lan analyser. The RNA might have many NICs in it, one with an IP, the others just connected to network segments in promiscuous mode. Our apps couldn't monitor any traffic because no packets had be sent out the interfaces. So, early in the boot process we force out a couple of Loopback packets and everything works just fine. Not sure if the second issue would be a problem for normal installations though. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Can't select/install kernels in custom install.cfg - 6.1RC2
On Fri, 5 May 2006 05:33 pm, Paul Koch wrote: Hi, I have just upgraded some of our product build machines to 6.1RC2 and I am having a few problems getting a custom install.cfg to work with the way sysinstall now selects/installs kernels. I am trying to select a custom distribution set using something like the following: dists=base kernels distKernel=GENERIC SMP distSetCustom base gets installed fine but no kernels ever get installed. If I set any of the standard distribution sets (eg. distSetMinimal, or distSetEverything), then one or both of GENERIC/SMP kernels get installed. I am a bit lost. Did some googling, but couldn't fine anything relevent. Paul. Managed to work this out by reading sysinstall code. It is quite simple, but not so obvious from the manual page. There appears to be two methods of selecting which dists to install: 1. Setting the distributions by name using something like: ... dists=base lib32 GENERIC SMP distSetCustom ... installCommit The kernels have now been separated from base, into their own dist, but you don't specify dists=base kernels, instead you have to specify the subdistribution names. Each of the dists values are looked up and the appropriate bit set for the variables below. 2. Setting the distributions by setting several bit field environment variables: ... distMain=8193 distSRC=0 distX11=0 diskKernel=3 ... installCommit The above variables are bit fields as per sysinstall/dist.h, but must be converted to decimal because they are read using atoi(3). Both methods work fine, but you can't select all within subdistribution using method 1 without putting every subdistribution name in (eg. for src - sbase scontrib scrypt ...), where as in method 2 you can just turn all bits on (eg. 0xF, but in decimal). Best to refer to dist.h. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Can't select/install kernels in custom install.cfg - 6.1RC2
Hi, I have just upgraded some of our product build machines to 6.1RC2 and I am having a few problems getting a custom install.cfg to work with the way sysinstall now selects/installs kernels. I am trying to select a custom distribution set using something like the following: dists=base kernels distKernel=GENERIC SMP distSetCustom base gets installed fine but no kernels ever get installed. If I set any of the standard distribution sets (eg. distSetMinimal, or distSetEverything), then one or both of GENERIC/SMP kernels get installed. I am a bit lost. Did some googling, but couldn't fine anything relevent. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: BETA3 report
On Mon, 13 Mar 2006 02:41 pm, M. Warner Losh wrote: I just tried to boot the BETA3 disc1 iso on my Toshiba Satellite A64-S1762. I got a cryptic message: Too many holes in the physical address space, giving up rest of copyright message here PANIC (if I've gone into the BIOS at all), followed by a panic. Or I get a panic when probing the ohci device (either with or without a USB keyboard attached, the built-in keyboard is fried). Has anybody else seen this? Is there a debug kernel I can boot, or do I have to build one of my own? Warner Hi, Is this related to issue 2 in my previous post: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153993+0+archive/2005/freebsd-stable/20051127.freebsd-stable Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Automatic installation problem
On Sat, 10 Dec 2005 01:34 am, Tarasov Alexey wrote: Hello! I'm trying to make full automatical FreeBSD installation. I have the following lines in my install.cfg: command=echo sshd_enable=YES /etc/rc.conf system command=echo 'pass' | /usr/sbin/pw useradd -u user -h 0 -G wheel system But during installation I get some error messages: DEBUG: dispatch: calling resword 'system' echo sshd_enable=YES /etc/rc.conf: not found I am using FreeBSD 6.0-RELEASE. Yep, something changed around 4.x in sysinstall where rc.conf is not created until near when it finishes. If you write to rc.conf during an scripted sysinstall, then it will be over written. You can write your stuff to /etc/rc.conf.local. It gets sourced in /etc/defaults/rc.conf. $ grep rc.conf /etc/defaults/rc.conf rc_conf_files=/etc/rc.conf /etc/rc.conf.local After boot time, we have a post boot script which merges our stuff from rc.conf.local into rc.conf, just to keep it clean. Just to be safe, we quote everything in install.cfg. As in echo 'sshd_enable=YES' /etc/rc.conf.local Paul. -- Paul Koch CTO Statseeker ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.0-stable panic in ohci_softintr when using ucom/uftdi
On Wed, 7 Dec 2005 09:02 pm, Ian Dowse wrote: In message [EMAIL PROTECTED], Paul Koch writes: As soon at the modem rings, the machine panics. I don't think the modem even gets time to pick up the line. This doesn't happen on a 5.4-stable box with 6 modems connected, but 5.4-stable appears to have other hanging issues, thus why I am trying out 6.0-stable. Recent changes to ohci.c (1.154.2.1) in this area (2 days ago) ?? Should I raise a PR ? I'll see if I can reproduce this later - the only thing I can think of now that might be responsible is a USB transfer reuse issue that the old patch here might help with: http://people.freebsd.org/~iedowse/releng_5_xfer_reuse.diff Could you see if that makes any difference? It should apply to 6-stable even though the name says releng_5. Ian I'll try it tomorrow morning when I get back to the office. I compiled in USB_DEBUG and set all the usb debug sysctl's to 100 and made it crash again. It is reproduceable 100% of the time. Are you interested in seeing the verbose debug log messages at all ? Paul. -- Paul Koch CTO Statseeker ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
6.0-stable panic in ohci_softintr when using ucom/uftdi
My setup is: ASUS Pundant Booksize PC Celeron 2.66Ghz, 512M ram, 4 USB ports. Attached is one Netcomm USB modem which uses the ucom / uftdi drivers, and is configured for auto answer. Source is 6.0 stable as of today 7th Dec 2005. Devices in /dev that get created are: cuaU0 cuaU0.init cuaU0.lock ttyU0.init ttyU0.lock ttyd0 I have added the following to /etc/ttys: ttyU0 /usr/libexec/getty modem.230400 dialup on insecure and this to /etc/gettytab: modem.230400|modems-230k:\ :hw:np:sp#230400: As soon at the modem rings, the machine panics. I don't think the modem even gets time to pick up the line. This doesn't happen on a 5.4-stable box with 6 modems connected, but 5.4-stable appears to have other hanging issues, thus why I am trying out 6.0-stable. Recent changes to ohci.c (1.154.2.1) in this area (2 days ago) ?? Should I raise a PR ? Paul. # kgdb kernel.debug /var/crash/vmcore.2 kgdb: core file: /var/crash/vmcore.2 kgdb: kernel image: kernel.debug [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined sym bol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd. Unread portion of the kernel message buffer: ucom0: open bulk out error (addr 2): IN_USE Fatal trap 12: page fault while in kernel mode fault virtual address = 0x24 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0474c36 stack pointer = 0x28:0xd3133c7c frame pointer = 0x28:0xd3133cac code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 29 (irq19: ohci0 ohci1+) trap number = 12 panic: page fault Uptime: 47s Dumping 446 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 447MB (114240 pages) 431 415 399 383 367 351 335 319 303 287 271 255 239 223 2 07 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 __asm __volatile(movl %%fs:0,%0 : =r (td)); (kgdb) backtrace #0 doadump () at pcpu.h:165 #1 0xc04c8a42 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xc04c8cd8 in panic (fmt=0xc05f6b91 %s) at /usr/src/sys/kern/kern_shutdown.c:555 #3 0xc05d7e34 in trap_fatal (frame=0xd3133c3c, eva=36) at /usr/src/sys/i386/i386/trap.c:836 #4 0xc05d7b9b in trap_pfault (frame=0xd3133c3c, usermode=0, eva=36) at /usr/src/sys/i386/i386/trap.c:744 #5 0xc05d77f9 in trap (frame= {tf_fs = -1027342328, tf_es = -1027342296, tf_ds = -1028849624, tf_edi = -1028264960, tf_esi = -1028043056, tf_ebp = -753714004, tf_isp = -753714072, tf_ebx = 2, tf_edx = -1028786304, tf_ecx = 0, tf_eax = -1027196800, tf_trapno = 12, tf_err = 0, tf_eip = -1069069258, tf_cs = 32, tf_eflags = 590466, tf_esp = -1068655605, tf_ss = -1028786304}) at /usr/src/sys/i386/i386/trap.c:434 #6 0xc05c7f3a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc0474c36 in ohci_softintr (v=0xc2b8b000) at /usr/src/sys/dev/usb/ohci.c:1469 #8 0xc04887ab in usb_schedsoftintr (bus=0xc2b8b000) at /usr/src/sys/dev/usb/usb.c:871 #9 0xc0474762 in ohci_intr1 (sc=0xc2b8b000) at /usr/src/sys/dev/usb/ohci.c:1233 #10 0xc04745b4 in ohci_intr (p=0xc2b8b000) at /usr/src/sys/dev/usb/ohci.c:1162 #11 0xc04b45d9 in ithread_loop (arg=0xc2a8b400) at /usr/src/sys/kern/kern_intr.c:547 #12 0xc04b3860 in fork_exit (callout=0xc04b4480 ithread_loop, arg=0xc2a8b400, frame=0xd3133d38) at /usr/src/sys/kern/kern_fork.c:789 #13 0xc05c7f9c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208 (kgdb) Kernel Config: machine i386 cpu I686_CPU ident TERMITE6X makeoptions DEBUG=-g options SCHED_4BSD options PREEMPTION options INET options FFS options SOFTUPDATES options UFS_ACL options UFS_DIRHASH options MD_ROOT options GEOM_GPT options COMPAT_43 options KTRACE options SYSVSHM options SYSVMSG options SYSVSEM options _KPOSIX_PRIORITY_SCHEDULING options KBD_INSTALL_CDEV options ADAPTIVE_GIANT device apic device pci device agp device ata device atadisk device atapicd options ATA_STATIC_ID device scbus device da device cd device pass device atkbdc device atkbd device psm device vga device splash device sc device pmtimer device sio device
Re: 6.0 kernel will not boot past atkbd0
On Fri, 25 Nov 2005 09:01 pm, Pete French wrote: We have an array of new ASUS machine connected through a KVM. These machines panic when the kernel is probing for the mouse if ACPI is not loaded. If a mouse is not plugged in, no panic. If ACPI is loaded and the mouse is plugged in, it boots fine. I didnt try ACPI. I merely took the ouse driiver out of the kernel. Interestingly this then let me boot single user, but would not boot multi user! I had to physically unplug the mouse in the end. Shall file a PR (unless someone else already did?) For the ASUS machines, we think it is already covered in PR i386/69750. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.0 kernel will not boot past atkbd0
On Fri, 25 Nov 2005 01:18 am, Pete French wrote: I have the same behavior on Compaq AP400 (2xPIII 700MHz). The problem disappeared when I disconnect the mouse. Ah! Now that is worth knowing - I didnt even think of trying that. So does anyone know why the mouse being connected causes it to not boot ? I can rebuild the kernel without the mouse driver, that might help... We have an array of new ASUS machine connected through a KVM. These machines panic when the kernel is probing for the mouse if ACPI is not loaded. If a mouse is not plugged in, no panic. If ACPI is loaded and the mouse is plugged in, it boots fine. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Dell DRAC card snatches keyboard console
On Thu, 24 Nov 2005 02:32 am, Palle Girgensohn wrote: Hi! I just installed a Dell 2850 with a DRAC card and a PS/2 keyboard. The keyboard stopped working when entering multiuser mode, and I found an old email on current from September 2004, where Brooks said to comment away ukbd lines in devd.conf. I did, PS/2 keyboard works but the DRAC remote console does not. Is there a way to have both keyboards working? http://lists.freebsd.org/pipermail/freebsd-current/2004-September/03 8879.html Regards, Palle We also came across this problem some time ago on 5.4. We could install ok, go into single user mode, but in multiuser mode the DRAC would become the console. We didn't fiddle devd.conf, but instead put an entry in rc.conf of keyboard=/dev/kbd0. This allowed PS/2 keyboards to work, and if we plugged a usb keyboard in, it would also work. We didn't get to play with the DRAC though. I don't have access to these machines anymore because they are now installed at a customer site. I'd be interested in knowing how to get both a locally attached keyboard and the DRAC going at the same time. I recall the DRAC being a special card which contains an ethernet port and virtual keyboard/mouse/video. It is accessed remotely via IP and replaces the need to KVM switches/cabling. The card contains its own tiny OS and IP stack. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.0 Release - Pentium install panic and some questions
On Mon, 21 Nov 2005 07:24 pm, Kris Kennaway wrote: Issue 1: Can't install on a Pentium P5 class machine: The install panics when installing the base stuff. No useful messages are displayed accept the panic: page fault and rebooting in 15 seconds. The machines are 10 year old DEC Pentiums, 32 to 64M ram, IDE disks, etc. We have four of these in our test environment and appear to install and run FreeBSD-5.4 fine. Try disabling ACPI. Many old systems have buggy ACPI implementations. Sometimes this can be fixed by a BIOS upgrade. A Pentium 150Mhz aged machine wouldn't have ACPI, would it ? I just tried going through the long floppy install on another one of these machines I have in my home test rack (they don't boot from the cdrom anymore), but stopped trying when the single IBM SCSI disk attached to an Adaptec controller was detected as da0, da1, da2, da3, da4 and da5 ! I'll try again on the machines in the office tomorrow. hints ./device.hints machine i386 cpu I586_CPU cpu I686_CPU ident RNA_KERNEL options SCHED_ULE You probably want 4BSD, since ULE is slower on many workloads. We used ULE on 5.4 because it used less space on the floppy image and there was really only one process doing much on the machine, but I did some playing this afternoon and see that on 6.0 ULE and 4BSD use up the same amount of space, so I am changing it back to 4BSD. I am not so stretched for space on the floppy anymore after getting rid of the second GLOBAL_OFFSET_TABLE text. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.0 Release - Pentium install panic and some questions
On Tue, 22 Nov 2005 07:03 am, Kris Kennaway wrote: On Mon, Nov 21, 2005 at 09:28:27PM +1000, Paul Koch wrote: On Mon, 21 Nov 2005 07:24 pm, Kris Kennaway wrote: Issue 1: Can't install on a Pentium P5 class machine: The install panics when installing the base stuff. No useful messages are displayed accept the panic: page fault and rebooting in 15 seconds. The machines are 10 year old DEC Pentiums, 32 to 64M ram, IDE disks, etc. We have four of these in our test environment and appear to install and run FreeBSD-5.4 fine. Try disabling ACPI. Many old systems have buggy ACPI implementations. Sometimes this can be fixed by a BIOS upgrade. A Pentium 150Mhz aged machine wouldn't have ACPI, would it ? I don't know..nevertheless, please try it :) Kris Ok, a bit of confusion here. When booting from floppy on these machines, the option is to Boot FreeBSD with ACPI enabled, while on other machines it says Boot FreeBSD with ACPI disabled. Looks like this is from beastie.4th. We tried both options and it still panics when it is extracting base (ie. you can partition, newfs, etc... using sysinstall). It gets about 2% of the way through extracting base. So, tried going to the command line from the loader menu and unloaded all loaded modules, and disabled further loading of ACPI, then continued booting. The kernel is loaded into memory, we get the Too many holes in physical memory message and it panics immediately after the copyright message with a: Fatal Trap 12: page fault while in kernel mode fault virtual address: 0xb5 fault code = supervisoer read, page not present . some other guff panic: page fault uptime: 1s This is the error condition we experienced with our RNA before bumping up the phys_avail[] in machdep.c I am planning to ditch all of these old Pentium machines because they are too much of a problem. I'll keep one for a little longer if you like to figure out what is going on. Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
6.0 Release - Pentium install panic and some questions
Hi, we are having a number of issues with 6.0-Release. Our setup: We have ~40 machines in a development test environment, ranging from P5/150Mhz/32M ram/IDE, PII Celerons, P3, P4, single and dual processor setups. Issue 1: Can't install on a Pentium P5 class machine: The install panics when installing the base stuff. No useful messages are displayed accept the panic: page fault and rebooting in 15 seconds. The machines are 10 year old DEC Pentiums, 32 to 64M ram, IDE disks, etc. We have four of these in our test environment and appear to install and run FreeBSD-5.4 fine. Issue 2: phys_avail[] array too small in i386/machdep.c P5 boxes ?? We have something which we call a Remote Network Appliance (RNA), which is basically a boot floppy with lots of stuff squeezed on it. The RNA uses a cut down kernel config (ie. no kernel source modifications), various other inhouse programs (eg. init/inetd/telnetd replacements), built into a 1Mbyte MD root. We have no problems using everything up until a 5.4-stable kernel but have various problems with 6.0-release. When using 6.0, we get the following messages: Overlapping or non-montonic memory region, ignoring second region ... Too many holes in the physical address space, giving up ... Fatel trap 12: page fault while in kernel mode ... panic Did a bit of searching and found that in Dragonfly phys_avail[] in i386/machdep.c has been bumped up because it is too small. Looking at 6.0 machdep.c, looks like new dcons stuff has been added to it, and it blocks out some physical memory to use. Not sure if that has anything to do with it. From my understanding, phys_avail[10] gives you room for four physical available address ranges (ie. 4 * start/end pair entries and null terminated). I bumped the number up to 12 (ie. gives me five address ranges) and we are off and going. 6.0 now boots on all our Pentium machines, but... on 5.4-stable we got: physical memory: 67108864 avail memory:56156160 on 6.0 with phys_avail[12] we got: physical memory: 67108864 avail memory:63299584 more available memory for some reason ! Hmmm. On most of our machines, when booting in verbose mode, the 5.4 kernel reports three phys_avail segments, but the Pentium boxes report four. On the patched 6.0, the Pentiums report five segments. Unfortunately, the machine panics on Pentium machines when stress testing it (ie. by making it run out of memory). On 5.4-stable it would just kill user processes, under 6.0 it kills a few processes but quickly panics with a page not present error. At least 6.0 now boots and runs on a Pentium, whereas the standard install panics. I can't get a dump of the RNA floppy panic because it has no swap or disk to write to, and there isn't enough room on the floppy to build a kernel with debugging stuff. So, my question is... is it OK to bump phys_avail from 10 to 12 ?? or do we just ditch the Pentium as a supported platform ? Dragonfly have bumped it to 22, giving 10 segments. The only other change we do is compile the kernel and world with -Os and -funit-at-a-time to reduce the resulting binary sizes. fyi, A copy of the floppy image is at: http://www.statseeker.com/downloads/lanstat_fbsd60.bin It also contains our realtime Statistical LAN Analyser. Instructions are at http://www.statseeker.com/download1.html The following is the RNA kernel config: hints ./device.hints machine i386 cpu I586_CPU cpu I686_CPU ident RNA_KERNEL options SCHED_ULE options INET options FFS options MD_ROOT options MD_ROOT_SIZE=1024 options COMPAT_FREEBSD4 options HZ=1000 options CLK_USE_I8254_CALIBRATION options VM_KMEM_SIZE_SCALE options NO_SWAPPING options INIT_PATH=/rna-init device apm device pci device vga device fdc device md device mem nodeviceio device atkbdc device atkbd device pty device sc options MAXCONS=2 options SC_HISTORY_SIZE=500 options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) options SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN) options SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK) options SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED) options SC_NO_CUTPASTE options SC_NO_FONT_LOADING options SC_NO_SYSMOUSE options DEVICE_POLLING device de device em device ixgb device txp device miibus device bfe device bge device dc device fxp device lge device nge device pcn device re device sf device sis device sk device ste device ti device tl device tx device vge device vr device wb device xl device loop device