Re: 11.1 running on HyperV hn interface hangs

2017-09-28 Thread Paul Koch
On Fri, 29 Sep 2017 13:31:22 +0800
Sepherosa Ziehau <se...@freebsd.org> wrote:

> On Fri, Sep 29, 2017 at 12:36 PM, Paul Koch <paul.k...@akips.com> wrote:
> > On Thu, 14 Sep 2017 09:54:56 +0800
> > Sepherosa Ziehau <se...@freebsd.org> wrote:
> >  
> >> If you have any updates on this, please let me know.  There is still
> >> time for 10.4.  
> >
> > We are still playing around with this in the lab...
> >
> > Running similar setup as the customer
> >  Microsoft Windows Server 2012 R2 Datacentre (6.3.9600) Revision 16384
> >  Hyper-V 2012
> >
> > Two VM guests
> >  - 11.0-RELEASE
> >  - 11.1-p1
> >
> > We can not get the Hyper-V hn interface to lock up like the customer can
> > though.
> >
> > We can get the VMs to hang/stall regularly if the guests run ntpd - approx
> > every 15 mins, but no real obvious pattern to it.  Disabling ntpd fixes
> > it.  
> 
> Hmm, by ntpd I think you mean ntp client?  You will have to disable
> timesync if you run ntp client:
> sysctl hw.hvtimesync.sample_thresh=-1
> sysctl hw.hvtimesync.ignore_sync=1
> 
> They interfere w/ each other.
> 
> Or do you mean the network hanging triggered by "RXBUF ack failed"?
> 
> Thanks,
> sephe

Yes, ntp client running on the VM guest.  After finding it was unstable, we
concluded that there must be some type of interference.  Best that we
automatically force ntp off when our software detects it is running in
Hyper-V.

We haven't been able to trigger the hn to hang like our customer can with the
RXBUF problem though.  Different underlying hardware probably.

Paul.
-- 
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.1 running on HyperV hn interface hangs

2017-09-28 Thread Paul Koch
On Thu, 14 Sep 2017 09:54:56 +0800
Sepherosa Ziehau <se...@freebsd.org> wrote:

> If you have any updates on this, please let me know.  There is still
> time for 10.4.

We are still playing around with this in the lab...

Running similar setup as the customer
 Microsoft Windows Server 2012 R2 Datacentre (6.3.9600) Revision 16384
 Hyper-V 2012

Two VM guests
 - 11.0-RELEASE
 - 11.1-p1

We can not get the Hyper-V hn interface to lock up like the customer can
though.

We can get the VMs to hang/stall regularly if the guests run ntpd - approx
every 15 mins, but no real obvious pattern to it.  Disabling ntpd fixes it.

    Paul.
-- 
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.1 running on HyperV hn interface hangs

2017-09-14 Thread Paul Koch
On Thu, 14 Sep 2017 09:54:56 +0800
Sepherosa Ziehau <se...@freebsd.org> wrote:

> If you have any updates on this, please let me know.  There is still
> time for 10.4.

Still working on it.  We are trying to replicate the FreeBSD 11.1
running in a Hyper-V VM setup in our test lab.  We have ping/snmp/netflow
network simulators that can create large amounts of real network traffic to
see if it reliably triggers the problem.

    Paul.
-- 
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.1 running on HyperV hn interface hangs

2017-09-07 Thread Paul Koch
On Thu, 7 Sep 2017 13:51:11 +0800
Sepherosa Ziehau <se...@freebsd.org> wrote:

> Weird, your traffic pattern does not even belong to anything heavy.
> Sending is mainly UDP, which will never be able to saturate the TX
> buffer ring causing the RXBUF ACK sending failure.  This is weird.

It's a bit tricky. The poller is very fast. We ping every device every 15
seconds, and collect every MIB object every 60 seconds. The poller "rate
limits" itself by dividing each minute into 100ms time slots and only sends a
specific amount of pings/snmp packets in each time slot.  The problem is, it
blasts the request packets out really fast at the start of each time slot,
and then sits in a receive loop until the next time slot comes around.  The
requests are not paced over the 100ms, therefore it will blast out a lot
of packets in a few milliseconds.

We use to use a 1 second rate limiting time slot, and didn't interlace
ping/snmp requests, but we found certain interface types on Cisco 6509
switches couldn't keep up with back-to-back pings and would lose them.


> Anyhow, make sure to test this patch:
> 8762017-Sep-07 02:19 hn_inc_txbr.diff

Yep.  Might take a bit of time to test though because we'll need to get the
customer to spin up a test VM on the same platform, and they are fairly
remote (Perth, Australia).  We don't run any Microsoft servers/HyperV setups
in our lab.

Paul.
-- 
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.1 running on HyperV hn interface hangs

2017-09-07 Thread Paul Koch
On Thu, 7 Sep 2017 10:22:40 +0800
Sepherosa Ziehau  wrote:

> Is it possible to tell me your workload?  e.g. TX heavy or RX heavy.
> Enabled TSO or not.  Details like how the send syscalls are issue will
> be interesting.  And your Windows version, include the patch level,
> etc.
> 
> Please try the following patch:
> https://people.freebsd.org/~sephe/hn_dec_txdesc.diff
> 
> Thanks,
> sephe

Hi Sephe,

Here's a bit of an explanation of the environment...

AKIPS Network Monitor workload:
- 22000 devices (routers/switches/APs/etc)
- 123000 interfaces (60 snmp polling)
- 131 netflow exporters
- ~1500 pings per second
- ~1000 snmp requests/responses per second (~1.9 million MIB object/min)
- ~250 netflow packets/sec (~4500 flows/sec incoming)
- ~130 syslog messages/sec (incoming)
- ~200 snmp traps/sec (incoming)

The ping/snmp poller is a single monolithic process (no threads).
Separate processes for each of the syslog/trap/netflow collection.

SNMP requests are sent using the sendto() system call over a non-blocking UDP
socket for both IPv4 and v6.  We set the UDP socket receive buffer size to
4 Mbytes.  Nothing really complex with it.

Pings are interlaced with snmp requests so we limit the bursty nature of
small back-to-back packets (eliminates issues with switch interfaces dropping
bursts of packets).  Ping requests are sent using a raw icmp socket.  We
don't read the responses from the icmp socket, instead we put the interface
into promiscuous mode and use the BPF info to measure the tx/rx RTT values.

Syslog daemon just listens on a UDP socket with a 4 Mbyte receive buffer.
Same with the snmp trap daemon.


Here's some links to performance graphs of the VM:
 https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last2h.pdf
 https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last24h.pdf
 https://www.akips.com/downloads/hyperv-fbsd11.1p1/system-graphs-last7d.pdf

The OS was upgraded to 11.1p1 at 5pm on the 5th Sep.  The hn0 interface hung
at 7:36pm.  The interface hung three times before we reverted to 11.0p9.  It
takes a few hours after rebooting the VM before the interface hangs.


Microsoft Host is running Windows 2012 R2.  Waiting for patch level info from
the customer.

I'll have to get the customer to spin up a new VM before trying your patch.


Here's some info (after a reboot of the VM)

Guest VM dmesg:

FreeBSD 11.1-RELEASE-p1 #0 r322350: Thu Aug 10 22:16:21 UTC 2017
r...@shed31.akips.com:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM
4.0.0)
VT(vga): text 80x25
Hyper-V Version: 6.3.9600 [SP18]
  
Features=0xe7f
  PM Features=0x0 [C2]
  Features3=0x7b2
Timecounter "Hyper-V" frequency 1000 Hz quality 2000
CPU: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz (2300.00-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
  
Features=0x1f83fbff
  Features2=0x80002001
  AMD Features=0x20100800
  AMD Features2=0x1
Hypervisor: Origin = "Microsoft Hv"
real memory  = 34359738368 (32768 MB)
avail memory = 33325903872 (31782 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
random: unblocking device.
ioapic0: Changing APIC ID to 0
ioapic0  irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
Timecounter "Hyper-V-TSC" frequency 1000 Hz quality 3000
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0x80f5b220, 0) error 19
nexus0
vtvga0:  on motherboard
cryptosoft0:  on motherboard
acpi0:  on motherboard
acpi0: Power Button (fixed)
cpu0:  on acpi0
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
attimer0:  port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0:  port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
vmbus0:  on pcib0
pci0:  on pcib0
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0
ata0:  at channel 0 on atapci0
ata1:  at channel 1 on atapci0
pci0:  at device 7.3 (no driver attached)
vgapci0:  mem 0xf800-0xfbff irq 11 at device
8.0 on pci0
vgapci0: Boot video device
atkbdc0:  port 0x60,0x64 irq 1 on acpi0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0:  irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, 

Re: 11.1 running on HyperV hn interface hangs

2017-09-06 Thread Paul Koch
On Wed, 6 Sep 2017 12:02:43 +0100
Pete French  wrote:

> > We recently moved our software from 11.0-p9 to 11.1-p1, but looks like
> > there is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012
> > R2) where the virtual hn0 interface hangs with the following kernel
> > messages:
> > 
> >   hn0:  on vmbus0
> >   hn0: Ethernet address: 00:15:5d:31:21:0f
> >   hn0: link state changed to UP
> >   ...
> >   hn0: RXBUF ack retry
> >   hn0: RXBUF ack failed
> >   last message repeated 571 times
> > 
> > It requires a restart of the HyperV VM.
> > 
> > This is a customer production server (remote customer ~4000km away)
> > running fairly critical monitoring software, so we needed to roll it back
> > to 11.0-p9. We only have two customers running our software in HyperV, vs
> > lots in VMware and a handful on physical hardware.
> > 
> > 11.0-p9 has been very stable.  Has anyone seen this problem before with
> > 11.1 ?  
> 
> 
> I don't run anything on local hyper-v anymore, but I do run a ot of 
> stuff in Azure, and we havent seen anything like this. I track STABLE 
> for things though, updating after reading the commits and testing 
> locally for a week or so, so the version I am running currently is 
> r320175, which was part of 11.1-BETA2. I am going to upgrade to a more 
> recent STABLE sometime this weke or next though, will do that on a test 
> amchine and let you now how it goes.
> 
> I seem to recall that there were some large changes to the hn code in 
> August to add virtual function support. When does 11.1-p1 date from ?

Looks like 2017-08-10 

Paul.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


11.1 running on HyperV hn interface hangs

2017-09-06 Thread Paul Koch

No sure if -stable is the right mailing list for this one.

We recently moved our software from 11.0-p9 to 11.1-p1, but looks like there
is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2) where
the virtual hn0 interface hangs with the following kernel messages:

 hn0:  on vmbus0
 hn0: Ethernet address: 00:15:5d:31:21:0f
 hn0: link state changed to UP
 ...
 hn0: RXBUF ack retry
 hn0: RXBUF ack failed
 last message repeated 571 times

It requires a restart of the HyperV VM.

This is a customer production server (remote customer ~4000km away) running
fairly critical monitoring software, so we needed to roll it back to 11.0-p9.
We only have two customers running our software in HyperV, vs lots in VMware
and a handful on physical hardware.

11.0-p9 has been very stable.  Has anyone seen this problem before with 11.1 ?

11.1 is listed here
 
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-freebsd-virtual-machines-on-hyper-v

Paul.
-- 
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


date -r {big number} results in segmentation fault

2017-08-29 Thread Paul Koch

I had one of those in-a-hurry copy-n-paste errors...

 date -r 15038137211503813721
 Segmentation fault

Haven't had a chance to look into it yet though.
This was on 11.1-RELEASE-p1 #0 r322350

Paul.
-- 
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS ARC and mmap/page cache coherency question

2016-06-14 Thread Paul Koch

We are trying to understand a performance issue when syncing large mmap'ed
files on ZFS.

Example test box setup:
 FreeBSD 10.3-p5
 Intel i7-5820K 3.30GHz with 64G RAM
 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe

Read performance of a sequentially written large file on the pool is
typically around 950Mbytes/sec using dd.

Our software mmap's some large database files using MAP_NOSYNC, and we call
fsync() every 10 minutes when we know the file system is mostly idle.  In
our test setup, the database files are 1.1G, 2G, 1.4G, 12G, 4.7G and ~20
small files (under 10M). Most of the memory pages in the mmap'ed files are
updated every minute with massive amounts of collected ping and snmp mib
values.  

When the 10 minute fsync() occurs, gstat typically shows very little disk
reads and very high write speeds, which is what we expect.  But, every 80
minutes we process the data in the large mmap'ed files and store it in highly
compressed blocks of a ~300G file using pread/pwrite (i.e. not mmap'ed).
After that, the performance of the next fsync() of the mmap'ed files falls
off a cliff.  We are assuming it is because the ARC has thrown away the
cached data of the mmap'ed files.  gstat shows lots of read/write contention
and lots of things tend to stall waiting for disk.

Is this just a lack of ZFS ARC and page cache coherency ??

Is there a way to prime the ARC with the mmap'ed files again before we call
fsync() ?

We've tried cat and read() on the mmap'ed files but doesn't seem to touch the
disk at all and the fsync() performance is still poor, so it looks like the
ARC is not being filled.  msync() doesn't seem to be much different.
mincore() stats show the mmap'ed data is entirely incore and referenced.

Paul.
-- 
Paul Koch | Founder | CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


10.2 - Process stuck in unkillable sleep

2016-02-23 Thread Paul Koch
, 
  td_vnet = 0x0, 
  td_vnet_lpush = 0x0, 
  td_intr_frame = 0x0, 
  td_rfppwait_p = 0x0, 
  td_ma = 0x0, 
  td_ma_cnt = 0, 
  td_su = 0x0
}


Paul.
-- 
Paul Koch | Founder, CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: gptboot: unable to read backup GPT header - virtualbox guest with SAS controller

2015-06-25 Thread Paul Koch
On Mon, 22 Jun 2015 08:28:10 -0600 (MDT)
Warren Block wbl...@wonkity.com wrote:

 On Mon, 22 Jun 2015, Paul Koch wrote:
 
  We get the following error after installing 10.1-p12 in a VirtualBox guest
  when setup with an emulated LSI / SAS controller and a 50G fixed sized
  virtual disk:
 
  gptboot: error 1 lba 104857599
  gptboot: unable to read backup GPT header
 
  Can't seem to find anyone who has this same issue.
 
  The problem does not exist if we configure the guest with a SATA controller
  and same size virtual disk.
 
 ...
 
  The guest boots fine, but we always get the gptboot error.
 
  Is this just a problem with the virtualbox SAS controller emulation where
  gptboot can't retrieve the backup table ?
 
 That would be my first guess: an off-by-one error preventing the last 
 block from being read.  It's not clear which emulated controller was 
 being used for the diskinfo output posted earlier.  If it really was an 
 off-by-one bug, the block count would differ depending on the 
 controller.
 
 However, some controllers keep metadata on the drive, and report a 
 reduced capacity, and that would have almost the same effect.  Seems 
 like there would be a complaint by the controller firmware about the 
 contents of the metadata block, but maybe not by an emulated controller. 
 If controller metadata is the problem, installing FreeBSD using the 
 emulated controller in place should make sure the backup GPT is in the 
 correct position, rather than switching to the SCSI controller after 
 installing with, say, SATA.

It does look like gptboot couldn't access the last sector on the virtual SAS
disk. We were playing with expanding the size of the virtual disk...

Shutting down the VM, expanding the disk size, rebooting, and no gptboot error.

Running the appropriate gpart/zpool commands to take up the expanded space,
then reboot, and the gptboot error is back again.

We've been having similar/same issues with a customer who is running on bare
hardware using a Cisco C240 M4 using the mrsas driver, but it also appears
to exhibit the problem we were having where the backup partition table does
not get updated when the gpart bootonce flag is set so we can boot from an 
alternate partition. Adding 'gpart recover ${disk}' to our startup script
gets around the problem.

Paul.
-- 
Paul Koch | Founder, CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


bootonce not set in backup partition table - virtualbox guest with SAS controller

2015-06-21 Thread Paul Koch
 11e5 9e8d 0008 3f27 7893
120 0800 0080   07ff 0120  
130    0800 0061 006b 0069 0070
140 0073 002d 0072 006f 006f 0074 0030 
150        
*
180 7cb6 516e 6ecf 11d6 f88f 0200 092d 2b71
190 66da 9ac1 18ed 11e5 9e8d 0008 3f27 7893
1a0 0800 0120   07ff 01c0  
1b0    0400 0061 006b 0069 0070
1c0 0073 002d 0072 006f 006f 0074 0031 
1d0        
*
200 7cba 516e 6ecf 11d6 f88f 0200 092d 2b71
210 1d30 9ac3 18ed 11e5 9e8d 0008 3f27 7893
220 0800 01c0   f7ff 063f  
230     0061 006b 0069 0070
240 0073 002d 0068 006f 006d 0065  
250        
*


Dump of the backup partition table

# dd if=/dev/da0 bs=512 skip=104857567 count=32 2/dev/null | hexdump
000 6b9d 83bd 7f41 11dc 0bbe 1500 b860 0f4f
010 b409 9abc 18ed 11e5 9e8d 0008 3f27 7893
020 0028    0427   
030     0067 0070 0074 0062
040 006f 006f 0074     
050        
*
080 7cb5 516e 6ecf 11d6 f88f 0200 092d 2b71
090 3b28 9abe 18ed 11e5 9e8d 0008 3f27 7893
0a0 0428    0427 0080  
0b0     0061 006b 0069 0070
0c0 0073 002d 0073 0077 0061 0070  
0d0        
*
100 7cb6 516e 6ecf 11d6 f88f 0200 092d 2b71
110 c0b3 9abf 18ed 11e5 9e8d 0008 3f27 7893
120 0800 0080   07ff 0120  
130    0800 0061 006b 0069 0070
140 0073 002d 0072 006f 006f 0074 0030 
150        
*
180 7cb6 516e 6ecf 11d6 f88f 0200 092d 2b71
190 66da 9ac1 18ed 11e5 9e8d 0008 3f27 7893
1a0 0800 0120   07ff 01c0  
1b0    0c00 0061 006b 0069 0070  different
1c0 0073 002d 0072 006f 006f 0074 0031 
1d0        
*
200 7cba 516e 6ecf 11d6 f88f 0200 092d 2b71
210 1d30 9ac3 18ed 11e5 9e8d 0008 3f27 7893
220 0800 01c0   f7ff 063f  
230     0061 006b 0069 0070
240 0073 002d 0068 006f 006d 0065  
250        
*


The line at offset 1b0 is out by one bit.  Looks like the bootonce
flag is only updated in the primary partition table and not the backup.

At this point our boot script tries to do

 gpart set   -a bootme -i 4 da0
 gpart unset -a bootme -i 3 da0
 gpart unset -a bootonce -i 4 da0

but we of course get 
 gpart: table 'da0' is corrupt: Operation not permitted

We currently have a workaround that does
 gpart recover da0
before doing the set/unset gpart commands.


Not sure what we are doing wrong because it all works fine if we configure
the vm guest using a SATA controller.

Paul.
-- 
Paul Koch | Founder, CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


gptboot: unable to read backup GPT header - virtualbox guest with SAS controller

2015-06-21 Thread Paul Koch
Hi,

We get the following error after installing 10.1-p12 in a VirtualBox guest
when setup with an emulated LSI / SAS controller and a 50G fixed sized
virtual disk:

 gptboot: error 1 lba 104857599
 gptboot: unable to read backup GPT header

Can't seem to find anyone who has this same issue.

The problem does not exist if we configure the guest with a SATA controller
and same size virtual disk.


Setup:
- 10.1-RELEASE-p12 host
- VirtualBox 4.3.28
- 10.1-RELEASE-p12 guest


Guest info after boot...

# uname -a
FreeBSD shed65.akips.com 10.1-RELEASE-p12 FreeBSD 10.1-RELEASE-p12 #0 r284334:
Sat Jun 13 05:45:13 UTC 2015 r...@shed21.akips.com:/usr/obj/usr/src/sys/GENERIC
amd64


# diskinfo -v da0
da0
512 # sectorsize
53687091200 # mediasize in bytes (50G)
104857600   # mediasize in sectors
0   # stripesize
0   # stripeoffset
6527# Cylinders according to firmware.
255 # Heads according to firmware.
63  # Sectors according to firmware.
# Disk ident.


# gpart show
=   34  104857533  da0  GPT  (50G)
 34   2014   - free -  (1.0M)
   2048   10241  freebsd-boot  (512K)
   307283886082  freebsd-swap  (4.0G)
8391680   1024   - free -  (512K)
8392704   104857603  freebsd-ufs  [bootme]  (5.0G)
   18878464   104857604  freebsd-ufs  (5.0G)
   29364224   754913285  freebsd-zfs  (36G)
  10482   2015   - free -  (1.0M)

 
# gpart status
 Name  Status  Components
da0p1  OK  da0
da0p2  OK  da0
da0p3  OK  da0
da0p4  OK  da0
da0p5  OK  da0


The primary and backup GPT headers differ, which is expected we think.

Primary GPT header

# dd if=/dev/da0 bs=512 skip=1 count=1 2/dev/null | hexdump
000 4645 2049 4150 5452  0001 005c 
010 b85b a5f5   0001   
020  063f   0022   
030 ffde 063f   16e7 cafe 18d1 11e5
040 f697 0008 9b27 bd6a 0002   
050 0080  0080  8856 ebca  
060        
*

Backup GPT header

# dd if=/dev/da0 bs=512 skip=104857599 count=1 2/dev/null | hexdump
000 4645 2049 4150 5452  0001 005c 
010 98b1 45c8    063f  
020 0001    0022   
030 ffde 063f   16e7 cafe 18d1 11e5
040 f697 0008 9b27 bd6a ffdf 063f  
050 0080  0080  8856 ebca  
060        
*


MD5 of the primary and backup partition tables identical.

# dd if=/dev/da0 bs=512 skip=2 count=32 2/dev/null | md5
8c4510e1854f3371c3241e8a4374dc2c

# dd if=/dev/da0 bs=512 skip=104857567 count=32 2/dev/null | md5
8c4510e1854f3371c3241e8a4374dc2c


The guest boots fine, but we always get the gptboot error.

Is this just a problem with the virtualbox SAS controller emulation where
gptboot can't retrieve the backup table ?

Appears to fail in sys/boot/i386/common/drv.c in drvread().


We have another problem to do with the same setup to do with setting
bootonce flag not being updated in the backup partition table, but I'll
describe that in a separate email.

Paul.
-- 
Paul Koch | Founder, CEO
AKIPS Network Monitor | akips.com
Brisbane, Australia
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: flock incorrectly detects deadlock on 7-stable and current

2008-05-09 Thread Paul Koch
On Thu, 8 May 2008 06:37:00 pm Doug Rabson wrote:
 On 8 May 2008, at 09:12, Paul Koch wrote:
  Hi,
 
  We have been trying to track down a problem with one of our apps
  which does a lot of flock(2) calls.  flock returns errno 11
  (Resource deadlock avoided) under certain scenarios.  Our app works
  fine on 7-Release, but fails on 7-stable and -current.
 
  The problem appears to be when we have at least three processes
  doing flock() on a file, and one is trying to upgrade a shared lock
  to an exclusive lock but fails with a deadlock avoided.
 
  Attached is a simple flock() test program.
 
  a. Process 1 requests and gets a shared lock
  b. Process 2 requests and blocks for an exclusive lock
  c. Process 3 requests and gets a shared lock
  d. Process 3 requests an upgrade to an exclusive lock but fails
  (errno 11)
 
  If we change 'd' to
Process 3 requests unlock, then requests exclusive lock, it
  works.

 Could you possibly try this patch and tell me if it helps:

  //depot/user/dfr/lockd/sys/kern/kern_lockf.c#57 -
 /tank/projects/ lockd/src/sys/kern/kern_lockf.c 
 @@ -1370,6 +1370,18 @@
   }

   /*
 +  * For flock type locks, we must first remove
 +  * any shared locks that we hold before we sleep
 +  * waiting for an exclusive lock.
 +  */
 + if ((lock-lf_flags  F_FLOCK) 
 + lock-lf_type == F_WRLCK) {
 + lock-lf_type = F_UNLCK;
 + lf_activate_lock(state, lock);
 + lock-lf_type = F_WRLCK;
 + }
 +
 + /*
* We are blocked. Create edges to each blocking lock,
* checking for deadlock using the owner graph. For
* simplicity, we run deadlock detection for all
 @@ -1389,17 +1401,6 @@
   }

   /*
 -  * For flock type locks, we must first remove
 -  * any shared locks that we hold before we sleep
 -  * waiting for an exclusive lock.
 -  */
 - if ((lock-lf_flags  F_FLOCK) 
 - lock-lf_type == F_WRLCK) {
 - lock-lf_type = F_UNLCK;
 - lf_activate_lock(state, lock);
 - lock-lf_type = F_WRLCK;
 - }
 - /*
* We have added edges to everything that blocks
* us. Sleep until they all go away.
*/

Manually applied the patch to stable kern_lockf.c  1.57.2.1.  Ran the 
flock_test program on many of our architectures and it works fine.

Have also been testing our app on a single core i386 machine today with 
no locking problems.  Just setup a quad core -stable amd64 build and it 
also appears to be running fine now.

Thanks

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


flock incorrectly detects deadlock on 7-stable and current

2008-05-08 Thread Paul Koch
Hi,

We have been trying to track down a problem with one of our apps which 
does a lot of flock(2) calls.  flock returns errno 11 (Resource 
deadlock avoided) under certain scenarios.  Our app works fine on 
7-Release, but fails on 7-stable and -current.

The problem appears to be when we have at least three processes doing 
flock() on a file, and one is trying to upgrade a shared lock to an 
exclusive lock but fails with a deadlock avoided.

Attached is a simple flock() test program.

a. Process 1 requests and gets a shared lock
b. Process 2 requests and blocks for an exclusive lock
c. Process 3 requests and gets a shared lock
d. Process 3 requests an upgrade to an exclusive lock but fails (errno 
11)

If we change 'd' to
   Process 3 requests unlock, then requests exclusive lock, it works.


The manual page says:

A shared lock may be upgraded to an exclusive lock, and vice versa, 
simply by specifying the appropriate lock type; this results in the 
previous lock being released and the new lock applied (possibly after 
other processes have gained and released the lock).

The manual page doesn't mention that flock() can fail with a deadlock.


Our test environment is:
 - 8 core Intel machine running i386 stable
 - 4 core Intel machine running amd64 current (20080508)
 - 4 core Intel machine running amd64 stable  (20080508)
 - 2 core AMD machine running i386 stable (20080418)
 - 2 core AMD machine running i386 stable (20080418)
 - single core (no hyperthreading) i386 stable (20080418)

There appears to have been changes to kern_lockf.c and other stuff 
around the 10th April to do with deadlock detection.  We don't see the 
problem on 6.2-stable, 7-Release, or 7-stable pre ~10th April.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD boots too fast on Dell PE850

2006-08-18 Thread Paul Koch
On Saturday 19 August 2006 03:12, Alan Amesbury wrote:
 Thanks for the feedback and discussion!  Alas, in terms of network
 configuration, I'm just a tenant; I have no direct control over the
 networking gear, nor direct visibility into how the switch is
 configured.

 A couple people wrote to me directly and suggested I 'send-pr' this,
 so I'll do so (hopefully later today).

 Thanks again!


 --
 Alan Amesbury
 University of Minnesota


This is a really old problem, actually two.

The first being the spanning tree problem where it can take a long 
time for it to settle and your port go into forwarding state.  Adding a 
random sleep doesn't help because - how long do you sleep for ?  How we 
got around this problem at various sites was, by modifying rc scripts, 
to check if a default gateway was configured (typical), and ping it 
until a response was received, or a large timeout occurred (eg. 5 
minutes).  That way, all other network services like nptdate, and 
sendmail would have a better chance of working.

If your machine doesn't use a static IP, but instead dhcp, then you will 
need to have a long timeout/retry on the dhcp requests.

The second problem we found was, various NICs would report that they 
were active after doing auto negotiation, but no rx packets were 
being passed into to the OS.  Not sure if it was a hardware or driver 
issue, but we discovered that by forcing a packet out the NIC via the 
bpf interface, it would immediately start doing stuff.  It was if the 
auto negotiation had not really completed fully until a packet was 
transmitted.  This only occurred on certain types of NICs, the newer 
ones.  This was a problem for us because we build something called 
a remote network appliance (RNA) which is basically FreeBSD on a 
floppy and runs a statistical lan analyser.  The RNA might have many 
NICs in it, one with an IP, the others just connected to network 
segments in promiscuous mode.  Our apps couldn't monitor any traffic 
because no packets had be sent out the interfaces.  So, early in the 
boot process we force out a couple of Loopback packets and everything 
works just fine.

Not sure if the second issue would be a problem for normal installations 
though.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Can't select/install kernels in custom install.cfg - 6.1RC2

2006-05-13 Thread Paul Koch
On Fri, 5 May 2006 05:33 pm, Paul Koch wrote:
 Hi,

 I have just upgraded some of our product build machines to 6.1RC2 and
 I am having a few problems getting a custom install.cfg to work with
 the way sysinstall now selects/installs kernels.  I am trying to
 select a custom distribution set using something like the following:

  dists=base kernels
  distKernel=GENERIC SMP
  distSetCustom

 base gets installed fine but no kernels ever get installed.  If I set
 any of the standard distribution sets (eg. distSetMinimal, or
 distSetEverything), then one or both of GENERIC/SMP kernels get
 installed.

 I am a bit lost. Did some googling, but couldn't fine anything
 relevent.

   Paul.

Managed to work this out by reading sysinstall code.  It is quite 
simple, but not so obvious from the manual page.  There appears to be 
two methods of selecting which dists to install:

1. Setting the distributions by name using something like:

 ...
 dists=base lib32 GENERIC SMP
 distSetCustom
 ...
 installCommit

The kernels have now been separated from base, into their own dist, 
but you don't specify dists=base kernels, instead you have to specify 
the subdistribution names.  Each of the dists values are looked up and 
the appropriate bit set for the variables below.


2. Setting the distributions by setting several bit field environment 
variables:

 ...
 distMain=8193
 distSRC=0
 distX11=0
 diskKernel=3
 ...
 installCommit

The above variables are bit fields as per sysinstall/dist.h, but must be 
converted to decimal because they are read using atoi(3).

Both methods work fine, but you can't select all within 
subdistribution using method 1 without putting every subdistribution 
name in (eg. for src - sbase scontrib scrypt ...), where as in method 2 
you can just turn all bits on (eg. 0xF, but in decimal). 

Best to refer to dist.h.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Can't select/install kernels in custom install.cfg - 6.1RC2

2006-05-05 Thread Paul Koch
Hi,

I have just upgraded some of our product build machines to 6.1RC2 and I 
am having a few problems getting a custom install.cfg to work with the 
way sysinstall now selects/installs kernels.  I am trying to select a 
custom distribution set using something like the following:

 dists=base kernels
 distKernel=GENERIC SMP
 distSetCustom

base gets installed fine but no kernels ever get installed.  If I set 
any of the standard distribution sets (eg. distSetMinimal, or 
distSetEverything), then one or both of GENERIC/SMP kernels get 
installed.

I am a bit lost. Did some googling, but couldn't fine anything relevent.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: BETA3 report

2006-03-12 Thread Paul Koch
On Mon, 13 Mar 2006 02:41 pm, M. Warner Losh wrote:
 I just tried to boot the BETA3 disc1 iso on my Toshiba Satellite
 A64-S1762.  I got a cryptic message:

 Too many holes in the physical address space, giving up
 rest of copyright message here
 PANIC
 (if I've gone into the BIOS at all), followed by a panic.

 Or I get a panic when probing the ohci device (either with or without
 a USB keyboard attached, the built-in keyboard is fried).

 Has anybody else seen this?  Is there a debug kernel I can boot, or
 do I have to build one of my own?

 Warner

Hi,

Is this related to issue 2 in my previous post:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153993+0+archive/2005/freebsd-stable/20051127.freebsd-stable

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Automatic installation problem

2005-12-09 Thread Paul Koch
On Sat, 10 Dec 2005 01:34 am, Tarasov Alexey wrote:
 Hello!

 I'm trying to make full automatical FreeBSD installation. I have the

 following lines in my install.cfg:
  command=echo sshd_enable=YES  /etc/rc.conf
  system
  command=echo 'pass' | /usr/sbin/pw useradd -u user -h 0 -G wheel
  system

 But during installation I get some error messages:
   DEBUG: dispatch: calling resword 'system'
   echo sshd_enable=YES  /etc/rc.conf: not found

 I am using FreeBSD 6.0-RELEASE.

Yep, something changed around 4.x in sysinstall where rc.conf is not 
created until near when it finishes.  If you write to rc.conf during an 
scripted sysinstall, then it will be over written.  You can write your 
stuff to /etc/rc.conf.local.  It gets sourced in /etc/defaults/rc.conf.

$ grep rc.conf /etc/defaults/rc.conf
rc_conf_files=/etc/rc.conf /etc/rc.conf.local

After boot time, we have a post boot script which merges our stuff from 
rc.conf.local into rc.conf, just to keep it clean.

Just to be safe, we quote everything in install.cfg.  As in

 echo 'sshd_enable=YES'  /etc/rc.conf.local

Paul.
-- 
Paul Koch
CTO
Statseeker
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0-stable panic in ohci_softintr when using ucom/uftdi

2005-12-07 Thread Paul Koch
On Wed, 7 Dec 2005 09:02 pm, Ian Dowse wrote:
 In message [EMAIL PROTECTED], Paul Koch 
writes:
 As soon at the modem rings, the machine panics.  I don't think the
 modem even gets time to pick up the line.  This doesn't happen on
 a 5.4-stable box with 6 modems connected, but 5.4-stable appears
 to have other hanging issues, thus why I am trying out 6.0-stable.
 
 Recent changes to ohci.c (1.154.2.1) in this area (2 days ago) ??
 Should I raise a PR ?

 I'll see if I can reproduce this later - the only thing I can think
 of now that might be responsible is a USB transfer reuse issue that
 the old patch here might help with:

   http://people.freebsd.org/~iedowse/releng_5_xfer_reuse.diff

 Could you see if that makes any difference? It should apply to
 6-stable even though the name says releng_5.

 Ian

I'll try it tomorrow morning when I get back to the office.

I compiled in USB_DEBUG and set all the usb debug sysctl's to 100 and 
made it crash again. It is reproduceable 100% of the time.  Are you 
interested in seeing the verbose debug log messages at all ?

Paul. 
-- 
Paul Koch
CTO
Statseeker
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


6.0-stable panic in ohci_softintr when using ucom/uftdi

2005-12-06 Thread Paul Koch
My setup is:
ASUS Pundant Booksize PC Celeron 2.66Ghz, 512M ram, 4 USB ports.
Attached is one Netcomm USB modem which uses the ucom / uftdi 
drivers, and is configured for auto answer. 

Source is 6.0 stable as of today 7th Dec 2005.

Devices in /dev that get created are:
cuaU0
cuaU0.init
cuaU0.lock
ttyU0.init
ttyU0.lock
ttyd0

I have added the following to /etc/ttys:
 ttyU0 /usr/libexec/getty modem.230400 dialup on insecure

and this to /etc/gettytab:
 modem.230400|modems-230k:\
:hw:np:sp#230400:

As soon at the modem rings, the machine panics.  I don't think the
modem even gets time to pick up the line.  This doesn't happen on 
a 5.4-stable box with 6 modems connected, but 5.4-stable appears 
to have other hanging issues, thus why I am trying out 6.0-stable.

Recent changes to ohci.c (1.154.2.1) in this area (2 days ago) ??
Should I raise a PR ?

Paul.


# kgdb kernel.debug /var/crash/vmcore.2
kgdb: core file: /var/crash/vmcore.2
kgdb: kernel image: kernel.debug
[GDB will not be able to debug user-mode 
threads: /usr/lib/libthread_db.so: Undefined sym
bol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you 
are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for 
details.
This GDB was configured as i386-marcel-freebsd.

Unread portion of the kernel message buffer:
ucom0: open bulk out error (addr 2): IN_USE


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x24
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc0474c36
stack pointer   = 0x28:0xd3133c7c
frame pointer   = 0x28:0xd3133cac
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 29 (irq19: ohci0 ohci1+)
trap number = 12
panic: page fault
Uptime: 47s
Dumping 446 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 447MB (114240 pages) 431 415 399 383 367 351 335 319 303 287 
271 255 239 223 2
07 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 __asm __volatile(movl %%fs:0,%0 : =r (td));

(kgdb) backtrace
#0  doadump () at pcpu.h:165
#1  0xc04c8a42 in boot (howto=260) 
at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc04c8cd8 in panic (fmt=0xc05f6b91 %s) 
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc05d7e34 in trap_fatal (frame=0xd3133c3c, eva=36)
at /usr/src/sys/i386/i386/trap.c:836
#4  0xc05d7b9b in trap_pfault (frame=0xd3133c3c, usermode=0, eva=36)
at /usr/src/sys/i386/i386/trap.c:744
#5  0xc05d77f9 in trap (frame=
  {tf_fs = -1027342328, tf_es = -1027342296, tf_ds = -1028849624, 
tf_edi = -1028264960, tf_esi = -1028043056, tf_ebp = -753714004, tf_isp 
= -753714072, tf_ebx = 2, tf_edx = -1028786304, tf_ecx = 0, tf_eax = 
-1027196800, tf_trapno = 12, tf_err = 0, tf_eip = -1069069258, tf_cs = 
32, tf_eflags = 590466, tf_esp = -1068655605, tf_ss = -1028786304})
at /usr/src/sys/i386/i386/trap.c:434
#6  0xc05c7f3a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0474c36 in ohci_softintr (v=0xc2b8b000) 
at /usr/src/sys/dev/usb/ohci.c:1469
#8  0xc04887ab in usb_schedsoftintr (bus=0xc2b8b000) 
at /usr/src/sys/dev/usb/usb.c:871
#9  0xc0474762 in ohci_intr1 (sc=0xc2b8b000) 
at /usr/src/sys/dev/usb/ohci.c:1233
#10 0xc04745b4 in ohci_intr (p=0xc2b8b000) 
at /usr/src/sys/dev/usb/ohci.c:1162
#11 0xc04b45d9 in ithread_loop (arg=0xc2a8b400) 
at /usr/src/sys/kern/kern_intr.c:547
#12 0xc04b3860 in fork_exit (callout=0xc04b4480 ithread_loop, 
arg=0xc2a8b400, 
frame=0xd3133d38) at /usr/src/sys/kern/kern_fork.c:789
#13 0xc05c7f9c in fork_trampoline () 
at /usr/src/sys/i386/i386/exception.s:208
(kgdb)



Kernel Config:
machine i386
cpu I686_CPU
ident   TERMITE6X
makeoptions DEBUG=-g
 
options SCHED_4BSD
options PREEMPTION
options INET
options FFS
options SOFTUPDATES
options UFS_ACL
options UFS_DIRHASH
options MD_ROOT
options GEOM_GPT
options COMPAT_43
options KTRACE
options SYSVSHM
options SYSVMSG
options SYSVSEM
options _KPOSIX_PRIORITY_SCHEDULING 
options KBD_INSTALL_CDEV
options ADAPTIVE_GIANT
device  apic
device  pci
device  agp
 
device  ata
device  atadisk
device  atapicd
options ATA_STATIC_ID
 
device  scbus
device  da
device  cd
device  pass
 
device  atkbdc
device  atkbd
device  psm
device  vga
device  splash
device  sc
 
device  pmtimer
device  sio
device  

Re: 6.0 kernel will not boot past atkbd0

2005-11-25 Thread Paul Koch
On Fri, 25 Nov 2005 09:01 pm, Pete French wrote:
  We have an array of new ASUS machine connected through a KVM. 
  These machines panic when the kernel is probing for the mouse if
  ACPI is not loaded.  If a mouse is not plugged in, no panic.  If
  ACPI is loaded and the mouse is plugged in, it boots fine.

 I didnt try ACPI. I merely took the ouse driiver out of the kernel.
 Interestingly this then let me boot single user, but would not boot
 multi user! I had to physically unplug the mouse in the end.

 Shall file a PR (unless someone else already did?)

For the ASUS machines, we think it is already covered in PR i386/69750.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 kernel will not boot past atkbd0

2005-11-24 Thread Paul Koch
On Fri, 25 Nov 2005 01:18 am, Pete French wrote:
  I have the same behavior on Compaq AP400 (2xPIII 700MHz).
  The problem disappeared when I disconnect the mouse.

 Ah! Now that is worth knowing - I didnt even think of trying that.
 So does anyone know why the mouse being connected causes it to not
 boot ?

 I can rebuild the kernel without the mouse driver, that might help...

We have an array of new ASUS machine connected through a KVM.  These 
machines panic when the kernel is probing for the mouse if ACPI is not 
loaded.  If a mouse is not plugged in, no panic.  If ACPI is loaded and 
the mouse is plugged in, it boots fine.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Dell DRAC card snatches keyboard console

2005-11-23 Thread Paul Koch
On Thu, 24 Nov 2005 02:32 am, Palle Girgensohn wrote:
 Hi!

 I just installed a Dell 2850 with a DRAC card and a PS/2 keyboard.
 The keyboard stopped working when entering multiuser mode, and I
 found an old email on current from September 2004, where Brooks said
 to comment away ukbd lines in devd.conf. I did, PS/2 keyboard works
 but the DRAC remote console does not. Is there a way to have both
 keyboards working?

 http://lists.freebsd.org/pipermail/freebsd-current/2004-September/03
8879.html

 Regards,
 Palle

We also came across this problem some time ago on 5.4.  We could install 
ok, go into single user mode, but in multiuser mode the DRAC would 
become the console.  We didn't fiddle devd.conf, but instead put an 
entry in rc.conf of keyboard=/dev/kbd0.  This allowed PS/2 keyboards 
to work, and if we plugged a usb keyboard in, it would also work.  We 
didn't get to play with the DRAC though.  I don't have access to these 
machines anymore because they are now installed at a customer site.

I'd be interested in knowing how to get both a locally attached keyboard 
and the DRAC going at the same time.  I recall the DRAC being a special 
card which contains an ethernet port and virtual keyboard/mouse/video. 
It is accessed remotely via IP and replaces the need to KVM 
switches/cabling. The card contains its own tiny OS and IP stack.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 Release - Pentium install panic and some questions

2005-11-21 Thread Paul Koch
On Mon, 21 Nov 2005 07:24 pm, Kris Kennaway wrote:
  Issue 1: Can't install on a Pentium P5 class machine:
 
  The install panics when installing the base stuff. No useful
  messages are displayed accept the panic: page fault and rebooting
  in 15 seconds. The machines are 10 year old DEC Pentiums, 32 to 64M
  ram, IDE disks, etc. We have four of these in our test environment
  and appear to install and run FreeBSD-5.4 fine.

 Try disabling ACPI.  Many old systems have buggy ACPI
 implementations. Sometimes this can be fixed by a BIOS upgrade.

A Pentium 150Mhz aged machine wouldn't have ACPI, would it ?

I just tried going through the long floppy install on another one of 
these machines I have in my home test rack (they don't boot from the 
cdrom anymore), but stopped trying when the single IBM SCSI disk 
attached to an Adaptec controller was detected as da0, da1, da2, da3, 
da4 and da5 !   I'll try again on the machines in the office tomorrow.

  hints   ./device.hints
  machine i386
  cpu I586_CPU
  cpu I686_CPU
  ident   RNA_KERNEL
  options SCHED_ULE

 You probably want 4BSD, since ULE is slower on many workloads.

We used ULE on 5.4 because it used less space on the floppy image and 
there was really only one process doing much on the machine, but I did 
some playing this afternoon and see that on 6.0 ULE and 4BSD use up the 
same amount of space, so I am changing it back to 4BSD.  I am not so 
stretched for space on the floppy anymore after getting rid of the 
second GLOBAL_OFFSET_TABLE text.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 Release - Pentium install panic and some questions

2005-11-21 Thread Paul Koch
On Tue, 22 Nov 2005 07:03 am, Kris Kennaway wrote:
 On Mon, Nov 21, 2005 at 09:28:27PM +1000, Paul Koch wrote:
  On Mon, 21 Nov 2005 07:24 pm, Kris Kennaway wrote:
Issue 1: Can't install on a Pentium P5 class machine:
   
The install panics when installing the base stuff. No useful
messages are displayed accept the panic: page fault and
rebooting in 15 seconds. The machines are 10 year old DEC
Pentiums, 32 to 64M ram, IDE disks, etc. We have four of these
in our test environment and appear to install and run
FreeBSD-5.4 fine.
  
   Try disabling ACPI.  Many old systems have buggy ACPI
   implementations. Sometimes this can be fixed by a BIOS upgrade.
 
  A Pentium 150Mhz aged machine wouldn't have ACPI, would it ?

 I don't know..nevertheless, please try it :)

 Kris

Ok, a bit of confusion here.  When booting from floppy on these 
machines, the option is to Boot FreeBSD with ACPI enabled, while on 
other machines it says Boot FreeBSD with ACPI disabled.  Looks like 
this is from beastie.4th.  We tried both options and it still panics 
when it is extracting base (ie. you can partition, newfs, etc... using 
sysinstall). It gets about 2% of the way through extracting base.


So, tried going to the command line from the loader menu and unloaded 
all loaded modules, and disabled further loading of ACPI, then 
continued booting.  The kernel is loaded into memory, we get the Too 
many holes in physical memory message and it panics immediately after 
the copyright message with a:

Fatal Trap 12: page fault while in kernel mode
fault virtual address: 0xb5
fault code = supervisoer read, page not present
. some other guff
panic: page fault
uptime: 1s

This is the error condition we experienced with our RNA before bumping 
up the phys_avail[] in machdep.c


I am planning to ditch all of these old Pentium machines because they 
are too much of a problem.  I'll keep one for a little longer if you 
like to figure out what is going on.

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


6.0 Release - Pentium install panic and some questions

2005-11-20 Thread Paul Koch
Hi, we are having a number of issues with 6.0-Release.

Our setup: We have ~40 machines in a development test environment, 
ranging from P5/150Mhz/32M ram/IDE,  PII Celerons, P3, P4, single and 
dual processor setups.


Issue 1: Can't install on a Pentium P5 class machine:

The install panics when installing the base stuff. No useful messages 
are displayed accept the panic: page fault and rebooting in 15 
seconds. The machines are 10 year old DEC Pentiums, 32 to 64M ram, IDE 
disks, etc. We have four of these in our test environment and appear to 
install and run FreeBSD-5.4 fine.


Issue 2: phys_avail[] array too small in i386/machdep.c P5 boxes ??

We have something which we call a Remote Network Appliance (RNA), which 
is basically a boot floppy with lots of stuff squeezed on it. The RNA 
uses a cut down kernel config (ie. no kernel source modifications), 
various other inhouse programs (eg. init/inetd/telnetd replacements), 
built into a 1Mbyte MD root. We have no problems using everything up 
until a 5.4-stable kernel but have various problems with 6.0-release.  
When using 6.0, we get the following messages:

Overlapping or non-montonic memory region, ignoring second region
...
Too many holes in the physical address space, giving up
...
Fatel trap 12: page fault while in kernel mode
...
panic

Did a bit of searching and found that in Dragonfly phys_avail[] in 
i386/machdep.c has been bumped up because it is too small. Looking at 
6.0 machdep.c, looks like new dcons stuff has been added to it, and it 
blocks out some physical memory to use. Not sure if that has anything 
to do with it.  From my understanding, phys_avail[10] gives you room 
for four physical available address ranges (ie. 4 * start/end pair 
entries and null terminated).  I bumped the number up to 12 (ie. gives 
me five address ranges) and we are off and going.

6.0 now boots on all our Pentium machines, but...

on 5.4-stable we got:
 physical memory: 67108864
 avail memory:56156160

on 6.0 with phys_avail[12] we got:
 physical memory: 67108864
 avail memory:63299584

more available memory for some reason !  Hmmm.

On most of our machines, when booting in verbose mode, the 5.4 kernel 
reports three phys_avail segments, but the Pentium boxes report four. 
On the patched 6.0, the Pentiums report five segments.

Unfortunately, the machine panics on Pentium machines when stress 
testing it (ie. by making it run out of memory).  On 5.4-stable it 
would just kill user processes, under 6.0 it kills a few processes but 
quickly panics with a page not present error.  At least 6.0 now boots 
and runs on a Pentium, whereas the standard install panics.  I can't 
get a dump of the RNA floppy panic because it has no swap or disk to 
write to, and there isn't enough room on the floppy to build a kernel 
with debugging stuff.

So, my question is... is it OK to bump phys_avail from 10 to 12 ?? or do 
we just ditch the Pentium as a supported platform ?
Dragonfly have bumped it to 22, giving 10 segments.

The only other change we do is compile the kernel and world with -Os and 
-funit-at-a-time to reduce the resulting binary sizes.

fyi, A copy of the floppy image is at:
  http://www.statseeker.com/downloads/lanstat_fbsd60.bin
It also contains our realtime Statistical LAN Analyser. Instructions are 
at http://www.statseeker.com/download1.html


The following is the RNA kernel config:

hints   ./device.hints
machine i386
cpu I586_CPU
cpu I686_CPU
ident   RNA_KERNEL
options SCHED_ULE
options INET
options FFS
options MD_ROOT
options MD_ROOT_SIZE=1024
options COMPAT_FREEBSD4
options HZ=1000
options CLK_USE_I8254_CALIBRATION
options VM_KMEM_SIZE_SCALE
options NO_SWAPPING
options INIT_PATH=/rna-init

device  apm
device  pci
device  vga
device  fdc
device  md
device  mem
nodeviceio
device  atkbdc
device  atkbd
device  pty
device  sc
options MAXCONS=2
options SC_HISTORY_SIZE=500
options SC_NORM_ATTR=(FG_GREEN|BG_BLACK)
options SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN)
options SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK)
options SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED)
options SC_NO_CUTPASTE
options SC_NO_FONT_LOADING
options SC_NO_SYSMOUSE

options DEVICE_POLLING

device  de
device  em
device  ixgb
device  txp

device  miibus
device  bfe
device  bge
device  dc
device  fxp
device  lge
device  nge
device  pcn
device  re

device  sf
device  sis
device  sk
device  ste
device  ti
device  tl
device  tx
device  vge
device  vr
device  wb
device  xl

device  loop
device