Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-29 Thread Jeremy Chadwick
On Tue, Sep 28, 2010 at 10:31:27PM -0700, Don Lewis wrote:
 On 28 Sep, Don Lewis wrote:
 
  Looking at the timestamps of things and comparing to my logs, I
  discovered that the last instance of ntp instability happened when I was
  running make index in /usr/ports.  I tried it again with entertaining
  results.  After a while, the machine became unresponsive.  I was logged
  in over ssh and it stopped echoing keystrokes.  In parallel I was
  running a script that echoed the date, the results of vmstat -i, and
  the results of ntpq -c pe.  The latter showed jitter and offset going
  insane.  Eventually make index finished and the machine was responsive
  again, but the time was way off and ntpd croaked because the necessary
  time correction was too large.  Nothing else anomalous showed up in the
  logs.  Hmn, about half an hour after ntpd died I started my CPU time
  accounting test and two minutes into that test I got a spew of calcru
  messages ...
 
 I tried this experiment again using a kernel with WITNESS and
 DEBUG_VFS_LOCKS compiled in, and pinging this machine from another.
 Things look normal for a while, then the ping times get huge for a while
 and then recover.
 
 64 bytes from 192.168.101.3: icmp_seq=1169 ttl=64 time=0.135 ms
 64 bytes from 192.168.101.3: icmp_seq=1170 ttl=64 time=0.141 ms
 64 bytes from 192.168.101.3: icmp_seq=1171 ttl=64 time=0.130 ms
 64 bytes from 192.168.101.3: icmp_seq=1172 ttl=64 time=0.131 ms
 64 bytes from 192.168.101.3: icmp_seq=1173 ttl=64 time=0.128 ms
 64 bytes from 192.168.101.3: icmp_seq=1174 ttl=64 time=38232.140 ms
 64 bytes from 192.168.101.3: icmp_seq=1175 ttl=64 time=37231.309 ms
 64 bytes from 192.168.101.3: icmp_seq=1176 ttl=64 time=36230.470 ms
 64 bytes from 192.168.101.3: icmp_seq=1177 ttl=64 time=35229.632 ms
 64 bytes from 192.168.101.3: icmp_seq=1178 ttl=64 time=34228.791 ms
 64 bytes from 192.168.101.3: icmp_seq=1179 ttl=64 time=33227.953 ms
 64 bytes from 192.168.101.3: icmp_seq=1180 ttl=64 time=32227.091 ms
 64 bytes from 192.168.101.3: icmp_seq=1181 ttl=64 time=31226.262 ms
 64 bytes from 192.168.101.3: icmp_seq=1182 ttl=64 time=30225.425 ms
 64 bytes from 192.168.101.3: icmp_seq=1183 ttl=64 time=29224.597 ms
 64 bytes from 192.168.101.3: icmp_seq=1184 ttl=64 time=28223.757 ms
 64 bytes from 192.168.101.3: icmp_seq=1185 ttl=64 time=27222.918 ms
 64 bytes from 192.168.101.3: icmp_seq=1186 ttl=64 time=26222.086 ms
 64 bytes from 192.168.101.3: icmp_seq=1187 ttl=64 time=25221.164 ms
 64 bytes from 192.168.101.3: icmp_seq=1188 ttl=64 time=24220.407 ms
 64 bytes from 192.168.101.3: icmp_seq=1189 ttl=64 time=23219.575 ms
 64 bytes from 192.168.101.3: icmp_seq=1190 ttl=64 time=22218.737 ms
 64 bytes from 192.168.101.3: icmp_seq=1191 ttl=64 time=21217.905 ms
 64 bytes from 192.168.101.3: icmp_seq=1192 ttl=64 time=20217.066 ms
 64 bytes from 192.168.101.3: icmp_seq=1193 ttl=64 time=19216.228 ms
 64 bytes from 192.168.101.3: icmp_seq=1194 ttl=64 time=18215.333 ms
 64 bytes from 192.168.101.3: icmp_seq=1195 ttl=64 time=17214.503 ms
 64 bytes from 192.168.101.3: icmp_seq=1196 ttl=64 time=16213.720 ms
 64 bytes from 192.168.101.3: icmp_seq=1197 ttl=64 time=15210.912 ms
 64 bytes from 192.168.101.3: icmp_seq=1198 ttl=64 time=14210.044 ms
 64 bytes from 192.168.101.3: icmp_seq=1199 ttl=64 time=13209.194 ms
 64 bytes from 192.168.101.3: icmp_seq=1200 ttl=64 time=12208.376 ms
 64 bytes from 192.168.101.3: icmp_seq=1201 ttl=64 time=11207.536 ms
 64 bytes from 192.168.101.3: icmp_seq=1202 ttl=64 time=10206.694 ms
 64 bytes from 192.168.101.3: icmp_seq=1203 ttl=64 time=9205.816 ms
 64 bytes from 192.168.101.3: icmp_seq=1204 ttl=64 time=8205.014 ms
 64 bytes from 192.168.101.3: icmp_seq=1205 ttl=64 time=7204.186 ms
 64 bytes from 192.168.101.3: icmp_seq=1206 ttl=64 time=6203.294 ms
 64 bytes from 192.168.101.3: icmp_seq=1207 ttl=64 time=5202.510 ms
 64 bytes from 192.168.101.3: icmp_seq=1208 ttl=64 time=4201.677 ms
 64 bytes from 192.168.101.3: icmp_seq=1209 ttl=64 time=3200.851 ms
 64 bytes from 192.168.101.3: icmp_seq=1210 ttl=64 time=2200.013 ms
 64 bytes from 192.168.101.3: icmp_seq=1211 ttl=64 time=1199.100 ms
 64 bytes from 192.168.101.3: icmp_seq=1212 ttl=64 time=198.331 ms
 64 bytes from 192.168.101.3: icmp_seq=1213 ttl=64 time=0.129 ms
 64 bytes from 192.168.101.3: icmp_seq=1214 ttl=64 time=58223.470 ms
 64 bytes from 192.168.101.3: icmp_seq=1215 ttl=64 time=57222.637 ms
 64 bytes from 192.168.101.3: icmp_seq=1216 ttl=64 time=56221.800 ms
 64 bytes from 192.168.101.3: icmp_seq=1217 ttl=64 time=55220.960 ms
 64 bytes from 192.168.101.3: icmp_seq=1218 ttl=64 time=54220.116 ms
 64 bytes from 192.168.101.3: icmp_seq=1219 ttl=64 time=53219.282 ms
 64 bytes from 192.168.101.3: icmp_seq=1220 ttl=64 time=52218.444 ms
 64 bytes from 192.168.101.3: icmp_seq=1221 ttl=64 time=51217.618 ms
 64 bytes from 192.168.101.3: icmp_seq=1222 ttl=64 time=50216.778 ms
 64 bytes from 192.168.101.3: icmp_seq=1223 ttl=64 time=49215.932 ms
 64 bytes from 192.168.101.3: icmp_seq=1224 

Re: fetch: Non-recoverable resolver failure

2010-09-29 Thread Jeremy Chadwick
On Tue, Sep 28, 2010 at 10:59:04PM +0200, Miroslav Lachman wrote:
 Jeremy Chadwick wrote:
 On Tue, Sep 28, 2010 at 08:12:00PM +0200, Miroslav Lachman wrote:
 Hi,
 
 we are using fetch command from cron to run PHP scripts periodically
 and sometimes cron sends error e-mails like this:
 
 fetch: https://hiden.example.com/cron/fiveminutes: Non-recoverable
 resolver failure
 
 [...]
 
 Note: target domains are hosted on the server it-self and named too.
 
 The system is FreeBSD 7.3-RELEASE-p2 i386 GENERIC
 
 Can somebody help me to diagnose this random fetch+resolver issue?
 
 The error in question comes from the resolver library returning
 EAI_FAIL.  This return code can be returned to all sorts of applications
 (not just fetch), although how each app handles it may differ.  So,
 chances are you really do have something going on upstream from you (one
 of the nameservers you use might not be available at all times), and it
 probably clears very quickly (before you have a chance to
 manually/interactively investigate it).
 
 The strange thing is that I have only one nameserver listed in
 resolv.conf and it is the local one! (127.0.0.1) (there were two
 remote nameservers, but I tried to switch to local one to rule out
 remote nameservers / network problems)
 
 You're probably going to have to set up a combination of scripts that do
 tcpdump logging, and ktrace -t+ -i (and probably -a) logging (ex. ktrace
 -t+ -i -a -f /var/log/ktrace.fetch.out fetch -qo ...) to find out what's
 going on behind the scenes.  The irregularity of the problem (re:
 sometimes) warrants such.  I'd recommend using something other than
 127.0.0.1 as your resolver if you need to do tcpdump.
 
 I will try it... there will be a lot of output as there are many
 cronjobs and relativelly high traffic on the webserver. But fetch
 resolver failure occurred only few times a day.
 
 Providing contents of your /etc/resolv.conf, as well as details about
 your network configuration on the machine (specifically if any
 firewall stacks (pf or ipfw) are in place) would help too.  Some folks
 might want netstat -m output as well.
 
 There is nothing special in the network, the machine is Sun Fire
 X2100 M2 with bge1 NIC connected to Cisco Linksys switch (100Mbps
 port) with uplink (1Gbps port) connected to Cisco router with dual
 10Gbps connectivity. No firewalls in the path. There are more than
 10 other servers in the rack and we have no problems / error
 messages in logs from other services / daemons related to DNS.
 
 # cat /etc/resolv.conf
 nameserver 127.0.0.1
 
 
 /# netstat -m
 279/861/1140 mbufs in use (current/cache/total)
 257/553/810/25600 mbuf clusters in use (current/cache/total/max)
 257/313 mbuf+clusters out of packet secondary zone in use (current/cache)
 5/306/311/12800 4k (page size) jumbo clusters in use
 (current/cache/total/max)
 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
 603K/2545K/3149K bytes allocated to network (current/cache/total)
 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
 13/470/6656 sfbufs in use (current/peak/max)
 0 requests for sfbufs denied
 0 requests for sfbufs delayed
 3351782 requests for I/O initiated by sendfile
 0 calls to protocol drain routines
 
 
 (real IPs were replaced)
 
 # ifconfig bge1
 bge1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
 options=9bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM
 ether 00:1e:68:2f:71:ab
 inet 1.2.3.40 netmask 0xff80 broadcast 1.2.3.127
 inet 1.2.3.41 netmask 0x broadcast 1.2.3.41
 inet 1.2.3.42 netmask 0x broadcast 1.2.3.42
 media: Ethernet autoselect (100baseTX full-duplex)
 status: active
 
 
 NIC is:
 
 b...@pci0:6:4:1:class=0x02 card=0x534c108e
 chip=0x167814e4 rev=0xa3 hdr=0x00
 vendor = 'Broadcom Corporation'
 device = 'BCM5715C 10/100/100 PCIe Ethernet Controller'
 class  = network
 subclass   = ethernet
 
 
 There is PF with some basic rules, mostly blocking incomming
 packets, allowing all outgoing and scrubbing:
 
 scrub in on bge1 all fragment reassemble
 scrub out on bge1 all no-df random-id min-ttl 24 max-mss 1492
 fragment reassemble
 
 pass out on bge1 inet proto udp all keep state
 pass out on bge1 inet proto tcp from 1.2.3.40 to any flags S/SA
 modulate state
 pass out on bge1 inet proto tcp from 1.2.3.41 to any flags S/SA
 modulate state
 pass out on bge1 inet proto tcp from 1.2.3.42 to any flags S/SA
 modulate state
 
 modified PF options:
 
 set timeout { frag 15, interval 5 }
 set limit { frags 2500, states 5000 }
 set optimization aggressive
 set block-policy drop
 set loginterface bge1
 # Let loopback and internal interface traffic flow without restrictions
 set skip on lo0

Please also provide pfctl -s info output, in addition to uname -a
output (you 

Re: Still getting kmem exhausted panic

2010-09-29 Thread Andriy Gapon
on 29/09/2010 03:38 Artem Belevich said the following:
 On Tue, Sep 28, 2010 at 3:22 PM, Andriy Gapon a...@icyb.net.ua wrote:
 BTW, have you seen my posts about UMA and ZFS on hackers@ ?
 I found it advantageous to use UMA for ZFS I/O buffers, but only after 
 reducing
 size of per-CPU caches for the zones with large-sized items.
 I further modified the code in my local tree to completely disable per-CPU
 caches for items  32KB.
 
 Do you have updated patch disabling per-cpu caches for large items?
 I've just rebuilt FreeBSD-8 with your uma-2.diff (it needed r209050
 from -head to compile) and so far things look good. I'll re-enable UMA
 for ZFS and see how it flies in a couple of days.

I've just uploaded uma-3.diff.
It implements what uma-1.diff did, plus totally skips per-CPU caches for items 
32KB, and also has code from uma-2.diff for flushing per-CPU caches on
significant memory shortage.

Will appreciate your feedback.
Thank you for testing!

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: cpu timer issues

2010-09-29 Thread Andriy Gapon
on 29/09/2010 06:49 Jurgen Weber said the following:
 Andriy
 
 You can find everything you are after here:
 
 http://pastebin.com/WH4V2W0F

Looks like this was with ACPI disabled?
Can you try to re-enable it?
Also, it doesn't look like the dmesg is verbose.


 On 28/09/10 8:07 PM, Andriy Gapon wrote:
 on 28/09/2010 10:54 Jurgen Weber said the following:
 # dmesg | grep Timecounter
 Timecounter i8254 frequency 1193182 Hz quality 0
 Timecounters tick every 1.000 msec
 # sysctl kern.timecounter.hardware
 kern.timecounter.hardware: i8254

 Only have one timer to choose from.

 Can you provide a little bit more of hard data than the above?
 Specifically, the following sysctls:
 kern.timecounter
 dev.cpu

 Output of vmstat -i.
 _Verbose_ boot dmesg.

 Please do not disable ACPI when taking this data.
 Preferably, upload it somewhere and post a link to it.
 


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: cpu timer issues

2010-09-29 Thread Jeremy Chadwick
On Wed, Sep 29, 2010 at 01:49:39PM +1000, Jurgen Weber wrote:
 Andriy
 
 You can find everything you are after here:
 
 http://pastebin.com/WH4V2W0F

The information provided here shows ACPI is disabled in addition to the
boot not being verbose.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-29 Thread Don Lewis
On 29 Sep, Jeremy Chadwick wrote:

 Given all the information here, in addition to the other portion of the
 thread (indicating ntpd reports extreme offset between the system clock
 and its stratum 1 source), I would say the motherboard is faulty or
 there is a system device which is behaving badly (possibly something
 pertaining to interrupts, but I don't know how to debug this on a low
 level).

Possible, but I haven't run into any problems running -CURRENT on this
box with an SMP kernel.

 Can you boot verbosely and provide all of the output here or somewhere
 on the web?

http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-verbose.txt

 If possible, I would start by replacing the mainboard.  The board looks
 to be a consumer-level board (I see an nfe(4) controller, for example).

It's an Abit AN-M2 HD.  The RAM is ECC.  I haven't seen any machine
check errors in the logs.  I'll run prime95 as soon as I have a chance.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-29 Thread Jeremy Chadwick
On Wed, Sep 29, 2010 at 12:39:49AM -0700, Don Lewis wrote:
 On 29 Sep, Jeremy Chadwick wrote:
 
  Given all the information here, in addition to the other portion of the
  thread (indicating ntpd reports extreme offset between the system clock
  and its stratum 1 source), I would say the motherboard is faulty or
  there is a system device which is behaving badly (possibly something
  pertaining to interrupts, but I don't know how to debug this on a low
  level).
 
 Possible, but I haven't run into any problems running -CURRENT on this
 box with an SMP kernel.
 
  Can you boot verbosely and provide all of the output here or somewhere
  on the web?
 
 http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-verbose.txt
 
  If possible, I would start by replacing the mainboard.  The board looks
  to be a consumer-level board (I see an nfe(4) controller, for example).
 
 It's an Abit AN-M2 HD.  The RAM is ECC.  I haven't seen any machine
 check errors in the logs.  I'll run prime95 as soon as I have a chance.

Thanks for the verbose boot.  Since it works on -CURRENT, can you
provide a verbose boot from that as well?  Possibly someone made some
changes between RELENG_8 and HEAD which fixed an issue, which could be
MFC'd.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-29 Thread Don Lewis
On 29 Sep, Jeremy Chadwick wrote:
 On Wed, Sep 29, 2010 at 12:39:49AM -0700, Don Lewis wrote:
 On 29 Sep, Jeremy Chadwick wrote:
 
  Given all the information here, in addition to the other portion of the
  thread (indicating ntpd reports extreme offset between the system clock
  and its stratum 1 source), I would say the motherboard is faulty or
  there is a system device which is behaving badly (possibly something
  pertaining to interrupts, but I don't know how to debug this on a low
  level).
 
 Possible, but I haven't run into any problems running -CURRENT on this
 box with an SMP kernel.
 
  Can you boot verbosely and provide all of the output here or somewhere
  on the web?
 
 http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-verbose.txt
 
  If possible, I would start by replacing the mainboard.  The board looks
  to be a consumer-level board (I see an nfe(4) controller, for example).
 
 It's an Abit AN-M2 HD.  The RAM is ECC.  I haven't seen any machine
 check errors in the logs.  I'll run prime95 as soon as I have a chance.
 
 Thanks for the verbose boot.  Since it works on -CURRENT, can you
 provide a verbose boot from that as well?  Possibly someone made some
 changes between RELENG_8 and HEAD which fixed an issue, which could be
 MFC'd.

Even when I saw the wierd ntp stepping problem and the calcru messages,
the system was still stable enough to build hundreds of ports.  In the
most recent case, I built 800+ ports over several days without any other
hiccups.

It could also be a difference between SMP and !SMP.  I just found a bug
that causes an immediate panic if lock profiling is enabled on a !SMP
kernel.  This bug also exists in -CURRENT.  Here's the patch:

Index: sys/sys/mutex.h
===
RCS file: /home/ncvs/src/sys/sys/mutex.h,v
retrieving revision 1.105.2.1
diff -u -r1.105.2.1 mutex.h
--- sys/sys/mutex.h 3 Aug 2009 08:13:06 -   1.105.2.1
+++ sys/sys/mutex.h 29 Sep 2010 06:58:52 -
@@ -251,8 +251,11 @@
 #define _rel_spin_lock(mp) do {
\
if (mtx_recursed((mp))) \
(mp)-mtx_recurse--;\
-   else\
+   else {  \
(mp)-mtx_lock = MTX_UNOWNED;   \
+   LOCKSTAT_PROFILE_RELEASE_LOCK(LS_MTX_SPIN_UNLOCK_RELEASE, \
+   mp);\
+   }   \
spinlock_exit();\
 } while (0)
 #endif /* SMP */


After applying the above patch, I enabled lock profiling and got the
following results when I ran make index:
http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE_lock_profile.txt

I didn't see anything strange happening this time.  I don't know if I
got lucky, or the change in kernel options fixed the bug.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-29 Thread Andriy Gapon
on 29/09/2010 00:11 Don Lewis said the following:
 On 28 Sep, Don Lewis wrote:
 
 
 % vmstat -i
 interrupt  total   rate
 irq0: clk   60683442   1000
 irq1: atkbd0   6  0
 irq8: rtc7765537127
 irq9: acpi0   13  0
 irq10: ohci0 ehci1+ 10275064169
 irq11: fwohci0 ahc+   132133  2
 irq12: psm0   21  0
 irq14: ata090982  1
 irq15: nfe0 ata1   18363  0

 I'm not sure why I'm getting USB interrupts.  There aren't any USB
 devices plugged into this machine.
 
 Answer: irq 10 is also shared by vgapci0 and atapci1.

Just curious why Local APIC timer isn't being used for hardclock on your system.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-29 Thread Don Lewis
On 29 Sep, Andriy Gapon wrote:
 on 29/09/2010 00:11 Don Lewis said the following:
 On 28 Sep, Don Lewis wrote:
 
 
 % vmstat -i
 interrupt  total   rate
 irq0: clk   60683442   1000
 irq1: atkbd0   6  0
 irq8: rtc7765537127
 irq9: acpi0   13  0
 irq10: ohci0 ehci1+ 10275064169
 irq11: fwohci0 ahc+   132133  2
 irq12: psm0   21  0
 irq14: ata090982  1
 irq15: nfe0 ata1   18363  0

 I'm not sure why I'm getting USB interrupts.  There aren't any USB
 devices plugged into this machine.
 
 Answer: irq 10 is also shared by vgapci0 and atapci1.
 
 Just curious why Local APIC timer isn't being used for hardclock on your 
 system.

I'm using the same kernel config as the one on a slower !SMP box which
I'm trying to squeeze as much performance out of as possible.  My kernel
config file contains these statements:
nooptions   SMP
nodeviceapic

Testing with an SMP kernel is on my TODO list.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-29 Thread Andriy Gapon
on 29/09/2010 11:56 Don Lewis said the following:
 I'm using the same kernel config as the one on a slower !SMP box which
 I'm trying to squeeze as much performance out of as possible.  My kernel
 config file contains these statements:
   nooptions   SMP
   nodeviceapic
 
 Testing with an SMP kernel is on my TODO list.

SMP or not, it's really weird to see apic disabled nowadays.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Diskless/readonly root booting issues

2010-09-29 Thread Morgan Reed
Hi all,

I've been working on updating my semi-embedded images to
7.3-stable of late (I generally wait for .3+ releases), it's been a
few years since the last time I did one of these and I'm having some
issues getting my netboot test environment to behave itself.

I'm sure it's something simple but I've spent quite a bit of time
looking for answers and poking the system but no joy yet.

Basically I use a PXE booted NFS root to test my reduced footprint
image builds, the boot is working but init is attempting to remount /
rw (in spite of it being marked ro in fstab) which of course fails
because the directory is exported ro from the NFS server at which
point the system dumps me to single user mode;

=== OUTPUT ===

Starting file system checks:
udp: Netconfig database not found
Mounting root filesystem rw failed, startup aborted
ERROR: ABORTING BOOT (sending SIGTERM to parent)!
Sep 30 09:60:02 init: /bin/sh on /etc/rc terminated abnormally, going
to single user mode
Enter full pathname of shell or RETURN for /bin/sh:



Relevant configs from the diskless root

== rc.conf ==

ifconfig_le0=DHCP

diskless_mount=/etc/rc.initdiskless

varsize=8192
varmfs=YES

tmpsize=8192
tmpmfs=YES

nfs_client_enable=YES

dumpdev=NO

=

rc.initdiskless is the version from /usr/share/examples/rc.initdiskless

== fstab ==

192.168.2.2:/usr/fbtest / nfs ro 0 0
proc /proc procfs rw 0 0



== loader.conf ==

verbose_loading=YES

autoboot_delay=2



Kernel is (obviously) built with NFS_ROOT and NFSCLIENT, relatively
minimalist otherwise, have also tested with GENERIC, same result.

I must be forgetting something simple in all of this, I don't recall
it being terribly difficult to get this stuff working when I was doing
my original work with 6.3, though I don't recall the use of the
initdiskless script, IIRC I was using rc.diskless2 which (again IIRC)
was later replaced by /etc/rc.d/diskless but I've not been able to
find this script anywhere.

Any suggestions would be greatly appreciated at this point.

Thanks,

Morgan Reed
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-29 Thread Vitaly Magerya
Chuck Swiger wrote:
  MCA: Bank 1, Status 0xe20001f5
  MCA: Global Cap 0x0005, Status 0x
  MCA: Vendor GenuineIntel, ID 0x695, APIC ID 0
  MCA: CPU 0 UNCOR PCC OVER DCACHE L1 ??? error
 
 That is very likely to be a matter of luck.  If I translate this MCA right,
 it looks to be an uncorrected error in L1 data cache on the CPU.  Try to run
 something like prime95's torture test mode and see whether it fails 
 overnight

The test run for 17 hours without any problems (or MCA messages), then I
put the laptop for a 5 minute sleep, resumed the test, and after 30
minutes of it there are now two MCA messages in dmesg. Are they somehow
related, or is this a coincidence?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-29 Thread Chuck Swiger
Hi--

On Sep 29, 2010, at 8:42 AM, Vitaly Magerya wrote:
 The test run for 17 hours without any problems (or MCA messages),

That part is good.  At least starting from normal operation, your laptop is 
running stably under load

 then I put the laptop for a 5 minute sleep, resumed the test, and after 30
 minutes of it there are now two MCA messages in dmesg. Are they somehow
 related, or is this a coincidence?

I doubt repeated coincidences.  :-)  Is prime95 testing running stable after 
waking from sleep?

Regards,
-- 
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-29 Thread Jeremy Chadwick
On Wed, Sep 29, 2010 at 09:57:53AM -0700, Chuck Swiger wrote:
 Hi--
 
 On Sep 29, 2010, at 8:42 AM, Vitaly Magerya wrote:
  The test run for 17 hours without any problems (or MCA messages),
 
 That part is good.  At least starting from normal operation, your laptop is 
 running stably under load
 
  then I put the laptop for a 5 minute sleep, resumed the test, and after 30
  minutes of it there are now two MCA messages in dmesg. Are they somehow
  related, or is this a coincidence?
 
 I doubt repeated coincidences.  :-)  Is prime95 testing running stable after 
 waking from sleep?

He's not running Prime95 (native Win32 app), he's running
ports/math/mprime under FreeBSD natively.  I don't know if this
application stresses hardware to the same degree Prime95 does; I've used
Prime95 many times to burn in new workstations.

The Thinkpad hardware he's on is old (note the quotes), so I
wouldn't be surprised if the CPU (Intel Pentium M) happens to induce a
strange/odd MCA event as a result of going in/out of sleep state.  It
could be a general system bug of some sort as well (one which has no
repercussions).

Look at it this way: if his L1 cache was going bad, his system would be
freaking out doing literally anything (booting the kernel for example);
I'm under the impression Pentium M CPUs do not have ECC L1 cache.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-29 Thread Vitaly Magerya
Jeremy Chadwick wrote:
 On Wed, Sep 29, 2010 at 09:57:53AM -0700, Chuck Swiger wrote:
 Is prime95 testing running stable after waking from sleep?

Yes, 0 errors, 0 warnings.

 The Thinkpad hardware he's on is old (note the quotes), so I
 wouldn't be surprised if the CPU (Intel Pentium M) happens to induce a
 strange/odd MCA event as a result of going in/out of sleep state.  It
 could be a general system bug of some sort as well (one which has no
 repercussions).

Well, since it causes no other visible problems, it might just as well
be a false alarm.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-29 Thread Jeremy Chadwick
On Wed, Sep 29, 2010 at 10:16:13AM -0700, Chuck Swiger wrote:
 On Sep 29, 2010, at 10:07 AM, Jeremy Chadwick wrote:
  On Wed, Sep 29, 2010 at 09:57:53AM -0700, Chuck Swiger wrote:
  
  I doubt repeated coincidences.  :-)  Is prime95 testing running stable 
  after waking from sleep?
  
  He's not running Prime95 (native Win32 app), he's running
  ports/math/mprime under FreeBSD natively.  I don't know if this
  application stresses hardware to the same degree Prime95 does; I've used
  Prime95 many times to burn in new workstations.
 
 It's doing the same math operations; something like mprime -t is the same 
 as the Win32 test mode per the docs:
 
  -t  Run the torture test.  Same as Options/Torture Test.
 
  The Thinkpad hardware he's on is old (note the quotes), so I
  wouldn't be surprised if the CPU (Intel Pentium M) happens to induce a
  strange/odd MCA event as a result of going in/out of sleep state.  It
  could be a general system bug of some sort as well (one which has no
  repercussions).
 
 That sounds reasonable to me, but I'm wary of uncorrected errors which seem 
 to be reproducible to specific circumstances.
 
  Look at it this way: if his L1 cache was going bad, his system would be
  freaking out doing literally anything (booting the kernel for example);
  I'm under the impression Pentium M CPUs do not have ECC L1 cache.
 
 Sure, if the MCA report is reflecting a legitimate problem, and it was 
 happening more often than every few minutes, and it happened after a cold 
 reboot rather than after wakeup from sleep  :-)
 
 I place more faith in ~17 hours of Prime95/mprime working OK to validate that 
 the hardware is not obviously broken.

Oh, absolutely.  If anything my statement was indirectly agreeing with
your recommended test (sans being unsure how mprime behaved).  :-)

I wonder if there's CPU errata or something along those lines which
might explain the behaviour.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE

2010-09-29 Thread Jeremy Chadwick
On Wed, Sep 29, 2010 at 08:24:21PM +0300, Vitaly Magerya wrote:
 Jeremy Chadwick wrote:
  On Wed, Sep 29, 2010 at 09:57:53AM -0700, Chuck Swiger wrote:
  Is prime95 testing running stable after waking from sleep?
 
 Yes, 0 errors, 0 warnings.
 
  The Thinkpad hardware he's on is old (note the quotes), so I
  wouldn't be surprised if the CPU (Intel Pentium M) happens to induce a
  strange/odd MCA event as a result of going in/out of sleep state.  It
  could be a general system bug of some sort as well (one which has no
  repercussions).
 
 Well, since it causes no other visible problems, it might just as well
 be a false alarm.

Highly possible.  If it bothers you to the point where you'd rather not
see it, you can disable MCA events by setting hw.mca.enabled=0 in
/boot/loader.conf.  I don't know of a way to conditionally ignore
certain MCAs.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


zfs send/receive: is this slow?

2010-09-29 Thread Dan Langille
It's taken about 15 hours to copy 800GB.  I'm sure there's some tuning I
can do.

The system is now running:

# zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula

All the drives are Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device

from systat:

1 usersLoad  0.36  0.58  0.57  Sep 29 13:47

Mem:KBREALVIRTUAL   VN PAGER   SWAP PAGER
Tot   Share  TotShareFree   in   out in   out
Act   420127584   54404411028  204492  count
All  9623568736 1074363k18220  pages
Proc:Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt141 cow9951 total
 42   23k  668 3094 1951 2166  657288 zfod   
ohci0 ohci
  ozfod  
ohci2 ohci
13.6%Sys   0.8%Intr  0.2%User  0.0%Nice 85.5%Idle%ozfod   ahc0
irq20
|||||||||||   daefr  
ahci0 22
===   366 prcfr  2000
cpu0: time
26 dtbuf47129 totfr 3 em0
irq256
Namei Name-cache   Dir-cache10 desvn  react   892
siis0 257
   Callshits   %hits   % 87983 numvn  pdwak  1056
siis1 259
46084608 100 24981 frevn  pdpgs  2000
cpu3: time
  intrn  2000
cpu1: time
Disks  ada0  ada1  ada2  ada3  ada4  ada5  ada6   1355484 wire   2000
cpu2: time
KB/t  35.95 37.00 36.75 41.44 40.05 40.86 41.11 25936 act
tps 306   299   301   267   276   271   269   2452756 inact
MB/s  10.75 10.82 10.79 10.79 10.80 10.80 10.81 76664 cache
%busy27502537272727127828 free
   427728 buf



$ zpool iostat 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
storage 7.67T  5.02T358 38  43.1M  1.96M
storage 7.67T  5.02T317475  39.4M  30.9M
storage 7.67T  5.02T357533  44.3M  34.4M
storage 7.67T  5.02T371556  46.0M  35.8M
storage 7.67T  5.02T313521  38.9M  28.7M
storage 7.67T  5.02T309457  38.4M  30.4M
storage 7.67T  5.02T388589  48.2M  37.8M
storage 7.67T  5.02T377581  46.8M  36.5M
storage 7.67T  5.02T310559  38.4M  30.4M
storage 7.67T  5.02T430611  53.4M  41.3M

$ zfs get all storage/compressed
NAMEPROPERTY  VALUE  SOURCE
storage/compressed  type  filesystem -
storage/compressed  creation  Tue Sep 28 20:35 2010  -
storage/compressed  used  856G   -
storage/compressed  available 3.38T  -
storage/compressed  referenced44.8K  -
storage/compressed  compressratio 1.60x  -
storage/compressed  mounted   yes-
storage/compressed  quota none   default
storage/compressed  reservation   none   default
storage/compressed  recordsize128K   default
storage/compressed  mountpoint/storage/compresseddefault
storage/compressed  sharenfs  offdefault
storage/compressed  checksum  on default
storage/compressed  compression   on local
storage/compressed  atime on default
storage/compressed  devices   on default
storage/compressed  exec  on default
storage/compressed  setuidon default
storage/compressed  readonly  offdefault
storage/compressed  jailedoffdefault
storage/compressed  snapdir   hidden default
storage/compressed  aclmode   groupmask  default
storage/compressed  aclinheritrestricted default
storage/compressed  canmount  on default
storage/compressed  shareiscsioffdefault
storage/compressed  xattr offtemporary
storage/compressed  copies1  default
storage/compressed  version   4  -
storage/compressed  utf8only  off-
storage/compressed  normalization none   -
storage/compressed  casesensitivity   sensitive  -
storage/compressed  vscan  

Re: zfs send/receive: is this slow?

2010-09-29 Thread Artem Belevich
On Wed, Sep 29, 2010 at 11:04 AM, Dan Langille d...@langille.org wrote:
 It's taken about 15 hours to copy 800GB.  I'm sure there's some tuning I
 can do.

 The system is now running:

 # zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula

Try piping zfs data through mbuffer (misc/mbuffer in ports). I've
found that it does help a lot to smooth out data flow and increase
send/receive throughput even when send/receive happens on the same
host. Run it with a buffer large enough to accommodate few seconds
worth of write throughput for your target disks.

Here's an example:
http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/

--Artem
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: wifi issues under -stable

2010-09-29 Thread Jim Bryant

Warren Block wrote:

On Wed, 22 Sep 2010, Jim Bryant wrote:

i have two laptops, both are Compaq(HP) C300 series using the 
motherboards with the 945GM chipset and using T7200 and T7600 Core2 
Duos.


One (this one) has an intel pro wireless 3945ABG installed, which 
returns:


wpi0: Intel(R) PRO/Wireless 3945ABG irq 18 at device 0.0 on pci6
wpi0: Driver Revision 20071127
wpi0: 0x1000 bytes of rid 0x10 res 3 failed (0, 0x).
wpi0: could not allocate memory resource
device_attach: wpi0 attach returned 6

and the broadcom used in the other does the exact same thing.

I'm thinking that this isn't really a problem with the wifi, but may 
be a mini-pci-e issue.


Don't know about the Intel, but some Broadcoms work.  Please show what 
you're doing in /boot/loader.conf and /etc/rc.conf.


the other machine has configs pretty much the same as this one.  the 
problem seems to be at the kernel level tho, at probe time.


4:19:52pm  argus(14): cat /boot/loader.conf
beastie_disable=YES   # Turn the beastie boot menu on and off# 
Beginning of the block added by the VMware software

vmxnet_load=YES
# End of the block added by the VMware software

4:20:02pm  argus(15): cat /etc/rc.conf

# -- sysinstall generated deltas -- # Tue Mar 23 21:07:50 2010
# Created: Tue Mar 23 21:07:50 2010
# Enable network daemons for user convenience.
# Please make all changes to this file, not to /etc/defaults/rc.conf.
# This file now contains just the overrides from /etc/defaults/rc.conf.
accounting_enable=YES
gateway_enable=YES
hostname=argus.root.com
ifconfig_rl0=inet 192.168.0.2 192.168.0.1 netmask 255.255.255.0
inetd_enable=YES
ipv6_enable=NO
keyrate=fast
lpd_enable=YES
moused_enable=YES
moused_port=/dev/psm0
moused_type=auto
named_enable=YES
nfs_client_enable=YES
nfs_reserved_port_only=YES
nfs_server_enable=YES
router=/sbin/routed
router_enable=YES
router_flags=-q
rpc_lockd_enable=YES
rpc_statd_enable=YES
rpcbind_enable=YES
rwhod_enable=NO
saver=NO
scrnmap=NO
sshd_enable=YES
blanktime=300
font8x14=cp437-8x14
font8x16=cp437-8x16
font8x8=cp437-8x8
allscreens_flags=-c blink MODE_280# Set this vidcontrol mode for 
all virtual screens
allscreens_kbdflags=-r fast   # Set this kbdcontrol mode for all 
virtual screens

fusefs_enable=YES
mysql_enable=YES
lircd_enable=YES

4:23:36pm  argus(17): pciconf -lv
hos...@pci0:0:0:0:  class=0x06 card=0x30a5103c chip=0x27a08086 
rev=0x03 hdr=0x00

   vendor = 'Intel Corporation'
   device = '955XM/945GM/PM/GMS/940GML Express Processor to DRAM 
Controller'

   class  = bridge
   subclass   = HOST-PCI
vgap...@pci0:0:2:0: class=0x03 card=0x30a5103c chip=0x27a28086 
rev=0x03 hdr=0x00

   vendor = 'Intel Corporation'
   device = 'Mobile 945GM/GU Express Integrated Graphics Controller'
   class  = display
   subclass   = VGA
vgap...@pci0:0:2:1: class=0x038000 card=0x30a5103c chip=0x27a68086 
rev=0x03 hdr=0x00

   vendor = 'Intel Corporation'
   device = 'Mobile 945GM/GU Express Integrated Graphics Controller'
   class  = display
hd...@pci0:0:27:0:  class=0x040300 card=0x30a5103c chip=0x27d88086 
rev=0x01 hdr=0x00

   vendor = 'Intel Corporation'
   device = 'IDT High Definition Audio Driver  (BA101897)'
   class  = multimedia
   subclass   = HDA
pc...@pci0:0:28:0:  class=0x060400 card=0x30a5103c chip=0x27d08086 
rev=0x01 hdr=0x01

   vendor = 'Intel Corporation'
   device = '82801G (ICH7 Family) PCIe Root Port'
   class  = bridge
   subclass   = PCI-PCI
pc...@pci0:0:28:2:  class=0x060400 card=0x30a5103c chip=0x27d48086 
rev=0x01 hdr=0x01

   vendor = 'Intel Corporation'
   device = '82801G (ICH7 Family) PCIe Root Port'
   class  = bridge
   subclass   = PCI-PCI
uh...@pci0:0:29:0:  class=0x0c0300 card=0x30a5103c chip=0x27c88086 
rev=0x01 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801G (ICH7 Family) USB Universal Host Controller'
   class  = serial bus
   subclass   = USB
uh...@pci0:0:29:1:  class=0x0c0300 card=0x30a5103c chip=0x27c98086 
rev=0x01 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801G (ICH7 Family) USB Universal Host Controller'
   class  = serial bus
   subclass   = USB
uh...@pci0:0:29:2:  class=0x0c0300 card=0x30a5103c chip=0x27ca8086 
rev=0x01 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801G (ICH7 Family) USB Universal Host Controller'
   class  = serial bus
   subclass   = USB
eh...@pci0:0:29:7:  class=0x0c0320 card=0x30a5103c chip=0x27cc8086 
rev=0x01 hdr=0x00

   vendor = 'Intel Corporation'
   device = '82801G (ICH7 Family) USB 2.0 Enhanced Host Controller'
   class  = serial bus
   subclass   = USB
pc...@pci0:0:30:0:  class=0x060401 card=0x30a5103c chip=0x24488086 
rev=0xe1 hdr=0x01

   vendor = 'Intel Corporation'
   device = '82801 Family (ICH2/3/4/5/6/7/8/9-M) Hub Interface to 
PCI Bridge'

   class  = bridge
   subclass   = PCI-PCI

Re: cpu timer issues

2010-09-29 Thread Jurgen Weber
 I do not understand what you mean by a verbose dmesg.. looking at 
the man page there is no verbose option for dmesg except what I 
completed (dmesg -a).


Once that is clarified I can reboot the backup machine and turn on ACPI 
for you.


On 29/09/10 5:29 PM, Jeremy Chadwick wrote:

On Wed, Sep 29, 2010 at 01:49:39PM +1000, Jurgen Weber wrote:

Andriy

You can find everything you are after here:

http://pastebin.com/WH4V2W0F


The information provided here shows ACPI is disabled in addition to the
boot not being verbose.



--
--
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: cpu timer issues

2010-09-29 Thread Jurgen Weber

Hi

 I do not understand what you mean by a verbose dmesg.. looking at 
the man page there is no verbose option for dmesg except what I 
completed (dmesg -a).


Once that is clarified I can reboot the backup machine and turn on ACPI 
for you.


Thanks

On 29/09/10 5:26 PM, Andriy Gapon wrote:

on 29/09/2010 06:49 Jurgen Weber said the following:

Andriy

You can find everything you are after here:

http://pastebin.com/WH4V2W0F


Looks like this was with ACPI disabled?
Can you try to re-enable it?
Also, it doesn't look like the dmesg is verbose.



On 28/09/10 8:07 PM, Andriy Gapon wrote:

on 28/09/2010 10:54 Jurgen Weber said the following:

# dmesg | grep Timecounter
Timecounter i8254 frequency 1193182 Hz quality 0
Timecounters tick every 1.000 msec
# sysctl kern.timecounter.hardware
kern.timecounter.hardware: i8254

Only have one timer to choose from.


Can you provide a little bit more of hard data than the above?
Specifically, the following sysctls:
kern.timecounter
dev.cpu

Output of vmstat -i.
_Verbose_ boot dmesg.

Please do not disable ACPI when taking this data.
Preferably, upload it somewhere and post a link to it.







--
--
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: cpu timer issues

2010-09-29 Thread Jeremy Chadwick
On Thu, Sep 30, 2010 at 07:51:49AM +1000, Jurgen Weber wrote:
  I do not understand what you mean by a verbose dmesg.. looking
 at the man page there is no verbose option for dmesg except what I
 completed (dmesg -a).
 
 Once that is clarified I can reboot the backup machine and turn on
 ACPI for you.
 
 On 29/09/10 5:29 PM, Jeremy Chadwick wrote:
 On Wed, Sep 29, 2010 at 01:49:39PM +1000, Jurgen Weber wrote:
 Andriy
 
 You can find everything you are after here:
 
 http://pastebin.com/WH4V2W0F
 
 The information provided here shows ACPI is disabled in addition to the
 boot not being verbose.

When the machine boots (when loader starts), you'll see the FreeBSD logo
with a menu of choices (boot, boot with ACPI disabled, single user mode,
etc.).  One of them is boot verbosely; I think it's #5, labelled Boot
with verbose logging or something like that.

Choose that.  That will cause your machine to boot with ACPI enabled, in
addition to booting verbosely.  There will be a LOT more information
printed on the screen during the boot process, and it should be visible
in /var/log/messages after the machine is started.  This is the
information we're looking for.

HTH!

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: cpu timer issues

2010-09-29 Thread Andriy Gapon
on 30/09/2010 00:51 Jurgen Weber said the following:
 Hi
 
  I do not understand what you mean by a verbose dmesg.. looking at the man
 page there is no verbose option for dmesg except what I completed (dmesg -a).
 
 Once that is clarified I can reboot the backup machine and turn on ACPI for 
 you.

Verbose dmesg is produced when kernel is booted with verbose logging.

Either boot -v on loader prompt.
Or '5' (IIRC) in loader menu.
Or nextboot -k kernel -o -v before reboot.
Or verbose_loading=YES in loader.conf.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: cpu timer issues

2010-09-29 Thread Jurgen Weber

Gentlemen

Ah, ok. Learn something new everyday. Fantastic. The first time the 
machine stopped during the boot process, but that is ok the 2nd time we 
have success.


http://pastebin.com/r4UWdN7U

I am not sure if ACPI is on, Jeremy you mention below that it should be 
in just by booting with this option so let me know if there are any 
problems there.


Thanks

Jurgen

On 30/09/10 7:56 AM, Jeremy Chadwick wrote:

On Thu, Sep 30, 2010 at 07:51:49AM +1000, Jurgen Weber wrote:

  I do not understand what you mean by a verbose dmesg.. looking
at the man page there is no verbose option for dmesg except what I
completed (dmesg -a).

Once that is clarified I can reboot the backup machine and turn on
ACPI for you.

On 29/09/10 5:29 PM, Jeremy Chadwick wrote:

On Wed, Sep 29, 2010 at 01:49:39PM +1000, Jurgen Weber wrote:

Andriy

You can find everything you are after here:

http://pastebin.com/WH4V2W0F


The information provided here shows ACPI is disabled in addition to the
boot not being verbose.


When the machine boots (when loader starts), you'll see the FreeBSD logo
with a menu of choices (boot, boot with ACPI disabled, single user mode,
etc.).  One of them is boot verbosely; I think it's #5, labelled Boot
with verbose logging or something like that.

Choose that.  That will cause your machine to boot with ACPI enabled, in
addition to booting verbosely.  There will be a LOT more information
printed on the screen during the boot process, and it should be visible
in /var/log/messages after the machine is started.  This is the
information we're looking for.

HTH!



--
--
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime

2010-09-29 Thread Don Lewis
On 29 Sep, Andriy Gapon wrote:
 on 29/09/2010 11:56 Don Lewis said the following:
 I'm using the same kernel config as the one on a slower !SMP box which
 I'm trying to squeeze as much performance out of as possible.  My kernel
 config file contains these statements:
  nooptions   SMP
  nodeviceapic
 
 Testing with an SMP kernel is on my TODO list.
 
 SMP or not, it's really weird to see apic disabled nowadays.

I tried enabling apic and got worse results.  I saw ping RTTs as high as
67 seconds.  Here's the timer info with apic enabled:

# sysctl kern.timecounter
kern.timecounter.tick: 1
kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-100)
kern.timecounter.hardware: ACPI-fast
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 53633
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.ACPI-fast.counter: 7988816
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.quality: 1000
kern.timecounter.tc.TSC.mask: 4294967295
kern.timecounter.tc.TSC.counter: 1341917999
kern.timecounter.tc.TSC.frequency: 2500014018
kern.timecounter.tc.TSC.quality: 800
kern.timecounter.invariant_tsc: 0

Here's the verbose boot info with apic:
http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-apic-verbose.txt


I've also experimented with SMP as well as SCHED_4BSD (all previous
testing was with !SMP and SCHED_ULE).  I still see occasional problems
with SCHED_4BSD and !SMP, but so far I have not seen any problems with
SCHED_ULE and SMP.

I did manage to catch the problem with lock profiling enabled:
http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE_lock_profile_freeze.txt


I'm currently testing SMP some more to verify if it really avoids this
problem.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: zfs send/receive: is this slow?

2010-09-29 Thread Dan Langille

On 9/29/2010 3:57 PM, Artem Belevich wrote:

On Wed, Sep 29, 2010 at 11:04 AM, Dan Langilled...@langille.org  wrote:

It's taken about 15 hours to copy 800GB.  I'm sure there's some tuning I
can do.

The system is now running:

# zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula


Try piping zfs data through mbuffer (misc/mbuffer in ports). I've
found that it does help a lot to smooth out data flow and increase
send/receive throughput even when send/receive happens on the same
host. Run it with a buffer large enough to accommodate few seconds
worth of write throughput for your target disks.


Thanks.  I just installed it.  I'll use it next time.  I don't want to 
interrupt this one.  I'd like to see how long it takes.  Then compare.



Here's an example:
http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/


That looks really good. Thank you.

--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org