extremely strange results with mail or make after one day uptime

2010-06-10 Thread Harald Schmalzbauer

Hello all,

I have absolutley no idea how it comes:
After one or two days of uptime, 'make -j3 buildworld' just returns 
without any error but without doing anything.
Also sending mail via 'mail' produces mutilatet output: no valid rcpt, 
no subject.

When I reboot the machine everything is fine again.
Daily status reports also stop working; that's how I found this 
absolutley mysterious spookie thing.

And: sshguard destroys hosts.allow. It's suddenly empty.
The jails don't seem to suffer form that ghost.
My setup:
8.1-prerelease on amd64 with _read-only_ / (and unionfs /etc), and 
dozends of ZFS filesystems for jails.


Where to start to debug?

Any hints highly appreciated.

-Harry



signature.asc
Description: OpenPGP digital signature


Re: extremely strange results with mail or make after one day uptime

2010-06-10 Thread Harald Schmalzbauer

Harald Schmalzbauer schrieb am 10.06.2010 10:10 (localtime):

Hello all,

I have absolutley no idea how it comes:
After one or two days of uptime, 'make -j3 buildworld' just returns 
without any error but without doing anything.
Also sending mail via 'mail' produces mutilatet output: no valid rcpt, 
no subject.

When I reboot the machine everything is fine again.
Daily status reports also stop working; that's how I found this 
absolutley mysterious spookie thing.

And: sshguard destroys hosts.allow. It's suddenly empty.


It seems that any text handling routine gets crazy because 
/var/run/jail_XXX.id is also empty when starting new jails.
It worked at machine boot, since older running jails do have a number in 
their id file. Only newer started/restarted jails have an empty id file.

Also the hosts.allow gets reproducable emptied by sshguard.

Where's the part of FreeBSD doing such text manipulation?
Maybe that's also responsable for makefile parsing and such explains the 
'make' failure? Interesting is that 'make' without -j3 at least starts 
the build, but reproducable fails at different points while src tree is 
absolutely consistent. If I mount it elsewhere everything compiles fine.


And to ephisize: This mysteric behaviour of some base system components 
appears only after some uptime. I haven't seen it the first day after 
fresh booting.


Anybody any ideas? I'm desperate because I don't know here I could start 
to search/test.


Thanks in advance,

-Harry



signature.asc
Description: OpenPGP digital signature


tmpfs problem [Was: Re: extremely strange results with mail or make after one day uptime]

2010-06-10 Thread Harald Schmalzbauer

Harald Schmalzbauer schrieb am 10.06.2010 11:17 (localtime):

Harald Schmalzbauer schrieb am 10.06.2010 10:10 (localtime):

Hello all,

I have absolutley no idea how it comes:
After one or two days of uptime, 'make -j3 buildworld' just returns 
without any error but without doing anything.
Also sending mail via 'mail' produces mutilatet output: no valid rcpt, 
no subject.

When I reboot the machine everything is fine again.
Daily status reports also stop working; that's how I found this 
absolutley mysterious spookie thing.

And: sshguard destroys hosts.allow. It's suddenly empty.


It seems that any text handling routine gets crazy because 
/var/run/jail_XXX.id is also empty when starting new jails.
It worked at machine boot, since older running jails do have a number in 
their id file. Only newer started/restarted jails have an empty id file.

Also the hosts.allow gets reproducable emptied by sshguard.

Where's the part of FreeBSD doing such text manipulation?
Maybe that's also responsable for makefile parsing and such explains the 
'make' failure? Interesting is that 'make' without -j3 at least starts 
the build, but reproducable fails at different points while src tree is 
absolutely consistent. If I mount it elsewhere everything compiles fine.


And to ephisize: This mysteric behaviour of some base system components 
appears only after some uptime. I haven't seen it the first day after 
fresh booting.


Anybody any ideas? I'm desperate because I don't know here I could start 
to search/test.


Ok, luck seems to be with the stupid this day ;)

I identified tmpfs as the culprit.

'head /etc/hosts.allow' correctly returnes two lines!
'head -n 2 /etc/hosts.allow'  /tmp/test results in a empty file
'head -n 2 /etc/hosts.allow  /var/tmp/test' produves a file with the 
expected two lines.


Ok, next time I'll better adhere to developers experimental warnings...

Is there anything I can do for the hackers? I haven't had that symptom 
before on any other machine and I'm using tmpfs for quiet some time. 
Maby it's an interesting edge case.


Thanks,

-Harry



signature.asc
Description: OpenPGP digital signature


Re: Resizing GPT partitions

2010-06-10 Thread Stephane Dupille
Andrey V. Elsukov bu7c...@yandex.ru écrit :

Hi,

 Stephane Dupille wrote:
 Currently there is no easy way to do it. GPT holds information about
 first and last usable sectors. You can see them in your output:
 last: 20971486
 first: 34

I had the opportunity to boot that machine from network (a linux), and
parted fix GPT tables correctly. Now, I have in FreeBSD the right last
usable sector :
last: 312581774
first: 34

And dmesg does not say anymore that the secondary GPT table is corrupt
or invalid.

(yeah, one problem fixed)

Unfortunatly, parted does not allow me to resize the last partition
because it does not know the type of the partition.

 You can look at freebsd-geom's mail list archive. There was a topic
 OCE and GPT with similar problem.

Yep, seen it. I applied your patch to resize partition, but I didn't
manage to use it correctly.

# cd /usr/src
# patch  /root/patch
# cd sbin/geom/class/part/
# make
# make install

Did I applied the patch correctly ? It seems not working :
# gpart resize -i 3 ad0
gpart: param 'size': Invalid argument


Thanks for your reply.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


calcru: runtime went backwards messages

2010-06-10 Thread Jansen Gotis
Hi, for the past couple of months since moving to RELENG_8 I've been
receiving calcru: runtime went backwards messages on the console.

My machine is a dual Pentium III 1.26GHz with an Intel SAI2 board.
Disabling EIST is not an option in my BIOS, and I've tried disabling
the ACPI timer as well as setting kern.timecounter.hardware=i8254.
I've also tried disabling cpufreq in my kernel configuration.

For what it's worth, I'm running base ntpd. I've also tried openntpd,
but no dice.

I did a binary search of the commit with which this started, and
apparently it's svn r204546, a summary of which can be seen here:
http://freshbsd.org/2010/03/02/01/56/55

The calcru messages appear whether vesa is loaded as a module
or compiled into the kernel.

If anyone needs more information, I'll be happy to provide it.


Best regards,
Jansen

= snippet of /var/log/messages relating to calcru messages =
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
3502 usec to 3297 usec for pid 1106 (mksh)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
36785 usec to 35858 usec for pid 1114 (csh)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
13438 usec to 12652 usec for pid 1113 (su)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
14956 usec to 14081 usec for pid  (mksh)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
3323 usec to 3128 usec for pid  (mksh)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from 610
usec to 574 usec for pid 549 (devd)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from 517
usec to 486 usec for pid 548 (dhclient)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
1912 usec to 1800 usec for pid 532 (dhclient)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
39738 usec to 37412 usec for pid 532 (dhclient)
Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
3369010 usec to 3334846 usec for pid 1 (init)


= /var/run/dmesg.boot =
Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-STABLE #0 r204546: Thu Jun 10 21:05:09 PHT 2010
jan...@hobbes.jansen.homenet:/usr/obj/usr/src/sys/LOCAL i386
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) III CPU family  1266MHz (1263.45-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x6b1  Stepping = 1
  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
real memory  = 2148007936 (2048 MB)
avail memory = 2090995712 (1994 MB)
ACPI APIC Table: Intel  0278
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s) x 1 core(s)
 cpu0 (BSP): APIC ID:  3
 cpu1 (AP): APIC ID:  0
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 Version 1.1 irqs 0-15 on motherboard
ioapic1 Version 1.1 irqs 16-31 on motherboard
kbd1 at kbdmux0
netsmb_dev: loaded
smbios0: System Management BIOS at iomem 0xf6e90-0xf6eae on motherboard
smbios0: Version: 2.3, BCD Revision: 2.3
acpi0: Intel 0278 on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
Timecounter ACPI-safe frequency 3579545 Hz quality 850
acpi_timer0: 32-bit timer at 3.579545MHz port 0x508-0x50b on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
vgapci0: VGA-compatible display port 0x2000-0x20ff mem
0xfa00-0xfaff,0xfb00-0xfb000fff at device 2.0 on pci0
fxp0: Intel 82559 Pro/100 Ethernet port 0x2400-0x243f mem
0xfb001000-0xfb001fff,0xfb10-0xfb1f irq 18 at device 3.0 on
pci0
miibus0: MII bus on fxp0
inphy0: i82555 10/100 media interface PHY 1 on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:03:47:a6:0d:4a
fxp0: [ITHREAD]
isab0: PCI-ISA bridge at device 15.0 on pci0
isa0: ISA bus on isab0
atapci0: ServerWorks CSB5 UDMA100 controller port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2440-0x244f at device 15.1 on
pci0
ata0: ATA channel 0 on atapci0
ata0: [ITHREAD]
ata1: ATA channel 1 on atapci0
ata1: [ITHREAD]
ohci0: OHCI (generic) USB controller mem 0xfb002000-0xfb002fff irq 9
at device 15.2 on pci0
ohci0: [ITHREAD]
usbus0: OHCI (generic) USB controller on ohci0
pcib1: ACPI Host-PCI bridge on acpi0
pci1: ACPI PCI bus on pcib1
atapci1: Promise PDC20575 SATA150 controller port
0x2480-0x24ff,0x2800-0x28ff mem
0xfb42-0xfb420fff,0xfb40-0xfb41 irq 20 at device 10.0 on
pci1
atapci1: [ITHREAD]
atapci1: [ITHREAD]
ata2: ATA channel 0 on atapci1
ata2: SIGNATURE: 0101
ata2: [ITHREAD]
ata3: ATA channel 1 on atapci1
ata3: [ITHREAD]
ata4: ATA channel 2 on atapci1
ata4: [ITHREAD]
atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0
atkbdc0: Keyboard controller (i8042) port 0x60,0x64 

Re: Resizing GPT partitions

2010-06-10 Thread Andrey V. Elsukov
Stephane Dupille wrote:
 And dmesg does not say anymore that the secondary GPT table is corrupt
 or invalid.

I have plan to add `gpart recover` feature in near future.

 Yep, seen it. I applied your patch to resize partition, but I didn't
 manage to use it correctly.
 
 # cd /usr/src
 # patch  /root/patch
 # cd sbin/geom/class/part/
 # make
 # make install
 
 Did I applied the patch correctly ? It seems not working :
 # gpart resize -i 3 ad0
 gpart: param 'size': Invalid argument

It needs in kernel support too. You can try to download snapshot
of livefs CD of 9.0-CURRENT and use it.
http://pub.allbsd.org/FreeBSD-snapshots/

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


File system trouble with ICH9 controller

2010-06-10 Thread Robin Sommer
I'm running 8.0-RELEASE-p2 (amd64) on a larger number of Supermicro
SBI-7425C-T3 blades. Each of the blades has 2 x 500GB disks striped
into a single volume via the on-board ICH9 RAID controller. 

However, after running fine for a while (days), the blades crash
eventually with file system problems such as the one below.
Initially I thought that must be a bad disk, but by now 5 different
blades have shown similar problems so I'm suspecting some OS issue. 

Has anybody seen something similar before? Could this be an
incompatibility with the RAID controller (I haven't found much
recent on Google but there are a number of older threads indicating
that it might not be well supported. Not sure though whether those
still apply).  

Any other thoughts?

Thanks,

Robin

- syslog ---

Jun  9 10:00:02 user.crit blade19 kernel: ar0s1a[WRITE(offset=704187858944, 
length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704188219392, length=131072)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704188891136, length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704189382656, length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704189743104, length=131072)]
Jun  9 10:00:02 user.crit blade19 kernel: error = 5

- system information  --

# uname -a
FreeBSD blade5 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan  5 21:11:58 
UTC 2010 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

# pciconf -lv | grep SATA
device = '82801IB/IR/IH (ICH9 Family) SATA RAID Controller'

# atacontrol list
ATA channel 2:
Master:  ad4 ST9500325AS/0001SDM1 SATA revision 2.x
Slave:   no device present
ATA channel 3:
Master:  ad6 ST9500325AS/0001SDM1 SATA revision 2.x
Slave:   no device present

# dmesg | grep ata
atapci0: Intel ICH9 SATA300 controller port 
0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 
0xfcc0-0xfcc007ff irq 17 at device 31.2 on pci0
atapci0: [ITHREAD]
atapci0: AHCI called from vendor specific driver
atapci0: AHCI v1.20 controller with 6 3Gbps ports, PM supported
ata2: ATA channel 0 on atapci0
ata2: [ITHREAD]
ata3: ATA channel 1 on atapci0
ata3: [ITHREAD]
ata4: ATA channel 2 on atapci0
ata4: stopping AHCI engine failed
ata4: [ITHREAD]
ata5: ATA channel 3 on atapci0
ata5: stopping AHCI engine failed
ata5: [ITHREAD]
ata6: ATA channel 4 on atapci0
ata6: [ITHREAD]
ata7: ATA channel 5 on atapci0
ata7: [ITHREAD]
ad4: 476940MB Seagate ST9500325AS 0001SDM1 at ata2-master SATA300
ad6: 476940MB Seagate ST9500325AS 0001SDM1 at ata3-master SATA300
ar0: writing of DDF metadata is NOT supported yet
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master


-- 
Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org 
ICSI/LBNL* Fax   +1 (510) 666-2956 *   www.icir.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


File system trouble with ICH9 controller

2010-06-10 Thread Robin Sommer
I'm running 8.0-RELEASE-p2 (amd64) on a larger number of Supermicro
SBI-7425C-T3 blades. Each of the blades has 2 x 500GB disks striped
into a single volume via the on-board ICH9 RAID controller. 

However, after running fine for a while (days), the blades crash
eventually with file system problems such as the one below.
Initially I thought that must be a bad disk, but by now 5 different
blades have shown similar problems so I'm suspecting some OS issue. 

Has anybody seen something similar before? Could this be an
incompatibility with the RAID controller (I haven't found much
recent on Google but there are a number of older threads indicating
that it might not be well supported. Not sure though whether those
still apply).  

Any other thoughts?

Thanks,

Robin

- syslog ---

Jun  9 10:00:02 user.crit blade19 kernel: ar0s1a[WRITE(offset=704187858944, 
length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704188219392, length=131072)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704188891136, length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704189382656, length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704189743104, length=131072)]
Jun  9 10:00:02 user.crit blade19 kernel: error = 5

- system information  --

# uname -a
FreeBSD blade5 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan  5 21:11:58 
UTC 2010 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

# pciconf -lv | grep SATA
device = '82801IB/IR/IH (ICH9 Family) SATA RAID Controller'

# atacontrol list
ATA channel 2:
Master:  ad4 ST9500325AS/0001SDM1 SATA revision 2.x
Slave:   no device present
ATA channel 3:
Master:  ad6 ST9500325AS/0001SDM1 SATA revision 2.x
Slave:   no device present

# dmesg | grep ata
atapci0: Intel ICH9 SATA300 controller port 
0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 
0xfcc0-0xfcc007ff irq 17 at device 31.2 on pci0
atapci0: [ITHREAD]
atapci0: AHCI called from vendor specific driver
atapci0: AHCI v1.20 controller with 6 3Gbps ports, PM supported
ata2: ATA channel 0 on atapci0
ata2: [ITHREAD]
ata3: ATA channel 1 on atapci0
ata3: [ITHREAD]
ata4: ATA channel 2 on atapci0
ata4: stopping AHCI engine failed
ata4: [ITHREAD]
ata5: ATA channel 3 on atapci0
ata5: stopping AHCI engine failed
ata5: [ITHREAD]
ata6: ATA channel 4 on atapci0
ata6: [ITHREAD]
ata7: ATA channel 5 on atapci0
ata7: [ITHREAD]
ad4: 476940MB Seagate ST9500325AS 0001SDM1 at ata2-master SATA300
ad6: 476940MB Seagate ST9500325AS 0001SDM1 at ata3-master SATA300
ar0: writing of DDF metadata is NOT supported yet
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master

-- 
Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org 
ICSI/LBNL* Fax   +1 (510) 666-2956 *   www.icir.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: File system trouble with ICH9 controller

2010-06-10 Thread Jeremy Chadwick
On Thu, Jun 10, 2010 at 09:29:19AM -0700, Robin Sommer wrote:
 I'm running 8.0-RELEASE-p2 (amd64) on a larger number of Supermicro
 SBI-7425C-T3 blades. Each of the blades has 2 x 500GB disks striped
 into a single volume via the on-board ICH9 RAID controller. 

 However, after running fine for a while (days), the blades crash
 eventually with file system problems such as the one below.
 Initially I thought that must be a bad disk, but by now 5 different
 blades have shown similar problems so I'm suspecting some OS issue. 
 
 Has anybody seen something similar before? Could this be an
 incompatibility with the RAID controller (I haven't found much
 recent on Google but there are a number of older threads indicating
 that it might not be well supported. Not sure though whether those
 still apply).  

 Jun  9 10:00:02 user.crit blade19 kernel: ar0s1a[WRITE(offset=704187858944, 
 length=114688)]error = 5
 Jun  9 10:00:02 user.crit blade19 kernel: 
 g_vfs_done():ar0s1a[WRITE(offset=704188219392, length=131072)]error = 5
 Jun  9 10:00:02 user.crit blade19 kernel: 
 g_vfs_done():ar0s1a[WRITE(offset=704188891136, length=114688)]error = 5
 Jun  9 10:00:02 user.crit blade19 kernel: 
 g_vfs_done():ar0s1a[WRITE(offset=704189382656, length=114688)]error = 5
 Jun  9 10:00:02 user.crit blade19 kernel: 
 g_vfs_done():ar0s1a[WRITE(offset=704189743104, length=131072)]
 Jun  9 10:00:02 user.crit blade19 kernel: error = 5

You're using Intel MatrixRAID.  Please stop[1]; you're living
dangerously.

The messages your kernel is spitting out could indicate a lot of
different things.  Tracking it down will take time.  So let's start wit
this:

1) Provide output from gpart show ar0s1.  I'm curious about something
(likely a red herring, but I want to see).

2) Install sysutils/smartmontools and run smartctl -a /dev/adXX on
each of the disks which make up the RAID array.  I believe FreeBSD can
see the disks associated with the array (meaning you should have a few
adXX disks, in addition to an ar0 entry).  I can help you decode the
output, to see if any of the disks have actual problems that indicate
they could be going bad.

3) Remove use of MatrixRAID.  Alternatives include ccd, gstripe, gvinum,
or ZFS.  I would recommend ZFS if you ran RELENG_8 instead of -RELEASE,
system was amd64, and has at least 4GB RAM.  Remove use of MatrixRAID
first, then see if the problem goes away.

4) If the problem still happens after this, there should be developers
who can help diagnose the problem.  Keeping MatrixRAID out of the
picture helps greatly.

More details: you might consider these opinions, but they're based on
personal experience (I've dealt many a time with MatrixRAID).  The
problem is not with the ICH9, given that most of our systems are
Supermicro (not blades but that doesn't matter) and use ICH9 with AHCI
(both with and without ahci.ko).  Intel ICHxx and ESBx controllers are
heavily tested on FreeBSD, both by users and developers.


[1]: http://en.wikipedia.org/wiki/Intel_Matrix_RAID

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: calcru: runtime went backwards messages

2010-06-10 Thread Jilles Tjoelker
On Fri, Jun 11, 2010 at 12:35:05AM +0800, Jansen Gotis wrote:
 Hi, for the past couple of months since moving to RELENG_8 I've been
 receiving calcru: runtime went backwards messages on the console.

 My machine is a dual Pentium III 1.26GHz with an Intel SAI2 board.
 Disabling EIST is not an option in my BIOS, and I've tried disabling
 the ACPI timer as well as setting kern.timecounter.hardware=i8254.
 I've also tried disabling cpufreq in my kernel configuration.

 For what it's worth, I'm running base ntpd. I've also tried openntpd,
 but no dice.

 I did a binary search of the commit with which this started, and
 apparently it's svn r204546, a summary of which can be seen here:
 http://freshbsd.org/2010/03/02/01/56/55

 The calcru messages appear whether vesa is loaded as a module
 or compiled into the kernel.

 If anyone needs more information, I'll be happy to provide it.

 = snippet of /var/log/messages relating to calcru messages =
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
 3502 usec to 3297 usec for pid 1106 (mksh)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
 36785 usec to 35858 usec for pid 1114 (csh)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
 13438 usec to 12652 usec for pid 1113 (su)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
 14956 usec to 14081 usec for pid  (mksh)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
 3323 usec to 3128 usec for pid  (mksh)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from 610
 usec to 574 usec for pid 549 (devd)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from 517
 usec to 486 usec for pid 548 (dhclient)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
 1912 usec to 1800 usec for pid 532 (dhclient)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
 39738 usec to 37412 usec for pid 532 (dhclient)
 Jun 10 22:41:42 hobbes kernel: calcru: runtime went backwards from
 3369010 usec to 3334846 usec for pid 1 (init)

This may well be a manifestation of a brokenness (which should not be
unknown) in how FreeBSD stores CPU time utilization. The time is
maintained in CPU ticks (CPU clock cycles), so if the clock frequency
changes, the values of existing processes will be wrong (a jump when
converted to seconds). When calcru detects this, it generates messages
like the above. If this analysis is right, the messages can be ignored,
but indicate that CPU time statistics may be inaccurate.

I suppose fairly arbitrary changes can cause the messages to appear or
disappear.

-- 
Jilles Tjoelker
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: File system trouble with ICH9 controller

2010-06-10 Thread Robin Sommer

On Thu, Jun 10, 2010 at 11:17 -0700, you wrote:

 You're using Intel MatrixRAID.  Please stop[1]; you're living
 dangerously.

Thanks for your quick response. I don't need much in terms of
long-term data reliability on these machines (thus the RAID 0).
However, if MatrixRAID is unreliably even without further external
events (like disk problems/changes), I'll turn it off. 

 1) Provide output from gpart show ar0s1.  I'm curious about something
 (likely a red herring, but I want to see).

# gpart show ar0s1
= 0  1952989857  ar0s1  BSD  (931G)
   0  1952989857  1  freebsd-ufs  (931G)
   
 2) Install sysutils/smartmontools and run smartctl -a /dev/adXX on
 each of the disks which make up the RAID array. 

See below. 

Thanks,

Robin

- cut ---

# smartctl -a /dev/ad4
smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.0-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 5400.6 series
Device Model: ST9500325AS
Serial Number:6VE3R9QW
Firmware Version: 0001SDM1
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:Thu Jun 10 12:16:49 2010 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (   0) seconds.
Offline data collection
capabilities:(0x73) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:( 144) minutes.
Conveyance self-test routine
recommended polling time:(   2) minutes.
SCT capabilities:  (0x103b) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   111   099   006Pre-fail  Always   
-   37061718
  3 Spin_Up_Time0x0003   099   099   000Pre-fail  Always   
-   0
  4 Start_Stop_Count0x0032   100   100   020Old_age   Always   
-   10
  5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000f   100   253   030Pre-fail  Always   
-   453274
  9 Power_On_Hours  0x0032   099   099   000Old_age   Always   
-   1564
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   100   037   020Old_age   Always   
-   10
184 End-to-End_Error0x0032   100   100   099Old_age   Always   
-   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age   Always   
-   0
188 Command_Timeout 0x0032   100   100   000Old_age   Always   
-   0
189 High_Fly_Writes 0x003a   100   100   000Old_age   Always   
-   0
190 Airflow_Temperature_Cel 0x0022   079   073   045Old_age   Always   
-   21 (Lifetime Min/Max 21/22)
191 G-Sense_Error_Rate  0x0032   100   100   000Old_age   Always   
-   0
192 

Setting up X Terminals: What about audio, Pulseaudio or NAS?

2010-06-10 Thread Christian Walther
Hi,

I'm currently thinking about reconfiguring my wives and my laptop as X
Terminals. We're using them most of the time mostly, and my server is
a Athlon X2 with 4GB RAM. The only thing I'm currently wondering about
is what audio system to use. Both pulseaudio and NAS seem to be an
option. Pulseaudio seems to be more widely supported, while I heard
some bad things about it that indicate that the Network Audio System
has been implemented more cleanly and thus is easier to setup.

Has anybody used one or maybe both ports and is willing to share the experience?

Regards
Christian Walther
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RELENG_7 em problems

2010-06-10 Thread Mike Tancsa

Hi Jack,
I am seeing some issues on RELENG_7 with a specific em nic

e...@pci0:13:0:0:class=0x02 card=0x108c15d9 
chip=0x108c8086 rev=0x03 hdr=0x00

vendor = 'Intel Corporation'
device = 'Intel Corporation 82573E Gigabit Ethernet 
Controller (Copper) (82573E)'

class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)

If I disable tso, I am not able to make a tcp connection into the host

eg
0[psbgate1]# ifconfig em2
em2: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC
ether 00:30:48:9f:eb:80
inet 192.168.128.200 netmask 0xfff0 broadcast 192.168.128.207
media: Ethernet autoselect (100baseTX full-duplex)
status: active
0[psbgate1]# ifconfig em2 -tso
0[psbgate1]#


Looking at the pcap, the checksum is bad on the syn-ack.  If I 
re-enable tso, it seems to be ok


16:18:01.113297 IP (tos 0x10, ttl 64, id 6339, offset 0, flags [DF], 
proto TCP (6), length 60) 192.168.128.196.54172  192.168.128.200.22: 
S, cksum 0x4e79 (correct), 3313156149:3313156149(0) win 65535 mss 
1460,nop,wscale 3,sackOK,timestamp 3376174416 0
16:18:01.123676 IP (tos 0x0, ttl 64, id 3311, offset 0, flags [DF], 
proto TCP (6), length 60) 192.168.128.200.22  192.168.128.196.54172: 
S, cksum 0x81c9 (incorrect (- 0x51f2), 1373042663:1373042663(0) ack 
3313156150 win 65535 mss 1460,nop,wscale 3,sackOK,timestamp 
1251567646 3376174416



em2: Intel(R) PRO/1000 Network Connection 7.0.5 port 0x5000-0x501f 
mem 0xe820-0xe821 irq 16 at device 0.0 on pci13

em2: Using MSI interrupt
em2: [FILTER]
em2: Ethernet address: 00:30:48:9f:eb:80
pcib5: ACPI PCI-PCI bridge irq 16 at device 28.5 on pci0
pci14: ACPI PCI bus on pcib5
em3: Intel(R) PRO/1000 Network Connection 7.0.5 port 0x6000-0x601f 
mem 0xe830-0xe831 irq 17 at device 0.0 on pci14

em3: Using MSI interrupt
em3: [FILTER]
em3: Ethernet address: 00:30:48:9f:eb:81


Also there is still the issue with

http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052842.html

in RELENG_7 ?

---Mike



Mike Tancsa,  tel +1 519 651 3400
Sentex Communications,m...@sentex.net
Providing Internet since 1994www.sentex.net
Cambridge, Ontario Canada www.sentex.net/mike

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Setting up X Terminals: What about audio, Pulseaudio or NAS?

2010-06-10 Thread Freddie Cash
On Thu, Jun 10, 2010 at 2:47 PM, Christian Walther cptsa...@gmail.comwrote:

 I'm currently thinking about reconfiguring my wives and my laptop as X
 Terminals. We're using them most of the time mostly, and my server is
 a Athlon X2 with 4GB RAM. The only thing I'm currently wondering about
 is what audio system to use. Both pulseaudio and NAS seem to be an
 option. Pulseaudio seems to be more widely supported, while I heard
 some bad things about it that indicate that the Network Audio System
 has been implemented more cleanly and thus is easier to setup.

 Has anybody used one or maybe both ports and is willing to share the
 experience?


How powerful are the laptops?  Are they good enough to run FreeBSD + X +
apps locally?  Do they have at least 1 GB of RAM?

If they have fast enough CPUs and enough RAM to run things locally, then
look into diskless booting via PXE instead of thin-client setups.  You get
all the benefits of thin-clients (central management as everything is on the
server, the benefits of having nothing installed locally so no moving parts,
etc)  ... along with the power of running apps locally, and minimising the
network load (only time network is used is to boot, and to load apps).  This
also allows for accelerated 3D and easy sound configuration.

If they aren't fast enough to support X locally, then you'll need to use
thin-client / X terminal setups.  NAS was created for just this purpose.  It
works in virtually the same way that X works across a network.  Definitely
look into NAS first.  Only if that fails, should you go down the dark,
twisted path of PulseAudio.  :)

We (School District 73 Kamloops/Thompson in BC, Canada) started out using
thin-client setups with P2 333 Mhz clients with 256 MB of RAM.  Worked well
as a base to start from, but we quickly ran into issues with online Flash
games, 3D accelerated programs like Blender and CAD apps, full-motion video,
educational games like TuxMath, TuxMath, TenThumbs Typing Tutor, etc.
Within 3 years, we had started a migration to diskless setup with apps
running locally.  We now run strictly diskless, even on teacher, office, and
admin desktops.  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org