It is rather difficult to determine what sort of response you are
expecting to this message, as it seems to cover several different (but
maybe related) topics, and include some exposition and supposition that do
not include clear questions.
On Wed, 11 Jun 2014, O. Hartmann wrote:
Running FreeBSD
Version String: FreeBSD 11.0-CURRENT #3 r267294: Mon Jun 9 22:07:15 CEST 2014
amd64
crashes wihout panic message and /var/crash/info.0 contains this message:
Dump header from device /dev/gpt/swap
Architecture: amd64
Architecture Version: 2
Dump Length: 968962048B (924 MB)
Blocksize: 512
Dumptime: Wed Jun 11 19:19:19 2014
Hostname: thor.sb211.zbv
Magic: FreeBSD Kernel Dump
Version String: FreeBSD 11.0-CURRENT #3 r267294: Mon Jun 9 22:07:15 CEST 2014
r...@thor.sb211.zbv:/usr/obj/usr/src/sys/THOR
Panic String: ffs_alloccg: map corrupted
Dump Parity: 3034136388
Bounds: 0
Dump Status: good
I'm very confused about the panic string, since it seems to tell me something
is bad with
FFS/UFS.
ffs is encountering "bad" data while searching through the free block map.
I am not an ffs/ufs expert, but I think this could be the result of of
corrupt data on-disk [from a previous crash?] that does not get cleaned up
by fsck. If that is the case, re-running newfs should clear things up.
Since this is /tmp which is, as you note, usually just ephemeral files,
that is probably one of the first things I would try.
More disturbing is the fact that the boot process into multi user stops at a
compalin
about unclean /dev/gpt/tmp filesystem (mount to /tmp): The OS stops at the
PAsswd: prompt
for single user-mode maintainance.
If error(s) are encountered during the mounting of filesystems, the OS
always drops to single-user mode. There is no special-casing for /tmp or
anything else. See the calls to stop_boot() from
/etc/rc.d/mountcritlocal, etc..
I can not understand why the system is stopping complaining about a broken /tmp
filesystem. I consider especially /tmp infill corrupt after a fault and I'd
like to ask
whether there is a way to overrun this corruption and force a repair and mount,
even if
the data contained in /tmp is after forced cleaning corrupt.
When using tmpfs backed /tmp there shouldn't be any stopp/fault of that kind so
it would
be canonical to have it also for a hard-drive backed /tmp, or am I wrong?
I don't think you're obviously correct. You may not be wrong, but this is
not how the system is currently expected to behave; there would need to be
some discussion if it was to change.
It is not the first time that I receive this kind of crash under heavy load
(box is a
8GB system with this CPU specs:
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final
208032) 20140512 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
(2999.72-MHz
K8-class CPU) Origin="GenuineIntel" Id=0x10676 Family=0x6 Model=0x17
Stepping=6
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x8e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
AMD Features=0x20100800<SYSCALL,NX,LM>
AMD Features2=0x1<LAHF>
TSC: P-state invariant, performance statistics
real memory = 8589934592 (8192 MB)
avail memory = 8278880256 (7895 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <A_M_I_ OEMAPIC >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
[...]
The not-so-funny-part is that I have those crashes under heavy load very
frequent on ALL
C2D systems (one E8400 as shown, another has a Q4400 CPU, but also 8 GB RAM,
same
motherboard). In all cases of a sudden crash, /tmp gets corrupted and the
system refuses
to boot into multiuser mode complaining about the broken /tmp filesystem which
can not be
repaired automatically.
Apart from this specific question about an unclean /tmp, this kind of crash
under heavy
load on a specific hardware architecture with most recent CURRENT is puzzling
(and
occured within the past 8 weeks several times with the same stupid blocking at
the
broken /tmp partition). I also checked the hardware with tools like memtest86
ensure
having no fault memory, but I can not exclude some kind of overheating the CPU
since I
realized with CLANG and -O3 (which is supposed to optimise for vector units if
available,
if I'm right) this increases the average CPU temperature by ~ 3 - 5 degree
Celsius. This
is more obvious on a Dell Latitude E6510 with a first-generation Sandy Bridge
mobile CPU
and FreeBSD 9.2/9.3: compiling the OS with gcc 4.2 (base compiler in that
system), the
temperature is 2 - 4 degrees lower than using CLANG 3.4.1 with -O3 enabled
(reading the
ACPI reported temperature via "systctl -a|grep tempe"). This is funny, isn't it?
I don't feel like there is anything I can say in reply to this bit.
-Ben
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"