It is rather difficult to determine what sort of response you are expecting to this message, as it seems to cover several different (but maybe related) topics, and include some exposition and supposition that do not include clear questions.

On Wed, 11 Jun 2014, O. Hartmann wrote:

Running FreeBSD

Version String: FreeBSD 11.0-CURRENT #3 r267294: Mon Jun  9 22:07:15 CEST 2014 
amd64

crashes wihout panic message and /var/crash/info.0 contains this message:

Dump header from device /dev/gpt/swap
 Architecture: amd64
 Architecture Version: 2
 Dump Length: 968962048B (924 MB)
 Blocksize: 512
 Dumptime: Wed Jun 11 19:19:19 2014
 Hostname: thor.sb211.zbv
 Magic: FreeBSD Kernel Dump
 Version String: FreeBSD 11.0-CURRENT #3 r267294: Mon Jun  9 22:07:15 CEST 2014
   r...@thor.sb211.zbv:/usr/obj/usr/src/sys/THOR
 Panic String: ffs_alloccg: map corrupted
 Dump Parity: 3034136388
 Bounds: 0
 Dump Status: good

I'm very confused about the panic string, since it seems to tell me something 
is bad with
FFS/UFS.

ffs is encountering "bad" data while searching through the free block map. I am not an ffs/ufs expert, but I think this could be the result of of corrupt data on-disk [from a previous crash?] that does not get cleaned up by fsck. If that is the case, re-running newfs should clear things up. Since this is /tmp which is, as you note, usually just ephemeral files, that is probably one of the first things I would try.


More disturbing is the fact that the boot process into multi user stops at a 
compalin
about unclean /dev/gpt/tmp filesystem (mount to /tmp): The OS stops at the 
PAsswd: prompt
for single user-mode maintainance.

If error(s) are encountered during the mounting of filesystems, the OS always drops to single-user mode. There is no special-casing for /tmp or anything else. See the calls to stop_boot() from /etc/rc.d/mountcritlocal, etc..

I can not understand why the system is stopping complaining about a broken /tmp
filesystem. I consider especially /tmp infill corrupt after a fault and I'd 
like to ask
whether there is a way to overrun this corruption and force a repair and mount, 
even if
the data contained in /tmp is after forced cleaning corrupt.

When using tmpfs backed /tmp there shouldn't be any stopp/fault of that kind so 
it would
be canonical to have it also for a hard-drive backed /tmp, or am I wrong?

I don't think you're obviously correct. You may not be wrong, but this is not how the system is currently expected to behave; there would need to be some discussion if it was to change.

It is not the first time that I receive this kind of crash under heavy load 
(box is a
8GB system with this CPU specs:

FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final
208032) 20140512 CPU: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz 
(2999.72-MHz
K8-class CPU) Origin="GenuineIntel"  Id=0x10676  Family=0x6  Model=0x17  
Stepping=6
 
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
 
Features2=0x8e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
 AMD Features=0x20100800<SYSCALL,NX,LM>
 AMD Features2=0x1<LAHF>
 TSC: P-state invariant, performance statistics
real memory  = 8589934592 (8192 MB)
avail memory = 8278880256 (7895 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <A_M_I_ OEMAPIC >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
[...]

The not-so-funny-part is that I have those crashes under heavy load very 
frequent on ALL
C2D systems (one E8400 as shown, another has a Q4400 CPU, but also 8 GB RAM, 
same
motherboard). In all cases of a sudden crash, /tmp gets corrupted and the 
system refuses
to boot into multiuser mode complaining about the broken /tmp filesystem which 
can not be
repaired automatically.

Apart from this specific question about an unclean /tmp, this kind of crash 
under heavy
load on a specific hardware architecture with most recent CURRENT is puzzling 
(and
occured within the past 8 weeks several times with the same stupid blocking at 
the
broken /tmp partition). I also checked the hardware with tools like memtest86 
ensure
having no fault memory, but I can not exclude some kind of overheating the CPU 
since I
realized with CLANG and -O3 (which is supposed to optimise for vector units if 
available,
if I'm right) this increases the average CPU temperature by ~ 3 - 5 degree 
Celsius. This
is more obvious on a Dell Latitude E6510 with a first-generation Sandy Bridge 
mobile CPU
and FreeBSD 9.2/9.3: compiling the OS with gcc 4.2 (base compiler in that 
system), the
temperature is 2 - 4 degrees lower than using CLANG 3.4.1 with -O3 enabled 
(reading the
ACPI reported temperature via "systctl -a|grep tempe"). This is funny, isn't it?

I don't feel like there is anything I can say in reply to this bit.

-Ben
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to