Dear Mark,

sorry for my late answer. I'm still in a hurry. Unfortunately there are no log files left from October. But I should say that we had several similar events during the last week (and from time to time) which are in coincidence with shutdowns or end in shutdowns (don't know what comes first). I could prepare the message logs if you think it's worth. However, not before tomorrow.

Best regards,

Karl



On 25.01.16 19:22, Mark Vitale wrote:
On Nov 4, 2015, at 11:50 AM, Karl Behler <[email protected]> wrote:

Dear Mark and Ben,

thanks for your response. We could not find which component in our system may have caused 
the "umount".
But since then it never happened again. I think we will go over to a newer 
version of the client and then see what happens.
When you first reported this, I focused on the possible reasons for the shutdown, and 
inadvertently overlooked the panic/hang that you also reported.  But recently while doing 
some Solaris testing I discovered a bug in the OpenAFS Solaris shutdown code.  Unlike 
you, my shutdowns were intentional; but like you, I saw "Failed to flush 
vcache" messages and a panic after shutdown.   At that point I remembered your email 
and realized you had probably encountered the same panic I did.  However, the only way to 
be sure would be to look for the panic messages in your syslog.  If you still have those 
(from back in October), it would be helpful to see them.

Regardless, I'm able to duplicate the problem quite easily.

I've opened https://rt.central.org/rt/Ticket/Display.html?id=132689 and I am 
working on an upstream fix for this.

Regards,
--
Mark Vitale
Sine Nomine Associates


On Oct 16, 2015, at 10:46 AM, Karl Behler <[email protected]> wrote:

we experience unwanted "shutdown" events of our OpenAFS 1.6.9 clients under 
Solaris 10.

Running this client since October last year without problems on ten Solaris 
desktop servers which reboot regularly on weekends, we recently had kind of 
crashes on nearly half of these servers in the middle of a week.

The log file (/var/adm/messages) contains kernel messages which look like a 
shutdown which seems to be initiated by the afsd itself.
(In the following log the real event starts at Oct 16 11:54:47)

Oct 16 11:35:39 sxaug37 genunix: [ID 900631 kern.notice] afs: byte-range 
lock/unlock ignored; make sure no one else is running this program (pid 23006 
(thunderbird-bin), user 13471, fid 1108706165.12934.344145).
Oct 16 11:39:23 sxaug37 genunix: [ID 900631 kern.notice] afs: byte-range 
lock/unlock ignored; make sure no one else is running this program (pid 22054 
(firefox-bin), user 6570, fid 1108604831.175334.13229850).
Oct 16 11:49:23 sxaug37 last message repeated 1 time
Oct 16 11:54:47 sxaug37 genunix: [ID 146023 kern.notice] afs: WARM
Oct 16 11:54:47 sxaug37 genunix: [ID 510892 kern.notice] shutting down of: 
vcaches...
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 
0x28e2f840
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 
0x2924b960
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 
0x28114c00
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 
0x27d49000
... several hundert similar messages
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 
0x2811dbc0
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 
0x28a53c60
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 
0x27e10460
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 
0x289fad40
Oct 16 11:54:47 sxaug37 genunix: [ID 364168 kern.notice] BkG...
Oct 16 11:54:47 sxaug37 genunix: [ID 338304 kern.notice] CB...
Oct 16 11:54:47 sxaug37 genunix: [ID 543876 kern.notice] afs...
Oct 16 11:54:47 sxaug37 genunix: [ID 229921 kern.notice] CTrunc...
Oct 16 11:54:47 sxaug37 genunix: [ID 916331 kern.notice] AFSDB...
Oct 16 11:54:47 sxaug37 genunix: [ID 196290 kern.notice] RxEvent...
Oct 16 11:54:48 sxaug37 genunix: [ID 687192 kern.notice] UnmaskRxkSignals...
Oct 16 11:54:48 sxaug37 genunix: [ID 346748 kern.notice] RxListener...
Oct 16 11:54:48 sxaug37 genunix: [ID 890369 kern.notice] NetIfPoller...
Oct 16 11:54:48 sxaug37 genunix: [ID 288918 kern.notice] WARNING: not all 
blocks freed: large 0 small 217
Oct 16 11:54:48 sxaug37 genunix: [ID 646860 kern.notice]  ALL allocated 
tables...
Oct 16 11:54:48 sxaug37 genunix: [ID 773001 kern.notice] done
Oct 16 11:58:24 sxaug37 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 
Version Generic_150401-28 64-bit
Oct 16 11:58:24 sxaug37 genunix: [ID 282658 kern.notice] Copyright (c) 1983, 
2015, Oracle and/or its affiliates. All rights reserved.

Sometimes the system reboots immediately and sometimes the system stays in a 
state where all attempts to access AFS end with I/O Error.


--
Dr. Karl Behler 
CODAC & IT services ASDEX Upgrade
phon +49 89 3299-1351 fax 3299-961351

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to