--On Monday, December 06, 2004 07:48:52 PM +0100 Jeffrey Altman <[EMAIL PROTECTED]> wrote:
> The thing which is preventing the release of 1.3.7x as a stable 1.4
> tree is lack of deployment and testing by users. There has been very
> little feedback both positive or negative on the existing releases.
> Without this feedback it is very difficult for us to know whether or
> not it is ready.
I'd been holding back our feedback because 1.3.75 was imminent and some
of the fixes listed we though might fix our problems. We've done testing
with 1.3.74 and 1.3.75. The clients are all Fedora Core 3 w/ patched
kernels to provide sys_call_table[]. We are experiencing the following
problems:
* Inability to unmount /usr/vice/cache (or / if it's not a separate
partition). This is 100% repeatable on all FC3 machines. The following
steps will always create this problem:
- Stop all processes and logout all users of AFS
- Stop all AFS processes and unload libafs kernel module
- lsof | grep -i afs reports nothing open
- umount /usr/vice/cache
This will always result in an error that /usr/vice/cache is busy:
# umount /usr/vice/cache
umount: /usr/vice/cache: device is busy
umount: /usr/vice/cache: device is busy
* Accessing an AFS volume over our VPN results in an immediate kernel
panic. The panic message reports many "Unable to handle kernel NULL
pointer deference at virtual address" errors followed by "Recursive die()
failure, output suppressed" and "<0>Kernel panic - not syncing: Fatal
exception in interrupt". This is present only on 1 of 2 laptops running
FC3, but is 100% repeatable on the failing laptop.
* Copying large files (~450Mb0 into AFS from non-AFS partitions results
in a kernel oops. The error reported is:
rxi_Start: xmit list overflowed<1>Unable to handle kernel paging request
at virtual address ffffffff
This problem is also 100% repeatable. 'fs getcache' does not report that
the cache is full. I've attached a file gti-largefile-copy-oops.txt that
is the "soft" kernel oops.
* Random cache consistency problems. A file will be present in the
filesystem and viewable on other machines but not on the FC3 host. fs
flush does not always solve this problem however another client operating
on the same directory (i.e. touch hi) seems to unstick the client. We do
have one test case that seems to always generate this problem, but it's not
very portable for other to test as it requires our internal package
management software. Rudy Maceyko is going to test this with 1.3.75
shortly.
These are our current problems with the 1.3.7x series. We have not
tested 1.3.7x on any other Linux release because we're focusing on moving
forward with Fedora 3 and RHEL 4 preparations. So I can't speak to these
problems existing on, for example, FC1.
We are building the RPMs with a modified specfile. We're working to
merge our changes back into the mainline spec file and provide that to the
community. I've attached all of the patches we're applying to the source
tree since they're all small. Their descriptions are:
openafs-1.2.11-no_old_gid_t.patch - Support for AMD 64
openafs-1.2.11-res_search.patch - resolver patch
openafs-1.3.75-afskvers-autoconf-fix.patch - Fix --with-afs-system
26syscall.patch - Hard-sets the build process to use sys_call_table
afs.initd.patch - Removes modload logic in favor of symlinks
to /lib/modules
openafs-krb5-2.0-afsconf.patch - Fixes call to afsconf_AddKey()
for afs-krb5
I've held off reporting this for a little bit because I've not had time to
properly test or debug any of these. Let me know what we can do to further
debug these problems.
--
Jason McCormick
CERT Infrastructure Team
[EMAIL PROTECTED] ** 412-268-7961
Dec 10 14:48:48 gti kernel: rxi_Start: xmit list overflowed<1>Unable to handle kernel paging request at virtual address ffffffff Dec 10 14:48:48 gti kernel: printing eip: Dec 10 14:48:48 gti kernel: 12fac54c Dec 10 14:48:48 gti kernel: *pde = 00002067 Dec 10 14:48:48 gti kernel: Oops: 0002 [#1] Dec 10 14:48:48 gti kernel: Modules linked in: libafs(U) cisco_ipsec(U) i2c_dev i2c_core ipt_REJECT ipt_LOG ipt_state ip_conntrack iptable_filter ip_tables orinoco_cs orinoco hermes ds microcode dm_mod button battery ac ohci1394 ieee1394 yenta_socket pcmcia_core uhci_hcd snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore 3c59x floppy ext3 jbd Dec 10 14:48:48 gti kernel: CPU: 0 Dec 10 14:48:48 gti kernel: EIP: 0060:[<12fac54c>] Tainted: P VLI Dec 10 14:48:48 gti kernel: EFLAGS: 00010212 (2.6.9-1.681.CERT) Dec 10 14:48:48 gti kernel: EIP is at osi_Panic+0x17/0x23 [libafs] Dec 10 14:48:48 gti kernel: eax: 0000001f ebx: 12fc731e ecx: 12fc6fbc edx: 06a9ea5c Dec 10 14:48:48 gti kernel: esi: 0326fc80 edi: 12ff81b0 ebp: 00000007 esp: 06a9ea58 Dec 10 14:48:48 gti kernel: ds: 007b es: 007b ss: 0068 Dec 10 14:48:48 gti kernel: Process cp (pid: 3640, threadinfo=06a9e000 task=0f24f7b0) Dec 10 14:48:48 gti kernel: Stack: 12fc6fbc 00000020 12fd9420 00000000 12fe6ce0 12fa9119 00000000 100a83c0 Dec 10 14:48:48 gti kernel: 00000007 41b9fda0 00052259 41b9fd9f 000efdc9 0326fc80 06865d94 12ffa840 Dec 10 14:48:48 gti kernel: 12ffa1e0 12fab212 00000000 00001000 12fb596c 00000574 00001000 0000026c Dec 10 14:48:48 gti kernel: Call Trace: Dec 10 14:48:48 gti kernel: [<12fa9119>] rxi_Start+0x2dc/0x4f4 [libafs] Dec 10 14:48:48 gti kernel: [<12fab212>] rxi_WriteProc+0x15c/0x350 [libafs] Dec 10 14:48:48 gti kernel: [<12fb596c>] afs_osi_Read+0x4b/0x8f [libafs] Dec 10 14:48:48 gti kernel: [<12f7eb30>] afs_UFSCacheStoreProc+0xe6/0x185 [libafs] Dec 10 14:48:48 gti kernel: [<0218564b>] iget_locked+0x167/0x206 Dec 10 14:48:48 gti kernel: [<12f887dd>] afs_StoreAllSegments+0x8b3/0x1843 [libafs] Dec 10 14:48:48 gti kernel: [<1286014e>] ext3_file_write+0x19/0x8b [ext3] Dec 10 14:48:48 gti kernel: [<12fba9ad>] afs_linux_writepage_sync+0xb0/0x1b7 [libafs] Dec 10 14:48:48 gti kernel: [<12fbaa28>] afs_linux_writepage_sync+0x12b/0x1b7 [libafs] Dec 10 14:48:48 gti kernel: [<0215222e>] follow_page_pte+0xec/0xfd Dec 10 14:48:48 gti kernel: [<12fbaac3>] afs_linux_updatepage+0xf/0x11 [libafs] Dec 10 14:48:48 gti kernel: [<12fbab94>] afs_linux_commit_write+0xcf/0x167 [libafs] Dec 10 14:48:48 gti kernel: [<02144825>] generic_file_buffered_write+0x301/0x48e Dec 10 14:48:48 gti kernel: [<02128c29>] update_wall_time+0x9/0x31 Dec 10 14:48:48 gti kernel: [<02108bf4>] free_irq+0xf/0x1a0 Dec 10 14:48:48 gti kernel: [<0215222e>] follow_page_pte+0xec/0xfd Dec 10 14:48:48 gti kernel: [<0215e907>] rw_vm+0x3ef/0x47a Dec 10 14:48:48 gti kernel: [<02144ce8>] generic_file_aio_write_nolock+0x336/0x364 Dec 10 14:48:48 gti kernel: [<02144d9a>] generic_file_write_nolock+0x84/0x99 Dec 10 14:48:48 gti kernel: [<021c3fc2>] avc_has_perm+0x3b/0x45 Dec 10 14:48:48 gti kernel: [<12f9112b>] afs_CopyOutAttrs+0x1df/0x1e5 [libafs] Dec 10 14:48:48 gti kernel: [<12fb70ec>] vcache2inode+0x21/0x27 [libafs] Dec 10 14:48:48 gti kernel: [<0211d26f>] autoremove_wake_function+0x0/0x2d Dec 10 14:48:48 gti kernel: [<02144ed6>] generic_file_write+0x5a/0xbb Dec 10 14:48:48 gti kernel: [<12fb789b>] afs_linux_write+0x48b/0x5b1 [libafs] Dec 10 14:48:48 gti kernel: [<02145b31>] mempool_free+0x169/0x16d Dec 10 14:48:48 gti kernel: [<02165c82>] vfs_write+0xb6/0xe2 Dec 10 14:48:48 gti kernel: [<02165d4c>] sys_write+0x3c/0x62 Dec 10 14:48:48 gti kernel: Code: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Dec 10 14:48:48 gti kernel: in_atomic():0[expected: 0], irqs_disabled():1 Dec 10 14:48:48 gti kernel: [<0211cbcb>] __might_sleep+0x7d/0x8a Dec 10 14:48:48 gti kernel: [<0215e726>] rw_vm+0x20e/0x47a Dec 10 14:48:48 gti kernel: [<12fac521>] rxi_GetHostUDPSocket+0x19/0x23 [libafs] Dec 10 14:48:48 gti kernel: [<12fac521>] rxi_GetHostUDPSocket+0x19/0x23 [libafs] Dec 10 14:48:48 gti kernel: [<0215ee70>] get_user_size+0x30/0x57 Dec 10 14:48:48 gti kernel: [<12fac521>] rxi_GetHostUDPSocket+0x19/0x23 [libafs] Dec 10 14:48:48 gti kernel: [<0210682b>] show_registers+0x109/0x15e Dec 10 14:48:48 gti kernel: [<02106a2f>] die+0x14a/0x241 Dec 10 14:48:48 gti kernel: [<0211937e>] do_page_fault+0x0/0x511 Dec 10 14:48:48 gti kernel: [<0211937e>] do_page_fault+0x0/0x511 Dec 10 14:48:48 gti kernel: [<02119733>] do_page_fault+0x3b5/0x511 Dec 10 14:48:48 gti kernel: [<12fac54c>] osi_Panic+0x17/0x23 [libafs] Dec 10 14:48:48 gti kernel: [<0211b15f>] activate_task+0x53/0x5f Dec 10 14:48:48 gti kernel: [<0211d27c>] autoremove_wake_function+0xd/0x2d Dec 10 14:48:48 gti kernel: [<0211bbeb>] __wake_up_common+0x36/0x51 Dec 10 14:48:48 gti kernel: [<0211bc93>] __wake_up+0x8d/0xf2 Dec 10 14:48:48 gti kernel: [<0211937e>] do_page_fault+0x0/0x511 Dec 10 14:48:48 gti kernel: [<12fac54c>] osi_Panic+0x17/0x23 [libafs] Dec 10 14:48:48 gti kernel: [<12fa9119>] rxi_Start+0x2dc/0x4f4 [libafs] Dec 10 14:48:48 gti kernel: [<12fab212>] rxi_WriteProc+0x15c/0x350 [libafs] Dec 10 14:48:48 gti kernel: [<12fb596c>] afs_osi_Read+0x4b/0x8f [libafs] Dec 10 14:48:48 gti kernel: [<12f7eb30>] afs_UFSCacheStoreProc+0xe6/0x185 [libafs] Dec 10 14:48:48 gti kernel: [<0218564b>] iget_locked+0x167/0x206 Dec 10 14:48:48 gti kernel: [<12f887dd>] afs_StoreAllSegments+0x8b3/0x1843 [libafs] Dec 10 14:48:48 gti kernel: [<1286014e>] ext3_file_write+0x19/0x8b [ext3] Dec 10 14:48:48 gti kernel: [<12fba9ad>] afs_linux_writepage_sync+0xb0/0x1b7 [libafs] Dec 10 14:48:48 gti kernel: [<12fbaa28>] afs_linux_writepage_sync+0x12b/0x1b7 [libafs] Dec 10 14:48:48 gti kernel: [<0215222e>] follow_page_pte+0xec/0xfd Dec 10 14:48:48 gti kernel: [<12fbaac3>] afs_linux_updatepage+0xf/0x11 [libafs] Dec 10 14:48:48 gti kernel: [<12fbab94>] afs_linux_commit_write+0xcf/0x167 [libafs] Dec 10 14:48:48 gti kernel: [<02144825>] generic_file_buffered_write+0x301/0x48e Dec 10 14:48:48 gti kernel: [<02128c29>] update_wall_time+0x9/0x31 Dec 10 14:48:48 gti kernel: [<02108bf4>] free_irq+0xf/0x1a0 Dec 10 14:48:48 gti kernel: [<0215222e>] follow_page_pte+0xec/0xfd Dec 10 14:48:48 gti kernel: [<0215e907>] rw_vm+0x3ef/0x47a Dec 10 14:48:48 gti kernel: [<02144ce8>] generic_file_aio_write_nolock+0x336/0x364 Dec 10 14:48:48 gti kernel: [<02144d9a>] generic_file_write_nolock+0x84/0x99 Dec 10 14:48:48 gti kernel: [<021c3fc2>] avc_has_perm+0x3b/0x45 Dec 10 14:48:48 gti kernel: [<12f9112b>] afs_CopyOutAttrs+0x1df/0x1e5 [libafs] Dec 10 14:48:48 gti kernel: [<12fb70ec>] vcache2inode+0x21/0x27 [libafs] Dec 10 14:48:48 gti kernel: [<0211d26f>] autoremove_wake_function+0x0/0x2d Dec 10 14:48:48 gti kernel: [<02144ed6>] generic_file_write+0x5a/0xbb Dec 10 14:48:48 gti kernel: [<12fb789b>] afs_linux_write+0x48b/0x5b1 [libafs] Dec 10 14:48:48 gti kernel: [<02145b31>] mempool_free+0x169/0x16d Dec 10 14:48:48 gti kernel: [<02165c82>] vfs_write+0xb6/0xe2 Dec 10 14:48:48 gti kernel: [<02165d4c>] sys_write+0x3c/0x62 Dec 10 14:48:48 gti kernel: Bad EIP value.
26syscall.patch
Description: Binary data
afs.initd.patch
Description: Binary data
openafs-1.2.11-no_old_gid_t.patch
Description: Binary data
openafs-1.2.11-res_search.patch
Description: Binary data
openafs-1.3.74-admin_tools.klog.patch
Description: Binary data
openafs-krb5-2.0-afsconf.patch
Description: Binary data
openafs-1.3.75-afskvers-autoconf-fix.patch
Description: Binary data
