If mount fails, (no root.afs and not dynroot for instance) the client failure is well-known. The old Linux-AFS client did something clever which we probably should also: fake a root and later swap in the real one.
I'm more interested in the server errors. Can you share anything from the logs? On Tue, Jul 29, 2008 at 6:11 AM, Dr A V Le Blanc <[EMAIL PROTECTED]> wrote: > I sent this to Russ Allbery, and he suggested that I send it to > openafs-devel. > > We've got an old AFS cell, and I've been looking at moving to Debian > lenny for the file and db servers, and I started experimenting with > Russ's 1.5.50.dfsg1-1 version, which I used to create a new experimental > cell. I've seen a number of problems: > > The fileserver and dbserver packages had a number of issues; I was able > to create the cell and get a quorum for the vlserver and ptserver, but > attempts to create a volume always ended with a communications failure, > and nothing would ever make the volumes online and readable. > > When I attempted to start the client on the cell, it took a very long > time, and then failed, presumably since root.afs wasn't online. But > I was unable to stop and restart it, getting the message about the lack > of memory which I describe below. > > By the way, attempting to run the afs-newcell script even with all the > requirements satisfied (of course) failed. > > When I replaced the dbserver, fileserver, client and openafs-krb5 packages > with openafs-1.4.7.dfsg1-2 packages, everything worked perfectly -- even > when I still had the 1.5.50.dfsg1-1 module in the kernel. This seems > to me to show that it was not a problem with firewalling or other > communications issues. > > A typical message from a shutdown was this: > > Jul 24 11:39:26 scree kernel: [79231.987117] WARM shutting down of: CB... > afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener... > Jul 24 11:39:26 scree kernel: [79232.491466] WARNING: not all blocks freed: > large 1 small 4 > Jul 24 11:39:26 scree kernel: [79232.491466] ALL allocated tables > > also I have this: > > Jul 24 13:15:48 scree kernel: [85788.248612] COLD shutting down of: CB... > afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener... > Jul 24 13:15:48 scree kernel: [85788.871295] ALL allocated tables > Jul 24 13:15:48 scree kernel: [85788.888977] slab error in > kmem_cache_destroy(): cache `afs_inode_cache': Can't free all objects > Jul 24 13:15:48 scree kernel: [85788.993231] [<c0174519>] > kmem_cache_destroy+0x6a/0xb6 > Jul 24 13:15:48 scree kernel: [85788.993261] [<f8b5c9da>] > cleanup_module+0x1e/0x32 [openafs] > Jul 24 13:15:48 scree kernel: [85788.993345] [<c0140dfa>] > sys_delete_module+0x1a8/0x1f7 > Jul 24 13:15:48 scree kernel: [85788.993374] [<c01672e1>] > remove_vma+0x3e/0x43 > Jul 24 13:15:48 scree kernel: [85788.993388] [<c0167fe4>] > do_munmap+0x1ba/0x1d4Jul 24 13:15:48 scree kernel: [85788.993409] > [<c0103982>] syscall_call+0x7/0xb > Jul 24 13:15:48 scree kernel: [85788.993436] ======================= > Jul 24 13:21:11 scree kernel: [86141.296068] Symbol init_mm is marked as > UNUSED, however this module is using it. > Jul 24 13:21:11 scree kernel: [86141.296082] This symbol will go away in the > future. > > and from a failed attempt to restart the client: > > Jul 24 13:21:11 scree kernel: [86141.298714] Found system call table at > 0xfffffffe (exported) > Jul 24 13:21:11 scree kernel: [86141.298720] Address 0xfffffffe is not > writable.Jul 24 13:21:11 scree kernel: [86141.298725] System call hooks will > not be installed; proceeding anyway > Jul 24 13:21:11 scree kernel: [86141.298733] kmem_cache_create: duplicate > cache > afs_inode_cache > Jul 24 13:21:11 scree kernel: [86141.382946] [<c0174623>] > kmem_cache_create+0xbe/0x33b > Jul 24 13:21:11 scree kernel: [86141.382987] [<f8b4e68e>] > afs_init_inodecache+0x1b/0x2b [openafs] > Jul 24 13:21:11 scree kernel: [86141.383069] [<f8b4e69e>] init_once+0x0/0x7 > [openafs] > Jul 24 13:21:11 scree kernel: [86141.383133] [<f892f025>] > init_module+0x25/0x5f [openafs] > Jul 24 13:21:11 scree kernel: [86141.383193] [<c0140a85>] > sys_init_module+0x1862/0x19e5 > Jul 24 13:21:11 scree kernel: [86141.383270] [<c01304d9>] > find_task_by_vpid+0x0/0x19 > Jul 24 13:21:11 scree kernel: [86141.383331] [<c0103982>] > syscall_call+0x7/0xb > Jul 24 13:21:11 scree kernel: [86141.383368] ======================= > > I have not saved logs from the salvager processes, but there didn't seem to me > to be anything useful in them. > > I hope this is useful, and that someone can see what some of the problems are. > Test builds of the kernel module show some peculiarities with other kernels, > at least to the extent of giving a warning message about being unable to > unload sunrpc. I'd be happy to do any experiments that might help illumine > or solve this problem. > > -- Owen > _______________________________________________ > OpenAFS-devel mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-devel > _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
