Ok, here's some more info- I added some debugging code to the fileserver at each of the calls to VTakeOffline that didn't already have it. The volume is being taken offline because rx_WritevAlloc is failing inside FetchData_RXStyle. Relevant logs below:

Thu Aug  3 19:15:36 2006 [102] FindClient: authenticating connection: 
authClass=0
Thu Aug  3 19:15:36 2006 [102] WhoAreYou success on 128.8.111.219:7001
Thu Aug  3 19:15:36 2006 [102] InitCallBackState3 success on 128.8.111.219:7001
Thu Aug  3 19:15:36 2006 [102] SAFS_FetchStatus,  Fid = 1970897351.1.1, Host 128
.8.111.219:7001, Id 32766
Thu Aug  3 19:15:36 2006 [102] SAFS_FetchStatus returns 0
Thu Aug  3 19:15:36 2006 [102] SAFS_FetchStatus,  Fid = 1970897351.1348.1070438,
 Host 128.8.111.206:7001, Id 32766
Thu Aug  3 19:15:36 2006 [102] SAFS_FetchStatus returns 0
Thu Aug  3 19:15:37 2006 [102] SAFS_FetchStatus,  Fid = 1970897351.1348.1070438,
 Host 128.8.111.206:7001, Id 32766
Thu Aug  3 19:15:37 2006 [102] SAFS_FetchStatus returns 0
Thu Aug  3 19:15:37 2006 [102] SRXAFS_FetchData, Fid = 1970897351.1348.1070438
Thu Aug  3 19:15:37 2006 [102] SRXAFS_FetchData, Fid = 1970897351.1348.1070438,
Host 128.8.111.206:7001, Id 32766
Thu Aug  3 19:15:37 2006 [102] FetchData_RXStyle: Pos 0, Len 524288
Thu Aug  3 19:15:37 2006 [102] FetchData_RXStyle: file size 1854506
Thu Aug 3 19:15:37 2006 [102] FetchData_RXStyle failed - rx_WritevAlloc returned <= 0
Thu Aug  3 19:15:37 2006 [102] VOffline: Volume 1970897351 (s.common.readonly) i
s now offline
Thu Aug  3 19:15:37 2006 [102]
Thu Aug  3 19:15:37 2006 [102] SRXAFS_FetchData returns 5

Kevin

On Thu, 3 Aug 2006, Kevin Hildebrand wrote:


Hello, we've been having problems recently with one of our volumes having most or all of its RO replications go offline at approximately the same time. The RW volume has remained stable, so it's only the ROs that we're having problems with.

This volume is released on an hourly basis, and normally has 3 RO replications. What's been happening, is that some point in between replications, the volume is taken offline-

FileLog:
Thu Aug 3 12:46:42 2006 VAttachVolume: volume salvage flag is ON for /vicepc//V1970897351.vol; volume needs salvage

VolserLog:
Thu Aug 3 12:46:42 2006 VAttachVolume: volume salvage flag is ON for /vicepc/V1970897351.vol; volume needs salvage

There is no other relevant entry in the logs as to WHY the volume is being taken offline. I'll be adding some debug code to the fileserver shortly to see if I can nail down where this is occurring, if no one else has any leads.

Here's the volume info-

# /usr/afs/bin/volinfo -volumeid 1970897351
Inode 219522: Good magic 78a1b2c5 and version 1
Inode 219523: Good magic 99776655 and version 1
Inode 219524: Good magic 88664433 and version 1
Volume header for volume 1970897351 (s.common.readonly)
stamp.magic = 78a1b2c5, stamp.version = 1
inUse = 0, inService = 1, blessed = 1, needsSalvaged = 1, dontSalvage = 0
type = 1 (readonly), uniquifier = 1070251, needsCallback = 0, destroyMe = 0
id = 1970897351, parentId = 1970897350, cloneId = 1970897351, backupId = 1970897352, restoredFromId = 0 maxquota = 200000, minquota = 0, maxfiles = 0, filecount = 1022, diskused = 125611 creationDate = 1154622174 (2006/08/03.12:22:54), copyDate = 1154622174 (2006/08/03.12:22:54) backupDate = 1154577821 (2006/08/03.00:03:41), expirationDate = 0 (1969/12/31.19:00:00) accessDate = 0 (1969/12/31.19:00:00), updateDate = 1154622150 (2006/08/03.12:22:30)
owner = 0, accountNumber = 0
dayUse = 36575; week = (0, 0, 0, 0, 0, 0, 0), dayUseDate = 1154540160 (2006/08/02.13:36:00)

Thanks,

Kevin Hildebrand
University of Maryland, College Park
Project Glue
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to