Hello Everyone,

We recently installed OpenAFS 1.6.2 on one of our fileservers in preparation for migrating the rest of our cell to the latest 1.6.x release. One of the driving factors behind upgrading to 1.6.x is to support volumes larger than 2TB.

Currently, the rest of the servers in our cell are running a mixture of 1.4.x releases. The database servers are all running 1.4.5.

Like most other sites, we dump our volumes daily to disk using 'vos dump' so that they can be backed up using our enterprise backup system. While performing a dump of the volumes on the fileserver running 1.6.2 we noticed a changed behavior in the volume status (from what occurs in 1.4.x) while a dump is in progress.

When a 'vos dump' is performed on a volume that lives on a 1.4.x fileserver, a 'vos ex' and 'vos listvol' have the following behavior:

root@fileserver02:# vos ex 537142259
        **** Volume 537142259 is busy ****

                RWrite: 537142257     Backup: 537142259
                number of sites -> 1
                   server fileserver02 partition /vicepb RW Site


root@fileserver02:# vos listvol b locahost -local

        Total number of volumes on server localhost partition /vicepb: 6
        my.volume.6                       537142257 RW   21191001 K On-line
        my.volume.7                       536995501 RW    2362268 K On-line
        my.volume.7.backup                536995532 BK    2362268 K On-line
        my.volume.8                       537089944 RW     268280 K On-line
        my.volume.8.backup                537089946 BK     268280 K On-line
        **** Volume 537142259 is busy ****


However on a fileserver running 1.6.2 when running a 'vos ex'
against the volume being dumped, vos reports that the volume does not exist. Furthermore, a vos listvol on the partition shows: '**** Could not attach volume 537142257 ****'.

root@fileserver05:# vos ex 537466433

        Could not fetch the information about volume 537466433 from the server
        : No such device
        Volume does not exist on server fileserver05 as indicated by the VLDB

        Dump only information from VLDB

        test.volume.5
            RWrite: 537466431     Backup: 537466433
             number of sites -> 1
               server fileserver05 partition /vicepa RW Site

root@fileserver05:# vos listvol locahost -local

        Total number of volumes on server localhost partition /vicepa: 6
        test.volume.3                     537465393 RW          4 K On-line
        test.volume.3.backup              537465395 BK          4 K On-line
        test.volume.4                     537465396 RW 1539693624 K On-line
        test.volume.4.backup              537465398 BK 1539693624 K On-line
        test.volume.5                     537466431 RW   99958788 K On-line
        **** Could not attach volume 537466433 ****


So was this change in behavior from 1.4.x to 1.6.x intentional or are we
encountering a bug ? Perhaps this is being caused by our DB servers still being at 1.4.5 ?

We have scripts that periodically do a vos listvol across all our fileservers and look for volumes that could not be attached or possibly offline. This is one of the ways in which we monitor the availability of our volumes. But with the new behavior in 1.6.x, there is no easy way at first glance to distinguish whether there is an actual problem with the volume or if is in the process of being dumped.


Thanks.




_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to