hi folks, this is a kind of involved question, sorry. i've already asked the vendor and the centos forums and gotten no useful advice or confirmation that this problem has ever been seen before by anyone else. the main thing i'm looking for is:
a) has anyone ever seen anything like this before? and b) any suggestions for ameliorating? i'm seeing some strange behavior with hot-swap drives and am looking for help in further debugging or solving the issue. here is our setup: centos 5.5 x86_64, areca 1680x hardware raid card, external enclosure with sata drives. here is the behavior that is baffling me: on the sata drives, we create a single drive volumeset and raidset via the card firmware, using up the entire drive. we set the channel/raid/lun combo to a unique value across all of the drives we use (that information is stored in the raid card's "metadata" associated with the drive--so it stays the same across ejections/insertions/reboots, and you can end up with more than one drive with the same raid/channel/lun and that creates known problems with the raid cards, so we avoid it). we label it with the date of initialization via the raid card, and then we format that drive (/dev/sdf, for instance) with ext3fs and finally use it to make a backup. all of this works fine. when we unmount & eject the drives, it is sometimes but not always the case that a different drive stuck into the enclosure next shows a "ghost" filesystem from the old drive. it is always the case that if we eject the drive corresponding to sdf, only another drive that becomes sdf will see the old filesystem-not sdg or sde. the areca volumeset and raidset labels (examined via the areca utility 'cli64' that accesses the card firmware) have changed to reflect the new drive (which had an old backup on it and is labelled as such). however, in linux, the new drive will show up with part of the old drive's filesystem, and the old drive's statistics (total size and % of space consumed on the filesystem). so when i mount and browse the replacement drive, i see the root directory of the old drive, and "df" reports the same percentages as the old drive. it seems like the filesystem metadata is cached or something, and the new scsi device retains that cached information somehow. but no matter how many times we eject and insert, or umount and remount the "new" sdf, we cannot seem to get to the existing filesystem on the newly inserted drive. to recap: areca's CLI64 utility shows new (correct) disk info. linux (by navigating in a shell or file browser to the mount point of the mounted volume) shows bad, old info. we effectively cannot use the drives to restore from because their contents are hidden by the ghost of the more recent backup. (i say ghost because though the drive appears full, none of the (old) files and only a few of the directories are on the filesystem). the only way we've found to fix it is a reboot, after which the drive shows up with everything correct (areca labels and filesystem). we sadly cannot just unload and reload the drivers for the cards because they are constantly in use by other filesystems. taking the erroneous drive out and putting it back in, even in a different slot, does not fix the issue--it still has the ghost image. deleting and recreating the vs/rs does fix it too--but at the cost of losing the data that was on the drive - which we'd like to use! the way the areca driver and the scsi subsystem interact is not entirely known to me, but upon deleting the raidset, the scsi device goes away, and upon recreating it, a new scsi device appears (via the kernel logs, seen in 'dmesg' output) have you heard of this happening before? and is there way to fix this or even examine the problem in more detail? (nothing unexpected is logged by the kernel). i've asked areca's tech support about this but they say nobody else has reported similar problems and they have no suggestions. and the fact that linux seems to see the old info while the card's firmware sees the new info suggests to me that it may be more of a general kernel issue than an issue with the areca card--though i don't know if the driver is to blame. thanks for your advice. please feel free to ask any other questions and i will get you answers ASAP! _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
