hi folks,

this is a kind of involved question, sorry.  i've already asked the
vendor and the centos forums and gotten no useful advice or
confirmation that this problem has ever been seen before by anyone
else.  the main thing i'm looking for is:

a) has anyone ever seen anything like this before? and
b) any suggestions for ameliorating?


i'm seeing some strange behavior with hot-swap drives and am looking
for help in further debugging or solving the issue. here is our setup:

centos 5.5 x86_64, areca 1680x hardware raid card, external enclosure
with sata drives.

here is the behavior that is baffling me:

on the sata drives, we create a single drive volumeset and raidset via
the card firmware, using up the entire drive. we set the
channel/raid/lun combo to a unique value across all of the drives we
use (that information is stored in the raid card's "metadata"
associated with the drive--so it stays the same across
ejections/insertions/reboots, and you can end up with more than one
drive with the same raid/channel/lun and that creates known problems
with the raid cards, so we avoid it). we label it with the date of
initialization via the raid card, and then we format that drive
(/dev/sdf, for instance) with ext3fs and finally use it to make a
backup. all of this works fine.

when we unmount & eject the drives, it is sometimes but not always the
case that a different drive stuck into the enclosure next shows a
"ghost" filesystem from the old drive. it is always the case that if
we eject the drive corresponding to sdf, only another drive that
becomes sdf will see the old filesystem-not sdg or sde. the areca
volumeset and raidset labels (examined via the areca utility 'cli64'
that accesses the card firmware) have changed to reflect the new drive
(which had an old backup on it and is labelled as such).

however, in linux, the new drive will show up with part of the old
drive's filesystem, and the old drive's statistics (total size and %
of space consumed on the filesystem). so when i mount and browse the
replacement drive, i see the root directory of the old drive, and "df"
reports the same percentages as the old drive. it seems like the
filesystem metadata is cached or something, and the new scsi device
retains that cached information somehow. but no matter how many times
we eject and insert, or umount and remount the "new" sdf, we cannot
seem to get to the existing filesystem on the newly inserted drive.

to recap: areca's CLI64 utility shows new (correct) disk info. linux
(by navigating in a shell or file browser to the mount point of the
mounted volume) shows bad, old info. we effectively cannot use the
drives to restore from because their contents are hidden by the ghost
of the more recent backup. (i say ghost because though the drive
appears full, none of the (old) files and only a few of the
directories are on the filesystem).

the only way we've found to fix it is a reboot, after which the drive
shows up with everything correct (areca labels and filesystem). we
sadly cannot just unload and reload the drivers for the cards because
they are constantly in use by other filesystems. taking the erroneous
drive out and putting it back in, even in a different slot, does not
fix the issue--it still has the ghost image.

deleting and recreating the vs/rs does fix it too--but at the cost of
losing the data that was on the drive - which we'd like to use! the
way the areca driver and the scsi subsystem interact is not entirely
known to me, but upon deleting the raidset, the scsi device goes away,
and upon recreating it, a new scsi device appears (via the kernel
logs, seen in 'dmesg' output)

have you heard of this happening before? and is there way to fix this
or even examine the problem in more detail? (nothing unexpected is
logged by the kernel). i've asked areca's tech support about this but
they say nobody else has reported similar problems and they have no
suggestions. and the fact that linux seems to see the old info while
the card's firmware sees the new info suggests to me that it may be
more of a general kernel issue than an issue with the areca
card--though i don't know if the driver is to blame.

thanks for your advice. please feel free to ask any other questions
and i will get you answers ASAP!
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to