Hello,

I just found this list and am very excited that you all are here! I have a 
homemade ZFS server that serves as our poor man's Thumper (we named it 
thumpthis) and provides primarily NFS shares for our VMware environment. As is 
often the case, the server has developed a hardware problem mere days before I 
am ready to go live with a new replacement server (thumpthat). At first the 
problem appeared to be a bad drive, but now I am not so sure. I would like to 
sanity check my thought process with this list and see if anybody has some 
different ideas. Here is a quick timeline of the trouble:

1. I noticed the following when running a routine zpool status:

<snip>
          mirror    DEGRADED     0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  REMOVED      0  368K     0
</snip>

2. I determined which drive appeared to be offline by watching drive lights and 
then rebooted the server.

3. Initially the drive appeared to be fine and ZFS picked it backup and 
resilvered the mirror. About 30 minutes later I noticed that the same drive was 
again marked REMOVED.

4. I shut the server down and replaced the drives with a new, larger disk.

5. I ran zpool replace tank c3t3d0 and it happily went to work on the 
replacement drive. A few hours later the resilver was complete and all seemed 
well.

6. The next day, about 12 hours after installing the new drive I found the same 
error message (here's the whole pool):

config:

        NAME        STATE     READ WRITE CKSUM
        tank        DEGRADED     0     0     0
          mirror    ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c3t1d0  ONLINE       0     0     0
          mirror    DEGRADED     0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  REMOVED      0  370K     0
          mirror    ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
            c4t3d0  ONLINE       0     0     0

errors: No known data errors

This is where I am now. Either my new hard drive is bad (not impossible) or I 
am looking at some other hardware failure, possibly the AOC-SAT2-MV8 controller 
card. I have a spare controller card (same make and model purchased at the same 
time we built the server) and plan to replace that tonight. Does that seem like 
the correct course of action? Are there any steps I can take beforehand to zero 
in on the problem? Any words of encouragement or wisdom?

Regards,
Chris Dunbar
Earthside, LLC

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to