[zfs-discuss] fault.fs.zfs.vdev.io

2009-10-21 Thread Matthew C Aycock
I have several of these messages from fmdump:

 fmdump -v -u 98abae95-8053-4cdc-d91a-dad89b125db4~
TIME UUID SUNW-MSG-ID
Sep 18 00:45:23.7621 98abae95-8053-4cdc-d91a-dad89b125db4 ZFS-8000-FD
  100%  fault.fs.zfs.vdev.io

Problem in: zfs://pool=mzfs/vdev=a414878cf09644a
   Affects: zfs://pool=mzfs/vdev=a414878cf09644a
   FRU: -
  Location: -

Oct 21 10:34:41.8014 98abae95-8053-4cdc-d91a-dad89b125db4 FMD-8000-4M Repaired
  100%  fault.fs.zfs.vdev.io

Problem in: zfs://pool=mzfs/vdev=a414878cf09644a
   Affects: zfs://pool=mzfs/vdev=a414878cf09644a
   FRU: -
  Location: -

I am trying to determine which of the four vdevs is involved. Hdow do I 
translate vdev=a414878cf09644a a cWtXdYsZ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Mirrors braindead?

2008-10-07 Thread Matthew C Aycock
I recently ran into a problem for the second time with ZFS mirrors. I mirror 
between two different physical arrays for some of my data. One array (SE3511) 
had a catastrophic failure and was unresponsive. Thus, according to the ZFS in 
s10u3 it just basically waits for the array to come back and hangs pretty much 
all IO to the zpool. I was told by Sun service that there were enhancements in 
the upcoming S10 10/08 release that will help. 

My understanding of the code being delivered with S10 10/08 is that on 2-way 
mirrors (which is what I use) that if this same situation occurs again, ZFS 
will allow reads to happen but writes are still going to be queued until the 
other half of the mirror comes back.

Is it just me or have we gone backwards? The whole point of mirroring is so 
that if half the mirror goes we survive and can fix the problem with little to 
NO impact to the running system. Is this really true? With ZFS root also being 
available in S10 10/08 I would not want it anywhere near my root filesystem if 
this is really the behavior.

Any information would be GREATLY appreciated!

BlueUmp
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Mirror Problem

2008-06-16 Thread Matthew C Aycock
Well, I have a zpool created that contains four vdevs. Each Vdev is a mirror of 
a T3B lun and a corresponding lun of a SE3511 brick. I did this since I was new 
with ZFS and wanted to ensure that my data would survive an array failure. It 
turns out that I was smart for doing this :)

I had a hardware failure on the SE3511 that caused the complete RAID5 lun on 
the se3511 to die. (The first glance showed 6 drives failed :( ) However, I 
would have expected that ZFS would detect the failed mirror halves and offline 
them as would ODS and VxVM. To my shock, it basically hung the server. I 
eventually had to unmap the SE3511 luns and replace them space I had available 
from another brick in the SE3511. I then did a zpool replace and ZFS reslivered 
the data.

So, why did ZFS hang my server?

This is on Solaris 11/06 kernel patch 127111-05 and ZFS version 4.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Odd behavior of NFS of ZFS versus UFS

2007-12-21 Thread Matthew C Aycock
I have a test cluster running HA-NFS that shares both ufs and zfs based file 
systems. However, the behavior that I am seeing is a little perplexing.

The Setup: I have Sun Cluster 3.2 on a pair of SunBlade 1000's connecting to 
two T3B partner groups through a QLogic switch. All four bricks of the T3B are 
configured as RAID-5 with a hot spare. One brick from each pair is mirrored 
with VxVM 4.1 with a ufs file system on top of the mirror. I have mirrored the 
other two bricks via a Zpool. I have configured an HAStoragePlus resource for 
the datadg VxVM disk group and another one for the hazfs Zpool.  Both are a 
part of my single nfs-rg. All machines are connected via 100MB switches.

I have a small test program that was created to detect a particular "problem" 
that we were having. Its very simple and I will include the c code at the end. 
What is does is to time the creation of a file, do an 8k synchronous write, and 
close the file. If the time is greater than 1 second, it prints out the elapsed 
time. Very simple.

The Test: I have two identical SunBlade 2500s that each mount a file system, 
run a loop of iozone 500 then sleep 10 seconds, run nf (my test program) on the 
mounted file system. One does this on the ZFS based file system and the other 
on the UFS based one.

The Results: On the UFS based filesystem, nf reports ZERO output. Thus, it 
never took more than a second to do the test. On the ZFS based mount point I 
see multiple delays ranging from 2 to 6 seconds. So, I reversed the roles of 
the machines and ran the test again with virtually the save results.

The $1000 Question: Why would this happen?

The Code:
#include 
#include 
#include 
#include 
#include 
#include 
#include 


void main () {


char nbuff[32];
char data [8192];
int fd;
time_t start,finish;
char date[256];

while (1) {
start=time(0);
sprintf(nbuff,"TEMP%d", rand());
fd=open(nbuff, O_RDWR| O_CREAT |O_SYNC, 0777);
write (fd, data, sizeof (data));
close (fd);
unlink(nbuff);
finish=time(0);
if ((finish - start) > 1) {
cftime(date, "%c", &start);
fprintf(stderr,"%s elapsed=%d\n",date, finish-start);
}
sleep(1);
}
}
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] HA-NFS AND HA-ZFS

2007-12-17 Thread Matthew C Aycock
We are currently running sun cluster 3.2 on solaris 10u3. We are using ufs/vxvm 
4.1 as our shared file systems. However, I would like to migrate to HA-NFS on 
ZFS. Since there is no conversion process from UFS to ZFS other than copy, I 
would like to migrate on my own time. To do this I am planning to add a new 
zpool HAStoragePlus resource to my existing HA-NFS resource group. This way I 
can migrate data from my existing UFS to ZFS on my own time and the clients 
will not know the difference.

I made sure that the zpool was available on both nodes of the cluster. I then 
created a new HAStoragePlus resource for the zpool. I updated my NFS resource 
to depend on both HAStoragePlus resources. I added the two test file systems to 
the current dfstab.nfs-rs file.
I manually ran the shares and I was able to mount the new zfs file system. 
However, once the monitor ran it re-shared I guess and now the ZFS based 
filesystems are not available.

I read that you are not to add the ZFS based file systems to the 
FileSystemMountPoints property. Any ideas?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] S10u4 in kernel sharetab

2007-10-24 Thread Matthew C Aycock
There was a log of talk about ZFS and NFS shares being a problem when there was 
a large number of filesystems. There was a fix that in part included an in 
kernel sharetab (I think :) Does anyone know if this has made it into S10u4?

Thanks,

BlueUmp
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Disk replacement/upgrade

2007-08-02 Thread Matthew C Aycock
I am playing with ZFS on a jetStor 516F with 9 1TB E-SATA drives. This is our 
first real tests with ZFS and I working on how to replace our HA-NFS ufs file 
systems with ZFS counterparts. One of the things I am concerned with is how do 
I replace a disk array/vdev in a pool? It appears that is not possible at the 
moment.

For example, I have this array that I want to replace the drives in with bigger 
ones. I currently have 3 raidz vdevs and I am using about two thirds of the 
total space. So, to keep ahead of the curve, I want to replace the 1TB drvies 
with 1.5TB drives. 

Another example, would be that I have a pool with some older T3Bs and newer 
SE3511. I want to remove the T3Bs from the pool and replace them with an 
expansion tray on the SE3511. 

Any idea when I might be able to do this?

Matt
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS

2006-12-12 Thread Matthew C Aycock
We are currently working on a plan to upgrade our HA-NFS cluster that uses 
HA-StoragePlus and VxVM 3.2 on Solaris 9 to Solaris 10 and ZFS. Is there a 
known procedure or best practice for this? I have enough free disk space to 
recreate all the filesystems and copy the data if necessary, but would like to 
avoid copying if possible.

Also, I am considering what type of zpools to create. I have a SAN with T3Bs 
and SE3511s. Since neither of these can work as a JBOD (at lesat that is what I 
remember) I guess I am going  to have to add in the LUNS in a mirrored zpool of 
the Raid-5 Luns?

We are at the extreme start of this project and I was hoping for some guidance 
as to what direction to start.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [raidz] file not removed: No space left on device

2006-07-05 Thread Matthew C Aycock
Eric,

To ask the obvious but crucial question :) What is the best way to truncate the 
file on ZFS?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: nevada_41 and zfs disk partition

2006-06-20 Thread Matthew C Aycock
> 
> What does vmstat look like ?
> Also zpool iostat 1.
> 

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tank 291M  9.65G  0 11   110K   694K
tank 301M  9.64G  0 32  0  87.9K
tank 301M  9.64G  0  0  0  0
tank 301M  9.64G 31  0  3.96M  0
tank 301M  9.64G  0 88  0  4.91M
tank 311M  9.63G 16 77  2.05M  2.64M
tank 311M  9.63G 31  0  3.88M  0
tank 311M  9.63G  0  0  0  0
tank 311M  9.63G 31 62  3.96M  3.88M
tank 321M  9.62G 15101  1.90M  3.08M
tank 321M  9.62G  0  0  0  0
tank 321M  9.62G 31  0  3.96M  0
tank 321M  9.62G  0 88  0  4.47M

kthr  memorypagedisk  faults  cpu
 r b w   swap  free  re  mf pi po fr de sr dd s1 -- --   in   sy   cs us sy id
 0 0 0 8395576 67320  0  69 224 0  0  0  0 104 0  0  0  578 3463 2210 16 17 67
 13 0 0 8395456 67192 1 109 16  0  0  0  0 70  0  0  0  466 1176 1055  7 73 20
 0 0 0 8395416 67112  0  21 16  0  0  0  0  2  0  0  0  327  809  452  2  2 96
 0 0 0 8395416 67112  0   3  0  0  0  0  0  0  0  0  0  370 1947  818  6  4 90
 0 0 0 8395416 67112  0   2  0  0  0  0  0  0  0  0  0  306 1358  672  8  3 89
 0 0 0 8395416 67112  0   4  0  0  0  0  0  0  0  0  0  338  822  409  1  1 98
 1 0 0 8395416 67112  0  10  0  0  0  0  0  0  0  0  0  320 3152 1415 20  8 72
 0 0 0 8396568 68200  0  16  0  0  0  0  0 12  0  0  0  381 1273  633  5  5 90
 0 0 0 8396568 68200  0   6  8  0  0  0  0  1  0  0  0  320 1613  620  4  3 93
 0 0 0 8396568 68192  0   0  0  0  0  0  0  0  0  0  0  352 1198  595  5  2 93
 0 0 0 8396568 68192  0   1  0  0  0  0  0  0  0  0  0  292  843  413  2  2 96
 0 0 0 8396568 68192  0   0  0  0  0  0  0  0  0  0  0  343  818  405  1  1 98
 0 0 0 8396568 68192  0   0  0  0  0  0  0  0  0  0  0  308  803  412  1  1 98
 0 0 0 8396568 68192  0   0  0  0  0  0  0  0  0  0  0  345 1236  471  2  3 95
 0 0 0 8396568 68192  0   0  0  0  0  0  0  0  0  0  0  296 1570  709  6  2 92
 0 0 0 8396568 68192 13 142  0  0  0  0  0  0  0  0  0  380 3134 1182 14  6 80
 0 0 0 8396568 68192  0   4  8  0  0  0  0  1  0  0  0  301 1034  536  5  4 91
 0 0 0 8396568 68184  0   0  0  0  0  0  0  0  0  0  0  343  811  417  1  2 97
 0 0 0 8396568 68184  0   0  0  0  0  0  0  0  0  0  0  310 1220  452  1  2 97
 kthr  memorypagedisk  faults  cpu
 r b w   swap  free  re  mf pi po fr de sr dd s1 -- --   in   sy   cs us sy id
 0 0 0 8396568 68176  0   0  0  0  0  0  0  1  0  0  0  373 1715  651  4  2 94
 0 0 0 8396568 68176  0   0  0  0  0  0  0  0  0  0  0  336 1739  647  3  2 95
 0 0 0 8396160 67272 51 334 565 0  0  0  0 60  0  0  0  558 4029 1651 10 14 76
 0 0 0 8396776 68184  3  99  0  0  0  0  0  0  0  0  0  357 1204  577  4  3 93
 0 0 0 8396776 68184  0   8  8  0  0  0  0  1  0  0  0  356 3497 1353 16  7 77
 0 0 0 8396776 68176  0   0  0  0  0  0  0  0  0  0  0  311 1128  477  2  1 97
 0 0 0 8396776 68176  0   6  0  0  0  0  0  0  0  0  0  357 1259  518  3  2 95
 0 0 0 8396776 68176  0   1  0  0  0  0  0  0  0  0  0  312 1166  495  2  1 97
 0 0 0 8396776 68176  0  50 71  0  0  0  0  9  0  0  0  366 1207  540 25  3 72

> Do you have any disk based swap ?
> 
Yes, there is an 8GB swap partition on the system and 2GB of RAM.

> One best practice we probably will be coming out with
> is to
> configure at least physmem of swap with ZFS (at least
> as of
> this release).
> 
> The partly hung system could be this :
> 
>   http://bugs.opensolaris.org/bugdatabase/view_bug.do?b
> ug_id=6429205
> 6429205 each zpool needs to monitor it's  throughput
> t and throttle heavy writers
> 
> The fix state is "in-progress".
> 
I will look at this.

> What throughput do you get for the full untar
> (untared size / elapse time) ?
# tar xf thunderbird-1.5.0.4-source.tar  2.77s user 35.36s system 33% cpu 
1:54.19 

260M/114 =~ 2.28 MB/s on this IDE disk
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] nevada_41 and zfs disk partition

2006-06-20 Thread Matthew C Aycock
I just installed build 41 of Nevada on a SunBlade 1500 with 2GB of ram. I 
wanted to check out zfs since the delay of S10U2 I really could not wait any 
longer :)

I installed it on my system and created a zpool out of an approximately 40GB 
disk slice. I then wanted to build a version of thunderbird that contains a 
local patch that we like. So I download the source tar ball. I try to untar it 
on the zfs filesystem and the machine comes to its knees. At times it appears 
that the system has hung. A Sol10 version of top shows that most of the cpu 
time is in the kernel (not suprising).

The steps I used to create the pool/fs is basicly the following:

# zpool create space /dev/dsk/c0t0d0s7
# zfs create space/src
# cd /space/src/
# gtar xzf thunderbird.tar.gz

Any ideas on how I can try and do a little debug of this? Has anyone else seen 
this behavior?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss