[zfs-discuss] fault.fs.zfs.vdev.io
I have several of these messages from fmdump: fmdump -v -u 98abae95-8053-4cdc-d91a-dad89b125db4~ TIME UUID SUNW-MSG-ID Sep 18 00:45:23.7621 98abae95-8053-4cdc-d91a-dad89b125db4 ZFS-8000-FD 100% fault.fs.zfs.vdev.io Problem in: zfs://pool=mzfs/vdev=a414878cf09644a Affects: zfs://pool=mzfs/vdev=a414878cf09644a FRU: - Location: - Oct 21 10:34:41.8014 98abae95-8053-4cdc-d91a-dad89b125db4 FMD-8000-4M Repaired 100% fault.fs.zfs.vdev.io Problem in: zfs://pool=mzfs/vdev=a414878cf09644a Affects: zfs://pool=mzfs/vdev=a414878cf09644a FRU: - Location: - I am trying to determine which of the four vdevs is involved. Hdow do I translate vdev=a414878cf09644a a cWtXdYsZ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Mirrors braindead?
I recently ran into a problem for the second time with ZFS mirrors. I mirror between two different physical arrays for some of my data. One array (SE3511) had a catastrophic failure and was unresponsive. Thus, according to the ZFS in s10u3 it just basically waits for the array to come back and hangs pretty much all IO to the zpool. I was told by Sun service that there were enhancements in the upcoming S10 10/08 release that will help. My understanding of the code being delivered with S10 10/08 is that on 2-way mirrors (which is what I use) that if this same situation occurs again, ZFS will allow reads to happen but writes are still going to be queued until the other half of the mirror comes back. Is it just me or have we gone backwards? The whole point of mirroring is so that if half the mirror goes we survive and can fix the problem with little to NO impact to the running system. Is this really true? With ZFS root also being available in S10 10/08 I would not want it anywhere near my root filesystem if this is really the behavior. Any information would be GREATLY appreciated! BlueUmp -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Mirror Problem
Well, I have a zpool created that contains four vdevs. Each Vdev is a mirror of a T3B lun and a corresponding lun of a SE3511 brick. I did this since I was new with ZFS and wanted to ensure that my data would survive an array failure. It turns out that I was smart for doing this :) I had a hardware failure on the SE3511 that caused the complete RAID5 lun on the se3511 to die. (The first glance showed 6 drives failed :( ) However, I would have expected that ZFS would detect the failed mirror halves and offline them as would ODS and VxVM. To my shock, it basically hung the server. I eventually had to unmap the SE3511 luns and replace them space I had available from another brick in the SE3511. I then did a zpool replace and ZFS reslivered the data. So, why did ZFS hang my server? This is on Solaris 11/06 kernel patch 127111-05 and ZFS version 4. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Odd behavior of NFS of ZFS versus UFS
I have a test cluster running HA-NFS that shares both ufs and zfs based file systems. However, the behavior that I am seeing is a little perplexing. The Setup: I have Sun Cluster 3.2 on a pair of SunBlade 1000's connecting to two T3B partner groups through a QLogic switch. All four bricks of the T3B are configured as RAID-5 with a hot spare. One brick from each pair is mirrored with VxVM 4.1 with a ufs file system on top of the mirror. I have mirrored the other two bricks via a Zpool. I have configured an HAStoragePlus resource for the datadg VxVM disk group and another one for the hazfs Zpool. Both are a part of my single nfs-rg. All machines are connected via 100MB switches. I have a small test program that was created to detect a particular "problem" that we were having. Its very simple and I will include the c code at the end. What is does is to time the creation of a file, do an 8k synchronous write, and close the file. If the time is greater than 1 second, it prints out the elapsed time. Very simple. The Test: I have two identical SunBlade 2500s that each mount a file system, run a loop of iozone 500 then sleep 10 seconds, run nf (my test program) on the mounted file system. One does this on the ZFS based file system and the other on the UFS based one. The Results: On the UFS based filesystem, nf reports ZERO output. Thus, it never took more than a second to do the test. On the ZFS based mount point I see multiple delays ranging from 2 to 6 seconds. So, I reversed the roles of the machines and ran the test again with virtually the save results. The $1000 Question: Why would this happen? The Code: #include #include #include #include #include #include #include void main () { char nbuff[32]; char data [8192]; int fd; time_t start,finish; char date[256]; while (1) { start=time(0); sprintf(nbuff,"TEMP%d", rand()); fd=open(nbuff, O_RDWR| O_CREAT |O_SYNC, 0777); write (fd, data, sizeof (data)); close (fd); unlink(nbuff); finish=time(0); if ((finish - start) > 1) { cftime(date, "%c", &start); fprintf(stderr,"%s elapsed=%d\n",date, finish-start); } sleep(1); } } This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] HA-NFS AND HA-ZFS
We are currently running sun cluster 3.2 on solaris 10u3. We are using ufs/vxvm 4.1 as our shared file systems. However, I would like to migrate to HA-NFS on ZFS. Since there is no conversion process from UFS to ZFS other than copy, I would like to migrate on my own time. To do this I am planning to add a new zpool HAStoragePlus resource to my existing HA-NFS resource group. This way I can migrate data from my existing UFS to ZFS on my own time and the clients will not know the difference. I made sure that the zpool was available on both nodes of the cluster. I then created a new HAStoragePlus resource for the zpool. I updated my NFS resource to depend on both HAStoragePlus resources. I added the two test file systems to the current dfstab.nfs-rs file. I manually ran the shares and I was able to mount the new zfs file system. However, once the monitor ran it re-shared I guess and now the ZFS based filesystems are not available. I read that you are not to add the ZFS based file systems to the FileSystemMountPoints property. Any ideas? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] S10u4 in kernel sharetab
There was a log of talk about ZFS and NFS shares being a problem when there was a large number of filesystems. There was a fix that in part included an in kernel sharetab (I think :) Does anyone know if this has made it into S10u4? Thanks, BlueUmp This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Disk replacement/upgrade
I am playing with ZFS on a jetStor 516F with 9 1TB E-SATA drives. This is our first real tests with ZFS and I working on how to replace our HA-NFS ufs file systems with ZFS counterparts. One of the things I am concerned with is how do I replace a disk array/vdev in a pool? It appears that is not possible at the moment. For example, I have this array that I want to replace the drives in with bigger ones. I currently have 3 raidz vdevs and I am using about two thirds of the total space. So, to keep ahead of the curve, I want to replace the 1TB drvies with 1.5TB drives. Another example, would be that I have a pool with some older T3Bs and newer SE3511. I want to remove the T3Bs from the pool and replace them with an expansion tray on the SE3511. Any idea when I might be able to do this? Matt This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS
We are currently working on a plan to upgrade our HA-NFS cluster that uses HA-StoragePlus and VxVM 3.2 on Solaris 9 to Solaris 10 and ZFS. Is there a known procedure or best practice for this? I have enough free disk space to recreate all the filesystems and copy the data if necessary, but would like to avoid copying if possible. Also, I am considering what type of zpools to create. I have a SAN with T3Bs and SE3511s. Since neither of these can work as a JBOD (at lesat that is what I remember) I guess I am going to have to add in the LUNS in a mirrored zpool of the Raid-5 Luns? We are at the extreme start of this project and I was hoping for some guidance as to what direction to start. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: [raidz] file not removed: No space left on device
Eric, To ask the obvious but crucial question :) What is the best way to truncate the file on ZFS? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: nevada_41 and zfs disk partition
> > What does vmstat look like ? > Also zpool iostat 1. > capacity operationsbandwidth pool used avail read write read write -- - - - - - - tank 291M 9.65G 0 11 110K 694K tank 301M 9.64G 0 32 0 87.9K tank 301M 9.64G 0 0 0 0 tank 301M 9.64G 31 0 3.96M 0 tank 301M 9.64G 0 88 0 4.91M tank 311M 9.63G 16 77 2.05M 2.64M tank 311M 9.63G 31 0 3.88M 0 tank 311M 9.63G 0 0 0 0 tank 311M 9.63G 31 62 3.96M 3.88M tank 321M 9.62G 15101 1.90M 3.08M tank 321M 9.62G 0 0 0 0 tank 321M 9.62G 31 0 3.96M 0 tank 321M 9.62G 0 88 0 4.47M kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr dd s1 -- -- in sy cs us sy id 0 0 0 8395576 67320 0 69 224 0 0 0 0 104 0 0 0 578 3463 2210 16 17 67 13 0 0 8395456 67192 1 109 16 0 0 0 0 70 0 0 0 466 1176 1055 7 73 20 0 0 0 8395416 67112 0 21 16 0 0 0 0 2 0 0 0 327 809 452 2 2 96 0 0 0 8395416 67112 0 3 0 0 0 0 0 0 0 0 0 370 1947 818 6 4 90 0 0 0 8395416 67112 0 2 0 0 0 0 0 0 0 0 0 306 1358 672 8 3 89 0 0 0 8395416 67112 0 4 0 0 0 0 0 0 0 0 0 338 822 409 1 1 98 1 0 0 8395416 67112 0 10 0 0 0 0 0 0 0 0 0 320 3152 1415 20 8 72 0 0 0 8396568 68200 0 16 0 0 0 0 0 12 0 0 0 381 1273 633 5 5 90 0 0 0 8396568 68200 0 6 8 0 0 0 0 1 0 0 0 320 1613 620 4 3 93 0 0 0 8396568 68192 0 0 0 0 0 0 0 0 0 0 0 352 1198 595 5 2 93 0 0 0 8396568 68192 0 1 0 0 0 0 0 0 0 0 0 292 843 413 2 2 96 0 0 0 8396568 68192 0 0 0 0 0 0 0 0 0 0 0 343 818 405 1 1 98 0 0 0 8396568 68192 0 0 0 0 0 0 0 0 0 0 0 308 803 412 1 1 98 0 0 0 8396568 68192 0 0 0 0 0 0 0 0 0 0 0 345 1236 471 2 3 95 0 0 0 8396568 68192 0 0 0 0 0 0 0 0 0 0 0 296 1570 709 6 2 92 0 0 0 8396568 68192 13 142 0 0 0 0 0 0 0 0 0 380 3134 1182 14 6 80 0 0 0 8396568 68192 0 4 8 0 0 0 0 1 0 0 0 301 1034 536 5 4 91 0 0 0 8396568 68184 0 0 0 0 0 0 0 0 0 0 0 343 811 417 1 2 97 0 0 0 8396568 68184 0 0 0 0 0 0 0 0 0 0 0 310 1220 452 1 2 97 kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr dd s1 -- -- in sy cs us sy id 0 0 0 8396568 68176 0 0 0 0 0 0 0 1 0 0 0 373 1715 651 4 2 94 0 0 0 8396568 68176 0 0 0 0 0 0 0 0 0 0 0 336 1739 647 3 2 95 0 0 0 8396160 67272 51 334 565 0 0 0 0 60 0 0 0 558 4029 1651 10 14 76 0 0 0 8396776 68184 3 99 0 0 0 0 0 0 0 0 0 357 1204 577 4 3 93 0 0 0 8396776 68184 0 8 8 0 0 0 0 1 0 0 0 356 3497 1353 16 7 77 0 0 0 8396776 68176 0 0 0 0 0 0 0 0 0 0 0 311 1128 477 2 1 97 0 0 0 8396776 68176 0 6 0 0 0 0 0 0 0 0 0 357 1259 518 3 2 95 0 0 0 8396776 68176 0 1 0 0 0 0 0 0 0 0 0 312 1166 495 2 1 97 0 0 0 8396776 68176 0 50 71 0 0 0 0 9 0 0 0 366 1207 540 25 3 72 > Do you have any disk based swap ? > Yes, there is an 8GB swap partition on the system and 2GB of RAM. > One best practice we probably will be coming out with > is to > configure at least physmem of swap with ZFS (at least > as of > this release). > > The partly hung system could be this : > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?b > ug_id=6429205 > 6429205 each zpool needs to monitor it's throughput > t and throttle heavy writers > > The fix state is "in-progress". > I will look at this. > What throughput do you get for the full untar > (untared size / elapse time) ? # tar xf thunderbird-1.5.0.4-source.tar 2.77s user 35.36s system 33% cpu 1:54.19 260M/114 =~ 2.28 MB/s on this IDE disk This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] nevada_41 and zfs disk partition
I just installed build 41 of Nevada on a SunBlade 1500 with 2GB of ram. I wanted to check out zfs since the delay of S10U2 I really could not wait any longer :) I installed it on my system and created a zpool out of an approximately 40GB disk slice. I then wanted to build a version of thunderbird that contains a local patch that we like. So I download the source tar ball. I try to untar it on the zfs filesystem and the machine comes to its knees. At times it appears that the system has hung. A Sol10 version of top shows that most of the cpu time is in the kernel (not suprising). The steps I used to create the pool/fs is basicly the following: # zpool create space /dev/dsk/c0t0d0s7 # zfs create space/src # cd /space/src/ # gtar xzf thunderbird.tar.gz Any ideas on how I can try and do a little debug of this? Has anyone else seen this behavior? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss