Re: [zfs-discuss] Can I recover filesystem from an offline pool?
10GB of memory + 5 days later. The pool was imported. this file server is a virtual machine. I allocated 2GB of memory and 2 CPU cores assume this was enough to mange 6 TB (6x 1TB disks). While the pool I am try to recover is only 700 GB and not the 6TB pool I am try to migrate. So I decided to borrow all available memory for this vm and increase it to 12 GB. What I immediately notice is the system does not hang as before and Hard drive activity light continue to flash so I left it running. After Days later the pool as imported. Here is some of my stats 1. system is running opensolaris svn_134 2. The pool I try to recover/import show about 150GB data with 50GB allocation 3x dedupped before the zfs send hung. 3. The Memory usage during the import showing around 2GB of free space i.e. 10GB used 4. The zpool would walk up once in a while for only a sec or so. very low CPU usage. I remember seeing only 2min of CPU time after 2 days. 5. during this whole import, all other zfs commands are blocked, also include not able to open new shell and SSH would hang after password is entered. I assume this issue is all due to the new dedup feature any only happen to pool containing dedupped dataset. Hopeful this issue fixed and available for testing soon. I thought the memory available for ARC cache are for performance only. I am surprise to learn having not enough available memory would actually hang system when you are deleting a dedupped dataset . I would consider this a major issue as the memory requirement seem to be depend on the characteristic of the dedupped dataset and there doesn't seem to have a good document explaining how much memories are needed. Hope this help will anyone consider testing the dedup feature -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv reliability
On Fri, May 28, 2010 at 10:05 AM, Gregory J. Benscoter gjb5...@arl.psu.edu wrote: I’m primarily concerned with in the possibility of a bit flop. If this occurs will the stream be lost? Or will the file that that bit flop occurred in be the only degraded file? Lastly how does the reliability of this plan compare to more traditional backup tools like tar, cpio, etc…? You could run the stream through something like par2 and then save the resulting mess of files to tape. It *should* protect you from bit flips, but at the expense of increased tape size. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] [RESOLVED] Re: No mount all at boot
I had empty directory /export/home created in root. It was preventing mount. Just deleted it and all is ok. On Sun, May 30, 2010 at 5:40 PM, me dea...@gmail.com wrote: I was trying to expand space of rpool. I didn't done it but after removing one (not in use) disk from VM configuration, system doesn't start (no X). After shell login i found out that there is no home: zfs mount rpool/ROOT/opensolaris / Home can be mount manually correctly. What is wrong? -- Dmitry -- Dmitry ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] expand zfs for OpenSolaris running inside vm
Thanks! It is exactly i was looking for. On Sat, May 29, 2010 at 12:44 AM, Cindy Swearingen cindy.swearin...@oracle.com wrote: 2. Attaching a larger disk to the root pool and then detaching the smaller disk I like #2 best. See this section in the ZFS troubleshooting wiki: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide Replacing/Relabeling the Root Pool Disk Size of pool is changed, i updated swap size too. Now i have detached old disk. I did installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 Reboot and fail to startup. Grub is loads, os loading screen shows and then restart :(. I loaded rescue disc console but don't know what to do. -- Dmitry ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] [RESOLVED] Re: expand zfs for OpenSolaris running inside vm
Reinstalling grub helped. What is the purpose of dump slice? On Sun, May 30, 2010 at 9:05 PM, me dea...@gmail.com wrote: Thanks! It is exactly i was looking for. On Sat, May 29, 2010 at 12:44 AM, Cindy Swearingen cindy.swearin...@oracle.com wrote: 2. Attaching a larger disk to the root pool and then detaching the smaller disk I like #2 best. See this section in the ZFS troubleshooting wiki: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide Replacing/Relabeling the Root Pool Disk Size of pool is changed, i updated swap size too. Now i have detached old disk. I did installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0 Reboot and fail to startup. Grub is loads, os loading screen shows and then restart :(. I loaded rescue disc console but don't know what to do. -- Dmitry -- Dmitry ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs/lofi/share panic
On 05/27/10 05:16 PM, Dennis Clarke wrote: I just tried this with a UFS based filesystem just for a lark. It never failed on UFS, regardless of the contents of /etc/dfs/dfstab. Guess I must now try this with a ZFS fs under that iso file. Just tried it again with b134 *with* share /mnt in /etc/dfs/dfstab. # mount -O -F hsfs /export/iso_images/moblin-2.1-PR-Final-ivi-201002090924.img /mnt # ls /mnt isolinux LiveOS # unshare /mnt /mnt: path doesn't exist # share /mnt # unshare /mnt # share /mnt Panic ensues (the following observed on the serial console); note that the dataset is not UFS! # May 30 13:35:44 host5 ufs: NOTICE: mount: not a UFS magic number (0x0) panic[cpu1]/thread=30001f5f560: BAD TRAP: type=31 rp=2a1014769a0 addr=218 mmu_fsr=0 occurred in module nfssrv due to a NULL pointer dereference Tried again after it rebooted Edited /etc/dfs/dfstab to remove the share /mnt # unshare /mnt # mount -O -F hsfs /backups/icon/moblin-2.1-PR-Final-ivi-201002090924.img /mnt # ls /mnt isolinux LiveOS # unshare /mnt /mnt: bad path # share /mnt # unshare /mnt # share /mnt No panic. So the problem all along appears to be what happens if you mount -O to an already shared mountpoint. Deliberately sharing before mounting (but with nothing in /etc/dfs/dfstab) resulted in a slightly different panic (more like the ones documented in the CR): panic[cpu1]/thread=30002345e0: BAD TRAP: type=34 rp=2a100f84460 addr=ff6f6c2f5267 mmu_fsr=0 unshare: alignment error: So CR6798273 should be amended to show the following: To reproduce, share (say) /mnt mount -O some-image-file /mnt share /mnt unshare /mnt share/mnt unshare ./mnt Highly reproducible panic ensues. Workaround - make sure mountpoints are not shared before mounting iso images stored on a ZFS dataset. So the problem, now seen to be relatively trivial, isn't fixed. at least in b134. For all of you who responded both off and on the list and motivated this experiment, much thanks. Perhaps someone with access to a more recent build could try this, and if it still happens, update and reopen CR6798273, although it doesn't seem very important now. Regards -- Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool/zfs list question
Hi all Using zpool/zfs list -H gives me a good overview of things, and is easy to parse, except that the allocation and data sizes are reported in 'human readable' form. For scripting, this is somehow non-optimal. Is there a way to report zpool/zfs stats in a fixed scale, like KiB or even bytes? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool/zfs list question
On Sun, May 30, 2010 at 11:46 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote: Is there a way to report zpool/zfs stats in a fixed scale, like KiB or even bytes? Some (but not all) commands use -p. -p Use exact (parseable) numeric output. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Disk space overhead (total volume size) by ZFS
I just wanted to make sure this is normal and is expected. I fully expected that as the file-system filled up I would see more disk space being used than with other file-systems due to its features but what I didn't expect was to lose out on ~500-600GB to be missing from the total volume size right at file-system creation. Comparing two systems, one being JFS and one being ZFS, one being raidz2 one being raid6. Here is the differences I see: ZFS: r...@opensolaris: 11:22 AM :/data# df -k /data Filesystemkbytesused avail capacity Mounted on data 17024716800 258872352 16765843815 2%/data JFS: r...@sabayonx86-64: 11:22 AM :~# df -k /data2 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd117577451416 2147912 17575303504 1% /data2 zpool list shows the raw capacity right? r...@opensolaris: 11:25 AM :/data# zpool list data NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT data 18.1T 278G 17.9T 1% 1.00x ONLINE - Ok, i would expect it to be rounded to 18.2 but that seems about right for 20 trillion bytes (what 20x1 TB is): r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print 20/1024/1024/1024/1024}' 18.1899 Now minus two drives for parity: r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print 18/1024/1024/1024/1024}' 16.3709 Yet when running zfs list it also lists the amount of storage significantly smaller: r...@opensolaris: 11:23 AM :~# zfs list data NAME USED AVAIL REFER MOUNTPOINT data 164K 15.9T 56.0K /data I would expect this to be 16.4T. Taking the df -k values JFS gives me a total volume size of: r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print 17577451416/1024/1024/1024}' 16.3703 and zfs is: r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print 17024716800/1024/1024/1024}' 15.8555 So basically with JFS I see no decrease in total volume size but a huge difference on ZFS. Is this normal/expected? Can anything be disabled to not lose 500-600 GB of space? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS
On Sun, May 30, 2010 at 2:37 PM, Sandon Van Ness san...@van-ness.com wrote: ZFS: r...@opensolaris: 11:22 AM :/data# df -k /data 'zfs list' is more accurate than df, since it will also show space used by snapshots. eg: bh...@basestar:~$ df -h /export/home/bhigh Filesystem size used avail capacity Mounted on tank/export/home/bhigh 5.3T 8.2G 2.8T 1%/export/home/bhigh bh...@basestar:~$ zfs list tank/export/home/bhigh NAME USED AVAIL REFER MOUNTPOINT tank/export/home/bhigh 51.0G 2.85T 8.16G /export/home/bhigh zpool list shows the raw capacity right? Yes. It shows the raw capacity, including space that will be used for parity. Its USED column includes space used by all active datasets and snapshots. So basically with JFS I see no decrease in total volume size but a huge difference on ZFS. Is this normal/expected? Can anything be disabled to not lose 500-600 GB of space? Are you using any snapshots? They'll consume space. What is the recordsize, and what kind of data are you storing? Small blocks or lots of small files ( 128k) will have more overhead for metadata. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS
On 05/30/2010 02:51 PM, Brandon High wrote: On Sun, May 30, 2010 at 2:37 PM, Sandon Van Ness san...@van-ness.com wrote: ZFS: r...@opensolaris: 11:22 AM :/data# df -k /data 'zfs list' is more accurate than df, since it will also show space used by snapshots. eg: bh...@basestar:~$ df -h /export/home/bhigh Filesystem size used avail capacity Mounted on tank/export/home/bhigh 5.3T 8.2G 2.8T 1%/export/home/bhigh bh...@basestar:~$ zfs list tank/export/home/bhigh NAME USED AVAIL REFER MOUNTPOINT tank/export/home/bhigh 51.0G 2.85T 8.16G /export/home/bhigh zpool list shows the raw capacity right? Yes. It shows the raw capacity, including space that will be used for parity. Its USED column includes space used by all active datasets and snapshots. So basically with JFS I see no decrease in total volume size but a huge difference on ZFS. Is this normal/expected? Can anything be disabled to not lose 500-600 GB of space? Are you using any snapshots? They'll consume space. What is the recordsize, and what kind of data are you storing? Small blocks or lots of small files ( 128k) will have more overhead for metadata. -B Yeah I know all about issues with snapshots and stuff like this but this is on a totally new/empty file-system. Its basically over 500 gigabytes smaller right from the get-go even before any data has ever been written to it. I would totally expect some numbers to be off on a used file-system but not so much on a completely brand-new one. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Small stalls slowing down rsync from holding network saturation every 5 seconds
Basically for a few seconds at a time I can get very nice speeds through rsync (saturating a 1 gig link) which is around 112-113 megabytes/sec which is about as good as I can expect after overhead. The problem is that every 5 seconds when data is actually written to disks (physically looking at the disk LEDs I see the activity on the one sending data stall as the ZFS machine is showing disk activity (writes). Basically this problem is bringing down the average write speed from around 112-113 megabytes/sec to around 100 megabytes/sec (sometimes lower) and thus lowering speeds by 10% (or a bit more). I only really care as 10% can make a difference when you are copying terrabytes of data to the machine. Anyway here is what I am seeing on the linux machine that is sending to the ZFS machine: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 17.00 0.00 496.000.00 112.00 0.00 462.45 0.420.84 0.16 7.90 avg-cpu: %user %nice %system %iowait %steal %idle 6.340.003.800.000.00 89.86 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 1.00 0.00 584.000.00 111.75 0.00 391.88 0.430.74 0.12 7.10 avg-cpu: %user %nice %system %iowait %steal %idle 6.840.005.440.000.00 87.72 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 4.00 0.00 557.000.00 112.00 0.00 411.81 0.430.77 0.13 7.10 avg-cpu: %user %nice %system %iowait %steal %idle 4.640.002.510.000.00 92.86 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 1.98 0.00 104.950.0023.76 0.00 463.70 0.090.84 0.17 1.78 avg-cpu: %user %nice %system %iowait %steal %idle 7.110.005.080.000.00 87.82 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 9.00 0.00 538.000.00 112.69 0.00 428.97 0.390.72 0.14 7.50 avg-cpu: %user %nice %system %iowait %steal %idle 7.460.005.560.000.00 86.98 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 8.00 0.00 524.000.00 112.00 0.00 437.74 0.380.72 0.13 6.90 avg-cpu: %user %nice %system %iowait %steal %idle 7.980.005.960.000.00 86.06 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 1.00 0.00 493.000.00 111.29 0.00 462.31 0.390.80 0.16 7.90 This is iostat -xm 1 sdc3 Basically you can see its reading at around full gig speed and then reads drop down due to writes being stalled on the ZFS machine. These are 1 second averages. On the ZFS end with 10 second averages I see approximately 100 MB/sec: data25.1G 18.1T 0834 0 100M data26.2G 18.1T 0833 0 100M data27.3G 18.1T 0833 0 100M Change this to 1 second and I see: data32.7G 18.1T 0 0 0 0 data32.7G 18.1T 0 2.86K 0 360M data33.3G 18.1T 0264 0 21.8M data33.3G 18.1T 0 0 0 0 data33.3G 18.1T 0 0 0 0 data33.3G 18.1T 0 0 0 0 data33.3G 18.1T 0 2.94K 0 369M data33.8G 18.1T 0375 0 35.1M data33.8G 18.1T 0 0 0 0 data33.8G 18.1T 0 0 0 0 data33.8G 18.1T 0 0 0 0 data33.8G 18.1T 0 2.90K 0 365M data34.4G 18.1T 0599 0 62.6M data34.4G 18.1T 0 0 0 0 data34.4G 18.1T 0 0 0 0 data34.4G 18.1T 0 0 0 0 data34.4G 18.1T 0 2.10K 0 265M data34.9G 18.1T 0 1.77K 0 211M I tried changing the txg sync time from 30 to 1 and that did make things more smooth but in general lowered speeds (down to 90 megabytes/sec or so). Actually writing files to the array I see well excess of 112 megabytes/sec so I would think I should be able to get this to go at full gig speeds without the small stalls: r...@opensolaris: 11:36 AM :/data# dd bs=1M count=10 if=/dev/zero of=./100gb.bin 10+0 records in 10+0 records out 10485760 bytes (105 GB) copied, 233.257 s, 450 MB/s ___ zfs-discuss mailing list zfs-discuss@opensolaris.org
Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS
On Sun, May 30, 2010 at 23:37, Sandon Van Ness san...@van-ness.com wrote: I just wanted to make sure this is normal and is expected. I fully expected that as the file-system filled up I would see more disk space being used than with other file-systems due to its features but what I didn't expect was to lose out on ~500-600GB to be missing from the total volume size right at file-system creation. Comparing two systems, one being JFS and one being ZFS, one being raidz2 one being raid6. Here is the differences I see: ZFS: r...@opensolaris: 11:22 AM :/data# df -k /data Filesystem kbytes used avail capacity Mounted on data 17024716800 258872352 16765843815 2% /data JFS: r...@sabayonx86-64: 11:22 AM :~# df -k /data2 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd1 17577451416 2147912 17575303504 1% /data2 zpool list shows the raw capacity right? r...@opensolaris: 11:25 AM :/data# zpool list data NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT data 18.1T 278G 17.9T 1% 1.00x ONLINE - Ok, i would expect it to be rounded to 18.2 but that seems about right for 20 trillion bytes (what 20x1 TB is): r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print 20/1024/1024/1024/1024}' 18.1899 Now minus two drives for parity: r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print 18/1024/1024/1024/1024}' 16.3709 Yet when running zfs list it also lists the amount of storage significantly smaller: r...@opensolaris: 11:23 AM :~# zfs list data NAME USED AVAIL REFER MOUNTPOINT data 164K 15.9T 56.0K /data I would expect this to be 16.4T. Taking the df -k values JFS gives me a total volume size of: r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print 17577451416/1024/1024/1024}' 16.3703 and zfs is: r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print 17024716800/1024/1024/1024}' 15.8555 So basically with JFS I see no decrease in total volume size but a huge difference on ZFS. Is this normal/expected? Can anything be disabled to not lose 500-600 GB of space? This may be the answer: http://www.cuddletech.com/blog/pivot/entry.php?id=1013 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS
On 05/30/2010 03:10 PM, Mattias Pantzare wrote: On Sun, May 30, 2010 at 23:37, Sandon Van Ness san...@van-ness.com wrote: I just wanted to make sure this is normal and is expected. I fully expected that as the file-system filled up I would see more disk space being used than with other file-systems due to its features but what I didn't expect was to lose out on ~500-600GB to be missing from the total volume size right at file-system creation. Comparing two systems, one being JFS and one being ZFS, one being raidz2 one being raid6. Here is the differences I see: ZFS: r...@opensolaris: 11:22 AM :/data# df -k /data Filesystemkbytesused avail capacity Mounted on data 17024716800 258872352 16765843815 2%/data JFS: r...@sabayonx86-64: 11:22 AM :~# df -k /data2 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd117577451416 2147912 17575303504 1% /data2 zpool list shows the raw capacity right? r...@opensolaris: 11:25 AM :/data# zpool list data NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT data 18.1T 278G 17.9T 1% 1.00x ONLINE - Ok, i would expect it to be rounded to 18.2 but that seems about right for 20 trillion bytes (what 20x1 TB is): r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print 20/1024/1024/1024/1024}' 18.1899 Now minus two drives for parity: r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print 18/1024/1024/1024/1024}' 16.3709 Yet when running zfs list it also lists the amount of storage significantly smaller: r...@opensolaris: 11:23 AM :~# zfs list data NAME USED AVAIL REFER MOUNTPOINT data 164K 15.9T 56.0K /data I would expect this to be 16.4T. Taking the df -k values JFS gives me a total volume size of: r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print 17577451416/1024/1024/1024}' 16.3703 and zfs is: r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print 17024716800/1024/1024/1024}' 15.8555 So basically with JFS I see no decrease in total volume size but a huge difference on ZFS. Is this normal/expected? Can anything be disabled to not lose 500-600 GB of space? This may be the answer: http://www.cuddletech.com/blog/pivot/entry.php?id=1013 That is definitely interesting; however, I am seeing more than 1.6% of a descrepancy: When using a newer df based off gnu coreutils I use -B to specify the unit of 1 billion bytes which is 1 GB using the HD companies scale. On the raid/jfs: r...@sabayonx86-64: 03:14 PM :~# df -B 10 /data2 Filesystem 1GB-blocks Used Available Use% Mounted on /dev/sdd118000 3 17998 1% /data2 on the ZFS r...@opensolaris: 03:16 PM :/data# df -B 10 /data Filesystem 1GB-blocks Used Available Use% Mounted on data 17434 1 17434 1% /data Interesting enough I am seeing almost exactly double that as its 3.14% by my calculations. Maybe this was cahnged in newer versions to have more of a reserve? I am running b134. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Small stalls slowing down rsync from holding network saturation every 5 seconds
On May 30, 2010, at 3:04 PM, Sandon Van Ness wrote: Basically for a few seconds at a time I can get very nice speeds through rsync (saturating a 1 gig link) which is around 112-113 megabytes/sec which is about as good as I can expect after overhead. The problem is that every 5 seconds when data is actually written to disks (physically looking at the disk LEDs I see the activity on the one sending data stall as the ZFS machine is showing disk activity (writes). Basically this problem is bringing down the average write speed from around 112-113 megabytes/sec to around 100 megabytes/sec (sometimes lower) and thus lowering speeds by 10% (or a bit more). I only really care as 10% can make a difference when you are copying terrabytes of data to the machine. Anyway here is what I am seeing on the linux machine that is sending to the ZFS machine: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 17.00 0.00 496.000.00 112.00 0.00 462.45 0.420.84 0.16 7.90 avg-cpu: %user %nice %system %iowait %steal %idle 6.340.003.800.000.00 89.86 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 1.00 0.00 584.000.00 111.75 0.00 391.88 0.430.74 0.12 7.10 avg-cpu: %user %nice %system %iowait %steal %idle 6.840.005.440.000.00 87.72 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 4.00 0.00 557.000.00 112.00 0.00 411.81 0.430.77 0.13 7.10 avg-cpu: %user %nice %system %iowait %steal %idle 4.640.002.510.000.00 92.86 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 1.98 0.00 104.950.0023.76 0.00 463.70 0.090.84 0.17 1.78 avg-cpu: %user %nice %system %iowait %steal %idle 7.110.005.080.000.00 87.82 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 9.00 0.00 538.000.00 112.69 0.00 428.97 0.390.72 0.14 7.50 avg-cpu: %user %nice %system %iowait %steal %idle 7.460.005.560.000.00 86.98 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 8.00 0.00 524.000.00 112.00 0.00 437.74 0.380.72 0.13 6.90 avg-cpu: %user %nice %system %iowait %steal %idle 7.980.005.960.000.00 86.06 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util sdc3 1.00 0.00 493.000.00 111.29 0.00 462.31 0.390.80 0.16 7.90 This is iostat -xm 1 sdc3 Basically you can see its reading at around full gig speed and then reads drop down due to writes being stalled on the ZFS machine. These are 1 second averages. On the ZFS end with 10 second averages I see approximately 100 MB/sec: data25.1G 18.1T 0834 0 100M data26.2G 18.1T 0833 0 100M data27.3G 18.1T 0833 0 100M Change this to 1 second and I see: data32.7G 18.1T 0 0 0 0 data32.7G 18.1T 0 2.86K 0 360M data33.3G 18.1T 0264 0 21.8M data33.3G 18.1T 0 0 0 0 data33.3G 18.1T 0 0 0 0 data33.3G 18.1T 0 0 0 0 data33.3G 18.1T 0 2.94K 0 369M data33.8G 18.1T 0375 0 35.1M data33.8G 18.1T 0 0 0 0 data33.8G 18.1T 0 0 0 0 data33.8G 18.1T 0 0 0 0 data33.8G 18.1T 0 2.90K 0 365M data34.4G 18.1T 0599 0 62.6M data34.4G 18.1T 0 0 0 0 data34.4G 18.1T 0 0 0 0 data34.4G 18.1T 0 0 0 0 data34.4G 18.1T 0 2.10K 0 265M data34.9G 18.1T 0 1.77K 0 211M I tried changing the txg sync time from 30 to 1 and that did make things more smooth but in general lowered speeds (down to 90 megabytes/sec or so). Actually writing files to the array I see well excess of 112 megabytes/sec so I would think I should be able to get this to go at full gig speeds without the small stalls: I have better luck tuning the zfs_txg_synctime_ms from 5000 to 1000 or less. r...@opensolaris: 11:36 AM :/data# dd bs=1M
Re: [zfs-discuss] Zfs mirror boot hang at boot
On 5/29/10 12:54 AM -0700 Matt Connolly wrote: I'm running snv_134 on 64-bit x86 motherboard, with 2 SATA drives. The zpool rpool uses whole disk of each drive. Can't be. zfs can't boot from a whole disk pool on x86 (maybe sparc too). You have a single solaris partition with the root pool on it. I am only being pedantic because whole disk has a special meaning to zfs, distinct from a single partition using the entire disk. ... If I detach a drive from the pool, then the system also correctly boots off a single connected drive. However, reattaching the 2nd drive causes a whole resilver to occur. By detach do you mean running zpool detach, or simply removing the drive physically without running any command? I suppose the former because if you just remove it I'd think you'd have the same non-booting problem. If that's right, then that is the expected behavior. zpool detach causes zfs to forget everything it knows about the device being detached. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Small stalls slowing down rsync from holding network saturation every 5 seconds
On 05/30/2010 04:22 PM, Richard Elling wrote: If you want to decouple the txg commit completely, then you might consider using a buffer of some sort. I use mbuffer for pipes, but that may be tricky to use in an rsync environment. -- richard I initially thought this was I/O but now I think it might be a CPU bottleneck causing the problem. When I write zero's using DD I can get quite good speed (~500 megabytes/sec) but I think the problem is that it totally maxes both of my CPU's cores when I do so (2-5% idle in top). The problem is that when it does the write burst its taking away CPU usage from rsync which is actually what might be causing the dip during writes (not the I/O activity itself) but the CPU generated from the writes. I actually verified this by making a non raidz2 pool and just added all the disks into a pool with nothing specified: zpool create data c4t5000C500028BD5FCd0p0 c4t5000C50009A4D727d0p0 c4t5000C50009A46AF5d0p0 c4t5000C50009A515B0d0p0 c4t5000C500028A81BEd0p0 c4t5000C500028B44A1d0p0 c4t5000C500028B415Bd0p0 c4t5000C500028B23D2d0p0 c4t5000C5000CC3338Dd0p0 c4t5000C500027F59C8d0p0 c4t5000C50009DBF8D4d0p0 c4t5000C500027F3C1Fd0p0 c4t5000C5000DAF02F3d0p0 c4t5000C5000DA7ED4Ed0p0 c4t5000C5000DAEF990d0p0 c4t5000C5000DAEEF8Ed0p0 c4t5000C5000DAEB881d0p0 c4t5000C5000A121581d0p0 c4t5000C5000DAC848Fd0p0 c4t5000C50002770EE6d0p0 Once I did this I got a nice stable 115 megabyte/ssec over the network so 11.5% better. So the problem appears to be that when it goes to write data it will use 100% of its cpu power (even for x number of ms of time) which stalls the network while it does this. This happens when it does parity calculations but doesn't when using a non parity zpool. I don't think I can throttle parity calculations at all so I don't think there will be a fix to this unfortunately =(. I can live with losing 10% off my rsync speed though. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss