Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
Seems like this issue only occurs when MSI-X interrupts are enabled for the BCM5709 chips, or am I reading it wrong? If I type 'echo ::interrupts | mdb -k', and isolate for network-related bits, I get the following output: IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# ISR(s) 36 0x60 6 PCI Lvl Fixed 3 1 0x1/0x4 bnx_intr_1lvl 48 0x61 6 PCI Lvl Fixed 2 1 0x1/0x10 bnx_intr_1lvl Does this imply that my system is not in a vulnerable configuration? Supposedly i'm losing some performance without MSI-X, but I'm not sure in which environments or workloads we would notice since the load on this server is relatively low, and the L2ARC serves data at greater than 100MB/s (wire speed) without stressing much of anything. The BIOS settings in our T610 are exactly as they arrived from Dell when we bought it over a year ago. Thoughts? --eric Unfortunately I see irq type fixed in system that suffers from network issues with bnx. But yes, Regarding to redhat material this has something to do with Nehalem c-states (power saving etc) and/or MSI. If your system has been running for year or so, I wouldn't expect this issue to come up, we have noted this issue with R410/R710 mostly that are manufactured in Q4/2009-Q1/2010 (different hw revisions?) Yours Markus Kovero ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD sale on newegg
On Tue, Apr 06, 2010 at 05:22:25PM -0700, Carson Gaspar wrote: I just found an 8 GB SATA Zeus (Z4S28I) for £83.35 (~US$127) shipped to California. That should be more than large enough for my ZIL @home, based on zilstat. Transcend sells an 8 GByte SLC SSD for about 70 EUR. The specs are not awe-inspiring though (I used it in an embedded firewall). The web site says EOL, limited to current stock. http://www.dpieshop.com/stec-zeus-z4s28i-8gb-25-sata-ssd-solid-state-drive-industrial-temp-p-410.html Of course this seems _way_ too good to be true, but I decided to take the risk. -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Hi list, If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up normally with the following message in 'zpool status': ... pool: mypool state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAMESTATE READ WRITE CKSUM mypool FAULTED 0 0 0 bad intent log ... Nice! Running a later version of ZFS seems to lessen the need for ZIL-mirroring... With kind regards, Jeroen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Hi list, If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up normally with the following message in 'zpool status': ... pool: mypool state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAMESTATE READ WRITE CKSUM mypool FAULTED 0 0 0 bad intent log ... Nice! Running a later version of ZFS seems to lessen the need for ZIL-mirroring... With kind regards, Jeroen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jeroen Roodhart If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up normally [...] Nice! Running a later version of ZFS seems to lessen the need for ZIL- mirroring... Yes, since zpool 19, which is not available in any version of solaris yet, and is not available in osol 2009.06 unless you update to developer builds, Since zpool 19, you have the ability to zpool remove log devices. And if a log device fails during operation, the system is supposed to fall back and just start using ZIL blocks from the main pool instead. So the recommendation for zpool 19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool =19 would be ... don't mirror your log device. If you have more than one, just add them both unmirrored. I edited the ZFS Best Practices yesterday to reflect these changes. I always have a shade of doubt about things that are supposed to do something. Later this week, I am building an OSOL machine, updating it, adding an unmirrored log device, starting a sync-write benchmark (to ensure the log device is heavily in use) and then I'm going to yank out the log device, and see what happens. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 7 apr 2010, at 14.28, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jeroen Roodhart If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up normally [...] Nice! Running a later version of ZFS seems to lessen the need for ZIL- mirroring... Yes, since zpool 19, which is not available in any version of solaris yet, and is not available in osol 2009.06 unless you update to developer builds, Since zpool 19, you have the ability to zpool remove log devices. And if a log device fails during operation, the system is supposed to fall back and just start using ZIL blocks from the main pool instead. So the recommendation for zpool 19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool =19 would be ... don't mirror your log device. If you have more than one, just add them both unmirrored. Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. For a file server, mail server, etc etc, where things are stored and supposed to be available later, you almost certainly want redundancy on your slog too. (There may be file servers where this doesn't apply, but they are special cases that should not be mentioned in the general documentation.) I edited the ZFS Best Practices yesterday to reflect these changes. I'd say, that In zpool version 19 or greater, it is recommended not to mirror log devices. is not a very good advice and should be changed. /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 07/04/2010 13:58, Ragnar Sundblad wrote: Rather: ...=19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. For a file server, mail server, etc etc, where things are stored and supposed to be available later, you almost certainly want redundancy on your slog too. (There may be file servers where this doesn't apply, but they are special cases that should not be mentioned in the general documentation.) While I agree with you I want to mention that it is all about understanding a risk. In this case not only your server has to crash in such a way so data has not been synced (sudden power loss for example) but there would have to be some data committed to a slog device(s) which was not written to a main pool and when your server restarts your slog device would have to completely die as well. Other than that you are fine even with unmirrored slog device. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Wed, 7 Apr 2010, Ragnar Sundblad wrote: So the recommendation for zpool 19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool =19 would be ... don't mirror your log device. If you have more than one, just add them both unmirrored. Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time. The writes are assumed to work if the device claims success. If the log device fails to read (oops!), then a mirror would be quite useful. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 07/04/2010 15:35, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Ragnar Sundblad wrote: So the recommendation for zpool 19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool =19 would be ... don't mirror your log device. If you have more than one, just add them both unmirrored. Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time. The writes are assumed to work if the device claims success. If the log device fails to read (oops!), then a mirror would be quite useful. it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Wed, 7 Apr 2010, Robert Milkowski wrote: it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. How does zfs know if there is uncomitted data on the slog device without reading it? The minimal read would be quite small, but it seems that a read is still required. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Reclaiming Windows Partitions
I finally decided to get rid of my Windows XP partition as I rarely used it except to fire it up to install OS updates and virus signatures. I had some trouble locating information on how to do this so I thought I'd document it here. My system is Toshiba Tecra M9. It had four partitions on it. Partition 1 - NTFS Windows XP OS (Drive C:) Partition 2 - NTFS Windows data partition (D:) Partition 3 - FAT32 Partition 4 - Solaris2 Partition 1 and 2 where laid down by my company's standard OS install. I had shrunk these using QTparted to enable me to install OpenSolaris. Partition 3 was setup to have a common file system mountable by OpenSolaris and Windows. There may be ways to do this with NTFS now, but this was a legacy from older Solaris installs. Partition 4 is my OpenSolaris ZFS install Step 1) Backuped up all my data from Partition 3, and any files I needed from Partition 1 and 2. I also had a current snapshot of my OpenSolaris partition (Partition 4). Step 2) Delete Partitions 1,2, and 3. I did this using fdisk option in format under Opensolaris. format - Select Disk 0 (make note of the short drive name alias, mine was c4t0d0) You will receive a warning something like this; [disk formatted] /dev/dsk/c4t0d0s0 is part of active ZFS pool rpool. Please see zpool(1M) Then select fdisk from the FORMAT MENU You will see something like this; Total disk size is 14593 cylinders Cylinder size is 16065 (512 byte) blocks Cylinders PartitionStatus Type Start End Length% = == = === == === 1 FAT32LBA x xx 2 FAT32LBA xx 3 Win95 FAT32 5481 8157267 18 4 Active Solaris2 8158 145796422 44 SELECT ONE OF THE FOLLOWING: 1. Create a partition 2. Specify the active partition 3. Delete a partition 4. Change between Solaris and Solaris2 Partition IDs 5. Edit/View extended partitions 6. Exit (update disk configuration and exit) 7. Cancel (exit without updating disk configuration) Enter Selection: Delete the partitions 1, 2 and 3 (Don't forget to back them up before you do this) Using the fdisk menu create a new Solaris2 partition for use by ZFS. When you are done you should see something like this; Cylinder size is 16065 (512 byte) blocks Cylinders Partition Status Type Start End Length% = == = === == === 1 Solaris2 1 81578157 56 4 Active Solaris2 8158 14579 6422 44 Exit and update the disk configuration. Step 3) Create the ZFS pool First you can test if zpool will be successful in creating the pool by using the -n option; zpool create -n datapool c4t0d0p1 (I will make some notes about this disk name at the end) Should report something like; would create 'datapool' with the following layout: datapool c4t0d0p1 By default the zpool command will make a mount-point in your root / with the same name as your pool. If you don't want this you can change that in the create command (see the man page for details) Now issue the command without the -n option; zpool create datapool c4t0d0p1 Now check to see if it is there; zpool list It should report something like this; NAME SIZE ALLOC FREECAPDEDUP HEALTH ALTROOT datapool 62G 30.7G 31.3G49% 1.06x ONLINE - rpool49G 43.4G 5.65G88% 1.00x ONLINE - Step 4) Remember to take any of the mount parameters out of your /etc/vfstab file. You should be good to go at this point. == Notes about disk/partition naming; In my case the disk is called c4t0d0. So how did I come up with c4t0d0p1? The whole disk name is c4t0d0p0. Each partition is has the following naming convention; Partition 1 = c4t0d0p1 Partition 2 = c4t0d0p2 Partition 3 = c4t0d0p3 Partition 4 = c4t0d0p4 The fdisk command does not renumber the partitions when you delete partitions. So in the end I had Partition 1 and 4. Thanks to Srdjan Matovina for helping me sort this out, and as a second pair of eyes to
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 04/07/10 09:19, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Robert Milkowski wrote: it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. How does zfs know if there is uncomitted data on the slog device without reading it? The minimal read would be quite small, but it seems that a read is still required. Bob If there's ever been synchronous activity then there an empty tail block (stubby) that will be read even after a clean shutdown. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device at the same time, this is an important consideration. But if you have either a system crash, or a failed log device, that don't happen at the same time, then your sync writes are safe, right up to the nanosecond. Using unmirrored nonvolatile log device on zpool = 19. I'd say, that In zpool version 19 or greater, it is recommended not to mirror log devices. is not a very good advice and should be changed. See above. Still disagree? If desired, I could clarify the statement, by basically pasting what's written above. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time. The writes are assumed to work if the device claims success. If the log device fails to read (oops!), then a mirror would be quite useful. An excellent point. BTW, does the system *ever* read from the log device during normal operation? Such as perhaps during a scrub? It really would be nice to detect failure of log devices in advance, that are claiming to write correctly, but which are really unreadable. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reclaiming Windows Partitions
On 07.04.2010 18:05, Ron Marshall wrote: I finally decided to get rid of my Windows XP partition as I rarely used it except to fire it up to install OS updates and virus signatures. I had some trouble locating information on how to do this so I thought I'd document it here. My system is Toshiba Tecra M9. It had four partitions on it. Partition 1 - NTFS Windows XP OS (Drive C:) Partition 2 - NTFS Windows data partition (D:) Partition 3 - FAT32 Partition 4 - Solaris2 Partition 1 and 2 where laid down by my company's standard OS install. I had shrunk these using QTparted to enable me to install OpenSolaris. Partition 3 was setup to have a common file system mountable by OpenSolaris and Windows. There may be ways to do this with NTFS now, but this was a legacy from older Solaris installs. Partition 4 is my OpenSolaris ZFS install Step 1) Backuped up all my data from Partition 3, and any files I needed from Partition 1 and 2. I also had a current snapshot of my OpenSolaris partition (Partition 4). Step 2) Delete Partitions 1,2, and 3. I did this using fdisk option in format under Opensolaris. format - Select Disk 0 (make note of the short drive name alias, mine was c4t0d0) You will receive a warning something like this; [disk formatted] /dev/dsk/c4t0d0s0 is part of active ZFS pool rpool. Please see zpool(1M) Then select fdisk from the FORMAT MENU You will see something like this; Total disk size is 14593 cylinders Cylinder size is 16065 (512 byte) blocks Cylinders PartitionStatus Type Start End Length% = == = === == === 1 FAT32LBA x xx 2 FAT32LBA xx 3 Win95 FAT32 5481 8157267 18 4 Active Solaris2 8158 145796422 44 SELECT ONE OF THE FOLLOWING: 1. Create a partition 2. Specify the active partition 3. Delete a partition 4. Change between Solaris and Solaris2 Partition IDs 5. Edit/View extended partitions 6. Exit (update disk configuration and exit) 7. Cancel (exit without updating disk configuration) Enter Selection: Delete the partitions 1, 2 and 3 (Don't forget to back them up before you do this) Using the fdisk menu create a new Solaris2 partition for use by ZFS. When you are done you should see something like this; Cylinder size is 16065 (512 byte) blocks Cylinders Partition Status Type Start End Length% = == = === == === 1 Solaris2 1 81578157 56 4 Active Solaris2 8158 14579 6422 44 Exit and update the disk configuration. Step 3) Create the ZFS pool First you can test if zpool will be successful in creating the pool by using the -n option; zpool create -n datapool c4t0d0p1 (I will make some notes about this disk name at the end) Should report something like; would create 'datapool' with the following layout: datapool c4t0d0p1 By default the zpool command will make a mount-point in your root / with the same name as your pool. If you don't want this you can change that in the create command (see the man page for details) Now issue the command without the -n option; zpool create datapool c4t0d0p1 Now check to see if it is there; zpool list It should report something like this; NAME SIZE ALLOC FREECAPDEDUP HEALTH ALTROOT datapool 62G 30.7G 31.3G49% 1.06x ONLINE - rpool49G 43.4G 5.65G88% 1.00x ONLINE - Step 4) Remember to take any of the mount parameters out of your /etc/vfstab file. You should be good to go at this point. == Notes about disk/partition naming; In my case the disk is called c4t0d0. So how did I come up with c4t0d0p1? The whole disk name is c4t0d0p0. Each partition is has the following naming convention; Partition 1 = c4t0d0p1 Partition 2 = c4t0d0p2 Partition 3 = c4t0d0p3 Partition 4 = c4t0d0p4 The fdisk command does not
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 04/07/10 10:18, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time. The writes are assumed to work if the device claims success. If the log device fails to read (oops!), then a mirror would be quite useful. An excellent point. BTW, does the system *ever* read from the log device during normal operation? Such as perhaps during a scrub? It really would be nice to detect failure of log devices in advance, that are claiming to write correctly, but which are really unreadable. A scrub will read the log blocks but only for unplayed logs. Because of the transient nature of the log and becuase it operates outside of the transaction group model it's hard to read the in-flight log blocks to validate them. There have previously been suggestions to read slogs periodically. I don't know if there's a CR raised for this though. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Wed, 7 Apr 2010, Neil Perrin wrote: There have previously been suggestions to read slogs periodically. I don't know if there's a CR raised for this though. Roch wrote up CR 6938883 Need to exercise read from slog dynamically Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Wed, 7 Apr 2010, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device at the same time, this is an important consideration. But if you have either a system crash, or a failed log device, that don't happen at the same time, then your sync writes are safe, right up to the nanosecond. Using unmirrored nonvolatile log device on zpool = 19. The point is that the slog is a write-only device and a device which fails such that its acks each write, but fails to read the data that it wrote, could silently fail at any time during the normal operation of the system. It is not necessary for the slog device to fail at the exact same time that the system spontaneously reboots. I don't know if Solaris implements a background scrub of the slog as a normal course of operation which would cause a device with this sort of failure to be exposed quickly. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Wed, 7 Apr 2010, Edward Ned Harvey wrote: BTW, does the system *ever* read from the log device during normal operation? Such as perhaps during a scrub? It really would be nice to detect failure of log devices in advance, that are claiming to write correctly, but which are really unreadable. To make matters worse, a SSD with a large cache might satisfy such reads from its cache so a scrub of the (possibly) tiny bit of pending synchronous writes may not validate anything. A lightly loaded slog should usually be empty. We already know that some (many?) SSDs are not very good about persisting writes to FLASH, even after acking a cache flush request. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS RaidZ recommendation
I have been searching this forum and just about every ZFS document i can find trying to find the answer to my questions. But i believe the answer i am looking for is not going to be documented and is probably best learned from experience. This is my first time playing around with open solaris and ZFS. I am in the midst of replacing my home based filed server. This server hosts all of my media files from MP3's to Blue Ray ISO's. I stream media from this file server to several media players throughout my house. The server consists of a Supermicro X6DHE-XG2 motherboard, 2 X 2.8ghz xeon processors, 4 gigs of ram and 2 Supermicro SAT2MV8 controllers. I have 14 1TB hitachi hard drives connected to the controllers. My initial thought was to just create a single 14 drive RaidZ2 pool, but i have read over and over again that i should be limiting each array to a max of 9 drives. So then i would end up with 2 X 7 drive RaidZ arrays. To keep the pool size at 12TB i would have to give up my extra parity drive going to this 2 array setup and it is concerning as i have no room for hot spares in this system. So in my mind i am left with only one other choice and this is going to 2XRaidZ2 pools and loosing an additional 2 TB so i am left with a 10TB ZFS pool. So my big question is given that i am working with 4mb - 50gb files is going with 14 spindles going incur a huge performance hit? I was hoping to be able to saturate a single GigE link with this setup, but i am concerned the single large array wont let me achieve this. hh, decisions, decisions Any advice would be greatly appreciated. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Apr 7, 2010, at 10:19 AM, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device at the same time, this is an important consideration. But if you have either a system crash, or a failed log device, that don't happen at the same time, then your sync writes are safe, right up to the nanosecond. Using unmirrored nonvolatile log device on zpool = 19. The point is that the slog is a write-only device and a device which fails such that its acks each write, but fails to read the data that it wrote, could silently fail at any time during the normal operation of the system. It is not necessary for the slog device to fail at the exact same time that the system spontaneously reboots. I don't know if Solaris implements a background scrub of the slog as a normal course of operation which would cause a device with this sort of failure to be exposed quickly. You are playing against marginal returns. An ephemeral storage requirement is very different than permanent storage requirement. For permanent storage services, scrubs work well -- you can have good assurance that if you read the data once then you will likely be able to read the same data again with some probability based on the expected decay of the data. For ephemeral data, you do not read the same data more than once, so there is no correlation between reading once and reading again later. In other words, testing the readability of an ephemeral storage service is like a cat chasing its tail. IMHO, this is particularly problematic for contemporary SSDs that implement wear leveling. sidebar For clusters the same sort of problem exists for path monitoring. If you think about paths (networks, SANs, cups-n-strings) then there is no assurance that a failed transfer means all subsequent transfers will also fail. Some other permanence test is required to predict future transfer failures. s/fail/pass/g /sidebar Bottom line: if you are more paranoid, mirror the separate log devices and sleep through the night. Pleasant dreams! :-) -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
Daniel, Which Solaris release is this? I can't reproduce this on my lab system that runs the Solaris 10 10/09 release. See the output below. Thanks, Cindy # zfs destroy -r tank/test # zfs create -o compression=gzip tank/test # zfs snapshot tank/t...@now # zfs send -R tank/t...@now | zfs receive -vd rpool receiving full stream of tank/t...@now into rpool/t...@now received 249KB stream in 2 seconds (125KB/sec) # zfs list -r rpool NAMEUSED AVAIL REFER MOUNTPOINT rpool 39.4G 27.5G 47.1M /rpool rpool/ROOT 4.89G 27.5G21K legacy rpool/ROOT/s10s_u8wos_08a 4.89G 27.5G 4.89G / rpool/dump 1.50G 27.5G 1.50G - rpool/export 44K 27.5G23K /export rpool/export/home21K 27.5G21K /export/home rpool/snaps31.0G 27.5G 31.0G /rpool/snaps rpool/swap2G 29.5G16K - rpool/test 21K 27.5G21K /rpool/test rpool/t...@now 0 -21K - # zfs get compression rpool/test NAMEPROPERTY VALUE SOURCE rpool/test compression gzip local On 04/07/10 11:47, Daniel Bakken wrote: When I send a filesystem with compression=gzip to another server with compression=on, compression=gzip is not set on the received filesystem. I am using: zfs send -R promise1/arch...@daily.1 | zfs receive -vd sas The zfs manpage says regarding the -R flag: When received, all properties, snapshots, descendent file systems, and clones are preserved. Snapshots are preserved, but the compression property is not. Any ideas why this doesn't work as advertised? Thanks, Daniel Bakken ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, 7 Apr 2010, Jason S wrote: To keep the pool size at 12TB i would have to give up my extra parity drive going to this 2 array setup and it is concerning as i have no room for hot spares in this system. So in my mind i am left with only one other choice and this is going to 2XRaidZ2 pools and loosing an additional 2 TB so i am left with a 10TB ZFS pool. I would go with a single pool with two raidz2 vdevs, even if you don't get the maximum possible space. Raidz is best avoided when using 1GB SATA disk drives because of the relatively high probability of data loss during a resilver and the long resilver times. I would trade the hot spare for the improved security of raidz2. The hot spare is more helpful for mirrored setups or raidz1, where the data reliability is more sensitive to how long it takes to recover a lost drive. Just buy a spare drive so that you can replace a failed drive expediently. So my big question is given that i am working with 4mb - 50gb files is going with 14 spindles going incur a huge performance hit? I was hoping to be able to saturate a single GigE link with this setup, but i am concerned the single large array wont let me achieve this. It is not difficult to saturate a gigabit link. It can be easily accomplished with just a couple of drives. The main factor is if zfs's prefetch is aggressive enough. Each raidz2 vdev will offer the useful IOPS of a single disk drive so from an IOPS standpoint, the pool would behave like two drives. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, 2010-04-07 at 10:40 -0700, Jason S wrote: I have been searching this forum and just about every ZFS document i can find trying to find the answer to my questions. But i believe the answer i am looking for is not going to be documented and is probably best learned from experience. This is my first time playing around with open solaris and ZFS. I am in the midst of replacing my home based filed server. This server hosts all of my media files from MP3's to Blue Ray ISO's. I stream media from this file server to several media players throughout my house. The server consists of a Supermicro X6DHE-XG2 motherboard, 2 X 2.8ghz xeon processors, 4 gigs of ram and 2 Supermicro SAT2MV8 controllers. I have 14 1TB hitachi hard drives connected to the controllers. If you can at all afford it, upgrade your RAM to 8GB. More than anything else, I've found that additional RAM makes up for any other deficiencies with a ZFS setup. 4GB is OK, but 8GB is a pretty sweet spot for price/performance for a small NAS server. My initial thought was to just create a single 14 drive RaidZ2 pool, but i have read over and over again that i should be limiting each array to a max of 9 drives. So then i would end up with 2 X 7 drive RaidZ arrays. That's correct. You can certainly do a 14-drive Raidz2, but given how the access/storage pattern for data is in such a setup, you'll likely suffer noticeable slowness vs. a 2x7-drive setup. To keep the pool size at 12TB i would have to give up my extra parity drive going to this 2 array setup and it is concerning as i have no room for hot spares in this system. So in my mind i am left with only one other choice and this is going to 2XRaidZ2 pools and loosing an additional 2 TB so i am left with a 10TB ZFS pool. You've pretty much hit it right there. There is *one* other option: create a zpool of two raidz1 vdevs: one with 6 drives, and one with 7 drives. Then add a hot spare for the pool. That will give you most of the performance of a 2x7 setup, with the capacity of 11 disks. The tradeoff is that it's a bit less reliable, as you have to trust the ability of the hot spare to resilver before any additional drives fail in degraded array. For a home NAS, it's likely a reasonable bet, though. So my big question is given that i am working with 4mb - 50gb files is going with 14 spindles going incur a huge performance hit? I was hoping to be able to saturate a single GigE link with this setup, but i am concerned the single large array wont let me achieve this. Frankly, testing is the only way to be sure. :-) Writing files that large (and reading them back more frequently, I assume...) will tend to reduce the differences in performance between a 1x14 and 2x7 setup. One way to keep your 1Gb Ethernet saturated is to increase the RAM (as noted above). With 8GB of RAM, you should have enough buffer space in play to mask the differences in large file I/O between the 1x14 and 2x7 setups. 12GB or 16GB would most certainly erase pretty much any noticeable difference. For small random I/O, even with larger amounts of RAM, you'll notice some difference between the two setups - exactly how noticeable I can't say, and you'd have to try it to see, as it depends heavily on your access pattern. hh, decisions, decisions Any advice would be greatly appreciated. One thing Richard or Bob might be able to answer better is the tradeoff between getting a cheap/small SSD for L2ARC and buying more RAM. That is, I don't have a good feel for whether (for your normal usage case), it would be better to get 8GB of more RAM, or buy something like a cheap 40-60GB SSD for use as an L2ARC (or some combinations of the two). SSDs in that size range are $150-200, which is what 8GB of DDR1 ECC RAM will likely cost. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
jr == Jeroen Roodhart j.r.roodh...@uva.nl writes: jr Running OSOL nv130. Power off the machine, removed the F20 and jr power back on. Machines boots OK and comes up normally with jr the following message in 'zpool status': yeah, but try it again and this time put rpool on the F20 as well and try to import the pool from a LiveCD: if you lose zpool.cache at this stage, your pool is toast./end repeat mode pgpt1GZtrVxS6.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, 7 Apr 2010, Erik Trimble wrote: One thing Richard or Bob might be able to answer better is the tradeoff between getting a cheap/small SSD for L2ARC and buying more RAM. That is, I don't have a good feel for whether (for your normal usage case), it would be better to get 8GB of more RAM, or buy something like a cheap 40-60GB SSD for use as an L2ARC (or some combinations of the two). SSDs in that size range are $150-200, which is what 8GB of DDR1 ECC RAM will likely cost. If the storage is primarily used for single-user streamed video playback, data caching will have little value (data is accessed only once) and there may even be value to disabling data caching entirely (but cache metadata). The only useful data caching would be to support file prefetch. If data caching is disabled then the total RAM requirement may be reduced. If the storage will serve other purposes as well, then retaining the caching and buying more RAM is a wise idea. Zfs has a design weakness in that any substantial writes during streaming playback may temporarily interrupt (hickup) the streaming playback. This weakness seems to be inherent to zfs although there are tuning options to reduce its effect. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
Thank you for the replies guys! I was actually already planning to get another 4 gigs of ram for the box right away anyway, but thank you for mentioning it! As there appears to be a couple ways to skin the cat here i think i am going to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance is like. I have a fews days of grace before i need to have this server ready for duty. Something i forgot to note in my original post is the performance numbers i am concerned with are going to be during reads primarily. There could be at any one point 4 media players attempting to stream media from this server. The media players all have 100mb interfaces so as long i can can reliable stream 400mb/s it should be ok (this is assuming all the media players were playing high bitrate Blueray streams at one time). Any writing to this array would happen pretty infrequently and i normally schedule any file transfers for the wee hours of the morning anyway. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 7 apr 2010, at 18.13, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device at the same time, this is an important consideration. But if you have either a system crash, or a failed log device, that don't happen at the same time, then your sync writes are safe, right up to the nanosecond. Using unmirrored nonvolatile log device on zpool = 19. Right, but if you have a power or a hardware problem, chances are that more things really break at the same time, including the slog device(s). I'd say, that In zpool version 19 or greater, it is recommended not to mirror log devices. is not a very good advice and should be changed. See above. Still disagree? If desired, I could clarify the statement, by basically pasting what's written above. I believe that for a mail server, NFS server (to be spec compliant), general purpose file server and the like, where the last written data is as important as older data (maybe even more), it would be wise to have at least as good redundancy on the slog as on the data disks. If one can stand the (pretty small) risk of of loosing the last transaction group before a crash, at the moment typically up to the last 30 seconds of changes, you may have less redundancy on the slog. (And if you don't care at all, like on a web cache perhaps, you could of course disable the zil all together - that is kind of the other end of the scale, which puts this in perspective.) As Robert M so wisely and simply put it; It is all about understanding a risk. I think the documentation should help people take educated decisions, though I am not right now sure how to put the words to describe this in an easily understandable way. /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
Hello, More for my own edification than to help Jason (sorry Jason!) I would like to clarify something. If read performance is paramount, am I correct in thinking RAIDZ is not the best way to go? Would not the ZFS equivalent of RAID 10 (striped mirror sets) offer better read performance? In this case, I realize that Jason also needs to maximize the space he has in order to store all of those legitimately copied Blu-Ray movies. ;-) Regards, Chris On Apr 7, 2010, at 3:09 PM, Jason S wrote: Thank you for the replies guys! I was actually already planning to get another 4 gigs of ram for the box right away anyway, but thank you for mentioning it! As there appears to be a couple ways to skin the cat here i think i am going to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance is like. I have a fews days of grace before i need to have this server ready for duty. Something i forgot to note in my original post is the performance numbers i am concerned with are going to be during reads primarily. There could be at any one point 4 media players attempting to stream media from this server. The media players all have 100mb interfaces so as long i can can reliable stream 400mb/s it should be ok (this is assuming all the media players were playing high bitrate Blueray streams at one time). Any writing to this array would happen pretty infrequently and i normally schedule any file transfers for the wee hours of the morning anyway. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss eSoft SpamFilter Training Tool Train as Spam Blacklist for All Users Whitelist for All Users ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
GreenBytes (USA) sells OpenSolaris based storage appliances Web site: www.getgreenbytes.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On 04/ 7/10 03:09 PM, Jason S wrote: I was actually already planning to get another 4 gigs of ram for the box right away anyway, but thank you for mentioning it! As there appears to be a couple ways to skin the cat here i think i am going to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance is like. I have a fews days of grace before i need to have this server ready for duty. Just curious, what are you planning to boot from? AFAIK you can't boot ZFS from anything much more complicated than a mirror. Cheers -- Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, 7 Apr 2010, Chris Dunbar wrote: More for my own edification than to help Jason (sorry Jason!) I would like to clarify something. If read performance is paramount, am I correct in thinking RAIDZ is not the best way to go? Would not the ZFS equivalent of RAID 10 (striped mirror sets) offer better read performance? In this case, I realize that Jason also needs to Striped mirror vdevs are assured to offer peak performance. One would (naively) think that the striping in a raidz2 would allow it to offer more sequential performance, but zfs's sequential file prefetch allows mirrors to offer about the same level of sequential performance. With the mirror setup, 128K blocks are pulled from each disk whereas with the raidz setup, the 128K block is split across the drives constituting a vdev. Zfs is very good at ramping up prefetch for large sequential files. Due to this, raidz2 should be seen as a way to improve storage efficiency and data reliability, and not so much as a way to improve sequential performance. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, Apr 7, 2010 at 12:09 PM, Jason S j.sin...@shaw.ca wrote: I was actually already planning to get another 4 gigs of ram for the box right away anyway, but thank you for mentioning it! As there appears to be a couple ways to skin the cat here i think i am going to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance is like. I have a fews days of grace before i need to have this server ready for duty. Don't bother with the 14-drive raidz2. I can attest to just how horrible the performance is for a single, large, raidz2 vdev is: atrocious. Especially when it comes time to scrub or resilver. You'll end up thrashing all the disks, taking close to a week to resilver a dead drive (if you can actually get it to complete), and pulling your hair out is frustration. Our original configuration in our storage servers used a single 24-drive raidz2 vdev using 7200 RPM SATA drives. Worked, not well, but it worked ... until the first drive died. After 3 weeks, the resilver still hadn't finished, the backups processes weren't completing overnight due to the resilver process, and things just went downhill. We redid the pool using 3x raidz2 vdevs using 8 drives each, and things are much better. (If I had to re-do it again today, I'd use 4x raidz2 vdevs using 6 drives each.) The more vdevs you can add to a pool, the better the raw I/O performance of the pool will be. Go with lots of smaller vdevs. With 14 drives, play around with the following: 2x raidz2 vdevs using 7 drives each 3x raidz2 vdevs using 5 drives each (with two hot-spares, or a mirror vdev for root?) 4x raidz2 vdevs using 4 drives each (with one hot-spare, perhaps?) 4x raidz1 vdevs using 4 drives each (maybe not enough redundancy?) 5x mirror vdevs using 3 drives each (maybe too much lost space for redundancy?) 7x mirror vdevs using 2 drives each You really need to decide which is more important: raw storage space or raw I/O throughput. They're almost (not quite, but almost) mutually exclusive. Something i forgot to note in my original post is the performance numbers i am concerned with are going to be during reads primarily. There could be at any one point 4 media players attempting to stream media from this server. The media players all have 100mb interfaces so as long i can can reliable stream 400mb/s it should be ok (this is assuming all the media players were playing high bitrate Blueray streams at one time). Any writing to this array would happen pretty infrequently and i normally schedule any file transfers for the wee hours of the morning anyway. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, Apr 7, 2010 at 12:29 PM, Frank Middleton f.middle...@apogeect.comwrote: On 04/ 7/10 03:09 PM, Jason S wrote: I was actually already planning to get another 4 gigs of ram for the box right away anyway, but thank you for mentioning it! As there appears to be a couple ways to skin the cat here i think i am going to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance is like. I have a fews days of grace before i need to have this server ready for duty. Just curious, what are you planning to boot from? AFAIK you can't boot ZFS from anything much more complicated than a mirror. The OP mentioned OpenSolaris, so can't comment on what can/can't be booted from on that OS. However, FreeBSD 8 can boot from a mirror pool, a raidz1 pool, and a raidz2 pool. So it's not a limitation in ZFS itself. :) -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
Ahh, Thank you for the reply Bob, that is the info i was after. It looks like i will be going with the 2 X 7 RaidZ2 option. And just to clarify as far as expanding this pool in the future my only option is to add another 7 spindle RaidZ2 array correct? Thanks for all the help guys ! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?
On Wed, Apr 7, 2010 at 2:20 PM, Jeremy Archer j4rc...@gmail.com wrote: GreenBytes (USA) sells OpenSolaris based storage appliances Web site: www.getgreenbytes.com http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Unless something has changed recently, they were using their own modified, and non-open-source version of ZFS. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
I am booting from a single 74gig WD raptor attached to the motherboards onboard SATA port. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, Apr 7 at 12:41, Jason S wrote: And just to clarify as far as expanding this pool in the future my only option is to add another 7 spindle RaidZ2 array correct? That is correct, unless you want to use the -f option to force-allow an asymmetric expansion of your pool. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, 2010-04-07 at 12:41 -0700, Jason S wrote: Ahh, Thank you for the reply Bob, that is the info i was after. It looks like i will be going with the 2 X 7 RaidZ2 option. And just to clarify as far as expanding this pool in the future my only option is to add another 7 spindle RaidZ2 array correct? Thanks for all the help guys ! You can add arbitrary-sized vdevs to a pool, but you can't add any drives to an existing raidz[123] vdev. You can even add things like a mirrored vdev to a pool consisting of several raidz[123] vdevs. :-) Thus, it would certainly be possible to add, say, a 4-drive raidz1 to your 2x7 pool. It wouldn't perform quite the same as a 3x7 pool, but it still would perform better than the 2x7 pool. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
I worked around the problem by first creating a filesystem of the same name with compression=gzip on the target server. Like this: zfs create sas/archive zfs set compression=gzip sas/archive Then I used zfs receive with the -F option: zfs send -vR promise1/arch...@daily.1 | zfs send zfs receive -vFd sas And now I have gzip compression enabled locally: zfs get compression sas/archive NAME PROPERTY VALUE SOURCE sas/archive compression gzip local Not pretty, but it works. Daniel Bakken On Wed, Apr 7, 2010 at 12:51 PM, Cindy Swearingen cindy.swearin...@oracle.com wrote: Hi Daniel, I tried to reproduce this by sending from a b130 system to a s10u9 system, which vary in pool versions, but this shouldn't matter. I've been sending/receiving streams between latest build systems and older s10 systems for a long time. The zfs send -R option to send a recursive snapshot and all properties integrated into b77 so that isn't your problem either. The above works as expected. See below. I also couldn't find any recent bugs related to this, but bug searching is not an exact science. Mystified as well... Cindy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
Freddie, now you have brought up another question :) I had always assumed that i would just used open solaris for this file server build, as i had not actually done any research in regards to other operatin systems that support ZFS. Does anyone have any advice as to wether i should be considering FreeBSD instead of Open Solaris? Both operating systems are somewhat foriegn to me as i come from the windows domain with a little bit of linux experience as well. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
The receive side is running build 111b (2009.06), so I'm not sure if your advice actually applies to my situation. Daniel Bakken On Tue, Apr 6, 2010 at 10:57 PM, Tom Erickson thomas.erick...@oracle.comwrote: After build 128, locally set properties override received properties, and this would be the expected behavior. In that case, the value was received and you can see it like this: % zfs get -o all compression tank NAME PROPERTY VALUE RECEIVED SOURCE tank compression ongzip local % You could make the received value the effective value (clearing the local value) like this: % zfs inherit -S compression tank % zfs get -o all compression tank NAME PROPERTY VALUE RECEIVED SOURCE tank compression gzip gzip received % If the receive side is below the version that supports received properties, then I would expect the receive to set compression=gzip. After build 128 'zfs receive' prints an error message for every property it fails to set. Before that version, 'zfs receive' is silent when it fails to set a property so long as everything else is successful. I might check whether I have permission to set compression with 'zfs allow'. You could pipe the send stream to zstreamdump to verify that compression=gzip is in the send stream, but I think before build 125 you will not have zstreamdump. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Locking snapshots when using zfs send
I just bought a new set of disks, and want to move my primary data store over to the new disks. I created a new pool fine, and now I'm trying to use zfs send -R | zfs receive to transfer the data. Here's the error I got: $ pfexec zfs send -Rpv h...@next | pfexec zfs receive -duvF temp sending from @ to h...@sync receiving full stream of h...@sync into t...@sync sending from @sync to h...@zfs-auto-snap:frequent-2010-04-05-22:00 warning: cannot send 'h...@zfs-auto-snap:frequent-2010-04-05-22:00': no such pool or datasetsending from @zfs-auto-snap:frequent-2010-04-05-22:00 to h...@zfs-auto-snap:frequent-2010-04-06-00:00 warning: cannot send 'h...@zfs-auto-snap:frequent-2010-04-06-00:00': no such pool or dataset sending from @zfs-auto-snap:frequent-2010-04-06-00:00 to h...@zfs-auto-snap:frequent-2010-04-06-11:45 warning: cannot send 'h...@zfs-auto-snap:frequent-2010-04-06-11:45': no such pool or dataset sending from @zfs-auto-snap:frequent-2010-04-06-11:45 to h...@next warning: cannot send 'h...@next': incremental source (@zfs-auto-snap:frequent-2010-04-06-11:45) does not exist cannot receive new filesystem stream: invalid backup stream This process took about 12 hours to do, so it's frustrating that (apparently) snapshots disappearing causes the replication to fail. Perhaps some sort of locking should be implemented to prevent snapshots that will be needed from being destroyed. In the meantime, I disabled all the zfs/auto-snapshot* services. Should this be enough to prevent the send process from failing again, or are there other steps I should take? Thanks! Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, Apr 7, 2010 at 1:22 PM, Jason S j.sin...@shaw.ca wrote: now you have brought up another question :) I had always assumed that i would just used open solaris for this file server build, as i had not actually done any research in regards to other operatin systems that support ZFS. Does anyone have any advice as to wether i should be considering FreeBSD instead of Open Solaris? Both operating systems are somewhat foriegn to me as i come from the windows domain with a little bit of linux experience as well. If you want access to the latest and greatest ZFS features as soon as they are available, you'll need to use OpenSolaris (currently at ZFSv22 or newer). If you don't mind waiting up to a year for new ZFS features, you can use FreeBSD (currently at ZFSv13 in 7.3 and 8.0). Hardware support for enterprise server gear may be better in OSol. Hardware support for general server gear should be about the same. Hardware support for desktop gear may be better in FreeBSD. Each has fancy software features that the other doesn't (GEOM, HAST, IPFW/PF, Jails, etc in FreeBSD; Zones, Crossbow, whatever that fancy admin framework is called, integrated iSCSI, integrated CIFS, etc in OSol). I'm biased toward FreeBSD, but that's because I've never used OSol. Anything is better than Linux. ;) -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, 7 Apr 2010, Jason S wrote: systems that support ZFS. Does anyone have any advice as to wether i should be considering FreeBSD instead of Open Solaris? Both operating systems are somewhat foriegn to me as i come from the FreeBSD zfs does clearly work, although it is an older version of zfs (version 13) than comes with the latest Solaris 10 (version 15), or development OpenSolaris. Zfs is better integrated into Solaris than it is in FreeBSD since it was designed for Solaris. While I have not used FreeBSD zfs, my experience with Solaris 10 and FreeBSD is that Solaris 10 (and later) is an extremely feature-rich system which can take considerable time to figure out if you really want to use all of those features (but you don't have to). FreeBSD is simpler because it does not do as much. FreeBSD boots extremely fast. If your only interest is with zfs, my impression is that in a year or two it will not really matter if you are using Solaris or FreeBSD because FreeBSD will have an updated zfs (with deduplication) and will be more mature than it is now. Today zfs is more mature and stable in Solaris. Solaris NFS is clearly more mature and performant than in FreeBSD. OpenSolaris native CIFS is apparently quite a good performer. I find that Solaris 10 with Samba works well for me. Solaris 10's Live Upgrade (and the OpenSolaris equivalent) is quite valuable in that it allows you to upgrade the OS without more than a few minutes of down-time and with a quick fall-back if things don't work as expected. It is more straightforward to update a FreeBSD install from source code because that is the way it is normally delivered. Sometimes this is useful in order to incorporate a fix as soon as possible without needing to wait for someone to produce binaries. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
Since i already have Open Solaris installed on the box, i probably wont jump over to FreeBSD. However someone has suggested to me to look into www.nexenta.org and i must say it is quite interesting. Someone correct me if i am wrong but it looks like it is Open Solaris based and has basically everything i am looking for (NFS and CIFS sharing). I am downloading it right now and am going to install it on another machine to see if this GUI is easy enough to use. Does anyone have any experience or pointers with this NAS software? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Locking snapshots when using zfs send
On Wed, Apr 7, 2010 at 1:32 PM, Will Murnane will.murn...@gmail.com wrote: This process took about 12 hours to do, so it's frustrating that (apparently) snapshots disappearing causes the replication to fail. Perhaps some sort of locking should be implemented to prevent snapshots that will be needed from being destroyed. What release of opensolaris are you using? Recent versions have the ability to place holds on snapshots, and doing a send will automatically place holds on the snapshots. zfs hold tank/foo/b...@now zfs release tank/foo/b...@now -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Locking snapshots when using zfs send
On Wed, Apr 7, 2010 at 17:51, Brandon High bh...@freaks.com wrote: On Wed, Apr 7, 2010 at 1:32 PM, Will Murnane will.murn...@gmail.com wrote: This process took about 12 hours to do, so it's frustrating that (apparently) snapshots disappearing causes the replication to fail. Perhaps some sort of locking should be implemented to prevent snapshots that will be needed from being destroyed. What release of opensolaris are you using? Recent versions have the ability to place holds on snapshots, and doing a send will automatically place holds on the snapshots. This is on b134: $ pfexec pkg image-update No updates available for this image. There is a zfs hold command available, but checking for holds on the snapshot I'm trying to send (I started it again, to see if disabling automatic snapshots helped) doesn't show anything: $ zfs holds -r h...@next $ echo $? 0 and applying a recursive hold to that snapshot doesn't seem to hold all its children: $ pfexec zfs hold -r keep h...@next $ zfs holds -r h...@next NAME TAG TIMESTAMP huge/homes/d...@next keep Wed Apr 7 18:02:09 2010 h...@nextkeep Wed Apr 7 18:02:09 2010 $ zfs list -r -t all huge | grep next h...@next204K - 2.80T - huge/back...@next 0 - 42.0K - huge/ho...@next 0 - 42.9M - huge/homes/cnl...@next 59.9K - 165G - huge/homes/d...@next 0 - 42.0K - huge/homes/svnb...@next 0 - 46.4M - huge/homes/w...@next23.9M - 95.7G - Suggestions? Comments? Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wednesday, April 7, 2010, Jason S j.sin...@shaw.ca wrote: Since i already have Open Solaris installed on the box, i probably wont jump over to FreeBSD. However someone has suggested to me to look into www.nexenta.org and i must say it is quite interesting. Someone correct me if i am wrong but it looks like it is Open Solaris based and has basically everything i am looking for (NFS and CIFS sharing). I am downloading it right now and am going to install it on another machine to see if this GUI is easy enough to use. Does anyone have any experience or pointers with this NAS software? -- This message posted from opensolaris.org _ I wouldn't waste your time. My last go round lacp was completely broken for no apparent reason. The community is basically non-existent. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Apr 7, 2010, at 16:47, Bob Friesenhahn wrote: Solaris 10's Live Upgrade (and the OpenSolaris equivalent) is quite valuable in that it allows you to upgrade the OS without more than a few minutes of down-time and with a quick fall-back if things don't work as expected. It is more straightforward to update a FreeBSD install from source code because that is the way it is normally delivered. Sometimes this is useful in order to incorporate a fix as soon as possible without needing to wait for someone to produce binaries. If you're going to go with (Open)Solaris, the OP may also want to look into the multi-platform pkgsrc for third-party open source software: http://www.pkgsrc.org/ http://en.wikipedia.org/wiki/Pkgsrc It's not as comprehensive as FreeBSD Ports (21,500 and counting), but it has the major stuff and is quite good. I'd also look into the FreeBSD Handbook: http://freebsd.org/handbook ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
Here is the info from zstreamdump -v on the sending side: BEGIN record hdrtype = 2 features = 0 magic = 2f5bacbac creation_time = 0 type = 0 flags = 0x0 toguid = 0 fromguid = 0 toname = promise1/arch...@daily.1 nvlist version: 0 tosnap = daily.1 fss = (embedded nvlist) nvlist version: 0 0xcfde021e56c8fc = (embedded nvlist) nvlist version: 0 name = promise1/archive parentfromsnap = 0x0 props = (embedded nvlist) nvlist version: 0 mountpoint = /promise1/archive compression = 0xa dedup = 0x2 (end props) I assume that compression = 0xa means gzip. I wonder if the dedup property is causing the receiver (build 111b) to disregard all other properties, since the receiver doesn't support dedup. Dedup was enabled in the past on the sending filesystem, but is now disabled for reasons of sanity. I'd like to try the dtrace debugging, but it would destroy the progress I've made so far transferring the filesystem. Thanks, Daniel On Wed, Apr 7, 2010 at 12:52 AM, Tom Erickson thomas.erick...@oracle.comwrote: The advice regarding received vs local properties definitely does not apply. You could still confirm the presence of the compression property in the send stream with zstreamdump, since the send side is running build 129. To debug the receive side I might dtrace the zap_update() function with the fbt provider, something like zfs send -R promise1/arch...@daily.1 | dtrace -c 'zfs receive -vd sas' \ -n 'fbt::zap_update:entry / stringof(args[2]) == compression || \ stringof(args[2]) == compression$recvd / { self-trace = 1; }' \ -n 'fbt::zap_update:return / self-trace / { trace(args[1]); \ self-trace = 0; }' and look for non-zero return values. I'd also redirect 'zdb -vvv poolname' to a file and search it for compression to check the value in the ZAP. I assume you have permission to set the compression property on the receive side, but I'd check anyway. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Apr 7, 2010, at 3:24 PM, Tim Cook wrote: On Wednesday, April 7, 2010, Jason S j.sin...@shaw.ca wrote: Since i already have Open Solaris installed on the box, i probably wont jump over to FreeBSD. However someone has suggested to me to look into www.nexenta.org and i must say it is quite interesting. Someone correct me if i am wrong but it looks like it is Open Solaris based and has basically everything i am looking for (NFS and CIFS sharing). I am downloading it right now and am going to install it on another machine to see if this GUI is easy enough to use. Does anyone have any experience or pointers with this NAS software? -- This message posted from opensolaris.org _ I wouldn't waste your time. My last go round lacp was completely broken for no apparent reason. The community is basically non-existent. [richard pinches himself... yep, still there :-)] NexentaStor version 3.0 is based on b134 so it has the same basic foundation as the yet-unreleased OpenSolaris 2010.next. For an easy-to-use NAS box for the masses, it is much more friendly and usable than a basic OpenSolaris or Solaris 10 release. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, 7 Apr 2010, David Magda wrote: It is more straightforward to update a FreeBSD install from source code because that is the way it is normally delivered. Sometimes this is useful in order to incorporate a fix as soon as possible without needing to wait for someone to produce binaries. If you're going to go with (Open)Solaris, the OP may also want to look into the multi-platform pkgsrc for third-party open source software: http://www.pkgsrc.org/ http://en.wikipedia.org/wiki/Pkgsrc But this does not update the OS kernel. It is for application packages. I did have to apply a source patch to the FreeBSD kernel the last time around. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, Apr 7, 2010 at 4:27 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Wed, 7 Apr 2010, David Magda wrote: It is more straightforward to update a FreeBSD install from source code because that is the way it is normally delivered. Sometimes this is useful in order to incorporate a fix as soon as possible without needing to wait for someone to produce binaries. If you're going to go with (Open)Solaris, the OP may also want to look into the multi-platform pkgsrc for third-party open source software: http://www.pkgsrc.org/ http://en.wikipedia.org/wiki/Pkgsrc But this does not update the OS kernel. It is for application packages. I did have to apply a source patch to the FreeBSD kernel the last time around. This is getting a bit off-topic regarding ZFS, but you only need to patch the FreeBSD kernel if you don't want to wait for an errata/security notice to be made. If you can wait, then you can just use the freebsd-update tool to do a binary update of just the affected (files), or even to the next (major or minor) release. Not sure what the equivalent process would be on (Open)Solaris (or if you even can do a patch/source update). However, I believe the mention of pkgsrc was for use on OSol. There's very little reason to use pkgsrc on FreeBSD. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Chris Dunbar like to clarify something. If read performance is paramount, am I correct in thinking RAIDZ is not the best way to go? Would not the ZFS equivalent of RAID 10 (striped mirror sets) offer better read performance? In this case, I realize that Jason also needs to maximize the space he has in order to store all of those legitimately copied Blu-Ray movies. ;-) During my testing, for sequential reads using 6 disks, I got these numbers: (normalized times performance of a single disk) Stripe of 3 mirrors: 10.89 Raidz 6disks: 9.84 Raidz2 6disks: 7.17 Where any number 2 would max out a GigE. The main performance advantage of the stripe of mirrors is in the random reads, which aren't very significant for Jason's case. http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary. pdf ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of David Magda If you're going to go with (Open)Solaris, the OP may also want to look into the multi-platform pkgsrc for third-party open source software: http://www.pkgsrc.org/ http://en.wikipedia.org/wiki/Pkgsrc Am I mistaken? I thought pkgsrc was for netbsd. For solaris/opensolaris, I would normally say opencsw or blastwave. (And in some circumstances, sunfreeware.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
We have found the problem. The mountpoint property on the sender was at one time changed from the default, then later changed back to defaults using zfs set instead of zfs inherit. Therefore, zfs send included these local non-default properties in the stream, even though the local properties are effectively set at defaults. This caused the receiver to stop processing subsequent properties in the stream because the mountpoint isn't valid on the receiver. I tested this theory with a spare zpool. First I used zfs inherit mountpoint promise1/archive to remove the local setting (which was exactly the same value as the default). This time the compression=gzip property was correctly received. It seems like a bug to me that one failed property in a stream prevents the rest from being applied. I should have used zfs inherit, but it would be best if zfs receive handled failures more gracefully, and attempted to set as many properties as possible. Thanks to Cindy and Tom for their help. Daniel On Wed, Apr 7, 2010 at 2:31 AM, Tom Erickson thomas.erick...@oracle.comwrote: Now I remember that 'zfs receive' used to give up after the first property it failed to set. If I'm remembering correctly, then, in this case, if the mountpoint was invalid on the receive side, 'zfs receive' would not even try to set the remaining properties. I'd try the following in the source dataset: zfs inherit mountpoint promise1/archive to clear the explicit mountpoint and prevent it from being included in the send stream. Later set it back the way it was. (Soon there will be an option to take care of that; see CR 6883722 want 'zfs recv -o prop=value' to set initial property values of received dataset.) Then see if you receive the compression property successfully. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
Daniel Bakken wrote: When I send a filesystem with compression=gzip to another server with compression=on, compression=gzip is not set on the received filesystem. I am using: zfs send -R promise1/arch...@daily.1 | zfs receive -vd sas The zfs manpage says regarding the -R flag: When received, all properties, snapshots, descendent file systems, and clones are preserved. Snapshots are preserved, but the compression property is not. Any ideas why this doesn't work as advertised? After build 128, locally set properties override received properties, and this would be the expected behavior. In that case, the value was received and you can see it like this: % zfs get -o all compression tank NAME PROPERTY VALUE RECEIVED SOURCE tank compression ongzip local % You could make the received value the effective value (clearing the local value) like this: % zfs inherit -S compression tank % zfs get -o all compression tank NAME PROPERTY VALUE RECEIVED SOURCE tank compression gzip gzip received % If the receive side is below the version that supports received properties, then I would expect the receive to set compression=gzip. After build 128 'zfs receive' prints an error message for every property it fails to set. Before that version, 'zfs receive' is silent when it fails to set a property so long as everything else is successful. I might check whether I have permission to set compression with 'zfs allow'. You could pipe the send stream to zstreamdump to verify that compression=gzip is in the send stream, but I think before build 125 you will not have zstreamdump. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
Daniel Bakken wrote: The receive side is running build 111b (2009.06), so I'm not sure if your advice actually applies to my situation. The advice regarding received vs local properties definitely does not apply. You could still confirm the presence of the compression property in the send stream with zstreamdump, since the send side is running build 129. To debug the receive side I might dtrace the zap_update() function with the fbt provider, something like zfs send -R promise1/arch...@daily.1 | dtrace -c 'zfs receive -vd sas' \ -n 'fbt::zap_update:entry / stringof(args[2]) == compression || \ stringof(args[2]) == compression$recvd / { self-trace = 1; }' \ -n 'fbt::zap_update:return / self-trace / { trace(args[1]); \ self-trace = 0; }' and look for non-zero return values. I'd also redirect 'zdb -vvv poolname' to a file and search it for compression to check the value in the ZAP. I assume you have permission to set the compression property on the receive side, but I'd check anyway. Tom On Tue, Apr 6, 2010 at 10:57 PM, Tom Erickson thomas.erick...@oracle.com mailto:thomas.erick...@oracle.com wrote: After build 128, locally set properties override received properties, and this would be the expected behavior. In that case, the value was received and you can see it like this: % zfs get -o all compression tank NAME PROPERTY VALUE RECEIVED SOURCE tank compression ongzip local % You could make the received value the effective value (clearing the local value) like this: % zfs inherit -S compression tank % zfs get -o all compression tank NAME PROPERTY VALUE RECEIVED SOURCE tank compression gzip gzip received % If the receive side is below the version that supports received properties, then I would expect the receive to set compression=gzip. After build 128 'zfs receive' prints an error message for every property it fails to set. Before that version, 'zfs receive' is silent when it fails to set a property so long as everything else is successful. I might check whether I have permission to set compression with 'zfs allow'. You could pipe the send stream to zstreamdump to verify that compression=gzip is in the send stream, but I think before build 125 you will not have zstreamdump. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Locking snapshots when using zfs send
On Apr 7, 2010, at 5:06 PM, Will Murnane wrote: This is on b134: $ pfexec pkg image-update No updates available for this image. There is a zfs hold command available, but checking for holds on the snapshot I'm trying to send (I started it again, to see if disabling automatic snapshots helped) doesn't show anything: $ zfs holds -r h...@next $ echo $? 0 and applying a recursive hold to that snapshot doesn't seem to hold all its children: $ pfexec zfs hold -r keep h...@next Hmm, I made a number of fixes in build 132 related to destroying snapshots while sending replication streams. I'm unable to reproduce the 'zfs holds -r' issue on build 133. I'll try build 134, but I'm not aware of any changes in that area. -Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
Daniel Bakken wrote: Here is the info from zstreamdump -v on the sending side: BEGIN record hdrtype = 2 features = 0 magic = 2f5bacbac creation_time = 0 type = 0 flags = 0x0 toguid = 0 fromguid = 0 toname = promise1/arch...@daily.1 nvlist version: 0 tosnap = daily.1 fss = (embedded nvlist) nvlist version: 0 0xcfde021e56c8fc = (embedded nvlist) nvlist version: 0 name = promise1/archive parentfromsnap = 0x0 props = (embedded nvlist) nvlist version: 0 mountpoint = /promise1/archive compression = 0xa dedup = 0x2 (end props) I assume that compression = 0xa means gzip. Yep, that's ZIO_COMPRESS_GZIP_6, the default gzip. I wonder if the dedup property is causing the receiver (build 111b) to disregard all other properties, since the receiver doesn't support dedup. Dedup was enabled in the past on the sending filesystem, but is now disabled for reasons of sanity. Now I remember that 'zfs receive' used to give up after the first property it failed to set. If I'm remembering correctly, then, in this case, if the mountpoint was invalid on the receive side, 'zfs receive' would not even try to set the remaining properties. You could try 'zfs get mountpoint' (or 'zdb -vvv poolname file' and search the file for 'mountpoint') to see if that was set. I'd like to try the dtrace debugging, but it would destroy the progress I've made so far transferring the filesystem. Maybe you could try receiving into a new pool that you can later throw away. zpool create bogustestpool c0t0d0 zfs send -R promise1/arch...@daily.1 | zfs receive -vd bogustestpool I'd try the following in the source dataset: zfs inherit mountpoint promise1/archive to clear the explicit mountpoint and prevent it from being included in the send stream. Later set it back the way it was. (Soon there will be an option to take care of that; see CR 6883722 want 'zfs recv -o prop=value' to set initial property values of received dataset.) Then see if you receive the compression property successfully. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compression property not received
Daniel Bakken wrote: We have found the problem. The mountpoint property on the sender was at one time changed from the default, then later changed back to defaults using zfs set instead of zfs inherit. Therefore, zfs send included these local non-default properties in the stream, even though the local properties are effectively set at defaults. This caused the receiver to stop processing subsequent properties in the stream because the mountpoint isn't valid on the receiver. I tested this theory with a spare zpool. First I used zfs inherit mountpoint promise1/archive to remove the local setting (which was exactly the same value as the default). This time the compression=gzip property was correctly received. It seems like a bug to me that one failed property in a stream prevents the rest from being applied. I should have used zfs inherit, but it would be best if zfs receive handled failures more gracefully, and attempted to set as many properties as possible. Yes, that was fixed in build 128. Thanks to Cindy and Tom for their help. Glad to hear we identified the problem. Sorry for the trouble. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Apr 7, 2010, at 19:58, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of David Magda If you're going to go with (Open)Solaris, the OP may also want to look into the multi-platform pkgsrc for third-party open source software: http://www.pkgsrc.org/ http://en.wikipedia.org/wiki/Pkgsrc Am I mistaken? I thought pkgsrc was for netbsd. For solaris/opensolaris, I would normally say opencsw or blastwave. (And in some circumstances, sunfreeware.) It was originally created by the NetBSD (forking from FreeBSD), but like everything else they seem to do, it's multi-platform: BSDs, Linux, Darwin/OS X, IRIX, AIX, Interix, QNX, HP-UX, and Solaris. AFAIK you can do cross-compiling as well (i.e., use Pkgsrc on Linux/AMD to compile a package for IRIX/MIPS). Pkgsrc currently has 9500 packages; Blastwave 4500; OpenCSW about 2300 AFAICT; FreeBSD Ports, 21500. YMMV. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, Apr 7, 2010 at 5:59 PM, Richard Elling richard.ell...@gmail.comwrote: On Apr 7, 2010, at 3:24 PM, Tim Cook wrote: On Wednesday, April 7, 2010, Jason S j.sin...@shaw.ca wrote: Since i already have Open Solaris installed on the box, i probably wont jump over to FreeBSD. However someone has suggested to me to look into www.nexenta.org and i must say it is quite interesting. Someone correct me if i am wrong but it looks like it is Open Solaris based and has basically everything i am looking for (NFS and CIFS sharing). I am downloading it right now and am going to install it on another machine to see if this GUI is easy enough to use. Does anyone have any experience or pointers with this NAS software? -- This message posted from opensolaris.org _ I wouldn't waste your time. My last go round lacp was completely broken for no apparent reason. The community is basically non-existent. [richard pinches himself... yep, still there :-)] NexentaStor version 3.0 is based on b134 so it has the same basic foundation as the yet-unreleased OpenSolaris 2010.next. For an easy-to-use NAS box for the masses, it is much more friendly and usable than a basic OpenSolaris or Solaris 10 release. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com **Unless of course you were looking for any community support or basic LACP functionality. ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
On Wed, Apr 7, 2010 at 4:58 PM, Edward Ned Harvey solar...@nedharvey.comwrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of David Magda If you're going to go with (Open)Solaris, the OP may also want to look into the multi-platform pkgsrc for third-party open source software: http://www.pkgsrc.org/ http://en.wikipedia.org/wiki/Pkgsrc Am I mistaken? I thought pkgsrc was for netbsd. For solaris/opensolaris, I would normally say opencsw or blastwave. (And in some circumstances, sunfreeware.) pkgsrc is available for several Unix-like systems. NetBSD is just the origin of it, and the main development environment. It's even available for MacOSX, DragonFlyBSD, Linux distros, and more. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RaidZ recommendation
Go with the 2x7 raidz2. When you start to really run out of space, replace the drives with bigger ones. You will run out of space eventually regardless; this way you can replace 7 at a time, not 14 at a time. With luck, each replacement will last you long enough that the next replacement will come when the next generation of drive sizes is at the price sweet-spot. -- Dan. pgpLa7oRhxFnc.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss