Re: [zfs-discuss] Sonnet Tempo SSD supported?
On Mon, Dec 03, 2012 at 06:28:17PM -0500, Peter Tripp wrote: HI Eugen, Whether it's compatible entirely depends on the chipset of the SATA controller. This is what I was trying to find out. I guess I just have to test it empirically. Basically that card is just a dual port 6gbps PCIe SATA controller with the space to mount one ($149) or two ($299) 2.5inch disks. Sonnet, a mac focused company, offers it as a way to better utilize existing Mac Pros already in the field without an external box. Mac Pros only have 3gbps SATA2 and a 4x3.5inch drive backplane, but nearly all have a free full-length PCIe slot. This product only makes sense if you're trying to run OpenIndiana on a Mac Pro, which in my experience is more trouble than it's worth, but to each their own I guess. My application is to stick 2x SSDs into a SunFire X2100 M2, without resorting to splicing into power cables and mounting SSD in random location with double-side sticky tape. Depending on hardware support I'll either run OpenIndiana or Linux with a zfs hybrid pool (2x SATA drives as mirrored pool). If you can confirm the chipset you might get lucky and have it be a supported chip. The big chip is labelled PLX, but I can't read the markings and wasn't aware PLX made any PCIe SATA controllers (PCIe and USB/SATA bridges sure, but not straight controllers) so that may not even be the chip we care about. http://www.profil-marketing.com/uploads/tx_lipresscenter/Sonnet_Tempo_SSD_Pro_01.jpg Eiter way I'll know the hardware support situation soon enough. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can I copy a ZFS filesystem back and forth
Thanks for the help Chris! Cheers, Fritz You wrote: original and and rename the new one, or zfs send or ?? Can I do a send and receive into a filesystem with attributes set as I want or does the receive keep the same attributes as well? Thank you. That will work. Just create the new filesystem with the attributes you want and send/recv the latest snapshot. As the data is received the gzip compression will be applied. Since the new filesystem already exists you will have to do a zfs receive -Fv to force it. --chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sonnet Tempo SSD supported?
On Dec 4, 2012, Eugen Leitl wrote: Either way I'll know the hardware support situation soon enough. Have you tried contacting Sonnet? -Gary ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sonnet Tempo SSD supported?
On Tue, Dec 04, 2012 at 03:38:07AM -0800, Gary Driggs wrote: On Dec 4, 2012, Eugen Leitl wrote: Either way I'll know the hardware support situation soon enough. Have you tried contacting Sonnet? No, but I did some digging. It *might* be a Marvell 88SX7042, which would be then supported by Linux, but not by Solaris http://www.nexentastor.org/boards/1/topics/2383 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sonnet Tempo SSD supported?
On Tue, Dec 04, 2012 at 11:07:17AM +0100, Eugen Leitl wrote: On Mon, Dec 03, 2012 at 06:28:17PM -0500, Peter Tripp wrote: HI Eugen, Whether it's compatible entirely depends on the chipset of the SATA controller. This is what I was trying to find out. I guess I just have to test it empirically. Basically that card is just a dual port 6gbps PCIe SATA controller with the space to mount one ($149) or two ($299) 2.5inch disks. Sonnet, a mac focused company, offers it as a way to better utilize existing Mac Pros already in the field without an external box. Mac Pros only have 3gbps SATA2 and a 4x3.5inch drive backplane, but nearly all have a free full-length PCIe slot. This product only makes sense if you're trying to run OpenIndiana on a Mac Pro, which in my experience is more trouble than it's worth, but to each their own I guess. My application is to stick 2x SSDs into a SunFire X2100 M2, without resorting to splicing into power cables and mounting SSD in random location with double-side sticky tape. Depending on hardware support I'll either run OpenIndiana or Linux with a zfs hybrid pool (2x SATA drives as mirrored pool). If you can confirm the chipset you might get lucky and have it be a supported chip. The big chip is labelled PLX, but I can't read the markings and wasn't aware PLX made any PCIe SATA controllers (PCIe and USB/SATA bridges sure, but not straight controllers) so that may not even be the chip we care about. http://www.profil-marketing.com/uploads/tx_lipresscenter/Sonnet_Tempo_SSD_Pro_01.jpg Eiter way I'll know the hardware support situation soon enough. I see a Marvell 88SE9182 on that Sonnet. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS QoS and priorities
On Nov 29, 2012, at 1:56 AM, Jim Klimov jimkli...@cos.ru wrote: I've heard a claim that ZFS relies too much on RAM caching, but implements no sort of priorities (indeed, I've seen no knobs to tune those) - so that if the storage box receives many different types of IO requests with different administrative weights in the view of admins, it can not really throttle some IOs to boost others, when such IOs have to hit the pool's spindles. Caching has nothing to do with QoS in this context. *All* modern filesystems cache to RAM, otherwise they are unusable. For example, I might want to have corporate webshop-related databases and appservers to be the fastest storage citizens, then some corporate CRM and email, then various lower priority zones and VMs, and at the bottom of the list - backups. Please read the papers on the ARC and how it deals with MFU and MRU cache types. You can adjust these policies using the primarycache and secondarycache properties at the dataset level. AFAIK, now such requests would hit the ARC, then the disks if needed - in no particular order. Well, can the order be made particular with current ZFS architecture, i.e. by setting some datasets to have a certain NICEness or another priority mechanism? ZFS has a priority-based I/O scheduler that works at the DMU level. However, there is no system call interface in UNIX that transfers priority or QoS information (eg read() or write()) into the file system VFS interface. So the grainularity of priority control is by zone or dataset. -- richard -- richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Digging in the bowels of ZFS
On 2012-12-03 18:23, Jim Klimov wrote: On 2012-12-02 05:42, Jim Klimov wrote: So... here are some applied questions: Well, I am ready to reply a few of my own questions now :) Continuing the desecration of my deceased files' resting grounds... 2) Do I understand correctly that for the offset definition, sectors in a top-level VDEV (which is all of my pool) are numbered in rows per-component disk? Like this: 0 1 2 3 4 5 6 7 8 9 10 11... That is, offset % setsize = disknum? If true, does such numbering scheme apply all over the TLVDEV, so as for my block on a 6-disk raidz2 disk set - its sectors start at (roughly rounded) offset_from_DVA / 6 on each disk, right? 3) Then, if I read the ZFS on-disk spec correctly, the sectors of the first disk holding anything from this block would contain the raid-algo1 permutations of the four data sectors, sectors of the second disk contain the raid-algo2 for those 4 sectors, and the remaining 4 disks contain the data sectors? My understanding was correct. For posterity, in the earlier set up example I had an uncompressed 128KB block residing at the address DVA[0]=0:590002c1000:3. Counting in my disks' 4KB sectors, this is 0x590002c1000/0x1000 = 0x590002C1 or 1493172929 logical offset into the TLVDEV number 0 (and the only one in this pool). Given that this TLVDEV is a 6-disk raidz2 set, my expected offset on each component drive is 1493172929/6 = 248862154.83 (.83=5/6), starting from after the ZFS header (2 labels and a reservation, amounting to 4MB = 1024*4KB sectors). So this block's allocation covers 8 4KB-sectors starting at 248862154+1024 on disk5 and at 248862155+1024 on disks 0,1,2,3,4. As my further tests showed, the sector-columns (not rows as I had expected after doc-reading) from disks 1,2,3,4 do recombine into the original userdata (sha256 checksum matches), so disks 5 and 0 should hold the two parities - how ever that is calculated: # for D in 1 2 3 4; do dd bs=4096 count=8 conv=noerror,sync \ if=/dev/dsk/c7t${D}d0s0 of=b1d${D}.img skip=248863179; done # for D in 1 2 3 4; do for R in 0 1 2 3 4 5 6 7; do \ dd if=/pool/test3/b1d${D}.img bs=4096 skip=$R count=1; \ done; done /tmp/d Note that the latter can be greatly simplified as cat, which also works to the same effect, and is faster: # cat /pool/test3/b1d?.img /tmp/d However I left the difficult notation to use in experiments later on. That is, the original 128KB block was cut into 4 pieces (my 4 data drives in the 6-disk raidz2 set), and each 32Kb strip was stored on a separate drive. Nice descriptive pictures in some presentations suggested to me that the original block is stored sector by sector rotating onto the next disk - the set of 4 sectors with 2 parity sectors in my case being a single stripe for the RAID purposes. This directly suggested that incomplete such stripes, such as the ends of files or whole small files, would still have the two parity sectors and a handful of data sectors. Reality differs. For undersized allocations, i.e. of compressed data, it is possible to see P-sizes not divisible by 4 (disks) in 4KB sectors, however, some sectors do apparently get wasted because the A-size in the DVA is divisible by 6*4KB. With columnar allocation of disks, it is easier to see why full stripes have to be used: p1 p2 d1 d2 d3 d4 . , 1 5 9 13 . , 2 6 10 14 . , 3 7 11 x . , 4 8 12 x In this illustration a 14-sector-long block is saved, with X being the empty leftovers, on which we can't really save (as would be the case with the other allocation, which is likely less efficient for CPU and IOs). The metadata blocks do have A-sizes of 0x3000 (2 parity + 1 data), at least, which on 4KB-sectored disks is also pretty much for these miniature data objects - but not as sad as 6*4KB would have been ;) It also seems that the instinctive desire to have raidzN sets of 4*M+N disks (i.e. 6-disk raidz2, 11-disk raidz3, etc.) which was discussed over and over on the list a couple of years ago, may still be valid with typical block sizes being powers of two... Even though gurus said that this should not matter much. For IOPS - maybe not. For wasted space - likely... I'm almost ready to go and test Q2 and Q3, however, the questions which regard useable tools (and what data should be fed into such tools?) are still on the table. Some OLD questions remain raised, just in case anyone answers them. 3b) The redundancy algos should in fact cover other redundancy disks too (in order to sustain loss of any 2 disks), correct? (...) 4) Where are the redundancy algorithms specified? Is there any simple tool that would recombine a given algo-N redundancy sector with some other 4 sectors from a 6-sector stripe in order to try and recalculate the sixth sector's contents? (Perhaps part of some unit tests?) 7) Is there a command-line tool to do lzjb compressions and