Re: [OpenIndiana-discuss] slicing a disk for ZIL
From: Jonathan Adams [mailto:t12nsloo...@gmail.com] I would suggest that on smaller systems you wouldn't bother messing with the ZIL :) It doesn't matter how big or small the system is. It matters how it's used. If you have a small database server, and some other database clients on the LAN, and you disable your ZIL on the server, you are begging for data loss. Or NFS server, or ... any other stateful service. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] slicing a disk for ZIL
From: Sebastian Gabler [mailto:sequoiamo...@gmx.net] As far as I am aware, the loss of the ZIL is no longer fatal for the pool in current zfs versions. As well, even with a corrupted ZIL, the pool should be recoverable. Hence, I was so far not worried too much. Should I better be? If you lose your log device (and system crashes) on a current system, you don't lose your whole pool. But you *do* lose the last few seconds of sync-writes on that pool. So as long as you can afford that loss ... AND you don't have any clients depending on the state of the server to remain consistent... Then it's ok. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
[OpenIndiana-discuss] slicing a disk for ZIL
Hi, I have bought and installed an Intel SSD 313 20 GB to use as ZIL for one or many pools. I am running openindiana on a x86 platform, no SPARC. As 4 GB should suffice, I am considering to partition the drive in order to assign each partition to one pool (ATM pools are 2 on the Server, but I could expand it in the future) After some reading, I am still confused about slicing and partitioning. What do I actually need to do to achieve the wanted effect to have up to 4 partions on that SSD? Thanks for any enlightenment. With best regards, Sebastian ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] slicing a disk for ZIL
On 2012-11-29 12:43, Sebastian Gabler wrote: I have bought and installed an Intel SSD 313 20 GB to use as ZIL for one or many pools. I am running openindiana on a x86 platform, no SPARC. As 4 GB should suffice, I am considering to partition the drive in order to assign each partition to one pool (ATM pools are 2 on the Server, but I could expand it in the future) After some reading, I am still confused about slicing and partitioning. What do I actually need to do to achieve the wanted effect to have up to 4 partions on that SSD? Well, basically you need to partition the SSD :) There are several options available: 1) Use MBR (MSDOS Boot Record) traditional partitions - up to four primary partitions, one (or more?) of which can allocate a range as a secondary (or extended) partition and store more parts there. In this case you would address your zfs log devices as cXtYdZpN. In case of IDE connection (which you don't want) you see cXdZpN. Here N is the partition number, with 0 being the partition table/whole disk itself, so your logs would be N=1,2,3,4 if you only use primary partitions (the whole disk should be divided between them, then. Extended partitions are addressed at N=5..15. 2) Use MBR with Solaris slices - define a Solaris partition in MBR using the whole disk, and define Solaris slices inside. Note that slice #2 is reserved as traditional whole disk for backup tools. It may cause zfs tools to report conflicts (slice overlaps), but you can ignore them. In this case your zfs log devices would be cXtYdZsN, with N being the slice number (0..7, 2 is reserved but can be reused if required). AFAIK you can't define more than one Solaris slice set on one HDD, though you can use other partitions for other filesystems. Maybe you can mix slices and partitions as components for the pool and its log/cache devices, but I won't bet on this. 3) Use a GPT (GUID Partition Table, maybe aka EFI) partitioning table. There you explicitly define many partitions, all primary, that can be used by your pools and their components (logs/caches). I think, Solaris format utility detects a GPT table and uses (and displays) those partitions in place of slices for traditional MBR+slice tables. I.e. with GPT you don't have slices, and usually don't need to - you can have many more GPT partitions than slices. When you dedicate a whole disk to ZFS to use in a non-root pool, it essentially makes a GPT table with an 8Mb reserved tail partition and the rest is a ZFS partition. As of now, the GPT-tabled disks can not be used for bootable root pools with Solaris-derived OSes (maybe usable with other distros and another GRUB in particular). I think you can not use separate ZIL with root pools - for mostly the same reasons only mirrors/single disks are allowed (the single device passed from BIOS/bootloader must be sufficient to boot the OS image). Maybe you can attach an L2ARC cache to an rpool, though. Also note that when using a sliced/partitioned disk with ZFS, you might need to ensure that disk caching is on for this device. This can be scripted as a bootup init-script calling format, if required. I think I or Edward Ned Harvey published variants of such scripts in zfs-discuss mailing list this year. Finally, you can use (g)parted and/or fdisk utilities to label a disk as having a particular table and its contents. For slices or partitions accessible to Solaris, and to configure some options like cache, you further use format utility and follow its text menu. HTH, //Jim Klimov ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] slicing a disk for ZIL
From: Sebastian Gabler [mailto:sequoiamo...@gmx.net] I have bought and installed an Intel SSD 313 20 GB to use as ZIL for one or many pools. I am running openindiana on a x86 platform, no SPARC. As 4 GB should suffice, I am considering to partition the drive in order to assign each partition to one pool (ATM pools are 2 on the Server, but I could expand it in the future) After some reading, I am still confused about slicing and partitioning. What do I actually need to do to achieve the wanted effect to have up to 4 partions on that SSD? Everybody seems to want to do this, and I can see why - If you look at the storage capacity of an SSD, you're like Hey, this thing has 64G and I only need 1 or 2 or 4G, that leaves all the rest unused. But you forget to think to yourself, Hey, this thing has a 6Gbit bus, and I'm trying to use it all to boost the performance of some other pool. The only situation where I think it's a good idea to slice the SSD and use it for more than one slog device, is: If you have two pools, and you know you're not planning to write them simultaneously. I have a job that reads from pool A and writes to pool B, and then it will read from pool B and write to pool A, and so forth. But this is highly contrived, and I seriously doubt it's what you're doing (until you say that's what you're doing.) The better thing is to swallow and accept 60G wasted on your slog device. It's not there for storage capacity - you bought it for speed. And there isn't excess speed going to waste. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] slicing a disk for ZIL
Hi Jim, thanks for the information. I managed to set up 4 partitions on the disk with fdisk: Total disk size is 3116 cylinders Cylinder size is 12544 (512 byte) blocks Cylinders Partition StatusType Start End Length % = == = === == === 1 Solaris2 1 779 779 25 2 Solaris2780 1558 779 25 3 Solaris2 1559 2336 778 25 4 Solaris2 2337 3114 778 25 Now, in /dev/rdsk and /dev/dsk, I see: # ls -sh /dev/dsk/ | grep c3t31d0p 512 c3t31d0p0 512 c3t31d0p1 512 c3t31d0p2 512 c3t31d0p3 512 c3t31d0p4 So, there are 5 Partitions. I am not sure which or if any count, and I have not found a way to verify that each of the Partitions is about 4 GB. partition print says partition print Current partition table (original): Total disk cylinders available: 777 + 2 (reserved cylinders) Part TagFlag Cylinders SizeBlocks 0 unassignedwm 0 0 (0/0/0) 0 1 unassignedwm 0 0 (0/0/0) 0 2 backupwu 0 - 7764.65GB(777/0/0) 9746688 3 unassignedwm 0 0 (0/0/0) 0 4 unassignedwm 0 0 (0/0/0) 0 5 unassignedwm 0 0 (0/0/0) 0 6 unassignedwm 0 0 (0/0/0) 0 7 unassignedwm 0 0 (0/0/0) 0 8 bootwu 0 - 06.12MB(1/0/0) 12544 9 unassignedwm 0 0 (0/0/0) 0 I am still confused how to proceed further. WBR, Sebastian Am 29.11.2012 13:47, schrieb openindiana-discuss-requ...@openindiana.org: -- Message: 7 Date: Thu, 29 Nov 2012 13:47:04 +0100 From: Jim Klimov jimkli...@cos.ru To: openindiana-discuss@openindiana.org Subject: Re: [OpenIndiana-discuss] slicing a disk for ZIL Message-ID: 50b75948.5060...@cos.ru Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 2012-11-29 12:43, Sebastian Gabler wrote: I have bought and installed an Intel SSD 313 20 GB to use as ZIL for one or many pools. I am running openindiana on a x86 platform, no SPARC. As 4 GB should suffice, I am considering to partition the drive in order to assign each partition to one pool (ATM pools are 2 on the Server, but I could expand it in the future) After some reading, I am still confused about slicing and partitioning. What do I actually need to do to achieve the wanted effect to have up to 4 partions on that SSD? Well, basically you need to partition the SSD:) There are several options available: 1) Use MBR (MSDOS Boot Record) traditional partitions - up to four primary partitions, one (or more?) of which can allocate a range as a secondary (or extended) partition and store more parts there. In this case you would address your zfs log devices as cXtYdZpN. In case of IDE connection (which you don't want) you see cXdZpN. Here N is the partition number, with 0 being the partition table/whole disk itself, so your logs would be N=1,2,3,4 if you only use primary partitions (the whole disk should be divided between them, then. Extended partitions are addressed at N=5..15. 2) Use MBR with Solaris slices - define a Solaris partition in MBR using the whole disk, and define Solaris slices inside. Note that slice #2 is reserved as traditional whole disk for backup tools. It may cause zfs tools to report conflicts (slice overlaps), but you can ignore them. In this case your zfs log devices would be cXtYdZsN, with N being the slice number (0..7, 2 is reserved but can be reused if required). AFAIK you can't define more than one Solaris slice set on one HDD, though you can use other partitions for other filesystems. Maybe you can mix slices and partitions as components for the pool and its log/cache devices, but I won't bet on this. 3) Use a GPT (GUID Partition Table, maybe aka EFI) partitioning table. There you explicitly define many partitions, all primary, that can be used by your pools and their components (logs/caches). I think, Solaris format utility detects a GPT table and uses (and displays) those partitions in place of slices for traditional MBR+slice tables. I.e. with GPT you don't have slices, and usually don't need to - you can have many more GPT partitions than slices. When you dedicate a whole disk to ZFS to use in a non-root pool, it essentially makes a GPT table with an 8Mb reserved tail partition and the rest is a ZFS partition. As of now, the GPT-tabled disks can not be used for bootable root pools with Solaris-derived OSes (maybe usable with other distros
Re: [OpenIndiana-discuss] slicing a disk for ZIL
On Thu, 29 Nov 2012, Edward Ned Harvey (openindiana) wrote: From: Sebastian Gabler [mailto:sequoiamo...@gmx.net] I have bought and installed an Intel SSD 313 20 GB to use as ZIL for one or many pools. I am running openindiana on a x86 platform, no SPARC. As 4 GB should suffice, I am considering to partition the drive in order to assign each partition to one pool (ATM pools are 2 on the Server, but I could expand it in the future) Beware, the intel 313 ssd seems to have no power loss protection: http://ark.intel.com/products/66290/Intel-SSD-313-Series-24GB-mSATA-3Gbs-25nm-SLC zil is relying on this feature. After some reading, I am still confused about slicing and partitioning. What do I actually need to do to achieve the wanted effect to have up to 4 partions on that SSD? Everybody seems to want to do this, and I can see why - If you look at the storage capacity of an SSD, you're like Hey, this thing has 64G and I only need 1 or 2 or 4G, that leaves all the rest unused. But you forget to think to yourself, Hey, this thing has a 6Gbit bus, and I'm trying to use it all to boost the performance of some other pool. The only situation where I think it's a good idea to slice the SSD and use it for more than one slog device, is: If you have two pools, and you know you're not planning to write them simultaneously. I have a job that reads from pool A and writes to pool B, and then it will read from pool B and write to pool A, and so forth. But this is highly contrived, and I seriously doubt it's what you're doing (until you say that's what you're doing.) The better thing is to swallow and accept 60G wasted on your slog device. It's not there for storage capacity - you bought it for speed. And there isn't excess speed going to waste. Well, it depends on usage. I sliced up a 120gb intel 320 into 8gb for zil and 50gb for l2arc. On my not so many users multi user system most of the time it's either heavy nfs write activity, or heavy read activity, seldom it's both. zil slices for multiple pools may show the same behaviour. - Michael ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] slicing a disk for ZIL
On 2012-11-29 15:59, Edward Ned Harvey (openindiana) wrote: From: Sebastian Gabler [mailto:sequoiamo...@gmx.net] I have bought and installed an Intel SSD 313 20 GB to use as ZIL for one or many pools. I am running openindiana on a x86 platform, no SPARC. As 4 GB should suffice, I am considering to partition the drive in order to assign each partition to one pool (ATM pools are 2 on the Server, but I could expand it in the future) After some reading, I am still confused about slicing and partitioning. What do I actually need to do to achieve the wanted effect to have up to 4 partions on that SSD? Everybody seems to want to do this, and I can see why - If you look at the storage capacity of an SSD, you're like Hey, this thing has 64G and I only need 1 or 2 or 4G, that leaves all the rest unused. But you forget to think to yourself, Hey, this thing has a 6Gbit bus, and I'm trying to use it all to boost the performance of some other pool. The only situation where I think it's a good idea to slice the SSD and use it for more than one slog device, is: If you have two pools, and you know you're not planning to write them simultaneously. I have a job that reads from pool A and writes to pool B, and then it will read from pool B and write to pool A, and so forth. But this is highly contrived, and I seriously doubt it's what you're doing (until you say that's what you're doing.) The better thing is to swallow and accept 60G wasted on your slog device. It's not there for storage capacity - you bought it for speed. And there isn't excess speed going to waste. Well, there is some truth to this :) But what is the alternative? For example, on smaller systems you only have one PCI(*) bus and that would be a bottleneck, even if you add several SSDs onto the box, so you lose little by using few SSDs for many tasks right away. Also, with their screaming IOPS, many storage vendors limit the supported SSD setups to about 4 devices per system (or per a JBOD shelf, at most), and even that - often with a dedicated controller HBA for the SSDs. However, if you only add a SLOG to just one pool, your other pools would remain slow on sync writes - which you might not care about with some workloads, or might not want with others. And, mind you all, a dedicated ZIL device would only accelerate sync writes which are acknowledged to clients as yes, I've instantly saved your data in reliable fashion!, and presence of such IOs can be researched with DTrace scripts to justify the purchase of a SLOG (or lack of need) for a particular storage box and its de-facto IO patterns. If your box uses several pools but is limited to one SLOG (or two, in a mirrored SLOG), you might still prefer to benefit from having pieces of it added to the several pools. Even if your systems have some sort of streaming sync IO (which for some reason won't be streamed right into the pool anyway), your ZIL writes would be 3Gbit/s for a couple of pools. If the (networked) IO is bursty, as it often is, then it is likely that each pool would get full bandwitdh to its part of the SLOG in those microseconds it needs to spool those blocks. My cents, //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] slicing a disk for ZIL
On 29 November 2012 15:27, Jim Klimov jimkli...@cos.ru wrote: On 2012-11-29 15:59, Edward Ned Harvey (openindiana) wrote: From: Sebastian Gabler [mailto:sequoiamo...@gmx.net] snip The better thing is to swallow and accept 60G wasted on your slog device. It's not there for storage capacity - you bought it for speed. And there isn't excess speed going to waste. Well, there is some truth to this :) But what is the alternative? For example, on smaller systems you only have one PCI(*) bus and that would be a bottleneck, even if you add several SSDs onto the box, so you lose little by using few SSDs for many tasks right away. I would suggest that on smaller systems you wouldn't bother messing with the ZIL :) Jon ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] slicing a disk for ZIL
On 2012-11-29 16:16, Sebastian Gabler wrote: Hi Jim, thanks for the information. You're welcome :) I managed to set up 4 partitions on the disk with fdisk: Total disk size is 3116 cylinders Cylinder size is 12544 (512 byte) blocks Cylinders Partition StatusType Start End Length % = == = === == === 1 Solaris2 1 779 779 25 2 Solaris2 780 1558 779 25 3 Solaris2 1559 2336 778 25 4 Solaris2 2337 3114 778 25 So, this is an MBR table. With GPT/EFI fdisk would likely show a trivial table like this: Total disk size is 7832 cylinders Cylinder size is 16065 (512 byte) blocks Cylinders Partition StatusType Start End Length% = == = === == === 1 EFI 0 78327833100 ...and format/partition/print would detail thus: Total disk sectors available: 125812701 + 16384 (reserved sectors) Part TagFlag First Sector Size Last Sector 0usrwm256 59.99GB 125812701 1 unassignedwm 0 0 0 2 unassignedwm 0 0 0 3 unassignedwm 0 0 0 4 unassignedwm 0 0 0 5 unassignedwm 0 0 0 6 unassignedwm 0 0 0 8 reservedwm 1258127028.00MB 125829085 Now, in /dev/rdsk and /dev/dsk, I see: # ls -sh /dev/dsk/ | grep c3t31d0p 512 c3t31d0p0 512 c3t31d0p1 512 c3t31d0p2 512 c3t31d0p3 512 c3t31d0p4 So, there are 5 Partitions. I am not sure which or if any count, and I have not found a way to verify that each of the Partitions is about 4 GB. As I wrote, p0 is the partition table (whole disk's header), and the p1-p4 are your data partitions. As for the sizes, you can calculate from data shown by fdisk above, or look at slice-2's size in format below. Your partitions here are 777 cylinders in size, with each cylinder being 12544*512 bytes = 6.125Mb (according to fdisk); your partition sizes amount to 4759Mb=4.65Gb each (in power-of-two sizing of Mb). partition print says partition print Current partition table (original): Total disk cylinders available: 777 + 2 (reserved cylinders) Part TagFlag Cylinders SizeBlocks 0 unassignedwm 0 0 (0/0/0) 0 1 unassignedwm 0 0 (0/0/0) 0 2 backupwu 0 - 7764.65GB(777/0/0) 9746688 3 unassignedwm 0 0 (0/0/0) 0 4 unassignedwm 0 0 (0/0/0) 0 5 unassignedwm 0 0 (0/0/0) 0 6 unassignedwm 0 0 (0/0/0) 0 7 unassignedwm 0 0 (0/0/0) 0 8 bootwu 0 - 06.12MB(1/0/0) 12544 9 unassigned wm 0 0 (0/0/0) 0 Being Solaris partitions, these are likely labelled with a slice table. But you can only access one Solaris slice table on your disk, with the x86 default layout shown above. I am still confused how to proceed further. I hope that you can use these partitions as your pool components, i.e. # zpool add mypool log c3t31d0p1 It MAY be possible that ZFS would refuse this, at least on some builds; in this case (or in any case) it might be a safer bet to create a single MBR partition spanning the whole disk and several 4.6Gb slices inside it (skipping the number 2), and then use these slices for log devices: # zpool add mypool log c3t31d0s1 If you do manage to add some log devices before such reorganization of the disk layout, be sure to remove these devices from pools first. And be extra careful to use the keywords (like log or cache, as appropriate) when adding devices to pools - otherwise you risk extending the pool size itself with a new top-level component, especially if you use force for any reason - and top-level components can not be removed. Anyhow, if at all possible - do back up your data before trying this all :) WBR, Sebastian Am 29.11.2012 13:47, schrieb openindiana-discuss-requ...@openindiana.org: -- Message: 7 Date: Thu, 29 Nov 2012 13:47:04 +0100 From: Jim Klimov jimkli...@cos.ru To: openindiana-discuss@openindiana.org Subject: Re: [OpenIndiana-discuss] slicing a disk for ZIL Message-ID: 50b75948.5060...@cos.ru Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 2012-11-29 12:43, Sebastian Gabler wrote: I have bought and installed an Intel SSD 313 20 GB to use as ZIL for one or many pools. I am running openindiana on a x86 platform, no SPARC. As 4
Re: [OpenIndiana-discuss] slicing a disk for ZIL
Am 29.11.2012 16:48, schrieb openindiana-discuss-requ...@openindiana.org: Beware, the intel 313 ssd seems to have no power loss protection: http://ark.intel.com/products/66290/Intel-SSD-313-Series-24GB-mSATA-3Gbs-25nm-SLC zil is relying on this feature. (from Michael) and Anyhow, if at all possible - do back up your data before trying this all:) (from Jim) I have chosen a device without a supercap because my environment is based on non-enterprise HW. The server has only a single PSU with the option to get a second one. The shelf where the ZIL disk resides has dual PSUs connected. The whole thing is connected to a single phase UPS. The data that will be affected are for test databases. As far as I am aware, the loss of the ZIL is no longer fatal for the pool in current zfs versions. As well, even with a corrupted ZIL, the pool should be recoverable. Hence, I was so far not worried too much. Should I better be? BR Sebastian ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] slicing a disk for ZIL
On 2012-11-29 17:29, Sebastian Gabler wrote: The data that will be affected are for test databases. As far as I am aware, the loss of the ZIL is no longer fatal for the pool in current zfs versions. As well, even with a corrupted ZIL, the pool should be recoverable. Hence, I was so far not worried too much. Should I better be? For test databases - probably little worry. If your storage's power goes down, your test database server would likely reboot too, and restart its work from the available consistent disk state (which might not be the latest data that the clients thought was saved). If you can reboot every device and client which relies on the same view of the data, and if you can safely ignore losing a few transactions - this setup should be ok. The potential problems are for production systems which can not lose transactions (i.e. eshop or bank databases), and for systems which rely on files being saved to disk (mail servers, document workflow, etc.) As for possible ZIL problems, they are as follows: the separate SLOG saves sync parts of the ZFS transactions. When a transaction group (TXG) closes, the same data and more (async data) is saved to main pool from RAM dirty pages in cache. Thus the dedicated ZIL device is normally only written, and very rarely read (after an unclean shutdown and the following pool import). If it gets broken during work, ZFS detects this state and falls back to using some blocks on the main pool for ZIL tasks - as it does normally without a SLOG. The possible problems are if the ZIL device is unmirrored and gets broken during such poweroff - then the last group of transactions (possibly, a couple of groups) which were not spooled to main pool disks can get lost. Anyhow, any async data is also lost - but storage makes no guarantees about storing that across powerloss and other breakages either, and the clients should not expect async data to be safely stored (hence the explicit sync IO requests or flush after async IO). //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss