Re: Software raid over iSCSI
Hi! A good new year to everyone, and I'd like to throw in this on the MBR- Partitioning: I think the next years GPT (GUID-based Partition Table) will become more common (64-bit EFI-based system use that partitioning scheme). In GPT there are no longer sectors, tracks, or heads. There are just blocks, that is LBA (Logical Block Addressing) is used everywhere. GPT supports really large disks. Regards, Ulrich On 23 Dec 2008 at 2:38, Eric wrote: Andrew, I'll have look at that discussion. Thank you! Don't think I've time to wait for those patches to come. We commonly use vanilla kernels and I bet it will take some time before those patches come down the distribution hill. I'll have to apply these tweaks manually. Anyway, I think these changes could cause some struggles by sysadmins. And I bet there's some crappy disk-recovery software around that won't like those partition tables. This takes me back to like 10 years ago when norton disk doctor destroyed 3 of my partitions =P. That was one important lesson: Computers are always right, software not! Kind regards, Eric On Dec 23, 9:13 am, Andrew McGill list2...@lunch.za.net wrote: From another list, the mail below is a proposal to change the default partition table for disks from 512 bytes to 4096 bytes. I think that once implemented (in a few years / days time), it will make some of the alignment problems due to the 512-byte MSDOS partition table go away. The complete thread has some references to how the SCSI code determines the geometry --http://news.gmane.org/group/gmane.linux.utilities.util-linux-ng/last= Subject: Changing the default CHS used by Linux partition editors From: Theodore Ts'o ty...@mit.edu To: util-linux...@vger.kernel.org, Eric Sandeen sand...@redhat.com, Ric Wheeler rwhee...@redhat.com, James Bottomley james.bottom...@hansenpartnership.com, Jeff Garzik jgar...@redhat.com, Curtis Gedak ged...@gmail.com Date: 2008-12-12 00:30 I attended the IDEMA (International Disk Drive Equipment and Materials Association) conference today to give a talk about Linux, and during one of the breaks I got buttonholed by someone who asked me if I could help make sure Linux would be able to deal with the upcoming HDD sector size move from 512 to 4096. Just coincidentally, I ran across the following article from Slashdot, Which Operating System Is Best For solid-state disks: http://www.computerworld.com/action/article.do?command=viewArticleBas... Quoting from that article, Justin Sykes from Micron Technologies stated: NAND [flash memory] fundamentally has native 4K block sizes. Anything that's not aligned to a 4K block creates extra challenges, Sykes said. There ends up being background operations to garbage-collect that empty space [in larger file blocks] that isn't fully utilized. And, so that activity is chewing up your bandwidth in the background, and it adds extra wear to the NAND [flash memory]. I fully expect that perhaps someone from San Disk or Intel will pop up and say that this is just Micron's SSD's suck; *our* SSD's won't have this problem. Perhaps; but HDD's won't be going away any time soon[1], and they will be moving to a 4k block size in the next few years. So what's the problem? The main problem seems to be that by default, we are using partition tables that cause the partitions to be not aligned on 4k boudaries, because of the default hdd geometry used by our partition tools and returned by the HDIO_GETGEO ioctl: Disk /dev/sda: 255 heads, 63 sectors, 38913 cylinders Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID 1 80 1 1 0 254 63 121 63 1959867 83 2 00 0 1 122 254 63 619 1959930 8000370 82 3 00 0 1 620 254 63 1023 9960300 615177045 05 4 00 0 0 0 0 0 0 0 0 00 5 00 1 1 620 254 63 1023 63 615176982 8e For pretty much all modern systems --- certainly any drive using the SATA interface, the boot loader no longer needs to use the original CHS INT13 interface, so what we pick for the CHS geometry doesn't matter as far as bootloaders are concerned. Linux only uses LBA's so the bottom line is that aside from controlling the alignment of partitions, CHS's don't really matter. For SSD's and HDD's that use a 4k internal sector size, being 4k aligned makes a big difference because it avoids read-modify-write cycles. We can achieve this easily if we simply use a CHS geometry of 56 sectors/track instead of 63 sectors. So, I would propose that we change the default geometry used by the partitioning tools in util-linux-ng, gparted, etc. so the default sectors is 56; furthermore, to catch those partitioning tools that use the HDIO_GETGEO ioctl, that we change the fantasy geometry generated in
Re: Software raid over iSCSI
From another list, the mail below is a proposal to change the default partition table for disks from 512 bytes to 4096 bytes. I think that once implemented (in a few years / days time), it will make some of the alignment problems due to the 512-byte MSDOS partition table go away. The complete thread has some references to how the SCSI code determines the geometry -- http://news.gmane.org/group/gmane.linux.utilities.util-linux-ng/last= Subject: Changing the default CHS used by Linux partition editors From: Theodore Ts'o ty...@mit.edu To: util-linux...@vger.kernel.org, Eric Sandeen sand...@redhat.com, Ric Wheeler rwhee...@redhat.com, James Bottomley james.bottom...@hansenpartnership.com, Jeff Garzik jgar...@redhat.com, Curtis Gedak ged...@gmail.com Date: 2008-12-12 00:30 I attended the IDEMA (International Disk Drive Equipment and Materials Association) conference today to give a talk about Linux, and during one of the breaks I got buttonholed by someone who asked me if I could help make sure Linux would be able to deal with the upcoming HDD sector size move from 512 to 4096. Just coincidentally, I ran across the following article from Slashdot, Which Operating System Is Best For solid-state disks: http://www.computerworld.com/action/article.do?command=viewArticleBasictaxonomyName=StoragearticleId=9123140taxonomyId=19pageNumber=1 Quoting from that article, Justin Sykes from Micron Technologies stated: NAND [flash memory] fundamentally has native 4K block sizes. Anything that's not aligned to a 4K block creates extra challenges, Sykes said. There ends up being background operations to garbage-collect that empty space [in larger file blocks] that isn't fully utilized. And, so that activity is chewing up your bandwidth in the background, and it adds extra wear to the NAND [flash memory]. I fully expect that perhaps someone from San Disk or Intel will pop up and say that this is just Micron's SSD's suck; *our* SSD's won't have this problem. Perhaps; but HDD's won't be going away any time soon[1], and they will be moving to a 4k block size in the next few years. So what's the problem? The main problem seems to be that by default, we are using partition tables that cause the partitions to be not aligned on 4k boudaries, because of the default hdd geometry used by our partition tools and returned by the HDIO_GETGEO ioctl: Disk /dev/sda: 255 heads, 63 sectors, 38913 cylinders Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID 1 80 1 1 0 254 63 121 63 1959867 83 2 00 0 1 122 254 63 619 1959930 8000370 82 3 00 0 1 620 254 63 1023 9960300 615177045 05 4 00 0 0 0 0 0 0 0 0 00 5 00 1 1 620 254 63 1023 63 615176982 8e For pretty much all modern systems --- certainly any drive using the SATA interface, the boot loader no longer needs to use the original CHS INT13 interface, so what we pick for the CHS geometry doesn't matter as far as bootloaders are concerned. Linux only uses LBA's so the bottom line is that aside from controlling the alignment of partitions, CHS's don't really matter. For SSD's and HDD's that use a 4k internal sector size, being 4k aligned makes a big difference because it avoids read-modify-write cycles. We can achieve this easily if we simply use a CHS geometry of 56 sectors/track instead of 63 sectors. So, I would propose that we change the default geometry used by the partitioning tools in util-linux-ng, gparted, etc. so the default sectors is 56; furthermore, to catch those partitioning tools that use the HDIO_GETGEO ioctl, that we change the fantasy geometry generated in drivers/scsi/scsicam.c:scsicam_bios_param() and drivers/ata/libata-scsi.c to also use a 255/56 head/sector geometry. Does this make sense? Am I missing some fatal flaw? Should I send patches? - Ted [1] There was an absolutely brilliant presentation at the IDEMA conference from Steve Hetzler, an IBM Fellow from Almaden Research Lab, that used an economic argument based the capital cost of the Fab's and what would happen if one were to move *all* of the world's Silicon Fabs to generating flash for SSD's --- this would only satisfy 18% of the HDD market --- and the total size of the HDD market by revenue is $35 billion, and the value of the output of the Si Fab's today is $280 billion --- so are we going to give up $280 billion dollars worth of revenue from the current products of today's available Fabs in order to displace 18% of the HDD $35 billion market? What about building new Fabs? Well, building new fabs sufficient to create enough flash to replace all of the HDD market would cost approximately one trillion dollars. A single Fab 45mm fab is $3-4 billion; and a 22mm Fab will probably cost be $7-8billion. (This is just the cost to *build* the Fab; it ignores the materials and
Re: Software raid over iSCSI
Andrew, I'll have look at that discussion. Thank you! Don't think I've time to wait for those patches to come. We commonly use vanilla kernels and I bet it will take some time before those patches come down the distribution hill. I'll have to apply these tweaks manually. Anyway, I think these changes could cause some struggles by sysadmins. And I bet there's some crappy disk-recovery software around that won't like those partition tables. This takes me back to like 10 years ago when norton disk doctor destroyed 3 of my partitions =P. That was one important lesson: Computers are always right, software not! Kind regards, Eric On Dec 23, 9:13 am, Andrew McGill list2...@lunch.za.net wrote: From another list, the mail below is a proposal to change the default partition table for disks from 512 bytes to 4096 bytes. I think that once implemented (in a few years / days time), it will make some of the alignment problems due to the 512-byte MSDOS partition table go away. The complete thread has some references to how the SCSI code determines the geometry --http://news.gmane.org/group/gmane.linux.utilities.util-linux-ng/last= Subject: Changing the default CHS used by Linux partition editors From: Theodore Ts'o ty...@mit.edu To: util-linux...@vger.kernel.org, Eric Sandeen sand...@redhat.com, Ric Wheeler rwhee...@redhat.com, James Bottomley james.bottom...@hansenpartnership.com, Jeff Garzik jgar...@redhat.com, Curtis Gedak ged...@gmail.com Date: 2008-12-12 00:30 I attended the IDEMA (International Disk Drive Equipment and Materials Association) conference today to give a talk about Linux, and during one of the breaks I got buttonholed by someone who asked me if I could help make sure Linux would be able to deal with the upcoming HDD sector size move from 512 to 4096. Just coincidentally, I ran across the following article from Slashdot, Which Operating System Is Best For solid-state disks: http://www.computerworld.com/action/article.do?command=viewArticleBas... Quoting from that article, Justin Sykes from Micron Technologies stated: NAND [flash memory] fundamentally has native 4K block sizes. Anything that's not aligned to a 4K block creates extra challenges, Sykes said. There ends up being background operations to garbage-collect that empty space [in larger file blocks] that isn't fully utilized. And, so that activity is chewing up your bandwidth in the background, and it adds extra wear to the NAND [flash memory]. I fully expect that perhaps someone from San Disk or Intel will pop up and say that this is just Micron's SSD's suck; *our* SSD's won't have this problem. Perhaps; but HDD's won't be going away any time soon[1], and they will be moving to a 4k block size in the next few years. So what's the problem? The main problem seems to be that by default, we are using partition tables that cause the partitions to be not aligned on 4k boudaries, because of the default hdd geometry used by our partition tools and returned by the HDIO_GETGEO ioctl: Disk /dev/sda: 255 heads, 63 sectors, 38913 cylinders Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID 1 80 1 1 0 254 63 121 63 1959867 83 2 00 0 1 122 254 63 619 1959930 8000370 82 3 00 0 1 620 254 63 1023 9960300 615177045 05 4 00 0 0 0 0 0 0 0 0 00 5 00 1 1 620 254 63 1023 63 615176982 8e For pretty much all modern systems --- certainly any drive using the SATA interface, the boot loader no longer needs to use the original CHS INT13 interface, so what we pick for the CHS geometry doesn't matter as far as bootloaders are concerned. Linux only uses LBA's so the bottom line is that aside from controlling the alignment of partitions, CHS's don't really matter. For SSD's and HDD's that use a 4k internal sector size, being 4k aligned makes a big difference because it avoids read-modify-write cycles. We can achieve this easily if we simply use a CHS geometry of 56 sectors/track instead of 63 sectors. So, I would propose that we change the default geometry used by the partitioning tools in util-linux-ng, gparted, etc. so the default sectors is 56; furthermore, to catch those partitioning tools that use the HDIO_GETGEO ioctl, that we change the fantasy geometry generated in drivers/scsi/scsicam.c:scsicam_bios_param() and drivers/ata/libata-scsi.c to also use a 255/56 head/sector geometry. Does this make sense? Am I missing some fatal flaw? Should I send patches? - Ted [1] There was an absolutely brilliant presentation at the IDEMA conference from Steve Hetzler, an IBM Fellow from Almaden Research Lab, that used an economic argument based the capital cost of the Fab's and what would happen if one were to move *all* of the world's Silicon Fabs to generating flash for SSD's ---
Re: Software raid over iSCSI
On Tue, Dec 23, 2008 at 9:13 AM, Andrew McGill list2...@lunch.za.net wrote: From another list, the mail below is a proposal to change the default partition table for disks from 512 bytes to 4096 bytes. I think that once implemented (in a few years / days time), it will make some of the alignment problems due to the 512-byte MSDOS partition table go away. The complete thread has some references to how the SCSI code determines the geometry -- http://news.gmane.org/group/gmane.linux.utilities.util-linux-ng/last= Thanks for the link. This thread contains an interesting suggestion, namely choosing 255/56/X for C/H/S. This C/H/S choice guarantees that partitions are aligned at 4096 byte boundaries. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Software raid over iSCSI
like this? http://groups.google.com/group/open-iscsi/browse_thread/thread/4caaf406fe8165ab/f021bae5e175ed59#f021bae5e175ed59 2008/12/22 Eric ericvanblokl...@gmail.com Hello, I'm going to setup a software raid over iSCSI. While I probably should ask my question to the raid people, my guess was someone here might have experience with this. I'm going to use the following topology: There will be 2 storage servers exporting targets. Other physical machines will initiate a target from each storage server to create a software raid (1). Virtual machines will use these raids as their disks. Apart from the standard optimizations, there's one I haven't been able to find any information on. It is suggested that initiators, using the disks (in this case the virtualized machines) should use a head and cylinder count that is divisable by 2. Is this suggestion correct; does it (still) apply to iSCSI? Now for the actual question: I don't know exactly how linux software raid works internally, but it sounds like logic to me, that the software raid should be aware of the head and cylinder count as well. So, when creating raid partitions on the targets, I should also modify the head and cylinder count on these. Is this a correct assumption and will software raid use the values advertised in the partition table? Any answers or suggestions are appreciated. Thanks in advance. Kind regards, Eric -- Thanks regards,|73 Yuri Huang --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Software raid over iSCSI
Yuri, Assuming the disks used by your raid are targets originating from different servers, then yes. On Dec 22, 11:27 am, Yuri yuri.li...@gmail.com wrote: like this?http://groups.google.com/group/open-iscsi/browse_thread/thread/4caaf4... 2008/12/22 Eric ericvanblokl...@gmail.com Hello, I'm going to setup a software raid over iSCSI. While I probably should ask my question to the raid people, my guess was someone here might have experience with this. I'm going to use the following topology: There will be 2 storage servers exporting targets. Other physical machines will initiate a target from each storage server to create a software raid (1). Virtual machines will use these raids as their disks. Apart from the standard optimizations, there's one I haven't been able to find any information on. It is suggested that initiators, using the disks (in this case the virtualized machines) should use a head and cylinder count that is divisable by 2. Is this suggestion correct; does it (still) apply to iSCSI? Now for the actual question: I don't know exactly how linux software raid works internally, but it sounds like logic to me, that the software raid should be aware of the head and cylinder count as well. So, when creating raid partitions on the targets, I should also modify the head and cylinder count on these. Is this a correct assumption and will software raid use the values advertised in the partition table? Any answers or suggestions are appreciated. Thanks in advance. Kind regards, Eric -- Thanks regards,|73 Yuri Huang- Hide quoted text - - Show quoted text - --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Software raid over iSCSI
We'd be using iSCSI Enterprise Target (iscsitarget.sourceforge.net). Ofcourse, the target implementation to use in this setup is open for discussion. On Dec 22, 12:00 pm, Bart Van Assche bart.vanass...@gmail.com wrote: On Mon, Dec 22, 2008 at 11:19 AM, Eric ericvanblokl...@gmail.com wrote: Apart from the standard optimizations, there's one I haven't been able to find any information on. It is suggested that initiators, using the disks (in this case the virtualized machines) should use a head and cylinder count that is divisable by 2. Is this suggestion correct; does it (still) apply to iSCSI? This depends entirely on which target implementation you are using. Some iSCSI targets perform better with aligned partitions, and other iSCSI targets do not need aligned partitions to work at full speed. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Software raid over iSCSI
On 22 Dec 2008 at 2:19, Eric wrote: Hello, I'm going to setup a software raid over iSCSI. While I probably should ask my question to the raid people, my guess was someone here might have experience with this. I'm going to use the following topology: There will be 2 storage servers exporting targets. Other physical machines will initiate a target from each storage server to create a software raid (1). Virtual machines will use these raids as their disks. Apart from the standard optimizations, there's one I haven't been able to find any information on. It is suggested that initiators, using the disks (in this case the virtualized machines) should use a head and cylinder count that is divisable by 2. Is this suggestion correct; does it (still) apply to iSCSI? Hi! I think since ZBR (Zone Bit Recording) the number of sectors per cylinder is variable. thus it makes no sense for any higher-level disk software to try to deal with heads or cylinders. Since ATA (about 1990) only the controller on the disk knows the tracks, heads, and cylinders. The rest is just logic. Therefore SCSI (nad now LBA) just uses logical block numbers. Now for the actual question: I don't know exactly how linux software raid works internally, but it sounds like logic to me, that the software raid should be aware of the head and cylinder count as well. So, when creating raid partitions on the targets, I should also modify the head and cylinder count on these. Is this a correct assumption and will software raid use the values advertised in the partition table? Any answers or suggestions are appreciated. Thanks in advance. Only for MS-DOS compatibility you need C/H/S addressing. The rest doesn't care AFAIK. Regards, Ulrich Kind regards, Eric --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Software raid over iSCSI
On Mon, Dec 22, 2008 at 1:18 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I think since ZBR (Zone Bit Recording) the number of sectors per cylinder is variable. thus it makes no sense for any higher-level disk software to try to deal with heads or cylinders. Since ATA (about 1990) only the controller on the disk knows the tracks, heads, and cylinders. The rest is just logic. Therefore SCSI (nad now LBA) just uses logical block numbers. At least with IET, changing the heads/sector values for the exported disk does improve speed. See also http://www.mail-archive.com/open-iscsi@googlegroups.com/msg01664.html. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Software raid over iSCSI
Bart, Thanks for definitive answer and the link to a great thread. I need one more: I have to set the heads and cylinders on the disk partitions of the virtualized servers. Now I assume I also have to set heads and cylinders on the raid partions, exported by the targets. Is this assumption correct? Thanks, Eric On Dec 22, 1:51 pm, Bart Van Assche bart.vanass...@gmail.com wrote: On Mon, Dec 22, 2008 at 1:18 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I think since ZBR (Zone Bit Recording) the number of sectors per cylinder is variable. thus it makes no sense for any higher-level disk software to try to deal with heads or cylinders. Since ATA (about 1990) only the controller on the disk knows the tracks, heads, and cylinders. The rest is just logic. Therefore SCSI (nad now LBA) just uses logical block numbers. At least with IET, changing the heads/sector values for the exported disk does improve speed. See alsohttp://www.mail-archive.com/open-iscsi@googlegroups.com/msg01664.html. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Software raid over iSCSI
On Mon, Dec 22, 2008 at 2:17 PM, Eric ericvanblokl...@gmail.com wrote: Thanks for definitive answer and the link to a great thread. I need one more: I have to set the heads and cylinders on the disk partitions of the virtualized servers. Now I assume I also have to set heads and cylinders on the raid partions, exported by the targets. Is this assumption correct? I'm not 100% sure, but the CHS layout of the disk itself (not the partitions) might be the layout you have to tune. If I remember correctly, IET performance is suboptimal if partition boundaries are not aligned with page boundaries. Or: partition boundaries should be a multiple of 4096 KB. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Software raid over iSCSI
Bart, My use of the term partitions is terrible abuse ofcourse. I mean the CHS layout ofcourse. I've a semi-production test setup where I will apply these changes and see what happens. The setup is far from ideal because if have some cheap-ass 3com switches which don't support jumbo frames or 802.3ad dynamic link aggregation. However, storage is already generating some load without link saturation so we should see something happen. I'll post my findings in about 2 weeks. Have a nice holiday! Eric On Dec 22, 2:34 pm, Bart Van Assche bart.vanass...@gmail.com wrote: On Mon, Dec 22, 2008 at 2:17 PM, Eric ericvanblokl...@gmail.com wrote: Thanks for definitive answer and the link to a great thread. I need one more: I have to set the heads and cylinders on the disk partitions of the virtualized servers. Now I assume I also have to set heads and cylinders on the raid partions, exported by the targets. Is this assumption correct? I'm not 100% sure, but the CHS layout of the disk itself (not the partitions) might be the layout you have to tune. If I remember correctly, IET performance is suboptimal if partition boundaries are not aligned with page boundaries. Or: partition boundaries should be a multiple of 4096 KB. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---