Hello, i did some measurements on the impact that unaligned partitions/slices have on the "new" harddrives that use ondisk 4kB sectors and export them as 512B sectors. [1] My tests were done on a Western Digital WD10EARS. [2]
CONCLUSION: Having unaligned partition/slices on those disks leads to noticable performance penalty under realworld workloads. IMPLICATIONS: 1. The rounding of unit sizes to cylinder boundaries by disklabel has to be evaluated. 2. A FAQ entry for the "advanced format" disks is needed to tell people to set the XP jumper. (more on that later) If disklabel is not modified, that entry would also have to explain the alignement implications and "how to use a calculator". TEST RESULTS: - sequential write/read speeds --- dd bs | aligned | unaligned | wd10eads* | | | 4k write | 97433116 | 86349673 | 80762241 (bytes/sec) 64k w | 101273894 | 85616298 | 81234814 1m w | 98291974 | 79201231 | 83113302 | | | 4k read | 103706513 | 104434701 | 82723667 64k r | 105136468 | 104453140 | 85552816 1m r | 104228605 | 104921901 | 85650289 (* wd10eads is the previous generation to the wd10ears with 32mb cache and usual 512B ondisk sectors. Disk is in a different system! That system is not idle so actual numbers might be higher.) - extracting a source tree --- aligned : 6m26.31s unaligned : 14m30.30s - build kernel / make obj / make build --- | aligned | unaligned kernel | 2m27.94s | 2m48.12s make obj | 0m28.51s | 1m01.41s make build | 36m07.27s | 70m51.58s EXPLANAITIONS (or whatever :): Those numbers are kinda scary. I would not have expected such bad results for the builds from my earlier sequential rw tests i sent to m...@. (Just to make it clear, if the partition/slices are not aligned, the disk has to read every 4k sector it wants to write to, before it can actually do that. The 64MB of cache help to elevate that up to some point.) This drive has a "XP legacy jumper". (Same as WD15EARS and WD20EARS.) It is intended to be used for Windows XP systems with a single partition over the whole drive. XP uses the same 63 sector offset as OpenBSD does. Setting this jumper, transparently alignes the 63 sectors infront of a 4k sector boundary. When that jumper is set, slices inside the partition only have to be multiples of 8 big. The issue is with disklables rounding down to the nearest cylinder boundary. This will mess up the nice multiplication by 1024, which would lead to a size divisable by 8. The rounding down is always done when using units, but not when requesting a size without a unit/in sectors. So slices can be aligned that way "by hand". That rounding to cylinders is not needed, afaik. So without that, a simple "rtfaq! set the damn jumper!" would be enough, to get the best performance out of such harddisks. Below you can find more info about my test setup and the test outputs. Cheers, - Robert [1] http://www.wdc.com/advformat [2] http://www.wdc.com/en/products/products.asp?driveid=763 TESTS: "aligned" == XP jumper set "unaligned" == XP jumper NOT set* (* without the jumper, the partition/slices are off by one 512B sector.) I installed a snapshot i had on hand (see dmesg) and went from there. (Fresh installl without the jumper.) Source-tree used is -current from some hours ago. I sync'ed before every test. disk layout and ramdisk was the same in both scenarios. - dmesg --- OpenBSD 4.6-current (GENERIC.MP) #40: Tue Dec 29 01:02:20 MST 2009 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 3488088064 (3326MB) avail mem = 3388391424 (3231MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xf0730 (61 entries) bios0: vendor American Megatrends Inc. version "1104" date 09/11/2009 bios0: ASUSTeK Computer INC. P5QL-E acpi0 at bios0: rev 2 acpi0: tables DSDT FACP APIC MCFG OEMB HPET OSFR acpi0: wakeup devices P0P2(S4) P0P3(S4) P0P1(S4) UAR1(S4) PS2K(S4) PS2M(S4) EUSB(S4) USBE(S4) P0P5(S4) P0P6(S4) P0P7(S4) P0P8(S4) P0P9(S4) GBEC(S4) USB0(S4) USB1(S4) USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) P0P4(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 3325.54 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG cpu0: 2MB 64b/line 8-way L2 cache cpu0: apic clock running at 266MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Pentium(R) Dual-Core CPU E5200 @ 2.50GHz, 3325.06 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,EST,TM2,CX16,xTPR,NXE,LONG cpu1: 2MB 64b/line 8-way L2 cache ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (P0P2) acpiprt2 at acpi0: bus -1 (P0P3) acpiprt3 at acpi0: bus 5 (P0P1) acpiprt4 at acpi0: bus -1 (P0P5) acpiprt5 at acpi0: bus 3 (P0P8) acpiprt6 at acpi0: bus 2 (P0P9) acpiprt7 at acpi0: bus 4 (P0P4) acpicpu0 at acpi0 acpicpu1 at acpi0 aibs at acpi0 not configured acpibtn0 at acpi0: PWRB cpu0: unknown Enhanced SpeedStep CPU, msr 0x061a4c1f06004c1f cpu0: using only highest and lowest power states cpu0: Enhanced SpeedStep 3325 MHz: speeds: 15200, 1200 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel G45 Host" rev 0x02 ppb0 at pci0 dev 1 function 0 "Intel G45 PCIE" rev 0x02: apic 2 int 16 (irq 10) pci1 at ppb0 bus 1 vga1 at pci1 dev 0 function 0 "ATI Radeon HD 4850" rev 0x00 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) azalia0 at pci1 dev 0 function 1 "ATI Radeon HD 48xx HD Audio" rev 0x00: apic 2 int 17 (irq 11) azalia0: no supported codecs azalia0: initialization failure, detaching uhci0 at pci0 dev 26 function 0 "Intel 82801JI USB" rev 0x00: apic 2 int 16 (irq 10) uhci1 at pci0 dev 26 function 1 "Intel 82801JI USB" rev 0x00: apic 2 int 21 (irq 14) uhci2 at pci0 dev 26 function 2 "Intel 82801JI USB" rev 0x00: apic 2 int 18 (irq 15) ehci0 at pci0 dev 26 function 7 "Intel 82801JI USB" rev 0x00: apic 2 int 18 (irq 15) usb0 at ehci0: USB revision 2.0 uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 ppb1 at pci0 dev 28 function 0 "Intel 82801JI PCIE" rev 0x00: apic 2 int 17 (irq 11) pci2 at ppb1 bus 4 ppb2 at pci0 dev 28 function 4 "Intel 82801JI PCIE" rev 0x00: apic 2 int 17 (irq 11) pci3 at ppb2 bus 3 jmb0 at pci3 dev 0 function 0 "JMicron JMB363 IDE/SATA" rev 0x03 ahci0 at jmb0: apic 2 int 16 (irq 10), AHCI 1.0 scsibus0 at ahci0: 32 targets pciide0 at jmb0: DMA, channel 0 wired to native-PCI, channel 1 wired to native-PCI pciide0: using apic 2 int 16 (irq 10) for native-PCI interrupt atapiscsi0 at pciide0 channel 0 drive 0 scsibus1 at atapiscsi0: 2 targets cd0 at scsibus1 targ 0 lun 0: <TOSHIBA, DVD-ROM SD-M1712, J004> ATAPI 5/cdrom removable atapiscsi1 at pciide0 channel 0 drive 1 scsibus2 at atapiscsi1: 2 targets cd1 at scsibus2 targ 0 lun 0: <_NEC, DVD_RW ND-3500AG, 2.18> ATAPI 5/cdrom removable cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 cd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 pciide0: channel 1 disabled (no drives) ppb3 at pci0 dev 28 function 5 "Intel 82801JI PCIE" rev 0x00: apic 2 int 16 (irq 10) pci4 at ppb3 bus 2 ale0 at pci4 dev 0 function 0 "Attansic Technology L1E" rev 0xb0: AR8121, apic 2 int 17 (irq 11), address 00:22:15:00:12:34 atphy0 at ale0 phy 0: F1 10/100/1000 PHY, rev. 9 uhci3 at pci0 dev 29 function 0 "Intel 82801JI USB" rev 0x00: apic 2 int 23 (irq 3) uhci4 at pci0 dev 29 function 1 "Intel 82801JI USB" rev 0x00: apic 2 int 19 (irq 5) uhci5 at pci0 dev 29 function 2 "Intel 82801JI USB" rev 0x00: apic 2 int 18 (irq 15) ehci1 at pci0 dev 29 function 7 "Intel 82801JI USB" rev 0x00: apic 2 int 23 (irq 3) usb1 at ehci1: USB revision 2.0 uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1 ppb4 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x90 pci5 at ppb4 bus 5 "Creative Labs SoundBlaster Audigy LS" rev 0x00 at pci5 dev 1 function 0 not configured "AT&T/Lucent FW322 1394" rev 0x70 at pci5 dev 3 function 0 not configured pcib0 at pci0 dev 31 function 0 "Intel 82801JIR LPC" rev 0x00 ahci1 at pci0 dev 31 function 2 "Intel 82801JI AHCI" rev 0x00: apic 2 int 19 (irq 5), AHCI 1.2 scsibus3 at ahci1: 32 targets sd0 at scsibus3 targ 0 lun 0: <ATA, WDC WD10EARS-00Y, 80.0> SCSI3 0/direct fixed sd0: 953869MB, 512 bytes/sec, 1953525168 sec total ichiic0 at pci0 dev 31 function 3 "Intel 82801JI SMBus" rev 0x00: apic 2 int 18 (irq 15) iic0 at ichiic0 iic0: addr 0x1e 01=01 02=01 10=0f 11=01 12=01 13=0f 20=05 21=01 22=01 23=05 31=01 32=01 words 00=0001 01=0101 02=0100 03=0000 04=0000 05=0000 06=0000 07=0000 iic0: addr 0x20 01=80 02=17 03=7f 10=00 19=b0 20=20 21=00 25=20 26=b2 38=74 39=03 4a=64 6a=2c 78=02 79=08 7a=00 7b=00 7e=82 80=00 8b=31 8c=bb 96=8d 99=41 9a=98 9b=01 d0=00 d1=03 d2=72 d3=72 d4=03 d5=02 d6=01 d7=9b d8=6b d9=00 da=00 db=00 dc=00 dd=00 de=00 df=00 e0=00 e1=00 e2=10 e3=10 e4=10 e5=10 e6=10 e7=10 e8=10 e9=10 ea=10 ec=07 ee=00 f1=08 f5=02 f6=02 f9=00 fa=00 fb=50 words 00=ffff 01=8037 02=1766 03=7fff 04=ffff 05=ffff 06=ffff 07=ffff spdmem0 at iic0 addr 0x50: 2GB DDR2 SDRAM non-parity PC2-6400CL5 spdmem1 at iic0 addr 0x52: 2GB DDR2 SDRAM non-parity PC2-6400CL5 usb2 at uhci0: USB revision 1.0 uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb3 at uhci1: USB revision 1.0 uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb4 at uhci2: USB revision 1.0 uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb5 at uhci3: USB revision 1.0 uhub5 at usb5 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb6 at uhci4: USB revision 1.0 uhub6 at usb6 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb7 at uhci5: USB revision 1.0 uhub7 at usb7 "Intel UHCI root hub" rev 1.00/1.00 addr 1 isa0 at pcib0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard, using wsdisplay0 pcppi0 at isa0 port 0x61 midi0 at pcppi0: <PC speaker> spkr0 at pcppi0 lm0 at isa0 port 0x290/8: W83627DHG fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 mtrr: Pentium Pro MTRR support uhidev0 at uhub7 port 1 configuration 1 interface 0 "Razer Razer Copperhead Laser Mouse" rev 1.10/21.00 addr 2 uhidev0: iclass 3/0 ums0 at uhidev0: 7 buttons, Z dir wsmouse0 at ums0 mux 0 uhidev1 at uhub7 port 1 configuration 1 interface 1 "Razer Razer Copperhead Laser Mouse" rev 1.10/21.00 addr 2 uhidev1: iclass 3/1 ukbd0 at uhidev1: 8 modifier keys, 6 key codes wskbd1 at ukbd0 mux 1 wskbd1: connecting to wsdisplay0 vscsi0 at root scsibus4 at vscsi0: 256 targets softraid0 at root root on sd0a swap on sd0b dump on sd0b (( fwiw, jacob: that azalia is just disabled. the iic0 should be an asus ai booster. still have to modify the live driver for that soundblaster 5.1 vx. )) - fdisk --- Disk: sd0 geometry: 121601/255/63 [1953525168 Sectors] Offset: 0 Signature: 0xAA55 Starting Ending LBA Info: #: id C H S - C H S [ start: size ] ------------------------------------------------------------------------------- 0: 00 0 0 0 - 0 0 0 [ 0: 0 ] unused 1: 00 0 0 0 - 0 0 0 [ 0: 0 ] unused 2: 00 0 0 0 - 0 0 0 [ 0: 0 ] unused *3: A6 0 1 1 - 121600 254 63 [ 63: 1953520002 ] OpenBSD - disklabel --- # /dev/rsd0c: type: SCSI disk: SCSI disk label: WDC WD10EARS-00Y flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 121601 total sectors: 1953525168 rpm: 3600 interleave: 1 boundstart: 63 boundend: 1953520065 drivedata: 0 16 partitions: # size offset fstype [fsize bsize cpg] a: 20000000 63 4.2BSD 2048 16384 1 # / b: 2000000 20000063 swap c: 1953525168 0 unused d: 20000000 22000063 4.2BSD 2048 16384 1 # /usr e: 20000000 42000063 4.2BSD 2048 16384 1 # /usr/obj f: 20000000 62000063 4.2BSD 2048 16384 1 # /usr/src - ramdisk --- # fgrep ramdisk /etc/fstab swap /ramdisk mfs rw,nodev,nosuid,-s=2000000 0 0 ### aligned partition/slices ### - sequential write/read --- # dd if=/dev/zero of=/tmp/testfile.4k bs=4k count=524288 524288+0 records in 524288+0 records out 2147483648 bytes transferred in 22.040 secs (97433116 bytes/sec) # dd if=/dev/zero of=/tmp/testfile.64k bs=64k count=32768 32768+0 records in 32768+0 records out 2147483648 bytes transferred in 21.204 secs (101273894 bytes/sec) # dd if=/dev/zero of=/tmp/testfile.1m bs=1m count=2048 2048+0 records in 2048+0 records out 2147483648 bytes transferred in 21.848 secs (98291974 bytes/sec) # dd of=/dev/null if=/tmp/testfile.4k bs=4k count=524288 524288+0 records in 524288+0 records out 2147483648 bytes transferred in 20.707 secs (103706513 bytes/sec) # dd of=/dev/null if=/tmp/testfile.64k bs=64k count=32768 32768+0 records in 32768+0 records out 2147483648 bytes transferred in 20.425 secs (105136468 bytes/sec) # dd of=/dev/null if=/tmp/testfile.1m bs=1m count=2048 2048+0 records in 2048+0 records out 2147483648 bytes transferred in 20.603 secs (104228605 bytes/sec) - extract source tarball --- # ls -l /ramdisk total 298752 -rw-r--r-- 1 root wheel 152866732 Jan 6 19:19 src.tgz # cd /usr/src # time tar xzf /ramdisk/src.tgz 6m26.31s real 0m3.72s user 0m7.49s system - build kernel / make obj / make build --- # cd /usr/src/sys/arch/amd64/compile/GENERIC.MP # time ( make depend && make ) [ ... ] 2m27.94s real 2m1.78s user 0m23.48s system # cd /usr/src && time make obj [ ... ] 0m28.51s real 0m2.41s user 0m5.43s system # cd /usr/src && time make build [ ... ] 36m7.27s real 19m31.87s user 7m31.80s system ### unaligned partition/slices (no jumper) ### - sequential write/read --- # dd if=/dev/zero of=/tmp/testfile.4k bs=4k count=524288 524288+0 records in 524288+0 records out 2147483648 bytes transferred in 24.869 secs (86349673 bytes/sec) # dd if=/dev/zero of=/tmp/testfile.64k bs=64k count=32768 32768+0 records in 32768+0 records out 2147483648 bytes transferred in 25.082 secs (85616298 bytes/sec) # dd if=/dev/zero of=/tmp/testfile.1m bs=1m count=2048 2048+0 records in 2048+0 records out 2147483648 bytes transferred in 27.114 secs (79201231 bytes/sec) # dd of=/dev/null if=/tmp/testfile.4k bs=4k count=524288 524288+0 records in 524288+0 records out 2147483648 bytes transferred in 20.562 secs (104434701 bytes/sec) # dd of=/dev/null if=/tmp/testfile.64k bs=64k count=32768 32768+0 records in 32768+0 records out 2147483648 bytes transferred in 20.559 secs (104453140 bytes/sec) # dd of=/dev/null if=/tmp/testfile.1m bs=1m count=2048 2048+0 records in 2048+0 records out 2147483648 bytes transferred in 20.467 secs (104921901 bytes/sec) - extract source tarball --- # ls -l /ramdisk total 298752 -rw-r--r-- 1 root wheel 152866732 Jan 6 21:12 src.tgz # cd /usr/src # time tar xzf /ramdisk/src.tgz 14m30.30s real 0m4.15s user 0m6.15s system - build kernel / make obj / make build --- # time ( make depend && make ) [ ... ] 2m48.12s real 2m1.03s user 0m24.14s system # cd /usr/src && time make obj [ ... ] 1m1.41s real 0m2.14s user 0m5.99s system # cd /usr/src && time make build [ ... ] 70m51.58s real 19m31.95s user 7m27.84s system ### wd10eads (just for comparison) ### - sequential write/read --- # dd if=/dev/zero of=/wd10eads/testfile.4k bs=4k count=524288 524288+0 records in 524288+0 records out 2147483648 bytes transferred in 26.590 secs (80762241 bytes/sec) # dd if=/dev/zero of=/wd10eads/testfile.64k bs=64k count=32768 32768+0 records in 32768+0 records out 2147483648 bytes transferred in 26.435 secs (81234814 bytes/sec) # dd if=/dev/zero of=/wd10eads/testfile.1m bs=1m count=2048 2048+0 records in 2048+0 records out 2147483648 bytes transferred in 25.838 secs (83113302 bytes/sec) # dd of=/dev/null if=/wd10eads/testfile.4k bs=4k count=524288 524288+0 records in 524288+0 records out 2147483648 bytes transferred in 25.959 secs (82723667 bytes/sec) # dd of=/dev/null if=/wd10eads/testfile.64k bs=64k count=32768 32768+0 records in 32768+0 records out 2147483648 bytes transferred in 25.101 secs (85552816 bytes/sec) # dd of=/dev/null if=/wd10eads/testfile.1m bs=1m count=2048 2048+0 records in 2048+0 records out 2147483648 bytes transferred in 25.072 secs (85650289 bytes/sec) - extract source tarball --- # cd /wd10eads/ # time tar xzf /ramdisk/src.tgz 1m33.80s real 0m8.01s user 0m12.15s system (( If you read this far, have a cookie and wonder with me about that quick extraction... The system this drive is in has the same board, but everything else is slower and not idle when meassured...))