On 2/20/06, Raju Uppalapati <[EMAIL PROTECTED]> wrote: > When copying or removing directories with lots of content, it feels like it > takes more time to complete the same task on Solaris than on Linux. > Is Solaris is really slower than Linux for file system related tasks? > Is the slowness due to some default security and data consistency features in > Solaris? > Can I change any config on my Solaris system to make these operations quicker? >
Give me some data to work with here. On the exact same hardware install Red Hat Enterprise linux AS 64-bit and then create 10 or 20 million files of varying sizes with varying path name lengths and then perform a cp -rp foo bar with it. Time that ten times. In each case ensure that you start with a fresh filesystem. Then repeat that procedure but this time you will perform a rm -rf foo Again repeat this procedure ten times and record the data with millisecond accuracy. Then repeat all of the above with a ZFS volume. Then install Solaris 10 Update 1 on that exact same hardware and arrange for the test filesystem to be in more or less the same cylinder range on the physical test disk. Repeat all above procedures. Then install Solaris Express and setup ZFS. Repeat all of the above. Then install Microsoft Windows 2003 Advanced Server. Repeat the test again. Then install Nexenta and perform these tests again. Then arrange for some fibre storage arrays and perform those tests again with ten disk stripes and with multiple fibre controllers and with multipath redundant features on all OS options. For example .. take a A5200 array ( get one anywhere .. they are cheap now ) Be sure to have a look at my crucible file IO load tester, at http://www.blastwave.org/dclarke/crucible get it like so : # wget http://www.blastwave.org/dclarke/crucible/crucible_sparc --14:50:33-- http://www.blastwave.org/dclarke/crucible/crucible_sparc => `crucible_sparc' Resolving www.blastwave.org... 207.61.151.12 Connecting to www.blastwave.org|207.61.151.12|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 17,080 (17K) [text/plain] 100%[====================================>] 17,080 --.--K/s 14:50:33 (296.27 KB/s) - `crucible_sparc' saved [17080/17080] # mv crucible_sparc ./crucible # chmod 755 crucible # Then create a place to run a quick test on a fresh disk : # format -e Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 Specify disk (enter its number): 0 selecting c1t0d0 [disk formatted] FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision scsi - independent SCSI mode selects cache - enable, disable or query SCSI disk cache volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> pa PARTITION MENU: 0 - change `0' partition 1 - change `1' partition 2 - change `2' partition 3 - change `3' partition 4 - change `4' partition 5 - change `5' partition 6 - change `6' partition 7 - change `7' partition select - select a predefined table modify - modify a predefined partition table name - name the current table print - display the current table label - write partition map and label to the disk !<cmd> - execute <cmd>, then return quit partition> pr Current partition table (original): Total disk cylinders available: 14087 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 1649 - 3709 10.00GB (2061/0/0) 20972736 1 var wm 3710 - 5770 10.00GB (2061/0/0) 20972736 2 backup wm 0 - 14086 68.35GB (14087/0/0) 143349312 3 swap wu 0 - 1648 8.00GB (1649/0/0) 16780224 4 unassigned wm 5771 - 5783 64.59MB (13/0/0) 132288 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 home wm 5784 - 7844 10.00GB (2061/0/0) 20972736 partition> 5 Part Tag Flag Cylinders Size Blocks 5 unassigned wm 0 0 (0/0/0) 0 Enter partition id tag[unassigned]: Enter partition permission flags[wm]: Enter new starting cyl[0]: 7845 Enter partition size[0b, 0c, 7845e, 0.00mb, 0.00gb]: 32g `32.00gb' is out of range Enter partition size[0b, 0c, 7845e, 0.00mb, 0.00gb]: 24g partition> pr Current partition table (unnamed): Total disk cylinders available: 14087 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 1649 - 3709 10.00GB (2061/0/0) 20972736 1 var wm 3710 - 5770 10.00GB (2061/0/0) 20972736 2 backup wm 0 - 14086 68.35GB (14087/0/0) 143349312 3 swap wu 0 - 1648 8.00GB (1649/0/0) 16780224 4 unassigned wm 5771 - 5783 64.59MB (13/0/0) 132288 5 unassigned wm 7845 - 12791 24.00GB (4947/0/0) 50340672 6 unassigned wm 0 0 (0/0/0) 0 7 home wm 5784 - 7844 10.00GB (2061/0/0) 20972736 partition> label [0] SMI Label [1] EFI Label Specify Label type[0]: Ready to label disk, continue? yes partition> q FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision scsi - independent SCSI mode selects cache - enable, disable or query SCSI disk cache volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> q # # newfs -v -b 8192 -i 8192 -f 8192 -m 2 -s 50340672 /dev/rdsk/c1t0d0s5 newfs: construct a new file system /dev/rdsk/c1t0d0s5: (y/n)? y mkfs -F ufs /dev/rdsk/c1t0d0s5 50340672 -1 -1 8192 8192 203 2 167 8192 t 0 -1 8 128 n Warning: 3264 sector(s) in last cylinder unallocated /dev/rdsk/c1t0d0s5: 50340672 sectors in 8194 cylinders of 48 tracks, 128 sectors 24580.4MB in 513 cyl groups (16 c/g, 48.00MB/g, 5824 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920, Initializing cylinder groups: .......... super-block backups for last 10 cylinder groups at: 49453984, 49552416, 49650848, 49749280, 49847712, 49946144, 50044576, 50143008, 50241440, 50331680 # mount that with ufs logging ( remember to test all above with and without ufs logging or journeling ) # mount -F ufs -o logging /dev/dsk/c1t0d0s5 /mnt # mkdir /mnt/test Now do a quickie dump test to that fresh disk # time -p mkfile -v 4096m /mnt/test/4g.dat /mnt/test/4g.dat 4294967296 bytes real 66.12 user 0.49 sys 20.43 # bc scale=9 4096/66.12 61.947973381 # So there you get about 60MB/sec to the disk which is an Ultra320 SCSI disk in a Netra V240. Let's try a quick creation of 26^3 files okay ? Thats 17576 files and we will create them all with integer multiples of the filesystem block size and then go back for a second pass and append a block fragment and then a third pass and append another really small fragment and test the speed of each file IO operation with a buffer flush and ensure that we use the microsecond accurate hige resolution timers for all this too okay ? something like this : [ trivial code idea ] start_proc_hrt = gethrtime(); fp = fopen ( filename, "w") fflush_err = fflush ( fp ); if ( fflush_err != 0 ) { fprintf ( stderr, "fflush error %i", fflush_err ); exit (1); } fclose ( fp ); end_proc_hrt = gethrtime(); First we try a fresh filesystem for each test such that we don't get polluted data. We are being scientific here .. # umount /dev/dsk/c1t0d0s5 # newfs -v -b 8192 -i 8192 -f 8192 -m 2 -s 50340672 /dev/rdsk/c1t0d0s5 newfs: /dev/rdsk/c1t0d0s5 last mounted as /mnt newfs: construct a new file system /dev/rdsk/c1t0d0s5: (y/n)? y mkfs -F ufs /dev/rdsk/c1t0d0s5 50340672 -1 -1 8192 8192 203 2 167 8192 t 0 -1 8 128 n Warning: 3264 sector(s) in last cylinder unallocated /dev/rdsk/c1t0d0s5: 50340672 sectors in 8194 cylinders of 48 tracks, 128 sectors 24580.4MB in 513 cyl groups (16 c/g, 48.00MB/g, 5824 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920, Initializing cylinder groups: .......... super-block backups for last 10 cylinder groups at: 49453984, 49552416, 49650848, 49749280, 49847712, 49946144, 50044576, 50143008, 50241440, 50331680 # mount -F ufs -o logging /dev/dsk/c1t0d0s5 /mnt # mkdir /mnt/test # time -p /root/crucible /mnt/test ************************************************************** crucible : cru-ci-ble (kroo'se-bel) noun. 1. A vessel for melting materials at high temperatures. 2. A severe test, as of patience or belief; a trial. [ Dennis Clarke [EMAIL PROTECTED] ] ************************************************************** TEST 1 ) file write. Building file structure at /mnt/test/ This test will create 26^3 = 17576 files of exactly 65536 bytes each. This amounts to 1151860736 bytes = 1098.5Mb of file data. RT = 21.590421 sec 17576 files avg = 0.001228 sec total = 21.580843 sec io avg = 50.901627 MB/sec TEST 2 ) file append 2048 bytes. Appending to file structure at /mnt/test/ This test will append 2048 bytes to the files that were created in TEST 1. RT = 4.915591 sec 17576 files avg = 0.000279 sec total = 4.909125 sec io avg = 6.992718 MB/sec TEST 3 ) file append 749 bytes Appending to file structure at /mnt/test/ This test will append 749 bytes to the files that were created in TEST 1. RT = 3.770303 sec 17576 files avg = 0.000214 sec total = 3.764416 sec io avg = 3.335065 MB/sec RT = 30.276818 sec real 30.28 user 12.04 sys 8.18 # So there we see a whole new world of data when we thrash the file IO a little bit. We see 50 MB/sec to a single disk. Let's check the filesystem to see if we really did get those files : # find /mnt/test -type f | wc -l 17576 The files are dumped into directory structures of the form [a-z][a-z] thus we have 26^2 top level directories. Let's now time the removal of them : # du -skd /mnt/test 1270896 /mnt/test # time -p rm -rf /mnt/test/[a-z][a-z] real 3.00 user 0.15 sys 1.46 # hmmmmm .. 3 seconds. Well, there is probably some shell time in there expanding those options for directory paths. This is a pretty basic little server. Thus : # uname -a SunOS mail-g1 5.10 Generic_118822-25 sun4u sparc SUNW,Netra-240 # prtconf -v | grep Memory Memory size: 4096 Megabytes # psrinfo -v Status of virtual processor 0 as of: 02/20/2006 15:07:08 on-line since 02/16/2006 22:34:32. The sparcv9 processor operates at 1503 MHz, and has a sparcv9 floating point processor. Status of virtual processor 1 as of: 02/20/2006 15:07:08 on-line since 02/16/2006 22:34:30. The sparcv9 processor operates at 1503 MHz, and has a sparcv9 floating point processor. # cat /etc/release Solaris 10 1/06 s10s_u1wos_19a SPARC Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 07 December 2005 # Now then .. let's move on towards fibre storage and we will start with a single fibre array with 22 disks okay ? # luxadm display mirror0 SENA DISK STATUS SLOT FRONT DISKS (Node WWN) REAR DISKS (Node WWN) 0 On (O.K.) 2000002037979c97 On (O.K.) 2000002037979d77 1 On (O.K.) 2000002037979d07 On (O.K.) 2000002037979d88 2 On (O.K.) 200000203719d128 On (O.K.) 200000203719c3a4 3 On (O.K.) 2000002037979de4 On (O.K.) 200000203719d31c 4 On (O.K.) 20000020374f67e0 On (O.K.) 2000002037979d74 5 On (O.K.) 2000002037979d78 On (O.K.) 2000002037979dd9 6 On (O.K.) 2000002037979d71 On (O.K.) 2000002037979db9 7 On (O.K.) 200000203719c591 On (O.K.) 2000002037979d31 8 On (O.K.) 2000002037979deb On (O.K.) 2000002037979d84 9 On (O.K.) 2000002037979d1f On (O.K.) 2000002037877afa 10 On (O.K.) 2000002037979c5c On (O.K.) 2000002037979d83 SUBSYSTEM STATUS FW Revision:1.09 Box ID:0 Node WWN:508002000006a2e8 Enclosure Name:mirror0 Power Supplies (0,2 in front, 1 in rear) 0 O.K.(rev.-02) 1 O.K.(rev.-02) 2 O.K.(rev.-02) Fans (0 in front, 1 in rear) 0 O.K.(rev.-04) 1 O.K.(rev.-00) ESI Interface board(IB) (A top, B bottom) A: O.K.(rev.-04) GBIC module (1 on left, 0 on right in IB) 0 O.K.(mod.-05) 1 Not Installed B: O.K.(rev.-04) GBIC module (1 on left, 0 on right in IB) 0 Not Installed 1 Not Installed Disk backplane (0 in front, 1 in rear) Front Backplane: O.K.(rev.-03) Temperature sensors (on front backplane) 0:37ºC 1:39ºC 2:39ºC 3:39ºC 4:39ºC 5:39ºC 6:39ºC 7:39ºC 8:39ºC 9:39ºC 10:37ºC (All temperatures are NORMAL.) Rear Backplane: O.K.(rev.-03) Temperature sensors (on rear backplane) 0:37ºC 1:37ºC 2:39ºC 3:37ºC 4:39ºC 5:39ºC 6:39ºC 7:39ºC 8:39ºC 9:40ºC 10:39ºC (All temperatures are NORMAL.) Interconnect assembly O.K.(rev.-02) Loop configuration Loop A is configured as a single loop. Loop B is configured as a single loop. Language USA English # Want me to go on and on and on ? I can tell you that I have tested the file IO speed of Solaris with various configs for years and years and years. In a very methodical and pedantic fashion. I have recorded data and measured and bench marked. Every single bloody time .. its so boring .. Solaris is faster on the same hardware. Of course, to be fair, I have only been doing this for 15 years .. so maybe Linux is faster in some way on some hardware somewhere with a specific configuration. But out of the box plain jane Solaris rocks every time. Just for giggles, by the way, that single fibre path array up there clocks in at 96MB/sec which means I saturated the 1GB/sec fibre controller most likely. Thats "actual" file IO and not some big dumb dump to disk that prooves nothing. I am still working on the crucible code and I have revisions of it that really create a LOT more files and then randomly seeks all over the directory structure and appends and truncates and does various other things. Dennis Clarke
_______________________________________________ opensolaris-discuss mailing list [email protected]
