On 2/20/06, Raju Uppalapati <[EMAIL PROTECTED]> wrote:
> When copying or removing directories with lots of content, it feels like it 
> takes more time to complete the same task on Solaris than on Linux.
> Is Solaris is really slower than Linux for file system related tasks?
> Is the slowness due to some default security and data consistency features in 
> Solaris?
> Can I change any config on my Solaris system to make these operations quicker?
>

Give me some data to work with here.

On the exact same hardware install Red Hat Enterprise linux AS 64-bit
and then create 10 or 20 million files of varying sizes with varying
path name lengths and then perform a cp -rp foo bar with it.  Time
that ten times.  In each case ensure that you start with a fresh
filesystem.

Then repeat that procedure but this time you will perform a rm -rf foo
Again repeat this procedure ten times and record the data with
millisecond accuracy.

Then repeat all of the above with a ZFS volume.

Then install Solaris 10 Update 1 on that exact same hardware and
arrange for the test filesystem to be in more or less the same
cylinder range on the physical test disk.

Repeat all above procedures.

Then install Solaris Express and setup ZFS.

Repeat all of the above.

Then install Microsoft Windows 2003 Advanced Server.

Repeat the test again.

Then install Nexenta and perform these tests again.

Then arrange for some fibre storage arrays and perform those tests
again with ten disk stripes and with multiple fibre controllers and
with multipath redundant features on all OS options.

For example .. take a A5200 array ( get one anywhere .. they are cheap now )

Be sure to have a look at my crucible file IO load tester, at

http://www.blastwave.org/dclarke/crucible

get it like so :

# wget http://www.blastwave.org/dclarke/crucible/crucible_sparc
--14:50:33--  http://www.blastwave.org/dclarke/crucible/crucible_sparc
           => `crucible_sparc'
Resolving www.blastwave.org... 207.61.151.12
Connecting to www.blastwave.org|207.61.151.12|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17,080 (17K) [text/plain]

100%[====================================>] 17,080        --.--K/s

14:50:33 (296.27 KB/s) - `crucible_sparc' saved [17080/17080]

# mv crucible_sparc ./crucible
# chmod 755 crucible
#

Then create a place to run a quick test on a fresh disk :

# format -e
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
       1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
          /[EMAIL PROTECTED],600000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
Specify disk (enter its number): 0
selecting c1t0d0
[disk formatted]


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        scsi       - independent SCSI mode selects
        cache      - enable, disable or query SCSI disk cache
        volname    - set 8-character volume name
        !<cmd>     - execute <cmd>, then return
        quit
format> pa


PARTITION MENU:
        0      - change `0' partition
        1      - change `1' partition
        2      - change `2' partition
        3      - change `3' partition
        4      - change `4' partition
        5      - change `5' partition
        6      - change `6' partition
        7      - change `7' partition
        select - select a predefined table
        modify - modify a predefined partition table
        name   - name the current table
        print  - display the current table
        label  - write partition map and label to the disk
        !<cmd> - execute <cmd>, then return
        quit
partition> pr
Current partition table (original):
Total disk cylinders available: 14087 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm    1649 -  3709       10.00GB    (2061/0/0)   20972736
  1        var    wm    3710 -  5770       10.00GB    (2061/0/0)   20972736
  2     backup    wm       0 - 14086       68.35GB    (14087/0/0) 143349312
  3       swap    wu       0 -  1648        8.00GB    (1649/0/0)   16780224
  4 unassigned    wm    5771 -  5783       64.59MB    (13/0/0)       132288
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7       home    wm    5784 -  7844       10.00GB    (2061/0/0)   20972736

partition> 5
Part      Tag    Flag     Cylinders         Size            Blocks
  5 unassigned    wm       0                0         (0/0/0)             0

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 7845
Enter partition size[0b, 0c, 7845e, 0.00mb, 0.00gb]: 32g
`32.00gb' is out of range
Enter partition size[0b, 0c, 7845e, 0.00mb, 0.00gb]: 24g
partition> pr
Current partition table (unnamed):
Total disk cylinders available: 14087 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm    1649 -  3709       10.00GB    (2061/0/0)   20972736
  1        var    wm    3710 -  5770       10.00GB    (2061/0/0)   20972736
  2     backup    wm       0 - 14086       68.35GB    (14087/0/0) 143349312
  3       swap    wu       0 -  1648        8.00GB    (1649/0/0)   16780224
  4 unassigned    wm    5771 -  5783       64.59MB    (13/0/0)       132288
  5 unassigned    wm    7845 - 12791       24.00GB    (4947/0/0)   50340672
  6 unassigned    wm       0                0         (0/0/0)             0
  7       home    wm    5784 -  7844       10.00GB    (2061/0/0)   20972736

partition> label
[0] SMI Label
[1] EFI Label
Specify Label type[0]:
Ready to label disk, continue? yes

partition> q


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        scsi       - independent SCSI mode selects
        cache      - enable, disable or query SCSI disk cache
        volname    - set 8-character volume name
        !<cmd>     - execute <cmd>, then return
        quit
format> q
#

# newfs -v -b 8192 -i 8192 -f 8192 -m 2 -s 50340672 /dev/rdsk/c1t0d0s5
newfs: construct a new file system /dev/rdsk/c1t0d0s5: (y/n)? y
mkfs -F ufs /dev/rdsk/c1t0d0s5 50340672 -1 -1 8192 8192 203 2 167 8192
t 0 -1 8 128 n
Warning: 3264 sector(s) in last cylinder unallocated
/dev/rdsk/c1t0d0s5:     50340672 sectors in 8194 cylinders of 48
tracks, 128 sectors
        24580.4MB in 513 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
..........
super-block backups for last 10 cylinder groups at:
 49453984, 49552416, 49650848, 49749280, 49847712, 49946144, 50044576,
 50143008, 50241440, 50331680
#

mount that with ufs logging ( remember to test all above with and
without ufs logging or journeling )

# mount -F ufs -o logging /dev/dsk/c1t0d0s5 /mnt
# mkdir /mnt/test

Now do a quickie dump test to that fresh disk

# time -p mkfile -v 4096m /mnt/test/4g.dat
/mnt/test/4g.dat 4294967296 bytes

real 66.12
user 0.49
sys 20.43
# bc
scale=9
4096/66.12
61.947973381
#

So there you get about 60MB/sec to the disk which is an Ultra320 SCSI
disk in a Netra V240.

Let's try a quick creation of 26^3 files okay ?  Thats 17576 files and
we will create them all with integer multiples of the filesystem block
size and then go back for a second pass and append a block fragment
and then a third pass and append another really small fragment and
test the speed of each file IO operation with a buffer flush and
ensure that we use the microsecond accurate hige resolution timers for
all this too okay ?

something like this :
[ trivial code idea ]
start_proc_hrt = gethrtime();
fp = fopen ( filename, "w")
fflush_err = fflush ( fp );
if ( fflush_err != 0 )
    {
        fprintf ( stderr, "fflush error %i", fflush_err );
        exit (1);
    }
fclose ( fp );
end_proc_hrt = gethrtime();

First we try a fresh filesystem for each test such that we don't get
polluted data.  We are being scientific here ..

# umount /dev/dsk/c1t0d0s5
# newfs -v -b 8192 -i 8192 -f 8192 -m 2 -s 50340672 /dev/rdsk/c1t0d0s5
newfs: /dev/rdsk/c1t0d0s5 last mounted as /mnt
newfs: construct a new file system /dev/rdsk/c1t0d0s5: (y/n)? y
mkfs -F ufs /dev/rdsk/c1t0d0s5 50340672 -1 -1 8192 8192 203 2 167 8192
t 0 -1 8 128 n
Warning: 3264 sector(s) in last cylinder unallocated
/dev/rdsk/c1t0d0s5:     50340672 sectors in 8194 cylinders of 48
tracks, 128 sectors
        24580.4MB in 513 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
..........
super-block backups for last 10 cylinder groups at:
 49453984, 49552416, 49650848, 49749280, 49847712, 49946144, 50044576,
 50143008, 50241440, 50331680
# mount -F ufs -o logging /dev/dsk/c1t0d0s5 /mnt
# mkdir /mnt/test
# time -p /root/crucible /mnt/test

**************************************************************

   crucible : cru-ci-ble (kroo'se-bel) noun.

              1. A vessel for melting materials at
                 high temperatures.

              2. A severe test, as of patience or belief;
                 a trial.

                [ Dennis Clarke [EMAIL PROTECTED] ]


**************************************************************

TEST 1 ) file write.
Building file structure at /mnt/test/

This test will create 26^3 = 17576 files of
exactly 65536 bytes each.   This amounts to
1151860736 bytes = 1098.5Mb of file data.

    RT = 21.590421 sec

  17576 files   avg = 0.001228 sec      total = 21.580843 sec   io avg
= 50.901627 MB/sec

TEST 2 ) file append 2048 bytes.
Appending to file structure at /mnt/test/

This test will append 2048 bytes to the files
that were created in TEST 1.

    RT = 4.915591 sec

  17576 files   avg = 0.000279 sec      total = 4.909125 sec    io avg
= 6.992718 MB/sec

TEST 3 ) file append 749 bytes
Appending to file structure at /mnt/test/

This test will append 749 bytes to the files
that were created in TEST 1.

    RT = 3.770303 sec

  17576 files   avg = 0.000214 sec      total = 3.764416 sec    io avg
= 3.335065 MB/sec



    RT = 30.276818 sec


real 30.28
user 12.04
sys 8.18
#

So there we see a whole new world of data when we thrash the file IO a
little bit. We see 50 MB/sec to a single disk.  Let's check the
filesystem to see if we really did get those files :


# find /mnt/test -type f | wc -l
   17576

The files are dumped into directory structures of the form [a-z][a-z]
thus we have 26^2 top level directories.  Let's now time the removal
of them :

# du -skd /mnt/test
1270896 /mnt/test
# time -p rm -rf /mnt/test/[a-z][a-z]

real 3.00
user 0.15
sys 1.46
#

hmmmmm .. 3 seconds.  Well, there is probably some shell time in there
expanding those options for directory paths.

This is a pretty basic little server.  Thus :

# uname -a
SunOS mail-g1 5.10 Generic_118822-25 sun4u sparc SUNW,Netra-240
# prtconf -v | grep Memory
Memory size: 4096 Megabytes
# psrinfo -v
Status of virtual processor 0 as of: 02/20/2006 15:07:08
  on-line since 02/16/2006 22:34:32.
  The sparcv9 processor operates at 1503 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 02/20/2006 15:07:08
  on-line since 02/16/2006 22:34:30.
  The sparcv9 processor operates at 1503 MHz,
        and has a sparcv9 floating point processor.
# cat /etc/release
                       Solaris 10 1/06 s10s_u1wos_19a SPARC
           Copyright 2005 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 07 December 2005
#

Now then .. let's move on towards fibre storage and we will start with
a single fibre array with 22 disks okay ?

# luxadm display mirror0

                                   SENA
                                 DISK STATUS
SLOT   FRONT DISKS       (Node WWN)          REAR DISKS        (Node WWN)
0      On (O.K.)         2000002037979c97    On (O.K.)         2000002037979d77
1      On (O.K.)         2000002037979d07    On (O.K.)         2000002037979d88
2      On (O.K.)         200000203719d128    On (O.K.)         200000203719c3a4
3      On (O.K.)         2000002037979de4    On (O.K.)         200000203719d31c
4      On (O.K.)         20000020374f67e0    On (O.K.)         2000002037979d74
5      On (O.K.)         2000002037979d78    On (O.K.)         2000002037979dd9
6      On (O.K.)         2000002037979d71    On (O.K.)         2000002037979db9
7      On (O.K.)         200000203719c591    On (O.K.)         2000002037979d31
8      On (O.K.)         2000002037979deb    On (O.K.)         2000002037979d84
9      On (O.K.)         2000002037979d1f    On (O.K.)         2000002037877afa
10     On (O.K.)         2000002037979c5c    On (O.K.)         2000002037979d83
                                SUBSYSTEM STATUS
FW Revision:1.09   Box ID:0   Node WWN:508002000006a2e8   Enclosure Name:mirror0
Power Supplies (0,2 in front, 1 in rear)
        0 O.K.(rev.-02) 1 O.K.(rev.-02) 2 O.K.(rev.-02)
Fans (0 in front, 1 in rear)
        0 O.K.(rev.-04) 1 O.K.(rev.-00)
ESI Interface board(IB) (A top, B bottom)
        A: O.K.(rev.-04)
                GBIC module (1 on left, 0 on right in IB)
                0 O.K.(mod.-05)
                1 Not Installed
        B: O.K.(rev.-04)
                GBIC module (1 on left, 0 on right in IB)
                0 Not Installed
                1 Not Installed
Disk backplane (0 in front, 1 in rear)
        Front Backplane: O.K.(rev.-03)
          Temperature sensors (on front backplane)
          0:37ºC 1:39ºC 2:39ºC 3:39ºC 4:39ºC 5:39ºC
          6:39ºC 7:39ºC 8:39ºC 9:39ºC 10:37ºC  (All temperatures are NORMAL.)
        Rear Backplane:  O.K.(rev.-03)
          Temperature sensors (on rear backplane)
          0:37ºC 1:37ºC 2:39ºC 3:37ºC 4:39ºC 5:39ºC
          6:39ºC 7:39ºC 8:39ºC 9:40ºC 10:39ºC  (All temperatures are NORMAL.)
Interconnect assembly
        O.K.(rev.-02)
Loop  configuration
        Loop A is configured as a single loop.
        Loop B is configured as a single loop.
Language        USA English
#

Want me to go on and on and on ?

I can tell you that I have tested the file IO speed of Solaris with
various configs for years and years and years.  In a very methodical
and pedantic fashion.  I have recorded data and measured and bench
marked.

Every single bloody time .. its so boring .. Solaris is faster on the
same hardware.

Of course, to be fair, I have only been doing this for 15 years .. so
maybe Linux is faster in some way on some hardware somewhere with a
specific configuration.  But out of the box plain jane Solaris rocks
every time.

Just for giggles, by the way, that single fibre path array up there
clocks in at 96MB/sec which means I saturated the 1GB/sec fibre
controller most likely.   Thats "actual" file IO and not some big dumb
dump to disk that prooves nothing.

I am still working on the crucible code and I have revisions of it
that really create a LOT more files and then randomly seeks all over
the directory structure and appends and truncates and does various
other things.

Dennis Clarke
_______________________________________________
opensolaris-discuss mailing list
[email protected]

Reply via email to