"Se stie" ca nu se foloseste ext4 pentru baze de date ...
Exista bench-uri publice cu mai multe baze de date si mai multe fs-uri.
Ca intotdeauna insa depinde mult si de tipicul tau de operare.
Din familia extx, ext2 tin minte ca facea fata cel mai bine.
Dupa ce necesarul de stocare creste, multi jura ca xfs-ul
face fata (si) mai bine.

De ce nu faci niste bench-uri cu baza, nu cu dd-ul sau hdparm-ul,
care ambele fac cu totul si cu totul altceva?


peace,
_bogdan_



On Sun, Apr 14, 2013 at 12:23 PM, petrescs <[email protected]> wrote:

> Salut,
>
> Nu reusesc sa imi dau seama de un bottleneck I/O pe un server de baze de
> date. Am un script care face niste modificari intensive pe o db firebird de
> aprox. 5GB, insa din alte investigatii, inclin sa cred ca problema nu e la
> db/script/indecsi samd, ci mai degraba undeva in filesystem (ext4).
>
> Mai jos informatii relevante (sper). Sistemul nu are optimizari sau
> parametrizari custom (nobarriers, noatime, writeback, commit, vm.dirty_*) -
> totul e default pe un debian squeeze amd64. Memorie suficienta, procesul nu
> are nevoie de swap, CPU nu e utilizat mai deloc si nici nu asteapta dupa
> IO, rularea se executa local, nu asteapta dupa retea, ionice nu are efect
> samd.
>
> Singura chestie dubioasa pare sa o raporteze strace, apelurile pwrite()
> dureaza foarte mult comparativ cu cele write(), nu imi dau seama daca e
> acceptabil sau nu. Pot spune insa ca pwrite scrie intotdeuna cate 8K
> (dimensiunea paginii din db) fata de apelurile write() care sunt pentru
> valori mult mai mici (128 bytes), insa nici asa nu explica diferenta de
> timp: 8k este de 64 ori mai mare ca 128, insa apelul pwrite dureaza de
> aprox. 1400 ori mai mult decat cel de write (vezi mai jos in outputul
> strace).
>
> Orice sugestie de testari suplimentare e binevenita, posibil sa ma fi
> incurcat inclusiv la intrepretarea vreunui parametru din iostat/vmstat.
>
> Multumesc,
> Silviu
>
> 2 buc Seagate Cheetah 15K.5 73GB 15K 3.0Gbps Serial SCSI / SAS Hard Drive
> ST373454SS in RAID1 soft
>
> # uname -a
> Linux sab09 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64
> GNU/Linux
>
> # lsb_release -a
> No LSB modules are available.
> Distributor ID:    Debian
> Description:    Debian GNU/Linux 6.0.5 (squeeze)
> Release:    6.0.5
> Codename:    squeeze
>
>
> # mdadm --query --detail /dev/md0
> /dev/md0:
>         Version : 1.2
>   Creation Time : Wed Jul 27 19:29:10 2011
>      Raid Level : raid1
>      Array Size : 66404280 (63.33 GiB 68.00 GB)
>   Used Dev Size : 66404280 (63.33 GiB 68.00 GB)
>    Raid Devices : 2
>   Total Devices : 2
>     Persistence : Superblock is persistent
>
>     Update Time : Sat Apr 13 01:21:38 2013
>           State : active
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>
>            Name : sab09:0  (local to host sab09)
>            UUID : 1ecf91cd:38ebe28d:e9691655:6656a7d6
>          Events : 763
>
>     Number   Major   Minor   RaidDevice State
>        0       8        1        0      active sync   /dev/sda1
>        1       8       17        1      active sync   /dev/sdb1
>
> # tune2fs -l /dev/md0
> tune2fs 1.41.12 (17-May-2010)
> Filesystem volume name:   <none>
> Last mounted on:          /
> Filesystem UUID:          18efff63-743e-4411-b826-574bad5d51dc
> Filesystem magic number:  0xEF53
> Filesystem revision #:    1 (dynamic)
> Filesystem features:      has_journal ext_attr resize_inode dir_index
> filetype needs_recovery extent flex_bg sparse_super large_file huge_file
> uninit_bg dir_nlink extra_isize
> Filesystem flags:         signed_directory_hash
> Default mount options:    (none)
> Filesystem state:         clean
> Errors behavior:          Continue
> Filesystem OS type:       Linux
> Inode count:              4153344
> Block count:              16601070
> Reserved block count:     830053
> Free blocks:              12170173
> Free inodes:              4122661
> First block:              0
> Block size:               4096
> Fragment size:            4096
> Reserved GDT blocks:      1020
> Blocks per group:         32768
> Fragments per group:      32768
> Inodes per group:         8192
> Inode blocks per group:   512
> Flex block group size:    16
> Filesystem created:       Wed Jul 27 19:31:08 2011
> Last mount time:          Mon Jul 23 20:48:17 2012
> Last write time:          Tue Feb 14 18:22:01 2012
> Mount count:              9
> Maximum mount count:      31
> Last checked:             Tue Feb 14 18:22:01 2012
> Check interval:           15552000 (6 months)
> Next check after:         Sun Aug 12 19:22:01 2012
> Lifetime writes:          3306 GB
> Reserved blocks uid:      0 (user root)
> Reserved blocks gid:      0 (group root)
> First inode:              11
> Inode size:              256
> Required extra isize:     28
> Desired extra isize:      28
> Journal inode:            8
> Default directory hash:   half_md4
> Directory Hash Seed:      a934d2d2-5f44-4850-8bb9-94ce3f425c45
> Journal backup:           inode blocks
>
> # strace -c -p 6146
> Process 6146 attached - interrupt to quit
> ^CProcess 6146 detached
>
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  99.67    0.064003          26      2454           pwrite
>   0.14    0.000093           0      8707           pread
>   0.07    0.000048           0      7782           read
>   0.07    0.000045           0      6618           write
>   0.05    0.000029           0     10021           lseek
>   0.00    0.000000           0         6           fstat
>   0.00    0.000000           0       752           rt_sigaction
>   0.00    0.000000           0       846           rt_sigprocmask
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.064218                 37186           total
>
> # strace -e pwrite -p 6146
> pwrite(8,
> "\5\000909\1\0\0\0\0\0\0\0\0\0\0\303d\0\0\237\0#\1\0\0\0\0\0\0\0\0"...,
> 8192, 5040791552) = 8192
> [...]
>
> #strace -e write -p 6146
> write(12, "\0\376\0\0\0\0\0\0007\1\0\0\0\0\0\0\r\0Ianuarie 2006\0"..., 128)
> = 128
> [...]
>
> # cat /proc/6146/io
> rchar: 498882100537
> wchar: 140103265784
> syscr: 138456608
> syscw: 82931241
> read_bytes: 557056
> write_bytes: 131345989632
> cancelled_write_bytes: 0
>
> # free -m
>              total       used       free     shared    buffers     cached
> Mem:         12046      11584        462          0          6      11286
> -/+ buffers/cache:        291      11754
> Swap:        10311          4      10307
>
> # iostat 1 /dev/md0 (in medie pe parcursul rularii)
> [...]
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.49    0.00    0.16    1.96    0.00   97.39
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> md0             355.00         0.00      1888.00          0       1888
> [...]
>
> # iostat -x 1 /dev/md0
> [...]
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.65    0.00    0.26    1.56    0.00   97.53
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> md0               0.00     0.00    0.00  365.00     0.00  1952.00
> 5.35     0.00    0.00   0.00   0.00
> [...]
>
> # vmstat 1
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  0   4848 472976   6492 11557044    0    0     2     5    0    0  1  0
> 99  0
>  0  0   4848 473092   6492 11557044    0    0     0   888  997 1192  0  0
> 98  2
>  0  0   4848 473092   6492 11557044    0    0     0  1032 1064 1375  0  0
> 97  3
>  1  0   4848 473092   6500 11557036    0    0     0   876  965 1140  0  0
> 97  2
>  0  1   4848 473092   6500 11557044    0    0     0  1032 1085 1387  0  0
> 98  2
>  0  0   4848 473092   6500 11557044    0    0     0   912 1015 1216  1  0
> 97  3
>  0  1   4848 473092   6500 11557044    0    0     0   944 1000 1267  0  0
> 98  2
>  0  0   4848 473092   6500 11557044    0    0     0  1040 1083 1429  0  0
> 98  2
>  0  0   4848 473092   6508 11557040    0    0     0   868  965 1144  0  0
> 98  2
>  0  0   4848 473092   6508 11557044    0    0     0   968 1032 1322  0  0
> 97  2
> ^C
> _______________________________________________
> RLUG mailing list
> [email protected]
> http://lists.lug.ro/mailman/listinfo/rlug
>
_______________________________________________
RLUG mailing list
[email protected]
http://lists.lug.ro/mailman/listinfo/rlug

Raspunde prin e-mail lui