Hi Lindsay. First node:
root@nodo01:~# pveperf /var/lib/ceph/osd/ceph-0 CPU BOGOMIPS: 40000.76 REGEX/SECOND: 954062 HD SIZE: 1857.11 GB (/dev/sdb1) BUFFERED READS: 152.38 MB/sec AVERAGE SEEK TIME: 15.07 ms FSYNCS/SECOND: 48.11 DNS EXT: 51.79 ms DNS INT: 62.76 ms (domain.test2) root@nodo01:~# pveperf /var/lib/ceph/osd/ceph-1 CPU BOGOMIPS: 40000.76 REGEX/SECOND: 972176 HD SIZE: 1857.11 GB (/dev/sdd1) BUFFERED READS: 141.72 MB/sec AVERAGE SEEK TIME: 18.91 ms FSYNCS/SECOND: 41.38 DNS EXT: 23.32 ms DNS INT: 79.97 ms (domain.test2) root@nodo01:~# pveperf /var/lib/ceph/osd/ceph-2 CPU BOGOMIPS: 40000.76 REGEX/SECOND: 956704 HD SIZE: 1857.11 GB (/dev/sde1) BUFFERED READS: 157.24 MB/sec AVERAGE SEEK TIME: 14.97 ms FSYNCS/SECOND: 43.48 DNS EXT: 20.50 ms DNS INT: 130.27 ms (domain.test2) -------- Second node: root@nodo02:~# pveperf /var/lib/ceph/osd/ceph-3 CPU BOGOMIPS: 39999.04 REGEX/SECOND: 965952 HD SIZE: 1857.11 GB (/dev/sdb1) BUFFERED READS: 147.61 MB/sec AVERAGE SEEK TIME: 22.60 ms FSYNCS/SECOND: 42.29 DNS EXT: 45.84 ms DNS INT: 54.82 ms (futek.it) root@nodo02:~# pveperf /var/lib/ceph/osd/ceph-4 CPU BOGOMIPS: 39999.04 REGEX/SECOND: 956254 HD SIZE: 1857.11 GB (/dev/sdc1) BUFFERED READS: 143.70 MB/sec AVERAGE SEEK TIME: 15.33 ms FSYNCS/SECOND: 47.33 DNS EXT: 20.91 ms DNS INT: 20.76 ms (futek.it) root@nodo02:~# pveperf /var/lib/ceph/osd/ceph-5 CPU BOGOMIPS: 39999.04 REGEX/SECOND: 996038 HD SIZE: 1857.11 GB (/dev/sdd1) BUFFERED READS: 150.55 MB/sec AVERAGE SEEK TIME: 15.83 ms FSYNCS/SECOND: 52.12 DNS EXT: 20.69 ms DNS INT: 21.33 ms (futek.it) -------- Third node: root@nodo03:~# pveperf /var/lib/ceph/osd/ceph-6 CPU BOGOMIPS: 40001.56 REGEX/SECOND: 988544 HD SIZE: 1857.11 GB (/dev/sdb1) BUFFERED READS: 125.93 MB/sec AVERAGE SEEK TIME: 18.15 ms FSYNCS/SECOND: 43.85 DNS EXT: 40.32 ms DNS INT: 22.03 ms (futek.it) root@nodo03:~# pveperf /var/lib/ceph/osd/ceph-7 CPU BOGOMIPS: 40001.56 REGEX/SECOND: 963925 HD SIZE: 1857.11 GB (/dev/sdc1) BUFFERED READS: 111.99 MB/sec AVERAGE SEEK TIME: 18.33 ms FSYNCS/SECOND: 26.52 DNS EXT: 26.22 ms DNS INT: 20.57 ms (futek.it) root@nodo03:~# pveperf /var/lib/ceph/osd/ceph-8 CPU BOGOMIPS: 40001.56 REGEX/SECOND: 998566 HD SIZE: 1857.11 GB (/dev/sdd1) BUFFERED READS: 149.53 MB/sec AVERAGE SEEK TIME: 14.75 ms FSYNCS/SECOND: 43.25 DNS EXT: 15.37 ms DNS INT: 55.12 ms (futek.it) I can only see that OSD ceph-7 has less (half) fsyncs/second (also testing again it). Those servers, have this controller: 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02) On the other production cluster (with Dell 2950 and PERC5i, i have really better fsyncs/sec); I think that is because of controller's cache: root@proxmox-1:~# pveperf /var/lib/ceph/osd/ceph-0 CPU BOGOMIPS: 37238.64 REGEX/SECOND: 901248 HD SIZE: 925.55 GB (/dev/sdb1) BUFFERED READS: 101.61 MB/sec AVERAGE SEEK TIME: 17.39 ms FSYNCS/SECOND: 1817.31 DNS EXT: 43.65 ms DNS INT: 2.87 ms (panservice.it) 02:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID controller 5 On the thest cluster, with only one 500Gbyte HD and this controller: 00:1f.2 IDE interface: Intel Corporation NM10/ICH7 Family SATA Controller [IDE mode] (rev 01) i have this result: root@nodo1:~# pveperf /var/lib/ceph/osd/ceph-1 CPU BOGOMIPS: 8532.80 REGEX/SECOND: 818860 HD SIZE: 460.54 GB (/dev/sdb1) BUFFERED READS: 71.33 MB/sec AVERAGE SEEK TIME: 22.75 ms FSYNCS/SECOND: 53.49 DNS EXT: 69.40 ms DNS INT: 2.34 ms (panservice.it) So, for the first and third cluster, i have similar fsync/second result, but very different delay on OSDs. I'll try to investigate some controller-related issue. Thanks, Fabrizio ----- Messaggio originale ----- Da: "Lindsay Mathieson" <[email protected]> A: [email protected], "Fabrizio Cuseo" <[email protected]> Inviato: Giovedì, 15 gennaio 2015 13:17:07 Oggetto: Re: [PVE-User] High ceph OSD latency On Thu, 15 Jan 2015 11:25:44 AM Fabrizio Cuseo wrote: > What is strange is that on OSD tree I have high latency: tipically Apply > latency is between 5 and 25, but commit lattency is between 150 and 300 > (and sometimes 5/600), with 5/10 op/s and some B/s rd/wr (i have only 3 > vms, and only 1 is working now, so the cluster is really unloaded). > > I am using a pool with 3 copies, and I have increased pg_num to 256 (the > default value of 64 is too low); but OSD latency is the same with a > different pg_num value. > > I have other clusters (similar configuration, using dell 2950, dual ethernet > for ceph and proxmox, 4 x OSD with 1Tbyte drive, perc 5i controller), with > several vlms, and the commit and apply latency is 1/2ms. > > Another cluster (test cluster) with 3 x dell PE860, with only 1 OSD per > node, have better latency (10/20 ms). > > What can i check ? POOMA U, but if you have one drive or controller that is marginal or failing, it can slow down the whole cluster. Might be worth while benching individual osd's -- --- Fabrizio Cuseo - mailto:[email protected] Direzione Generale - Panservice InterNetWorking Servizi Professionali per Internet ed il Networking Panservice e' associata AIIP - RIPE Local Registry Phone: +39 0773 410020 - Fax: +39 0773 470219 http://www.panservice.it mailto:[email protected] Numero verde nazionale: 800 901492 _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
