Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-11 Thread Steven Vacaroaia
Hello again, I have reinstalled the cluster and noticed that, with 2 servers is working as expectd, adding the 3rd one tanks perfermonce IRRESPECTIVE of which server is the 3 rd one I have tested it with only 1 OSD per server in order to eliminate any balancing issues This seems to indicate an

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-11 Thread Steven Vacaroaia
[root@osd01 ~]# ceph osd pool ls detail -f json-pretty [ { "pool_name": "rbd", "flags": 1, "flags_names": "hashpspool", "type": 1, "size": 2, "min_size": 1, "crush_rule": 0, "object_hash": 2, "pg_num": 128,

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-11 Thread Konstantin Shalygin
On 04/11/2018 07:48 PM, Steven Vacaroaia wrote: Thanks for the suggestion but , unfortunately, having same number of OSD did not solve the issue Here is with 2 OSD per server, 3 servers - identical servers and osd configuration ceph osd pool ls detail ceph osd crush rule dump k

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-11 Thread Steven Vacaroaia
Thanks for the suggestion but , unfortunately, having same number of OSD did not solve the issue Here is with 2 OSD per server, 3 servers - identical servers and osd configuration [root@osd01 ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 4.02173 root default

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-10 Thread Konstantin Shalygin
ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS TYPE NAME -1 3.44714- 588G 80693M 509G 00 - root default -9 0.57458- 588G 80693M 509G 13.39 1.13 - host osd01 5 hdd 0.57458 1.0 588G 80693M 509G 13.39 1.13 64

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-10 Thread Steven Vacaroaia
Hi With osd_debug increased to 5/5 I am seeing lots of these in the ceph-osd.5.log ( newly added OSD) Anyone know what it means ? 2018-04-10 16:05:33.317451 7f33610be700 5 osd.5 300 heartbeat: osd_stat(43897 MB used, 545 GB avail, 588 GB total, peers [0,1,2,3,4] op hist

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-10 Thread Steven Vacaroaia
I've just added another server ( same specs) with one osd and the behavior is the same - bad performance ..cur MB/s 0 Check network with iperf3 ..no issues So it is not a server issue since I am getting same behavior with 2 different servers ... but I checked network with iperf3 ..no issues

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-10 Thread Steven Vacaroaia
Hi, Thanks for providing guidance VD0 is the SSD drive Many people suggested to not enable WB for SSD so that cache can be used for HDD where is needed more Setup is 3 identical DELL R620 server OSD01, OSD02, OSD04 10 GB separate networks, 600 GB Entreprise HDD , 320 GB Entreprise SSD Blustore,

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-10 Thread Kai Wagner
Is this just from one server or from all servers? Just wondering why VD 0 is using WriteThrough compared to the others. If that's the setup for the OSD's you already have a cache setup problem. On 10.04.2018 13:44, Mohamad Gebai wrote: > megacli -LDGetProp -cache -Lall -a0 > > Adapter 0-VD

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-10 Thread Mohamad Gebai
Just to be clear about the issue: You have a 3 servers setup, performance is good. You add a server (with 1 OSD?) and performance goes down, is that right? Can you give us more details? What's your complete setup? How many OSDs per node, bluestore/filestore, WAL/DB setup, etc. You're talking

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-09 Thread Steven Vacaroaia
Disk controller seem fine Any other suggestions will be really appreciated megacli -AdpBbuCmd -aAll BBU status for Adapter: 0 BatteryType: BBU Voltage: 3925 mV Current: 0 mA Temperature: 17 C Battery State: Optimal BBU Firmware Status: Charging Status : None Voltage

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-06 Thread David Turner
First and foremost, have you checked your disk controller. Of most import would be your cache battery. Any time I have a single node acting up, the controller is Suspect #1. On Thu, Apr 5, 2018 at 11:23 AM Steven Vacaroaia wrote: > Hi, > > I have a strange issue - OSDs from

[ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-05 Thread Steven Vacaroaia
Hi, I have a strange issue - OSDs from a specific server are introducing huge performance issue This is a brand new installation on 3 identical servers - DELL R620 with PERC H710 , bluestore DB and WAL on SSD, 10GB dedicated private/public networks When I add the OSD I see gaps like below