Re: [ceph-users] Is the 12.2.1 really stable? Anybody have production cluster with Luminous Bluestore?
Hi, We're running 12.2.1 on production and facing some memory & cpu issues --> http://tracker.ceph.com/issues/4?next_issue_id=3_issue_id=5 http://tracker.ceph.com/issues/21933 On Wed, Nov 22, 2017 at 6:38 PM, Vasu Kulkarniwrote: > On Wed, Nov 22, 2017 at 8:29 AM, magicb...@gmail.com > wrote: > > Hi > > > > We have a Ceph Jewel cluster running, but in our Lab environment, when we > > try to upgrade to 12.2.0, we are facing a problem with cephx/auth and > MGR. > > > > See this bugs: > > > > - http://tracker.ceph.com/issues/22096 > > - > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/ > 2017-August/020396.html > > The issue has come up multiple times in ceph-user list, check tracker > http://tracker.ceph.com/issues/20950 > Its fixed/verified in 12.2.2 but not in 12.2.1, 12.2.2 is not > released yet and is still in backports state. > A workaround is also discussed here for now: > https://www.spinics.net/lists/ceph-devel/msg37911.html > > > > > > > Thanks. > > J. > > > > > > > > On 16/11/17 15:14, Konstantin Shalygin wrote: > >> > >> Hi cephers. > >> Some thoughts... > >> At this time my cluster on Kraken 11.2.0 - works smooth with FileStore > and > >> RBD only. > >> I want upgrade to Luminous 12.2.1 and go to Bluestore because this > cluster > >> want grows double with new disks, so is best opportunity migrate to > >> Bluestore. > >> > >> In ML I was found two problems: > >> 1. Increased memory usage, should be fixed in upstream > >> (http://lists.ceph.com/pipermail/ceph-users-ceph.com/ > 2017-October/021676.html). > >> 2. OSD drops and goes cluster offline > >> (http://lists.ceph.com/pipermail/ceph-users-ceph.com/ > 2017-November/022494.html). > >> Don't know this Bluestore or FileStore OSD'.s. > >> > >> If the first case I can safely survive - hosts has enough memory to go > to > >> Bluestore and with the growing I can wait until the next stable release. > >> That second case really scares me. As I understood clusters with this > >> problem for now not in production. > >> > >> By this point I have completed all the preparations for the update and > now > >> I need to figure out whether I should update to 12.2.1 or wait for the > next > >> stable release, because my cluster is in production and I can't fail. > Or I > >> can upgrade and use FileStore until next release, this is acceptable > for me. > >> > >> Thanks. > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] High osd cpu usage
Hi, Yes, im using bluestore. there is no I/O on the ceph cluster. it's totally idle. All the CPU usage are by OSD who don't have any workload on it. Thanks! On Thu, Nov 9, 2017 at 9:37 AM, Vy Nguyen Tan <vynt.kensh...@gmail.com> wrote: > Hello, > > I think it not normal behavior in Luminous. I'm testing 3 nodes, each node > have 3 x 1TB HDD, 1 SSD for wal + db, E5-2620 v3, 32GB of RAM, 10Gbps NIC. > > I use fio for I/O performance measurements. When I ran "fio > --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test > --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw > --rwmixread=75" I get % CPU each ceph-osd as shown bellow: > >2452 ceph 20 0 2667088 1.813g 15724 S 22.8 5.8 34:41.02 > /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph >2178 ceph 20 0 2872152 2.005g 15916 S 22.2 6.4 43:22.80 > /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph >1820 ceph 20 0 2713428 1.865g 15064 S 13.2 5.9 34:19.56 > /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph > > Are you using bluestore? How many IOPS / disk throughput did you get with > your cluster ? > > > Regards, > > On Wed, Nov 8, 2017 at 8:13 PM, Alon Avrahami <alonavrahami@gmail.com> > wrote: > >> Hello Guys >> >> We have a fresh 'luminous' ( 12.2.0 ) >> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) >> luminous (rc) ( installed using ceph-ansible ) >> >> the cluster contains 6 * Intel server board S2600WTTR ( 96 osds and >> 3 mons ) >> >> We have 6 nodes ( Intel server board S2600WTTR ) , Mem - 64G , CPU >> -> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores . >> Each server has 16 * 1.6TB Dell SSD drives ( SSDSC2BB016T7R ) , total >> of 96 osds , 3 mons >> >> The main usage is rbd's for our OpenStack environment ( Okata ) >> >> We're at the beginning of our production tests and it looks like the >> osd's are too busy although we don't generate too much iops at this stage >> ( almost nothing ) >> All ceph-osds using 50% of CPU usage and I can't figure out why are they >> so busy : >> >> top - 07:41:55 up 49 days, 2:54, 2 users, load average: 6.85, 6.40, >> 6.37 >> >> Tasks: 518 total, 1 running, 517 sleeping, 0 stopped, 0 zombie >> %Cpu(s): 14.8 us, 4.3 sy, 0.0 ni, 80.3 id, 0.0 wa, 0.0 hi, 0.6 si, >> 0.0 st >> KiB Mem : 65853584 total, 23953788 free, 40342680 used, 1557116 >> buff/cache >> KiB Swap: 3997692 total, 3997692 free,0 used. 18020584 avail Mem >> >> PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ >> COMMAND >> 36713 ceph 20 0 3869588 2.826g 28896 S 47.2 4.5 6079:20 >> ceph-osd >> 53981 ceph 20 0 3998732 2.666g 28628 S 45.8 4.2 5939:28 >> ceph-osd >> 55879 ceph 20 0 3707004 2.286g 28844 S 44.2 3.6 5854:29 >> ceph-osd >> 46026 ceph 20 0 3631136 1.930g 29100 S 43.2 3.1 6008:50 >> ceph-osd >> 39021 ceph 20 0 4091452 2.698g 28936 S 42.9 4.3 5687:39 >> ceph-osd >> 47210 ceph 20 0 3598572 1.871g 29092 S 42.9 3.0 5759:19 >> ceph-osd >> 52763 ceph 20 0 3843216 2.410g 28896 S 42.2 3.8 5540:11 >> ceph-osd >> 49317 ceph 20 0 3794760 2.142g 28932 S 41.5 3.4 5872:24 >> ceph-osd >> 42653 ceph 20 0 3915476 2.489g 28840 S 41.2 4.0 5605:13 >> ceph-osd >> 41560 ceph 20 0 3460900 1.801g 28660 S 38.5 2.9 5128:01 >> ceph-osd >> 50675 ceph 20 0 3590288 1.827g 28840 S 37.9 2.9 5196:58 >> ceph-osd >> 37897 ceph 20 0 4034180 2.814g 29000 S 34.9 4.5 4789:10 >> ceph-osd >> 50237 ceph 20 0 3379780 1.930g 28892 S 34.6 3.1 4846:36 >> ceph-osd >> 48608 ceph 20 0 3893684 2.721g 28880 S 33.9 4.3 4752:43 >> ceph-osd >> 40323 ceph 20 0 4227864 2.959g 28800 S 33.6 4.7 4712:36 >> ceph-osd >> 44638 ceph 20 0 3656780 2.437g 28896 S 33.2 3.9 4793:58 >> ceph-osd >> 61639 ceph 20 0 527512 114300 20988 S 2.7 0.2 2722:03 >> ceph-mgr >> 31586 ceph 20 0 765672 304140 21816 S 0.7 0.5 409:06.09 >> ceph-mon >> 68 root 20 0 0 0 0 S 0.3 0.0 3:09.69 >> ksoftirqd/12 >> >> strace doesn't show anything suspicious >> >> root@ecprdbcph10-opens:~# strace -p 36713 >> strace: Process 36713 attached >> futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL >> >> Ceph logs don't reveal anything? >> Is this "normal" behavior in Luminous? >> Looking out in older threads I can only find a thread about time gaps >> which is not our case >> >> Thanks, >> Alon >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] High osd cpu usage
Hello Guys We have a fresh 'luminous' ( 12.2.0 ) (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) ( installed using ceph-ansible ) the cluster contains 6 * Intel server board S2600WTTR ( 96 osds and 3 mons ) We have 6 nodes ( Intel server board S2600WTTR ) , Mem - 64G , CPU -> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores . Each server has 16 * 1.6TB Dell SSD drives ( SSDSC2BB016T7R ) , total of 96 osds , 3 mons The main usage is rbd's for our OpenStack environment ( Okata ) We're at the beginning of our production tests and it looks like the osd's are too busy although we don't generate too much iops at this stage ( almost nothing ) All ceph-osds using 50% of CPU usage and I can't figure out why are they so busy : top - 07:41:55 up 49 days, 2:54, 2 users, load average: 6.85, 6.40, 6.37 Tasks: 518 total, 1 running, 517 sleeping, 0 stopped, 0 zombie %Cpu(s): 14.8 us, 4.3 sy, 0.0 ni, 80.3 id, 0.0 wa, 0.0 hi, 0.6 si, 0.0 st KiB Mem : 65853584 total, 23953788 free, 40342680 used, 1557116 buff/cache KiB Swap: 3997692 total, 3997692 free,0 used. 18020584 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 36713 ceph 20 0 3869588 2.826g 28896 S 47.2 4.5 6079:20 ceph-osd 53981 ceph 20 0 3998732 2.666g 28628 S 45.8 4.2 5939:28 ceph-osd 55879 ceph 20 0 3707004 2.286g 28844 S 44.2 3.6 5854:29 ceph-osd 46026 ceph 20 0 3631136 1.930g 29100 S 43.2 3.1 6008:50 ceph-osd 39021 ceph 20 0 4091452 2.698g 28936 S 42.9 4.3 5687:39 ceph-osd 47210 ceph 20 0 3598572 1.871g 29092 S 42.9 3.0 5759:19 ceph-osd 52763 ceph 20 0 3843216 2.410g 28896 S 42.2 3.8 5540:11 ceph-osd 49317 ceph 20 0 3794760 2.142g 28932 S 41.5 3.4 5872:24 ceph-osd 42653 ceph 20 0 3915476 2.489g 28840 S 41.2 4.0 5605:13 ceph-osd 41560 ceph 20 0 3460900 1.801g 28660 S 38.5 2.9 5128:01 ceph-osd 50675 ceph 20 0 3590288 1.827g 28840 S 37.9 2.9 5196:58 ceph-osd 37897 ceph 20 0 4034180 2.814g 29000 S 34.9 4.5 4789:10 ceph-osd 50237 ceph 20 0 3379780 1.930g 28892 S 34.6 3.1 4846:36 ceph-osd 48608 ceph 20 0 3893684 2.721g 28880 S 33.9 4.3 4752:43 ceph-osd 40323 ceph 20 0 4227864 2.959g 28800 S 33.6 4.7 4712:36 ceph-osd 44638 ceph 20 0 3656780 2.437g 28896 S 33.2 3.9 4793:58 ceph-osd 61639 ceph 20 0 527512 114300 20988 S 2.7 0.2 2722:03 ceph-mgr 31586 ceph 20 0 765672 304140 21816 S 0.7 0.5 409:06.09 ceph-mon 68 root 20 0 0 0 0 S 0.3 0.0 3:09.69 ksoftirqd/12 strace doesn't show anything suspicious root@ecprdbcph10-opens:~# strace -p 36713 strace: Process 36713 attached futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL Ceph logs don't reveal anything? Is this "normal" behavior in Luminous? Looking out in older threads I can only find a thread about time gaps which is not our case Thanks, Alon ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com