Re: [ceph-users] Is the 12.2.1 really stable? Anybody have production cluster with Luminous Bluestore?

2017-11-23 Thread Alon Avrahami
Hi,

We're running 12.2.1 on production and facing some memory & cpu issues -->

http://tracker.ceph.com/issues/4?next_issue_id=3_issue_id=5

http://tracker.ceph.com/issues/21933


On Wed, Nov 22, 2017 at 6:38 PM, Vasu Kulkarni  wrote:

> On Wed, Nov 22, 2017 at 8:29 AM, magicb...@gmail.com
>  wrote:
> > Hi
> >
> > We have a Ceph Jewel cluster running, but in our Lab environment, when we
> > try to upgrade to 12.2.0, we are facing a problem with cephx/auth and
> MGR.
> >
> > See this bugs:
> >
> > - http://tracker.ceph.com/issues/22096
> > -
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2017-August/020396.html
>
> The issue has come up multiple times in ceph-user list, check tracker
> http://tracker.ceph.com/issues/20950
> Its fixed/verified in 12.2.2 but not in 12.2.1,  12.2.2 is not
> released yet and is still in backports state.
> A workaround is also discussed here for now:
> https://www.spinics.net/lists/ceph-devel/msg37911.html
>
> >
> >
> > Thanks.
> > J.
> >
> >
> >
> > On 16/11/17 15:14, Konstantin Shalygin wrote:
> >>
> >> Hi cephers.
> >> Some thoughts...
> >> At this time my cluster on Kraken 11.2.0 - works smooth with FileStore
> and
> >> RBD only.
> >> I want upgrade to Luminous 12.2.1 and go to Bluestore because this
> cluster
> >> want grows double with new disks, so is best opportunity migrate to
> >> Bluestore.
> >>
> >> In ML I was found two problems:
> >> 1. Increased memory usage, should be fixed in upstream
> >> (http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2017-October/021676.html).
> >> 2. OSD drops and goes cluster offline
> >> (http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2017-November/022494.html).
> >> Don't know this Bluestore or FileStore OSD'.s.
> >>
> >> If the first case I can safely survive - hosts has enough memory to go
> to
> >> Bluestore and with the growing I can wait until the next stable release.
> >> That second case really scares me. As I understood clusters with this
> >> problem for now not in production.
> >>
> >> By this point I have completed all the preparations for the update and
> now
> >> I need to figure out whether I should update to 12.2.1 or wait for the
> next
> >> stable release, because my cluster is in production and I can't fail.
> Or I
> >> can upgrade and use FileStore until next release, this is acceptable
> for me.
> >>
> >> Thanks.
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High osd cpu usage

2017-11-09 Thread Alon Avrahami
Hi,

Yes, im using bluestore.
there is no I/O on the ceph cluster. it's totally idle.
All the CPU usage are by OSD who don't have any workload on it.

Thanks!

On Thu, Nov 9, 2017 at 9:37 AM, Vy Nguyen Tan <vynt.kensh...@gmail.com>
wrote:

> Hello,
>
> I think it not normal behavior in Luminous. I'm testing 3 nodes, each node
> have 3 x 1TB HDD, 1 SSD for wal + db, E5-2620 v3, 32GB of RAM, 10Gbps NIC.
>
> I use fio for  I/O performance measurements. When I ran "fio
> --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
> --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw
> --rwmixread=75" I get % CPU each ceph-osd as shown bellow:
>
>2452 ceph  20   0 2667088 1.813g  15724 S  22.8  5.8  34:41.02
> /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
>2178 ceph  20   0 2872152 2.005g  15916 S  22.2  6.4  43:22.80
> /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
>1820 ceph  20   0 2713428 1.865g  15064 S  13.2  5.9  34:19.56
> /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
>
> Are you using bluestore? How many IOPS / disk throughput did you get with
> your cluster ?
>
>
> Regards,
>
> On Wed, Nov 8, 2017 at 8:13 PM, Alon Avrahami <alonavrahami@gmail.com>
> wrote:
>
>> Hello Guys
>>
>> We  have a fresh 'luminous'  (  12.2.0 ) 
>> (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>> luminous (rc)   ( installed using ceph-ansible )
>>
>> the cluster contains 6 *  Intel  server board  S2600WTTR  (  96 osds and
>> 3 mons )
>>
>> We have 6 nodes  ( Intel server board  S2600WTTR ) , Mem - 64G , CPU
>> -> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .
>> Each server   has 16 * 1.6TB  Dell SSD drives ( SSDSC2BB016T7R )  , total
>> of 96 osds , 3 mons
>>
>> The main usage  is rbd's for our  OpenStack environment ( Okata )
>>
>> We're at the beginning of our production tests and it looks like the
>> osd's are too busy although  we don't generate  too much iops at this stage
>> ( almost nothing )
>> All ceph-osds using 50% of CPU usage and I can't figure out why are they
>> so busy :
>>
>> top - 07:41:55 up 49 days,  2:54,  2 users,  load average: 6.85, 6.40,
>> 6.37
>>
>> Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
>> %Cpu(s): 14.8 us,  4.3 sy,  0.0 ni, 80.3 id,  0.0 wa,  0.0 hi,  0.6 si,
>> 0.0 st
>> KiB Mem : 65853584 total, 23953788 free, 40342680 used,  1557116
>> buff/cache
>> KiB Swap:  3997692 total,  3997692 free,0 used. 18020584 avail Mem
>>
>> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
>> COMMAND
>>   36713 ceph  20   0 3869588 2.826g  28896 S  47.2  4.5   6079:20
>> ceph-osd
>>   53981 ceph  20   0 3998732 2.666g  28628 S  45.8  4.2   5939:28
>> ceph-osd
>>   55879 ceph  20   0 3707004 2.286g  28844 S  44.2  3.6   5854:29
>> ceph-osd
>>   46026 ceph  20   0 3631136 1.930g  29100 S  43.2  3.1   6008:50
>> ceph-osd
>>   39021 ceph  20   0 4091452 2.698g  28936 S  42.9  4.3   5687:39
>> ceph-osd
>>   47210 ceph  20   0 3598572 1.871g  29092 S  42.9  3.0   5759:19
>> ceph-osd
>>   52763 ceph  20   0 3843216 2.410g  28896 S  42.2  3.8   5540:11
>> ceph-osd
>>   49317 ceph  20   0 3794760 2.142g  28932 S  41.5  3.4   5872:24
>> ceph-osd
>>   42653 ceph  20   0 3915476 2.489g  28840 S  41.2  4.0   5605:13
>> ceph-osd
>>   41560 ceph  20   0 3460900 1.801g  28660 S  38.5  2.9   5128:01
>> ceph-osd
>>   50675 ceph  20   0 3590288 1.827g  28840 S  37.9  2.9   5196:58
>> ceph-osd
>>   37897 ceph  20   0 4034180 2.814g  29000 S  34.9  4.5   4789:10
>> ceph-osd
>>   50237 ceph  20   0 3379780 1.930g  28892 S  34.6  3.1   4846:36
>> ceph-osd
>>   48608 ceph  20   0 3893684 2.721g  28880 S  33.9  4.3   4752:43
>> ceph-osd
>>   40323 ceph  20   0 4227864 2.959g  28800 S  33.6  4.7   4712:36
>> ceph-osd
>>   44638 ceph  20   0 3656780 2.437g  28896 S  33.2  3.9   4793:58
>> ceph-osd
>>   61639 ceph  20   0  527512 114300  20988 S   2.7  0.2   2722:03
>> ceph-mgr
>>   31586 ceph  20   0  765672 304140  21816 S   0.7  0.5 409:06.09
>> ceph-mon
>>  68 root  20   0   0  0  0 S   0.3  0.0   3:09.69
>> ksoftirqd/12
>>
>> strace  doesn't show anything suspicious
>>
>> root@ecprdbcph10-opens:~# strace -p 36713
>> strace: Process 36713 attached
>> futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL
>>
>> Ceph logs don't reveal anything?
>> Is this "normal" behavior in Luminous?
>> Looking out in older threads I can only find a thread about time gaps
>> which is not our case
>>
>> Thanks,
>> Alon
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] High osd cpu usage

2017-11-08 Thread Alon Avrahami
Hello Guys

We  have a fresh 'luminous'  (  12.2.0 )
(32ce2a3ae5239ee33d6150705cdb24d43bab910c)
luminous (rc)   ( installed using ceph-ansible )

the cluster contains 6 *  Intel  server board  S2600WTTR  (  96 osds and  3
mons )

We have 6 nodes  ( Intel server board  S2600WTTR ) , Mem - 64G , CPU
-> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .
Each server   has 16 * 1.6TB  Dell SSD drives ( SSDSC2BB016T7R )  , total
of 96 osds , 3 mons

The main usage  is rbd's for our  OpenStack environment ( Okata )

We're at the beginning of our production tests and it looks like the  osd's
are too busy although  we don't generate  too much iops at this stage (
almost nothing )
All ceph-osds using 50% of CPU usage and I can't figure out why are they so
busy :

top - 07:41:55 up 49 days,  2:54,  2 users,  load average: 6.85, 6.40, 6.37

Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.8 us,  4.3 sy,  0.0 ni, 80.3 id,  0.0 wa,  0.0 hi,  0.6 si,
0.0 st
KiB Mem : 65853584 total, 23953788 free, 40342680 used,  1557116 buff/cache
KiB Swap:  3997692 total,  3997692 free,0 used. 18020584 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
  36713 ceph  20   0 3869588 2.826g  28896 S  47.2  4.5   6079:20
ceph-osd
  53981 ceph  20   0 3998732 2.666g  28628 S  45.8  4.2   5939:28
ceph-osd
  55879 ceph  20   0 3707004 2.286g  28844 S  44.2  3.6   5854:29
ceph-osd
  46026 ceph  20   0 3631136 1.930g  29100 S  43.2  3.1   6008:50
ceph-osd
  39021 ceph  20   0 4091452 2.698g  28936 S  42.9  4.3   5687:39
ceph-osd
  47210 ceph  20   0 3598572 1.871g  29092 S  42.9  3.0   5759:19
ceph-osd
  52763 ceph  20   0 3843216 2.410g  28896 S  42.2  3.8   5540:11
ceph-osd
  49317 ceph  20   0 3794760 2.142g  28932 S  41.5  3.4   5872:24
ceph-osd
  42653 ceph  20   0 3915476 2.489g  28840 S  41.2  4.0   5605:13
ceph-osd
  41560 ceph  20   0 3460900 1.801g  28660 S  38.5  2.9   5128:01
ceph-osd
  50675 ceph  20   0 3590288 1.827g  28840 S  37.9  2.9   5196:58
ceph-osd
  37897 ceph  20   0 4034180 2.814g  29000 S  34.9  4.5   4789:10
ceph-osd
  50237 ceph  20   0 3379780 1.930g  28892 S  34.6  3.1   4846:36
ceph-osd
  48608 ceph  20   0 3893684 2.721g  28880 S  33.9  4.3   4752:43
ceph-osd
  40323 ceph  20   0 4227864 2.959g  28800 S  33.6  4.7   4712:36
ceph-osd
  44638 ceph  20   0 3656780 2.437g  28896 S  33.2  3.9   4793:58
ceph-osd
  61639 ceph  20   0  527512 114300  20988 S   2.7  0.2   2722:03
ceph-mgr
  31586 ceph  20   0  765672 304140  21816 S   0.7  0.5 409:06.09
ceph-mon
 68 root  20   0   0  0  0 S   0.3  0.0   3:09.69
ksoftirqd/12

strace  doesn't show anything suspicious

root@ecprdbcph10-opens:~# strace -p 36713
strace: Process 36713 attached
futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL

Ceph logs don't reveal anything?
Is this "normal" behavior in Luminous?
Looking out in older threads I can only find a thread about time gaps which
is not our case

Thanks,
Alon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com