[ceph-users] Does anyone else still experiancing memory issues with 12.2.2 and Bluestore?

2018-02-10 Thread Tzachi Strul
Hi,
I know that 12.2.2 should have fixed all memory leak issues with bluestore
but we still experiencing some odd behavior.

Our osd flaps once in a while... sometimes it doesn't stop until we restart
all osds on all nodes/on the same server...
in our syslog we see messages like this "failed: Cannot allocate memory" on
all kind of processes...

In addition, sometimes we get this error while trying to work with ceph
commands:
Traceback (most recent call last):
  File "/usr/bin/ceph", line 125, in 
import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages

seems like a memory leak issue...when we restart all osds this behavior
stops for few hours/days.
we have 8 osd servers with 16 ssd disks on each and 64GB of ram. bluestore
cache set to default (3GB for ssd)

the result is our cluster is almost constantly in rebuilds and that impacts
performance.

root@ecprdbcph10-opens:~# ceph daemon osd.1 dump_mempools
{
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 5105472,
"bytes": 5105472
},
"bluestore_cache_data": {
"items": 68868,
"bytes": 1934663680
},
"bluestore_cache_onode": {
"items": 152640,
"bytes": 102574080
},
"bluestore_cache_other": {
"items": 16920009,
"bytes": 371200513
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 3,
"bytes": 2160
},
"bluestore_writing_deferred": {
"items": 33,
"bytes": 265015
},
"bluestore_writing": {
"items": 19,
"bytes": 6403820
},
"bluefs": {
"items": 303,
"bytes": 12760
},
"buffer_anon": {
"items": 32958,
"bytes": 14087657
},
"buffer_meta": {
"items": 68996,
"bytes": 6071648
},
"osd": {
"items": 187,
"bytes": 2255968
},
"osd_mapbl": {
"items": 0,
"bytes": 0
},
"osd_pglog": {
"items": 514238,
"bytes": 152438172
},
"osdmap": {
"items": 35699,
    "bytes": 823040
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 0,
"bytes": 0
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
},
"total": {
"items": 22899425,
"bytes": 2595903985
}
}


Any help would be appreciated.
Thank you


-- 

*Tzachi Strul*

*Storage DevOps *// *Kenshoo*

*Office* +972 73 2862-368 // *Mobile* +972 54 755 1308

[image: Kenshoo logo] <http://kenshoo.com/>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Stuck pgs (activating+remapped) and slow requests after adding OSD node via ceph-ansible

2018-01-07 Thread Tzachi Strul
Hi all,
We have 5 node ceph cluster (Luminous 12.2.1) installed via ceph-ansible.
All servers have 16X1.5TB SSD disks.
3 of these servers are also acting as MON+MGRs.
We don't have separated network for cluster and public, each node has 4
NICs bonded together (40G) and serves cluster+public communication (We know
it's not ideal and planning to change it).

Last week we added another node to cluster (another 16*1.5TB ssd).
We used ceph-ansible latest stable release.
After OSD activation cluster started rebalancing and problems began:
1. Cluster entered HEALTH_ERROR state
2. 67 pgs stuck at activating+remapped
3. A lot of blocked slow requests.

This cluster serves OpenStack volumes and almost all OpenStack instances
got 100% disk utilization and hanged, eventually, cinder-volume has crushed.

Eventually, after restarting several OSDs, problem solved and cluster got
back to HEALTH_OK

Our configuration already has:
osd max backfills = 1
osd max scrubs = 1
osd recovery max active = 1
osd recovery op priority = 1

In addition, we see a lot of bad mappings:
for example: bad mapping rule 0 x 52 num_rep 8 result [32,5,78,25,96,59,80]

What can be the cause and what can I do in order to avoid this situation?
we need to add another 9 osd servers and can't afford downtime.

Any help would be appreciated. Thank you very much


Our ceph configuration:

[mgr]
mgr_modules = dashboard zabbix

[global]
cluster network = *removed for security resons*
fsid =  *removed for security resons*
mon host =  *removed for security resons*
mon initial members =  *removed for security resons*
mon osd down out interval = 900
osd pool default size = 3
public network =  *removed for security resons*

[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be
writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by QEMU and
allowed by SELinux or AppArmor

[osd]
osd backfill scan max = 16
osd backfill scan min = 4
osd bluestore cache size = 104857600  **Due to 12.2.1 bluestore memory leak
bug**
osd max backfills = 1
osd max scrubs = 1
osd recovery max active = 1
osd recovery max single start = 1
osd recovery op priority = 1
osd recovery threads = 1


--

*Tzachi Strul*

*Storage DevOps *// *Kenshoo*

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] application not enabled on poo - openstack pools

2017-09-14 Thread Tzachi Strul
Hi All,
We finished installed a fresh new cluster version 12 (Luminous). we used
ceph ansible.
I know since version 12 we need to associate pools to application.
I have only openstack related pools in this configuration:
images
volumes
vms
backup.

I just wanted to make sure... All this pools shoud be associated with rbd
application am i right?

Thanks


-- 

*Tzachi Strul*

*Storage Ops *// *Kenshoo*

*Office* +972 73 2862-368 // *Mobile* +972 54 755 1308

[image: Kenshoo logo] <http://kenshoo.com/>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Integratin ceph with openstack with cephx disabled

2017-06-14 Thread Tzachi Strul
Hi,
We have ceph cluster that we want to integrate with openstack.
We disabled cephx.
We noticed that when we integrate ceph with libvirt it doesnt work unless
we use client.cinder.key when we import secret.xml

are we doing something wrong? or it is impossible to implement without
cephx enabled?

Thank you

-- 

*Tzachi Strul*

*Storage DevOps *// *Kenshoo*

*Office* +972 73 2862-368 // *Mobile* +972 54 755 1308

[image: Kenshoo logo] <http://kenshoo.com/>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com