Hi David,

I am Praveen, we also had a similar problem with hammer 0.94.2. We had the
problem when we created a new cluster with erasure coding pool (10+5
config).

Root cause:

The high memory usage in our case was because of pg logs. The number of pg
logs are higher in case of erasure coding pool compared to replica pools.
so in our case we started running out of memory when we created the new
cluster with erasure coding pools

Solution:

Ceph provides configuration to control the number of pg log entries. You
can try setting this value in your cluster and check your OSD memory usage.
This will also improve the osd boot up time. Below are the config
parameters and the values we use

  osd max pg log entries = 600
  osd min pg log entries = 200
  osd pg log trim min = 200

Other Information:

We dug around this problem for some time before figuring out the root
cause. So we are fairly sure there are no memory leaks in ceph hammer
0.94.2 version.

Regards,
Praveen

Date: Fri, 7 Oct 2016 16:04:03 +1100
From: David Burns <dbu...@fetchtv.com.au>
To: ceph-us...@ceph.com
Subject: [ceph-users] Hammer OSD memory usage very high
Message-ID: <c5d65c91-1abf-4a7a-bab5-b88785a0a...@fetchtv.com.au>
Content-Type: text/plain; charset=UTF-8

Hello all,

We have a small 160TB Ceph cluster used only as a test s3 storage
repository for media content.

Problem
Since upgrading from Firefly to Hammer we are experiencing very high OSD
memory use of 2-3 GB per TB of OSD storage - typical OSD memory 6-10GB.
We have had to increase swap space to bring the cluster to a basic
functional state. Clearly this will significantly impact system performance
and precludes starting all OSDs simultaneously.

Hardware
4 x storage nodes with 16 OSDs/node. OSD nodes are reasonable spec SMC
storage servers with dual Xeon CPUs. Storage is 16 x 3TB SAS disks in each
node.
Installed RAM is 72GB (2 nodes) & 80GB (2 nodes). (We note that the
installed RAM is at least 50% higher than the Ceph recommended 1 GB RAM per
TB of storage.)

Software
OSD node OS is CentOS 6.8 (with updates). One node has been updated to
CentOS 7.2 - no change in memory usage was observed.

"ceph -v" -> ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
(all Ceph packages downloaded from download.ceph.com)

The cluster has achieved status HEALTH_OK so we don?t believe this relates
to increased memory due to recovery.

History
Emperor 0.72.2 -> Firefly 0.80.10 -> Hammer 0.94.6 -> Hammer 0.94.7 ->
Hammer 0.94.9

OSD per process memory is observed to increase substantially during
load_pgs phase.

Use of "ceph tell 'osd.*' heap release? has minimal effect - there is no
substantial memory in the heap or cache freelists.

More information can be found in bug #17228 (link
http://tracker.ceph.com/issues/17228)

Any feedback or guidance to further understanding the high memory usage
would be welcomed.

Thanks

David


--
FetchTV Pty Ltd, Level 5, 61 Lavender Street, Milsons Point, NSW 2061

<http://www.fetchtv.com.au/>

This email is sent by FetchTV Pty Ltd (ABN 36 130 669 500). The contents of
this communication may be
confidential, legally privileged and/or copyright material. If you are not
the intended recipient, any use,
disclosure or copying of this communication is expressed prohibited. If you
have received this email in error,
please notify the sender and delete it immediately.

<http://facebook.com/fetchtv>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to