Re: [ceph-users] mimic + cephmetrics + prometheus - working ?
I'm not the expert when it comes to cephmetrics but I think (at least until very recently) cephmetrics relies on other exporters besides the mgr module and the node_exporter. On Mon, Aug 27, 2018 at 01:11:29PM -0400, Steven Vacaroaia wrote: Hi has anyone been able to use Mimic + cephmetric + prometheus ? I am struggling to make it fully functional as it appears data provided by node_exporter has a different name than the one grafana expectes As a result of the above, only certain dashboards are being populated ( the ones ceph specific) while others have "no data points" ( the ones server specific) Any advice/suggestion/troubleshooting tips will be greatly appreciated Example: Grafana latency by server uses node_disk_read_time_ms but node_exporter does not provide it curl [1]http://osd01:9100/metrics | grep node_disk_read_time % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP node_disk_read_time_seconds_total The total number of milliseconds spent by all reads. # TYPE node_disk_read_time_seconds_total counter node_disk_read_time_seconds_total{device="dm-0"} 8910.801 node_disk_read_time_seconds_total{device="sda"} 0.525 node_disk_read_time_seconds_total{device="sdb"} 14221.732 node_disk_read_time_seconds_total{device="sdc"} 0.465 node_disk_read_time_seconds_total{device="sdd"} 0.46 node_disk_read_time_seconds_total{device="sde"} 0.017 node_disk_read_time_seconds_total{device="sdf"} 455.064 node_disk_read_time_seconds_total{device="sr0"} 0 100 64683 100 646830 0 4452k 0 --:--:-- --:--:-- --:--:-- 5263k References 1. http://osd01:9100/metrics ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jan Fajerski Engineer Enterprise Storage SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic - troubleshooting prometheus
The prometheus plugin currently skips histogram perf counters. The representation in ceph is not compatible with prometheus' approach (iirc). However I believe most, if not all of the perf counters should be exported as long running averages. Look for metric pair that are named some_metric_name_sum and some_metric_name_count. HTH, Jan On Fri, Aug 24, 2018 at 01:47:40PM -0400, Steven Vacaroaia wrote: Hi, Any idea/suggestions for troubleshooting prometheus ? what logs /commands are available to find out why OSD servers specific data ( IOPS, disk and network data) is not scrapped but cluster specific data ( pools, capacity ..etc) is ? Increasing log level for MGR showed only the following 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_r_latency_out_bytes_histogram, type 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_out_bytes_histogram, type 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_in_bytes_histogram, type 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_w_latency_in_bytes_histogram, type 2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_r_latency_out_bytes_histogram, type 2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_out_bytes_histogram, type 2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_in_bytes_histogram, type 2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_w_latency_in_bytes_histogram, type 2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_r_latency_out_bytes_histogram, type 2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_out_bytes_histogram, type 2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_rw_latency_in_bytes_histogram, type 2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring osd.op_w_latency_in_bytes_histogram, type ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jan Fajerski Engineer Enterprise Storage SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to secure Prometheus endpoints (mgr plugin and node_exporter)
Hi Martin, hope this is still useful, despite the lag. On Fri, Jun 29, 2018 at 01:04:09PM +0200, Martin Palma wrote: Since Prometheus uses a pull model over HTTP for collecting metrics. What are the best practices to secure these HTTP endpoints? - With a reverse proxy with authentication? This is currently the recommended way to secure prometheus traffic with TLS or authentication. See also https://prometheus.io/docs/introduction/faq/#why-don-t-the-prometheus-server-components-support-tls-or-authentication-can-i-add-those for more info. However native support for TLS and authentication has just been put on the roadmap in August. - Export the node_exporter only on the cluster network? (not usable for the mgr plugin and for nodes like mons, mdss,...) - No security at all? Best, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jan Fajerski Engineer Enterprise Storage SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph-Deploy error on 15/71 stage
Hi Eugen. Just tried everything again here by removing the /sda4 partitions and letting it so that either salt-run proposal-populate or salt-run state.orch ceph.stage.configure could try to find the free space on the partitions to work with: unsuccessfully again. :( Just to make things clear: are you so telling me that it is completely impossible to have a ceph "volume" in non-dedicated devices, sharing space with, for instance, the nodes swap, boot or main partition? And so the only possible way to have a functioning ceph distributed filesystem working would be by having in each node at least one disk dedicated for the operational system and another, independent disk dedicated to the ceph filesystem? That would be a awful drawback in our plans if real, but if there is no other way, we will have to just give up. Just, please, answer this two questions clearly, before we capitulate? :( Anyway, thanks a lot, once again, Jones On Mon, Sep 3, 2018 at 5:39 AM Eugen Block wrote: > Hi Jones, > > I still don't think creating an OSD on a partition will work. The > reason is that SES creates an additional partition per OSD resulting > in something like this: > > vdb 253:16 05G 0 disk > ├─vdb1253:17 0 100M 0 part /var/lib/ceph/osd/ceph-1 > └─vdb2253:18 0 4,9G 0 part > > Even with external block.db and wal.db on additional devices you would > still need two partitions for the OSD. I'm afraid with your setup this > can't work. > > Regards, > Eugen > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to setup Ceph OSD auto boot up on node reboot
Hi, I created a ceph cluster manually (not using ceph-deploy). When I reboot the node the osd's doesn't come backup because the OS doesn't know that it need to bring up the OSD. I am running this on Ubuntu 1604. Is there a standardized way to initiate ceph osd start on node reboot? "sudo start ceph-osd-all" isn't working well and doesn't like the idea of "sudo start ceph-osd id=1" for each OSD in rc file. Need to do it for both Hammer (Ubuntu 1404) and Luminous (Ubuntu 1604). -- Thanks, Pardhiv Karri ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS small files overhead
You need to re-deploy OSDs for bluestore_min_alloc_size to take effect. > On 4.09.2018, at 18:31, andrew w goussakovski wrote: > > Hello > > We are trying to use cephfs as storage for web graphics, such as > thumbnails and so on. > Is there any way to reduse overhead on storage? On test cluster we have > 1 fs, 2 pools (meta and data) with replica size = 2 > > objects: 1.02 M objects, 1.1 GiB > usage: 144 GiB used, 27 GiB / 172 GiB avail > > So we have (144/2)/1.1*100%=6500% overhead. > > ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic > (stable) > osd storage - bluestore (changing bluestore_min_alloc_size makes no > visible effect) > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS small files overhead
You could probably cut the overhead in half with the inline data feature: http://docs.ceph.com/docs/master/cephfs/experimental-features/#inline-data However, that is an experimental feature. CephFS is unfortunately not very good at storing lots of small files in a storage-efficient manner :( Paul 2018-09-04 17:31 GMT+02:00 andrew w goussakovski : > Hello > > We are trying to use cephfs as storage for web graphics, such as > thumbnails and so on. > Is there any way to reduse overhead on storage? On test cluster we have > 1 fs, 2 pools (meta and data) with replica size = 2 > > objects: 1.02 M objects, 1.1 GiB > usage: 144 GiB used, 27 GiB / 172 GiB avail > > So we have (144/2)/1.1*100%=6500% overhead. > > ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic > (stable) > osd storage - bluestore (changing bluestore_min_alloc_size makes no > visible effect) > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] v12.2.8 Luminous released
We're glad to announce the next point release in the Luminous v12.2.X stable release series. This release contains a range of bugfixes and stability improvements across all the components of ceph. For detailed release notes with links to tracker issues and pull requests, refer to the blog post at http://ceph.com/releases/v12-2-8-released/ Upgrade Notes from previous luminous releases - When upgrading from v12.2.5 or v12.2.6 please note that upgrade caveats from 12.2.5 will apply to any _newer_ luminous version including 12.2.8. Please read the notes at https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6 For the cluster that installed the broken 12.2.6 release, 12.2.7 fixed the regression and introduced a workaround option `osd distrust data digest = true`, but 12.2.7 clusters still generated health warnings like :: [ERR] 11.288 shard 207: soid 11:1155c332:::rbd_data.207dce238e1f29.0527:head data_digest 0xc8997a5b != data_digest 0x2ca15853 12.2.8 improves the deep scrub code to automatically repair these inconsistencies. Once the entire cluster has been upgraded and then fully deep scrubbed, and all such inconsistencies are resolved; it will be safe to disable the `osd distrust data digest = true` workaround option. Changelog - * bluestore: set correctly shard for existed Collection (issue#24761, pr#22860, Jianpeng Ma) * build/ops: Boost system library is no longer required to compile and link example librados program (issue#25054, pr#23202, Nathan Cutler) * build/ops: Bring back diff -y for non-FreeBSD (issue#24396, issue#21664, pr#22848, Sage Weil, David Zafman) * build/ops: install-deps.sh fails on newest openSUSE Leap (issue#25064, pr#23179, Kyr Shatskyy) * build/ops: Mimic build fails with -DWITH_RADOSGW=0 (issue#24437, pr#22864, Dan Mick) * build/ops: order rbdmap.service before remote-fs-pre.target (issue#24713, pr#22844, Ilya Dryomov) * build/ops: rpm: silence osd block chown (issue#25152, pr#23313, Dan van der Ster) * cephfs-journal-tool: Fix purging when importing an zero-length journal (issue#24239, pr#22980, yupeng chen, zhongyan gu) * cephfs: MDSMonitor: uncommitted state exposed to clients/mdss (issue#23768, pr#23013, Patrick Donnelly) * ceph-fuse mount failed because no mds (issue#22205, pr#22895, liyan) * ceph-volume add a __release__ string, to help version-conditional calls (issue#25170, pr#23331, Alfredo Deza) * ceph-volume: adds test for `ceph-volume lvm list /dev/sda` (issue#24784, issue#24957, pr#23350, Andrew Schoen) * ceph-volume: do not use stdin in luminous (issue#25173, issue#23260, pr#23367, Alfredo Deza) * ceph-volume enable the ceph-osd during lvm activation (issue#24152, pr#23394, Dan van der Ster, Alfredo Deza) * ceph-volume expand on the LVM API to create multiple LVs at different sizes (issue#24020, pr#23395, Alfredo Deza) * ceph-volume lvm.activate conditional mon-config on prime-osd-dir (issue#25216, pr#23397, Alfredo Deza) * ceph-volume lvm.batch remove non-existent sys_api property (issue#34310, pr#23811, Alfredo Deza) * ceph-volume lvm.listing only include devices if they exist (issue#24952, pr#23150, Alfredo Deza) * ceph-volume: process.call with stdin in Python 3 fix (issue#24993, pr#23238, Alfredo Deza) * ceph-volume: PVolumes.get() should return one PV when using name or uuid (issue#24784, pr#23329, Andrew Schoen) * ceph-volume: refuse to zap mapper devices (issue#24504, pr#23374, Andrew Schoen) * ceph-volume: tests.functional inherit SSH_ARGS from ansible (issue#34311, pr#23813, Alfredo Deza) * ceph-volume tests/functional run lvm list after OSD provisioning (issue#24961, pr#23147, Alfredo Deza) * ceph-volume: unmount lvs correctly before zapping (issue#24796, pr#23128, Andrew Schoen) * ceph-volume: update batch documentation to explain filestore strategies (issue#34309, pr#23825, Alfredo Deza) * change default filestore_merge_threshold to -10 (issue#24686, pr#22814, Douglas Fuller) * client: add inst to asok status output (issue#24724, pr#23107, Patrick Donnelly) * client: fixup parallel calls to ceph_ll_lookup_inode() in NFS FASL (issue#22683, pr#23012, huanwen ren) * client: increase verbosity level for log messages in helper methods (issue#21014, pr#23014, Rishabh Dave) * client: update inode fields according to issued caps (issue#24269, pr#22783, "Yan, Zheng") * common: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh (issue#23492, pr#23025, Sage Weil) * common/DecayCounter: set last_decay to current time when decoding decay counter (issue#24440, pr#22779, Zhi Zhang) * doc: ceph-bluestore-tool manpage not getting rendered correctly (issue#24800, pr#23177, Nathan Cutler) * filestore: add pgid in filestore pg dir split log message (issue#24878, pr#23454, Vikhyat Umrao) * let "ceph status" use base 10 when printing numbers not sizes (issue#22095, pr#22680, Jan Fajerski, Kefu Chai) * librados: fix
Re: [ceph-users] Luminous RGW errors at start
This was the issue (could not create the pool, because it would have exceeded the new (luminous) limitation on pgs /osd. On Tue, Sep 4, 2018 at 10:35 AM David Turner wrote: > I was confused what could be causing this until Janne's email. I think > they're correct that the cluster is preventing pool creation due to too > many PGs per OSD. Double check how many PGs you have in each pool and what > your defaults are for that. > > On Mon, Sep 3, 2018 at 7:19 AM Janne Johansson > wrote: > >> Did you change the default pg_num or pgp_num so the pools that did show >> up made it go past the mon_max_pg_per_osd ? >> >> >> Den fre 31 aug. 2018 kl 17:20 skrev Robert Stanford < >> rstanford8...@gmail.com>: >> >>> >>> I installed a new Luminous cluster. Everything is fine so far. Then I >>> tried to start RGW and got this error: >>> >>> 2018-08-31 15:15:41.998048 7fc350271e80 0 rgw_init_ioctx ERROR: >>> librados::Rados::pool_create returned (34) Numerical result out of range >>> (this can be due to a pool or placement group misconfiguration, e.g. pg_num >>> < pgp_num or mon_max_pg_per_osd exceeded) >>> 2018-08-31 15:15:42.005732 7fc350271e80 -1 Couldn't init storage >>> provider (RADOS) >>> >>> I notice that the only pools that exist are the data and index RGW >>> pools (no user or log pools like on Jewel). What is causing this? >>> >>> Thank you >>> R >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> -- >> May the most significant bit of your life be positive. >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous new OSD being over filled
Instead of manually weighting the OSDs, you can use the mgr module to slowly add the OSDs and balance your cluster at the same time. I believe you can control the module by telling it a maximum percent of misplaced objects, or other similar metrics, to control adding in the OSD, while also preventing your cluster from being poorly balanced. On Mon, Sep 3, 2018 at 12:08 PM David C wrote: > Hi Marc > > I like that approach although I think I'd go in smaller weight increments. > > Still a bit confused by the behaviour I'm seeing, it looks like I've got > things weighted correctly. Redhat's docs recommend doing an OSD at a time > and I'm sure that's how I've done it on other clusters in the past although > they would have been running older versions. > > Thanks, > > On Mon, Sep 3, 2018 at 1:45 PM Marc Roos wrote: > >> >> >> I am adding a node like this, I think it is more efficient, because in >> your case you will have data being moved within the added node (between >> the newly added osd's there). So far no problems with this. >> >> Maybe limit your >> ceph tell osd.* injectargs --osd_max_backfills=X >> Because pg's being moved are taking space until the move is completed. >> >> sudo -u ceph ceph osd crush reweight osd.23 1 (all osd's in the node) >> sudo -u ceph ceph osd crush reweight osd.24 1 >> sudo -u ceph ceph osd crush reweight osd.25 1 >> sudo -u ceph ceph osd crush reweight osd.26 1 >> sudo -u ceph ceph osd crush reweight osd.27 1 >> sudo -u ceph ceph osd crush reweight osd.28 1 >> sudo -u ceph ceph osd crush reweight osd.29 1 >> >> And then after recovery >> >> sudo -u ceph ceph osd crush reweight osd.23 2 >> sudo -u ceph ceph osd crush reweight osd.24 2 >> sudo -u ceph ceph osd crush reweight osd.25 2 >> sudo -u ceph ceph osd crush reweight osd.26 2 >> sudo -u ceph ceph osd crush reweight osd.27 2 >> sudo -u ceph ceph osd crush reweight osd.28 2 >> sudo -u ceph ceph osd crush reweight osd.29 2 >> >> Etc etc >> >> >> -Original Message- >> From: David C [mailto:dcsysengin...@gmail.com] >> Sent: maandag 3 september 2018 14:34 >> To: ceph-users >> Subject: [ceph-users] Luminous new OSD being over filled >> >> Hi all >> >> >> Trying to add a new host to a Luminous cluster, I'm doing one OSD at a >> time. I've only added one so far but it's getting too full. >> >> The drive is the same size (4TB) as all others in the cluster, all OSDs >> have crush weight of 3.63689. Average usage on the drives is 81.70% >> >> >> With the new OSD I start with a crush weight 0 and steadily increase. >> It's currently crush weight 3.0 and is 94.78% full. If I increase to >> 3.63689 it's going to hit too full. >> >> >> It's been a while since I've added a host to an existing cluster. Any >> idea why the drive is getting too full? Do I just have to leave this one >> with a lower crush weight and then continue adding the drives and then >> eventually even out the crush weights? >> >> Thanks >> David >> >> >> >> >> >> >> ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous RGW errors at start
I was confused what could be causing this until Janne's email. I think they're correct that the cluster is preventing pool creation due to too many PGs per OSD. Double check how many PGs you have in each pool and what your defaults are for that. On Mon, Sep 3, 2018 at 7:19 AM Janne Johansson wrote: > Did you change the default pg_num or pgp_num so the pools that did show up > made it go past the mon_max_pg_per_osd ? > > > Den fre 31 aug. 2018 kl 17:20 skrev Robert Stanford < > rstanford8...@gmail.com>: > >> >> I installed a new Luminous cluster. Everything is fine so far. Then I >> tried to start RGW and got this error: >> >> 2018-08-31 15:15:41.998048 7fc350271e80 0 rgw_init_ioctx ERROR: >> librados::Rados::pool_create returned (34) Numerical result out of range >> (this can be due to a pool or placement group misconfiguration, e.g. pg_num >> < pgp_num or mon_max_pg_per_osd exceeded) >> 2018-08-31 15:15:42.005732 7fc350271e80 -1 Couldn't init storage provider >> (RADOS) >> >> I notice that the only pools that exist are the data and index RGW pools >> (no user or log pools like on Jewel). What is causing this? >> >> Thank you >> R >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > May the most significant bit of your life be positive. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Luminous - journal setting
Are you planning on using bluestore or filestore? The settings for filestore haven't changed. If you're planning to use bluestore there is a lot of documentation in the ceph docs as well as a wide history of questions like this on the ML. On Mon, Sep 3, 2018 at 5:24 AM M Ranga Swami Reddy wrote: > Hi - I am using the Ceph Luminous release. here what are the OSD > journal settings needed for OSD? > NOTE: I used SSDs for journal till Jewel release. > > Thanks > Swami > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS small files overhead
Hello We are trying to use cephfs as storage for web graphics, such as thumbnails and so on. Is there any way to reduse overhead on storage? On test cluster we have 1 fs, 2 pools (meta and data) with replica size = 2 objects: 1.02 M objects, 1.1 GiB usage: 144 GiB used, 27 GiB / 172 GiB avail So we have (144/2)/1.1*100%=6500% overhead. ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable) osd storage - bluestore (changing bluestore_min_alloc_size makes no visible effect) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] data_extra_pool for RGW Luminous still needed?
On 09/03/2018 10:07 PM, Nhat Ngo wrote: Hi all, I am new to Ceph and we are setting up a new RadosGW and Ceph storage cluster on Luminous. We are using only EC for our `buckets.data` pool at the moment. However, I just read the Red Hat Ceph object Gateway for Production article and it mentions an extra duplicated `buckets.non-ec` pool is needed for multi-part uploads because each multi-upload parts must be stored without EC. EC will only apply to the whole objects, not partial uploads. Is this still hold true for Luminous? The data layout document on Ceph does not make any mention of non-ec pool: http://docs.ceph.com/docs/luminous/radosgw/layout/ Thanks, *Nhat Ngo* | DevOps Engineer Cloud Research Team, University of Melbourne, 3010, VIC *Email: *nhat.n...@unimelb.edu.au ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Hi Nhat, The data extra pool is still necessary for multipart uploads, yes. This extra non-ec pool is only used for the 'multipart metadata' object that tracks which parts have been written, though - the object data for each part is still written to the normal data pool, so it can take advantage of erasure coding. Casey ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Degraded data redundancy: NUM pgs undersized
Hello Lothar, Thanks for your reply. Am 04.09.2018 um 11:20 schrieb Lothar Gesslein: By pure chance 15 pgs are now actually replicated to all 3 osds, so they have enough copies (clean). But the placement is "wrong", it would like to move the data to different osds (remapped) if possible. That seems to be correct. I've added a third bucket of type datacenter and moved on host bucket so that each datacenter has one host with one osd. The PGs were rebalanced (if that is the correct term) and status changed to HEALTH_OK with all PGs active+clean. Now I moved the host in dc2 to another datacenter and removed dc2 from the CRUSH map. Now I have all PGs active+clean+remapped. So now your next statement applies: It replicated to 2 osds in the initial placement but wasn't able to find a suitable third osd. Then by increasing pgp_num it recalculated the placement, again selected two osds and moved the data there. It won't remove the data from the "wrong" osd until it has a new place for it, so you end up with three copies, but remapped pgs. Ok, I think I got this. 3. What's wrong here and what do I have to do to get the cluster back to active+clean, again? I guess you want to have "two copies in dc1, one copy in dc2"? If you stay with only 3 osds that is the only way to distribute 3 objects anyway, so you don't need any crush rule. What your crush rule is currently expressing is "in the default root, select n buckets (where n is the pool size, 3 in this case) of type datacenter, select one leaf (meaning osd) in each datacenter". You only have 2 datacenter buckets, so that will only ever select 2 osds. If your cluster is going to grow to at least 2 osds in each dc, you can go with http://cephnotes.ksperis.com/blog/2017/01/23/crushmap-for-2-dc/ I would translate this crush rule as "in the default root, select 2 buckets of type datacenter, select n-1 (where n is the pool size, so here 3-1 = 2) leafs in each datacenter" You will need at least two osds in each dc for this, because it is random (with respect to the weights) in which dc the 2 copies will be placed and which gets the remaining copy. I don't get it why I need to have at least two osds in each dc. Because I thought when I only have three osds it is implicit clear where to write the two copies. In case I have two osds in each dc I would never know on which side the two copies of my three replicas are. Let's try an example to check if my understanding of the matter is correct or not: I have two dc dcA and dcB with two osds in each dc. Due to the random placement two copies of object A are written in dcA and one in dcB. From the next object B two copies are written in dcB and one in dcA. In case I have two osds in dcA and only one in dcB the two copies of an object are written to dcA every time and only one copy in dcB. Did I get it right? Best regards, Joerg smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] "no valid command found" when running "ceph-deploy osd create"
On Sun, Sep 2, 2018 at 3:01 PM, David Wahler wrote: > On Sun, Sep 2, 2018 at 1:31 PM Alfredo Deza wrote: >> >> On Sun, Sep 2, 2018 at 12:00 PM, David Wahler wrote: >> > Ah, ceph-volume.log pointed out the actual problem: >> > >> > RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path >> > or an existing device is needed >> >> That is odd, is it possible that the error log wasn't the one that >> matched what you saw on ceph-deploy's end? >> >> Usually ceph-deploy will just receive whatever ceph-volume produced. > > I tried again, running ceph-volume directly this time, just to see if > I had mixed anything up. It looks like ceph-deploy is correctly > reporting the output of ceph-volume. The problem is that ceph-volume > only writes the relevant error message to the log file, and not to its > stdout/stderr. > > Console output: > > rock64@rockpro64-1:~/my-cluster$ sudo ceph-volume --cluster ceph lvm > create --bluestore --data /dev/storage/foobar > Running command: /usr/bin/ceph-authtool --gen-print-key > Running command: /usr/bin/ceph --cluster ceph --name > client.bootstrap-osd --keyring > /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new > e7dd6d45-b556-461c-bad1-83d98a5a1afa > --> Was unable to complete a new OSD, will rollback changes > Running command: /usr/bin/ceph --cluster ceph --name > client.bootstrap-osd --keyring > /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.1 > --yes-i-really-mean-it > stderr: no valid command found; 10 closest matches: > [...etc...] > > ceph-volume.log: > > [2018-09-02 18:49:21,415][ceph_volume.main][INFO ] Running command: > ceph-volume --cluster ceph lvm create --bluestore --data > /dev/storage/foobar > [2018-09-02 18:49:21,423][ceph_volume.process][INFO ] Running > command: /usr/bin/ceph-authtool --gen-print-key > [2018-09-02 18:49:26,664][ceph_volume.process][INFO ] stdout > AQCxMIxb+SezJRAAGAP/HHtHLVbciSQnZ/c/qw== > [2018-09-02 18:49:26,668][ceph_volume.process][INFO ] Running > command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new > e7dd6d45-b556-461c-bad1-83d98a5a1afa > [2018-09-02 18:49:27,685][ceph_volume.process][INFO ] stdout 1 > [2018-09-02 18:49:27,686][ceph_volume.process][INFO ] Running > command: /bin/lsblk --nodeps -P -o > NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL > /dev/storage/foobar > [2018-09-02 18:49:27,707][ceph_volume.process][INFO ] stdout > NAME="storage-foobar" KNAME="dm-1" MAJ:MIN="253:1" FSTYPE="" > MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="" SIZE="100G" > STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw" > ALIGNMENT="0" PHY-SEC="4096" LOG-SEC="512" ROTA="1" SCHED="" > TYPE="lvm" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" > PKNAME="" PARTLABEL="" > [2018-09-02 18:49:27,708][ceph_volume.process][INFO ] Running > command: /bin/lsblk --nodeps -P -o > NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL > /dev/storage/foobar > [2018-09-02 18:49:27,720][ceph_volume.process][INFO ] stdout > NAME="storage-foobar" KNAME="dm-1" MAJ:MIN="253:1" FSTYPE="" > MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="" SIZE="100G" > STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw" > ALIGNMENT="0" PHY-SEC="4096" LOG-SEC="512" ROTA="1" SCHED="" > TYPE="lvm" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" > PKNAME="" PARTLABEL="" > [2018-09-02 18:49:27,720][ceph_volume.devices.lvm.prepare][ERROR ] lvm > prepare was unable to complete > Traceback (most recent call last): > File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", > line 216, in safe_prepare > self.prepare(args) > File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py", > line 16, in is_root > return func(*a, **kw) > File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", > line 283, in prepare > block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid) > File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py", > line 206, in prepare_device > raise RuntimeError(' '.join(error)) > RuntimeError: Cannot use device (/dev/storage/foobar). A vg/lv path or > an existing device is needed > [2018-09-02 18:49:27,722][ceph_volume.devices.lvm.prepare][INFO ] > will rollback OSD ID creation > [2018-09-02 18:49:27,723][ceph_volume.process][INFO ] Running > command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.1 > --yes-i-really-mean-it > [2018-09-02 18:49:28,425][ceph_volume.process][INFO ] stderr no valid > command found; 10 closest matches: > [...etc...] This is a bug. Thanks for di
Re: [ceph-users] SSD OSDs crashing after upgrade to 12.2.7
On Tue, Sep 4, 2018 at 3:59 AM, Wolfgang Lendl wrote: > is downgrading from 12.2.7 to 12.2.5 an option? - I'm still suffering > from high frequent osd crashes. > my hopes are with 12.2.9 - but hope wasn't always my best strategy 12.2.8 just went out. I think that Adam or Radoslaw might have some time to check those logs now > > br > wolfgang > > On 2018-08-30 19:18, Alfredo Deza wrote: >> On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl >> wrote: >>> Hi Alfredo, >>> >>> >>> caught some logs: >>> https://pastebin.com/b3URiA7p >> That looks like there is an issue with bluestore. Maybe Radoslaw or >> Adam might know a bit more. >> >> >>> br >>> wolfgang >>> >>> On 2018-08-29 15:51, Alfredo Deza wrote: On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl wrote: > Hi, > > after upgrading my ceph clusters from 12.2.5 to 12.2.7 I'm experiencing > random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not > affected. > I destroyed and recreated some of the SSD OSDs which seemed to help. > > this happens on centos 7.5 (different kernels tested) > > /var/log/messages: > Aug 29 10:24:08 ceph-osd: *** Caught signal (Segmentation fault) ** > Aug 29 10:24:08 ceph-osd: in thread 7f8a8e69e700 > thread_name:bstore_kv_final > Aug 29 10:24:08 kernel: traps: bstore_kv_final[187470] general > protection ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in > libtcmalloc.so.4.4.5[7f8a997a8000+46000] > Aug 29 10:24:08 systemd: ceph-osd@2.service: main process exited, > code=killed, status=11/SEGV > Aug 29 10:24:08 systemd: Unit ceph-osd@2.service entered failed state. > Aug 29 10:24:08 systemd: ceph-osd@2.service failed. > Aug 29 10:24:28 systemd: ceph-osd@2.service holdoff time over, > scheduling restart. > Aug 29 10:24:28 systemd: Starting Ceph object storage daemon osd.2... > Aug 29 10:24:28 systemd: Started Ceph object storage daemon osd.2. > Aug 29 10:24:28 ceph-osd: starting osd.2 at - osd_data > /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal > Aug 29 10:24:35 ceph-osd: *** Caught signal (Segmentation fault) ** > Aug 29 10:24:35 ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp > Aug 29 10:24:35 kernel: traps: tp_osd_tp[186933] general protection > ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in > libtcmalloc.so.4.4.5[7f5f430cd000+46000] > Aug 29 10:24:35 systemd: ceph-osd@0.service: main process exited, > code=killed, status=11/SEGV > Aug 29 10:24:35 systemd: Unit ceph-osd@0.service entered failed state. > Aug 29 10:24:35 systemd: ceph-osd@0.service failed These systemd messages aren't usually helpful, try poking around /var/log/ceph/ for the output on that one OSD. If those logs aren't useful either, try bumping up the verbosity (see http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time ) > did I hit a known issue? > any suggestions are highly appreciated > > > br > wolfgang > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> -- >>> Wolfgang Lendl >>> IT Systems & Communications >>> Medizinische Universität Wien >>> Spitalgasse 23 / BT 88 /Ebene 00 >>> A-1090 Wien >>> Tel: +43 1 40160-21231 >>> Fax: +43 1 40160-921200 >>> >>> > > -- > Wolfgang Lendl > IT Systems & Communications > Medizinische Universität Wien > Spitalgasse 23 / BT 88 /Ebene 00 > A-1090 Wien > Tel: +43 1 40160-21231 > Fax: +43 1 40160-921200 > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS does not always failover to hot standby on reboot
It's mds_beacon_grace. Set that on the monitor to control the replacement of laggy MDS daemons, and usually also set it to the same value on the MDS daemon as it's used there for the daemon to hold off on certain tasks if it hasn't seen a mon beacon recently. John On Mon, Sep 3, 2018 at 9:26 AM William Lawton wrote: > > Which configuration option determines the MDS timeout period? > > > > William Lawton > > > > From: Gregory Farnum > Sent: Thursday, August 30, 2018 5:46 PM > To: William Lawton > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] MDS does not always failover to hot standby on > reboot > > > > Yes, this is a consequence of co-locating the MDS and monitors — if the MDS > reports to its co-located monitor and both fail, the monitor cluster has to > go through its own failure detection and then wait for a full MDS timeout > period after that before it marks the MDS down. :( > > > > We might conceivably be able to optimize for this, but there's not a general > solution. If you need to co-locate, one thing that would make it better > without being a lot of work is trying to have the MDS connect to one of the > monitors on a different host. You can do that by just restricting the list of > monitors you feed it in the ceph.conf, although it's not a guarantee that > will *prevent* it from connecting to its own monitor if there are failures or > reconnects after first startup. > > -Greg > > On Thu, Aug 30, 2018 at 8:38 AM William Lawton > wrote: > > Hi. > > > > We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email). > During resiliency tests we have an occasional problem when we reboot the > active MDS instance and a MON instance together i.e. dub-sitv-ceph-02 and > dub-sitv-ceph-04. We expect the MDS to failover to the standby instance > dub-sitv-ceph-01 which is in standby-replay mode, and 80% of the time it does > with no problems. However, 20% of the time it doesn’t and the MDS_ALL_DOWN > health check is not cleared until 30 seconds later when the rebooted > dub-sitv-ceph-02 and dub-sitv-ceph-04 instances come back up. > > > > When the MDS successfully fails over to the standby we see in the ceph.log > the following: > > > > 2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 50 : > cluster [ERR] Health check failed: 1 filesystem is offline (MDS_ALL_DOWN) > > 2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 52 : > cluster [INF] Standby daemon mds.dub-sitv-ceph-01 assigned to filesystem > cephfs as rank 0 > > 2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 54 : > cluster [INF] Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is > offline) > > > > When the active MDS role does not failover to the standby the MDS_ALL_DOWN > check is not cleared until after the rebooted instances have come back up > e.g.: > > > > 2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 55 : > cluster [ERR] Health check failed: 1 filesystem is offline (MDS_ALL_DOWN) > > 2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 10.18.186.208:6789/0 > 226 : cluster [INF] mon.dub-sitv-ceph-05 calling monitor election > > 2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 56 : > cluster [INF] mon.dub-sitv-ceph-03 calling monitor election > > 2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 57 : > cluster [INF] mon.dub-sitv-ceph-03 is new leader, mons > dub-sitv-ceph-03,dub-sitv-ceph-05 in quorum (ranks 0,2) > > 2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 62 : > cluster [WRN] Health check failed: 1/3 mons down, quorum > dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN) > > 2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 63 : > cluster [WRN] overall HEALTH_WARN 2 osds down; 2 hosts (2 osds) down; 1/3 > mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05 > > 2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 64 : > cluster [WRN] Health check failed: Reduced data availability: 2 pgs inactive, > 115 pgs peering (PG_AVAILABILITY) > > 2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 66 : > cluster [WRN] Health check failed: Degraded data redundancy: 712/2504 objects > degraded (28.435%), 86 pgs degraded (PG_DEGRADED) > > 2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 67 : > cluster [WRN] Health check update: Reduced data availability: 1 pg inactive, > 69 pgs peering (PG_AVAILABILITY) > > 2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 68 : > cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data > availability: 1 pg inactive, 69 pgs peering) > > 2018-08-25 03:30:18.432043 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 69 : > cluster [WRN] Health check update: Degraded data redundancy: 1286/2572 > objects degraded (50.000%), 166 pgs degraded (PG_DEGRADED) > > 2018
Re: [ceph-users] Degraded data redundancy: NUM pgs undersized
On 09/04/2018 09:47 AM, Jörg Kastning wrote: > My questions are: > > 1. What does active+undersized actually mean? I did not find anything > about it in the documentation on docs.ceph.com. http://docs.ceph.com/docs/master/rados/operations/pg-states/ active Ceph will process requests to the placement group. undersized The placement group has fewer copies than the configured pool replication level. Your crush map/rules and osds do not allow to have all pgs on three "independent" osds, so pgs have fewer copies than configured. > 2. Why are only 15 PGs were getting remapped after I've corrected the > mistake with the wrong pgp_num value? By pure chance 15 pgs are now actually replicated to all 3 osds, so they have enough copies (clean). But the placement is "wrong", it would like to move the data to different osds (remapped) if possible. It replicated to 2 osds in the initial placement but wasn't able to find a suitable third osd. Then by increasing pgp_num it recalculated the placement, again selected two osds and moved the data there. It won't remove the data from the "wrong" osd until it has a new place for it, so you end up with three copies, but remapped pgs. > 3. What's wrong here and what do I have to do to get the cluster back > to active+clean, again? I guess you want to have "two copies in dc1, one copy in dc2"? If you stay with only 3 osds that is the only way to distribute 3 objects anyway, so you don't need any crush rule. What your crush rule is currently expressing is "in the default root, select n buckets (where n is the pool size, 3 in this case) of type datacenter, select one leaf (meaning osd) in each datacenter". You only have 2 datacenter buckets, so that will only ever select 2 osds. If your cluster is going to grow to at least 2 osds in each dc, you can go with http://cephnotes.ksperis.com/blog/2017/01/23/crushmap-for-2-dc/ I would translate this crush rule as "in the default root, select 2 buckets of type datacenter, select n-1 (where n is the pool size, so here 3-1 = 2) leafs in each datacenter" You will need at least two osds in each dc for this, because it is random (with respect to the weights) in which dc the 2 copies will be placed and which gets the remaining copy. Best regards, Lothar -- Lothar Gesslein Linux Consultant Mail: gessl...@b1-systems.de B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd_journal_aio=false and performance
Hi, guys. I made a few tests and i see that performance is better if osd_journal_aio=false for LV-journals. Setup: 2 servers x 4 OSD (SATA HDD + journal on SSD LV) 12.2.5, filestore cluster: id: ce305aae-4c56-41ec-be54-529b05eb45ed health: HEALTH_OK services: mon: 2 daemons, quorum a,b mgr: a(active), standbys: b osd: 8 osds: 8 up, 8 in data: pools: 1 pools, 512 pgs objects: 0 objects, 0 bytes usage: 904 MB used, 11440 GB / 11441 GB avail pgs: 512 active+clean 0 objects before each test. I used rados bench from two servers in parralel: 2x of: rados bench -p test 30 write -b 1M -t 32 I ran each test four times. file-journal on XFS FS, dio=1, aio=0: Average IOPS: 102.5 Average Latency(s): 0.30 LV-journal, dio=1, aio=1: Average IOPS: 96.5 Average Latency(s): 32.5 LV-journal, dio=1, aio=0: Average IOPS: 104 Average Latency(s): 0.30 Is It safe to disable aio on LV journals? Is it make sence? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD OSDs crashing after upgrade to 12.2.7
is downgrading from 12.2.7 to 12.2.5 an option? - I'm still suffering from high frequent osd crashes. my hopes are with 12.2.9 - but hope wasn't always my best strategy br wolfgang On 2018-08-30 19:18, Alfredo Deza wrote: > On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl > wrote: >> Hi Alfredo, >> >> >> caught some logs: >> https://pastebin.com/b3URiA7p > That looks like there is an issue with bluestore. Maybe Radoslaw or > Adam might know a bit more. > > >> br >> wolfgang >> >> On 2018-08-29 15:51, Alfredo Deza wrote: >>> On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl >>> wrote: Hi, after upgrading my ceph clusters from 12.2.5 to 12.2.7 I'm experiencing random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not affected. I destroyed and recreated some of the SSD OSDs which seemed to help. this happens on centos 7.5 (different kernels tested) /var/log/messages: Aug 29 10:24:08 ceph-osd: *** Caught signal (Segmentation fault) ** Aug 29 10:24:08 ceph-osd: in thread 7f8a8e69e700 thread_name:bstore_kv_final Aug 29 10:24:08 kernel: traps: bstore_kv_final[187470] general protection ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in libtcmalloc.so.4.4.5[7f8a997a8000+46000] Aug 29 10:24:08 systemd: ceph-osd@2.service: main process exited, code=killed, status=11/SEGV Aug 29 10:24:08 systemd: Unit ceph-osd@2.service entered failed state. Aug 29 10:24:08 systemd: ceph-osd@2.service failed. Aug 29 10:24:28 systemd: ceph-osd@2.service holdoff time over, scheduling restart. Aug 29 10:24:28 systemd: Starting Ceph object storage daemon osd.2... Aug 29 10:24:28 systemd: Started Ceph object storage daemon osd.2. Aug 29 10:24:28 ceph-osd: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal Aug 29 10:24:35 ceph-osd: *** Caught signal (Segmentation fault) ** Aug 29 10:24:35 ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp Aug 29 10:24:35 kernel: traps: tp_osd_tp[186933] general protection ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in libtcmalloc.so.4.4.5[7f5f430cd000+46000] Aug 29 10:24:35 systemd: ceph-osd@0.service: main process exited, code=killed, status=11/SEGV Aug 29 10:24:35 systemd: Unit ceph-osd@0.service entered failed state. Aug 29 10:24:35 systemd: ceph-osd@0.service failed >>> These systemd messages aren't usually helpful, try poking around >>> /var/log/ceph/ for the output on that one OSD. >>> >>> If those logs aren't useful either, try bumping up the verbosity (see >>> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time >>> ) did I hit a known issue? any suggestions are highly appreciated br wolfgang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> -- >> Wolfgang Lendl >> IT Systems & Communications >> Medizinische Universität Wien >> Spitalgasse 23 / BT 88 /Ebene 00 >> A-1090 Wien >> Tel: +43 1 40160-21231 >> Fax: +43 1 40160-921200 >> >> -- Wolfgang Lendl IT Systems & Communications Medizinische Universität Wien Spitalgasse 23 / BT 88 /Ebene 00 A-1090 Wien Tel: +43 1 40160-21231 Fax: +43 1 40160-921200 signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Degraded data redundancy: NUM pgs undersized
Good morning folks, As a newbie to Ceph yesterday was the first time I've configured my CRUSH map, added a CRUSH rule and created my first pool using this rule. Since then I get the status HEALTH_WARN with the following output: ~~~ $ sudo ceph status cluster: id: 47c108bd-db66-4197-96df-cadde9e9eb45 health: HEALTH_WARN Degraded data redundancy: 128 pgs undersized 1 pools have pg_num > pgp_num services: mon: 3 daemons, quorum ccp-tcnm01,ccp-tcnm02,ccp-tcnm03 mgr: ccp-tcnm01(active), standbys: ccp-tcnm03, ccp-tcnm02 osd: 3 osds: 3 up, 3 in data: pools: 1 pools, 128 pgs objects: 0 objects, 0 bytes usage: 3088 MB used, 3068 GB / 3071 GB avail pgs: 128 active+undersized ~~~ The pool was created running `sudo ceph osd pool create joergsfirstpool 128 replicated replicate_datacenter`. I've figured out that I forgot to set the value for the key pgp_num accordingly. So I've done that by running `sudo ceph osd pool set joergsfirstpool pgp_num 128`. As you could see in the following output 15 PGs were remapped but 113 still remain in active+undersized. ~~~ $ sudo ceph status cluster: id: 47c108bd-db66-4197-96df-cadde9e9eb45 health: HEALTH_WARN Degraded data redundancy: 113 pgs undersized services: mon: 3 daemons, quorum ccp-tcnm01,ccp-tcnm02,ccp-tcnm03 mgr: ccp-tcnm01(active), standbys: ccp-tcnm03, ccp-tcnm02 osd: 3 osds: 3 up, 3 in; 15 remapped pgs data: pools: 1 pools, 128 pgs objects: 0 objects, 0 bytes usage: 3089 MB used, 3068 GB / 3071 GB avail pgs: 113 active+undersized 15 active+clean+remapped ~~~ My questions are: 1. What does active+undersized actually mean? I did not find anything about it in the documentation on docs.ceph.com. 2. Why are only 15 PGs were getting remapped after I've corrected the mistake with the wrong pgp_num value? 3. What's wrong here and what do I have to do to get the cluster back to active+clean, again? For further information you could find my current CRUSH map below: # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 class hdd device 1 osd.1 class hdd device 2 osd.2 class hdd # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host ccp-tcnm01 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily # weight 1.000 alg straw2 hash 0 # rjenkins1 item osd.1 weight 1.000 } host ccp-tcnm03 { id -7 # do not change unnecessarily id -8 class hdd # do not change unnecessarily # weight 1.000 alg straw2 hash 0 # rjenkins1 item osd.2 weight 1.000 } datacenter dc1 { id -9 # do not change unnecessarily id -12 class hdd# do not change unnecessarily # weight 2.000 alg straw2 hash 0 # rjenkins1 item ccp-tcnm01 weight 1.000 item ccp-tcnm03 weight 1.000 } host ccp-tcnm02 { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily # weight 1.000 alg straw2 hash 0 # rjenkins1 item osd.0 weight 1.000 } datacenter dc3 { id -10 # do not change unnecessarily id -11 class hdd# do not change unnecessarily # weight 1.000 alg straw2 hash 0 # rjenkins1 item ccp-tcnm02 weight 1.000 } root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily # weight 3.000 alg straw2 hash 0 # rjenkins1 item dc1 weight 2.000 item dc3 weight 1.000 } # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule replicate_datacenter { id 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type datacenter step emit } # end crush map Best regards, Joerg smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com