Re: [ceph-users] Down monitors after adding mds node
On Mon, Oct 3, 2016 at 6:29 AM, Adam Tygart wrote: > I put this in the #ceph-dev on Friday, > > (gdb) print info > $7 = (const MDSMap::mds_info_t &) @0x5fb1da68: { > global_id = { boost::totally_ordered2 boost::detail::empty_base > >> = > { boost::equality_comparable1 boost::totally_ordered2 boost::detail::empty_base > > >> = > { boost::totally_ordered2 boost::detail::empty_base > >> = > { boost::detail::empty_base >> = > { boost::equality_comparable2 boost::detail::empty_base > >> = > { boost::detail::empty_base >> = > {> = {}, fields>}, }, }, }, data fields>}, }, t = 1055992652}, name = "mormo", > rank = -1, inc = 0, > state = MDSMap::STATE_STANDBY, state_seq = 2, addr = {type = 0, > nonce = 8835, {addr = {ss_family = 2, __ss_align = 0, __ss_padding = > '\000' }, addr4 = {sin_family = 2, sin_port = > 36890, > sin_addr = {s_addr = 50398474}, sin_zero = > "\000\000\000\000\000\000\000"}, addr6 = {sin6_family = 2, sin6_port = > 36890, sin6_flowinfo = 50398474, sin6_addr = {__in6_u = { > __u6_addr8 = '\000' , __u6_addr16 = {0, > 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = > 0}}}, laggy_since = {tv = {tv_sec = 0, tv_nsec = 0}}, > standby_for_rank = 0, standby_for_name = "", standby_for_fscid = > 328, standby_replay = true, export_targets = std::set with 0 elements, > mds_features = 1967095022025} > (gdb) print target_role > $8 = {rank = 0, fscid = } > > It looks like target_role.fscid was somehow optimized out. Thanks for this, let's switch discussion to the ticket (I think I know what's wrong now). John > > -- > Adam > > On Sun, Oct 2, 2016 at 4:26 PM, Gregory Farnum wrote: >> On Sat, Oct 1, 2016 at 7:19 PM, Adam Tygart wrote: >>> The wip-fixup-mds-standby-init branch doesn't seem to allow the >>> ceph-mons to start up correctly. I disabled all mds servers before >>> starting the monitors up, so it would seem the pending mdsmap update >>> is in durable storage. Now that the mds servers are down, can we clear >>> the mdsmap of active and standby servers while initializing the mons? >>> I would hope that, now that all the versions are in sync, a bad >>> standby_for_fscid would not be possible with new mds servers starting. >> >> Looks like my first guess about the run-time initialization being >> confused was wrong. :( >> Given that, we're pretty befuddled. But I commented on irc: >> >>>if you've still got a core dump, can you go up a frame (to >>>MDSMonitor::maybe_promote_standby) and check the values of target_role.rank >>>and target_role.fscid, and how that compares to info.standby_for_fscid, >>>info.legacy_client_fscid, and info.standby_for_rank? >> >> That might pop up something and isn't accessible in the log you >> posted. We also can't see an osdmap or dump; if you could either >> extract and print that or get a log which includes it that might show >> up something. >> >> I don't think we changed the mds<-> protocol or anything in the point >> releases, so the different package version *shouldn't* matter...right, >> John? ;) >> -Greg >> >>> >>> -- >>> Adam >>> >>> On Fri, Sep 30, 2016 at 3:49 PM, Gregory Farnum wrote: On Fri, Sep 30, 2016 at 11:39 AM, Adam Tygart wrote: > Hello all, > > Not sure if this went through before or not, as I can't check the > mailing list archives. > > I've gotten myself into a bit of a bind. I was prepping to add a new > mds node to my ceph cluster. e.g. ceph-deploy mds create mormo > > Unfortunately, it started the mds server before I was ready. My > cluster was running 10.2.1, and the newly deployed mds is 10.2.3. > > This caused 3 of my 5 monitors to crash. Since I immediately realized > the mds was a newer version, I took that opportunity to upgrade my > monitors to 10.2.3. Three of the 5 monitors continue to crash. And it > looks like they are crashing when trying to apply a pending mdsmap > update. > > The log is available here: > http://people.cis.ksu.edu/~mozes/hobbit01.mon-20160930.log.gz > > I have attempted (making backups of course) to extract the monmap from > a working monitor and inserting it into a broken one. No luck, and > backup was restored. > > Since I had 2 working monitors, I backed up the monitor stores, > updated the monmaps to remove the broken ones and tried to restart > them. I then tried to restart the "working" ones. They then failed in > the same way. I've now restored my backups of those monitors. > > I need to get these monitors back up post-haste. > > If you've got any ideas, I would be grateful. I'm not sure but it looks like it's now too late to keep the problem out of the durable storage, but if you try again make sure you turn off the MDS first. It sort of looks like you've managed to get a failed MDS with an invalid fscid (ie, a cephfs filesystem ID). ...or maybe just a terrible coding mistake.
Re: [ceph-users] Down monitors after adding mds node
Sent before I was ready, oops. How might I get the osdmap from a down cluster? -- Adam On Mon, Oct 3, 2016 at 12:29 AM, Adam Tygart wrote: > I put this in the #ceph-dev on Friday, > > (gdb) print info > $7 = (const MDSMap::mds_info_t &) @0x5fb1da68: { > global_id = { boost::totally_ordered2 boost::detail::empty_base > >> = > { boost::equality_comparable1 boost::totally_ordered2 boost::detail::empty_base > > >> = > { boost::totally_ordered2 boost::detail::empty_base > >> = > { boost::detail::empty_base >> = > { boost::equality_comparable2 boost::detail::empty_base > >> = > { boost::detail::empty_base >> = > {> = {}, fields>}, }, }, }, data fields>}, }, t = 1055992652}, name = "mormo", > rank = -1, inc = 0, > state = MDSMap::STATE_STANDBY, state_seq = 2, addr = {type = 0, > nonce = 8835, {addr = {ss_family = 2, __ss_align = 0, __ss_padding = > '\000' }, addr4 = {sin_family = 2, sin_port = > 36890, > sin_addr = {s_addr = 50398474}, sin_zero = > "\000\000\000\000\000\000\000"}, addr6 = {sin6_family = 2, sin6_port = > 36890, sin6_flowinfo = 50398474, sin6_addr = {__in6_u = { > __u6_addr8 = '\000' , __u6_addr16 = {0, > 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = > 0}}}, laggy_since = {tv = {tv_sec = 0, tv_nsec = 0}}, > standby_for_rank = 0, standby_for_name = "", standby_for_fscid = > 328, standby_replay = true, export_targets = std::set with 0 elements, > mds_features = 1967095022025} > (gdb) print target_role > $8 = {rank = 0, fscid = } > > It looks like target_role.fscid was somehow optimized out. > > -- > Adam > > On Sun, Oct 2, 2016 at 4:26 PM, Gregory Farnum wrote: >> On Sat, Oct 1, 2016 at 7:19 PM, Adam Tygart wrote: >>> The wip-fixup-mds-standby-init branch doesn't seem to allow the >>> ceph-mons to start up correctly. I disabled all mds servers before >>> starting the monitors up, so it would seem the pending mdsmap update >>> is in durable storage. Now that the mds servers are down, can we clear >>> the mdsmap of active and standby servers while initializing the mons? >>> I would hope that, now that all the versions are in sync, a bad >>> standby_for_fscid would not be possible with new mds servers starting. >> >> Looks like my first guess about the run-time initialization being >> confused was wrong. :( >> Given that, we're pretty befuddled. But I commented on irc: >> >>>if you've still got a core dump, can you go up a frame (to >>>MDSMonitor::maybe_promote_standby) and check the values of target_role.rank >>>and target_role.fscid, and how that compares to info.standby_for_fscid, >>>info.legacy_client_fscid, and info.standby_for_rank? >> >> That might pop up something and isn't accessible in the log you >> posted. We also can't see an osdmap or dump; if you could either >> extract and print that or get a log which includes it that might show >> up something. >> >> I don't think we changed the mds<-> protocol or anything in the point >> releases, so the different package version *shouldn't* matter...right, >> John? ;) >> -Greg >> >>> >>> -- >>> Adam >>> >>> On Fri, Sep 30, 2016 at 3:49 PM, Gregory Farnum wrote: On Fri, Sep 30, 2016 at 11:39 AM, Adam Tygart wrote: > Hello all, > > Not sure if this went through before or not, as I can't check the > mailing list archives. > > I've gotten myself into a bit of a bind. I was prepping to add a new > mds node to my ceph cluster. e.g. ceph-deploy mds create mormo > > Unfortunately, it started the mds server before I was ready. My > cluster was running 10.2.1, and the newly deployed mds is 10.2.3. > > This caused 3 of my 5 monitors to crash. Since I immediately realized > the mds was a newer version, I took that opportunity to upgrade my > monitors to 10.2.3. Three of the 5 monitors continue to crash. And it > looks like they are crashing when trying to apply a pending mdsmap > update. > > The log is available here: > http://people.cis.ksu.edu/~mozes/hobbit01.mon-20160930.log.gz > > I have attempted (making backups of course) to extract the monmap from > a working monitor and inserting it into a broken one. No luck, and > backup was restored. > > Since I had 2 working monitors, I backed up the monitor stores, > updated the monmaps to remove the broken ones and tried to restart > them. I then tried to restart the "working" ones. They then failed in > the same way. I've now restored my backups of those monitors. > > I need to get these monitors back up post-haste. > > If you've got any ideas, I would be grateful. I'm not sure but it looks like it's now too late to keep the problem out of the durable storage, but if you try again make sure you turn off the MDS first. It sort of looks like you've managed to get a failed MDS with an invalid fscid (ie, a cephfs filesystem ID). ...or maybe just a terrible coding mistake. As ment
Re: [ceph-users] Give up on backfill, remove slow OSD
On 22. sep. 2016 09:16, Iain Buclaw wrote: Hi, I currently have an OSD that has been backfilling data off it for a little over two days now, and it's gone from approximately 68 PGs to 63. As data is still being read from, and written to it by clients whilst I'm trying to get it out of the cluster, this is not helping it at all. I figured that it's probably best just to cut my losses and just force it out entirely so that all new writes and reads to those PGs get redirected elsewhere to a functional disk, and the rest of the recovery can proceed without being blocked heavily by this one disk. Granted that objects and files have a 1:1 relationship, I can just rsync the data to a new server and write it back into ceph afterwards. Now, I know that as soon as I bring down this OSD, the entire cluster will stop operating. So what's the most swift method of telling the cluster to forget about this disk and everything that may be stored on it. Thanks It should normally not get new writes to it if you want to remove it from the cluster. I assume you did something wrong here. How did you define the osd out of the cluster ? generally my procedure for a working osd is something like 1. ceph osd crush reweight osd.X 0 2. ceph osd tree check that the osd in question actualy have 0 weight (first number after ID) and that the host weight have been reduced accordingly. 3. ls /var/lib/ceph/osd/cph-X/current ; periodically wait for the osd to drain, there should be no PG directories n.xxx_head or n.xxx_TEMP this will take a while depending on the size of the osd. in reality i just wait until the disk usage graph settle, then doublecheck with ls. 4: once empty I mark the osd out, stop the process, and removes the osd from the cluster as written in the documentation - ceph auth del osd.x - ceph osd crush remove osd.x - ceph osd rm osd.x PS: if your cluster stops to operate when a osd goes down, you have something else fundamentally wrong. you should look into this as well as a separate case. kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS: No space left on device
Hi Johan, Many thanks for your reply. I will try to play with the mds tunables and report back to your ASAP. So far I see that mds log contains a lot of errors of the following kind: 2016-10-02 11:58:03.002769 7f8372d54700 0 mds.0.cache.dir(100056ddecd) _fetched badness: got (but i already had) [inode 10005729a77 [2,head] ~mds0/stray1/10005729a77 auth v67464942 s=196728 nl=0 n(v0 b196728 1=1+0) (iversion lock) 0x7f84acae82a0] mode 33204 mtime 2016-08-07 23:06:29.776298 2016-10-02 11:58:03.002789 7f8372d54700 -1 log_channel(cluster) log [ERR] : loaded dup inode 10005729a77 [2,head] v68621 at /users/mykola/mms/NCSHNO/final/120nm-uniform-h8200/j002654.out/m_xrange192-320_yrange192-320_016232.dump, but inode 10005729a77.head v67464942 already exists at ~mds0/stray1/10005729a77 Those folders within mds.0.cache.dir that got badness report a size of 16EB on the clients. rm on them fails with 'Directory not empty'. As for the "Client failing to respond to cache pressure", I have 2 kernel clients on 4.4.21, 1 on 4.7.5 and 16 fuse clients always running the most recent release version of ceph-fuse. The funny thing is that every single client misbehaves from time to time. I am aware of quite discussion about this issue on the ML, but cannot really follow how to debug it. Regards, -Mykola On 2 October 2016 at 22:27, John Spray wrote: > On Sun, Oct 2, 2016 at 11:09 AM, Mykola Dvornik > wrote: > > After upgrading to 10.2.3 we frequently see messages like > > From which version did you upgrade? > > > 'rm: cannot remove '...': No space left on device > > > > The folders we are trying to delete contain approx. 50K files 193 KB > each. > > My guess would be that you are hitting the new > mds_bal_fragment_size_max check. This limits the number of entries > that the MDS will create in a single directory fragment, to avoid > overwhelming the OSD with oversized objects. It is 10 by default. > This limit also applies to "stray" directories where unlinked files > are put while they wait to be purged, so you could get into this state > while doing lots of deletions. There are ten stray directories that > get a roughly even share of files, so if you have more than about one > million files waiting to be purged, you could see this condition. > > The "Client failing to respond to cache pressure" messages may play a > part here -- if you have misbehaving clients then they may cause the > MDS to delay purging stray files, leading to a backlog. If your > clients are by any chance older kernel clients, you should upgrade > them. You can also unmount/remount them to clear this state, although > it will reoccur until the clients are updated (or until the bug is > fixed, if you're running latest clients already). > > The high level counters for strays are part of the default output of > "ceph daemonperf mds." when run on the MDS server (the "stry" and > "purg" columns). You can look at these to watch how fast the MDS is > clearing out strays. If your backlog is just because it's not doing > it fast enough, then you can look at tuning mds_max_purge_files and > mds_max_purge_ops to adjust the throttles on purging. Those settings > can be adjusted without restarting the MDS using the "injectargs" > command (http://docs.ceph.com/docs/master/rados/operations/ > control/#mds-subsystem) > > Let us know how you get on. > > John > > > > The cluster state and storage available are both OK: > > > > cluster 98d72518-6619-4b5c-b148-9a781ef13bcb > > health HEALTH_WARN > > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > > pressure > > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > > pressure > > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > > pressure > > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > > pressure > > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > > pressure > > monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0} > > election epoch 11, quorum 0 000-s-ragnarok > > fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active} > > osdmap e20203: 16 osds: 16 up, 16 in > > flags sortbitwise > > pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 kobjects > > 23048 GB used, 6745 GB / 29793 GB avail > > 1085 active+clean > >2 active+clean+scrubbing > >1 active+clean+scrubbing+deep > > > > > > Has anybody experienced this issue so far? > > > > Regards, > > -- > > Mykola > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- Mykola ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Down monitors after adding mds node
On Sat, Oct 1, 2016 at 7:19 PM, Adam Tygart wrote: > The wip-fixup-mds-standby-init branch doesn't seem to allow the > ceph-mons to start up correctly. I disabled all mds servers before > starting the monitors up, so it would seem the pending mdsmap update > is in durable storage. Now that the mds servers are down, can we clear > the mdsmap of active and standby servers while initializing the mons? > I would hope that, now that all the versions are in sync, a bad > standby_for_fscid would not be possible with new mds servers starting. Looks like my first guess about the run-time initialization being confused was wrong. :( Given that, we're pretty befuddled. But I commented on irc: >if you've still got a core dump, can you go up a frame (to >MDSMonitor::maybe_promote_standby) and check the values of target_role.rank >and target_role.fscid, and how that compares to info.standby_for_fscid, >info.legacy_client_fscid, and info.standby_for_rank? That might pop up something and isn't accessible in the log you posted. We also can't see an osdmap or dump; if you could either extract and print that or get a log which includes it that might show up something. I don't think we changed the mds<-> protocol or anything in the point releases, so the different package version *shouldn't* matter...right, John? ;) -Greg > > -- > Adam > > On Fri, Sep 30, 2016 at 3:49 PM, Gregory Farnum wrote: >> On Fri, Sep 30, 2016 at 11:39 AM, Adam Tygart wrote: >>> Hello all, >>> >>> Not sure if this went through before or not, as I can't check the >>> mailing list archives. >>> >>> I've gotten myself into a bit of a bind. I was prepping to add a new >>> mds node to my ceph cluster. e.g. ceph-deploy mds create mormo >>> >>> Unfortunately, it started the mds server before I was ready. My >>> cluster was running 10.2.1, and the newly deployed mds is 10.2.3. >>> >>> This caused 3 of my 5 monitors to crash. Since I immediately realized >>> the mds was a newer version, I took that opportunity to upgrade my >>> monitors to 10.2.3. Three of the 5 monitors continue to crash. And it >>> looks like they are crashing when trying to apply a pending mdsmap >>> update. >>> >>> The log is available here: >>> http://people.cis.ksu.edu/~mozes/hobbit01.mon-20160930.log.gz >>> >>> I have attempted (making backups of course) to extract the monmap from >>> a working monitor and inserting it into a broken one. No luck, and >>> backup was restored. >>> >>> Since I had 2 working monitors, I backed up the monitor stores, >>> updated the monmaps to remove the broken ones and tried to restart >>> them. I then tried to restart the "working" ones. They then failed in >>> the same way. I've now restored my backups of those monitors. >>> >>> I need to get these monitors back up post-haste. >>> >>> If you've got any ideas, I would be grateful. >> >> I'm not sure but it looks like it's now too late to keep the problem >> out of the durable storage, but if you try again make sure you turn >> off the MDS first. >> >> It sort of looks like you've managed to get a failed MDS with an >> invalid fscid (ie, a cephfs filesystem ID). >> >> ...or maybe just a terrible coding mistake. As mentioned on irc, >> wip-fixup-mds-standby-init should fix it. I've created a ticket as >> well: http://tracker.ceph.com/issues/17466 >> -Greg >> >> >>> >>> -- >>> Adam >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Blog post about Ceph cache tiers - feedback welcome
Hi Sascha, Good article, you might want to add a small section about these two variables osd_agent_max_high_ops osd_agent_max_ops They control how many concurrent flushes happen at the high/low thresholds. Ie you can set the low one to 1 to minimise the impact on client IO. Also the target_max_bytes is calculated on a per PG basis, so the value is divided across PG's. As data distribution is not equal across all PG's you can get into a situation where you are getting cache full warnings, even though the total cache utilisation is below the target_max_bytes, so leave it plenty of headroom. Nick > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Sascha Vogt > Sent: 02 October 2016 20:59 > To: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org > Subject: [ceph-users] Blog post about Ceph cache tiers - feedback welcome > > Hi all, > > as it took quite a while until we got our Ceph cache working (and we're still > hit but some unexpected things, see the thread Ceph with > cache pool - disk usage / cleanup), I thought it might be good to write a > summary of what I (believe) to know up to this point. > > Any feedback, especially corrections is highly welcome! > > http://maybebuggy.de/post/ceph-cache-tier/ > > Greetings > -Sascha- > > PS: Posted to ceph-devel as well, just in case a developer spots some > mistakes. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS: No space left on device
On Sun, Oct 2, 2016 at 11:09 AM, Mykola Dvornik wrote: > After upgrading to 10.2.3 we frequently see messages like >From which version did you upgrade? > 'rm: cannot remove '...': No space left on device > > The folders we are trying to delete contain approx. 50K files 193 KB each. My guess would be that you are hitting the new mds_bal_fragment_size_max check. This limits the number of entries that the MDS will create in a single directory fragment, to avoid overwhelming the OSD with oversized objects. It is 10 by default. This limit also applies to "stray" directories where unlinked files are put while they wait to be purged, so you could get into this state while doing lots of deletions. There are ten stray directories that get a roughly even share of files, so if you have more than about one million files waiting to be purged, you could see this condition. The "Client failing to respond to cache pressure" messages may play a part here -- if you have misbehaving clients then they may cause the MDS to delay purging stray files, leading to a backlog. If your clients are by any chance older kernel clients, you should upgrade them. You can also unmount/remount them to clear this state, although it will reoccur until the clients are updated (or until the bug is fixed, if you're running latest clients already). The high level counters for strays are part of the default output of "ceph daemonperf mds." when run on the MDS server (the "stry" and "purg" columns). You can look at these to watch how fast the MDS is clearing out strays. If your backlog is just because it's not doing it fast enough, then you can look at tuning mds_max_purge_files and mds_max_purge_ops to adjust the throttles on purging. Those settings can be adjusted without restarting the MDS using the "injectargs" command (http://docs.ceph.com/docs/master/rados/operations/control/#mds-subsystem) Let us know how you get on. John > The cluster state and storage available are both OK: > > cluster 98d72518-6619-4b5c-b148-9a781ef13bcb > health HEALTH_WARN > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > pressure > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > pressure > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > pressure > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > pressure > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache > pressure > monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0} > election epoch 11, quorum 0 000-s-ragnarok > fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active} > osdmap e20203: 16 osds: 16 up, 16 in > flags sortbitwise > pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 kobjects > 23048 GB used, 6745 GB / 29793 GB avail > 1085 active+clean >2 active+clean+scrubbing >1 active+clean+scrubbing+deep > > > Has anybody experienced this issue so far? > > Regards, > -- > Mykola > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph with Cache pool - disk usage / cleanup | writeup
Hi all, just a quick writeup. Over the last two days I was able to evict a lot of those 0-byte files by setting "target_max_objects" to 2 millions. After we hit that limit I set it to 10 millions for now. So target_dirty_ratio of 0.6 would mean evicting should start at around 6 million objects. target_full_ratio is set to 0.9, so overall no more than 9 million objects should exist in the cache. Remember we started at 109 million total and 24 million dirty. Now I still have quite some 0-bytes left over in our cache pool (see listing at the end), but we'll see how they develop over the next days. Having set the limit so low, we evicted nearly the whole cache (from 9 TB total storage space only 800 GB remained). Luckily the difference from the original question is now down to around 50 GB (quite some savings from 860 GB which we started ;) ) ceph df detail now lists 2.3 million objects and 1.7 million dirty. Thanks a lot Christian and Burkhard for all the help and clarifications and your informations have been preserved in a blog post (see other post to this mailing list). Greetings -Sascha- File count (total and 0-bytes per OSD): OSD-20 total: 315998 OSD-20 0-bytes: 301835 OSD-21 total: 224645 OSD-21 0-bytes: 212026 OSD-22 total: 208189 OSD-22 0-bytes: 196139 OSD-23 total: 357256 OSD-23 0-bytes: 342350 OSD-24 total: 232800 OSD-24 0-bytes: 220466 OSD-25 total: 235298 OSD-25 0-bytes: 222985 OSD-26 total: 236957 OSD-26 0-bytes: 224345 OSD-27 total: 265974 OSD-27 0-bytes: 252538 OSD-28 total: 253577 OSD-28 0-bytes: 241265 OSD-29 total: 255774 OSD-29 0-bytes: 242891 OSD-30 total: 209818 OSD-30 0-bytes: 198581 OSD-31 total: 276357 OSD-31 0-bytes: 262294 OSD-32 total: 239600 OSD-32 0-bytes: 226639 OSD-33 total: 245248 OSD-33 0-bytes: 232712 OSD-34 total: 267156 OSD-34 0-bytes: 253815 OSD-35 total: 250241 OSD-35 0-bytes: 237709 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Blog post about Ceph cache tiers - feedback welcome
Hi all, as it took quite a while until we got our Ceph cache working (and we're still hit but some unexpected things, see the thread Ceph with cache pool - disk usage / cleanup), I thought it might be good to write a summary of what I (believe) to know up to this point. Any feedback, especially corrections is highly welcome! http://maybebuggy.de/post/ceph-cache-tier/ Greetings -Sascha- PS: Posted to ceph-devel as well, just in case a developer spots some mistakes. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New Cluster OSD Issues
Hi Pankaj, On 30/09/16 17:31, Garg, Pankaj wrote: I just created a new cluster with 0.94.8 and I’m getting this message: 2016-09-29 21:36:47.065642 mon.0 [INF] disallowing boot of OSD osd.35 10.22.21.49:6844/9544 because the osdmap requires CEPH_FEATURE_SERVER_JEWEL but the osd lacks CEPH_FEATURE_SERVER_JEWEL This is really bizzare. All the OSDS are down due to this. Can someone shed any light? Are you sure you used Hammer (0.94.x) in all places? Looks like your monitor daemon is on Jewel already (or at least thats what I'm reading from the printed info log). Greetings -Sascha- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS: No space left on device
After upgrading to 10.2.3 we frequently see messages like 'rm: cannot remove '...': No space left on device The folders we are trying to delete contain approx. 50K files 193 KB each. The cluster state and storage available are both OK: cluster 98d72518-6619-4b5c-b148-9a781ef13bcb health HEALTH_WARN mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure mds0: Client XXX.XXX.XXX.XXX failing to respond to cache pressure monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0} election epoch 11, quorum 0 000-s-ragnarok fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active} osdmap e20203: 16 osds: 16 up, 16 in flags sortbitwise pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 kobjects 23048 GB used, 6745 GB / 29793 GB avail 1085 active+clean 2 active+clean+scrubbing 1 active+clean+scrubbing+deep Has anybody experienced this issue so far? Regards, -- Mykola ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] unfound objects blocking cluster, need help!
Hi, Do you understand why removing that osd led to unfound objects? Do you have the ceph.log from yesterday? Cheers, Dan On 2 Oct 2016 10:18, "Tomasz Kuzemko" wrote: > > Forgot to mention Ceph version - 0.94.5. > > I managed to fix this. By chance I found that when an OSD for a blocked PG is starting, there is a few-second time window (after load_pgs) in which it accepts commands related to the blocked PG. So first I managed to capture "ceph pg PGID query" this way. Then I tried to issue "ceph pg missing_lost delete" and it worked too. After deleting all unfound objects this way cluster finally unblocked. Before that I exported all blocked PGs so hopefully I will be able to recover those 17 objects to a near-latest state. > > Hope this helps anyone who might run into the same problem. > > > 2016-10-01 14:27 GMT+02:00 Tomasz Kuzemko : >> >> Hi, >> >> I have a production cluster on which 1 OSD on a failing disk was slowing the whole cluster down. I removed the OSD (osd.87) like usual in such case but this time it resulted in 17 unfound objects. I no longer have the files from osd.87. I was able to call "ceph pg PGID mark_unfound_lost delete" on 10 of those objects. >> >> On the remaining objects 7 the command blocks. When I try to do "ceph pg PGID query" on this PG it also blocks. I suspect this is same reason why mark_unfound blocks. >> >> Other client IO to PGs that have unfound objects are also blocked. When trying to query the OSDs which has the PG with unfound objects, "ceph tell" blocks. >> >> I tried to mark the PG as complete using ceph-objectstore-tool but it did not help as the PG is in fact complete but for some reason blocks. >> >> I tried recreating an empty osd.87 and importing the PG exported from other replica but it did not help. >> >> Can someone help me please? This is really important. >> >> ceph pg dump: >> https://gist.github.com/anonymous/c0622ef0d8c0ac84e0778e73bad3c1af/raw/206a06e674ed1c870bbb09bb75fe4285a8e20ba4/pg-dump >> >> ceph osd dump: >> https://gist.github.com/anonymous/64e237d85016af6bd7879ef272ca5639/raw/d6fceb9acd206b75c3ce59c60bcd55a47dea7acd/osd-dump >> >> ceph health detail: >> https://gist.github.com/anonymous/ddb27863ecd416748ebd7ebbc036e438/raw/59ef1582960e011f10cbdbd4ccee509419b95d4e/health-detail >> >> >> -- >> Pozdrawiam, >> Tomasz Kuzemko >> tom...@kuzemko.net > > > > > -- > Pozdrawiam, > Tomasz Kuzemko > tom...@kuzemko.net > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] unfound objects blocking cluster, need help!
Forgot to mention Ceph version - 0.94.5. I managed to fix this. By chance I found that when an OSD for a blocked PG is starting, there is a few-second time window (after load_pgs) in which it accepts commands related to the blocked PG. So first I managed to capture "ceph pg PGID query" this way. Then I tried to issue "ceph pg missing_lost delete" and it worked too. After deleting all unfound objects this way cluster finally unblocked. Before that I exported all blocked PGs so hopefully I will be able to recover those 17 objects to a near-latest state. Hope this helps anyone who might run into the same problem. 2016-10-01 14:27 GMT+02:00 Tomasz Kuzemko : > Hi, > > I have a production cluster on which 1 OSD on a failing disk was slowing > the whole cluster down. I removed the OSD (osd.87) like usual in such case > but this time it resulted in 17 unfound objects. I no longer have the files > from osd.87. I was able to call "ceph pg PGID mark_unfound_lost delete" on > 10 of those objects. > > On the remaining objects 7 the command blocks. When I try to do "ceph pg > PGID query" on this PG it also blocks. I suspect this is same reason why > mark_unfound blocks. > > Other client IO to PGs that have unfound objects are also blocked. When > trying to query the OSDs which has the PG with unfound objects, "ceph tell" > blocks. > > I tried to mark the PG as complete using ceph-objectstore-tool but it did > not help as the PG is in fact complete but for some reason blocks. > > I tried recreating an empty osd.87 and importing the PG exported from > other replica but it did not help. > > Can someone help me please? This is really important. > > ceph pg dump: > https://gist.github.com/anonymous/c0622ef0d8c0ac84e0778e73bad3c1af/raw/ > 206a06e674ed1c870bbb09bb75fe4285a8e20ba4/pg-dump > > ceph osd dump: > https://gist.github.com/anonymous/64e237d85016af6bd7879ef272ca5639/raw/ > d6fceb9acd206b75c3ce59c60bcd55a47dea7acd/osd-dump > > ceph health detail: > https://gist.github.com/anonymous/ddb27863ecd416748ebd7ebbc036e438/raw/ > 59ef1582960e011f10cbdbd4ccee509419b95d4e/health-detail > > > -- > Pozdrawiam, > Tomasz Kuzemko > tom...@kuzemko.net > -- Pozdrawiam, Tomasz Kuzemko tom...@kuzemko.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Again: Unknown error (95->500) when creating buckets or putting files to RGW after upgrade from Infernalis to Jewel
dear all I have exactly the same problem as reported in thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011599.html I have openstack mitaka, upgraded radosgw from infernalis to jewel testing swift, creation of containers and object (not yet tried creation of s3 buckets and upload "files") I run the script described here https://www.mail-archive.com/ceph-users@lists.ceph.com/msg30567.html after I run radosgw-admin period update radosgw-admin period commit before posting any config files or logs, I manage to create swift containers OK when I try to upload a file "create object" I get the error reported above thread I have checked the ceph.conf file against the ones in others threads it's using rgw frontends = "civetweb port=9000" in my client.rgw.XXX.log 2016-10-02 08:14:42.799570 7f6102ffd700 0 WARNING: set_req_state_err err_no=95 resorting to 500 2016-10-02 08:14:42.799660 7f6102ffd700 2 req 1:0.613066:swift:PUT /swift/v1/testMD/curly:put_obj:op status=-95 2016-10-02 08:14:42.799666 7f6102ffd700 2 req 1:0.613073:swift:PUT /swift/v1/testMD/curly:put_obj:http status=500 2016-10-02 08:14:42.799674 7f6102ffd700 1 == req done req=0x7f6102ff7710 op status=-95 http_status=500 == the container "testMD" was successfully created earlier, I think all interaction with my keystone v3 are OK so my question to the list and specially to Maciej Naruszewicz do you have the swift and/or s3 apis of ceph jewel (the radosgw) working if there was a solution to Maciej (and mine) threads? tia I can provide more detailed conf or logs best Mario David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com