Re: [ceph-users] Down monitors after adding mds node

2016-10-02 Thread John Spray
On Mon, Oct 3, 2016 at 6:29 AM, Adam Tygart  wrote:
> I put this in the #ceph-dev on Friday,
>
> (gdb) print info
> $7 = (const MDSMap::mds_info_t &) @0x5fb1da68: {
>   global_id = { boost::totally_ordered2 boost::detail::empty_base > >> =
> { boost::equality_comparable1 boost::totally_ordered2 boost::detail::empty_base > > >> =
> { boost::totally_ordered2 boost::detail::empty_base > >> =
> { boost::detail::empty_base >> =
> { boost::equality_comparable2 boost::detail::empty_base > >> =
> { boost::detail::empty_base >> =
> {> = {},  fields>}, }, }, },  data fields>}, }, t = 1055992652}, name = "mormo",
> rank = -1, inc = 0,
>   state = MDSMap::STATE_STANDBY, state_seq = 2, addr = {type = 0,
> nonce = 8835, {addr = {ss_family = 2, __ss_align = 0, __ss_padding =
> '\000' }, addr4 = {sin_family = 2, sin_port =
> 36890,
> sin_addr = {s_addr = 50398474}, sin_zero =
> "\000\000\000\000\000\000\000"}, addr6 = {sin6_family = 2, sin6_port =
> 36890, sin6_flowinfo = 50398474, sin6_addr = {__in6_u = {
> __u6_addr8 = '\000' , __u6_addr16 = {0,
> 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id =
> 0}}}, laggy_since = {tv = {tv_sec = 0, tv_nsec = 0}},
>   standby_for_rank = 0, standby_for_name = "", standby_for_fscid =
> 328, standby_replay = true, export_targets = std::set with 0 elements,
> mds_features = 1967095022025}
> (gdb) print target_role
> $8 = {rank = 0, fscid = }
>
> It looks like target_role.fscid was somehow optimized out.

Thanks for this, let's switch discussion to the ticket (I think I know
what's wrong now).

John

>
> --
> Adam
>
> On Sun, Oct 2, 2016 at 4:26 PM, Gregory Farnum  wrote:
>> On Sat, Oct 1, 2016 at 7:19 PM, Adam Tygart  wrote:
>>> The wip-fixup-mds-standby-init branch doesn't seem to allow the
>>> ceph-mons to start up correctly. I disabled all mds servers before
>>> starting the monitors up, so it would seem the pending mdsmap update
>>> is in durable storage. Now that the mds servers are down, can we clear
>>> the mdsmap of active and standby servers while initializing the mons?
>>> I would hope that, now that all the versions are in sync, a bad
>>> standby_for_fscid would not be possible with new mds servers starting.
>>
>> Looks like my first guess about the run-time initialization being
>> confused was wrong. :(
>> Given that, we're pretty befuddled. But I commented on irc:
>>
>>>if you've still got a core dump, can you go up a frame (to 
>>>MDSMonitor::maybe_promote_standby) and check the values of target_role.rank 
>>>and target_role.fscid, and how that compares to info.standby_for_fscid, 
>>>info.legacy_client_fscid, and info.standby_for_rank?
>>
>> That might pop up something and isn't accessible in the log you
>> posted. We also can't see an osdmap or dump; if you could either
>> extract and print that or get a log which includes it that might show
>> up something.
>>
>> I don't think we changed the mds<-> protocol or anything in the point
>> releases, so the different package version *shouldn't* matter...right,
>> John? ;)
>> -Greg
>>
>>>
>>> --
>>> Adam
>>>
>>> On Fri, Sep 30, 2016 at 3:49 PM, Gregory Farnum  wrote:
 On Fri, Sep 30, 2016 at 11:39 AM, Adam Tygart  wrote:
> Hello all,
>
> Not sure if this went through before or not, as I can't check the
> mailing list archives.
>
> I've gotten myself into a bit of a bind. I was prepping to add a new
> mds node to my ceph cluster. e.g. ceph-deploy mds create mormo
>
> Unfortunately, it started the mds server before I was ready. My
> cluster was running 10.2.1, and the newly deployed mds is 10.2.3.
>
> This caused 3 of my 5 monitors to crash. Since I immediately realized
> the mds was a newer version, I took that opportunity to upgrade my
> monitors to 10.2.3. Three of the 5 monitors continue to crash. And it
> looks like they are crashing when trying to apply a pending mdsmap
> update.
>
> The log is available here:
> http://people.cis.ksu.edu/~mozes/hobbit01.mon-20160930.log.gz
>
> I have attempted (making backups of course) to extract the monmap from
> a working monitor and inserting it into a broken one. No luck, and
> backup was restored.
>
> Since I had 2 working monitors, I backed up the monitor stores,
> updated the monmaps to remove the broken ones and tried to restart
> them. I then tried to restart the "working" ones. They then failed in
> the same way. I've now restored my backups of those monitors.
>
> I need to get these monitors back up post-haste.
>
> If you've got any ideas, I would be grateful.

 I'm not sure but it looks like it's now too late to keep the problem
 out of the durable storage, but if you try again make sure you turn
 off the MDS first.

 It sort of looks like you've managed to get a failed MDS with an
 invalid fscid (ie, a cephfs filesystem ID).

 ...or maybe just a terrible coding mistake. 

Re: [ceph-users] Down monitors after adding mds node

2016-10-02 Thread Adam Tygart
Sent before I was ready, oops.

How might I get the osdmap from a down cluster?

--
Adam

On Mon, Oct 3, 2016 at 12:29 AM, Adam Tygart  wrote:
> I put this in the #ceph-dev on Friday,
>
> (gdb) print info
> $7 = (const MDSMap::mds_info_t &) @0x5fb1da68: {
>   global_id = { boost::totally_ordered2 boost::detail::empty_base > >> =
> { boost::equality_comparable1 boost::totally_ordered2 boost::detail::empty_base > > >> =
> { boost::totally_ordered2 boost::detail::empty_base > >> =
> { boost::detail::empty_base >> =
> { boost::equality_comparable2 boost::detail::empty_base > >> =
> { boost::detail::empty_base >> =
> {> = {},  fields>}, }, }, },  data fields>}, }, t = 1055992652}, name = "mormo",
> rank = -1, inc = 0,
>   state = MDSMap::STATE_STANDBY, state_seq = 2, addr = {type = 0,
> nonce = 8835, {addr = {ss_family = 2, __ss_align = 0, __ss_padding =
> '\000' }, addr4 = {sin_family = 2, sin_port =
> 36890,
> sin_addr = {s_addr = 50398474}, sin_zero =
> "\000\000\000\000\000\000\000"}, addr6 = {sin6_family = 2, sin6_port =
> 36890, sin6_flowinfo = 50398474, sin6_addr = {__in6_u = {
> __u6_addr8 = '\000' , __u6_addr16 = {0,
> 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id =
> 0}}}, laggy_since = {tv = {tv_sec = 0, tv_nsec = 0}},
>   standby_for_rank = 0, standby_for_name = "", standby_for_fscid =
> 328, standby_replay = true, export_targets = std::set with 0 elements,
> mds_features = 1967095022025}
> (gdb) print target_role
> $8 = {rank = 0, fscid = }
>
> It looks like target_role.fscid was somehow optimized out.
>
> --
> Adam
>
> On Sun, Oct 2, 2016 at 4:26 PM, Gregory Farnum  wrote:
>> On Sat, Oct 1, 2016 at 7:19 PM, Adam Tygart  wrote:
>>> The wip-fixup-mds-standby-init branch doesn't seem to allow the
>>> ceph-mons to start up correctly. I disabled all mds servers before
>>> starting the monitors up, so it would seem the pending mdsmap update
>>> is in durable storage. Now that the mds servers are down, can we clear
>>> the mdsmap of active and standby servers while initializing the mons?
>>> I would hope that, now that all the versions are in sync, a bad
>>> standby_for_fscid would not be possible with new mds servers starting.
>>
>> Looks like my first guess about the run-time initialization being
>> confused was wrong. :(
>> Given that, we're pretty befuddled. But I commented on irc:
>>
>>>if you've still got a core dump, can you go up a frame (to 
>>>MDSMonitor::maybe_promote_standby) and check the values of target_role.rank 
>>>and target_role.fscid, and how that compares to info.standby_for_fscid, 
>>>info.legacy_client_fscid, and info.standby_for_rank?
>>
>> That might pop up something and isn't accessible in the log you
>> posted. We also can't see an osdmap or dump; if you could either
>> extract and print that or get a log which includes it that might show
>> up something.
>>
>> I don't think we changed the mds<-> protocol or anything in the point
>> releases, so the different package version *shouldn't* matter...right,
>> John? ;)
>> -Greg
>>
>>>
>>> --
>>> Adam
>>>
>>> On Fri, Sep 30, 2016 at 3:49 PM, Gregory Farnum  wrote:
 On Fri, Sep 30, 2016 at 11:39 AM, Adam Tygart  wrote:
> Hello all,
>
> Not sure if this went through before or not, as I can't check the
> mailing list archives.
>
> I've gotten myself into a bit of a bind. I was prepping to add a new
> mds node to my ceph cluster. e.g. ceph-deploy mds create mormo
>
> Unfortunately, it started the mds server before I was ready. My
> cluster was running 10.2.1, and the newly deployed mds is 10.2.3.
>
> This caused 3 of my 5 monitors to crash. Since I immediately realized
> the mds was a newer version, I took that opportunity to upgrade my
> monitors to 10.2.3. Three of the 5 monitors continue to crash. And it
> looks like they are crashing when trying to apply a pending mdsmap
> update.
>
> The log is available here:
> http://people.cis.ksu.edu/~mozes/hobbit01.mon-20160930.log.gz
>
> I have attempted (making backups of course) to extract the monmap from
> a working monitor and inserting it into a broken one. No luck, and
> backup was restored.
>
> Since I had 2 working monitors, I backed up the monitor stores,
> updated the monmaps to remove the broken ones and tried to restart
> them. I then tried to restart the "working" ones. They then failed in
> the same way. I've now restored my backups of those monitors.
>
> I need to get these monitors back up post-haste.
>
> If you've got any ideas, I would be grateful.

 I'm not sure but it looks like it's now too late to keep the problem
 out of the durable storage, but if you try again make sure you turn
 off the MDS first.

 It sort of looks like you've managed to get a failed MDS with an
 invalid fscid (ie, a cephfs filesystem ID).

 ...or maybe just a terrible coding mistake. As ment

Re: [ceph-users] Give up on backfill, remove slow OSD

2016-10-02 Thread Ronny Aasen

On 22. sep. 2016 09:16, Iain Buclaw wrote:

Hi,

I currently have an OSD that has been backfilling data off it for a
little over two days now, and it's gone from approximately 68 PGs to
63.

As data is still being read from, and written to it by clients whilst
I'm trying to get it out of the cluster, this is not helping it at
all.  I figured that it's probably best just to cut my losses and just
force it out entirely so that all new writes and reads to those PGs
get redirected elsewhere to a functional disk, and the rest of the
recovery can proceed without being blocked heavily by this one disk.

Granted that objects and files have a 1:1 relationship, I can just
rsync the data to a new server and write it back into ceph afterwards.

Now, I know that as soon as I bring down this OSD, the entire cluster
will stop operating.  So what's the most swift method of telling the
cluster to forget about this disk and everything that may be stored on
it.

Thanks




It should normally not get new writes to it if you want to remove it 
from the cluster. I assume you did something wrong here. How did you 
define the osd out of the cluster ?



generally my procedure for a working osd is something like
1. ceph osd crush reweight osd.X 0

2. ceph osd tree
   check that the osd in question actualy have 0 weight (first number
after ID) and that the host weight have been reduced accordingly.


3. ls /var/lib/ceph/osd/cph-X/current ; periodically
   wait for the osd to drain, there should be no PG directories 
n.xxx_head or n.xxx_TEMP this will take a while depending on the size of 
the osd. in reality i just wait  until the disk usage graph settle, then 
doublecheck with ls.


4: once empty I mark the osd out, stop the process, and removes the osd 
from the cluster as written in the documentation

 - ceph auth del osd.x
 - ceph osd crush remove osd.x
 - ceph osd rm osd.x



PS: if your cluster stops to operate when a osd goes down, you have 
something else fundamentally wrong. you should look into this as well as 
a separate case.


kind regards
Ronny Aasen





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: No space left on device

2016-10-02 Thread Mykola Dvornik
Hi Johan,

Many thanks for your reply. I will try to play with the mds tunables and
report back to your ASAP.

So far I see that mds log contains a lot of errors of the following kind:

2016-10-02 11:58:03.002769 7f8372d54700  0 mds.0.cache.dir(100056ddecd)
_fetched  badness: got (but i already had) [inode 10005729a77 [2,head]
~mds0/stray1/10005729a77 auth v67464942 s=196728 nl=0 n(v0 b196728 1=1+0)
(iversion lock) 0x7f84acae82a0] mode 33204 mtime 2016-08-07 23:06:29.776298

2016-10-02 11:58:03.002789 7f8372d54700 -1 log_channel(cluster) log [ERR] :
loaded dup inode 10005729a77 [2,head] v68621 at
/users/mykola/mms/NCSHNO/final/120nm-uniform-h8200/j002654.out/m_xrange192-320_yrange192-320_016232.dump,
but inode 10005729a77.head v67464942 already exists at
~mds0/stray1/10005729a77

Those folders within mds.0.cache.dir that got badness report a size of 16EB
on the clients. rm on them fails with 'Directory not empty'.

As for the "Client failing to respond to cache pressure", I have 2 kernel
clients on 4.4.21, 1 on 4.7.5 and 16 fuse clients always running the most
recent release version of ceph-fuse. The funny thing is that every single
client misbehaves from time to time. I am aware of quite discussion about
this issue on the ML, but cannot really follow how to debug it.

Regards,

-Mykola

On 2 October 2016 at 22:27, John Spray  wrote:

> On Sun, Oct 2, 2016 at 11:09 AM, Mykola Dvornik
>  wrote:
> > After upgrading to 10.2.3 we frequently see messages like
>
> From which version did you upgrade?
>
> > 'rm: cannot remove '...': No space left on device
> >
> > The folders we are trying to delete contain approx. 50K files 193 KB
> each.
>
> My guess would be that you are hitting the new
> mds_bal_fragment_size_max check.  This limits the number of entries
> that the MDS will create in a single directory fragment, to avoid
> overwhelming the OSD with oversized objects.  It is 10 by default.
> This limit also applies to "stray" directories where unlinked files
> are put while they wait to be purged, so you could get into this state
> while doing lots of deletions.  There are ten stray directories that
> get a roughly even share of files, so if you have more than about one
> million files waiting to be purged, you could see this condition.
>
> The "Client failing to respond to cache pressure" messages may play a
> part here -- if you have misbehaving clients then they may cause the
> MDS to delay purging stray files, leading to a backlog.  If your
> clients are by any chance older kernel clients, you should upgrade
> them.  You can also unmount/remount them to clear this state, although
> it will reoccur until the clients are updated (or until the bug is
> fixed, if you're running latest clients already).
>
> The high level counters for strays are part of the default output of
> "ceph daemonperf mds." when run on the MDS server (the "stry" and
> "purg" columns).  You can look at these to watch how fast the MDS is
> clearing out strays.  If your backlog is just because it's not doing
> it fast enough, then you can look at tuning mds_max_purge_files and
> mds_max_purge_ops to adjust the throttles on purging.  Those settings
> can be adjusted without restarting the MDS using the "injectargs"
> command (http://docs.ceph.com/docs/master/rados/operations/
> control/#mds-subsystem)
>
> Let us know how you get on.
>
> John
>
>
> > The cluster state and storage available are both OK:
> >
> > cluster 98d72518-6619-4b5c-b148-9a781ef13bcb
> >  health HEALTH_WARN
> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> > pressure
> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> > pressure
> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> > pressure
> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> > pressure
> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> > pressure
> >  monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0}
> > election epoch 11, quorum 0 000-s-ragnarok
> >   fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active}
> >  osdmap e20203: 16 osds: 16 up, 16 in
> > flags sortbitwise
> >   pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 kobjects
> > 23048 GB used, 6745 GB / 29793 GB avail
> > 1085 active+clean
> >2 active+clean+scrubbing
> >1 active+clean+scrubbing+deep
> >
> >
> > Has anybody experienced this issue so far?
> >
> > Regards,
> > --
> >  Mykola
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>



-- 
 Mykola
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Down monitors after adding mds node

2016-10-02 Thread Gregory Farnum
On Sat, Oct 1, 2016 at 7:19 PM, Adam Tygart  wrote:
> The wip-fixup-mds-standby-init branch doesn't seem to allow the
> ceph-mons to start up correctly. I disabled all mds servers before
> starting the monitors up, so it would seem the pending mdsmap update
> is in durable storage. Now that the mds servers are down, can we clear
> the mdsmap of active and standby servers while initializing the mons?
> I would hope that, now that all the versions are in sync, a bad
> standby_for_fscid would not be possible with new mds servers starting.

Looks like my first guess about the run-time initialization being
confused was wrong. :(
Given that, we're pretty befuddled. But I commented on irc:

>if you've still got a core dump, can you go up a frame (to 
>MDSMonitor::maybe_promote_standby) and check the values of target_role.rank 
>and target_role.fscid, and how that compares to info.standby_for_fscid, 
>info.legacy_client_fscid, and info.standby_for_rank?

That might pop up something and isn't accessible in the log you
posted. We also can't see an osdmap or dump; if you could either
extract and print that or get a log which includes it that might show
up something.

I don't think we changed the mds<-> protocol or anything in the point
releases, so the different package version *shouldn't* matter...right,
John? ;)
-Greg

>
> --
> Adam
>
> On Fri, Sep 30, 2016 at 3:49 PM, Gregory Farnum  wrote:
>> On Fri, Sep 30, 2016 at 11:39 AM, Adam Tygart  wrote:
>>> Hello all,
>>>
>>> Not sure if this went through before or not, as I can't check the
>>> mailing list archives.
>>>
>>> I've gotten myself into a bit of a bind. I was prepping to add a new
>>> mds node to my ceph cluster. e.g. ceph-deploy mds create mormo
>>>
>>> Unfortunately, it started the mds server before I was ready. My
>>> cluster was running 10.2.1, and the newly deployed mds is 10.2.3.
>>>
>>> This caused 3 of my 5 monitors to crash. Since I immediately realized
>>> the mds was a newer version, I took that opportunity to upgrade my
>>> monitors to 10.2.3. Three of the 5 monitors continue to crash. And it
>>> looks like they are crashing when trying to apply a pending mdsmap
>>> update.
>>>
>>> The log is available here:
>>> http://people.cis.ksu.edu/~mozes/hobbit01.mon-20160930.log.gz
>>>
>>> I have attempted (making backups of course) to extract the monmap from
>>> a working monitor and inserting it into a broken one. No luck, and
>>> backup was restored.
>>>
>>> Since I had 2 working monitors, I backed up the monitor stores,
>>> updated the monmaps to remove the broken ones and tried to restart
>>> them. I then tried to restart the "working" ones. They then failed in
>>> the same way. I've now restored my backups of those monitors.
>>>
>>> I need to get these monitors back up post-haste.
>>>
>>> If you've got any ideas, I would be grateful.
>>
>> I'm not sure but it looks like it's now too late to keep the problem
>> out of the durable storage, but if you try again make sure you turn
>> off the MDS first.
>>
>> It sort of looks like you've managed to get a failed MDS with an
>> invalid fscid (ie, a cephfs filesystem ID).
>>
>> ...or maybe just a terrible coding mistake. As mentioned on irc,
>> wip-fixup-mds-standby-init should fix it. I've created a ticket as
>> well: http://tracker.ceph.com/issues/17466
>> -Greg
>>
>>
>>>
>>> --
>>> Adam
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blog post about Ceph cache tiers - feedback welcome

2016-10-02 Thread Nick Fisk
Hi Sascha,

Good article, you might want to add a small section about these two variables

osd_agent_max_high_ops
osd_agent_max_ops

They control how many concurrent flushes happen at the high/low thresholds. Ie 
you can set the low one to 1 to minimise the impact
on client IO.

Also the target_max_bytes is calculated on a per PG basis, so the value is 
divided across PG's. As data distribution is not equal
across all PG's you can get into a situation where you are getting cache full 
warnings, even though the total cache utilisation is
below the target_max_bytes, so leave it plenty of headroom.

Nick


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Sascha Vogt
> Sent: 02 October 2016 20:59
> To: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
> Subject: [ceph-users] Blog post about Ceph cache tiers - feedback welcome
> 
> Hi all,
> 
> as it took quite a while until we got our Ceph cache working (and we're still 
> hit but some unexpected things, see the thread Ceph
with
> cache pool - disk usage / cleanup), I thought it might be good to write a 
> summary of what I (believe) to know up to this point.
> 
> Any feedback, especially corrections is highly welcome!
> 
> http://maybebuggy.de/post/ceph-cache-tier/
> 
> Greetings
> -Sascha-
> 
> PS: Posted to ceph-devel as well, just in case a developer spots some 
> mistakes.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: No space left on device

2016-10-02 Thread John Spray
On Sun, Oct 2, 2016 at 11:09 AM, Mykola Dvornik
 wrote:
> After upgrading to 10.2.3 we frequently see messages like

>From which version did you upgrade?

> 'rm: cannot remove '...': No space left on device
>
> The folders we are trying to delete contain approx. 50K files 193 KB each.

My guess would be that you are hitting the new
mds_bal_fragment_size_max check.  This limits the number of entries
that the MDS will create in a single directory fragment, to avoid
overwhelming the OSD with oversized objects.  It is 10 by default.
This limit also applies to "stray" directories where unlinked files
are put while they wait to be purged, so you could get into this state
while doing lots of deletions.  There are ten stray directories that
get a roughly even share of files, so if you have more than about one
million files waiting to be purged, you could see this condition.

The "Client failing to respond to cache pressure" messages may play a
part here -- if you have misbehaving clients then they may cause the
MDS to delay purging stray files, leading to a backlog.  If your
clients are by any chance older kernel clients, you should upgrade
them.  You can also unmount/remount them to clear this state, although
it will reoccur until the clients are updated (or until the bug is
fixed, if you're running latest clients already).

The high level counters for strays are part of the default output of
"ceph daemonperf mds." when run on the MDS server (the "stry" and
"purg" columns).  You can look at these to watch how fast the MDS is
clearing out strays.  If your backlog is just because it's not doing
it fast enough, then you can look at tuning mds_max_purge_files and
mds_max_purge_ops to adjust the throttles on purging.  Those settings
can be adjusted without restarting the MDS using the "injectargs"
command 
(http://docs.ceph.com/docs/master/rados/operations/control/#mds-subsystem)

Let us know how you get on.

John


> The cluster state and storage available are both OK:
>
> cluster 98d72518-6619-4b5c-b148-9a781ef13bcb
>  health HEALTH_WARN
> mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> pressure
> mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> pressure
> mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> pressure
> mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> pressure
> mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
> pressure
>  monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0}
> election epoch 11, quorum 0 000-s-ragnarok
>   fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active}
>  osdmap e20203: 16 osds: 16 up, 16 in
> flags sortbitwise
>   pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 kobjects
> 23048 GB used, 6745 GB / 29793 GB avail
> 1085 active+clean
>2 active+clean+scrubbing
>1 active+clean+scrubbing+deep
>
>
> Has anybody experienced this issue so far?
>
> Regards,
> --
>  Mykola
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph with Cache pool - disk usage / cleanup | writeup

2016-10-02 Thread Sascha Vogt

Hi all,

just a quick writeup. Over the last two days I was able to evict a lot 
of those 0-byte files by setting "target_max_objects" to 2 millions.


After we hit that limit I set it to 10 millions for now. So 
target_dirty_ratio of 0.6 would mean evicting should start at around 6 
million objects. target_full_ratio is set to 0.9, so overall no more 
than 9 million objects should exist in the cache. Remember we started at 
109 million total and 24 million dirty.


Now I still have quite some 0-bytes left over in our cache pool (see 
listing at the end), but we'll see how they develop over the next days.


Having set the limit so low, we evicted nearly the whole cache (from 9 
TB total storage space only 800 GB remained). Luckily the difference 
from the original question is now down to around 50 GB (quite some 
savings from 860 GB which we started ;) )


ceph df detail now lists 2.3 million objects and 1.7 million dirty.

Thanks a lot Christian and Burkhard for all the help and clarifications 
and your informations have been preserved in a blog post (see other post 
to this mailing list).


Greetings
-Sascha-

File count (total and 0-bytes per OSD):

OSD-20 total: 315998
OSD-20 0-bytes: 301835
OSD-21 total: 224645
OSD-21 0-bytes: 212026
OSD-22 total: 208189
OSD-22 0-bytes: 196139
OSD-23 total: 357256
OSD-23 0-bytes: 342350
OSD-24 total: 232800
OSD-24 0-bytes: 220466
OSD-25 total: 235298
OSD-25 0-bytes: 222985
OSD-26 total: 236957
OSD-26 0-bytes: 224345
OSD-27 total: 265974
OSD-27 0-bytes: 252538
OSD-28 total: 253577
OSD-28 0-bytes: 241265
OSD-29 total: 255774
OSD-29 0-bytes: 242891
OSD-30 total: 209818
OSD-30 0-bytes: 198581
OSD-31 total: 276357
OSD-31 0-bytes: 262294
OSD-32 total: 239600
OSD-32 0-bytes: 226639
OSD-33 total: 245248
OSD-33 0-bytes: 232712
OSD-34 total: 267156
OSD-34 0-bytes: 253815
OSD-35 total: 250241
OSD-35 0-bytes: 237709


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Blog post about Ceph cache tiers - feedback welcome

2016-10-02 Thread Sascha Vogt

Hi all,

as it took quite a while until we got our Ceph cache working (and we're 
still hit but some unexpected things, see the thread Ceph with cache 
pool - disk usage / cleanup), I thought it might be good to write a 
summary of what I (believe) to know up to this point.


Any feedback, especially corrections is highly welcome!

http://maybebuggy.de/post/ceph-cache-tier/

Greetings
-Sascha-

PS: Posted to ceph-devel as well, just in case a developer spots some 
mistakes.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New Cluster OSD Issues

2016-10-02 Thread Sascha Vogt

Hi Pankaj,

On 30/09/16 17:31, Garg, Pankaj wrote:

I just created a new cluster with 0.94.8 and I’m getting this message:

2016-09-29 21:36:47.065642 mon.0 [INF] disallowing boot of OSD osd.35
10.22.21.49:6844/9544 because the osdmap requires
CEPH_FEATURE_SERVER_JEWEL but the osd lacks CEPH_FEATURE_SERVER_JEWEL

This is really bizzare. All the OSDS are down due to this. Can someone
shed any light?
Are you sure you used Hammer (0.94.x) in all places? Looks like your 
monitor daemon is on Jewel already (or at least thats what I'm reading 
from the printed info log).


Greetings
-Sascha-

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS: No space left on device

2016-10-02 Thread Mykola Dvornik
After upgrading to 10.2.3 we frequently see messages like

'rm: cannot remove '...': No space left on device

The folders we are trying to delete contain approx. 50K files 193 KB each.

The cluster state and storage available are both OK:

cluster 98d72518-6619-4b5c-b148-9a781ef13bcb
 health HEALTH_WARN
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
pressure
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
pressure
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
pressure
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
pressure
mds0: Client XXX.XXX.XXX.XXX failing to respond to cache
pressure
 monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0}
election epoch 11, quorum 0 000-s-ragnarok
  fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active}
 osdmap e20203: 16 osds: 16 up, 16 in
flags sortbitwise
  pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 kobjects
23048 GB used, 6745 GB / 29793 GB avail
1085 active+clean
   2 active+clean+scrubbing
   1 active+clean+scrubbing+deep


Has anybody experienced this issue so far?

Regards,
-- 
 Mykola
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] unfound objects blocking cluster, need help!

2016-10-02 Thread Dan van der Ster
Hi,

Do you understand why removing that osd led to unfound objects? Do you have
the ceph.log from yesterday?

Cheers, Dan

On 2 Oct 2016 10:18, "Tomasz Kuzemko"  wrote:
>
> Forgot to mention Ceph version - 0.94.5.
>
> I managed to fix this. By chance I found that when an OSD for a blocked
PG is starting, there is a few-second time window (after load_pgs) in which
it accepts commands related to the blocked PG. So first I managed to
capture "ceph pg PGID query" this way. Then I tried to issue "ceph pg
missing_lost delete" and it worked too. After deleting all unfound objects
this way cluster finally unblocked. Before that I exported all blocked PGs
so hopefully I will be able to recover those 17 objects to a near-latest
state.
>
> Hope this helps anyone who might run into the same problem.
>
>
> 2016-10-01 14:27 GMT+02:00 Tomasz Kuzemko :
>>
>> Hi,
>>
>> I have a production cluster on which 1 OSD on a failing disk was slowing
the whole cluster down. I removed the OSD (osd.87) like usual in such case
but this time it resulted in 17 unfound objects. I no longer have the files
from osd.87. I was able to call "ceph pg PGID mark_unfound_lost delete" on
10 of those objects.
>>
>> On the remaining objects 7 the command blocks. When I try to do "ceph
pg  PGID query" on this PG it also blocks. I suspect this is same reason
why mark_unfound blocks.
>>
>> Other client IO to PGs that have unfound objects are also blocked. When
trying to query the OSDs which has the PG with unfound objects, "ceph tell"
blocks.
>>
>> I tried to mark the PG as complete using ceph-objectstore-tool but it
did not help as the PG is in fact complete but for some reason blocks.
>>
>> I tried recreating an empty osd.87 and importing the PG exported from
other replica but it did not help.
>>
>> Can someone help me please? This is really important.
>>
>> ceph pg dump:
>>
https://gist.github.com/anonymous/c0622ef0d8c0ac84e0778e73bad3c1af/raw/206a06e674ed1c870bbb09bb75fe4285a8e20ba4/pg-dump
>>
>> ceph osd dump:
>>
https://gist.github.com/anonymous/64e237d85016af6bd7879ef272ca5639/raw/d6fceb9acd206b75c3ce59c60bcd55a47dea7acd/osd-dump
>>
>> ceph health detail:
>>
https://gist.github.com/anonymous/ddb27863ecd416748ebd7ebbc036e438/raw/59ef1582960e011f10cbdbd4ccee509419b95d4e/health-detail
>>
>>
>> --
>> Pozdrawiam,
>> Tomasz Kuzemko
>> tom...@kuzemko.net
>
>
>
>
> --
> Pozdrawiam,
> Tomasz Kuzemko
> tom...@kuzemko.net
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] unfound objects blocking cluster, need help!

2016-10-02 Thread Tomasz Kuzemko
Forgot to mention Ceph version - 0.94.5.

I managed to fix this. By chance I found that when an OSD for a blocked PG
is starting, there is a few-second time window (after load_pgs) in which it
accepts commands related to the blocked PG. So first I managed to capture
"ceph pg PGID query" this way. Then I tried to issue "ceph pg missing_lost
delete" and it worked too. After deleting all unfound objects this way
cluster finally unblocked. Before that I exported all blocked PGs so
hopefully I will be able to recover those 17 objects to a near-latest state.

Hope this helps anyone who might run into the same problem.


2016-10-01 14:27 GMT+02:00 Tomasz Kuzemko :

> Hi,
>
> I have a production cluster on which 1 OSD on a failing disk was slowing
> the whole cluster down. I removed the OSD (osd.87) like usual in such case
> but this time it resulted in 17 unfound objects. I no longer have the files
> from osd.87. I was able to call "ceph pg PGID mark_unfound_lost delete" on
> 10 of those objects.
>
> On the remaining objects 7 the command blocks. When I try to do "ceph pg
> PGID query" on this PG it also blocks. I suspect this is same reason why
> mark_unfound blocks.
>
> Other client IO to PGs that have unfound objects are also blocked. When
> trying to query the OSDs which has the PG with unfound objects, "ceph tell"
> blocks.
>
> I tried to mark the PG as complete using ceph-objectstore-tool but it did
> not help as the PG is in fact complete but for some reason blocks.
>
> I tried recreating an empty osd.87 and importing the PG exported from
> other replica but it did not help.
>
> Can someone help me please? This is really important.
>
> ceph pg dump:
> https://gist.github.com/anonymous/c0622ef0d8c0ac84e0778e73bad3c1af/raw/
> 206a06e674ed1c870bbb09bb75fe4285a8e20ba4/pg-dump
>
> ceph osd dump:
> https://gist.github.com/anonymous/64e237d85016af6bd7879ef272ca5639/raw/
> d6fceb9acd206b75c3ce59c60bcd55a47dea7acd/osd-dump
>
> ceph health detail:
> https://gist.github.com/anonymous/ddb27863ecd416748ebd7ebbc036e438/raw/
> 59ef1582960e011f10cbdbd4ccee509419b95d4e/health-detail
>
>
> --
> Pozdrawiam,
> Tomasz Kuzemko
> tom...@kuzemko.net
>



-- 
Pozdrawiam,
Tomasz Kuzemko
tom...@kuzemko.net
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Again: Unknown error (95->500) when creating buckets or putting files to RGW after upgrade from Infernalis to Jewel

2016-10-02 Thread Mario David
dear all
I have exactly the same problem as reported in thread
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011599.html

I have openstack mitaka, upgraded radosgw from infernalis to jewel
testing swift, creation of containers and object (not yet tried creation
of s3 buckets and upload "files")

I run the script described here
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg30567.html

after I run
radosgw-admin period update
radosgw-admin period commit

before posting any config files or logs, I manage to create swift
containers OK

when I try to upload a file "create object" I get the error reported
above thread

I have checked the ceph.conf file against the ones in others threads
it's using
rgw frontends = "civetweb port=9000"

in my client.rgw.XXX.log

2016-10-02 08:14:42.799570 7f6102ffd700  0 WARNING: set_req_state_err
err_no=95 resorting to 500
2016-10-02 08:14:42.799660 7f6102ffd700  2 req
1:0.613066:swift:PUT /swift/v1/testMD/curly:put_obj:op status=-95
2016-10-02 08:14:42.799666 7f6102ffd700  2 req
1:0.613073:swift:PUT /swift/v1/testMD/curly:put_obj:http status=500
2016-10-02 08:14:42.799674 7f6102ffd700  1 == req done
req=0x7f6102ff7710 op status=-95 http_status=500 ==

the container "testMD" was successfully created earlier, 
I think all interaction with my keystone v3 are OK

so my question to the list and specially to Maciej Naruszewicz
do you have the swift and/or s3 apis of ceph jewel (the radosgw) working

if there was a solution to Maciej (and mine) threads?

tia

I can provide more detailed conf or logs 

best
Mario David

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com