Re: [ceph-users] recovery process stops

2014-10-25 Thread Harald Rößler
Anyone an idea to solver the situation? Thanks for any advise. Kind Regards Harald Rößler Am 23.10.2014 um 18:56 schrieb Harald Rößler harald.roess...@btd.de: @Wido: sorry I don’t understand what you mean 100%, generated some output which may helps. Ok the pool: pool 3 'bcf' rep size

Re: [ceph-users] recovery process stops

2014-10-23 Thread Harald Rößler
Hi all the procedure does not work for me, have still 47 active+remapped pg. Anyone have an idea how to fix this issue. @Wido: now my cluster have a usage less than 80% - thanks for your advice. Harry Am 21.10.2014 um 22:38 schrieb Craig Lewis

Re: [ceph-users] recovery process stops

2014-10-23 Thread Wido den Hollander
On 10/23/2014 05:33 PM, Harald Rößler wrote: Hi all the procedure does not work for me, have still 47 active+remapped pg. Anyone have an idea how to fix this issue. If you look at those PGs using ceph osd pg dump, what is their prefix? They should start with a number and that number

Re: [ceph-users] recovery process stops

2014-10-23 Thread Harald Rößler
@Wido: sorry I don’t understand what you mean 100%, generated some output which may helps. Ok the pool: pool 3 'bcf' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 832 pgp_num 832 last_change 8000 owner 0 all remapping pg have an temp entry: pg_temp 3.1 [14,20,0] pg_temp

Re: [ceph-users] recovery process stops

2014-10-21 Thread Harald Rößler
Hi all, thank you for your support, now the file system is not degraded any more. Now I have a minus degrading :-) 2014-10-21 10:15:22.303139 mon.0 [INF] pgmap v43376478: 3328 pgs: 3281 active+clean, 47 active+remapped; 1609 GB data, 5022 GB used, 1155 GB / 6178 GB avail; 8034B/s rd, 3548KB/s

Re: [ceph-users] recovery process stops

2014-10-21 Thread Craig Lewis
That will fix itself over time. remapped just means that Ceph is moving the data around. It's normal to see PGs in the remapped and/or backfilling state after OSD restarts. They should go down steadily over time. How long depends on how much data is in the PGs, how fast your hardware is, how

Re: [ceph-users] recovery process stops

2014-10-21 Thread Harald Rößler
After more than 10 hours the same situation, I don’t think it will fix self over time. How I can find out what is the problem. Am 21.10.2014 um 17:28 schrieb Craig Lewis cle...@centraldesktop.commailto:cle...@centraldesktop.com: That will fix itself over time. remapped just means that Ceph

Re: [ceph-users] recovery process stops

2014-10-21 Thread Robert LeBlanc
I've had issues magically fix themselves over night after waiting/trying things for hours. On Tue, Oct 21, 2014 at 1:02 PM, Harald Rößler harald.roess...@btd.de wrote: After more than 10 hours the same situation, I don’t think it will fix self over time. How I can find out what is the problem.

Re: [ceph-users] recovery process stops

2014-10-21 Thread Craig Lewis
In that case, take a look at ceph pg dump | grep remapped. In the up or active column, there should be one or two common OSDs between the stuck PGs. Try restarting those OSD daemons. I've had a few OSDs get stuck scheduling recovery, particularly around toofull situations. I've also had

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 02:45 PM, Harald Rößler wrote: Dear All I have in them moment a issue with my cluster. The recovery process stops. See this: 2 active+degraded+remapped+backfill_toofull 156 pgs backfill_toofull You have one or more OSDs which are to full and that causes recovery to stop. If

Re: [ceph-users] recovery process stops

2014-10-20 Thread Leszek Master
I think it's because you have too full osds like in warning message. I had similiar problem recently and i did: ceph osd reweight-by-utilization But first read what this command does. It solved problem for me. 2014-10-20 14:45 GMT+02:00 Harald Rößler harald.roess...@btd.de: Dear All I have

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
Yes, I had some OSD which was near full, after that I tried to fix the problem with ceph osd reweight-by-utilization, but this does not help. After that I set the near full ratio to 88% with the idea that the remapping would fix the issue. Also a restart of the OSD doesn’t help. At the same

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 04:43 PM, Harald Rößler wrote: Yes, I had some OSD which was near full, after that I tried to fix the problem with ceph osd reweight-by-utilization, but this does not help. After that I set the near full ratio to 88% with the idea that the remapping would fix the issue. Also a

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 05:10 PM, Harald Rößler wrote: yes, tomorrow I will get the replacement of the failed disk, to get a new node with many disk will take a few days. No other idea? If the disks are all full, then, no. Sorry to say this, but it came down to poor capacity management. Never let

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
Yes I agree 100%, but actual every disk have a maximum of 86% usage, there should a way to recover the cluster. To set the near full ratio to higher than 85% should be only a short term solution. New disk for higher capacity are already ordered, I only don’t like degraded situation, for a week

Re: [ceph-users] recovery process stops

2014-10-20 Thread Leszek Master
You can set lower weight on full osds, or try changing the osd_near_full_ratio parameter in your cluster from 85 to for example 89. But i don't know what can go wrong when you do that. 2014-10-20 17:12 GMT+02:00 Wido den Hollander w...@42on.com: On 10/20/2014 05:10 PM, Harald Rößler wrote:

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
yes, tomorrow I will get the replacement of the failed disk, to get a new node with many disk will take a few days. No other idea? Harald Rößler Am 20.10.2014 um 16:45 schrieb Wido den Hollander w...@42on.com: On 10/20/2014 04:43 PM, Harald Rößler wrote: Yes, I had some OSD which was