Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-11 Thread Jens Dueholm Christensen
: Wednesday, January 11, 2017 2:50 PM To: Shinobu Kinjo Cc: Ceph Users Subject: Re: [ceph-users] PGs stuck active+remapped and osds lose data?! Yes, but everything i want to know is, if my way to change the tunables is right or not? > Am 11.01.2017 um 13:11 schrieb Shinobu Kinjo <ski...@redh

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-11 Thread Marcus Müller
Yes, but everything i want to know is, if my way to change the tunables is right or not? > Am 11.01.2017 um 13:11 schrieb Shinobu Kinjo : > > Please refer to Jens's message. > > Regards, > >> On Wed, Jan 11, 2017 at 8:53 PM, Marcus Müller >>

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-11 Thread Shinobu Kinjo
Please refer to Jens's message. Regards, On Wed, Jan 11, 2017 at 8:53 PM, Marcus Müller wrote: > Ok, thank you. I thought I have to set ceph to a tunables profile. If I’m > right, then I just have to export the current crush map, edit it and import > it again, like:

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-11 Thread Marcus Müller
Ok, thank you. I thought I have to set ceph to a tunables profile. If I’m right, then I just have to export the current crush map, edit it and import it again, like: ceph osd getcrushmap -o /tmp/crush crushtool -i /tmp/crush --set-choose-total-tries 100 -o /tmp/crush.new ceph osd setcrushmap -i

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-11 Thread Brad Hubbard
Your current problem has nothing to do with clients and neither does choose_total_tries. Try setting just this value to 100 and see if your situation improves. Ultimately you need to take a good look at your cluster configuration and how your crush map is configured to deal with that

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Shinobu Kinjo
Yeah, Sam is correct. I've not looked at crushmap. But I should have noticed what troublesome is with looking at `ceph osd tree`. That's my bad, sorry for that. Again please refer to: http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/ Regards, On Wed, Jan 11, 2017 at

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Marcus Müller
Hi Sam, another idea: I have two HDDs here and already wanted to add them to ceph5, so that I would need a new crush map. Could this problem be solved by doing this? > Am 10.01.2017 um 17:50 schrieb Samuel Just : > > Shinobu isn't correct, you have 9/9 osds up and running.

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Marcus Müller
Ok, thanks. Then I will change the tunables. As far as I see, this would already help me: ceph osd crush tunables bobtail Even if we run ceph hammer this would work according to the documentation, am I right? And: I’m using librados for our clients (hammer too) could this change create

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Samuel Just
Shinobu isn't correct, you have 9/9 osds up and running. up does not equal acting because crush is having trouble fulfilling the weights in your crushmap and the acting set is being padded out with an extra osd which happens to have the data to keep you up to the right number of replicas. Please

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Marcus Müller
Ok, i understand but how can I debug why they are not running as they should? For me I thought everything is fine because ceph -s said they are up and running. I would think of a problem with the crush map. > Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo : > > e.g., > OSD7

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Shinobu Kinjo
e.g., OSD7 / 3 / 0 are in the same acting set. They should be up, if they are properly running. # 9.7 >"up": [ >7, >3 >], >"acting": [ >7, >3, >0 >], Here is an example: "up": [ 1, 0, 2 ], "acting": [ 1, 0,

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Marcus Müller
> > That's not perfectly correct. > > OSD.0/1/2 seem to be down. Sorry but where do you see this? I think this indicates that they are up: osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs? > Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo : > > On Tue, Jan 10, 2017 at

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Shinobu Kinjo
On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller wrote: > All osds are currently up: > > health HEALTH_WARN > 4 pgs stuck unclean > recovery 4482/58798254 objects degraded (0.008%) > recovery 420522/58798254 objects misplaced

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Marcus Müller
All osds are currently up: health HEALTH_WARN 4 pgs stuck unclean recovery 4482/58798254 objects degraded (0.008%) recovery 420522/58798254 objects misplaced (0.715%) noscrub,nodeep-scrub flag(s) set monmap e9: 5 mons at

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Shinobu Kinjo
Looking at ``ceph -s`` you originally provided, all OSDs are up. > osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs But looking at ``pg query``, OSD.0 / 1 are not up. Are they something like related to ?: > Ceph1, ceph2 and ceph3 are vms on one physical host Are those OSDs running on vm

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Shinobu Kinjo
> pg 9.7 is stuck unclean for 512936.160212, current state active+remapped, > last acting [7,3,0] > pg 7.84 is stuck unclean for 512623.894574, current state active+remapped, > last acting [4,8,1] > pg 8.1b is stuck unclean for 513164.616377, current state active+remapped, > last acting [4,7,2]

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Christian Wuerdig
On Tue, Jan 10, 2017 at 10:22 AM, Marcus Müller wrote: > Trying google with "ceph pg stuck in active and remapped" points to a > couple of post on this ML typically indicating that it's a problem with the > CRUSH map and ceph being unable to satisfy the mapping rules.

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Brad Hubbard
There is currently a thread about this very issue on the ceph-devel mailing list (check archives for "PG stuck unclean after rebalance-by-weight" in the last few days. Have a read of http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/ and try bumping choose_total_tries up

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Marcus Müller
> Trying google with "ceph pg stuck in active and remapped" points to a couple > of post on this ML typically indicating that it's a problem with the CRUSH > map and ceph being unable to satisfy the mapping rules. Your ceph -s output > indicates that your using replication of size 3 in your

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Christian Wuerdig
On Tue, Jan 10, 2017 at 8:23 AM, Marcus Müller wrote: > Hi all, > > Recently I added a new node with new osds to my cluster, which, of course > resulted in backfilling. At the end, there are 4 pgs left in the state 4 > active+remapped and I don’t know what to do. > >

[ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Marcus Müller
Hi all, Recently I added a new node with new osds to my cluster, which, of course resulted in backfilling. At the end, there are 4 pgs left in the state 4 active+remapped and I don’t know what to do. Here is how my cluster looks like currently: ceph -s health HEALTH_WARN 4