Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Muthusamy Muthiah
Hi Greg, Thanks for the info and hope this will be solved in the upcoming minor updates of kraken. Regarding k+1 , I will take your feedback to our architect team and to increase this to k+2 and revert back the pool to normal state. Thanks, Muthu On 1 February 2017 at 02:01, Shinobu Kinjo

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Shinobu Kinjo
On Wed, Feb 1, 2017 at 3:38 AM, Gregory Farnum wrote: > On Tue, Jan 31, 2017 at 9:06 AM, Muthusamy Muthiah > wrote: >> Hi Greg, >> >> the problem is in kraken, when a pool is created with EC profile , min_size >> equals erasure size. >> >> For

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Muthusamy Muthiah
Hi Greg, the problem is in kraken, when a pool is created with EC profile , min_size equals erasure size. For 3+1 profile , following is the pool status , pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Muthusamy Muthiah
Hi Greg, Following are the test outcomes on EC profile ( n = k + m) 1. Kraken filestore and bluetore with m=1 , recovery does not start . 2. Jewel filestore and bluestore with m=1 , recovery happens . 3. Kraken bluestore all default configuration and m=1, no recovery. 4.

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Muthusamy Muthiah
Hi Greg, Now we could see the same problem exists for kraken-filestore also. Attached the requested osdmap and crushmap. OSD.1 was stopped in this following procedure and OSD map for a PG is displayed. ceph osd dump | grep cdvr_ec 2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-30 Thread Gregory Farnum
You might also check out "ceph osd tree" and crush dump and make sure they look the way you expect. On Mon, Jan 30, 2017 at 1:23 PM, Gregory Farnum wrote: > On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah > wrote: >> Hi All, >> >> Also tried

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-30 Thread Gregory Farnum
On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah wrote: > Hi All, > > Also tried EC profile 3+1 on 5 node cluster with bluestore enabled . When > an OSD is down the cluster goes to ERROR state even when the cluster is n+1 > . No recovery happening. > > health

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-29 Thread Muthusamy Muthiah
Hi All, Also tried EC profile 3+1 on 5 node cluster with bluestore enabled . When an OSD is down the cluster goes to ERROR state even when the cluster is n+1 . No recovery happening. health HEALTH_ERR 75 pgs are stuck inactive for more than 300 seconds 75 pgs incomplete

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-23 Thread Muthusamy Muthiah
Hi Greg, We use EC:4+1 on 5 node cluster in production deployments with filestore and it does recovery and peering when one OSD goes down. After few mins , other OSD from a node where the fault OSD exists will take over the PGs temporarily and all PGs goes to active + clean state . Cluster also

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-20 Thread Shinobu Kinjo
`ceph pg dump` should show you something like: * active+undersized+degraded ... [NONE,3,2,4,1]3[NONE,3,2,4,1] Sam, Am I wrong? Or is it up to something else? On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum wrote: > I'm pretty sure the default configs won't let an

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-20 Thread Gregory Farnum
I'm pretty sure the default configs won't let an EC PG go active with only "k" OSDs in its PG; it needs at least k+1 (or possibly more? Not certain). Running an "n+1" EC config is just not a good idea. For testing you could probably adjust this with the equivalent of min_size for EC pools, but I

[ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-20 Thread Muthusamy Muthiah
Hi , We are validating kraken 11.2.0 with bluestore on 5 node cluster with EC 4+1. When an OSD is down , the peering is not happening and ceph health status moved to ERR state after few mins. This was working in previous development releases. Any additional configuration required in v11.2.0