Hi Greg,
Thanks for the info and hope this will be solved in the upcoming minor
updates of kraken.
Regarding k+1 , I will take your feedback to our architect team and to
increase this to k+2 and revert back the pool to normal state.
Thanks,
Muthu
On 1 February 2017 at 02:01, Shinobu Kinjo
On Wed, Feb 1, 2017 at 3:38 AM, Gregory Farnum wrote:
> On Tue, Jan 31, 2017 at 9:06 AM, Muthusamy Muthiah
> wrote:
>> Hi Greg,
>>
>> the problem is in kraken, when a pool is created with EC profile , min_size
>> equals erasure size.
>>
>> For
Hi Greg,
the problem is in kraken, when a pool is created with EC profile ,
min_size equals erasure size.
For 3+1 profile , following is the pool status ,
pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
Hi Greg,
Following are the test outcomes on EC profile ( n = k + m)
1. Kraken filestore and bluetore with m=1 , recovery does not start .
2. Jewel filestore and bluestore with m=1 , recovery happens .
3. Kraken bluestore all default configuration and m=1, no recovery.
4.
Hi Greg,
Now we could see the same problem exists for kraken-filestore also.
Attached the requested osdmap and crushmap.
OSD.1 was stopped in this following procedure and OSD map for a PG is
displayed.
ceph osd dump | grep cdvr_ec
2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the
You might also check out "ceph osd tree" and crush dump and make sure
they look the way you expect.
On Mon, Jan 30, 2017 at 1:23 PM, Gregory Farnum wrote:
> On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah
> wrote:
>> Hi All,
>>
>> Also tried
On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah
wrote:
> Hi All,
>
> Also tried EC profile 3+1 on 5 node cluster with bluestore enabled . When
> an OSD is down the cluster goes to ERROR state even when the cluster is n+1
> . No recovery happening.
>
> health
Hi All,
Also tried EC profile 3+1 on 5 node cluster with bluestore enabled . When
an OSD is down the cluster goes to ERROR state even when the cluster is n+1
. No recovery happening.
health HEALTH_ERR
75 pgs are stuck inactive for more than 300 seconds
75 pgs incomplete
Hi Greg,
We use EC:4+1 on 5 node cluster in production deployments with filestore
and it does recovery and peering when one OSD goes down. After few mins ,
other OSD from a node where the fault OSD exists will take over the PGs
temporarily and all PGs goes to active + clean state . Cluster also
`ceph pg dump` should show you something like:
* active+undersized+degraded ... [NONE,3,2,4,1]3[NONE,3,2,4,1]
Sam,
Am I wrong? Or is it up to something else?
On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum wrote:
> I'm pretty sure the default configs won't let an
I'm pretty sure the default configs won't let an EC PG go active with
only "k" OSDs in its PG; it needs at least k+1 (or possibly more? Not
certain). Running an "n+1" EC config is just not a good idea.
For testing you could probably adjust this with the equivalent of
min_size for EC pools, but I
Hi ,
We are validating kraken 11.2.0 with bluestore on 5 node cluster with EC
4+1.
When an OSD is down , the peering is not happening and ceph health status
moved to ERR state after few mins. This was working in previous development
releases. Any additional configuration required in v11.2.0
12 matches
Mail list logo