Re: [ceph-users] OMAP warning ( again )

2018-09-01 Thread Brent Kennedy
I found this discussion between Wido and Florian ( two really good ceph folks 
), but it doesn’t seem to deep dive into sharding ( something I would like to 
know more about ).

https://www.spinics.net/lists/ceph-users/msg24420.html  

None of my clusters are using multi-site sync ( was thinking about it though ). 
 

Matt:  I see you are talking about this bug report:  
http://tracker.ceph.com/issues/34307  

Not sure how I could even find the retired bucket index you are referring to.  
I think I understand what sharding is doing, but as far as the inner workings, 
I am still a newb.

-Brent


-Original Message-
From: Matt Benjamin [mailto:mbenj...@redhat.com] 
Sent: Saturday, September 1, 2018 2:56 PM
To: Brent Kennedy 
Cc: Will Marley ; ceph-users 

Subject: Re: [ceph-users] OMAP warning ( again )

Apparently it is the case presently that when dynamic resharding completes, the 
retired bucket index shards need to be manually deleted.  We plan to change 
this, but it's worth checking for such objects.  Alternatively, though, look 
for other large omap "objects", e.g., sync-error.log, if you are using 
multi-site sync.

Matt

On Sat, Sep 1, 2018 at 1:47 PM, Brent Kennedy  wrote:
> I didn’t want to attempt anything until I had more information.  I 
> have been tied up with secondary stuff, so we are just monitoring for 
> now.  The only thing I could find was a setting to make the warning go 
> away, but that doesn’t seem like a good idea as it was identified as 
> an issue that should be addressed.  What perplexes me is that in 
> newest version of Luminous, it should be automatically resharding.  
> Noted here:  https://ceph.com/releases/v12-2-0-luminous-released/
>
> "RGW now supports dynamic bucket index sharding. This has to be enabled via 
> the rgw dynamic resharding configurable. As the number of objects in a bucket 
> grows, RGW will automatically reshard the bucket index in response.  No user 
> intervention or bucket size capacity planning is required."
>
> The documentation, says its enabled by default though, noted here: 
> http://docs.ceph.com/docs/master/radosgw/dynamicresharding
>
> I ran a few of the commands on the buckets to check and the sharding says -1, 
> so what I believe to be the case is that dynamic resharding is only enabled 
> by default on new buckets created in luminous, its not applied to existing 
> buckets.  So now the question is, can I enable dynamic resharding on a 
> particular bucket and will my cluster go nuts if the bucket is huge.  I would 
> rather the system figure out the sharding though as I am unsure what the 
> shard numbers would be should I have to tell the system to do it manually.
>
> There is an open bug report that I have been following: 
> https://tracker.ceph.com/issues/24457 ( appears we are not the only ones ).
>
> We need defined remediation steps for this. I was thinking of 
> hitting up the IRC room since RGW folks don’t seem to be around :(
>
> -Brent
>
> -Original Message-
> From: Will Marley [mailto:will.mar...@ukfast.co.uk]
> Sent: Friday, August 31, 2018 6:08 AM
> To: Brent Kennedy 
> Subject: RE: [ceph-users] OMAP warning ( again )
>
> Hi Brent,
>
> We're currently facing a similar issue. Did a manual reshard repair this for 
> you? Or do you have any more information to hand regarding a solution with 
> this? We're currently sat around scratching our heads about this as there 
> doesn't seem to be much documentation available.
>
> Kind Regards
>
> Will
> Ceph & Openstack Engineer
>
> UKFast.Net Limited, Registered in England Company Registration Number 
> 3845616 Registered office: UKFast Campus, Birley Fields, Manchester 
> M15 5QJ
>
> -----Original Message-
> From: ceph-users  On Behalf Of 
> Brent Kennedy
> Sent: 01 August 2018 23:08
> To: 'Brad Hubbard' 
> Cc: 'ceph-users' 
> Subject: Re: [ceph-users] OMAP warning ( again )
>
> Ceph health detail gives this:
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
> 1 large objects found in pool '.rgw.buckets.index'
> Search the cluster log for 'Large omap object found' for more details.
>
> The ceph.log file on the monitor server only shows the 1 large omap objects 
> message.
>
> I looked further into the issue again and remembered it was related to bucket 
> sharding.  I then remembered that in Luminous it was supposed to dynamic. I 
> went through the process this time of checking to see what the shards were 
> set to for one of the buckets we have and the max shards is still set to 0.  
> The blog posting about it says that there isn’t anything we have to do, but I 
> am wondering if the same is true for clusters that were upgraded to luminous 
> from

Re: [ceph-users] OMAP warning ( again )

2018-09-01 Thread Matt Benjamin
Apparently it is the case presently that when dynamic resharding
completes, the retired bucket index shards need to be manually
deleted.  We plan to change this, but it's worth checking for such
objects.  Alternatively, though, look for other large omap "objects",
e.g., sync-error.log, if you are using multi-site sync.

Matt

On Sat, Sep 1, 2018 at 1:47 PM, Brent Kennedy  wrote:
> I didn’t want to attempt anything until I had more information.  I have been 
> tied up with secondary stuff, so we are just monitoring for now.  The only 
> thing I could find was a setting to make the warning go away, but that 
> doesn’t seem like a good idea as it was identified as an issue that should be 
> addressed.  What perplexes me is that in newest version of Luminous, it 
> should be automatically resharding.  Noted here:  
> https://ceph.com/releases/v12-2-0-luminous-released/
>
> "RGW now supports dynamic bucket index sharding. This has to be enabled via 
> the rgw dynamic resharding configurable. As the number of objects in a bucket 
> grows, RGW will automatically reshard the bucket index in response.  No user 
> intervention or bucket size capacity planning is required."
>
> The documentation, says its enabled by default though, noted here: 
> http://docs.ceph.com/docs/master/radosgw/dynamicresharding
>
> I ran a few of the commands on the buckets to check and the sharding says -1, 
> so what I believe to be the case is that dynamic resharding is only enabled 
> by default on new buckets created in luminous, its not applied to existing 
> buckets.  So now the question is, can I enable dynamic resharding on a 
> particular bucket and will my cluster go nuts if the bucket is huge.  I would 
> rather the system figure out the sharding though as I am unsure what the 
> shard numbers would be should I have to tell the system to do it manually.
>
> There is an open bug report that I have been following: 
> https://tracker.ceph.com/issues/24457 ( appears we are not the only ones ).
>
> We need defined remediation steps for this. I was thinking of hitting up 
> the IRC room since RGW folks don’t seem to be around :(
>
> -Brent
>
> -Original Message-
> From: Will Marley [mailto:will.mar...@ukfast.co.uk]
> Sent: Friday, August 31, 2018 6:08 AM
> To: Brent Kennedy 
> Subject: RE: [ceph-users] OMAP warning ( again )
>
> Hi Brent,
>
> We're currently facing a similar issue. Did a manual reshard repair this for 
> you? Or do you have any more information to hand regarding a solution with 
> this? We're currently sat around scratching our heads about this as there 
> doesn't seem to be much documentation available.
>
> Kind Regards
>
> Will
> Ceph & Openstack Engineer
>
> UKFast.Net Limited, Registered in England Company Registration Number 3845616 
> Registered office: UKFast Campus, Birley Fields, Manchester M15 5QJ
>
> -----Original Message-
> From: ceph-users  On Behalf Of Brent 
> Kennedy
> Sent: 01 August 2018 23:08
> To: 'Brad Hubbard' 
> Cc: 'ceph-users' 
> Subject: Re: [ceph-users] OMAP warning ( again )
>
> Ceph health detail gives this:
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
> 1 large objects found in pool '.rgw.buckets.index'
> Search the cluster log for 'Large omap object found' for more details.
>
> The ceph.log file on the monitor server only shows the 1 large omap objects 
> message.
>
> I looked further into the issue again and remembered it was related to bucket 
> sharding.  I then remembered that in Luminous it was supposed to dynamic. I 
> went through the process this time of checking to see what the shards were 
> set to for one of the buckets we have and the max shards is still set to 0.  
> The blog posting about it says that there isn’t anything we have to do, but I 
> am wondering if the same is true for clusters that were upgraded to luminous 
> from older versions.
>
> Do I need to run this: radosgw-admin reshard add --bucket= 
> --num-shards=  for every bucket to get that going?
>
> When I look at a bucket ( BKTEST ), it shows num_shards as 0:
> root@ukpixmon1:/var/log/ceph# radosgw-admin metadata get 
> bucket.instance:BKTEST:default.7320.3
> {
> "key": "bucket.instance:BKTEST:default.7320.3",
> "ver": {
> "tag": "_JFn84AijvH8aWXWXyvSeKpZ",
> "ver": 1
> },
> "mtime": "2018-01-10 18:50:07.994194Z",
> "data": {
> "bucket_info": {
> "bucket": {
> "name": "BKTEST",
> "marker": "default.7320.3

Re: [ceph-users] OMAP warning ( again )

2018-09-01 Thread Brent Kennedy
I didn’t want to attempt anything until I had more information.  I have been 
tied up with secondary stuff, so we are just monitoring for now.  The only 
thing I could find was a setting to make the warning go away, but that doesn’t 
seem like a good idea as it was identified as an issue that should be 
addressed.  What perplexes me is that in newest version of Luminous, it should 
be automatically resharding.  Noted here:  
https://ceph.com/releases/v12-2-0-luminous-released/ 

"RGW now supports dynamic bucket index sharding. This has to be enabled via the 
rgw dynamic resharding configurable. As the number of objects in a bucket 
grows, RGW will automatically reshard the bucket index in response.  No user 
intervention or bucket size capacity planning is required."

The documentation, says its enabled by default though, noted here: 
http://docs.ceph.com/docs/master/radosgw/dynamicresharding 

I ran a few of the commands on the buckets to check and the sharding says -1, 
so what I believe to be the case is that dynamic resharding is only enabled by 
default on new buckets created in luminous, its not applied to existing 
buckets.  So now the question is, can I enable dynamic resharding on a 
particular bucket and will my cluster go nuts if the bucket is huge.  I would 
rather the system figure out the sharding though as I am unsure what the shard 
numbers would be should I have to tell the system to do it manually.

There is an open bug report that I have been following: 
https://tracker.ceph.com/issues/24457 ( appears we are not the only ones ).

We need defined remediation steps for this. I was thinking of hitting up 
the IRC room since RGW folks don’t seem to be around :(  

-Brent

-Original Message-
From: Will Marley [mailto:will.mar...@ukfast.co.uk] 
Sent: Friday, August 31, 2018 6:08 AM
To: Brent Kennedy 
Subject: RE: [ceph-users] OMAP warning ( again )

Hi Brent,

We're currently facing a similar issue. Did a manual reshard repair this for 
you? Or do you have any more information to hand regarding a solution with 
this? We're currently sat around scratching our heads about this as there 
doesn't seem to be much documentation available.

Kind Regards

Will
Ceph & Openstack Engineer

UKFast.Net Limited, Registered in England Company Registration Number 3845616 
Registered office: UKFast Campus, Birley Fields, Manchester M15 5QJ

-Original Message-
From: ceph-users  On Behalf Of Brent Kennedy
Sent: 01 August 2018 23:08
To: 'Brad Hubbard' 
Cc: 'ceph-users' 
Subject: Re: [ceph-users] OMAP warning ( again )

Ceph health detail gives this:
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool '.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.

The ceph.log file on the monitor server only shows the 1 large omap objects 
message.

I looked further into the issue again and remembered it was related to bucket 
sharding.  I then remembered that in Luminous it was supposed to dynamic. I 
went through the process this time of checking to see what the shards were set 
to for one of the buckets we have and the max shards is still set to 0.  The 
blog posting about it says that there isn’t anything we have to do, but I am 
wondering if the same is true for clusters that were upgraded to luminous from 
older versions.

Do I need to run this: radosgw-admin reshard add --bucket= 
--num-shards=  for every bucket to get that going?

When I look at a bucket ( BKTEST ), it shows num_shards as 0:
root@ukpixmon1:/var/log/ceph# radosgw-admin metadata get 
bucket.instance:BKTEST:default.7320.3
{
"key": "bucket.instance:BKTEST:default.7320.3",
"ver": {
"tag": "_JFn84AijvH8aWXWXyvSeKpZ",
"ver": 1
},
"mtime": "2018-01-10 18:50:07.994194Z",
"data": {
"bucket_info": {
"bucket": {
"name": "BKTEST",
"marker": "default.7320.3",
"bucket_id": "default.7320.3",
"tenant": "",
"explicit_placement": {
"data_pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_pool": ".rgw.buckets.index"
}
},
"creation_time": "2016-03-09 17:23:50.00Z",
"owner": "zz",
"flags": 0,
"zonegroup": "default",
"placement_rule": "default-placement",
"has_instance_obj": "true",
"quota": {
"enabled": fal

Re: [ceph-users] OMAP warning ( again )

2018-08-01 Thread Brad Hubbard
rgw is not really my area but I'd suggest before you do *anything* you
establish which object it is talking about.

On Thu, Aug 2, 2018 at 8:08 AM, Brent Kennedy  wrote:
> Ceph health detail gives this:
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
> 1 large objects found in pool '.rgw.buckets.index'
> Search the cluster log for 'Large omap object found' for more details.
>
> The ceph.log file on the monitor server only shows the 1 large omap objects 
> message.
>
> I looked further into the issue again and remembered it was related to bucket 
> sharding.  I then remembered that in Luminous it was supposed to dynamic. I 
> went through the process this time of checking to see what the shards were 
> set to for one of the buckets we have and the max shards is still set to 0.  
> The blog posting about it says that there isn’t anything we have to do, but I 
> am wondering if the same is true for clusters that were upgraded to luminous 
> from older versions.
>
> Do I need to run this: radosgw-admin reshard add --bucket= 
> --num-shards=  for every bucket to get that going?
>
> When I look at a bucket ( BKTEST ), it shows num_shards as 0:
> root@ukpixmon1:/var/log/ceph# radosgw-admin metadata get 
> bucket.instance:BKTEST:default.7320.3
> {
> "key": "bucket.instance:BKTEST:default.7320.3",
> "ver": {
> "tag": "_JFn84AijvH8aWXWXyvSeKpZ",
> "ver": 1
> },
> "mtime": "2018-01-10 18:50:07.994194Z",
> "data": {
> "bucket_info": {
> "bucket": {
> "name": "BKTEST",
> "marker": "default.7320.3",
> "bucket_id": "default.7320.3",
> "tenant": "",
> "explicit_placement": {
> "data_pool": ".rgw.buckets",
> "data_extra_pool": ".rgw.buckets.extra",
> "index_pool": ".rgw.buckets.index"
> }
> },
> "creation_time": "2016-03-09 17:23:50.00Z",
> "owner": "zz",
> "flags": 0,
> "zonegroup": "default",
> "placement_rule": "default-placement",
> "has_instance_obj": "true",
> "quota": {
> "enabled": false,
> "check_on_raw": false,
> "max_size": -1024,
> "max_size_kb": 0,
> "max_objects": -1
> },
> "num_shards": 0,
> "bi_shard_hash_type": 0,
> "requester_pays": "false",
> "has_website": "false",
> "swift_versioning": "false",
> "swift_ver_location": "",
> "index_type": 0,
> "mdsearch_config": [],
> "reshard_status": 0,
>         "new_bucket_instance_id": ""
>
> When I run that shard setting to change the number of shards:
> "radosgw-admin reshard add --bucket=BKTEST --num-shards=2"
>
> Then run to get the status:
> "radosgw-admin reshard list"
>
> [
> {
> "time": "2018-08-01 21:58:13.306381Z",
> "tenant": "",
> "bucket_name": "BKTEST",
> "bucket_id": "default.7320.3",
> "new_instance_id": "",
> "old_num_shards": 1,
> "new_num_shards": 2
> }
> ]
>
> If it was 0, why does it say old_num_shards was 1?
>
> -Brent
>
> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Tuesday, July 31, 2018 9:07 PM
> To: Brent Kennedy 
> Cc: ceph-users 
> Subject: Re: [ceph-users] OMAP warning ( again )
>
> Search the cluster log for 'Large omap object found' for more details.
>
> On Wed, Aug 1, 2018 at 3:50 AM, Brent Kennedy  wrote:
>> Upgraded from 12.2.5 to 12.2.6, got a “1 large omap objects” warning
>> message, then upgraded to 12.2.7 and the message went away.  I just
>> added four OSDs to balance out the cluster ( we had some servers with
>> fewer drives in them; jbod config ) 

Re: [ceph-users] OMAP warning ( again )

2018-08-01 Thread Brent Kennedy
Ceph health detail gives this:
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool '.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.

The ceph.log file on the monitor server only shows the 1 large omap objects 
message.

I looked further into the issue again and remembered it was related to bucket 
sharding.  I then remembered that in Luminous it was supposed to dynamic. I 
went through the process this time of checking to see what the shards were set 
to for one of the buckets we have and the max shards is still set to 0.  The 
blog posting about it says that there isn’t anything we have to do, but I am 
wondering if the same is true for clusters that were upgraded to luminous from 
older versions.

Do I need to run this: radosgw-admin reshard add --bucket= 
--num-shards=  for every bucket to get that going?

When I look at a bucket ( BKTEST ), it shows num_shards as 0:
root@ukpixmon1:/var/log/ceph# radosgw-admin metadata get 
bucket.instance:BKTEST:default.7320.3
{
"key": "bucket.instance:BKTEST:default.7320.3",
"ver": {
"tag": "_JFn84AijvH8aWXWXyvSeKpZ",
"ver": 1
},
"mtime": "2018-01-10 18:50:07.994194Z",
"data": {
"bucket_info": {
"bucket": {
"name": "BKTEST",
"marker": "default.7320.3",
"bucket_id": "default.7320.3",
"tenant": "",
"explicit_placement": {
"data_pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_pool": ".rgw.buckets.index"
}
},
"creation_time": "2016-03-09 17:23:50.00Z",
"owner": "zz",
"flags": 0,
"zonegroup": "default",
"placement_rule": "default-placement",
"has_instance_obj": "true",
"quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1024,
"max_size_kb": 0,
"max_objects": -1
},
"num_shards": 0,
"bi_shard_hash_type": 0,
"requester_pays": "false",
"has_website": "false",
"swift_versioning": "false",
"swift_ver_location": "",
"index_type": 0,
"mdsearch_config": [],
"reshard_status": 0,
"new_bucket_instance_id": ""

When I run that shard setting to change the number of shards:
"radosgw-admin reshard add --bucket=BKTEST --num-shards=2"

Then run to get the status:
"radosgw-admin reshard list"

[
{
"time": "2018-08-01 21:58:13.306381Z",
"tenant": "",
"bucket_name": "BKTEST",
"bucket_id": "default.7320.3",
"new_instance_id": "",
"old_num_shards": 1,
"new_num_shards": 2
}
]

If it was 0, why does it say old_num_shards was 1?

-Brent

-Original Message-
From: Brad Hubbard [mailto:bhubb...@redhat.com] 
Sent: Tuesday, July 31, 2018 9:07 PM
To: Brent Kennedy 
Cc: ceph-users 
Subject: Re: [ceph-users] OMAP warning ( again )

Search the cluster log for 'Large omap object found' for more details.

On Wed, Aug 1, 2018 at 3:50 AM, Brent Kennedy  wrote:
> Upgraded from 12.2.5 to 12.2.6, got a “1 large omap objects” warning 
> message, then upgraded to 12.2.7 and the message went away.  I just 
> added four OSDs to balance out the cluster ( we had some servers with 
> fewer drives in them; jbod config ) and now the “1 large omap objects” 
> warning message is back.  I did some googlefoo to try to figure out 
> what it means and then how to correct it, but the how to correct it is a bit 
> vague.
>
>
>
> We use rados gateways for all storage, so everything is in the 
> .rgw.buckets pool, which I gather from research is why we are getting 
> the warning message ( there are millions of objects in there ).
>
>
>
> Is there an if/then process to clearing this error message?
>
>
>
> Regards,
>
> -Brent
>
>
>
> Existing Clusters:
>
> Test: Luminous 12.2.7 with 3 osd servers, 1 mon/man, 1 gateway ( all 
> virtual
> )
>
> US Production: Firefly with 4 osd servers, 3 mons, 3 gateways behind 
> haproxy LB
>
> UK Production: Luminous 12.2.7 with 8 osd servers, 3 mons/man, 3 
> gateways behind haproxy LB
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Cheers,
Brad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OMAP warning ( again )

2018-07-31 Thread Brad Hubbard
Search the cluster log for 'Large omap object found' for more details.

On Wed, Aug 1, 2018 at 3:50 AM, Brent Kennedy  wrote:
> Upgraded from 12.2.5 to 12.2.6, got a “1 large omap objects” warning
> message, then upgraded to 12.2.7 and the message went away.  I just added
> four OSDs to balance out the cluster ( we had some servers with fewer drives
> in them; jbod config ) and now the “1 large omap objects” warning message is
> back.  I did some googlefoo to try to figure out what it means and then how
> to correct it, but the how to correct it is a bit vague.
>
>
>
> We use rados gateways for all storage, so everything is in the .rgw.buckets
> pool, which I gather from research is why we are getting the warning message
> ( there are millions of objects in there ).
>
>
>
> Is there an if/then process to clearing this error message?
>
>
>
> Regards,
>
> -Brent
>
>
>
> Existing Clusters:
>
> Test: Luminous 12.2.7 with 3 osd servers, 1 mon/man, 1 gateway ( all virtual
> )
>
> US Production: Firefly with 4 osd servers, 3 mons, 3 gateways behind haproxy
> LB
>
> UK Production: Luminous 12.2.7 with 8 osd servers, 3 mons/man, 3 gateways
> behind haproxy LB
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OMAP warning ( again )

2018-07-31 Thread Brent Kennedy
Upgraded from 12.2.5 to 12.2.6, got a "1 large omap objects" warning
message, then upgraded to 12.2.7 and the message went away.  I just added
four OSDs to balance out the cluster ( we had some servers with fewer drives
in them; jbod config ) and now the "1 large omap objects" warning message is
back.  I did some googlefoo to try to figure out what it means and then how
to correct it, but the how to correct it is a bit vague.

 

We use rados gateways for all storage, so everything is in the .rgw.buckets
pool, which I gather from research is why we are getting the warning message
( there are millions of objects in there ).

 

Is there an if/then process to clearing this error message?

 

Regards,

-Brent

 

Existing Clusters:

Test: Luminous 12.2.7 with 3 osd servers, 1 mon/man, 1 gateway ( all virtual
)

US Production: Firefly with 4 osd servers, 3 mons, 3 gateways behind haproxy
LB

UK Production: Luminous 12.2.7 with 8 osd servers, 3 mons/man, 3 gateways
behind haproxy LB

 

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com