from:"Casey Bodley"

Re: [ceph-users] Using same name for rgw / beast web front end

2019-09-11 Thread Casey Bodley


Hi Eric,

boost::beast is a low-level c++ http protocol library that's hosted at 
https://github.com/boostorg/beast. Radosgw uses this library, along with 
boost::asio, as the basis for its 'beast frontend'. The motivation 
behind this frontend is its flexible threading model and support for 
asynchronous networking. The civetweb server's thread-per-connection 
model has been a major limitation for scalability. With ongoing work to 
extend the asynchrony of the beast frontend into radosgw's request 
processing, it'll be able to handle more concurrent requests with fewer 
thread/memory resources than civetweb.


In nautilus though, the beast frontend still has this 
thread-per-connection limitation, and its performance is comparable to 
civetweb's. I expect we'll make more noise about it once we're further 
along with the async refactoring and start to show the real wins. I hope 
that helps to clarify the situation!


Casey

On 9/11/19 12:56 PM, Eric Choi wrote:

Replying to my own question:

2. Beast is not a web front end, so it would be an apple-to-orange 
comparison.  I just couldn't find any blogs / docs about it at first 
(found it here: https://github.com/ceph/Beast)


Still unsure about the first question..

On Tue, Sep 10, 2019 at 4:45 PM Eric Choi > wrote:


Hi there, we have been using ceph for a few years now, it's only
now that I've noticed we have been using the same name for all RGW
hosts, resulting when you run ceph -s:

rgw: 1 daemon active (..)

despite having more than 10 RGW hosts.

* What are the side effects of doing this? Is this a no-no? I can
see the metrics can (ceph daemon ... perf dump) be wrong, are the
metrics kept track independently (per host)?

* My second question (maybe this should be a separate email!) is
that comparison between Beast vs Civetweb. We only recently
upgraded to Nautilus, so Beast became available as an option to
us.  I couldn't find any blogs / docs on comparing these 2
frontends.  Is there any recommended reading or could someone give
me an overview?

Much appreciated!



-- 


Eric Choi
Senior Software Engineer 2 | Core Platform




--

Eric Choi
Senior Software Engineer 2 | Core Platform



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW configuration parameters

2019-07-30 Thread Casey Bodley

On 7/30/19 3:03 PM, Thomas Bennett wrote:

Hi Casey,

Thanks for your reply.

Just to make sure I understand correctly-  would that only be if the 
S3 object size for the put/get is multiples of your rgw_max_chunk_size?

whenever the object size is larger than a single chunk

Kind regards,
Tom

On Tue, 30 Jul 2019 at 16:57, Casey Bodley <mailto:cbod...@redhat.com>> wrote:

Hi Thomas,

I see that you're familiar with rgw_max_chunk_size, which is the most
object data that radosgw will write in a single osd request. Each
PutObj
and GetObj request will issue multiple osd requests in parallel,
up to
these configured window sizes. Raising these values can potentially
improve throughput at the cost of increased memory usage.

On 7/30/19 10:36 AM, Thomas Bennett wrote:
> Does anyone know what these parameters are for. I'm not 100% sure I
> understand what a window is in context of rgw objects:
>
>   * rgw_get_obj_window_size
>   * rgw_put_obj_min_window_size
>
> The code points to throttling I/O. But some more info would be
useful.
>
> Kind regards,
> Tom
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Thomas Bennett

Storage Engineer at SARAO

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW configuration parameters

2019-07-30 Thread Casey Bodley


Hi Thomas,

I see that you're familiar with rgw_max_chunk_size, which is the most 
object data that radosgw will write in a single osd request. Each PutObj 
and GetObj request will issue multiple osd requests in parallel, up to 
these configured window sizes. Raising these values can potentially 
improve throughput at the cost of increased memory usage.


On 7/30/19 10:36 AM, Thomas Bennett wrote:
Does anyone know what these parameters are for. I'm not 100% sure I 
understand what a window is in context of rgw objects:


  * rgw_get_obj_window_size
  * rgw_put_obj_min_window_size

The code points to throttling I/O. But some more info would be useful.

Kind regards,
Tom

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Large OMAP Objects in zone.rgw.log pool

2019-07-25 Thread Casey Bodley

What ceph version is this cluster running? Luminous or later should not 
be writing any new meta.log entries when it detects a single-zone 
configuration.

I'd recommend editing your zonegroup configuration (via 'radosgw-admin 
zonegroup get' and 'put') to set both log_meta and log_data to false, 
then commit the change with 'radosgw-admin period update --commit'.

You can then delete any meta.log.* and data_log.* objects from your log 
pool using the rados tool.

On 7/25/19 2:30 PM, Brett Chancellor wrote:

Casey,
  These clusters were setup with the intention of one day doing multi 
site replication. That has never happened. The cluster has a single 
realm, which contains a single zonegroup, and that zonegroup contains 
a single zone.

-Brett

On Thu, Jul 25, 2019 at 2:16 PM Casey Bodley <mailto:cbod...@redhat.com>> wrote:

Hi Brett,

These meta.log objects store the replication logs for metadata
sync in
multisite. Log entries are trimmed automatically once all other zones
have processed them. Can you verify that all zones in the multisite
configuration are reachable and syncing? Does 'radosgw-admin sync
status' on any zone show that it's stuck behind on metadata sync?
That
would prevent these logs from being trimmed and result in these large
omap warnings.

On 7/25/19 1:59 PM, Brett Chancellor wrote:
> I'm having an issue similar to
>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033611.html .

> I don't see where any solution was proposed.
>
> $ ceph health detail
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
>     1 large objects found in pool 'us-prd-1.rgw.log'
>     Search the cluster log for 'Large omap object found' for
more details.
>
> $ grep "Large omap object" /var/log/ceph/ceph.log
> 2019-07-25 14:58:21.758321 osd.3 (osd.3) 15 : cluster [WRN]
Large omap
> object found. Object:
> 51:61eb35fe:::meta.log.e557cf47-46df-4b45-988e-9a94c5004a2e.19:head
> Key count: 3382154 Size (bytes): 611384043
>
> $ rados -p us-prd-1.rgw.log listomapkeys
> meta.log.e557cf47-46df-4b45-988e-9a94c5004a2e.19 |wc -l
> 3382154
>
> $ rados -p us-prd-1.rgw.log listomapvals
> meta.log.e557cf47-46df-4b45-988e-9a94c5004a2e.19
> This returns entries from almost every bucket, across multiple
> tenants. Several of the entries are from buckets that no longer
exist
> on the system.
>
> $ ceph df |egrep 'OBJECTS|.rgw.log'
>     POOL        ID      STORED      OBJECTS     USED    %USED MAX
> AVAIL
>     us-prd-1.rgw.log                 51     758 MiB 228   758 MiB
>       0       102 TiB
>
> Thanks,
>
> -Brett
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Large OMAP Objects in zone.rgw.log pool

2019-07-25 Thread Casey Bodley


Hi Brett,

These meta.log objects store the replication logs for metadata sync in 
multisite. Log entries are trimmed automatically once all other zones 
have processed them. Can you verify that all zones in the multisite 
configuration are reachable and syncing? Does 'radosgw-admin sync 
status' on any zone show that it's stuck behind on metadata sync? That 
would prevent these logs from being trimmed and result in these large 
omap warnings.


On 7/25/19 1:59 PM, Brett Chancellor wrote:
I'm having an issue similar to 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033611.html . 
I don't see where any solution was proposed.


$ ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
    1 large objects found in pool 'us-prd-1.rgw.log'
    Search the cluster log for 'Large omap object found' for more details.

$ grep "Large omap object" /var/log/ceph/ceph.log
2019-07-25 14:58:21.758321 osd.3 (osd.3) 15 : cluster [WRN] Large omap 
object found. Object: 
51:61eb35fe:::meta.log.e557cf47-46df-4b45-988e-9a94c5004a2e.19:head 
Key count: 3382154 Size (bytes): 611384043


$ rados -p us-prd-1.rgw.log listomapkeys 
meta.log.e557cf47-46df-4b45-988e-9a94c5004a2e.19 |wc -l

3382154

$ rados -p us-prd-1.rgw.log listomapvals 
meta.log.e557cf47-46df-4b45-988e-9a94c5004a2e.19
This returns entries from almost every bucket, across multiple 
tenants. Several of the entries are from buckets that no longer exist 
on the system.


$ ceph df |egrep 'OBJECTS|.rgw.log'
    POOL        ID      STORED      OBJECTS     USED        %USED MAX 
AVAIL
    us-prd-1.rgw.log                 51     758 MiB 228     758 MiB   
      0       102 TiB


Thanks,

-Brett

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Admin REST metadata caps

2019-07-23 Thread Casey Bodley


the /admin/metadata apis require caps of type "metadata"

source: 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rest_metadata.h#L37


On 7/23/19 12:53 PM, Benjeman Meekhof wrote:

Ceph Nautilus, 14.2.2, RGW civetweb.
Trying to read from the RGW admin api /metadata/user with request URL like:
GET /admin/metadata/user?key=someuser=json

But am getting a 403 denied error from RGW.  Shouldn't the caps below
be sufficient, or am I missing something?

  "caps": [
 {
 "type": "metadata",
 "perm": "read"
 },
 {
 "type": "user",
 "perm": "read"
 },
 {
 "type": "users",
 "perm": "read"
 }
 ],

The application making the call is a python module:
https://github.com/UMIACS/rgwadmin

I have another application using the API and it is able to make
requests to fetch a user but does so by calling 'GET
/admin/user?format=xml=someuser' and that user has just the
'users=read' cap.

thanks,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multisite RGW - endpoints configuration

2019-07-17 Thread Casey Bodley



On 7/17/19 8:04 AM, P. O. wrote:

Hi,
Is there any mechanism inside the rgw that can detect faulty endpoints 
for a configuration with multiple endpoints?


No, replication requests that fail just get retried using round robin 
until they succeed. If an endpoint isn't available, we assume it will 
come back eventually and keep trying.



Is there any advantage related with the number of replication 
endpoints? Can I expect improved replication performance (the more 
synchronization rgws = the faster replication)?


These endpoints act as the server side of replication, and handle GET 
requests from other zones to read replication logs and fetch objects. As 
long as the number of gateways on the client side of replication (ie. 
gateways on other zones that have rgw_run_sync_thread enabled, which is 
on by default) scale along with these replication endpoints, you can 
expect a modest improvement in replication, though it's limited by the 
available bandwidth between sites. Spreading replication endpoints over 
several gateways also helps to limit the impact of replication on the 
local client workloads.






W dniu środa, 17 lipca 2019 P. O. <mailto:pos...@gmail.com>> napisał(a):


Hi,

Is there any mechanism inside the rgw that can detect faulty
endpoints for a configuration with multiple endpoints? Is there
any advantage related with the number of replication endpoints?
Can I expect improved replication performance (the more synchronization 
rgws = the faster replication)?


W dniu wtorek, 16 lipca 2019 Casey Bodley mailto:cbod...@redhat.com>> napisał(a):

We used to have issues when a load balancer was in front of
the sync endpoints, because our http client didn't time out
stalled connections. Those are resolved in luminous, but we
still recommend using the radosgw addresses directly to avoid
shoveling data through an extra proxy. Internally, sync is
already doing a round robin over that list of endpoints. On
the other hand, load balancers give you some extra
flexibility, like adding/removing gateways without having to
update the global multisite configuration.

On 7/16/19 2:52 PM, P. O. wrote:

Hi all,

I have multisite RGW setup with one zonegroup and two
zones. Each zone has one endpoint configured like below:

"zonegroups": [
{
 ...
 "is_master": "true",
 "endpoints": ["http://192.168.100.1:80;],
 "zones": [
   {
     "name": "primary_1",
     "endpoints": ["http://192.168.100.1:80;],
   },
   {
     "name": "secondary_1",
     "endpoints": ["http://192.168.200.1:80;],
   }
 ],

My question is what is the best practice with configuring
synchronization endpoints?

1) Should endpoints be behind load balancer? For example
two synchronization endpoints per zone, and only load
balancers address in "endpoints" section?
2) Should endpoints be behind Round-robin DNS?
3) Can I set RGWs addresses directly in endpoints section?
For example:

 "zones": [
   {
     "name": "primary_1",
     "endpoints": ["http://192.168.100.1:80;,
http://192.168.100.2:80],
   },
   {
     "name": "secondary_1",
     "endpoints": ["http://192.168.200.1:80;,
http://192.168.200.2:80],
   }

Is there any advantages of third option? I mean speed up
of synchronization, for example.

What recommendations do you have with the configuration of
the endpoints in prod environments?

Best regards,
Dun F.

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multisite RGW - endpoints configuration

2019-07-16 Thread Casey Bodley

We used to have issues when a load balancer was in front of the sync 
endpoints, because our http client didn't time out stalled connections. 
Those are resolved in luminous, but we still recommend using the radosgw 
addresses directly to avoid shoveling data through an extra proxy. 
Internally, sync is already doing a round robin over that list of 
endpoints. On the other hand, load balancers give you some extra 
flexibility, like adding/removing gateways without having to update the 
global multisite configuration.


On 7/16/19 2:52 PM, P. O. wrote:

Hi all,

I have multisite RGW setup with one zonegroup and two zones. Each zone 
has one endpoint configured like below:


"zonegroups": [
{
 ...
 "is_master": "true",
 "endpoints": ["http://192.168.100.1:80;],
 "zones": [
   {
     "name": "primary_1",
     "endpoints": ["http://192.168.100.1:80;],
   },
   {
     "name": "secondary_1",
     "endpoints": ["http://192.168.200.1:80;],
   }
 ],

My question is what is the best practice with configuring 
synchronization endpoints?


1) Should endpoints be behind load balancer? For example two 
synchronization endpoints per zone, and only load balancers address in 
"endpoints" section?

2) Should endpoints be behind Round-robin DNS?
3) Can I set RGWs addresses directly in endpoints section? For example:

 "zones": [
   {
     "name": "primary_1",
     "endpoints": ["http://192.168.100.1:80;, http://192.168.100.2:80],
   },
   {
     "name": "secondary_1",
     "endpoints": ["http://192.168.200.1:80;, http://192.168.200.2:80],
   }

Is there any advantages of third option? I mean speed up of 
synchronization, for example.


What recommendations do you have with the configuration of the 
endpoints in prod environments?


Best regards,
Dun F.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Beast crash 14.2.1

2019-07-11 Thread Casey Bodley




On 7/11/19 3:28 AM, EDH - Manuel Rios Fernandez wrote:


Hi Folks,

This night RGW crashed without sense using beast as fronted.

We solved turning on civetweb again.

Should be report to tracker?

Please do. It looks like this crashed during startup. Can you please 
include the rgw_frontends configuration?



Regards

Manuel

Centos 7.6

Linux ceph-rgw03 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 
UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


fsid e1ee8086-7cce-43fd-a252-3d677af22428

last_changed 2019-06-17 22:35:18.946810

created 2018-04-17 01:37:27.768960

min_mon_release 14 (nautilus)

0: [v2:172.16.2.5:3300/0,v1:172.16.2.5:6789/0] mon.CEPH-MON01

1: [v2:172.16.2.11:3300/0,v1:172.16.2.11:6789/0] mon.CEPH002

2: [v2:172.16.2.12:3300/0,v1:172.16.2.12:6789/0] mon.CEPH003

3: [v2:172.16.2.10:3300/0,v1:172.16.2.10:6789/0] mon.CEPH001

   -18> 2019-07-11 09:05:01.995 7f8441aff700  4 set_mon_vals no 
callback set


   -17> 2019-07-11 09:05:01.995 7f845f6e47c0 10 monclient: _renew_subs

   -16> 2019-07-11 09:05:01.995 7f845f6e47c0 10 monclient: 
_send_mon_message to mon.CEPH003 at v2:172.16.2.12:3300/0


  -15> 2019-07-11 09:05:01.995 7f845f6e47c0  1 librados: init done

   -14> 2019-07-11 09:05:01.995 7f845f6e47c0  5 asok(0x55cd18bac000) 
register_command cr dump hook 0x55cd198247a8


   -13> 2019-07-11 09:05:01.996 7f8443302700  4 mgrc handle_mgr_map 
Got map version 774


   -12> 2019-07-11 09:05:01.996 7f8443302700  4 mgrc handle_mgr_map 
Active mgr is now [v2:172.16.2.10:6858/256331,v1:172.16.2.10:6859/256331]


   -11> 2019-07-11 09:05:01.996 7f8443302700  4 mgrc reconnect 
Starting new session with 
[v2:172.16.2.10:6858/256331,v1:172.16.2.10:6859/256331]


   -10> 2019-07-11 09:05:01.996 7f844c59d700 10 monclient: 
get_auth_request con 0x55cd19a62000 auth_method 0


    -9> 2019-07-11 09:05:01.997 7f844cd9e700 10 monclient: 
get_auth_request con 0x55cd19a62400 auth_method 0


    -8> 2019-07-11 09:05:01.997 7f844c59d700 10 monclient: 
get_auth_request con 0x55cd19a62800 auth_method 0


    -7> 2019-07-11 09:05:01.998 7f845f6e47c0  5 asok(0x55cd18bac000) 
register_command sync trace show hook 0x55cd19846c40


    -6> 2019-07-11 09:05:01.998 7f845f6e47c0  5 asok(0x55cd18bac000) 
register_command sync trace history hook 0x55cd19846c40


    -5> 2019-07-11 09:05:01.998 7f845f6e47c0  5 asok(0x55cd18bac000) 
register_command sync trace active hook 0x55cd19846c40


    -4> 2019-07-11 09:05:01.998 7f845f6e47c0  5 asok(0x55cd18bac000) 
register_command sync trace active_short hook 0x55cd19846c40


    -3> 2019-07-11 09:05:01.999 7f844d59f700 10 monclient: 
get_auth_request con 0x55cd19a62c00 auth_method 0


    -2> 2019-07-11 09:05:01.999 7f844cd9e700 10 monclient: 
get_auth_request con 0x55cd19a63000 auth_method 0


    -1> 2019-07-11 09:05:01.999 7f845f6e47c0  0 starting handler: beast

 0> 2019-07-11 09:05:02.001 7f845f6e47c0 -1 *** Caught signal 
(Aborted) **


in thread 7f845f6e47c0 thread_name:radosgw

ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) 
nautilus (stable)


1: (()+0xf5d0) [0x7f845293c5d0]

2: (gsignal()+0x37) [0x7f8451d77207]

3: (abort()+0x148) [0x7f8451d788f8]

4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f84526867d5]

5: (()+0x5e746) [0x7f8452684746]

6: (()+0x5e773) [0x7f8452684773]

7: (()+0x5e993) [0x7f8452684993]

8: (void 
boost::throw_exception(boost::system::system_error 
const&)+0x173) [0x55cd16d9f863]


9: (boost::asio::detail::do_throw_error(boost::system::error_code 
const&, char const*)+0x5b) [0x55cd16d9f91b]


10: (()+0x2837fc) [0x55cd16d8b7fc]

11: (main()+0x2873) [0x55cd16d2a8b3]

12: (__libc_start_main()+0xf5) [0x7f8451d633d5]

13: (()+0x24a877) [0x55cd16d52877]

NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


--- logging levels ---

   0/ 5 none

   0/ 1 lockdep

   0/ 1 context

   1/ 1 crush

   1/ 5 mds

   1/ 5 mds_balancer

   1/ 5 mds_locker

   1/ 5 mds_log

   1/ 5 mds_log_expire

   1/ 5 mds_migrator

   0/ 1 buffer

   0/ 1 timer

   0/ 1 filer

   0/ 1 striper

   0/ 1 objecter

   0/ 5 rados

   0/ 5 rbd

   0/ 5 rbd_mirror

   0/ 5 rbd_replay

   0/ 5 journaler

   0/ 5 objectcacher

   0/ 5 client

   0/ 0 osd

   0/ 5 optracker

   0/ 5 objclass

   1/ 3 filestore

   0/ 0 journal

   0/ 0 ms

   1/ 5 mon

   0/10 monc

   1/ 5 paxos

   0/ 5 tp

   1/ 5 auth

   1/ 5 crypto

   1/ 1 finisher

   1/ 1 reserver

   1/ 5 heartbeatmap

   1/ 5 perfcounter

   1/ 1 rgw

   1/ 5 rgw_sync

   1/10 civetweb

   1/ 5 javaclient

   1/ 5 asok

   1/ 1 throttle

   0/ 0 refs

   1/ 5 xio

   1/ 5 compressor

   1/ 5 bluestore

   1/ 5 bluefs

   1/ 3 bdev

   1/ 5 kstore

   4/ 5 rocksdb

   4/ 5 leveldb

   4/ 5 memdb

   1/ 5 kinetic

   1/ 5 fuse

   1/ 5 mgr

   1/ 5 mgrc

   1/ 5 dpdk

   1/ 5 eventtrace

  -2/-2 (syslog threshold)

  -1/-1 (stderr threshold)

  max_recent 1

  max_new 1000

  log_file /var/log/ceph/ceph-client.rgw.ceph-rgw03.log

--- end dump of recent

Re: [ceph-users] Stop metadata sync in multi-site RGW

2019-06-19 Thread Casey Bodley

Right, the sync_from fields in the zone configuration only relate to 
data sync within the zonegroup. Can you clarify what your goal is? Are 
you just trying to pause the replication for a while, or disable it 
permanently?


To pause replication, you can configure rgw_run_sync_thread=0 on all 
gateways in that zone. Just note that replication logs will continue to 
grow, and because this 'paused' zone isn't consuming them, it will 
prevent the logs from being trimmed on all zones until sync is reenabled 
and replication catches up.


To disable replication entirely, you'd want to move that zone out of the 
multisite configuration. This would involve removing the zone from its 
current zonegroup, creating a new realm and zonegroup, moving the zone 
into that, and setting its log_data/log_meta fields to false. I can 
follow up with radosgw-admin commands if that's what you're trying to do.


On 6/19/19 10:14 AM, Marcelo Mariano Miziara wrote:

Hello all!

I'm trying to stop the sync from two zones, but using the parameter 
"--sync_from_all=false" seems to stop only the data sync, but not the 
metadata (i.e. users and buckets are synced).



# radosgw-admin sync status
  realm  (xx)
      zonegroup  (xx)
  zone  (xx)
  metadata sync syncing
    full sync: 0/64 shards
    incremental sync: 64/64 shards
    metadata is caught up with master
  data sync source: (xx)
    not syncing from zone

Thanks,
Marcelo M.
Serviço Federal de Processamento de Dados - SERPRO
marcelo.mizi...@serpro.gov.br

-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), 
empresa pública federal regida pelo disposto na Lei Federal nº 5.615, 
é enviada exclusivamente a seu destinatário e pode conter informações 
confidenciais, protegidas por sigilo profissional. Sua utilização 
desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a 
recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, 
esclarecendo o equívoco."


"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) 
-- a government company established under Brazilian law (5.615/70) -- 
is directed exclusively to its addressee and may contain confidential 
data, protected under professional secrecy rules. Its unauthorized use 
is illegal and may subject the transgressor to the law's penalties. If 
you're not the addressee, please send it back, elucidating the failure."


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Even more objects in a single bucket?

2019-06-17 Thread Casey Bodley

Hi Harry,

When dynamic resharding was introduced for luminous, this limit on the
number of bucket index shards was increased from 7877 to 65521. However,
you're likely to have problems with bucket listing performance before
you get to 7877 shards, because every listing request has to read from
every shard of the bucket in order to produce sorted results.

If you can avoid listings entirely, indexless buckets are recommended.
Otherwise, you can use our 'allow-unordered' extension to the s3 GET
Bucket api [1] which is able to list one shard at a time for better
scaling with shard count. Note that there was a bug [2] affecting this
extension that was resolved for v12.2.13, v13.2.6, and v14.2.2.

[1] http://docs.ceph.com/docs/luminous/radosgw/s3/bucketops/#get-bucket

[2] http://tracker.ceph.com/issues/39393

On 6/17/19 11:00 AM, Harald Staub wrote:
There are customers asking for 500 million objects in a single object
storage bucket (i.e. 5000 shards), but also more. But we found some
places that say that there is a limit in the number of shards per
bucket, e.g.

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli

It says that the maximum number of shards is 7877. But I could not
find this magic number (or any other limit) on http://docs.ceph.com.

Maybe this hard limit no longer applies to Nautilus? Maybe there is a
recommended soft limit?

Background about the application: Veeam (veeam.com) is a backup
solution for VMWare that can embed a cloud storage tier with object
storage (only with a single bucket). Just thinking loud: Maybe this
could work with an indexless bucket. Not sure how manageable this
would be, e.g. to monitor how much space is used. Maybe separate pools
would be needed.

Harry
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Multisite Q's

2019-06-14 Thread Casey Bodley


On 6/12/19 11:49 AM, Peter Eisch wrote:

Hi,

Could someone be able to point me to a blog or documentation page which helps 
me resolve the issues noted below?

All nodes are Luminous, 12.2.12; one realm, one zonegroup (clustered haproxies 
fronting), two zones (three rgw in each); All endpoint references to each zone 
go are an haproxy.

In hoping to replace a swift config with RGW it has been interesting.  Crafting 
a functional configuration from blog posts and documentation takes time.  It 
was crucial to find and use 
http://docs.ceph.com/docs/luminous/radosgw/multisite/ instead of 
http://docs.ceph.com/docs/master/radosgw/config-ref/ except parts suggest 
incorrect configurations.  I've submitted corrections to the former in #28517, 
for what it's worth.

Through this I'm now finding fewer resources to help explain the abundance of 
404's in the gateway logs:

   "GET /admin/log/?type=data=8=true= 
HTTP/1.1" 404 0 - -
   "GET /admin/log/?type=data=8=true= 
HTTP/1.1" 404 0 - -
   "GET /admin/log/?type=data=8=true= 
HTTP/1.1" 404 0 - -
   "GET /admin/log/?type=data=8=true= 
HTTP/1.1" 404 0 - -

To the counts of hundreds of thousands.  The site seems to work with just 
minimal testing so far.  The 404's also seem to be limited to the data queries 
while the metadata queries are mostly more successful with 200's.

   "GET 
/admin/log?type=metadata=55=58b43d07-03e2-48e4-b2dc-74d64ef7f0c9=100&=
 HTTP/1.1" 200 0 - -
"GET 
/admin/log?type=metadata=45=58b43d07-03e2-48e4-b2dc-74d64ef7f0c9=100&==
 HTTP/1.1" 200 0 - -
   "GET 
/admin/log?type=metadata=4=58b43d07-03e2-48e4-b2dc-74d64ef7f0c9=100&==
 HTTP/1.1" 200 0 - -
"GET 
/admin/log?type=metadata=35=58b43d07-03e2-48e4-b2dc-74d64ef7f0c9=100&==
 HTTP/1.1" 200 0 - -

Q: How do I address the 404 events to help them succeed?


Hi Peter,

These 404s are not really failures. The replication logs for data and 
metadata are spread over several objects (identified by '=' in the 
requests). If no changes have occurred on a given shard id, that object 
never gets created, so other zones will get a 404 when trying to read them.




Other log events which I cannot resolve are the tens of thousands (even while 
no reads or writes are requested) of:

   ... meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
   ... meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
   ... data sync: ERROR: failed to read remote data log info: ret=-2
   ... data sync: ERROR: failed to read remote data log info: ret=-2
   ... meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
   ... meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
   ... data sync: ERROR: failed to read remote data log info: ret=-2
   ... data sync: ERROR: failed to read remote data log info: ret=-2
   ... data sync: ERROR: failed to read remote data log info: ret=-2
   ... meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
   ... etc.
These seem to fire off every 30 seconds but doesn't seem to be managed by "rgw usage log tick 
interval" nor "rgw init timeout" values.  Meanwhile the usage between the two zones 
matches for each bucket.

Q:  What are these log events indicating?


These are the same non-fatal errors from above, where 404 errors from 
other zones are converted to -ENOENT. The RGWBackoffControlCR will 
continue to poll these objects for changes. These ERROR messages are 
unnecessarily spammy though, so I'd be in favor of removing them.



Thanks,

peter



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW 405 Method Not Allowed on CreateBucket

2019-06-14 Thread Casey Bodley


Hi Drew,

Judging by the "PUT /" in the request line, this request is using the 
virtual hosted bucket format [1]. This means the bucket name is part of 
the dns name and Host header, rather than in the path of the http 
request. Making this work in radosgw takes a little extra configuration 
[2]. If you prefer not to mess with dns, you can tell the SDK to 
'use_path_style_endpoint' instead [3].


Hope that helps,

Casey

[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html

[2] 
http://docs.ceph.com/docs/nautilus/radosgw/s3/commons/#bucket-and-host-name


[3] 
https://docs.aws.amazon.com/aws-sdk-php/v3/api/class-Aws.S3.S3Client.html


On 6/14/19 11:49 AM, Drew Weaver wrote:


Hello,

I am using the latest AWS PHP SDK to create a bucket.

Every time I attempt to do this in the log I see:

2019-06-14 11:42:53.092 7fdff5459700  1 civetweb: 0x55c5450249d8: 
redacted - - [14/Jun/2019:11:42:53 -0400] "PUT / HTTP/1.1" 405 405 - 
aws-sdk-php/3.100.3 GuzzleHttp/6.3.3 curl/7.29.0 PHP/7.2.18


Do I need to somehow give this user or her key the capability to 
create/delete their own buckets?


Thanks,

-Drew


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Casey Bodley


Hi Harald,

If the bucket reshard didn't complete, it's most likely one of the new 
bucket index shards that got corrupted here and the original index shard 
should still be intact. Does $BAD_BUCKET_ID correspond to the 
new/resharded instance id? If so, once the rocksdb/osd issues are 
resolved, you should still be able to access and write to the bucket. 
The 'radosgw-admin reshard stale-instances list/rm' commands should be 
able to detect and clean up after the failed reshard. Without knowing 
more about the rocksdb problem, it's hard to tell whether it's safe to 
re-reshard.


Casey


On 6/12/19 10:31 AM, Harald Staub wrote:

Also opened an issue about the rocksdb problem:
https://tracker.ceph.com/issues/40300

On 12.06.19 16:06, Harald Staub wrote:
We ended in a bad situation with our RadosGW (Cluster is Nautilus 
14.2.1, 350 OSDs with BlueStore):


1. There is a bucket with about 60 million objects, without shards.

2. radosgw-admin bucket reshard --bucket $BIG_BUCKET --num-shards 1024

3. Resharding looked fine first, it counted up to the number of 
objects, but then it hang.


4. 3 OSDs crashed with a segfault: "rocksdb: Corruption: file is too 
short"


5. Trying to start the OSDs manually led to the same segfaults.

6. ceph-bluestore-tool repair ...

7. The repairs all aborted, with the same rocksdb error as above.

8. Now 1 PG is stale. It belongs to the radosgw bucket index pool, 
and it contained the index of this big bucket.


Is there any hope in getting these rocksdbs up again?

Otherwise: how would we fix the bucket index pool? Our ideas:

1. ceph pg $BAD_PG mark_unfound_lost delete
2. rados -p .rgw.buckets ls, search $BAD_BUCKET_ID and remove these 
objects. The hope of this step would be to make the following step 
faster, and avoid another similar problem.

3. radosgw-admin bucket check --check-objects

Will this really rebuild the bucket index? Is it ok to leave the 
existing bucket indexes in place? Is it ok to run for all buckets at 
once, or has it to be run bucket by bucket? Is there a risk that the 
indexes that are not affected by the BAD_PG will be broken afterwards?


Some more details that may be of interest.

ceph-bluestore-repair says:

2019-06-12 11:15:38.345 7f56269670c0 -1 rocksdb: Corruption: file is 
too short (6139497190 bytes) to be an sstabledb/079728.sst
2019-06-12 11:15:38.345 7f56269670c0 -1 
bluestore(/var/lib/ceph/osd/ceph-49) _open_db erroring opening db:

error from fsck: (5) Input/output error

The repairs also showed several warnings like:

tcmalloc: large alloc 17162051584 bytes == 0x56167918a000 @ 
0x7f5626521887 0x56126a287229 0x56126a2873a3 0x56126a5dc1ec 
0x56126a584ce2 0x56126a586a05 0x56126a587dd0 0x56126a589344 
0x56126a38c3cf 0x56126a2eae94 0x56126a30654e 0x56126a337ae1 
0x56126a1a73a1 0x7f561b228b97 0x56126a28077a


The processes showed up with like 45 GB of RAM used. Fortunately, 
there was no Out-Of-Memory.


  Harry
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Ceph-large] Large Omap Warning on Log pool

2019-06-12 Thread Casey Bodley


Hi Aaron,

The data_log objects are storing logs for multisite replication. Judging 
by the pool name '.us-phx2.log', this cluster was created before jewel. 
Are you (or were you) using multisite or radosgw-agent?


If not, you'll want to turn off the logging (log_meta and log_data -> 
false) in your zonegroup configuration using 'radosgw-admin zonegroup 
get/set', restart gateways, then delete the data_log and meta_log objects.


If it is multisite, then the logs should all be trimmed in the 
background as long as all peer zones are up-to-date. There was a bug 
prior to 12.2.12 that prevented datalog trimming 
(http://tracker.ceph.com/issues/38412).


Casey


On 6/11/19 5:41 PM, Aaron Bassett wrote:

Hey all,
I've just recently upgraded some of my larger rgw clusters to latest luminous 
and now I'm getting a lot of warnings about large omap objects. Most of them 
were on the indices and I've taken care of them by sharding where appropriate. 
However on two of my clusters I have a large object in the rgw log pool.

ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
 1 large objects found in pool '.us-phx2.log'
 Search the cluster log for 'Large omap object found' for more details.


2019-06-11 10:50:04.583354 7f8d2b737700  0 log_channel(cluster) log [WRN] : 
Large omap object found. Object: 51:b9a904f6:::data_log.27:head Key count: 
15903755 Size (bytes): 2305116273


I'm not sure what to make of this. I don't see much chatter on the mailing 
lists about the log pool, other than a thread about swift lifecycles, which I 
dont use.  The log pool is pretty large, making it difficult to poke around in 
there:

.us-phx2.log 51  118GiB  0.03384TiB 
 12782413

That said i did a little poking around and it looks like a mix of these 
data_log object and some delete hints, but mostly a lot of objects starting 
with dates that point to different s3 pools. The object referenced in the osd 
log has 15912300  omap keys, and spot checking it, it looks like it's mostly 
referencing a pool we use with out dns resolver. We have a dns service that 
checks rgw endpoint health by uploading and deleting an object every few 
minutes to check health, and adds/removes endpoints from the A record as 
indicated.

So I guess I've got a few questions:

1) what is the nature of the data in the data_log.* objects in the log pool? Is 
it safe to remove or is it more like a binlog that needs to be intact from the 
beginning of time?

2) with the log pool in general, beyond the individual objects omap sizes, is 
there any concern about size? If so, is there a way to force it to truncate? I 
see some log commands in radosgw-admin, but documentation is light.


Thanks,
Aaron

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
Ceph-large mailing list
ceph-la...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] is rgw crypt default encryption key long term supported ?

2019-06-11 Thread Casey Bodley

The server side encryption features all require special x-amz headers on 
write, so they only apply to our S3 apis. But objects encrypted with 
SSE-KMS (or a default encryption key) can be read without any x-amz 
headers, so swift should be able to decrypt them too. I agree that this 
is a bug and opened http://tracker.ceph.com/issues/40257.


On 6/7/19 7:03 AM, Scheurer François wrote:

Hello Casey


We found something weird during our testing of the 
rgw_crypt_default_encryption_key=""xxx"  parameter.

s3cms behaves like expected:
s3cmd is then always writing encrypted objects
s3cmd can read encrypted and unencrypted objects

but swift does not support encryption:
swift can read only unencrypted objects (encrypted objects return error md5sum 
!= etag)
swift is not using encryption during writes (to demonstrate we can remove the 
rgw_crypt_default_encryption_key param and verify that the object is still 
readable).


Is that a bug?

Thank you .


Cheers
Francois



From: Scheurer François
Sent: Wednesday, May 29, 2019 9:28 AM
To: Casey Bodley; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

Hello Casey


Thank you for your reply.
To close this subject, one last question.

Do you know if it is possible to rotate the key defined by 
"rgw_crypt_default_encryption_key=" ?


Best Regards
Francois Scheurer



________
From: Casey Bodley 
Sent: Tuesday, May 28, 2019 5:37 PM
To: Scheurer François; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

On 5/28/19 11:17 AM, Scheurer François wrote:

Hi Casey


I greatly appreciate your quick and helpful answer :-)



It's unlikely that we'll do that, but if we do it would be announced with a 
long deprecation period and migration strategy.

Fine, just the answer we wanted to hear ;-)



However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.

sse-kms is working great, no issue or gaps with it.
We already use it in our openstack (rocky) with barbican and ceph/radosgw 
(luminous).

But we have customers that want encryption by default, something like SSE-S3 
(cf. below).
Do you know if there are plans to implement something similar?

I would love to see support for sse-s3. We've talked about building
something around vault (which I think is what minio does?), but so far
nobody has taken it up as a project.

Using dm-crypt would cost too much time for the conversion (72x 8TB SATA 
disks...) .
And dm-crypt is also storing its key on the monitors (cf. 
https://www.spinics.net/lists/ceph-users/msg52402.html).


Best Regards
Francois Scheurer


Amazon SSE-3 description:

https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html
Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption 
Keys (SSE-S3)
Server-side encryption protects data at rest. Amazon S3 encrypts each object 
with a unique key. As an additional safeguard, it encrypts the key itself with 
a master key that it rotates regularly. Amazon S3 server-side encryption uses 
one of the strongest block ciphers available, 256-bit Advanced Encryption 
Standard (AES-256), to encrypt your data.


https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTencryption.html
The following is an example of the request body for setting SSE-S3.
http://s3.amazonaws.com/doc/2006-03-01/;>

  
  AES256
  










________
From: Casey Bodley 
Sent: Tuesday, May 28, 2019 3:55 PM
To: Scheurer François; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

Hi François,


Removing support for either of rgw_crypt_default_encryption_key or
rgw_crypt_s3_kms_encryption_keys would mean that objects encrypted with
those keys would no longer be accessible. It's unlikely that we'll do
that, but if we do it would be announced with a long deprecation period
and migration strategy.


However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.


Casey


[1]
https://ceph.com/community/new-mimic-centralized-configuration-management/

[2]
http://docs.ceph.com/docs/mimic/rados/configuration/ceph-conf/#monitor-configuration-database


On 5/28/19 6:39 AM, Scheurer François wrote:

Dear Casey, Dear Ceph Users The following is written in the radosgw
documentation
(http://docs.ceph.com/docs/luminous/radosgw/encryption/): rgw

Re: [ceph-users] is rgw crypt default encryption key long term supported ?

2019-05-28 Thread Casey Bodley




On 5/28/19 11:17 AM, Scheurer François wrote:

Hi Casey


I greatly appreciate your quick and helpful answer :-)



It's unlikely that we'll do that, but if we do it would be announced with a 
long deprecation period and migration strategy.

Fine, just the answer we wanted to hear ;-)



However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.

sse-kms is working great, no issue or gaps with it.
We already use it in our openstack (rocky) with barbican and ceph/radosgw 
(luminous).

But we have customers that want encryption by default, something like SSE-S3 
(cf. below).
Do you know if there are plans to implement something similar?
I would love to see support for sse-s3. We've talked about building 
something around vault (which I think is what minio does?), but so far 
nobody has taken it up as a project.


Using dm-crypt would cost too much time for the conversion (72x 8TB SATA 
disks...) .
And dm-crypt is also storing its key on the monitors (cf. 
https://www.spinics.net/lists/ceph-users/msg52402.html).


Best Regards
Francois Scheurer
  


Amazon SSE-3 description:

https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html
Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption 
Keys (SSE-S3)
Server-side encryption protects data at rest. Amazon S3 encrypts each object 
with a unique key. As an additional safeguard, it encrypts the key itself with 
a master key that it rotates regularly. Amazon S3 server-side encryption uses 
one of the strongest block ciphers available, 256-bit Advanced Encryption 
Standard (AES-256), to encrypt your data.

  
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTencryption.html

The following is an example of the request body for setting SSE-S3.
http://s3.amazonaws.com/doc/2006-03-01/;>
   
 
 AES256
 











From: Casey Bodley 
Sent: Tuesday, May 28, 2019 3:55 PM
To: Scheurer François; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

Hi François,


Removing support for either of rgw_crypt_default_encryption_key or
rgw_crypt_s3_kms_encryption_keys would mean that objects encrypted with
those keys would no longer be accessible. It's unlikely that we'll do
that, but if we do it would be announced with a long deprecation period
and migration strategy.


However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.


Casey


[1]
https://ceph.com/community/new-mimic-centralized-configuration-management/

[2]
http://docs.ceph.com/docs/mimic/rados/configuration/ceph-conf/#monitor-configuration-database


On 5/28/19 6:39 AM, Scheurer François wrote:

Dear Casey, Dear Ceph Users The following is written in the radosgw
documentation
(http://docs.ceph.com/docs/luminous/radosgw/encryption/): rgw crypt
default encryption key = 4YSmvJtBv0aZ7geVgAsdpRnLBEwWSWlMIGnRS8a9TSA=

   Important: This mode is for diagnostic purposes only! The ceph
configuration file is not a secure method for storing encryption keys.

 Keys that are accidentally exposed in this way should be
considered compromised.




Is the warning only about the key exposure risk or does it mean also
that the feature could be removed in future?


The is also another similar parameter "rgw crypt s3 kms encryption
keys" (cf. usage example in
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030679.html).
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030679.html>


Both parameters are still interesting (provided the ceph.conf is
encrypted) but we want to be sure that they will not be dropped in future.




Best Regards

Francois


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] is rgw crypt default encryption key long term supported ?

2019-05-28 Thread Casey Bodley

Hi François,

Removing support for either of rgw_crypt_default_encryption_key or
rgw_crypt_s3_kms_encryption_keys would mean that objects encrypted with
those keys would no longer be accessible. It's unlikely that we'll do
that, but if we do it would be announced with a long deprecation period
and migration strategy.

However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.

Casey

[1]
https://ceph.com/community/new-mimic-centralized-configuration-management/

[2]
http://docs.ceph.com/docs/mimic/rados/configuration/ceph-conf/#monitor-configuration-database

On 5/28/19 6:39 AM, Scheurer François wrote:
Dear Casey, Dear Ceph Users The following is written in the radosgw
documentation
(http://docs.ceph.com/docs/luminous/radosgw/encryption/): rgw crypt
default encryption key = 4YSmvJtBv0aZ7geVgAsdpRnLBEwWSWlMIGnRS8a9TSA=

Important: This mode is for diagnostic purposes only! The ceph
configuration file is not a secure method for storing encryption keys.

Keys that are accidentally exposed in this way should be
considered compromised.

Is the warning only about the key exposure risk or does it mean also
that the feature could be removed in future?

The is also another similar parameter "rgw crypt s3 kms encryption
keys" (cf. usage example in
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030679.html).

Both parameters are still interesting (provided the ceph.conf is
encrypted) but we want to be sure that they will not be dropped in future.

Best Regards

Francois

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] large omap object in usage_log_pool

2019-05-24 Thread Casey Bodley

On 5/24/19 1:15 PM, shubjero wrote:

Thanks for chiming in Konstantin!

Wouldn't setting this value to 0 disable the sharding?

Reference: http://docs.ceph.com/docs/mimic/radosgw/config-ref/

rgw override bucket index max shards
Description:Represents the number of shards for the bucket index
object, a value of zero indicates there is no sharding. It is not
recommended to set a value too large (e.g. thousand) as it increases
the cost for bucket listing. This variable should be set in the client
or global sections so that it is automatically applied to
radosgw-admin commands.
Type:Integer
Default:0

rgw dynamic resharding is enabled:
ceph daemon mon.controller1 config show | grep rgw_dynamic_resharding
"rgw_dynamic_resharding": "true",

I'd like to know more about the purpose of our .usage pool and the
'usage_log_pool' in general as I cant find much about this component
of ceph.

You can find docs for the usage log at
http://docs.ceph.com/docs/master/radosgw/admin/#usage

Unless trimmed, the usage log will continue to grow. If you aren't using
it, I'd recommend turning it off and trimming it all.

On Thu, May 23, 2019 at 11:24 PM Konstantin Shalygin wrote:

in the config.
```"rgw_override_bucket_index_max_shards": "8",```. Should this be
increased?

Should be decreased to default `0`, I think.

Modern Ceph releases resolve large omaps automatically via bucket dynamic
resharding:

```

{
"option": {
"name": "rgw_dynamic_resharding",
"type": "bool",
"level": "basic",
"desc": "Enable dynamic resharding",
"long_desc": "If true, RGW will dynamicall increase the number of shards in
buckets that have a high number of objects per shard.",
"default": true,
"daemon_default": "",
"tags": [],
"services": [
"rgw"
],
"see_also": [
"rgw_max_objs_per_shard"
],
"min": "",
"max": ""
}
}
```

```

{
"option": {
"name": "rgw_max_objs_per_shard",
"type": "int64_t",
"level": "basic",
"desc": "Max objects per shard for dynamic resharding",
"long_desc": "This is the max number of objects per bucket index shard that
RGW will allow with dynamic resharding. RGW will trigger an automatic reshard operation on the
bucket if it exceeds this number.",
"default": 10,
"daemon_default": "",
"tags": [],
"services": [
"rgw"
],
"see_also": [
"rgw_dynamic_resharding"
],
"min": "",
"max": ""
}
}
```

So when your bucket reached new 100k objects rgw will shard this bucket
automatically.

Some old buckets may be not sharded, like your ancients from Giant. You can
check fill status like this: `radosgw-admin bucket limit check | jq '.[]'`. If
some buckets is not reshared you can shart it by hand via `radosgw-admin
reshard add ...`. Also, there may be some stale reshard instances (fixed ~ in
12.2.11), you can check it via `radosgw-admin reshard stale-instances list` and
then remove via `reshard stale-instances rm`.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw object size limit?

2019-05-10 Thread Casey Bodley




On 5/10/19 10:20 AM, Jan Kasprzak wrote:

Hello Casey (and the ceph-users list),

I am returning to my older problem to which you replied:

Casey Bodley wrote:
: There is a rgw_max_put_size which defaults to 5G, which limits the
: size of a single PUT request. But in that case, the http response
: would be 400 EntityTooLarge. For multipart uploads, there's also a
: rgw_multipart_part_upload_limit that defaults to 1 parts, which
: would cause a 416 InvalidRange error. By default though, s3cmd does
: multipart uploads with 15MB parts, so your 11G object should only
: have ~750 parts.
:
: Are you able to upload smaller objects successfully? These
: InvalidRange errors can also result from failures to create any
: rados pools that didn't exist already. If that's what you're
: hitting, you'd get the same InvalidRange errors for smaller object
: uploads, and you'd also see messages like this in your radosgw log:
:
: > rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34)
: Numerical result out of range (this can be due to a pool or
: placement group misconfiguration, e.g. pg_num < pgp_num or
: mon_max_pg_per_osd exceeded)

You are right. Now how do I know which pool it is and what is the
reason?

Anyway, If I try to upload a CentOS 7 ISO image using
Perl module Net::Amazon::S3, it works. I do something like this there:

 my $bucket = $s3->add_bucket({
 bucket => 'testbucket',
 acl_short => 'private',
 });
$bucket->add_key_filename("testdir/$dst", $file, {
content_type => 'application/octet-stream'
 }) or die $s3->err . ': ' . $s3->errstr;

and I see the following in /var/log/ceph/ceph-client.rgwlog:

2019-05-10 15:55:28.394 7f4b859b8700  1 civetweb: 0x558108506000: 127.0.0.1 - - 
[10/May/2019:15:53:50 +0200] "PUT 
/testbucket/testdir/CentOS-7-x86_64-Everything-1810.iso HTTP/1.1" 200 234 - 
libwww-perl/6.38

I can see the uploaded object using "s3cmd ls", and I can download it back
using "s3cmd get", with matching sha1sum. When I do the same using
"s3cmd put" instead of Perl module, I indeed get the pool create failure:

2019-05-10 15:53:14.914 7f4b859b8700  1 == starting new request 
req=0x7f4b859af850 =
2019-05-10 15:53:15.492 7f4b859b8700  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of range (this can 
be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num or 
mon_max_pg_per_osd exceeded)
2019-05-10 15:53:15.492 7f4b859b8700  1 == req done req=0x7f4b859af850 op 
status=-34 http_status=416 ==
2019-05-10 15:53:15.492 7f4b859b8700  1 civetweb: 0x558108506000: 127.0.0.1 - - 
[10/May/2019:15:53:14 +0200] "POST /testbucket/testdir/c7.iso?uploads HTTP/1.0" 
416 469 - -

So maybe the Perl module is configured differently? But which pool or
other parameter is the problem? I have the following pools:

# ceph osd pool ls
one
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.data


It looks like the default.rgw.buckets.non-ec pool is missing, which is 
where we track in-progress multipart uploads. So I'm guessing that your 
perl client is not doing a multipart upload, where s3cmd does by default.


I'd recommend debugging this by trying to create the pool manually - the 
only requirement for this pool is that it not be erasure coded. See the 
docs for your ceph release for more information:


http://docs.ceph.com/docs/luminous/rados/operations/pools/#create-a-pool

http://docs.ceph.com/docs/luminous/rados/operations/placement-groups/


(the "one" pool is unrelated to RadosGW, it contains OpenNebula RBD images).

Thanks,

-Yenya

: On 3/7/19 12:21 PM, Jan Kasprzak wrote:
: >  Hello, Ceph users,
: >
: >does radosgw have an upper limit of object size? I tried to upload
: >a 11GB file using s3cmd, but it failed with InvalidRange error:
: >
: >$ s3cmd put --verbose 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso s3://mybucket/
: >INFO: No cache file found, creating it.
: >INFO: Compiling list of local files...
: >INFO: Running stat() and reading/calculating MD5 values on 1 files, this may 
take some time...
: >INFO: Summary: 1 local files to upload
: >WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner username not known. 
Storing UID=108 instead.
: >WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner groupname not known. 
Storing GID=108 instead.
: >ERROR: S3 error: 416 (InvalidRange)
: >
: >$ ls -lh centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
: >-rw-r--r--. 1 108 108 11G Nov 26 15:28 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
: >
: >Thanks for any hint how to increase the limit.
: >
: >-Yenya
: >
: ___
: ceph-users mailing list

Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-07 Thread Casey Bodley



On 5/7/19 11:24 AM, EDH - Manuel Rios Fernandez wrote:

Hi Casey

ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
(stable)

Reshard is something than don’t allow us customer to list index?


Reshard does not prevent buckets from being listed, it just spreads the 
index over more rados objects (so more osds). Bucket sharding does have 
an impact on listing performance though, because each request to list 
the bucket has to read from every shard of the bucket index in order to 
sort the entries. If any of those osds have performance issues or slow 
requests, that would slow down all bucket listings.



Regards


-Mensaje original-
De: ceph-users  En nombre de Casey Bodley
Enviado el: martes, 7 de mayo de 2019 17:07
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker
diferent.

When the bucket id is different than the bucket marker, that indicates the
bucket has been resharded. Bucket stats shows 128 shards, which is
reasonable for that object count. The rgw.none category in bucket stats is
nothing to worry about.

What ceph version is this? This reminds me of a fix in
https://github.com/ceph/ceph/pull/23940, which I now see never got its
backports to mimic or luminous. :(

On 5/7/19 10:20 AM, EDH - Manuel Rios Fernandez wrote:

Hi Ceph’s

We got an issue that we’re still looking the cause after more than 60
hour searching a misconfiguration.

After cheking a lot of documentation and Questions we find that
bucket id and bucket marker are not the same. We compared all our
other bucket and all got the same id and marker.

Also found some bucket with the rgw.none section an another not.

This bucket is unable to be listed in a fashionable time. Customer
relaxed usage from 120TB to 93TB , from 7Million objects to 5.8M.

We isolated a single petition in a RGW server and check some metric,
just try to list this bucket generate 2-3Gbps traffic from RGW to
OSD/MON’s.

I asked at IRC if there’re any problem about index pool be in other
root in the same site at crushmap and we think that shouldn’t be.

Any idea or suggestion, however crazy, will be proven.

Our relevant configuration that may help :

CEPH DF:

ceph df

GLOBAL:

     SIZE AVAIL   RAW USED %RAW USED

     684 TiB 139 TiB  545 TiB 79.70

POOLS:

     NAME   ID USED    %USED MAX AVAIL OBJECTS

volumes    21 3.3 TiB 63.90   1.9 TiB
831300

backups    22 0 B 0   1.9 TiB
0

     images 23 1.8 TiB 49.33   1.9

TiB

237066

vms    24 3.4 TiB 64.85   1.9 TiB
811534

openstack-volumes-archive  25  30 TiB 47.92    32 TiB
7748864

.rgw.root  26 1.6 KiB 0   1.9 TiB
4

default.rgw.control    27 0 B 0   1.9 TiB
100

default.rgw.data.root  28  56 KiB 0   1.9 TiB
186

default.rgw.gc 29 0 B 0   1.9 TiB
32

default.rgw.log    30 0 B 0   1.9 TiB
175

default.rgw.users.uid  31 4.9 KiB 0   1.9 TiB
   26

default.rgw.users.email    36    12 B 0   1.9 TiB
1

default.rgw.users.keys 37   243 B 0   1.9 TiB
14

default.rgw.buckets.index  38 0 B 0   1.9 TiB
1056

default.rgw.buckets.data   39 245 TiB 93.84    16 TiB
102131428

default.rgw.buckets.non-ec 40 0 B 0   1.9 TiB
23046

default.rgw.usage  43 0 B 0    1.9

TiB

6

CEPH OSD Distribution:

ceph osd tree

ID  CLASS   WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-41 654.84045 root archive

-37 130.96848 host CEPH-ARCH-R03-07

100 archive 10.91399 osd.100   up  1.0 1.0

101 archive 10.91399 osd.101   up  1.0 1.0

102 archive 10.91399 osd.102   up  1.0 1.0

103 archive 10.91399 osd.103   up  1.0 1.0

104 archive 10.91399 osd.104   up  1.0 1.0

105 archive 10.91399 osd.105   up  1.0 1.0

106 archive 10.91409 osd.106   up  1.0 1.0

107 archive 10.91409 osd.107   up  1.0 1.0

108 archive 10.91409 osd.108   up  1.0 1.0

109 archive 10.91409 osd.109   up  1.0 1.0

110 archive 10.91409 osd.110   up  1.0 1.0

111 archive 10.91409 osd.111   up  1.0 1.0

-23 130.96800 host CEPH005

   4 archive 10.91399 osd.4 up  1.0 1.0

41 archive 10.91399 osd.41    up  1.0 1.0

74 archive 10.91399 osd.74    up  1.0 1.0

75 archive 10.91399 osd.75    up  1.0 1.0

81 archive 10.91399

Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-07 Thread Casey Bodley

When the bucket id is different than the bucket marker, that indicates 
the bucket has been resharded. Bucket stats shows 128 shards, which is 
reasonable for that object count. The rgw.none category in bucket stats 
is nothing to worry about.


What ceph version is this? This reminds me of a fix in 
https://github.com/ceph/ceph/pull/23940, which I now see never got its 
backports to mimic or luminous. :(


On 5/7/19 10:20 AM, EDH - Manuel Rios Fernandez wrote:


Hi Ceph’s

We got an issue that we’re still looking the cause after more than 60 
hour searching a misconfiguration.


After cheking a lot of documentation and Questions we find that 
bucket id and bucket marker are not the same. We compared all our 
other bucket and all got the same id and marker.


Also found some bucket with the rgw.none section an another not.

This bucket is unable to be listed in a fashionable time. Customer 
relaxed usage from 120TB to 93TB , from 7Million objects to 5.8M.


We isolated a single petition in a RGW server and check some metric, 
just try to list this bucket generate 2-3Gbps traffic from RGW to 
OSD/MON’s.


I asked at IRC if there’re any problem about index pool be in other 
root in the same site at crushmap and we think that shouldn’t be.


Any idea or suggestion, however crazy, will be proven.

Our relevant configuration that may help :

CEPH DF:

ceph df

GLOBAL:

    SIZE AVAIL   RAW USED %RAW USED

    684 TiB 139 TiB  545 TiB 79.70

POOLS:

    NAME   ID USED    %USED MAX AVAIL OBJECTS

volumes    21 3.3 TiB 63.90   1.9 
TiB    831300


backups    22 0 B 0   1.9 
TiB 0


    images 23 1.8 TiB 49.33   1.9 
TiB    237066


vms    24 3.4 TiB 64.85   1.9 
TiB    811534


openstack-volumes-archive  25  30 TiB 47.92    32 
TiB   7748864


.rgw.root  26 1.6 KiB 0   1.9 
TiB 4


default.rgw.control    27 0 B 0   1.9 
TiB   100


default.rgw.data.root  28  56 KiB 0   1.9 
TiB   186


default.rgw.gc 29 0 B 0   1.9 
TiB    32


default.rgw.log    30 0 B 0   1.9 
TiB   175


default.rgw.users.uid  31 4.9 KiB 0   1.9 TiB  
  26


default.rgw.users.email    36    12 B 0   1.9 
TiB 1


default.rgw.users.keys 37   243 B 0   1.9 
TiB    14


default.rgw.buckets.index  38 0 B 0   1.9 TiB  
1056


default.rgw.buckets.data   39 245 TiB 93.84    16 TiB 
102131428


default.rgw.buckets.non-ec 40 0 B 0   1.9 TiB 
23046


default.rgw.usage  43 0 B 0    1.9 
TiB 6


CEPH OSD Distribution:

ceph osd tree

ID  CLASS   WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-41 654.84045 root archive

-37 130.96848 host CEPH-ARCH-R03-07

100 archive 10.91399 osd.100   up  1.0 1.0

101 archive 10.91399 osd.101   up  1.0 1.0

102 archive 10.91399 osd.102   up  1.0 1.0

103 archive 10.91399 osd.103   up  1.0 1.0

104 archive 10.91399 osd.104   up  1.0 1.0

105 archive 10.91399 osd.105   up  1.0 1.0

106 archive 10.91409 osd.106   up  1.0 1.0

107 archive 10.91409 osd.107   up  1.0 1.0

108 archive 10.91409 osd.108   up  1.0 1.0

109 archive 10.91409 osd.109   up  1.0 1.0

110 archive 10.91409 osd.110   up  1.0 1.0

111 archive 10.91409 osd.111   up  1.0 1.0

-23 130.96800 host CEPH005

  4 archive 10.91399 osd.4 up  1.0 1.0

41 archive 10.91399 osd.41    up  1.0 1.0

74 archive 10.91399 osd.74    up  1.0 1.0

75 archive 10.91399 osd.75    up  1.0 1.0

81 archive 10.91399 osd.81    up  1.0 1.0

82 archive 10.91399 osd.82    up  1.0 1.0

83 archive 10.91399 osd.83    up  1.0 1.0

84 archive 10.91399 osd.84    up  1.0 1.0

85 archive 10.91399 osd.85    up  1.0 1.0

86 archive 10.91399 osd.86    up  1.0 1.0

87 archive 10.91399 osd.87    up  1.0 1.0

88 archive 10.91399 osd.88    up  1.0 1.0

-17 130.96800 host CEPH006

  7 archive 10.91399 osd.7 up  1.0 1.0

  8 archive 10.91399

Re: [ceph-users] Object Gateway - Server Side Encryption

2019-04-25 Thread Casey Bodley



On 4/25/19 11:33 AM, Francois Scheurer wrote:

Hello Amardeep
We are trying the same as you on luminous.
s3cmd --access_key xxx  --secret_key xxx  --host-bucket '%(bucket)s.s3.xxx.ch' 
--host s3.xxx.ch --signature-v2 --no-preserve --server-side-encryption \
--server-side-encryption-kms-idhttps://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
  put hello.txt3 s3://test/hello.txt3

upload: 'hello.txt3' -> 's3://test/hello.txt3'  [1 of 1]
  13 of 13   100% in    0s    14.25 B/s  done
ERROR: S3 error: 400 (InvalidArgument): Failed to retrieve the actual key, 
kms-keyid:https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
openstack --os-cloud fsc-ac secret 
gethttps://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
+---+--+
| Field | Value 
   |
+---+--+
| Secret href   
|https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
  |
| Name  | fsc-key3  
   |
| Created   | 2019-04-25T14:31:52+00:00 
   |
| Status    | ACTIVE
   |
| Content types | {u'default': u'application/octet-stream'} 
   |
| Algorithm | aes   
   |
| Bit length    | 256   
   |
| Secret type   | opaque
   |
| Mode  | cbc   
   |
| Expiration    | 2020-01-01T00:00:00+00:00 
   |
+---+--+
We also tried using --server-side-encryption-kms-id 
ffa60094-f88b-41a4-b63f-c07a017ad2b5
or --server-side-encryption-kms-id fsc-key3 with the same error.


vim /etc/ceph/ceph.conf
 rgw barbican url =https://barbican.service.xxx.ch
 rgw keystone barbican user = rgwcrypt
 rgw keystone barbican password = xxx
 rgw keystone barbican project = service
 rgw keystone barbican domain = default
 rgw crypt require ssl = false
Thank you in advance for your help.



Best Regards
Francois Scheurer

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


I think rgw is expecting these keyids to look like 
"ffa60094-f88b-41a4-b63f-c07a017ad2b5", so it doesn't url-encode them 
when sending the request to barbican. In this case, the keyid is itself 
a url, so rgw is sending a request to 
"https://barbican.service.xxx.ch/v1/secrets/https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5;. 
It's hard to tell without logs from barbican, but I suspect that it's 
trying to interpret the slashes as part of the request path, rather than 
part of the keyid.


So I would recommend using keyids of the form 
"ffa60094-f88b-41a4-b63f-c07a017ad2b5", but would also consider the lack 
of url-encoding to be a bug. I opened a ticket for this at 
http://tracker.ceph.com/issues/39488 - feel free to add more information 
there. Barbican log output showing the request/response would be helpful!


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multi-site replication speed

2019-04-16 Thread Casey Bodley


Hi Brian,

On 4/16/19 1:57 AM, Brian Topping wrote:
On Apr 15, 2019, at 5:18 PM, Brian Topping > wrote:


If I am correct, how do I trigger the full sync?


Apologies for the noise on this thread. I came to discover the 
`radosgw-admin [meta]data sync init` command. That’s gotten me with 
something that looked like this for several hours:



[root@master ~]# radosgw-admin  sync status
          realm 54bb8477-f221-429a-bbf0-76678c767b5f (example)
      zonegroup 8e33f5e9-02c8-4ab8-a0ab-c6a37c2bcf07 (us)
           zone b6e32bc8-f07e-4971-b825-299b5181a5f0 (secondary)
  metadata sync preparing for full sync
                full sync: 64/64 shards
                full sync: 0 entries to sync
                incremental sync: 0/64 shards
                metadata is behind on 64 shards
                behind shards: 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]

      data sync source: 35835cb0-4639-43f4-81fd-624d40c7dd6f (master)
                        preparing for full sync
                        full sync: 1/128 shards
                        full sync: 0 buckets to sync
                        incremental sync: 127/128 shards
                        data is behind on 1 shards
                        behind shards: [0]


I also had the data sync showing a list of “behind shards”, but both 
of them sat in “preparing for full sync” for several hours, so I tried 
`radosgw-admin [meta]data sync run`. My sense is that was a bad idea, 
but neither of the commands seem to be documented and the thread I 
found them on indicated they wouldn’t damage the source data.


QUESTIONS at this point:

1) What is the best sequence of commands to properly start the sync? 
Does init just set things up and do nothing until a run is started?
The sync is always running. Each shard starts with full sync (where it 
lists everything on the remote, and replicates each), then switches to 
incremental sync (where it polls the replication logs for changes). The 
'metadata sync init' command clears the sync status, but this isn't 
synchronized with the metadata sync process running in radosgw(s) - so 
the gateways need to restart before they'll see the new status and 
restart the full sync. The same goes for 'data sync init'.
2) Are there commands I should run before that to clear out any 
previous bad runs?

Just restart gateways, and you should see progress via 'sync status'.


*Thanks very kindly for any assistance. *As I didn’t really see any 
documentation outside of setting up the realms/zones/groups, it seems 
like this would be useful information for others that follow.


best, Brian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site

2019-04-05 Thread Casey Bodley

Hi Iain,

Resharding is not supported in multisite. The issue is that the master zone
needs to be authoritative for all metadata. If bucket reshard commands run
on the secondary zone, they create new bucket instance metadata that the
master zone never sees, so replication can't reconcile those changes.

The 'stale-instances rm' command is not safe to run in multisite because it
can misidentify as 'stale' some bucket instances that were deleted on the
master zone, where data sync on the secondary zone hasn't yet finished
deleting all of the objects it contained. Deleting these bucket instances
and their associated bucket index objects would leave any remaining objects
behind as orphans and leak storage capacity.

On Thu, Apr 4, 2019 at 3:28 PM Iain Buclaw  wrote:

> On Wed, 3 Apr 2019 at 09:41, Iain Buclaw  wrote:
> >
> > On Tue, 19 Feb 2019 at 10:11, Iain Buclaw  wrote:
> > >
> > >
> > > # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> > > 7511
> > >
> > > # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> > > 3509
> > >
> > > # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> > > 3801
> > >
> >
> > Documentation is a horrid mess around the subject on multi-site
> resharding
> >
> >
> http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding
> >
> >
> https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html
> > (Manual Resharding)
> >
> >
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw
> >
> > All disagree with each other over the correct process to reshard
> > indexes in multi-site.  Worse, none of them seem to work correctly
> > anyway.
> >
> > Changelog of 13.2.5 looked promising up until the sentence: "These
> > commands should not be used on a multisite setup as the stale
> > instances may be unlikely to be from a reshard and can have
> > consequences".
> >
> > http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic
> >
>
> The stale-instances feature only correctly identifies one stale shard.
>
> # radosgw-admin reshard stale-instances list
> [
> "mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1"
> ]
>
> I can confirm this is one of the orphaned index objects.
>
> # rados -p .rgw.buckets.index ls | grep
> 0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.0
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.3
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.9
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.5
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.2
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.7
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.1
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.10
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.4
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.6
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.11
> .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8
>
> I would assume then that unlike what documentation says, it's safe to
> run 'reshard stale-instances rm' on a multi-site setup.
>
> However it is quite telling if the author of this feature doesn't
> trust what they have written to work correctly.
>
> There are still thousands of stale index objects that 'stale-instances
> list' didn't pick up though.  But it appears that radosgw-admin only
> looks at 'metadata list bucket' data, and not what is physically
> inside the pool.
>
> --
> Iain Buclaw
>
> *(p < e ? p++ : p) = (c & 0x0f) + '0';
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Looking up buckets in multi-site radosgw configuration

2019-03-19 Thread Casey Bodley

On 3/19/19 12:05 AM, David Coles wrote:

I'm looking at setting up a multi-site radosgw configuration where
data is sharded over multiple clusters in a single physical location;
and would like to understand how Ceph handles requests in this
configuration.

Looking through the radosgw source[1] it looks like radowgw will
return 301 redirect if I request a bucket that is not in the current
zonegroup. This redirect appears to be to the endpoint for the
zonegroup (I assume as configured by `radosgw-admin zonegroup create
--endpoints`). This seems like it would work well for multiple
geographic regions (e.g. us-east and us-west) for ensuring that a
request is redirected to the region (zonegroup) that hosts the bucket.
We could possibly improve this by virtual hosted buckets and having
DNS point to the correct region for that bucket.

I notice that it's also possible to configure zones in a zonegroup
that don't peform replication[2] (e.g. us-east-1 and us-east-2). In
this case I assume that if I direct a request to the wrong zone, then
Ceph will just report that the object as not-found because, despite
the bucket metadata being replicated from the zonegroup master, the
objects will never be replicated from one zone to the other. Another
layer (like a consistent hash across the bucket name or database)
would be required for routing to the correct zone.

Is this mostly correct? Are there other ways of controlling which
cluster data is placed (i.e. placement groups)?

Yeah, correct on both points. The zonegroup redirects would be the only
way to guide clients between clusters.

Thanks!

1.
https://github.com/ceph/ceph/blob/affb7d396f76273e885cfdbcd363c1882496726c/src/rgw/rgw_op.cc#L653-L669
2.
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_red_hat_enterprise_linux/multi_site#configuring_multiple_zones_without_replication

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados Gateway using S3 Api does not store file correctly

2019-03-18 Thread Casey Bodley

Hi Dan,

We just got a similar report about SSE-C in
http://tracker.ceph.com/issues/38700 that seems to be related to
multipart uploads. Could you please add some details there about your s3
client, its multipart chunk size, and your ceph version?

On 3/18/19 2:38 PM, Dan Smith wrote:

Hello,

I have stored more than 167 million files in ceph using the S3 api.
Out of those 167 million+ files, one file is not storing correctly.

The file is 92MB in size. I have stored files much larger and much
smaller. If I store the file WITHOUT using the Customer Provided
256-bit AES key using Server Side encryption, the file stores and
retrieves just fine (SHA256 hashes match).

If I store the file USING the 256-bit AES key using Server Side
encryption, the file stores without error, however, when I retrieve
the file and compare the hash of the file I retrieve from ceph against
the hash of the original file, the hashes differ.

If I store the file using Amazon S3, using the same AES key and their
server side encryption the file stores are retrieves using out issue
(hashes match).

I can reproduce this issue in two different ceph environments.
Thankfully, the file I am storing is not confidential, so I can share
it out to anyone interested in this
issue.(https://s3.amazonaws.com/aws-website-afewgoodmenrankedfantasyfootballcom-j5gvt/delete-me)

I have opened a ticket with our vendor for support, but I am hoping
someone might be able to give me some ideas on what might be going on
as well.

Cheers,
Dan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Need clarification about RGW S3 Bucket Tagging

2019-03-14 Thread Casey Bodley

The bucket policy documentation just lists which actions the policy 
engine understands. Bucket tagging isn't supported, so those requests 
were misinterpreted as normal PUT requests to create a bucket. I opened 
https://github.com/ceph/ceph/pull/26952 to return 405 Method Not Allowed 
there instead and update the doc to clarify that it's not supported.


If anyone's interested in working on this feature, the rgw team would 
happy to assist!


Thanks,
Casey

On 3/14/19 4:05 AM, Konstantin Shalygin wrote:


Hi.

I CC'ed Casey Bodley as new RGW tech lead.

Luminous doc [1] tells that s3:GetBucketTagging & s3:PutBucketTagging 
methods is supported.But actually PutBucketTagging fails on Luminous 
12.2.11 RGW with "provided input did not specify location constraint 
correctly", I think is issue [2], but why issue type for this ticket 
is changed to feature? Is that this mode is unsupported in Luminous 
and this is a doc bug or this really bug and should be fixed?



Thanks,

k

[1] http://docs.ceph.com/docs/luminous/radosgw/bucketpolicy/

[2] https://tracker.ceph.com/issues/24443



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-08 Thread Casey Bodley

(cc ceph-users)

Can you tell whether these sync errors are coming from metadata sync or 
data sync? Are they blocking sync from making progress according to your 
'sync status'?

On 3/8/19 10:23 AM, Trey Palmer wrote:

Casey,

Having done the 'reshard stale-instances delete' earlier on the advice 
of another list member, we have tons of sync errors on deleted 
buckets, as you mention.

After 'data sync init' we're still seeing all of these errors on 
deleted buckets.

Since buckets are metadata, it occurred to me this morning that 
buckets are metadata so a 'sync init' wouldn't refresh that info.  
 But a 'metadata sync init' might get rid of the stale bucket sync 
info and stop the sync errors.   Would that be the way to go?

Thanks,

Trey

On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley <mailto:cbod...@redhat.com>> wrote:

Hi Trey,

I think it's more likely that these stale metadata entries are from
deleted buckets, rather than accidental bucket reshards. When a
bucket
is deleted in a multisite configuration, we don't delete its bucket
instance because other zones may still need to sync the object
deletes -
and they can't make progress on sync if the bucket metadata
disappears.
These leftover bucket instances look the same to the 'reshard
stale-instances' commands, but I'd be cautious about using that to
remove them in multisite, as it may cause more sync errors and
potentially leak storage if they still contain objects.

Regarding 'datalog trim', that alone isn't safe because it could trim
entries that hadn't been applied on other zones yet, causing them to
miss some updates. What you can do is run 'data sync init' on each
zone,
and restart gateways. This will restart with a data full sync (which
will scan all buckets for changes), and skip past any datalog entries
from before the full sync. I was concerned that the bug in error
handling (ie "ERROR: init sync on...") would also affect full
sync, but
that doesn't appear to be the case - so I do think that's worth
trying.

On 3/5/19 6:24 PM, Trey Palmer wrote:
> Casey,
>
> Thanks very much for the reply!
>
> We definitely have lots of errors on sync-disabled buckets and the
> workaround for that is obvious (most of them are empty anyway).
>
> Our second form of error is stale buckets.  We had dynamic
resharding
> enabled but have now disabled it (having discovered it was on by
> default, and not supported in multisite).
>
> We removed several hundred stale buckets via 'radosgw-admin
sharding
> stale-instances rm', but they are still giving us sync errors.
>
> I have found that these buckets do have entries in 'radosgw-admin
> datalog list', and my guess is this could be fixed by doing a
> 'radosgw-admin datalog trim' for each entry on the master zone.
>
> Does that sound right?  :-)
>
> Thanks again for the detailed explanation,
>
> Trey Palmer
>
> On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley mailto:cbod...@redhat.com>
> <mailto:cbod...@redhat.com <mailto:cbod...@redhat.com>>> wrote:
>
>     Hi Christian,
>
>     I think you've correctly intuited that the issues are related to
>     the use
>     of 'bucket sync disable'. There was a bug fix for that
feature in
> http://tracker.ceph.com/issues/26895, and I recently found that a
>     block
>     of code was missing from its luminous backport. That missing
code is
>     what handled those "ERROR: init sync on 
failed,
>     retcode=-2" errors.
>
>     I included a fix for that in a later backport
>     (https://github.com/ceph/ceph/pull/26549), which I'm still
working to
>     get through qa. I'm afraid I can't really recommend a workaround
>     for the
>     issue in the meantime.
>
>     Looking forward though, we do plan to support something like
s3's
>     cross
>     region replication so you can enable replication on a
specific bucket
>     without having to enable it globally.
>
>     Casey
>
>
>     On 3/5/19 2:32 PM, Christian Rice wrote:
>     >
>     > Much appreciated.  We’ll continue to poke around and
certainly will
>     > disable the dynamic resharding.
>     >
>     > We started with 12.2.8 in production.  We definitely did not
>     have it
>     > enabled in ceph.conf
>     >
>     > *From: *Matthew H mailto:matthew.he...@hotmail.com>
>     <mailto:matthew.he...@hotmail.com
<mailto:matthew.he...@hotmail.com>>>
>

Re: [ceph-users] Radosgw object size limit?

2019-03-07 Thread Casey Bodley

There is a rgw_max_put_size which defaults to 5G, which limits the size 
of a single PUT request. But in that case, the http response would be 
400 EntityTooLarge. For multipart uploads, there's also a 
rgw_multipart_part_upload_limit that defaults to 1 parts, which 
would cause a 416 InvalidRange error. By default though, s3cmd does 
multipart uploads with 15MB parts, so your 11G object should only have 
~750 parts.


Are you able to upload smaller objects successfully? These InvalidRange 
errors can also result from failures to create any rados pools that 
didn't exist already. If that's what you're hitting, you'd get the same 
InvalidRange errors for smaller object uploads, and you'd also see 
messages like this in your radosgw log:


> rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34) 
Numerical result out of range (this can be due to a pool or placement 
group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd 
exceeded)


On 3/7/19 12:21 PM, Jan Kasprzak wrote:

Hello, Ceph users,

does radosgw have an upper limit of object size? I tried to upload
a 11GB file using s3cmd, but it failed with InvalidRange error:

$ s3cmd put --verbose centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso 
s3://mybucket/
INFO: No cache file found, creating it.
INFO: Compiling list of local files...
INFO: Running stat() and reading/calculating MD5 values on 1 files, this may 
take some time...
INFO: Summary: 1 local files to upload
WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner username not known. Storing 
UID=108 instead.
WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner groupname not known. 
Storing GID=108 instead.
ERROR: S3 error: 416 (InvalidRange)

$ ls -lh centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
-rw-r--r--. 1 108 108 11G Nov 26 15:28 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso

Thanks for any hint how to increase the limit.

-Yenya


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Casey Bodley

Hi Trey,

I think it's more likely that these stale metadata entries are from 
deleted buckets, rather than accidental bucket reshards. When a bucket 
is deleted in a multisite configuration, we don't delete its bucket 
instance because other zones may still need to sync the object deletes - 
and they can't make progress on sync if the bucket metadata disappears. 
These leftover bucket instances look the same to the 'reshard 
stale-instances' commands, but I'd be cautious about using that to 
remove them in multisite, as it may cause more sync errors and 
potentially leak storage if they still contain objects.

Regarding 'datalog trim', that alone isn't safe because it could trim 
entries that hadn't been applied on other zones yet, causing them to 
miss some updates. What you can do is run 'data sync init' on each zone, 
and restart gateways. This will restart with a data full sync (which 
will scan all buckets for changes), and skip past any datalog entries 
from before the full sync. I was concerned that the bug in error 
handling (ie "ERROR: init sync on...") would also affect full sync, but 
that doesn't appear to be the case - so I do think that's worth trying.

On 3/5/19 6:24 PM, Trey Palmer wrote:

Casey,

Thanks very much for the reply!

We definitely have lots of errors on sync-disabled buckets and the 
workaround for that is obvious (most of them are empty anyway).

Our second form of error is stale buckets.  We had dynamic resharding 
enabled but have now disabled it (having discovered it was on by 
default, and not supported in multisite).

We removed several hundred stale buckets via 'radosgw-admin sharding 
stale-instances rm', but they are still giving us sync errors.

I have found that these buckets do have entries in 'radosgw-admin 
datalog list', and my guess is this could be fixed by doing a 
'radosgw-admin datalog trim' for each entry on the master zone.

Does that sound right?  :-)

Thanks again for the detailed explanation,

Trey Palmer

On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley <mailto:cbod...@redhat.com>> wrote:

Hi Christian,

I think you've correctly intuited that the issues are related to
the use
of 'bucket sync disable'. There was a bug fix for that feature in
http://tracker.ceph.com/issues/26895, and I recently found that a
block
of code was missing from its luminous backport. That missing code is
what handled those "ERROR: init sync on  failed,
retcode=-2" errors.

I included a fix for that in a later backport
(https://github.com/ceph/ceph/pull/26549), which I'm still working to
get through qa. I'm afraid I can't really recommend a workaround
for the
issue in the meantime.

Looking forward though, we do plan to support something like s3's
cross
region replication so you can enable replication on a specific bucket
without having to enable it globally.

Casey

On 3/5/19 2:32 PM, Christian Rice wrote:
>
> Much appreciated.  We’ll continue to poke around and certainly will
> disable the dynamic resharding.
>
> We started with 12.2.8 in production.  We definitely did not
have it
> enabled in ceph.conf
>
> *From: *Matthew H mailto:matthew.he...@hotmail.com>>
> *Date: *Tuesday, March 5, 2019 at 11:22 AM
> *To: *Christian Rice mailto:cr...@pandora.com>>, ceph-users
> mailto:ceph-users@lists.ceph.com>>
> *Cc: *Trey Palmer mailto:nerdmagic...@gmail.com>>
> *Subject: *Re: radosgw sync falling behind regularly
>
> Hi Christian,
>
> To be on the safe side and future proof yourself will want to go
ahead
> and set the following in your ceph.conf file, and then issue a
restart
> to your RGW instances.
>
> rgw_dynamic_resharding = false
>
> There are a number of issues with dynamic resharding, multisite rgw
> problems being just one of them. However I thought it was disabled
> automatically when multisite rgw is used (but I will have to double
> check the code on that). What version of Ceph did you initially
> install the cluster with? Prior to v12.2.2 this feature was
enabled by
> default for all rgw use cases.
>
> Thanks,
>
>

>
> *From:*Christian Rice mailto:cr...@pandora.com>>
> *Sent:* Tuesday, March 5, 2019 2:07 PM
> *To:* Matthew H; ceph-users
> *Subject:* Re: radosgw sync falling behind regularly
>
> Matthew, first of all, let me say we very much appreciate your help!
>
> So I don’t think we turned dynamic resharding on, nor did we
manually
> reshard buckets. Seems like it defaults to on for luminous but the
> mimic docs say it’s not suppo

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Casey Bodley


Hi Christian,

I think you've correctly intuited that the issues are related to the use 
of 'bucket sync disable'. There was a bug fix for that feature in 
http://tracker.ceph.com/issues/26895, and I recently found that a block 
of code was missing from its luminous backport. That missing code is 
what handled those "ERROR: init sync on  failed, 
retcode=-2" errors.


I included a fix for that in a later backport 
(https://github.com/ceph/ceph/pull/26549), which I'm still working to 
get through qa. I'm afraid I can't really recommend a workaround for the 
issue in the meantime.


Looking forward though, we do plan to support something like s3's cross 
region replication so you can enable replication on a specific bucket 
without having to enable it globally.


Casey


On 3/5/19 2:32 PM, Christian Rice wrote:


Much appreciated.  We’ll continue to poke around and certainly will 
disable the dynamic resharding.


We started with 12.2.8 in production.  We definitely did not have it 
enabled in ceph.conf


*From: *Matthew H 
*Date: *Tuesday, March 5, 2019 at 11:22 AM
*To: *Christian Rice , ceph-users 


*Cc: *Trey Palmer 
*Subject: *Re: radosgw sync falling behind regularly

Hi Christian,

To be on the safe side and future proof yourself will want to go ahead 
and set the following in your ceph.conf file, and then issue a restart 
to your RGW instances.


rgw_dynamic_resharding = false

There are a number of issues with dynamic resharding, multisite rgw 
problems being just one of them. However I thought it was disabled 
automatically when multisite rgw is used (but I will have to double 
check the code on that). What version of Ceph did you initially 
install the cluster with? Prior to v12.2.2 this feature was enabled by 
default for all rgw use cases.


Thanks,



*From:*Christian Rice 
*Sent:* Tuesday, March 5, 2019 2:07 PM
*To:* Matthew H; ceph-users
*Subject:* Re: radosgw sync falling behind regularly

Matthew, first of all, let me say we very much appreciate your help!

So I don’t think we turned dynamic resharding on, nor did we manually 
reshard buckets. Seems like it defaults to on for luminous but the 
mimic docs say it’s not supported in multisite.  So do we need to 
disable it manually via tell and ceph.conf?


Also, after running the command you suggested, all the stale instances 
are gone…these from my examples were in output:


    "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303",


    "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299",


    "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301",


Though we still get lots of log messages like so in rgw:

2019-03-05 11:01:09.526120 7f64120ae700  0 ERROR: failed to get bucket 
instance info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299


2019-03-05 11:01:09.528664 7f63e5016700  1 civetweb: 0x55976f1c2000: 
172.17.136.17 - - [05/Mar/2019:10:54:06 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e 
HTTP/1.1" 404 0 - -


2019-03-05 11:01:09.529648 7f64130b0700  0 meta sync: ERROR: can't 
remove key: 
bucket.instance:sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299 
ret=-2


2019-03-05 11:01:09.530324 7f64138b1700  0 ERROR: failed to get bucket 
instance info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299


2019-03-05 11:01:09.530345 7f6405094700  0 data sync: ERROR: failed to 
retrieve bucket info for 
bucket=sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299


2019-03-05 11:01:09.531774 7f6405094700  0 data sync: WARNING: 
skipping data log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299


2019-03-05 11:01:09.571680 7f6405094700  0 data sync: ERROR: init sync 
on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302 
failed, retcode=-2


2019-03-05 11:01:09.573179 7f6405094700  0 data sync: WARNING: 
skipping data log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302


2019-03-05 11:01:13.504308 7f63f903e700  1 civetweb: 0x55976f0f2000: 
10.105.18.20 - - [05/Mar/2019:11:00:57 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e 
HTTP/1.1" 404 0 - -


*From: *Matthew H 
*Date: *Tuesday, March 5, 2019 at 10:03 AM
*To: *Christian Rice , ceph-users 


*Subject: *Re: radosgw sync falling behind regularly

Hi Christian,

You have stale bucket instances that need to be clean up, which

Re: [ceph-users] Multisite Ceph setup sync issue

2019-01-29 Thread Casey Bodley

On Tue, Jan 29, 2019 at 12:24 PM Krishna Verma  wrote:
>
> Hi Ceph Users,
>
>
>
> I need your to fix sync issue in multisite setup.
>
>
>
> I have 2 cluster in different datacenter that we want to use for 
> bidirectional data replication. By followed the documentation 
> http://docs.ceph.com/docs/master/radosgw/multisite/ I have setup the gateway 
> on each site but when I am checking the sync status its getting failed as 
> below:
>
>
>
> Admin node at master :
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin data sync status
>
> ERROR: source zone not specified
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin realm list
>
> {
>
> "default_info": "1102c891-d81c-480e-9487-c9f874287d13",
>
> "realms": [
>
> "georep",
>
> "geodata"
>
> ]
>
> }
>
>
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin zonegroup list
>
> read_default_id : 0
>
> {
>
> "default_info": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
>
> "zonegroups": [
>
> "noida"
>
> ]
>
> }
>
>
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin zone list
>
> {
>
> "default_info": "71931e0e-1be6-449f-af34-edb4166c4e4a",
>
> "zones": [
>
> "noida1"
>
> ]
>
> }
>
>
>
> [cephuser@vlno-ceph01 cluster]$
>
>
>
> [cephuser@vlno-ceph01 cluster]$ cat ceph.conf
>
> [global]
>
> fsid = d52e50a4-ed2e-44cc-aa08-9309bc539a55
>
> mon_initial_members = vlno-ceph01
>
> mon_host = 172.23.16.67
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> # Your network address
>
> public network = 172.23.16.0/24
>
> osd pool default size = 2
>
> rgw_override_bucket_index_max_shards = 100
>
> debug ms = 1
>
> debug rgw = 20
>
> [cephuser@vlno-ceph01 cluster]$
>
>
>
> On Master Gateway :
>
>
>
> [cephuser@zabbix-server ~]$ cat /etc/ceph/ceph.conf
>
> [global]
>
> fsid = d52e50a4-ed2e-44cc-aa08-9309bc539a55
>
> mon_initial_members = vlno-ceph01
>
> mon_host = 172.23.16.67
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> # Your network address
>
> public network = 172.23.16.0/24
>
> osd pool default size = 2
>
> rgw_override_bucket_index_max_shards = 100
>
> debug ms = 1
>
> debug rgw = 20
>
> [client.rgw.zabbix-server]
>
> host = zabbix-server
>
> rgw frontends = "civetweb port=7480"
>
> rgw_zone=noida1
>
> [cephuser@zabbix-server ~]$
>
>
>
>
>
> On Secondary site admin node.
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ radosgw-admin realm list
>
> {
>
> "default_info": "1102c891-d81c-480e-9487-c9f874287d13",
>
> "realms": [
>
> "georep"
>
> ]
>
> }
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ radosgw-admin zonegroup list
>
> read_default_id : 0
>
> {
>
> "default_info": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
>
> "zonegroups": [
>
> "noida",
>
> "default"
>
> ]
>
> }
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ radosgw-admin zone list
>
> {
>
> "default_info": "45c690a8-f39c-4b1d-9faf-e0e991ceaaac",
>
> "zones": [
>
> "san-jose"
>
> ]
>
> }
>
>
>
> [cephuser@vlsj-kverma1 cluster]$
>
>
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ cat ceph.conf
>
> [global]
>
> fsid = c626be3a-4536-48b9-8db8-470437052313
>
> mon_initial_members = vlsj-kverma1
>
> mon_host = 172.18.84.131
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> # Your network address
>
> public network = 172.18.84.0/24
>
> osd pool default size = 2
>
> rgw_override_bucket_index_max_shards = 100
>
> debug ms = 1
>
> debug rgw = 20
>
>
>
>
>
> [cephuser@vlsj-kverma1 cluster]$
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ radosgw-admin data sync status
>
> 2019-01-28 10:33:12.163298 7f11c24c79c0  1 Cannot find zone 
> id=45c690a8-f39c-4b1d-9faf-e0e991ceaaac (name=san-jose), switching to local 
> zonegroup configuration
>
> ERROR: source zone not specified
>
> [cephuser@vlsj-kverma1 cluster]$
>
>
>
> On Secondary site Gateway host:
>
>
>
> [cephuser@zabbix-client ceph]$ cat /etc/ceph/ceph.conf
>
> [global]
>
> fsid = c626be3a-4536-48b9-8db8-470437052313
>
> mon_initial_members = vlsj-kverma1
>
> mon_host = 172.18.84.131
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> # Your network address
>
> public network = 172.18.84.0/24
>
> osd pool default size = 2
>
> rgw_override_bucket_index_max_shards = 100
>
> debug ms = 1
>
> debug rgw = 20
>
> [client.rgw.zabbix-client]
>
> host = zabbix-client
>
> rgw frontends = "civetweb port=7480"
>
> rgw_zone=san-jose
>
>
>
> [cephuser@zabbix-client ceph]$
>
>
>
>
>
>
>
> Appreciate any help in the setup.
>
>
>
> /Krishna
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

The 'radosgw-admin data sync status' command requires a --source-zone
argument, which is generally the zone name on the opposite cluster.
But you're probably just looking for the

Re: [ceph-users] Bucket logging howto

2019-01-28 Thread Casey Bodley

On Sat, Jan 26, 2019 at 6:57 PM Marc Roos  wrote:
>
>
>
>
> From the owner account of the bucket I am trying to enable logging, but
> I don't get how this should work. I see the s3:PutBucketLogging is
> supported, so I guess this should work. How do you enable it? And how do
> you access the log?
>
>
> [@ ~]$ s3cmd -c .s3cfg accesslog s3://archive Access logging for:
> s3://archive/
>Logging Enabled: False
>
> [@ ~]$ s3cmd -c .s3cfg.archive accesslog s3://archive
> --access-logging-target-prefix=s3://archive/xx
> ERROR: S3 error: 405 (MethodNotAllowed)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi Marc,

The s3:PutBucketLogging action is recognized by bucket policy, but the
feature is otherwise not supported.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw/s3: performance of range requests

2019-01-07 Thread Casey Bodley



On 1/7/19 3:15 PM, Giovani Rinaldi wrote:

Hello!

I've been wondering if range requests are more efficient than doing 
"whole" requests for relatively large objects (100MB-1GB).
More precisely, my doubt is regarding the use of OSD/RGW resources, 
that is, does the entire object is retrieved from the OSD only to be 
sliced afterwards? Or only the requested portion is read/sent from the 
OSD to the RGW?


The reason is that, in my scenario, the entire object may be requested 
to ceph eventually, either via multiple range requests or a single 
request.
But, from my application point of view, it would be more efficient to 
retrieve such object partially as needed, although only if such range 
requests do not end up using more resources than necessary from my 
ceph cluster (such as retrieving the whole object for each range request).


I've searched the online documentation, as well as the mailing list, 
but failed to find any indicative of how range requests are processed 
by ceph.


Thanks in advance.
Giovani.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Giovani,

RGW will only fetch the minimum amount of data from rados needed to 
satisfy the range request.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] any way to see enabled/disabled status of bucket sync?

2019-01-02 Thread Casey Bodley


Hi Christian,

The easiest way to do that is probably the 'radosgw-admin bucket sync 
status' command, which will print "Sync is disabled for bucket ..." if 
disabled. Otherwise, you could use 'radosgw-admin metadata get' to 
inspect that flag in the bucket instance metadata.



On 12/31/18 2:20 PM, Christian Rice wrote:


Is there a command that will show me the current status of bucket sync 
(enabled vs disabled)?


Referring to 
https://github.com/ceph/ceph/blob/b5f33ae3722118ec07112a4fe1bb0bdedb803a60/src/rgw/rgw_admin.cc#L1626



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin unable to store user information

2019-01-02 Thread Casey Bodley



On 12/26/18 4:58 PM, Dilip Renkila wrote:

Hi all,

Some useful information

>>/>> />>///What do the following return?/
>>/>> >> />>/>> >> $ radosgw-admin zone get/
/root@ctrl1:~# radosgw-admin zone get { "id": 
"8bfdf8a3-c165-44e9-9ed6-deff8a5d852f", "name": "default", 
"domain_root": "default.rgw.meta:root", "control_pool": 
"default.rgw.control", "gc_pool": "default.rgw.log:gc", "lc_pool": 
"default.rgw.log:lc", "log_pool": "default.rgw.log", 
"intent_log_pool": "default.rgw.log:intent", "usage_log_pool": 
"default.rgw.log:usage", "reshard_pool": "default.rgw.log:reshard", 
"user_keys_pool": "default.rgw.meta:users.keys", "user_email_pool": 
"default.rgw.meta:users.email", "user_swift_pool": 
"default.rgw.meta:users.swift", "user_uid_pool": 
"default.rgw.meta:users.uid", "otp_pool": "default.rgw.otp", 
"system_key": { "access_key": "", "secret_key": "" }, 
"placement_pools": [ { "key": "default-placement", "val": { 
"index_pool": "default.rgw.buckets.index", "data_pool": 
"default.rgw.buckets.data", "data_extra_pool": 
"default.rgw.buckets.non-ec", "index_type": 0, "compression": "" } } 
], "metadata_heap": "", "realm_id": "" }/
>>/>> >> /radosgw-admin user info 
--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f" 
--debug-ms=1 --debug-rgw=20 --debug-objecter=20 --log-to-stderr//

https://etherpad.openstack.org/p/loPctEQWFU
//
>>/>> >> />>/>> >> $ rados lspools/
/root@ctrl1:~# rados lspools cinder-volumes-sas ephemeral-volumes 
.rgw.root rgw1 defaults.rgw.buckets.data default.rgw.control 
default.rgw.meta defaults.rgw.buckets.index default.rgw.log 
cinder-volumes-nvme default.rgw.buckets.index images 
default.rgw.buckets.data /

//
/
/
Best Regards / Kind Regards

Dilip Renkila


Den ons 26 dec. 2018 kl 22:29 skrev Dilip Renkila 
mailto:dilip.renk...@linserv.se>>:


Hi all,

I have a ceph radosgw deployment as openstack swift backend with
multitenancy enabled in rgw.

I can create containers and store data through swift api.

I am trying to retrieve user data from radosgw-admin cli tool for
an user. I am able to get only admin user info but no one else.
$  radosgw-admin user info
--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f"
could not fetch user info: no user info saved

$  radosgw-admin user list
[
"0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f",
"32a7cd9b37bb40168200bae69015311a$32a7cd9b37bb40168200bae69015311a",
"2eea218eea984dd68f1378ea21c64b83$2eea218eea984dd68f1378ea21c64b83",
    "admin",
"032f07e376404586b53bb8c3bfd6d1d7$032f07e376404586b53bb8c3bfd6d1d7",
"afcf7fc3fd5844ea920c2028ebfa5832$afcf7fc3fd5844ea920c2028ebfa5832",
"5793054cd0fe4a018e959eb9081442a8$5793054cd0fe4a018e959eb9081442a8",
"d4f6c1bd190d40feb8379625bcf2bc39$d4f6c1bd190d40feb8379625bcf2bc39",
"8f411343b44143d2b116563c177ed93d$8f411343b44143d2b116563c177ed93d",
"0a49f61d66644fb2a10d664d5b79b1af$0a49f61d66644fb2a10d664d5b79b1af",
"a1dd449c9ce64345af2a7fb05c4aa21f$a1dd449c9ce64345af2a7fb05c4aa21f",
"a5442064c50a4b9bbf854d15748f99d4$a5442064c50a4b9bbf854d15748f99d4"
]



The general format of these objects names is 'tenant$uid', so you may 
need to specify them separately ie. radosgw-admin user info --tenant= --uid=





Debug output
$ radosgw-admin user info
--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f"
--debug_rgw=20 --log-to-stderr
2018-12-26 22:25:10.722 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bfe20 obj=.rgw.root:default.realm
state=0x5571718d9000 s->prefetch_data=0
2018-12-26 22:25:10.722 7fbc24ff9700  2
RGWDataChangesLog::ChangesRenewThread: start
2018-12-26 22:25:10.726 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bf3d0 obj=.rgw.root:converted state=0x5571718d9000
s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bee50 obj=.rgw.root:default.realm
state=0x5571718e35a0 s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bef40 obj=.rgw.root:zonegroups_names.default
state=0x5571718e35a0 s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
s->obj_tag was set empty
2018-12-26 22:25:10.730 7fbc4999e740 20 rados->read ofs=0 len=524288
2018-12-26 22:25:10.730 7fbc4999e740 20 rados->read r=0 bl.length=46
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate:
got zonegroup_info.b7493bbe-a638-4950-a4d5-716919e5d150
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate:
got zonegroup_info.23e74943-f594-44cb-a3bb-3a2150804dd3
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate:
got zone_info.9be46480-91cb-437b-87e1-eb6eff862767
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate:
got zone_info.8bfdf8a3-c165-44e9-9ed6-deff8a5d852f
2018-12-26 22:25:10.742

Re: [ceph-users] civitweb segfaults

2018-12-11 Thread Casey Bodley


Hi Leon,

Are you running with a non-default value of rgw_gc_max_objs? I was able 
to reproduce this exact stack trace by setting rgw_gc_max_objs = 0; I 
can't think of any other way to get a 'Floating point exception' here.


On 12/11/18 10:31 AM, Leon Robinson wrote:

Hello, I have found a surefire way to bring down our swift gateways.

First, upload a bunch of large files and split it in to segments, e.g.

for i in {1..100}; do swift upload test_container -S 10485760 
CentOS-7-x86_64-GenericCloud.qcow2 --object-name 
CentOS-7-x86_64-GenericCloud.qcow2-$i; done


This creates 100 objects in test_container and 1000 or so objects in 
test_container_segments


Then, Delete them. Preferably in a ludicrous manner.

for i in $(swift list test_container); do swift delete test_container 
$i; done


What results is:

 -13> 2018-12-11 15:17:57.627655 7fc128b49700  1 -- 
172.28.196.121:0/464072497 <== osd.480 172.26.212.6:6802/2058882 1 
 osd_op_reply(11 .dir.default.1083413551.2.7 [call,call] 
v1423252'7548804 uv7548804 ondisk = 0) v8  213+0+0 (3895049453 0 
0) 0x55c98f45e9c0 con 0x55c98f4d7800
   -12> 2018-12-11 15:17:57.627827 7fc0e3ffe700  1 -- 
172.28.196.121:0/464072497 --> 172.26.221.7:6816/2366816 -- 
osd_op(unknown.0.0:12 14.110b 
14:d08c26b8:::default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10%2f1532606905.440697%2f938016768%2f10485760%2f0037:head 
[cmpxattr user.rgw.idtag (25) op 1 mode 1,call rgw.obj_remove] snapc 
0=[] ondisk+write+known_if_redirected e1423252) v8 -- 0x55c98f4603c0 con 0
   -11> 2018-12-11 15:17:57.628582 7fc128348700  5 -- 
172.28.196.121:0/157062182 >> 172.26.225.9:6828/2257653 
conn(0x55c98f0eb000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH 
pgs=540 cs=1 l=1). rx osd.87 seq 2 0x55c98f4603c0 osd_op_reply(340 
obj_delete_at_hint.55 [call] v1423252'9217746 uv9217746 ondisk 
= 0) v8
   -10> 2018-12-11 15:17:57.628604 7fc128348700  1 -- 
172.28.196.121:0/157062182 <== osd.87 172.26.225.9:6828/2257653 2  
osd_op_reply(340 obj_delete_at_hint.55 [call] v1423252'9217746 
uv9217746 ondisk = 0) v8  173+0+0 (3971813511 0 0) 0x55c98f4603c0 
con 0x55c98f0eb000
-9> 2018-12-11 15:17:57.628760 7fc1017f9700  1 -- 
172.28.196.121:0/157062182 --> 172.26.225.9:6828/2257653 -- 
osd_op(unknown.0.0:341 13.4f 
13:f3db1134:::obj_delete_at_hint.55:head [call timeindex.list] 
snapc 0=[] ondisk+read+known_if_redirected e1423252) v8 -- 
0x55c98f45fa00 con 0
-8> 2018-12-11 15:17:57.629306 7fc128348700  5 -- 
172.28.196.121:0/157062182 >> 172.26.225.9:6828/2257653 
conn(0x55c98f0eb000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH 
pgs=540 cs=1 l=1). rx osd.87 seq 3 0x55c98f45fa00 osd_op_reply(341 
obj_delete_at_hint.55 [call] v0'0 uv9217746 ondisk = 0) v8
-7> 2018-12-11 15:17:57.629326 7fc128348700  1 -- 
172.28.196.121:0/157062182 <== osd.87 172.26.225.9:6828/2257653 3  
osd_op_reply(341 obj_delete_at_hint.55 [call] v0'0 uv9217746 
ondisk = 0) v8  173+0+15 (3272189389 0 2149983739) 0x55c98f45fa00 
con 0x55c98f0eb000
-6> 2018-12-11 15:17:57.629398 7fc128348700  5 -- 
172.28.196.121:0/464072497 >> 172.26.221.7:6816/2366816 
conn(0x55c98f4d6000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH 
pgs=181 cs=1 l=1). rx osd.58 seq 2 0x55c98f45fa00 osd_op_reply(12 
default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10/1532606905.440697/938016768/10485760/0037 
[cmpxattr (25) op 1 mode 1,call] v1423252'743755 uv743755 ondisk = 0) v8
-5> 2018-12-11 15:17:57.629418 7fc128348700  1 -- 
172.28.196.121:0/464072497 <== osd.58 172.26.221.7:6816/2366816 2  
osd_op_reply(12 
default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10/1532606905.440697/938016768/10485760/0037 
[cmpxattr (25) op 1 mode 1,call] v1423252'743755 uv743755 ondisk = 0) 
v8  290+0+0 (3763879162 0 0) 0x55c98f45fa00 con 0x55c98f4d6000
-4> 2018-12-11 15:17:57.629458 7fc1017f9700  1 -- 
172.28.196.121:0/157062182 --> 172.26.225.9:6828/2257653 -- 
osd_op(unknown.0.0:342 13.4f 
13:f3db1134:::obj_delete_at_hint.55:head [call lock.unlock] 
snapc 0=[] ondisk+write+known_if_redirected e1423252) v8 -- 
0x55c98f45fd40 con 0
-3> 2018-12-11 15:17:57.629603 7fc0e3ffe700  1 -- 
172.28.196.121:0/464072497 --> 172.26.212.6:6802/2058882 -- 
osd_op(unknown.0.0:13 15.1e0 
15:079bdcbb:::.dir.default.1083413551.2.7:head [call 
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[] 
ondisk+write+known_if_redirected e1423252) v8 -- 0x55c98f460700 con 0
-2> 2018-12-11 15:17:57.631312 7fc128b49700  5 -- 
172.28.196.121:0/464072497 >> 172.26.212.6:6802/2058882 
conn(0x55c98f4d7800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH 
pgs=202 cs=1 l=1). rx osd.480 seq 2 0x55c98f460700 osd_op_reply(13 
.dir.default.1083413551.2.7 [call,call] v1423252'7548805 uv7548805 
ondisk = 0) v8
-1> 2018-12-11 15:17:57.631329 7fc128b49700  1 -- 
172.28.196.121:0/464072497 <== osd.480 172.26.212.6:6802/2058882 2 
 osd_op_reply(13

Re: [ceph-users] rwg/civetweb log verbosity level

2018-11-28 Thread Casey Bodley

This stuff is logged under the 'civetweb' subsystem, so can be turned 
off with 'debug_civetweb = 0'. You can configure 'debug_rgw' separately.


On 11/28/18 1:03 AM, zyn赵亚楠 wrote:


Hi there,

I have a question about rgw/civetweb log settings.

Currently, rgw/civetweb prints 3 lines of logs with loglevel 1 (high 
priority) for each HTTP request, like following:


$ tail /var/log/ceph/ceph-client.rgw.node-1.log

2018-11-28 11:52:45.339229 7fbf2d693700  1 == starting new request 
req=0x7fbf2d68d190 =


2018-11-28 11:52:45.341961 7fbf2d693700  1 == req done 
req=0x7fbf2d68d190 op status=0 http_status=200 ==


2018-11-28 11:52:45.341993 7fbf2d693700  1 civetweb: 0x558f0433: 
127.0.0.1 - - [28/Nov/2018:11:48:10 +0800] "HEAD 
/swift/v1/images.xxx.com/8801234/BFAB307D-F5FE-4BC6-9449-E854944A460F_160_180.jpg 
HTTP/1.1" 1 0 - goswift/1.0


The above 3 lines occupies roughly 0.5KB space on average, varying a 
little with the lengths of bucket names and object names.


Now the problem is, when requests are intensive, it will consume a 
huge mount of space. For example, 4 million requests (on a single RGW 
node) will result to 2GB, which takes only ~6 hours to happen in our 
cluster node in busy period (a large part may be HEAD requests).


When trouble shooting, I usually need to turn the loglevel to 5, 10 or 
even bigger to check the detailed logs, but most of the log space is 
occupied by the above access logs (level 1), which doesn’t provide 
much information.


My question is, is there a way to configure Ceph skip those logs? E.g. 
only print logs with verbosity in a specified range (NOT support, 
according to my investigation).


Or, are there any suggested ways for turning on more logs for debugging?

Best Regards

Arthur Chiao


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Disabling RGW Encryption support in Luminous

2018-10-16 Thread Casey Bodley

That's not currently possible, no. And I don't think it's a good idea to 
add such a feature; if the client requests that something be encrypted, 
the server should either encrypt it or reject the request.


There is a config called rgw_crypt_s3_kms_encryption_keys that we use 
for testing, though, which allows you to specify a mapping of kms keyids 
to actual keys. If your client is using a limited number of kms keyids, 
you can provide keys for them and get limited sse-kms support without 
setting up an actual kms.


For example, this is our test configuration for use with s3tests:

rgw crypt s3 kms encryption keys = 
testkey-1=YmluCmJvb3N0CmJvb3N0LWJ1aWxkCmNlcGguY29uZgo= 
testkey-2=aWIKTWFrZWZpbGUKbWFuCm91dApzcmMKVGVzdGluZwo=


Where s3tests is sending requests with header 
x-amz-server-side-encryption-aws-kms-key-id: testkey1 or testkey2.


I hope that helps!
Casey

On 10/16/18 8:43 AM, Arvydas Opulskis wrote:

Hi,

got no success on IRC, maybe someone will help me here.

After RGW upgrade from Jewel to Luminous, one S3 user started to 
receive errors from his postgre wal-e solution. Error is like this: 
"Server Side Encryption with KMS managed key requires HTTP header 
x-amz-server-side-encryption : aws:kms".
After some reading, seems, like this client is forcing Server side 
encryption (SSE) on RGW and it is not configured. Because user can't 
disable encryption in his solution for now (it will be possible in 
future release), can I somehow disable Encryption support on Luminous 
RGW?


Thank you for your insights.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] can I define buckets in a multi-zone config that are exempted from replication?

2018-10-08 Thread Casey Bodley




On 10/08/2018 03:45 PM, Christian Rice wrote:


Just getting started here, but I am setting up a three-zone realm, 
each with a pair of S3 object gateways, Luminous on Debian.  I’m 
wondering if there’s a straightforward way to exempt some buckets from 
replicating to other zones?  The idea being there might be data that 
pertains to a specific zone…perhaps due to licensing or other more 
trivial technical reasons shouldn’t be transported off site.


Documentation at 
http://docs.ceph.com/docs/luminous/radosgw/s3/bucketops/ 
 suggests “A 
bucket can be constrained to a region by providing 
LocationConstraintduring a PUT request.”  Is this applicable to my 
multi-zone realm?


TIA,

Christian



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Christian,

A 'region' in radosgw corresponds to the zonegroup, so 
LocationConstraint isn't quite what you want. You can disable sync on a 
single bucket by running this command on the master zone:


$ radosgw-admin bucket sync disable --bucket=bucketname
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOS async client memory usage explodes when reading several objects in sequence

2018-09-12 Thread Casey Bodley




On 09/12/2018 05:29 AM, Daniel Goldbach wrote:

Hi all,

We're reading from a Ceph Luminous pool using the librados asychronous 
I/O API. We're seeing some concerning memory usage patterns when we 
read many objects in sequence.


The expected behaviour is that our memory usage stabilises at a small 
amount, since we're just fetching objects and ignoring their data. 
What we instead find is that the memory usage of our program grows 
linearly with the amount of data read for an interval of time, and 
then continues to grow at a much slower but still consistent pace. 
This memory is not freed until program termination. My guess is that 
this is an issue with Ceph's memory allocator.


To demonstrate, we create 2 objects of size 10KB, and of size 
100KB, and of size 1MB:


    #include 
    #include 
    #include 
    #include 

    int main() {
rados_t cluster;
rados_create(, "test");
rados_conf_read_file(cluster, "/etc/ceph/ceph.conf");
rados_connect(cluster);

rados_ioctx_t io;
rados_ioctx_create(cluster, "test", );

        char data[100];
memset(data, 'a', 100);

        char smallobj_name[16], mediumobj_name[16], largeobj_name[16];
        int i;
        for (i = 0; i < 2; i++) {
sprintf(smallobj_name, "10kobj_%d", i);
rados_write(io, smallobj_name, data, 1, 0);

sprintf(mediumobj_name, "100kobj_%d", i);
rados_write(io, mediumobj_name, data, 10, 0);

sprintf(largeobj_name, "1mobj_%d", i);
rados_write(io, largeobj_name, data, 100, 0);

printf("wrote %s of size 1, %s of size 10, %s of size 100\n",
      smallobj_name, mediumobj_name, largeobj_name);
        }

return 0;
    }

    $ gcc create.c -lrados -o create
    $ ./create
    wrote 10kobj_0 of size 1, 100kobj_0 of size 10, 1mobj_0 of 
size 100
    wrote 10kobj_1 of size 1, 100kobj_1 of size 10, 1mobj_1 of 
size 100

    [...]
    wrote 10kobj_19998 of size 1, 100kobj_19998 of size 10, 
1mobj_19998 of size 100
    wrote 10kobj_1 of size 1, 100kobj_1 of size 10, 
1mobj_1 of size 100


Now we read each of these objects with the async API, into the same 
buffer. First we read just the the 10KB objects first:


    #include 
    #include 
    #include 
    #include 
    #include 

    void readobj(rados_ioctx_t* io, char objname[]);

    int main() {
        rados_t cluster;
rados_create(, "test");
rados_conf_read_file(cluster, "/etc/ceph/ceph.conf");
rados_connect(cluster);

rados_ioctx_t io;
rados_ioctx_create(cluster, "test", );

        char smallobj_name[16];
        int i, total_bytes_read = 0;

        for (i = 0; i < 2; i++) {
sprintf(smallobj_name, "10kobj_%d", i);
readobj(, smallobj_name);

total_bytes_read += 1;
printf("Read %s for total %d\n", smallobj_name, total_bytes_read);
        }

getchar();
        return 0;
    }

    void readobj(rados_ioctx_t* io, char objname[]) {
        char data[100];
        unsigned long bytes_read;
rados_completion_t completion;
        int retval;

rados_read_op_t read_op = rados_create_read_op();
rados_read_op_read(read_op, 0, 1, data, _read, );
        retval = rados_aio_create_completion(NULL, NULL, NULL, 
);

assert(retval == 0);

        retval = rados_aio_read_op_operate(read_op, *io, completion, 
objname, 0);

assert(retval == 0);

rados_aio_wait_for_complete(completion);
rados_aio_get_return_value(completion);
    }

    $ gcc read.c -lrados -o read_small -Wall -g && ./read_small
    Read 10kobj_0 for total 1
    Read 10kobj_1 for total 2
    [...]
    Read 10kobj_19998 for total 1
    Read 10kobj_1 for total 2

We read 200MB. A graph of the resident set size of the program is 
attached as mem-graph-10k.png, with seconds on x axis and KB on the y 
axis. You can see that the memory usage increases throughout, which 
itself is unexpected since that memory should be freed over time and 
we should only hold 10KB of object data in memory at a time. The rate 
of growth decreases and eventually stabilises, and by the end we've 
used 60MB of RAM.


We repeat this experiment for the 100KB and 1MB objects and find that 
after all reads they use 140MB and 500MB of RAM, and memory usage 
presumably would continue to grow if there were more objects. This is 
orders of magnitude more memory than what I would expect these 
programs to use.


  * We do not get this behaviour with the synchronous API, and the
memory usage remains stable at just a few MB.
  * We've found that for some reason, this doesn't happen (or doesn't
happen as severely) if we intersperse large reads with much
smaller reads. In this case, the memory usage seems to stabilise
at a reasonable number.
  * Valgrind only reports a trivial amount of unreachable memory.
  * Memory usage doesn't increase in this manner if we repeatedly read
the same object over and over again. It hovers around 20MB.
  * In other experiments we've done, with different object data and
distributions of

Re: [ceph-users] data_extra_pool for RGW Luminous still needed?

2018-09-04 Thread Casey Bodley





On 09/03/2018 10:07 PM, Nhat Ngo wrote:


Hi all,


I am new to Ceph and we are setting up a new RadosGW and Ceph storage 
cluster on Luminous. We are using only EC for our `buckets.data` pool 
at the moment.



However, I just read the Red Hat Ceph object Gateway for Production 
article and it mentions an extra  duplicated `buckets.non-ec` pool is 
needed for multi-part uploads because each multi-upload parts must be 
stored without EC. EC will only apply to the whole objects, not 
partial uploads. Is this still hold true for Luminous?



The data layout document on Ceph does not make any mention of non-ec pool:

http://docs.ceph.com/docs/luminous/radosgw/layout/


Thanks,

*Nhat Ngo* | DevOps Engineer

Cloud Research Team, University of Melbourne, 3010, VIC
*Email: *nhat.n...@unimelb.edu.au



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Nhat,

The data extra pool is still necessary for multipart uploads, yes. This 
extra non-ec pool is only used for the 'multipart metadata' object that 
tracks which parts have been written, though - the object data for each 
part is still written to the normal data pool, so it can take advantage 
of erasure coding.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Delay replicate for ceph radosgw multi-site v2

2018-08-28 Thread Casey Bodley




On 08/28/2018 09:24 AM, Jason Dillaman wrote:

On Mon, Aug 27, 2018 at 11:19 PM đức phạm xuân  wrote:

Hello Jason Dillaman,

I'm working with Ceph Object Storage Multi-Site v2, ceph's version is mimic. 
Now I want to delay replicate data from a master site to a slave site. I don't 
know whether dose ceph has support the mechanism?

To be honest, I've never worked with RGW multisite so I am afraid I
can't immediately answer your question. I've CCed the ceph-users list
so that perhaps someone else that is more knowledgeable can answer.


--
Phạm Xuân Đức
Sinh viên Học Viện Kỹ thuật Mật Mã - khóa AT11
Mobile: +84165 417 1434
Skype: pxduc96
Email: ducp...@gmail.com


There isn't really a mechanism for this, no. Could you provide some more 
details about what exactly you're trying to accomplish?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?

2018-08-24 Thread Casey Bodley




On 08/24/2018 06:44 AM, Konstantin Shalygin wrote:


Answer to myself.

radosgw-admin realm create --rgw-realm=default --default
radosgw-admin zonegroup modify --rgw-zonegroup=default --rgw-realm=default
radosgw-admin period update --commit
radosgw-admin zonegroup placement add --rgw-zonegroup="default" \
  --placement-id="indexless-placement"
radosgw-admin zonegroup placement default 
--placement-id="default-placement"

radosgw-admin period update --commit
radosgw-admin zone placement add --rgw-zone="default" \
  --placement-id="indexless-placement" \
  --data-pool="default.rgw.buckets.data" \
  --index-pool="default.rgw.buckets.index" \
  --data_extra_pool="default.rgw.buckets.non-ec" \
  --placement-index-type="indexless"


Restart rgw instances and now is possible to create indexless buckets:

s3cmd mb s3://blindbucket --region=:indexless-placement


The documentation of Object Storage Gateway worse that for rbd or 
cephfs and have outdated (removed year ago) strings.


http://tracker.ceph.com/issues/18082

http://tracker.ceph.com/issues/24508

http://tracker.ceph.com/issues/8073

Hope this post will help somebody in future.



k



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Thank you very much! If anyone would like to help update these docs, I 
would be happy to help with guidance/review.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW pools don't show up in luminous

2018-08-24 Thread Casey Bodley




On 08/23/2018 01:22 PM, Robert Stanford wrote:


 I installed a new Ceph cluster with Luminous, after a long time 
working with Jewel.  I created my RGW pools the same as always (pool 
create default.rgw.buckets.data etc.), but they don't show up in ceph 
df with Luminous.  Has the command changed?


 Thanks
 R



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Robert,

Do you have a ceph-mgr running? I believe the accounting for 'ceph df' 
is performed by ceph-mgr in Luminous and beyond, rather than ceph-mon.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Reset Object ACLs in RGW

2018-08-02 Thread Casey Bodley




On 08/02/2018 07:35 AM, Thomas White wrote:

Hi all,

At present I have a cluster with a user on the RGW who has lost access to many 
of his files. The bucket has the correct ACL to be accessed by the account and 
so with their access and secret key many items can be listed, but are unable to 
be downloaded.

Is there a way of using the radosgw-admin tool to reset (or set) ACLs on 
individual files or recursively across bucket objects to restore access for 
them?

Kind Regards,

Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Tom,

I don't think radosgw-admin can do this. But you can create a system 
user (radosgw-admin user create --system ...) which overrides permission 
checks, and use it to issue s3 operations to manipulate the acls.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Why LZ4 isn't built with ceph?

2018-07-25 Thread Casey Bodley




On 07/25/2018 08:39 AM, Elias Abacioglu wrote:

Hi

I'm wondering why LZ4 isn't built by default for newer Linux distros 
like Ubuntu Xenial?
I understand that it wasn't built for Trusty because of too old lz4 
libraries. But why isn't built for the newer distros?


Thanks,
Elias


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Elias,

We only turned it on by default once it was available on all target 
platforms, which wasn't the case until the mimic release. This happened 
in https://github.com/ceph/ceph/pull/21332, with some prior discussion 
in https://github.com/ceph/ceph/pull/17038.


I don't know how to add build dependencies that are conditional on 
ubuntu version, but if you're keen to see this in luminous and have some 
debian packaging experience, you can target a PR against the luminous 
branch. I'm happy to help with review.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] multisite and link speed

2018-07-18 Thread Casey Bodley

On Tue, Jul 17, 2018 at 10:16 AM, Robert Stanford
 wrote:
>
>  I have ceph clusters in a zone configured as active/passive, or
> primary/backup.  If the network link between the two clusters is slower than
> the speed of data coming in to the active cluster, what will eventually
> happen?  Will data pool on the active cluster until memory runs out?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

The primary zone does not queue up changes in memory to push to other
zones. Instead, the sync process on the backup zone reads updates from
the primary zone. Sync will make as much progress as the link allows,
but if the primary cluster is constantly ingesting data at a higher
rate, the backup cluster will fall behind.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw non-ec pool and multipart uploads

2018-06-26 Thread Casey Bodley

Not quite. Only 'multipart meta' objects are stored in this non-ec pool 
- these objects just track a list of parts that have been written for a 
given multipart upload. This list is stored in the omap database, which 
isn't supported for ec pools. The actual object data for these parts are 
written to the normal data pool, and aren't moved/copied when the 
multipart upload completes.



On 06/26/2018 10:10 AM, Robert Stanford wrote:


 After I started using multipart uploads to RGW, Ceph automatically 
created a non-ec pool.  It looks like it stores object pieces there 
until all the pieces of a multipart upload arrive, then moves the 
completed piece to the normal rgw data pool.  Is this correct?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw bucket listing (s3 ls s3://$bucketname) slow with ~2 billion objects

2018-05-01 Thread Casey Bodley

The main problem with efficiently listing many-sharded buckets is the 
requirement to provide entries in sorted order. This means that each 
http request has to fetch ~1000 entries from every shard, combine them 
into a sorted order, and throw out the leftovers. The next request to 
continue the listing will advance its position slightly, but still end 
up fetching many of the same entries from each shard. As the number of 
shards increases, the more these shard listings will overlap, and the 
performance falls off.

Eric Ivancich recently added s3 and swift extensions for unordered 
bucket listing in https://github.com/ceph/ceph/pull/21026 (for mimic). 
That allows radosgw to list each shard separately, and avoid the step 
that throws away extra entries. If your application can tolerate 
unsorted listings, that could be a big help without having to resort to 
indexless buckets.

On 05/01/2018 11:09 AM, Robert Stanford wrote:

 I second the indexless bucket suggestion.  The downside being that 
you can't use bucket policies like object expiration in that case.

On Tue, May 1, 2018 at 10:02 AM, David Turner > wrote:

Any time using shared storage like S3 or cephfs/nfs/gluster/etc
the absolute rule that I refuse to break is to never rely on a
directory listing to know where objects/files are.  You should be
maintaining a database of some sort or a deterministic naming
scheme. The only time a full listing of a directory should be
required is if you feel like your tooling is orphaning files and
you want to clean them up.  If I had someone with a bucket with 2B
objects, I would force them to use an index-less bucket.

That's me, though.  I'm sure there are ways to manage a bucket in
other ways, but it sounds awful.

On Tue, May 1, 2018 at 10:10 AM Robert Stanford
> wrote:

 Listing will always take forever when using a high shard
number, AFAIK.  That's the tradeoff for sharding.  Are those
2B objects in one bucket? How's your read and write
performance compared to a bucket with a lower number
(thousands) of objects, with that shard number?

On Tue, May 1, 2018 at 7:59 AM, Katie Holly <8ld3j...@meo.ws
> wrote:

One of our radosgw buckets has grown a lot in size, `rgw
bucket stats --bucket $bucketname` reports a total of
2,110,269,538 objects with the bucket index sharded across
32768 shards, listing the root context of the bucket with
`s3 ls s3://$bucketname` takes more than an hour which is
the hard limit to first-byte on our nginx reverse proxy
and the aws-cli times out long before that timeout limit
is hit.

The software we use supports sharding the data across
multiple s3 buckets but before I go ahead and enable this,
has anyone ever had that many objects in a single RGW
bucket and can let me know how you solved the problem of
RGW taking a long time to read the full index?

-- 
Best regards

Katie Holly
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW bucket lifecycle policy vs versioning

2018-04-26 Thread Casey Bodley



On 04/26/2018 07:22 AM, Sean Purdy wrote:

Hi,

Both versioned buckets and lifecycle policies are implemented in ceph, and look 
useful.

But are lifecycle policies implemented for versioned buckets?  i.e. can I set a policy 
that will properly expunge all "deleted" objects after a certain time?  i.e. 
objects where the delete marker is the latest version.  This is available in AWS for 
example.


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


As of luminous, radosgw does support the lifecycle policy rules 
NoncurrentVersionExpiration and ExpiredObjectDeleteMarker that control 
the expiration of versioned objects.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fixing bad radosgw index

2018-04-16 Thread Casey Bodley




On 04/14/2018 12:54 PM, Robert Stanford wrote:


 I deleted my default.rgw.buckets.data and default.rgw.buckets.index 
pools in an attempt to clean them out.  I brought this up on the list 
and received replies telling me essentially, "You shouldn't do that." 
There was however no helpful advice on recovering.


 When I run 'radosgw-admin bucket list' I get a list of all my old 
buckets (I thought they'd be cleaned out when I deleted and recreated 
default.rgw.buckets.index, but I was wrong.)  Deleting them with s3cmd 
and radosgw-admin does nothing; they still appear (though s3cmd will 
give a '404' error.)  Running radosgw-admin with 'bucket check' and 
'--fix' does nothing as well.  So, how do I get myself out of this mess.


 On another, semi-related note, I've been deleting (existing) buckets 
and their contents with s3cmd (and --recursive); the space is never 
freed from ceph and the bucket still appears in s3cmd ls.  Looks like 
my radosgw has several issues, maybe all related to deleting and 
recreating the pools.


 Thanks


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


The 'bucket list' command takes a user and prints the list of buckets 
they own - this list is read from the user object itself. You can remove 
these entries with the 'bucket unlink' command.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW multisite sync issues

2018-04-06 Thread Casey Bodley



On 04/06/2018 10:57 AM, Josef Zelenka wrote:

Hi everyone,

i'm currently setting up RGW multisite(one cluster is jewel(primary), 
the other is luminous - this is only for testing, on prod we will have 
the same version - jewel on both), but i can't get bucket 
synchronization to work. Data gets synchronized fine when i upload it, 
but when i delete it from the primary cluster, it only deletes the 
metadata of the file on the secondary one, the files are still 
there(can see it in rados df - pool states the same). Also, none of 
the older buckets start synchronizing to the secondary cluster. It's 
been quite a headache so far. Anyone who knows what might be wrong? I 
can supply any needed info. THanks


Josef Zelenka

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Your issue may be related to http://tracker.ceph.com/issues/22062 (fixed 
in luminous for 12.2.3)? If not, it's probably something similar. In 
general, I wouldn't recommend mixing releases in a multisite 
configuration, as it's not something we do any testing for.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Object lifecycle and indexless buckets

2018-03-20 Thread Casey Bodley




On 03/20/2018 01:33 PM, Robert Stanford wrote:


 Hello,

 Does object expiration work on indexless (blind) buckets?

 Thank you


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


No. Lifecycle processing needs to list the buckets, so objects in 
indexless buckets would not expire.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Object Gateway - Server Side Encryption

2018-03-13 Thread Casey Bodley



On 03/10/2018 12:58 AM, Amardeep Singh wrote:

On Saturday 10 March 2018 02:01 AM, Casey Bodley wrote:


On 03/08/2018 07:16 AM, Amardeep Singh wrote:

Hi,

I am trying to configure server side encryption using Key Management 
Service as per documentation 
http://docs.ceph.com/docs/master/radosgw/encryption/


Configured Keystone/Barbican integration and its working, tested 
using curl commands. After I configure RadosGW and use 
boto.s3.connection from python or s3cmd client an error is thrown.

*
*/boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden//
//encoding="UTF-8"?>AccessDeniedFailed to 
retrieve the actual key, kms-keyid: 
616b2ce2-053a-41e3-b51e-0ff53e33cf81newbuckettx77750-005aa1274b-ac51-uk-westac51-uk-west-uk//

/
In server side logs its getting the token and barbican is 
authenticating the request then providing secret url, but unable to 
serve key.

/
22:10:03.940091 7f056f7eb700 15 ceph_armor ret=16
 22:10:03.940111 7f056f7eb700 15 
supplied_md5=eb1a3227cdc3fedbaec2fe38bf6c044a
 22:10:03.940129 7f056f7eb700 20 reading from 
uk-west.rgw.meta:root:.bucket.meta.newbucket:ee560b67-c330-4fd0-af50-aefff93735d2.4163.1
 22:10:03.940138 7f056f7eb700 20 get_system_obj_state: 
rctx=0x7f056f7e39f0 
obj=uk-west.rgw.meta:root:.bucket.meta.newbucket:ee560b67-c330-4fd0-af50-aefff93735d2.4163.1 
state=0x56540487a5a0 s->prefetch_data=0
 22:10:03.940145 7f056f7eb700 10 cache get: 
name=uk-west.rgw.meta+root+.bucket.meta.newbucket:ee560b67-c330-4fd0-af50-aefff93735d2.4163.1 
: hit (requested=0x16, cached=0x17)
 22:10:03.940152 7f056f7eb700 20 get_system_obj_state: s->obj_tag 
was set empty
 22:10:03.940155 7f056f7eb700 10 cache get: 
name=uk-west.rgw.meta+root+.bucket.meta.newbucket:ee560b67-c330-4fd0-af50-aefff93735d2.4163.1 
: hit (requested=0x11, cached=0x17)
 22:10:03.944015 7f056f7eb700 20 bucket quota: max_objects=1638400 
max_size=-1
 22:10:03.944030 7f056f7eb700 20 bucket quota OK: 
stats.num_objects=7 stats.size=50
 22:10:03.944176 7f056f7eb700 20 Getting KMS encryption key for 
key=616b2ce2-053a-41e3-b51e-0ff53e33cf81
 22:10:03.944225 7f056f7eb700 20 Requesting secret from barbican 
url=http://keyserver.rados:5000/v3/auth/tokens
 22:10:03.944281 7f056f7eb700 20 sending request to 
http://keyserver.rados:5000/v3/auth/tokens
* 22:10:04.405974 7f056f7eb700 20 sending request to 
http://keyserver.rados:9311/v1/secrets/616b2ce2-053a-41e3-b51e-0ff53e33cf81*
* 22:10:05.519874 7f056f7eb700 5 Failed to retrieve secret from 
barbican:616b2ce2-053a-41e3-b51e-0ff53e33cf81**

*/


It looks like this request is being rejected by barbican. Do you have 
any logs on the barbican side that might show why?

Only get 2 lines in barbican logs, one shows warning.

22:10:08.255 807 WARNING barbican.api.controllers.secrets 
[req-091413d2--46e2-be5f-a3e68a480ac9 
716dad1b8044459c99fea284dbfc47cc - - default default] Decrypted secret 
616b2ce2-053a-41e3-b51e-0ff53e33cf81 requested using deprecated API call.
22:10:08.261 807 INFO barbican.api.middleware.context 
[req-091413d2--46e2-be5f-a3e68a480ac9 
716dad1b8044459c99fea284dbfc47cc - - default default] Processed 
request: 200 OK - GET 
http://keyserver.rados:9311/v1/secrets/616b2ce2-053a-41e3-b51e-0ff53e33cf81




Okay, so barbican is returning 200 OK but radosgw is still converting 
that to EACCES. I'm guessing that's happening in 
request_key_from_barbican() here: 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_crypt.cc#L779 - is 
it possible the key in barbican is something other than AES256?






/*** 22:10:05.519901 7f056f7eb700 5 ERROR: failed to retrieve actual 
key from key_id: 616b2ce2-053a-41e3-b51e-0ff53e33cf81*
 22:10:05.519980 7f056f7eb700 2 req 387:1.581432:s3:PUT 
/encrypted.txt:put_obj:completing
 22:10:05.520187 7f056f7eb700 2 req 387:1.581640:s3:PUT 
/encrypted.txt:put_obj:op status=-13
 22:10:05.520193 7f056f7eb700 2 req 387:1.581645:s3:PUT 
/encrypted.txt:put_obj:http status=403
 22:10:05.520206 7f056f7eb700 1 == req done req=0x7f056f7e5190 
op status=-13 http_status=403 ==

 22:10:05.520225 7f056f7eb700 20 process_request() returned -13
 22:10:05.520280 7f056f7eb700 1 civetweb: 0x5654042a1000: 
192.168.100.200 - - [02/Mar/2018:22:10:03 +0530] "PUT /encrypted.txt 
HTTP/1.1" 1 0 - Boto/2.38.0 Python/2.7.12 Linux/4.12.1-041201-generic

 22:10:06.116527 7f056e7e9700 20 HTTP_ACCEPT=*/*/

The error thrown in from this line 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_crypt.cc#L1063


I am unable to understand why its throwing the error.

In ceph.conf following settings are done.

[global]
rgw barbican url = http://keyserver.rados:9311
rgw keystone barbican user = rgwcrypt
rgw keystone barbican password = rgwpass
rgw keystone barbican project = service
rgw keystone barbican domain = default
rgw keystone url = http://keyserver.rados:5000
rgw keystone api version = 3
rgw crypt require ssl = false

Can someone help in figuring out w

Re: [ceph-users] Object Gateway - Server Side Encryption

2018-03-09 Thread Casey Bodley



On 03/08/2018 07:16 AM, Amardeep Singh wrote:

Hi,

I am trying to configure server side encryption using Key Management 
Service as per documentation 
http://docs.ceph.com/docs/master/radosgw/encryption/


Configured Keystone/Barbican integration and its working, tested using 
curl commands. After I configure RadosGW and use boto.s3.connection 
from python or s3cmd client an error is thrown.

*
*/boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden//
//encoding="UTF-8"?>AccessDeniedFailed to 
retrieve the actual key, kms-keyid: 
616b2ce2-053a-41e3-b51e-0ff53e33cf81newbuckettx77750-005aa1274b-ac51-uk-westac51-uk-west-uk//

/
In server side logs its getting the token and barbican is 
authenticating the request then providing secret url, but unable to 
serve key.

/
22:10:03.940091 7f056f7eb700 15 ceph_armor ret=16
 22:10:03.940111 7f056f7eb700 15 
supplied_md5=eb1a3227cdc3fedbaec2fe38bf6c044a
 22:10:03.940129 7f056f7eb700 20 reading from 
uk-west.rgw.meta:root:.bucket.meta.newbucket:ee560b67-c330-4fd0-af50-aefff93735d2.4163.1
 22:10:03.940138 7f056f7eb700 20 get_system_obj_state: 
rctx=0x7f056f7e39f0 
obj=uk-west.rgw.meta:root:.bucket.meta.newbucket:ee560b67-c330-4fd0-af50-aefff93735d2.4163.1 
state=0x56540487a5a0 s->prefetch_data=0
 22:10:03.940145 7f056f7eb700 10 cache get: 
name=uk-west.rgw.meta+root+.bucket.meta.newbucket:ee560b67-c330-4fd0-af50-aefff93735d2.4163.1 
: hit (requested=0x16, cached=0x17)
 22:10:03.940152 7f056f7eb700 20 get_system_obj_state: s->obj_tag was 
set empty
 22:10:03.940155 7f056f7eb700 10 cache get: 
name=uk-west.rgw.meta+root+.bucket.meta.newbucket:ee560b67-c330-4fd0-af50-aefff93735d2.4163.1 
: hit (requested=0x11, cached=0x17)
 22:10:03.944015 7f056f7eb700 20 bucket quota: max_objects=1638400 
max_size=-1
 22:10:03.944030 7f056f7eb700 20 bucket quota OK: stats.num_objects=7 
stats.size=50
 22:10:03.944176 7f056f7eb700 20 Getting KMS encryption key for 
key=616b2ce2-053a-41e3-b51e-0ff53e33cf81
 22:10:03.944225 7f056f7eb700 20 Requesting secret from barbican 
url=http://keyserver.rados:5000/v3/auth/tokens
 22:10:03.944281 7f056f7eb700 20 sending request to 
http://keyserver.rados:5000/v3/auth/tokens
* 22:10:04.405974 7f056f7eb700 20 sending request to 
http://keyserver.rados:9311/v1/secrets/616b2ce2-053a-41e3-b51e-0ff53e33cf81*
* 22:10:05.519874 7f056f7eb700 5 Failed to retrieve secret from 
barbican:616b2ce2-053a-41e3-b51e-0ff53e33cf81**

*/


It looks like this request is being rejected by barbican. Do you have 
any logs on the barbican side that might show why?


/*** 22:10:05.519901 7f056f7eb700 5 ERROR: failed to retrieve actual 
key from key_id: 616b2ce2-053a-41e3-b51e-0ff53e33cf81*
 22:10:05.519980 7f056f7eb700 2 req 387:1.581432:s3:PUT 
/encrypted.txt:put_obj:completing
 22:10:05.520187 7f056f7eb700 2 req 387:1.581640:s3:PUT 
/encrypted.txt:put_obj:op status=-13
 22:10:05.520193 7f056f7eb700 2 req 387:1.581645:s3:PUT 
/encrypted.txt:put_obj:http status=403
 22:10:05.520206 7f056f7eb700 1 == req done req=0x7f056f7e5190 op 
status=-13 http_status=403 ==

 22:10:05.520225 7f056f7eb700 20 process_request() returned -13
 22:10:05.520280 7f056f7eb700 1 civetweb: 0x5654042a1000: 
192.168.100.200 - - [02/Mar/2018:22:10:03 +0530] "PUT /encrypted.txt 
HTTP/1.1" 1 0 - Boto/2.38.0 Python/2.7.12 Linux/4.12.1-041201-generic

 22:10:06.116527 7f056e7e9700 20 HTTP_ACCEPT=*/*/

The error thrown in from this line 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_crypt.cc#L1063


I am unable to understand why its throwing the error.

In ceph.conf following settings are done.

[global]
rgw barbican url = http://keyserver.rados:9311
rgw keystone barbican user = rgwcrypt
rgw keystone barbican password = rgwpass
rgw keystone barbican project = service
rgw keystone barbican domain = default
rgw keystone url = http://keyserver.rados:5000
rgw keystone api version = 3
rgw crypt require ssl = false

Can someone help in figuring out what is missing.

Thanks,
Amar


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-27 Thread Casey Bodley

s3cmd does have special handling for 'US' and 'us-east-1' that skips the 
LocationConstraint on bucket creation:


https://github.com/s3tools/s3cmd/blob/master/S3/S3.py#L380


On 02/26/2018 05:16 PM, David Turner wrote:
I just realized the difference between the internal realm, local 
realm, and local-atl realm.  local-atl is a Luminous cluster while the 
other 2 are Jewel.  It looks like that option was completely ignored 
in Jewel and now Luminous is taking it into account (which is better 
imo).  I think you're right that 'us' is probably some sort of default 
in s3cmd that doesn't actually send the variable to the gateway.


Unfortunately we only allow https for rgw in the environments I have 
set up, but I think we found the cause of the initial randomness of 
things.  Thanks Yehuda.


On Mon, Feb 26, 2018 at 4:26 PM Yehuda Sadeh-Weinraub 
> wrote:


I don't know why 'us' works for you, but it could be that s3cmd is
just not sending any location constraint when 'us' is set. You can try
looking at the capture for this. You can try using wireshark for the
capture (assuming http endpoint and not https).

Yehuda

On Mon, Feb 26, 2018 at 1:21 PM, David Turner
> wrote:
> I set it to that for randomness.  I don't have a zonegroup named
'us'
> either, but that works fine.  I don't see why 'cn' should be any
different.
> The bucket_location that triggered me noticing this was 'gd1'. 
I don't know
> where that one came from, but I don't see why we should force
people setting
> it to 'us' when that has nothing to do with the realm. If it
needed to be
> set to 'local-atl' that would make sense, but 'us' works just
fine.  Perhaps
> 'us' working is what shouldn't work as opposed to allowing
whatever else to
> be able to work.
>
> I tested setting bucket_location to 'local-atl' and it did
successfully
> create the bucket.  So the question becomes, why do my other
realms not care
> what that value is set to and why does this realm allow 'us' to
be used when
> it isn't correct?
>
> On Mon, Feb 26, 2018 at 4:12 PM Yehuda Sadeh-Weinraub
>
> wrote:
>>
>> If that's what you set in the config file, I assume that's what
passed
>> in. Why did you set that in your config file? You don't have a
>> zonegroup named 'cn', right?
>>
>> On Mon, Feb 26, 2018 at 1:10 PM, David Turner
>
>> wrote:
>> > I'm also not certain how to do the tcpdump for this.  Do you
have any
>> > pointers to how to capture that for you?
>> >
>> > On Mon, Feb 26, 2018 at 4:09 PM David Turner
>
>> > wrote:
>> >>
>> >> That's what I set it to in the config file. I probably
should have
>> >> mentioned that.
>> >>
>> >> On Mon, Feb 26, 2018 at 4:07 PM Yehuda Sadeh-Weinraub
>> >> >
>> >> wrote:
>> >>>
>> >>> According to the log here, it says that the location
constraint it got
>> >>> is "cn", can you take a look at a tcpdump, see if that's
actually
>> >>> what's passed in?
>> >>>
>> >>> On Mon, Feb 26, 2018 at 12:02 PM, David Turner
>
>> >>> wrote:
>> >>> > I run with `debug rgw = 10` and was able to find these
lines at the
>> >>> > end
>> >>> > of a
>> >>> > request to create the bucket.
>> >>> >
>> >>> > Successfully creating a bucket with `bucket_location =
US` looks
>> >>> > like
>> >>> > [1]this.  Failing to create a bucket has "ERROR: S3
error: 400
>> >>> > (InvalidLocationConstraint): The specified
location-constraint is
>> >>> > not
>> >>> > valid"
>> >>> > on the CLI and [2]this (excerpt from the end of the
request) in the
>> >>> > rgw
>> >>> > log
>> >>> > (debug level 10).  "create bucket location constraint"
was not found
>> >>> > in
>> >>> > the
>> >>> > log for successfully creating the bucket.
>> >>> >
>> >>> >
>> >>> > [1]
>> >>> > 2018-02-26 19:52:36.419251 7f4bc9bc8700 10 cache put:
>> >>> >
>> >>> >
>> >>> >

name=local-atl.rgw.data.root++.bucket.meta.testerton:bef43c26-daf3-47ef-a3a5-e1167e3f88ac.39099765.1
>> >>> > info.flags=0x17
>> >>> > 2018-02-26 19:52:36.419262 7f4bc9bc8700 10 adding
>> >>> >
>> >>> >
>> >>> >

local-atl.rgw.data.root++.bucket.meta.testerton:bef43c26-daf3-47ef-a3a5-e1167e3f88ac.39099765.1
>> >>> > to cache LRU end
>> >>> > 2018-02-26 19:52:36.419266 7f4bc9bc8700 10 updating xattr:
>> >>> > name=user.rgw.acl
>> >>> > bl.length()=141
>> >>> >

Re: [ceph-users] Is the minimum length of a part in a RGW multipart upload configurable?

2018-02-16 Thread Casey Bodley



On 02/16/2018 12:39 AM, F21 wrote:

I am uploading parts to RGW using the S3 multipart upload functionality.

I tried uploading a part sized at 500 KB and received a EntityTooSmall 
error from the server. I am assuming that it expects each part to have 
a minimum size of 5MB like S3.


I found `rgw multipart min part size` being mentioned on the issue 
tracker, but this option does not seem to be in the docs. This PR also 
shows that it was removed: https://github.com/ceph/ceph/pull/9285


Is this still a configurable option?

Thanks,

Francis

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


That is the right config option, it just hasn't been documented. I've 
opened a doc bug for that at http://tracker.ceph.com/issues/23027 - 
anyone interested in helping out can follow up there.


Thanks,
Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Error message in the logs: "meta sync: ERROR: failed to read mdlog info with (2) No such file or directory"

2018-01-17 Thread Casey Bodley



On 01/15/2018 09:57 AM, Victor Flávio wrote:

Hello,

We've have a radosgw cluster(verion 12.2.2) in multisite mode. Our 
cluster is formed by one master realm, with one master zonegroup and 
two zones(which one is the master zone).


We've followed the instructions of Ceph documentation to install and 
configure our cluster.


The cluster works as expected, the objects and users are being 
replicated between the zones, but we always are getting this error 
message in our logs:



2018-01-15 12:25:00.119301 7f68868e5700  1 meta sync: ERROR: failed to 
read mdlog info with (2) No such file or directory



Some details about the errors message(s):
 - They are only printed in the non-master zone log;
 - They are only printed when this "slave" zone try to sync the 
metadata info;
 - In each synchronization cycle of the metadata info, the number of 
this errors messages equals to the number of shards of metadata logs;
 - When we run the command "rados-admin mdlogs list", we've got a 
empty array as output in both zones;
 - The output of "rados-admin sync status" says every is ok and 
synced, which is true, despite the mdlog error messages in log.


Anyone got this same problem? And how to fix it. I've tried and failed 
to many times to fix it.



--
Victor Flávio de Oliveira Santos
Fullstack Developer/DevOps
http://victorflavio.me
Twitter: @victorflavio
Skype: victorflavio.oliveira
Github: https://github.com/victorflavio
Telefone/Phone: +55 62 81616477



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Victor,

We use a hashing strategy to spread metadata over these mdlog shards. 
It's likely that some shards are empty, especially if there are 
relatively few buckets/users in the system. These 'No such file or 
directory' errors are just trying to read from shard objects that 
haven't ever been written to. Logging them as noisy ERROR messages is 
certainly misleading, but it's probably nothing to worry about.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to "reset" rgw?

2018-01-10 Thread Casey Bodley



On 01/10/2018 04:34 AM, Martin Emrich wrote:

Hi!

As I cannot find any solution for my broken rgw pools, the only way 
out is to give up and "reset".


How do I throw away all rgw data from a ceph cluster? Just delete all 
rgw pools? Or are some parts stored elsewhere (monitor, ...)?


Thanks,

Martin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Deleting all of rgw's pools should be sufficient.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous rgw hangs after sighup

2017-12-11 Thread Casey Bodley

There have been other issues related to hangs during realm 
reconfiguration, ex http://tracker.ceph.com/issues/20937. We decided to 
revert the use of SIGHUP to trigger realm reconfiguration in 
https://github.com/ceph/ceph/pull/16807. I just started a backport of 
that for luminous.

On 12/11/2017 11:07 AM, Graham Allan wrote:

That's the issue I remember (#20763)!

The hang happened to me once, on this cluster, after upgrade from 
jewel to 12.2.2; then on Friday I disabled automatic bucket resharding 
due to some other problems - didn't get any logrotate-related hangs 
through the weekend. I wonder if these could be related?

Graham

On 12/11/2017 02:01 AM, Martin Emrich wrote:

Hi!

This sounds like http://tracker.ceph.com/issues/20763 (or indeed 
http://tracker.ceph.com/issues/20866).

It is still present in 12.2.2 (just tried it). My workaround is to 
exclude radosgw from logrotate (remove "radosgw" from 
/etc/logrotate.d/ceph) from being SIGHUPed, and to rotate the logs 
manually from time to time and completely restarting the radosgw 
processes one after the other on my radosgw cluster.

Regards,

Martin

Am 08.12.17, 18:58 schrieb "ceph-users im Auftrag von Graham Allan" 
:

 I noticed this morning that all four of our rados gateways 
(luminous
 12.2.2) hung at logrotate time overnight. The last message 
logged was:
  > 2017-12-08 03:21:01.897363 7fac46176700  0 ERROR: failed 
to clone shard, completion_mgr.get_next() returned ret=-125

  one of the 3 nodes recorded more detail:
 > 2017-12-08 06:51:04.452108 7f80fbfdf700  1 rgw realm reloader: 
Pausing frontends for realm update...
 > 2017-12-08 06:51:04.452126 7f80fbfdf700  1 rgw realm reloader: 
Frontends paused
 > 2017-12-08 06:51:04.452891 7f8202436700  0 ERROR: failed to 
clone shard, completion_mgr.get_next() returned ret=-125

 I remember seeing this happen on our test cluster a while back with
 Kraken. I can't find the tracker issue I originally found 
related to
 this, but it also sounds like it could be a reversion of bug 
#20339 or

 #20686?
  I recorded some strace output from one of the radosgw 
instances before

 restarting, if it's useful to open an issue.
  --
 Graham Allan
 Minnesota Supercomputing Institute - g...@umn.edu
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 10.2.10: "default" zonegroup in custom root pool not found

2017-11-15 Thread Casey Bodley




On 11/15/2017 12:11 AM, Richard Chan wrote:

After creating a non-default root pool
rgw_realm_root_pool = gold.rgw.root
rgw_zonegroup_root_pool = gold.rgw.root
rgw_period_root_pool = gold.rgw.root
rgw_zone_root_pool = gold.rgw.root
rgw_region = gold.rgw.root


You probably meant to set rgw_region_root_pool for that last line. As it 
is, this is triggering some compatibility code that sets 'rgw_zonegroup 
= rgw_region' when a region is given but zonegroup is not.




radosgw-admin realm create --rgw-realm gold --default
radosgw-admin zonegroup create --rgw-zonegroup=us  --default --master 
--endpoints http://rgw:7480


The "default" is not respected anymore:


radosgw-admin period update --commit
2017-11-15 04:50:42.400404 7f694dd4e9c0  0 failed reading zonegroup 
info: ret -2 (2) No such file or directory

couldn't init storage provider


I require --rgw-zonegroup=us on command line or /etc/ceph/ceph.conf

This seems to be regression.




--
Richard Chan



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw multi site different period

2017-11-15 Thread Casey Bodley

Your period configuration is indeed consistent between zones. This 
"master is on a different period" error is specific to the metadata sync 
status. It's saying that zone b is unable to finish syncing the metadata 
changes from zone a that occurred during the previous period. Even 
though zone b was the master during that period, it needs to re-sync 
from zone a to make sure everyone ends up with a consistent view (even 
if this results in the loss of metadata changes).


It sounds like zone a was re-promoted to master before it had a chance 
to catch up completely. The docs offer some guidance [1] to avoid this 
situation, but you can recover on zone b by running `radosgw-admin 
metadata sync init` and restarting its gateways to restart a full sync.


[1] 
http://docs.ceph.com/docs/luminous/radosgw/multisite/#changing-the-metadata-master-zone


On 11/15/2017 02:56 AM, Kim-Norman Sahm wrote:

both cluster are in the same epoch and period:
  
root@ceph-a-1:~# radosgw-admin period get-current

{
 "current_period": "b7392c41-9cbe-4d92-ad03-db607dd7d569"
}

root@ceph-b-1:~# radosgw-admin period get-current
{
 "current_period": "b7392c41-9cbe-4d92-ad03-db607dd7d569"
}

but the sync state is still "master is on a different period":

root@ceph-b-1:~# radosgw-admin sync status
   realm 833e65be-268f-42c2-8f3c-9bab83ebbff2 (myrealm)
   zonegroup 15550dc6-a761-473f-81e8-0dc6cc5106bd (ceph)
zone 082cd970-bd25-4cbc-a5fd-20f3b3f9dbd2 (b)
   metadata sync syncing
 full sync: 0/64 shards
 master is on a different period:
master_period=b7392c41-9cbe-4d92-ad03-db607dd7d569
local_period=d306a847-77a6-4306-87c9-0bb4fa16cdc4
 incremental sync: 64/64 shards
 metadata is caught up with master
   data sync source: 51019cee-86fb-4b39-b6ba-282171c459c6 (a)
 syncing
 full sync: 0/128 shards
 incremental sync: 128/128 shards
 data is caught up with source


Am Dienstag, den 14.11.2017, 18:21 +0100 schrieb Kim-Norman Sahm:

both cluster are in the same epoch and period:

root@ceph-a-1:~# radosgw-admin period get-current
{
 "current_period": "b7392c41-9cbe-4d92-ad03-db607dd7d569"
}

root@ceph-b-1:~# radosgw-admin period get-current
{
 "current_period": "b7392c41-9cbe-4d92-ad03-db607dd7d569"
}

Am Dienstag, den 14.11.2017, 17:05 + schrieb David Turner:

I'm assuming you've looked at the period in both places `radosgw-
admin period get` and confirmed that the second site is behind the
master site (based on epochs).  I'm also assuming (since you linked
the instructions) that you've done `radosgw-admin period pull` on
the
second site to get any period updates that have been done to the
master site.

If my assumptions are wrong.  Then you should do those things.  If
my
assumptions are correct, then running `radosgw-admin period update
--
commit` on the the master site and `radosgw-admin period pull` on
the
second site might fix this.  If you've already done that as well
(as
they're steps in the article you linked), then you need someone
smarter than I am to chime in.

On Tue, Nov 14, 2017 at 11:35 AM Kim-Norman Sahm 
wrote:

hi,

i've installed a ceph multi site setup with two ceph clusters and
each
one radosgw.
the multi site setup was in sync, so i tried a failover.
cluster A is going down and i've changed the zone (b) on cluster
b
to
the new master zone.
it's working fine.

now i start the cluster A and try to switch back the master zone
to
A.
cluster A believes that he is the master, cluster b is secondary.
but on the secondary is a different period and the bucket delta
is
not
synced to the new master zone:

root@ceph-a-1:~# radosgw-admin sync status
   realm 833e65be-268f-42c2-8f3c-9bab83ebbff2 (myrealm)
   zonegroup 15550dc6-a761-473f-81e8-0dc6cc5106bd (ceph)
zone 51019cee-86fb-4b39-b6ba-282171c459c6 (a)
   metadata sync no sync (zone is master)
   data sync source: 082cd970-bd25-4cbc-a5fd-20f3b3f9dbd2 (b)
 syncing
 full sync: 0/128 shards
 incremental sync: 128/128 shards
 data is caught up with source

root@ceph-b-1:~# radosgw-admin sync status
   realm 833e65be-268f-42c2-8f3c-9bab83ebbff2 (myrealm)
   zonegroup 15550dc6-a761-473f-81e8-0dc6cc5106bd (ceph)
zone 082cd970-bd25-4cbc-a5fd-20f3b3f9dbd2 (b)
   metadata sync syncing
 full sync: 0/64 shards
 master is on a different period:
master_period=b7392c41-9cbe-4d92-ad03-db607dd7d569
local_period=d306a847-77a6-4306-87c9-0bb4fa16cdc4
 incremental sync: 64/64 shards
 metadata is caught up with master
   data sync source: 51019cee-86fb-4b39-b6ba-282171c459c6 (a)
 syncing
 full sync: 0/128 shards

Re: [ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-12 Thread Casey Bodley

Thanks Enrico. I wrote a test case that reproduces the issue, and opened 
http://tracker.ceph.com/issues/21772 to track the bug. It sounds like 
this is a regression in luminous.

On 10/11/2017 06:41 PM, Enrico Kern wrote:

or this:

   {
"shard_id": 22,
"entries": [
{
"id": "1_1507761448.758184_10459.1",
"section": "data",
"name": 
"testbucket:6a9448d2-bdba-4bec-aad6-aba72cd8eac6.21344646.3/Wireshark-win64-2.2.7.exe",

"timestamp": "2017-10-11 22:37:28.758184Z",
"info": {
"source_zone": "6a9448d2-bdba-4bec-aad6-aba72cd8eac6",
"error_code": 5,
"message": "failed to sync object"
}
}
]
},

	Virenfrei. www.avg.com 

On Thu, Oct 12, 2017 at 12:39 AM, Enrico Kern 
> wrote:

its 45MB, but it happens with all multipart uploads.

sync error list shows

   {
"shard_id": 31,
"entries": [
{
"id": "1_1507761459.607008_8197.1",
"section": "data",
"name":
"testbucket:6a9448d2-bdba-4bec-aad6-aba72cd8eac6.21344646.3",
"timestamp": "2017-10-11 22:37:39.607008Z",
"info": {
"source_zone":
"6a9448d2-bdba-4bec-aad6-aba72cd8eac6",
"error_code": 5,
"message": "failed to sync bucket instance:
(5) Input/output error"
}
}
]
}

for multiple shards not just this one

On Thu, Oct 12, 2017 at 12:31 AM, Yehuda Sadeh-Weinraub
> wrote:

What is the size of the object? Is it only this one?

Try this command: 'radosgw-admin sync error list'. Does it
show anything related to that object?

Thanks,
Yehuda

On Wed, Oct 11, 2017 at 3:26 PM, Enrico Kern
> wrote:

if i change permissions the sync status shows that it is
syncing 1 shard, but no files ends up in the pool (testing
with empty data pool). after a while it shows that data is
back in sync but there is no file

On Wed, Oct 11, 2017 at 11:26 PM, Yehuda Sadeh-Weinraub
> wrote:

Thanks for your report. We're looking into it. You can
try to see if touching the object (e.g., modifying its
permissions) triggers the sync.

Yehuda

On Wed, Oct 11, 2017 at 1:36 PM, Enrico Kern
> wrote:

Hi David,

yeah seems you are right, they are stored as
different filenames in the data bucket when using
multisite upload. But anyway it stil doesnt get
replicated. As example i have files like

6a9448d2-bdba-4bec-aad6-aba72cd8eac6.21344646.1__multipart_Wireshark-win64-2.2.7.exe.2~0LAfq93OMdk7hrijvyzW_EBRkVQLX37.6

in the data pool on one zone. But its not
replicated to the other zone. naming is not
relevant, the other data bucket doesnt have any
file multipart or not.

im really missing the file on the other zone.

Virenfrei. www.avg.com

On Wed, Oct 11, 2017 at 10:25 PM, David Turner
> wrote:

Multipart is a client side setting when
uploading. Multisite in and of itself is a
client and it doesn't use multipart (at least
not by default). I have a Jewel RGW Multisite
cluster and one site has the object as
multi-part while the second site just has it
as a single object.  I had to change from
looking at the objects in the pool for
monitoring to looking at an ls of the buckets

Re: [ceph-users] RGW flush_read_list error

2017-10-11 Thread Casey Bodley


Hi Travis,

This is reporting an error when sending data back to the client. 
Generally it means that the client timed out and closed the connection. 
Are you also seeing failures on the client side?


Casey


On 10/10/2017 06:45 PM, Travis Nielsen wrote:

In Luminous 12.2.1, when running a GET on a large (1GB file) repeatedly
for an hour from RGW, the following error was hit intermittently a number
of times. The first error was hit after 45 minutes and then the error
happened frequently for the remainder of the test.

ERROR: flush_read_list(): d->client_cb->handle_data() returned -5

Here is some more context from the rgw log around one of the failures.

2017-10-10 18:20:32.321681 I | rgw: 2017-10-10 18:20:32.321643
7f8929f41700 1 civetweb: 0x55bd25899000: 10.32.0.1 - -
[10/Oct/2017:18:19:07 +] "GET /bucket100/testfile.tst HTTP/1.1" 1 0 -
aws-sdk-java/1.9.0 Linux/4.4.0-93-generic
OpenJDK_64-Bit_Server_VM/25.131-b11/1.8.0_131
2017-10-10 18:20:32.383855 I | rgw: 2017-10-10 18:20:32.383786
7f8924736700 1 == starting new request req=0x7f892472f140 =
2017-10-10 18:20:46.605668 I | rgw: 2017-10-10 18:20:46.605576
7f894af83700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
returned -5
2017-10-10 18:20:46.605934 I | rgw: 2017-10-10 18:20:46.605914
7f894af83700 1 == req done req=0x7f894af7c140 op status=-5
http_status=200 ==
2017-10-10 18:20:46.606249 I | rgw: 2017-10-10 18:20:46.606225
7f8924736700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
returned -5

I don't see anything else standing out in the log. The object store was
configured with an erasure-coded data pool with k=2 and m=1.

There are a number of threads around this, but I don't see a resolution.
Is there a tracking issue for this?
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007756.ht
ml
https://www.spinics.net/lists/ceph-users/msg16117.html
https://www.spinics.net/lists/ceph-devel/msg37657.html


Here's our tracking Rook issue.
https://github.com/rook/rook/issues/1067


Thanks,
Travis



On 10/10/17, 3:05 PM, "ceph-users on behalf of Jack"
 wrote:


Hi,

I would like some information about the following

Let say I have a running cluster, with 4 OSDs: 2 SSDs, and 2 HDDs
My single pool has size=3, min_size=2

For a write-only pattern, I thought I would get SSDs performance level,
because the write would be acked as soon as min_size OSDs acked

But I am right ?

(the same setup could involve some high latency OSDs, in the case of
country-level cluster)
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ceph
.com%2Flistinfo.cgi%2Fceph-users-ceph.com=02%7C01%7CTravis.Nielsen%40
quantum.com%7C16f668da252f4e6f355308d5102b09c1%7C322a135f14fb4d72aede12227
2134ae0%7C1%7C0%7C636432699404298770=tmIMMyQ7ia%2FVmHrSGcF9t4sMpt2bj
dexriEhEg3XUGU%3D=0

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-25 Thread Casey Bodley


Hi David,

The 'data sync init' command won't touch any actual object data, no. 
Resetting the data sync status will just cause a zone to restart a full 
sync of the --source-zone's data changes log. This log only lists which 
buckets/shards have changes in them, which causes radosgw to consider 
them for bucket sync. So while the command may silence the warnings 
about data shards being behind, it's unlikely to resolve the issue with 
missing objects in those buckets.


When data sync is behind for an extended period of time, it's usually 
because it's stuck retrying previous bucket sync failures. The 'sync 
error list' may help narrow down where those failures are.


There is also a 'bucket sync init' command to clear the bucket sync 
status. Following that with a 'bucket sync run' should restart a full 
sync on the bucket, pulling in any new objects that are present on the 
source-zone. I'm afraid that those commands haven't seen a lot of polish 
or testing, however.


Casey


On 08/24/2017 04:15 PM, David Turner wrote:
Apparently the data shards that are behind go in both directions, but 
only one zone is aware of the problem. Each cluster has objects in 
their data pool that the other doesn't have.  I'm thinking about 
initiating a `data sync init` on both sides (one at a time) to get 
them back on the same page.  Does anyone know if that command will 
overwrite any local data that the zone has that the other doesn't if 
you run `data sync init` on it?


On Thu, Aug 24, 2017 at 1:51 PM David Turner > wrote:


After restarting the 2 RGW daemons on the second site again,
everything caught up on the metadata sync.  Is there something
about having 2 RGW daemons on each side of the multisite that
might be causing an issue with the sync getting stale?  I have
another realm set up the same way that is having a hard time with
its data shards being behind.  I haven't told them to resync, but
yesterday I noticed 90 shards were behind.  It's caught back up to
only 17 shards behind, but the oldest change not applied is 2
months old and no order of restarting RGW daemons is helping to
resolve this.

On Thu, Aug 24, 2017 at 10:59 AM David Turner
> wrote:

I have a RGW Multisite 10.2.7 set up for bi-directional
syncing.  This has been operational for 5 months and working
fine.  I recently created a new user on the master zone, used
that user to create a bucket, and put in a public-acl object
in there.  The Bucket created on the second site, but the user
did not and the object errors out complaining about the
access_key not existing.

That led me to think that the metadata isn't syncing, while
bucket and data both are.  I've also confirmed that data is
syncing for other buckets as well in both directions. The sync
status from the second site was this.

1.

metadata sync syncing

2.

full sync:0/64shards

3.

incremental sync:64/64shards

4.

metadata iscaught up withmaster

5.

data sync
source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)

6.

syncing

7.

full sync:0/128shards

8.

incremental sync:128/128shards

9.

data iscaught up withsource


Sync status leads me to think that the second site believes it
is up to date, even though it is missing a freshly created
user.  I restarted all of the rgw daemons for the zonegroup,
but it didn't trigger anything to fix the missing user in the
second site. I did some googling and found the sync init
commands mentioned in a few ML posts and used metadata sync
init and now have this as the sync status.

1.

metadata sync preparing forfull sync

2.

full sync:64/64shards

3.

full sync:0entries to sync

4.

incremental sync:0/64shards

5.

metadata isbehind on 70shards

6.

oldest incremental change
notapplied:2017-03-0121:13:43.0.126971s

7.

data sync
source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)

8.

syncing

9.

full sync:0/128shards

   10.

incremental sync:128/128shards

   11.

data iscaught up withsource


It definitely triggered a fresh sync and told it to forget
about what it's previously applied as the date of the oldest
change not applied is the day we initially set up multisite
for this zone.  The problem is that was over 12 hours ago and
the sync stat hasn't caught up on any shards yet.

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-25 Thread Casey Bodley


Hi David,

The 'radosgw-admin sync error list' command may be useful in debugging 
sync failures for specific entries. For users, we've seen some sync 
failures caused by conflicting user metadata that was only present on 
the secondary site. For example, a user that had the same access key or 
email address, which we require to be unique.


Running multiple gateways on the same zone is fully supported, and 
unlikely to cause these kinds of issues.



On 08/24/2017 01:51 PM, David Turner wrote:
After restarting the 2 RGW daemons on the second site again, 
everything caught up on the metadata sync.  Is there something about 
having 2 RGW daemons on each side of the multisite that might be 
causing an issue with the sync getting stale?  I have another realm 
set up the same way that is having a hard time with its data shards 
being behind.  I haven't told them to resync, but yesterday I noticed 
90 shards were behind. It's caught back up to only 17 shards behind, 
but the oldest change not applied is 2 months old and no order of 
restarting RGW daemons is helping to resolve this.


On Thu, Aug 24, 2017 at 10:59 AM David Turner > wrote:


I have a RGW Multisite 10.2.7 set up for bi-directional syncing. 
This has been operational for 5 months and working fine.  I

recently created a new user on the master zone, used that user to
create a bucket, and put in a public-acl object in there.  The
Bucket created on the second site, but the user did not and the
object errors out complaining about the access_key not existing.

That led me to think that the metadata isn't syncing, while bucket
and data both are.  I've also confirmed that data is syncing for
other buckets as well in both directions. The sync status from the
second site was this.

1.

metadata sync syncing

2.

full sync:0/64shards

3.

incremental sync:64/64shards

4.

metadata iscaught up withmaster

5.

data sync
source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)

6.

syncing

7.

full sync:0/128shards

8.

incremental sync:128/128shards

9.

data iscaught up withsource


Sync status leads me to think that the second site believes it is
up to date, even though it is missing a freshly created user.  I
restarted all of the rgw daemons for the zonegroup, but it didn't
trigger anything to fix the missing user in the second site.  I
did some googling and found the sync init commands mentioned in a
few ML posts and used metadata sync init and now have this as the
sync status.

1.

metadata sync preparing forfull sync

2.

full sync:64/64shards

3.

full sync:0entries to sync

4.

incremental sync:0/64shards

5.

metadata isbehind on 70shards

6.

oldest incremental change notapplied:2017-03-0121:13:43.0.126971s

7.

data sync
source:f4c12327-4721-47c9-a365-86332d84c227(public-atl01)

8.

syncing

9.

full sync:0/128shards

   10.

incremental sync:128/128shards

   11.

data iscaught up withsource


It definitely triggered a fresh sync and told it to forget about
what it's previously applied as the date of the oldest change not
applied is the day we initially set up multisite for this zone. 
The problem is that was over 12 hours ago and the sync stat hasn't

caught up on any shards yet.

Does anyone have any suggestions other than blast the second site
and set it back up with a fresh start (the only option I can think
of at this point)?

Thank you,
David Turner



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Multisite Sync Memory Usage

2017-07-26 Thread Casey Bodley


Hi Ryan,

Sorry to hear about the crashes. Based on the fact that it's happening 
on the source zone, I'm guessing that you're hitting this infinite loop 
that leads to OOM: http://tracker.ceph.com/issues/20386. The jewel 
backport for this one is still pending, so I raised its priority to 
Urgent. I'm afraid there isn't a workaround here - the infinite loop 
reproduces once the 'data changes log' grows above 1000 entries.


Casey


On 07/26/2017 11:05 AM, Ryan Leimenstoll wrote:

Hi all,

We are currently trying to migrate our RGW Object Storage service from one zone 
to another (in the same zonegroup) in part to make use of erasure coded data 
pools. That being said, the rgw daemon is reliably getting OOM killed on the 
rgw origin host serving the original zone (and thus the current production 
data) as a result of high rgw memory usage. We are willing to consider more 
memory for the rgw daemon’s hosts to solve this problem, but was wondering what 
would be expected memory wise (at least as a rule of thumb). I noticed there 
were a few memory related rgw sync fixes in 10.2.9, but so far upgrading hasn’t 
seemed to prevent crashing.


Some details about our cluster:
Ceph Version: 10.2.9
OS: RHEL 7.3

584 OSDs
Serving RBD, CephFS, and RGW

RGW Origin Hosts:
Virtualized via KVM/QEMU, RHEL 7.3
Memory: 32GB
CPU: 12 virtual cores (Hypervisor processors: Intel E5-2630)

First zone data and index pools:
pool name KB  objects   clones degraded  
unfound   rdrd KB   wrwr KB
.rgw.buckets112190858231 3423974600
0   2713542251 265848150719475841837 153970795085
.rgw.buckets.index0 497200  
  0   3721485483   5926323574 360300980


Thanks,
Ryan Leimenstoll
University of Maryland Institute for Advanced Computer Studies

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Casey Bodley



On 06/22/2017 10:40 AM, Dan van der Ster wrote:

On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley <cbod...@redhat.com> wrote:

On 06/22/2017 04:00 AM, Dan van der Ster wrote:

I'm now running the three relevant OSDs with that patch. (Recompiled,
replaced /usr/lib64/rados-classes/libcls_log.so with the new version,
then restarted the osds).

It's working quite well, trimming 10 entries at a time instead of
1000, and no more timeouts.

Do you think it would be worth decreasing this hardcoded value in ceph
proper?

-- Dan


I do, yeah. At least, the trim operation should be able to pass in its own
value for that. I opened a ticket for that at
http://tracker.ceph.com/issues/20382.

I'd also like to investigate using the ObjectStore's OP_OMAP_RMKEYRANGE
operation to trim a range of keys in a single osd op, instead of generating
a different op for each key. I have a PR that does this at
https://github.com/ceph/ceph/pull/15183. But it's still hard to guarantee
that leveldb can process the entire range inside of the suicide timeout.

I wonder if that would help. Here's what I've learned today:

   * two of the 3 relevant OSDs have something screwy with their
leveldb. The primary and 3rd replica are ~quick at trimming for only a
few hundred keys, whilst the 2nd OSD is very very fast always.
   * After manually compacting the two slow OSDs, they are fast again
for just a few hundred trims. So I'm compacting, trimming, ..., in a
loop now.
   * I moved the omaps to SSDs -- doesn't help. (iostat confirms this
is not IO bound).
   * CPU util on the slow OSDs gets quite high during the slow trimming.
   * perf top is below [1]. leveldb::Block::Iter::Prev and
leveldb::InternalKeyComparator::Compare are notable.
   * The always fast OSD shows no leveldb functions in perf top while trimming.

I've tried bigger leveldb cache and block sizes, compression on and
off, and played with the bloom size up to 14 bits -- none of these
changes make any difference.

At this point I'm not confident this trimming will ever complete --
there are ~20 million records to remove at maybe 1Hz.

How about I just delete the meta.log object? Would this use a
different, perhaps quicker, code path to remove those omap keys?

Thanks!

Dan

[1]

4.92%  libtcmalloc.so.4.2.6;5873e42b (deleted)  [.]
0x00023e8d
4.47%  libc-2.17.so [.] __memcmp_sse4_1
4.13%  libtcmalloc.so.4.2.6;5873e42b (deleted)  [.]
0x000273bb
3.81%  libleveldb.so.1.0.7  [.]
leveldb::Block::Iter::Prev
3.07%  libc-2.17.so [.]
__memcpy_ssse3_back
2.84%  [kernel] [k] port_inb
2.77%  libstdc++.so.6.0.19  [.]
std::string::_M_mutate
2.75%  libstdc++.so.6.0.19  [.]
std::string::append
2.53%  libleveldb.so.1.0.7  [.]
leveldb::InternalKeyComparator::Compare
1.32%  libtcmalloc.so.4.2.6;5873e42b (deleted)  [.]
0x00023e77
0.85%  [kernel] [k] _raw_spin_lock
0.80%  libleveldb.so.1.0.7  [.]
leveldb::Block::Iter::Next
0.77%  libtcmalloc.so.4.2.6;5873e42b (deleted)  [.]
0x00023a05
0.67%  libleveldb.so.1.0.7  [.]
leveldb::MemTable::KeyComparator::operator()
0.61%  libtcmalloc.so.4.2.6;5873e42b (deleted)  [.]
0x00023a09
0.58%  libleveldb.so.1.0.7  [.]
leveldb::MemTableIterator::Prev
0.51%  [kernel] [k] __schedule
0.48%  libruby.so.2.1.0 [.] ruby_yyparse


Hi Dan,

Removing an object will try to delete all of its keys at once, which 
should be much faster. It's also very likely to hit your suicide 
timeout, so you'll have to keep retrying until it stops killing your osd.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-22 Thread Casey Bodley



On 06/22/2017 04:00 AM, Dan van der Ster wrote:

I'm now running the three relevant OSDs with that patch. (Recompiled,
replaced /usr/lib64/rados-classes/libcls_log.so with the new version,
then restarted the osds).

It's working quite well, trimming 10 entries at a time instead of
1000, and no more timeouts.

Do you think it would be worth decreasing this hardcoded value in ceph proper?

-- Dan


I do, yeah. At least, the trim operation should be able to pass in its 
own value for that. I opened a ticket for that at 
http://tracker.ceph.com/issues/20382.


I'd also like to investigate using the ObjectStore's OP_OMAP_RMKEYRANGE 
operation to trim a range of keys in a single osd op, instead of 
generating a different op for each key. I have a PR that does this at 
https://github.com/ceph/ceph/pull/15183. But it's still hard to 
guarantee that leveldb can process the entire range inside of the 
suicide timeout.


Casey




On Wed, Jun 21, 2017 at 3:51 PM, Casey Bodley <cbod...@redhat.com> wrote:

That patch looks reasonable. You could also try raising the values of
osd_op_thread_suicide_timeout and filestore_op_thread_suicide_timeout on
that osd in order to trim more at a time.


On 06/21/2017 09:27 AM, Dan van der Ster wrote:

Hi Casey,

I managed to trim up all shards except for that big #54. The others
all trimmed within a few seconds.

But 54 is proving difficult. It's still going after several days, and
now I see that the 1000-key trim is indeed causing osd timeouts. I've
manually compacted the relevant osd leveldbs, but haven't found any
way to speed up the trimming. It's now going at ~1-2Hz, so 1000 trims
per op locks things up for quite awhile.

I'm thinking of running those ceph-osd's with this patch:

# git diff
diff --git a/src/cls/log/cls_log.cc b/src/cls/log/cls_log.cc
index 89745bb..7dcd933 100644
--- a/src/cls/log/cls_log.cc
+++ b/src/cls/log/cls_log.cc
@@ -254,7 +254,7 @@ static int cls_log_trim(cls_method_context_t hctx,
bufferlist *in, bufferlist *o
   to_index = op.to_marker;
 }

-#define MAX_TRIM_ENTRIES 1000
+#define MAX_TRIM_ENTRIES 10
 size_t max_entries = MAX_TRIM_ENTRIES;

 int rc = cls_cxx_map_get_vals(hctx, from_index, log_index_prefix,
max_entries, );


What do you think?

-- Dan




On Mon, Jun 19, 2017 at 5:32 PM, Casey Bodley <cbod...@redhat.com> wrote:

Hi Dan,

That's good news that it can remove 1000 keys at a time without hitting
timeouts. The output of 'du' will depend on when the leveldb compaction
runs. If you do find that compaction leads to suicide timeouts on this
osd
(you would see a lot of 'leveldb:' output in the log), consider running
offline compaction by adding 'leveldb compact on mount = true' to the osd
config and restarting.

Casey


On 06/19/2017 11:01 AM, Dan van der Ster wrote:

On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley <cbod...@redhat.com>
wrote:

On 06/14/2017 05:59 AM, Dan van der Ster wrote:

Dear ceph users,

Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:

2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14 11:22:04.143903 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
request 480.140904 seconds old, received at 2017-06-14
11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
0=[] ondisk+write+known_if_redirected e7752) currently waiting for
scrub
...
2017-06-14 11:22:06.729306 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
deep-scrub ok

We have log_meta: true, log_data: false on this (our only) region [1],
which IIRC we setup to enable indexless buckets.

I'm obviously unfamiliar with rgw meta and data logging, and have a
few questions:

 1. AFAIU, it is used by the rgw multisite feature. Is it safe to
turn
it off when not using multisite?


It's a good idea to turn that off, yes.

First, make sure that you have configured a default
realm/zonegroup/zone:

$ radosgw-admin realm default --rgw-realm   (you can
determine
realm name from 'radosgw-admin realm list')
$ radosgw-admin zonegroup default --rgw-zonegroup default
$ radosgw-admin zone default --rgw-zone default


Thanks. This had already been done, as confirmed with radosgw-admin
realm get-default.


Then you can modify the zonegroup (aka region):

$ radosgw-admin zonegroup get > zonegroup.json
$ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json
$ radosgw-admin zonegroup set < zonegroup.json

Then commit the updated period configuration:

$ radosgw-admin period update --commit

Verify that the resulting period contains "log_meta": "false". Take
care
with future radosgw-admin commands on the zone/zonegroup, as they may
revert
log_meta back to true [1].


Great, this worked. FYI (and for others trying this in future), the
period update --commit b

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Casey Bodley

That patch looks reasonable. You could also try raising the values of 
osd_op_thread_suicide_timeout and filestore_op_thread_suicide_timeout on 
that osd in order to trim more at a time.


On 06/21/2017 09:27 AM, Dan van der Ster wrote:

Hi Casey,

I managed to trim up all shards except for that big #54. The others
all trimmed within a few seconds.

But 54 is proving difficult. It's still going after several days, and
now I see that the 1000-key trim is indeed causing osd timeouts. I've
manually compacted the relevant osd leveldbs, but haven't found any
way to speed up the trimming. It's now going at ~1-2Hz, so 1000 trims
per op locks things up for quite awhile.

I'm thinking of running those ceph-osd's with this patch:

# git diff
diff --git a/src/cls/log/cls_log.cc b/src/cls/log/cls_log.cc
index 89745bb..7dcd933 100644
--- a/src/cls/log/cls_log.cc
+++ b/src/cls/log/cls_log.cc
@@ -254,7 +254,7 @@ static int cls_log_trim(cls_method_context_t hctx,
bufferlist *in, bufferlist *o
  to_index = op.to_marker;
}

-#define MAX_TRIM_ENTRIES 1000
+#define MAX_TRIM_ENTRIES 10
size_t max_entries = MAX_TRIM_ENTRIES;

int rc = cls_cxx_map_get_vals(hctx, from_index, log_index_prefix,
max_entries, );


What do you think?

-- Dan




On Mon, Jun 19, 2017 at 5:32 PM, Casey Bodley <cbod...@redhat.com> wrote:

Hi Dan,

That's good news that it can remove 1000 keys at a time without hitting
timeouts. The output of 'du' will depend on when the leveldb compaction
runs. If you do find that compaction leads to suicide timeouts on this osd
(you would see a lot of 'leveldb:' output in the log), consider running
offline compaction by adding 'leveldb compact on mount = true' to the osd
config and restarting.

Casey


On 06/19/2017 11:01 AM, Dan van der Ster wrote:

On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley <cbod...@redhat.com> wrote:

On 06/14/2017 05:59 AM, Dan van der Ster wrote:

Dear ceph users,

Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:

2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14 11:22:04.143903 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
request 480.140904 seconds old, received at 2017-06-14
11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
0=[] ondisk+write+known_if_redirected e7752) currently waiting for
scrub
...
2017-06-14 11:22:06.729306 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
deep-scrub ok

We have log_meta: true, log_data: false on this (our only) region [1],
which IIRC we setup to enable indexless buckets.

I'm obviously unfamiliar with rgw meta and data logging, and have a
few questions:

1. AFAIU, it is used by the rgw multisite feature. Is it safe to turn
it off when not using multisite?


It's a good idea to turn that off, yes.

First, make sure that you have configured a default realm/zonegroup/zone:

$ radosgw-admin realm default --rgw-realm   (you can
determine
realm name from 'radosgw-admin realm list')
$ radosgw-admin zonegroup default --rgw-zonegroup default
$ radosgw-admin zone default --rgw-zone default


Thanks. This had already been done, as confirmed with radosgw-admin
realm get-default.


Then you can modify the zonegroup (aka region):

$ radosgw-admin zonegroup get > zonegroup.json
$ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json
$ radosgw-admin zonegroup set < zonegroup.json

Then commit the updated period configuration:

$ radosgw-admin period update --commit

Verify that the resulting period contains "log_meta": "false". Take care
with future radosgw-admin commands on the zone/zonegroup, as they may
revert
log_meta back to true [1].


Great, this worked. FYI (and for others trying this in future), the
period update --commit blocks all rgws for ~30s while they reload the
realm.


2. I started dumping the output of radosgw-admin mdlog list, and
cancelled it after a few minutes. It had already dumped 3GB of json
and I don't know how much more it would have written. Is something
supposed to be trimming the mdlog automatically?


There is automated mdlog trimming logic in master, but not jewel/kraken.
And
this logic won't be triggered if there is only one zone [2].


3. ceph df doesn't show the space occupied by omap objects -- is
there an indirect way to see how much space these are using?


You can inspect the osd's omap directory: du -sh
/var/lib/ceph/osd/osd0/current/omap


Cool. osd.155 (holding shard 54) has 3.3GB of omap, compared with
~100-300MB on other OSDs.


4. mdlog status has markers going back to 2016-10, see [2]. I suppose
we're not using this feature correctly? :-/

5. Suppose I were to set log_meta: false -- how would I delete these
log entries now that they are not needed?


There is a 'radosgw-a

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-19 Thread Casey Bodley


Hi Dan,

That's good news that it can remove 1000 keys at a time without hitting 
timeouts. The output of 'du' will depend on when the leveldb compaction 
runs. If you do find that compaction leads to suicide timeouts on this 
osd (you would see a lot of 'leveldb:' output in the log), consider 
running offline compaction by adding 'leveldb compact on mount = true' 
to the osd config and restarting.


Casey

On 06/19/2017 11:01 AM, Dan van der Ster wrote:

On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley <cbod...@redhat.com> wrote:

On 06/14/2017 05:59 AM, Dan van der Ster wrote:

Dear ceph users,

Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:

2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14 11:22:04.143903 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
request 480.140904 seconds old, received at 2017-06-14
11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
0=[] ondisk+write+known_if_redirected e7752) currently waiting for
scrub
...
2017-06-14 11:22:06.729306 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
deep-scrub ok

We have log_meta: true, log_data: false on this (our only) region [1],
which IIRC we setup to enable indexless buckets.

I'm obviously unfamiliar with rgw meta and data logging, and have a
few questions:

   1. AFAIU, it is used by the rgw multisite feature. Is it safe to turn
it off when not using multisite?


It's a good idea to turn that off, yes.

First, make sure that you have configured a default realm/zonegroup/zone:

$ radosgw-admin realm default --rgw-realm   (you can determine
realm name from 'radosgw-admin realm list')
$ radosgw-admin zonegroup default --rgw-zonegroup default
$ radosgw-admin zone default --rgw-zone default


Thanks. This had already been done, as confirmed with radosgw-admin
realm get-default.


Then you can modify the zonegroup (aka region):

$ radosgw-admin zonegroup get > zonegroup.json
$ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json
$ radosgw-admin zonegroup set < zonegroup.json

Then commit the updated period configuration:

$ radosgw-admin period update --commit

Verify that the resulting period contains "log_meta": "false". Take care
with future radosgw-admin commands on the zone/zonegroup, as they may revert
log_meta back to true [1].


Great, this worked. FYI (and for others trying this in future), the
period update --commit blocks all rgws for ~30s while they reload the
realm.


   2. I started dumping the output of radosgw-admin mdlog list, and
cancelled it after a few minutes. It had already dumped 3GB of json
and I don't know how much more it would have written. Is something
supposed to be trimming the mdlog automatically?


There is automated mdlog trimming logic in master, but not jewel/kraken. And
this logic won't be triggered if there is only one zone [2].


   3. ceph df doesn't show the space occupied by omap objects -- is
there an indirect way to see how much space these are using?


You can inspect the osd's omap directory: du -sh
/var/lib/ceph/osd/osd0/current/omap


Cool. osd.155 (holding shard 54) has 3.3GB of omap, compared with
~100-300MB on other OSDs.


   4. mdlog status has markers going back to 2016-10, see [2]. I suppose
we're not using this feature correctly? :-/

   5. Suppose I were to set log_meta: false -- how would I delete these
log entries now that they are not needed?


There is a 'radosgw-admin mdlog trim' command that can be used to trim them
one --shard-id (from 0 to 63) at a time. An entire log shard can be trimmed
with:

$ radosgw-admin mdlog trim --shard-id 0 --period
8d4fcb63-c314-4f9a-b3b3-0e61719ec258 --end-time 2020-1-1

*However*, there is a risk that bulk operations on large omaps will affect
cluster health by taking down OSDs. Not only can this bulk deletion take
long enough to trigger the osd/filestore suicide timeouts, the resulting
leveldb compaction after deletion is likely to block other omap operations
and hit the timeouts as well. This seems likely in your case, based on the
fact that you're already having issues with scrub.

We did this directly on shard 54, and indeed the command is taking a
looong time (but with no slow requests or osds being marked down).
After 45 minutes, du is still 3.3GB, so I can't tell if it's
progressing. I see ~1000 _omap_rmkeys messages every ~2 seconds:

2017-06-19 16:57:34.347222 7fc602640700 15
filestore(/var/lib/ceph/osd/ceph-155) _omap_rmkeys
24.1d_head/#24:ba0cd17d:::met
a.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54:head#
2017-06-19 16:57:34.347319 7fc602640700 10 filestore oid:
#24:ba0cd17d:::meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54:h
ead# not skipping op, *spos 67765185.0.0
2017-06-19 16:57:34.347326 7fc602640700 10 filestore  > he

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-15 Thread Casey Bodley



On 06/14/2017 05:59 AM, Dan van der Ster wrote:

Dear ceph users,

Today we had O(100) slow requests which were caused by deep-scrubbing
of the metadata log:

2017-06-14 11:07:55.373184 osd.155
[2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d
deep-scrub starts
...
2017-06-14 11:22:04.143903 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow
request 480.140904 seconds old, received at 2017-06-14
11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d
meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc
0=[] ondisk+write+known_if_redirected e7752) currently waiting for
scrub
...
2017-06-14 11:22:06.729306 osd.155
[2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d
deep-scrub ok

We have log_meta: true, log_data: false on this (our only) region [1],
which IIRC we setup to enable indexless buckets.

I'm obviously unfamiliar with rgw meta and data logging, and have a
few questions:

  1. AFAIU, it is used by the rgw multisite feature. Is it safe to turn
it off when not using multisite?


It's a good idea to turn that off, yes.

First, make sure that you have configured a default realm/zonegroup/zone:

$ radosgw-admin realm default --rgw-realm   (you can 
determine realm name from 'radosgw-admin realm list')

$ radosgw-admin zonegroup default --rgw-zonegroup default
$ radosgw-admin zone default --rgw-zone default

Then you can modify the zonegroup (aka region):

$ radosgw-admin zonegroup get > zonegroup.json
$ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json
$ radosgw-admin zonegroup set < zonegroup.json

Then commit the updated period configuration:

$ radosgw-admin period update --commit

Verify that the resulting period contains "log_meta": "false". Take care 
with future radosgw-admin commands on the zone/zonegroup, as they may 
revert log_meta back to true [1].




  2. I started dumping the output of radosgw-admin mdlog list, and
cancelled it after a few minutes. It had already dumped 3GB of json
and I don't know how much more it would have written. Is something
supposed to be trimming the mdlog automatically?


There is automated mdlog trimming logic in master, but not jewel/kraken. 
And this logic won't be triggered if there is only one zone [2].




  3. ceph df doesn't show the space occupied by omap objects -- is
there an indirect way to see how much space these are using?


You can inspect the osd's omap directory: du -sh 
/var/lib/ceph/osd/osd0/current/omap




  4. mdlog status has markers going back to 2016-10, see [2]. I suppose
we're not using this feature correctly? :-/

  5. Suppose I were to set log_meta: false -- how would I delete these
log entries now that they are not needed?


There is a 'radosgw-admin mdlog trim' command that can be used to trim 
them one --shard-id (from 0 to 63) at a time. An entire log shard can be 
trimmed with:


$ radosgw-admin mdlog trim --shard-id 0 --period 
8d4fcb63-c314-4f9a-b3b3-0e61719ec258 --end-time 2020-1-1


*However*, there is a risk that bulk operations on large omaps will 
affect cluster health by taking down OSDs. Not only can this bulk 
deletion take long enough to trigger the osd/filestore suicide timeouts, 
the resulting leveldb compaction after deletion is likely to block other 
omap operations and hit the timeouts as well. This seems likely in your 
case, based on the fact that you're already having issues with scrub.




Apologies if there are already good docs about this, which eluded my googling.

Best Regards,
Dan


[1] region get:

{
 "id": "61c0ff1a-4330-405a-9eb1-bb494d4daf82",
 "name": "default",
 "api_name": "default",
 "is_master": "true",
 "endpoints": [],
 "hostnames": [],
 "hostnames_s3website": [],
 "master_zone": "61c59385-085d-4caa-9070-63a3868dccb6",
 "zones": [
 {
 "id": "61c59385-085d-4caa-9070-63a3868dccb6",
 "name": "default",
 "endpoints": [],
 "log_meta": "true",
 "log_data": "false",
 "bucket_index_max_shards": 32,
 "read_only": "false"
 }
 ],
 "placement_targets": [
 {
 "name": "default-placement",
 "tags": []
 },
 {
 "name": "indexless",
 "tags": []
 }
 ],
 "default_placement": "default-placement",
 "realm_id": "552868ad-8898-4afb-a775-911297961cee"
}

[2] mdlog status:

No --period given, using current period=8d4fcb63-c314-4f9a-b3b3-0e61719ec258
[
...
 {
 "marker": "1_1475568296.712634_3.1",
 "last_update": "2016-10-04 08:04:56.712634Z"
 },
...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[1] http://tracker.ceph.com/issues/20320
[2] http://tracker.ceph.com/issues/20319
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] radosgw global quotas - how to set in jewel?

2017-04-05 Thread Casey Bodley

A new set of 'radosgw-admin global quota' commands were added for this, 
which we'll backport to kraken and jewel. You can view the updated 
documentation here: 
http://docs.ceph.com/docs/master/radosgw/admin/#reading-writing-global-quotas


Thanks again for pointing this out,
Casey


On 04/03/2017 03:23 PM, Graham Allan wrote:
Ah, thanks, I thought I was going crazy for a bit there! The global 
quota would be useful for us (now wanting to retroactively impose 
quotas on pre-existing users), but we can script a workaround instead.


Thanks,
Graham

On 03/29/2017 10:17 AM, Casey Bodley wrote:

Hi Graham, you're absolutely right. In jewel, these settings were moved
into the period, but radosgw-admin doesn't have any commands to modify
them. I opened a tracker issue for this at
http://tracker.ceph.com/issues/19409. For now, it looks like you're
stuck with the 'default quota' settings in ceph.conf.

Thanks,
Casey

On 03/27/2017 03:13 PM, Graham Allan wrote:

I'm following up to myself here, but I'd love to hear if anyone knows
how the global quotas can be set in jewel's radosgw. I haven't found
anything which has an effect - the documentation says to use:

radosgw-admin region-map get > regionmap.json
...edit the json file
radosgw-admin region-map set < regionmap.json

but this has no effect on jewel. There doesn't seem to be any
analogous function in the "period"-related commands which I think
would be the right place to look for jewel.

Am I missing something, or should I open a bug?

Graham




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw global quotas - how to set in jewel?

2017-03-29 Thread Casey Bodley

Hi Graham, you're absolutely right. In jewel, these settings were moved 
into the period, but radosgw-admin doesn't have any commands to modify 
them. I opened a tracker issue for this at 
http://tracker.ceph.com/issues/19409. For now, it looks like you're 
stuck with the 'default quota' settings in ceph.conf.


Thanks,
Casey

On 03/27/2017 03:13 PM, Graham Allan wrote:
I'm following up to myself here, but I'd love to hear if anyone knows 
how the global quotas can be set in jewel's radosgw. I haven't found 
anything which has an effect - the documentation says to use:


radosgw-admin region-map get > regionmap.json
...edit the json file
radosgw-admin region-map set < regionmap.json

but this has no effect on jewel. There doesn't seem to be any 
analogous function in the "period"-related commands which I think 
would be the right place to look for jewel.


Am I missing something, or should I open a bug?

Graham

On 03/21/2017 03:18 PM, Graham Allan wrote:

On 03/17/2017 11:47 AM, Casey Bodley wrote:


On 03/16/2017 03:47 PM, Graham Allan wrote:

This might be a dumb question, but I'm not at all sure what the
"global quotas" in the radosgw region map actually do.

It is like a default quota which is applied to all users or buckets,
without having to set them individually, or is it a blanket/aggregate
quota applied across all users and buckets in the region/zonegroup?

Graham


They're defaults that are applied in the absence of quota settings on
specific users/buckets, not aggregate quotas. I agree that the
documentation in http://docs.ceph.com/docs/master/radosgw/admin/ is not
clear about the relationship between 'default quotas' and 'global
quotas' - they're basically the same thing, except for their scope.


Thanks, that's great to know, and exactly what I hoped it would do. It
seemed most likely but not 100% obvious!

My next question is how to set/enable the master quota, since I'm not
sure that the documented procedure still works for jewel. Although
radosgw-admin doesn't acknowledge the "region-map" command in its help
output any more, it does accept it, however the "region-map set" appears
to have no effect.

I think I should be using the radosgw-admin period commands, but it's
not clear to me how I can update the quotas within the period_config

G.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw global quotas

2017-03-17 Thread Casey Bodley



On 03/16/2017 03:47 PM, Graham Allan wrote:
This might be a dumb question, but I'm not at all sure what the 
"global quotas" in the radosgw region map actually do.


It is like a default quota which is applied to all users or buckets, 
without having to set them individually, or is it a blanket/aggregate 
quota applied across all users and buckets in the region/zonegroup?


Graham


They're defaults that are applied in the absence of quota settings on 
specific users/buckets, not aggregate quotas. I agree that the 
documentation in http://docs.ceph.com/docs/master/radosgw/admin/ is not 
clear about the relationship between 'default quotas' and 'global 
quotas' - they're basically the same thing, except for their scope.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw. Strange behavior in 2 zone configuration

2017-03-06 Thread Casey Bodley




On 03/03/2017 07:40 AM, K K wrote:


Hello, all!

I have successfully create 2 zone cluster(se and se2). But my radosgw 
machines are sending many GET /admin/log requests to each other after 
put 10k items to cluster via radosgw. It's look like:


2017-03-03 17:31:17.897872 7f21b9083700 1 civetweb: 0x7f222001f660: 
10.30.18.24 - - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ 
HTTP/1.1" 200 0 - -
2017-03-03 17:31:17.944212 7f21ca0a5700 1 civetweb: 0x7f2200015510: 
10.30.18.24 - - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ 
HTTP/1.1" 200 0 - -
2017-03-03 17:31:17.945363 7f21b9083700 1 civetweb: 0x7f222001f660: 
10.30.18.24 - - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ 
HTTP/1.1" 200 0 - -
2017-03-03 17:31:17.988330 7f21ca0a5700 1 civetweb: 0x7f2200015510: 
10.30.18.24 - - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ 
HTTP/1.1" 200 0 - -
2017-03-03 17:31:18.005993 7f21b9083700 1 civetweb: 0x7f222001f660: 
10.30.18.24 - - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ 
HTTP/1.1" 200 0 - -
2017-03-03 17:31:18.006234 7f21c689e700 1 civetweb: 0x7f221c011260: 
10.30.18.24 - - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ 
HTTP/1.1" 200 0 - -


up to 2k rps!!! Do anybody know what is it???



These are the radosgw instances polling each other for changes to their 
metadata and data logs. Each log has many shards (64 or 128), and we 
poll on each shard separately, which generates a lot of small requests. 
I would expect these requests to be more frequent during heavy load, and 
less frequent when idle. We don't currently do any throttling on these 
connections, though we really should.


Casey


Tcpdump show the request is:

GET 
/admin/log/?type=data=100=bfe2e3bb-2040-4b1a-9ccb-ab5347ce3017 
HTTP/1.1

Host: se2.local
Accept: */*
Transfer-Encoding: chunked
AUTHORIZATION: AWS hEY2W7nW3tdodGrsnrdv:v6+m2FGGhqCSDQteGJ4w039X1uw=
DATE: Fri Mar 3 12:32:20 2017
Expect: 100-continue

and answer:

...2...m{"marker":"1_1488542463.536646_1448.1","last_update":"2017-03-03 
12:01:03.536646Z"}




All system install on:
OS: Ubuntu 16.04
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

rga sync status
2017-03-03 17:36:20.146017 7f7a72b5ea00 0 error in read_id for id : 
(2) No such file or directory
2017-03-03 17:36:20.147015 7f7a72b5ea00 0 error in read_id for id : 
(2) No such file or directory

realm d9ed5678-5734-4609-bf7a-fe3d5f700b23 (s)
zonegroup bfe2e3bb-2040-4b1a-9ccb-ab5347ce3017 (se)
zone 9b212551-a7cf-4aaa-9ef6-b18a31a6e032 (se-k8)
metadata sync no sync (zone is master)
data sync source: 029e0f49-f4dc-4f29-8855-bcc23a8bbcd9 (se2-k12)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source



My config files are:

[client.radosgw.se2-k12-2]
rgw data = /var/lib/ceph/radosgw/ceph-radosgw.se2-k12-2
rgw zonegroup = se
rgw zone = se2-k12
#rgw zonegroup root pool = se.root
#rgw zone root pool = se.root
keyring = /etc/ceph/bak.client.radosgw.se2-k12-2.keyring
rgw host = cbrgw04
rgw dns name = se2.local
log file = /var/log/radosgw/client.radosgw.se2-k12-2.log
rgw_frontends = "civetweb num_threads=50 port=80"
rgw cache lru size = 10
rgw cache enabled = false
#debug rgw = 20
rgw enable ops log = false
#log to stderr = false
rgw enable usage log = false
rgw swift versioning enabled = true
rgw swift url = http://se2.local/
rgw override bucket index max shards = 20
rgw print continue = false


[client.radosgw.se-k8-2]
rgw data = /var/lib/ceph/radosgw/ceph-radosgw.se-k8-2
rgw zonegroup = se
rgw zone = se-k8
#rgw zonegroup root pool = .se.root
#rgw zone root pool = .se.root
keyring = /etc/ceph/ceph.client.radosgw.se-k8-2.keyring
rgw host = cnrgw02
rgw dns name = se.local
log file = /var/log/radosgw/client.radosgw.se-k8-2.log
rgw_frontends = "civetweb num_threads=100 port=80"
rgw cache enabled = false
rgw cache lru size = 10
#debug rgw = 20
rgw enable ops log = false
#log to stderr = false
rgw enable usage log = false
rgw swift versioning enabled = true
rgw swift url = http://se.local
rgw override bucket index max shards = 20
rgw print continue = false

rga zonegroup get
{
"id": "bfe2e3bb-2040-4b1a-9ccb-ab5347ce3017",
"name": "se",
"api_name": "se",
"is_master": "true",
"endpoints": [
"http:\/\/se.local:80"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "9b212551-a7cf-4aaa-9ef6-b18a31a6e032",
"zones": [
{
"id": "029e0f49-f4dc-4f29-8855-bcc23a8bbcd9",
"name": "se2-k12",
"endpoints": [
"http:\/\/se2.local:80"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false"
},
{
"id": "9b212551-a7cf-4aaa-9ef6-b18a31a6e032",
"name": "se-k8",
"endpoints": [
"http:\/\/se.local:80"
],
"log_meta": "true",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "d9ed5678-5734-4609-bf7a-fe3d5f700b23"
}



--
K K



___
ceph-users mailing list

Re: [ceph-users] radosgw-admin bucket link: empty bucket instance id

2017-02-21 Thread Casey Bodley

When it complains about a missing bucket instance id, that's what it's 
expecting to get from the --bucket-id argument. That's the "id" field 
shown in bucket stats. Try this?


$ radosgw-admin bucket link --bucket=XXX --bucket-id=YYY --uid=ZZZ

Casey


On 02/21/2017 08:30 AM, Valery Tschopp wrote:

Hi,

I've the same problem about 'radosgw-admin bucket link --bucket XXX 
--uid YYY', but with a Jewel radosgw


The admin rest API [1] do not work either :(

Any idea?

[1]: http://docs.ceph.com/docs/master/radosgw/adminops/#link-bucket


On 28/01/16 17:03 , Wido den Hollander wrote:

Hi,

I'm trying to link a bucket to a new user and this is failing for me.

The Ceph version is 0.94.5 (Hammer).

The bucket is called 'packer' and I can verify that it exists:

$ radosgw-admin bucket stats --bucket packer

{
"bucket": "packer",
"pool": ".rgw.buckets",
"index_pool": ".rgw.buckets",
"id": "ams02.5862567.3564",
"marker": "ams02.5862567.3564",
"owner": "X_beta",
"ver": "0#21975",
"master_ver": "0#0",
"mtime": "2015-08-04 12:31:06.00",
"max_marker": "0#",
"usage": {
"rgw.main": {
"size_kb": 10737764,
"size_kb_actual": 10737836,
"num_objects": 27
},
"rgw.multimeta": {
"size_kb": 0,
"size_kb_actual": 0,
"num_objects": 0
}
},
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}

Now when I try to link this bucket it fails:

$ radosgw-admin bucket link --bucket packer --uid 

"failure: (22) Invalid argument: empty bucket instance id"

It seems like this is a bug in the radosgw-admin tool where it doesn't
parse the --bucket argument properly.

Any ideas?

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOSGW S3 api ACLs

2017-02-16 Thread Casey Bodley




On 02/16/2017 07:17 AM, Josef Zelenka wrote:

Hello everyone,
i've been struggling for the past few days with setting up ACLs for 
buckets on my radosgw. I want to use the buckets with the s3 API and i 
want them to have the ACL set up like this:
every file that gets pushed into the bucket is automatically readable 
by everyone and writeable only by a specific user. Currently i was 
able to set the ACLs i want on existing files, but i want them to be 
set up in a way that will automatically do this, i.e the entire 
bucket. Can anyone shed some light on ACLs in S3 API and RGW?

Thanks
Josef Zelenka
Cloudevelops
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Josef,

You seem to have a good grasp on the limitations of bucket acls - they 
apply to operations that list/create/delete objects, but don't help you 
control access to the objects themselves. Object acls do this, but they 
have to be applied to individual objects. There's no way to set a custom 
object acl that's automatically applied to all new objects in a bucket.


In S3, this kind of access control is accomplished with user or bucket 
policy. Amazon has some 'Guidelines for Using the Available Access 
Policy Options' at 
http://docs.aws.amazon.com/AmazonS3/latest/dev/access-policy-alternatives-guidelines.html 
that covers the differences between ACLs and policy. RGW does not 
currently have support for these policies, but there is work in progress.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph rados gw, select objects by metadata

2017-01-30 Thread Casey Bodley



On 01/30/2017 06:11 AM, Johann Schwarzmeier wrote:

Hello Wido,
That is not good news, but it's what i expected. Thanks for your qick 
answer.

Jonny

Am 2017-01-30 11:57, schrieb Wido den Hollander:
Op 30 januari 2017 om 10:29 schreef Johann Schwarzmeier 
:



Hello,
I’m quite new to ceph and radosgw. With the python API, I found calls
for writing objects via boto API. It’s also possible to add metadata’s
to our objects. But now I have a question: is it possible to select or
search objects via metadata?  A little more in detail: I want to store
objects with metadata like color = blue, color red and so on. And the I
would select all object with color = blue. Sorry for a stupid question
but I’m not able to find an answer in the documentation.


The RADOS Gateway implements the S3 API from Amazon and doesn't allow 
for this.


The whole design for Ceph is also that it's object-name based and you
can't query for xattr values nor names.

So what you are trying to achieve will not work.

Wido


Br Jonny

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


You might be interested in the work Yehuda has done to integrate with 
elasticsearch, which does allow you to search for user-specified 
metadata. You can learn more about it here: 
http://tracker.ceph.com/projects/ceph/wiki/Rgw_metadata_search

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU load with radosgw instances

2016-09-16 Thread Casey Bodley

In the meantime, we've made changes to radosgw so that it can detect and 
work around this libcurl bug. You can track the progress of this 
workaround (currently in master and pending backport to jewel) at 
http://tracker.ceph.com/issues/16695.


Casey


On 09/16/2016 01:38 PM, Ken Dreyer wrote:

Hi Lewis,

This sounds a lot like https://bugzilla.redhat.com/1347904 , currently
slated for the upcoming RHEL 7.3 (and CentOS 7.3).

There's an SRPM in that BZ that you can rebuild and test out. This
method won't require you to keep chasing upstream curl versions
forever (curl has a lot of CVEs).

Mind testing that out and reporting back?

- Ken


On Fri, Sep 16, 2016 at 11:06 AM, lewis.geo...@innoscale.net
 wrote:

Hi Yehuda,
Well, again, thank you!

I was able to get a package built from the latest curl release, and after
upgrading on my radosgw hosts, the load is no longer running high. The load
is just sitting at almost nothing and I only see the radosgw process using
CPU when it is actually doing something now.

So, I am still curious if this would be considered a bug or not, since the
curl version from the base CentOS repo seems to have an issue.

Have a good day,

Lewis George




From: "lewis.geo...@innoscale.net" 
Sent: Friday, September 16, 2016 7:28 AM
To: "Yehuda Sadeh-Weinraub" 

Cc: "ceph-users@lists.ceph.com" 
Subject: Re: [ceph-users] High CPU load with radosgw instances

Hi Yehuda,
Thank you for the idea. I will try to test that and see if it helps.

If that is the case, would that be considered a bug with radosgw? I ask
because, that version of curl seems to be what is currently standard on
RHEL/CentOS 7 (fully updated). I will have to either compile it or search
3rd-party repos for newer version, which is not usually something that is
great.

Have a good day,

Lewis George



From: "Yehuda Sadeh-Weinraub" 
Sent: Thursday, September 15, 2016 10:42 PM
To: lewis.geo...@innoscale.net
Cc: "ceph-users@lists.ceph.com" 
Subject: Re: [ceph-users] High CPU load with radosgw instances

On Thu, Sep 15, 2016 at 4:53 PM, lewis.geo...@innoscale.net
 wrote:

Hi,
So, maybe someone has an idea of where to go on this.

I have just setup 2 rgw instances in a multisite setup. They are working
nicely. I have add a couple of test buckets and some files to make sure it
works is all. The status shows both are caught up. Nobody else is
accessing
or using them.

However, the CPU load on both hosts is sitting at like 3.00, with the
radosgw process taking up 99% CPU constantly. I do not see anything in the
logs happening at all.

Any thoughts or direction?


We've seen that happening when running on a system with older version
of libcurl (e.g., 7.29). If that's the case upgrading to a newer
version should fix it for you.

Yehuda



Have a good day,

Lewis George


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw meta pool

2016-09-09 Thread Casey Bodley


Hi,

My (limited) understanding of this metadata heap pool is that it's an 
archive of metadata entries and their versions. According to Yehuda, 
this was intended to support recovery operations by reverting specific 
metadata objects to a previous version. But nothing has been implemented 
so far, and I'm not aware of any plans to do so. So these objects are 
being created, but never read or deleted.


This was discussed in the rgw standup this morning, and we agreed that 
this archival should be made optional (and default to off), most likely 
by assigning an empty pool name to the zone's 'metadata_heap' field. 
I've created a ticket at http://tracker.ceph.com/issues/17256 to track 
this issue.


Casey


On 09/09/2016 11:01 AM, Warren Wang - ISD wrote:

A little extra context here. Currently the metadata pool looks like it is
on track to exceed the number of objects in the data pool, over time. In a
brand new cluster, we¹re already up to almost 2 million in each pool.

 NAME  ID USED  %USED MAX AVAIL
OBJECTS
 default.rgw.buckets.data  17 3092G  0.86  345T
2013585
 default.rgw.meta  25  743M 0  172T
1975937

We¹re concerned this will be unmanageable over time.

Warren Wang


On 9/9/16, 10:54 AM, "ceph-users on behalf of Pavan Rallabhandi"
 wrote:


Any help on this is much appreciated, am considering to fix this, given
it¹s confirmed an issue unless am missing something obvious.

Thanks,
-Pavan.

On 9/8/16, 5:04 PM, "ceph-users on behalf of Pavan Rallabhandi"
 wrote:

Trying it one more time on the users list.

In our clusters running Jewel 10.2.2, I see default.rgw.meta pool

running into large number of objects, potentially to the same range of
objects contained in the data pool.

I understand that the immutable metadata entries are now stored in

this heap pool, but I couldn¹t reason out why the metadata objects are
left in this pool even after the actual bucket/object/user deletions.

The put_entry() promptly seems to be storing the same in the heap

pool
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_metadata.cc#L880,
but I do not see them to be reaped ever. Are they left there for some
reason?

Thanks,

-Pavan.


___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Excluding buckets in RGW Multi-Site Sync

2016-09-08 Thread Casey Bodley



On 09/08/2016 08:35 AM, Wido den Hollander wrote:

Hi,

I've been setting up a RGW Multi-Site [0] configuration in 6 VMs. 3 VMs per 
cluster and one RGW per cluster.

Works just fine, I can create a user in the master zone, create buckets and 
upload data using s3cmd (S3).

What I see is that ALL data is synced between the two zones. While I understand 
that's indeed the purpose of it, is there a way to disable the sync for 
specific buckets/users?

Wido

[0]: http://docs.ceph.com/docs/master/radosgw/multisite/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Wido,

This came up recently on the ceph-devel list (see [rgw multisite] 
disable specified bucket data sync), and there's an initial PR to do 
this at https://github.com/ceph/ceph/pull/10995.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

84 matches

Mail list logo