from:"Yehuda Sadeh\-Weinraub"

Re: [ceph-users] RGW put file question

2015-02-12 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: baijia...@126.com
> To: "ceph-users" 
> Sent: Wednesday, February 4, 2015 5:47:03 PM
> Subject: [ceph-users] RGW put file question
> 
> when I put file failed, and run the function "
> RGWRados::cls_obj_complete_cancel",
> why we use CLS_RGW_OP_ADD not use CLS_RGW_OP_CANCEL?
> why we set poolid is -1 and set epoch is 0?
> 

I'm not sure, could very well be a bug. It should definitely be OP_CANCEL, but 
going back through the history it seems like it has been OP_ADD since at least 
argonaut. How did you notice it? It might explain a couple of issues that we've 
been seeing.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Having problem to start Radosgw

2015-02-14 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "B L" 
> To: ceph-users@lists.ceph.com
> Sent: Friday, February 13, 2015 11:55:22 PM
> Subject: [ceph-users] Having problem to start Radosgw
> 
> Hi all,
> 
> I’m having a problem to start radosgw, giving me error that I can’t diagnose:
> 
> $ radosgw -c ceph.conf -d
> 
> 2015-02-14 07:46:58.435802 7f9d739557c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27609
> 2015-02-14 07:46:58.437284 7f9d739557c0 -1 asok(0x7f9d74da80a0)
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
> bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17)
> File exists
> 2015-02-14 07:46:58.499004 7f9d739557c0 0 framework: fastcgi
> 2015-02-14 07:46:58.499016 7f9d739557c0 0 starting handler: fastcgi
> 2015-02-14 07:46:58.501160 7f9d477fe700 0 ERROR: FCGX_Accept_r returned -9
> 2015-02-14 07:46:58.594271 7f9d648ab700 -1 failed to list objects
> pool_iterate returned r=-2
> 2015-02-14 07:46:58.594276 7f9d648ab700 0 ERROR: lists_keys_next(): ret=-2
> 2015-02-14 07:46:58.594278 7f9d648ab700 0 ERROR: sync_all_users() returned
> ret=-2
> ^C2015-02-14 07:47:29.119185 7f9d47fff700 1 handle_sigterm
> 2015-02-14 07:47:29.119214 7f9d47fff700 1 handle_sigterm set alarm for 120
> 2015-02-14 07:47:29.119222 7f9d739557c0 -1 shutting down
> 2015-02-14 07:47:29.142726 7f9d739557c0 1 final shutdown
> 
> 
> since it complains that this file exists:
> /var/run/ceph/ceph-client.admin.asok, I removed it, but now, I get this
> error:
> 
> $ radosgw -c ceph.conf -d
> 
> 2015-02-14 07:47:55.140276 7f31cc0637c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27741
> 2015-02-14 07:47:55.201561 7f31cc0637c0 0 framework: fastcgi
> 2015-02-14 07:47:55.201567 7f31cc0637c0 0 starting handler: fastcgi
> 2015-02-14 07:47:55.203443 7f319effd700 0 ERROR: FCGX_Accept_r returned -9

Error 9 is EBADF (bad file number). Looks like there's an issue with the socket 
created for the fastcgi communication. How did you configure it?

Yehuda

> 2015-02-14 07:47:55.304048 7f319700 -1 failed to list objects
> pool_iterate returned r=-2
> 2015-02-14 07:47:55.304054 7f319700 0 ERROR: lists_keys_next(): ret=-2
> 2015-02-14 07:47:55.304060 7f319700 0 ERROR: sync_all_users() returned
> ret=-2
> 
> 
> Cant somebody help me where to start fixing this?
> 
> Thanks!
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Having problem to start Radosgw

2015-02-14 Thread Yehuda Sadeh-Weinraub

Not sure what's wrong there, might be an issue with the specific path; the 
error happens down at the libfcgi code. You can try running radosgw with 
strace, e.g.,
strace -F -T -tt -o/tmp/strace.out radosgw -f

The output might give a clearer picture as to what exactly happened.

Yehuda

- Original Message -
> From: "B L" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Saturday, February 14, 2015 8:32:52 AM
> Subject: Re: [ceph-users] Having problem to start Radosgw
> 
> Hello Yehuda,
> 
> Thanks for your response!
> 
> This is my RGW configuration:
> https://gist.github.com/anonymous/c0f62783feac88e069c7
> <https://gist.github.com/anonymous/c0f62783feac88e069c7>
> and
> This is Tengine configuration:
> https://gist.github.com/anonymous/90b77c168ed0606db03d
> <https://gist.github.com/anonymous/90b77c168ed0606db03d>
> 
> Please let me know if you need something else?
> 
> Best!
> 
> > On Feb 14, 2015, at 6:22 PM, Yehuda Sadeh-Weinraub 
> > wrote:
> > 
> > 
> > 
> > - Original Message -
> >> From: "B L" 
> >> To: ceph-users@lists.ceph.com
> >> Sent: Friday, February 13, 2015 11:55:22 PM
> >> Subject: [ceph-users] Having problem to start Radosgw
> >> 
> >> Hi all,
> >> 
> >> I’m having a problem to start radosgw, giving me error that I can’t
> >> diagnose:
> >> 
> >> $ radosgw -c ceph.conf -d
> >> 
> >> 2015-02-14 07:46:58.435802 7f9d739557c0 0 ceph version 0.80.7
> >> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27609
> >> 2015-02-14 07:46:58.437284 7f9d739557c0 -1 asok(0x7f9d74da80a0)
> >> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed
> >> to
> >> bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok':
> >> (17)
> >> File exists
> >> 2015-02-14 07:46:58.499004 7f9d739557c0 0 framework: fastcgi
> >> 2015-02-14 07:46:58.499016 7f9d739557c0 0 starting handler: fastcgi
> >> 2015-02-14 07:46:58.501160 7f9d477fe700 0 ERROR: FCGX_Accept_r returned -9
> >> 2015-02-14 07:46:58.594271 7f9d648ab700 -1 failed to list objects
> >> pool_iterate returned r=-2
> >> 2015-02-14 07:46:58.594276 7f9d648ab700 0 ERROR: lists_keys_next(): ret=-2
> >> 2015-02-14 07:46:58.594278 7f9d648ab700 0 ERROR: sync_all_users() returned
> >> ret=-2
> >> ^C2015-02-14 07:47:29.119185 7f9d47fff700 1 handle_sigterm
> >> 2015-02-14 07:47:29.119214 7f9d47fff700 1 handle_sigterm set alarm for 120
> >> 2015-02-14 07:47:29.119222 7f9d739557c0 -1 shutting down
> >> 2015-02-14 07:47:29.142726 7f9d739557c0 1 final shutdown
> >> 
> >> 
> >> since it complains that this file exists:
> >> /var/run/ceph/ceph-client.admin.asok, I removed it, but now, I get this
> >> error:
> >> 
> >> $ radosgw -c ceph.conf -d
> >> 
> >> 2015-02-14 07:47:55.140276 7f31cc0637c0 0 ceph version 0.80.7
> >> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27741
> >> 2015-02-14 07:47:55.201561 7f31cc0637c0 0 framework: fastcgi
> >> 2015-02-14 07:47:55.201567 7f31cc0637c0 0 starting handler: fastcgi
> >> 2015-02-14 07:47:55.203443 7f319effd700 0 ERROR: FCGX_Accept_r returned -9
> > 
> > Error 9 is EBADF (bad file number). Looks like there's an issue with the
> > socket created for the fastcgi communication. How did you configure it?
> > 
> > Yehuda
> > 
> >> 2015-02-14 07:47:55.304048 7f319700 -1 failed to list objects
> >> pool_iterate returned r=-2
> >> 2015-02-14 07:47:55.304054 7f319700 0 ERROR: lists_keys_next(): ret=-2
> >> 2015-02-14 07:47:55.304060 7f319700 0 ERROR: sync_all_users() returned
> >> ret=-2
> >> 
> >> 
> >> Cant somebody help me where to start fixing this?
> >> 
> >> Thanks!
> >> 
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Having problem to start Radosgw

2015-02-14 Thread Yehuda Sadeh-Weinraub

No, something like this: 

sudo strace -F -T -tt -o/tmp/strace.out radosgw -c ceph.conf -f 

- Original Message -

> From: "B L" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Saturday, February 14, 2015 9:01:09 AM
> Subject: Re: [ceph-users] Having problem to start Radosgw

> Shall I run it like this:

> sudo radosgw -c ceph.conf -d strace -F -T -tt -o/tmp/strace.out radosgw -f

> > On Feb 14, 2015, at 6:55 PM, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> > wrote:
> 

> > strace -F -T -tt -o/tmp/strace.out radosgw -f
> 

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Having problem to start Radosgw

2015-02-14 Thread Yehuda Sadeh-Weinraub

- Original Message -

> From: "B L" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Saturday, February 14, 2015 11:03:42 AM
> Subject: Re: [ceph-users] Having problem to start Radosgw

> Hello Yehyda,

> The strace command you referred to me, shows this:
> https://gist.github.com/anonymous/8e9f1ced485996a263bb

> Additionally, I traced this log file:
> /var/log/radosgw/ceph-client.radosgw.gateway

> it has the following:

> 2015-02-12 18:23:32.247679 7fecca5257c0 -1 did not load config file, using
> default settings.
> 2015-02-12 18:23:32.247745 7fecca5257c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20477
> 2015-02-12 18:23:32.251192 7fecca5257c0 -1 Couldn't init storage provider
> (RADOS)
> 2015-02-12 18:23:58.494026 7faab31377c0 -1 did not load config file, using
> default settings.
> 2015-02-12 18:23:58.494092 7faab31377c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20509
> 2015-02-12 18:23:58.497420 7faab31377c0 -1 Couldn't init storage provider
> (RADOS)
> 2015-02-14 17:13:03.478688 7f86f09567c0 -1 did not load config file, using
> default settings.
> 2015-02-14 17:13:03.478778 7f86f09567c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 2989
> 2015-02-14 17:13:03.482850 7f86f09567c0 -1 Couldn't init storage provider
> (RADOS)
> 2015-02-14 17:13:29.477530 7ff18226a7c0 -1 did not load config file, using
> default settings.
> 2015-02-14 17:13:29.477595 7ff18226a7c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3033
> 2015-02-14 17:13:29.481173 7ff18226a7c0 -1 Couldn't init storage provider
> (RADOS)
> 2015-02-14 17:21:00.950847 7ffee3a3b7c0 -1 did not load config file, using
> default settings.
> 2015-02-14 17:21:00.950916 7ffee3a3b7c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3086
> 2015-02-14 17:21:00.954085 7ffee3a3b7c0 -1 Couldn't init storage provider
> (RADOS)

> Turns out to be that the last line of the logs is thrown out by this piece of
> code in rgw_main.cc:

> …
> …

> FCGX_Init();

> RGWStoreManager store_manager;

> if (!store_manager.init("rados", g_ceph_context)) {
> derr << "Couldn't init storage provider (RADOS)" << dendl;
> return EIO;
> }

> RGWProcess process(g_ceph_context, 20);

> process.run();

> return 0;

> N.B. you can find it in:(
> http://workbench.dachary.org/ceph/ceph/raw/8d63e140777bbdd061baa6845d57e6c3cc771f76/src/rgw/rgw_main.cc
> ) , 10th line from below.

> Is that by any means related to the problem?

Not related. This actually means that it couldn't connect to the rados backend, 
so there's a different issue now. The strace log doesn't provide much with 
regard to the original issue as it didn't get to that part now. You can try 
bumping up the debug level (debug rgw = 20, debug ms = 1). I assume that the 
issue that you're seeing is that the wrong rados user and/or wrong cephx keys 
are being used. Try to run it again as you do usually, and see what the regular 
params that are being passed when starting radosgw; use these when running the 
strace command. 

Yehuda 

> > On Feb 14, 2015, at 7:24 PM, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> > wrote:
> 

> > sudo strace -F -T -tt -o/tmp/strace.out radosgw -c ceph.conf -f
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Having problem to start Radosgw

2015-02-14 Thread Yehuda Sadeh-Weinraub


add the '-n client.radosgw.gateway' param when you're running the gateway, all 
your settings are under that user.

Yehuda

- Original Message -
> From: "B L" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Saturday, February 14, 2015 2:56:54 PM
> Subject: Re: [ceph-users] Having problem to start Radosgw
> 
> Yehuda ..
> 
> In case you will need to know more about my system
> 
> Here is my full cluster configuration:
> https://gist.github.com/anonymous/fb4c314320d7df75569a
> 
> And, that’s my ceph cluster status:
> 
> $ ceph -s
> 
> cluster 17bea68b-1634-4cd1-8b2a-00a60ef4761d
> health HEALTH_WARN 203 pgs degraded; 203 pgs stuck unclean; recovery 6/151
> objects degraded (3.974%)
> monmap e1: 1 mons at {ceph-node1=172.31.0.84:6789/0}, election epoch 2,
> quorum 0 ceph-node1
> osdmap e93: 6 osds: 6 up, 6 in
> pgmap v3676: 1920 pgs, 16 pools, 10241 kB data, 51 objects
> 279 MB used, 18086 MB / 18365 MB avail
> 6/151 objects degraded (3.974%)
> 203 active+degraded
> 1717 active+clean
> 
> It was fully healthy before adding the radosgw pools .. yet, I still can put
> objects to the cluster (without using RGW)
> 
> Best!
> 
> 
> 
> 
> 
> On Feb 15, 2015, at 12:39 AM, B L < super.itera...@gmail.com > wrote:
> 
> That’s what I usually do to check if rgw is running with no problems: sudo
> radosgw -c ceph.conf -d
> 
> I already pumped up the log level, but I can’t see any change or verbosity
> level increase of the logs, I still get the same:
> 
> 2015-02-14 22:27:57.513151 7f26c79d27c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 7924
> 2015-02-14 22:27:57.573564 7f26c79d27c0 0 framework: fastcgi
> 2015-02-14 22:27:57.573569 7f26c79d27c0 0 starting handler: fastcgi
> 2015-02-14 22:27:57.575349 7f269affd700 0 ERROR: FCGX_Accept_r returned -9
> 2015-02-14 22:27:57.670610 7f269bfff700 0 ERROR: can't read user header:
> ret=-2
> 2015-02-14 22:27:57.670613 7f269bfff700 0 ERROR: sync_user() failed,
> user=cephtest ret=-2
> 2015-02-14 22:27:57.671382 7f269bfff700 0 ERROR: can't read user header:
> ret=-2
> 2015-02-14 22:27:57.671384 7f269bfff700 0 ERROR: sync_user() failed,
> user=cephtestss ret=-2
> ^C2015-02-14 22:28:30.693140 7f269b7fe700 1 handle_sigterm
> 2015-02-14 22:28:30.693170 7f269b7fe700 1 handle_sigterm set alarm for 120
> 2015-02-14 22:28:30.693179 7f26c79d27c0 -1 shutting down
> 2015-02-14 22:28:30.717340 7f26c79d27c0 1 final shutdown
> 
> Please let me know if I can do something more ..
> 
> Now I have 2 questions:
> 1- what RADOS user you refer to?
> 2- How would I know that I use wrong cephx keys unless I see authentication
> error or relevant warning?
> 
> Thanks!
> Beanos
> 
> 
> 
> 
> On Feb 14, 2015, at 11:29 PM, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> wrote:
> 
> 
> 
> 
> 
> 
> From: "B L" < super.itera...@gmail.com >
> To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> Cc: ceph-users@lists.ceph.com
> Sent: Saturday, February 14, 2015 11:03:42 AM
> Subject: Re: [ceph-users] Having problem to start Radosgw
> 
> Hello Yehyda,
> 
> The strace command you referred to me, shows this:
> https://gist.github.com/anonymous/8e9f1ced485996a263bb
> 
> Additionally, I traced this log file:
> /var/log/radosgw/ceph-client.radosgw.gateway
> 
> it has the following:
> 
> 2015-02-12 18:23:32.247679 7fecca5257c0 -1 did not load config file, using
> default settings.
> 2015-02-12 18:23:32.247745 7fecca5257c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20477
> 2015-02-12 18:23:32.251192 7fecca5257c0 -1 Couldn't init storage provider
> (RADOS)
> 2015-02-12 18:23:58.494026 7faab31377c0 -1 did not load config file, using
> default settings.
> 2015-02-12 18:23:58.494092 7faab31377c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20509
> 2015-02-12 18:23:58.497420 7faab31377c0 -1 Couldn't init storage provider
> (RADOS)
> 2015-02-14 17:13:03.478688 7f86f09567c0 -1 did not load config file, using
> default settings.
> 2015-02-14 17:13:03.478778 7f86f09567c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 2989
> 2015-02-14 17:13:03.482850 7f86f09567c0 -1 Couldn't init storage provider
> (RADOS)
> 2015-02-14 17:13:29.477530 7ff18226a7c0 -1 did not load config file, using
> default settings.
> 2015-02-14 17:13:29.477595 7ff18226a7c0 0 ceph version 0.80.7
> (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3033
> 2015-02-14 17:13:29.48

Re: [ceph-users] RadosGW - multiple dns names

2015-02-23 Thread Yehuda Sadeh-Weinraub

- Original Message -

> From: "Shinji Nakamoto" 
> To: ceph-us...@ceph.com
> Sent: Friday, February 20, 2015 3:58:39 PM
> Subject: [ceph-users] RadosGW - multiple dns names

> We have multiple interfaces on our Rados gateway node, each of which is
> assigned to one of our many VLANs with a unique IP address.

> Is it possible to set multiple DNS names for a single Rados GW, so it can
> handle the request to each of the VLAN specific IP address DNS names?

Not yet, however, the upcoming hammer release will support that (hostnames will 
be configured as part of the region). 

Yehuda 

> eg.
> rgw dns name = prd-apiceph001
> rgw dns name = prd-backendceph001
> etc.

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mixed ceph versions

2015-02-25 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Gregory Farnum" 
> To: "Tom Deneau" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, February 25, 2015 3:20:07 PM
> Subject: Re: [ceph-users] mixed ceph versions
> 
> On Wed, Feb 25, 2015 at 3:11 PM, Deneau, Tom  wrote:
> > I need to set up a cluster where the rados client (for running rados
> > bench) may be on a different architecture and hence running a different
> > ceph version from the osd/mon nodes.  Is there a list of which ceph
> > versions work together for a situation like this?
> 
> The RADOS protocol is architecture-independent, and while we don't
> test across a huge version divergence (mostly between LTS releases)
> the client should also be compatible with pretty much anything you
> have server-side.

Client stuff like rgw usually requires that the backend runs a version at least 
as new (for objclass functionality).

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Steffen W Sørensen" 
> To: ceph-users@lists.ceph.com
> Sent: Friday, February 27, 2015 6:40:01 AM
> Subject: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
> 
> Hi,
> 
> Newbie to RadosGW+Ceph, but learning...
> Got a running Ceph Cluster working with rbd+CephFS clients. Now I'm trying to
> verify a RadosGW S3 api, but seems to have an issue with RadosGW access.
> 
> I get the error (not found anything searching so far...):
> 
> S3ResponseError: 405 Method Not Allowed
> 
> when trying to access the rgw.
> 
> Apache vhost access log file says:
> 
> 10.20.0.29 - - [27/Feb/2015:14:09:04 +0100] "GET / HTTP/1.1" 405 27 "-"
> "Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64"
> 
> and Apache's general error_log file says:
> 
> [Fri Feb 27 14:09:04 2015] [warn] FastCGI: 10.20.0.29 GET http://{fqdn}:8005/
> auth AWS WL4EJJYTLVYXEHNR6QSA:X6XR4z7Gr9qTMNDphTNlRUk3gfc=
> 
> 
> RadosGW seems to launch and run fine, though /var/log/messages at launches
> says:
> 
> Feb 27 14:12:34 rgw kernel: radosgw[14985]: segfault at e0 ip
> 003fb36cb1dc sp 7fffde221410 error 4 in
> librados.so.2.0.0[3fb320+6d]
> 
> # ps -fuapache
> UIDPID  PPID  C STIME TTY  TIME CMD
> apache   15113 15111  0 14:07 ?00:00:00 /usr/sbin/fcgi-
> apache   15114 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
> apache   15115 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
> apache   15116 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
> apache   15117 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
> apache   15118 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
> apache   15119 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
> apache   15120 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
> apache   15121 15111  0 14:07 ?00:00:00 /usr/sbin/httpd
> apache   15224 1  1 14:12 ?00:00:25 /usr/bin/radosgw -n
> client.radosgw.owmblob
> 
> RadosGW create my FastCGI socket and a default .asok, (not sure why/what
> default socket are meant for) as well as the configured log file though it
> never logs anything...
> 
> # tail -18 /etc/ceph/ceph.conf:
> 
> [client.radosgw.owmblob]
>  keyring = /etc/ceph/ceph.client.radosgw.keyring
>  host = rgw
>  rgw data = /var/lib/ceph/radosgw/ceph-rgw
>  log file = /var/log/radosgw/client.radosgw.owmblob.log
>  debug rgw = 20
>  rgw enable log rados = true
>  rgw enable ops log = true
>  rgw enable apis = s3
>  rgw cache enabled = true
>  rgw cache lru size = 1
>  rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock
>  ;#rgw host = localhost
>  ;#rgw port = 8004
>  rgw dns name = {fqdn}
>  rgw print continue = true
>  rgw thread pool size = 20
> 
> Turned out /etc/init.d/ceph-radosgw didn't chown $USER even when log_file
> didn't exist,
> assuming radosgw creates this log file when opening it, only it creates it as
> root not $USER, thus not output, manually chowning it and restarting GW
> gives output ala:
> 
> 2015-02-27 15:25:14.464112 7fef463e9700 20 enqueued request req=0x25dea40
> 2015-02-27 15:25:14.465750 7fef463e9700 20 RGWWQ:
> 2015-02-27 15:25:14.465786 7fef463e9700 20 req: 0x25dea40
> 2015-02-27 15:25:14.465864 7fef463e9700 10 allocated request req=0x25e3050
> 2015-02-27 15:25:14.466214 7fef431e4700 20 dequeued request req=0x25dea40
> 2015-02-27 15:25:14.466677 7fef431e4700 20 RGWWQ: empty
> 2015-02-27 15:25:14.467888 7fef431e4700 20 CONTENT_LENGTH=0
> 2015-02-27 15:25:14.467922 7fef431e4700 20 DOCUMENT_ROOT=/var/www/html
> 2015-02-27 15:25:14.467941 7fef431e4700 20 FCGI_ROLE=RESPONDER
> 2015-02-27 15:25:14.467958 7fef431e4700 20 GATEWAY_INTERFACE=CGI/1.1
> 2015-02-27 15:25:14.467976 7fef431e4700 20 HTTP_ACCEPT_ENCODING=identity
> 2015-02-27 15:25:14.469476 7fef431e4700 20 HTTP_AUTHORIZATION=AWS
> WL4EJJYTLVYXEHNR6QSA:OAT0zVItGyp98T5mALeHz4p1fcg=
> 2015-02-27 15:25:14.469516 7fef431e4700 20 HTTP_DATE=Fri, 27 Feb 2015
> 14:25:14 GMT
> 2015-02-27 15:25:14.469533 7fef431e4700 20 HTTP_HOST={fqdn}:8005
> 2015-02-27 15:25:14.469550 7fef431e4700 20 HTTP_USER_AGENT=Boto/2.34.0
> Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64
> 2015-02-27 15:25:14.469571 7fef431e4700 20 PATH=/sbin:/usr/sbin:/bin:/usr/bin
> 2015-02-27 15:25:14.469589 7fef431e4700 20 QUERY_STRING=
> 2015-02-27 15:25:14.469607 7fef431e4700 20 REMOTE_ADDR=10.20.0.29
> 2015-02-27 15:25:14.469624 7fef431e4700 20 REMOTE_PORT=34386
> 2015-02-27 15:25:14.469641 7fef431e4700 20 REQUEST_METHOD=GET
> 2015-02-27 15:25:14.469658 7fef431e4700 20 REQUEST_URI=/
> 2015-02-27 15:25:14.469677 7fef431e4700 20
> SCRIPT_FILENAME=/var/www/html/s3gw.fcgi
> 2015-02-27 15:25:14.469694 7fef431e4700 20 SCRIPT_NAME=/
> 2015-02-27 15:25:14.469711 7fef431e4700 20 SCRIPT_URI=http://{fqdn}:8005/
> 2015-02-27 15:25:14.469730 7fef431e4700 20 SCRIPT_URL=/
> 2015-02-27 15:25:14.469748 7fef431e4700 20 SERVER_ADDR=10.20.0.29
> 2

Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

2015-02-27 Thread Yehuda Sadeh-Weinraub

- Original Message -

> From: "Steffen W Sørensen" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Friday, February 27, 2015 9:39:46 AM
> Subject: Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed

> On 27/02/2015, at 17.20, Yehuda Sadeh-Weinraub < yeh...@redhat.com > wrote:

> > I'd look at two things first. One is the '{fqdn}' string, which I'm not
> > sure
> > whether that's the actual string that you have, or whether you just
> > replaced
> > it for the sake of anonymity. The second is the port number, which should
> > be
> > fine, but maybe the fact that it appears as part of the script uri triggers
> > some issue.
> 

> When launching radosgw it logs this:

> ...
> 2015-02-27 18:33:58.663960 7f200b67a8a0 20 rados->read obj-ofs=0 read_ofs=0
> read_len=524288
> 2015-02-27 18:33:58.675821 7f200b67a8a0 20 rados->read r=0 bl.length=678
> 2015-02-27 18:33:58.676532 7f200b67a8a0 10 cache put:
> name=.rgw.root+zone_info.default
> 2015-02-27 18:33:58.676573 7f200b67a8a0 10 moving .rgw.root+zone_info.default
> to cache LRU end
> 2015-02-27 18:33:58.677415 7f200b67a8a0 2 zone default is master
> 2015-02-27 18:33:58.677666 7f200b67a8a0 20 get_obj_state: rctx=0x2a85cd0
> obj=.rgw.root:region_map state=0x2a86498 s->prefetch_data=0
> 2015-02-27 18:33:58.677760 7f200b67a8a0 10 cache get:
> name=.rgw.root+region_map : miss
> 2015-02-27 18:33:58.709411 7f200b67a8a0 10 cache put:
> name=.rgw.root+region_map
> 2015-02-27 18:33:58.709846 7f200b67a8a0 10 adding .rgw.root+region_map to
> cache LRU end
> 2015-02-27 18:33:58.957336 7f1ff17f2700 2 garbage collection: start
> 2015-02-27 18:33:58.959189 7f1ff0df1700 20 BucketsSyncThread: start
> 2015-02-27 18:33:58.985486 7f200b67a8a0 0 framework: fastcgi
> 2015-02-27 18:33:58.985778 7f200b67a8a0 0 framework: civetweb
> 2015-02-27 18:33:58.985879 7f200b67a8a0 0 framework conf key: port, val: 7480
> 2015-02-27 18:33:58.986462 7f200b67a8a0 0 starting handler: civetweb
> 2015-02-27 18:33:59.032173 7f1fc3fff700 20 UserSyncThread: start
> 2015-02-27 18:33:59.214739 7f200b67a8a0 0 starting handler: fastcgi
> 2015-02-27 18:33:59.286723 7f1fb59e8700 10 allocated request req=0x2aa1b20
> 2015-02-27 18:34:00.533188 7f1fc3fff700 20 RGWRados::pool_iterate: got {my
> user name}
> 2015-02-27 18:34:01.038190 7f1ff17f2700 2 garbage collection: stop
> 2015-02-27 18:34:01.670780 7f1fc3fff700 20 RGWUserStatsCache: sync user={my
> user name}
> 2015-02-27 18:34:01.687730 7f1fc3fff700 0 ERROR: can't read user header:
> ret=-2
> 2015-02-27 18:34:01.689734 7f1fc3fff700 0 ERROR: sync_user() failed, user={my
> user name} ret=-2

> Why does it seem to find my radosgw defined user name as a pool and what
> might bring it to fail to read user header?

That's just a red herring. It tries to sync the user stats, but it can't 
because quota is not enabled (iirc). We should probably get rid of these 
messages as they're pretty confusing. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hammer sharded radosgw bucket indexes question

2015-03-04 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Ben Hines" 
> To: "ceph-users" 
> Sent: Wednesday, March 4, 2015 1:03:16 PM
> Subject: [ceph-users] Hammer sharded radosgw bucket indexes question
> 
> Hi,
> 
> These questions were asked previously but perhaps lost:
> 
> We have some large buckets.
> 
> - When upgrading to Hammer (0.93 or later), is it necessary to
> recreate the buckets to get a sharded index?
> 
> - What parameters does the system use for deciding when to shard the index?
> 

The system does not re-shard the bucket index, it will only affect new buckets. 
There is a per-zone configurable that specifies num of shards for buckets 
created in that zone (by default it's disabled). There's also a ceph.conf 
configurable that can be set to override that value.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Understand RadosGW logs

2015-03-05 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Daniel Schneller" 
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, March 3, 2015 2:54:13 AM
> Subject: [ceph-users] Understand RadosGW logs
> 
> Hi!
> 
> After realizing the problem with log rotation (see
> http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
> and fixing it, I now for the first time have some
> meaningful (and recent) logs to look at.
> 
> While from an application perspective there seem
> to be no issues, I would like to understand some
> messages I find with relatively high frequency in
> the logs:
> 
> Exhibit 1
> -
> 2015-03-03 11:14:53.685361 7fcf4bfef700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1
> 2015-03-03 11:15:57.476059 7fcf39ff3700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1
> 2015-03-03 11:17:43.570986 7fcf25fcb700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1
> 2015-03-03 11:22:00.881640 7fcf39ff3700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1
> 2015-03-03 11:22:48.147011 7fcf35feb700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1
> 2015-03-03 11:27:40.572723 7fcf50ff9700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1
> 2015-03-03 11:29:40.082954 7fcf36fed700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1
> 2015-03-03 11:30:32.204492 7fcf4dff3700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1

It means that returning data to the client got some error, usually means that 
the client disconnected before completion.
> 
> I cannot find anything relevant by Googling for
> that, apart from the actual line of code that
> produces this line.
> What does that mean? Is it an indication of data
> corruption or are there more benign reasons for
> this line?
> 
> 
> Exhibit 2
> --
> Several of these blocks
> 
> 2015-03-03 07:06:17.805772 7fcf36fed700  1 == starting new request
> req=0x7fcf5800f3b0 =
> 2015-03-03 07:06:17.836671 7fcf36fed700  0
> RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
> part_ofs=0 rule->part_size=0
> 2015-03-03 07:06:17.836758 7fcf36fed700  0
> RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
> part_ofs=0 rule->part_size=0
> 2015-03-03 07:06:17.836918 7fcf36fed700  0
> RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243
> part_ofs=0 rule->part_size=0
> 2015-03-03 07:06:18.263126 7fcf36fed700  1 == req done
> req=0x7fcf5800f3b0 http_status=200 ==
> ...
> 2015-03-03 09:27:29.855001 7fcf28fd1700  1 == starting new request
> req=0x7fcf580102a0 =
> 2015-03-03 09:27:29.866718 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:29.866778 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:29.866852 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:29.866917 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:29.875466 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:29.884434 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:29.906155 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:29.914364 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:29.940653 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:30.272816 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:31.125773 7fcf28fd1700  0
> RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:27:31.192661 7fcf28fd1700  0 ERROR: flush_read_list():
> d->client_c->handle_data() returned -1
> 2015-03-03 09:27:31.194481 7fcf28fd1700  1 == req done
> req=0x7fcf580102a0 http_status=200 ==
> ...
> 2015-03-03 09:28:43.008517 7fcf2a7d4700  1 == starting new request
> req=0x7fcf580102a0 =
> 2015-03-03 09:28:43.016414 7fcf2a7d4700  0
> RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579
> part_ofs=0 rule->part_size=0
> 2015-03-03 09:28:43.022387 7fcf2a7d4700  1 == req done
> req=0x7fcf580102a0 http_status=200 ==
> 
> First, what is the req= line? Is that a thread-id?
> I am asking, because the same id is used over a

Re: [ceph-users] rgw admin api - users

2015-03-05 Thread Yehuda Sadeh-Weinraub

The metadata api can do it:

GET /admin/metadata/user


Yehuda

- Original Message -
> From: "Joshua Weaver" 
> To: ceph-us...@ceph.com
> Sent: Thursday, March 5, 2015 1:43:33 PM
> Subject: [ceph-users] rgw admin api - users
> 
> According to the docs at
> http://docs.ceph.com/docs/master/radosgw/adminops/#get-user-info
> I should be able to invoke /admin/user without a quid specified, and get a
> list of users.
> No matter what I try, I get a 403.
> After looking at the source at github (ceph/ceph), it appears that there
> isn’t any code path that would result in a collection of users to be
> generated from that resource.
> 
> Am I missing something?
> 
> TIA,
> _josh
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Steffen Winther" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, March 9, 2015 12:43:58 AM
> Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP
> 
> Steffen W Sørensen  writes:
> 
> > Response:
> > HTTP/1.1 200 OK
> > Date: Fri, 06 Mar 2015 10:41:14 GMT
> > Server: Apache/2.2.22 (Fedora)
> > Connection: close
> > Transfer-Encoding: chunked
> > Content-Type: application/xml
> > 
> > This response makes the App say:
> > 
> > S3.createBucket, class , code ,
> > message  > response is not a valid xml message>
> > 
> > Are our S3 GW not responding properly?
> Why doesn't the radosGW return a "Content-Length: 0" header
> when the body is empty?

If you're using apache, then it filters out zero Content-Length. Nothing much 
radosgw can do about it.

> 
> http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html
> 
> Maybe this is confusing my App to expect some XML in body

You can try using the radosgw civetweb frontend, see if it changes anything.

Yehuda

>  
> > 2. at every create bucket OP the GW create what looks like new containers
> > for ACLs in .rgw pool, is this normal
> > or howto avoid such multiple objects clottering the GW pools?
> Is there something wrong since I get multiple ACL object for this bucket
> everytime my App tries to recreate same bucket or
> is this a "feature/bug" in radosGW?
> 
>  
> > # rados -p .rgw ls
> > .bucket.meta.mssCl:default.6309817.1
> > .bucket.meta.mssCl:default.6187712.3
> > .bucket.meta.mssCl:default.6299841.7
> > .bucket.meta.mssCl:default.6309817.5
> > .bucket.meta.mssCl:default.6187712.2
> > .bucket.meta.mssCl:default.6187712.19
> > .bucket.meta.mssCl:default.6187712.12
> > mssCl
> > ...
> > 
> > # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12
> > ceph.objclass.version
> > user.rgw.acl
> 
> /Steffen
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Steffen Winther" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, March 9, 2015 1:25:43 PM
> Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP
> 
> Yehuda Sadeh-Weinraub  writes:
> 
> 
> > If you're using apache, then it filters out zero Content-Length.
> > Nothing much radosgw can do about it.
> > You can try using the radosgw civetweb frontend, see if it changes
> > anything.
> Thanks, only no difference...
> 
> Req:
> PUT /mssCl/ HTTP/1.1
> Host: rgw.gsp.sprawl.dk:7480
> Authorization: AWS 
> Date: Mon, 09 Mar 2015 20:18:16 GMT
> Content-Length: 0
> 
> Response:
> HTTP/1.1 200 OK
> Content-type: application/xml
> Content-Length: 0
> 
> App still says:
> 
> S3.createBucket, class , code ,
> message  message>
> 
> :/
> 

According to the api specified here 
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUT.html, there's no 
response expected. I can only assume that the application tries to decode the 
xml if xml content type is returned. What kind of application is that?

> 
> Yehuda any comments on below 2. issue?
> 
> 2. at every create bucket OP the GW create what looks like new containers
> for ACLs in .rgw pool, is this normal
> or howto avoid such multiple objects clottering the GW pools?
> Is there something wrong since I get multiple ACL object for this bucket
> everytime my App tries to recreate same bucket or
> is this a "feature/bug" in radosGW?

That's a bug.

Yehuda

> 
> # rados -p .rgw ls
> .bucket.meta.mssCl:default.6309817.1
> .bucket.meta.mssCl:default.6187712.3
> .bucket.meta.mssCl:default.6299841.7
> .bucket.meta.mssCl:default.6309817.5
> .bucket.meta.mssCl:default.6187712.2
> .bucket.meta.mssCl:default.6187712.19
> .bucket.meta.mssCl:default.6187712.12
> mssCl
> ...
> 
> # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12
> ceph.objclass.version
> user.rgw.acl
> 
> /Steffen
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-10 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Steffen Winther" 
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, March 10, 2015 12:06:38 AM
> Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP
> 
> Yehuda Sadeh-Weinraub  writes:
> 
> > According to the api specified here
> > http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUT.html,
> > there's no response expected. I can only assume that the application
> > tries to decode the xml if xml content type is returned.
> Also what I hinted App vendor
> 
> > What kind of application is that?
> Commercial Email platform from Openwave.com

Maybe it could be worked around using an apache rewrite rule. In any case, I 
opened issue #11091.

> 
> > > 2. at every create bucket OP the GW create what looks like new containers
> > > for ACLs in .rgw pool, is this normal
> > > or howto avoid such multiple objects clottering the GW pools?
> > > Is there something wrong since I get multiple ACL object for this bucket
> > > everytime my App tries to recreate same bucket or
> > > is this a "feature/bug" in radosGW?
> > 
> > That's a bug.
> Ok, any resolution/work-around to this?
> 

Not at the moment. There's already issue #6961, I bumped its priority higher, 
and we'll take a look at it.

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow files

2015-03-12 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Ben" 
> To: ceph-us...@ceph.com
> Sent: Wednesday, March 11, 2015 8:46:25 PM
> Subject: Re: [ceph-users] Shadow files
> 
> Anyone got any info on this?
> 
> Is it safe to delete shadow files?

It depends. Shadow files are badly named objects that represent part of the 
objects data. They are only safe to remove if you know that the corresponding 
objects no longer exist.

Yehuda

> 
> On 2015-03-11 10:03, Ben wrote:
> > We have a large number of shadow files in our cluster that aren't
> > being deleted automatically as data is deleted.
> > 
> > Is it safe to delete these files?
> > Is there something we need to be aware of when deleting them?
> > Is there a script that we can run that will delete these safely?
> > 
> > Is there something wrong with our cluster that it isn't deleting these
> > files when it should be?
> > 
> > We are using civetweb with radosgw, with tengine ssl proxy infront of
> > it
> > 
> > Any advice please
> > Thanks
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] not existing key from s3 list

2015-03-13 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Dominik Mostowiec" 
> To: ceph-users@lists.ceph.com
> Sent: Friday, March 13, 2015 4:50:18 PM
> Subject: [ceph-users] not existing key from s3 list
> 
> Hi,
> I found a strange problem with not existing file in s3.
> Object exists in list
> # s3 -u list bucketimages | grep 'files/fotoobject_83884@2/55673'
> files/fotoobject_83884@2/55673.JPG   2014-03-26T22:25:59Z   349K
> but:
> # s3 -u head 'bucketimages/files/fotoobject_83884@2/55673.JPG'
> 
> ERROR: HttpErrorNotFound
> 
> After a little digging:
> # radosgw-admin --bucket=bucketimages bucket stats | grep marker
>   "marker": "default.7573587.55",
> 
> # rados listomapkeys .dir.default.7573587.55 -p .rgw.buckets.index |
> grep 'files/fotoobject'
> files/fotoobject_83884@2/55673.JPG
> 
> # rados -p .rgw.buckets.index getomapval .dir.default.7573587.55
> 'files/fotoobject_83884@2/55673.JPG'
> No such key:
> .rgw.buckets.index/.dir.default.7573587.55/files/fotoobject_83884@2/55673.JPG
> 
> What is wrong?

It is likely that this object failed to upload, we returned an error for that, 
but there was a bug (fixed recently) that we didn't clear the bucket index 
entry correctly.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW Direct Upload Limitation

2015-03-16 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Craig Lewis" 
> To: "Gregory Farnum" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, March 16, 2015 11:48:15 AM
> Subject: Re: [ceph-users] RadosGW Direct Upload Limitation
> 
> 
> 
> 
> Maybe, but I'm not sure if Yehuda would want to take it upstream or
> not. This limit is present because it's part of the S3 spec. For
> larger objects you should use multi-part upload, which can get much
> bigger.
> -Greg
> 
> 
> Note that the multi-part upload has a lower limit of 4MiB per part, and the
> direct upload has an upper limit of 5GiB.

The limit is 10MB, but it does not apply to the last part, so basically you 
could upload any object size with it. I would still recommend using the plain 
upload for smaller object sizes, it is faster, and the resulting object might 
be more efficient (for really small sizes).

Yehuda

> 
> So you have to use both methods - direct upload for small files, and
> multi-part upload for big files.
> 
> Your best bet is to use the Amazon S3 libraries. They have functions that
> take care of it for you.
> 
> 
> I'd like to see this mentioned in the Ceph documentation someplace. When I
> first encountered the issue, I couldn't find a limit in the RadosGW
> documentation anywhere. I only found the 5GiB limit in the Amazon API
> documentation, which lead me to test on RadosGW. Now that I know it was done
> to preserve Amazon compatibility, I don't want to override the value
> anymore.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow files

2015-03-17 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Ben" 
> To: "Craig Lewis" 
> Cc: "Yehuda Sadeh-Weinraub" , "ceph-users" 
> 
> Sent: Monday, March 16, 2015 3:38:42 PM
> Subject: Re: [ceph-users] Shadow files
> 
> Thats the thing. The peaks and troughs are in USERS BUCKETS only.
> The actual cluster usage does not go up and down, it just goes up up up.
> 
> I would expect to see peaks and troughs much the same as the user
> buckets peaks and troughs on the overall cluster disk usage.
> But this is not the case.
> 
> We upgraded the cluster and radosgws to GIANT (0.87.1) yesterday, and
> now we are seeing a large number of misplaced(??) objects being moved
> around.
> Does this mean it has found all the shadow files that shouldn't exist
> anymore, and is deleting them? If so I would expect to start seeing
> overall cluster usage drop, but this hasn't happened yet.

No, I don't think so. Sounds like your cluster is recovering, and it happens in 
a completely different layer.
> 
> Any ideas?

try running:
$ radosgw-admin gc list --include-all

This should be showing all the shadow objects that are pending for delete. Note 
that if you have a non-default radosgw configuration, make sure you run 
radosgw-admin using the same user and config that radosgw is running (e.g., add 
-n client. appropriately), otherwise it might not look at the correct 
zone data.
You could create an object, identify the shadow objects for that object, remove 
it, check to see that the gc list command shows these shadow objects. Then, 
wait the configured time (2 hours?), and see if it was removed.

Yehuda


> 
> On 2015-03-17 06:12, Craig Lewis wrote:
> > Out of curiousity, what's the frequency of the peaks and troughs?
> > 
> > RadosGW has configs on how long it should wait after deleting before
> > garbage collecting, how long between GC runs, and how many objects it
> > can GC in per run.
> > 
> > The defaults are 2 hours, 1 hour, and 32 respectively.  Search
> > http://docs.ceph.com/docs/master/radosgw/config-ref/ [2] for "rgw gc".
> > 
> > If your peaks and troughs have a frequency less than 1 hour, then GC
> > is going to delay and alias the disk usage w.r.t. the object count.
> > 
> > If you have millions of objects, you probably need to tweak those
> > values.  If RGW is only GCing 32 objects an hour, it's never going to
> > catch up.
> > 
> > Now that I think about it, I bet I'm having issues here too.  I delete
> > more than (32*24) objects per day...
> > 
> > On Sun, Mar 15, 2015 at 4:41 PM, Ben  wrote:
> > 
> >> It is either a problem with CEPH, Civetweb or something else in our
> >> configuration.
> >> But deletes in user buckets is still leaving a high number of old
> >> shadow files. Since we have millions and millions of objects, it is
> >> hard to reconcile what should and shouldnt exist.
> >> 
> >> Looking at our cluster usage, there are no troughs, it is just a
> >> rising peak.
> >> But when looking at users data usage, we can see peaks and troughs
> >> as you would expect as data is deleted and added.
> >> 
> >> Our ceph version 0.80.9
> >> 
> >> Please ideas?
> >> 
> >> On 2015-03-13 02:25, Yehuda Sadeh-Weinraub wrote:
> >> 
> >> - Original Message -
> >> From: "Ben" 
> >> To: ceph-us...@ceph.com
> >> Sent: Wednesday, March 11, 2015 8:46:25 PM
> >> Subject: Re: [ceph-users] Shadow files
> >> 
> >> Anyone got any info on this?
> >> 
> >> Is it safe to delete shadow files?
> >> 
> >> It depends. Shadow files are badly named objects that represent
> >> part
> >> of the objects data. They are only safe to remove if you know that
> >> the
> >> corresponding objects no longer exist.
> >> 
> >> Yehuda
> >> 
> >> On 2015-03-11 10:03, Ben wrote:
> >>> We have a large number of shadow files in our cluster that aren't
> >>> being deleted automatically as data is deleted.
> >>> 
> >>> Is it safe to delete these files?
> >>> Is there something we need to be aware of when deleting them?
> >>> Is there a script that we can run that will delete these safely?
> >>> 
> >>> Is there something wrong with our cluster that it isn't deleting
> >> these
> >>> files when it should be?
> >>> 
> >>> We are using civetweb with radosgw, with

Re: [ceph-users] Shadow files

2015-03-18 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Ben" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "Craig Lewis" , "ceph-users" 
> 
> Sent: Tuesday, March 17, 2015 7:28:28 PM
> Subject: Re: [ceph-users] Shadow files
> 
> None of this helps with trying to remove defunct shadow files which
> number in the 10s of millions.

Did it at least reflect that the garbage collection system works?

> 
> Is there a quick way to see which shadow files are safe to delete
> easily?

There's no easy process. If you know that a lot of the removed data is on 
buckets that shouldn't exist anymore then you could start by trying to identify 
that. You could do that by:

$ radosgw-admin metadata list bucket

then, for each bucket:

$ radosgw-admin metadata get bucket:

This will give you the bucket markers of all existing buckets. Each data object 
(head and shadow objects) is prefixed by bucket markers. Objects that don't 
have valid bucket markers can be removed. Note that I would first list all 
objects, then get the list of valid bucket markers, as the operation is racy 
and new buckets can be created in the mean time.

We did discuss a new garbage cleanup tool that will address your specific 
issue, and we have a design for it, but it's not there yet.

Yehuda



> Remembering that there are MILLIONS of objects.
> 
> We have a 320TB cluster which is 272TB full. Of this, we should only
> actually be seeing 190TB. There is 80TB of shadow files that should no
> longer exist.
> 
> On 2015-03-18 02:00, Yehuda Sadeh-Weinraub wrote:
> > - Original Message -
> >> From: "Ben" 
> >> To: "Craig Lewis" 
> >> Cc: "Yehuda Sadeh-Weinraub" , "ceph-users"
> >> 
> >> Sent: Monday, March 16, 2015 3:38:42 PM
> >> Subject: Re: [ceph-users] Shadow files
> >> 
> >> Thats the thing. The peaks and troughs are in USERS BUCKETS only.
> >> The actual cluster usage does not go up and down, it just goes up up
> >> up.
> >> 
> >> I would expect to see peaks and troughs much the same as the user
> >> buckets peaks and troughs on the overall cluster disk usage.
> >> But this is not the case.
> >> 
> >> We upgraded the cluster and radosgws to GIANT (0.87.1) yesterday, and
> >> now we are seeing a large number of misplaced(??) objects being moved
> >> around.
> >> Does this mean it has found all the shadow files that shouldn't exist
> >> anymore, and is deleting them? If so I would expect to start seeing
> >> overall cluster usage drop, but this hasn't happened yet.
> > 
> > No, I don't think so. Sounds like your cluster is recovering, and it
> > happens in a completely different layer.
> >> 
> >> Any ideas?
> > 
> > try running:
> > $ radosgw-admin gc list --include-all
> > 
> > This should be showing all the shadow objects that are pending for
> > delete. Note that if you have a non-default radosgw configuration,
> > make sure you run radosgw-admin using the same user and config that
> > radosgw is running (e.g., add -n client. appropriately),
> > otherwise it might not look at the correct zone data.
> > You could create an object, identify the shadow objects for that
> > object, remove it, check to see that the gc list command shows these
> > shadow objects. Then, wait the configured time (2 hours?), and see if
> > it was removed.
> > 
> > Yehuda
> > 
> > 
> >> 
> >> On 2015-03-17 06:12, Craig Lewis wrote:
> >> > Out of curiousity, what's the frequency of the peaks and troughs?
> >> >
> >> > RadosGW has configs on how long it should wait after deleting before
> >> > garbage collecting, how long between GC runs, and how many objects it
> >> > can GC in per run.
> >> >
> >> > The defaults are 2 hours, 1 hour, and 32 respectively.  Search
> >> > http://docs.ceph.com/docs/master/radosgw/config-ref/ [2] for "rgw gc".
> >> >
> >> > If your peaks and troughs have a frequency less than 1 hour, then GC
> >> > is going to delay and alias the disk usage w.r.t. the object count.
> >> >
> >> > If you have millions of objects, you probably need to tweak those
> >> > values.  If RGW is only GCing 32 objects an hour, it's never going to
> >> > catch up.
> >> >
> >> > Now that I think about it, I bet I'm having issues here too.  I delete
> >> > more than (32*24) objects per day...
> >> >
> >> > On Su

Re: [ceph-users] Shadow files

2015-03-18 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Abhishek L" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "Ben" , "ceph-users" 
> Sent: Wednesday, March 18, 2015 10:54:37 AM
> Subject: Re: [ceph-users] Shadow files
> 
> 
> Yehuda Sadeh-Weinraub writes:
> 
> >> Is there a quick way to see which shadow files are safe to delete
> >> easily?
> >
> > There's no easy process. If you know that a lot of the removed data is on
> > buckets that shouldn't exist anymore then you could start by trying to
> > identify that. You could do that by:
> >
> > $ radosgw-admin metadata list bucket
> >
> > then, for each bucket:
> >
> > $ radosgw-admin metadata get bucket:
> >
> > This will give you the bucket markers of all existing buckets. Each data
> > object (head and shadow objects) is prefixed by bucket markers. Objects
> > that don't have valid bucket markers can be removed. Note that I would
> > first list all objects, then get the list of valid bucket markers, as the
> > operation is racy and new buckets can be created in the mean time.
> >
> > We did discuss a new garbage cleanup tool that will address your specific
> > issue, and we have a design for it, but it's not there yet.
> >
> 
> Could you share the design/ideas for making the cleanup tool. After an
> initial search I could only find two issues
> [1] http://tracker.ceph.com/issues/10342

It is sketched in here (10342), probably needs to be better formatted and 
documented.

Yehuda

> [2] http://tracker.ceph.com/issues/9604
> 
> though not much details are there to get started.
> --
> Abhishek
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] FastCGI and RadosGW issue?

2015-03-19 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Potato Farmer" 
> To: ceph-users@lists.ceph.com
> Sent: Thursday, March 19, 2015 12:26:41 PM
> Subject: [ceph-users] FastCGI and RadosGW issue?
> 
> 
> 
> Hi,
> 
> 
> 
> I am running into an issue uploading to a bucket over an s3 connection to
> ceph. I can create buckets just fine. I just can’t create a key and copy
> data to it.
> 
> 
> 
> Command that causes the error:
> 
> >>> key.set_contents_from_string("testing from string")
> 
> 
> 
> I encounter the following error:
> 
> Traceback (most recent call last):
> 
> File "", line 1, in 
> 
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 1424, in
> set_contents_from_string
> 
> encrypt_key=encrypt_key)
> 
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 1291, in
> set_contents_from_file
> 
> chunked_transfer=chunked_transfer, size=size)
> 
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 748, in
> send_file
> 
> chunked_transfer=chunked_transfer, size=size)
> 
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 949, in
> _send_file_internal
> 
> query_args=query_args
> 
> File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 664, in
> make_request
> 
> retry_handler=retry_handler
> 
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 1068, in
> make_request
> 
> retry_handler=retry_handler)
> 
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 1025, in
> _mexe
> 
> raise BotoServerError(response.status, response.reason, body)
> 
> boto.exception.BotoServerError: BotoServerError: 500 Internal Server Error
> 
> None
> 
> 
> 
> In the Apache logs I see the following:
> 
> [Thu Mar 19 12:03:13 2015] [error] [] FastCGI: comm with server
> "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
> 
> [Thu Mar 19 12:03:13 2015] [error] [] FastCGI: incomplete headers (0 bytes)
> received from server "/var/www/s3gw.fcgi"
> 
> [Thu Mar 19 12:03:32 2015] [error] [] FastCGI: comm with server
> "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
> 
> [Thu Mar 19 12:03:32 2015] [error] [] FastCGI: incomplete headers (0 bytes)
> received from server "/var/www/s3gw.fcgi"
> 
> 
> 
> I do not get any data to show in the radosgw logs, it is empty. I have turned
> off FastCGIWrapper and set rgw print continue to false in ceph.conf. I am
> using the version of FastCGI provided by the ceph repo.

In this case you don't need to have 'rgw print continue' set to false; either 
remove that line, or set it to true.

Yehuda
> 
> 
> 
> Has anyone run into this before? Any suggestions?
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Auth URL not found when using object gateway

2015-03-24 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Greg Meier" 
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, March 24, 2015 4:24:16 PM
> Subject: [ceph-users] Auth URL not found when using object gateway
> 
> Hi,
> 
> I'm having trouble setting up an object gateway on an existing cluster. The
> cluster I'm trying to add the gateway to is running on a Precise 12.04
> virtual machine.
> 
> The cluster is up and running, with a monitor, two OSDs, and a metadata
> server. It returns HEALTH_OK and active+clean, so I am somewhat assured that
> it is running correctly.
> 
> I've:
> - set up an apache2 webserver with the fastcgi mod installed
> - created an rgw.conf file
> - added an s3gw.fcgi script
> - enabled the rgw.conf site and disabled the default
> - created a keyring and gateway user with appropriate cap's
> - restarted ceph, apache2, and the radosgw daemon
> - created a user and subuser
> - tested both s3 and swift calls
> 
> Unfortunately, both s3 and swift fail to authorize. An attempt to create a
> new bucket with s3 using a python script returns:
> 
> Traceback (most recent call last):
> File "s3test.py", line 13, in 
> bucket = conn.create_bucket('my-new-bucket')
> File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 422, in
> create_bucket
> response.status, response.reason, body)
> boto.exception.S3ResponseError: S3ResponseError: 404 Not Found
> None
> 
> And an attempt to post a container using the python-swiftclient from the
> command line with command:
> 
> swift --debug --info -A http://localhost/auth/1.0 -U gatewayuser:swift -K
>  post new_container
> 
> returns:
> 
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): localhost
> DEBUG:urllib3.connectionpool:"GET /auth/1.0 HTTP/1.1" 404 180
> INFO:swiftclient:REQ: curl -i http://localhost/auth/1.0 -X GET
> INFO:swiftclient:RESP STATUS: 404 Not Found
> INFO:swiftclient:RESP HEADERS: [('content-length', '180'),
> ('content-encoding', 'gzip'), ('date', 'Tue, 24 Mar 2015 23:19:50 GMT'),
> ('content-type', 'text/html; charset=iso-8859-1'), ('vary',
> 'Accept-Encoding'), ('server', 'Apache/2.2.22 (Ubuntu)')]
> INFO:swiftclient:RESP BODY: M�0��}���,�I�)֔)Ң��m��qv��Y��.)�59�=Ve
> ���y���lsa���#T��p��v�,B/��� �5D�Z|=���S�N�+
> �|-�X)��V��b�a���與'@Uo���-�n��"?�
> ERROR:swiftclient:Auth GET failed: http://localhost/auth/1.0 404 Not Found
> Traceback (most recent call last):
> File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1181, in
> _retry
> self.url, self.token = self.get_auth()
> File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1155, in
> get_auth
> insecure=self.insecure)
> File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 318, in
> get_auth
> insecure=insecure)
> File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 241, in
> get_auth_1_0
> http_reason=resp.reason)
> ClientException: Auth GET failed: http://localhost/auth/1.0 404 Not Found
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): localhost
> DEBUG:urllib3.connectionpool:"GET /auth/1.0 HTTP/1.1" 404 180
> INFO:swiftclient:REQ: curl -i http://localhost/auth/1.0 -X GET
> INFO:swiftclient:RESP STATUS: 404 Not Found
> INFO:swiftclient:RESP HEADERS: [('content-length', '180'),
> ('content-encoding', 'gzip'), ('date', 'Tue, 24 Mar 2015 23:19:50 GMT'),
> ('content-type', 'text/html; charset=iso-8859-1'), ('vary',
> 'Accept-Encoding'), ('server', 'Apache/2.2.22 (Ubuntu)')]
> INFO:swiftclient:RESP BODY: M�0��}���,�I�)֔)Ң��m��qv��Y��.)�59�=Ve
> ���y���lsa���#T��p��v�,B/��� �5D�Z|=���S�N�+
> �|-�X)��V��b�a���與'@Uo���-�n��"?�
> ERROR:swiftclient:Auth GET failed: http://localhost/auth/1.0 404 Not Found
> Traceback (most recent call last):
> File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1181, in
> _retry
> self.url, self.token = self.get_auth()
> File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 1155, in
> get_auth
> insecure=self.insecure)
> File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 318, in
> get_auth
> insecure=insecure)
> File "/usr/lib/python2.7/dist-packages/swiftclient/client.py", line 241, in
> get_auth_1_0
> http_reason=resp.reason)
> ClientException: Auth GET failed: http://localhost/auth/1.0 404 Not Found
> Auth GET failed: http://localhost/auth/1.0 404 Not Found
> 
> I'm not at all sure why it doesn't work when I've followed the documentation
> for setting it up. Please find attached, the config files for rgw.conf,
> ceph.conf, and apache2.conf
> 

What does the rgw log show? (please add 'debug rgw = 20' and 'debug ms = 1')

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw authorization failed

2015-03-25 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Neville" 
> To: ceph-users@lists.ceph.com
> Sent: Wednesday, March 25, 2015 8:16:39 AM
> Subject: [ceph-users] Radosgw authorization failed
> 
> Hi all,
> 
> I'm testing backup product which supports Amazon S3 as target for Archive
> storage and I'm trying to setup a Ceph cluster configured with the S3 API to
> use as an internal target for backup archives instead of AWS.
> 
> I've followed the online guide for setting up Radosgw and created a default
> region and zone based on the AWS naming convention US-East-1. I'm not sure
> if this is relevant but since I was having issues I thought it might need to
> be the same.
> 
> I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
> create a bucket, create a folder, list buckets etc. The problem is when the
> backup software tries to create an object I get an authorization failure.
> It's using the same user/access/secret as I'm using from boto.s3 and I'm
> sure the creds are right as it lets me create the initial connection, it
> just fails when trying to create an object (backup folder).
> 
> Here's the extract from the radosgw log:
> 
> -
> 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
> /:list_bucket:init op
> 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
> /:list_bucket:verifying op mask
> 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7
> 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
> /:list_bucket:verifying op permissions
> 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test
> mask=49
> 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
> 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1
> mask=49
> 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
> 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2
> mask=49
> 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
> 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
> owner=test perm=1
> 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1,
> policy perm=1, user_perm_mask=1, acl perm=1
> 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
> /:list_bucket:verifying op params
> 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
> /:list_bucket:executing
> 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
> test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
> start num 1001
> 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
> /:list_bucket:http status=200
> 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0
> http_status=200 ==
> 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
> req=0x7f107000f0e0
> 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
> 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
> 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
> req=0x7f107000f6b0
> 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
> req=0x7f107000f0e0
> 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
> 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
> 2015-03-25 15:07:26.517084 7f1058dd7700 20
> CONTENT_TYPE=application/octet-stream
> 2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www
> 2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
> 2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
> 2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
> 2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1
> 2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS
> F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs=
> 2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive
> 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015
> 15:07:26 GMT
> 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue
> 2015-03-25 15:07:26.517093 7f1058dd7700 20
> HTTP_HOST=test1.devops-os-cog01.devops.local
> 2015-03-25 15:07:26.517094 7f1058dd7700 20
> HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1
> Java_HotSpot(TM)_Client_VM/24.55-b03
> 2015-03-25 15:07:26.517096 7f1058dd7700 20
> HTTP_X_AMZ_META_CREATIONTIME=2015-03-25T15:07:26
> 2015-03-25 15:07:26.517097 7f1058dd7700 20 HTTP_X_AMZ_META_SIZE=88
> 2015-03-25 15:07:26.517098 7f1058dd7700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD
> 2015-03-25 15:07:26.517099 7f1058dd7700 20 HTTPS=on
> 2015-03-25 15:07:26.517100 7f1058dd7700 20
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
> 2015-03-25 15:07:26.517100 7f1058dd7700 20 QUERY_STRING=
> 2015-03-25 15:07:26.517101 7f1058dd7700 20 REMOTE_ADDR=10.40.41.106
> 2015-03-25 15:07:26.517102 7f1058dd7

Re: [ceph-users] Radosgw authorization failed

2015-03-30 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Neville" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Monday, March 30, 2015 6:49:29 AM
> Subject: Re: [ceph-users] Radosgw authorization failed
> 
> 
> > Date: Wed, 25 Mar 2015 11:43:44 -0400
> > From: yeh...@redhat.com
> > To: neville.tay...@hotmail.co.uk
> > CC: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Radosgw authorization failed
> > 
> > 
> > 
> > - Original Message -
> > > From: "Neville" 
> > > To: ceph-users@lists.ceph.com
> > > Sent: Wednesday, March 25, 2015 8:16:39 AM
> > > Subject: [ceph-users] Radosgw authorization failed
> > > 
> > > Hi all,
> > > 
> > > I'm testing backup product which supports Amazon S3 as target for Archive
> > > storage and I'm trying to setup a Ceph cluster configured with the S3 API
> > > to
> > > use as an internal target for backup archives instead of AWS.
> > > 
> > > I've followed the online guide for setting up Radosgw and created a
> > > default
> > > region and zone based on the AWS naming convention US-East-1. I'm not
> > > sure
> > > if this is relevant but since I was having issues I thought it might need
> > > to
> > > be the same.
> > > 
> > > I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
> > > create a bucket, create a folder, list buckets etc. The problem is when
> > > the
> > > backup software tries to create an object I get an authorization failure.
> > > It's using the same user/access/secret as I'm using from boto.s3 and I'm
> > > sure the creds are right as it lets me create the initial connection, it
> > > just fails when trying to create an object (backup folder).
> > > 
> > > Here's the extract from the radosgw log:
> > > 
> > > -
> > > 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
> > > /:list_bucket:init op
> > > 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
> > > /:list_bucket:verifying op mask
> > > 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1
> > > user.op_mask=7
> > > 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
> > > /:list_bucket:verifying op permissions
> > > 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for
> > > uid=test
> > > mask=49
> > > 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
> > > 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for
> > > group=1
> > > mask=49
> > > 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
> > > 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for
> > > group=2
> > > mask=49
> > > 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
> > > 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
> > > owner=test perm=1
> > > 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm
> > > (type)=1,
> > > policy perm=1, user_perm_mask=1, acl perm=1
> > > 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
> > > /:list_bucket:verifying op params
> > > 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
> > > /:list_bucket:executing
> > > 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
> > > test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
> > > start num 1001
> > > 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
> > > /:list_bucket:http status=200
> > > 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done
> > > req=0x7f107000e2e0
> > > http_status=200 ==
> > > 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
> > > req=0x7f107000f0e0
> > > 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
> > > 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
> > > 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
> > > req=0x7f107000f6b0
> > > 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
> > > req=0x7f107000f0e0
> > > 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
> > > 2015-03-25 15:07:26.517081 7f1058dd

Re: [ceph-users] Radosgw authorization failed

2015-04-01 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Neville" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, April 1, 2015 11:45:09 AM
> Subject: Re: [ceph-users] Radosgw authorization failed
> 
> 
> 
> > On 31 Mar 2015, at 11:38, Neville  wrote:
> > 
> > 
> >  
> > > Date: Mon, 30 Mar 2015 12:17:48 -0400
> > > From: yeh...@redhat.com
> > > To: neville.tay...@hotmail.co.uk
> > > CC: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Radosgw authorization failed
> > > 
> > > 
> > > 
> > > - Original Message -
> > > > From: "Neville" 
> > > > To: "Yehuda Sadeh-Weinraub" 
> > > > Cc: ceph-users@lists.ceph.com
> > > > Sent: Monday, March 30, 2015 6:49:29 AM
> > > > Subject: Re: [ceph-users] Radosgw authorization failed
> > > > 
> > > > 
> > > > > Date: Wed, 25 Mar 2015 11:43:44 -0400
> > > > > From: yeh...@redhat.com
> > > > > To: neville.tay...@hotmail.co.uk
> > > > > CC: ceph-users@lists.ceph.com
> > > > > Subject: Re: [ceph-users] Radosgw authorization failed
> > > > > 
> > > > > 
> > > > > 
> > > > > - Original Message -
> > > > > > From: "Neville" 
> > > > > > To: ceph-users@lists.ceph.com
> > > > > > Sent: Wednesday, March 25, 2015 8:16:39 AM
> > > > > > Subject: [ceph-users] Radosgw authorization failed
> > > > > > 
> > > > > > Hi all,
> > > > > > 
> > > > > > I'm testing backup product which supports Amazon S3 as target for
> > > > > > Archive
> > > > > > storage and I'm trying to setup a Ceph cluster configured with the
> > > > > > S3 API
> > > > > > to
> > > > > > use as an internal target for backup archives instead of AWS.
> > > > > > 
> > > > > > I've followed the online guide for setting up Radosgw and created a
> > > > > > default
> > > > > > region and zone based on the AWS naming convention US-East-1. I'm
> > > > > > not
> > > > > > sure
> > > > > > if this is relevant but since I was having issues I thought it
> > > > > > might need
> > > > > > to
> > > > > > be the same.
> > > > > > 
> > > > > > I've tested the radosgw using boto.s3 and it seems to work ok i.e.
> > > > > > I can
> > > > > > create a bucket, create a folder, list buckets etc. The problem is
> > > > > > when
> > > > > > the
> > > > > > backup software tries to create an object I get an authorization
> > > > > > failure.
> > > > > > It's using the same user/access/secret as I'm using from boto.s3
> > > > > > and I'm
> > > > > > sure the creds are right as it lets me create the initial
> > > > > > connection, it
> > > > > > just fails when trying to create an object (backup folder).
> > > > > > 
> > > > > > Here's the extract from the radosgw log:
> > > > > > 
> > > > > > -
> > > > > > 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
> > > > > > /:list_bucket:init op
> > > > > > 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
> > > > > > /:list_bucket:verifying op mask
> > > > > > 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1
> > > > > > user.op_mask=7
> > > > > > 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
> > > > > > /:list_bucket:verifying op permissions
> > > > > > 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for
> > > > > > uid=test
> > > > > > mask=49
> > > > > > 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
> > > > > > 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for
> > > > > > group=1
> > > > > > mask=49
> >

Re: [ceph-users] RADOS Gateway quota management

2015-04-02 Thread Yehuda Sadeh-Weinraub

- Original Message -

> From: "Sergey Arkhipov" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, March 30, 2015 2:55:33 AM
> Subject: [ceph-users] RADOS Gateway quota management

> Hi,

> Currently I am trying to figure out how to work with RADOS Gateway (ceph
> 0.87) limits and I've managed to produce such strange behavior:

> { "bucket": "test1-8",
> "pool": ".rgw.buckets",
> "index_pool": ".rgw.buckets.index",
> "id": "default.17497.14",
> "marker": "default.17497.14",
> "owner": "cb254310-8b24-4622-93fb-640ca4a45998",
> "ver": 21,
> "master_ver": 0,
> "mtime": 1427705802,
> "max_marker": "",
> "usage": { "rgw.main": { "size_kb": 16000,
> "size_kb_actual": 16020,
> "num_objects": 9}},
> "bucket_quota": { "enabled": true,
> "max_size_kb": -1,
> "max_objects": 3}}

> Steps to reproduce: create bucket, set quota like that (max_objects = 3 and
> enable) and successfully upload 9 files. User quota is also defined:

> "bucket_quota": { "enabled": true,
> "max_size_kb": -1,
> "max_objects": 3},
> "user_quota": { "enabled": true,
> "max_size_kb": 1048576,
> "max_objects": 5},

> Could someone please help me to understand how to limit users?

> --

The question is whether the user is able to continue writing objects at this 
point. The quota system is working asynchronously, so it's possible to get into 
edge cases where users exceeded it a bit (it looks a whole lot better with 
larger numbers). The question is whether it's working for you at all. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOS Gateway quota management

2015-04-03 Thread Yehuda Sadeh-Weinraub

Great, I opened issue # 11323. 

Thanks, 
Yehuda 

- Original Message -

> From: "Sergey Arkhipov" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Friday, April 3, 2015 1:00:02 AM
> Subject: Re: [ceph-users] RADOS Gateway quota management

> Hi,

> Thank you for your answer! Meanwhile I did some investigations and found the
> reason: quota works on PUTs perfectly, but there are no checks on POSTs.
> I've made a pull-request: https://github.com/ceph/ceph/pull/4240

> 2015-04-02 18:40 GMT+03:00 Yehuda Sadeh-Weinraub < yeh...@redhat.com > :

> > > From: "Sergey Arkhipov" < sarkhi...@asdco.ru >
> > 
> 
> > > To: ceph-users@lists.ceph.com
> > 
> 
> > > Sent: Monday, March 30, 2015 2:55:33 AM
> > 
> 
> > > Subject: [ceph-users] RADOS Gateway quota management
> > 
> 

> > > Hi,
> > 
> 

> > > Currently I am trying to figure out how to work with RADOS Gateway (ceph
> > > 0.87) limits and I've managed to produce such strange behavior:
> > 
> 

> > > { "bucket": "test1-8",
> > 
> 
> > > "pool": ".rgw.buckets",
> > 
> 
> > > "index_pool": ".rgw.buckets.index",
> > 
> 
> > > "id": "default.17497.14",
> > 
> 
> > > "marker": "default.17497.14",
> > 
> 
> > > "owner": "cb254310-8b24-4622-93fb-640ca4a45998",
> > 
> 
> > > "ver": 21,
> > 
> 
> > > "master_ver": 0,
> > 
> 
> > > "mtime": 1427705802,
> > 
> 
> > > "max_marker": "",
> > 
> 
> > > "usage": { "rgw.main": { "size_kb": 16000,
> > 
> 
> > > "size_kb_actual": 16020,
> > 
> 
> > > "num_objects": 9}},
> > 
> 
> > > "bucket_quota": { "enabled": true,
> > 
> 
> > > "max_size_kb": -1,
> > 
> 
> > > "max_objects": 3}}
> > 
> 

> > > Steps to reproduce: create bucket, set quota like that (max_objects = 3
> > > and
> > > enable) and successfully upload 9 files. User quota is also defined:
> > 
> 

> > > "bucket_quota": { "enabled": true,
> > 
> 
> > > "max_size_kb": -1,
> > 
> 
> > > "max_objects": 3},
> > 
> 
> > > "user_quota": { "enabled": true,
> > 
> 
> > > "max_size_kb": 1048576,
> > 
> 
> > > "max_objects": 5},
> > 
> 

> > > Could someone please help me to understand how to limit users?
> > 
> 

> > > --
> > 
> 

> > The question is whether the user is able to continue writing objects at
> > this
> > point. The quota system is working asynchronously, so it's possible to get
> > into edge cases where users exceeded it a bit (it looks a whole lot better
> > with larger numbers). The question is whether it's working for you at all.
> 

> > Yehuda
> 

> --
> Sergey Arkhipov
> Software Engineer, ASD Technologies
> Phone: +7 920 018 9404
> Skype: serge.arkhipov
> sarkhi...@asdco.ru
> asdtech.co

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Purpose of the s3gw.fcgi script?

2015-04-12 Thread Yehuda Sadeh-Weinraub

You're not missing anything. The script was only needed when we used the 
process manager of the fastcgi module, but it has been very long since we 
stopped using it.

Yehuda

- Original Message -
> From: "Greg Meier" 
> To: ceph-users@lists.ceph.com
> Sent: Saturday, April 11, 2015 10:54:27 PM
> Subject: [ceph-users] Purpose of the s3gw.fcgi script?
> 
> From my observation, the s3gw.fcgi script seems to be completely superfluous
> in the operation of Ceph. With or without the script, swift requests execute
> correctly, as long as a radosgw daemon is running.
> 
> Is there something I'm missing here?
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-13 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Francois Lafont" 
> To: ceph-users@lists.ceph.com
> Sent: Sunday, April 12, 2015 8:47:40 PM
> Subject: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to 
> create bucket
> 
> Hi,
> 
> On a testing cluster, I have a radosgw on Firefly and the other
> nodes, OSDs and monitors, are on Hammer. The nodes are installed
> with puppet in personal VM, so I can reproduce the problem.
> Generally, I use s3cmd to check the radosgw. While radosgw is on
> Firefly, I can create bucket, no problem. Then, I upgrade the
> radosgw (it's a Ubuntu Trusty):
> 
> sed -i 's/firefly/hammer/g' /etc/apt/sources.list.d/ceph.list
> apt-get update && apt-get dist-upgrade -y
> service stop apache2
> stop radosgw-all
> start radosgw-all
> service apache2 start
> 
> After that, impossible to create a bucket with s3cmd:
> 
> --
> ~# s3cmd -d mb s3://bucket-2
> DEBUG: ConfigParser: Reading file '/root/.s3cfg'
> DEBUG: ConfigParser: bucket_location->US
> DEBUG: ConfigParser: cloudfront_host->cloudfront.amazonaws.com
> DEBUG: ConfigParser: default_mime_type->binary/octet-stream
> DEBUG: ConfigParser: delete_removed->False
> DEBUG: ConfigParser: dry_run->False
> DEBUG: ConfigParser: enable_multipart->True
> DEBUG: ConfigParser: encoding->UTF-8
> DEBUG: ConfigParser: encrypt->False
> DEBUG: ConfigParser: follow_symlinks->False
> DEBUG: ConfigParser: force->False
> DEBUG: ConfigParser: get_continue->False
> DEBUG: ConfigParser: gpg_command->/usr/bin/gpg
> DEBUG: ConfigParser: gpg_decrypt->%(gpg_command)s -d --verbose --no-use-agent
> --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s
> %(input_file)s
> DEBUG: ConfigParser: gpg_encrypt->%(gpg_command)s -c --verbose --no-use-agent
> --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s
> %(input_file)s
> DEBUG: ConfigParser: gpg_passphrase->...-3_chars...
> DEBUG: ConfigParser: guess_mime_type->True
> DEBUG: ConfigParser: host_base->ostore.athome.priv
> DEBUG: ConfigParser: access_key->5R...17_chars...Y
> DEBUG: ConfigParser: secret_key->Ij...37_chars...I
> DEBUG: ConfigParser: host_bucket->%(bucket)s.ostore.athome.priv
> DEBUG: ConfigParser: human_readable_sizes->False
> DEBUG: ConfigParser: invalidate_on_cf->False
> DEBUG: ConfigParser: list_md5->False
> DEBUG: ConfigParser: log_target_prefix->
> DEBUG: ConfigParser: mime_type->
> DEBUG: ConfigParser: multipart_chunk_size_mb->15
> DEBUG: ConfigParser: preserve_attrs->True
> DEBUG: ConfigParser: progress_meter->True
> DEBUG: ConfigParser: proxy_host->
> DEBUG: ConfigParser: proxy_port->0
> DEBUG: ConfigParser: recursive->False
> DEBUG: ConfigParser: recv_chunk->4096
> DEBUG: ConfigParser: reduced_redundancy->False
> DEBUG: ConfigParser: send_chunk->4096
> DEBUG: ConfigParser: simpledb_host->sdb.amazonaws.com
> DEBUG: ConfigParser: skip_existing->False
> DEBUG: ConfigParser: socket_timeout->300
> DEBUG: ConfigParser: urlencoding_mode->normal
> DEBUG: ConfigParser: use_https->False
> DEBUG: ConfigParser: verbosity->WARNING
> DEBUG: ConfigParser:
> website_endpoint->http://%(bucket)s.s3-website-%(location)s.amazonaws.com/
> DEBUG: ConfigParser: website_error->
> DEBUG: ConfigParser: website_index->index.html
> DEBUG: Updating Config.Config encoding -> UTF-8
> DEBUG: Updating Config.Config follow_symlinks -> False
> DEBUG: Updating Config.Config verbosity -> 10
> DEBUG: Unicodising 'mb' using UTF-8
> DEBUG: Unicodising 's3://bucket-2' using UTF-8
> DEBUG: Command: mb
> DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23
> +\n/bucket-2/'
> DEBUG: CreateRequest: resource[uri]=/
> DEBUG: SignHeaders: 'PUT\n\n\n\nx-amz-date:Mon, 13 Apr 2015 03:32:23
> +\n/bucket-2/'
> DEBUG: Processing request, please wait...
> DEBUG: get_hostname(bucket-2): bucket-2.ostore.athome.priv
> DEBUG: format_uri(): /
> DEBUG: Sending request method_string='PUT', uri='/',
> headers={'content-length': '0', 'Authorization': 'AWS
> 5RUS0Z3SBG6IK263PLFY:3V1MdXoCGFrJKrO2LSJaBpNMcK4=', 'x-amz-date': 'Mon, 13
> Apr 2015 03:32:23 +'}, body=(0 bytes)
> DEBUG: Response: {'status': 405, 'headers': {'date': 'Mon, 13 Apr 2015
> 03:32:23 GMT', 'accept-ranges': 'bytes', 'content-type': 'application/xml',
> 'content-length': '82', 'server': 'Apache/2.4.7 (Ubuntu)'}, 'reason':
> 'Method Not Allowed', 'data': ' encoding="UTF-8"?>MethodNotAllowed'}
> DEBUG: S3Error: 405 (Method Not Allowed)
> DEBUG: HttpHeader: date: Mon, 13 Apr 2015 03:32:23 GMT
> DEBUG: HttpHeader: accept-ranges: bytes
> DEBUG: HttpHeader: content-type: application/xml
> DEBUG: HttpHeader: content-length: 82
> DEBUG: HttpHeader: server: Apache/2.4.7 (Ubuntu)
> DEBUG: ErrorXML: Code: 'MethodNotAllowed'
> ERROR: S3 error: 405 (MethodNotAllowed):
> --
> 
> But before the upgrade, the same command worked fine.
> I see nothing in the log. Here is my ceph.conf:
> 
> --

Re: [ceph-users] Purpose of the s3gw.fcgi script?

2015-04-13 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Francois Lafont" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, April 13, 2015 5:17:47 PM
> Subject: Re: [ceph-users] Purpose of the s3gw.fcgi script?
> 
> Hi,
> 
> Yehuda Sadeh-Weinraub wrote:
> 
> > You're not missing anything. The script was only needed when we used
> > the process manager of the fastcgi module, but it has been very long
> > since we stopped using it.
> 
> Just to be sure, so if I understand well, these parts of the documentation:
> 
> 1.
> 
> http://docs.ceph.com/docs/master/radosgw/config/#create-a-cgi-wrapper-script
> 2.
> 
> http://docs.ceph.com/docs/master/radosgw/config/#adjust-cgi-wrapper-script-permission
> 
> can be completely skipped. Is it correct?
> 

Yes.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-14 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Francois Lafont" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, April 13, 2015 7:11:49 PM
> Subject: Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to 
> create bucket
> 
> Hi,
> 
> Yehuda Sadeh-Weinraub wrote:
> 
> > The 405 in this case usually means that rgw failed to translate the http
> > hostname header into
> > a bucket name. Do you have 'rgw dns name' set correctly?
> 
> Ah, I have found and indeed it concerned "rgw dns name" as also Karan
> thought. ;)
> But it's a little curious. Explanations:
> 
> My s3cmd client use these hostnames (which are well resolved with the IP
> address
> of the radosgw host):
> 
> .ostore.athome.priv
> 
> And in the configuration of my radosgw, I had:
> 
> ---
> [client.radosgw.gw1]
>   host= ceph-radosgw1
>   rgw dns name= ostore
>   ...
> ---
> 
> ie just the *short* name of the radosgw's fqdn (its fqdn is
> ostore.athome.priv).
> And with Firefly, it worked well, I never had problem with this
> configuration!
> But with Hammer, it doesn't work anymore (I don't know why). Now, with
> Hammer,
> I just notice that I have to put the fqdn in "rgw dns name" not the short
> name:
> 
> ---
> [client.radosgw.gw1]
>   host= ceph-radosgw1
>   rgw dns name= ostore.athome.priv
>   ...
> ---
> 
> And with this configuration, it works.
> 
> Is it normal? In fact, maybe my configuration with the short name (instead of
> the
> fqdn) was not valid and I just was lucky it work well so far. Is it the good
> conclusion
> of the story?
> 
> In fact, I think I never have well understood the meaning of the "rgw dns
> name"
> parameter. Can you confirm to me (or not) this:
> 
> This parameter is *only* used when a S3 client accesses to a bucket with
> the method http://.. If we don't set this
> parameter, such access will not work and a S3 client could access to a
> bucket only with the method http:///
> 
> Is it correct?

Yes.

Not sure why it *was* working in firefly. We did do some work around this in 
hammer, might have changed the behavior inadvertently.

Yehuda

> 
> Thx Yehuda and thx to Karan (who has pointed the real problem in fact ;)).
> 
> --
> François Lafont
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Swift and Ceph

2015-04-23 Thread Yehuda Sadeh-Weinraub

Sounds like you're hitting a known issue that was fixed a while back (although 
might not be fixed on the specific version you're running). Can you try 
creating a second subuser for the same user, see if that one works?

Yehuda

- Original Message -
> From: "alistair whittle" 
> To: ceph-users@lists.ceph.com
> Sent: Thursday, April 23, 2015 8:38:44 AM
> Subject: [ceph-users] Swift and Ceph
> 
> 
> 
> All,
> 
> 
> 
> I was hoping for some advice. I have recently built a Ceph cluster on RHEL
> 6.5 and have configured RGW. I want to test Swift API access, and as a
> result have created a user, swift subuser and swift keys as per the output
> below:
> 
> 
> 
> 1. Create user
> 
> 
> 
> radosgw-admin user create --uid="testuser1" --display-name="Test User1"
> 
> { "user_id": "testuser1",
> 
> "display_name": "Test User1",
> 
> "email": "",
> 
> "suspended": 0,
> 
> "max_buckets": 1000,
> 
> "auid": 0,
> 
> "subusers": [],
> 
> "keys": [
> 
> { "user": "testuser1",
> 
> "access_key": "MJBEZLJ7BYG8XODXT71V",
> 
> "secret_key": "tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p"}],
> 
> "swift_keys": [],
> 
> "caps": [],
> 
> "op_mask": "read, write, delete",
> 
> "default_placement": "",
> 
> "placement_tags": [],
> 
> "bucket_quota": { "enabled": false,
> 
> "max_size_kb": -1,
> 
> "max_objects": -1},
> 
> "user_quota": { "enabled": false,
> 
> "max_size_kb": -1,
> 
> "max_objects": -1},
> 
> "temp_url_keys": []}
> 
> 
> 
> 2. Create subuser.
> 
> 
> 
> radosgw-admin subuser create --uid=testuser1 --subuser=testuser1:swift
> --access=full
> 
> { "user_id": "testuser1",
> 
> "display_name": "Test User1",
> 
> "email": "",
> 
> "suspended": 0,
> 
> "max_buckets": 1000,
> 
> "auid": 0,
> 
> "subusers": [
> 
> { "id": "testuser1:swift",
> 
> "permissions": "full-control"}],
> 
> "keys": [
> 
> { "user": "testuser1:swift",
> 
> "access_key": "HX9Q30EJWCZG825AT7B0",
> 
> "secret_key": ""},
> 
> { "user": "testuser1",
> 
> "access_key": "MJBEZLJ7BYG8XODXT71V",
> 
> "secret_key": "tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p"}],
> 
> "swift_keys": [],
> 
> "caps": [],
> 
> "op_mask": "read, write, delete",
> 
> "default_placement": "",
> 
> "placement_tags": [],
> 
> "bucket_quota": { "enabled": false,
> 
> "max_size_kb": -1,
> 
> "max_objects": -1},
> 
> "user_quota": { "enabled": false,
> 
> "max_size_kb": -1,
> 
> "max_objects": -1},
> 
> "temp_url_keys": []}
> 
> 
> 
> 3. Create key
> 
> 
> 
> radosgw-admin key create --subuser=testuser1:swift --key-type=swift
> --gen-secret
> 
> { "user_id": "testuser1",
> 
> "display_name": "Test User1",
> 
> "email": "",
> 
> "suspended": 0,
> 
> "max_buckets": 1000,
> 
> "auid": 0,
> 
> "subusers": [
> 
> { "id": "testuser1:swift",
> 
> "permissions": "full-control"}],
> 
> "keys": [
> 
> { "user": "testuser1:swift",
> 
> "access_key": "HX9Q30EJWCZG825AT7B0",
> 
> "secret_key": ""},
> 
> { "user": "testuser1",
> 
> "access_key": "MJBEZLJ7BYG8XODXT71V",
> 
> "secret_key": "tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p"}],
> 
> "swift_keys": [
> 
> { "user": "testuser1:swift",
> 
> "secret_key": "KpQCfPLstJhSMsR9qUzY9WfA1ebO4x7VRXkr1KSf"}],
> 
> "caps": [],
> 
> "op_mask": "read, write, delete",
> 
> "default_placement": "",
> 
> "placement_tags": [],
> 
> "bucket_quota": { "enabled": false,
> 
> "max_size_kb": -1,
> 
> "max_objects": -1},
> 
> "user_quota": { "enabled": false,
> 
> "max_size_kb": -1,
> 
> "max_objects": -1},
> 
> "temp_url_keys": []}
> 
> 
> 
> When I try and do anything using the credentials above, I get “Account not
> found” errors as per the example below:
> 
> 
> 
> swift -A https:///auth/1.0 -U testuser1:swift -K
> "KpQCfPLstJhSMsR9qUzY9WfA1ebO4x7VRXkr1KSf" list
> 
> 
> 
> That’s the first thing.
> 
> 
> 
> Secondly, when I follow the process above to create a second user
> “testuser2”, the user and subuser is created, however, when I try and
> generate a swift key for it, I get the following error:
> 
> 
> 
> radosgw-admin key create --subuser=testuser2:swift --key-type=swift
> --gen-secret
> 
> could not create key: unable to add access key, unable to store user info
> 
> 2015-04-23 15:42:38.897090 7f38e157d820 0 WARNING: can't store user info,
> swift id () already mapped to another user (testuser2)
> 
> 
> 
> This suggests there is something wrong with the users or the configuration of
> the gateway somewhere. Can someone provide some advice on what might be
> wrong, or where I can look to find out. I have gone through whatever log
> files I can and don’t see anything of any use at the moment.
> 
> 
> 
> Any help appreciated.
> 
> 
> 
> Thanks
> 
> 
> 
> Alistair
> 
> 
> ___
> 
> This message is for information purposes only, it is not a recommendation,
> advice, offer or solicitation to buy or sell a product or service nor an
> official confirmation of any transaction. It is directed at persons who are
> professionals and is not intended for retail customer use. Intended for
> recipient only. This message is subject to

Re: [ceph-users] Swift and Ceph

2015-04-23 Thread Yehuda Sadeh-Weinraub

I think you're hitting issue #8587 (http://tracker.ceph.com/issues/8587). This 
issue has been fixed at 0.80.8, so you might want to upgrade to that version 
(available with ICE 1.2.3).

Yehuda

- Original Message -
> From: "alistair whittle" 
> To: yeh...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Sent: Thursday, April 23, 2015 10:47:28 AM
> Subject: Re: [ceph-users] Swift and Ceph
> 
> Can you explain this a bit more?   You mean try and create a second subuser
> for testuser1 or testuser2?
> 
> As an aside, I am running Ceph 0.80.7 as is packaged with ICE 1.2.2.  I
> believe that is the Firefly release.
> 
> 
> -Original Message-
> From: Yehuda Sadeh-Weinraub [mailto:yeh...@redhat.com]
> Sent: Thursday, April 23, 2015 6:18 PM
> To: Whittle, Alistair: Investment Bank (LDN)
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Swift and Ceph
> 
> Sounds like you're hitting a known issue that was fixed a while back
> (although might not be fixed on the specific version you're running). Can
> you try creating a second subuser for the same user, see if that one works?
> 
> Yehuda
> 
> - Original Message -
> > From: "alistair whittle" 
> > To: ceph-users@lists.ceph.com
> > Sent: Thursday, April 23, 2015 8:38:44 AM
> > Subject: [ceph-users] Swift and Ceph
> > 
> > 
> > 
> > All,
> > 
> > 
> > 
> > I was hoping for some advice. I have recently built a Ceph cluster on
> > RHEL
> > 6.5 and have configured RGW. I want to test Swift API access, and as a
> > result have created a user, swift subuser and swift keys as per the
> > output
> > below:
> > 
> > 
> > 
> > 1. Create user
> > 
> > 
> > 
> > radosgw-admin user create --uid="testuser1" --display-name="Test User1"
> > 
> > { "user_id": "testuser1",
> > 
> > "display_name": "Test User1",
> > 
> > "email": "",
> > 
> > "suspended": 0,
> > 
> > "max_buckets": 1000,
> > 
> > "auid": 0,
> > 
> > "subusers": [],
> > 
> > "keys": [
> > 
> > { "user": "testuser1",
> > 
> > "access_key": "MJBEZLJ7BYG8XODXT71V",
> > 
> > "secret_key": "tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p"}],
> > 
> > "swift_keys": [],
> > 
> > "caps": [],
> > 
> > "op_mask": "read, write, delete",
> > 
> > "default_placement": "",
> > 
> > "placement_tags": [],
> > 
> > "bucket_quota": { "enabled": false,
> > 
> > "max_size_kb": -1,
> > 
> > "max_objects": -1},
> > 
> > "user_quota": { "enabled": false,
> > 
> > "max_size_kb": -1,
> > 
> > "max_objects": -1},
> > 
> > "temp_url_keys": []}
> > 
> > 
> > 
> > 2. Create subuser.
> > 
> > 
> > 
> > radosgw-admin subuser create --uid=testuser1 --subuser=testuser1:swift
> > --access=full
> > 
> > { "user_id": "testuser1",
> > 
> > "display_name": "Test User1",
> > 
> > "email": "",
> > 
> > "suspended": 0,
> > 
> > "max_buckets": 1000,
> > 
> > "auid": 0,
> > 
> > "subusers": [
> > 
> > { "id": "testuser1:swift",
> > 
> > "permissions": "full-control"}],
> > 
> > "keys": [
> > 
> > { "user": "testuser1:swift",
> > 
> > "access_key": "HX9Q30EJWCZG825AT7B0",
> > 
> > "secret_key": ""},
> > 
> > { "user": "testuser1",
> > 
> > "access_key": "MJBEZLJ7BYG8XODXT71V",
> > 
> > "secret_key": "tGnsm8JeEgPGAy1MGCKSVVoSIEs8iWNUOgiJ981p"}],
> > 
> > "swift_keys": [],
> > 
> > "caps": [],
> > 
> > "op_mask": "read, write, delete",
> > 
> > "default_placement": "",
> > 
> > "placement_tags": [],
> > 
> > "bucket_quota": { "enabled": false,
> > 
> > "max_size_kb": -1,
> &

Re: [ceph-users] Shadow Files

2015-04-24 Thread Yehuda Sadeh-Weinraub

What version are you running? There are two different issues that we were 
fixing this week, and we should have that upstream pretty soon.

Yehuda

- Original Message -
> From: "Ben" 
> To: "ceph-users" 
> Cc: "Yehuda Sadeh-Weinraub" 
> Sent: Thursday, April 23, 2015 7:42:06 PM
> Subject: [ceph-users] Shadow Files
> 
> We are still experiencing a problem with out gateway not properly
> clearing out shadow files.
> 
> I have done numerous tests where I have:
> -Uploaded a file of 1.5GB in size using s3browser application
> -Done an object stat on the file to get its prefix
> -Done rados ls -p .rgw.buckets | grep  to count the number of
> shadow files associated (in this case it is around 290 shadow files)
> -Deleted said file with s3browser
> -Performed a gc list, which shows the ~290 files listed
> -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep  to
> recount the shadow files only to be left with 290 files still there
> 
>  From log output /var/log/ceph/radosgw.log, I can see the following when
> clicking DELETE (this appears 290 times)
> 2015-04-24 10:43:29.996523 7f0b0afb5700  0 RGWObjManifest::operator++():
> result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule->part_size=0
> 2015-04-24 10:43:29.996557 7f0b0afb5700  0 RGWObjManifest::operator++():
> result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule->part_size=0
> 2015-04-24 10:43:29.996564 7f0b0afb5700  0 RGWObjManifest::operator++():
> result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule->part_size=0
> 2015-04-24 10:43:29.996570 7f0b0afb5700  0 RGWObjManifest::operator++():
> result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule->part_size=0
> 2015-04-24 10:43:29.996576 7f0b0afb5700  0 RGWObjManifest::operator++():
> result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule->part_size=0
> 2015-04-24 10:43:29.996581 7f0b0afb5700  0 RGWObjManifest::operator++():
> result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule->part_size=0
> 2015-04-24 10:43:29.996586 7f0b0afb5700  0 RGWObjManifest::operator++():
> result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule->part_size=0
> 2015-04-24 10:43:29.996592 7f0b0afb5700  0 RGWObjManifest::operator++():
> result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule->part_size=0
> 
> In this same log, I also see the gc process saying it is removing said
> file (these records appear 290 times too)
> 2015-04-23 14:16:27.926952 7f15be0ee700  0 gc::process: removing
> .rgw.buckets:
> 2015-04-23 14:16:27.928572 7f15be0ee700  0 gc::process: removing
> .rgw.buckets:
> 2015-04-23 14:16:27.929636 7f15be0ee700  0 gc::process: removing
> .rgw.buckets:
> 2015-04-23 14:16:27.930448 7f15be0ee700  0 gc::process: removing
> .rgw.buckets:
> 2015-04-23 14:16:27.931226 7f15be0ee700  0 gc::process: removing
> .rgw.buckets:
> 2015-04-23 14:16:27.932103 7f15be0ee700  0 gc::process: removing
> .rgw.buckets:
> 2015-04-23 14:16:27.933470 7f15be0ee700  0 gc::process: removing
> .rgw.buckets:
> 
> So even though it appears that the GC is processing its removal, the
> shadow files remain!
> 
> Please help!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-04-24 Thread Yehuda Sadeh-Weinraub

These ones:

http://tracker.ceph.com/issues/10295
http://tracker.ceph.com/issues/11447

- Original Message -
> From: "Ben Jackson" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "ceph-users" 
> Sent: Friday, April 24, 2015 3:06:02 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> We were firefly, then we upgraded to giant, now we are on hammer.
> 
> What issues?
> 
> On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub  wrote:
> >
> > What version are you running? There are two different issues that we were
> > fixing this week, and we should have that upstream pretty soon.
> >
> > Yehuda
> >
> > - Original Message -
> > > From: "Ben" 
> > > To: "ceph-users" 
> > > Cc: "Yehuda Sadeh-Weinraub" 
> > > Sent: Thursday, April 23, 2015 7:42:06 PM
> > > Subject: [ceph-users] Shadow Files
> > > 
> > > We are still experiencing a problem with out gateway not properly
> > > clearing out shadow files.
> > > 
> > > I have done numerous tests where I have:
> > > -Uploaded a file of 1.5GB in size using s3browser application
> > > -Done an object stat on the file to get its prefix
> > > -Done rados ls -p .rgw.buckets | grep  to count the number of
> > > shadow files associated (in this case it is around 290 shadow files)
> > > -Deleted said file with s3browser
> > > -Performed a gc list, which shows the ~290 files listed
> > > -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep  to
> > > recount the shadow files only to be left with 290 files still there
> > > 
> > >  From log output /var/log/ceph/radosgw.log, I can see the following when
> > > clicking DELETE (this appears 290 times)
> > > 2015-04-24 10:43:29.996523 7f0b0afb5700  0 RGWObjManifest::operator++():
> > > result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule->part_size=0
> > > 2015-04-24 10:43:29.996557 7f0b0afb5700  0 RGWObjManifest::operator++():
> > > result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule->part_size=0
> > > 2015-04-24 10:43:29.996564 7f0b0afb5700  0 RGWObjManifest::operator++():
> > > result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule->part_size=0
> > > 2015-04-24 10:43:29.996570 7f0b0afb5700  0 RGWObjManifest::operator++():
> > > result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule->part_size=0
> > > 2015-04-24 10:43:29.996576 7f0b0afb5700  0 RGWObjManifest::operator++():
> > > result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule->part_size=0
> > > 2015-04-24 10:43:29.996581 7f0b0afb5700  0 RGWObjManifest::operator++():
> > > result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule->part_size=0
> > > 2015-04-24 10:43:29.996586 7f0b0afb5700  0 RGWObjManifest::operator++():
> > > result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule->part_size=0
> > > 2015-04-24 10:43:29.996592 7f0b0afb5700  0 RGWObjManifest::operator++():
> > > result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule->part_size=0
> > > 
> > > In this same log, I also see the gc process saying it is removing said
> > > file (these records appear 290 times too)
> > > 2015-04-23 14:16:27.926952 7f15be0ee700  0 gc::process: removing
> > > .rgw.buckets:
> > > 2015-04-23 14:16:27.928572 7f15be0ee700  0 gc::process: removing
> > > .rgw.buckets:
> > > 2015-04-23 14:16:27.929636 7f15be0ee700  0 gc::process: removing
> > > .rgw.buckets:
> > > 2015-04-23 14:16:27.930448 7f15be0ee700  0 gc::process: removing
> > > .rgw.buckets:
> > > 2015-04-23 14:16:27.931226 7f15be0ee700  0 gc::process: removing
> > > .rgw.buckets:
> > > 2015-04-23 14:16:27.932103 7f15be0ee700  0 gc::process: removing
> > > .rgw.buckets:
> > > 2015-04-23 14:16:27.933470 7f15be0ee700  0 gc::process: removing
> > > .rgw.buckets:
> > > 
> > > So even though it appears that the GC is processing its removal, the
> > > shadow files remain!
> > > 
> > > Please help!
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-04-25 Thread Yehuda Sadeh-Weinraub


Yeah, that's definitely something that we'd address soon.

Yehuda

- Original Message -
> From: "Ben" 
> To: "Ben Hines" , "Yehuda Sadeh-Weinraub" 
> 
> Cc: "ceph-users" 
> Sent: Friday, April 24, 2015 5:14:11 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> Definitely need something to help clear out these old shadow files.
> 
> I'm sure our cluster has around 100TB of these shadow files.
> 
> I've written a script to go through known objects to get prefixes of objects
> that should exist to compare to ones that shouldn't, but the time it takes
> to do this over millions and millions of objects is just too long.
> 
> On 25/04/15 09:53, Ben Hines wrote:
> 
> 
> 
> When these are fixed it would be great to get good steps for listing /
> cleaning up any orphaned objects. I have suspicions this is affecting us.
> 
> thanks-
> 
> -Ben
> 
> On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> wrote:
> 
> 
> These ones:
> 
> http://tracker.ceph.com/issues/10295
> http://tracker.ceph.com/issues/11447
> 
> - Original Message -
> > From: "Ben Jackson" 
> > To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> > Cc: "ceph-users" < ceph-us...@ceph.com >
> > Sent: Friday, April 24, 2015 3:06:02 PM
> > Subject: Re: [ceph-users] Shadow Files
> > 
> > We were firefly, then we upgraded to giant, now we are on hammer.
> > 
> > What issues?
> > 
> > On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub < yeh...@redhat.com > wrote:
> > > 
> > > What version are you running? There are two different issues that we were
> > > fixing this week, and we should have that upstream pretty soon.
> > > 
> > > Yehuda
> > > 
> > > - Original Message -
> > > > From: "Ben" 
> > > > To: "ceph-users" < ceph-us...@ceph.com >
> > > > Cc: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> > > > Sent: Thursday, April 23, 2015 7:42:06 PM
> > > > Subject: [ceph-users] Shadow Files
> > > > 
> > > > We are still experiencing a problem with out gateway not properly
> > > > clearing out shadow files.
> > > > 
> > > > I have done numerous tests where I have:
> > > > -Uploaded a file of 1.5GB in size using s3browser application
> > > > -Done an object stat on the file to get its prefix
> > > > -Done rados ls -p .rgw.buckets | grep  to count the number of
> > > > shadow files associated (in this case it is around 290 shadow files)
> > > > -Deleted said file with s3browser
> > > > -Performed a gc list, which shows the ~290 files listed
> > > > -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep 
> > > > to
> > > > recount the shadow files only to be left with 290 files still there
> > > > 
> > > > From log output /var/log/ceph/radosgw.log, I can see the following when
> > > > clicking DELETE (this appears 290 times)
> > > > 2015-04-24 10:43:29.996523 7f0b0afb5700 0 RGWObjManifest::operator++():
> > > > result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule->part_size=0
> > > > 2015-04-24 10:43:29.996557 7f0b0afb5700 0 RGWObjManifest::operator++():
> > > > result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule->part_size=0
> > > > 2015-04-24 10:43:29.996564 7f0b0afb5700 0 RGWObjManifest::operator++():
> > > > result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule->part_size=0
> > > > 2015-04-24 10:43:29.996570 7f0b0afb5700 0 RGWObjManifest::operator++():
> > > > result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule->part_size=0
> > > > 2015-04-24 10:43:29.996576 7f0b0afb5700 0 RGWObjManifest::operator++():
> > > > result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule->part_size=0
> > > > 2015-04-24 10:43:29.996581 7f0b0afb5700 0 RGWObjManifest::operator++():
> > > > result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule->part_size=0
> > > > 2015-04-24 10:43:29.996586 7f0b0afb5700 0 RGWObjManifest::operator++():
> > > > result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule->part_size=0
> > > > 2015-04-24 10:43:29.996592 7f0b0afb5700 0 RGWObjManifest::operator++():
> > > > result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule->part_size=0
> > > > 
> > > > In this same log, I

Re: [ceph-users] Shadow Files

2015-04-27 Thread Yehuda Sadeh-Weinraub

It will get to the ceph mainline eventually. We're still reviewing and testing 
the fix, and there's more work to be done on the cleanup tool.

Yehuda

- Original Message -
> From: "Ben" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "ceph-users" 
> Sent: Sunday, April 26, 2015 11:02:23 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> Are these fixes going to make it into the repository versions of ceph,
> or will we be required to compile and install manually?
> 
> On 2015-04-26 02:29, Yehuda Sadeh-Weinraub wrote:
> > Yeah, that's definitely something that we'd address soon.
> > 
> > Yehuda
> > 
> > - Original Message -
> >> From: "Ben" 
> >> To: "Ben Hines" , "Yehuda Sadeh-Weinraub"
> >> 
> >> Cc: "ceph-users" 
> >> Sent: Friday, April 24, 2015 5:14:11 PM
> >> Subject: Re: [ceph-users] Shadow Files
> >> 
> >> Definitely need something to help clear out these old shadow files.
> >> 
> >> I'm sure our cluster has around 100TB of these shadow files.
> >> 
> >> I've written a script to go through known objects to get prefixes of
> >> objects
> >> that should exist to compare to ones that shouldn't, but the time it
> >> takes
> >> to do this over millions and millions of objects is just too long.
> >> 
> >> On 25/04/15 09:53, Ben Hines wrote:
> >> 
> >> 
> >> 
> >> When these are fixed it would be great to get good steps for listing /
> >> cleaning up any orphaned objects. I have suspicions this is affecting
> >> us.
> >> 
> >> thanks-
> >> 
> >> -Ben
> >> 
> >> On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub <
> >> yeh...@redhat.com >
> >> wrote:
> >> 
> >> 
> >> These ones:
> >> 
> >> http://tracker.ceph.com/issues/10295
> >> http://tracker.ceph.com/issues/11447
> >> 
> >> - Original Message -
> >> > From: "Ben Jackson" 
> >> > To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> >> > Cc: "ceph-users" < ceph-us...@ceph.com >
> >> > Sent: Friday, April 24, 2015 3:06:02 PM
> >> > Subject: Re: [ceph-users] Shadow Files
> >> >
> >> > We were firefly, then we upgraded to giant, now we are on hammer.
> >> >
> >> > What issues?
> >> >
> >> > On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> >> > wrote:
> >> > >
> >> > > What version are you running? There are two different issues that we
> >> > > were
> >> > > fixing this week, and we should have that upstream pretty soon.
> >> > >
> >> > > Yehuda
> >> > >
> >> > > - Original Message -
> >> > > > From: "Ben" 
> >> > > > To: "ceph-users" < ceph-us...@ceph.com >
> >> > > > Cc: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> >> > > > Sent: Thursday, April 23, 2015 7:42:06 PM
> >> > > > Subject: [ceph-users] Shadow Files
> >> > > >
> >> > > > We are still experiencing a problem with out gateway not properly
> >> > > > clearing out shadow files.
> >> > > >
> >> > > > I have done numerous tests where I have:
> >> > > > -Uploaded a file of 1.5GB in size using s3browser application
> >> > > > -Done an object stat on the file to get its prefix
> >> > > > -Done rados ls -p .rgw.buckets | grep  to count the number
> >> > > > of
> >> > > > shadow files associated (in this case it is around 290 shadow files)
> >> > > > -Deleted said file with s3browser
> >> > > > -Performed a gc list, which shows the ~290 files listed
> >> > > > -Waited 24 hours to redo the rados ls -p .rgw.buckets | grep
> >> > > > 
> >> > > > to
> >> > > > recount the shadow files only to be left with 290 files still there
> >> > > >
> >> > > > From log output /var/log/ceph/radosgw.log, I can see the following
> >> > > > when
> >> > > > clicking DELETE (this appears 290 times)
> >> > > > 2015-04-24 10:43:29.996523 7f0b0afb57

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-04-28 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Sean" 
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, April 28, 2015 2:52:35 PM
> Subject: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb 
> logs stop after rotation
> 
> Hey yall!
> 
> I have a weird issue and I am not sure where to look so any help would
> be appreciated. I have a large ceph giant cluster that has been stable
> and healthy almost entirely since its inception. We have stored over
> 1.5PB into the cluster currently through RGW and everything seems to be
> functioning great. We have downloaded smaller objects without issue but
> last night we did a test on our largest file (almost 1 terabyte) and it
> continuously times out at almost the exact same place. Investigating
> further it looks like Civetweb/RGW is returning that the uploads
> completed even though the objects are truncated. At least when we
> download the objects they seem to be truncated.
> 
> I have tried searching through the mailing list archives to see what may
> be going on but it looks like the mailing list DB may be going through
> some mainenance:
> 
> 
> Unable to read word database file
> '/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db'
> 
> 
> After checking through the gzipped logs I see that civetweb just stops
> logging after a rotation for some reason as well and my last log is from
> the 28th of march. I tried manually running /etc/init.d/radosgw reload
> but this didn't seem to work. As running the download again could take
> all day to error out we instead use the range request to try and pull
> the missing bites.
> 
> https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the
> code we are using to download via S3 / boto as well as the returned size
> report and overview of our issue.
> http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb
> server they are hitting.
> 
> Here is our current config ::
> http://pastebin.com/2SGfSDYG
> 
> Current output of ceph health::
> http://pastebin.com/3f6iJEbu
> 
> I am thinking that this must be a civetweb/radosgw bug of somekind. My
> question is 1.) is there a way to try and download the object via rados
> directly I am guessing I will need to find the prefix and then just cat
> all of them together and hope I get it right? 2.) Why would ceph say the
> upload went fine but then return a smaller object?
> 
> 


Note that the returned http resonse returns 206 (partial content):
/var/log/radosgw/client.radosgw.log:2015-04-28 16:08:26.525268 7f6e93fff700  2 
req 0:1.067030:s3:GET 
/tcga_cghub_protected/ff9b730c-d303-4d49-b28f-e0bf9d8f1c84/759366461d2bf8bb0583d5b9566ce947.bam:get_obj:http
 status=206

It'll only return that if partial content is requested (through the http Range 
header). It's really hard to tell from these logs whether there's any actual 
problem. I suggest bumping up the log level (debug ms = 1, debug rgw = 20), and 
take a look at an entire request (one that include all the request http 
headers).

Yehuda



> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-05-02 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Sean" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Friday, May 1, 2015 6:47:09 PM
> Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; 
> civetweb logs stop after rotation
> 
> Hey there,
> 
> Sorry for the delay. I have been moving apartments UGH. Our dev team
> found out how to quickly identify these files that are downloading a
> smaller size::
> 
> iterate through all of the objects in a bucket and call for a key.size
> in each item and compare it to conn.get_bucket().get_key().size of each
> key and the sizes differ. If the sizes differ these correspond exactly
> to any object that seems to have missing objects in ceph.
> 
> The objects always seem to be intervals of 512k as well which is really
> odd.
> 
> ==
> http://pastebin.com/R34wF7PB
> ==
> 
> My main question is why are these sizes different at all? Shouldn't they
> be exactly the same? Why are they off by multiples of 512k as well?
> Finally I need a way to rule out that this is a ceph issue and the only
> way I can think of is grabbing a list of all of the data files and
> concatenating them together in order in hopes that the manifest is wrong
> and I get the whole file.
> 
> For example::
> 
> implicit size 7745820218 explicit size 7744771642. Absolute
> 1048576; name =
> 86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam
> 
> I explicitly called one of the gateways and then piped the output to a
> text file while downloading this bam:
> 
> https://drive.google.com/file/d/0B16pfLB7yY6GcTZXalBQM3RHT0U/view?usp=sharing
> (25 Mb of text)
> 
> As we can see above. Ceph is saying that the size is  7745820218 bytes
> somewhere but when we download it we get 7744771642 bytes. If I download

There are two different things: the bucket index, and the object manifest. The 
bucket index has the former, and the object manifest specifies the latter.

> the object I get a 7744771642 byte file. Finally if I do a range request
> of all of the bytes from 7744771642 to the end I get a cannot compete
> request::
> 
> 
> http://pastebin.com/CVvmex4m -- traceback of the python range request.
> http://pastebin.com/4sd1Jc0G -- the radoslog of the range request
> 
> If I request the file with a shorter range (say 7744771642 -2 bytes
> (7744771640)) I am left with just a 2 byte file::
> 
> http://pastebin.com/Sn7Y0t9G -- range request of file - 2 bytes to end
> of file.
> lacadmin@kh10-9:~$ ls -lhab 7gtest-range.bam
> -rw-r--r-- 1 lacadmin lacadmin 2 Feb 24 01:00 7gtest-range.bam
> 
> 
> I think that rados-gw may not be keeping track of the multipart chunks
> errors possibly? How did rados get the original and correct file size
> and why is it short when it returns the actual chunks? Finally why are
> the corrupt / missing chunks always a multipe of 512K? I do not see
> anything obvious that is set to 512K on the configuration/user side.
> 
> 
> Sorry for the questions and babling but I am at a loss as to how to
> address this.

So, the question is which is correct, the index, or the object itself. Do you 
have any way to know which one is the correct one? Also, does it only happen to 
you with very large objects? Does it happen with every such object (e.g., > 
4GBs)?

Here's some extra information you could gather:

 - Get the object manifest:

$ radosgw-admin object stat --bucket= --object=

 - Get status for each rados object to the corresponding logical rgw object:

First, identify the object names that correspond to this specific rgw object. 
From the manifest you'd get a 'prefix', which is a random hash that all tail 
objects should contain. Then you should do something like:

$ rados -p  ls | grep $prefix

And then, for each object:

$ rados -p  stat $object

There's also the head object that you'd want to inspect (named after the actual 
rgw object name, grep it too).

HTH,
Yehuda

> 
> 
> 
> 
> 
> On 04/28/2015 05:03 PM, Yehuda Sadeh-Weinraub wrote:
> >
> > - Original Message -
> >> From: "Sean" 
> >> To: ceph-users@lists.ceph.com
> >> Sent: Tuesday, April 28, 2015 2:52:35 PM
> >> Subject: [ceph-users] Civet RadosGW S3 not storing complete obects;
> >> civetweb logs stop after rotation
> >>
> >> Hey yall!
> >>
> >> I have a weird issue and I am not sure where to look so any help would
> >> be appreciated. I have a large ceph giant cluster that has been stable
> >> and healthy almost entirely since its inception. We have stored over
> >> 1.5PB into the clu

Re: [ceph-users] Shadow Files

2015-05-04 Thread Yehuda Sadeh-Weinraub


I've been working on a new tool that would detect leaked rados objects. It will 
take some time for it to be merged into an official release, or even into the 
master branch, but if anyone likes to play with it, it is in the 
wip-rgw-orphans branch.

At the moment I recommend to not remove any object that the tool reports, but 
rather move it to a different pool for backup (using the rados tool cp command).

The tool works in a few stages:
(1) list all the rados objects in the specified pool, store in repository
(2) list all bucket instances in the system, store in repository
(3) iterate through bucket instances in repository, list (logical) objects, for 
each object store the expected rados objects that build it
(4) compare data from (1) and (3), each object that is in (1), but not in (3), 
stat, if older than $start_time - $stale_period, report it

There can be lot's of things that can go wrong with this, so we really need to 
be careful here.

The tool can be run by the following command:

$ radosgw-admin orphans find --pool= --job-id=  
[--num-shards=] [--orphan-stale-secs=]

The tool can be stopped, and restarted, and it will continue from the stage 
where it stopped. Note that some of the stages will restart from the beginning 
(of the stages), due to system limitation (specifically 1, 2).

In order to clean up a job's data:

$ radosgw-admin orphans finish --job-id=

Note that the jobs run in the radosgw-admin process context, it does not 
schedule a job on the radosgw process.

Please let me know of any issue you find.

Thanks,
Yehuda

- Original Message -
> From: "Ben Hines" 
> To: "Ben" 
> Cc: "Yehuda Sadeh-Weinraub" , "ceph-users" 
> 
> Sent: Thursday, April 30, 2015 3:00:16 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> Going to hold off on our 94.1 update for this issue
> 
> Hopefully this can make it into a 94.2 or a v95 git release.
> 
> -Ben
> 
> On Mon, Apr 27, 2015 at 2:32 PM, Ben < b@benjackson.email > wrote:
> 
> 
> How long are you thinking here?
> 
> We added more storage to our cluster to overcome these issues, and we can't
> keep throwing storage at it until the issues are fixed.
> 
> 
> On 28/04/15 01:49, Yehuda Sadeh-Weinraub wrote:
> 
> 
> It will get to the ceph mainline eventually. We're still reviewing and
> testing the fix, and there's more work to be done on the cleanup tool.
> 
> Yehuda
> 
> - Original Message -
> 
> 
> From: "Ben" 
> To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> Cc: "ceph-users" < ceph-us...@ceph.com >
> Sent: Sunday, April 26, 2015 11:02:23 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> Are these fixes going to make it into the repository versions of ceph,
> or will we be required to compile and install manually?
> 
> On 2015-04-26 02:29, Yehuda Sadeh-Weinraub wrote:
> 
> 
> Yeah, that's definitely something that we'd address soon.
> 
> Yehuda
> 
> - Original Message -
> 
> 
> From: "Ben" 
> To: "Ben Hines" < bhi...@gmail.com >, "Yehuda Sadeh-Weinraub"
> < yeh...@redhat.com >
> Cc: "ceph-users" < ceph-us...@ceph.com >
> Sent: Friday, April 24, 2015 5:14:11 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> Definitely need something to help clear out these old shadow files.
> 
> I'm sure our cluster has around 100TB of these shadow files.
> 
> I've written a script to go through known objects to get prefixes of
> objects
> that should exist to compare to ones that shouldn't, but the time it
> takes
> to do this over millions and millions of objects is just too long.
> 
> On 25/04/15 09:53, Ben Hines wrote:
> 
> 
> 
> When these are fixed it would be great to get good steps for listing /
> cleaning up any orphaned objects. I have suspicions this is affecting
> us.
> 
> thanks-
> 
> -Ben
> 
> On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub <
> yeh...@redhat.com >
> wrote:
> 
> 
> These ones:
> 
> http://tracker.ceph.com/issues/10295
> http://tracker.ceph.com/issues/11447
> 
> - Original Message -
> 
> 
> From: "Ben Jackson" 
> To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> Cc: "ceph-users" < ceph-us...@ceph.com >
> Sent: Friday, April 24, 2015 3:06:02 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> We were firefly, then we upgraded to giant, now we are on hammer.
> 
> What issues?
> 
> On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> wrote:
> 
> 
> What version are you running? There are two different issues that

Re: [ceph-users] Shadow Files

2015-05-05 Thread Yehuda Sadeh-Weinraub

Can you try creating the .log pool?

Yehda

- Original Message -
> From: "Anthony Alba" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "Ben" , "ceph-users" 
> Sent: Tuesday, May 5, 2015 3:37:15 AM
> Subject: Re: [ceph-users] Shadow Files
> 
> ...sorry clicked send to quickly
> 
> /opt/ceph/bin/radosgw-admin orphans find --pool=.rgw.buckets --job-id=abcd
> ERROR: failed to open log pool ret=-2
> job not found
> 
> On Tue, May 5, 2015 at 6:36 PM, Anthony Alba  wrote:
> > Hi Yehuda,
> >
> > First run:
> >
> > /opt/ceph/bin/radosgw-admin  --pool=.rgw.buckets --job-id=testing
> > ERROR: failed to open log pool ret=-2
> > job not found
> >
> > Do I have to precreate some pool?
> >
> >
> > On Tue, May 5, 2015 at 8:17 AM, Yehuda Sadeh-Weinraub 
> > wrote:
> >>
> >> I've been working on a new tool that would detect leaked rados objects. It
> >> will take some time for it to be merged into an official release, or even
> >> into the master branch, but if anyone likes to play with it, it is in the
> >> wip-rgw-orphans branch.
> >>
> >> At the moment I recommend to not remove any object that the tool reports,
> >> but rather move it to a different pool for backup (using the rados tool
> >> cp command).
> >>
> >> The tool works in a few stages:
> >> (1) list all the rados objects in the specified pool, store in repository
> >> (2) list all bucket instances in the system, store in repository
> >> (3) iterate through bucket instances in repository, list (logical)
> >> objects, for each object store the expected rados objects that build it
> >> (4) compare data from (1) and (3), each object that is in (1), but not in
> >> (3), stat, if older than $start_time - $stale_period, report it
> >>
> >> There can be lot's of things that can go wrong with this, so we really
> >> need to be careful here.
> >>
> >> The tool can be run by the following command:
> >>
> >> $ radosgw-admin orphans find --pool= --job-id=
> >> [--num-shards=] [--orphan-stale-secs=]
> >>
> >> The tool can be stopped, and restarted, and it will continue from the
> >> stage where it stopped. Note that some of the stages will restart from
> >> the beginning (of the stages), due to system limitation (specifically 1,
> >> 2).
> >>
> >> In order to clean up a job's data:
> >>
> >> $ radosgw-admin orphans finish --job-id=
> >>
> >> Note that the jobs run in the radosgw-admin process context, it does not
> >> schedule a job on the radosgw process.
> >>
> >> Please let me know of any issue you find.
> >>
> >> Thanks,
> >> Yehuda
> >>
> >> - Original Message -
> >>> From: "Ben Hines" 
> >>> To: "Ben" 
> >>> Cc: "Yehuda Sadeh-Weinraub" , "ceph-users"
> >>> 
> >>> Sent: Thursday, April 30, 2015 3:00:16 PM
> >>> Subject: Re: [ceph-users] Shadow Files
> >>>
> >>> Going to hold off on our 94.1 update for this issue
> >>>
> >>> Hopefully this can make it into a 94.2 or a v95 git release.
> >>>
> >>> -Ben
> >>>
> >>> On Mon, Apr 27, 2015 at 2:32 PM, Ben < b@benjackson.email > wrote:
> >>>
> >>>
> >>> How long are you thinking here?
> >>>
> >>> We added more storage to our cluster to overcome these issues, and we
> >>> can't
> >>> keep throwing storage at it until the issues are fixed.
> >>>
> >>>
> >>> On 28/04/15 01:49, Yehuda Sadeh-Weinraub wrote:
> >>>
> >>>
> >>> It will get to the ceph mainline eventually. We're still reviewing and
> >>> testing the fix, and there's more work to be done on the cleanup tool.
> >>>
> >>> Yehuda
> >>>
> >>> - Original Message -
> >>>
> >>>
> >>> From: "Ben" 
> >>> To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> >>> Cc: "ceph-users" < ceph-us...@ceph.com >
> >>> Sent: Sunday, April 26, 2015 11:02:23 PM
> >>> Subject: Re: [ceph-users] Shadow Files
> >>>
> >>> Are these fixes going to make it into the repository versions of ceph,
> >>> or will we be required to compil

Re: [ceph-users] Shadow Files

2015-05-05 Thread Yehuda Sadeh-Weinraub

Yes, so it seems. The librados::nobjects_begin() call expects at least a Hammer 
(0.94) backend. Probably need to add a try/catch there to catch this issue, and 
maybe see if using a different api would be better compatible with older 
backends.

Yehuda

- Original Message -
> From: "Anthony Alba" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "Ben" , "ceph-users" 
> Sent: Tuesday, May 5, 2015 10:14:38 AM
> Subject: Re: [ceph-users] Shadow Files
> 
> Unfortunately it immediately aborted (running against a 0.80.9 Ceph).
> Does Ceph also have to be a 0.94 level?
> 
> last error was
>-3> 2015-05-06 01:11:11.710947 7f311dd15880  0 run(): building
> index of all objects in pool
> -2> 2015-05-06 01:11:11.710995 7f311dd15880  1 --
> 10.200.3.92:0/1001510 --> 10.200.3.32:6800/1870 --
> osd_op(client.4065115.0:27 ^A/ [pgnls start_epoch 0] 11.0 ack+read
> +known_if_redirected e952) v5 -- ?+0 0x39a4e80 con 0x39a4aa0
> -1> 2015-05-06 01:11:11.712125 7f31026f4700  1 --
> 10.200.3.92:0/1001510 <== osd.1 10.200.3.32:6800/1870 1 
> osd_op_reply(27  [pgnls start_epoch 0] v934'6252 uv6252
> ondisk = -22 ((22) Invalid argument)) v6  167+0+0 (3260127617 0 0)
> 0x7f30c4000a90 con 0x39a4aa0
>  0> 2015-05-06 01:11:11.712652 7f311dd15880 -1 *** Caught signal
> (Aborted) **
>  in thread 7f311dd15880
> 
> 
> 
> 
> 
> 2015-05-06 01:11:11.710947 7f311dd15880  0 run(): building index of
> all objects in pool
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  rados returned (22) Invalid argument
> *** Caught signal (Aborted) **
>  in thread 7f311dd15880
>  ceph version 0.94-1339-gc905d51 (c905d517c2c778a88b006302996591b60d167cb6)
>  1: radosgw-admin() [0x61e604]
>  2: (()+0xf130) [0x7f311a59f130]
>  3: (gsignal()+0x37) [0x7f31195d85d7]
>  4: (abort()+0x148) [0x7f31195d9cc8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f3119edc9b5]
>  6: (()+0x5e926) [0x7f3119eda926]
>  7: (()+0x5e953) [0x7f3119eda953]
>  8: (()+0x5eb73) [0x7f3119edab73]
>  9: (()+0x4d116) [0x7f311b606116]
>  10: (librados::IoCtx::nobjects_begin()+0x2e) [0x7f311b60c60e]
>  11: (RGWOrphanSearch::build_all_oids_index()+0x62) [0x516a02]
>  12: (RGWOrphanSearch::run()+0x1e3) [0x51ad23]
>  13: (main()+0xa430) [0x4fbc30]
>  14: (__libc_start_main()+0xf5) [0x7f31195c4af5]
>  15: radosgw-admin() [0x5028d9]
> 2015-05-06 01:11:11.712652 7f311dd15880 -1 *** Caught signal (Aborted) **
>  in thread 7f311dd15880
> 
>  ceph version 0.94-1339-gc905d51 (c905d517c2c778a88b006302996591b60d167cb6)
>  1: radosgw-admin() [0x61e604]
>  2: (()+0xf130) [0x7f311a59f130]
> 
> 
> 
> 
> 
> 
> On Tue, May 5, 2015 at 10:41 PM, Yehuda Sadeh-Weinraub
>  wrote:
> > Can you try creating the .log pool?
> >
> > Yehda
> >
> > - Original Message -
> >> From: "Anthony Alba" 
> >> To: "Yehuda Sadeh-Weinraub" 
> >> Cc: "Ben" , "ceph-users" 
> >> Sent: Tuesday, May 5, 2015 3:37:15 AM
> >> Subject: Re: [ceph-users] Shadow Files
> >>
> >> ...sorry clicked send to quickly
> >>
> >> /opt/ceph/bin/radosgw-admin orphans find --pool=.rgw.buckets --job-id=abcd
> >> ERROR: failed to open log pool ret=-2
> >> job not found
> >>
> >> On Tue, May 5, 2015 at 6:36 PM, Anthony Alba 
> >> wrote:
> >> > Hi Yehuda,
> >> >
> >> > First run:
> >> >
> >> > /opt/ceph/bin/radosgw-admin  --pool=.rgw.buckets --job-id=testing
> >> > ERROR: failed to open log pool ret=-2
> >> > job not found
> >> >
> >> > Do I have to precreate some pool?
> >> >
> >> >
> >> > On Tue, May 5, 2015 at 8:17 AM, Yehuda Sadeh-Weinraub
> >> > 
> >> > wrote:
> >> >>
> >> >> I've been working on a new tool that would detect leaked rados objects.
> >> >> It
> >> >> will take some time for it to be merged into an official release, or
> >> >> even
> >> >> into the master branch, but if anyone likes to play with it, it is in
> >> >> the
> >> >> wip-rgw-orphans branch.
> >> >>
> >> >> At the moment I recommend to not remove any object that the tool
> >> >> reports,
> >> >> but rather move it to a different pool for backup (using the rados tool
> >> >> cp command).
> >> >>
> >> >> The tool works in a few stages:
> >> >> (1) list

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-05-06 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Sean" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Tuesday, May 5, 2015 12:14:19 PM
> Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; 
> civetweb logs stop after rotation
> 
> 
> 
> Hello Yehuda and the rest of the mailing list.
> 
> 
> My main question currently is why are the bucket index and the object
> manifest ever different? Based on how we are uploading data I do not think
> that the rados gateway should ever know the full file size without having
> all of the objects within ceph at one point in time. So after the multipart
> is marked as completed Rados gateway should cat through all of the objects
> and make a complete part, correct?

That's what *should* happen, but obviously there's some bug there.

> 
> 
> 
> Secondly,
> 
> I think I am not understanding the process to grab all of the parts
> correctly. To continue to use my example file
> "86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam"
> in bucket tcga_cghub_protected. I would be using the following to grab the
> prefix:
> 
> 
> prefix=$(radosgw-admin object stat --bucket=tcga_cghub_protected
> --object=86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam
> | grep -iE '"prefix"' | awk -F"\"" '{print $4}')
> 
> 
> Which should take everything between quotes for the prefix key and give me
> the value.
> 
> 
> In this case::
> 
> "prefix":
> "86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S",
> 
> 
> So
> 
> lacadmin@kh10-9:~$ echo ${prefix}
> 
> 86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S
> 
> 
> From here I list all of the objects in the .rgw.buckets pool and grep for
> that said prefix which yields 1335 objects. From here if I cat all of these
> objects together I only end up with a 5468160 byte file which is 2G short of
> what the object manifest says it should be. If I grab the file and tail the
> Rados gateway log I end up with 1849 objects and when I sum them all up I

How are these objects named?

> end up with 7744771642 which is the same size that the manifest reports. I
> understand that this does nothing other than verify the manifests accuracy
> but I still find it interesting. The missing chunks may still exist in ceph
> outside of the object manifest and tagged with the same prefix, correct? Or
> am I misunderstanding something?

Either it's missing a chunk, or one of the objects is truncated. Can you stat 
all the parts? I expect most of the objects to have two different sizes (e.g., 
4MB, 1MB), but at it is likely that the last part is smaller, and maybe another 
object that is missing 512k. 

> 
> 
> We have over 40384 files in the tcga_cghub_protected bucket and only 66 of
> these files are suffering from this truncation issue. What I need to know
> is: is this happening on the gateway side or on the client side? Next I need
> to know what possible actions can occur where the bucket index and the
> object manifest would be mismatched like this as 40318 out of 40384 are
> working without issue.
> 
> 
> The truncated files are of all different sizes (5 megabytes - 980 gigabytes)
> and the truncation seems to be all over. By "all over" I mean some files are
> missing the first few bytes that should read "bam" and some are missing
> parts in the middle.

Can you give an example of an object manifest for a broken object, and all the 
rados objects that build it (e.g., the output of 'rados stat' on these 
objects). A smaller object might be easier.

> 
> 
> So our upload code is using mmap to stream chunks of the file to the Rados
> gateway via a multipart upload but no where on the client side do we have a
> direct reference to the files we are using nor do we specify the size in
> anyway. So where is the gateway getting the correct complete filesize from
> and how is the bucket index showing the intended file size?
> 
> 
> This implies that, at some point in time, ceph was able to see all of the
> parts of the file and calculate the correct total size. This to me seems
> like a rados gateway bug regardless of how the file is being uploaded. I
> think that the RGW should be able to be fuzzed and still store the data
> correctly.
> 
> 
> Why is the bucket list not matching the bucket index and how can I verify
> that the data is not being corrupted by the RGW or worse, after it is
> committed to ceph ?

That's what we're trying to find out.

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW - Can't download complete object

2015-05-07 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "Sean" 
> To: ceph-users@lists.ceph.com
> Sent: Thursday, May 7, 2015 3:35:14 PM
> Subject: [ceph-users] RGW - Can't download complete object
> 
> I have another thread goign on about truncation of objects and I believe
> this is a separate but equally bad issue in civetweb/radosgw. My cluster
> is completely healthy
> 
> I have one (possibly more) objects stored in ceph rados gateway that
> will return a different size every time I Try to download it::
> 
> http://pastebin.com/hK1iqXZH --- ceph -s
> http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object

The two interesting things that I see here is:
 - the multipart upload size for each part is on the big side (is it 1GB for 
each part?)
 - it seems that there are a lot of parts that suffered from retries, could be 
a source for the 512k issue

> http://pastebin.com/5TnvgMrX --- python download code
> 
> The weird part is every time I download the file it is of a different
> size. I am grabbing the individual objects of the 14g file and will
> update this email once I have them all statted out. Currently I am
> getting, on average, 1.5G to 2Gb files when the total object should be
> 14G in size.
> 
> lacadmin@kh10-9:~$ python corruptpull.py
> the download failed. The filesize = 2125988202. The actual size is
> 14577056082. Attempts = 1
> the download failed. The filesize = 2071462250. The actual size is
> 14577056082. Attempts = 2
> the download failed. The filesize = 2016936298. The actual size is
> 14577056082. Attempts = 3
> the download failed. The filesize = 1643643242. The actual size is
> 14577056082. Attempts = 4
> the download failed. The filesize = 1597505898. The actual size is
> 14577056082. Attempts = 5
> the download failed. The filesize = 2075656554. The actual size is
> 14577056082. Attempts = 6
> the download failed. The filesize = 650117482. The actual size is
> 14577056082. Attempts = 7
> the download failed. The filesize = 1987576170. The actual size is
> 14577056082. Attempts = 8
> the download failed. The filesize = 2109210986. The actual size is
> 14577056082. Attempts = 9
> the download failed. The filesize = 2142765418. The actual size is
> 14577056082. Attempts = 10
> the download failed. The filesize = 2134376810. The actual size is
> 14577056082. Attempts = 11
> the download failed. The filesize = 2146959722. The actual size is
> 14577056082. Attempts = 12
> the download failed. The filesize = 2142765418. The actual size is
> 14577056082. Attempts = 13
> the download failed. The filesize = 1467482474. The actual size is
> 14577056082. Attempts = 14
> the download failed. The filesize = 2046296426. The actual size is
> 14577056082. Attempts = 15
> the download failed. The filesize = 2021130602. The actual size is
> 14577056082. Attempts = 16
> the download failed. The filesize = 177366. The actual size is
> 14577056082. Attempts = 17
> the download failed. The filesize = 2146959722. The actual size is
> 14577056082. Attempts = 18
> the download failed. The filesize = 2016936298. The actual size is
> 14577056082. Attempts = 19
> the download failed. The filesize = 1983381866. The actual size is
> 14577056082. Attempts = 20
> the download failed. The filesize = 2134376810. The actual size is
> 14577056082. Attempts = 21
> 
> Notice it is always different. Once the rados -p .rgw.buckets ls | grep
> finishes I will return the listing of objects as well but this is quite
> odd and I think this is a separate issue.
> 
> Has anyone seen this before? Why wouldn't radosgw return an error and
> why am I getting different file sizes?

Usually that means that there was some error in the middle of the download, 
maybe client to radosgw communication issue. What does the radosgw show when 
this happens?

> 
> I would post the log from radosgw but I don't see any "err|wrn|fatal"
> mentions in the log and the client completes without issue every time.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] civetweb lockups

2015-05-11 Thread Yehuda Sadeh-Weinraub

- Original Message -

> From: "Daniel Hoffman" 
> To: "ceph-users" 
> Sent: Sunday, May 10, 2015 10:54:21 PM
> Subject: [ceph-users] civetweb lockups

> Hi All.

> We have a wierd issue where civetweb just locks up, it just fails to respond
> to HTTP and a restart resolves the problem. This happens anywhere from every
> 60 seconds to every 4 hours with no reason behind it.

> We have run the gateway in full debug mode and there is nothing there that
> seems to be an issue.

> We run 2 gateways on 6core machines, there is no load, cpu or memory wise,
> the machines seem fine. They are load balanced behind HA proxy. We run 12
> data nodes at the moment with ~170 disks.

> We see around the 40-60MB/s into the array. Is this just too much for
> civetweb to handle? Should we look at virtual machines on the hardware/mode
> nodes?

> [client.radosgw.ceph-obj02]
> host = ceph-obj02
> keyring = /etc/ceph/keyring.radosgw.ceph-obj02
> rgw socket path = /tmp/radosgw.sock
> log file = /var/log/ceph/radosgw.log
> rgw data = /var/lib/ceph/radosgw/ceph-obj02
> rgw thread pool size = 1024
> rgw print continue = False
> debug rgw = 0
> debug ms = 0
> rgw enable ops log = False
> log to stderr = False
> rgw enable usage log = False

> Advice appreciated.

Not sure what would be the issue. I'd look at the number of threads, maybe try 
reducing it, see if it makes any difference? Also, try to see how many open fds 
are there when it hangs. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-05-11 Thread Yehuda Sadeh-Weinraub

- Original Message -

> From: "Daniel Hoffman" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "Ben" , "ceph-users" 
> Sent: Sunday, May 10, 2015 5:03:22 PM
> Subject: Re: [ceph-users] Shadow Files

> Any updates on when this is going to be released?

> Daniel

> On Wed, May 6, 2015 at 3:51 AM, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> wrote:

> > Yes, so it seems. The librados::nobjects_begin() call expects at least a
> > Hammer (0.94) backend. Probably need to add a try/catch there to catch this
> > issue, and maybe see if using a different api would be better compatible
> > with older backends.
> 

> > Yehuda
> 

I cleaned up the commits a bit, but it needs to be reviewed, and it'll be nice 
to get some more testing to it before it goes on an official release. There's 
still the issue of running it against a firefly backend. I looked at 
backporting it to firefly, but it's not going to be a trivial work, so I think 
the better time usage would be to get the hammer one to work against a firefly 
backend. There are some librados api quirks that we need to flush out first. 

Yehuda 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-05-11 Thread Yehuda Sadeh-Weinraub

It's the wip-rgw-orphans branch.

- Original Message -
> From: "Daniel Hoffman" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "Ben" , "David Zafman" , 
> "ceph-users" 
> Sent: Monday, May 11, 2015 4:30:11 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> Thanks.
> 
> Can you please let me know the suitable/best git version/tree to be pulling
> to compile and use this feature/patch?
> 
> Thanks
> 
> On Tue, May 12, 2015 at 4:38 AM, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> From: "Daniel Hoffman" < daniel.hoff...@13andrew.com >
> To: "Yehuda Sadeh-Weinraub" < yeh...@redhat.com >
> Cc: "Ben" , "ceph-users" < ceph-us...@ceph.com >
> Sent: Sunday, May 10, 2015 5:03:22 PM
> Subject: Re: [ceph-users] Shadow Files
> 
> Any updates on when this is going to be released?
> 
> Daniel
> 
> On Wed, May 6, 2015 at 3:51 AM, Yehuda Sadeh-Weinraub < yeh...@redhat.com >
> wrote:
> 
> 
> Yes, so it seems. The librados::nobjects_begin() call expects at least a
> Hammer (0.94) backend. Probably need to add a try/catch there to catch this
> issue, and maybe see if using a different api would be better compatible
> with older backends.
> 
> Yehuda
> I cleaned up the commits a bit, but it needs to be reviewed, and it'll be
> nice to get some more testing to it before it goes on an official release.
> There's still the issue of running it against a firefly backend. I looked at
> backporting it to firefly, but it's not going to be a trivial work, so I
> think the better time usage would be to get the hammer one to work against a
> firefly backend. There are some librados api quirks that we need to flush
> out first.
> 
> Yehuda
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-05-12 Thread Yehuda Sadeh-Weinraub

Hi,

Thank you for a very thorough investigation. See my comments below:

- Original Message -
> From: "Mark Murphy" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: "Sean Sullivan" , ceph-users@lists.ceph.com
> Sent: Tuesday, May 12, 2015 10:50:49 AM
> Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; 
> civetweb logs stop after rotation
> 
> Hey Yehuda,
> 
> I work with Sean on the dev side. We thought we should put together a short
> report on what we’ve been seeing in the hopes that the behavior might make
> some sense to you.
> 
> We had originally noticed these issues a while ago with our first iteration
> of this particular Ceph deployment. The issues we had seen were
> characterized by two different behaviors:
> 
>   • Some objects would appear truncated, returning different sizes for 
> each
>   request. Repeated attempts would eventually result in a successful
>   retrieval if the second behavior doesn’t apply.

This really sound like some kind of networking issue, maybe a load balancer 
that is on the way that clobbers things?

>   • Some objects would always appear truncated, missing an integer 
> multiple of
>   512KB.
> 
> This is where the report that we are encountering ‘truncation’ came from,
> which is slightly misleading. We recently verified that we are indeed
> encountering the first behavior, for which I believe Sean has supplied or
> will be supplying Ceph logs showcasing the server-side errors, and is true
> truncation. However, the second behavior is not really truncation, but
> missing 512KB chunks, as Sean has brought up.
> 
> We’ve had some luck with identifying some of the patterns that are seemingly
> related to this issue. Without going into too great of detail, we’ve found
> the following appear to hold true for all objects affected by the second
> behavior:
> 
>   • The amount of data missing is always in integer multiples of 512KB.
>   • The expected file size is always found via the bucket index.
>   • Ceph objects do not appear to be missing chunks or have holes in them.
>   • The missing 512KB chunks are always at the beginning of multipart 
> segments
>   (1GB in our case).

This matches some of my original suspicions. Here's some basic background that 
might help clarify things:

This looks like some kind of rgw bug. A radosgw object is usually composed of 
two different parts: the object head, and the object tail. The head is usually 
composed of the first 512k of data of the object (and never more than that), 
and the tail has the rest of the object's data. However, the head data part is 
optional, and it can be zero. For example, in the case of multipart upload, 
after combining the parts, the head will not have any data, and the tail will 
be compiled out of the different parts data.
However, when dealing with multipart parts, the parts do not really have a head 
(due to their immutability), so it is expected that the part object sizes to be 
4MB. So it seems that for some reason these specific parts were treated as if 
they had a head, although they shouldn't have. Now, that brings me to the 
issue, where I noticed that some of the parts were retried. When this happens, 
the part name is different than the default part name, so there's a note in the 
manifest, and a special handling that start at specific offsets. It might be 
that this is related, and the code that handles the retries generate bad object 
parts.



>   • For large files missing multiple chunks, the segments affected appear 
> to
>   be clustered and contiguous.
> 

That would point at a cluster of retries, maybe due to networking issues around 
the time these were created.

> The first pattern was identified when we noticed that the bucket index and
> the object manifest differed in reported size. This is useful as an quick
> method of identifying affected objects. We’ve used this to avoid having to
> pull down and check each object individually. In total, we have 108 affected
> objects, which translates to approximately 0.25% of our S3 objects.
> 
> We noticed that the bucket index always reports the object size that would be
> expected had the upload gone correctly. Since we only ever report the
> segment sizes to the gateway, this would suggest that the segment sizes were
> reported accurately and aggregated correctly server side.
> 
> Sean identified the Ceph objects that compose one of our affected S3 objects.
> We thought we might see the first Ceph object missing some data, but found
> it to be a full 4MB. Retrieving the first Ceph object and comparing it to
> the bytes in the corresponding file, it appears that the Ceph object matches
> the 4MB of the file after the first 512KB. We t

Re: [ceph-users] Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

2015-05-12 Thread Yehuda Sadeh-Weinraub

I opened issue #11604, and have a fix for the issue. I updated our test suite 
to cover the specific issue that you were hitting. We'll backport the fix to 
both hammer and firefly soon.

Thanks!
Yehuda

- Original Message -
> From: "Yehuda Sadeh-Weinraub" 
> To: "Mark Murphy" 
> Cc: ceph-users@lists.ceph.com, "Sean Sullivan" 
> Sent: Tuesday, May 12, 2015 12:59:48 PM
> Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects; 
> civetweb logs stop after rotation
> 
> Hi,
> 
> Thank you for a very thorough investigation. See my comments below:
> 
> - Original Message -
> > From: "Mark Murphy" 
> > To: "Yehuda Sadeh-Weinraub" 
> > Cc: "Sean Sullivan" , ceph-users@lists.ceph.com
> > Sent: Tuesday, May 12, 2015 10:50:49 AM
> > Subject: Re: [ceph-users] Civet RadosGW S3 not storing complete obects;
> > civetweb logs stop after rotation
> > 
> > Hey Yehuda,
> > 
> > I work with Sean on the dev side. We thought we should put together a short
> > report on what we’ve been seeing in the hopes that the behavior might make
> > some sense to you.
> > 
> > We had originally noticed these issues a while ago with our first iteration
> > of this particular Ceph deployment. The issues we had seen were
> > characterized by two different behaviors:
> > 
> > • Some objects would appear truncated, returning different sizes for 
> > each
> > request. Repeated attempts would eventually result in a successful
> > retrieval if the second behavior doesn’t apply.
> 
> This really sound like some kind of networking issue, maybe a load balancer
> that is on the way that clobbers things?
> 
> > • Some objects would always appear truncated, missing an integer 
> > multiple
> > of
> > 512KB.
> > 
> > This is where the report that we are encountering ‘truncation’ came from,
> > which is slightly misleading. We recently verified that we are indeed
> > encountering the first behavior, for which I believe Sean has supplied or
> > will be supplying Ceph logs showcasing the server-side errors, and is true
> > truncation. However, the second behavior is not really truncation, but
> > missing 512KB chunks, as Sean has brought up.
> > 
> > We’ve had some luck with identifying some of the patterns that are
> > seemingly
> > related to this issue. Without going into too great of detail, we’ve found
> > the following appear to hold true for all objects affected by the second
> > behavior:
> > 
> > • The amount of data missing is always in integer multiples of 512KB.
> > • The expected file size is always found via the bucket index.
> > • Ceph objects do not appear to be missing chunks or have holes in them.
> > • The missing 512KB chunks are always at the beginning of multipart
> > segments
> > (1GB in our case).
> 
> This matches some of my original suspicions. Here's some basic background
> that might help clarify things:
> 
> This looks like some kind of rgw bug. A radosgw object is usually composed of
> two different parts: the object head, and the object tail. The head is
> usually composed of the first 512k of data of the object (and never more
> than that), and the tail has the rest of the object's data. However, the
> head data part is optional, and it can be zero. For example, in the case of
> multipart upload, after combining the parts, the head will not have any
> data, and the tail will be compiled out of the different parts data.
> However, when dealing with multipart parts, the parts do not really have a
> head (due to their immutability), so it is expected that the part object
> sizes to be 4MB. So it seems that for some reason these specific parts were
> treated as if they had a head, although they shouldn't have. Now, that
> brings me to the issue, where I noticed that some of the parts were retried.
> When this happens, the part name is different than the default part name, so
> there's a note in the manifest, and a special handling that start at
> specific offsets. It might be that this is related, and the code that
> handles the retries generate bad object parts.
> 
> 
> 
> > • For large files missing multiple chunks, the segments affected appear 
> > to
> > be clustered and contiguous.
> > 
> 
> That would point at a cluster of retries, maybe due to networking issues
> around the time these were created.
> 
> > The first pattern was identified when we noticed that the bucket index and
> > the object manifest differed in reporte

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub

That's another interesting issue. Note that for part 12_80 the manifest 
specifies (I assume, by the messenger log) this part:

default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
(note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')

whereas it seems that you do have the original part:
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
(note the '2/...')

The part that the manifest specifies does not exist, which makes me think that 
there is some weird upload sequence, something like:

 - client uploads part, upload finishes but client does not get ack for it
 - client retries (second upload)
 - client gets ack for the first upload and gives up on the second one

But I'm not sure if it would explain the manifest, I'll need to take a look at 
the code. Could such a sequence happen with the client that you're using to 
upload?

Yehuda

- Original Message -
> From: "Sean Sullivan" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, May 13, 2015 2:07:22 PM
> Subject: Re: [ceph-users] RGW - Can't download complete object
> 
> Sorry for the delay. It took me a while to figure out how to do a range
> request and append the data to a single file. The good news is that the end
> file seems to be 14G in size which matches the files manifest size. The bad
> news is that the file is completely corrupt and the radosgw log has errors.
> I am using the following code to perform the download::
> 
> https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
> 
> Here is a clip of the log file::
> --
> 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
> [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004 (1180387808 0
> 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
> 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12934184960 len=858004
> 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 
> 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
> 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12145655808 len=4194304
> 
> 2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when
> trying to read object: -2
> 
> 2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
> [read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316 (1695485150
> 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
> 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=10786701312 len=3671316
> 2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> osd.82 10.64.64.103:6857/88524 2  osd_op_reply(74566283
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
> [read 0~4194304] v0'0 uv41566 ondisk = 0) v6  303+0+4194304 (1474509283
> 0 3209869954) 0x7f53d005b1a0 con 0x7f56f81b1420
> 2015-05-11 15:28:52.504118 7f57067fc700 20 get_obj_aio_completion_cb: io
> completion ofs=12917407744 len=4194304
> 
> I couldn't really find any good documentation on how fragments/files are
> layed out on the object file system so I am not sure on where the file will
> be. How could the 4mb object have issues but the cluster be completely
> health okay? I did do the rados stat of each object inside ceph and they all
> appear to be there::
> 
> http://paste.ubuntu.com/8561/
> 
> The sum of all of the objects :: 14584887282
> The stat of the object inside ceph:: 14577056082
> 
> So for some reason I have more data in objects than the key manifest. We
> easiliy identified this object via the same method as the other thread I
> have::
> 
> for key in keys:
>: if ( key.name ==
>'b235040a-46b6-42b3-b134-962b1f8813d5/28357709e4

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub

Ok, I dug a bit more, and it seems to me that the problem is with the manifest 
that was created. I was able to reproduce a similar issue (opened ceph bug 
#11622), for which I also have a fix.

I created new tests to cover this issue, and we'll get those recent fixes as 
soon as we can, after we test for any regressions.

Thanks,
Yehuda

- Original Message -
> From: "Yehuda Sadeh-Weinraub" 
> To: "Sean Sullivan" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, May 13, 2015 2:33:07 PM
> Subject: Re: [ceph-users] RGW - Can't download complete object
> 
> That's another interesting issue. Note that for part 12_80 the manifest
> specifies (I assume, by the messenger log) this part:
> 
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')
> 
> whereas it seems that you do have the original part:
> default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
> (note the '2/...')
> 
> The part that the manifest specifies does not exist, which makes me think
> that there is some weird upload sequence, something like:
> 
>  - client uploads part, upload finishes but client does not get ack for it
>  - client retries (second upload)
>  - client gets ack for the first upload and gives up on the second one
> 
> But I'm not sure if it would explain the manifest, I'll need to take a look
> at the code. Could such a sequence happen with the client that you're using
> to upload?
> 
> Yehuda
> 
> - Original Message -
> > From: "Sean Sullivan" 
> > To: "Yehuda Sadeh-Weinraub" 
> > Cc: ceph-users@lists.ceph.com
> > Sent: Wednesday, May 13, 2015 2:07:22 PM
> > Subject: Re: [ceph-users] RGW - Can't download complete object
> > 
> > Sorry for the delay. It took me a while to figure out how to do a range
> > request and append the data to a single file. The good news is that the end
> > file seems to be 14G in size which matches the files manifest size. The bad
> > news is that the file is completely corrupt and the radosgw log has errors.
> > I am using the following code to perform the download::
> > 
> > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
> > 
> > Here is a clip of the log file::
> > --
> > 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
> > [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004 (1180387808 0
> > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
> > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=12934184960 len=858004
> > 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> > [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 
> > 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
> > 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=12145655808 len=4194304
> > 
> > 2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error when
> > trying to read object: -2
> > 
> > 2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12
> > [read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316 (1695485150
> > 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
> > 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io
> > completion ofs=10786701312 len=3671316
> > 2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108 <==
> > osd.82 10.64.64.103:6857/88524 2  osd_op_reply(74566283
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_8
> > [read 0~4194304] v0'0 uv41566 ondisk = 0) v6  303+0+4194304 (1474509283
> > 0 3209869954) 0x7f5

Re: [ceph-users] RGW - Can't download complete object

2015-05-13 Thread Yehuda Sadeh-Weinraub

The code is in wip-11620, abd it's currently on top of the next branch. We'll 
get it through the tests, then get it into hammer and firefly. I wouldn't 
recommend installing it in production without proper testing first.

Yehuda

- Original Message -
> From: "Sean Sullivan" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, May 13, 2015 7:22:10 PM
> Subject: Re: [ceph-users] RGW - Can't download complete object
> 
> Thank you so much Yahuda! I look forward to testing these. Is there a way
> for me to pull this code in? Is it in master?
> 
> 
> On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub  wrote:
> 
> > Ok, I dug a bit more, and it seems to me that the problem is with the
> > manifest that was created. I was able to reproduce a similar issue (opened
> > ceph bug #11622), for which I also have a fix.
> >
> > I created new tests to cover this issue, and we'll get those recent fixes
> > as soon as we can, after we test for any regressions.
> >
> > Thanks,
> > Yehuda
> >
> > - Original Message -
> > > From: "Yehuda Sadeh-Weinraub" 
> > > To: "Sean Sullivan" 
> > > Cc: ceph-users@lists.ceph.com
> > > Sent: Wednesday, May 13, 2015 2:33:07 PM
> > > Subject: Re: [ceph-users] RGW - Can't download complete object
> > >
> > > That's another interesting issue. Note that for part 12_80 the manifest
> > > specifies (I assume, by the messenger log) this part:
> > >
> > > 
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
> > > (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')
> > >
> > > whereas it seems that you do have the original part:
> > > 
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80
> > > (note the '2/...')
> > >
> > > The part that the manifest specifies does not exist, which makes me think
> > > that there is some weird upload sequence, something like:
> > >
> > >  - client uploads part, upload finishes but client does not get ack for
> > >  it
> > >  - client retries (second upload)
> > >  - client gets ack for the first upload and gives up on the second one
> > >
> > > But I'm not sure if it would explain the manifest, I'll need to take a
> > > look
> > > at the code. Could such a sequence happen with the client that you're
> > > using
> > > to upload?
> > >
> > > Yehuda
> > >
> > > - Original Message -
> > > > From: "Sean Sullivan" 
> > > > To: "Yehuda Sadeh-Weinraub" 
> > > > Cc: ceph-users@lists.ceph.com
> > > > Sent: Wednesday, May 13, 2015 2:07:22 PM
> > > > Subject: Re: [ceph-users] RGW - Can't download complete object
> > > >
> > > > Sorry for the delay. It took me a while to figure out how to do a range
> > > > request and append the data to a single file. The good news is that the
> > > > end
> > > > file seems to be 14G in size which matches the files manifest size. The
> > > > bad
> > > > news is that the file is completely corrupt and the radosgw log has
> > > > errors.
> > > > I am using the following code to perform the download::
> > > >
> > > > 
> > https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py
> > > >
> > > > Here is a clip of the log file::
> > > > --
> > > > 2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108
> > > > <==
> > > > osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287
> > > > 
> > default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12
> > > > [read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004
> > > > (1180387808 0
> > > > 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
> > > > 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb:
> > > > io
> > > > completion ofs=12934184960 len=858004
> > > > 2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108
> > > > <==
> > > > osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566

Re: [ceph-users] radosgw crash within libfcgi

2015-06-24 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "GuangYang" 
> To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com, yeh...@redhat.com
> Sent: Wednesday, June 24, 2015 10:09:58 AM
> Subject: radosgw crash within libfcgi
> 
> Hello Cephers,
> Recently we have several radosgw daemon crashes with the same following
> kernel log:
> 
> Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
> 7ffa069996f2 sp 7ff55c432710 error 6 in
> libfcgi.so.0.0.0[7ffa06995000+a000] in libfcgi.so.0.0.0[7ffa06995000+a000]
> 
> Looking at the assembly, it seems crashing at this point -
> http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which
> confused me. I tried to see if there is any other reference holding the
> FCGX_Request which release the handle without any luck.
> 
> There are also other observations:
>  1> Several radosgw daemon across different hosts crashed around the same
>  time.
>  2> Apache's error log has some fcgi error complaining ##idle timeout##
>  during the time.
> 
> Does anyone experience similar issue?
> 

In the past we've had issues with libfcgi that were related to the number of 
open fds on the process (> 1024). The issue was a buggy libfcgi that was using 
select() instead of poll(), so this might be the issue you're noticing.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw crash within libfcgi

2015-06-24 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "GuangYang" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
> Sent: Wednesday, June 24, 2015 1:53:20 PM
> Subject: RE: radosgw crash within libfcgi
> 
> Thanks Yehuda for the response.
> 
> We already patched libfcgi to use poll instead of select to overcome the
> limitation.
> 
> Thanks,
> Guang
> 
> 
> 
> > Date: Wed, 24 Jun 2015 14:40:25 -0400
> > From: yeh...@redhat.com
> > To: yguan...@outlook.com
> > CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> > Subject: Re: radosgw crash within libfcgi
> >
> >
> >
> > - Original Message -
> >> From: "GuangYang" 
> >> To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com,
> >> yeh...@redhat.com
> >> Sent: Wednesday, June 24, 2015 10:09:58 AM
> >> Subject: radosgw crash within libfcgi
> >>
> >> Hello Cephers,
> >> Recently we have several radosgw daemon crashes with the same following
> >> kernel log:
> >>
> >> Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
> >> 7ffa069996f2 sp 7ff55c432710 error 6 in

error 6 is sigabrt, right? With invalid pointer I'd expect to get segfault. Is 
the pointer actually invalid?

Yehuda


> >> libfcgi.so.0.0.0[7ffa06995000+a000] in libfcgi.so.0.0.0[7ffa06995000+a000]
> >>
> >> Looking at the assembly, it seems crashing at this point -
> >> http://github.com/sknown/fcgi/blob/master/libfcgi/fcgiapp.c#L2035, which
> >> confused me. I tried to see if there is any other reference holding the
> >> FCGX_Request which release the handle without any luck.
> >>
> >> There are also other observations:
> >> 1> Several radosgw daemon across different hosts crashed around the same
> >> time.
> >> 2> Apache's error log has some fcgi error complaining ##idle timeout##
> >> during the time.
> >>
> >> Does anyone experience similar issue?
> >>
> >
> > In the past we've had issues with libfcgi that were related to the number
> > of open fds on the process (> 1024). The issue was a buggy libfcgi that
> > was using select() instead of poll(), so this might be the issue you're
> > noticing.
> >
> > Yehuda
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> N嫥叉靣笡y氊b瞂千v豝�藓{.n�壏渮榏z鳐妠ay�蕠跈�jf＂穐殝鄗�畐ア�⒎:+v墾妛鑚豰稛�珣赙zZ+凒殠娸"濟!秈
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw crash within libfcgi

2015-06-24 Thread Yehuda Sadeh-Weinraub



- Original Message -
> From: "GuangYang" 
> To: "Yehuda Sadeh-Weinraub" 
> Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
> Sent: Wednesday, June 24, 2015 2:12:23 PM
> Subject: RE: radosgw crash within libfcgi
> 
> 
> > Date: Wed, 24 Jun 2015 17:04:05 -0400
> > From: yeh...@redhat.com
> > To: yguan...@outlook.com
> > CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> > Subject: Re: radosgw crash within libfcgi
> >
> >
> >
> > - Original Message -
> >> From: "GuangYang" 
> >> To: "Yehuda Sadeh-Weinraub" 
> >> Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
> >> Sent: Wednesday, June 24, 2015 1:53:20 PM
> >> Subject: RE: radosgw crash within libfcgi
> >>
> >> Thanks Yehuda for the response.
> >>
> >> We already patched libfcgi to use poll instead of select to overcome the
> >> limitation.
> >>
> >> Thanks,
> >> Guang
> >>
> >>
> >> 
> >>> Date: Wed, 24 Jun 2015 14:40:25 -0400
> >>> From: yeh...@redhat.com
> >>> To: yguan...@outlook.com
> >>> CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> >>> Subject: Re: radosgw crash within libfcgi
> >>>
> >>>
> >>>
> >>> - Original Message -
> >>>> From: "GuangYang" 
> >>>> To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com,
> >>>> yeh...@redhat.com
> >>>> Sent: Wednesday, June 24, 2015 10:09:58 AM
> >>>> Subject: radosgw crash within libfcgi
> >>>>
> >>>> Hello Cephers,
> >>>> Recently we have several radosgw daemon crashes with the same following
> >>>> kernel log:
> >>>>
> >>>> Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
> >>>> 7ffa069996f2 sp 7ff55c432710 error 6 in
> >
> > error 6 is sigabrt, right? With invalid pointer I'd expect to get segfault.
> > Is the pointer actually invalid?
> With (ip - {address_load_the_sharded_library}) to get the instruction which
> caused this crash, the objdump shows the crash happened at instruction 46f2
> (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but I
> don't quite understand how/why it could crash there.
> 
> 4690 :
>     4690:       48 89 5c 24 f0          mov    %rbx,-0x10(%rsp)
>     4695:       48 89 6c 24 f8          mov    %rbp,-0x8(%rsp)
>     469a:       48 83 ec 18             sub    $0x18,%rsp
>     469e:       48 85 ff                test   %rdi,%rdi
>     46a1:       48 89 fb                mov    %rdi,%rbx
>     46a4:       89 f5                   mov    %esi,%ebp
>     46a6:       74 28                   je     46d0 
>     46a8:       48 8d 7f 08             lea    0x8(%rdi),%rdi
>     46ac:       e8 67 e3 ff ff          callq  2a18 
>     46b1:       48 8d 7b 10             lea    0x10(%rbx),%rdi
>     46b5:       e8 5e e3 ff ff          callq  2a18 
>     46ba:       48 8d 7b 18             lea    0x18(%rbx),%rdi
>     46be:       e8 55 e3 ff ff          callq  2a18 
>     46c3:       48 8d 7b 28             lea    0x28(%rbx),%rdi
>     46c7:       e8 d4 f4 ff ff          callq  3ba0 
>     46cc:       85 ed                   test   %ebp,%ebp
>     46ce:       75 10                   jne    46e0 
>     46d0:       48 8b 5c 24 08          mov    0x8(%rsp),%rbx
>     46d5:       48 8b 6c 24 10          mov    0x10(%rsp),%rbp
>     46da:       48 83 c4 18             add    $0x18,%rsp
>     46de:       c3                      retq
>     46df:       90                      nop
>     46e0:       31 f6                   xor    %esi,%esi
>     46e2:       83 7b 4c 00             cmpl   $0x0,0x4c(%rbx)
>     46e6:       8b 7b 30                mov    0x30(%rbx),%edi
>     46e9:       40 0f 94 c6             sete   %sil
>     46ed:       e8 86 e6 ff ff          callq  2d78 
>     46f2:       c7 43 30 ff ff ff ff    movl   $0x,0x30(%rbx)

info registers?

Not too familiar with the specific message, but it could be that OS_IpcClose() 
aborts (not highly unlikely) and it only dumps the return address of the 
current function (shouldn't be referenced as ip though).

What's rbx? Is the memory at %rbx + 0x30 valid?

Also, did you by any chance upgrade the binaries while the code was running? is 
the code running over nfs?

Yehuda

> >
> > Yehuda
> >
> >
> >>>> libfcgi.so.0

Re: [ceph-users] radosgw crash within libfcgi

2015-06-24 Thread Yehuda Sadeh-Weinraub

Also, looking at the code, I see an extra call to FCGX_Finish_r():

diff --git a/src/rgw/rgw_main.cc b/src/rgw/rgw_main.cc
index 9a8aa5f..0aa7ded 100644
--- a/src/rgw/rgw_main.cc
+++ b/src/rgw/rgw_main.cc
@@ -669,8 +669,6 @@ void RGWFCGXProcess::handle_request(RGWRequest *r)
 dout(20) << "process_request() returned " << ret << dendl;
   }
 
-  FCGX_Finish_r(fcgx);
-
   delete req;
 }
 

Maybe this is a problem on the specific libfcgi version that you're using?

- Original Message -----
> From: "Yehuda Sadeh-Weinraub" 
> To: "GuangYang" 
> Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
> Sent: Wednesday, June 24, 2015 2:21:04 PM
> Subject: Re: radosgw crash within libfcgi
> 
> 
> 
> ----- Original Message -
> > From: "GuangYang" 
> > To: "Yehuda Sadeh-Weinraub" 
> > Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
> > Sent: Wednesday, June 24, 2015 2:12:23 PM
> > Subject: RE: radosgw crash within libfcgi
> > 
> > 
> > > Date: Wed, 24 Jun 2015 17:04:05 -0400
> > > From: yeh...@redhat.com
> > > To: yguan...@outlook.com
> > > CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> > > Subject: Re: radosgw crash within libfcgi
> > >
> > >
> > >
> > > - Original Message -
> > >> From: "GuangYang" 
> > >> To: "Yehuda Sadeh-Weinraub" 
> > >> Cc: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com
> > >> Sent: Wednesday, June 24, 2015 1:53:20 PM
> > >> Subject: RE: radosgw crash within libfcgi
> > >>
> > >> Thanks Yehuda for the response.
> > >>
> > >> We already patched libfcgi to use poll instead of select to overcome the
> > >> limitation.
> > >>
> > >> Thanks,
> > >> Guang
> > >>
> > >>
> > >> 
> > >>> Date: Wed, 24 Jun 2015 14:40:25 -0400
> > >>> From: yeh...@redhat.com
> > >>> To: yguan...@outlook.com
> > >>> CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> > >>> Subject: Re: radosgw crash within libfcgi
> > >>>
> > >>>
> > >>>
> > >>> - Original Message -
> > >>>> From: "GuangYang" 
> > >>>> To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com,
> > >>>> yeh...@redhat.com
> > >>>> Sent: Wednesday, June 24, 2015 10:09:58 AM
> > >>>> Subject: radosgw crash within libfcgi
> > >>>>
> > >>>> Hello Cephers,
> > >>>> Recently we have several radosgw daemon crashes with the same
> > >>>> following
> > >>>> kernel log:
> > >>>>
> > >>>> Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip
> > >>>> 7ffa069996f2 sp 7ff55c432710 error 6 in
> > >
> > > error 6 is sigabrt, right? With invalid pointer I'd expect to get
> > > segfault.
> > > Is the pointer actually invalid?
> > With (ip - {address_load_the_sharded_library}) to get the instruction which
> > caused this crash, the objdump shows the crash happened at instruction 46f2
> > (see below), which was to assign '-1' to the CGX_Request::ipcFd to -1, but
> > I
> > don't quite understand how/why it could crash there.
> > 
> > 4690 :
> >     4690:       48 89 5c 24 f0          mov    %rbx,-0x10(%rsp)
> >     4695:       48 89 6c 24 f8          mov    %rbp,-0x8(%rsp)
> >     469a:       48 83 ec 18             sub    $0x18,%rsp
> >     469e:       48 85 ff                test   %rdi,%rdi
> >     46a1:       48 89 fb                mov    %rdi,%rbx
> >     46a4:       89 f5                   mov    %esi,%ebp
> >     46a6:       74 28                   je     46d0 
> >     46a8:       48 8d 7f 08             lea    0x8(%rdi),%rdi
> >     46ac:       e8 67 e3 ff ff          callq  2a18 
> >     46b1:       48 8d 7b 10             lea    0x10(%rbx),%rdi
> >     46b5:       e8 5e e3 ff ff          callq  2a18 
> >     46ba:       48 8d 7b 18             lea    0x18(%rbx),%rdi
> >     46be:       e8 55 e3 ff ff          callq  2a18 
> >     46c3:       48 8d 7b 28             lea    0x28(%rbx),%rdi
> >     46c7:       e8 d4 f4 ff ff          callq  3ba0 
> >     46cc:       85 ed                   test   %ebp,%ebp

Re: [ceph-users] S3：Permissions of access-key

2015-08-28 Thread Yehuda Sadeh-Weinraub

On Fri, Aug 28, 2015 at 2:17 AM, Zhengqiankun  wrote:
> hi,Yehuda:
>
>   I have a question and hope that you can help me answer it. Different
> subuser of swift
>
>   can set specific permissions, but why not set specific permission for
> access-key of s3?
>

Probably because no one ever asked it. It shouldn't be hard to do
this, sounds like an easy starter project if anyone wants to get their
hands dirty in the rgw code. Note that the canonical way to do it in
S3 is through user policies that we don't (yet?) support.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Yehuda Sadeh-Weinraub

As long as you're 100% sure that the prefix is only being used for the
specific bucket that was previously removed, then it is safe to remove
these objects. But please do double check and make sure that there's
no other bucket that matches this prefix somehow.

Yehuda

On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines  wrote:
> No input, eh? (or maybe TL,DR for everyone)
>
> Short version: Presuming the bucket index shows blank/empty, which it
> does and is fine, would me manually deleting the rados objects with
> the prefix matching the former bucket's ID cause any problems?
>
> thanks,
>
> -Ben
>
> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines  wrote:
>> Ceph 0.93->94.2->94.3
>>
>> I noticed my pool used data amount is about twice the bucket used data count.
>>
>> This bucket was emptied long ago. It has zero objects:
>> "globalcache01",
>> {
>> "bucket": "globalcache01",
>> "pool": ".rgw.buckets",
>> "index_pool": ".rgw.buckets.index",
>> "id": "default.8873277.32",
>> "marker": "default.8873277.32",
>> "owner": "...",
>> "ver": "0#12348839",
>> "master_ver": "0#0",
>> "mtime": "2015-03-08 11:44:11.00",
>> "max_marker": "0#",
>> "usage": {
>> "rgw.none": {
>> "size_kb": 0,
>> "size_kb_actual": 0,
>> "num_objects": 0
>> },
>> "rgw.main": {
>> "size_kb": 0,
>> "size_kb_actual": 0,
>> "num_objects": 0
>> }
>> },
>> "bucket_quota": {
>> "enabled": false,
>> "max_size_kb": -1,
>> "max_objects": -1
>> }
>> },
>>
>>
>>
>> bucket check shows nothing:
>>
>> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>> --bucket=globalcache01 --fix
>> []
>> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>> --check-head-obj-locator --bucket=globalcache01 --fix
>> {
>> "bucket": "globalcache01",
>> "check_objects": [
>> ]
>> }
>>
>>
>> However, i see a lot of data for it on an OSD (all shadow files with
>> escaped underscores)
>>
>> [root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print
>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c
>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c
>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c
>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c
>> 
>>
>> -bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+'
>> default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47
>> default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6
>> default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3
>> default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1
>> default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16
>> default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2
>> 
>>
>> Is there still a bug in the fix obj locator command perhaps? I suppose
>> can just do something like:
>>
>>rados -p .rgw.buckets cleanup --prefix default.8873277.32
>>
>> Since i want to destroy the bucket anyway, but if this affects other
>> buckets, i may want to clean those a better way.
>>
>> -Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Yehuda Sadeh-Weinraub

Make sure you use the underscore also, e.g., "default.8873277.32_".
Otherwise you could potentially erase objects you did't intend to,
like ones who start with "default.8873277.320" and such.

On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines  wrote:
> Ok. I'm not too familiar with the inner workings of RGW, but i would
> assume that for a bucket with these parameters:
>
>"id": "default.8873277.32",
>"marker": "default.8873277.32",
>
> Tha it would be the only bucket using the files that start with
> "default.8873277.32"
>
> default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15
> default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2
>
>
>
> On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub
>  wrote:
>> As long as you're 100% sure that the prefix is only being used for the
>> specific bucket that was previously removed, then it is safe to remove
>> these objects. But please do double check and make sure that there's
>> no other bucket that matches this prefix somehow.
>>
>> Yehuda
>>
>> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines  wrote:
>>> No input, eh? (or maybe TL,DR for everyone)
>>>
>>> Short version: Presuming the bucket index shows blank/empty, which it
>>> does and is fine, would me manually deleting the rados objects with
>>> the prefix matching the former bucket's ID cause any problems?
>>>
>>> thanks,
>>>
>>> -Ben
>>>
>>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines  wrote:
>>>> Ceph 0.93->94.2->94.3
>>>>
>>>> I noticed my pool used data amount is about twice the bucket used data 
>>>> count.
>>>>
>>>> This bucket was emptied long ago. It has zero objects:
>>>> "globalcache01",
>>>> {
>>>> "bucket": "globalcache01",
>>>> "pool": ".rgw.buckets",
>>>> "index_pool": ".rgw.buckets.index",
>>>> "id": "default.8873277.32",
>>>> "marker": "default.8873277.32",
>>>> "owner": "...",
>>>> "ver": "0#12348839",
>>>> "master_ver": "0#0",
>>>> "mtime": "2015-03-08 11:44:11.00",
>>>> "max_marker": "0#",
>>>> "usage": {
>>>> "rgw.none": {
>>>> "size_kb": 0,
>>>> "size_kb_actual": 0,
>>>> "num_objects": 0
>>>> },
>>>> "rgw.main": {
>>>> "size_kb": 0,
>>>> "size_kb_actual": 0,
>>>> "num_objects": 0
>>>> }
>>>> },
>>>> "bucket_quota": {
>>>> "enabled": false,
>>>> "max_size_kb": -1,
>>>> "max_objects": -1
>>>> }
>>>> },
>>>>
>>>>
>>>>
>>>> bucket check shows nothing:
>>>>
>>>> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>>> --bucket=globalcache01 --fix
>>>> []
>>>> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>>> --check-head-obj-locator --bucket=globalcache01 --fix
>>>> {
>>>> "bucket": "globalcache01",
>>>> "check_objects": [
>>>> ]
>>>> }
>>>>
>>>>
>>>> However, i see a lot of data for it on an OSD (all shadow files with
>>>> escaped underscores)
>>>>
>>>> [root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print
>>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c
>>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c
>>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c
>>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c
>>>> 
>>>>
>>>> -bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+'
>>>> default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47
>>>> default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6
>>>> default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3
>>>> default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1
>>>> default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16
>>>> default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2
>>>> 
>>>>
>>>> Is there still a bug in the fix obj locator command perhaps? I suppose
>>>> can just do something like:
>>>>
>>>>rados -p .rgw.buckets cleanup --prefix default.8873277.32
>>>>
>>>> Since i want to destroy the bucket anyway, but if this affects other
>>>> buckets, i may want to clean those a better way.
>>>>
>>>> -Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Yehuda Sadeh-Weinraub

The bucket index objects are most likely in the .rgw.buckets.index pool.

Yehuda

On Mon, Aug 31, 2015 at 3:27 PM, Ben Hines  wrote:
> Good call, thanks!
>
> Is there any risk of also deleting parts of the bucket index? I'm not
> sure what the objects for the index itself look like, or if they are
> in the .rgw.buckets pool.
>
>
> On Mon, Aug 31, 2015 at 3:23 PM, Yehuda Sadeh-Weinraub
>  wrote:
>> Make sure you use the underscore also, e.g., "default.8873277.32_".
>> Otherwise you could potentially erase objects you did't intend to,
>> like ones who start with "default.8873277.320" and such.
>>
>> On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines  wrote:
>>> Ok. I'm not too familiar with the inner workings of RGW, but i would
>>> assume that for a bucket with these parameters:
>>>
>>>"id": "default.8873277.32",
>>>"marker": "default.8873277.32",
>>>
>>> Tha it would be the only bucket using the files that start with
>>> "default.8873277.32"
>>>
>>> default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15
>>> default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2
>>>
>>>
>>>
>>> On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub
>>>  wrote:
>>>> As long as you're 100% sure that the prefix is only being used for the
>>>> specific bucket that was previously removed, then it is safe to remove
>>>> these objects. But please do double check and make sure that there's
>>>> no other bucket that matches this prefix somehow.
>>>>
>>>> Yehuda
>>>>
>>>> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines  wrote:
>>>>> No input, eh? (or maybe TL,DR for everyone)
>>>>>
>>>>> Short version: Presuming the bucket index shows blank/empty, which it
>>>>> does and is fine, would me manually deleting the rados objects with
>>>>> the prefix matching the former bucket's ID cause any problems?
>>>>>
>>>>> thanks,
>>>>>
>>>>> -Ben
>>>>>
>>>>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines  wrote:
>>>>>> Ceph 0.93->94.2->94.3
>>>>>>
>>>>>> I noticed my pool used data amount is about twice the bucket used data 
>>>>>> count.
>>>>>>
>>>>>> This bucket was emptied long ago. It has zero objects:
>>>>>> "globalcache01",
>>>>>> {
>>>>>> "bucket": "globalcache01",
>>>>>> "pool": ".rgw.buckets",
>>>>>> "index_pool": ".rgw.buckets.index",
>>>>>> "id": "default.8873277.32",
>>>>>> "marker": "default.8873277.32",
>>>>>> "owner": "...",
>>>>>> "ver": "0#12348839",
>>>>>> "master_ver": "0#0",
>>>>>> "mtime": "2015-03-08 11:44:11.00",
>>>>>> "max_marker": "0#",
>>>>>> "usage": {
>>>>>> "rgw.none": {
>>>>>> "size_kb": 0,
>>>>>> "size_kb_actual": 0,
>>>>>> "num_objects": 0
>>>>>> },
>>>>>> "rgw.main": {
>>>>>> "size_kb": 0,
>>>>>> "size_kb_actual": 0,
>>>>>> "num_objects": 0
>>>>>> }
>>>>>> },
>>>>>> "bucket_quota": {
>>>>>> "enabled": false,
>>>>>> "max_size_kb": -1,
>>>>>> "max_objects": -1
>>>>>> }
>>>>>> },
>>>>>>
>>>>>>
>>>>>>
>>>>>> bucket check shows nothing:
>>>>>>
>>>>>> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>>>>> --bucket=globalcache01 --fix
>>>>>

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Yehuda Sadeh-Weinraub

Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the
operations (bucket listing and bucket check) go into some kind of
infinite loop?

Yehuda

On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters  wrote:
> Hi, I've started the bucket --check --fix on friday evening and it's
> still running. 'ceph -s' shows the cluster health as OK, I don't know if
> there is anything else I could check? Is there a way of finding out if
> its actually doing something?
>
> We only have this issue on the one bucket with versioning enabled, I
> can't get rid of the feeling it has something todo with that. The
> "underscore bug" is also still present on that bucket
> (http://tracker.ceph.com/issues/12819). Not sure if thats related in any
> way.
> Are there any alternatives, as for example copy all the objects into a
> new bucket without versioning? Simple way would be to list the objects
> and copy them to a new bucket, but bucket listing is not working so...
>
> -Sam
>
>
> On 31-08-15 10:47, Gregory Farnum wrote:
>> This generally shouldn't be a problem at your bucket sizes. Have you
>> checked that the cluster is actually in a healthy state? The sleeping
>> locks are normal but should be getting woken up; if they aren't it
>> means the object access isn't working for some reason. A down PG or
>> something would be the simplest explanation.
>> -Greg
>>
>> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters  wrote:
>>> Ok, maybe I'm to impatient. It would be great if there were some verbose
>>> or progress logging of the radosgw-admin tool.
>>> I will start a check and let it run over the weekend.
>>>
>>> tnx,
>>> Sam
>>>
>>> On 28-08-15 18:16, Sam Wouters wrote:
 Hi,

 this bucket only has 13389 objects, so the index size shouldn't be a
 problem. Also, on the same cluster we have an other bucket with 1200543
 objects (but no versioning configured), which has no issues.

 when we run a radosgw-admin bucket --check (--fix), nothing seems to be
 happening. Putting an strace on the process shows a lot of lines like 
 these:
 [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
 
 [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL 
 [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 
 [pid 99385] <... futex resumed> )   = -1 EAGAIN (Resource
 temporarily unavailable)
 [pid 99371] <... futex resumed> )   = 0

 but no errors in the ceph logs or health warnings.

 r,
 Sam

 On 28-08-15 17:49, Ben Hines wrote:
> How many objects in the bucket?
>
> RGW has problems with index size once number of objects gets into the
> 90+ level. The buckets need to be recreated with 'sharded bucket
> indexes' on:
>
> rgw override bucket index max shards = 23
>
> You could also try repairing the index with:
>
>  radosgw-admin bucket check --fix --bucket=
>
> -Ben
>
> On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters  wrote:
>> Hi,
>>
>> we have a rgw bucket (with versioning) where PUT and GET operations for
>> specific objects succeed,  but retrieving an object list fails.
>> Using python-boto, after a timeout just gives us an 500 internal error;
>> radosgw-admin just hangs.
>> Also a radosgw-admin bucket check just seems to hang...
>>
>> ceph version is 0.94.3 but this also was happening with 0.94.2, we
>> quietly hoped upgrading would fix but it didn't...
>>
>> r,
>> Sam
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Troubleshooting rgw bucket list

2015-09-01 Thread Yehuda Sadeh-Weinraub

I assume you filtered the log by thread? I don't see the response
messages. For the bucket check you can run radosgw-admin with
--log-to-stderr.

Can you also set 'debug objclass = 20' on the osds? You can do it by:

$ ceph tell osd.\* injectargs --debug-objclass 20

Also, it'd be interesting to get the following:

$ radosgw-admin bi list --bucket=
--object=abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5


Thanks,
Yehuda

On Tue, Sep 1, 2015 at 10:44 AM, Sam Wouters  wrote:
> not sure where I can find the logs for the bucket check, I can't really
> filter them out in the radosgw log.
>
> -Sam
>
> On 01-09-15 19:25, Sam Wouters wrote:
>> It looks like it, this is what shows in the logs after bumping the debug
>> and requesting a bucket list.
>>
>> 2015-09-01 17:14:53.008620 7fccb17ca700 10 cls_bucket_list
>> aws-cmis-prod(@{i=.be-east.rgw.buckets.index}.be-east.rgw.buckets[be-east.5436.1])
>> start
>> abc_econtract/data/6shflrwbwwcm6dsemrpjit2li3v913iad1EZQ3.S6Prb-NXLvfQRlaWC5nBYp5[]
>> num_entries 1
>> 2015-09-01 17:14:53.008629 7fccb17ca700 20 reading from
>> .be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
>> 2015-09-01 17:14:53.008636 7fccb17ca700 20 get_obj_state:
>> rctx=0x7fccb17c84d0
>> obj=.be-east.rgw:.bucket.meta.aws-cmis-prod:be-east.5436.1
>> state=0x7fcde01a4060 s->prefetch_data=0
>> 2015-09-01 17:14:53.008640 7fccb17ca700 10 cache get:
>> name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
>> 2015-09-01 17:14:53.008645 7fccb17ca700 20 get_obj_state: s->obj_tag was
>> set empty
>> 2015-09-01 17:14:53.008647 7fccb17ca700 10 cache get:
>> name=.be-east.rgw+.bucket.meta.aws-cmis-prod:be-east.5436.1 : hit
>> 2015-09-01 17:14:53.008675 7fccb17ca700  1 -- 10.11.4.105:0/1109243 -->
>> 10.11.4.105:6801/39085 -- osd_op(client.55506.0:435874
>> ...
>> .dir.be-east.5436.1 [call rgw.bucket_list] 26.7d78fc84
>> ack+read+known_if_redirected e255) v5 -- ?+0 0x7fcde01a0540 con 0x3a2d870
>>
>> On 01-09-15 17:11, Yehuda Sadeh-Weinraub wrote:
>>> Can you bump up debug (debug rgw = 20, debug ms = 1), and see if the
>>> operations (bucket listing and bucket check) go into some kind of
>>> infinite loop?
>>>
>>> Yehuda
>>>
>>> On Tue, Sep 1, 2015 at 1:16 AM, Sam Wouters  wrote:
>>>> Hi, I've started the bucket --check --fix on friday evening and it's
>>>> still running. 'ceph -s' shows the cluster health as OK, I don't know if
>>>> there is anything else I could check? Is there a way of finding out if
>>>> its actually doing something?
>>>>
>>>> We only have this issue on the one bucket with versioning enabled, I
>>>> can't get rid of the feeling it has something todo with that. The
>>>> "underscore bug" is also still present on that bucket
>>>> (http://tracker.ceph.com/issues/12819). Not sure if thats related in any
>>>> way.
>>>> Are there any alternatives, as for example copy all the objects into a
>>>> new bucket without versioning? Simple way would be to list the objects
>>>> and copy them to a new bucket, but bucket listing is not working so...
>>>>
>>>> -Sam
>>>>
>>>>
>>>> On 31-08-15 10:47, Gregory Farnum wrote:
>>>>> This generally shouldn't be a problem at your bucket sizes. Have you
>>>>> checked that the cluster is actually in a healthy state? The sleeping
>>>>> locks are normal but should be getting woken up; if they aren't it
>>>>> means the object access isn't working for some reason. A down PG or
>>>>> something would be the simplest explanation.
>>>>> -Greg
>>>>>
>>>>> On Fri, Aug 28, 2015 at 6:52 PM, Sam Wouters  wrote:
>>>>>> Ok, maybe I'm to impatient. It would be great if there were some verbose
>>>>>> or progress logging of the radosgw-admin tool.
>>>>>> I will start a check and let it run over the weekend.
>>>>>>
>>>>>> tnx,
>>>>>> Sam
>>>>>>
>>>>>> On 28-08-15 18:16, Sam Wouters wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> this bucket only has 13389 objects, so the index size shouldn't be a
>>>>>>> problem. Also, on the same cluster we have an other bucket with 1200543
>>>>>>> objects (but no versioning configured), which has no issues.

Re: [ceph-users] How to observed civetweb.

2015-09-08 Thread Yehuda Sadeh-Weinraub

You can increase the civetweb logs by adding 'debug civetweb = 10' in
your ceph.conf. The output will go into the rgw logs.

Yehuda

On Tue, Sep 8, 2015 at 2:24 AM, Vickie ch  wrote:
> Dear cephers,
>Just upgrade radosgw from apache to civetweb.
> It's really simple to installed and used. But I can't find any parameters or
> logs to adjust(or observe) civetweb. (Like apache log).  I'm really confuse.
> Any ideas?
>
>
> Best wishes,
> Mika
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados gateway / no socket server point defined

2015-09-24 Thread Yehuda Sadeh-Weinraub

On Thu, Sep 24, 2015 at 8:59 AM, Mikaël Guichard  wrote:
> Hi,
>
> I encounter this error :
>
>> /usr/bin/radosgw -d --keyring /etc/ceph/ceph.client.radosgw.keyring -n
>> client.radosgw.myhost
> 2015-09-24 17:41:18.223206 7f427f074880  0 ceph version 0.94.3
> (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process radosgw, pid 4570
> 2015-09-24 17:41:18.349037 7f427f074880  0 framework: fastcgi
> 2015-09-24 17:41:18.349044 7f427f074880  0 framework: civetweb
> 2015-09-24 17:41:18.349048 7f427f074880  0 framework conf key: port, val:
> 7480
> 2015-09-24 17:41:18.349056 7f427f074880  0 starting handler: civetweb
> 2015-09-24 17:41:18.351852 7f427f074880  0 starting handler: fastcgi
> 2015-09-24 17:41:18.351921 7f41fc7a0700  0 ERROR: no socket server point
> defined, cannot start fcgi frontend
>
> I can force the socket file with the followed option and it works :
> --rgw-socket-path=/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> but why the ceph.conf parameter is ignored ?
>
> I look in the radosgw code, it should work :
>
>   conf->get_val("socket_path", "", &socket_path);
>   conf->get_val("socket_port", g_conf->rgw_port, &socket_port);
>   conf->get_val("socket_host", g_conf->rgw_host, &socket_host);
>
>   if (socket_path.empty() && socket_port.empty() && socket_host.empty()) {
> socket_path = g_conf->rgw_socket_path;
> if (socket_path.empty()) {
>   dout(0) << "ERROR: no socket server point defined, cannot start fcgi
> frontend" << dendl;
>   return;
> }
>   }
>
>
>
> My ceph.conf content :
>
> [client.radosgw.gateway]

You're using a different user for starting rgw
(client.radosgw.myhost), so this config section doesn't get used.
Either rename this section, or use the client.radosgw.gateway user.

Yehuda

> host = myhost
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> rgw print continue = false
> rgw enable usage log = true
> rgw enable ops log = true
> log file = /var/log/radosgw/client.radosgw.gateway.log
> rgw usage log tick interval = 30
> rgw usage log flush threshold = 1024
> rgw usage max shards = 32
> rgw usage max user shards = 1
>
> thanks for your response.
>
> regards
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw Storage policies

2015-09-28 Thread Yehuda Sadeh-Weinraub

On Mon, Sep 28, 2015 at 4:00 AM, Luis Periquito  wrote:
> Hi All,
>
> I was hearing the ceph talk about radosgw and Yehuda talks about storage
> policies. I started looking for it in the documentation, on how to
> implement/use and couldn't much information:
> http://docs.ceph.com/docs/master/radosgw/s3/ says it doesn't currently
> support it, and http://docs.ceph.com/docs/master/radosgw/swift/ doesn't
> mention it.
>
> From the release notes it seems to be for the swift interface, not S3. Is
> this correct? Can we create them for S3 interface, or only Swift?
>
>

You can create buckets in both swift and s3 that utilize this feature.
You need to define different placement targets in the zone
configuration.
In S3 when you create a bucket, you need to specify a location
constrain that specifies this policy. The location constraint should
be specified as follows: [region][:policy]. So if you're creating a
bucket in the current region using your 'gold' policy that you
defined, you'll need to set it to ':gold'.
In swift, the api requires sending it through a special http header
(X-Storage-Policy).

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw and keystone version 3 domains

2015-09-30 Thread Yehuda Sadeh-Weinraub

At the moment radosgw just doesn't support v3 (so it seems). I created
issue #13303. If anyone wants to pick this up (or provide some
information as to what it would require to support that) it would be
great.

Thanks,
Yehuda

On Wed, Sep 30, 2015 at 3:32 AM, Robert Duncan  wrote:
> Yes, but it always results in 401 from horizon and cli
>
> swift --debug --os-auth-url http://172.25.60.2:5000/v3 --os-username ldapuser 
> --os-user-domain-name ldapdomain --os-project-name someproject 
> --os-project-domain-name ldapdomain --os-password password123 -V 3 post 
> containerV3
> DEBUG:keystoneclient.auth.identity.v3:Making authentication request to 
> http://172.25.60.2:5000/v3/auth/tokens
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2
> DEBUG:urllib3.connectionpool:Setting read timeout to None
> DEBUG:urllib3.connectionpool:"POST /v3/auth/tokens HTTP/1.1" 201 8366
> DEBUG:iso8601.iso8601:Parsed 2015-09-30T11:20:46.053177Z into {'tz_sign': 
> None, 'second_fraction': u'053177', 'hour': u'11', 'daydash': u'30', 
> 'tz_hour': None, 'month': None, 'timezone': u'Z', 'second': u'46', 
> 'tz_minute': None, 'year': u'2015', 'separator': u'T', 'monthdash': u'09', 
> 'day': None, 'minute': u'20'} with default timezone  object at 0x1736f50>
> DEBUG:iso8601.iso8601:Got u'2015' for 'year' with default None
> DEBUG:iso8601.iso8601:Got u'09' for 'monthdash' with default None
> DEBUG:iso8601.iso8601:Got 9 for 'month' with default 9
> DEBUG:iso8601.iso8601:Got u'30' for 'daydash' with default None
> DEBUG:iso8601.iso8601:Got 30 for 'day' with default 30
> DEBUG:iso8601.iso8601:Got u'11' for 'hour' with default None
> DEBUG:iso8601.iso8601:Got u'20' for 'minute' with default None
> DEBUG:iso8601.iso8601:Got u'46' for 'second' with default None
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2
> DEBUG:urllib3.connectionpool:Setting read timeout to  0x7f193dc590b0>
> DEBUG:urllib3.connectionpool:"POST /swift/v1/containerV3 HTTP/1.1" 401 None
> INFO:swiftclient:REQ: curl -i http://172.25.60.2:8080/swift/v1/containerV3 -X 
> POST -H "Content-Length: 0" -H "X-Auth-Token: 
> 30fd924774bf480d8814c61c7fdf128e"
> INFO:swiftclient:RESP STATUS: 401 Unauthorized
> INFO:swiftclient:RESP HEADERS: [('content-encoding', 'gzip'), 
> ('transfer-encoding', 'chunked'), ('accept-ranges', 'bytes'), ('vary', 
> 'Accept-Encoding'), ('server', 'Apache/2.2.15 (CentOS)'), ('date', 'Wed, 30 
> Sep 2015 10:20:46 GMT'), ('content-type', 'text/plain; charset=utf-8')]
> INFO:swiftclient:RESP BODY: AccessDenied
>
> DEBUG:keystoneclient.auth.identity.v3:Making authentication request to 
> http://172.25.60.2:5000/v3/auth/tokens
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2
> DEBUG:urllib3.connectionpool:Setting read timeout to None
> DEBUG:urllib3.connectionpool:"POST /v3/auth/tokens HTTP/1.1" 201 8366
> DEBUG:iso8601.iso8601:Parsed 2015-09-30T11:20:47.839422Z into {'tz_sign': 
> None, 'second_fraction': u'839422', 'hour': u'11', 'daydash': u'30', 
> 'tz_hour': None, 'month': None, 'timezone': u'Z', 'second': u'47', 
> 'tz_minute': None, 'year': u'2015', 'separator': u'T', 'monthdash': u'09', 
> 'day': None, 'minute': u'20'} with default timezone  object at 0x1736f50>
> DEBUG:iso8601.iso8601:Got u'2015' for 'year' with default None
> DEBUG:iso8601.iso8601:Got u'09' for 'monthdash' with default None
> DEBUG:iso8601.iso8601:Got 9 for 'month' with default 9
> DEBUG:iso8601.iso8601:Got u'30' for 'daydash' with default None
> DEBUG:iso8601.iso8601:Got 30 for 'day' with default 30
> DEBUG:iso8601.iso8601:Got u'11' for 'hour' with default None
> DEBUG:iso8601.iso8601:Got u'20' for 'minute' with default None
> DEBUG:iso8601.iso8601:Got u'47' for 'second' with default None
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 172.25.60.2
> DEBUG:urllib3.connectionpool:Setting read timeout to  0x7f193dc590b0>
> DEBUG:urllib3.connectionpool:"POST /swift/v1/containerV3 HTTP/1.1" 401 None
> INFO:swiftclient:REQ: curl -i http://172.25.60.2:8080/swift/v1/containerV3 -X 
> POST -H "Content-Length: 0" -H "X-Auth-Token: 
> fc7bb4a07baf41058546d8a85b2cd2b8"
> INFO:swiftclient:RESP STATUS: 401 Unauthorized
> INFO:swiftclient:RESP HEADERS: [('content-encoding', 'gzip'), 
> ('transfer-encoding', 'chunked'), ('accept-ranges', 'bytes'), ('vary', 
> 'Accept-Encoding'), ('server', 'Apache/2.2.15 (CentOS)'), ('date', 'Wed, 30 
> Sep 2015 10:20:47 GMT'), ('content-type', 'text/plain; charset=utf-8')]
> INFO:swiftclient:RESP BODY: AccessDenied
>
> ERROR:swiftclient:Container POST failed: 
> http://172.25.60.2:8080/swift/v1/containerV3 401 Unauthorized   AccessDenied
> Traceback (most recent call last):
>   File "/usr/lib/python2.6/site-packages/swiftclient/client.py", line 1243, 
> in _retry
> rv = func(self.url, self.token, *args, **kwargs)
>   File "/usr/lib/python2.6/site-packages/swiftclient/client.py", line 771, in 
> post_container
> http_response_content=body)
> ClientException: Conta

Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?

2015-10-08 Thread Yehuda Sadeh-Weinraub

On Thu, Oct 8, 2015 at 1:55 PM, Christian Sarrasin
 wrote:
> After discovering this excellent blog post [1], I thought that taking
> advantage of users' "default_placement" feature would be a preferable way to
> achieve my multi-tenancy requirements (see previous post).
>
> Alas I seem to be hitting a snag. Any attempt to create a bucket with a user
> setup with a non-empty default_placement results in a 400 error thrown back
> to the client and the following msg in the radosgw logs:
>
> "could not find placement rule placement-user2 within region"
>
> (The pools exist, I reloaded the radosgw service and ran 'radosgw-admin
> regionmap update' as suggested in the blog post before running the client
> test)
>
> Here's the setup.  What am I doing wrong?  Any insight is really
> appreciated!

Not sure. Did you run 'radosgw-admin regionmap update'?

>
> radosgw-admin region get
> { "name": "default",
>   "api_name": "",
>   "is_master": "true",
>   "endpoints": [],
>   "master_zone": "",
>   "zones": [
> { "name": "default",
>   "endpoints": [],
>   "log_meta": "false",
>   "log_data": "false"}],
>   "placement_targets": [
> { "name": "default-placement",
>   "tags": []},
> { "name": "placement-user2",
>   "tags": []}],
>   "default_placement": "default-placement"}
>
> radosgw-admin zone get default
> { "domain_root": ".rgw",
>   "control_pool": ".rgw.control",
>   "gc_pool": ".rgw.gc",
>   "log_pool": ".log",
>   "intent_log_pool": ".intent-log",
>   "usage_log_pool": ".usage",
>   "user_keys_pool": ".users",
>   "user_email_pool": ".users.email",
>   "user_swift_pool": ".users.swift",
>   "user_uid_pool": ".users.uid",
>   "system_key": { "access_key": "",
>   "secret_key": ""},
>   "placement_pools": [
> { "key": "default-placement",
>   "val": { "index_pool": ".rgw.buckets.index",
>   "data_pool": ".rgw.buckets",
>   "data_extra_pool": ".rgw.buckets.extra"}},
> { "key": "placement-user2",
>   "val": { "index_pool": ".rgw.index.user2",
>   "data_pool": ".rgw.buckets.user2",
>   "data_extra_pool": ".rgw.buckets.extra"}}]}
>
> radosgw-admin user info --uid=user2
> { "user_id": "user2",
>   "display_name": "User2",
>   "email": "",
>   "suspended": 0,
>   "max_buckets": 1000,
>   "auid": 0,
>   "subusers": [],
>   "keys": [
> { "user": "user2",
>   "access_key": "VYM2EEU1X5H6Y82D0K4F",
>   "secret_key": "vEeJ9+yadvtqZrb2xoCAEuM2AlVyZ7UTArbfIEek"}],
>   "swift_keys": [],
>   "caps": [],
>   "op_mask": "read, write, delete",
>   "default_placement": "placement-user2",
>   "placement_tags": [],
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "user_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "temp_url_keys": []}
>
> [1] http://cephnotes.ksperis.com/blog/2014/11/28/placement-pools-on-rados-gw
>
>
> On 03/10/15 19:48, Christian Sarrasin wrote:
>>
>> What are the best options to setup the Ceph radosgw so it supports
>> separate/independent "tenants"? What I'm after:
>>
>> 1. Ensure isolation between tenants, ie: no overlap/conflict in bucket
>> namespace; something separate radosgw "users" doesn't achieve
>> 2. Ability to backup/restore tenants' pools individually
>>
>> Referring to the docs [1], it seems this could possibly be achieved with
>> zones; one zone per tenant and leave out synchronization. Seems a little
>> heavy handed and presumably the overhead is non-negligible.
>>
>> Is this "supported"? Is there a better way?
>>
>> I'm running Firefly. I'm also rather new to Ceph so apologies if this is
>> already covered somewhere; kindly send pointers if so...
>>
>> Cheers,
>> Christian
>>
>> PS: cross-posted from [2]
>>
>> [1] http://docs.ceph.com/docs/v0.80/radosgw/federated-config/
>> [2]
>>
>> http://serverfault.com/questions/726491/how-to-setup-ceph-radosgw-to-support-multi-tenancy
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to setup Ceph radosgw to support multi-tenancy?

2015-10-08 Thread Yehuda Sadeh-Weinraub

When you start radosgw, do you explicitly state the name of the region
that gateway belongs to?


On Thu, Oct 8, 2015 at 2:19 PM, Christian Sarrasin
 wrote:
> Hi Yehuda,
>
> Yes I did run "radosgw-admin regionmap update" and the regionmap appears to
> know about my custom placement_target.  Any other idea?
>
> Thanks a lot
> Christian
>
> radosgw-admin region-map get
> { "regions": [
> { "key": "default",
>   "val": { "name": "default",
>   "api_name": "",
>   "is_master": "true",
>   "endpoints": [],
>   "master_zone": "",
>   "zones": [
> { "name": "default",
>   "endpoints": [],
>   "log_meta": "false",
>   "log_data": "false"}],
>   "placement_targets": [
> { "name": "default-placement",
>   "tags": []},
> { "name": "placement-user2",
>   "tags": []}],
>   "default_placement": "default-placement"}}],
>   "master_region": "default",
>   "bucket_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1},
>   "user_quota": { "enabled": false,
>   "max_size_kb": -1,
>   "max_objects": -1}}
>
> On 08/10/15 23:02, Yehuda Sadeh-Weinraub wrote:
>
>>> Here's the setup.  What am I doing wrong?  Any insight is really
>>> appreciated!
>>
>>
>> Not sure. Did you run 'radosgw-admin regionmap update'?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw keystone accepted roles not matching

2015-10-15 Thread Yehuda Sadeh-Weinraub

On Thu, Oct 15, 2015 at 8:34 AM, Mike Lowe  wrote:
> I’m having some trouble with radosgw and keystone integration, I always get 
> the following error:
>
> user does not hold a matching role; required roles: Member,user,_member_,admin
>
> Despite my token clearly having one of the roles:
>
> "user": {
> "id": "401375297eb540bbb1c32432439827b0",
> "name": "jomlowe",
> "roles": [
> {
> "id": "8adcf7413cd3469abe4ae13cf259be6e",
> "name": "user"
> }
> ],
> "roles_links": [],
> "username": "jomlowe"
> }
>
> Does anybody have any hints?


Does the user has these roles assigned on keystone?

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw get quota

2015-10-29 Thread Yehuda Sadeh-Weinraub

On Thu, Oct 29, 2015 at 11:29 AM, Derek Yarnell  wrote:
> Sorry, the information is in the headers.  So I think the valid question
> to follow up is why is this information in the headers and not the body
> of the request.  I think this is a bug, but maybe I am not aware of a
> subtly.  It would seem this json comes from this line[0].
>
> [0] -
> https://github.com/ceph/ceph/blob/83e10f7e2df0a71bd59e6ef2aa06b52b186fddaa/src/rgw/rgw_rest_user.cc#L697
>
> For example the information is returned in what seems to be the
> Content-type header as follows.  Maybe the missing : in the json
> encoding would explain something?

It's definitely a bug. It looks like we fail to call end_header()
before it, so everything is dumped before we close the http header.
Can you open a ceph tracker issue with the info you provided here?

Thanks,
Yehuda

>
> INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS
> connection (1): ceph.umiacs.umd.edu
> DEBUG:requests.packages.urllib3.connectionpool:"GET
> /admin/user?quota&format=json&uid=foo1209"a-type=user HTTP/1.1" 200 0
> INFO:rgwadmin.rgw:[('date', 'Thu, 29 Oct 2015 18:28:45 GMT'),
> ('{"enabled"', 'true,"max_size_kb":12345,"max_objects":-1}Content-type:
> application/json'), ('content-length', '0'), ('server', 'Apache/2.4.6
> (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5')]
>
> On 10/28/15 11:15 PM, Derek Yarnell wrote:
>> I have had this issue before, and I don't think I have resolved it.  I
>> have been using the RGW admin api to set quota based on the docs[0].
>> But I can't seem to be able to get it to cough up and show me the quota
>> now.  Any ideas I get a 200 back but no body, I have tested this on a
>> Firefly (0.80.5-9) and Hammer (0.87.2-0) cluster.  The latter is what
>> the logs are for.
>>
>> [0] - http://docs.ceph.com/docs/master/radosgw/adminops/#quotas
>>
>> DEBUG:rgwadmin.rgw:URL:
>> http://ceph.umiacs.umd.edu/admin/user?quota&uid=derek"a-type=user
>> DEBUG:rgwadmin.rgw:Access Key: RTJ1TL13CH613JRU2PJD
>> DEBUG:rgwadmin.rgw:Verify: True  CA Bundle: None
>> INFO:requests.packages.urllib3.connectionpool:Starting new HTTP
>> connection (1): ceph.umiacs.umd.edu
>> DEBUG:requests.packages.urllib3.connectionpool:"GET
>> /admin/user?quota&uid=derek"a-type=user HTTP/1.1" 200 0
>> INFO:rgwadmin.rgw:No JSON object could be decoded
>>
>>
>> 2015-10-28 23:02:46.445367 7f444cff1700  1 civetweb: 0x7f445c026d00:
>> 127.0.0.1 - - [28/Oct/2015:23:02:46 -0400] "GET /admin/user HTTP/1.1" -1
>> 0 - python-requests/2.7.0 CPython/2.7.5 Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:02.063755 7f447ace2700  2
>> RGWDataChangesLog::ChangesRenewThread: start
>> 2015-10-28 23:03:17.139339 7f443cfd1700 20 RGWEnv::set(): HTTP_HOST:
>> localhost:7480
>> 2015-10-28 23:03:17.139357 7f443cfd1700 20 RGWEnv::set():
>> HTTP_ACCEPT_ENCODING: gzip, deflate
>> 2015-10-28 23:03:17.139358 7f443cfd1700 20 RGWEnv::set(): HTTP_ACCEPT: */*
>> 2015-10-28 23:03:17.139364 7f443cfd1700 20 RGWEnv::set():
>> HTTP_USER_AGENT: python-requests/2.7.0 CPython/2.7.5
>> Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:17.139375 7f443cfd1700 20 RGWEnv::set(): HTTP_DATE:
>> Thu, 29 Oct 2015 03:03:17 GMT
>> 2015-10-28 23:03:17.139377 7f443cfd1700 20 RGWEnv::set():
>> HTTP_AUTHORIZATION: AWS RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds=
>> 2015-10-28 23:03:17.139381 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_FOR: 128.8.132.4
>> 2015-10-28 23:03:17.139383 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_HOST: ceph.umiacs.umd.edu
>> 2015-10-28 23:03:17.139385 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_SERVER: cephproxy00.umiacs.umd.edu
>> 2015-10-28 23:03:17.139387 7f443cfd1700 20 RGWEnv::set():
>> HTTP_CONNECTION: Keep-Alive
>> 2015-10-28 23:03:17.139392 7f443cfd1700 20 RGWEnv::set():
>> REQUEST_METHOD: GET
>> 2015-10-28 23:03:17.139394 7f443cfd1700 20 RGWEnv::set(): REQUEST_URI:
>> /admin/user
>> 2015-10-28 23:03:17.139397 7f443cfd1700 20 RGWEnv::set(): QUERY_STRING:
>> quota&uid=derek"a-type=user
>> 2015-10-28 23:03:17.139401 7f443cfd1700 20 RGWEnv::set(): REMOTE_USER:
>> 2015-10-28 23:03:17.139403 7f443cfd1700 20 RGWEnv::set(): SCRIPT_URI:
>> /admin/user
>> 2015-10-28 23:03:17.139408 7f443cfd1700 20 RGWEnv::set(): SERVER_PORT: 7480
>> 2015-10-28 23:03:17.139409 7f443cfd1700 20 HTTP_ACCEPT=*/*
>> 2015-10-28 23:03:17.139410 7f443cfd1700 20 HTTP_ACCEPT_ENCODING=gzip,
>> deflate
>> 2015-10-28 23:03:17.139411 7f443cfd1700 20 HTTP_AUTHORIZATION=AWS
>> RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds=
>> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_CONNECTION=Keep-Alive
>> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_DATE=Thu, 29 Oct 2015
>> 03:03:17 GMT
>> 2015-10-28 23:03:17.139413 7f443cfd1700 20 HTTP_HOST=localhost:7480
>> 2015-10-28 23:03:17.139413 7f443cfd1700 20
>> HTTP_USER_AGENT=python-requests/2.7.0 CPython/2.7.5
>> Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:17.139414 7f443cfd1700 20 HTTP_X_FORWARDED_FOR=128.8.132.4
>> 2015-10-2

Re: [ceph-users] Missing bucket

2015-11-13 Thread Yehuda Sadeh-Weinraub

On Fri, Nov 13, 2015 at 12:53 PM, Łukasz Jagiełło
 wrote:
> Hi all,
>
> Recently I've noticed a problem with one of our buckets:
>
> I cannot list or stats on a bucket:
> #v+
> root@ceph-s1:~# radosgw-admin bucket stats --bucket=problematic_bucket
> error getting bucket stats ret=-22

That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
radosgw-admin version mismatch vs. version that osds are running. Try
to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
more info about the source of this error.

> ➜  ~  s3cmd -c /etc/s3cmd/prod.cfg ls
> s3://problematic_bucket/images/e/e0/file.png
> ERROR: S3 error: None
> #v-
>
> ,but direct request for an object is working perfectly fine:
> #v+
> ➜  ~  curl -svo /dev/null
> http://ceph-s1/problematic_bucket/images/e/e0/file.png
> […]
> < HTTP/1.1 200 OK
> < Content-Type: image/png
> < Content-Length: 379906
> […]
> #v-
>
> Any solution how to fix it? We're still running ceph 0.67.11
>

You're really behind.


Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Missing bucket

2015-11-13 Thread Yehuda Sadeh-Weinraub

On Fri, Nov 13, 2015 at 1:14 PM, Łukasz Jagiełło
 wrote:
> On Fri, Nov 13, 2015 at 1:07 PM, Yehuda Sadeh-Weinraub 
> wrote:
>>
>> > Recently I've noticed a problem with one of our buckets:
>> >
>> > I cannot list or stats on a bucket:
>> > #v+
>> > root@ceph-s1:~# radosgw-admin bucket stats --bucket=problematic_bucket
>> > error getting bucket stats ret=-22
>>
>> That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
>> radosgw-admin version mismatch vs. version that osds are running. Try
>> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
>> more info about the source of this error.
>
>
> https://gist.github.com/ljagiello/06a4dd1f34a776e38f77
>
> Result of more verbose debug.
>
2015-11-13 21:10:19.160420 7fd9f91be7c0 1 -- 10.8.68.78:0/1007616 -->
10.8.42.35:6800/26514 -- osd_op(client.44897323.0:30
.dir.default.5457.9 [call rgw.bucket_list] 16.2f979b1a e172956) v4 --
?+0 0x15f3740 con 0x15daa60
2015-11-13 21:10:19.161058 7fd9ef8a7700 1 -- 10.8.68.78:0/1007616 <==
osd.12 10.8.42.35:6800/26514 6  osd_op_reply(30
.dir.default.5457.9 [call] ondisk = -22 (Invalid argument)) v4 
118+0+0 (3885840820 0 0) 0x7fd9c8000d50 con 0x15daa60
error getting bucket stats ret=-22

You can try taking a look at osd.12 logs. Any chance osd.12 and
radosgw-admin aren't running the same major version? (more likely
radosgw-admin running a newer version).

>>
>> You're really behind.
>
>
> I know, we've got scheduled update for 2016 it's a big project to ensure
> everything is fine.
>
> --
> Łukasz Jagiełło
> lukaszjagielloorg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Missing bucket

2015-11-13 Thread Yehuda Sadeh-Weinraub

On Fri, Nov 13, 2015 at 1:37 PM, Łukasz Jagiełło
 wrote:
>> >> > Recently I've noticed a problem with one of our buckets:
>> >> >
>> >> > I cannot list or stats on a bucket:
>> >> > #v+
>> >> > root@ceph-s1:~# radosgw-admin bucket stats
>> >> > --bucket=problematic_bucket
>> >> > error getting bucket stats ret=-22
>> >>
>> >> That's EINVAL, not ENOENT. It could mean lot's of things, e.g.,
>> >> radosgw-admin version mismatch vs. version that osds are running. Try
>> >> to add --debug-rgw=20 --debug-ms=1 --log-to-stderr to maybe get a bit
>> >> more info about the source of this error.
>> >
>> >
>> > https://gist.github.com/ljagiello/06a4dd1f34a776e38f77
>> >
>> > Result of more verbose debug.
>> >
>> 2015-11-13 21:10:19.160420 7fd9f91be7c0 1 -- 10.8.68.78:0/1007616 -->
>> 10.8.42.35:6800/26514 -- osd_op(client.44897323.0:30
>> .dir.default.5457.9 [call rgw.bucket_list] 16.2f979b1a e172956) v4 --
>> ?+0 0x15f3740 con 0x15daa60
>> 2015-11-13 21:10:19.161058 7fd9ef8a7700 1 -- 10.8.68.78:0/1007616 <==
>> osd.12 10.8.42.35:6800/26514 6  osd_op_reply(30
>> .dir.default.5457.9 [call] ondisk = -22 (Invalid argument)) v4 
>> 118+0+0 (3885840820 0 0) 0x7fd9c8000d50 con 0x15daa60
>> error getting bucket stats ret=-22
>>
>> You can try taking a look at osd.12 logs. Any chance osd.12 and
>> radosgw-admin aren't running the same major version? (more likely
>> radosgw-admin running a newer version).
>
>
> From last 12h it's just deep-scrub info
> #v+
> 2015-11-13 08:23:00.690076 7fc4c62ee700  0 log [INF] : 15.621 deep-scrub ok
> #v-

This is unrelated.

>
> But yesterday there was a big rebalance and a host with that osd was
> rebuilding from scratch.
>
> We're running the same version (ceph, rados) across entire cluster just
> double check it.
>

what does 'radosgw-admin --version' return?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW not start after upgrade to Jewel

2016-04-26 Thread Yehuda Sadeh-Weinraub

On Tue, Apr 26, 2016 at 6:50 AM, Abhishek Lekshmanan  wrote:
>
> Ansgar Jazdzewski writes:
>
>> Hi,
>>
>> After plaing with the setup i got some output that looks wrong
>>
>> # radosgw-admin zone get
>>
>> "placement_pools": [
>> {
>> "key": "default-placement",
>> "val": {
>> "index_pool": ".eu-qa.rgw.buckets.inde",
>> "data_pool": ".eu-qa.rgw.buckets.dat",
>> "data_extra_pool": ".eu-qa.rgw.buckets.non-e",
>> "index_type": 0
>> }
>> }
>> ],
>>
>> i think it sould be
>>
>> index_pool = .eu-qa.rgw.buckets.index.
>> data_pool = .eu-qa.rgw.buckets
>> data_extra_pool = .eu-qa.rgw.buckets.extra
>>
>> how can i fix it?
>
> Not sure how it reached this state, but given a zone get json, you can

There's an issue now when doing radosgw-admin zone set, and the pool
names start with a period (http://tracker.ceph.com/issues/15597). The
pool name is getting truncated by one character. We will have this
fixed for the next point release, but the workaround now would be to
add an extra character in each pool name before running the zone set
command.

Yehuda

> edit this and set it back using zone set for eg
> # radosgw-admin zone get > zone.json # now edit this file
> # radosgw-admin zone set --rgw-zone="eu-qa" < zone.json
>>
>> Thanks
>> Ansgar
>>
>> 2016-04-26 13:07 GMT+02:00 Ansgar Jazdzewski :
>>> Hi all,
>>>
>>> i got an answer, that pointed me to:
>>> https://github.com/ceph/ceph/blob/master/doc/radosgw/multisite.rst
>>>
>>> 2016-04-25 16:02 GMT+02:00 Karol Mroz :
 On Mon, Apr 25, 2016 at 02:23:28PM +0200, Ansgar Jazdzewski wrote:
> Hi,
>
> we test Jewel in our  QA environment (from Infernalis to Hammer) the
> upgrade went fine but the Radosgw did not start.
>
> the error appears also with radosgw-admin
>
> # radosgw-admin user info --uid="images" --rgw-region=eu --rgw-zone=eu-qa
> 2016-04-25 12:13:33.425481 7fc757fad900  0 error in read_id for id  :
> (2) No such file or directory
> 2016-04-25 12:13:33.425494 7fc757fad900  0 failed reading zonegroup
> info: ret -2 (2) No such file or directory
> couldn't init storage provider
>
> do i have to change some settings, also for upgrade of the radosgw?

 Hi,

 Testing a recent master build (with only default region and zone),
 I'm able to successfully run the command you specified:

 % ./radosgw-admin user info --uid="testid" --rgw-region=default 
 --rgw-zone=default
 ...
 {
 "user_id": "testid",
 "display_name": "M. Tester",
 ...
 }

 Are you certain the region and zone you specified exist?

 What do the following report:

 radosgw-admin zone list
 radosgw-admin region list

 --
 Regards,
 Karol
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Abhishek Lekshmanan
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] removing 'rados cppool' command

2016-05-06 Thread Yehuda Sadeh-Weinraub

On Fri, May 6, 2016 at 12:41 PM, Sage Weil  wrote:
> This PR
>
> https://github.com/ceph/ceph/pull/8975
>
> removes the 'rados cppool' command.  The main problem is that the command
> does not make a faithful copy of all data because it doesn't preserve the
> snapshots (and snapshot related metadata).  That means if you copy an RBD
> pool it will render the images somewhat broken (snaps won't be present and
> won't work properly).  It also doesn't preserve the user_version field
> that some librados users may rely on.
>
> Since it's obscure and of limited use, this PR just removes it.
>
> Alternatively, we could add safeguards so that it refuses to make a copy
> if there are any selfmanaged_snaps, and/or generate some warnings.
>
> Any objections?

I prefer the alternative. I found this command pretty useful for
testing config upgrade scenarios with rgw. After generating config
scenarios in older versions, I used this command to store the config
on another pool, and then I could get the different config whenever
needed.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] removing 'rados cppool' command

2016-05-06 Thread Yehuda Sadeh-Weinraub

On Fri, May 6, 2016 at 2:27 PM, Sage Weil  wrote:
> On Fri, 6 May 2016, Yehuda Sadeh-Weinraub wrote:
>> On Fri, May 6, 2016 at 12:41 PM, Sage Weil  wrote:
>> > This PR
>> >
>> > https://github.com/ceph/ceph/pull/8975
>> >
>> > removes the 'rados cppool' command.  The main problem is that the command
>> > does not make a faithful copy of all data because it doesn't preserve the
>> > snapshots (and snapshot related metadata).  That means if you copy an RBD
>> > pool it will render the images somewhat broken (snaps won't be present and
>> > won't work properly).  It also doesn't preserve the user_version field
>> > that some librados users may rely on.
>> >
>> > Since it's obscure and of limited use, this PR just removes it.
>> >
>> > Alternatively, we could add safeguards so that it refuses to make a copy
>> > if there are any selfmanaged_snaps, and/or generate some warnings.
>> >
>> > Any objections?
>>
>> I prefer the alternative. I found this command pretty useful for
>> testing config upgrade scenarios with rgw. After generating config
>> scenarios in older versions, I used this command to store the config
>> on another pool, and then I could get the different config whenever
>> needed.
>
> Keep in mind that all of these calls in rgw
>
> rgw/rgw_rados.cc:  epoch = ref.ioctx.get_last_version();
>
> may be subtley broken by cppool because user_version is not preserved...
>

Right, but these calls are done in the bucket index pool, not at the
root or metadata pools that are relevant to the system config. While
it may not be the perfect tool for everything, I still find it useful
from time to time.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Yehuda Sadeh-Weinraub

While I'm usually not fond of blaming the client application, this is
really the swift command line tool issue. It tries to be smart by
comparing the md5sum of the object's content with the object's etag,
and it breaks with multipart objects. Multipart objects is calculated
differently (md5sum of the md5sum of each part). I think the swift
tool has a special handling for swift large objects (which are not the
same as s3 multipart objects), so that's why it works in that specific
use case.

Yehuda

On Wed, May 11, 2016 at 7:15 AM, Saverio Proto  wrote:
> It does not work also the way around:
>
> If I upload a file with the swift client with the -S options to force
> swift to make multipart:
>
> swift upload -S 100 multipart 180.mp4
>
> Then I am not able to read the file with S3
>
> s3cmd get s3://multipart/180.mp4
> download: 's3://multipart/180.mp4' -> './180.mp4'  [1 of 1]
> download: 's3://multipart/180.mp4' -> './180.mp4'  [1 of 1]
>  38818503 of 38818503   100% in1s27.32 MB/s  done
> WARNING: MD5 signatures do not match:
> computed=961f154cc78c7bf1be3b4009c29e5a68,
> received=d41d8cd98f00b204e9800998ecf8427e
>
> Saverio
>
>
> 2016-05-11 16:07 GMT+02:00 Saverio Proto :
>> Thank you.
>>
>> It is exactly a problem with multipart.
>>
>> So I tried two clients (s3cmd and rclone). When you upload a file in
>> S3 using multipart, you are not able to read anymore this object with
>> the SWIFT API because the md5 check fails.
>>
>> Saverio
>>
>>
>>
>> 2016-05-09 12:00 GMT+02:00 Xusangdi :
>>> Hi,
>>>
>>> I'm not running a cluster as yours, but I don't think the issue is caused 
>>> by you using 2 APIs at the same time.
>>> IIRC the dash thing is append by S3 multipart upload, with a following 
>>> digit indicating the number of parts.
>>> You may want to check this reported in s3cmd community:
>>> https://sourceforge.net/p/s3tools/bugs/123/
>>>
>>> and some basic info from Amazon:
>>> http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
>>>
>>> Hope this helps :D
>>>
>>> Regards,
>>> ---Sandy
>>>
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Saverio Proto
 Sent: Monday, May 09, 2016 4:42 PM
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API 
 at the same time

 I try to simplify the question to get some feedback.

 Is anyone running the RadosGW in production with S3 and SWIFT API active 
 at the same time ?

 thank you !

 Saverio


 2016-05-06 11:39 GMT+02:00 Saverio Proto :
 > Hello,
 >
 > We have been running the Rados GW with the S3 API and we did not have
 > problems for more than a year.
 >
 > We recently enabled also the SWIFT API for our users.
 >
 > radosgw --version
 > ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 >
 > The idea is that each user of the system is free of choosing the S3
 > client or the SWIFT client to access the same container/buckets.
 >
 > Please tell us if this is possible by design or if we are doing 
 > something wrong.
 >
 > We have now a problem that some files wrote in the past with S3,
 > cannot be read with the SWIFT API because the md5sum always fails.
 >
 > I am able to reproduce the bug in this way:
 >
 > We have this file googlebooks-fre-all-2gram-20120701-ts.gz and we know
 > the correct md5 is 1c8113d2bd21232688221ec74dccff3a You can download
 > the same file here:
 > https://www.dropbox.com/s/auq16vdv2maw4p7/googlebooks-fre-all-2gram-20
 > 120701-ts.gz?dl=0
 >
 > rclone mkdir lss3:bugreproduce
 > rclone copy googlebooks-fre-all-2gram-20120701-ts.gz lss3:bugreproduce
 >
 > The file is successfully uploaded.
 >
 > At this point I can succesfully download again the file:
 > rclone copy lss3:bugreproduce/googlebooks-fre-all-2gram-20120701-ts.gz
 > test.gz
 >
 > but not with swift:
 >
 > swift download googlebooks-ngrams-gz
 > fre/googlebooks-fre-all-2gram-20120701-ts.gz
 > Error downloading object
 > 'googlebooks-ngrams-gz/fre/googlebooks-fre-all-2gram-20120701-ts.gz':
 > u'Error downloading fre/googlebooks-fre-all-2gram-20120701-ts.gz:
 > md5sum != etag, 1c8113d2bd21232688221ec74dccff3a !=
 > 1a209a31b4ac3eb923fac5e8d194d9d3-2'
 >
 > Also I found strange the dash character '-' at the end of the md5 that
 > is trying to compare.
 >
 > Of course upload a file with the swift client and redownloading the
 > same file just works.
 >
 > Should I open a bug for the radosgw on http://tracker.ceph.com/ ?
 >
 > thank you
 >
 > Saverio
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> --

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-12 Thread Yehuda Sadeh-Weinraub

On Thu, May 12, 2016 at 12:29 AM, Saverio Proto  wrote:
>> While I'm usually not fond of blaming the client application, this is
>> really the swift command line tool issue. It tries to be smart by
>> comparing the md5sum of the object's content with the object's etag,
>> and it breaks with multipart objects. Multipart objects is calculated
>> differently (md5sum of the md5sum of each part). I think the swift
>> tool has a special handling for swift large objects (which are not the
>> same as s3 multipart objects), so that's why it works in that specific
>> use case.
>
> Well but I tried also with rclone and I have the same issue.
>
> Clients I tried
> rclone (both SWIFT and S3)
> s3cmd (S3)
> python-swiftclient (SWIFT).
>
> I can reproduce the issue with different clients.
> Once a multipart object is uploaded via S3 (with rclone or s3cmd) I
> cannot read it anymore via SWIFT (either with rclone or
> pythonswift-client).
>
> Are you saying that all SWIFT clients implementations are wrong ?

Yes.

>
> Or should the radosgw be configured with only 1 API active ?
>
> Saverio
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)

2016-05-20 Thread Yehuda Sadeh-Weinraub

On Fri, May 20, 2016 at 9:03 AM, Jonathan D. Proulx  wrote:
> Hi All,
>
> I saw the previous thread on this related to
> http://tracker.ceph.com/issues/15597
>
> and Yehuda's fix script
> https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone
>
> Running this seems to have landed me in a weird state.
>
> I can create and get new buckets and objects but I've "lost" all my
> old buckets.  I'm fairly confident the "lost" data is in the
> .rgw.buckets pool but my current zone is set to use .rgw.buckets_
>
>
>
> root@ceph-mon0:~# radosgw-admin zone get
> {
> "id": "default",
> "name": "default",
> "domain_root": ".rgw_",
> "control_pool": ".rgw.control_",
> "gc_pool": ".rgw.gc_",
> "log_pool": ".log_",
> "intent_log_pool": ".intent-log_",
> "usage_log_pool": ".usage_",
> "user_keys_pool": ".users_",
> "user_email_pool": ".users.email_",
> "user_swift_pool": ".users.swift_",
> "user_uid_pool": ".users.uid_",
> "system_key": {
> "access_key": "",
> "secret_key": ""
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": ".rgw.buckets.index_",
> "data_pool": ".rgw.buckets_",
> "data_extra_pool": ".rgw.buckets.extra_",
> "index_type": 0
> }
> }
> ],
> "metadata_heap": "default.rgw.meta",
> "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"
> }
>
>
> root@ceph-mon0:~# ceph osd pool ls |grep rgw|sort
> default.rgw.meta
> .rgw
> .rgw_
> .rgw.buckets
> .rgw.buckets_
> .rgw.buckets.index
> .rgw.buckets.index_
> .rgw.control
> .rgw.control_
> .rgw.gc
> .rgw.gc_
> .rgw.root
> .rgw.root.backup
>
> Should I just adjust the zone to use the pools without trailing
> slashes?  I'm a bit lost.  the last I could see from running the

Yes. The trailing slashes were needed when upgrading for 10.2.0, as
there was another bug, and I needed to add these to compensate for it.
I should update the script now to reflect that fix. You should just
update the json and set the zone appropriately.

Yehuda

> script didn't seem to indicate any errors (though I lost the to to
> scroll back buffer before i noticed the issue)
>
> Tail of output from running script:
> https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone
>
> + radosgw-admin zone set --rgw-zone=default
> zone id default{
> "id": "default",
> "name": "default",
> "domain_root": ".rgw_",
> "control_pool": ".rgw.control_",
> "gc_pool": ".rgw.gc_",
> "log_pool": ".log_",
> "intent_log_pool": ".intent-log_",
> "usage_log_pool": ".usage_",
> "user_keys_pool": ".users_",
> "user_email_pool": ".users.email_",
> "user_swift_pool": ".users.swift_",
> "user_uid_pool": ".users.uid_",
> "system_key": {
> "access_key": "",
> "secret_key": ""
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": ".rgw.buckets.index_",
> "data_pool": ".rgw.buckets_",
> "data_extra_pool": ".rgw.buckets.extra_",
> "index_type": 0
> }
> }
> ],
> "metadata_heap": "default.rgw.meta",
> "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"
> }
> + radosgw-admin zonegroup default --rgw-zonegroup=default
> + radosgw-admin zone default --rgw-zone=default
> root@ceph-mon0:~# radosgw-admin region get --rgw-zonegroup=default
> {
> "id": "default",
> "name": "default",
> "api_name": "",
> "is_master": "true",
> "endpoints": [],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "default",
> "zones": [
> {
> "id": "default",
> "name": "default",
> "endpoints": [],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 0,
> "read_only": "false"}
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": []
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"}
>
> root@ceph-mon0:~# ceph -v
> ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>
> Thanks,
> -Jon
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw s3website issue

2016-05-29 Thread Yehuda Sadeh-Weinraub

On Sun, May 29, 2016 at 4:47 AM, Gaurav Bafna  wrote:
> Hi Cephers,
>
> I am unable to create bucket hosting a webstite in my vstart cluster.
>
> When I do this in boto :
>
> website_bucket.configure_website('index.html','error.html')
>
> I get :
>
> boto.exception.S3ResponseError: S3ResponseError: 405 Method Not Allowed
>
>
> Here is my ceph.conf for radosgw:
>
> rgw frontends = fastcgi, civetweb port=8010
>
> rgw enable static website = true
>
> rgw dns name = 10.140.13.22
>
> rgw dns s3website name = 10.140.13.22
>
>
> Here are the logs in rgw :
>
> 2016-05-29 00:00:47.191297 7ff404ff9700  1 == starting new request
> req=0x7ff404ff37d0 =
>
> 2016-05-29 00:00:47.191325 7ff404ff9700  2 req 1:0.28::PUT
> /s3website/::initializing for trans_id =
> tx1-005749967f-101f-default
>
> 2016-05-29 00:00:47.191330 7ff404ff9700 10 host=10.140.13.22
>
> 2016-05-29 00:00:47.191338 7ff404ff9700 20 subdomain=
> domain=10.140.13.22 in_hosted_domain=1 in_hosted_domain_s3website=1
>

Could it be that the endpoint is configured to serve both S3 and
static websites?

Yehuda

> 2016-05-29 00:00:47.191350 7ff404ff9700  5 the op is PUT
>
> 2016-05-29 00:00:47.191395 7ff404ff9700 20 get_handler
> handler=32RGWHandler_REST_Bucket_S3Website
>
> 2016-05-29 00:00:47.191399 7ff404ff9700 10
> handler=32RGWHandler_REST_Bucket_S3Website
>
> 2016-05-29 00:00:47.191401 7ff404ff9700  2 req 1:0.000104:s3:PUT
> /s3website/::getting op 1
>
> 2016-05-29 00:00:47.191410 7ff404ff9700 10
> RGWHandler_REST_S3Website::error_handler err_no=-2003 http_ret=405
>
> 2016-05-29 00:00:47.191412 7ff404ff9700 20 No special error handling today!
>
> 2016-05-29 00:00:47.191415 7ff404ff9700 20 handler->ERRORHANDLER:
> err_no=-2003 new_err_no=-2003
>
> 2016-05-29 00:00:47.191504 7ff404ff9700  2 req 1:0.000207:s3:PUT
> /s3website/::op status=0
>
> 2016-05-29 00:00:47.191510 7ff404ff9700  2 req 1:0.000213:s3:PUT
> /s3website/::http status=405
>
> 2016-05-29 00:00:47.191511 7ff404ff9700  1 == req done
> req=0x7ff404ff37d0 op status=0 http_status=405 ==
>
>
> Code wise I see that put_op is not defined for
> RGWHandler_REST_S3Website class but is defined for
> RGWHandler_REST_Bucket_S3 class .
>
> Can somebody please help me out ?
>
>
>
>
> --
> Gaurav Bafna
> 9540631400
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW AWS4 issue.

2016-05-29 Thread Yehuda Sadeh-Weinraub

On Sun, May 29, 2016 at 11:13 AM, Khang Nguyễn Nhật
 wrote:
> Hi,
> I'm having problems with AWS4 in the CEPH Jewel when interact with the
> bucket, object.
> First I will talk briefly about my cluster. My cluster is used CEPH Jewel
> v10.2.1, including: 3 OSD, 2 monitors and 1 RGW.
> - Information in zonegroup:
> CLI: radosgw-admin zone list. (CLI is comand line)
> read_default_id : 0
> {
> "default_info": "03cde122-441d-46c5-a02d-19d28f3fd882",
> "zonegroups": [
> "default"
> ]
> }
>
> CLI: radosgw-admin zonegroup get
> {
> "id": "03cde122-441d-46c5-a02d-19d28f3fd882",
> "name": "default",
> "api_name": "",

^^^ api name

> "is_master": "true",
> "endpoints": [],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "cb991931-88b1-4415-9d7f-a22cdce55ce7",
> "zones": [
> {
> "id": "cb991931-88b1-4415-9d7f-a22cdce55ce7",
> "name": "default",
> "endpoints": [],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 0,
> "read_only": "false"
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": []
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "a62bf866-f52b-4732-80b0-50a7287703f1"
> }
> - Zone:
> CLI: radosgw-admin zone list
> {
> "default_info": "cb991931-88b1-4415-9d7f-a22cdce55ce7",
> "zones": [
> "default"
> ]
> }
>
> CLI: radosgw-admin zone get
> {
> "id": "cb991931-88b1-4415-9d7f-a22cdce55ce7",
> "name": "default",
> "domain_root": "default.rgw.data.root",
> "control_pool": "default.rgw.control",
> "gc_pool": "default.rgw.gc",
> "log_pool": "default.rgw.log",
> "intent_log_pool": "default.rgw.intent-log",
> "usage_log_pool": "default.rgw.usage",
> "user_keys_pool": "default.rgw.users.keys",
> "user_email_pool": "default.rgw.users.email",
> "user_swift_pool": "default.rgw.users.swift",
> "user_uid_pool": "default.rgw.users.uid",
> "system_key": {
> "access_key": "",
> "secret_key": ""
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": "default.rgw.buckets.index",
> "data_pool": "default.rgw.buckets.data",
> "data_extra_pool": "default.rgw.buckets.non-ec",
> "index_type": 0
> }
> }
> ],
> "metadata_heap": "default.rgw.meta",
> "realm_id": ""
> }
> - User infor:
> {
> "user_id": "1",
> "display_name": "User1",
> "email": "us...@ceph.com",
> "suspended": 0,
> "max_buckets": 1000,
> "auid": 0,
> "subusers": [],
> "keys": [
> {
> "user": "1",
> "access_key": "",
> "secret_key": ""
> }
> ],
> "swift_keys": [],
> "caps": [],
> "op_mask": "read, write, delete",
> "default_placement": "",
> "placement_tags": [],
> "bucket_quota": {
> "enabled": false,
> "max_size_kb": -1,
> "max_objects": -1
> },
> "user_quota": {
> "enabled": false,
> "max_size_kb": -1,
> "max_objects": -1
> },
> "temp_url_keys": []
> }
>
> -RGW config:
> [global]
> //
> rgw zonegroup root pool = .rgw.root
> [client.rgw.radosgw1]
> rgw_frontends = "civetweb port=
> error_log_file=/var/log/ceph/civetweb.error.log
> access_log_file=/var/log/ceph/civetweb.access.log debug-civetweb=10"
> rgw_zone  = default
> rgw region= default
> rgw enable ops log = true
> rgw log nonexistent bucket = true
> rgw enable usage log = true
> rgw log object name utc  = true
> rgw intent log object name = %Y-%m-%d-%i-%n
> rgw intent log object name utc = true
>
> User1 not own any bucket, any object. I used a python boto3 to interact with
> the S3, here is my code:
> s3 = boto3.client(service_name='s3',
> region_name='default',
> aws_access_key_id='',aws_secret_access_key='',
> use_ssl=False, endpoint_url='http://192.168.1.1:',
> config=Config(signature_version='s3v4'))
> print s3.list_buckets()
> And this is result:
> {u'Owner': {u'DisplayName': 'User1', u'ID': '1'}, u'Buckets': [],
> 'ResponseMetadata': {'HTTPStatusCode': 200, 'HostId': '', 'RequestId':
> 'tx1-00574b2e2f-6304-default'}}
> print s3.create_bucket(ACL='public-read-write', Bucket='image',
>CreateBucketConfiguration={'LocationConstraint':
> 'default'},

the location constraint needs to match the zonegroup's api name. Your
zonegroup does not have an api name, so this should be empty.

Yehuda

>GrantFullControl='image')
> And i recive:
> HTTP/1.1 400 Bad Request.
> botocore.exceptions.ClientError: An error occurred (InvalidRequest) when
> calling the CreateBucket operation: Unknown
>
> I did wrong something?

Re: [ceph-users] rgw pool names

2016-06-10 Thread Yehuda Sadeh-Weinraub

On Fri, Jun 10, 2016 at 11:44 AM, Deneau, Tom  wrote:
> When I start radosgw, I create the pool .rgw.buckets manually to control
> whether it is replicated or erasure coded and I let the other pools be
> created automatically.
>
> However, I have noticed that sometimes the pools get created with the 
> "default"
> prefix, thus
> rados lspools
>   .rgw.root
>   default.rgw.control
>   default.rgw.data.root
>   default.rgw.gc
>   default.rgw.log
>   .rgw.buckets  # the one I created
>   default.rgw.users.uid
>   default.rgw.users.keys
>   default.rgw.meta
>   default.rgw.buckets.index
>   default.rgw.buckets.data  # the one actually being used
>
> What controls whether these pools have the "default" prefix or not?
>

The prefix is the name of the zone ('default' by default). This was
added for the jewel release, as well as dropping the requirement of
having the pool names starts with a dot.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can not change access for containers

2016-06-28 Thread Yehuda Sadeh-Weinraub

On Tue, Jun 28, 2016 at 4:12 AM, John Mathew 
wrote:

> I am using radosgw as object storage in openstack liberty. I am using ceph
> jewel. Currently I can create public and private containers. But cannot
> change the access of containers ie. cannot change a public container to
> private and vice versa. There is pop-up. "Success: Successfully updated
> container access to public." But access is not changing. Couldn't find any
> errors in logs. I tried with ceph-infernalis, but couldn't recreate this
> with infernalis. Everything worked with infernalis. Could this be a bug
> with ceph jewel? Also does jewel support mulittenant namespace for
> containers.
>

Jewel does have a support for separate container namespaces (tenants).


>
> Thanks in advance
>
>
> COMMAND
>
> curl -X POST -i -H  "X-Auth-Token:x" -H "X-Container-Read: *" -L  "
> http://xxx:7480/swift/v1/pub5";
>

Can you try this instead?

curl -X POST -i -H  "X-Auth-Token:x" -H "X-Container-Read: .r:*"
-L  "http://xxx:7480/swift/v1/pub5";

Yehuda


>
> 2016-06-23 03:17:11.822539 7f0ae2ffd700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2016-06-23 03:17:33.822711 7f0ae2ffd700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2016-06-23 03:17:48.028376 7f09077fe700 20 RGWEnv::set(): HTTP_USER_AGENT:
> curl/7.35.0
> 2016-06-23 03:17:48.028397 7f09077fe700 20 RGWEnv::set(): HTTP_HOST:
> 10.10.20.9:7480
> 2016-06-23 03:17:48.028400 7f09077fe700 20 RGWEnv::set(): HTTP_ACCEPT: */*
> 2016-06-23 03:17:48.028403 7f09077fe700 20 RGWEnv::set():
> HTTP_X_AUTH_TOKEN: 5b83a5faf86e4df3baa087049e8a0b9a
> 2016-06-23 03:17:48.028410 7f09077fe700 20 RGWEnv::set():
> HTTP_X_CONTAINER_READ: *
> 2016-06-23 03:17:48.028412 7f09077fe700 20 RGWEnv::set(): REQUEST_METHOD:
> POST
> 2016-06-23 03:17:48.028414 7f09077fe700 20 RGWEnv::set(): REQUEST_URI:
> /swift/v1/pub5
> 2016-06-23 03:17:48.028416 7f09077fe700 20 RGWEnv::set(): QUERY_STRING:
> 2016-06-23 03:17:48.028422 7f09077fe700 20 RGWEnv::set(): REMOTE_USER:
> 2016-06-23 03:17:48.028424 7f09077fe700 20 RGWEnv::set(): SCRIPT_URI:
> /swift/v1/pub5
> 2016-06-23 03:17:48.028427 7f09077fe700 20 RGWEnv::set(): SERVER_PORT: 7480
> 2016-06-23 03:17:48.028429 7f09077fe700 20 HTTP_ACCEPT=*/*
> 2016-06-23 03:17:48.028430 7f09077fe700 20 HTTP_HOST=10.10.20.9:7480
> 2016-06-23 03:17:48.028431 7f09077fe700 20 HTTP_USER_AGENT=curl/7.35.0
> 2016-06-23 03:17:48.028432 7f09077fe700 20
> HTTP_X_AUTH_TOKEN=5b83a5faf86e4df3baa087049e8a0b9a
> 2016-06-23 03:17:48.028434 7f09077fe700 20 HTTP_X_CONTAINER_READ=*
> 2016-06-23 03:17:48.028435 7f09077fe700 20 QUERY_STRING=
> 2016-06-23 03:17:48.028436 7f09077fe700 20 REMOTE_USER=
> 2016-06-23 03:17:48.028437 7f09077fe700 20 REQUEST_METHOD=POST
> 2016-06-23 03:17:48.028438 7f09077fe700 20 REQUEST_URI=/swift/v1/pub5
> 2016-06-23 03:17:48.028439 7f09077fe700 20 SCRIPT_URI=/swift/v1/pub5
> 2016-06-23 03:17:48.028439 7f09077fe700 20 SERVER_PORT=7480
> 2016-06-23 03:17:48.028442 7f09077fe700  1 == starting new request
> req=0x7f09077f87d0 =
> 2016-06-23 03:17:48.028470 7f09077fe700  2 req 63:0.29::POST
> /swift/v1/pub5::initializing for trans_id =
> tx0003f-00576b8d1c-16d30b-default
> 2016-06-23 03:17:48.028478 7f09077fe700 10 host=10.10.20.9
> 2016-06-23 03:17:48.028482 7f09077fe700 20 subdomain= domain=
> in_hosted_domain=0 in_hosted_domain_s3website=0
> 2016-06-23 03:17:48.028494 7f09077fe700 10 meta>> HTTP_X_CONTAINER_READ
> 2016-06-23 03:17:48.028501 7f09077fe700 10 x>> x-amz-read:*
> 2016-06-23 03:17:48.028520 7f09077fe700 10 ver=v1 first=pub5 req=
> 2016-06-23 03:17:48.028527 7f09077fe700 10
> handler=28RGWHandler_REST_Bucket_SWIFT
> 2016-06-23 03:17:48.028530 7f09077fe700  2 req 63:0.89:swift:POST
> /swift/v1/pub5::getting op 4
> 2016-06-23 03:17:48.028535 7f09077fe700 10
> op=35RGWPutMetadataBucket_ObjStore_SWIFT
> 2016-06-23 03:17:48.028537 7f09077fe700  2 req 63:0.95:swift:POST
> /swift/v1/pub5:put_bucket_metadata:authorizing
> 2016-06-23 03:17:48.028544 7f09077fe700 20
> token_id=5b83a5faf86e4df3baa087049e8a0b9a
> 2016-06-23 03:17:48.028553 7f09077fe700 20 cached token.project.id
> =1c1ae7b02eaa4610bd46d04ddc0f3c00
> 2016-06-23 03:17:48.028559 7f09077fe700 20 updating
> user=1c1ae7b02eaa4610bd46d04ddc0f3c00
> 2016-06-23 03:17:48.028577 7f09077fe700 20 get_system_obj_state:
> rctx=0x7f09077f71d0
> obj=default.rgw.users.uid:1c1ae7b02eaa4610bd46d04ddc0f3c00$1c1ae7b02eaa4610bd46d04ddc0f3c00
> state=0x7f08f800c318 s->prefetch_data=0
> 2016-06-23 03:17:48.028589 7f09077fe700 10 cache get:
> name=default.rgw.users.uid+1c1ae7b02eaa4610bd46d04ddc0f3c00$1c1ae7b02eaa4610bd46d04ddc0f3c00
> : type miss (requested=6, cached=0)
> 2016-06-23 03:17:48.029626 7f09077fe700 10 cache put:
> name=default.rgw.users.uid+1c1ae7b02eaa4610bd46d04ddc0f3c00$1c1ae7b02eaa4610bd46d04ddc0f3c00
> info.flags=0
> 2016-06-23 03:17:48.029638 7f09077fe700 10 moving
> default.rgw.users.uid+1c1ae7b02eaa4610bd46d04ddc0f3c00$1c1ae7b02eaa4610bd46d04ddc0f3c00
> t

Re: [ceph-users] blind buckets

2016-07-28 Thread Yehuda Sadeh-Weinraub

In order to use indexless (blind) buckets, you need to create a new
placement target, and then set the placement target's index_type param
to 1.

Yehuda

On Tue, Jul 26, 2016 at 10:30 AM, Tyler Bischel
 wrote:
> Hi there,
>   We are looking at using Ceph (Jewel) for a use case that is very write
> heavy strictly as an object store.  We've been working with Rados Gateway
> because we can easily integrate with existing S3 libraries... but we will
> never be doing any of the bucket listing operations.  I am concerned about
> the potential bottleneck of the RGW index files.
>   I've read here that Jewel now supports "Blind Buckets"... with some
> reference to setting a the RGWBucketIndexType to RGWBIType_Indexless... and
> I'm guessing its set as "index_type" here.  In the docs, the only
> "index_type" reference I see is here, under the placement pools.  However,
> the Pools documentation doesn't really give a clue how to set this value, or
> even if this is the proper index_type field that I'm guessing.
>   So the two things I'm interested in figuring out is:
> 1) Are "Blind Buckets" actually production ready
> 2) How do I configure Rados Gateway to omit the index files?
> --Tyler
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] blind buckets

2016-07-28 Thread Yehuda Sadeh-Weinraub

On Thu, Jul 28, 2016 at 12:11 PM, Tyler Bischel
 wrote:
> Can I not update an existing placement target's index_type?  I had tried to
> update the default pool's index type:
>
> radosgw-admin zone get --rgw-zone=default > default-zone.json
>
> #replace index_type:0 to index_type:1 in the default zone file, under the
> default-placement entry of the placement_pools
>
> radosgw-admin zone set --rgw-zone=default --infile default-zone.json
>
> However, it seems like I can still access bucket lists of objects after
> additional objects added, which makes me think this setting isn't being
> respected in the way I thought it would.
>

It only affects newly created buckets.

Yehuda

> On Thu, Jul 28, 2016 at 9:59 AM, Yehuda Sadeh-Weinraub 
> wrote:
>>
>> In order to use indexless (blind) buckets, you need to create a new
>> placement target, and then set the placement target's index_type param
>> to 1.
>>
>> Yehuda
>>
>> On Tue, Jul 26, 2016 at 10:30 AM, Tyler Bischel
>>  wrote:
>> > Hi there,
>> >   We are looking at using Ceph (Jewel) for a use case that is very write
>> > heavy strictly as an object store.  We've been working with Rados
>> > Gateway
>> > because we can easily integrate with existing S3 libraries... but we
>> > will
>> > never be doing any of the bucket listing operations.  I am concerned
>> > about
>> > the potential bottleneck of the RGW index files.
>> >   I've read here that Jewel now supports "Blind Buckets"... with some
>> > reference to setting a the RGWBucketIndexType to RGWBIType_Indexless...
>> > and
>> > I'm guessing its set as "index_type" here.  In the docs, the only
>> > "index_type" reference I see is here, under the placement pools.
>> > However,
>> > the Pools documentation doesn't really give a clue how to set this
>> > value, or
>> > even if this is the proper index_type field that I'm guessing.
>> >   So the two things I'm interested in figuring out is:
>> > 1) Are "Blind Buckets" actually production ready
>> > 2) How do I configure Rados Gateway to omit the index files?
>> > --Tyler
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [jewel][rgw]why the usage log record date is 16 hours later than the real operate time

2016-07-28 Thread Yehuda Sadeh-Weinraub

On Thu, Jul 28, 2016 at 5:53 PM, Leo Yu  wrote:
> hi all,
>   i want get the usage of user,so i use the command radosgw-admin usage show
> ,but i can not get the usage when i use the --start-date unless  minus 16
> hours
>
> i have rgw both on ceph01 and ceph03,civeweb:7480 port ,and the ceph version
> is jewel 10.2.2
>
> the time zone of ceph01 and ceph03
> [root@ceph03 ~]# ls -l /etc/localtime
> lrwxrwxrwx 1 root root 35 Jul 25 07:40 /etc/localtime ->
> ../usr/share/zoneinfo/Asia/Shanghai
> [root@ceph03 ~]# ssh ceph01 ls -l /etc/localtime
> lrwxrwxrwx 1 root root 35 Jul 25 14:14 /etc/localtime ->
> ../usr/share/zoneinfo/Asia/Shanghai
>
>
> the timedateclt of ceph03
>
> [root@ceph03 ~]# timedatectl
> Warning: Ignoring the TZ variable. Reading the system's time zone setting
> only.
>
>   Local time: Fri 2016-07-29 08:28:44 CST
>   Universal time: Fri 2016-07-29 00:28:44 UTC
> RTC time: Fri 2016-07-29 00:28:44
>Time zone: Asia/Shanghai (CST, +0800)
>  NTP enabled: yes
> NTP synchronized: yes
>  RTC in local TZ: no
>   DST active: n/a
>
> the timedate of ceph01
> [root@ceph03 ~]# ssh ceph01 timedatectl
>   Local time: Fri 2016-07-29 08:32:43 CST
>   Universal time: Fri 2016-07-29 00:32:43 UTC
> RTC time: Fri 2016-07-29 08:32:43
>Time zone: Asia/Shanghai (CST, +0800)
>  NTP enabled: yes
> NTP synchronized: yes
>  RTC in local TZ: no
>   DST active: n/a
>
>
> i create bucket use the python script test2.py
>
> [root@ceph01 ~]# cat  test2.py
> import requests
> import logging
> from datetime import *
> from requests_toolbelt.utils import dump
> from awsauth import S3Auth
> # host = 'yuliyangdebugwebjewel.tunnel.qydev.com'
> #host = 'yuliyangdebugweb68.tunnel.qydev.com'
> #host = '10.254.9.20:7480'
> host = '10.254.3.68:7480' #ceph03
> #host = '127.0.0.1:7480'  #ceph01
> logging.basicConfig(level=logging.DEBUG)
> access_key = 'date2'
> secret_key = 'date2'
> cmd = '/%s' % '{:%m_%d.%H_%M_%S}'.format(datetime.now())
> #cmd = '/%s' % '{:%m_%d.%H_%M_%S}'.format(datetime.now() -
> timedelta(hours=16))
> url = 'http://%s%s' % (host,cmd)
> response = requests.put(url,auth=S3Auth(access_key,
> secret_key,service_url=host))
>
> data = dump.dump_all(response)
> print(data.decode('utf-8'))
>
>
> and it's output
>
> INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection
> (1): 10.254.3.68
> DEBUG:requests.packages.urllib3.connectionpool:"PUT /07_29.08_34_42
> HTTP/1.1" 200 0
> < PUT /07_29.08_34_42 HTTP/1.1
> < Host: 10.254.3.68:7480
> < Content-Length: 0
> < Accept-Encoding: gzip, deflate
> < Accept: */*
> < User-Agent: python-requests/2.6.0 CPython/2.7.5
> Linux/3.10.0-327.el7.x86_64
> < Connection: keep-alive
> < date: Fri, 29 Jul 2016 00:34:42 GMT
> < Authorization: AWS date2:F+KLuenLNP42e25P/My/VWoUkeA=
> <
>
>> HTTP/1.1 200 OK
>> date: Fri, 29 Jul 2016 00:34:42 GMT
>> content-length: 0
>> x-amz-request-id: tx16112-00579aa4a2-c4c4f-default
>> connection: Keep-Alive
>>
>
>
>
> but the usage show the bucket create 16 hours before
>
> [root@ceph01 ~]# radosgw-admin usage show --uid=date2 |grep 07_29.08_34_42
> -A30
> "bucket": "07_29.08_34_42",
> "time": "2016-07-28 16:00:00.00Z",
> "epoch": 1469721600,
> "owner": "date2",
> "categories": [
> {
> "category": "create_bucket",
> "bytes_sent": 19,
> "bytes_received": 0,
> "ops": 1,
> "successful_ops": 1
> }
> ]
> },
>
>
> the time("time": "2016-07-28 16:00:00.00Z",) is 16 hours later than
> Local time: Fri 2016-07-29 08:28:44 CST
>
> i can get the usage show by radosgw-admin usage show --uid=date2
> --start-date="2016-07-28 16:00:00"
> [root@ceph03 ~]# radosgw-admin usage show --uid=date2
> --start-date="2016-07-28 16:00:00"
> {
> "entries": [
> {
> "user": "date2",
> "buckets": [
> {
> "bucket": "07_29.08_34_42",
> "time": "2016-07-28 16:00:00.00Z",
> "epoch": 1469721600,
> "owner": "date2",
> "categories": [
> {
> "category": "create_bucket",
> "bytes_sent": 19,
> "bytes_received": 0,
> "ops": 1,
> "successful_ops": 1
> }
> ]
> }
> ]
> }
> ],
> "summary": [
> {
> "user": "date2",
> "categories": [
> {
> "category": "create_bucket",
> "bytes_sent": 19,
>

Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-03 Thread Yehuda Sadeh-Weinraub

On Wed, Aug 3, 2016 at 10:10 AM, Brian Felton  wrote:

> This may just be me having a conversation with myself, but maybe this will
> be helpful to someone else.
>
> Having dug and dug and dug through the code, I've come to the following
> realizations:
>
>1. When a multipart upload is completed, the function
>list_multipart_parts in rgw_op.cc is called.  This seems to be the start of
>the problems, as it will only return those parts in the 'multipart'
>namespace that include the upload id in the name, irrespective of how many
>copies of parts exist on the system with non-upload id prefixes
>2. In the course of writing to the OSDs, a list (remove_objs) is
>processed in cls_rgw.cc:unaccount_entry(), causing bucket stats to be
>decremented
>3. These decremented stats are written to the bucket's index
>entry/entries in .rgw.buckets.index via the CEPH_OSD_OP_OMAPSETHEADER case
>in ReplicatedPG::do_osd_ops
>
> So this explains why manually removing the multipart entries from
> .rgw.buckets and cleaning the shadow entries in .rgw.buckets.index does not
> cause the bucket's stats to be updated.  What I don't know how to do is
> force an update of the bucket's stats from the CLI.  I can retrieve the
> omap header from each of the bucket's shards in .rgw.buckets.index, but I
> don't have the first clue how to read the data or rebuild it into something
> valid.  I've searched the docs and mailing list archives, but I didn't find
> any solution to this problem.  For what it's worth, I've tried 'bucket
> check' with all combinations of '--check-objects' and '--fix' after
> cleaning up .rgw.buckets and .rgw.buckets.index.
>
> From a long-term perspective, it seems there are two possible fixes here:
>
>1. Update the logic in list_multipart_parts to return all the parts
>for a multipart object, so that *all* parts in the 'multipart' namespace
>can be properly removed
>2. Update the logic in RGWPutObj::execute() to not restart a write if
>the put_data_and_throttle() call returns -EEXIST but instead put the data
>in the original file(s)
>
> While I think 2 would involve the least amount of yak shaving with the
> multipart logic since the MP logic already assumes a happy path where all
> objects have a prefix of the multipart upload id, I'm all but certain this
> is going to horribly break many other parts of the system that I don't
> fully understand.
>

#2 is dangerous. That was the original behavior, and it is racy and *will*
lead to data corruption.  OTOH, I don't think #1 is an easy option. We only
keep a single entry per part, so we don't really have a good way to see all
the uploaded pieces. We could extend the meta object to keep record of all
the uploaded parts, and at the end, when assembling everything remove the
parts that aren't part of the final assembly.

> The good news is that the assembly of the multipart object is being done
> correctly; what I can't figure out is how it knows about the non-upload id
> prefixes when creating the metadata on the multipart object in
> .rgw.buckets.  My best guess is that it's copying the metadata from the
> 'meta' object in .rgw.buckets.extra (which is correctly updated with the
> new part prefixes after each successful upload), but I haven't absolutely
> confirmed that.
>

Yeah, something along these lines.


> If one of the developer folk that are more familiar with this could weigh
> in, I would be greatly appreciative.
>

btw, did you try to run the radosgw-admin orphan find tool?

Yehuda

> Brian
>
> On Tue, Aug 2, 2016 at 8:59 AM, Brian Felton  wrote:
>
>> I am actively working through the code and debugging everything.  I
>> figure the issue is with how RGW is listing the parts of a multipart upload
>> when it completes or aborts the upload (read: it's not getting *all* the
>> parts, just those that are either most recent or tagged with the upload
>> id).  As soon as I can figure out a patch, or, more importantly, how to
>> manually address the problem, I will respond with instructions.
>>
>> The reported bug contains detailed instructions on reproducing the
>> problem, so it's trivial to reproduce and test on a small and/or new
>> cluster.
>>
>> Brian
>>
>>
>> On Tue, Aug 2, 2016 at 8:53 AM, Tyler Bishop <
>> tyler.bis...@beyondhosting.net> wrote:
>>
>>> We're having the same issues.   I have a 1200TB pool at 90% utilization
>>> however disk utilization is only 40%
>>>
>>>
>>>
>>>  [image: http://static.beyondhosting.net/img/bh-small.png]
>>>
>>>
>>> *Tyler Bishop *Chief Technical Officer
>>> 513-299-7108 x10
>>>
>>> tyler.bis...@beyondhosting.net
>>>
>>> If you are not the intended recipient of this transmission you are
>>> notified that disclosing, copying, distributing or taking any action in
>>> reliance on the contents of this information is strictly prohibited.
>>>
>>>
>>>
>>> --
>>> *From: *"Brian Felton" 
>>> *To: *"ceph-users" 
>>> *Sent: *Wednesday, July 27, 201

Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-03 Thread Yehuda Sadeh-Weinraub

On Wed, Aug 3, 2016 at 10:57 AM, Brian Felton  wrote:

> I should clarify:
>
> There doesn't seem to be a problem with list_multipart_parts -- upon
> further review, it seems to be doing the right thing.  What tipped me off
> is that when one aborts a multipart upload where parts have been uploaded
> more than once, the last copy of each part uploaded is successfully removed
> (not just removed from the bucket's stats, as with complete multipart, but
> submitted for garbage collection).  The difference seems to be in the
> following:
>
> In RGWCompleteMultipart::execute, the removal doesn't occur on the entries
> returned from list_mutlpart_parts; instead, we initialize a 'src_obj'
> rgw_obj structure and grab its index key
> (src_obj.get_index_key(&remove_key)), which is then pushed onto remove_objs.
>

iirc, we don't really remove the objects there. Only remove the entries
from the index.


>
> In RGWAbortMultipart::execute, we operate directly on the
> RGWUploadPartInfo value in the obj_parts map, submitting it for deletion
> (gc) if its manifest is empty.
>
> If this is correct, there is no "fix" for list_multipart_parts; instead,
> it would seem that the only fix is to not allow an upload part to generate
> a new prefix in RGWPutObj::execute().
>

The problem is that operations can happen concurrently, so the decision
whether to remove or not to remove an entry is not very easy. We have seen
before that application initiated multiple uploads of the same part, but
the one that actually complete the last was not the last to upload (e.g.,
due to networking timeouts and retries that happen in different layers).


> Since I don't really have any context on why a new prefix would be
> generated if the object already exists, I'm not the least bit confident
> that changing it will not have all sorts of unforeseen consequences.  That
> said, since all knowledge of an uploaded part seems to vanish from
> existence once it has been replaced, I don't see how the accounting of
> multipart data will ever be correct.
>

Having a mutable part is problematic, since different uploads might step on
each other (as with the example I provided above), and you end up with
corrupted data.


>
> And yes, I've tried the orphan find, but I'm not really sure what to do
> with the results.  The post I could find in the mailing list (mostly from
> you), seemed to conclude that no action should be taken on the things that
> it finds are orphaned.  Also, I have removed a significant number of
> multipart and shadow files that are not valid, but none of that actually
>

The tool is not removing data, only reporting about possible leaked rados
objects.


> updates the buckets stats to the correct values.  If I had some mechanism
> for forcing that, this would be much less of a big deal.
>

Right, this is a separate issue. Did you try running 'radosgw-admin bucket
check --fix'?

Yehuda


>
>
> Brian
>
> On Wed, Aug 3, 2016 at 12:46 PM, Yehuda Sadeh-Weinraub 
> wrote:
>
>>
>>
>> On Wed, Aug 3, 2016 at 10:10 AM, Brian Felton  wrote:
>>
>>> This may just be me having a conversation with myself, but maybe this
>>> will be helpful to someone else.
>>>
>>> Having dug and dug and dug through the code, I've come to the following
>>> realizations:
>>>
>>>1. When a multipart upload is completed, the function
>>>list_multipart_parts in rgw_op.cc is called.  This seems to be the start 
>>> of
>>>the problems, as it will only return those parts in the 'multipart'
>>>namespace that include the upload id in the name, irrespective of how 
>>> many
>>>copies of parts exist on the system with non-upload id prefixes
>>>2. In the course of writing to the OSDs, a list (remove_objs) is
>>>processed in cls_rgw.cc:unaccount_entry(), causing bucket stats to be
>>>decremented
>>>3. These decremented stats are written to the bucket's index
>>>entry/entries in .rgw.buckets.index via the CEPH_OSD_OP_OMAPSETHEADER 
>>> case
>>>in ReplicatedPG::do_osd_ops
>>>
>>> So this explains why manually removing the multipart entries from
>>> .rgw.buckets and cleaning the shadow entries in .rgw.buckets.index does not
>>> cause the bucket's stats to be updated.  What I don't know how to do is
>>> force an update of the bucket's stats from the CLI.  I can retrieve the
>>> omap header from each of the bucket's shards in .rgw.buckets.index, but I
>>> don't have the first clue how to read the data or rebuild it

Re: [ceph-users] Radosgw versioning S3 compatible?

2017-06-28 Thread Yehuda Sadeh-Weinraub

On Wed, Jun 28, 2017 at 8:13 AM, Martin Emrich
 wrote:
> Correction: It’s about the Version expiration, not the versioning itself.
>
> We send this rule:
>
>
>
>   Rules: [
>
> {
>
>   Status: 'Enabled',
>
>   Prefix: '',
>
>   NoncurrentVersionExpiration: {
>
> NoncurrentDays: 60
>
>   },
>
>   Expiration: {
>
> ExpiredObjectDeleteMarker: true
>
>   },
>
>   ID: 'expire-60days'
>
> }
>
>   ]
>
>
>
> Should that be supported?
>

Currently rgw bucket lifecycle rules do not versioning.

Yehuda

>
>
> Thanks
>
>
>
> Martin
>
>
>
> Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von
> Martin Emrich
> Gesendet: Mittwoch, 28. Juni 2017 16:13
> An: ceph-users@lists.ceph.com
> Betreff: [ceph-users] Radosgw versioning S3 compatible?
>
>
>
> Hi!
>
>
>
> Is the Object Gateway S3 API supposed to be compatible with Amazon S3
> regarding versioning?
>
>
>
> Object Versioning is listed as supported in Ceph 12.1, but using the
> standard Node.js aws-sdk module (s3.putBucketVersioning()) results in
> “NotImplemented”.
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to replicate metadata only on RGW multisite?

2017-06-30 Thread Yehuda Sadeh-Weinraub

On Fri, Jun 30, 2017 at 4:49 AM, Henrik Korkuc  wrote:
> Hello,
>
> I have RGW multisite setup on Jewel and I would like to turn off data
> replication there so that only metadata (users, created buckets, etc) would
> be synced but not the data.
>
>

FWIW, not in jewel, but in kraken the zone info has two params:
 - sync_from_all
 - sync_from

The first specifies whether zone should sync from all its peers
(within the same zonegroup), and the second one specifies what zones
to sync from (in case sync_from_all is false).

However, as I said, this doesn't exist in jewel. In jewel you can
still just put the zone in a separate zonegroup as data is synced only
within the same zonegroup. Metadata syncs from the master zone to any
other zones.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Repeated failures in RGW in Ceph 12.1.4

2017-08-30 Thread Yehuda Sadeh-Weinraub

On Wed, Aug 30, 2017 at 5:44 PM, Bryan Banister
 wrote:
> Not sure what’s happening but we started to but a decent load on the RGWs we
> have setup and we were seeing failures with the following kind of
> fingerprint:
>
>
>
> 2017-08-29 17:06:22.072361 7ffdc501a700  1 rgw realm reloader: Frontends
> paused
>

Are you modifying configuration? Could be that something is sending
HUP singal to the radosgw process. We disabled this behavior (process
dynamic reconfig after HUP) in 12.2.0.

Yehuda

> 2017-08-29 17:06:22.072359 7fffacbe9700  1 civetweb: 0x56add000:
> 7.128.12.19 - - [29/Aug/2017:16:47:36 -0500] "PUT
> /blah?partNumber=8&uploadId=2~L9MEmUUmZKb2y8JCotxo62yzdMbHmye HTTP/1.1" 1 0
> - Minio (linux; amd64) minio-go/3.0.0
>
> 2017-08-29 17:06:22.072438 7fffcb426700  0 ERROR: failed to clone shard,
> completion_mgr.get_next() returned ret=-125
>
> 2017-08-29 17:06:23.689610 7ffdc501a700  1 rgw realm reloader: Store closed
>
> 2017-08-29 17:06:24.117630 7ffdc501a700  1 failed to decode the mdlog
> history: buffer::end_of_buffer
>
> 2017-08-29 17:06:24.117635 7ffdc501a700  1 failed to read mdlog history: (5)
> Input/output error
>
> 2017-08-29 17:06:24.118711 7ffdc501a700  1 rgw realm reloader: Creating new
> store
>
> 2017-08-29 17:06:24.118901 7ffdc501a700  1 mgrc service_daemon_register
> rgw.carf-ceph-osd01 metadata {arch=x86_64,ceph_version=ceph version 12.1.4
> (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc),cpu=Intel(R)
> Xeon(R) CPU E5-2680 v4 @ 2.40GHz,distro=rhel,distro_description=Red Hat
> Enterprise Linux Server 7.3
> (Maipo),distro_version=7.3,frontend_config#0=civetweb port=80
> num_threads=1024,frontend_type#0=civetweb,hos
>
> tname=carf-ceph-osd01,kernel_description=#1 SMP Tue Apr 4 04:49:42 CDT
> 2017,kernel_version=3.10.0-514.6.1.el7.jump3.x86_64,mem_swap_kb=0,mem_total_kb=263842036,num_handles=1,os=Linux,pid=14723,zone_id=b0634f34-67e2-4b44-ab00-5282f1e2cd83,zone_name=carf01,zonegroup_id=8207fcf5-7bd3-43df-ab5a-ea17e5949eec,zonegroup_name=us}
>
> 2017-08-29 17:06:24.118925 7ffdc501a700  1 rgw realm reloader: Finishing
> initialization of new store
>
> 2017-08-29 17:06:24.118927 7ffdc501a700  1 rgw realm reloader:  - REST
> subsystem init
>
> 2017-08-29 17:06:24.118943 7ffdc501a700  1 rgw realm reloader:  - user
> subsystem init
>
> 2017-08-29 17:06:24.118947 7ffdc501a700  1 rgw realm reloader:  - user
> subsystem init
>
> 2017-08-29 17:06:24.118950 7ffdc501a700  1 rgw realm reloader:  - usage
> subsystem init
>
> 2017-08-29 17:06:24.118985 7ffdc501a700  1 rgw realm reloader: Resuming
> frontends with new realm configuration.
>
> 2017-08-29 17:06:24.119018 7fffad3ea700  1 == starting new request
> req=0x7fffad3e4190 =
>
> 2017-08-29 17:06:24.119039 7fffacbe9700  1 == starting new request
> req=0x7fffacbe3190 =
>
> 2017-08-29 17:06:24.120163 7fffacbe9700  1 == req done
> req=0x7fffacbe3190 op status=0 http_status=403 ==
>
> 2017-08-29 17:06:24.120200 7fffad3ea700  1 == req done
> req=0x7fffad3e4190 op status=0 http_status=403 ==
>
>
>
> Any help understanding how to fix this would be greatly appreciated!
>
> -Bryan
>
>
> 
>
> Note: This email is for the confidential use of the named addressee(s) only
> and may contain proprietary, confidential or privileged information. If you
> are not the intended recipient, you are hereby notified that any review,
> dissemination or copying of this email is strictly prohibited, and to please
> notify the sender immediately and destroy this email and any attachments.
> Email transmission cannot be guaranteed to be secure or error-free. The
> Company, therefore, does not make any guarantees as to the completeness or
> accuracy of this email or any attachments. This email is for informational
> purposes only and does not constitute a recommendation, offer, request or
> solicitation of any kind to buy, sell, subscribe, redeem or perform any type
> of transaction of a financial product.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread Yehuda Sadeh-Weinraub

On Thu, Sep 7, 2017 at 7:44 PM, David Turner  wrote:
> Ok, I've been testing, investigating, researching, etc for the last week and
> I don't have any problems with data syncing.  The clients on one side are
> creating multipart objects while the multisite sync is creating them as
> whole objects and one of the datacenters is slower at cleaning up the shadow
> files.  That's the big discrepancy between object counts in the pools
> between datacenters.  I created a tool that goes through for each bucket in
> a realm and does a recursive listing of all objects in it for both
> datacenters and compares the 2 lists for any differences.  The data is
> definitely in sync between the 2 datacenters down to the modified time and
> byte of each file in s3.
>
> The metadata is still not syncing for the other realm, though.  If I run
> `metadata sync init` then the second datacenter will catch up with all of
> the new users, but until I do that newly created users on the primary side
> don't exist on the secondary side.  `metadata sync status`, `sync status`,
> `metadata sync run` (only left running for 30 minutes before I ctrl+c it),
> etc don't show any problems... but the new users just don't exist on the
> secondary side until I run `metadata sync init`.  I created a new bucket
> with the new user and the bucket shows up in the second datacenter, but no
> objects because the objects don't have a valid owner.
>
> Thank you all for the help with the data sync issue.  You pushed me into
> good directions.  Does anyone have any insight as to what is preventing the
> metadata from syncing in the other realm?  I have 2 realms being sync using
> multi-site and it's only 1 of them that isn't getting the metadata across.
> As far as I can tell it is configured identically.

What do you mean you have two realms? Zones and zonegroups need to
exist in the same realm in order for meta and data sync to happen
correctly. Maybe I'm misunderstanding.

Yehuda

>
> On Thu, Aug 31, 2017 at 12:46 PM David Turner  wrote:
>>
>> All of the messages from sync error list are listed below.  The number on
>> the left is how many times the error message is found.
>>
>>1811 "message": "failed to sync bucket instance:
>> (16) Device or resource busy"
>>   7 "message": "failed to sync bucket instance:
>> (5) Input\/output error"
>>  65 "message": "failed to sync object"
>>
>> On Tue, Aug 29, 2017 at 10:00 AM Orit Wasserman 
>> wrote:
>>>
>>>
>>> Hi David,
>>>
>>> On Mon, Aug 28, 2017 at 8:33 PM, David Turner 
>>> wrote:

 The vast majority of the sync error list is "failed to sync bucket
 instance: (16) Device or resource busy".  I can't find anything on Google
 about this error message in relation to Ceph.  Does anyone have any idea
 what this means? and/or how to fix it?
>>>
>>>
>>> Those are intermediate errors resulting from several radosgw trying to
>>> acquire the same sync log shard lease. It doesn't effect the sync progress.
>>> Are there any other errors?
>>>
>>> Orit


 On Fri, Aug 25, 2017 at 2:48 PM Casey Bodley  wrote:
>
> Hi David,
>
> The 'data sync init' command won't touch any actual object data, no.
> Resetting the data sync status will just cause a zone to restart a full 
> sync
> of the --source-zone's data changes log. This log only lists which
> buckets/shards have changes in them, which causes radosgw to consider them
> for bucket sync. So while the command may silence the warnings about data
> shards being behind, it's unlikely to resolve the issue with missing 
> objects
> in those buckets.
>
> When data sync is behind for an extended period of time, it's usually
> because it's stuck retrying previous bucket sync failures. The 'sync error
> list' may help narrow down where those failures are.
>
> There is also a 'bucket sync init' command to clear the bucket sync
> status. Following that with a 'bucket sync run' should restart a full sync
> on the bucket, pulling in any new objects that are present on the
> source-zone. I'm afraid that those commands haven't seen a lot of polish 
> or
> testing, however.
>
> Casey
>
>
> On 08/24/2017 04:15 PM, David Turner wrote:
>
> Apparently the data shards that are behind go in both directions, but
> only one zone is aware of the problem.  Each cluster has objects in their
> data pool that the other doesn't have.  I'm thinking about initiating a
> `data sync init` on both sides (one at a time) to get them back on the 
> same
> page.  Does anyone know if that command will overwrite any local data that
> the zone has that the other doesn't if you run `data sync init` on it?
>
> On Thu, Aug 24, 2017 at 1:51 PM David Turner 
> wrote:
>>
>> After restarting the 2 RGW daemons on the second site again,
>> everything caught

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread Yehuda Sadeh-Weinraub

On Thu, Sep 7, 2017 at 10:04 PM, David Turner  wrote:
> One realm is called public with a zonegroup called public-zg with a zone for
> each datacenter.  The second realm is called internal with a zonegroup
> called internal-zg with a zone for each datacenter.  they each have their
> own rgw's and load balancers.  The needs of our public facing rgw's and load
> balancers vs internal use ones was different enough that we split them up
> completely.  We also have a local realm that does not use multisite and a
> 4th realm called QA that mimics the public realm as much as possible for
> staging configuration stages for the rgw daemons.  All 4 realms have their
> own buckets, users, etc and that is all working fine.  For all of the
> radosgw-admin commands I am using the proper identifiers to make sure that
> each datacenter and realm are running commands on exactly what I expect them
> to (--rgw-realm=public --rgw-zonegroup=public-zg --rgw-zone=public-dc1
> --source-zone=public-dc2).
>
> The data sync issue was in the internal realm but running a data sync init
> and kickstarting the rgw daemons in each datacenter fixed the data
> discrepancies (I'm thinking it had something to do with a power failure a
> few months back that I just noticed recently).  The metadata sync issue is
> in the public realm.  I have no idea what is causing this to not sync
> properly since running a `metadata sync init` catches it back up to the
> primary zone, but then it doesn't receive any new users created after that.
>

Sounds like an issue with the metadata log in the primary master zone.
Not sure what could go wrong there, but maybe the master zone doesn't
know that it is a master zone, or it's set to not log metadata. Or
maybe there's a problem when the secondary is trying to fetch the
metadata log. Maybe some kind of # of shards mismatch (though not
likely).
Try to see if the master logs any changes: should use the
'radosgw-admin mdlog list' command.

Yehuda

> On Thu, Sep 7, 2017 at 2:52 PM Yehuda Sadeh-Weinraub 
> wrote:
>>
>> On Thu, Sep 7, 2017 at 7:44 PM, David Turner 
>> wrote:
>> > Ok, I've been testing, investigating, researching, etc for the last week
>> > and
>> > I don't have any problems with data syncing.  The clients on one side
>> > are
>> > creating multipart objects while the multisite sync is creating them as
>> > whole objects and one of the datacenters is slower at cleaning up the
>> > shadow
>> > files.  That's the big discrepancy between object counts in the pools
>> > between datacenters.  I created a tool that goes through for each bucket
>> > in
>> > a realm and does a recursive listing of all objects in it for both
>> > datacenters and compares the 2 lists for any differences.  The data is
>> > definitely in sync between the 2 datacenters down to the modified time
>> > and
>> > byte of each file in s3.
>> >
>> > The metadata is still not syncing for the other realm, though.  If I run
>> > `metadata sync init` then the second datacenter will catch up with all
>> > of
>> > the new users, but until I do that newly created users on the primary
>> > side
>> > don't exist on the secondary side.  `metadata sync status`, `sync
>> > status`,
>> > `metadata sync run` (only left running for 30 minutes before I ctrl+c
>> > it),
>> > etc don't show any problems... but the new users just don't exist on the
>> > secondary side until I run `metadata sync init`.  I created a new bucket
>> > with the new user and the bucket shows up in the second datacenter, but
>> > no
>> > objects because the objects don't have a valid owner.
>> >
>> > Thank you all for the help with the data sync issue.  You pushed me into
>> > good directions.  Does anyone have any insight as to what is preventing
>> > the
>> > metadata from syncing in the other realm?  I have 2 realms being sync
>> > using
>> > multi-site and it's only 1 of them that isn't getting the metadata
>> > across.
>> > As far as I can tell it is configured identically.
>>
>> What do you mean you have two realms? Zones and zonegroups need to
>> exist in the same realm in order for meta and data sync to happen
>> correctly. Maybe I'm misunderstanding.
>>
>> Yehuda
>>
>> >
>> > On Thu, Aug 31, 2017 at 12:46 PM David Turner 
>> > wrote:
>> >>
>> >> All of the messages from sync error list are listed below.  The number
>> >> on
>> >

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread Yehuda Sadeh-Weinraub

On Thu, Sep 7, 2017 at 11:02 PM, David Turner  wrote:
> I created a test user named 'ice' and then used it to create a bucket named
> ice.  The bucket ice can be found in the second dc, but not the user.
> `mdlog list` showed ice for the bucket, but not for the user.  I performed
> the same test in the internal realm and it showed the user and bucket both
> in `mdlog list`.
>

Maybe your radosgw-admin command is running with a ceph user that
doesn't have permissions to write to the log pool? (probably not,
because you are able to run the sync init commands).
Another very slim explanation would be if you had for some reason
overlapping zones configuration that shared some of the config but not
all of it, having radosgw running against the correct one and
radosgw-admin against the bad one. I don't think it's the second
option.

Yehuda

>
>
> On Thu, Sep 7, 2017 at 3:27 PM Yehuda Sadeh-Weinraub 
> wrote:
>>
>> On Thu, Sep 7, 2017 at 10:04 PM, David Turner 
>> wrote:
>> > One realm is called public with a zonegroup called public-zg with a zone
>> > for
>> > each datacenter.  The second realm is called internal with a zonegroup
>> > called internal-zg with a zone for each datacenter.  they each have
>> > their
>> > own rgw's and load balancers.  The needs of our public facing rgw's and
>> > load
>> > balancers vs internal use ones was different enough that we split them
>> > up
>> > completely.  We also have a local realm that does not use multisite and
>> > a
>> > 4th realm called QA that mimics the public realm as much as possible for
>> > staging configuration stages for the rgw daemons.  All 4 realms have
>> > their
>> > own buckets, users, etc and that is all working fine.  For all of the
>> > radosgw-admin commands I am using the proper identifiers to make sure
>> > that
>> > each datacenter and realm are running commands on exactly what I expect
>> > them
>> > to (--rgw-realm=public --rgw-zonegroup=public-zg --rgw-zone=public-dc1
>> > --source-zone=public-dc2).
>> >
>> > The data sync issue was in the internal realm but running a data sync
>> > init
>> > and kickstarting the rgw daemons in each datacenter fixed the data
>> > discrepancies (I'm thinking it had something to do with a power failure
>> > a
>> > few months back that I just noticed recently).  The metadata sync issue
>> > is
>> > in the public realm.  I have no idea what is causing this to not sync
>> > properly since running a `metadata sync init` catches it back up to the
>> > primary zone, but then it doesn't receive any new users created after
>> > that.
>> >
>>
>> Sounds like an issue with the metadata log in the primary master zone.
>> Not sure what could go wrong there, but maybe the master zone doesn't
>> know that it is a master zone, or it's set to not log metadata. Or
>> maybe there's a problem when the secondary is trying to fetch the
>> metadata log. Maybe some kind of # of shards mismatch (though not
>> likely).
>> Try to see if the master logs any changes: should use the
>> 'radosgw-admin mdlog list' command.
>>
>> Yehuda
>>
>> > On Thu, Sep 7, 2017 at 2:52 PM Yehuda Sadeh-Weinraub 
>> > wrote:
>> >>
>> >> On Thu, Sep 7, 2017 at 7:44 PM, David Turner 
>> >> wrote:
>> >> > Ok, I've been testing, investigating, researching, etc for the last
>> >> > week
>> >> > and
>> >> > I don't have any problems with data syncing.  The clients on one side
>> >> > are
>> >> > creating multipart objects while the multisite sync is creating them
>> >> > as
>> >> > whole objects and one of the datacenters is slower at cleaning up the
>> >> > shadow
>> >> > files.  That's the big discrepancy between object counts in the pools
>> >> > between datacenters.  I created a tool that goes through for each
>> >> > bucket
>> >> > in
>> >> > a realm and does a recursive listing of all objects in it for both
>> >> > datacenters and compares the 2 lists for any differences.  The data
>> >> > is
>> >> > definitely in sync between the 2 datacenters down to the modified
>> >> > time
>> >> > and
>> >> > byte of each file in s3.
>> >> >
>> >> > The metadata is still not syncing for t

Re: [ceph-users] RGW Multisite metadata sync init

2017-09-07 Thread Yehuda Sadeh-Weinraub

On Thu, Sep 7, 2017 at 11:37 PM, David Turner  wrote:
> I'm pretty sure I'm using the cluster admin user/keyring.  Is there any
> output that would be helpful?  Period, zonegroup get, etc?

 - radosgw-admin period get
 - radosgw-admin zone list
 - radosgw-admin zonegroup list

For each zone, zonegroup in result:
 - radosgw-admin zone get --rgw-zone=
 - radosgw-admin zonegroup get --rgw-zonegroup=

 - rados lspools

Also, create a user with --debug-rgw=20 --debug-ms=1, need to look at the log.

Yehuda


>
> On Thu, Sep 7, 2017 at 4:27 PM Yehuda Sadeh-Weinraub 
> wrote:
>>
>> On Thu, Sep 7, 2017 at 11:02 PM, David Turner 
>> wrote:
>> > I created a test user named 'ice' and then used it to create a bucket
>> > named
>> > ice.  The bucket ice can be found in the second dc, but not the user.
>> > `mdlog list` showed ice for the bucket, but not for the user.  I
>> > performed
>> > the same test in the internal realm and it showed the user and bucket
>> > both
>> > in `mdlog list`.
>> >
>>
>> Maybe your radosgw-admin command is running with a ceph user that
>> doesn't have permissions to write to the log pool? (probably not,
>> because you are able to run the sync init commands).
>> Another very slim explanation would be if you had for some reason
>> overlapping zones configuration that shared some of the config but not
>> all of it, having radosgw running against the correct one and
>> radosgw-admin against the bad one. I don't think it's the second
>> option.
>>
>> Yehuda
>>
>> >
>> >
>> > On Thu, Sep 7, 2017 at 3:27 PM Yehuda Sadeh-Weinraub 
>> > wrote:
>> >>
>> >> On Thu, Sep 7, 2017 at 10:04 PM, David Turner 
>> >> wrote:
>> >> > One realm is called public with a zonegroup called public-zg with a
>> >> > zone
>> >> > for
>> >> > each datacenter.  The second realm is called internal with a
>> >> > zonegroup
>> >> > called internal-zg with a zone for each datacenter.  they each have
>> >> > their
>> >> > own rgw's and load balancers.  The needs of our public facing rgw's
>> >> > and
>> >> > load
>> >> > balancers vs internal use ones was different enough that we split
>> >> > them
>> >> > up
>> >> > completely.  We also have a local realm that does not use multisite
>> >> > and
>> >> > a
>> >> > 4th realm called QA that mimics the public realm as much as possible
>> >> > for
>> >> > staging configuration stages for the rgw daemons.  All 4 realms have
>> >> > their
>> >> > own buckets, users, etc and that is all working fine.  For all of the
>> >> > radosgw-admin commands I am using the proper identifiers to make sure
>> >> > that
>> >> > each datacenter and realm are running commands on exactly what I
>> >> > expect
>> >> > them
>> >> > to (--rgw-realm=public --rgw-zonegroup=public-zg
>> >> > --rgw-zone=public-dc1
>> >> > --source-zone=public-dc2).
>> >> >
>> >> > The data sync issue was in the internal realm but running a data sync
>> >> > init
>> >> > and kickstarting the rgw daemons in each datacenter fixed the data
>> >> > discrepancies (I'm thinking it had something to do with a power
>> >> > failure
>> >> > a
>> >> > few months back that I just noticed recently).  The metadata sync
>> >> > issue
>> >> > is
>> >> > in the public realm.  I have no idea what is causing this to not sync
>> >> > properly since running a `metadata sync init` catches it back up to
>> >> > the
>> >> > primary zone, but then it doesn't receive any new users created after
>> >> > that.
>> >> >
>> >>
>> >> Sounds like an issue with the metadata log in the primary master zone.
>> >> Not sure what could go wrong there, but maybe the master zone doesn't
>> >> know that it is a master zone, or it's set to not log metadata. Or
>> >> maybe there's a problem when the secondary is trying to fetch the
>> >> metadata log. Maybe some kind of # of shards mismatch (though not
>> >> likely).
>> >> Try to see if the master logs any changes: should use the
&g

Re: [ceph-users] [Ceph-maintainers] Ceph release cadence

2017-09-10 Thread Yehuda Sadeh-Weinraub

I'm not a huge fan of train releases, as they tend to never quite make
it on time and it always feels a bit artificial timeline anyway. OTOH,
I do see and understand the need of a predictable schedule with a
roadmap attached to it. There are many that need to have at least a
vague idea on what we're going to ship and when, so that they can plan
ahead. We need it ourselves, as sometimes the schedule can dictate the
engineering decision that we're going to make.
On the other hand, I think that working towards a release that come
out after 9 or 12 months is a bit too long. This is a recipe for more
delays as the penalty for missing a feature is painful. Maybe we can
consider returning to shorter iterations for *dev* releases. These are
checkpoints that need to happen after a short period (2-3 weeks),
where we end up with a minimally tested release that passes some smoke
test. Features are incrementally added to the dev release. The idea
behind a short term dev release is that it minimizes the window where
master is completely unusable, thus reduces the time to stabilization.
Then it's easier to enforce a train schedule if we want to. It might
be easier to let go of a feature that doesn't make it, as it will be
there soon, and maybe if really needed we (or the downstream
maintainer) can make the decision to backport it. This makes me think
that we could revisit the our backport policy/procedure/tooling, so
that we can do it in a sane and safe way when needed and possible.

Yehuda

On Fri, Sep 8, 2017 at 7:59 PM, Gregory Farnum  wrote:
> I think I'm the resident train release advocate so I'm sure my
> advocating that model will surprise nobody. I'm not sure I'd go all
> the way to Lars' multi-release maintenance model (although it's
> definitely something I'm interested in), but there are two big reasons
> I wish we were on a train with more frequent real releases:
>
> 1) It reduces the cost of features missing a release. Right now if
> something misses an LTS release, that's it for a year. And nobody
> likes releasing an LTS without a bunch of big new features, so each
> LTS is later than the one before as we scramble to get features merged
> in.
>
> ...and then we deal with the fact that we scrambled to get a bunch of
> features merged in and they weren't quite baked. (Luminous so far
> seems to have gone much better in this regard! Hurray! But I think
> that has a lot to do with our feature-release-scramble this year being
> mostly peripheral stuff around user interfaces that got tacked on
> about the time we'd initially planned the release to occur.)
>
> 2) Train releases increase predictability for downstreams, partners,
> and users around when releases will happen. Right now, the release
> process and schedule is entirely opaque to anybody who's not involved
> in every single upstream meeting we have; and it's unpredictable even
> to those who are. That makes things difficult, as Xiaoxi said.
>
> There are other peripheral but serious benefits I'd expect to see from
> fully-validated train releases as well. It would be *awesome* to have
> more frequent known-stable points to do new development against. If
> you're an external developer and you want a new feature, you have to
> either keep it rebased against a fast-changing master branch, or you
> need to settle for writing it against a long-out-of-date LTS and then
> forward-porting it for merge. If you're an FS developer writing a very
> small new OSD feature and you try to validate it against RADOS, you've
> no idea if bugs that pop up and look random are because you really did
> something wrong or if there's currently an intermittent issue in RADOS
> master. I would have *loved* to be able to maintain CephFS integration
> branches for features that didn't touch RADOS and were built on top of
> the latest release instead of master, but it was utterly infeasible
> because there were too many missing features with the long delays.
>
> On Fri, Sep 8, 2017 at 9:16 AM, Sage Weil  wrote:
>> I'm going to pick on Lars a bit here...
>>
>> On Thu, 7 Sep 2017, Lars Marowsky-Bree wrote:
>>> On 2017-09-06T15:23:34, Sage Weil  wrote:
>>> > Other options we should consider?  Other thoughts?
>>>
>>> With about 20-odd years in software development, I've become a big
>>> believer in schedule-driven releases. If it's feature-based, you never
>>> know when they'll get done.
>>>
>>> If the schedule intervals are too long though, the urge to press too
>>> much in (so as not to miss the next merge window) is just too high,
>>> meaning the train gets derailed. (Which cascades into the future,
>>> because the next time the pressure will be even higher based on the
>>> previous experience.) This requires strictness.
>>>
>>> We've had a few Linux kernel releases that were effectively feature
>>> driven and never quite made it. 1.3.x? 1.5.x? My memory is bad, but they
>>> were a disaster than eventually led Linus to evolve to the current
>>> model.
>>>
>>> That serves them really

Re: [ceph-users] radosgw notify on creation/deletion of file in bucket

2017-10-03 Thread Yehuda Sadeh-Weinraub

On Tue, Oct 3, 2017 at 8:59 AM, Sean Purdy  wrote:
> Hi,
>
>
> Is there any way that radosgw can ping something when a file is removed or 
> added to a bucket?
>

That depends on what exactly you're looking for. You can't get that
info as a user. but there is a mechanism for remote zones to detect
changes that happen on the zone.

> Or use its sync facility to sync files to AWS/Google buckets?
>

Not at the moment, in the works. Unless you want to write your own sync plugin.

Yehuda

> Just thinking about backups.  What do people use for backups?  Been looking 
> at rclone.
>
>
> Thanks,
>
> Sean
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw resharding operation seemingly won't end

2017-10-09 Thread Yehuda Sadeh-Weinraub

On Mon, Oct 9, 2017 at 1:59 PM, Ryan Leimenstoll
 wrote:
> Hi all,
>
> We recently upgraded to Ceph 12.2.1 (Luminous) from 12.2.0 however are now 
> seeing issues running radosgw. Specifically, it appears an automatically 
> triggered resharding operation won’t end, despite the jobs being cancelled 
> (radosgw-admin reshard cancel). I have also disabled dynamic sharding for the 
> time being in the ceph.conf.
>
>
> [root@objproxy02 ~]# radosgw-admin reshard list
> []
>
> The two buckets were also reported in the `radosgw-admin reshard list` before 
> our RGW frontends paused recently (and only came back after a service 
> restart). These two buckets cannot currently be written to at this point 
> either.
>
> 2017-10-06 22:41:19.547260 7f90506e9700 0 block_while_resharding ERROR: 
> bucket is still resharding, please retry
> 2017-10-06 22:41:19.547411 7f90506e9700 0 WARNING: set_req_state_err 
> err_no=2300 resorting to 500
> 2017-10-06 22:41:19.547729 7f90506e9700 0 ERROR: 
> RESTFUL_IO(s)->complete_header() returned err=Input/output error
> 2017-10-06 22:41:19.548570 7f90506e9700 1 == req done req=0x7f90506e3180 
> op status=-2300 http_status=500 ==
> 2017-10-06 22:41:19.548790 7f90506e9700 1 civetweb: 0x55766d111000: 
> $MY_IP_HERE$ - - [06/Oct/2017:22:33:47 -0400] "PUT /
> $REDACTED_BUCKET_NAME$/$REDACTED_KEY_NAME$ HTTP/1.1" 1 0 - Boto3/1.4.7 
> Python/2.7.12 Linux/4.9.43-17.3
> 9.amzn1.x86_64 exec-env/AWS_Lambda_python2.7 Botocore/1.7.2 Resource
> [.. slightly later in the logs..]
> 2017-10-06 22:41:53.516272 7f90406c9700 1 rgw realm reloader: Frontends paused
> 2017-10-06 22:41:53.528703 7f907893f700 0 ERROR: failed to clone shard, 
> completion_mgr.get_next() returned ret=-125
> 2017-10-06 22:44:32.049564 7f9074136700 0 ERROR: keystone revocation 
> processing returned error r=-22
> 2017-10-06 22:59:32.059222 7f9074136700 0 ERROR: keystone revocation 
> processing returned error r=-22
>
> Can anyone advise on the best path forward to stop the current sharding 
> states and avoid this moving forward?
>

What does 'radosgw-admin reshard status --bucket=' return?
I think just manually resharding the buckets should clear this flag,
is that not an option?
manual reshard: radosgw-admin bucket reshard --bucket=
--num-shards=

also, the 'radosgw-admin bucket check --fix' might clear that flag.

For some reason it seems that the reshard cancellation code is not
clearing that flag on the bucket index header (pretty sure it used to
do it at one point). I'll open a tracker ticket.

Thanks,
Yehuda

>
> Some other details:
>  - 3 rgw instances
>  - Ceph Luminous 12.2.1
>  - 584 active OSDs, rgw bucket index is on Intel NVMe OSDs
>
>
> Thanks,
> Ryan Leimenstoll
> rleim...@umiacs.umd.edu
> University of Maryland Institute for Advanced Computer Studies
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

1 2 >

1 - 100 of 194 matches

Mail list logo