from:"Ben Hines"

Re: [ceph-users] object lifecycle and updating from jewel

2018-01-04 Thread Ben Hines

Yes, it works fine with pre existing buckets.

On Thu, Jan 4, 2018 at 8:52 AM, Graham Allan  wrote:

> I've only done light testing with lifecycle so far, but I'm pretty sure
> you can apply it to pre-existing buckets.
>
> Graham
>
>
> On 01/02/2018 10:42 PM, Robert Stanford wrote:
>
>>
>>   I would like to use the new object lifecycle feature of kraken /
>> luminous.  I have jewel, with buckets that have lots and lots of objects.
>> It won't be practical to move them, then move them back after upgrading.
>>
>>   In order to use the object lifecycle feature of radosgw in
>> kraken/luminous, do I need to have buckets configured for this, before
>> installing data?  In the scenario above, am I out of luck?  Or is object
>> lifecycle functionality available as soon as radosgw is upgraded?
>>
>>   Thank you
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> --
> Graham Allan
> Minnesota Supercomputing Institute - g...@umn.edu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] installing specific version of ceph-common

2017-10-09 Thread Ben Hines

Just encountered this same problem with 11.2.0.

" yum install ceph-common-11.2.0 libradosstriper1-11.2.0 librgw2-11.2.0"
did the trick. Thanks!

It would be nice if it was easier to install older noncurrent versions of
Ceph, perhaps there is a way to fix the dependencies so that yum can figure
it out properly?

-Ben

On Tue, Jul 18, 2017 at 1:39 AM, Buyens Niels  wrote:

> I've been looking into this again and have been able to install it now
> (10.2.9 is newest now instead of 10.2.8 when I first asked the question):
>
> Looking at the dependency resolving, we can see it's going to install
> libradosstriper1 version 10.2.9 and because of that also librados 10.2.9
> ...
> ---> Package libradosstriper1.x86_64 1:10.2.9-0.el7 will be installed
> --> Processing Dependency: librados2 = 1:10.2.9-0.el7 for package:
> 1:libradosstriper1-10.2.9-0.el7.x86_64
> ...
>
> Same for librgw2:
> ...
> ---> Package librgw2.x86_64 1:10.2.9-0.el7 will be installed
> --> Processing Dependency: libfcgi.so.0()(64bit) for package:
> 1:librgw2-10.2.9-0.el7.x86_64
> ...
>
> So to install ceph-common with a specific version, you need to do:
> yum install ceph-common-10.2.7 libradosstriper1-10.2.7 librgw2-10.2.7
> This way it won't try to install v10.2.9 of librados2.
>
> I still feel it's weird that it's trying to install newer versions as
> dependency for a 10.2.7 package (looking at the dependencies being
> processed, there's no version provided for the librbd, libbabeltrace,
> libbabeltrace-ctf, libradosstriper and librgw so it will install the newest
> version it can find and because of that upgrade librados2 to the newest
> version as well to provide for those dependencies.
>
> Complete resolve:
> Resolving Dependencies
> --> Running transaction check
> ---> Package ceph-common.x86_64 1:10.2.7-0.el7 will be installed
> --> Processing Dependency: python-rados = 1:10.2.7-0.el7 for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: librbd1 = 1:10.2.7-0.el7 for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: python-rbd = 1:10.2.7-0.el7 for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: python-cephfs = 1:10.2.7-0.el7 for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: libcephfs1 = 1:10.2.7-0.el7 for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: librbd.so.1()(64bit) for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: libbabeltrace.so.1()(64bit) for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: libbabeltrace-ctf.so.1()(64bit) for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: libradosstriper.so.1()(64bit) for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> --> Processing Dependency: librgw.so.2()(64bit) for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> ---> Package librados2.x86_64 1:10.2.7-0.el7 will be installed
> --> Running transaction check
> ---> Package libbabeltrace.x86_64 0:1.2.4-3.el7 will be installed
> ---> Package libcephfs1.x86_64 1:10.2.7-0.el7 will be installed
> ---> Package libradosstriper1.x86_64 1:10.2.9-0.el7 will be installed
> --> Processing Dependency: librados2 = 1:10.2.9-0.el7 for package:
> 1:libradosstriper1-10.2.9-0.el7.x86_64
> ---> Package librbd1.x86_64 1:10.2.7-0.el7 will be installed
> --> Processing Dependency: librados2 = 1:10.2.7-0.el7 for package:
> 1:librbd1-10.2.7-0.el7.x86_64
> ---> Package librgw2.x86_64 1:10.2.9-0.el7 will be installed
> --> Processing Dependency: libfcgi.so.0()(64bit) for package:
> 1:librgw2-10.2.9-0.el7.x86_64
> ---> Package python-cephfs.x86_64 1:10.2.7-0.el7 will be installed
> ---> Package python-rados.x86_64 1:10.2.7-0.el7 will be installed
> --> Processing Dependency: librados2 = 1:10.2.7-0.el7 for package:
> 1:python-rados-10.2.7-0.el7.x86_64
> ---> Package python-rbd.x86_64 1:10.2.7-0.el7 will be installed
> --> Running transaction check
> ---> Package fcgi.x86_64 0:2.4.0-25.el7 will be installed
> ---> Package librados2.x86_64 1:10.2.7-0.el7 will be installed
> --> Processing Dependency: librados2 = 1:10.2.7-0.el7 for package:
> 1:python-rados-10.2.7-0.el7.x86_64
> --> Processing Dependency: librados2 = 1:10.2.7-0.el7 for package:
> 1:librbd1-10.2.7-0.el7.x86_64
> --> Processing Dependency: librados2 = 1:10.2.7-0.el7 for package:
> 1:ceph-common-10.2.7-0.el7.x86_64
> ---> Package librados2.x86_64 1:10.2.9-0.el7 will be installed
> ---> Package librbd1.x86_64 1:10.2.7-0.el7 will be installed
> --> Processing Dependency: librados2 = 1:10.2.7-0.el7 for package:
> 1:librbd1-10.2.7-0.el7.x86_64
> ---> Package python-rados.x86_64 1:10.2.7-0.el7 will be installed
> --> Processing Dependency: librados2 = 1:10.2.7-0.el7 for package:
> 1:python-rados-10.2.7-0.el7.x86_64
> --> Finished Dependency Resolution
>
> Fixed install:
> yum install ceph-common-10.2.7 libradosstriper1-10.2.7 librgw2-10.2.7
> Loaded plugins: fastestmirror
> Loading mirror

[ceph-users] Kraken bucket index fix failing

2017-09-14 Thread Ben Hines

Hi,

A few weeks ago after running the command to fix my object index for a
particular bucket with a lot of data (~26TB) and about 50k multipart
objects (~1800 S3 objects), the index lost track of all previous objects
and started tracking only new ones.  The radosgw zone was set to
index_type: 1 when i ran the command. Could that have broken it? I then set
it back to index_type:0 but i didn't change anything.

Fortunately, i have a separate DB containing the path to each object in
Ceph, so i can verify they are still there. Running HEAD on these objects
works, as does downloading them. But they are missing from the index.
When i rerun, nothing changes. It also returns in about 30 seconds which is
way too fast considering how much data i have in this bucket.

fix returns 15949 objects, but i estimate thus bucket has more like 5
multipart objects in it.

Earlier:
radosgw-admin bucket check --bucket  --fix --check-objects
output (Earlier) and zonegroup info:
https://gist.github.com/benh57/a58ea7d9f23a853e71b1115aca9fad29

And now when i run it, i get nothing at all - i suspect it was wiped again
and only will pick up new things.

-bash-4.2$ radosgw-admin bucket check --bucket int8-packages --check-objects
2017-09-14 18:02:53.690107 7f65e33b9c80  0 System already converted
[]

Currently Kraken 11.2.0. Is it worth going up to Luminous (or Kraken
11.2.1) to fix?

Thanks,

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph release cadence

2017-09-11 Thread Ben Hines

We have generally been running the latest non LTS 'stable' release since my
cluster is slightly less mission critical than others, and there were
important features to us added in both Infernalis and Kraken. But i really
only care about RGW. If the rgw component could be split out of ceph into a
plugin and independently updated, it'd be awesome for us.

A minor bugfix to radosgw shouldn't be blocked by issues with RBD, for
example. I don't care at all.
Could have packages like:

ceph-core
ceph-radosgw
ceph-rbd ...
ceph-mgr..

Might increase the testing workload, but automation should take care of
that...

ceph-mgr is also similar. Minor (or even major) updates to the GUI
dashboard shouldn't be blocked rolling out to users because we're waiting
on a new RBD feature or critical RGW fix.

radosgw and mgr are really 'clients', after all.

-Ben

On Mon, Sep 11, 2017 at 3:30 PM, John Spray  wrote:

> On Wed, Sep 6, 2017 at 4:23 PM, Sage Weil  wrote:
> > Hi everyone,
> >
> > Traditionally, we have done a major named "stable" release twice a year,
> > and every other such release has been an "LTS" release, with fixes
> > backported for 1-2 years.
> >
> > With kraken and luminous we missed our schedule by a lot: instead of
> > releasing in October and April we released in January and August.
> >
> > A few observations:
> >
> > - Not a lot of people seem to run the "odd" releases (e.g., infernalis,
> > kraken).  This limits the value of actually making them.  It also means
> > that those who *do* run them are running riskier code (fewer users ->
> more
> > bugs).
> >
> > - The more recent requirement that upgrading clusters must make a stop at
> > each LTS (e.g., hammer -> luminous not supported, must go hammer -> jewel
> > -> lumninous) has been hugely helpful on the development side by reducing
> > the amount of cross-version compatibility code to maintain and reducing
> > the number of upgrade combinations to test.
> >
> > - When we try to do a time-based "train" release cadence, there always
> > seems to be some "must-have" thing that delays the release a bit.  This
> > doesn't happen as much with the odd releases, but it definitely happens
> > with the LTS releases.  When the next LTS is a year away, it is hard to
> > suck it up and wait that long.
> >
> > A couple of options:
> >
> > * Keep even/odd pattern, and continue being flexible with release dates
> >
> >   + flexible
> >   - unpredictable
> >   - odd releases of dubious value
> >
> > * Keep even/odd pattern, but force a 'train' model with a more regular
> > cadence
> >
> >   + predictable schedule
> >   - some features will miss the target and be delayed a year
> >
> > * Drop the odd releases but change nothing else (i.e., 12-month release
> > cadence)
> >
> >   + eliminate the confusing odd releases with dubious value
> >
> > * Drop the odd releases, and aim for a ~9 month cadence. This splits the
> > difference between the current even/odd pattern we've been doing.
> >
> >   + eliminate the confusing odd releases with dubious value
> >   + waiting for the next release isn't quite as bad
> >   - required upgrades every 9 months instead of ever 12 months
>
> This is my preferred option (second choice would be the next one up,
> i.e. same thing but annually).
>
> Our focus should be on delivering solid stuff, but not necessarily
> bending over backwards to enable people to run old stuff.  Our
> commitment to releases should be that there are either fixes for that
> release, or a newer (better) release to upgrade to.  Either way there
> is a solution on offer (and any user/vendor who wants to independently
> maintain other stable branches is free to do so).
>
> John
>
> > * Drop the odd releases, but relax the "must upgrade through every LTS"
> to
> > allow upgrades across 2 versions (e.g., luminous -> mimic or luminous ->
> > nautilus).  Shorten release cycle (~6-9 months).
> >
> >   + more flexibility for users
> >   + downstreams have greater choice in adopting an upstrema release
> >   - more LTS branches to maintain
> >   - more upgrade paths to consider
> >
> > Other options we should consider?  Other thoughts?
> >
> > Thanks!
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Repeated failures in RGW in Ceph 12.1.4

2017-08-30 Thread Ben Hines

The daily log rotation.

-Ben

On Wed, Aug 30, 2017 at 3:09 PM, Bryan Banister 
wrote:

> Looking at the systemd service it does show that twice, at roughly the
> same time and one day apart, the service did receive a HUP signal:
>
> Aug 29 16:31:02 carf-ceph-osd02 radosgw[130050]: 2017-08-29
> 16:31:02.528559 7fffc641c700 -1 received  signal: Hangup from  PID: 73176
> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
> radosgw  UID: 0
>
> Aug 30 16:32:02 carf-ceph-osd02 radosgw[130050]: 2017-08-30
> 16:32:02.529825 7fffc641c700 -1 received  signal: Hangup from  PID: 48062
> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
> radosgw  UID: 0
>
>
>
> Any idea what would do this?
>
>
>
> I'll be updating the version to 12.2.0 shortly,
>
> -Bryan
>
>
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Bryan Banister
> Sent: Wednesday, August 30, 2017 3:42 PM
> To: Yehuda Sadeh-Weinraub 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Repeated failures in RGW in Ceph 12.1.4
>
>
>
> Note: External Email
>
> -
>
>
>
> We are not sending a HUP signal that we know about.  We were not modifying
> our configuration.  However all user accounts in the RGW were lost!
>
> -Bryan
>
>
>
> -Original Message-
>
> From: Yehuda Sadeh-Weinraub [mailto:yeh...@redhat.com ]
>
> Sent: Wednesday, August 30, 2017 3:30 PM
>
> To: Bryan Banister 
>
> Cc: ceph-users@lists.ceph.com
>
> Subject: Re: [ceph-users] Repeated failures in RGW in Ceph 12.1.4
>
>
>
> Note: External Email
>
> -
>
>
>
> On Wed, Aug 30, 2017 at 5:44 PM, Bryan Banister
>
>  wrote:
>
> > Not sure what’s happening but we started to but a decent load on the
> RGWs we
>
> > have setup and we were seeing failures with the following kind of
>
> > fingerprint:
>
> >
>
> >
>
> >
>
> > 2017-08-29 17:06:22.072361 7ffdc501a700  1 rgw realm reloader: Frontends
>
> > paused
>
> >
>
>
>
> Are you modifying configuration? Could be that something is sending
>
> HUP singal to the radosgw process. We disabled this behavior (process
>
> dynamic reconfig after HUP) in 12.2.0.
>
>
>
> Yehuda
>
>
>
> > 2017-08-29 17:06:22.072359 7fffacbe9700  1 civetweb: 0x56add000:
>
> > 7.128.12.19 - - [29/Aug/2017:16:47:36 -0500] "PUT
>
> > /blah?partNumber=8=2~L9MEmUUmZKb2y8JCotxo62yzdMbHmye HTTP/1.1"
> 1 0
>
> > - Minio (linux; amd64) minio-go/3.0.0
>
> >
>
> > 2017-08-29 17:06:22.072438 7fffcb426700  0 ERROR: failed to clone shard,
>
> > completion_mgr.get_next() returned ret=-125
>
> >
>
> > 2017-08-29 17:06:23.689610 7ffdc501a700  1 rgw realm reloader: Store
> closed
>
> >
>
> > 2017-08-29 17:06:24.117630 7ffdc501a700  1 failed to decode the mdlog
>
> > history: buffer::end_of_buffer
>
> >
>
> > 2017-08-29 17:06:24.117635 7ffdc501a700  1 failed to read mdlog history:
> (5)
>
> > Input/output error
>
> >
>
> > 2017-08-29 17:06:24.118711 7ffdc501a700  1 rgw realm reloader: Creating
> new
>
> > store
>
> >
>
> > 2017-08-29 17:06:24.118901 7ffdc501a700  1 mgrc service_daemon_register
>
> > rgw.carf-ceph-osd01 metadata {arch=x86_64,ceph_version=ceph version
> 12.1.4
>
> > (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc),cpu=Intel(R)
>
> > Xeon(R) CPU E5-2680 v4 @ 2.40GHz,distro=rhel,distro_description=Red Hat
>
> > Enterprise Linux Server 7.3
>
> > (Maipo),distro_version=7.3,frontend_config#0=civetweb port=80
>
> > num_threads=1024,frontend_type#0=civetweb,hos
>
> >
>
> > tname=carf-ceph-osd01,kernel_description=#1 SMP Tue Apr 4 04:49:42 CDT
>
> > 2017,kernel_version=3.10.0-514.6.1.el7.jump3.x86_64,mem_
> swap_kb=0,mem_total_kb=263842036,num_handles=1,os=Linux,pid=14723,zone_id=
> b0634f34-67e2-4b44-ab00-5282f1e2cd83,zone_name=carf01,
> zonegroup_id=8207fcf5-7bd3-43df-ab5a-ea17e5949eec,zonegroup_name=us}
>
> >
>
> > 2017-08-29 17:06:24.118925 7ffdc501a700  1 rgw realm reloader: Finishing
>
> > initialization of new store
>
> >
>
> > 2017-08-29 17:06:24.118927 7ffdc501a700  1 rgw realm reloader:  - REST
>
> > subsystem init
>
> >
>
> > 2017-08-29 17:06:24.118943 7ffdc501a700  1 rgw realm reloader:  - user
>
> > subsystem init
>
> >
>
> > 2017-08-29 17:06:24.118947 7ffdc501a700  1 rgw realm reloader:  - user
>
> > subsystem init
>
> >
>
> > 2017-08-29 17:06:24.118950 7ffdc501a700  1 rgw realm reloader:  - usage
>
> > subsystem init
>
> >
>
> > 2017-08-29 17:06:24.118985 7ffdc501a700  1 rgw realm reloader: Resuming
>
> > frontends with new realm configuration.
>
> >
>
> > 2017-08-29 17:06:24.119018 7fffad3ea700  1 == starting new request
>
> > req=0x7fffad3e4190 =
>
> >
>
> > 2017-08-29 17:06:24.119039 7fffacbe9700  1 == starting new request
>
> > req=0x7fffacbe3190 =
>
> >
>
> > 2017-08-29 17:06:24.120163 7fffacbe9700  1 == req done
>
> >

Re: [ceph-users] Linear space complexity or memory leak in `Radosgw-admin bucket check --fix`

2017-07-26 Thread Ben Hines

Which version of Ceph?

On Tue, Jul 25, 2017 at 4:19 AM, Hans van den Bogert 
wrote:

> Hi All,
>
> I don't seem to be able to fix a bucket, a bucket which has become
> inconsistent due to the use of the `inconsistent-index` flag 8).
>
> My ceph-admin VM has 4GB of RAM, but that doesn't seem to be enough to do
> a `radosgw-admin bucket check --fix` which holds 6M items, as the
> radosgw-admin process is killed eventually by the Out-Of-Memory-Manager. Is
> this high RAM usage to be expected, or should I file a bug?
>
> Regards,
>
> Hans
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Kraken rgw lifeycle processing nightly crash

2017-07-25 Thread Ben Hines

Looks like wei found and fixed this in
https://github.com/ceph/ceph/pull/16495

Thanks Wei!

This has been causing crashes for us since May. Guess it shows that not
many folks use Kraken with lifecycles yet, but more certainly will with
Luminous.

-Ben

On Fri, Jul 21, 2017 at 7:19 AM, Daniel Gryniewicz <d...@redhat.com> wrote:

> On 07/20/2017 04:48 PM, Ben Hines wrote:
>
>> Still having this RGWLC crash once a day or so. I do plan to update to
>> Luminous as soon as that is final, but it's possible this issue will still
>> occur, so i was hoping one of the devs could take a look at it.
>>
>> My original suspicion was that it happens when lifecycle processing at
>> the same time that the morning log rotation occurs, but i am not certain
>> about that, so perhaps the bug title should be updated to remove that
>> conclusion. (i can't edit it)
>>
>> http://tracker.ceph.com/issues/19956 - no activity for 2 months.
>>
>> Stack with symbols:
>>
>> #0 0x7f6a6cb1723b in raise () from /lib64/libpthread.so.0
>> #1 <http://tracker.ceph.com/issues/1> 0x7f6a778b9e95 in
>> reraise_fatal (signum=11) at /usr/src/debug/ceph-11.2.0/src
>> /global/signal_handler.cc:72
>> #2 <http://tracker.ceph.com/issues/2> handle_fatal_signal (signum=11) at
>> /usr/src/debug/ceph-11.2.0/src/global/signal_handler.cc:134
>> #3 <http://tracker.ceph.com/issues/3> 
>> #4 <http://tracker.ceph.com/issues/4> RGWGC::add_chain (this=this@entry=0x0,
>> op=..., chain=..., tag="default.68996150.61684839") at
>> /usr/src/debug/ceph-11.2.0/src/rgw/rgw_gc.cc:58
>> #5 <http://tracker.ceph.com/issues/5> 0x7f6a77801e3f in
>> RGWGC::send_chain (this=0x0, chain=..., tag="default.68996150.61684839",
>> sync=sync@entry=false)
>>
>
> Here, this (the RGWGC, or store->gc) is NULL, so that's the problem.  I
> have no idea how the store isn't initialized, though.
>
> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_gc.cc:64
>> #6 <http://tracker.ceph.com/issues/6> 0x7f6a776c0a29 in
>> RGWRados::Object::complete_atomic_modification (this=0x7f69cc8578d0) at
>> /usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:7870
>> #7 <http://tracker.ceph.com/issues/7> 0x7f6a777102a0 in
>> RGWRados::Object::Delete::delete_obj (this=this@entry=0x7f69cc857840) at
>> /usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:8295
>> #8 <http://tracker.ceph.com/issues/8> 0x7f6a77710ce8 in
>> RGWRados::delete_obj (this=, obj_ctx=..., bucket_info=...,
>> obj=..., versioning_status=0, bilog_flags=,
>> expiration_time=...) at /usr/src/debug/ceph-11.2.0/src
>> /rgw/rgw_rados.cc:8330
>> #9 <http://tracker.ceph.com/issues/9> 0x7f6a77607ced in
>> rgw_remove_object (store=0x7f6a810fe000, bucket_info=..., bucket=...,
>> key=...) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_bucket.cc:519
>> #10 <http://tracker.ceph.com/issues/10> 0x7f6a7780c971 in
>> RGWLC::bucket_lc_process (this=this@entry=0x7f6a81959c00,
>> shard_id=":globalcache307:default.42048218.11")
>> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:283
>> #11 <http://tracker.ceph.com/issues/11> 0x7f6a7780d928 in
>> RGWLC::process (this=this@entry=0x7f6a81959c00, index=,
>> max_lock_secs=max_lock_secs@entry=60)
>> at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:482
>> #12 <http://tracker.ceph.com/issues/12> 0x7f6a7780ddc1 in
>> RGWLC::process (this=0x7f6a81959c00) at /usr/src/debug/ceph-11.2.0/src
>> /rgw/rgw_lc.cc:412
>> #13 <http://tracker.ceph.com/issues/13> 0x7f6a7780e033 in
>> RGWLC::LCWorker::entry (this=0x7f6a81a820d0) at
>> /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:51
>> #14 <http://tracker.ceph.com/issues/14> 0x7f6a6cb0fdc5 in
>> start_thread () from /lib64/libpthread.so.0
>> #15 <http://tracker.ceph.com/issues/15> 0x7f6a6b37073d in clone ()
>> from /lib64/libc.so.6
>>
>>
> Daniel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Kraken rgw lifeycle processing nightly crash

2017-07-20 Thread Ben Hines

Still having this RGWLC crash once a day or so. I do plan to update to
Luminous as soon as that is final, but it's possible this issue will still
occur, so i was hoping one of the devs could take a look at it.

My original suspicion was that it happens when lifecycle processing at the
same time that the morning log rotation occurs, but i am not certain about
that, so perhaps the bug title should be updated to remove that conclusion.
(i can't edit it)

http://tracker.ceph.com/issues/19956 - no activity for 2 months.

Stack with symbols:

#0 0x7f6a6cb1723b in raise () from /lib64/libpthread.so.0
#1  0x7f6a778b9e95 in reraise_fatal
(signum=11) at /usr/src/debug/ceph-11.2.0/src/global/signal_handler.cc:72
#2  handle_fatal_signal (signum=11) at
/usr/src/debug/ceph-11.2.0/src/global/signal_handler.cc:134
#3  
#4  RGWGC::add_chain (this=this@entry=0x0,
op=..., chain=..., tag="default.68996150.61684839") at
/usr/src/debug/ceph-11.2.0/src/rgw/rgw_gc.cc:58
#5  0x7f6a77801e3f in
RGWGC::send_chain (this=0x0, chain=..., tag="default.68996150.61684839",
sync=sync@entry=false)
at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_gc.cc:64
#6  0x7f6a776c0a29 in
RGWRados::Object::complete_atomic_modification (this=0x7f69cc8578d0) at
/usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:7870
#7  0x7f6a777102a0 in
RGWRados::Object::Delete::delete_obj (this=this@entry=0x7f69cc857840) at
/usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:8295
#8  0x7f6a77710ce8 in
RGWRados::delete_obj (this=, obj_ctx=..., bucket_info=...,
obj=..., versioning_status=0, bilog_flags=,
expiration_time=...) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_rados.cc:8330
#9  0x7f6a77607ced in
rgw_remove_object (store=0x7f6a810fe000, bucket_info=..., bucket=...,
key=...) at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_bucket.cc:519
#10  0x7f6a7780c971 in
RGWLC::bucket_lc_process (this=this@entry=0x7f6a81959c00,
shard_id=":globalcache307:default.42048218.11")
at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:283
#11  0x7f6a7780d928 in
RGWLC::process (this=this@entry=0x7f6a81959c00, index=,
max_lock_secs=max_lock_secs@entry=60)
at /usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:482
#12  0x7f6a7780ddc1 in
RGWLC::process (this=0x7f6a81959c00) at
/usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:412
#13  0x7f6a7780e033 in
RGWLC::LCWorker::entry (this=0x7f6a81a820d0) at
/usr/src/debug/ceph-11.2.0/src/rgw/rgw_lc.cc:51
#14  0x7f6a6cb0fdc5 in start_thread
() from /lib64/libpthread.so.0
#15  0x7f6a6b37073d in clone () from
/lib64/libc.so.6

thanks,

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-06 Thread Ben Hines

If you have nothing listed in 'lc list', you probably need to add a
lifecycle configuration using the S3 API. It's not automatic and has to be
added per-bucket.


Here's some sample code for doing so: http://tracker.ceph.com/issues/19587

-Ben

On Tue, Jun 6, 2017 at 9:07 AM, Graham Allan <g...@umn.edu> wrote:

> I still haven't seen anything get expired from our kraken (11.2.0) system.
>
> When I run "radosgw-admin lc list" I get no output, besides debug output
> (I have "debug rgw = 10" at present):
>
> # radosgw-admin lc list
> 2017-06-06 10:57:49.319576 7f2b26ffd700  2 
> RGWDataChangesLog::ChangesRenewThread:
> start
> 2017-06-06 10:57:49.350646 7f2b49558c80 10 Cannot find current period zone
> using local zone
> 2017-06-06 10:57:49.379065 7f2b49558c80  2 all 8 watchers are set,
> enabling cache
> []
> 2017-06-06 10:57:49.399538 7f2b49558c80  2 removed watcher, disabling cache
>
> Unclear to me whether the debug message about "Cannot find current period
> zone using local zone" is related or indicates a problem.
>
> Currently all the lc config is more or less default, eg a few values:
>
> # ceph --show-config|grep rgw_|grep lifecycle
> rgw_lifecycle_enabled = true
> rgw_lifecycle_thread = 1
> rgw_lifecycle_work_time = 00:00-06:00
>
>
> Graham
>
> On 06/05/2017 01:07 PM, Ben Hines wrote:
>
>> FWIW lifecycle is working for us. I did have to research to find the
>> appropriate lc config file settings, the documentation for which is found
>> in a git pull request (waiting for another release?) rather than on the
>> Ceph docs site. https://github.com/ceph/ceph/pull/13990
>>
>>
>> Try these:
>> debug rgw = 20
>> rgw lifecycle work time = 00:01-23:59
>>
>>
>> and see if you have lifecycles listed when you run:
>>
>>
>> radosgw-admin lc list
>>
>>
>> 2017-06-05 10:58:00.473957 7f3429f77c80  0 System already converted
>> [
>>  {
>>  "bucket": ":bentest:default.653959.6",
>>  "status": "COMPLETE"
>>  },
>>  {
>>  "bucket": "::default.24713983.1",
>>  "status": "PROCESSING"
>>  },
>>  {
>>  "bucket": "::default.24713983.2",
>>  "status": "PROCESSING"
>>  },
>>
>> 
>>
>>
>> At 10 loglevel, the lifecycle processor logs 'DELETED' each time it
>> deletes something: https://github.com/ceph/ceph/b
>> lob/master/src/rgw/rgw_lc.cc#L388
>>
>>   grep --text DELETED client..log | wc -l
>> 121853
>>
>>
>> -Ben
>>
>> On Mon, Jun 5, 2017 at 6:16 AM, Daniel Gryniewicz <d...@redhat.com
>> <mailto:d...@redhat.com>> wrote:
>>
>> Kraken has lifecycle, Jewel does not.
>>
>> Daniel
>>
>>
>> On 06/04/2017 07:16 PM, ceph.nov...@habmalnefrage.de
>> <mailto:ceph.nov...@habmalnefrage.de> wrote:
>>
>>
>> grrr... sorry && and again as text :|
>>
>>
>> Gesendet: Montag, 05. Juni 2017 um 01:12 Uhr
>> Von: ceph.nov...@habmalnefrage.de
>> <mailto:ceph.nov...@habmalnefrage.de>
>> An: "Yehuda Sadeh-Weinraub" <yeh...@redhat.com
>> <mailto:yeh...@redhat.com>>
>> Cc: "ceph-users@lists.ceph.com
>> <mailto:ceph-users@lists.ceph.com>" <ceph-users@lists.ceph.com
>> <mailto:ceph-users@lists.ceph.com>>, ceph-de...@vger.kernel.org
>> <mailto:ceph-de...@vger.kernel.org>
>> Betreff: Re: [ceph-users] RGW lifecycle not expiring objects
>>
>>
>>
>> Hi (again) Yehuda.
>>
>> Looping in ceph-devel...
>>
>> Could it be that lifecycle is still not implemented neither in
>> Jewel nor in Kraken, even if release notes and other places say
>> so?
>>
>> https://www.spinics.net/lists/ceph-devel/msg34492.html
>> <https://www.spinics.net/lists/ceph-devel/msg34492.html>
>> https://github.com/ceph/ceph-ci/commit/7d48f62f5c86913d8f00b
>> 44d46a04a52d338907c
>> <https://github.com/ceph/ceph-ci/commit/7d48f62f5c86913d8f00
>> b44d46a04a52d338907c>
>> https://github.com/ceph/ceph-ci/commit/9162bd29594d34429a095
>> 62ed60a32a0703940ea
>> <https://github.com/ceph/ceph-ci/commit/9162bd29594d34429a09
>> 562ed60a32a0703940ea>
>>
>

Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-05 Thread Ben Hines

FWIW lifecycle is working for us. I did have to research to find the
appropriate lc config file settings, the documentation for which is found
in a git pull request (waiting for another release?) rather than on the
Ceph docs site. https://github.com/ceph/ceph/pull/13990


Try these:
debug rgw = 20
rgw lifecycle work time = 00:01-23:59


and see if you have lifecycles listed when you run:


radosgw-admin lc list


2017-06-05 10:58:00.473957 7f3429f77c80  0 System already converted
[
{
"bucket": ":bentest:default.653959.6",
"status": "COMPLETE"
},
{
"bucket": "::default.24713983.1",
"status": "PROCESSING"
},
{
"bucket": "::default.24713983.2",
"status": "PROCESSING"
},




At 10 loglevel, the lifecycle processor logs 'DELETED' each time it deletes
something: https://github.com/ceph/ceph/blob/master/src/rgw/rgw_lc.cc#L388

 grep --text DELETED client..log | wc -l
121853


-Ben

On Mon, Jun 5, 2017 at 6:16 AM, Daniel Gryniewicz  wrote:

> Kraken has lifecycle, Jewel does not.
>
> Daniel
>
>
> On 06/04/2017 07:16 PM, ceph.nov...@habmalnefrage.de wrote:
>
>>
>> grrr... sorry && and again as text :|
>>
>>
>> Gesendet: Montag, 05. Juni 2017 um 01:12 Uhr
>> Von: ceph.nov...@habmalnefrage.de
>> An: "Yehuda Sadeh-Weinraub" 
>> Cc: "ceph-users@lists.ceph.com" ,
>> ceph-de...@vger.kernel.org
>> Betreff: Re: [ceph-users] RGW lifecycle not expiring objects
>>
>>
>>
>> Hi (again) Yehuda.
>>
>> Looping in ceph-devel...
>>
>> Could it be that lifecycle is still not implemented neither in Jewel nor
>> in Kraken, even if release notes and other places say so?
>>
>> https://www.spinics.net/lists/ceph-devel/msg34492.html
>> https://github.com/ceph/ceph-ci/commit/7d48f62f5c86913d8f00b
>> 44d46a04a52d338907c
>> https://github.com/ceph/ceph-ci/commit/9162bd29594d34429a095
>> 62ed60a32a0703940ea
>>
>> Thanks & regards
>>  Anton
>>
>>
>> Gesendet: Sonntag, 04. Juni 2017 um 21:34 Uhr
>> Von: ceph.nov...@habmalnefrage.de
>> An: "Yehuda Sadeh-Weinraub" 
>> Cc: "ceph-users@lists.ceph.com" 
>> Betreff: Re: [ceph-users] RGW lifecycle not expiring objects
>> Hi Yahuda.
>>
>> Well, here we go: http://tracker.ceph.com/issues
>> /20177[http://tracker.ceph.com/issues/20177]
>>
>> As it's my first one, hope it's ok as it is...
>>
>> Thanks & regards
>> Anton
>>
>>
>> Gesendet: Samstag, 03. Juni 2017 um 00:14 Uhr
>> Von: "Yehuda Sadeh-Weinraub" 
>> An: ceph.nov...@habmalnefrage.de
>> Cc: "Graham Allan" , "ceph-users@lists.ceph.com" <
>> ceph-users@lists.ceph.com>
>> Betreff: Re: [ceph-users] RGW lifecycle not expiring objects
>> Have you opened a ceph tracker issue, so that we don't lose track of
>> the problem?
>>
>> Thanks,
>> Yehuda
>>
>> On Fri, Jun 2, 2017 at 3:05 PM,  wrote:
>>
>>> Hi Graham.
>>>
>>> We are on Kraken and have the same problem with "lifecycle". Various
>>> (other) tools like s3cmd or CyberDuck do show the applied "expiration"
>>> settings, but objects seem never to be purged.
>>>
>>> If you should have new findings, hints,... PLEASE share/let me know.
>>>
>>> Thanks a lot!
>>> Anton
>>>
>>>
>>> Gesendet: Freitag, 19. Mai 2017 um 22:44 Uhr
>>> Von: "Graham Allan" 
>>> An: ceph-users@lists.ceph.com
>>> Betreff: [ceph-users] RGW lifecycle not expiring objects
>>> I've been having a hard time getting the s3 object lifecycle to do
>>> anything here. I was able to set a lifecycle on a test bucket. As others
>>> also seem to have found, I do get an EACCES error on setting the
>>> lifecycle, but it does however get stored:
>>>
>>> % aws --endpoint-url https://xxx.xxx.xxx.xxx[https://xxx.xxx.xxx.xxx][
 https://xxx.xxx.xxx.xxx[https://xxx.xxx.xxx.xxx]] s3api
 get-bucket-lifecycle-configuration --bucket=testgta
 {
 "Rules": [
 {
 "Status": "Enabled",
 "Prefix": "",
 "Expiration": {
 "Days": 3
 },
 "ID": "test"
 }
 ]
 }

>>>
>>> but many days later I have yet to see any object actually get expired.
>>> There are some hints in the rgw log that the expiry thread does run
>>> periodically:
>>>
>>> 2017-05-19 03:49:03.281347 7f74f1134700 2 
>>> RGWDataChangesLog::ChangesRenewThread:
 start
 2017-05-19 03:49:16.356022 7f74ef931700 2 object expiration: start
 2017-05-19 03:49:16.356036 7f74ef931700 20 proceeding shard =
 obj_delete_at_hint.00
 2017-05-19 03:49:16.359785 7f74ef931700 20 proceeding shard =
 obj_delete_at_hint.01
 2017-05-19 03:49:16.364667 7f74ef931700 20 proceeding shard =
 obj_delete_at_hint.02
 2017-05-19 03:49:16.369636 7f74ef931700 20 proceeding shard =
 obj_delete_at_hint.03

>>> ...
>>>
 2017-05-19 03:49:16.803270 7f74ef931700 20 proceeding shard =
 obj_delete_at_hint.000126
 2017-05-19 03:49:16.806423

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-05-22 Thread Ben Hines

We used this workaround when upgrading to Kraken (which had a similar issue)

>modify the zonegroup and populate the 'hostnames' array with all backend
server hostnames as well as the hostname terminated by haproxy

Which i'm fine with. It's definitely a change that should be noted in a
more prominent release note. Without the hostname in there, ceph
interpreted the hostname as a bucket name if the hostname rgw was being hit
with differed from the hostname of the actual server. Pre Kraken, i didn't
need that setting at all and it just worked.

-Ben

On Mon, May 22, 2017 at 1:11 AM, Ingo Reimann  wrote:

> Hi Radek,
>
> are there any news about this issue? We are also stuck with 10.2.5 and
> can`t
> update to 10.2.7.
> We use a couple of radosgws that are loadbalanced behind a Keepalived/LVS.
> Removal of rgw_dns_name does only help, if I address the gateway directly,
> but not in general.
>
> Best regards,
>
> Ingo
>
> -Ursprüngliche Nachricht-
> Von: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] Im Auftrag von
> Radoslaw Zarzynski
> Gesendet: Mittwoch, 3. Mai 2017 11:59
> An: Łukasz Jagiełło
> Cc: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?
>
> Hello Łukasz,
>
> Thanks for your testing and sorry for my mistake. It looks that two commits
> need to be reverted to get the previous behaviour:
>
> The already mentioned one:
>   https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca
> 16d7f4c6d0
> Its dependency:
>   https://github.com/ceph/ceph/commit/b72fc1b820ede3cd186d887d9d30f7
> f91fe3764b
>
> They have been merged in the same pull request:
>   https://github.com/ceph/ceph/pull/11760
> and form the difference visible between v10.2.5 and v10.2.6 in the matter
> of
> "in_hosted_domain" handling:
>   https://github.com/ceph/ceph/blame/v10.2.5/src/rgw/rgw_rest.cc#L1773
>   https://github.com/ceph/ceph/blame/v10.2.6/src/rgw/rgw_
> rest.cc#L1781-L1782
>
> I'm really not sure we want to revert them. Still, it can be that they just
> unhide a misconfiguration issue while fixing the problems we had with
> handling of virtual hosted buckets.
>
> Regards,
> Radek
>
> On Wed, May 3, 2017 at 3:12 AM, Łukasz Jagiełło  >
> wrote:
> > Hi,
> >
> > I tried today revert [1] from 10.2.7 but the problem is still there
> > even without the change. Revert to 10.2.5 fix the issue instantly.
> >
> > https://github.com/ceph/ceph/commit/c9445faf7fac2ccb8a05b53152c0ca16d7
> > f4c6d0
> >
> > On Thu, Apr 27, 2017 at 4:53 AM, Radoslaw Zarzynski
> >  wrote:
> >>
>
>
>
> Ingo Reimann
>
> Teamleiter Technik
> Dunkel GmbH 
> Dunkel GmbH
> Philipp-Reis-Straße 2
> 65795 Hattersheim
> Fon: +49 6190 889-100
> Fax: +49 6190 889-399
> eMail: supp...@dunkel.de
> http://www.Dunkel.de/   Amtsgericht Frankfurt/Main
> HRB: 37971
> Geschäftsführer: Axel Dunkel
> Ust-ID: DE 811622001
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] DNS records for ceph

2017-05-20 Thread Ben Hines

Ceph kraken or later can use SRV records to find the mon servers. It works
great and I've found it a bit easier to maintain than the static list in
ceph.conf.

That would presumably be on the private subnet.


On May 20, 2017 7:40 AM, "David Turner"  wrote:

> The private network is only used by OSD daemons. The mons, mds, rgw, and
> clients do not need access to this subnet. Ceph does not need DNS for
> anything.  DNS is very helpful when managing your cluster, so it is helpful
> to configure it for the public network.  As nothing talks to the private
> subnet, there is no benefit to configuring DNS for it.
>
> On Sat, May 20, 2017, 7:52 AM Anton Dmitriev  wrote:
>
>> What is the true way of configuring DNS records for ceph?
>>
>> public network = 10.17.12.0/24
>> cluster network = 10.17.27.0/24
>>
>> Storages have separate interfaces for public and cluster networks.
>>
>> In DNS servers I added records such as storage01 pointing to IPs in
>> public network.
>>
>> Do I need to add extra records which will point to cluster network?
>>
>> Do I need to add records to /etc/hosts file on storages to make storages
>> resolve each other to cluster network addresses?
>>
>> Or maybe I don`t need to care about cluster network and ceph will
>> determine cluster network addresses himself without asking DNS and
>> /etc/hosts?
>>
>> Do monitors interact with OSD using public network?
>>
>> Do I need to add cluster network to monitors?
>>
>>
>> --
>> Dmitriev Anton
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Ben Hines

Well, ceph journals are of course going away with the imminent bluestore.
Are small SSDs still useful for something with Bluestore?

For speccing out a cluster today that is a many 6+ months away from being
required, which I am going to be doing, i was thinking all-SSD would be the
way to go. (or is all-spinner performant with Bluestore?) Too early to make
that call?

-Ben

On Wed, May 17, 2017 at 5:30 PM, Christian Balzer  wrote:

>
> Hello,
>
> On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote:
>
> > Hi Nick,
> >
> > El 17/05/17 a las 11:12, Nick Fisk escribió:
> > > There seems to be a shift in enterprise SSD products to larger less
> write intensive products and generally costing more than what
> > > the existing P/S 3600/3700 ranges were. For example the new Intel NVME
> P4600 range seems to start at 2TB. Although I mention Intel
> > > products, this seems to be the general outlook across all
> manufacturers. This presents some problems for acquiring SSD's for Ceph
> > > journal/WAL use if your cluster is largely write only and wouldn't
> benefit from using the extra capacity brought by these SSD's to
> > > use as cache.
> > >
> > > Is anybody in the same situation and is struggling to find good P3700
> 400G replacements?
> > >
> > We usually build tiny ceph clusters, with 1 gbit network and S3610/S3710
> > 200GB SSDs for journals. We have been experiencing supply problems for
> > those disks lately, although it seems that 400GB disks are available, at
> > least for now.
> >
> This. Very much THIS.
>
> We're trying to get 200 or 400 or even 800GB DC S3710 or S3610s here
> recently with zero success.
> And I'm believing our vendor for a change that it's not their fault.
>
> What seems to be happening (no official confirmation, but it makes all the
> sense in the world to me) is this:
>
> Intel is trying to switch to 3DNAND (like they did with the 3520s), but
> while not having officially EOL'ed the 3(6/7)10s also allowed the supply
> to run dry.
>
> Which of course is not a smart move, because now people are massively
> forced to look for alternatives and if they work unlikely to come back.
>
> I'm looking at oversized Samsungs (base model equivalent to 3610s) and am
> following this thread for other alternatives.
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-05-11 Thread Ben Hines

It actually seems like these values aren't being honored, i actually see
many more objects being processed by gc (as well as kraken object
lifecycle), even though my values are at the default 32 objs.

19:52:44 root@<> /var/run/ceph $ ceph --admin-daemon
/var/run/ceph/ceph-client.<>.asok config show | grep 'gc\|lc'
"rgw_enable_gc_threads": "true",
"rgw_enable_lc_threads": "true",
"rgw_lc_lock_max_time": "60",
"rgw_lc_max_objs": "32",
"rgw_lc_debug_interval": "-1",
"rgw_gc_max_objs": "32",
"rgw_gc_obj_min_wait": "7200",
"rgw_gc_processor_max_time": "3600",
"rgw_gc_processor_period": "3600",
"rgw_objexp_gc_interval": "600",


gc: (this is all within one hour, so must be within one cycle)

19:49:17 root@<> /var/log/ceph $ grep 'gc::process: removing' client.<>.log
| wc -l
6908

lifecycle:

19:50:22 root@<> /var/log/ceph $ grep DELETED client.<>.log | wc -l
741

Yehuda, do you know if these settings are honored still? (personally i dont
want to limit it at all, I would rather it delete as many objects as it can
within its runtime)

Also curious if lifecycle deleted objects go through the garbage collector,
or are they just immediately deleted?

-Ben

On Mon, Apr 10, 2017 at 2:46 PM, Deepak Naidu <dna...@nvidia.com> wrote:

> I still see the issue, where the space is not getting deleted. gc process
> works sometimes but sometimes it does nothing to clean the GC, as there are
> no items in the GC, but still the space is used on the pool.
>
>
>
> Any ideas what the ideal config for automatic deletion of these objects
> after the files are deleted.
>
> Currently set to
>
>
>
> "rgw_gc_max_objs": "97",
>
>
>
> --
>
> Deepak
>
>
>
> *From:* Deepak Naidu
> *Sent:* Wednesday, April 05, 2017 2:56 PM
> *To:* Ben Hines
> *Cc:* ceph-users
> *Subject:* RE: [ceph-users] ceph df space for rgw.buckets.data shows used
> even when files are deleted
>
>
>
> Thanks Ben.
>
>
>
> Is there are tuning param I need to use to fasten the process.
>
>
>
> "rgw_gc_max_objs": "32",
>
> "rgw_gc_obj_min_wait": "7200",
>
> "rgw_gc_processor_max_time": "3600",
>
> "rgw_gc_processor_period": "3600",
>
>
>
>
>
> --
>
> Deepak
>
>
>
>
>
>
>
> *From:* Ben Hines [mailto:bhi...@gmail.com <bhi...@gmail.com>]
> *Sent:* Wednesday, April 05, 2017 2:41 PM
> *To:* Deepak Naidu
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] ceph df space for rgw.buckets.data shows used
> even when files are deleted
>
>
>
> Ceph's RadosGW uses garbage collection by default.
>
>
>
> Try running 'radosgw-admin gc list' to list the objects to be garbage
> collected, or 'radosgw-admin gc process' to trigger them to be deleted.
>
>
>
> -Ben
>
>
>
> On Wed, Apr 5, 2017 at 12:15 PM, Deepak Naidu <dna...@nvidia.com> wrote:
>
> Folks,
>
>
>
> Trying to test the S3 object GW. When I try to upload any files the space
> is shown used(that’s normal behavior), but when the object is deleted it
> shows as used(don’t understand this).  Below example.
>
>
>
> Currently there is no files in the entire S3 bucket, but it still shows
> space used. Any insight is appreciated.
>
>
>
> ceph version 10.2.6
>
>
>
> *NAMEID USED
> %USED MAX AVAIL OBJECTS*
>
> default.rgw.buckets.data 49 51200M  1.08
> 4598G   12800
>
>
>
>
>
> --
>
> Deepak
> --
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> --
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Read from Replica Osds?

2017-05-08 Thread Ben Hines

We write many millions of keys into RGW which will never be changed (until
they are deleted) -- it would be interesting if we could somehow indicate
this to RGW and enable reading those from the replicas as well.

-Ben

On Mon, May 8, 2017 at 10:18 AM, Jason Dillaman  wrote:

> librbd can optionally read from replicas for snapshots and parent
> images (i.e. known read-only data). This is controlled via the
> following configuration options:
>
> rbd_balance_snap_reads
> rbd_localize_snap_reads
> rbd_balance_parent_reads
> rbd_localize_parent_reads
>
> Direct users of the librados API can also utilize the
> LIBRADOS_OPERATION_BALANCE_READS and LIBRADOS_OPERATION_LOCALIZE_READS
> flags to control this behavior.
>
> On Mon, May 8, 2017 at 12:04 PM, Mehmet  wrote:
> > Hi,
> >
> > I thought that Clients do also reads from ceph replicas. Sometimes i
> Read in
> > the web that this does only happens from the primary pg like how ceph
> handle
> > writes... so what is True?
> >
> > Greetz
> > Mehmet
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph UPDATE (not upgrade)

2017-04-26 Thread Ben Hines

It's probably fine, depending on the ceph version. The upgrade notes on the
ceph website typically tell you the steps for each version.

As of Kraken, the notes say: "You may upgrade OSDs, Monitors, and MDSs in
any order. RGW daemons should be upgraded last"

Previously it was always recommended to upgrade mons first, then osds, then
rgws. There could be some issues *during* the upgrade with versions that
are out of sync. But everything typically recovers once all services are on
the same version. There are also some flags that must be set sometimes or
other steps that have to be taken pre upgrade. The release notes will tell
you if that is the case.

-Ben

On Wed, Apr 26, 2017 at 7:21 AM, Massimiliano Cuttini 
wrote:

> On a Ceph Monitor/OSD server can i run just:
>
> *yum update -y*
>
> in order to upgrade system and packages or did this mess up Ceph?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Creating journal on needed partition

2017-04-19 Thread Ben Hines

This is my experience.  For creating new OSDs, i just created Rundeck jobs
that run ceph-deploy. It's relatively rare that new OSDs are created, so it
is fine.

Originally I was automating them with configuration management tools but it
tended to encounter edge cases and problems that ceph-deploy already
handles nicely.

-Ben

On Tue, Apr 18, 2017 at 6:22 AM, Vincent Godin 
wrote:

> Hi,
>
> If you're using ceph-deploy, just run the command :
>
> ceph-deploy osd prepare --overwrite-conf {your_host}:/dev/sdaa:/dev/sdaf2
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW lifecycle bucket stuck processing?

2017-04-14 Thread Ben Hines

Interesting - the state went back to 'UNINITIAL' eventually, possibly
because the first run never finished. Will see if it ever completes during
a nightly run.

-BEn

On Thu, Apr 13, 2017 at 11:10 AM, Ben Hines <bhi...@gmail.com> wrote:

> I initiated a manual lifecycle cleanup with:
>
> radosgw-admin lc process
>
> It took over a day working on my bucket called 'bucket1'  (w/2 million
> objects) and seems like it eventually got stuck with about 1.7 million objs
> left, with uninformative errors like:  (notice the timestamps)
>
>
> 2017-04-12 18:50:15.706952 7f90aa5dcc80  0 ERROR: rgw_remove_object
> 2017-04-12 18:50:16.841254 7f90aa5dcc80  0 ERROR: rgw_remove_object
> 2017-04-12 18:50:17.153323 7f90aa5dcc80  0 ERROR: rgw_remove_object
> 2017-04-12 18:50:20.752924 7f90aa5dcc80  0 ERROR: rgw_remove_object
> 2017-04-12 18:50:25.400460 7f90aa5dcc80  0 ERROR: rgw_remove_object
> 2017-04-13 03:19:30.027773 7f9099069700  0 -- 10.29.16.57:0/3392796805 >>
> 10.29.16.53:6801/20291 conn(0x7f9084002990 :-1 s=STATE_OPEN pgs=167140106
> cs=1 l=0).fault initiating reconnect
> 2017-04-13 03:36:30.721085 7f9099069700  0 -- 10.29.16.57:0/3392796805 >>
> 10.29.16.53:6801/20291 conn(0x7f90841d6ef0 :-1 s=STATE_OPEN pgs=167791627
> cs=1 l=0).fault initiating reconnect
> 2017-04-13 03:46:46.143055 7f90aa5dcc80  0 ERROR: rgw_remove_object
>
>
> This morning i aborted it with control-c. Now 'lc list' still shows the
> bucket as processing, and lc process returns quickly, as if the bucket is
> still locked:
>
>
>
> radosgw-admin lc list
>
> ...
> {
> "bucket": ":bucket1:default.42048218.4",
> "status": "PROCESSING"
> },
>
>
> -bash-4.2$ time radosgw-admin lc process
> 2017-04-13 11:07:48.482671 7f4fbeb87c80  0 System already converted
>
> real0m17.785s
>
>
>
> Is is possible it left behind a stale lock on the bucket due to the
> control-c?
>
>
> -Ben
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Question about RadosGW subusers

2017-04-13 Thread Ben Hines

Based on past LTS release dates would predict Luminous much sooner than
that, possibly even in May...  http://docs.ceph.com/docs/master/releases/

The docs also say "Spring" http://docs.ceph.com/docs/master/release-notes/

-Ben

On Thu, Apr 13, 2017 at 12:11 PM,  wrote:

> Thanks a lot, Trey.
>
> I'll try that stuff next week, once back from Easter holidays.
> And some "multi site" and "metasearch" is also still on my to-be-tested
> list. Need badly to free up some time for all the interesting "future of
> storage" things.
>
> BTW., we are on Kraken and I'd hope to see more of the new and shiny stuff
> here soon (something like 11.2.X) instead of waiting for Luminous late
> 2017. Not sure how the CEPH release policy is usually?!
>
> Anyhow, thanks and happy Easter everyone!
> Anton
>
>
> Gesendet: Donnerstag, 13. April 2017 um 20:15 Uhr
> Von: "Trey Palmer" 
> An: ceph.nov...@habmalnefrage.de
> Cc: "Trey Palmer" , ceph-us...@ceph.com
> Betreff: Re: [ceph-users] Question about RadosGW subusers
>
> Anton,
>
> It turns out that Adam Emerson is trying to get bucket policies and roles
> merged in time for Luminous:
>
> https://github.com/ceph/ceph/pull/14307
>
> Given this, I think we will only be using subusers temporarily as a method
> to track which human or service did what in which bucket.  This seems to us
> much easier than trying to deal with ACL's without any concept of groups,
> roles, or policies, in buckets that can often have millions of objects.
>
> Here is the general idea:
>
>
> 1.  Each bucket has a user ("master user"), but we don't use or issue that
> set of keys at all.
>
>
> radosgw-admin user create --uid=mybucket --display-name="My Bucket"
>
> You can of course have multiple buckets per user but so far for us it has
> been simple to have one user per bucket, with the username the same as the
> bucket name.   If a human needs access to more than one bucket, we will
> create multiple subusers for them.   That's not convenient, but it's
> temporary.
>
> So what we're doing is effectively making the user into the group, with
> the subusers being the users, and each user only capable of being in one
> group.   Very suboptimal, but better than the total chaos that would result
> from giving everyone the same set of keys for a given bucket.
>
>
> 2.  For each human user or service/machine user of that bucket, we create
> subusers.You can do this via:
>
> ## full-control ops user
> radosgw-admin subuser create --uid=mybucket --subuser=mybucket:alice
> --access=full --gen-access-key --gen-secret --key-type=s3
>
> ## write-only server user
> radosgw-admin subuser create --uid=mybucket --subuser=mybucket:daemon
> --access=write --gen-access-key --gen-secret-key --key-type=s3
>
> If you then do a "radosgw-admin metadata get user:mybucket", the JSON
> output contains the subusers and their keys.
>
>
> 3.  Raise the RGW log level in ceph.conf to make an "access key id" line
> available for each request, which you can then map to a subuser if/when you
> need to track who did what after the fact.  In ceph.conf:
>
> debug_rgw = 10/10
>
> This will cause the logs to be VERY verbose, an order of magnitude and
> some change more verbose than default.   We plan to discard most of the
> logs while feeding them into ElasticSearch.
>
> We might not need this much log verbosity once we have policies and are
> using unique users rather than subusers.
>
> Nevertheless, I hope we can eventually reduce the log level of the "access
> key id" line, as we have a pretty mainstream use case and I'm certain that
> tracking S3 request users will be required for many organizations for
> accounting and forensic purposes just as it is for us.
>
> -- Trey
>
> On Thu, Apr 13, 2017 at 1:29 PM,  ceph.nov...@habmalnefrage.de]> wrote:Hey Trey.
>
> Sounds great, we were discussing the same kind of requirements and
> couldn't agree on/find something "useful"... so THANK YOU for sharing!!!
>
> It would be great if you could provide some more details or an example how
> you configure the "bucket user" and sub-users and all that stuff.
> Even more interesting for me, how do the "different ppl or services"
> access that buckets/objects afterwards?! I mean via which tools (s3cmd,
> boto, cyberduck, mix of some, ...) and are there any ACLs set/in use as
> well?!
>
> (sorry if this all sounds somehow dumb but I'm a just a novice ;) )
>
> best
>  Anton
>
>
> Gesendet: Dienstag, 11. April 2017 um 00:17 Uhr
> Von: "Trey Palmer" 
> An: ceph-us...@ceph.com[mailto:ceph-us...@ceph.com]
> Betreff: [ceph-users] Question about RadosGW subusers
>
> Probably a question for @yehuda :
>
>
> We have fairly strict user accountability requirements.  The best way we
> have found to meet them with S3 object storage on Ceph is by using RadosGW
> subusers.
>
> If we set up one user per bucket, then

[ceph-users] RGW lifecycle bucket stuck processing?

2017-04-13 Thread Ben Hines

I initiated a manual lifecycle cleanup with:

radosgw-admin lc process

It took over a day working on my bucket called 'bucket1'  (w/2 million
objects) and seems like it eventually got stuck with about 1.7 million objs
left, with uninformative errors like:  (notice the timestamps)


2017-04-12 18:50:15.706952 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:16.841254 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:17.153323 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:20.752924 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:25.400460 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-13 03:19:30.027773 7f9099069700  0 -- 10.29.16.57:0/3392796805 >>
10.29.16.53:6801/20291 conn(0x7f9084002990 :-1 s=STATE_OPEN pgs=167140106
cs=1 l=0).fault initiating reconnect
2017-04-13 03:36:30.721085 7f9099069700  0 -- 10.29.16.57:0/3392796805 >>
10.29.16.53:6801/20291 conn(0x7f90841d6ef0 :-1 s=STATE_OPEN pgs=167791627
cs=1 l=0).fault initiating reconnect
2017-04-13 03:46:46.143055 7f90aa5dcc80  0 ERROR: rgw_remove_object


This morning i aborted it with control-c. Now 'lc list' still shows the
bucket as processing, and lc process returns quickly, as if the bucket is
still locked:



radosgw-admin lc list

...
{
"bucket": ":bucket1:default.42048218.4",
"status": "PROCESSING"
},


-bash-4.2$ time radosgw-admin lc process
2017-04-13 11:07:48.482671 7f4fbeb87c80  0 System already converted

real0m17.785s



Is is possible it left behind a stale lock on the bucket due to the
control-c?


-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-11 Thread Ben Hines

After much banging on this and reading through the Ceph RGW source, i
figured out Ceph RadosGW returns -13 ( EACCES - AcessDenied) if you dont
pass in a 'Prefix' in your S3 lifecycle configuration setting. It also
returns EACCES if the XML is invalid in any way, which is probably not the
most correct /  user friendly result.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html
specifies 'Prefix' as Optional, so i'll put in a bug for this.

-Ben


On Mon, Apr 3, 2017 at 12:14 PM, Ben Hines <bhi...@gmail.com> wrote:

> Interesting.
> I'm wondering what the -13 return code for the op execution in my debug
> output is (can't find in the source..)
>
>
>
> I just tried out setting the lifecycle with cyberduck and got this error,
> which is probably the other bug with AWSv4 auth, http://tracker.ceph.com/
> issues/17076   Not sure if cyberduck can be forced to use V2.
>
> 2017-04-03 12:07:15.093235 7f5617024700 10 op=20RGWPutLC_ObjStore_S3
> 2017-04-03 12:07:15.093248 7f5617024700  2 req 14:0.000438:s3:PUT
> /bentest/:put_lifecycle:authorizing
> .
> 2017-04-03 12:07:15.093637 7f5617024700 10 delaying v4 auth
> 2017-04-03 12:07:15.093643 7f5617024700 10 ERROR: AWS4 completion for this
> operation NOT IMPLEMENTED
> 2017-04-03 12:07:15.093652 7f5617024700 10 failed to authorize request
> 2017-04-03 12:07:15.093658 7f5617024700 20 handler->ERRORHANDLER:
> err_no=-2201 new_err_no=-2201
> 2017-04-03 12:07:15.093844 7f5617024700  2 req 14:0.001034:s3:PUT
> /bentest/:put_lifecycle:op status=0
> 2017-04-03 12:07:15.093859 7f5617024700  2 req 14:0.001050:s3:PUT
> /bentest/:put_lifecycle:http status=501
> 2017-04-03 12:07:15.093884 7f5617024700  1 == req done
> req=0x7f561701e340 op status=0 http_status=501 ==
>
>
>
> -Ben
>
> On Mon, Apr 3, 2017 at 7:16 AM, <ceph.nov...@habmalnefrage.de> wrote:
>
>> ... hmm, "modify" gives no error and may be the option to use, but I
>> don't see anything related to an "expires" meta field
>>
>> [root s3cmd-master]# ./s3cmd --no-ssl --verbose modify s3://Test/INSTALL
>> --expiry-days=365
>> INFO: Summary: 1 remote files to modify
>> modify: 's3://Test/INSTALL'
>>
>> [root s3cmd-master]# ./s3cmd --no-ssl --verbose info s3://Test/INSTALL
>> s3://Test/INSTALL (object):
>>File size: 3123
>>Last mod:  Mon, 03 Apr 2017 12:35:28 GMT
>>MIME type: text/plain
>>Storage:   STANDARD
>>MD5 sum:   63834dbb20b32968505c4ebe768fc8c4
>>SSE:   none
>>policy:> xmlns="http://s3.amazonaws.com/doc/2006-03-01/;>Test> Name>1000> s>falseINSTALL> Key>2017-04-03T12:35:28.533Z<
>> ETag>63834dbb20b32968505c4ebe768fc8c4> e>3123STANDARD666First
>> UserREADME.T
>> XT2017-03-31T22:36:38.380Z> >708efc3b9184c8b112e36062804aca1e<
>> Size>88STANDARD666First
>> User
>>cors:none
>>ACL:   First User: FULL_CONTROL
>>x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998
>> 096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mo
>> de:33188/mtime:1488021707/uid:0/uname:root
>>
>>
>> *Gesendet:* Montag, 03. April 2017 um 14:13 Uhr
>> *Von:* ceph.nov...@habmalnefrage.de
>> *An:* ceph-users <ceph-users@lists.ceph.com>
>>
>> *Betreff:* Re: [ceph-users] Kraken release and RGW --> "S3 bucket
>> lifecycle API has been added. Note that currently it only supports object
>> expiration."
>> ... additional strange but a bit different info related to the
>> "permission denied"
>>
>> [root s3cmd-master]# ./s3cmd --no-ssl put INSTALL s3://Test/
>> --expiry-days=5
>> upload: 'INSTALL' -> 's3://Test/INSTALL' [1 of 1]
>> 3123 of 3123 100% in 0s 225.09 kB/s done
>>
>> [root s3cmd-master]# ./s3cmd info s3://Test/INSTALL
>> s3://Test/INSTALL (object):
>> File size: 3123
>> Last mod: Mon, 03 Apr 2017 12:01:47 GMT
>> MIME type: text/plain
>> Storage: STANDARD
>> MD5 sum: 63834dbb20b32968505c4ebe768fc8c4
>> SSE: none
>> policy: http://s3.amazonaws.com/doc/2006-03-01/;>Test> Name>1000> s>falseINSTALL> Key>2017-04-03T12:01:47.745Z<
>> ETag>63834dbb20b32968505c4ebe768fc8c4> e>3123STANDARD666First
>> UserREADME.T
>> XT2017-03-31T22:36:38.380Z> >708efc3b9184c8b112e36062804aca1e<
>> Size>88STANDARD666First
>> User
>> cors: none
>> ACL: First User: FULL_CONTROL
>> x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998
>> 096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mo
>> de:33188/mtime:1488021707/uid:0/uname:root
&

Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-06 Thread Ben Hines

Personally before extreme measures like marking lost,  i would try bringing
up the osd, so it's up and out -- i believe the data will still be found
and re balanced away from it by Ceph.

-Ben

On Thu, Apr 6, 2017 at 11:20 AM, David Welch  wrote:

> Hi,
> We had a disk on the cluster that was not responding properly and causing
> 'slow requests'. The osd on the disk was stopped and the osd was marked
> down and then out. Rebalancing succeeded but (some?) pgs from that osd are
> now stuck in stale+active+clean state, which is not being resolved (see
> below for query results).
>
> My question: is it better to mark this osd as "lost" (i.e. 'ceph osd lost
> 14') or to remove the osd as detailed here:
> https://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/
>
> Thanks,
> David
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * $ ceph health detail HEALTH_ERR 17 pgs are stuck inactive for more than
> 300 seconds; 17 pgs stale; 17 pgs stuck stale pg 7.f3 is stuck stale for
> 6138.330316, current state stale+active+clean, last acting [14] pg 7.bd
>  is stuck stale for 6138.330365, current state
> stale+active+clean, last acting [14] pg 7.b6 is stuck stale for
> 6138.330374, current state stale+active+clean, last acting [14] pg 7.c5 is
> stuck stale for 6138.330363, current state stale+active+clean, last acting
> [14] pg 7.ac  is stuck stale for 6138.330385, current state
> stale+active+clean, last acting [14] pg 7.5b is stuck stale for
> 6138.330678, current state stale+active+clean, last acting [14] pg 7.1b4 is
> stuck stale for 6138.330409, current state stale+active+clean, last acting
> [14] pg 7.182 is stuck stale for 6138.330445, current state
> stale+active+clean, last acting [14] pg 7.1f8 is stuck stale for
> 6138.330720, current state stale+active+clean, last acting [14] pg 7.53 is
> stuck stale for 6138.330697, current state stale+active+clean, last acting
> [14] pg 7.1d2 is stuck stale for 6138.330663, current state
> stale+active+clean, last acting [14] pg 7.70 is stuck stale for
> 6138.330742, current state stale+active+clean, last acting [14] pg 7.14f is
> stuck stale for 6138.330585, current state stale+active+clean, last acting
> [14] pg 7.23 is stuck stale for 6138.330610, current state
> stale+active+clean, last acting [14] pg 7.153 is stuck stale for
> 6138.330600, current state stale+active+clean, last acting [14] pg 7.cc is
> stuck stale for 6138.330409, current state stale+active+clean, last acting
> [14] pg 7.16b is stuck stale for 6138.330509, current state
> stale+active+clean, last acting [14] $ ceph pg dump_stuck stale*
> *ok*
> *pg_statstateupup_primaryactingacting_primary*
> *7.f3stale+active+clean[14]14[14]14*
> *7.bd stale+active+clean[14]14[14]14*
> *7.b6stale+active+clean[14]14[14]14*
> *7.c5stale+active+clean[14]14[14]14*
> *7.ac stale+active+clean[14]14[14]14*
> *7.5bstale+active+clean[14]14[14]14*
> *7.1b4stale+active+clean[14]14[14]14*
> *7.182stale+active+clean[14]14[14]14*
> *7.1f8stale+active+clean[14]14[14]14*
> *7.53stale+active+clean[14]14[14]14*
> *7.1d2stale+active+clean[14]14[14]14*
> *7.70stale+active+clean[14]14[14]14*
> *7.14fstale+active+clean[14]14[14]14*
> *7.23stale+active+clean[14]14[14]14*
> *7.153stale+active+clean[14]14[14]14*
> *7.ccstale+active+clean[14]14[14]14*
> *7.16bstale+active+clean[14]14[14]14*
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-04-05 Thread Ben Hines

Ceph's RadosGW uses garbage collection by default.

Try running 'radosgw-admin gc list' to list the objects to be garbage
collected, or 'radosgw-admin gc process' to trigger them to be deleted.

-Ben

On Wed, Apr 5, 2017 at 12:15 PM, Deepak Naidu  wrote:

> Folks,
>
>
>
> Trying to test the S3 object GW. When I try to upload any files the space
> is shown used(that’s normal behavior), but when the object is deleted it
> shows as used(don’t understand this).  Below example.
>
>
>
> Currently there is no files in the entire S3 bucket, but it still shows
> space used. Any insight is appreciated.
>
>
>
> ceph version 10.2.6
>
>
>
> *NAMEID USED
> %USED MAX AVAIL OBJECTS*
>
> default.rgw.buckets.data 49 51200M  1.08
> 4598G   12800
>
>
>
>
>
> --
>
> Deepak
> --
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-03 Thread Ben Hines

Interesting.
I'm wondering what the -13 return code for the op execution in my debug
output is (can't find in the source..)



I just tried out setting the lifecycle with cyberduck and got this error,
which is probably the other bug with AWSv4 auth, http://tracker.ceph.com/
issues/17076   Not sure if cyberduck can be forced to use V2.

2017-04-03 12:07:15.093235 7f5617024700 10 op=20RGWPutLC_ObjStore_S3
2017-04-03 12:07:15.093248 7f5617024700  2 req 14:0.000438:s3:PUT
/bentest/:put_lifecycle:authorizing
.
2017-04-03 12:07:15.093637 7f5617024700 10 delaying v4 auth
2017-04-03 12:07:15.093643 7f5617024700 10 ERROR: AWS4 completion for this
operation NOT IMPLEMENTED
2017-04-03 12:07:15.093652 7f5617024700 10 failed to authorize request
2017-04-03 12:07:15.093658 7f5617024700 20 handler->ERRORHANDLER:
err_no=-2201 new_err_no=-2201
2017-04-03 12:07:15.093844 7f5617024700  2 req 14:0.001034:s3:PUT
/bentest/:put_lifecycle:op status=0
2017-04-03 12:07:15.093859 7f5617024700  2 req 14:0.001050:s3:PUT
/bentest/:put_lifecycle:http status=501
2017-04-03 12:07:15.093884 7f5617024700  1 == req done
req=0x7f561701e340 op status=0 http_status=501 ==



-Ben

On Mon, Apr 3, 2017 at 7:16 AM, <ceph.nov...@habmalnefrage.de> wrote:

> ... hmm, "modify" gives no error and may be the option to use, but I don't
> see anything related to an "expires" meta field
>
> [root s3cmd-master]# ./s3cmd --no-ssl --verbose modify s3://Test/INSTALL
> --expiry-days=365
> INFO: Summary: 1 remote files to modify
> modify: 's3://Test/INSTALL'
>
> [root s3cmd-master]# ./s3cmd --no-ssl --verbose info s3://Test/INSTALL
> s3://Test/INSTALL (object):
>File size: 3123
>Last mod:  Mon, 03 Apr 2017 12:35:28 GMT
>MIME type: text/plain
>Storage:   STANDARD
>MD5 sum:   63834dbb20b32968505c4ebe768fc8c4
>SSE:   none
>policy: xmlns="http://s3.amazonaws.com/doc/2006-03-01/;>
> Test1000<
> /MaxKeys>false
> INSTALL2017-04-03T12:35:28.533Z LastModified>63834dbb20b32968505c4ebe768fc8
> c43123STANDARD StorageClass>666First
> User
> README.TXT2017-03-31T22:36:38.380Z LastModified>708efc3b9184c8b112e36062804aca
> 1e88STANDARD StorageClass>666First
> User
>cors:none
>ACL:   First User: FULL_CONTROL
>x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:
> 1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8
> c4/mode:33188/mtime:1488021707/uid:0/uname:root
>
>
> *Gesendet:* Montag, 03. April 2017 um 14:13 Uhr
> *Von:* ceph.nov...@habmalnefrage.de
> *An:* ceph-users <ceph-users@lists.ceph.com>
>
> *Betreff:* Re: [ceph-users] Kraken release and RGW --> "S3 bucket
> lifecycle API has been added. Note that currently it only supports object
> expiration."
> ... additional strange but a bit different info related to the "permission
> denied"
>
> [root s3cmd-master]# ./s3cmd --no-ssl put INSTALL s3://Test/
> --expiry-days=5
> upload: 'INSTALL' -> 's3://Test/INSTALL' [1 of 1]
> 3123 of 3123 100% in 0s 225.09 kB/s done
>
> [root s3cmd-master]# ./s3cmd info s3://Test/INSTALL
> s3://Test/INSTALL (object):
> File size: 3123
> Last mod: Mon, 03 Apr 2017 12:01:47 GMT
> MIME type: text/plain
> Storage: STANDARD
> MD5 sum: 63834dbb20b32968505c4ebe768fc8c4
> SSE: none
> policy: http://s3.amazonaws.com/doc/2006-03-01/;>
> Test1000<
> /MaxKeys>false
> INSTALL2017-04-03T12:01:47.745Z LastModified>63834dbb20b32968505c4ebe768fc8
> c43123STANDARD StorageClass>666First
> User
> README.TXT2017-03-31T22:36:38.380Z LastModified>708efc3b9184c8b112e36062804aca
> 1e88STANDARD StorageClass>666First
> User
> cors: none
> ACL: First User: FULL_CONTROL
> x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:
> 1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8
> c4/mode:33188/mtime:1488021707/uid:0/uname:root
>
> [root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/ --expiry-days=365
> ERROR: Access to bucket 'Test' was denied
> ERROR: S3 error: 403 (AccessDenied)
>
> [root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/INSTALL
> --expiry-days=365
> ERROR: Parameter problem: Expecting S3 URI with just the bucket name set
> instead of 's3://Test/INSTALL'
> [root@mucsds26 s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/
> --expiry-days=365
> ERROR: Access to bucket 'Test' was denied
> ERROR: S3 error: 403 (AccessDenied)
>
> [root s3cmd-master]# ./s3cmd --no-ssl la expire s3://Test
> 2017-04-03 12:01 3123 s3://Test/INSTALL
> 2017-03-31 22:36 88 s3://Test/README.TXT
>
>
> 
>
> Gesendet: Montag, 03. April 2017 um 12:31 Uhr
> Von: ceph.nov...@habmalnefrage.de
> An: "Ben Hines" <bh

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-02 Thread Ben Hines

Hmm, Nope, not using tenants feature. The users/buckets were created on
prior ceph versions, perhaps i'll try with a newly created user + bucket.

radosgw-admin user info --uid=foo

{
"user_id": "foo",
"display_name": "foo",
"email": "snip",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [
{
"id": "foo:swift",
"permissions": "full-control"
}
],
"keys": [
{
"user": "foo:swift",
"access_key": "xxx",
"secret_key": ""
},
{
"user": "foo",
"access_key": "xxx",
"secret_key": ""
}
],
"swift_keys": [],
"caps": [
{
"type": "buckets",
"perm": "*"
},
{
"type": "metadata",
"perm": "*"
},
{
"type": "usage",
"perm": "*"
},
{
"type": "users",
"perm": "*"
},
{
"type": "zone",
"perm": "*"
}
],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
        "check_on_raw": false,
"max_size": -1024,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1024,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "none"
}




On Sun, Apr 2, 2017 at 5:54 AM, Orit Wasserman <owass...@redhat.com> wrote:

> I see : acct_user=foo, acct_name=foo,
> Are you using radosgw with tenants?
> If not it could be the problem
>
> Orit
>
> On Sat, Apr 1, 2017 at 7:43 AM, Ben Hines <bhi...@gmail.com> wrote:
>
>> I'm also trying to use lifecycles (via boto3) but i'm getting permission
>> denied trying to create the lifecycle. I'm bucket owner with full_control
>> and WRITE_ACP for good measure. Any ideas?
>>
>> This is debug ms=20 debug radosgw=20
>>
>>
>>
>> 2017-03-31 21:28:18.382217 7f50d0010700  2 req 8:0.000693:s3:PUT
>> /bentest:put_lifecycle:verifying op permissions
>> 2017-03-31 21:28:18.38 7f50d0010700  5 Searching permissions for
>> identity=RGWThirdPartyAccountAuthApplier() ->
>> RGWLocalAuthApplier(acct_user=foo, acct_name=foo, subuser=,
>> perm_mask=15, is_admin=) mask=56
>> 2017-03-31 21:28:18.382232 7f50d0010700  5 Searching permissions for
>> uid=foo
>> 2017-03-31 21:28:18.382235 7f50d0010700  5 Found permission: 15
>> 2017-03-31 21:28:18.382237 7f50d0010700  5 Searching permissions for
>> group=1 mask=56
>> 2017-03-31 21:28:18.382297 7f50d0010700  5 Found permission: 3
>> 2017-03-31 21:28:18.382307 7f50d0010700  5 Searching permissions for
>> group=2 mask=56
>> 2017-03-31 21:28:18.382313 7f50d0010700  5 Permissions for group not found
>> 2017-03-31 21:28:18.382318 7f50d0010700  5 Getting permissions
>> identity=RGWThirdPartyAccountAuthApplier() ->
>> RGWLocalAuthApplier(acct_user=foo, acct_name=foo, subuser=,
>> perm_mask=15, is_admin=) owner=foo perm=8
>> 2017-03-31 21:28:18.382325 7f50d0010700 10  
>> identity=RGWThirdPartyAccountAuthApplier()
>> -> RGWLocalAuthApplier(acct_user=foo, acct_name=foo, subuser=,
>> perm_mask=15, is_admin=) requested perm (type)=8, policy perm=8,
>> user_perm_mask=8, acl perm=8
>> 2017-03-31 21:28:18.382330 7f50d0010700  2 req 8:0.000808:s3:PUT
>> /bentest:put_lifecycle:verifying op params
>> 2017-03-31 21:28:18.382334 7f50d0010700  2 req 8:0.000813:s3:PUT
>> /bentest:put_lifecycle:pre-executing
>> 2017-03-31 21:28:18.382339 7f50d0010700  2 req 8:0.000817:s3:PUT
>> /bentest:put_lifecycle:executing
>> 2017-03-31 21:28:18.382361 7f50d0010700 15 read len=183
>> data=http://s3.amazonaws.com
>> /doc/2006-03-01/">Enabled
>> 10
>> 2017-03-31 21:28:18.382439 7f50d0010700  2 req 8:0.000917:s3:PUT
>> /bentest:put_lifecycle:completing
>> 2017-03-31 21:28:18.382594 7f50d0010

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-03-31 Thread Ben Hines

I'm also trying to use lifecycles (via boto3) but i'm getting permission
denied trying to create the lifecycle. I'm bucket owner with full_control
and WRITE_ACP for good measure. Any ideas?

This is debug ms=20 debug radosgw=20



2017-03-31 21:28:18.382217 7f50d0010700  2 req 8:0.000693:s3:PUT
/bentest:put_lifecycle:verifying op permissions
2017-03-31 21:28:18.38 7f50d0010700  5 Searching permissions for
identity=RGWThirdPartyAccountAuthApplier() ->
RGWLocalAuthApplier(acct_user=foo,
acct_name=foo, subuser=, perm_mask=15, is_admin=) mask=56
2017-03-31 21:28:18.382232 7f50d0010700  5 Searching permissions for uid=foo
2017-03-31 21:28:18.382235 7f50d0010700  5 Found permission: 15
2017-03-31 21:28:18.382237 7f50d0010700  5 Searching permissions for
group=1 mask=56
2017-03-31 21:28:18.382297 7f50d0010700  5 Found permission: 3
2017-03-31 21:28:18.382307 7f50d0010700  5 Searching permissions for
group=2 mask=56
2017-03-31 21:28:18.382313 7f50d0010700  5 Permissions for group not found
2017-03-31 21:28:18.382318 7f50d0010700  5 Getting permissions identity=
RGWThirdPartyAccountAuthApplier() -> RGWLocalAuthApplier(acct_user=foo,
acct_name=foo, subuser=, perm_mask=15, is_admin=) owner=foo perm=8
2017-03-31 21:28:18.382325 7f50d0010700 10  identity=
RGWThirdPartyAccountAuthApplier() -> RGWLocalAuthApplier(acct_user=foo,
acct_name=foo, subuser=, perm_mask=15, is_admin=) requested perm (type)=8,
policy perm=8, user_perm_mask=8, acl perm=8
2017-03-31 21:28:18.382330 7f50d0010700  2 req 8:0.000808:s3:PUT
/bentest:put_lifecycle:verifying op params
2017-03-31 21:28:18.382334 7f50d0010700  2 req 8:0.000813:s3:PUT
/bentest:put_lifecycle:pre-executing
2017-03-31 21:28:18.382339 7f50d0010700  2 req 8:0.000817:s3:PUT
/bentest:put_lifecycle:executing
2017-03-31 21:28:18.382361 7f50d0010700 15 read len=183
data=http://s3.amazonaws.com/doc/2006-03-01/
">Enabled10
2017-03-31 21:28:18.382439 7f50d0010700  2 req 8:0.000917:s3:PUT
/bentest:put_lifecycle:completing
2017-03-31 21:28:18.382594 7f50d0010700  2 req 8:0.001072:s3:PUT
/bentest:put_lifecycle:op status=-13
2017-03-31 21:28:18.382620 7f50d0010700  2 req 8:0.001098:s3:PUT
/bentest:put_lifecycle:http status=403
2017-03-31 21:28:18.382665 7f50d0010700  1 == req done
req=0x7f50d000a340 op status=-13 http_status=403 ==


-Ben

On Tue, Mar 28, 2017 at 6:42 AM, Daniel Gryniewicz  wrote:

> On 03/27/2017 04:28 PM, ceph.nov...@habmalnefrage.de wrote:
>
>> Hi Cephers.
>>
>> Couldn't find any special documentation about the "S3 object expiration"
>> so I assume it should work "AWS S3 like" (?!?) ...  BUT ...
>> we have a test cluster based on 11.2.0 - Kraken and I set some object
>> expiration dates via CyberDuck and DragonDisk, but the objects are still
>> there, days after the applied date/time. Do I miss something?
>>
>> Thanks & regards
>>
>>
> It is intended to work like AWS S3, yes.  Not every feature of AWS
> lifecycle is supported, (for example no moving between storage tiers), but
> deletion works, and is tested in teuthology runs.
>
> Did you somehow turn it off?  The config option rgw_enable_lc_threads
> controls it, but it defaults to "on".  Also make sure rgw_lc_debug_interval
> is not set, and that rgw_lifecycle_work_time isn't set to some interval too
> small scan your objects...
>
> Daniel
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 Multi-part upload broken with newer AWS Java SDK and Kraken RGW

2017-03-31 Thread Ben Hines

Hey Yehuda,

Are there plans to port of this fix to Kraken?  (or is there even another
Kraken release planned? :)

thanks!

-Ben

On Wed, Mar 1, 2017 at 11:33 AM, Yehuda Sadeh-Weinraub 
wrote:

> This sounds like this bug:
> http://tracker.ceph.com/issues/17076
>
> Will be fixed in 10.2.6. It's triggered by aws4 auth, so a workaround
> would be to use aws2 instead.
>
> Yehuda
>
>
> On Wed, Mar 1, 2017 at 10:46 AM, John Nielsen  wrote:
> > Hi all-
> >
> > We use Amazon S3 quite a bit at $WORK but are evaluating Ceph+radosgw as
> an alternative for some things. We have an "S3 smoke test" written using
> the AWS Java SDK that we use to validate a number of operations. On my
> Kraken cluster, multi-part uploads work fine for s3cmd. Our smoke test also
> passes fine using version 1.9.27 of the AWS SDK. However in SDK 1.11.69 the
> multi-part upload fails. The initial POST (to reserve the object name and
> start the upload) succeeds, but the first PUT fails with a 403 error.
> >
> > So, does anyone know offhand what might be going on here? If not, how
> can I get more details about the 403 error and what is causing it?
> >
> > The cluster was installed with Jewel and recently updated to Kraken.
> Using the built-in civetweb server.
> >
> > Here is the log output for three multi-part uploads. The first two are
> s3cmd and the older SDK, respectively. The last is the failing one with the
> newer SDK.
> >
> > S3cmd, Succeeds.
> > 2017-03-01 17:33:16.845613 7f80b06de700  1 == starting new request
> req=0x7f80b06d8340 =
> > 2017-03-01 17:33:16.856522 7f80b06de700  1 == req done
> req=0x7f80b06d8340 op status=0 http_status=200 ==
> > 2017-03-01 17:33:16.856628 7f80b06de700  1 civetweb: 0x7f81131fd000:
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST /
> testdomobucket10x3x104x64250438/multipartStreamTest?uploads HTTP/1.1" 1 0
> - -
> > 2017-03-01 17:33:16.953967 7f80b06de700  1 == starting new request
> req=0x7f80b06d8340 =
> > 2017-03-01 17:33:24.094134 7f80b06de700  1 == req done
> req=0x7f80b06d8340 op status=0 http_status=200 ==
> > 2017-03-01 17:33:24.094211 7f80b06de700  1 civetweb: 0x7f81131fd000:
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT /
> testdomobucket10x3x104x64250438/multipartStreamTest?
> partNumber=1=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB HTTP/1.1" 1 0 - -
> > 2017-03-01 17:33:24.193747 7f80b06de700  1 == starting new request
> req=0x7f80b06d8340 =
> > 2017-03-01 17:33:30.002050 7f80b06de700  1 == req done
> req=0x7f80b06d8340 op status=0 http_status=200 ==
> > 2017-03-01 17:33:30.002124 7f80b06de700  1 civetweb: 0x7f81131fd000:
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT /
> testdomobucket10x3x104x64250438/multipartStreamTest?
> partNumber=2=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB HTTP/1.1" 1 0 - -
> > 2017-03-01 17:33:30.085033 7f80b06de700  1 == starting new request
> req=0x7f80b06d8340 =
> > 2017-03-01 17:33:30.104944 7f80b06de700  1 == req done
> req=0x7f80b06d8340 op status=0 http_status=200 ==
> > 2017-03-01 17:33:30.105007 7f80b06de700  1 civetweb: 0x7f81131fd000:
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST /
> testdomobucket10x3x104x64250438/multipartStreamTest?uploadId=2~
> IGYuZC4uDC27TGWfpFkKk-Makqvk_XB HTTP/1.1" 1 0 - -
> >
> > AWS SDK (1.9.27). Succeeds.
> > 2017-03-01 17:54:50.720093 7f80c0eff700  1 == starting new request
> req=0x7f80c0ef9340 =
> > 2017-03-01 17:54:50.733109 7f80c0eff700  1 == req done
> req=0x7f80c0ef9340 op status=0 http_status=200 ==
> > 2017-03-01 17:54:50.733188 7f80c0eff700  1 civetweb: 0x7f811314c000:
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "POST /
> testdomobucket10x3x104x6443285/multipartStreamTest?uploads HTTP/1.1" 1 0
> - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 Java_HotSpot(TM)_64-Bit_
> Server_VM/24.71-b01/1.7.0_71
> > 2017-03-01 17:54:50.831618 7f80c0eff700  1 == starting new request
> req=0x7f80c0ef9340 =
> > 2017-03-01 17:54:58.057011 7f80c0eff700  1 == req done
> req=0x7f80c0ef9340 op status=0 http_status=200 ==
> > 2017-03-01 17:54:58.057082 7f80c0eff700  1 civetweb: 0x7f811314c000:
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT /
> testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%
> 7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo=1 HTTP/1.1" 1 0 -
> aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 Java_HotSpot(TM)_64-Bit_
> Server_VM/24.71-b01/1.7.0_71
> > 2017-03-01 17:54:58.143235 7f80c0eff700  1 == starting new request
> req=0x7f80c0ef9340 =
> > 2017-03-01 17:54:58.328351 7f80c0eff700  1 == req done
> req=0x7f80c0ef9340 op status=0 http_status=200 ==
> > 2017-03-01 17:54:58.328437 7f80c0eff700  1 civetweb: 0x7f811314c000:
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT /
> testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%
> 7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo=2 HTTP/1.1" 1 0 -
> aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 Java_HotSpot(TM)_64-Bit_
> Server_VM/24.71-b01/1.7.0_71

Re: [ceph-users] Shrinking lab cluster to free hardware for a new deployment

2017-03-09 Thread Ben Hines

AFAIK depending on how many you have, you are likely to end up with 'too
many pgs per OSD' warning for your main pool if you do this, because the
number of PGs in a pool cannot be reduced and there will be less OSDs to
put them on.

-Ben

On Wed, Mar 8, 2017 at 5:53 AM, Henrik Korkuc  wrote:

> On 17-03-08 15:39, Kevin Olbrich wrote:
>
> Hi!
>
> Currently I have a cluster with 6 OSDs (5 hosts, 7TB RAID6 each).
> We want to shut down the cluster but it holds some semi-productive VMs we
> might or might not need in the future.
> To keep them, we would like to shrink our cluster from 6 to 2 OSDs (we use
> size 2 and min_size 1).
>
> Should I set the OSDs out one by one or with norefill, norecovery flags
> set but all at once?
> If last is the case, which flags should be set also?
>
> just set OSDs out and wait for them to rebalace, OSDs will be active and
> serve traffic while data will be moving off them. I had a case where some
> pgs wouldn't move out, so after everything settles, you may need to remove
> OSDs from crush one by one.
>
> Thanks!
>
> Kind regards,
> Kevin Olbrich.
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw scaling recommendation?

2017-02-09 Thread Ben Hines

I'm curious how does the num_threads option to civetweb relate to the 'rgw
thread pool size'?  Should i make them equal?

ie:

rgw frontends = civetweb enable_keep_alive=yes port=80 num_threads=125
error_log_file=/var/log/ceph/civetweb.error.log
access_log_file=/var/log/ceph/civetweb.access.log


-Ben

On Thu, Feb 9, 2017 at 12:30 PM, Wido den Hollander  wrote:

>
> > Op 9 februari 2017 om 19:34 schreef Mark Nelson :
> >
> >
> > I'm not really an RGW expert, but I'd suggest increasing the
> > "rgw_thread_pool_size" option to something much higher than the default
> > 100 threads if you haven't already.  RGW requires at least 1 thread per
> > client connection, so with many concurrent connections some of them
> > might end up timing out.  You can scale the number of threads and even
> > the number of RGW instances on a single server, but at some point you'll
> > run out of threads at the OS level.  Probably before that actually
> > happens though, you'll want to think about multiple RGW gateway nodes
> > behind a load balancer.  Afaik that's how the big sites do it.
> >
>
> In addition, have you tried to use more RADOS handles?
>
> rgw_num_rados_handles = 8
>
> That with more RGW threads as Mark mentioned.
>
> Wido
>
> > I believe some folks are considering trying to migrate rgw to a
> > threadpool/event processing model but it sounds like it would be quite a
> > bit of work.
> >
> > Mark
> >
> > On 02/09/2017 12:25 PM, Benjeman Meekhof wrote:
> > > Hi all,
> > >
> > > We're doing some stress testing with clients hitting our rados gw
> > > nodes with simultaneous connections.  When the number of client
> > > connections exceeds about 5400 we start seeing 403 forbidden errors
> > > and log messages like the following:
> > >
> > > 2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew
> > > too big now=2017-02-09 08:53:16.00 req_time=2017-02-09
> > > 08:37:18.00
> > >
> > > This is version 10.2.5 using embedded civetweb.  There's just one
> > > instance per node, and they all start generating 403 errors and the
> > > above log messages when enough clients start hitting them.  The
> > > hardware is not being taxed at all, negligible load and network
> > > throughput.   OSD don't show any appreciable increase in CPU load or
> > > io wait on journal/data devices.  Unless I'm missing something it
> > > looks like the RGW is just not scaling to fill out the hardware it is
> > > on.
> > >
> > > Does anyone have advice on scaling RGW to fully utilize a host?
> > >
> > > thanks,
> > > Ben
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw static website docs 404

2017-01-19 Thread Ben Hines

Sure. However, as a general development process, many projects require
documentation to go in with a feature. The person who wrote it is the best
person to explain how to use it.

Even just adding a new setting to a list of valid settings is pretty basic,
quick and easy.  It'd odd that major new features are added and effectively
kept secret.

-Ben

On Thu, Jan 19, 2017 at 1:56 AM, Wido den Hollander <w...@42on.com> wrote:

>
> > Op 19 januari 2017 om 2:57 schreef Ben Hines <bhi...@gmail.com>:
> >
> >
> > Aha! Found some docs here in the RHCS site:
> >
> > https://access.redhat.com/documentation/en/red-hat-ceph-
> storage/2/paged/object-gateway-guide-for-red-hat-
> enterprise-linux/chapter-2-configuration
> >
> > Really, ceph.com should have all this too...
> >
>
> I agree, but keep in mind that Ceph is a free, Open Source project. It's
> free to use and to consume. Writing documentation isn't always the
> coolest/fanciest/nicest thing to do.
>
> You are more then welcome to send a Pull Request on Github to update the
> documentation for the RGW. That would help others which might be in the
> same situation as you are.
>
> Open Source is by working together and collaborating on a project :)
>
> This can be writing code, documentation or helping others on mailinglists.
> That way we call benefit from the project.
>
> Wido
>
> > -Ben
> >
> > On Wed, Jan 18, 2017 at 5:15 PM, Ben Hines <bhi...@gmail.com> wrote:
> >
> > > Are there docs on the RGW static website feature?
> > >
> > > I found 'rgw enable static website' config setting only via the mailing
> > > list. A search for 'static' on ceph.com turns up release notes, but no
> > > other documentation. Anyone have pointers on how to set this up and
> what i
> > > can do with it? Does it require using dns based buckets, for example?
> I'd
> > > like to be able to hit a website with http:
> ,
> > > ideally. (without the browser forcing it to download)
> > >
> > > thanks,
> > >
> > > -Ben
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can't create bucket (ERROR: endpoints not configured for upstream zone)

2016-12-22 Thread Ben Hines

FWIW, this is still required with Jewel 10.2.5. It sounded like it was
finally fixed from the release notes, but i had the same issue. Fortunately
Micha's steps are easy and fix it right up.

In my case i didn't think i had any mixed RGWs - was planning to stop them
all first -  but i had forgotten about my monitoring system which runs
'radosgw-admin' -- that part upgraded first, before i'd stopped any of my
Infernalis RGW's.

-Ben

On Thu, Jul 28, 2016 at 7:50 AM, Arvydas Opulskis <
arvydas.opuls...@adform.com> wrote:

> Hi,
>
> We solved it by running Micha scripts, plus we needed to run period update
> and commit commands (for some reason we had to do it in separate commands):
>
> radosgw-admin period update
> radosgw-admin period commit
>
> Btw, we added endpoints to json file, but I am not sure these are needed.
>
> And I agree with Micha - this should be noticed in upgrade instructions on
> Ceph site. We run into this trap on our prod env (upgrading Infernalis ->
> Jewel). Maybe we should test it more next time..
>
> Br,
> Arvydas
>
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Micha Krause
> Sent: Wednesday, July 6, 2016 2:46 PM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Can't create bucket (ERROR: endpoints not
> configured for upstream zone)
>
> Hi,
>
> I think I found a Solution for my Problem, here are my findings:
>
>
> This Bug can be easily reproduced in a test environment:
>
> 1. Delete all rgw related pools.
> 2. Start infernalis radosgw to initialize them again.
> 3. Create user.
> 4. User creates bucket.
> 5. Upgrade radosgw to jewel
> 6. User creates bucket -> fail
>
> I found this scary script from Yehuda: https://raw.githubusercontent.
> com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone
> which needs to be modified according to http://www.spinics.net/lists/c
> eph-users/msg27957.html.
>
> After the modification, a lot of the script becomes obsolete (in my
> opinion), and can be rewritten to this (less scary):
>
>
> #!/bin/sh
>
> set -x
>
> RADOSGW_ADMIN=radosgw-admin
>
> echo "Exercise initialization code"
> $RADOSGW_ADMIN user info --uid=foo # exercise init code (???)
>
> echo "Get default zonegroup"
> $RADOSGW_ADMIN zonegroup get --rgw-zonegroup=default | sed
> 's/"id":.*/"id": "default",/g' | sed 's/"master_zone.*/"master_zone":
> "default",/g' > default-zg.json
>
> echo "Get default zone"
> $RADOSGW_ADMIN zone get --zone-id=default > default-zone.json
>
> echo "Creating realm"
> $RADOSGW_ADMIN realm create --rgw-realm=myrealm
>
> echo "Creating default zonegroup"
> $RADOSGW_ADMIN zonegroup set --rgw-zonegroup=default < default-zg.json
>
> echo "Creating default zone"
> $RADOSGW_ADMIN zone set --rgw-zone=default < default-zone.json
>
> echo "Setting default zonegroup to 'default'"
> $RADOSGW_ADMIN zonegroup default --rgw-zonegroup=default
>
> echo "Setting default zone to 'default'"
> $RADOSGW_ADMIN zone default --rgw-zone=default
>
>
> My plan to do this in production is now:
>
> 1. Stop all rados-gateways
> 2. Upgrade rados-gateways to jewel
> 3. Run less scary script
> 4. Start rados-gateways
>
> This whole thing is a serious problem, there should at least be a clear
> notice in the Jewel release notes about this. I was lucky to catch this in
> my test-cluster, I'm sure a lot of people will run into this in production.
>
>
> Micha Krause
>
>
> Am 05.07.2016 um 09:30 schrieb Micha Krause:
> > *bump*
> >
> > Am 01.07.2016 um 13:00 schrieb Micha Krause:
> >> Hi,
> >>
> >>  > In Infernalis there was this command:
> >>>
> >>> radosgw-admin regions list
> >>>
> >>> But this is missing in Jewel.
> >>
> >> Ok, I just found out that this was renamed to zonegroup list:
> >>
> >> root@rgw01:~ # radosgw-admin --id radosgw.rgw zonegroup list
> >> read_default_id : -2 {
> >>  "default_info": "",
> >>  "zonegroups": [
> >>  "default"
> >>  ]
> >> }
> >>
> >> This looks to me like there is indeed only one zonegroup or region
> configured.
> >>
> >> Micha Krause
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v11.1.0 kraken candidate released

2016-12-12 Thread Ben Hines

It looks like the second releasenote in that section answers my question.
 sortbitwise is only supported in jewel, and it's required to be already
set for Kraken upgraded OSDs to even start up, so one must go to Jewel
first.

The section heading should probably say just "Upgrading to Kraken" rather
than "Upgrading from Jewel".

Also, there is a whole 'upgrading, release by release' section in the
documentation which hasn't been updated since Firefly. Someone, perhaps me,
should probably update those... It's a little easier to follow that path of
upgrade notes. (though reading release notes is also required)

http://docs.ceph.com/docs/master/install/upgrading-ceph/

-Ben

On Mon, Dec 12, 2016 at 6:35 PM, Ben Hines <bhi...@gmail.com> wrote:

> Hi! Can you clarify whether this release note applies to Jewel upgrades
> only? Ie, can we go Infernalis -> Kraken? It is in the 'upgrading from
> jewel' section which would imply that it doesn't apply to Infernalis ->
> Kraken. (or any other version to kraken), but it does say 'All clusters'.
>
> Upgrading from Jewel
> 
> * All clusters must first be upgraded to Jewel 10.2.z before upgrading
>   to Kraken 11.2.z (or, eventually, Luminous 12.2.z).
>
> thanks!
>
> -Ben
>
>
> On Mon, Dec 12, 2016 at 6:28 PM, Abhishek L <abhishek.lekshma...@gmail.com
> > wrote:
>
>> Hi everyone,
>>
>> This is the first release candidate for Kraken, the next stable
>> release series. There have been major changes from jewel with many
>> features being added. Please note the upgrade process from jewel,
>> before upgrading.
>>
>> Major Changes from Jewel
>> 
>>
>> - *RADOS*:
>>
>>   * The new *BlueStore* backend now has a stable disk format and is
>> passing our failure and stress testing. Although the backend is
>> still flagged as experimental, we encourage users to try it out
>> for non-production clusters and non-critical data sets.
>>   * RADOS now has experimental support for *overwrites on
>> erasure-coded* pools. Because the disk format and implementation
>> are not yet finalized, there is a special pool option that must be
>> enabled to test the new feature.  Enabling this option on a cluster
>> will permanently bar that cluster from being upgraded to future
>> versions.
>>   * We now default to the AsyncMessenger (``ms type = async``) instead
>> of the legacy SimpleMessenger.  The most noticeable difference is
>> that we now use a fixed sized thread pool for network connections
>> (instead of two threads per socket with SimpleMessenger).
>>   * Some OSD failures are now detected almost immediately, whereas
>> previously the heartbeat timeout (which defaults to 20 seconds)
>> had to expire.  This prevents IO from blocking for an extended
>> period for failures where the host remains up but the ceph-osd
>> process is no longer running.
>>   * There is a new ``ceph-mgr`` daemon.  It is currently collocated with
>> the monitors by default, and is not yet used for much, but the basic
>> infrastructure is now in place.
>>   * The size of encoded OSDMaps has been reduced.
>>   * The OSDs now quiesce scrubbing when recovery or rebalancing is in
>> progress.
>>
>> - *RGW*:
>>
>>   * RGW now supports a new zone type that can be used for metadata
>> indexing
>> via Elasticseasrch.
>>   * RGW now supports the S3 multipart object copy-part API.
>>   * It is possible now to reshard an existing bucket. Note that bucket
>> resharding currently requires that all IO (especially writes) to
>> the specific bucket is quiesced.
>>   * RGW now supports data compression for objects.
>>   * Civetweb version has been upgraded to 1.8
>>   * The Swift static website API is now supported (S3 support has been
>> added
>> previously).
>>   * S3 bucket lifecycle API has been added. Note that currently it only
>> supports
>> object expiration.
>>   * Support for custom search filters has been added to the LDAP auth
>> implementation.
>>   * Support for NFS version 3 has been added to the RGW NFS gateway.
>>   * A Python binding has been created for librgw.
>>
>> - *RBD*:
>>
>>   * RBD now supports images stored in an *erasure-coded* RADOS pool
>> using the new (experimental) overwrite support. Images must be
>> created using the new rbd CLI "--data-pool " option to
>> specify the EC pool where the backing data objects are
>> stored. Attempting to create an i

Re: [ceph-users] v11.1.0 kraken candidate released

2016-12-12 Thread Ben Hines

Hi! Can you clarify whether this release note applies to Jewel upgrades
only? Ie, can we go Infernalis -> Kraken? It is in the 'upgrading from
jewel' section which would imply that it doesn't apply to Infernalis ->
Kraken. (or any other version to kraken), but it does say 'All clusters'.

Upgrading from Jewel

* All clusters must first be upgraded to Jewel 10.2.z before upgrading
  to Kraken 11.2.z (or, eventually, Luminous 12.2.z).

thanks!

-Ben


On Mon, Dec 12, 2016 at 6:28 PM, Abhishek L 
wrote:

> Hi everyone,
>
> This is the first release candidate for Kraken, the next stable
> release series. There have been major changes from jewel with many
> features being added. Please note the upgrade process from jewel,
> before upgrading.
>
> Major Changes from Jewel
> 
>
> - *RADOS*:
>
>   * The new *BlueStore* backend now has a stable disk format and is
> passing our failure and stress testing. Although the backend is
> still flagged as experimental, we encourage users to try it out
> for non-production clusters and non-critical data sets.
>   * RADOS now has experimental support for *overwrites on
> erasure-coded* pools. Because the disk format and implementation
> are not yet finalized, there is a special pool option that must be
> enabled to test the new feature.  Enabling this option on a cluster
> will permanently bar that cluster from being upgraded to future
> versions.
>   * We now default to the AsyncMessenger (``ms type = async``) instead
> of the legacy SimpleMessenger.  The most noticeable difference is
> that we now use a fixed sized thread pool for network connections
> (instead of two threads per socket with SimpleMessenger).
>   * Some OSD failures are now detected almost immediately, whereas
> previously the heartbeat timeout (which defaults to 20 seconds)
> had to expire.  This prevents IO from blocking for an extended
> period for failures where the host remains up but the ceph-osd
> process is no longer running.
>   * There is a new ``ceph-mgr`` daemon.  It is currently collocated with
> the monitors by default, and is not yet used for much, but the basic
> infrastructure is now in place.
>   * The size of encoded OSDMaps has been reduced.
>   * The OSDs now quiesce scrubbing when recovery or rebalancing is in
> progress.
>
> - *RGW*:
>
>   * RGW now supports a new zone type that can be used for metadata indexing
> via Elasticseasrch.
>   * RGW now supports the S3 multipart object copy-part API.
>   * It is possible now to reshard an existing bucket. Note that bucket
> resharding currently requires that all IO (especially writes) to
> the specific bucket is quiesced.
>   * RGW now supports data compression for objects.
>   * Civetweb version has been upgraded to 1.8
>   * The Swift static website API is now supported (S3 support has been
> added
> previously).
>   * S3 bucket lifecycle API has been added. Note that currently it only
> supports
> object expiration.
>   * Support for custom search filters has been added to the LDAP auth
> implementation.
>   * Support for NFS version 3 has been added to the RGW NFS gateway.
>   * A Python binding has been created for librgw.
>
> - *RBD*:
>
>   * RBD now supports images stored in an *erasure-coded* RADOS pool
> using the new (experimental) overwrite support. Images must be
> created using the new rbd CLI "--data-pool " option to
> specify the EC pool where the backing data objects are
> stored. Attempting to create an image directly on an EC pool will
> not be successful since the image's backing metadata is only
> supported on a replicated pool.
>   * The rbd-mirror daemon now supports replicating dynamic image
> feature updates and image metadata key/value pairs from the
> primary image to the non-primary image.
>   * The number of image snapshots can be optionally restricted to a
> configurable maximum.
>   * The rbd Python API now supports asynchronous IO operations.
>
> - *CephFS*:
>
>   * libcephfs function definitions have been changed to enable proper
> uid/gid control.  The library version has been increased to reflect the
> interface change.
>   * Standby replay MDS daemons now consume less memory on workloads
> doing deletions.
>   * Scrub now repairs backtrace, and populates `damage ls` with
> discovered errors.
>   * A new `pg_files` subcommand to `cephfs-data-scan` can identify
> files affected by a damaged or lost RADOS PG.
>   * The false-positive "failing to respond to cache pressure" warnings have
> been fixed.
>
>
> Upgrading from Jewel
> 
>
> * All clusters must first be upgraded to Jewel 10.2.z before upgrading
>   to Kraken 11.2.z (or, eventually, Luminous 12.2.z).
>
> * The ``sortbitwise`` flag must be set on the Jewel cluster before
> upgrading
>   to Kraken.  The latest

Re: [ceph-users] Kraken 11.x feedback

2016-12-09 Thread Ben Hines

Not particularly, i just never did the Jewel upgrade. (normally like to
stay relatively current)

-Ben

On Fri, Dec 9, 2016 at 11:40 AM, Samuel Just <sj...@redhat.com> wrote:

> Is there a particular reason you are sticking to the versions with
> shorter support periods?
> -Sam
>
> On Fri, Dec 9, 2016 at 11:38 AM, Ben Hines <bhi...@gmail.com> wrote:
> > Anyone have any good / bad experiences with Kraken? I haven't seen much
> > discussion of it. Particularly from the RGW front.
> >
> > I'm still on Infernalis for our cluster, considering going up to K.
> >
> > thanks,
> >
> > -Ben
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Kraken 11.x feedback

2016-12-09 Thread Ben Hines

Anyone have any good / bad experiences with Kraken? I haven't seen much
discussion of it. Particularly from the RGW front.

I'm still on Infernalis for our cluster, considering going up to K.

thanks,

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines

Thanks. Will try it out once we get on Jewel.

Just curious, does bucket deletion with --purge-objects work via
radosgw-admin with the no index option?
If not, i imagine rados could be used to delete them manually by prefix.

On Sep 21, 2016 6:02 PM, "Stas Starikevich" <stas.starikev...@gmail.com>
wrote:

> Hi Ben,
>
> Since the 'Jewel' RadosGW supports blind buckets.
> To enable blind buckets configuration I used:
>
> radosgw-admin zone get --rgw-zone=default > default-zone.json
> #change index_type from 0 to 1
> vi default-zone.json
> radosgw-admin zone set --rgw-zone=default --infile default-zone.json
>
> To apply changes you have to restart all the RGW daemons. Then all newly
> created buckets will not have index (bucket list will provide empty
> output), but GET\PUT works perfectly.
> In my tests there is no performance difference between SSD-backed indexes
> and 'blind bucket' configuration.
>
> Stas
>
> > On Sep 21, 2016, at 2:26 PM, Ben Hines <bhi...@gmail.com> wrote:
> >
> > Nice, thanks! Must have missed that one. It might work well for our use
> case since we don't really need the index.
> >
> > -Ben
> >
> > On Wed, Sep 21, 2016 at 11:23 AM, Gregory Farnum <gfar...@redhat.com>
> wrote:
> > On Wednesday, September 21, 2016, Ben Hines <bhi...@gmail.com> wrote:
> > Yes, 200 million is way too big for a single ceph RGW bucket. We
> encountered this problem early on and sharded our buckets into 20 buckets,
> each which have the sharded bucket index with 20 shards.
> >
> > Unfortunately, enabling the sharded RGW index requires recreating the
> bucket and all objects.
> >
> > The fact that ceph uses ceph itself for the bucket indexes makes RGW
> less reliable in our experience. Instead of depending on one object you're
> depending on two, with the index and the object itself. If the cluster has
> any issues with the index the fact that it blocks access to the object
> itself is very frustrating. If we could retrieve / put objects into RGW
> without hitting the index at all we would - we don't need to list our
> buckets.
> >
> > I don't know the details or which release it went into, but indexless
> buckets are now a thing -- check the release notes or search the lists! :)
> > -Greg
> >
> >
> >
> > -Ben
> >
> > On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander <w...@42on.com>
> wrote:
> >
> > > Op 20 september 2016 om 10:55 schreef Василий Ангапов <
> anga...@gmail.com>:
> > >
> > >
> > > Hello,
> > >
> > > Is there any way to copy rgw bucket index to another Ceph node to
> > > lower the downtime of RGW? For now I have  a huge bucket with 200
> > > million files and its backfilling is blocking RGW completely for an
> > > hour and a half even with 10G network.
> > >
> >
> > No, not really. What you really want is the bucket sharding feature.
> >
> > So what you can do is enable the sharding, create a NEW bucket and copy
> over the objects.
> >
> > Afterwards you can remove the old bucket.
> >
> > Wido
> >
> > > Thanks!
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines

Nice, thanks! Must have missed that one. It might work well for our use
case since we don't really need the index.

-Ben

On Wed, Sep 21, 2016 at 11:23 AM, Gregory Farnum <gfar...@redhat.com> wrote:

> On Wednesday, September 21, 2016, Ben Hines <bhi...@gmail.com> wrote:
>
>> Yes, 200 million is way too big for a single ceph RGW bucket. We
>> encountered this problem early on and sharded our buckets into 20 buckets,
>> each which have the sharded bucket index with 20 shards.
>>
>> Unfortunately, enabling the sharded RGW index requires recreating the
>> bucket and all objects.
>>
>> The fact that ceph uses ceph itself for the bucket indexes makes RGW less
>> reliable in our experience. Instead of depending on one object you're
>> depending on two, with the index and the object itself. If the cluster has
>> any issues with the index the fact that it blocks access to the object
>> itself is very frustrating. If we could retrieve / put objects into RGW
>> without hitting the index at all we would - we don't need to list our
>> buckets.
>>
>
> I don't know the details or which release it went into, but indexless
> buckets are now a thing -- check the release notes or search the lists! :)
> -Greg
>
>
>
>>
>> -Ben
>>
>> On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander <w...@42on.com>
>> wrote:
>>
>>>
>>> > Op 20 september 2016 om 10:55 schreef Василий Ангапов <
>>> anga...@gmail.com>:
>>> >
>>> >
>>> > Hello,
>>> >
>>> > Is there any way to copy rgw bucket index to another Ceph node to
>>> > lower the downtime of RGW? For now I have  a huge bucket with 200
>>> > million files and its backfilling is blocking RGW completely for an
>>> > hour and a half even with 10G network.
>>> >
>>>
>>> No, not really. What you really want is the bucket sharding feature.
>>>
>>> So what you can do is enable the sharding, create a NEW bucket and copy
>>> over the objects.
>>>
>>> Afterwards you can remove the old bucket.
>>>
>>> Wido
>>>
>>> > Thanks!
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines

Yes, 200 million is way too big for a single ceph RGW bucket. We
encountered this problem early on and sharded our buckets into 20 buckets,
each which have the sharded bucket index with 20 shards.

Unfortunately, enabling the sharded RGW index requires recreating the
bucket and all objects.

The fact that ceph uses ceph itself for the bucket indexes makes RGW less
reliable in our experience. Instead of depending on one object you're
depending on two, with the index and the object itself. If the cluster has
any issues with the index the fact that it blocks access to the object
itself is very frustrating. If we could retrieve / put objects into RGW
without hitting the index at all we would - we don't need to list our
buckets.

-Ben

On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander  wrote:

>
> > Op 20 september 2016 om 10:55 schreef Василий Ангапов  >:
> >
> >
> > Hello,
> >
> > Is there any way to copy rgw bucket index to another Ceph node to
> > lower the downtime of RGW? For now I have  a huge bucket with 200
> > million files and its backfilling is blocking RGW completely for an
> > hour and a half even with 10G network.
> >
>
> No, not really. What you really want is the bucket sharding feature.
>
> So what you can do is enable the sharding, create a NEW bucket and copy
> over the objects.
>
> Afterwards you can remove the old bucket.
>
> Wido
>
> > Thanks!
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unknown error (95->500) when creating buckets or putting files to RGW after upgrade from Infernalis to Jewel

2016-07-26 Thread Ben Hines

Fwiw this thread still has me terrified to upgrade my rgw cluster. Just
when I thought it was safe.

Anyone have any successful problem free rgw infernalis-jewel upgrade
reports?

On Jul 25, 2016 11:27 PM, "nick"  wrote:

> Hey Maciej,
> I compared the output of your commands with the output on our cluster and
> they
> are the same. So I do not see any problems on that site. After that I
> googled
> for the warning you get in the debug log:
> """
> WARNING: set_req_state_err err_no=95 resorting to 500
> """
>
> I found some reports about problems with EC coded pools and rados gw. Do
> you
> use that?
>
>
> Cheers
> Nick
>
> On Monday, July 25, 2016 04:50:56 PM Naruszewicz, Maciej wrote:
> > WARNING: set_req_state_err err_no=95 resorting to 500
>
> --
> Sebastian Nickel
> Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)

2016-05-23 Thread Ben Hines

I for one am terrified of upgrading due to these messages (and indications
that the problem still may not be resolved even in 10.2.1) - holding off
until a clean upgrade is possible without running any hacky scripts.

-Ben

On Mon, May 23, 2016 at 2:23 AM, nick  wrote:

> Hi,
> we ran into the same rgw problem when updating from infernalis to jewel
> (version 10.2.1). Now I would like to run the script from Yehuda, but I am
> a
> bit scared by
>
>
> >I can create and get new buckets and objects but I've "lost" all my
> > old buckets.
>
> As I understood so far we do not need the trailing underscores from the
> script
> when upgrading directly to 10.2.1. Is this correct?
>
> I guess the trailing underscore gets created in that script line
> (filter_json
> function):
> """
> echo \"$key\": \"${val}_\", >> $out_file
> """
>
> Should I remove the 'underscore' there?
>
> Thanks for any help :-)
>
> Cheers
> Nick
>
> On Friday, May 20, 2016 01:28:07 PM Jonathan D. Proulx wrote:
> > On Fri, May 20, 2016 at 09:21:58AM -0700, Yehuda Sadeh-Weinraub wrote:
> > :On Fri, May 20, 2016 at 9:03 AM, Jonathan D. Proulx 
> wrote:
> > :> Hi All,
> > :>
> > :> I saw the previous thread on this related to
> > :> http://tracker.ceph.com/issues/15597
> > :>
> > :> and Yehuda's fix script
> > :>
> https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/
> > :> fix-zone
> > :>
> > :> Running this seems to have landed me in a weird state.
> > :>
> > :> I can create and get new buckets and objects but I've "lost" all my
> > :> old buckets.  I'm fairly confident the "lost" data is in the
> > :> .rgw.buckets pool but my current zone is set to use .rgw.buckets_
> >
> > 
> >
> > :> Should I just adjust the zone to use the pools without trailing
> > :> slashes?  I'm a bit lost.  the last I could see from running the
> > :
> > :Yes. The trailing slashes were needed when upgrading for 10.2.0, as
> > :there was another bug, and I needed to add these to compensate for it.
> > :I should update the script now to reflect that fix. You should just
> > :update the json and set the zone appropriately.
> > :
> > :Yehuda
> >
> > That did the trick (though obviously we both meant trailing
> > underscores '_')
> >
> > Thanks,
> > -Jon
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
> Sebastian Nickel
> Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] waiting for rw locks on rgw index file during recovery

2016-05-06 Thread Ben Hines

Infernalis 9.2.1, Centos 72. My cluster is in recovery and i've noticed a
lot of 'waiting for rw locks'. Some of these can last quite a long time.
Any idea what can cause this?

Because this is a RGW bucket index file, this causes backup effects --
since the index can't be updated, S3 updates to other objects using that
index fail because the client can't get the index. This is a problem with
radosgw's implementation of the index metadata - when the cluster has
issues, it affects more than just the object itself. It's basically twice
the failure points for each object, if you're using radosgw since both the
index and the rados object have to be valid and on good OSDs.

2016-05-06 21:22:40.193804 7f9b6bc22700  0 log_channel(cluster) log [WRN] :
1 slow requests, 1 included below; oldest blocked for > 30.484007 secs
2016-05-06 21:22:40.193810 7f9b6bc22700  0 log_channel(cluster) log [WRN] :
slow request 30.484007 seconds old, received at 2016-05-06 21:22:09.709765:
osd_op(client.45236419.0:481981 .dir.default.42048218.27.16 [call
rgw.bucket_complete_op] 11.b9b84dfe ack+ondisk+write+known_if_redirected
e105350) currently waiting for rw locks


>From dump_ops_in_flight:

"description": "osd_op(client.45297841.0:387094
.dir.default.42048218.27.16 [call rgw.bucket_prepare_op] 11.b9b84dfe
ondisk+write+known_if_redirected e105350)",
"initiated_at": "2016-05-06 21:25:51.911180",
"age": 5.121787,
"duration": 5.158513,
"type_data": [
"delayed",
{
"client": "client.45297841",
"tid": 387094
},
[
{
"time": "2016-05-06 21:25:51.911180",
"event": "initiated"
},
{
"time": "2016-05-06 21:25:51.911280",
"event": "queued_for_pg"
},
{
"time": "2016-05-06 21:25:51.940233",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:51.940265",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:52.059897",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:52.059925",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:52.262025",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:52.262056",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:52.358435",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:52.358958",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:52.806910",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:52.806930",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:52.947345",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:52.947357",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:53.131842",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:53.131860",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:53.323012",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:53.323031",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:53.800726",
"event": "reached_pg"
},
{
"time": "2016-05-06 21:25:53.800744",
"event": "waiting for rw locks"
},
{
"time": "2016-05-06 21:25:54.260684",
"event": "reached_pg"
},
{

Re: [ceph-users] Incorrect crush map

2016-05-05 Thread Ben Hines

Nevermind, they just came back. Looks like i had some other issues, such as
manually enabled ceph-osd@#.service files in systemd config for OSDs that
had been moved to different nodes.

The root problem is clearly that ceph-osd-prestart updates the crush map
before the OSD successfully starts at all. If there's duplicate IDs for
example, due to leftover files or somesuch, then a working OSD on another
OSD may be forcibly moved in the crush map to another node where it doesn't
exist. I would expect OSDs to update their own location in CRUSH, rather
than having this be a prestart step.

-Ben


On Wed, May 4, 2016 at 10:27 PM, Ben Hines <bhi...@gmail.com> wrote:

> Centos 7.2.
>
> .. and i think i just figured it out. One node had directories from former
> OSDs in /var/lib/ceph/osd. When restarting other OSDs on this host, ceph
> apparently added those to the crush map, too.
>
> [root@sm-cld-mtl-013 osd]# ls -la /var/lib/ceph/osd/
> total 128
> drwxr-x--- 8 ceph ceph  90 Feb 24 14:44 .
> drwxr-x--- 9 ceph ceph 106 Feb 24 14:44 ..
> drwxr-xr-x 2 root root   6 Jul  2  2015 ceph-42
> drwxr-xr-x 2 root root   6 Jul  2  2015 ceph-43
> drwxr-xr-x 1 root root 278 May  4 22:21 ceph-44
> drwxr-xr-x 1 root root 278 May  4 22:21 ceph-45
> drwxr-xr-x 1 root root 278 May  4 22:25 ceph-67
> drwxr-xr-x 1 root root 304 May  4 22:25 ceph-86
>
>
> (42 and 43 are on a different host.. yet when 'systemctl start
> ceph.target' is used, the osd preflight adds them to the crush map anyway:
>
>
> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.67 at :/0 osd_data
> /var/lib/ceph/osd/ceph-67 /var/lib/ceph/osd/ceph-67/journal
> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.45 at :/0 osd_data
> /var/lib/ceph/osd/ceph-45 /var/lib/ceph/osd/ceph-45/journal
> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: WARNING: will not setuid/gid:
> /var/lib/ceph/osd/ceph-42 owned by 0:0 and not requested 167:167
> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.529176
> 7f00cca7c900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on
> /var/lib/ceph/osd/ceph-43: (2) No such file or directory#033[0m
> May  4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.534657
> 7fb55c17e900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on
> /var/lib/ceph/osd/ceph-42: (2) No such file or directory#033[0m
> May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service: main process
> exited, code=exited, status=1/FAILURE
> May  4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@43.service entered
> failed state.
> May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service failed.
> May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service: main process
> exited, code=exited, status=1/FAILURE
> May  4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@42.service entered
> failed state.
> May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service failed.
>
>
>
> -Ben
>
> On Tue, May 3, 2016 at 7:16 PM, Wade Holler <wade.hol...@gmail.com> wrote:
>
>> Hi Ben,
>>
>> What OS+Version ?
>>
>> Best Regards,
>> Wade
>>
>>
>> On Tue, May 3, 2016 at 2:44 PM Ben Hines <bhi...@gmail.com> wrote:
>>
>>> My crush map keeps putting some OSDs on the wrong node. Restarting them
>>> fixes it temporarily, but they eventually hop back to the other node that
>>> they aren't really on.
>>>
>>> Is there anything that can cause this to look for?
>>>
>>> Ceph 9.2.1
>>>
>>> -Ben
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RGW obj remove cls_xx_remove returned -2

2016-05-05 Thread Ben Hines

Ceph 9.2.1, Centos 7.2

I noticed these errors sometimes when removing objects. It's getting a 'No
such file or directory' on the OSD when deleting things sometimes.  Any
ideas here?  Is this expected?

(i anonymized the full filename, but it's all the same file)

RGW log:

2016-05-04 23:14:32.216324 7f92b7741700  1 -- 10.29.16.57:0/2874775405 <==
osd.11 10.30.1.42:6808/7454 45  osd_op_reply(476 default.42048218.15_
... fb66a4923b2029a6588adb1245fa3fe9 [call] v0'0 uv551321 ondisk = -2 ((2)
No such file or directory)) v6  349+0+0 (2101432025 0 0) 0x7f93b403aca0
con 0x7f946001b3d0
2016-05-04 23:14:32.216587 7f931b7b6700  1 -- 10.29.16.57:0/2874775405 -->
10.30.1.42:6808/7454 -- osd_op(client.45297956.0:477
.dir.default.42048218.15.12 [call rgw.bucket_complete_op] 11.74c941dd
ack+ondisk+write+known_if_redirected e104420) v6 -- ?+0 0x7f95100fcb40 con
0x7f946001b3d0
2016-05-04 23:14:32.216807 7f931b7b6700  2 req 4238:22.224049:s3:DELETE
 fb66a4923b2029a6588adb1245fa3fe9:delete_obj:http status=204
2016-05-04 23:14:32.216826 7f931b7b6700  1 == req done
req=0x7f9510091e50 http_status=204 ==
2016-05-04 23:14:32.216920 7f931b7b6700  1 civetweb: 0x7f9518c0:
10.29.16.57 - - [04/May/2016:23:14:09 -0700] "DELETE
fb66a4923b2029a6588adb1245fa3fe9 HTTP/1.1" 204 0 - Boto/2.38.0 Python/2.7.5
Linux/3.10.0-327.10.1.el7.x86_64




Log on the OSD with debug ms=10:


2016-05-04 23:14:31.716246 7fccbec2a700  0  cls/rgw/cls_rgw.cc:1959:
ERROR: rgw_obj_remove(): cls_cxx_remove returned -2
2016-05-04 23:14:31.716379 7fccbec2a700  1 -- 10.30.1.42:6808/7454 -->
10.29.16.57:0/939886467 -- osd_op_reply(525 default.42048218.15_ ...
fb66a4923b2029a6588adb1245fa3fe9 [call rgw.obj_remove] v0'0 uv551321 ondisk
= -2 ((2) No such file or directory)) v6 -- ?+0 0x7fcd05f0d600 con
0x7fcd01865fa0
2016-05-04 23:14:31.716563 7fcc59cb0700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/939886467 pipe(0x7fcd0e29f000 sd=527 :6808 s=2 pgs=16 cs=1
l=1 c=0x7fcd01865fa0).writer: state = open policy.server=1
2016-05-04 23:14:31.716646 7fcc59cb0700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/939886467 pipe(0x7fcd0e29f000 sd=527 :6808 s=2 pgs=16 cs=1
l=1 c=0x7fcd01865fa0).writer: state = open policy.server=1
2016-05-04 23:14:31.716983 7fcc76585700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).reader wants 456 bytes from policy throttler
19523/524288000
2016-05-04 23:14:31.717006 7fcc76585700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).reader wants 456 from dispatch throttler 0/104857600
2016-05-04 23:14:31.717029 7fcc76585700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).aborted = 0
2016-05-04 23:14:31.717056 7fcc76585700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).reader got message 111 0x7fcd120c42c0
osd_op(client.45297946.0:411 .dir.default.42048218.15.15 [call
rgw.bucket_prepare_op] 11.c01f555d ondisk+write+known_if_redirected
e104420) v6
2016-05-04 23:14:31.717077 7fcc76585700  1 -- 10.30.1.42:6808/7454 <==
client.45297946 10.29.16.57:0/3924513385 111 
osd_op(client.45297946.0:411 .dir.default.42048218.15.15 [call
rgw.bucket_prepare_op] 11.c01f555d ondisk+write+known_if_redirected
e104420) v6  213+0+243 (3423964475 0 1018669967) 0x7fcd120c42c0 con
0x7fcced99f860
2016-05-04 23:14:31.717081 7fcc74538700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).writer: state = open policy.server=1
2016-05-04 23:14:31.717100 7fcc74538700 10 -- 10.30.1.42:6808/7454 >>
10.29.16.57:0/3924513385 pipe(0x7fcd0c87a000 sd=542 :6808 s=2 pgs=10 cs=1
l=1 c=0x7fcced99f860).write_ack 111
--
2016-05-04 23:14:32.202608 7fccb49ff700 10 -- 10.30.1.42:6809/7454
dispatch_throttle_release 83 to dispatch throttler 83/104857600
2016-05-04 23:14:32.203922 7fccb10c6700 10 -- 10.30.1.42:6809/7454 >>
10.30.1.124:6813/4012396 pipe(0x7fcd0486 sd=199 :6809 s=2 pgs=46808
cs=1 l=0 c=0x7fcd047679c0).reader got ack seq 1220 >= 1220 on
0x7fcd053a5e00 osd_repop(client.45297861.0:514 11.5d
11/c01f555d/.dir.default.42048218.15.15/head v 104420'1406810) v1
2016-05-04 23:14:32.204040 7fccb10c6700 10 -- 10.30.1.42:6809/7454 >>
10.30.1.124:6813/4012396 pipe(0x7fcd0486 sd=199 :6809 s=2 pgs=46808
cs=1 l=0 c=0x7fcd047679c0).reader wants 83 from dispatch throttler
0/104857600
2016-05-04 23:14:32.204084 7fccb10c6700 10 -- 10.30.1.42:6809/7454 >>
10.30.1.124:6813/4012396 pipe(0x7fcd0486 sd=199 :6809 s=2 pgs=46808
cs=1 l=0 c=0x7fcd047679c0).aborted = 0
2016-05-04 23:14:32.204103 7fccb10c6700 10 -- 10.30.1.42:6809/7454 >>
10.30.1.124:6813/4012396 pipe(0x7fcd0486 sd=199 :6809 s=2 pgs=46808
cs=1 l=0 c=0x7fcd047679c0).reader got message 1236 0x7fcd05d5f440

Re: [ceph-users] Incorrect crush map

2016-05-04 Thread Ben Hines

Centos 7.2.

.. and i think i just figured it out. One node had directories from former
OSDs in /var/lib/ceph/osd. When restarting other OSDs on this host, ceph
apparently added those to the crush map, too.

[root@sm-cld-mtl-013 osd]# ls -la /var/lib/ceph/osd/
total 128
drwxr-x--- 8 ceph ceph  90 Feb 24 14:44 .
drwxr-x--- 9 ceph ceph 106 Feb 24 14:44 ..
drwxr-xr-x 2 root root   6 Jul  2  2015 ceph-42
drwxr-xr-x 2 root root   6 Jul  2  2015 ceph-43
drwxr-xr-x 1 root root 278 May  4 22:21 ceph-44
drwxr-xr-x 1 root root 278 May  4 22:21 ceph-45
drwxr-xr-x 1 root root 278 May  4 22:25 ceph-67
drwxr-xr-x 1 root root 304 May  4 22:25 ceph-86

(42 and 43 are on a different host.. yet when 'systemctl start ceph.target'
is used, the osd preflight adds them to the crush map anyway:

May  4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.67 at :/0 osd_data
/var/lib/ceph/osd/ceph-67 /var/lib/ceph/osd/ceph-67/journal
May  4 22:13:26 sm-cld-mtl-013 ceph-osd: starting osd.45 at :/0 osd_data
/var/lib/ceph/osd/ceph-45 /var/lib/ceph/osd/ceph-45/journal
May  4 22:13:26 sm-cld-mtl-013 ceph-osd: WARNING: will not setuid/gid:
/var/lib/ceph/osd/ceph-42 owned by 0:0 and not requested 167:167
May  4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.529176
7f00cca7c900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on
/var/lib/ceph/osd/ceph-43: (2) No such file or directory#033[0m
May  4 22:13:26 sm-cld-mtl-013 ceph-osd: 2016-05-04 22:13:26.534657
7fb55c17e900 -1 #033[0;31m ** ERROR: unable to open OSD superblock on
/var/lib/ceph/osd/ceph-42: (2) No such file or directory#033[0m
May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service: main process
exited, code=exited, status=1/FAILURE
May  4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@43.service entered
failed state.
May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service failed.
May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service: main process
exited, code=exited, status=1/FAILURE
May  4 22:13:26 sm-cld-mtl-013 systemd: Unit ceph-osd@42.service entered
failed state.
May  4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service failed.

-Ben

On Tue, May 3, 2016 at 7:16 PM, Wade Holler <wade.hol...@gmail.com> wrote:

> Hi Ben,
>
> What OS+Version ?
>
> Best Regards,
> Wade
>
>
> On Tue, May 3, 2016 at 2:44 PM Ben Hines <bhi...@gmail.com> wrote:
>
>> My crush map keeps putting some OSDs on the wrong node. Restarting them
>> fixes it temporarily, but they eventually hop back to the other node that
>> they aren't really on.
>>
>> Is there anything that can cause this to look for?
>>
>> Ceph 9.2.1
>>
>> -Ben
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph degraded writes

2016-05-03 Thread Ben Hines

The Hammer .93 to .94 notes said:
If upgrading from v0.93, setosd enable degraded writes = false   on all
osds prior to upgrading. The degraded writes feature has been reverted due
to 11155.

Our cluster is now on Infernalis 9.2.1 and we still have this setting set.
Can we get rid of it? Was this release note just needed for the upgrade? I
think we may be encountering problems in our cluster during recovery
because we can't write to any object which has less than 3 copies even
though we have min_size at 1.

thanks,

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Incorrect crush map

2016-05-03 Thread Ben Hines

My crush map keeps putting some OSDs on the wrong node. Restarting them
fixes it temporarily, but they eventually hop back to the other node that
they aren't really on.

Is there anything that can cause this to look for?

Ceph 9.2.1

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines

Aha, i see how to use the debuginfo - trying it by running through gdb.


On Wed, Apr 27, 2016 at 10:09 PM, Ben Hines <bhi...@gmail.com> wrote:

> Got it again - however, the stack is exactly the same, no symbols -
> debuginfo didn't resolve. Do i need to do something to enable that?
>
> The server in 'debug ms=10' this time, so there is a bit more spew:
>
>-14> 2016-04-27 21:59:58.811919 7f9e817fa700  1 --
> 10.30.1.8:0/3291985349 --> 10.30.2.13:6805/27519 --
> osd_op(client.44936150.0:223 obj_delete_at_hint.55 [call
> timeindex.list] 10.2c88dbcf ack+read+known_if_redirected e100564) v6 -- ?+0
> 0x7f9f140dc5f0 con 0x7f9f1410ed10
>-13> 2016-04-27 21:59:58.812039 7f9e3fa6b700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state =
> open policy.server=0
>-12> 2016-04-27 21:59:58.812096 7f9e3fa6b700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state =
> open policy.server=0
>-11> 2016-04-27 21:59:58.814343 7f9e3f96a700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).reader wants 211
> from dispatch throttler 0/104857600
>-10> 2016-04-27 21:59:58.814375 7f9e3f96a700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).aborted = 0
> -9> 2016-04-27 21:59:58.814405 7f9e3f96a700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).reader got message
> 2 0x7f9ec0009250 osd_op_reply(223 obj_delete_at_hint.55 [call] v0'0
> uv1448004 ondisk = 0) v6
> -8> 2016-04-27 21:59:58.814428 7f9e3f96a700  1 --
> 10.30.1.8:0/3291985349 <== osd.6 10.30.2.13:6805/27519 2 
> osd_op_reply(223 obj_delete_at_hint.55 [call] v0'0 uv1448004 ondisk
> = 0) v6  196+0+15 (3849172018 0 2149983739) 0x7f9ec0009250 con
> 0x7f9f1410ed10
> -7> 2016-04-27 21:59:58.814472 7f9e3f96a700 10 --
> 10.30.1.8:0/3291985349 dispatch_throttle_release 211 to dispatch
> throttler 211/104857600
> -6> 2016-04-27 21:59:58.814470 7f9e3fa6b700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state =
> open policy.server=0
> -5> 2016-04-27 21:59:58.814511 7f9e3fa6b700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).write_ack 2
> -4> 2016-04-27 21:59:58.814528 7f9e3fa6b700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state =
> open policy.server=0
> -3> 2016-04-27 21:59:58.814607 7f9e817fa700  1 --
> 10.30.1.8:0/3291985349 --> 10.30.2.13:6805/27519 --
> osd_op(client.44936150.0:224 obj_delete_at_hint.55 [call
> lock.unlock] 10.2c88dbcf ondisk+write+known_if_redirected e100564) v6 --
> ?+0 0x7f9f140dc5f0 con 0x7f9f1410ed10
> -2> 2016-04-27 21:59:58.814718 7f9e3fa6b700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state =
> open policy.server=0
> -1> 2016-04-27 21:59:58.814778 7f9e3fa6b700 10 --
> 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010
> sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state =
> open policy.server=0
>  0> 2016-04-27 21:59:58.826494 7f9e7e7f4700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f9e7e7f4700
>
>  ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
>  1: (()+0x30b0a2) [0x7fa11c5030a2]
>  2: (()+0xf100) [0x7fa1183fe100]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> --- logging levels ---
> 
>
>
> On Wed, Apr 27, 2016 at 9:39 PM, Ben Hines <bhi...@gmail.com> wrote:
>
>> Yes, CentOS 7.2. Happened twice in a row, both times shortly after a
>> restart, so i expect i'll be able to reproduce it. However, i've now tried
>> a bunch of times and it's not happening again.
>>
>> In any case i have glibc + ceph-debuginfo installed so we can get more
>> info if it does happen.
>>
>> thanks!
>>
>> On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard <bhubb...@redhat.com>
>> wrote:
>>
>>> - Original Message -
>&

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines

Got it again - however, the stack is exactly the same, no symbols -
debuginfo didn't resolve. Do i need to do something to enable that?

The server in 'debug ms=10' this time, so there is a bit more spew:

   -14> 2016-04-27 21:59:58.811919 7f9e817fa700  1 -- 10.30.1.8:0/3291985349
--> 10.30.2.13:6805/27519 -- osd_op(client.44936150.0:223
obj_delete_at_hint.55 [call timeindex.list] 10.2c88dbcf
ack+read+known_if_redirected e100564) v6 -- ?+0 0x7f9f140dc5f0 con
0x7f9f1410ed10
   -13> 2016-04-27 21:59:58.812039 7f9e3fa6b700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).writer: state = open policy.server=0
   -12> 2016-04-27 21:59:58.812096 7f9e3fa6b700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).writer: state = open policy.server=0
   -11> 2016-04-27 21:59:58.814343 7f9e3f96a700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).reader wants 211 from dispatch throttler
0/104857600
   -10> 2016-04-27 21:59:58.814375 7f9e3f96a700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).aborted = 0
-9> 2016-04-27 21:59:58.814405 7f9e3f96a700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).reader got message 2 0x7f9ec0009250
osd_op_reply(223 obj_delete_at_hint.55 [call] v0'0 uv1448004 ondisk
= 0) v6
-8> 2016-04-27 21:59:58.814428 7f9e3f96a700  1 -- 10.30.1.8:0/3291985349
<== osd.6 10.30.2.13:6805/27519 2  osd_op_reply(223
obj_delete_at_hint.55 [call] v0'0 uv1448004 ondisk = 0) v6 
196+0+15 (3849172018 0 2149983739) 0x7f9ec0009250 con 0x7f9f1410ed10
-7> 2016-04-27 21:59:58.814472 7f9e3f96a700 10 -- 10.30.1.8:0/3291985349
dispatch_throttle_release 211 to dispatch throttler 211/104857600
-6> 2016-04-27 21:59:58.814470 7f9e3fa6b700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).writer: state = open policy.server=0
-5> 2016-04-27 21:59:58.814511 7f9e3fa6b700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).write_ack 2
-4> 2016-04-27 21:59:58.814528 7f9e3fa6b700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).writer: state = open policy.server=0
-3> 2016-04-27 21:59:58.814607 7f9e817fa700  1 -- 10.30.1.8:0/3291985349
--> 10.30.2.13:6805/27519 -- osd_op(client.44936150.0:224
obj_delete_at_hint.55 [call lock.unlock] 10.2c88dbcf
ondisk+write+known_if_redirected e100564) v6 -- ?+0 0x7f9f140dc5f0 con
0x7f9f1410ed10
-2> 2016-04-27 21:59:58.814718 7f9e3fa6b700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).writer: state = open policy.server=0
-1> 2016-04-27 21:59:58.814778 7f9e3fa6b700 10 -- 10.30.1.8:0/3291985349
>> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 sd=153 :10861 s=2 pgs=725914
cs=1 l=1 c=0x7f9f1410ed10).writer: state = open policy.server=0
 0> 2016-04-27 21:59:58.826494 7f9e7e7f4700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f9e7e7f4700

 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
 1: (()+0x30b0a2) [0x7fa11c5030a2]
 2: (()+0xf100) [0x7fa1183fe100]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---



On Wed, Apr 27, 2016 at 9:39 PM, Ben Hines <bhi...@gmail.com> wrote:

> Yes, CentOS 7.2. Happened twice in a row, both times shortly after a
> restart, so i expect i'll be able to reproduce it. However, i've now tried
> a bunch of times and it's not happening again.
>
> In any case i have glibc + ceph-debuginfo installed so we can get more
> info if it does happen.
>
> thanks!
>
> On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
>
>> - Original Message -
>> > From: "Karol Mroz" <km...@suse.com>
>> > To: "Ben Hines" <bhi...@gmail.com>
>> > Cc: "ceph-users" <ceph-users@lists.ceph.com>
>> > Sent: Wednesday, 27 April, 2016 7:06:56 PM
>> > Subject: Re: [ceph-users] radosgw crash - Infernalis
>> >
>> > On Tue, Apr 26, 2016 at 10:17:31PM -0700, Ben Hines wrote:
>> > [...]
>> > > --> 10.30.1.6:6800/10350 -- osd_op(client.44852756.0:79
>> > > default.42048218. [getxattrs,stat,read 0~524288] 12.aa7304

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines

Yes, CentOS 7.2. Happened twice in a row, both times shortly after a
restart, so i expect i'll be able to reproduce it. However, i've now tried
a bunch of times and it's not happening again.

In any case i have glibc + ceph-debuginfo installed so we can get more info
if it does happen.

thanks!

On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard <bhubb...@redhat.com> wrote:

> - Original Message -
> > From: "Karol Mroz" <km...@suse.com>
> > To: "Ben Hines" <bhi...@gmail.com>
> > Cc: "ceph-users" <ceph-users@lists.ceph.com>
> > Sent: Wednesday, 27 April, 2016 7:06:56 PM
> > Subject: Re: [ceph-users] radosgw crash - Infernalis
> >
> > On Tue, Apr 26, 2016 at 10:17:31PM -0700, Ben Hines wrote:
> > [...]
> > > --> 10.30.1.6:6800/10350 -- osd_op(client.44852756.0:79
> > > default.42048218. [getxattrs,stat,read 0~524288] 12.aa730416
> > > ack+read+known_if_redirected e100207) v6 -- ?+0 0x7f49c41880b0 con
> > > 0x7f49c4145eb0
> > >  0> 2016-04-26 22:07:59.685615 7f49a07f0700 -1 *** Caught signal
> > > (Segmentation fault) **
> > >  in thread 7f49a07f0700
> > >
> > >  ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
> > >  1: (()+0x30b0a2) [0x7f4c4907f0a2]
> > >  2: (()+0xf100) [0x7f4c44f7a100]
> > >  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed
> > > to interpret this.
> >
> > Hi Ben,
> >
> > I sense a pretty badly corrupted stack. From the radosgw-9.2.1 (obtained
> from
> > a downloaded rpm):
> >
> > 0030a810 <_Z13pidfile_writePK11md_config_t@@Base>:
> > ...
> >   30b09d:   e8 0e 40 e4 ff  callq  14f0b0 <backtrace@plt>
> >   30b0a2:   4c 89 efmov%r13,%rdi
> >   ---
> > ...
> >
> > So either we tripped backtrace() code from pidfile_write() _or_ we can't
> > trust the stack. From the log snippet, it looks that we're far past the
> point
> > at which we would write a pidfile to disk (ie. at process start during
> > global_init()).
> > Rather, we're actually handling a request and outputting some bit of
> debug
> > message
> > via MSDOp::print() and beyond...
>
> It would help to know what binary this is and what OS.
>
> We know the offset into the function is 0x30b0a2 but we don't know which
> function yet AFAICT. Karol, how did you arrive at pidfile_write? Purely
> from
> the offset? I'm not sure that would be reliable...
>
> This is a segfault so the address of the frame where we crashed should be
> the
> exact instruction where we crashed. I don't believe a mov from one
> register to
> another that does not involve a dereference ((%r13) as opposed to %r13) can
> cause a segfault so I don't think we are on the right instruction but
> then, as
> you say, the stack may be corrupt.
>
> >
> > Is this something you're able to easily reproduce? More logs with higher
> log
> > levels
> > would be helpful... a coredump with radosgw compiled with -g would be
> > excellent :)
>
> Agreed, although if this is an rpm based system it should be sufficient to
> run the following.
>
> # debuginfo-install ceph glibc
>
> That may give us the name of the function depending on where we are (if we
> are
> in a library it may require the debuginfo for that library be loaded.
>
> Karol is right that a coredump would be a good idea in this case and will
> give
> us maximum information about the issue you are seeing.
>
> Cheers,
> Brad
>
> >
> > --
> > Regards,
> > Karol
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw crash - Infernalis

2016-04-26 Thread Ben Hines

Is this a known one? Ceph 9.2.1. Can provide more logs if needed.

2> 2016-04-26 22:07:59.662702 7f49aeffd700  1 == req done
req=0x7f49c4138be0 http_status=200 ==
   -11> 2016-04-26 22:07:59.662752 7f49aeffd700  1 civetweb:
0x7f49c4001280: 10.30.1.221 - - [26/Apr/2016:22:07:59 -0700] "HEAD
/HTTP/1.1" 200 0 - Boto/2.32.1 Python/2.7.8 Windows/2008ServerR2
   -10> 2016-04-26 22:07:59.672109 7f49aeffd700  1 == starting new
request req=0x7f49c4148d00 =
-9> 2016-04-26 22:07:59.672131 7f49aeffd700  2 req 212:0.22::GET
::initializing for trans_id =
tx000d4-005720492f-2ac65f1-default
-8> 2016-04-26 22:07:59.672137 7f49aeffd700 10 host=sm-cephrgw3.scea.com
-7> 2016-04-26 22:07:59.672159 7f49aeffd700 10 s->object=
s->bucket=
-6> 2016-04-26 22:07:59.672165 7f49aeffd700  2 req 212:0.56:s3:GET
::getting op
-5> 2016-04-26 22:07:59.672169 7f49aeffd700  2 req 212:0.61:s3:GET
:authorizing
-4> 2016-04-26 22:07:59.672199 7f49aeffd700 10 get_canon_resource():
dest=
-3> 2016-04-26 22:07:59.672203 7f49aeffd700 10 auth_hdr:
GET


Wed, 27 Apr 2016 05:07:59 GMT

-2> 2016-04-26 22:07:59.672240 7f49aeffd700  2 req 212:0.000131:s3:GET
:reading permissions
-1> 2016-04-26 22:07:59.672338 7f49aeffd700  1 -- 10.30.1.8:0/4080085251
--> 10.30.1.6:6800/10350 -- osd_op(client.44852756.0:79
default.42048218. [getxattrs,stat,read 0~524288] 12.aa730416
ack+read+known_if_redirected e100207) v6 -- ?+0 0x7f49c41880b0 con
0x7f49c4145eb0
 0> 2016-04-26 22:07:59.685615 7f49a07f0700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f49a07f0700

 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
 1: (()+0x30b0a2) [0x7f4c4907f0a2]
 2: (()+0xf100) [0x7f4c44f7a100]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
  10/10 rgw
  10/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)

  max_recent 1
  max_new 1000
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Using s3 (radosgw + ceph) like a cache

2016-04-25 Thread Ben Hines

This is how we use ceph/ radosgw.  I'd say our cluster is not that
reliable, but it's probably mostly our fault (no SSD journals, etc).

However, note that deletes are very slow in ceph. We put millions of
objects in very quickly and they are verrry slow to delete again especially
from RGW because it has to update the index too.

-Ben


On Mon, Apr 25, 2016 at 2:15 AM, Dominik Mostowiec <
dominikmostow...@gmail.com> wrote:

> Hi,
> I thought that xfs fragmentation or leveldb(gc list growing, locking,
> ...) could be a problem.
> Do you have any experience with this ?
>
> ---
> Regards
> Dominik
>
> 2016-04-24 13:40 GMT+02:00  :
> > I do not see any issue with that
> >
> > On 24/04/2016 12:39, Dominik Mostowiec wrote:
> >> Hi,
> >> I'm curious if using s3 like a cache -  frequent put/delete in the
> >> long term   may cause some problems in radosgw or OSD(xfs)?
> >>
> >> -
> >> Regards
> >> Dominik
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Pozdrawiam
> Dominik
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket deletion woes

2016-03-19 Thread Ben Hines

We would be a big user of this. We delete large buckets often and it takes
forever.

Though didn't I read that 'object expiration' support is on the near-term
RGW roadmap? That may do what we want.. we're creating thousands of objects
a day, and thousands of objects a day will be expiring, so RGW will need to
handle.


-Ben

On Wed, Mar 16, 2016 at 9:40 AM, Yehuda Sadeh-Weinraub 
wrote:

> On Tue, Mar 15, 2016 at 11:36 PM, Pavan Rallabhandi
>  wrote:
> > Hi,
> >
> > I find this to be discussed here before, but couldn¹t find any solution
> > hence the mail. In RGW, for a bucket holding objects in the range of ~
> > millions, one can find it to take for ever to delete the bucket(via
> > radosgw-admin). I understand the gc(and its parameters) that would
> reclaim
> > the space eventually, but am looking more at the bucket deletion options
> > that can possibly speed up the operation.
> >
> > I realize, currently rgw_remove_bucket(), does it 1000 objects at a time,
> > serially. Wanted to know if there is a reason(that am possibly missing
> and
> > discussed) for this to be left that way, otherwise I was considering a
> > patch to make it happen better.
> >
>
> There is no real reason. You might want to have a version of that
> command that doesn't schedule the removal to gc, but rather removes
> all the object parts by itself. Otherwise, you're just going to flood
> the gc. You'll need to iterate through all the objects, and for each
> object you'll need to remove all of it's rados objects (starting with
> the tail, then the head). Removal of each rados object can be done
> asynchronously, but you'll need to throttle the operations, not send
> everything to the osds at once (which will be impossible, as the
> objecter will throttle the requests anyway, which will lead to a high
> memory consumption).
>
> Thanks,
> Yehuda
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw (civetweb) hangs once around 850 established connections

2016-03-19 Thread Ben Hines

What OS are you using?

I have a lot more open connections than that. (though i have some other
issues, where rgw sometimes returns 500 errors, it doesn't stop like yours)

You might try tuning civetweb's num_threads and 'rgw num rados handles':

rgw frontends = civetweb num_threads=125
error_log_file=/var/log/radosgw/civetweb.error.log
access_log_file=/var/log/radosgw/civetweb.access.log
rgw num rados handles = 32

You can also up civetweb loglevel:

debug civetweb = 20

-Ben

On Wed, Mar 16, 2016 at 5:03 PM, seapasu...@uchicago.edu <
seapasu...@uchicago.edu> wrote:

> I have a cluster of around 630 OSDs with 3 dedicated monitors and 2
> dedicated gateways. The entire cluster is running hammer (0.94.5
> (9764da52395923e0b32908d83a9f7304401fee43)).
>
> (Both of my gateways have stopped responding to curl right now.
> root@host:~# timeout 5 curl localhost ; echo $?
> 124
>
> From here I checked and it looks like radosgw has over 1 million open
> files:
> root@host:~# grep -i rados whatisopen.files.list | wc -l
> 1151753
>
> And around 750 open connections:
> root@host:~# netstat -planet | grep radosgw | wc -l
> 752
> root@host:~# ss -tnlap | grep rados | wc -l
> 752
>
> I don't think that the backend storage is hanging based on the following
> dump:
>
> root@host:~# ceph daemon /var/run/ceph/ceph-client.rgw.kh11-9.asok
> objecter_requests | grep -i mtime
> "mtime": "0.00",
> "mtime": "0.00",
> "mtime": "0.00",
> "mtime": "0.00",
> "mtime": "0.00",
> "mtime": "0.00",
> [...]
> "mtime": "0.00",
>
> The radosgw log is still showing lots of activity and so does strace which
> makes me think this is a config issue or limit of some kind that is not
> triggering a log. Of what I am not sure as the log doesn't seem to show any
> open file limit being hit and I don't see any big errors showing up in the
> logs.
> (last 500 lines of /var/log/radosgw/client.radosgw.log)
> http://pastebin.com/jmM1GFSA
>
> Perf dump of radosgw
> http://pastebin.com/rjfqkxzE
>
> Radosgw objecter requests:
> http://pastebin.com/skDJiyHb
>
> After restarting the gateway with '/etc/init.d/radosgw restart' the old
> process remains, no error is sent, and then I get connection refused via
> curl or netcat::
> root@kh11-9:~# curl localhost
> curl: (7) Failed to connect to localhost port 80: Connection refused
>
> Once I kill the old radosgw via sigkill the new radosgw instance restarts
> automatically and starts responding::
> root@kh11-9:~# curl localhost
> http://s3.amazonaws.com/doc/2006-03-01/
> ">anonymous
> What is going on here?
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk from jewel has issues on redhat 7

2016-03-15 Thread Ben Hines

It seems like ceph-disk is often breaking on centos/redhat systems. Does it
have automated tests in the ceph release structure?

-Ben


On Tue, Mar 15, 2016 at 8:52 AM, Stephen Lord 
wrote:

>
> Hi,
>
> The ceph-disk (10.0.4 version) command seems to have problems operating on
> a Redhat 7 system, it uses the partprobe command unconditionally to update
> the partition table, I had to change this to partx -u to get past this.
>
> @@ -1321,13 +1321,13 @@
>  processed, i.e. the 95-ceph-osd.rules actions and mode changes,
>  group changes etc. are complete.
>  """
> -LOG.debug('Calling partprobe on %s device %s', description, dev)
> +LOG.debug('Calling partx on %s device %s', description, dev)
>  partprobe_ok = False
>  error = 'unknown error'
>  for i in (1, 2, 3, 4, 5):
>  command_check_call(['udevadm', 'settle', '--timeout=600'])
>  try:
> -_check_output(['partprobe', dev])
> +_check_output(['partx', '-u', dev])
>  partprobe_ok = True
>  break
>  except subprocess.CalledProcessError as e:
>
>
> It really needs to be doing that conditional on the operating system
> version.
>
> Steve
>
>
> --
> The information contained in this transmission may be confidential. Any
> disclosure, copying, or further distribution of confidential information is
> not permitted unless such privilege is explicitly granted in writing by
> Quantum. Quantum reserves the right to have electronic communications,
> including email and attachments, sent across its networks filtered through
> anti virus and spam software programs and retain such messages in order to
> comply with applicable data security and retention requirements. Quantum is
> not responsible for the proper and complete transmission of the substance
> of this communication or for any delay in its receipt.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Recovery Assistance, pgs stuck peering

2016-03-08 Thread Ben Hines

After making that setting, the pg appeared to start peering but then it
actually changed the primary OSD to osd.100 - then went incomplete again.
Perhaps it did that because another OSD had more data? I presume i need to
set that value on each osd where the pg hops to.

-Ben

On Tue, Mar 8, 2016 at 10:39 AM, David Zafman <dzaf...@redhat.com> wrote:

>
> Ben,
>
> I haven't look at everything in your message, but pg 12.7a1 has lost data
> because of writes that went only to osd.73.  The way to recover this is to
> force recovery to ignore this fact and go with whatever data you have on
> the remaining OSDs.
> I assume that having min_size 1, having multiple nodes failing and clients
> continuing to write then permanently losing osd.73 caused this.
>
> You should TEMPORARILY set osd_find_best_info_ignore_history_les config
> variable to 1 on osd.36 and then mark it down (ceph osd down), so it will
> rejoin, re-peer and mark the pg active+clean.  Don't forget to set
> osd_find_best_info_ignore_history_les
> back to 0.
>
>
> Later you should fix your crush map.  See
> http://docs.ceph.com/docs/master/rados/operations/crush-map/
>
> The wrong placements makes you vulnerable to a single host failure taking
> out multiple copies of an object.
>
> David
>
>
> On 3/7/16 9:41 PM, Ben Hines wrote:
>
> Howdy,
>
> I was hoping someone could help me recover a couple pgs which are causing
> problems in my cluster. If we aren't able to resolve this soon, we may have
> to just destroy them and lose some data. Recovery has so far been
> unsuccessful. Data loss would probably cause some here to reconsider Ceph
> as something we'll stick with long term, so i'd love to recover it.
>
> Ceph 9.2.1. I have 4 (well, 3 now) pgs which are incomplete + stuck peering
> after a disk failure
>
> pg 12.7a1 query: https://gist.github.com/benh57/ba4f96103e1f6b3b7a4d
> pg 12.7b query: https://gist.github.com/benh57/8db0bfccc5992b9ca71a
> pg 10.4f query:  https://gist.github.com/benh57/44bdd2a19ea667d920ab
> ceph osd tree: https://gist.github.com/benh57/9fc46051a0f09b6948b7
>
> - The bad OSD (osd-73) was on mtl-024. There were no 'unfound' objects when
> it went down, the pg was 'down + peering'. It was marked lost.
> - After marking 73 lost, the new primary still wants to peer and flips
> between peering and incomplete.
> - Noticed '73' still shows in the pg query output for the bad pgs. (maybe i
> need to bring back an osd with the same name?)
> - Noticed that the new primary got set to an osd (osd-77) which was on the
> same node as (osd-76) which had all the data.  Figuring 77 couldn't peer
> with 36 because it was on the same node, i set 77 out, 36 became primary
> and 76 became one of the replicas. No change.
>
> startup logs of Primaries of bad pgs (12.7a1, 10.4f) with 'debug osd = 20,
> debug filestore = 30, debug ms = 1'  (large files)
>
> osd 36 (12.7a1) startup 
> log:https://raw.githubusercontent.com/benh57/cephdebugging/master/ceph-osd.36.log
> osd 6 (10.4f) startup 
> log:https://raw.githubusercontent.com/benh57/cephdebugging/master/ceph-osd.6.log
>
>
> Some other Notes:
>
> - Searching for OSDs which had data in 12.7a1_head, i found that osd-76 has
> 12G, but primary osd-36 has 728M. Another OSD which is out (100) also has a
> copy of the data.  Even after running a pg repair does not pick up the data
> from 76, remains stuck peering
>
> - One of the pgs was part of a pool which was no longer needed. (the unused
> radosgw .rgw.control pool, with one 0kb object in it) Per previous steps
> discussed here for a similar failure, i attempted these recovery steps on
> it, to see if they would work for the others:
>
> -- The failed osd disk only mounts 'read only' which causes
> ceph-objectstore-tool to fail to export, so i exported it from a seemingly
> good copy on another osd.
> -- stopped all osds
> -- exported the pg with objectstore-tool from an apparently good OSD
> -- removed the pg from all osds which had it using objectstore-tool
> -- imported the pg into an out osd, osd-100
>
>   Importing pgid 4.95
> Write 4/88aa5c95/notify.2/head
> Import successful
>
> -- Force recreated the pg on the cluster:
>ceph pg force_create_pg 4.95
> -- brought up all osds
> -- new pg 4.95 primary gets set to osd-99 + osd-64, 0 objects
>
> However, the object doesn't sync to the pg from osd-100, and instead 64
> tells to to remove itself from osd-100:
>
> 2016-03-05 15:44:22.858147 7fc004168700 20 osd.100 68025 _dispatch
> 0x7fc020867660 osd pg remove(epoch 68025; pg4.95; ) v2
> 2016-03-05 15:44:22.858174 7fc004168700  7 osd.100 68025 handle_pg_remove
> from osd.64 on 1 pgs
> 2016-03-05 15:44:22.858176 7fc004168700 15 osd.100 6

[ceph-users] Ceph Recovery Assistance, pgs stuck peering

2016-03-07 Thread Ben Hines

Howdy,

I was hoping someone could help me recover a couple pgs which are causing
problems in my cluster. If we aren't able to resolve this soon, we may have
to just destroy them and lose some data. Recovery has so far been
unsuccessful. Data loss would probably cause some here to reconsider Ceph
as something we'll stick with long term, so i'd love to recover it.

Ceph 9.2.1. I have 4 (well, 3 now) pgs which are incomplete + stuck peering
after a disk failure

pg 12.7a1 query: https://gist.github.com/benh57/ba4f96103e1f6b3b7a4d
pg 12.7b query: https://gist.github.com/benh57/8db0bfccc5992b9ca71a
pg 10.4f query:  https://gist.github.com/benh57/44bdd2a19ea667d920ab
ceph osd tree: https://gist.github.com/benh57/9fc46051a0f09b6948b7

- The bad OSD (osd-73) was on mtl-024. There were no 'unfound' objects when
it went down, the pg was 'down + peering'. It was marked lost.
- After marking 73 lost, the new primary still wants to peer and flips
between peering and incomplete.
- Noticed '73' still shows in the pg query output for the bad pgs. (maybe i
need to bring back an osd with the same name?)
- Noticed that the new primary got set to an osd (osd-77) which was on the
same node as (osd-76) which had all the data.  Figuring 77 couldn't peer
with 36 because it was on the same node, i set 77 out, 36 became primary
and 76 became one of the replicas. No change.

startup logs of Primaries of bad pgs (12.7a1, 10.4f) with 'debug osd = 20,
debug filestore = 30, debug ms = 1'  (large files)

osd 36 (12.7a1) startup log:
https://raw.githubusercontent.com/benh57/cephdebugging/master/ceph-osd.36.log
osd 6 (10.4f) startup log:
https://raw.githubusercontent.com/benh57/cephdebugging/master/ceph-osd.6.log


Some other Notes:

- Searching for OSDs which had data in 12.7a1_head, i found that osd-76 has
12G, but primary osd-36 has 728M. Another OSD which is out (100) also has a
copy of the data.  Even after running a pg repair does not pick up the data
from 76, remains stuck peering

- One of the pgs was part of a pool which was no longer needed. (the unused
radosgw .rgw.control pool, with one 0kb object in it) Per previous steps
discussed here for a similar failure, i attempted these recovery steps on
it, to see if they would work for the others:

-- The failed osd disk only mounts 'read only' which causes
ceph-objectstore-tool to fail to export, so i exported it from a seemingly
good copy on another osd.
-- stopped all osds
-- exported the pg with objectstore-tool from an apparently good OSD
-- removed the pg from all osds which had it using objectstore-tool
-- imported the pg into an out osd, osd-100

  Importing pgid 4.95
Write 4/88aa5c95/notify.2/head
Import successful

-- Force recreated the pg on the cluster:
   ceph pg force_create_pg 4.95
-- brought up all osds
-- new pg 4.95 primary gets set to osd-99 + osd-64, 0 objects

However, the object doesn't sync to the pg from osd-100, and instead 64
tells to to remove itself from osd-100:

2016-03-05 15:44:22.858147 7fc004168700 20 osd.100 68025 _dispatch
0x7fc020867660 osd pg remove(epoch 68025; pg4.95; ) v2
2016-03-05 15:44:22.858174 7fc004168700  7 osd.100 68025 handle_pg_remove
from osd.64 on 1 pgs
2016-03-05 15:44:22.858176 7fc004168700 15 osd.100 68025
require_same_or_newer_map 68025 (i am 68025) 0x7fc020867660
2016-03-05 15:44:22.858188 7fc004168700  5 osd.100 68025
queue_pg_for_deletion: 4.95
2016-03-05 15:44:22.858228 7fc004168700 15 osd.100 68025 project_pg_history
4.95 from 68025 to 68025, start ec=76 les/c/f 62655/62611/0
66982/67983/66982

Not wanting this to happen to my needed data from the other PGs, i didn't
try this procedure with those PGs. After this procedure  osd-100 does get
listed in 'pg query' as 'might_have_unfound', but ceph apparently decides
not to use it and the active osd sends a remove.

output of 'ceph pg 4.95 query' after these recovery steps:
https://gist.github.com/benh57/fc9a847cd83f4d5e4dcf


Quite Possibly Related:

I am occasionally noticing some incorrectness in 'ceph osd tree'. It seems
my crush map thinks some osds are on the wrong hosts. I wonder if this is
why peering is failing?
(example)
 -5   9.04999 host cld-mtl-006
 12   1.81000 osd.12   up  1.0  1.0
 13   1.81000 osd.13   up  1.0  1.0
 14   1.81000 osd.14   up  1.0  1.0
 94   1.81000 osd.94   up  1.0  1.0
 26   1.81000 osd.26   up  0.86775  1.0

^^ this host only has 4 osds on it! osd.26 is actually running over on
cld-mtl-004 !Restarting 26 fixed the map.
osd.42 (out) was also in the wrong place in 'osd tree'. tree syas it's on
cld-mtl-013, it's actually on cld-mtl-024.
- fixing these issues caused a large re-balance, so 'ceph health detail' is
a bit dirty right now, but you can see the stuck pgs:
ceph health detail:

-  I wonder if these incorrect crushmaps caused ceph to put some data on

Re: [ceph-users] abort slow requests ?

2016-03-04 Thread Ben Hines

Thanks, working on fixing the peering objects. Going to attempt a recovery
on the bad pgs tomorrow.

The corrupt OSD which they were on was marked 'lost' so i expected it
wouldn't try to peer with it anymore. Anyway I do have the data, at least.

-Ben

On Fri, Mar 4, 2016 at 1:04 AM, Luis Periquito <periqu...@gmail.com> wrote:

> you should really fix the peering objects.
>
> So far what I've seen in ceph is that it prefers data integrity over
> availability. So if it thinks that it can't keep all working properly
> it tends to stop (i.e. blocked requests), thus I don't believe there's
> a way to do this.
>
> On Fri, Mar 4, 2016 at 1:04 AM, Ben Hines <bhi...@gmail.com> wrote:
> > I have a few bad objects in ceph which are 'stuck on peering'.  The
> clients
> > hit them and they build up and eventually stop all traffic to the OSD.
>  I
> > can open up traffic by resetting the OSD (aborting those requests)
> > temporarily.
> >
> > Is there a way to tell ceph to cancel/abort these 'slow requests' once
> they
> > get to certain amount of time? Rather than building up and blocking
> > everything..
> >
> > -Ben
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] abort slow requests ?

2016-03-03 Thread Ben Hines

I have a few bad objects in ceph which are 'stuck on peering'.  The clients
hit them and they build up and eventually stop all traffic to the OSD.   I
can open up traffic by resetting the OSD (aborting those requests)
temporarily.

Is there a way to tell ceph to cancel/abort these 'slow requests' once they
get to certain amount of time? Rather than building up and blocking
everything..

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw refuses to initialize / waiting for peered 'notify' object

2016-03-02 Thread Ben Hines

Ceph 9.2.1. Shortly after updating 9.2.0 to 9.2.1 all radosgws are refusing
to start up, it's stuck on this 'notify' object:

[root@sm-cld-mtl-033 ceph]# ceph daemon /var/run/ceph/ceph-client.<>.asok
objecter_requests
{
"ops": [
{
"tid": 13,
"pg": "4.88aa5c95",
"osd": 64,
"object_id": "notify.2",
"object_locator": "@4",
"target_object_id": "notify.2",
"target_object_locator": "@4",
"paused": 0,
"used_replica": 0,
"precalc_pgid": 0,
"last_sent": "2016-03-02 17:25:34.946304",
"attempts": 1,
"snapid": "head",
"snap_context": "0=[]",
"mtime": "2016-03-02 17:25:34.946149",
"osd_ops": [
"create 0~0"
]
}


-bash-4.2$ ceph pg map 4.88aa5c95
osdmap e66042 pg 4.88aa5c95 (4.95) -> up [64,99] acting [64,99]


on 64...

2016-03-02 17:20:52.671251 7fc42c437700  0 log_channel(cluster) log [WRN] :
slow request 120.123788 seconds old, received at 2016-03-02
17:18:52.547397: osd_op(client.38825908.0:8291171 notify.2 [watch ping
cookie 74579472 gen 106] 4.88aa5c95 ondisk+write+known_if_redirected
e66040) currently waiting for peered

on 99 the object seems to exist... (though it's zero bytes)

-rw-r--r-- 1 root root 0 Jan 13 16:37 __head_0095__4
-rw-r--r-- 1 root root 0 Mar  1 22:10 notify.2__head_88AA5C95__4
[root@sm-cld-mtl-025 4.95_head]# pwd
/var/lib/ceph/osd/ceph-99/current/4.95_head

On 64, that dir is empty.

We had one osd which went bad and was removed, which was involved in this
pg.

Any next steps here? Are these 'notify' objects safe to nuke? I tried a
repair/scrub on it, didnt seem to have an effect or log anywhere.

Any assistance is appreciated...

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw flush_read_list(): d->client_c->handle_data() returned -5

2016-02-24 Thread Ben Hines

Any idea what is going on here? I get these intermittently, especially with
very large file.

The client is doing RANGE requests on this >51 GB file, incrementally
fetching later chunks.

2016-02-24 16:30:59.669561 7fd33b7fe700  1 == starting new request
req=0x7fd32c0879c0 =
2016-02-24 16:30:59.669675 7fd33b7fe700  2 req 3648804:0.000114::GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::initializing for
trans_id = tx00037ad24-0056ce4b43-259914b-default
2016-02-24 16:30:59.669687 7fd33b7fe700 10 host=
2016-02-24 16:30:59.669757 7fd33b7fe700 10
s->object=/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg
s->bucket=
2016-02-24 16:30:59.669767 7fd33b7fe700  2 req 3648804:0.000206:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::getting op
2016-02-24 16:30:59.669776 7fd33b7fe700  2 req 3648804:0.000215:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:authorizing
2016-02-24 16:30:59.669785 7fd33b7fe700  2 req 3648804:0.000224:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:reading
permissions
2016-02-24 16:30:59.673797 7fd33b7fe700 10 manifest: total_size =
50346000384
2016-02-24 16:30:59.673841 7fd33b7fe700  2 req 3648804:0.004280:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:init op
2016-02-24 16:30:59.673867 7fd33b7fe700 10 cache get:
name=.users.uid+ : hit
2016-02-24 16:30:59.673881 7fd33b7fe700 10 cache get:
name=.users.uid+ : hit
2016-02-24 16:30:59.673921 7fd33b7fe700  2 req 3648804:0.004360:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op mask
2016-02-24 16:30:59.673929 7fd33b7fe700  2 req 3648804:0.004369:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op permissions
2016-02-24 16:30:59.673941 7fd33b7fe700  5 Searching permissions for
uid=anonymous mask=49
2016-02-24 16:30:59.673944 7fd33b7fe700  5 Permissions for user not found
2016-02-24 16:30:59.673946 7fd33b7fe700  5 Searching permissions for
group=1 mask=49
2016-02-24 16:30:59.673949 7fd33b7fe700  5 Found permission: 1
2016-02-24 16:30:59.673951 7fd33b7fe700  5 Searching permissions for
group=2 mask=49
2016-02-24 16:30:59.673953 7fd33b7fe700  5 Permissions for group not found
2016-02-24 16:30:59.673955 7fd33b7fe700  5 Getting permissions id=anonymous
owner= perm=1
2016-02-24 16:30:59.673957 7fd33b7fe700 10  uid=anonymous requested perm
(type)=1, policy perm=1, user_perm_mask=15, acl perm=1
2016-02-24 16:30:59.673961 7fd33b7fe700  2 req 3648804:0.004400:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op params
2016-02-24 16:30:59.673965 7fd33b7fe700  2 req 3648804:0.004404:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:executing
2016-02-24 16:30:59.674107 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=130023424 stripe_ofs=130023424 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674193 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=134217728 stripe_ofs=134217728 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674317 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=138412032 stripe_ofs=138412032 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674433 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=142606336 stripe_ofs=142606336 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.046110 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=146800640 stripe_ofs=146800640 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.150966 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=150994944 stripe_ofs=150994944 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.151118 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=155189248 stripe_ofs=155189248 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.161000 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=157286400 stripe_ofs=157286400 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.199553 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=161480704 stripe_ofs=161480704 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.278308 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=165675008 stripe_ofs=165675008 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.312306 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=169869312 stripe_ofs=169869312 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.751626 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=174063616 stripe_ofs=174063616 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.833570 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=178257920 stripe_ofs=178257920 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.871774 7fd33b7fe700  0 ERROR: flush_read_list():
d->client_c->handle_data() returned -5
2016-02-24 16:31:00.872480 7fd33b7fe700  0 WARNING: set_req_state_err
err_no=5 resorting to 500
2016-02-24 16:31:00.872561

[ceph-users] incorrect numbers in ceph osd pool stats

2016-02-18 Thread Ben Hines

Ceph 9.2.0

Anyone seen this? Crazy numbers in osd stats command

ceph osd stats

pool .rgw.buckets id 12
  2/39 objects degraded (5.128%)
  -105/39 objects misplaced (-269.231%)
  recovery io 20183 kB/s, 36 objects/s
  client io 79346 kB/s rd, 703 kB/s wr, 476 op/s



ceph osd stats -f json


{"pool_name":".rgw.buckets","pool_id":12,"recovery":{"degraded_objects":4,"degraded_total":83,"degraded_ratio":0.048193,"misplaced_objects":18446744073709551373,"misplaced_total":83,"misplaced_ratio":-2.927711},"


-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading Ceph

2016-02-01 Thread Ben Hines

Upgrades have been easy for me, following the steps.

I would say to be careful not to 'miss' one OSD, or forget to restart it
after updating, since having an OSD on a different version than the rest of
the cluster for too long during an upgrade started to cause issues when i
missed one once.

-Ben

On Wed, Jan 27, 2016 at 6:00 AM, Vlad Blando  wrote:

> Hi,
>
> I have a production Ceph Cluster
> - 3 nodes
> - 3 mons on each nodes
> - 9 OSD @ 4TB per node
> - using ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>
> Now I want to upgrade it to Hammer, I saw the documentation on upgrading,
> it looks straight forward, but I want to know to those who have tried
> upgrading a production environment, any precautions, caveats, preparation
> that I need to do before doing it?
>
> - Vlad
> ᐧ
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RGW Civetweb + CentOS7 boto errors

2016-01-29 Thread Ben Hines

After updating our RGW servers to Centos 7 + civetweb, when hit with a fair
amount of load (20 gets/sec + a few puts/sec) i'm seeing 'BadStatusLine'
exceptions from boto relatively often.

Happens most when calling bucket.get_key() (about 10 times in 1000) These
appear to be possibly random TCP resets when viewing with Wireshark.
Happens with both Hammer and Infernalis.

These happen regardless of the civetweb num_threads or rgw num rados
handles setting. Has anyone seen something similar?

The servers don't appear to be running out of tcp sockets or similar but
perhaps there is some sysctl setting or other tuning that I should be
using.

I may try going back to apache + fastcgi as an experiment (if it still
works with Infernalis?)

thanks,

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] upgrading 0.94.5 to 9.2.0 notes

2016-01-26 Thread Ben Hines

I see the same list of issues, particularly where ceph.target doesn't
function until i 'enable' the daemons individually.

It would be nice if the package enabled the daemons when it is installed,
so that ceph.target works. Perhaps this could be fixed for Jewel?

-Ben

On Sat, Nov 21, 2015 at 12:22 AM, Henrik Korkuc  wrote:

> On 15-11-20 17:14, Kenneth Waegeman wrote:
>
>> <...>
>> * systemctl start ceph.target does not start my osds.., I have to start
>> them all with systemctl start ceph-osd@...
>> * systemctl restart ceph.target restart the running osds, but not the
>> osds that are not yet running.
>> * systemctl stop ceph.target stops everything, as expected :)
>>
>> I didn't have a chance for complete testing (still preparing for
> upgrade), but I think that I saw in service files, that they install under
> ceph.target, so if you would do "systemctl enable ceph-osd@" for
> all OSDs on the server, they could be started/stopped by ceph.target
>
>
> I didn't tested everything thoroughly yet, but does someone has seen the
>> same issues?
>>
>> Thanks!
>>
>> Kenneth
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to observed civetweb.

2016-01-19 Thread Ben Hines

Hey Kobi,

You stated:

> >> You can add:
> >> *access_log_file=/var/log/civetweb/access.log
> >> error_log_file=/var/log/civetweb/error.log*
> >>
> >> to *rgw frontends* in ceph.conf though these logs are thin on info
> >> (Source IP, date, and request)



How is this done exactly in the config file? I've tried various ways,
nothing works.


ie, neither of these work:


[client.radosgw.]
host = 
rgw socket path = /tmp/radosgw.sock
keyring = /etc/ceph/ceph.client.radosgw.keyring
log file = /var/log/radosgw/client.radosgw..log   <--  this
log gets output only
access_log_file=/var/log/civetweb/access.log

rgw frontends error log file = /var/log/radosgw/civetweb.error.log  <-- nada
rgw frontends access log file = /var/log/radosgw/civetweb.access.log <-- nada

rgw print continue = True
rgw enable ops log = False


[rgw frontends]

access_log_file=/var/log/civetweb/access.log  <-- nada



-Ben


On Tue, Sep 8, 2015 at 11:21 AM, Kobi Laredo 
wrote:

> Vickie,
>
> You can add:
> *access_log_file=/var/log/civetweb/access.log
> error_log_file=/var/log/civetweb/error.log*
>
> to *rgw frontends* in ceph.conf though these logs are thin on info
> (Source IP, date, and request)
>
> Check out
> https://github.com/civetweb/civetweb/blob/master/docs/UserManual.md for
> more civetweb configs you can inject through  *rgw frontends* config
> attribute in ceph.conf
>
> We are currently testing tuning civetweb's num_threads
> and request_timeout_ms to improve radosgw performance
>
> *Kobi Laredo*
> *Cloud Systems Engineer* | (*408) 409-KOBI*
>
> On Tue, Sep 8, 2015 at 8:20 AM, Yehuda Sadeh-Weinraub 
> wrote:
>
>> You can increase the civetweb logs by adding 'debug civetweb = 10' in
>> your ceph.conf. The output will go into the rgw logs.
>>
>> Yehuda
>>
>> On Tue, Sep 8, 2015 at 2:24 AM, Vickie ch  wrote:
>> > Dear cephers,
>> >Just upgrade radosgw from apache to civetweb.
>> > It's really simple to installed and used. But I can't find any
>> parameters or
>> > logs to adjust(or observe) civetweb. (Like apache log).  I'm really
>> confuse.
>> > Any ideas?
>> >
>> >
>> > Best wishes,
>> > Mika
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to observed civetweb.

2016-01-19 Thread Ben Hines

Of course, i figured this out. You meant just append it to the frontends
setting. Very confusing as it's unlike every other ceph setting.

rgw frontends = civetweb num_threads=150
error_log_file=/var/log/radosgw/civetweb.error.log
access_log_file=/var/log/radosgw/civetweb.access.log

Any documentation, at all, on civetweb + radosgw on the Ceph site would be
awesome.. Currently it all only references Apache+FastCGi.



On Tue, Jan 19, 2016 at 8:42 PM, Ben Hines <bhi...@gmail.com> wrote:

> Hey Kobi,
>
> You stated:
>
> > >> You can add:
> > >> *access_log_file=/var/log/civetweb/access.log
> > >> error_log_file=/var/log/civetweb/error.log*
> > >>
> > >> to *rgw frontends* in ceph.conf though these logs are thin on info
> > >> (Source IP, date, and request)
>
>
>
> How is this done exactly in the config file? I've tried various ways, nothing 
> works.
>
>
> ie, neither of these work:
>
>
> [client.radosgw.]
> host = 
> rgw socket path = /tmp/radosgw.sock
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> log file = /var/log/radosgw/client.radosgw..log   <--  this log gets 
> output only
> access_log_file=/var/log/civetweb/access.log
>
> rgw frontends error log file = /var/log/radosgw/civetweb.error.log  <-- nada
> rgw frontends access log file = /var/log/radosgw/civetweb.access.log <-- nada
>
> rgw print continue = True
> rgw enable ops log = False
>
>
> [rgw frontends]
>
> access_log_file=/var/log/civetweb/access.log  <-- nada
>
>
>
> -Ben
>
>
> On Tue, Sep 8, 2015 at 11:21 AM, Kobi Laredo <kobi.lar...@dreamhost.com>
> wrote:
>
>> Vickie,
>>
>> You can add:
>> *access_log_file=/var/log/civetweb/access.log
>> error_log_file=/var/log/civetweb/error.log*
>>
>> to *rgw frontends* in ceph.conf though these logs are thin on info
>> (Source IP, date, and request)
>>
>> Check out
>> https://github.com/civetweb/civetweb/blob/master/docs/UserManual.md for
>> more civetweb configs you can inject through  *rgw frontends* config
>> attribute in ceph.conf
>>
>> We are currently testing tuning civetweb's num_threads
>> and request_timeout_ms to improve radosgw performance
>>
>> *Kobi Laredo*
>> *Cloud Systems Engineer* | (*408) 409-KOBI*
>>
>> On Tue, Sep 8, 2015 at 8:20 AM, Yehuda Sadeh-Weinraub <yeh...@redhat.com>
>> wrote:
>>
>>> You can increase the civetweb logs by adding 'debug civetweb = 10' in
>>> your ceph.conf. The output will go into the rgw logs.
>>>
>>> Yehuda
>>>
>>> On Tue, Sep 8, 2015 at 2:24 AM, Vickie ch <mika.leaf...@gmail.com>
>>> wrote:
>>> > Dear cephers,
>>> >Just upgrade radosgw from apache to civetweb.
>>> > It's really simple to installed and used. But I can't find any
>>> parameters or
>>> > logs to adjust(or observe) civetweb. (Like apache log).  I'm really
>>> confuse.
>>> > Any ideas?
>>> >
>>> >
>>> > Best wishes,
>>> > Mika
>>> >
>>> >
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Ben Hines

It works fine. The federated config reference is not related to running
multiple instances on the same zone.

Just set up 2 radosgws give each instance the exact same configuration. (I
use different client names in ceph.conf, but i bet it would work even if
the client names were identical)

Official documentation on this very common use case would be a good idea, i
also figured this out on my own.

On Mon, Jan 4, 2016 at 6:21 PM, Yang Honggang 
wrote:

> Hello Srinivas,
>
> Yes, we can use Haproxy as a frontend. But the precondition is multi
> RadosGW instances sharing
> the *SAME CEPH POOLS* are running. I only want the master zone keep one
> copy of all data. I want
> to access the data through *ANY *radosgw instance.
> And it said in http://docs.ceph.com/docs/master/radosgw/federated-config/
> "zones may have more than one Ceph Object Gateway instance per zone.". So
> I need the *official way*
> to set up these radosgw instances.
>
> thx
>
> joseph
>
>
> On 01/04/2016 06:37 PM, Srinivasula Maram wrote:
>
> Hi Joseph,
>
>
>
> You can try haproxy as proxy for load balancing and failover.
>
>
>
> Thanks,
>
> Srinivas
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com
> ] *On Behalf Of *Joseph Yang
> *Sent:* Monday, January 04, 2016 2:09 PM
> *To:* ceph-us...@ceph.com; Joseph Yang
> *Subject:* [ceph-users] How to run multiple RadosGW instances under the
> same zone
>
>
>
>
>
> Hello,
>
>
>
> How to run multiple RadosGW instances under the same zone?
>
>
>
> Assume there are two hosts HOST_1 and HOST2. I want to run
>
> two RadosGW instances on these two hosts for my zone ZONE_MULI.
>
> So, when one of the radosgw instance is down, I can still access the zone.
>
>
>
> There are some questions:
>
> 1. How many ceph users should I create?
>
> 2. How many rados users should I create?
>
> 3. How to set ZONE_MULI's access_key/secret_key?
>
> 4. How to set the 'host' section in the ceph conf file for these two
>
>radosgw instances?
>
> 5. How to start the instances?
>
> # radosgw --cluster My_Cluster -n ?_which_rados_user_?
>
>
>
> I read http://docs.ceph.com/docs/master/radosgw/federated-config/, but
>
> there seems no explanation.
>
>
>
> Your answer is appreciated!
>
>
>
> thx
>
>
>
> Joseph
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Dealing with radosgw and large OSD LevelDBs: compact, start over, something else?

2015-12-21 Thread Ben Hines

I'd be curious to compare benchmarks. What size objects are you putting?
10gig end to end from client to RGW server to OSDs?  I wouldn't be
surprised if mine is pretty slow though in comparison, since we still don't
have SSD journals. So I have not paid much attention to upload speed.

Our omap dirs are about 400MB on each OSD, and we have ~100 OSDs.  ~20
buckets with ~23 shards each and 500k-1M objects each, so the layout is
much different.

-Ben

On Mon, Dec 21, 2015 at 1:16 AM, Florian Haas  wrote:

> On Thu, Dec 17, 2015 at 6:16 PM, Florian Haas  wrote:
> > Hey everyone,
> >
> > I recently got my hands on a cluster that has been underperforming in
> > terms of radosgw throughput, averaging about 60 PUTs/s with 70K
> > objects where a freshly-installed cluster with near-identical
> > configuration would do about 250 PUTs/s. (Neither of these values are
> > what I'd consider high throughput, but this is just to give you a feel
> > about the relative performance hit.)
> >
> > Some digging turned up that of the less than 200 buckets in the
> > cluster, about 40 held in excess of a million objects (1-4M), which
> > one bucket being an outlier with 45M objects. All buckets were created
> > post-Hammer, and use 64 index shards. The total number of objects in
> > radosgw is approx. 160M.
> >
> > Now this isn't a large cluster in terms of OSD distribution; there are
> > only 12 OSDs (after all, we're only talking double-digit terabytes
> > here). In almost all of these OSDs, the LevelDB omap directory has
> > grown to a size of 10-20 GB.
> >
> > So I have several questions on this:
> >
> > - Is it correct to assume that such a large LevelDB would be quite
> > detrimental to radosgw performance overall?
> >
> > - If so, would clearing that one large bucket and distributing the
> > data over several new buckets reduce the LevelDB size at all?
> >
> > - Is there even something akin to "ceph mon compact" for OSDs?
> >
> > - Are these large LevelDB databases a simple consequence of having a
> > combination of many radosgw objects and few OSDs, with the
> > distribution per-bucket being comparatively irrelevant?
> >
> > I do understand that the 45M object bucket itself would have been a
> > problem pre-Hammer, with no index sharding available. But with what
> > others have shared here, a rule of thumb of one index shard per
> > million objects should be a good one to follow, so 64 shards for 45M
> > objects doesn't strike me as totally off the mark. That's why I think
> > LevelDB I/O is actually the issue here. But I might be totally wrong;
> > all insights appreciated. :)
>
> Just giving this one a nudge and CC'ing a few other, presumably
> interested, parties. :) Ben, Wido, Wade, any thoughts on this one?
>
> Cheers,
> Florian
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw bucket index sharding tips?

2015-12-16 Thread Ben Hines

Great, glad to see that others are concerned about this.

One serious problem is that the number of index shards cannot be changed
once it's been created. So if you have a bucket that you can't just
recreate easily, you're screwed. Fortunately for my use case i can delete
the contents of our buckets and recreate them if need be, though it takes
time.

Adding the ability to scale up bucket index shards (or just making it fully
dynamic --  user shouldn't have to worry about this) would be great.

-Ben



On Wed, Dec 16, 2015 at 11:25 AM, Wade Holler <wade.hol...@gmail.com> wrote:

> I'm interested in this too. Should start testing next week at 1B+ objects
> and I sure would like a recommendation of what config to start with.
>
> We learned the hard way that not sharding is very bad at scales like this.
> On Wed, Dec 16, 2015 at 2:06 PM Florian Haas <flor...@hastexo.com> wrote:
>
>> Hi Ben & everyone,
>>
>> just following up on this one from July, as I don't think there's been
>> a reply here then.
>>
>> On Wed, Jul 8, 2015 at 7:37 AM, Ben Hines <bhi...@gmail.com> wrote:
>> > Anyone have any data on optimal # of shards for a radosgw bucket index?
>> >
>> > We've had issues with bucket index contention with a few million+
>> > objects in a single bucket so i'm testing out the sharding.
>> >
>> > Perhaps at least one shard per OSD? Or, less? More?
>>
>> I'd like to make this more concrete: what about having several buckets
>> each holding 2-4M objects, created on hammer, with 64 index shards? Is
>> that type of fill expected to bring radosgw performance down by a
>> factor of 5, versus an unpopulated (empty) radosgw setup?
>>
>> Ben, you wrote elsewhere
>> (
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003955.html
>> )
>> that you found approx. 900k objects to be the threshold where index
>> sharding becomes necessary. Have you found that to be a reasonable
>> rule of thumb, as in "try 1-2 shards per million objects in your most
>> populous bucket"? Also, do you reckon that beyond that, more shards
>> make things worse?
>>
>> > I noticed some discussion here regarding slow bucket listing with
>> > ~200k obj --
>> http://cephnotes.ksperis.com/blog/2015/05/12/radosgw-big-index
>> > - bucket list seems significantly impacted.
>> >
>> > But i'm more concerned about general object put  (write) / object read
>> > speed since 'bucket listing' is not something that we need to do. Not
>> > sure if the index has to be completely read to write an object into
>> > it?
>>
>> This is a question where I'm looking for an answer, too.
>>
>> Cheers,
>> Florian
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw bucket index sharding tips?

2015-12-16 Thread Ben Hines

On Wed, Dec 16, 2015 at 11:05 AM, Florian Haas  wrote:

> Hi Ben & everyone,
>
>
> Ben, you wrote elsewhere
> (
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003955.html
> )
> that you found approx. 900k objects to be the threshold where index
> sharding becomes necessary. Have you found that to be a reasonable
> rule of thumb, as in "try 1-2 shards per million objects in your most
> populous bucket"? Also, do you reckon that beyond that, more shards
> make things worse?
>
>

Oh, and to answer this part.   I didn't do that much experimentation
unfortunately.  I actually am using about 24 index shards per bucket
currently and we delete each bucket once it hits about a million objects.
(it's just a throwaway cache for us) Seems ok, so i stopped tweaking.

Also, i think i have a pretty slow cluster as far as write speed is
concerned, since we do not have SSD Journals. With SSD journals i imagine
the index write speed is significantly improved, but i am not sure how
much. A faster cluster could probably handle bigger indexes.

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph 9.2 fails to install in COS 7.1.1503: Report and Fix

2015-12-09 Thread Ben Hines

FYI - same issue when installing Hammer, 94.5. I also fixed it by enabling
the cr repo.

-Ben

On Tue, Dec 8, 2015 at 5:13 PM, Goncalo Borges  wrote:

> Hi Cephers
>
> This is just to report an issue (and a workaround) regarding dependencies
> in Centos 7.1.1503
>
> Last week, I installed a couple of nodes and there were no issues with
> dependencies. This week, the installation of ceph rpm fails because it
> depends on gperftools-libs which, on its own, depends on libunwind.
>
> Searching a bit, I've checked that my last week installs downloaded
> libunwind from epel (libunwind-1.1-10.el7.x86_64). Today it is no longer
> there.
>
> Goggling about it, it seems libunwind will be available in CentOS 7.2.1511
> but for the current time, it should be available in Centos CR repos. For
> Centos 7.1.1503, it provides libunwind-1.1-5.el7.x86_64)
>
> http://mirror.centos.org/centos/7.1.1503/cr
>
> Cheers
> Goncalo
>
> --
> Goncalo Borges
> Research Computing
> ARC Centre of Excellence for Particle Physics at the Terascale
> School of Physics A28 | University of Sydney, NSW  2006
> T: +61 2 93511937
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] osds revert to 'prepared' after reboot

2015-09-24 Thread Ben Hines

Any idea why OSDs might revert to 'prepared' after reboot and have to
be activated again?

These are older nodes which were manually deployed, not using ceph-deploy.

CentOS 6.7, Hammer 94.3

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osds revert to 'prepared' after reboot

2015-09-24 Thread Ben Hines

Aha, it seems like '--mark-init auto' (supposedly the default arg to
ceph-disk activate?) must be failing. I'll try re-activating my OSDs
with an explicit init system passed in.

-Ben

On Thu, Sep 24, 2015 at 12:49 PM, Ben Hines <bhi...@gmail.com> wrote:
> Any idea why OSDs might revert to 'prepared' after reboot and have to
> be activated again?
>
> These are older nodes which were manually deployed, not using ceph-deploy.
>
> CentOS 6.7, Hammer 94.3
>
> -Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rgw cache lru size

2015-09-23 Thread Ben Hines

We have a ton of memory on our RGW servers, 96GB.

Can someone explain how the rgw lru cache functions? It is worth
bumping the 'rgw cache lru size' to a huge number?

Our gateway seems to only be using about 1G of memory with the default setting.

Also currently still using apache/fastcgi due to the extra
configurability and logging of apache. Willing to switch to civetweb
if given a good reason..

thanks-

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] purpose of different default pools created by radosgw instance

2015-09-09 Thread Ben Hines

The Ceph docs in general could use a lot of improvement, IMO. There
are many, many
settings listed, but one must dive into the mailing list to learn
which ones are worth tweaking (And often, even *what they do*!)

-Ben

On Wed, Sep 9, 2015 at 3:51 PM, Mark Kirkwood
 wrote:
> On 16/09/14 17:10, pragya jain wrote:
>> Hi all!
>>
>> As document says, ceph has some default pools for radosgw instance. These 
>> pools are:
>>   * .rgw.root
>>   * .rgw.control
>>   * .rgw.gc
>>   * .rgw.buckets
>>   * .rgw.buckets.index
>>   * .log
>>   * .intent-log
>>   * .usage
>>   * .users
>>   * .users.email
>>   * .users.swift
>>   * .users.uid
>> Can somebody explain me what are the purpose of these different pools in 
>> terms of storing the data, for example, according to my understanding,
>>   * .users pool contains the information of the users that have their 
>> account in the system
>>   * .users.swift contains the information of users that are using Swift 
>> APIs to authenticate to the system.
>> Please help me to clarify all these concepts.
>>
>> Regards
>> Pragya Jain
>>
>
> I'd like to add a +1 to this, just had some issues with puzzling
> contents of one of these pools and the scarcity of doco on them made it
> more puzzling still. Fortunately one can poke about in src/rgw/ for
> enlightenment but that is slower than simpley reading some nice docs!
>
> regards
>
> Mark
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-09-08 Thread Ben Hines

FYI, over the past week I have deleted over 50 TB of data from my
cluster of these objects. Almost all were from buckets that no longer
exist, and the fix tool did not find them. Fortunately i don't need
the data from these old buckets so deleting all objects by prefix
worked great.

Anyone managing a large RGW cluster should periodically make sure that
the pool use matches expected values. (replication factor * sum of
size_kb_actual for each rgw bucket)

-Ben

On Mon, Aug 31, 2015 at 3:53 PM, Yehuda Sadeh-Weinraub
<yeh...@redhat.com> wrote:
> The bucket index objects are most likely in the .rgw.buckets.index pool.
>
> Yehuda
>
> On Mon, Aug 31, 2015 at 3:27 PM, Ben Hines <bhi...@gmail.com> wrote:
>> Good call, thanks!
>>
>> Is there any risk of also deleting parts of the bucket index? I'm not
>> sure what the objects for the index itself look like, or if they are
>> in the .rgw.buckets pool.
>>
>>
>> On Mon, Aug 31, 2015 at 3:23 PM, Yehuda Sadeh-Weinraub
>> <yeh...@redhat.com> wrote:
>>> Make sure you use the underscore also, e.g., "default.8873277.32_".
>>> Otherwise you could potentially erase objects you did't intend to,
>>> like ones who start with "default.8873277.320" and such.
>>>
>>> On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines <bhi...@gmail.com> wrote:
>>>> Ok. I'm not too familiar with the inner workings of RGW, but i would
>>>> assume that for a bucket with these parameters:
>>>>
>>>>"id": "default.8873277.32",
>>>>"marker": "default.8873277.32",
>>>>
>>>> Tha it would be the only bucket using the files that start with
>>>> "default.8873277.32"
>>>>
>>>> default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15
>>>> default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2
>>>>
>>>>
>>>>
>>>> On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub
>>>> <yeh...@redhat.com> wrote:
>>>>> As long as you're 100% sure that the prefix is only being used for the
>>>>> specific bucket that was previously removed, then it is safe to remove
>>>>> these objects. But please do double check and make sure that there's
>>>>> no other bucket that matches this prefix somehow.
>>>>>
>>>>> Yehuda
>>>>>
>>>>> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines <bhi...@gmail.com> wrote:
>>>>>> No input, eh? (or maybe TL,DR for everyone)
>>>>>>
>>>>>> Short version: Presuming the bucket index shows blank/empty, which it
>>>>>> does and is fine, would me manually deleting the rados objects with
>>>>>> the prefix matching the former bucket's ID cause any problems?
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> -Ben
>>>>>>
>>>>>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines <bhi...@gmail.com> wrote:
>>>>>>> Ceph 0.93->94.2->94.3
>>>>>>>
>>>>>>> I noticed my pool used data amount is about twice the bucket used data 
>>>>>>> count.
>>>>>>>
>>>>>>> This bucket was emptied long ago. It has zero objects:
>>>>>>> "globalcache01",
>>>>>>> {
>>>>>>> "bucket": "globalcache01",
>>>>>>> "pool": ".rgw.buckets",
>>>>>>> "index_pool": ".rgw.buckets.index",
>>>>>>> "id": "default.8873277.32",
>>>>>>> "marker": "default.8873277.32",
>>>>>>> "owner": "...",
>>>>>>> "ver": "0#12348839",
>>>>>>> "master_ver": "0#0",
>>>>>>> "mtime": "2015-03-08 11:44:11.00",
>>>>>>> "max_marker": "0#",
>>>>>>> "usage": {
>>>>>>> "rgw.none": {
>>>>>>> "size_kb": 0,
>>>>>>> "size_kb_actual": 0,
>>>>>>> "num_objects": 0
>>>>>>&g

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Ben Hines

Yeah, i'm not seeing stuff being moved at all. Perhaps we should file
a ticket to request a way to tell an OSD to rebalance its directory
structure.

On Fri, Sep 4, 2015 at 5:08 AM, Nick Fisk <n...@fisk.me.uk> wrote:
> I've just made the same change ( 4 and 40 for now) on my cluster which is a 
> similar size to yours. I didn't see any merging happening, although most of 
> the directory's I looked at had more files in than the new merge threshold, 
> so I guess this is to be expected
>
> I'm currently splitting my PG's from 1024 to 2048 to see if that helps to 
> bring things back into order.
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Wang, Warren
>> Sent: 04 September 2015 01:21
>> To: Mark Nelson <mnel...@redhat.com>; Ben Hines <bhi...@gmail.com>
>> Cc: ceph-users <ceph-users@lists.ceph.com>
>> Subject: Re: [ceph-users] Ceph performance, empty vs part full
>>
>> I'm about to change it on a big cluster too. It totals around 30 million, so 
>> I'm a
>> bit nervous on changing it. As far as I understood, it would indeed move
>> them around, if you can get underneath the threshold, but it may be hard to
>> do. Two more settings that I highly recommend changing on a big prod
>> cluster. I'm in favor of bumping these two up in the defaults.
>>
>> Warren
>>
>> -Original Message-----
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Mark Nelson
>> Sent: Thursday, September 03, 2015 6:04 PM
>> To: Ben Hines <bhi...@gmail.com>
>> Cc: ceph-users <ceph-users@lists.ceph.com>
>> Subject: Re: [ceph-users] Ceph performance, empty vs part full
>>
>> Hrm, I think it will follow the merge/split rules if it's out of whack given 
>> the
>> new settings, but I don't know that I've ever tested it on an existing 
>> cluster to
>> see that it actually happens.  I guess let it sit for a while and then check 
>> the
>> OSD PG directories to see if the object counts make sense given the new
>> settings? :D
>>
>> Mark
>>
>> On 09/03/2015 04:31 PM, Ben Hines wrote:
>> > Hey Mark,
>> >
>> > I've just tweaked these filestore settings for my cluster -- after
>> > changing this, is there a way to make ceph move existing objects
>> > around to new filestore locations, or will this only apply to newly
>> > created objects? (i would assume the latter..)
>> >
>> > thanks,
>> >
>> > -Ben
>> >
>> > On Wed, Jul 8, 2015 at 6:39 AM, Mark Nelson <mnel...@redhat.com>
>> wrote:
>> >> Basically for each PG, there's a directory tree where only a certain
>> >> number of objects are allowed in a given directory before it splits
>> >> into new branches/leaves.  The problem is that this has a fair amount
>> >> of overhead and also there's extra associated dentry lookups to get at any
>> given object.
>> >>
>> >> You may want to try something like:
>> >>
>> >> "filestore merge threshold = 40"
>> >> "filestore split multiple = 8"
>> >>
>> >> This will dramatically increase the number of objects per directory
>> allowed.
>> >>
>> >> Another thing you may want to try is telling the kernel to greatly
>> >> favor retaining dentries and inodes in cache:
>> >>
>> >> echo 1 | sudo tee /proc/sys/vm/vfs_cache_pressure
>> >>
>> >> Mark
>> >>
>> >>
>> >> On 07/08/2015 08:13 AM, MATHIAS, Bryn (Bryn) wrote:
>> >>>
>> >>> If I create a new pool it is generally fast for a short amount of time.
>> >>> Not as fast as if I had a blank cluster, but close to.
>> >>>
>> >>> Bryn
>> >>>>
>> >>>> On 8 Jul 2015, at 13:55, Gregory Farnum <g...@gregs42.com> wrote:
>> >>>>
>> >>>> I think you're probably running into the internal PG/collection
>> >>>> splitting here; try searching for those terms and seeing what your
>> >>>> OSD folder structures look like. You could test by creating a new
>> >>>> pool and seeing if it's faster or slower than the one you've already 
>> >>>> filled
>> up.
>> >>>> -Greg
>> >>>>
>> >>>> On Wed, Jul 8, 2015 at 1:25 PM, MATHIAS, Bryn (Bryn)
>> >>>> <bryn.math...@alcatel-lucent

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-03 Thread Ben Hines

Hey Mark,

I've just tweaked these filestore settings for my cluster -- after
changing this, is there a way to make ceph move existing objects
around to new filestore locations, or will this only apply to newly
created objects? (i would assume the latter..)

thanks,

-Ben

On Wed, Jul 8, 2015 at 6:39 AM, Mark Nelson  wrote:
> Basically for each PG, there's a directory tree where only a certain number
> of objects are allowed in a given directory before it splits into new
> branches/leaves.  The problem is that this has a fair amount of overhead and
> also there's extra associated dentry lookups to get at any given object.
>
> You may want to try something like:
>
> "filestore merge threshold = 40"
> "filestore split multiple = 8"
>
> This will dramatically increase the number of objects per directory allowed.
>
> Another thing you may want to try is telling the kernel to greatly favor
> retaining dentries and inodes in cache:
>
> echo 1 | sudo tee /proc/sys/vm/vfs_cache_pressure
>
> Mark
>
>
> On 07/08/2015 08:13 AM, MATHIAS, Bryn (Bryn) wrote:
>>
>> If I create a new pool it is generally fast for a short amount of time.
>> Not as fast as if I had a blank cluster, but close to.
>>
>> Bryn
>>>
>>> On 8 Jul 2015, at 13:55, Gregory Farnum  wrote:
>>>
>>> I think you're probably running into the internal PG/collection
>>> splitting here; try searching for those terms and seeing what your OSD
>>> folder structures look like. You could test by creating a new pool and
>>> seeing if it's faster or slower than the one you've already filled up.
>>> -Greg
>>>
>>> On Wed, Jul 8, 2015 at 1:25 PM, MATHIAS, Bryn (Bryn)
>>>  wrote:

 Hi All,


 I’m perf testing a cluster again,
 This time I have re-built the cluster and am filling it for testing.

 on a 10 min run I get the following results from 5 load generators, each
 writing though 7 iocontexts, with a queue depth of 50 async writes.


 Gen1
 Percentile 100 = 0.729775905609
 Max latencies = 0.729775905609, Min = 0.0320818424225, mean =
 0.0750389684542
 Total objects writen = 113088 in time 604.259738207s gives
 187.151307376/s (748.605229503 MB/s)

 Gen2
 Percentile 100 = 0.735981941223
 Max latencies = 0.735981941223, Min = 0.0340068340302, mean =
 0.0745198070711
 Total objects writen = 113822 in time 604.437897921s gives
 188.310495407/s (753.241981627 MB/s)

 Gen3
 Percentile 100 = 0.828994989395
 Max latencies = 0.828994989395, Min = 0.0349340438843, mean =
 0.0745455575197
 Total objects writen = 113670 in time 604.352181911s gives
 188.085694736/s (752.342778944 MB/s)

 Gen4
 Percentile 100 = 1.06834602356
 Max latencies = 1.06834602356, Min = 0.0333499908447, mean =
 0.0752239764659
 Total objects writen = 112744 in time 604.408732891s gives
 186.536020849/s (746.144083397 MB/s)

 Gen5
 Percentile 100 = 0.609658002853
 Max latencies = 0.609658002853, Min = 0.032968044281, mean =
 0.0744482759499
 Total objects writen = 113918 in time 604.671534061s gives
 188.396498897/s (753.585995589 MB/s)

 example ceph -w output:
 2015-07-07 15:50:16.507084 mon.0 [INF] pgmap v1077: 2880 pgs: 2880
 active+clean; 1996 GB data, 2515 GB used, 346 TB / 348 TB avail; 2185 MB/s
 wr, 572 op/s


 However when the cluster gets over 20% full I see the following results,
 this gets worse as the cluster fills up:

 Gen1
 Percentile 100 = 6.71176099777
 Max latencies = 6.71176099777, Min = 0.0358741283417, mean =
 0.161760483485
 Total objects writen = 52196 in time 604.488474131s gives 86.347386648/s
 (345.389546592 MB/s)

 Gen2
 Max latencies = 4.09169006348, Min = 0.0357890129089, mean =
 0.163243938477
 Total objects writen = 51702 in time 604.036739111s gives
 85.5941313704/s (342.376525482 MB/s)

 Gen3
 Percentile 100 = 7.32526683807
 Max latencies = 7.32526683807, Min = 0.038701172, mean =
 0.163992217926
 Total objects writen = 51476 in time 604.684302092s gives
 85.1287189397/s (340.514875759 MB/s)

 Gen4
 Percentile 100 = 7.56094503403
 Max latencies = 7.56094503403, Min = 0.0355761051178, mean =
 0.162109421231
 Total objects writen = 52092 in time 604.769910812s gives
 86.1352376642/s (344.540950657 MB/s)


 Gen5
 Percentile 100 = 6.99595499039
 Max latencies = 6.99595499039, Min = 0.0364680290222, mean =
 0.163651215426
 Total objects writen = 51566 in time 604.061977148s gives
 85.3654127404/s (341.461650961 MB/s)






 Cluster details:
 5*HPDL380’s with 13*6Tb OSD’s
 128Gb Ram
 2*intel 2620v3
 10 Gbit Ceph public network
 10 Gbit Ceph private network

 Load generators connected via a 20Gbit

Re: [ceph-users] Moving/Sharding RGW Bucket Index

2015-09-01 Thread Ben Hines

We also run RGW buckets with many millions of objects and had to shard
our existing buckets. We did have to delete the old ones first,
unfortunately.

I haven't tried moving the index pool to an SSD ruleset - would also
be interested in folks' experiences with this.

Thanks for the information on split multiple + merge threshold. I
assume that increasing that is relatively safe to do on a running
cluster? According to this redhat issue, this may impact
scrub/recovery performance?
https://bugzilla.redhat.com/show_bug.cgi?id=1219974

-Ben

On Tue, Sep 1, 2015 at 9:31 AM, Wang, Warren
 wrote:
> I added sharding to our busiest RGW sites, but it will not shard existing 
> bucket indexes, only applies to new buckets. Even with that change, I'm still 
> considering moving the index pool to SSD. The main factor being the rate of 
> writes. We are looking at a project that will have extremely high writes/sec 
> through the RGWs.
>
> The other thing worth noting is that at that scale, you also need to change 
> filestore merge threshold and filestore split multiple to something 
> considerably larger. Props to Michael Kidd @ RH for that tip. There's a 
> mathematical formula on the filestore config reference.
>
> Warren
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Daniel Maraio
> Sent: Tuesday, September 01, 2015 10:40 AM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Moving/Sharding RGW Bucket Index
>
> Hello,
>
>I have two large buckets in my RGW and I think the performance is being 
> impacted by the bucket index. One bucket contains 9 million objects and the 
> other one has 22 million. I'd like to shard the bucket index and also change 
> the ruleset of the .rgw.buckets.index pool to put it on our SSD root. I could 
> not find any documentation on this issue. It looks like the bucket indexes 
> can be rebuilt using the radosgw-admin bucket check command but I'm not sure 
> how to proceed. We can stop writes or take the cluster down completely if 
> necessary. My initial thought was to backup the existing index pool and 
> create a new one. I'm not sure if I can change the index_pool of an existing 
> bucket. If that is possible I assume I can change that to my new pool and 
> execute a radosgw-admin bucket check command to rebuild/shard the index.
>
>Does anyone have experience in getting sharding running with an existing 
> bucket, or even moving the index pool to a different ruleset?
> When I change the crush ruleset for the .rgw.buckets.index pool to my SSD 
> root we run into issues, buckets cannot be created or listed, writes cease to 
> work, reads seem to work fine though. Thanks for your time!
>
> - Daniel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Ben Hines

No input, eh? (or maybe TL,DR for everyone)

Short version: Presuming the bucket index shows blank/empty, which it
does and is fine, would me manually deleting the rados objects with
the prefix matching the former bucket's ID cause any problems?

thanks,

-Ben

On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines <bhi...@gmail.com> wrote:
> Ceph 0.93->94.2->94.3
>
> I noticed my pool used data amount is about twice the bucket used data count.
>
> This bucket was emptied long ago. It has zero objects:
> "globalcache01",
> {
> "bucket": "globalcache01",
> "pool": ".rgw.buckets",
> "index_pool": ".rgw.buckets.index",
> "id": "default.8873277.32",
> "marker": "default.8873277.32",
> "owner": "...",
> "ver": "0#12348839",
> "master_ver": "0#0",
> "mtime": "2015-03-08 11:44:11.00",
> "max_marker": "0#",
> "usage": {
> "rgw.none": {
> "size_kb": 0,
> "size_kb_actual": 0,
> "num_objects": 0
> },
> "rgw.main": {
> "size_kb": 0,
> "size_kb_actual": 0,
> "num_objects": 0
> }
> },
> "bucket_quota": {
> "enabled": false,
> "max_size_kb": -1,
> "max_objects": -1
> }
> },
>
>
>
> bucket check shows nothing:
>
> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
> --bucket=globalcache01 --fix
> []
> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
> --check-head-obj-locator --bucket=globalcache01 --fix
> {
> "bucket": "globalcache01",
> "check_objects": [
> ]
> }
>
>
> However, i see a lot of data for it on an OSD (all shadow files with
> escaped underscores)
>
> [root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print
> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c
> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c
> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c
> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c
> 
>
> -bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+'
> default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47
> default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6
> default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3
> default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1
> default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16
> default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2
> 
>
> Is there still a bug in the fix obj locator command perhaps? I suppose
> can just do something like:
>
>rados -p .rgw.buckets cleanup --prefix default.8873277.32
>
> Since i want to destroy the bucket anyway, but if this affects other
> buckets, i may want to clean those a better way.
>
> -Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Ben Hines

Ok. I'm not too familiar with the inner workings of RGW, but i would
assume that for a bucket with these parameters:

   "id": "default.8873277.32",
   "marker": "default.8873277.32",

Tha it would be the only bucket using the files that start with
"default.8873277.32"

default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15
default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2



On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub
<yeh...@redhat.com> wrote:
> As long as you're 100% sure that the prefix is only being used for the
> specific bucket that was previously removed, then it is safe to remove
> these objects. But please do double check and make sure that there's
> no other bucket that matches this prefix somehow.
>
> Yehuda
>
> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines <bhi...@gmail.com> wrote:
>> No input, eh? (or maybe TL,DR for everyone)
>>
>> Short version: Presuming the bucket index shows blank/empty, which it
>> does and is fine, would me manually deleting the rados objects with
>> the prefix matching the former bucket's ID cause any problems?
>>
>> thanks,
>>
>> -Ben
>>
>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines <bhi...@gmail.com> wrote:
>>> Ceph 0.93->94.2->94.3
>>>
>>> I noticed my pool used data amount is about twice the bucket used data 
>>> count.
>>>
>>> This bucket was emptied long ago. It has zero objects:
>>> "globalcache01",
>>> {
>>> "bucket": "globalcache01",
>>> "pool": ".rgw.buckets",
>>> "index_pool": ".rgw.buckets.index",
>>> "id": "default.8873277.32",
>>> "marker": "default.8873277.32",
>>> "owner": "...",
>>> "ver": "0#12348839",
>>> "master_ver": "0#0",
>>> "mtime": "2015-03-08 11:44:11.00",
>>> "max_marker": "0#",
>>> "usage": {
>>> "rgw.none": {
>>> "size_kb": 0,
>>> "size_kb_actual": 0,
>>> "num_objects": 0
>>> },
>>> "rgw.main": {
>>> "size_kb": 0,
>>> "size_kb_actual": 0,
>>> "num_objects": 0
>>> }
>>> },
>>> "bucket_quota": {
>>> "enabled": false,
>>> "max_size_kb": -1,
>>> "max_objects": -1
>>> }
>>> },
>>>
>>>
>>>
>>> bucket check shows nothing:
>>>
>>> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>> --bucket=globalcache01 --fix
>>> []
>>> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>> --check-head-obj-locator --bucket=globalcache01 --fix
>>> {
>>> "bucket": "globalcache01",
>>> "check_objects": [
>>> ]
>>> }
>>>
>>>
>>> However, i see a lot of data for it on an OSD (all shadow files with
>>> escaped underscores)
>>>
>>> [root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print
>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c
>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c
>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c
>>> ./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c
>>> 
>>>
>>> -bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+'
>>> default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47
>>> default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6
>>> default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3
>>> default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1
>>> default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16
>>> default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2
>>> 
>>>
>>> Is there still a bug in the fix obj locator command perhaps? I suppose
>>> can just do something like:
>>>
>>>rados -p .rgw.buckets cleanup --prefix default.8873277.32
>>>
>>> Since i want to destroy the bucket anyway, but if this affects other
>>> buckets, i may want to clean those a better way.
>>>
>>> -Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Ben Hines

Good call, thanks!

Is there any risk of also deleting parts of the bucket index? I'm not
sure what the objects for the index itself look like, or if they are
in the .rgw.buckets pool.


On Mon, Aug 31, 2015 at 3:23 PM, Yehuda Sadeh-Weinraub
<yeh...@redhat.com> wrote:
> Make sure you use the underscore also, e.g., "default.8873277.32_".
> Otherwise you could potentially erase objects you did't intend to,
> like ones who start with "default.8873277.320" and such.
>
> On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines <bhi...@gmail.com> wrote:
>> Ok. I'm not too familiar with the inner workings of RGW, but i would
>> assume that for a bucket with these parameters:
>>
>>"id": "default.8873277.32",
>>"marker": "default.8873277.32",
>>
>> Tha it would be the only bucket using the files that start with
>> "default.8873277.32"
>>
>> default.8873277.32__shadow_.OkYjjANx6-qJOrjvdqdaHev-LHSvPhZ_15
>> default.8873277.32__shadow_.a2qU3qodRf_E5b9pFTsKHHuX2RUC12g_2
>>
>>
>>
>> On Mon, Aug 31, 2015 at 2:51 PM, Yehuda Sadeh-Weinraub
>> <yeh...@redhat.com> wrote:
>>> As long as you're 100% sure that the prefix is only being used for the
>>> specific bucket that was previously removed, then it is safe to remove
>>> these objects. But please do double check and make sure that there's
>>> no other bucket that matches this prefix somehow.
>>>
>>> Yehuda
>>>
>>> On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines <bhi...@gmail.com> wrote:
>>>> No input, eh? (or maybe TL,DR for everyone)
>>>>
>>>> Short version: Presuming the bucket index shows blank/empty, which it
>>>> does and is fine, would me manually deleting the rados objects with
>>>> the prefix matching the former bucket's ID cause any problems?
>>>>
>>>> thanks,
>>>>
>>>> -Ben
>>>>
>>>> On Fri, Aug 28, 2015 at 4:22 PM, Ben Hines <bhi...@gmail.com> wrote:
>>>>> Ceph 0.93->94.2->94.3
>>>>>
>>>>> I noticed my pool used data amount is about twice the bucket used data 
>>>>> count.
>>>>>
>>>>> This bucket was emptied long ago. It has zero objects:
>>>>> "globalcache01",
>>>>> {
>>>>> "bucket": "globalcache01",
>>>>> "pool": ".rgw.buckets",
>>>>> "index_pool": ".rgw.buckets.index",
>>>>> "id": "default.8873277.32",
>>>>> "marker": "default.8873277.32",
>>>>> "owner": "...",
>>>>> "ver": "0#12348839",
>>>>> "master_ver": "0#0",
>>>>> "mtime": "2015-03-08 11:44:11.00",
>>>>> "max_marker": "0#",
>>>>> "usage": {
>>>>> "rgw.none": {
>>>>> "size_kb": 0,
>>>>> "size_kb_actual": 0,
>>>>> "num_objects": 0
>>>>> },
>>>>> "rgw.main": {
>>>>> "size_kb": 0,
>>>>> "size_kb_actual": 0,
>>>>> "num_objects": 0
>>>>> }
>>>>> },
>>>>> "bucket_quota": {
>>>>> "enabled": false,
>>>>> "max_size_kb": -1,
>>>>> "max_objects": -1
>>>>> }
>>>>> },
>>>>>
>>>>>
>>>>>
>>>>> bucket check shows nothing:
>>>>>
>>>>> 16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>>>> --bucket=globalcache01 --fix
>>>>> []
>>>>> 16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
>>>>> --check-head-obj-locator --bucket=globalcache01 --fix
>>>>> {
>>>>> "bucket": "globalcache01",
>>>>> "check_objects": [
>>>>> ]
>>>>> }
>>>&

Re: [ceph-users] a couple of radosgw questions

2015-08-29 Thread Ben Hines

I'm not the OP, but in my particular case, gc is proceeding normally
(since 94.2, i think) -- i just have millions of older objects
(months-old) which will not go away.

(see my other post --
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003967.html
 )

-Ben

On Fri, Aug 28, 2015 at 5:14 PM, Brad Hubbard bhubb...@redhat.com wrote:
 - Original Message -
 From: Ben Hines bhi...@gmail.com
 To: Brad Hubbard bhubb...@redhat.com
 Cc: Tom Deneau tom.den...@amd.com, ceph-users ceph-us...@ceph.com
 Sent: Saturday, 29 August, 2015 9:49:00 AM
 Subject: Re: [ceph-users] a couple of radosgw questions

 16:22:38 root@sm-cephrgw4 /etc/ceph $ radosgw-admin temp remove
 unrecognized arg remove
 usage: radosgw-admin cmd [options...]
 commands:
 
   temp removeremove temporary objects that were created up to
  specified date (and optional time)

 Looking into this ambiguity, thanks.



 On Fri, Aug 28, 2015 at 4:24 PM, Brad Hubbard bhubb...@redhat.com wrote:
  emove an object, it is no longer visible
  from the S3 API, but the objects
 that comprised it are still there in .rgw.buckets pool.  When do they
 get
 removed?
 
  Does the following command remove them?
 
  http://ceph.com/docs/master/radosgw/purge-temp/


 Does radosgw-admin gc list show anything?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Troubleshooting rgw bucket list

2015-08-28 Thread Ben Hines

How many objects in the bucket?

RGW has problems with index size once number of objects gets into the
90+ level. The buckets need to be recreated with 'sharded bucket
indexes' on:

rgw override bucket index max shards = 23

You could also try repairing the index with:

 radosgw-admin bucket check --fix --bucket=bucketname

-Ben

On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters s...@ericom.be wrote:
 Hi,

 we have a rgw bucket (with versioning) where PUT and GET operations for
 specific objects succeed,  but retrieving an object list fails.
 Using python-boto, after a timeout just gives us an 500 internal error;
 radosgw-admin just hangs.
 Also a radosgw-admin bucket check just seems to hang...

 ceph version is 0.94.3 but this also was happening with 0.94.2, we
 quietly hoped upgrading would fix but it didn't...

 r,
 Sam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Troubleshooting rgw bucket list

2015-08-28 Thread Ben Hines

Still, i'd strongly recommend sharding your big bucket before it gets
much bigger. Typically it's during OSD recovery that you will
encounter problems as it moves the index and locks all writes to it,
it will start returning 500s.

Problem is, you need to recreate the bucket before sharding it, it
would be nice if RGW could shard an existing bucket's index.

The check can take a long time.

-Ben

On Fri, Aug 28, 2015 at 9:16 AM, Sam Wouters s...@ericom.be wrote:
 Hi,

 this bucket only has 13389 objects, so the index size shouldn't be a
 problem. Also, on the same cluster we have an other bucket with 1200543
 objects (but no versioning configured), which has no issues.

 when we run a radosgw-admin bucket --check (--fix), nothing seems to be
 happening. Putting an strace on the process shows a lot of lines like these:
 [pid 99372] futex(0x2d730d4, FUTEX_WAIT_PRIVATE, 156619, NULL
 unfinished ...
 [pid 99385] futex(0x2da9410, FUTEX_WAIT_PRIVATE, 2, NULL unfinished ...
 [pid 99371] futex(0x2da9410, FUTEX_WAKE_PRIVATE, 1 unfinished ...
 [pid 99385] ... futex resumed )   = -1 EAGAIN (Resource
 temporarily unavailable)
 [pid 99371] ... futex resumed )   = 0

 but no errors in the ceph logs or health warnings.

 r,
 Sam

 On 28-08-15 17:49, Ben Hines wrote:
 How many objects in the bucket?

 RGW has problems with index size once number of objects gets into the
 90+ level. The buckets need to be recreated with 'sharded bucket
 indexes' on:

 rgw override bucket index max shards = 23

 You could also try repairing the index with:

  radosgw-admin bucket check --fix --bucket=bucketname

 -Ben

 On Fri, Aug 28, 2015 at 8:38 AM, Sam Wouters s...@ericom.be wrote:
 Hi,

 we have a rgw bucket (with versioning) where PUT and GET operations for
 specific objects succeed,  but retrieving an object list fails.
 Using python-boto, after a timeout just gives us an 500 internal error;
 radosgw-admin just hangs.
 Also a radosgw-admin bucket check just seems to hang...

 ceph version is 0.94.3 but this also was happening with 0.94.2, we
 quietly hoped upgrading would fix but it didn't...

 r,
 Sam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-28 Thread Ben Hines

Ceph 0.93-94.2-94.3

I noticed my pool used data amount is about twice the bucket used data count.

This bucket was emptied long ago. It has zero objects:
globalcache01,
{
bucket: globalcache01,
pool: .rgw.buckets,
index_pool: .rgw.buckets.index,
id: default.8873277.32,
marker: default.8873277.32,
owner: ...,
ver: 0#12348839,
master_ver: 0#0,
mtime: 2015-03-08 11:44:11.00,
max_marker: 0#,
usage: {
rgw.none: {
size_kb: 0,
size_kb_actual: 0,
num_objects: 0
},
rgw.main: {
size_kb: 0,
size_kb_actual: 0,
num_objects: 0
}
},
bucket_quota: {
enabled: false,
max_size_kb: -1,
max_objects: -1
}
},



bucket check shows nothing:

16:07:09 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
--bucket=globalcache01 --fix
[]
16:07:27 root@sm-cephrgw4 ~ $ radosgw-admin bucket check
--check-head-obj-locator --bucket=globalcache01 --fix
{
bucket: globalcache01,
check_objects: [
]
}


However, i see a lot of data for it on an OSD (all shadow files with
escaped underscores)

[root@sm-cld-mtl-008 current]# find . -name default.8873277.32* -print
./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/default.8873277.32\u\ushadow\u.Tos2Ms8w2BiEG7YJAZeE6zrrc\uwcHPN\u1__head_D886E961__c
./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_1/default.8873277.32\u\ushadow\u.Aa86mlEMvpMhRaTDQKHZmcxAReFEo2J\u1__head_4A71E961__c
./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_5/default.8873277.32\u\ushadow\u.KCiWEa4YPVaYw2FPjqvpd9dKTRBu8BR\u17__head_00B5E961__c
./12.161_head/DIR_1/DIR_6/DIR_9/DIR_E/DIR_8/default.8873277.32\u\ushadow\u.A2K\u2H1XKR8weiSwKGmbUlsCmEB9GDF\u32__head_42E8E961__c
snip

-bash-4.1$ rados -p .rgw.buckets ls | egrep '8873277\.32.+'
default.8873277.32__shadow_.pvaIjBfisb7pMABicR9J2Bgh8JUkEfH_47
default.8873277.32__shadow_.Wr_dGMxdSRHpoeu4gsQZXJ8t0I3JI7l_6
default.8873277.32__shadow_.WjijDxYhLFMUYdrMjeH7GvTL1LOwcqo_3
default.8873277.32__shadow_.3lRIhNePLmt1O8VVc2p5X9LtAVfdgUU_1
default.8873277.32__shadow_.VqF8n7PnmIm3T9UEhorD5OsacvuHOOy_16
default.8873277.32__shadow_.Jrh59XT01rIIyOdNPDjCwl5Pe1LDanp_2
snip

Is there still a bug in the fix obj locator command perhaps? I suppose
can just do something like:

   rados -p .rgw.buckets cleanup --prefix default.8873277.32

Since i want to destroy the bucket anyway, but if this affects other
buckets, i may want to clean those a better way.

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] a couple of radosgw questions

2015-08-28 Thread Ben Hines

16:22:38 root@sm-cephrgw4 /etc/ceph $ radosgw-admin temp remove
unrecognized arg remove
usage: radosgw-admin cmd [options...]
commands:

  temp removeremove temporary objects that were created up to
 specified date (and optional time)


On Fri, Aug 28, 2015 at 4:24 PM, Brad Hubbard bhubb...@redhat.com wrote:
 emove an object, it is no longer visible
 from the S3 API, but the objects
that comprised it are still there in .rgw.buckets pool.  When do they get
removed?

 Does the following command remove them?

 http://ceph.com/docs/master/radosgw/purge-temp/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw only delivers whats cached if latency between keyrequest and actual download is above 90s

2015-08-21 Thread Ben Hines

I just tried this (with some smaller objects, maybe 4.5 MB, as well as
with a 16 GB file and it worked fine.

However, i am using apache + fastcgi interface to rgw, rather than civetweb.

-Ben

On Fri, Aug 21, 2015 at 12:19 PM, Sean seapasu...@uchicago.edu wrote:
 We heavily use radosgw here for most of our work and we have seen a weird
 truncation issue with radosgw/s3 requests.

 We have noticed that if the time between the initial ticket to grab the
 object key and grabbing the data is greater than 90 seconds the object
 returned is truncated to whatever RGW has grabbed/cached after the initial
 connection and this seems to be around 512k.

 Here is some PoC. This will work on most objects I have tested mostly 1G to
 5G keys in RGW::

 
 
 #!/usr/bin/env python

 import os
 import sys
 import json
 import time

 import boto
 import boto.s3.connection

 if __name__ == '__main__':
 import argparse

 parser = argparse.ArgumentParser(description='Delayed download.')

 parser.add_argument('credentials', type=argparse.FileType('r'),
 help='Credentials file.')

 parser.add_argument('endpoint')
 parser.add_argument('bucket')
 parser.add_argument('key')

 args = parser.parse_args()

 credentials= json.load(args.credentials)[args.endpoint]

 conn = boto.connect_s3(
 aws_access_key_id = credentials.get('access_key'),
 aws_secret_access_key = credentials.get('secret_key'),
 host  = credentials.get('host'),
 port  = credentials.get('port'),
 is_secure = credentials.get('is_secure',False),
 calling_format= boto.s3.connection.OrdinaryCallingFormat(),
 )

 key = conn.get_bucket(args.bucket).get_key(args.key)

 key.BufferSize = 1048576
 key.open_read(headers={})
 time.sleep(120)

 key.get_contents_to_file(sys.stdout)
 
 

 The format of the credentials file is just standard::

 =
 =
 {
  cluster: {
 access_key: blahblahblah,
 secret_key: blahblahblah,
 host: blahblahblah,
 port: 443,
 is_secure: true
 }
 }

 =
 =


 From here your object will almost always be truncated to whatever the
 gateway has cached in the time after the initial key request.

 This can be a huge issue as if the radosgw or cluster is tasked some
 requests can be minutes long. You can end up grabbing the rest of the object
 by doing a range request against the gateway so I know the data is intact
 but I don't think the radosgw should be acting as if the download is
 completed successfully and I think it should instead return an error of some
 kind if it can no longer service the request.

 We are using hammer (ceph version 0.94.2
 (5fb85614ca8f354284c713a2f9c610860720bbf3)) and using civetweb as our
 gateway.

 This is on a 3 node test cluster but I have tried on our larger cluster with
 the same behavior. If I can provide any other information please let me
 know.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] НА: CEPH cache layer. Very slow

2015-08-14 Thread Ben Hines

Nice to hear that you have no SSD failures yet in 10months.

How many OSDs are you running, and what is your primary ceph workload?
(RBD, rgw, etc?)

-Ben

On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович
me...@yuterra.ru wrote:
 Hi!


 Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for ceph
 journals
 and DC S3700 400Gb in the SSD pool: same hosts, separate root in crushmap.

 SSD pool are not yet in production, journаlling SSDs works under production
 load
 for 10 months. They're in good condition - no faults, no degradation.

 We specially take 200Gb SSD for journals to reduce costs, and also have a
 higher
 than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended
 1/3 to 1/6.

 So, as a conclusion - I'll recommend you to get a bigger budget and buy
 durable
 and fast SSDs for Ceph.

 Megov Igor
 CIO, Yuterra

 
 От: ceph-users ceph-users-boun...@lists.ceph.com от имени Voloshanenko
 Igor igor.voloshane...@gmail.com
 Отправлено: 13 августа 2015 г. 15:54
 Кому: Jan Schermer
 Копия: ceph-users@lists.ceph.com
 Тема: Re: [ceph-users] CEPH cache layer. Very slow

 So, good, but price for 845 DC PRO 400 GB higher in about 2x times than
 intel S3500 240G (((

 Any other models? (((

 2015-08-13 15:45 GMT+03:00 Jan Schermer j...@schermer.cz:

 I tested and can recommend the Samsung 845 DC PRO (make sure it is DC PRO
 and not just PRO or DC EVO!).
 Those were very cheap but are out of stock at the moment (here).
 Faster than Intels, cheaper, and slightly different technology (3D V-NAND)
 which IMO makes them superior without needing many tricks to do its job.

 Jan

 On 13 Aug 2015, at 14:40, Voloshanenko Igor igor.voloshane...@gmail.com
 wrote:

 Tnx, Irek! Will try!

 but another question to all, which SSD good enough for CEPH now?

 I'm looking into S3500 240G (I have some S3500 120G which show great
 results. Around 8x times better than Samsung)

 Possible you can give advice about other vendors/models with same or below
 price level as S3500 240G?

 2015-08-13 12:11 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 Hi, Igor.
 Try to roll the patch here:

 http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov

 P.S. I am no longer tracks changes in this direction(kernel), because we
 use already recommended SSD

 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757

 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor
 igor.voloshane...@gmail.com:

 So, after testing SSD (i wipe 1 SSD, and used it for tests)

 root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
 --gr[53/1800]
 ting --name=journal-test
 journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
 iodepth=1
 fio-2.1.3
 Starting 1 process
 Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] [eta
 00m:00s]
 journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
 10:46:42 2015
   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
 clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
  lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
 clat percentiles (usec):
  |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[
 2928],
  | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[
 3408],
  | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[
 4016],
  | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792],
 99.95th=[10048],
  | 99.99th=[14912]
 bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
 stdev=34.31
 lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
   cpu  : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
   IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
 =64=0.0%
  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
  issued: total=r=0/w=17243/d=0, short=r=0/w=0/d=0

 Run status group 0 (all jobs):
   WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
 mint=60001msec, maxt=60001msec

 Disk stats (read/write):
   sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576,
 util=99.30%

 So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s

 I try to change cache mode :
 echo temporary write through  /sys/class/scsi_disk/2:0:0:0/cache_type
 echo temporary write through  /sys/class/scsi_disk/3:0:0:0/cache_type

 no luck, still same shit results, also i found this article:
 https://lkml.org/lkml/2013/11/20/264 pointed to old very simple patch,
 which disable CMD_FLUSH
 https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba

 Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
 without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
 because SSD 850 Pro have issue with NCQ TRIM and before 4.0.4 this
 exception was not included into libsata.c)

[ceph-users] optimizing non-ssd journals

2015-08-07 Thread Ben Hines

Our cluster is primarily used for RGW, but would like to use for RBD
eventually...

We don't have SSDs on our journals (for a while yet) and we're still
updating our cluster to 10GBE.

I do see some pretty high commit and apply latencies in 'osd perf'
often 100-500 ms, which figure is a result of the spinning journals.

Cluster consists of ~110 OSDs, 4 per node, on 2TB drives each, JBOD,
xfs with the associated 5GB  journal a second partition on each of
them:

/dev/sdb :
 /dev/sdb1 ceph data, active, cluster ceph, osd.35, journal /dev/sdb2
 /dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdc :
 /dev/sdc1 ceph data, active, cluster ceph, osd.36, journal /dev/sdc2
 /dev/sdc2 ceph journal, for /dev/sdc1
...

Also they are mounted with:
osd mount options xfs = rw,noatime,inode64

+ 8 experimental btrfs osds, mounted with
osd_mount_options_btrfs = rw,noatime,space_cache,user_subvol_rm_allowed


Considering that SSDs are unlikely in near term, what can we do to
help commit/apply latency?

- Would increasing the size of the journal partition help?

- JBOD vs single-disk RAID0 - the drives are just JBODded now.
Research indicates i may see improvements with single-disk RAID0. Is
this information still current?

thanks-

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Check networking first?

2015-07-31 Thread Ben Hines

I encountered a similar problem. Incoming firewall ports were blocked
on one host. So the other OSDs kept marking that OSD as down. But, it
could talk out, so it kept saying 'hey, i'm up, mark me up' so then
the other OSDs started trying to send it data again, causing backed up
requests.. Which goes on, ad infinitum. I had to figure out the
connectivity problem myself by looking in the OSD logs.

After a while, the cluster should just say 'no, you're not reachable,
stop putting yourself back into the cluster'.

-Ben

On Fri, Jul 31, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote:
 I remember reading that ScaleIO (I think?) does something like this by 
 regularly sending reports to a multicast group, thus any node with issues (or 
 just overload) is reweighted or avoided automatically on the client. OSD map 
 is the Ceph equivalent I guess. It makes sense to gather metrics and 
 prioritize better performing OSDs over those with e.g. worse latencies, but 
 it needs to update fast. But I believe that _network_ monitoring itself ought 
 to be part of… a network monitoring system you should already have :-) and 
 not a storage system that just happens to use network. I don’t remember 
 seeing anything but a simple ping/traceroute/dns test in any SAN interface. 
 If an OSD has issues it might be anything from a failing drive to a swapping 
 OS and a number like “commit latency” (= response time average from the 
 clients’ perspective) is maybe the ultimate metric of all for this purpose, 
 irrespective of the root cause.

 Nice option would be to read data from all replicas at once - this would of 
 course increase load and cause all sorts of issues if abused, but if you have 
 an app that absolutely-always-without-fail-must-get-data-ASAP then you could 
 enable this in the client (and I think that would be an easy option to add). 
 This is actually used in some systems. Harder part is to fail nicely when 
 writing (like waiting only for the remote network buffers on 2 nodes to get 
 the data instead of waiting for commit on all 3 replicas…)

 Jan

 On 31 Jul 2015, at 19:45, Robert LeBlanc rob...@leblancnet.us wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Even just a ping at max MTU set with nodefrag could tell a lot about
 connectivity issues and latency without a lot of traffic. Using Ceph
 messenger would be even better to check firewall ports. I like the
 idea of incorporating simple network checks into Ceph. The monitor can
 correlate failures and help determine if the problem is related to one
 host from the CRUSH map.
 - 
 Robert LeBlanc
 PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


 On Thu, Jul 30, 2015 at 11:27 PM, Stijn De Weirdt  wrote:
 wouldn't it be nice that ceph does something like this in background (some
 sort of network-scrub). debugging network like this is not that easy (can't
 expect admins to install e.g. perfsonar on all nodes and/or clients)

 something like: every X min, each service X pick a service Y on another host
 (assuming X and Y will exchange some communication at some point; like osd
 with other osd), send 1MB of data, and make the timing data available so we
 can monitor it and detect underperforming links over time.

 ideally clients also do this, but not sure where they should report/store
 the data.

 interpreting the data can be a bit tricky, but extreme outliers will be
 spotted easily, and the main issue with this sort of debugging is collecting
 the data.

 simply reporting / keeping track of ongoing communications is already a big
 step forward, but then we need to have the size of the exchanged data to
 allow interpretation (and the timing should be about the network part, not
 e.g. flush data to disk in case of an osd). (and obviously sampling is
 enough, no need to have details of every bit send).



 stijn


 On 07/30/2015 08:04 PM, Mark Nelson wrote:

 Thanks for posting this!  We see issues like this more often than you'd
 think.  It's really important too because if you don't figure it out the
 natural inclination is to blame Ceph! :)

 Mark

 On 07/30/2015 12:50 PM, Quentin Hartman wrote:

 Just wanted to drop a note to the group that I had my cluster go
 sideways yesterday, and the root of the problem was networking again.
 Using iperf I discovered that one of my nodes was only moving data at
 1.7Mb / s. Moving that node to a different switch port with a different
 cable has resolved the problem. It took awhile to track down because
 none of the server-side error metrics for disk or network showed
 anything was amiss, and I didn't think to test network performance (as
 suggested in another thread) until well into the process.

 Check networking first!

 QH


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list

Re: [ceph-users] Transfering files from NFS to ceph + RGW

2015-07-08 Thread Ben Hines

It's really about 10 minutes of work to write a python client to post
files into RGW/S3. (we use boto) Or you could use an S3 GUI client
such as Cyberduck.

The problem i am having and which you should look out for is that many
millions of objects in a single RGW bucket causes problems with
contention on the bucket index object in ceph. The 'sharded bucket
index' feature is new, and is intended to resolve this, but may have
other issues such as slowness. Going forward it would be nice if rgw
handled its index better.

-Ben

On Wed, Jul 8, 2015 at 7:01 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi,

 We are planning to build a Ceph cluster with RGW/S3 as the interface for
 user access. We have PB level of data in NFS share which needs to be moved
 to the Ceph cluster and that’s why I need your valuable input on how to
 efficiently do that. I am sure this is a common problem that RGW users in
 Ceph community have faced and resolved J .

 I can think of the following approach.



 Since the data needs to be accessed later with RGW/S3 , we have to write an
 application that can PUT the existing files as objects  over RGW+S3
 interface to the cluster.



 Is there any alternative approach ?

 There are existing RADOS tools that can take files as input and store it in
 a cluster , but, unfortunately RGW probably will not be able to understand
 those.

 IMO, there should be a channel where we can use these rados utility to store
 the objects in .rgw.data pool and RGW should be able to read the objects.
 This will solve lot of data migration problem (?).

 Also, probably this blueprint
 (https://wiki.ceph.com/Planning/Blueprints/Infernalis/RGW%3A_NFS) of
 Yehuda’s trying to solve similar problem…



 Anyways, Please share your thoughts and let me know if anybody already has a
 workaround for this.



 Thanks  Regards

 Somnath


 

 PLEASE NOTE: The information contained in this electronic mail message is
 intended only for the use of the designated recipient(s) named above. If the
 reader of this message is not the intended recipient, you are hereby
 notified that you have received this message in error and that any review,
 dissemination, distribution, or copying of this message is strictly
 prohibited. If you have received this communication in error, please notify
 the sender by telephone or e-mail (as shown above) immediately and destroy
 any and all copies of this message in your possession (whether hard copies
 or electronically stored copies).


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hammer issues (rgw)

2015-07-08 Thread Ben Hines

Also recently updated to 94.2. I am also seeing a large difference
between my 'ceph df' and 'size_kb_actual' in the bucket stats. I would
assume the difference is objects awaiting gc, but 'gc list' prints
very little.

ceph df:

NAME   ID USED%USED MAX AVAIL OBJECTS
...
.rgw.buckets   12  37370G 21.1842122G 25934075


  radosgw-admin bucket stats | grep size_kb_actual | sed 's# \|,##g' |
cut -d':' -f2 | awk '{s+=$1} END {print s}' | awk '{$1=$1/(1024^2);
print $1,GB;}'

5700.74 GB  would expect ~37370G ?

radosgw-admin gc list --include-all has 10,000 objects, but i think
that is from an earlier today rebalance, the 37 Gig vs 5.7 GB
difference has been present for quite a while.

Any suggestions?

-Ben


On Mon, Jun 29, 2015 at 2:32 AM, Gleb Borisov borisov.g...@gmail.com wrote:
 Hi

 We've just upgraded our storage from 0.94 to 0.94.2 and realized that we
 have a lot of garbage and corrupted objects in our buckets.

 First of all, we found several corrupted objects (missing data in the middle
 of object) uploaded via S3 multipart upload with enabled retry policy. It
 seems that we faced with [1]. Is there a proper way to find such objects?
 How we can deal with them?
 Of course we can check all gzipped objects (download and try to uncompress
 them), but we also have a lot of plain text data objects which we can't
 check in this way.

 Also we were also impacted by [2]. Now garbage collection is running
 normally, but there are 70TiB of garbage in our storage. Do we have any way
 to find and remove all such objects? (we check bucket stats and it shows
 20TiB of data and `ceph df` reports 90TiB used).

 Thanks.

 [1] http://tracker.ceph.com/issues/11604
 [2] http://tracker.ceph.com/issues/10295

 --
 Best regards,
 Gleb M Borisov

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw bucket index sharding tips?

2015-07-07 Thread Ben Hines

Anyone have any data on optimal # of shards for a radosgw bucket index?

We've had issues with bucket index contention with a few million+
objects in a single bucket so i'm testing out the sharding.

Perhaps at least one shard per OSD? Or, less? More?

I noticed some discussion here regarding slow bucket listing with
~200k obj -- http://cephnotes.ksperis.com/blog/2015/05/12/radosgw-big-index
- bucket list seems significantly impacted.

But i'm more concerned about general object put  (write) / object read
speed since 'bucket listing' is not something that we need to do. Not
sure if the index has to be completely read to write an object into
it?

thanks-

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Switching from tcmalloc

2015-06-24 Thread Ben Hines

Did you do before/after Ceph performance benchmarks? I dont care if my
systems are using 80% cpu, if Ceph performance is better than when
it's using 20% cpu.

Can you share any scripts you have to automate these things? (NUMA
pinning, migratepages)

thanks,

-Ben

On Wed, Jun 24, 2015 at 10:25 AM, Jan Schermer j...@schermer.cz wrote:
 There were essentialy three things we had to do for such a drastic drop

 1) recompile CEPH —without-tcmalloc
 2) pin the OSDs to a set of a specific NUMA zone  - we had this for a long
 time and it really helped
 3) migrate the OSD memory to the correct CPU with migratepages
  - we will use cgroups in the future for this, should make life easier and
 is the only correct solution

 It is similiar to the effect of just restarting the OSD, but much better -
 since we immediately see hundreds of connections on a freshly restarted OSD
 (and in the benchmark the tcmalloc issue manifested with just two clients in
 parallel) I’d say we never saw the raw performance with tcmalloc
 (undegraded), but it was never this good - consistently low latencies, much
 smaller spikes when something happens and much lower CPU usage (about 50%
 savings but we’re also backfilling a lot on the background). Workloads are
 faster as well - like reweighting OSDs on that same node was much (hundreds
 of percent) faster.

 So far the effect has been drastic. I wonder why tcmalloc was even used when
 people are having problems with it? The glibc malloc seems to work just fine
 for us.

 The only concerning thing is the virtual memory usage - we are over 400GB
 VSS with a few OSDs. That doesn’t hurt anything, though.

 Jan


 On 24 Jun 2015, at 18:46, Robert LeBlanc rob...@leblancnet.us wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Did you see what the effect of just restarting the OSDs before using
 tcmalloc? I've noticed that there is usually a good drop for us just by
 restarting them. I don't think it is usually this drastic.

 - 
 Robert LeBlanc
 GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

 On Wed, Jun 24, 2015 at 2:08 AM, Jan Schermer  wrote:
 Can you guess when we did that?
 Still on dumpling, btw...

 http://www.zviratko.net/link/notcmalloc.png

 Jan

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 -BEGIN PGP SIGNATURE-
 Version: Mailvelope v0.13.1
 Comment: https://www.mailvelope.com

 wsFcBAEBCAAQBQJVit75CRDmVDuy+mK58QAAmjcP/jU+wyohdwKDP+FHDAgJ
 DcqdB5aPG2AM79iLcYUub5bQjdNJpcWN/hyZcNdF3aSzEV3aY6jIqu9OpOIB
 c2fIzfGOoczzW/FEf7qKRVGpxaQL21Sw1LpwMEscNe0ETz9HMHoaAnBO9IFn
 nUEOCdEpRBO5W1rWwNAx9EVnOUPklb7vVEpY23sgtHhQSprb9oeO8D99AMRz
 /RhdHKlRDgHBjun/stCiR6lFuvBUx0GBmyaMuO5rfsLGRIkySLv++3CLQI6X
 NCt/MjYwTTNNfO/y/MjkiV/j+Cm1G1lcjlgbDjilf7bgf8/7W2vJa1sMtaA4
 xJL+PpZxiKcGSdC96B+EBYxLhLcwsNpbfq7uxQOkIspa66mkIMAVzJgt4DFL
 Ca+UY3ODA26VtWF5U/hkdupgld+YSxXTyJakeShrBSFAX0a4cygV9Ll7SIhO
 IDS+0Mbur0IGzIWRgtCQhRXsc7wn3IoIovqe8Nfk4xupeoK2P5UHO1rW9pWy
 Jwj5PXieDqxgx8RKlulN1bCbSgTaEdveTiqqVxlnM9L0MhgesuB8vkpHbsqn
 mYJHNzU7ghU89xLnRuia9rBlpjw4OzagfowAJTH3UnaO67kxES+IWO8onQbN
 RhY0QR5cB5rVSjYkzzlsuLM17fQPcT8++yMarKdsrr6WIGppXUFFdATAqIaY
 DHD1
 =goL4
 -END PGP SIGNATURE-



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Shadow Files

2015-04-24 Thread Ben Hines

When these are fixed it would be great to get good steps for listing /
cleaning up any orphaned objects. I have suspicions this is affecting us.

thanks-

-Ben

On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com
wrote:

 These ones:

 http://tracker.ceph.com/issues/10295
 http://tracker.ceph.com/issues/11447

 - Original Message -
  From: Ben Jackson b@benjackson.email
  To: Yehuda Sadeh-Weinraub yeh...@redhat.com
  Cc: ceph-users ceph-us...@ceph.com
  Sent: Friday, April 24, 2015 3:06:02 PM
  Subject: Re: [ceph-users] Shadow Files
 
  We were firefly, then we upgraded to giant, now we are on hammer.
 
  What issues?
 
  On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
  
   What version are you running? There are two different issues that we
 were
   fixing this week, and we should have that upstream pretty soon.
  
   Yehuda
  
   - Original Message -
From: Ben b@benjackson.email
To: ceph-users ceph-us...@ceph.com
Cc: Yehuda Sadeh-Weinraub yeh...@redhat.com
Sent: Thursday, April 23, 2015 7:42:06 PM
Subject: [ceph-users] Shadow Files
   
We are still experiencing a problem with out gateway not properly
clearing out shadow files.
   
I have done numerous tests where I have:
-Uploaded a file of 1.5GB in size using s3browser application
-Done an object stat on the file to get its prefix
-Done rados ls -p .rgw.buckets | grep prefix to count the number of
shadow files associated (in this case it is around 290 shadow files)
-Deleted said file with s3browser
-Performed a gc list, which shows the ~290 files listed
-Waited 24 hours to redo the rados ls -p .rgw.buckets | grep
 prefix to
recount the shadow files only to be left with 290 files still there
   
 From log output /var/log/ceph/radosgw.log, I can see the following
 when
clicking DELETE (this appears 290 times)
2015-04-24 10:43:29.996523 7f0b0afb5700  0
 RGWObjManifest::operator++():
result: ofs=4718592 stripe_ofs=4718592 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996557 7f0b0afb5700  0
 RGWObjManifest::operator++():
result: ofs=8912896 stripe_ofs=8912896 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996564 7f0b0afb5700  0
 RGWObjManifest::operator++():
result: ofs=13107200 stripe_ofs=13107200 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996570 7f0b0afb5700  0
 RGWObjManifest::operator++():
result: ofs=17301504 stripe_ofs=17301504 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996576 7f0b0afb5700  0
 RGWObjManifest::operator++():
result: ofs=21495808 stripe_ofs=21495808 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996581 7f0b0afb5700  0
 RGWObjManifest::operator++():
result: ofs=25690112 stripe_ofs=25690112 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996586 7f0b0afb5700  0
 RGWObjManifest::operator++():
result: ofs=29884416 stripe_ofs=29884416 part_ofs=0 rule-part_size=0
2015-04-24 10:43:29.996592 7f0b0afb5700  0
 RGWObjManifest::operator++():
result: ofs=34078720 stripe_ofs=34078720 part_ofs=0 rule-part_size=0
   
In this same log, I also see the gc process saying it is removing
 said
file (these records appear 290 times too)
2015-04-23 14:16:27.926952 7f15be0ee700  0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.928572 7f15be0ee700  0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.929636 7f15be0ee700  0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.930448 7f15be0ee700  0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.931226 7f15be0ee700  0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.932103 7f15be0ee700  0 gc::process: removing
.rgw.buckets:objectname
2015-04-23 14:16:27.933470 7f15be0ee700  0 gc::process: removing
.rgw.buckets:objectname
   
So even though it appears that the GC is processing its removal, the
shadow files remain!
   
Please help!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v0.93: Bucket removal with data purge

2015-03-04 Thread Ben Hines

Ah, nevermind - i had to pass the --bucket=bucketname argument.

You'd think the command would print an error if missing the critical argument.

-Ben

On Wed, Mar 4, 2015 at 6:06 PM, Ben Hines bhi...@gmail.com wrote:
 One of the release notes says:
 rgw: fix bucket removal with data purge (Yehuda Sadeh)

 Just tried this and it didnt seem to work:


 bash-4.1$ time radosgw-admin bucket rm mike-cache2 --purge-objects

 real0m7.711s
 user0m0.109s
 sys 0m0.072s

 Yet the bucket was not deleted, nor purged:

 -bash-4.1$ radosgw-admin bucket stats
 [

 mike-cache2,
 {
 bucket: mike-cache2,
 pool: .rgw.buckets,
 index_pool: .rgw.buckets.index,
 id: default.2769570.4,
 marker: default.2769570.4,
 owner: smbuildmachine,
 ver: 0#329,
 master_ver: 0#0,
 mtime: 2014-11-11 16:10:31.00,
 max_marker: 0#,
 usage: {
 rgw.main: {
 size_kb: 223355,
 size_kb_actual: 223768,
 num_objects: 164
 }
 },
 bucket_quota: {
 enabled: false,
 max_size_kb: -1,
 max_objects: -1
 }
 },

 ]




 -Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] v0.93: Bucket removal with data purge

2015-03-04 Thread Ben Hines

One of the release notes says:
rgw: fix bucket removal with data purge (Yehuda Sadeh)

Just tried this and it didnt seem to work:


bash-4.1$ time radosgw-admin bucket rm mike-cache2 --purge-objects

real0m7.711s
user0m0.109s
sys 0m0.072s

Yet the bucket was not deleted, nor purged:

-bash-4.1$ radosgw-admin bucket stats
[

mike-cache2,
{
bucket: mike-cache2,
pool: .rgw.buckets,
index_pool: .rgw.buckets.index,
id: default.2769570.4,
marker: default.2769570.4,
owner: smbuildmachine,
ver: 0#329,
master_ver: 0#0,
mtime: 2014-11-11 16:10:31.00,
max_marker: 0#,
usage: {
rgw.main: {
size_kb: 223355,
size_kb_actual: 223768,
num_objects: 164
}
},
bucket_quota: {
enabled: false,
max_size_kb: -1,
max_objects: -1
}
},

]




-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Ben Hines

Blind-bucket would be perfect for us, as we don't need to list the objects.

We only need to list the bucket when doing a bucket deletion. If we
could clean out/delete all objects in a bucket (without
iterating/listing them) that would be ideal..

On Mon, Mar 2, 2015 at 7:34 PM, GuangYang yguan...@outlook.com wrote:
 We have had good experience so far keeping each bucket less than 0.5 million 
 objects, by client side sharding. But I think it would be nice you can test 
 at your scale, with your hardware configuration, as well as your expectation 
 over the tail latency.

 Generally the bucket sharding should help, both for Write throughput and 
 *stall with recovering/scrubbing*, but it comes with a prices -  The X shards 
 you have for each bucket, the listing/trimming would be X times weighted, 
 from OSD's load's point of view. There was discussion to implement: 1) blind 
 bucket (for use cases bucket listing is not needed). 2) Un-ordered listing, 
 which could improve the problem I mentioned above. They are on the roadmap...

 Thanks,
 Guang


 
 From: bhi...@gmail.com
 Date: Mon, 2 Mar 2015 18:13:25 -0800
 To: erdem.agao...@gmail.com
 CC: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Some long running ops may lock osd

 We're seeing a lot of this as well. (as i mentioned to sage at
 SCALE..) Is there a rule of thumb at all for how big is safe to let a
 RGW bucket get?

 Also, is this theoretically resolved by the new bucket-sharding
 feature in the latest dev release?

 -Ben

 On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu erdem.agao...@gmail.com 
 wrote:
 Hi Gregory,

 We are not using listomapkeys that way or in any way to be precise. I used
 it here just to reproduce the behavior/issue.

 What i am really interested in is if scrubbing-deep actually mitigates the
 problem and/or is there something that can be further improved.

 Or i guess we should go upgrade now and hope for the best :)

 On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com wrote:

 On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu erdem.agao...@gmail.com
 wrote:
 Hi all, especially devs,

 We have recently pinpointed one of the causes of slow requests in our
 cluster. It seems deep-scrubs on pg's that contain the index file for a
 large radosgw bucket lock the osds. Incresing op threads and/or disk
 threads
 helps a little bit, but we need to increase them beyond reason in order
 to
 completely get rid of the problem. A somewhat similar (and more severe)
 version of the issue occurs when we call listomapkeys for the index
 file,
 and since the logs for deep-scrubbing was much harder read, this
 inspection
 was based on listomapkeys.

 In this example osd.121 is the primary of pg 10.c91 which contains file
 .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
 ~500k objects. Standard listomapkeys call take about 3 seconds.

 time rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null
 real 0m2.983s
 user 0m0.760s
 sys 0m0.148s

 In order to lock the osd we request 2 of them simultaneously with
 something
 like:

 rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 
 sleep 1
 rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 

 'debug_osd=30' logs show the flow like:

 At t0 some thread enqueue_op's my omap-get-keys request.
 Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
 keys.
 Op-Thread B responds to several other requests during that 1 second
 sleep.
 They're generally extremely fast subops on other pgs.
 At t1 (about a second later) my second omap-get-keys request gets
 enqueue_op'ed. But it does not start probably because of the lock held
 by
 Thread A.
 After that point other threads enqueue_op other requests on other pgs
 too
 but none of them starts processing, in which i consider the osd is
 locked.
 At t2 (about another second later) my first omap-get-keys request is
 finished.
 Op-Thread B locks pg 10.c91 and dequeue_op's my second request and
 starts
 reading ~500k keys again.
 Op-Thread A continues to process the requests enqueued in t1-t2.

 It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
 can
 process other requests for other pg's just fine.

 My guess is a somewhat larger scenario happens in deep-scrubbing, like
 on
 the pg containing index for the bucket of20M objects. A disk/op thread
 starts reading through the omap which will take say 60 seconds. During
 the
 first seconds, other requests for other pgs pass just fine. But in 60
 seconds there are bound to be other requests for the same pg, especially
 since it holds the index file. Each of these requests lock another
 disk/op
 thread to the point where there are no free threads left to process any
 requests for any pg. Causing slow-requests.

 So first of all thanks if you can make it here, and sorry for the
 involved
 mail, i'm exploring the problem as i go.
 Now, is that deep-scrubbing situation i tried to theorize even

Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Ben Hines

We're seeing a lot of this as well. (as i mentioned to sage at
SCALE..) Is there a rule of thumb at all for how big is safe to let a
RGW bucket get?

Also, is this theoretically resolved by the new bucket-sharding
feature in the latest dev release?

-Ben

On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu erdem.agao...@gmail.com wrote:
 Hi Gregory,

 We are not using listomapkeys that way or in any way to be precise. I used
 it here just to reproduce the behavior/issue.

 What i am really interested in is if scrubbing-deep actually mitigates the
 problem and/or is there something that can be further improved.

 Or i guess we should go upgrade now and hope for the best :)

 On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com wrote:

 On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu erdem.agao...@gmail.com
 wrote:
  Hi all, especially devs,
 
  We have recently pinpointed one of the causes of slow requests in our
  cluster. It seems deep-scrubs on pg's that contain the index file for a
  large radosgw bucket lock the osds. Incresing op threads and/or disk
  threads
  helps a little bit, but we need to increase them beyond reason in order
  to
  completely get rid of the problem. A somewhat similar (and more severe)
  version of the issue occurs when we call listomapkeys for the index
  file,
  and since the logs for deep-scrubbing was much harder read, this
  inspection
  was based on listomapkeys.
 
  In this example osd.121 is the primary of pg 10.c91 which contains file
  .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket contains
  ~500k objects. Standard listomapkeys call take about 3 seconds.
 
  time rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null
  real 0m2.983s
  user 0m0.760s
  sys 0m0.148s
 
  In order to lock the osd we request 2 of them simultaneously with
  something
  like:
 
  rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null 
  sleep 1
  rados -p .rgw.buckets listomapkeys .dir.5926.3  /dev/null 
 
  'debug_osd=30' logs show the flow like:
 
  At t0 some thread enqueue_op's my omap-get-keys request.
  Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading ~500k
  keys.
  Op-Thread B responds to several other requests during that 1 second
  sleep.
  They're generally extremely fast subops on other pgs.
  At t1 (about a second later) my second omap-get-keys request gets
  enqueue_op'ed. But it does not start probably because of the lock held
  by
  Thread A.
  After that point other threads enqueue_op other requests on other pgs
  too
  but none of them starts processing, in which i consider the osd is
  locked.
  At t2 (about another second later) my first omap-get-keys request is
  finished.
  Op-Thread B locks pg 10.c91 and dequeue_op's my second request and
  starts
  reading ~500k keys again.
  Op-Thread A continues to process the requests enqueued in t1-t2.
 
  It seems Op-Thread B is waiting on the lock held by Op-Thread A while it
  can
  process other requests for other pg's just fine.
 
  My guess is a somewhat larger scenario happens in deep-scrubbing, like
  on
  the pg containing index for the bucket of 20M objects. A disk/op thread
  starts reading through the omap which will take say 60 seconds. During
  the
  first seconds, other requests for other pgs pass just fine. But in 60
  seconds there are bound to be other requests for the same pg, especially
  since it holds the index file. Each of these requests lock another
  disk/op
  thread to the point where there are no free threads left to process any
  requests for any pg. Causing slow-requests.
 
  So first of all thanks if you can make it here, and sorry for the
  involved
  mail, i'm exploring the problem as i go.
  Now, is that deep-scrubbing situation i tried to theorize even possible?
  If
  not can you point us where to look further.
  We are currently running 0.72.2 and know about newer ioprio settings in
  Firefly and such. While we are planning to upgrade in a few weeks but i
  don't think those options will help us in any way. Am i correct?
  Are there any other improvements that we are not aware?

 This is all basically correct; it's one of the reasons you don't want
 to let individual buckets get too large.

 That said, I'm a little confused about why you're running listomapkeys
 that way. RGW throttles itself by getting only a certain number of
 entries at a time (1000?) and any system you're also building should
 do the same. That would reduce the frequency of any issues, and I
 *think* that scrubbing has some mitigating factors to help (although
 maybe not; it's been a while since I looked at any of that stuff).

 Although I just realized that my vague memory of deep scrubbing
 working better might be based on improvements that only got in for
 firefly...not sure.
 -Greg




 --
 erdem agaoglu

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

1 2 >

100 matches

Mail list logo