Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Ben Hines
Blind-bucket would be perfect for us, as we don't need to list the objects. We only need to list the bucket when doing a bucket deletion. If we could clean out/delete all objects in a bucket (without iterating/listing them) that would be ideal.. On Mon, Mar 2, 2015 at 7:34 PM, GuangYang

Re: [ceph-users] Some long running ops may lock osd

2015-03-02 Thread Ben Hines
We're seeing a lot of this as well. (as i mentioned to sage at SCALE..) Is there a rule of thumb at all for how big is safe to let a RGW bucket get? Also, is this theoretically resolved by the new bucket-sharding feature in the latest dev release? -Ben On Mon, Mar 2, 2015 at 11:08 AM, Erdem

Re: [ceph-users] v0.93: Bucket removal with data purge

2015-03-04 Thread Ben Hines
Ah, nevermind - i had to pass the --bucket=bucketname argument. You'd think the command would print an error if missing the critical argument. -Ben On Wed, Mar 4, 2015 at 6:06 PM, Ben Hines bhi...@gmail.com wrote: One of the release notes says: rgw: fix bucket removal with data purge (Yehuda

[ceph-users] v0.93: Bucket removal with data purge

2015-03-04 Thread Ben Hines
One of the release notes says: rgw: fix bucket removal with data purge (Yehuda Sadeh) Just tried this and it didnt seem to work: bash-4.1$ time radosgw-admin bucket rm mike-cache2 --purge-objects real0m7.711s user0m0.109s sys 0m0.072s Yet the bucket was not deleted, nor purged:

Re: [ceph-users] Shadow Files

2015-04-24 Thread Ben Hines
When these are fixed it would be great to get good steps for listing / cleaning up any orphaned objects. I have suspicions this is affecting us. thanks- -Ben On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: These ones: http://tracker.ceph.com/issues/10295

Re: [ceph-users] Switching from tcmalloc

2015-06-24 Thread Ben Hines
Did you do before/after Ceph performance benchmarks? I dont care if my systems are using 80% cpu, if Ceph performance is better than when it's using 20% cpu. Can you share any scripts you have to automate these things? (NUMA pinning, migratepages) thanks, -Ben On Wed, Jun 24, 2015 at 10:25 AM,

Re: [ceph-users] Check networking first?

2015-07-31 Thread Ben Hines
I encountered a similar problem. Incoming firewall ports were blocked on one host. So the other OSDs kept marking that OSD as down. But, it could talk out, so it kept saying 'hey, i'm up, mark me up' so then the other OSDs started trying to send it data again, causing backed up requests.. Which

Re: [ceph-users] radosgw only delivers whats cached if latency between keyrequest and actual download is above 90s

2015-08-21 Thread Ben Hines
I just tried this (with some smaller objects, maybe 4.5 MB, as well as with a 16 GB file and it worked fine. However, i am using apache + fastcgi interface to rgw, rather than civetweb. -Ben On Fri, Aug 21, 2015 at 12:19 PM, Sean seapasu...@uchicago.edu wrote: We heavily use radosgw here for

[ceph-users] optimizing non-ssd journals

2015-08-07 Thread Ben Hines
Our cluster is primarily used for RGW, but would like to use for RBD eventually... We don't have SSDs on our journals (for a while yet) and we're still updating our cluster to 10GBE. I do see some pretty high commit and apply latencies in 'osd perf' often 100-500 ms, which figure is a result of

Re: [ceph-users] НА: CEPH cache layer. Very slow

2015-08-14 Thread Ben Hines
Nice to hear that you have no SSD failures yet in 10months. How many OSDs are you running, and what is your primary ceph workload? (RBD, rgw, etc?) -Ben On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович me...@yuterra.ru wrote: Hi! Of course, it isn't cheap at all, but we use Intel

[ceph-users] radosgw bucket index sharding tips?

2015-07-07 Thread Ben Hines
Anyone have any data on optimal # of shards for a radosgw bucket index? We've had issues with bucket index contention with a few million+ objects in a single bucket so i'm testing out the sharding. Perhaps at least one shard per OSD? Or, less? More? I noticed some discussion here regarding slow

Re: [ceph-users] Transfering files from NFS to ceph + RGW

2015-07-08 Thread Ben Hines
It's really about 10 minutes of work to write a python client to post files into RGW/S3. (we use boto) Or you could use an S3 GUI client such as Cyberduck. The problem i am having and which you should look out for is that many millions of objects in a single RGW bucket causes problems with

Re: [ceph-users] Hammer issues (rgw)

2015-07-08 Thread Ben Hines
Also recently updated to 94.2. I am also seeing a large difference between my 'ceph df' and 'size_kb_actual' in the bucket stats. I would assume the difference is objects awaiting gc, but 'gc list' prints very little. ceph df: NAME ID USED%USED MAX AVAIL

Re: [ceph-users] Troubleshooting rgw bucket list

2015-08-28 Thread Ben Hines
How many objects in the bucket? RGW has problems with index size once number of objects gets into the 90+ level. The buckets need to be recreated with 'sharded bucket indexes' on: rgw override bucket index max shards = 23 You could also try repairing the index with: radosgw-admin bucket

Re: [ceph-users] Troubleshooting rgw bucket list

2015-08-28 Thread Ben Hines
in the ceph logs or health warnings. r, Sam On 28-08-15 17:49, Ben Hines wrote: How many objects in the bucket? RGW has problems with index size once number of objects gets into the 90+ level. The buckets need to be recreated with 'sharded bucket indexes' on: rgw override bucket index

Re: [ceph-users] a couple of radosgw questions

2015-08-29 Thread Ben Hines
at 5:14 PM, Brad Hubbard bhubb...@redhat.com wrote: - Original Message - From: Ben Hines bhi...@gmail.com To: Brad Hubbard bhubb...@redhat.com Cc: Tom Deneau tom.den...@amd.com, ceph-users ceph-us...@ceph.com Sent: Saturday, 29 August, 2015 9:49:00 AM Subject: Re: [ceph-users

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Ben Hines
No input, eh? (or maybe TL,DR for everyone) Short version: Presuming the bucket index shows blank/empty, which it does and is fine, would me manually deleting the rados objects with the prefix matching the former bucket's ID cause any problems? thanks, -Ben On Fri, Aug 28, 2015 at 4:22 PM, Ben

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Ben Hines
ly being used for the > specific bucket that was previously removed, then it is safe to remove > these objects. But please do double check and make sure that there's > no other bucket that matches this prefix somehow. > > Yehuda > > On Mon, Aug 31, 2015 at 2:42 PM, Ben Hines <

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-31 Thread Ben Hines
e you use the underscore also, e.g., "default.8873277.32_". > Otherwise you could potentially erase objects you did't intend to, > like ones who start with "default.8873277.320" and such. > > On Mon, Aug 31, 2015 at 3:20 PM, Ben Hines <bhi...@gmail.com> wrote: >> Ok

Re: [ceph-users] Moving/Sharding RGW Bucket Index

2015-09-01 Thread Ben Hines
We also run RGW buckets with many millions of objects and had to shard our existing buckets. We did have to delete the old ones first, unfortunately. I haven't tried moving the index pool to an SSD ruleset - would also be interested in folks' experiences with this. Thanks for the information on

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Ben Hines
to > bring things back into order. > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Wang, Warren >> Sent: 04 September 2015 01:21 >> To: Mark Nelson <mnel...@redhat.com>; Ben Hines <bhi...@gmail.com

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-03 Thread Ben Hines
Hey Mark, I've just tweaked these filestore settings for my cluster -- after changing this, is there a way to make ceph move existing objects around to new filestore locations, or will this only apply to newly created objects? (i would assume the latter..) thanks, -Ben On Wed, Jul 8, 2015 at

Re: [ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-09-08 Thread Ben Hines
st likely in the .rgw.buckets.index pool. > > Yehuda > > On Mon, Aug 31, 2015 at 3:27 PM, Ben Hines <bhi...@gmail.com> wrote: >> Good call, thanks! >> >> Is there any risk of also deleting parts of the bucket index? I'm not >> s

Re: [ceph-users] purpose of different default pools created by radosgw instance

2015-09-09 Thread Ben Hines
The Ceph docs in general could use a lot of improvement, IMO. There are many, many settings listed, but one must dive into the mailing list to learn which ones are worth tweaking (And often, even *what they do*!) -Ben On Wed, Sep 9, 2015 at 3:51 PM, Mark Kirkwood

[ceph-users] osds revert to 'prepared' after reboot

2015-09-24 Thread Ben Hines
Any idea why OSDs might revert to 'prepared' after reboot and have to be activated again? These are older nodes which were manually deployed, not using ceph-deploy. CentOS 6.7, Hammer 94.3 -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] rgw cache lru size

2015-09-23 Thread Ben Hines
We have a ton of memory on our RGW servers, 96GB. Can someone explain how the rgw lru cache functions? It is worth bumping the 'rgw cache lru size' to a huge number? Our gateway seems to only be using about 1G of memory with the default setting. Also currently still using apache/fastcgi due to

Re: [ceph-users] osds revert to 'prepared' after reboot

2015-09-24 Thread Ben Hines
Aha, it seems like '--mark-init auto' (supposedly the default arg to ceph-disk activate?) must be failing. I'll try re-activating my OSDs with an explicit init system passed in. -Ben On Thu, Sep 24, 2015 at 12:49 PM, Ben Hines <bhi...@gmail.com> wrote: > Any idea why OSDs mig

[ceph-users] Still have orphaned rgw shadow files, ceph 0.94.3

2015-08-28 Thread Ben Hines
Ceph 0.93-94.2-94.3 I noticed my pool used data amount is about twice the bucket used data count. This bucket was emptied long ago. It has zero objects: globalcache01, { bucket: globalcache01, pool: .rgw.buckets, index_pool: .rgw.buckets.index, id:

Re: [ceph-users] a couple of radosgw questions

2015-08-28 Thread Ben Hines
16:22:38 root@sm-cephrgw4 /etc/ceph $ radosgw-admin temp remove unrecognized arg remove usage: radosgw-admin cmd [options...] commands: temp removeremove temporary objects that were created up to specified date (and optional time) On Fri, Aug

Re: [ceph-users] Ceph 9.2 fails to install in COS 7.1.1503: Report and Fix

2015-12-09 Thread Ben Hines
FYI - same issue when installing Hammer, 94.5. I also fixed it by enabling the cr repo. -Ben On Tue, Dec 8, 2015 at 5:13 PM, Goncalo Borges wrote: > Hi Cephers > > This is just to report an issue (and a workaround) regarding dependencies > in Centos 7.1.1503 > >

Re: [ceph-users] radosgw bucket index sharding tips?

2015-12-16 Thread Ben Hines
texo.com> wrote: > >> Hi Ben & everyone, >> >> just following up on this one from July, as I don't think there's been >> a reply here then. >> >> On Wed, Jul 8, 2015 at 7:37 AM, Ben Hines <bhi...@gmail.com> wrote: >> > Anyone have a

Re: [ceph-users] radosgw bucket index sharding tips?

2015-12-16 Thread Ben Hines
On Wed, Dec 16, 2015 at 11:05 AM, Florian Haas wrote: > Hi Ben & everyone, > > > Ben, you wrote elsewhere > ( > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003955.html > ) > that you found approx. 900k objects to be the threshold where index > sharding

Re: [ceph-users] How to run multiple RadosGW instances under the same zone

2016-01-04 Thread Ben Hines
It works fine. The federated config reference is not related to running multiple instances on the same zone. Just set up 2 radosgws give each instance the exact same configuration. (I use different client names in ceph.conf, but i bet it would work even if the client names were identical)

Re: [ceph-users] Dealing with radosgw and large OSD LevelDBs: compact, start over, something else?

2015-12-21 Thread Ben Hines
I'd be curious to compare benchmarks. What size objects are you putting? 10gig end to end from client to RGW server to OSDs? I wouldn't be surprised if mine is pretty slow though in comparison, since we still don't have SSD journals. So I have not paid much attention to upload speed. Our omap

Re: [ceph-users] upgrading 0.94.5 to 9.2.0 notes

2016-01-26 Thread Ben Hines
I see the same list of issues, particularly where ceph.target doesn't function until i 'enable' the daemons individually. It would be nice if the package enabled the daemons when it is installed, so that ceph.target works. Perhaps this could be fixed for Jewel? -Ben On Sat, Nov 21, 2015 at

Re: [ceph-users] Upgrading Ceph

2016-02-01 Thread Ben Hines
Upgrades have been easy for me, following the steps. I would say to be careful not to 'miss' one OSD, or forget to restart it after updating, since having an OSD on a different version than the rest of the cluster for too long during an upgrade started to cause issues when i missed one once.

[ceph-users] RGW Civetweb + CentOS7 boto errors

2016-01-29 Thread Ben Hines
After updating our RGW servers to Centos 7 + civetweb, when hit with a fair amount of load (20 gets/sec + a few puts/sec) i'm seeing 'BadStatusLine' exceptions from boto relatively often. Happens most when calling bucket.get_key() (about 10 times in 1000) These appear to be possibly random TCP

[ceph-users] radosgw flush_read_list(): d->client_c->handle_data() returned -5

2016-02-24 Thread Ben Hines
Any idea what is going on here? I get these intermittently, especially with very large file. The client is doing RANGE requests on this >51 GB file, incrementally fetching later chunks. 2016-02-24 16:30:59.669561 7fd33b7fe700 1 == starting new request req=0x7fd32c0879c0 = 2016-02-24

Re: [ceph-users] How to observed civetweb.

2016-01-19 Thread Ben Hines
Hey Kobi, You stated: > >> You can add: > >> *access_log_file=/var/log/civetweb/access.log > >> error_log_file=/var/log/civetweb/error.log* > >> > >> to *rgw frontends* in ceph.conf though these logs are thin on info > >> (Source IP, date, and request) How is this done exactly in the config

Re: [ceph-users] How to observed civetweb.

2016-01-19 Thread Ben Hines
documentation, at all, on civetweb + radosgw on the Ceph site would be awesome.. Currently it all only references Apache+FastCGi. On Tue, Jan 19, 2016 at 8:42 PM, Ben Hines <bhi...@gmail.com> wrote: > Hey Kobi, > > You stated: > > > >> You can add: > > >> *ac

[ceph-users] incorrect numbers in ceph osd pool stats

2016-02-18 Thread Ben Hines
Ceph 9.2.0 Anyone seen this? Crazy numbers in osd stats command ceph osd stats pool .rgw.buckets id 12 2/39 objects degraded (5.128%) -105/39 objects misplaced (-269.231%) recovery io 20183 kB/s, 36 objects/s client io 79346 kB/s rd, 703 kB/s wr, 476 op/s ceph osd stats -f json

Re: [ceph-users] ceph-disk from jewel has issues on redhat 7

2016-03-15 Thread Ben Hines
It seems like ceph-disk is often breaking on centos/redhat systems. Does it have automated tests in the ceph release structure? -Ben On Tue, Mar 15, 2016 at 8:52 AM, Stephen Lord wrote: > > Hi, > > The ceph-disk (10.0.4 version) command seems to have problems operating

Re: [ceph-users] Radosgw (civetweb) hangs once around 850 established connections

2016-03-19 Thread Ben Hines
What OS are you using? I have a lot more open connections than that. (though i have some other issues, where rgw sometimes returns 500 errors, it doesn't stop like yours) You might try tuning civetweb's num_threads and 'rgw num rados handles': rgw frontends = civetweb num_threads=125

Re: [ceph-users] rgw bucket deletion woes

2016-03-19 Thread Ben Hines
We would be a big user of this. We delete large buckets often and it takes forever. Though didn't I read that 'object expiration' support is on the near-term RGW roadmap? That may do what we want.. we're creating thousands of objects a day, and thousands of objects a day will be expiring, so RGW

[ceph-users] Ceph Recovery Assistance, pgs stuck peering

2016-03-07 Thread Ben Hines
Howdy, I was hoping someone could help me recover a couple pgs which are causing problems in my cluster. If we aren't able to resolve this soon, we may have to just destroy them and lose some data. Recovery has so far been unsuccessful. Data loss would probably cause some here to reconsider Ceph

[ceph-users] abort slow requests ?

2016-03-03 Thread Ben Hines
I have a few bad objects in ceph which are 'stuck on peering'. The clients hit them and they build up and eventually stop all traffic to the OSD. I can open up traffic by resetting the OSD (aborting those requests) temporarily. Is there a way to tell ceph to cancel/abort these 'slow requests'

Re: [ceph-users] abort slow requests ?

2016-03-04 Thread Ben Hines
don't believe there's > a way to do this. > > On Fri, Mar 4, 2016 at 1:04 AM, Ben Hines <bhi...@gmail.com> wrote: > > I have a few bad objects in ceph which are 'stuck on peering'. The > clients > > hit them and they build up and eventually stop all traffic to the OSD

[ceph-users] radosgw refuses to initialize / waiting for peered 'notify' object

2016-03-02 Thread Ben Hines
Ceph 9.2.1. Shortly after updating 9.2.0 to 9.2.1 all radosgws are refusing to start up, it's stuck on this 'notify' object: [root@sm-cld-mtl-033 ceph]# ceph daemon /var/run/ceph/ceph-client.<>.asok objecter_requests { "ops": [ { "tid": 13, "pg": "4.88aa5c95",

Re: [ceph-users] Ceph Recovery Assistance, pgs stuck peering

2016-03-08 Thread Ben Hines
-map/ > > The wrong placements makes you vulnerable to a single host failure taking > out multiple copies of an object. > > David > > > On 3/7/16 9:41 PM, Ben Hines wrote: > > Howdy, > > I was hoping someone could help me recover a couple pgs which are causing &

Re: [ceph-users] Using s3 (radosgw + ceph) like a cache

2016-04-25 Thread Ben Hines
This is how we use ceph/ radosgw. I'd say our cluster is not that reliable, but it's probably mostly our fault (no SSD journals, etc). However, note that deletes are very slow in ceph. We put millions of objects in very quickly and they are verrry slow to delete again especially from RGW because

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines
! On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > - Original Message - > > From: "Karol Mroz" <km...@suse.com> > > To: "Ben Hines" <bhi...@gmail.com> > > Cc: "ceph-users" <ceph-users@lists.

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines
Aha, i see how to use the debuginfo - trying it by running through gdb. On Wed, Apr 27, 2016 at 10:09 PM, Ben Hines <bhi...@gmail.com> wrote: > Got it again - however, the stack is exactly the same, no symbols - > debuginfo didn't resolve. Do i need to do somethi

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines
fault) ** in thread 7f9e7e7f4700 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) 1: (()+0x30b0a2) [0x7fa11c5030a2] 2: (()+0xf100) [0x7fa1183fe100] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- logging levels --- On Wed, Apr 27, 2016 at 9:39 PM, Ben Hines

[ceph-users] radosgw crash - Infernalis

2016-04-26 Thread Ben Hines
Is this a known one? Ceph 9.2.1. Can provide more logs if needed. 2> 2016-04-26 22:07:59.662702 7f49aeffd700 1 == req done req=0x7f49c4138be0 http_status=200 == -11> 2016-04-26 22:07:59.662752 7f49aeffd700 1 civetweb: 0x7f49c4001280: 10.30.1.221 - - [26/Apr/2016:22:07:59 -0700] "HEAD

Re: [ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)

2016-05-23 Thread Ben Hines
I for one am terrified of upgrading due to these messages (and indications that the problem still may not be resolved even in 10.2.1) - holding off until a clean upgrade is possible without running any hacky scripts. -Ben On Mon, May 23, 2016 at 2:23 AM, nick wrote: > Hi, > we

[ceph-users] Incorrect crush map

2016-05-03 Thread Ben Hines
My crush map keeps putting some OSDs on the wrong node. Restarting them fixes it temporarily, but they eventually hop back to the other node that they aren't really on. Is there anything that can cause this to look for? Ceph 9.2.1 -Ben ___ ceph-users

[ceph-users] ceph degraded writes

2016-05-03 Thread Ben Hines
The Hammer .93 to .94 notes said: If upgrading from v0.93, setosd enable degraded writes = false on all osds prior to upgrading. The degraded writes feature has been reverted due to 11155. Our cluster is now on Infernalis 9.2.1 and we still have this setting set. Can we get rid of it? Was this

Re: [ceph-users] Incorrect crush map

2016-05-05 Thread Ben Hines
On Wed, May 4, 2016 at 10:27 PM, Ben Hines <bhi...@gmail.com> wrote: > Centos 7.2. > > .. and i think i just figured it out. One node had directories from former > OSDs in /var/lib/ceph/osd. When restarting other OSDs on this host, ceph > apparently added those to the crush m

[ceph-users] waiting for rw locks on rgw index file during recovery

2016-05-06 Thread Ben Hines
Infernalis 9.2.1, Centos 72. My cluster is in recovery and i've noticed a lot of 'waiting for rw locks'. Some of these can last quite a long time. Any idea what can cause this? Because this is a RGW bucket index file, this causes backup effects -- since the index can't be updated, S3 updates to

[ceph-users] RGW obj remove cls_xx_remove returned -2

2016-05-05 Thread Ben Hines
Ceph 9.2.1, Centos 7.2 I noticed these errors sometimes when removing objects. It's getting a 'No such file or directory' on the OSD when deleting things sometimes. Any ideas here? Is this expected? (i anonymized the full filename, but it's all the same file) RGW log: 2016-05-04

Re: [ceph-users] Incorrect crush map

2016-05-04 Thread Ben Hines
: ceph-osd@42.service failed. -Ben On Tue, May 3, 2016 at 7:16 PM, Wade Holler <wade.hol...@gmail.com> wrote: > Hi Ben, > > What OS+Version ? > > Best Regards, > Wade > > > On Tue, May 3, 2016 at 2:44 PM Ben Hines <bhi...@gmail.com> wrote: > >> M

Re: [ceph-users] Unknown error (95->500) when creating buckets or putting files to RGW after upgrade from Infernalis to Jewel

2016-07-26 Thread Ben Hines
Fwiw this thread still has me terrified to upgrade my rgw cluster. Just when I thought it was safe. Anyone have any successful problem free rgw infernalis-jewel upgrade reports? On Jul 25, 2016 11:27 PM, "nick" wrote: > Hey Maciej, > I compared the output of your commands with

Re: [ceph-users] Radosgw scaling recommendation?

2017-02-09 Thread Ben Hines
I'm curious how does the num_threads option to civetweb relate to the 'rgw thread pool size'? Should i make them equal? ie: rgw frontends = civetweb enable_keep_alive=yes port=80 num_threads=125 error_log_file=/var/log/ceph/civetweb.error.log access_log_file=/var/log/ceph/civetweb.access.log

Re: [ceph-users] rgw static website docs 404

2017-01-19 Thread Ben Hines
features are added and effectively kept secret. -Ben On Thu, Jan 19, 2017 at 1:56 AM, Wido den Hollander <w...@42on.com> wrote: > > > Op 19 januari 2017 om 2:57 schreef Ben Hines <bhi...@gmail.com>: > > > > > > Aha! Found some docs here in the RHCS

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines
Yes, 200 million is way too big for a single ceph RGW bucket. We encountered this problem early on and sharded our buckets into 20 buckets, each which have the sharded bucket index with 20 shards. Unfortunately, enabling the sharded RGW index requires recreating the bucket and all objects. The

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines
Nice, thanks! Must have missed that one. It might work well for our use case since we don't really need the index. -Ben On Wed, Sep 21, 2016 at 11:23 AM, Gregory Farnum <gfar...@redhat.com> wrote: > On Wednesday, September 21, 2016, Ben Hines <bhi...@gmail.com> wrote: > &

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines
nce difference between SSD-backed indexes > and 'blind bucket' configuration. > > Stas > > > On Sep 21, 2016, at 2:26 PM, Ben Hines <bhi...@gmail.com> wrote: > > > > Nice, thanks! Must have missed that one. It might work well for our use > case since we don'

Re: [ceph-users] v11.1.0 kraken candidate released

2016-12-12 Thread Ben Hines
eading release notes is also required) http://docs.ceph.com/docs/master/install/upgrading-ceph/ -Ben On Mon, Dec 12, 2016 at 6:35 PM, Ben Hines <bhi...@gmail.com> wrote: > Hi! Can you clarify whether this release note applies to Jewel upgrades > only? Ie, can we go Infernalis -> K

Re: [ceph-users] v11.1.0 kraken candidate released

2016-12-12 Thread Ben Hines
Hi! Can you clarify whether this release note applies to Jewel upgrades only? Ie, can we go Infernalis -> Kraken? It is in the 'upgrading from jewel' section which would imply that it doesn't apply to Infernalis -> Kraken. (or any other version to kraken), but it does say 'All clusters'.

Re: [ceph-users] Kraken 11.x feedback

2016-12-09 Thread Ben Hines
t; > On Fri, Dec 9, 2016 at 11:38 AM, Ben Hines <bhi...@gmail.com> wrote: > > Anyone have any good / bad experiences with Kraken? I haven't seen much > > discussion of it. Particularly from the RGW front. > > > > I'm still on Infernalis for our cluster, cons

[ceph-users] Kraken 11.x feedback

2016-12-09 Thread Ben Hines
Anyone have any good / bad experiences with Kraken? I haven't seen much discussion of it. Particularly from the RGW front. I'm still on Infernalis for our cluster, considering going up to K. thanks, -Ben ___ ceph-users mailing list

Re: [ceph-users] Can't create bucket (ERROR: endpoints not configured for upstream zone)

2016-12-22 Thread Ben Hines
FWIW, this is still required with Jewel 10.2.5. It sounded like it was finally fixed from the release notes, but i had the same issue. Fortunately Micha's steps are easy and fix it right up. In my case i didn't think i had any mixed RGWs - was planning to stop them all first - but i had

Re: [ceph-users] S3 Multi-part upload broken with newer AWS Java SDK and Kraken RGW

2017-03-31 Thread Ben Hines
Hey Yehuda, Are there plans to port of this fix to Kraken? (or is there even another Kraken release planned? :) thanks! -Ben On Wed, Mar 1, 2017 at 11:33 AM, Yehuda Sadeh-Weinraub wrote: > This sounds like this bug: > http://tracker.ceph.com/issues/17076 > > Will be fixed

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-03-31 Thread Ben Hines
I'm also trying to use lifecycles (via boto3) but i'm getting permission denied trying to create the lifecycle. I'm bucket owner with full_control and WRITE_ACP for good measure. Any ideas? This is debug ms=20 debug radosgw=20 2017-03-31 21:28:18.382217 7f50d0010700 2 req 8:0.000693:s3:PUT

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-02 Thread Ben Hines
, "check_on_raw": false, "max_size": -1024, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1024

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-11 Thread Ben Hines
correct / user friendly result. http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html specifies 'Prefix' as Optional, so i'll put in a bug for this. -Ben On Mon, Apr 3, 2017 at 12:14 PM, Ben Hines <bhi...@gmail.com> wrote: > Interesting. > I'm wondering what th

Re: [ceph-users] Question about RadosGW subusers

2017-04-13 Thread Ben Hines
Based on past LTS release dates would predict Luminous much sooner than that, possibly even in May... http://docs.ceph.com/docs/master/releases/ The docs also say "Spring" http://docs.ceph.com/docs/master/release-notes/ -Ben On Thu, Apr 13, 2017 at 12:11 PM,

[ceph-users] RGW lifecycle bucket stuck processing?

2017-04-13 Thread Ben Hines
I initiated a manual lifecycle cleanup with: radosgw-admin lc process It took over a day working on my bucket called 'bucket1' (w/2 million objects) and seems like it eventually got stuck with about 1.7 million objs left, with uninformative errors like: (notice the timestamps) 2017-04-12

Re: [ceph-users] Creating journal on needed partition

2017-04-19 Thread Ben Hines
This is my experience. For creating new OSDs, i just created Rundeck jobs that run ceph-deploy. It's relatively rare that new OSDs are created, so it is fine. Originally I was automating them with configuration management tools but it tended to encounter edge cases and problems that ceph-deploy

Re: [ceph-users] Shrinking lab cluster to free hardware for a new deployment

2017-03-09 Thread Ben Hines
AFAIK depending on how many you have, you are likely to end up with 'too many pgs per OSD' warning for your main pool if you do this, because the number of PGs in a pool cannot be reduced and there will be less OSDs to put them on. -Ben On Wed, Mar 8, 2017 at 5:53 AM, Henrik Korkuc

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-03 Thread Ben Hines
ead of 's3://Test/INSTALL' > [root@mucsds26 s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/ > --expiry-days=365 > ERROR: Access to bucket 'Test' was denied > ERROR: S3 error: 403 (AccessDenied) > > [root s3cmd-master]# ./s3cmd --no-ssl la expire s3://Test > 2017-04-03 12:01 3123 s3://Test/INSTALL > 2017-03-31 22:36 88 s3://Test/READ

Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-04-05 Thread Ben Hines
Ceph's RadosGW uses garbage collection by default. Try running 'radosgw-admin gc list' to list the objects to be garbage collected, or 'radosgw-admin gc process' to trigger them to be deleted. -Ben On Wed, Apr 5, 2017 at 12:15 PM, Deepak Naidu wrote: > Folks, > > > > Trying

Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-06 Thread Ben Hines
Personally before extreme measures like marking lost, i would try bringing up the osd, so it's up and out -- i believe the data will still be found and re balanced away from it by Ceph. -Ben On Thu, Apr 6, 2017 at 11:20 AM, David Welch wrote: > Hi, > We had a disk on the

Re: [ceph-users] RGW lifecycle bucket stuck processing?

2017-04-14 Thread Ben Hines
Interesting - the state went back to 'UNINITIAL' eventually, possibly because the first run never finished. Will see if it ever completes during a nightly run. -BEn On Thu, Apr 13, 2017 at 11:10 AM, Ben Hines <bhi...@gmail.com> wrote: > I initiated a manual lifecycle cleanup with: &g

Re: [ceph-users] Linear space complexity or memory leak in `Radosgw-admin bucket check --fix`

2017-07-26 Thread Ben Hines
Which version of Ceph? On Tue, Jul 25, 2017 at 4:19 AM, Hans van den Bogert wrote: > Hi All, > > I don't seem to be able to fix a bucket, a bucket which has become > inconsistent due to the use of the `inconsistent-index` flag 8). > > My ceph-admin VM has 4GB of RAM, but

Re: [ceph-users] Kraken rgw lifeycle processing nightly crash

2017-07-25 Thread Ben Hines
Gryniewicz <d...@redhat.com> wrote: > On 07/20/2017 04:48 PM, Ben Hines wrote: > >> Still having this RGWLC crash once a day or so. I do plan to update to >> Luminous as soon as that is final, but it's possible this issue will still >> occur, so i was hoping one

[ceph-users] Kraken rgw lifeycle processing nightly crash

2017-07-20 Thread Ben Hines
Still having this RGWLC crash once a day or so. I do plan to update to Luminous as soon as that is final, but it's possible this issue will still occur, so i was hoping one of the devs could take a look at it. My original suspicion was that it happens when lifecycle processing at the same time

Re: [ceph-users] Ceph UPDATE (not upgrade)

2017-04-26 Thread Ben Hines
It's probably fine, depending on the ceph version. The upgrade notes on the ceph website typically tell you the steps for each version. As of Kraken, the notes say: "You may upgrade OSDs, Monitors, and MDSs in any order. RGW daemons should be upgraded last" Previously it was always recommended

Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-05-11 Thread Ben Hines
...@nvidia.com> wrote: > I still see the issue, where the space is not getting deleted. gc process > works sometimes but sometimes it does nothing to clean the GC, as there are > no items in the GC, but still the space is used on the pool. > > > > Any ideas what the ideal conf

Re: [ceph-users] Read from Replica Osds?

2017-05-08 Thread Ben Hines
We write many millions of keys into RGW which will never be changed (until they are deleted) -- it would be interesting if we could somehow indicate this to RGW and enable reading those from the replicas as well. -Ben On Mon, May 8, 2017 at 10:18 AM, Jason Dillaman wrote:

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Ben Hines
Well, ceph journals are of course going away with the imminent bluestore. Are small SSDs still useful for something with Bluestore? For speccing out a cluster today that is a many 6+ months away from being required, which I am going to be doing, i was thinking all-SSD would be the way to go. (or

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-05-22 Thread Ben Hines
We used this workaround when upgrading to Kraken (which had a similar issue) >modify the zonegroup and populate the 'hostnames' array with all backend server hostnames as well as the hostname terminated by haproxy Which i'm fine with. It's definitely a change that should be noted in a more

Re: [ceph-users] DNS records for ceph

2017-05-20 Thread Ben Hines
Ceph kraken or later can use SRV records to find the mon servers. It works great and I've found it a bit easier to maintain than the static list in ceph.conf. That would presumably be on the private subnet. On May 20, 2017 7:40 AM, "David Turner" wrote: > The private

Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-05 Thread Ben Hines
FWIW lifecycle is working for us. I did have to research to find the appropriate lc config file settings, the documentation for which is found in a git pull request (waiting for another release?) rather than on the Ceph docs site. https://github.com/ceph/ceph/pull/13990 Try these: debug rgw = 20

Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-06 Thread Ben Hines
g a few values: > > # ceph --show-config|grep rgw_|grep lifecycle > rgw_lifecycle_enabled = true > rgw_lifecycle_thread = 1 > rgw_lifecycle_work_time = 00:00-06:00 > > > Graham > > On 06/05/2017 01:07 PM, Ben Hines wrote: > >> FWIW lifecycle is working for us. I did have

Re: [ceph-users] Ceph release cadence

2017-09-11 Thread Ben Hines
We have generally been running the latest non LTS 'stable' release since my cluster is slightly less mission critical than others, and there were important features to us added in both Infernalis and Kraken. But i really only care about RGW. If the rgw component could be split out of ceph into a

[ceph-users] Kraken bucket index fix failing

2017-09-14 Thread Ben Hines
Hi, A few weeks ago after running the command to fix my object index for a particular bucket with a lot of data (~26TB) and about 50k multipart objects (~1800 S3 objects), the index lost track of all previous objects and started tracking only new ones. The radosgw zone was set to index_type: 1

Re: [ceph-users] Repeated failures in RGW in Ceph 12.1.4

2017-08-30 Thread Ben Hines
The daily log rotation. -Ben On Wed, Aug 30, 2017 at 3:09 PM, Bryan Banister wrote: > Looking at the systemd service it does show that twice, at roughly the > same time and one day apart, the service did receive a HUP signal: > > Aug 29 16:31:02 carf-ceph-osd02

Re: [ceph-users] installing specific version of ceph-common

2017-10-09 Thread Ben Hines
Just encountered this same problem with 11.2.0. " yum install ceph-common-11.2.0 libradosstriper1-11.2.0 librgw2-11.2.0" did the trick. Thanks! It would be nice if it was easier to install older noncurrent versions of Ceph, perhaps there is a way to fix the dependencies so that yum can figure it

Re: [ceph-users] object lifecycle and updating from jewel

2018-01-04 Thread Ben Hines
Yes, it works fine with pre existing buckets. On Thu, Jan 4, 2018 at 8:52 AM, Graham Allan wrote: > I've only done light testing with lifecycle so far, but I'm pretty sure > you can apply it to pre-existing buckets. > > Graham > > > On 01/02/2018 10:42 PM, Robert Stanford wrote: >

  1   2   >