[ceph-users] centos and 'print continue' support

2014-05-23 Thread Bryan Stillwell
Yesterday I went through manually configuring a ceph cluster with a rados gateway on centos 6.5, and I have a question about the documentation. On this page: https://ceph.com/docs/master/radosgw/config/ It mentions On CentOS/RHEL distributions, turn off print continue. If you have it set to

Re: [ceph-users] Full OSD with 29% free

2013-10-14 Thread Bryan Stillwell
/dev/sdc1 actual 3481543, ideal 3447443, fragmentation factor 0.98% Bryan On Mon, Oct 14, 2013 at 4:35 PM, Michael Lowe j.michael.l...@gmail.com wrote: How fragmented is that file system? Sent from my iPad On Oct 14, 2013, at 5:44 PM, Bryan Stillwell bstillw...@photobucket.com wrote

Re: [ceph-users] Full OSD with 29% free

2013-10-30 Thread Bryan Stillwell
osd_mkfs_options_xfs = -f -b size=2048 The cluster is currently running the 0.71 release. Bryan On Mon, Oct 21, 2013 at 2:39 PM, Bryan Stillwell bstillw...@photobucket.com wrote: So I'm running into this issue again and after spending a bit of time reading the XFS mailing lists, I believe

Re: [ceph-users] Full OSD with 29% free

2013-10-31 Thread Bryan Stillwell
] on behalf of Bryan Stillwell [bstillw...@photobucket.com] Sent: Wednesday, October 30, 2013 2:18 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Full OSD with 29% free I wanted to report back on this since I've made some progress on fixing this issue. After converting every OSD

Re: [ceph-users] Full OSD with 29% free

2013-10-31 Thread Bryan Stillwell
free extent size 44.7352 That gives me a little more confidence in using 2K block sizes now. :) Bryan On Thu, Oct 31, 2013 at 11:02 AM, Bryan Stillwell bstillw...@photobucket.com wrote: Shain, After getting the segfaults when running 'xfs_db -r -c freesp -s' on a couple partitions, I'm

Re: [ceph-users] CephFS First product release discussion

2013-03-05 Thread Bryan Stillwell
On Tue, Mar 5, 2013 at 12:44 PM, Kevin Decherf ke...@kdecherf.com wrote: On Tue, Mar 05, 2013 at 12:27:04PM -0600, Dino Yancey wrote: The only two features I'd deem necessary for our workload would be stable distributed metadata / MDS and a working fsck equivalent. Snapshots would be great

[ceph-users] Corruption by missing blocks

2013-04-23 Thread Bryan Stillwell
I've run into an issue where after copying a file to my cephfs cluster the md5sums no longer match. I believe I've tracked it down to some parts of the file which are missing: $ obj_name=$(cephfs title1.mkv show_location -l 0 | grep object_name | sed -e s/.*:\W*\([0-9a-f]*\)\.[0-9a-f]*/\1/) $

Re: [ceph-users] Corruption by missing blocks

2013-04-23 Thread Bryan Stillwell
wrote: On Tue, Apr 23, 2013 at 11:38 AM, Bryan Stillwell bstillw...@photobucket.com wrote: I've run into an issue where after copying a file to my cephfs cluster the md5sums no longer match. I believe I've tracked it down to some parts of the file which are missing: $ obj_name=$(cephfs

Re: [ceph-users] Corruption by missing blocks

2013-04-23 Thread Bryan Stillwell
On Tue, Apr 23, 2013 at 5:24 PM, Sage Weil s...@inktank.com wrote: On Tue, 23 Apr 2013, Bryan Stillwell wrote: I'm testing this now, but while going through the logs I saw something that might have something to do with this: Apr 23 16:35:28 a1 kernel: [692455.496594] libceph: corrupt inc

Re: [ceph-users] Corruption by missing blocks

2013-04-23 Thread Bryan Stillwell
On Tue, Apr 23, 2013 at 5:45 PM, Sage Weil s...@inktank.com wrote: On Tue, 23 Apr 2013, Bryan Stillwell wrote: On Tue, Apr 23, 2013 at 5:24 PM, Sage Weil s...@inktank.com wrote: On Tue, 23 Apr 2013, Bryan Stillwell wrote: I'm testing this now, but while going through the logs I saw

Re: [ceph-users] Corruption by missing blocks

2013-04-23 Thread Bryan Stillwell
On Tue, Apr 23, 2013 at 5:54 PM, Gregory Farnum g...@inktank.com wrote: On Tue, Apr 23, 2013 at 4:45 PM, Sage Weil s...@inktank.com wrote: On Tue, 23 Apr 2013, Bryan Stillwell wrote: On Tue, Apr 23, 2013 at 5:24 PM, Sage Weil s...@inktank.com wrote: On Tue, 23 Apr 2013, Bryan Stillwell

[ceph-users] ceph-deploy documentation fixes

2013-05-07 Thread Bryan Stillwell
With the release of cuttlefish, I decided to try out ceph-deploy and ran into some documentation errors along the way: http://ceph.com/docs/master/rados/deployment/preflight-checklist/ Under 'CREATE A USER' it has the following line: To provide full privileges to the user, add the following to

[ceph-users] mon problems after upgrading to cuttlefish

2013-05-22 Thread Bryan Stillwell
I attempted to upgrade my bobtail cluster to cuttlefish tonight and I believe I'm running into some mon related issues. I did the original install manually instead of with mkcephfs or ceph-deploy, so I think that might have to do with this error: root@a1:~# ceph-mon -d -c /etc/ceph/ceph.conf

[ceph-users] Moving an MDS

2013-06-11 Thread Bryan Stillwell
I have a cluster I originally built on argonaut and have since upgraded it to bobtail and then cuttlefish. I originally configured it with one node for both the mds node and mon node, and 4 other nodes for hosting osd's: a1: mon.a/mds.a b1: osd.0, osd.1, osd.2, osd.3, osd.4, osd.20 b2: osd.5,

Re: [ceph-users] Moving an MDS

2013-06-11 Thread Bryan Stillwell
On Tue, Jun 11, 2013 at 3:50 PM, Gregory Farnum g...@inktank.com wrote: You should not run more than one active MDS (less stable than a single-MDS configuration, bla bla bla), but you can run multiple daemons and let the extras serve as a backup in case of failure. The process for moving an

Re: [ceph-users] Performance issues with small files

2013-09-04 Thread Bryan Stillwell
...@inktank.comwrote: Bryan, Good explanation. How's performance now that you've spread the load over multiple buckets? Mark On 09/04/2013 12:39 PM, Bryan Stillwell wrote: Bill, I've run into a similar issue with objects averaging ~100KiB. The explanation I received on IRC

Re: [ceph-users] Performance issues with small files

2013-09-05 Thread Bryan Stillwell
. Is it always (temporarily) resolved writing to a new empty bucket? Mark On 09/04/2013 02:45 PM, Bill Omer wrote: We've actually done the same thing, creating 65k buckets and storing 20-50 objects in each. No change really, not noticeable anyway On Wed, Sep 4, 2013 at 2:43 PM, Bryan Stillwell

Re: [ceph-users] Performance issues with small files

2013-09-05 Thread Bryan Stillwell
that things have slowed down a bit. The average upload rate over those first 20 hours was ~48 objects/second, but now I'm only seeing ~20 objects/second. This is with 18,836 buckets. Bryan On Wed, Sep 4, 2013 at 12:43 PM, Bryan Stillwell bstillw...@photobucket.com wrote: So far I haven't seen much

Re: [ceph-users] Performance issues with small files

2013-09-05 Thread Bryan Stillwell
/use). Mark On 09/05/2013 11:59 AM, Bryan Stillwell wrote: Mark, Yesterday I blew away all the objects and restarted my test using multiple buckets, and things are definitely better! After ~20 hours I've already uploaded ~3.5 million objects, which much is better then the ~1.5 million I

Re: [ceph-users] Speeding up garbage collection in RGW

2017-07-25 Thread Bryan Stillwell
. Thanks, Bryan From: Pavan Rallabhandi <prallabha...@walmartlabs.com> Date: Tuesday, July 25, 2017 at 3:00 AM To: Bryan Stillwell <bstillw...@godaddy.com>, "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> Subject: Re: [ceph-users] Speeding up garbage col

Re: [ceph-users] Speeding up garbage collection in RGW

2017-07-25 Thread Bryan Stillwell
Excellent, thank you! It does exist in 0.94.10! :) Bryan From: Pavan Rallabhandi <prallabha...@walmartlabs.com> Date: Tuesday, July 25, 2017 at 11:21 AM To: Bryan Stillwell <bstillw...@godaddy.com>, "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> Subject:

[ceph-users] Speeding up garbage collection in RGW

2017-07-24 Thread Bryan Stillwell
I'm in the process of cleaning up a test that an internal customer did on our production cluster that produced over a billion objects spread across 6000 buckets. So far I've been removing the buckets like this: printf %s\\n bucket{1..6000} | xargs -I{} -n 1 -P 32 radosgw-admin bucket rm

Re: [ceph-users] Speeding up garbage collection in RGW

2017-07-24 Thread Bryan Stillwell
Wouldn't doing it that way cause problems since references to the objects wouldn't be getting removed from .rgw.buckets.index? Bryan From: Roger Brown <rogerpbr...@gmail.com> Date: Monday, July 24, 2017 at 2:43 PM To: Bryan Stillwell <bstillw...@godaddy.com>, "ceph-users@lists

Re: [ceph-users] expanding cluster with minimal impact

2017-08-07 Thread Bryan Stillwell
Dan, We recently went through an expansion of an RGW cluster and found that we needed 'norebalance' set whenever making CRUSH weight changes to avoid slow requests. We were also increasing the CRUSH weight by 1.0 each time which seemed to reduce the extra data movement we were seeing with

[ceph-users] Living with huge bucket sizes

2017-06-08 Thread Bryan Stillwell
This has come up quite a few times before, but since I was only working with RBD before I didn't pay too close attention to the conversation. I'm looking for the best way to handle existing clusters that have buckets with a large number of objects (>20 million) in them. The cluster I'm doing

Re: [ceph-users] Directory size doesn't match contents

2017-06-15 Thread Bryan Stillwell
On 6/15/17, 9:20 AM, "John Spray" <jsp...@redhat.com> wrote: > > On Wed, Jun 14, 2017 at 4:31 PM, Bryan Stillwell <bstillw...@godaddy.com> > wrote: > > I have a cluster running 10.2.7 that is seeing some extremely large > > directory sizes in CephFS acc

[ceph-users] Directory size doesn't match contents

2017-06-14 Thread Bryan Stillwell
I have a cluster running 10.2.7 that is seeing some extremely large directory sizes in CephFS according to the recursive stats: $ ls -lhd Originals/ drwxrwxr-x 1 bryan bryan 16E Jun 13 13:27 Originals/ du reports a much smaller (and accurate) number: $ du -sh Originals/ 300GOriginals/

Re: [ceph-users] osd_op_tp timeouts

2017-06-13 Thread Bryan Stillwell
Is this on an RGW cluster? If so, you might be running into the same problem I was seeing with large bucket sizes: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018504.html The solution is to shard your buckets so the bucket index doesn't get too big. Bryan From: ceph-users

Re: [ceph-users] Client features by IP?

2017-09-08 Thread Bryan Stillwell
On 09/07/2017 01:26 PM, Josh Durgin wrote: > On 09/07/2017 11:31 AM, Bryan Stillwell wrote: >> On 09/07/2017 10:47 AM, Josh Durgin wrote: >>> On 09/06/2017 04:36 PM, Bryan Stillwell wrote: >>>> I was reading this post by Josh Durgin today and was pretty happy to

[ceph-users] radosgw crashing after buffer overflows detected

2017-09-08 Thread Bryan Stillwell
For about a week we've been seeing a decent number of buffer overflows detected across all our RGW nodes in one of our clusters. This started happening a day after we started weighing in some new OSD nodes, so we're thinking it's probably related to that. Could someone help us determine the root

Re: [ceph-users] radosgw crashing after buffer overflows detected

2017-09-11 Thread Bryan Stillwell
oun...@lists.ceph.com> on behalf of Bryan Stillwell <bstillw...@godaddy.com> Date: Friday, September 8, 2017 at 9:26 AM To: ceph-users <ceph-users@lists.ceph.com> Subject: [ceph-users] radosgw crashing after buffer overflows detected [This sender failed our fraud detection checks

Re: [ceph-users] Client features by IP?

2017-09-07 Thread Bryan Stillwell
On 09/07/2017 10:47 AM, Josh Durgin wrote: > On 09/06/2017 04:36 PM, Bryan Stillwell wrote: > > I was reading this post by Josh Durgin today and was pretty happy to > > see we can get a summary of features that clients are using with the > > 'ceph features' command: &g

[ceph-users] Client features by IP?

2017-09-06 Thread Bryan Stillwell
I was reading this post by Josh Durgin today and was pretty happy to see we can get a summary of features that clients are using with the 'ceph features' command: http://ceph.com/community/new-luminous-upgrade-complete/ However, I haven't found an option to display the IP address of those

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-25 Thread Bryan Stillwell
Yehuda Sadeh-Weinraub <yeh...@redhat.com> Date: Wednesday, October 25, 2017 at 11:32 AM To: Bryan Stillwell <bstillw...@godaddy.com> Cc: David Turner <drakonst...@gmail.com>, Ben Hines <bhi...@gmail.com>, "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.co

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-25 Thread Bryan Stillwell
few references to the rgw-gc settings in the config, but nothing that explained the times well enough for me to feel comfortable doing anything with them. On Tue, Jul 25, 2017 at 4:01 PM Bryan Stillwell <bstillw...@godaddy.com> wrote: Excellent, thank you!  It does exist in

[ceph-users] Problems removing buckets with --bypass-gc

2017-10-31 Thread Bryan Stillwell
As mentioned in another thread I'm trying to remove several thousand buckets on a hammer cluster (0.94.10), but I'm running into a problem using --bypass-gc. I usually see either this error: # radosgw-admin bucket rm --bucket=sg2pl598 --purge-objects --bypass-gc 2017-10-31 09:21:04.111599

[ceph-users] RGW (Swift) failures during upgrade from Jewel to Luminous

2018-05-08 Thread Bryan Stillwell
We recently began our upgrade testing for going from Jewel (10.2.10) to Luminous (12.2.5) on our clusters. The first part of the upgrade went pretty smoothly (upgrading the mon nodes, adding the mgr nodes, upgrading the OSD nodes), however, when we got to the RGWs we started seeing internal

Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

2018-05-11 Thread Bryan Stillwell
> We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB disks > each to the cluster. All the 5 nodes rebalanced well without any issues and > the sixth/last node OSDs started acting weird as I increase weight of one osd > the utilization doesn't change but a different osd on the

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-27 Thread Bryan Stillwell
On Wed, Oct 25, 2017 at 4:02 PM, Yehuda Sadeh-Weinraub <yeh...@redhat.com> wrote: > > On Wed, Oct 25, 2017 at 2:32 PM, Bryan Stillwell <bstillw...@godaddy.com> > wrote: > > That helps a little bit, but overall the process would take years at this > > rate: >

[ceph-users] Switching failure domains

2018-01-31 Thread Bryan Stillwell
We're looking into switching the failure domains on several of our clusters from host-level to rack-level and I'm trying to figure out the least impactful way to accomplish this. First off, I've made this change before on a couple large (500+ OSDs) OpenStack clusters where the volumes, images,

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-13 Thread Bryan Stillwell
It may work fine, but I would suggest limiting the number of operations going on at the same time. Bryan From: Bryan Banister <bbanis...@jumptrading.com> Date: Tuesday, February 13, 2018 at 1:16 PM To: Bryan Stillwell <bstillw...@godaddy.com>, Janne Johansson <icepic...@gmai

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-13 Thread Bryan Stillwell
Bryan, Based off the information you've provided so far, I would say that your largest pool still doesn't have enough PGs. If you originally had only 512 PGs for you largest pool (I'm guessing .rgw.buckets has 99% of your data), then on a balanced cluster you would have just ~11.5 PGs per OSD

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-21 Thread Bryan Stillwell
the ceph command man page? Would be good to help other avoid this pitfall. Thanks again, -Bryan From: David Turner [mailto:drakonst...@gmail.com] Sent: Friday, February 16, 2018 3:21 PM To: Bryan Banister <bbanis...@jumptrading.com<mailto:bbanis...@jumptrading.com>> Cc: Bryan St

Re: [ceph-users] v13.2.1 Mimic released

2018-07-27 Thread Bryan Stillwell
I decided to upgrade my home cluster from Luminous (v12.2.7) to Mimic (v13.2.1) today and ran into a couple issues: 1. When restarting the OSDs during the upgrade it seems to forget my upmap settings. I had to manually return them to the way they were with commands like: ceph osd

[ceph-users] ceph-mgr hangs on larger clusters in Luminous

2018-10-18 Thread Bryan Stillwell
After we upgraded from Jewel (10.2.10) to Luminous (12.2.5) we started seeing a problem where the new ceph-mgr would sometimes hang indefinitely when doing commands like 'ceph pg dump' on our largest cluster (~1,300 OSDs). The rest of our clusters (10+) aren't seeing the same issue, but they

Re: [ceph-users] ceph-mgr hangs on larger clusters in Luminous

2018-10-18 Thread Bryan Stillwell
ctd which is running 'ceph pg dump' every 16-17 seconds. I guess you could say we're stress testing that code path fairly well... :) Bryan On Thu, Oct 18, 2018 at 6:17 PM Bryan Stillwell mailto:bstillw...@godaddy.com>> wrote: After we upgraded from Jewel (10.2.10) to Luminous

Re: [ceph-users] ceph-mgr hangs on larger clusters in Luminous

2018-10-18 Thread Bryan Stillwell
I left some of the 'ceph pg dump' commands running and twice they returned results after 30 minutes, and three times it took 45 minutes. Is there something that runs every 15 minutes that would let these commands finish? Bryan From: Bryan Stillwell Date: Thursday, October 18, 2018 at 11:16

Re: [ceph-users] ceph-mgr hangs on larger clusters in Luminous

2018-10-18 Thread Bryan Stillwell
know the reasoning for that decision? Bryan From: Dan van der Ster Date: Thursday, October 18, 2018 at 2:03 PM To: Bryan Stillwell Cc: ceph-users Subject: Re: [ceph-users] ceph-mgr hangs on larger clusters in Luminous 15 minutes seems like the ms tcp read timeout would be related. Try

Re: [ceph-users] rocksdb mon stores growing until restart

2018-09-19 Thread Bryan Stillwell
> On 08/30/2018 11:00 AM, Joao Eduardo Luis wrote: > > On 08/30/2018 09:28 AM, Dan van der Ster wrote: > > Hi, > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB, > > eventually triggering the 'mon is using a lot of disk space' warning? > > Since upgrading to luminous, we've seen

[ceph-users] Compacting omap data

2019-01-02 Thread Bryan Stillwell
Recently on one of our bigger clusters (~1,900 OSDs) running Luminous (12.2.8), we had a problem where OSDs would frequently get restarted while deep-scrubbing. After digging into it I found that a number of the OSDs had very large omap directories (50GiB+). I believe these were OSDs that had

[ceph-users] Fixing a broken bucket index in RGW

2019-01-16 Thread Bryan Stillwell
I'm looking for some help in fixing a bucket index on a Luminous (12.2.8) cluster running on FileStore. First some background on how I believe the bucket index became broken. Last month we had a PG in our .rgw.buckets.index pool become inconsistent: 2018-12-11 09:12:17.743983 osd.1879 osd.1879

Re: [ceph-users] pgs stuck in creating+peering state

2019-01-17 Thread Bryan Stillwell
Since you're using jumbo frames, make sure everything between the nodes properly supports them (nics & switches). I've tested this in the past by using the size option in ping (you need to use a payload size of 8972 instead of 9000 to account for the 28 byte header): ping -s 8972

[ceph-users] Rebuilding RGW bucket indices from objects

2019-01-17 Thread Bryan Stillwell
This is sort of related to my email yesterday, but has anyone ever rebuilt a bucket index using the objects themselves? It seems to be that it would be possible since the bucket_id is contained within the rados object name: # rados -p .rgw.buckets.index listomapkeys

Re: [ceph-users] How to reduce min_size of an EC pool?

2019-01-17 Thread Bryan Stillwell
When you use 3+2 EC that means you have 3 data chunks and 2 erasure chunks for your data. So you can handle two failures, but not three. The min_size setting is preventing you from going below 3 because that's the number of data chunks you specified for the pool. I'm sorry to say this, but

Re: [ceph-users] Suggestions/experiences with mixed disk sizes and models from 4TB - 14TB

2019-01-17 Thread Bryan Stillwell
I've run my home cluster with drives ranging in size from 500GB to 8TB before and the biggest issue you run into is that the bigger drives will get a proportional more number of PGs which will increase the memory requirements on them. Typically you want around 100 PGs/OSD, but if you mix 4TB

Re: [ceph-users] Removing orphaned radosgw bucket indexes from pool

2018-11-29 Thread Bryan Stillwell
Wido, I've been looking into this large omap objects problem on a couple of our clusters today and came across your script during my research. The script has been running for a few hours now and I'm already over 100,000 'orphaned' objects! It appears that ever since upgrading to Luminous

[ceph-users] osdmaps not being cleaned up in 12.2.8

2019-01-07 Thread Bryan Stillwell
I have a cluster with over 1900 OSDs running Luminous (12.2.8) that isn't cleaning up old osdmaps after doing an expansion. This is even after the cluster became 100% active+clean: # find /var/lib/ceph/osd/ceph-1754/current/meta -name 'osdmap*' | wc -l 46181 With the osdmaps being over 600KB

Re: [ceph-users] Is it possible to increase Ceph Mon store?

2019-01-07 Thread Bryan Stillwell
I believe the option you're looking for is mon_data_size_warn. The default is set to 16106127360. I've found that sometimes the mons need a little help getting started with trimming if you just completed a large expansion. Earlier today I had a cluster where the mon's data directory was over

Re: [ceph-users] Omap issues - metadata creating too many

2019-01-03 Thread Bryan Stillwell
, January 3, 2019 at 3:49 AM To: "J. Eric Ivancich" Cc: "ceph-users@lists.ceph.com" , Bryan Stillwell Subject: Re: [ceph-users] Omap issues - metadata creating too many Hi, i had the default - so it was on(according to ceph kb). turned it off, but the issue persists. i noticed Br

Re: [ceph-users] osdmaps not being cleaned up in 12.2.8

2019-01-11 Thread Bryan Stillwell
I've created the following bug report to address this issue: http://tracker.ceph.com/issues/37875 Bryan From: ceph-users on behalf of Bryan Stillwell Date: Friday, January 11, 2019 at 8:59 AM To: Dan van der Ster Cc: ceph-users Subject: Re: [ceph-users] osdmaps not being cleaned up

Re: [ceph-users] osdmaps not being cleaned up in 12.2.8

2019-01-11 Thread Bryan Stillwell
to 49,272 osdmaps hanging around. The churn trick seems to be working again too. Bryan From: Dan van der Ster Date: Thursday, January 10, 2019 at 3:13 AM To: Bryan Stillwell Cc: ceph-users Subject: Re: [ceph-users] osdmaps not being cleaned up in 12.2.8 Hi Bryan, I think this is the old hammer

Re: [ceph-users] osdmaps not being cleaned up in 12.2.8

2019-01-08 Thread Bryan Stillwell
this was the solution Dan came across back in the hammer days. It works, but not ideal for sure. Across the cluster it freed up around 50TB of data! Bryan From: ceph-users on behalf of Bryan Stillwell Date: Monday, January 7, 2019 at 2:40 PM To: ceph-users Subject: [ceph-users] osdmaps not being cleaned

Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-09 Thread Bryan Stillwell
> On Apr 8, 2019, at 5:42 PM, Bryan Stillwell wrote: > > >> On Apr 8, 2019, at 4:38 PM, Gregory Farnum wrote: >> >> On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell >> wrote: >>> >>> There doesn't appear to be any correlation between

[ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Bryan Stillwell
We have two separate RGW clusters running Luminous (12.2.8) that have started seeing an increase in PGs going active+clean+inconsistent with the reason being caused by an omap_digest mismatch. Both clusters are using FileStore and the inconsistent PGs are happening on the .rgw.buckets.index

Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Bryan Stillwell
> On Apr 8, 2019, at 4:38 PM, Gregory Farnum wrote: > > On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell wrote: >> >> There doesn't appear to be any correlation between the OSDs which would >> point to a hardware issue, and since it's happening on two different

[ceph-users] Is repairing an RGW bucket index broken?

2019-03-11 Thread Bryan Stillwell
I'm wondering if the 'radosgw-admin bucket check --fix' command is broken in Luminous (12.2.8)? I'm asking because I'm trying to reproduce a situation we have on one of our production clusters and it doesn't seem to do anything. Here's the steps of my test: 1. Create a bucket with 1 million

Re: [ceph-users] help! pg inactive and slow requests after filestore to bluestore migration, version 12.2.12

2019-12-12 Thread Bryan Stillwell
Jelle, Try putting just the WAL on the Optane NVMe. I'm guessing your DB is too big to fit within 5GB. We used a 5GB journal on our nodes as well, but when we switched to BlueStore (using ceph-volume lvm batch) it created 37GiB logical volumes (200GB SSD / 5 or 400GB SSD / 10) for our DBs.

Re: [ceph-users] Ceph OSD node trying to possibly start OSDs that were purged

2019-10-29 Thread Bryan Stillwell
On Oct 29, 2019, at 11:23 AM, Jean-Philippe Méthot wrote: > A few months back, we had one of our OSD node motherboards die. At the time, > we simply waited for recovery and purged the OSDs that were on the dead node. > We just replaced that node and added back the drives as new OSDs. At the