Re: [ceph-users] I have PGs that I can't deep-scrub

2014-07-10 Thread Chris Dunlop
Hi Craig, On Thu, Jul 10, 2014 at 03:09:51PM -0700, Craig Lewis wrote: I fixed this issue by reformatting all of the OSDs. I changed the mkfs options from [osd] osd mkfs type = xfs osd mkfs options xfs = -l size=1024m -n size=64k -i size=2048 -s size=4096 to [osd] osd mkfs type

Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

2013-11-18 Thread Chris Dunlop
is to remove the file manually from the OSD’s filesystem and perform a repair of the PG that holds that object. This will copy the object back from one of the replicas. David On Nov 17, 2013, at 10:46 PM, Chris Dunlop ch...@onthe.net.au wrote: Hi David, On Fri, Nov 15, 2013 at 10:00

Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

2013-11-18 Thread Chris Dunlop
, we currently don’t have a way to determine which copy(s) are corrupt. This is where a manual intervention may be necessary if the administrator can determine which copy(s) are bad. David Zafman Senior Developer http://www.inktank.com On Nov 18, 2013, at 1:11 PM, Chris Dunlop

[ceph-users] Replace OSD with larger

2014-03-04 Thread Chris Dunlop
Hi, What is the recommended procedure for replacing an osd with a larger osd in a safe and efficient manner, i.e. whilst maintaining redundancy and causing the least data movement? Would this be a matter of adding the new osd into the crush map whilst reducing the weight of the old osd to zero,

Re: [ceph-users] journal on ramdisk for testing

2013-04-25 Thread Chris Dunlop
G'day James, On Thu, Apr 25, 2013 at 07:39:27AM +, James Harper wrote: I'm doing some testing and wanted to see the effect of increasing journal speed, and the fastest way to do this seemed to be to put it on a ramdisk where latency should drop to near zero and I can see what other

Re: [ceph-users] All pgs stuck peering

2015-12-14 Thread Chris Dunlop
On Mon, Dec 14, 2015 at 09:29:20PM +0800, Jaze Lee wrote: > Should we add big packet test in heartbeat? Right now the heartbeat > only test the little packet. If the MTU is mismatched, the heartbeat > can not find that. It would certainly have saved me a great deal of stress! I imagine you

[ceph-users] All pgs stuck peering

2015-12-13 Thread Chris Dunlop
Hi, ceph 0.94.5 After restarting one of our three osd hosts to increase the RAM and change from linux 3.18.21 to 4.1., the cluster is stuck with all pgs peering: # ceph -s cluster c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 health HEALTH_WARN 3072 pgs peering 3072 pgs

Re: [ceph-users] All pgs stuck peering

2015-12-13 Thread Chris Dunlop
Hi Varada, On Mon, Dec 14, 2015 at 03:23:20AM +, Varada Kari wrote: > Can get the details of > > 1. ceph health detail > 2. ceph pg query > > of any one PG stuck peering > > > Varada The full health detail is over 9000 lines, but here's a summary: # ceph health detail | head

Re: [ceph-users] All pgs stuck peering

2015-12-13 Thread Chris Dunlop
On Sun, Dec 13, 2015 at 09:10:34PM -0700, Robert LeBlanc wrote: > I've had something similar to this when there was an MTU mismatch, the > smaller I/O would get through, but the larger I/O would be blocked and > prevent peering. > > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4

Re: [ceph-users] Cephfs: large files hang

2015-12-17 Thread Chris Dunlop
Hi Bryan, Have you checked your MTUs? I was recently bitten by large packets not getting through where small packets would. (This list, Dec 14, "All pgs stuck peering".) Small files working but big files not working smells like it could be a similar problem. Cheers, Chris On Thu, Dec 17, 2015

Re: [ceph-users] pg stuck in peering state

2015-12-18 Thread Chris Dunlop
Hi Reno, "Peering", as far as I understand it, is the osds trying to talk to each other. You have approximately 1 OSD worth of pgs stuck (i.e. 264 / 8), and osd.0 appears in each of the stuck pgs, alongside either osd.2 or osd.3. I'd start by checking the comms between osd.0 and osds 2 and 3

Re: [ceph-users] v0.94.6 Hammer released

2016-03-01 Thread Chris Dunlop
Hi, The "old list of supported platforms" includes debian wheezy. Will v0.94.6 be built for this? Chris On Mon, Feb 29, 2016 at 10:57:53AM -0500, Sage Weil wrote: > The intention was to continue building stable releases (0.94.x) on the old > list of supported platforms (which inclues 12.04 and

Re: [ceph-users] v0.94.6 Hammer released

2016-03-09 Thread Chris Dunlop
Hi Loic, On Wed, Mar 02, 2016 at 06:32:18PM +0700, Loic Dachary wrote: > I think you misread what Sage wrote : "The intention was to > continue building stable releases (0.94.x) on the old list of > supported platforms (which inclues 12.04 and el6)". In other > words, the old OS'es are still

Re: [ceph-users] v0.94.6 Hammer released

2016-03-19 Thread Chris Dunlop
Hi Stable Release Team for v0.94, On Thu, Mar 10, 2016 at 11:00:06AM +1100, Chris Dunlop wrote: > On Wed, Mar 02, 2016 at 06:32:18PM +0700, Loic Dachary wrote: >> I think you misread what Sage wrote : "The intention was to >> continue building stable releases (0.9

Re: [ceph-users] v0.94.6 Hammer released

2016-03-19 Thread Chris Dunlop
Hi Chen, On Thu, Mar 17, 2016 at 12:40:28AM +, Chen, Xiaoxi wrote: > It’s already there, in > http://download.ceph.com/debian-hammer/pool/main/c/ceph/. I can only see ceph*_0.94.6-1~bpo80+1_amd64.deb there. Debian wheezy would be bpo70. Cheers, Chris > On 3/17/16, 7:20 AM, &quo

Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Stable Release Team for v0.94, Let's try again... Any news on a release of v0.94.6 for debian wheezy (bpo70)? Cheers, Chris On Thu, Mar 17, 2016 at 12:43:15PM +1100, Chris Dunlop wrote: > Hi Chen, > > On Thu, Mar 17, 2016 at 12:40:28AM +, Chen, Xiaoxi wrote: >&g

Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Loïc, On Wed, Mar 23, 2016 at 12:14:27AM +0100, Loic Dachary wrote: > On 22/03/2016 23:49, Chris Dunlop wrote: >> Hi Stable Release Team for v0.94, >> >> Let's try again... Any news on a release of v0.94.6 for debian wheezy >> (bpo70)? > > I don't think pu

Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Loïc, On Wed, Mar 23, 2016 at 01:03:06AM +0100, Loic Dachary wrote: > On 23/03/2016 00:39, Chris Dunlop wrote: >> "The old OS'es" that were being supported up to v0.94.5 includes debian >> wheezy. It would be quite surprising and unexpected to drop support f

Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
On Wed, Mar 23, 2016 at 01:22:45AM +0100, Loic Dachary wrote: > On 23/03/2016 01:12, Chris Dunlop wrote: >> On Wed, Mar 23, 2016 at 01:03:06AM +0100, Loic Dachary wrote: >>> On 23/03/2016 00:39, Chris Dunlop wrote: >>>> "The old OS'es" that were being

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
On Mon, May 16, 2016 at 10:40:47PM +0200, Wido den Hollander wrote: > > Op 16 mei 2016 om 7:56 schreef Chris Dunlop <ch...@onthe.net.au>: > > Why do we have both pg_num and pgp_num? Given the docs say "The pgp_num > > should be equal to the pg_num": under

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
On Tue, May 17, 2016 at 08:21:48AM +0900, Christian Balzer wrote: > On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote: > > > > pg_num is the actual amount of PGs. This you can increase without any > > actual data moving. > > Yes and no. > > Increasing the pg_num will split PGs,

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
Hi Christian, On Tue, May 17, 2016 at 10:41:52AM +0900, Christian Balzer wrote: > On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote: > Most your questions would be easily answered if you did spend a few > minutes with even the crappiest test cluster and observing things (wi

[ceph-users] Increasing pg_num

2016-05-15 Thread Chris Dunlop
Hi, I'm trying to understand the potential impact on an active cluster of increasing pg_num/pgp_num. The conventional wisdom, as gleaned from the mailing lists and general google fu, seems to be to increase pg_num followed by pgp_num, both in small increments, to the target size, using "osd max

Re: [ceph-users] v0.94.7 Hammer released

2016-05-16 Thread Chris Dunlop
On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote: > This Hammer point release fixes several minor bugs. It also includes a > backport of an improved ‘ceph osd reweight-by-utilization’ command for > handling OSDs with higher-than-average utilizations. > > We recommend that all hammer

[ceph-users] Journal flushed on osd clean shutdown?

2018-06-13 Thread Chris Dunlop
Hi, Is the osd journal flushed completely on a clean shutdown? In this case, with Jewel, and FileStore osds, and a "clean shutdown" being: systemctl stop ceph-osd@${osd} I understand it's documented practice to issue a --flush-journal after shutting down down an osd if you're intending to

Re: [ceph-users] Journal flushed on osd clean shutdown?

2018-06-13 Thread Chris Dunlop
Excellent news - tks! On Wed, Jun 13, 2018 at 11:50:15AM +0200, Wido den Hollander wrote: On 06/13/2018 11:39 AM, Chris Dunlop wrote: Hi, Is the osd journal flushed completely on a clean shutdown? In this case, with Jewel, and FileStore osds, and a "clean shutdown" being: It i