OSD crash during repair

2013-09-05 Thread Chris Dunlop
G'day, I'm getting an OSD crash on 0.56.7-1~bpo70+1 whilst trying to repair an OSD: http://tracker.ceph.com/issues/6233 ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33) 1: /usr/bin/ceph-osd() [0x8530a2] 2: (()+0xf030) [0x7f541ca39030] 3: (gsignal()+0x35) [0x7f541b132475]

Re: OSD crash during repair

2013-09-05 Thread Chris Dunlop
is. Thanks! sage On Fri, 6 Sep 2013, Chris Dunlop wrote: G'day, I'm getting an OSD crash on 0.56.7-1~bpo70+1 whilst trying to repair an OSD: http://tracker.ceph.com/issues/6233 ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33) 1: /usr/bin/ceph-osd() [0x8530a2

Re: OSD crash during repair

2013-09-05 Thread Chris Dunlop
On Thu, Sep 05, 2013 at 07:55:52PM -0700, Sage Weil wrote: On Fri, 6 Sep 2013, Chris Dunlop wrote: Hi Sage, Does this answer your question? 2013-09-06 09:30:19.813811 7f0ae8cbc700 0 log [INF] : applying configuration change: internal_safe_to_start_threads = 'true' 2013-09-06 09:33

Re: OSD crash during repair

2013-09-05 Thread Chris Dunlop
On Fri, Sep 06, 2013 at 01:12:21PM +1000, Chris Dunlop wrote: On Thu, Sep 05, 2013 at 07:55:52PM -0700, Sage Weil wrote: On Fri, 6 Sep 2013, Chris Dunlop wrote: Hi Sage, Does this answer your question? 2013-09-06 09:30:19.813811 7f0ae8cbc700 0 log [INF] : applying configuration change

OSD repair: on disk size does not match object info size

2013-09-09 Thread Chris Dunlop
G'day, On 0.56.7-1~bpo70+1 I'm getting: # ceph pg dump | grep inconsistent 013-09-10-08:39:59 2.bc27760 0 0 11521799680 162063 162063 active+clean+inconsistent 2013-09-10 08:38:38.482302 20512'69987720360'13461026 [6,0] [6,0] 20512'699877

Re: OSD repair: on disk size does not match object info size

2013-09-09 Thread Chris Dunlop
On Mon, Sep 09, 2013 at 04:30:33PM -0700, Sage Weil wrote: On Tue, 10 Sep 2013, Chris Dunlop wrote: G'day, On 0.56.7-1~bpo70+1 I'm getting: # ceph pg dump | grep inconsistent 013-09-10-08:39:59 2.bc27760 0 0 11521799680 162063 162063 active+clean

Re: OSD repair: on disk size does not match object info size

2013-09-09 Thread Chris Dunlop
On Mon, Sep 09, 2013 at 05:14:14PM -0700, Sage Weil wrote: On Tue, 10 Sep 2013, Chris Dunlop wrote: On Mon, Sep 09, 2013 at 04:30:33PM -0700, Sage Weil wrote: On Tue, 10 Sep 2013, Chris Dunlop wrote: G'day, On 0.56.7-1~bpo70+1 I'm getting: # ceph pg dump | grep inconsistent 013-09-10-08

Bobtail to dumpling (was: OSD crash during repair)

2013-09-10 Thread Chris Dunlop
On Fri, Sep 06, 2013 at 08:21:07AM -0700, Sage Weil wrote: On Fri, 6 Sep 2013, Chris Dunlop wrote: On Thu, Sep 05, 2013 at 07:55:52PM -0700, Sage Weil wrote: Also, you should upgrade to dumpling. :) I've been considering it. It was initially a little scary with the various issues that were

Re: [ceph-users] I have PGs that I can't deep-scrub

2014-07-10 Thread Chris Dunlop
Hi Craig, On Thu, Jul 10, 2014 at 03:09:51PM -0700, Craig Lewis wrote: I fixed this issue by reformatting all of the OSDs. I changed the mkfs options from [osd] osd mkfs type = xfs osd mkfs options xfs = -l size=1024m -n size=64k -i size=2048 -s size=4096 to [osd] osd mkfs type

slow requests, hunting for new mon

2013-02-11 Thread Chris Dunlop
Hi, What are likely causes for slow requests and monclient: hunting for new mon messages? E.g.: 2013-02-12 16:27:07.318943 7f9c0bc16700 0 monclient: hunting for new mon ... 2013-02-12 16:27:45.892314 7f9c13c26700 0 log [WRN] : 6 slow requests, 6 included below; oldest blocked for 30.383883

Re: slow requests, hunting for new mon

2013-02-12 Thread Chris Dunlop
On Tue, Feb 12, 2013 at 06:28:15PM +1100, Chris Dunlop wrote: Hi, What are likely causes for slow requests and monclient: hunting for new mon messages? E.g.: 2013-02-12 16:27:07.318943 7f9c0bc16700 0 monclient: hunting for new mon ... 2013-02-12 16:27:45.892314 7f9c13c26700 0 log [WRN

Re: slow requests, hunting for new mon

2013-02-14 Thread Chris Dunlop
On 2013-02-12, Chris Dunlop ch...@onthe.net.au wrote: Hi, What are likely causes for slow requests and monclient: hunting for new mon messages? E.g.: 2013-02-12 16:27:07.318943 7f9c0bc16700 0 monclient: hunting for new mon ... 2013-02-12 16:27:45.892314 7f9c13c26700 0 log [WRN] : 6 slow

Mon losing touch with OSDs

2013-02-14 Thread Chris Dunlop
G'day, In an otherwise seemingly healthy cluster (ceph 0.56.2), what might cause the mons to lose touch with the osds? I imagine a network glitch could cause it, but I can't see any issues in any other system logs on any of the machines on the network. Having (mostly?) resolved my previous slow

Re: Mon losing touch with OSDs

2013-02-15 Thread Chris Dunlop
G'day Sage, On Thu, Feb 14, 2013 at 08:57:11PM -0800, Sage Weil wrote: On Fri, 15 Feb 2013, Chris Dunlop wrote: In an otherwise seemingly healthy cluster (ceph 0.56.2), what might cause the mons to lose touch with the osds? Can you enable 'debug ms = 1' on the mons and leave them that way

Re: Mon losing touch with OSDs

2013-02-18 Thread Chris Dunlop
On Sun, Feb 17, 2013 at 05:44:29PM -0800, Sage Weil wrote: On Mon, 18 Feb 2013, Chris Dunlop wrote: On Sat, Feb 16, 2013 at 09:05:21AM +1100, Chris Dunlop wrote: On Thu, Feb 14, 2013 at 08:57:11PM -0800, Sage Weil wrote: On Fri, 15 Feb 2013, Chris Dunlop wrote: In an otherwise seemingly

Re: Mon losing touch with OSDs

2013-02-19 Thread Chris Dunlop
On Tue, Feb 19, 2013 at 02:02:03PM +1100, Chris Dunlop wrote: On Sun, Feb 17, 2013 at 05:44:29PM -0800, Sage Weil wrote: On Mon, 18 Feb 2013, Chris Dunlop wrote: On Sat, Feb 16, 2013 at 09:05:21AM +1100, Chris Dunlop wrote: On Thu, Feb 14, 2013 at 08:57:11PM -0800, Sage Weil wrote: On Fri, 15

Re: Mon losing touch with OSDs

2013-02-22 Thread Chris Dunlop
On Fri, Feb 22, 2013 at 03:43:22PM -0800, Sage Weil wrote: On Sat, 23 Feb 2013, Chris Dunlop wrote: On Fri, Feb 22, 2013 at 01:57:32PM -0800, Sage Weil wrote: On Fri, 22 Feb 2013, Chris Dunlop wrote: G'day, It seems there might be two issues here: the first being the delayed receipt

Re: Mon losing touch with OSDs

2013-02-28 Thread Chris Dunlop
On Sat, Feb 23, 2013 at 01:02:53PM +1100, Chris Dunlop wrote: On Fri, Feb 22, 2013 at 05:52:11PM -0800, Sage Weil wrote: On Sat, 23 Feb 2013, Chris Dunlop wrote: On Fri, Feb 22, 2013 at 05:30:04PM -0800, Sage Weil wrote: On Sat, 23 Feb 2013, Chris Dunlop wrote: On Fri, Feb 22, 2013 at 04:13

Re: Mon losing touch with OSDs

2013-03-07 Thread Chris Dunlop
On Thu, Feb 28, 2013 at 09:00:24PM -0800, Sage Weil wrote: On Fri, 1 Mar 2013, Chris Dunlop wrote: On Sat, Feb 23, 2013 at 01:02:53PM +1100, Chris Dunlop wrote: On Fri, Feb 22, 2013 at 05:52:11PM -0800, Sage Weil wrote: On Sat, 23 Feb 2013, Chris Dunlop wrote: On Fri, Feb 22, 2013 at 05:30

Re: Mon losing touch with OSDs

2013-03-08 Thread Chris Dunlop
On Fri, Mar 08, 2013 at 02:12:40PM +1100, Chris Dunlop wrote: On Thu, Feb 28, 2013 at 09:00:24PM -0800, Sage Weil wrote: On Fri, 1 Mar 2013, Chris Dunlop wrote: On Sat, Feb 23, 2013 at 01:02:53PM +1100, Chris Dunlop wrote: On Fri, Feb 22, 2013 at 05:52:11PM -0800, Sage Weil wrote: On Sat, 23

Speed up 'rbd rm'

2013-05-29 Thread Chris Dunlop
Hi, I know 'rbd rm' is notoriously slow, even on never-written devices: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6740 http://tracker.ceph.com/issues/2256 I fat fingered an 'rbd create' and accidentally created a 1.5 PB device, and it's going to take some considerable time to

Re: Speed up 'rbd rm'

2013-05-29 Thread Chris Dunlop
On Wed, May 29, 2013 at 12:21:07PM -0700, Josh Durgin wrote: On 05/28/2013 10:59 PM, Chris Dunlop wrote: I see there's a new commit to speed up an 'rbd rm': http://tracker.ceph.com/projects/ceph/repository/revisions/40956410169709c32a282d9b872cb5f618a48926 Is it safe to cherry-pick

Re: Speed up 'rbd rm'

2013-06-03 Thread Chris Dunlop
On Thu, May 30, 2013 at 07:04:28PM -0700, Josh Durgin wrote: On 05/30/2013 06:40 PM, Chris Dunlop wrote: On Thu, May 30, 2013 at 01:50:14PM -0700, Josh Durgin wrote: On 05/29/2013 07:23 PM, Chris Dunlop wrote: On Wed, May 29, 2013 at 12:21:07PM -0700, Josh Durgin wrote: On 05/28/2013 10:59 PM

krbd + format=2 ?

2013-06-03 Thread Chris Dunlop
G'day, Sage's recent pull message to Linus said: Please pull the following Ceph patches from git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus This is a big pull. Most of it is culmination of Alex's work to implement RBD image layering, which is now complete

Re: krbd + format=2 ?

2013-06-07 Thread Chris Dunlop
On Fri, Jun 07, 2013 at 11:54:20AM -0500, Alex Elder wrote: On 06/03/2013 04:24 AM, Chris Dunlop wrote: I pulled the for-linus branch (@ 3abef3b) on top of 3.10.0-rc4, and it's letting me map a format=2 image (created under bobtail), however reading from the block device returns zeros rather

Re: krbd + format=2 ?

2013-06-11 Thread Chris Dunlop
On Sat, Jun 08, 2013 at 12:48:52PM +1000, Chris Dunlop wrote: On Fri, Jun 07, 2013 at 11:54:20AM -0500, Alex Elder wrote: On 06/03/2013 04:24 AM, Chris Dunlop wrote: I pulled the for-linus branch (@ 3abef3b) on top of 3.10.0-rc4, and it's letting me map a format=2 image (created under bobtail

Re: krbd + format=2 ?

2013-06-13 Thread Chris Dunlop
On Wed, Jun 12, 2013 at 08:56:50PM -0700, Josh Durgin wrote: On 06/11/2013 09:59 PM, Chris Dunlop wrote: Looks like the kernel rbd and librbd aren't compatible, as at 3.10.0-rc4+ceph-client/for-linus@3abef3b vs librbd1 0.56.6-1~bpo70+1. Thanks for the detailed report Chris. The kernel client

libceph: error -2 building auth method x request

2012-10-14 Thread Chris Dunlop
G'day, In case anyone else might be getting the $subject error, and to make the issue visible to a google search... When trying to map an rdb block device whilst using cephx authentication, if you get kernel messages like: [79683.055935] libceph: client0 fsid

kernel rbd format=2

2012-12-17 Thread Chris Dunlop
Hi, Format 2 images (and attendant layering support) are not yet supported by the kernel rbd client, according to: http://ceph.com/docs/master/rbd/rbd-snapshot/#layering When might this support be available? Cheers, Chris -- To unsubscribe from this list: send the line unsubscribe ceph-devel

Re: Simple test system not working

2011-01-31 Thread Chris Dunlop
Chris Dunlop chris at onthe.net.au writes: The ceph filesystem is created using the attached ceph.conf and: Sorry, no attachments. ceph.conf: -- [global] pid file = /var/run/ceph/$name.pid debug ms = 20 debug mon = 20 [mon

Speling fixes

2011-02-01 Thread Chris Dunlop
Trivial spelling fixes... diff --git a/man/mount.ceph.8 b/man/mount.ceph.8 index d5fe62a..cfcf8f8 100644 --- a/man/mount.ceph.8 +++ b/man/mount.ceph.8 @@ -134,7 +134,7 @@ Report rbytes for st_size on directories. Default: on norbytes .IP -Do not report rbytse for st_size. +Do not report

Improve mount.ceph.8

2011-02-01 Thread Chris Dunlop
Seeing as I was wondering what rbytes might be, I guess others might too... diff --git a/man/mount.ceph.8 b/man/mount.ceph.8 index d5fe62a..f2cdd86 100644 --- a/man/mount.ceph.8 +++ b/man/mount.ceph.8 @@ -129,12 +129,12 @@ no funky `cat dirname` for stats rbytes .IP -Report rbytes for st_size

Re: cmon: PGMonitor::encode_pending() assert failure

2011-02-03 Thread Chris Dunlop
On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote: Hi Chris, This is an interesting one. Would it be possible for you to tar up your mondata directory on the failed node and post it somewhere I can get at it? From the looks of things the pgmap incremental state file is truncated,

Hard links (was: Fixing NFS)

2011-02-03 Thread Chris Dunlop
On 2011-02-03, Sage Weil s...@newdream.net wrote: There are a couple of levels of difficulty. The main problem is that the only truly stable information in the NFS fh is the inode number, and Ceph's architecture simply doesn't support lookup-by-ino. (It uses an extra table to support it

Re: cmon: PGMonitor::encode_pending() assert failure

2011-02-03 Thread Chris Dunlop
On Thu, Feb 03, 2011 at 02:24:43PM -0800, Sage Weil wrote: On Fri, 4 Feb 2011, Chris Dunlop wrote: On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote: Hi Chris, This is an interesting one. Would it be possible for you to tar up your mondata directory on the failed node and post

Re: cmon: PGMonitor::encode_pending() assert failure

2011-02-06 Thread Chris Dunlop
On Thu, Feb 03, 2011 at 02:24:43PM -0800, Sage Weil wrote: On Fri, 4 Feb 2011, Chris Dunlop wrote: On Thu, Feb 03, 2011 at 01:03:17PM -0800, Sage Weil wrote: http://tracker.newdream.net/issues/762 I can revert back to my previous install and run the same workload to see if it crops up

Re: WARNING: at fs/btrfs/inode.c:2143 btrfs_orphan_commit_root+0x7f/0x9b

2011-02-07 Thread Chris Dunlop
On Mon, Feb 07, 2011 at 06:39:11PM +1100, Chris Dunlop wrote: On Mon, Feb 07, 2011 at 05:31:02PM +1100, Chris Dunlop wrote: G'day, Using Josef's btrfs-work bacae123 (+ ceph-client 9aae8faf), I can consistently reproduce the following btrfs warning by simply creating and starting a new btrfs

Re: WARNING: at fs/btrfs/inode.c:2143 btrfs_orphan_commit_root+0x7f/0x9b

2011-02-08 Thread Chris Dunlop
On Tue, Feb 08, 2011 at 07:19:36AM -0800, Sage Weil wrote: On Tue, 8 Feb 2011, Chris Dunlop wrote: On Mon, Feb 07, 2011 at 06:39:11PM +1100, Chris Dunlop wrote: On Mon, Feb 07, 2011 at 05:31:02PM +1100, Chris Dunlop wrote: [ 549.767234] [ cut here ] [ 549.767276

Re: [ceph-users] All pgs stuck peering

2015-12-14 Thread Chris Dunlop
On Mon, Dec 14, 2015 at 09:29:20PM +0800, Jaze Lee wrote: > Should we add big packet test in heartbeat? Right now the heartbeat > only test the little packet. If the MTU is mismatched, the heartbeat > can not find that. It would certainly have saved me a great deal of stress! I imagine you