Re: [ceph-users] Failed to start Ceph disk activation: /dev/dm-18

2017-05-16 Thread Kevin Olbrich
Hi, seems that I found the cause. The disk array was used for ZFS before and was not wiped. I zapped the disks with sgdisk and via ceph but "zfs_member" was still somewhere on the disk. Wiping the disk (wipefs -a -f /dev/mapper/mpatha), "ceph osd create --zap-disk" twice until entry in "df" and

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-16 Thread Jason Dillaman
It looks like it's just a ping message in that capture. Are you saying that you restarted OSD 46 and the problem persisted? On Tue, May 16, 2017 at 4:02 PM, Stefan Priebe - Profihost AG wrote: > Hello, > > while reproducing the problem, objecter_requests looks like this:

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-16 Thread Stefan Priebe - Profihost AG
Hello, while reproducing the problem, objecter_requests looks like this: { "ops": [ { "tid": 42029, "pg": "5.bd9616ad", "osd": 46, "object_id": "rbd_data.e10ca56b8b4567.311c", "object_locator": "@5",

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-16 Thread Jason Dillaman
On Tue, May 16, 2017 at 3:37 PM, Stefan Priebe - Profihost AG wrote: > We've enabled the op tracker for performance reasons while using SSD > only storage ;-( Disabled you mean? > Can enable the op tracker using ceph osd tell? Than reproduce the > problem. Check what has

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-16 Thread Stefan Priebe - Profihost AG
Hello Jason, Am 16.05.2017 um 21:32 schrieb Jason Dillaman: > Thanks for the update. In the ops dump provided, the objecter is > saying that OSD 46 hasn't responded to the deletion request of object > rbd_data.e10ca56b8b4567.311c. > > Perhaps run "ceph daemon osd.46

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-16 Thread Stefan Priebe - Profihost AG
Hello Jason, i'm happy to tell you that i've currently one VM where i can reproduce the problem. > The best option would be to run "gcore" against the running VM whose > IO is stuck, compress the dump, and use the "ceph-post-file" to > provide the dump. I could then look at all the Ceph data

[ceph-users] Hammer to Jewel upgrade questions

2017-05-16 Thread Shain Miley
Hello, I am going to be upgrading our production Ceph cluster from Hammer/Ubuntu 14.04 to Jewel/Ubuntu 16.04 and I wanted to ask a question and sanity check my upgrade plan. Here are the steps I am planning to take during the upgrade: 1)Upgrade to latest hammer on current cluster 2)Remove

[ceph-users] Failed to start Ceph disk activation: /dev/dm-18

2017-05-16 Thread Kevin Olbrich
HI! Currently I am deploying a small cluster with two nodes. I installed ceph jewel on all nodes and made a basic deployment. After "ceph osd create..." I am now getting "Failed to start Ceph disk activation: /dev/dm-18" on boot. All 28 OSDs were never active. This server has a 14 disk JBOD with

[ceph-users] S3 API with Keystone auth

2017-05-16 Thread Mārtiņš Jakubovičs
Hello all, Just entered to object storage world and set up working cluster for RadosGW and authentication using OpenStack Keystone. Swift API works great, but how to test S3 API? I mean, I find a way to test with python boto, but looks like I am missing aws_access_key_id, how to get it? Or

Re: [ceph-users] Odd cyclical cluster performance

2017-05-16 Thread Patrick Dinnen
Hi Greg, It's definitely not scrub or deep-scrub, as those are switched off for testing. Anything else you'd look at as a possible culprit here? Thanks, Patrick On Mon, May 15, 2017 at 5:51 PM, Gregory Farnum wrote: > Did you try correlating it with PG scrubbing or other

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-16 Thread Jason Dillaman
On Tue, May 16, 2017 at 2:12 AM, Stefan Priebe - Profihost AG wrote: > 3.) it still happens on pre jewel images even when they got restarted / > killed and reinitialized. In that case they've the asok socket available > for now. Should i issue any command to the socket to

[ceph-users] sortbitwise warning broken on Ceph Jewel?

2017-05-16 Thread Fabian Grünbichler
The Kraken release notes[1] contain the following note about the sortbitwise flag and upgrading from <= Jewel to > Jewel: The sortbitwise flag must be set on the Jewel cluster before upgrading to Kraken. The latest Jewel (10.2.4+) releases issue a health warning if the flag is not set, so this is

Re: [ceph-users] Ceph memory overhead when used with KVM

2017-05-16 Thread nick
Thanks for the explanation. I will create a ticket on the tracker then. Cheers Nick On Tuesday, May 16, 2017 08:16:33 AM Jason Dillaman wrote: > Sorry, I haven't had a chance to attempt to reproduce. > > I do know that the librbd in-memory cache does not restrict incoming > IO to the cache size

Re: [ceph-users] Ceph memory overhead when used with KVM

2017-05-16 Thread Jason Dillaman
Sorry, I haven't had a chance to attempt to reproduce. I do know that the librbd in-memory cache does not restrict incoming IO to the cache size while in-flight. Therefore, if you are performing 4MB writes with a queue depth of 256, you might see up to 1GB of memory allocated from the heap for

Re: [ceph-users] Ceph memory overhead when used with KVM

2017-05-16 Thread nick
Hi Jason, did you have some time to check if you can reproduce the high memory usage? I am not sure if I should create a bug report for this or if this is expected behaviour. Cheers Nick On Monday, May 08, 2017 08:55:55 AM you wrote: > Thanks. One more question: was the image a clone or a

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-16 Thread Stefan Priebe - Profihost AG
> 3.) it still happens on pre jewel images even when they got restarted > / killed and reinitialized. In that case they've the asok socket > available > for now. Should i issue any command to the socket to get log out of > the hanging vm? Qemu is still responding just ceph / disk i/O gets >

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-16 Thread Stefan Priebe - Profihost AG
Hello Jason, it got some further hints. Please see below. Am 15.05.2017 um 22:25 schrieb Jason Dillaman: > On Mon, May 15, 2017 at 3:54 PM, Stefan Priebe - Profihost AG > wrote: >> Would it be possible that the problem is the same you fixed? > > No, I would not expect it