Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-17 Thread Dan Jakubiec
Also worth pointing out something a bit obvious but: this kind of faster/destructive migration should only be attempted if all your pools are at least 3x replicated. For example, if you had a 1x replicated pool you would lose data using this approach. -- Dan > On Jan 11, 2018, at 14:24, Reed

Re: [ceph-users] CephFS log jam prevention

2017-12-05 Thread Dan Jakubiec
To add a little color here... we started an rsync last night to copy about 4TB worth of files to CephFS. Paused it this morning because CephFS was unresponsive on the machine (e.g. can't cat a file from the filesystem). Been waiting about 3 hours for the log jam to clear. Slow requests have

Re: [ceph-users] stalls caused by scrub on jewel

2016-12-02 Thread Dan Jakubiec
> On Dec 2, 2016, at 10:48, Sage Weil <s...@newdream.net> wrote: > > On Fri, 2 Dec 2016, Dan Jakubiec wrote: >> For what it's worth... this sounds like the condition we hit we >> re-enabled scrub on our 16 OSDs (after 6 to 8 weeks of noscrub). They >> fla

Re: [ceph-users] How to pick the number of PGs for a CephFS metadata pool?

2016-11-08 Thread Dan Jakubiec
solute data you're less worried about balancing the data precisely > evenly and more concerned about not accidentally driving all IO to one > of 7 disks because you have 8 PGs, and all your supposedly-parallel > ops are contending. ;) > -Greg > >> >> Thanks, >> &g

[ceph-users] How to pick the number of PGs for a CephFS metadata pool?

2016-11-08 Thread Dan Jakubiec
Hello, Picking the number of PGs for the CephFS data pool seems straightforward, but how does one do this for the metadata pool? Any rules of thumb or recommendations? Thanks, -- Dan Jakubiec ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Multi-tenancy and sharing CephFS data pools with other RADOS users

2016-11-02 Thread Dan Jakubiec
We currently have one master RADOS pool in our cluster that is shared among many applications. All objects stored in the pool are currently stored using specific namespaces -- nothing is stored in the default namespace. We would like to add a CephFS filesystem to our cluster, and would like to

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-24 Thread Dan Jakubiec
Thanks Kostis, great read. We also had a Ceph disaster back in August and a lot of this experience looked familiar. Sadly, in the end we were not able to recover our cluster but glad to hear that you were successful. LevelDB corruptions were one of our big problems. Your note below about

Re: [ceph-users] Recovery/Backfill Speedup

2016-10-05 Thread Dan Jakubiec
toring how many pg's are backfilling and the load on machines and network. kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dan Jakubiec VP Development Focus VQ LLC

[ceph-users] Is rados_write_op_* any more efficient than issuing the commands individually?

2016-09-06 Thread Dan Jakubiec
Hello, I need to issue the following commands on millions of objects: rados_write_full(oid1, ...) rados_setxattr(oid1, "attr1", ...) rados_setxattr(oid1, "attr2", ...) Would it make it any faster if I combined all 3 of these into a single rados_write_op and issued them "together" as a single

Re: [ceph-users] OSD daemon randomly stops

2016-09-03 Thread Dan Jakubiec
Hi Brad, thank you very much for the response: > On Sep 3, 2016, at 17:05, Brad Hubbard <bhubb...@redhat.com> wrote: > > > > On Sun, Sep 4, 2016 at 6:21 AM, Dan Jakubiec <dan.jakub...@gmail.com > <mailto:dan.jakub...@gmail.com>> wrote: > >&g

Re: [ceph-users] OSD daemon randomly stops

2016-09-03 Thread Dan Jakubiec
Hi Samuel, Here is another assert, but this time with debug filestore = 20. Does this reveal anything? 2016-09-03 16:12:44.122451 7fec728c9700 20 list_by_hash_bitwise prefix 08F3 2016-09-03 16:12:44.123046 7fec728c9700 20 list_by_hash_bitwise prefix 08F30042 2016-09-03 16:12:44.123068

Re: [ceph-users] How to abandon PGs that are stuck in "incomplete"?

2016-09-03 Thread Dan Jakubiec
mand to removed the old OSD, I think our next step will be to bring up a new/real/empty OSD.8 and see if that will clear the log jam. But seems like there should be a tool to deal with this kind of thing? Thanks, -- Dan > On Sep 2, 2016, at 15:01, Dan Jakubiec <dan.jakub...@gmail.com&

[ceph-users] Can someone explain the strange leftover OSD devices in CRUSH map -- renamed from osd.N to deviceN?

2016-09-02 Thread Dan Jakubiec
A while back we removed two damaged OSDs from our cluster, osd.0 and osd.8. They are now gone from most Ceph commands, but are still showing up in the CRUSH map with weird device names: ... # devices device 0 device0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5

[ceph-users] How to abandon PGs that are stuck in "incomplete"?

2016-09-02 Thread Dan Jakubiec
Re-packaging this question which was buried in a larger, less-specific thread from a couple of days ago. Hoping this will be more useful here. We have been working on restoring our Ceph cluster after losing a large number of OSDs. We have all PGs active now except for 80 PGs that are stuck in

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Dan Jakubiec
Thanks you for all the help Wido: > On Sep 1, 2016, at 14:03, Wido den Hollander wrote: > > You have to mark those OSDs as lost and also force create the incomplete PGs. > This might be the root of our problems. We didn't mark the parent OSD as "lost" before we removed it.

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Dan Jakubiec
Thanks Wido. Reed and I have been working together to try to restore this cluster for about 3 weeks now. I have been accumulating a number of failure modes that I am hoping to share with the Ceph group soon, but have been holding off a bit until we see the full picture clearly so that we can

Re: [ceph-users] librados Java support for rados_lock_exclusive()

2016-08-25 Thread Dan Jakubiec
with librados. > > You are more then welcome to send a Pull Request though! > https://github.com/ceph/rados-java/pulls > > Wido > >> Op 24 augustus 2016 om 21:58 schreef Dan Jakubiec <dan.jakub...@gmail.com>: >> >> >> Hello, >> >>

[ceph-users] librados Java support for rados_lock_exclusive()

2016-08-24 Thread Dan Jakubiec
Hello, Is anyone planning to implement support for Rados locks in the Java API anytime soon? Thanks, -- Dan J ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How can we repair OSD leveldb?

2016-08-17 Thread Dan Jakubiec
Hi Wido, Thank you for the response: > On Aug 17, 2016, at 16:25, Wido den Hollander <w...@42on.com> wrote: > > >> Op 17 augustus 2016 om 17:44 schreef Dan Jakubiec <dan.jakub...@gmail.com>: >> >> >> Hello, we have a Ceph cluster with 8 OSD t

[ceph-users] How can we repair OSD leveldb?

2016-08-17 Thread Dan Jakubiec
Hello, we have a Ceph cluster with 8 OSD that recently lost power to all 8 machines. We've managed to recover the XFS filesystems on 7 of the machines, but the OSD service is only starting on 1 of them. The other 5 machines all have complaints similar to the following: 2016-08-17