[ceph-users] Deleted a pool - when will a PG be removed from the OSD?

2017-04-19 Thread Daniel Marks
Hi all, I am wondering when the PGs for a deleted pool get removed from their OSDs. http://docs.ceph.com/docs/master/dev/osd_internals/pg_removal/ says that it is happening asynchronously, but what is the trigger? I deleted the p

Re: [ceph-users] rbd kernel client fencing

2017-04-19 Thread Chaofan Yu
Thank you so much. The blacklist entries are stored in osd map, which is supposed to be tiny and clean. So we are doing similar cleanups after reboot. I’m quite interested in how the host commit suicide and reboot, can you successfully umount the folder and unmap the rbd block device after it

Re: [ceph-users] bluestore object overhead

2017-04-19 Thread Jason Dillaman
Does the bluestore min alloc size apply for 4k block-size files [1]? [1] https://github.com/ceph/ceph/blob/master/src/common/config_opts.h#L1063 On Wed, Apr 19, 2017 at 4:51 PM, Gregory Farnum wrote: > On Wed, Apr 19, 2017 at 1:49 PM, Pavel Shub wrote: >> On Wed, Apr 19, 2017 at 4:33 PM, Gregor

[ceph-users] chooseleaf updates

2017-04-19 Thread Donny Davis
In reading the docs, I am curious if I can change the chooseleaf parameter as my cluster expands. I currently only have one node and used this parameter in ceph.conf osd crush chooseleaf type = 0 Can this be changed after I expand nodes. The other two nodes are currently on gluster, but moving to

Re: [ceph-users] Extremely high OSD memory utilization on Kraken 11.2.0 (with XFS -or- bluestore)

2017-04-19 Thread Aaron Ten Clay
I'm new to doing this all via systemd and systemd-coredump, but I appear to have gotten cores from two OSD processes. When xzipped they are < 2MIB each, but I threw them on my webserver to avoid polluting the mailing list. This seems oddly small, so if I've botched the process somehow let me know :

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Reed Dier
In this case the spinners have their journals on an NVMe drive, 3 OSD : 1 NVMe Journal. Will be trying tomorrow to get some benchmarks and compare some hdd/ssd/hybrid workloads to see performance differences across the three backing layers. Most client traffic is read oriented to begin with, so

Re: [ceph-users] Creating journal on needed partition

2017-04-19 Thread Ben Hines
This is my experience. For creating new OSDs, i just created Rundeck jobs that run ceph-deploy. It's relatively rare that new OSDs are created, so it is fine. Originally I was automating them with configuration management tools but it tended to encounter edge cases and problems that ceph-deploy a

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Anthony D'Atri
Re ratio, I think you’re right. Write performance depends for sure on what the journal devices are. If the journals are colo’d on spinners, then for sure the affinity game isn’t going to help writes massively. My understanding of write latency is that min_size journals have to be written befo

Re: [ceph-users] rbd kernel client fencing

2017-04-19 Thread Kjetil Jørgensen
Hi, As long as you blacklist the old owner by ip, you should be fine. Do note that rbd lock remove implicitly also blacklists unless you also pass rbd lock remove the --rbd_blacklist_on_break_lock=false option. (that is I think "ceph osd blacklist add a.b.c.d interval" translates into blacklisting

Re: [ceph-users] bluestore object overhead

2017-04-19 Thread Gregory Farnum
On Wed, Apr 19, 2017 at 1:49 PM, Pavel Shub wrote: > On Wed, Apr 19, 2017 at 4:33 PM, Gregory Farnum wrote: >> On Wed, Apr 19, 2017 at 1:26 PM, Pavel Shub wrote: >>> Hey All, >>> >>> I'm running a test of bluestore in a small VM and seeing 2x overhead >>> for each object in cephfs. Here's the ou

Re: [ceph-users] bluestore object overhead

2017-04-19 Thread Pavel Shub
On Wed, Apr 19, 2017 at 4:33 PM, Gregory Farnum wrote: > On Wed, Apr 19, 2017 at 1:26 PM, Pavel Shub wrote: >> Hey All, >> >> I'm running a test of bluestore in a small VM and seeing 2x overhead >> for each object in cephfs. Here's the output of df detail >> https://gist.github.com/pavel-citymaps

Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-19 Thread Gregory Farnum
On Tue, Apr 18, 2017 at 11:34 AM, Peter Maloney wrote: > On 04/18/17 11:44, Jogi Hofmüller wrote: > > Hi, > > Am Dienstag, den 18.04.2017, 13:02 +0200 schrieb mj: > > On 04/18/2017 11:24 AM, Jogi Hofmüller wrote: > > This might have been true for hammer and older versions of ceph. > From > what I

Re: [ceph-users] librbd::ImageCtx: error reading immutable metadata: (2) No such file or directory

2017-04-19 Thread Gregory Farnum
On Tue, Apr 18, 2017 at 4:27 AM, Frode Nordahl wrote: > Hello all, > > A while ago I came across a Ceph cluster with a RBD volume missing the > header object describing the characteristics of the volume, making it > impossible to attach or perform any operations on said volume. > > As a courtesy t

Re: [ceph-users] bluestore object overhead

2017-04-19 Thread Gregory Farnum
On Wed, Apr 19, 2017 at 1:26 PM, Pavel Shub wrote: > Hey All, > > I'm running a test of bluestore in a small VM and seeing 2x overhead > for each object in cephfs. Here's the output of df detail > https://gist.github.com/pavel-citymaps/868a7c4b1c43cea9ab86cdf2e79198ee > > This is on a VM with all

[ceph-users] bluestore object overhead

2017-04-19 Thread Pavel Shub
Hey All, I'm running a test of bluestore in a small VM and seeing 2x overhead for each object in cephfs. Here's the output of df detail https://gist.github.com/pavel-citymaps/868a7c4b1c43cea9ab86cdf2e79198ee This is on a VM with all daemons & 20gb disk, all pools are of size 1. Is this the expect

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Reed Dier
Hi Maxime, This is a very interesting concept. Instead of the primary affinity being used to choose SSD for primary copy, you set crush rule to first choose an osd in the ‘ssd-root’, then the ‘hdd-root’ for the second set. And with 'step chooseleaf first {num}’ > If {num} > 0 && < pool-num-repl

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Maxime Guyot
Hi, >> Assuming production level, we would keep a pretty close 1:2 SSD:HDD ratio, >1:4-5 is common but depends on your needs and the devices in question, ie. >assuming LFF drives and that you aren’t using crummy journals. You might be speaking about different ratios here. I think that Anthony is

[ceph-users] Sharing SSD journals and SSD drive choice

2017-04-19 Thread Adam Carheden
Does anyone know if XFS uses a single thread to write to it's journal? I'm evaluating SSDs to buy as journal devices. I plan to have multiple OSDs share a single SSD for journal. I'm benchmarking several brands as described here: https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-yo

Re: [ceph-users] Adding a new rack to crush map without pain?

2017-04-19 Thread Maxime Guyot
Hi Matthew, I would expect the osd_crush_location parameter to take effect from the OSD activation. Maybe ceph-ansible would have info there? A work around might be “set noin”, restart all the OSDs once the ceph.conf includes the crush location and enjoy the automatic CRUSHmap update (if you ha

Re: [ceph-users] Ceph extension - how to equilibrate ?

2017-04-19 Thread Maxime Guyot
Hi Pascal, I ran into the same situation some time ago: a small cluster and adding a node with HDDs double the size of the existing ones and wrote about it here: http://ceph.com/planet/the-schrodinger-ceph-cluster/ When adding OSDs to a cluster rebalancing/data movement is unavoidable in most

Re: [ceph-users] Adding a new rack to crush map without pain?

2017-04-19 Thread Matthew Vernon
Hi, > How many OSD's are we talking about? We're about 500 now, and even > adding another 2000-3000 is a 5 minute cut/paste job of editing the > CRUSH map. If you really are adding racks and racks of OSD's every week, > you should have found the crush location hook a long time ago. We have 540 a

Re: [ceph-users] Ceph extension - how to equilibrate ?

2017-04-19 Thread pascal.pu...@pci-conseil.net
[...] I hope those aren't SMR disks... make sure they're not or it will be very slow, to the point where *osds will time out and die*. Hopefully : DELL 8TB 7.2K RPM NLSAS 12Gbps 512e 3.5in Hot-plug Hard Drive,sync :) This is not for performance, just for cold data. ceph osd crush move osd.X h

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Richard Hesketh
On 18/04/17 22:28, Anthony D'Atri wrote: > I get digests, so please forgive me if this has been covered already. > >> Assuming production level, we would keep a pretty close 1:2 SSD:HDD ratio, > > 1:4-5 is common but depends on your needs and the devices in question, ie. > assuming LFF drives an

Re: [ceph-users] 答复: Does cephfs guarantee client cache consistency for file data?

2017-04-19 Thread John Spray
On Wed, Apr 19, 2017 at 11:03 AM, 许雪寒 wrote: > Thanks, everyone:-) > > I'm still not very clear. Do these cache "capabilities" only apply to > metadata operations or both metadata and data? Both metadata and data are consistent between clients. If a client has the capability to buffer data for

[ceph-users] 答复: Why is there no data backup mechanism in the rados layer?

2017-04-19 Thread 许雪寒
Thanks:-) If there is a mechanism that can provide a replication functionality with the granularity of object, that is rados can do replication of objects on behalf of higher layer features while leaving other objects not replicated if those features decide some objects need to be replicated to

[ceph-users] 答复: Does cephfs guarantee client cache consistency for file data?

2017-04-19 Thread 许雪寒
Thanks, everyone:-) I'm still not very clear. Do these cache "capabilities" only apply to metadata operations or both metadata and data? -邮件原件- 发件人: David Disseldorp [mailto:dd...@suse.de] 发送时间: 2017年4月19日 16:46 收件人: 许雪寒 抄送: ceph-users@lists.ceph.com 主题: Re: [ceph-users] Does cephfs gua

[ceph-users] rbd kernel client fencing

2017-04-19 Thread Chaofan Yu
Hi list, I wonder someone can help with rbd kernel client fencing (aimed to avoid simultaneously rbd map on different hosts). I know the exclusive rbd image feature is added later to avoid manual rbd lock CLIs. But want to know previous blacklist solution. The official workflow I’ve got is l

Re: [ceph-users] Does cephfs guarantee client cache consistency for file data?

2017-04-19 Thread David Disseldorp
Hi, On Wed, 19 Apr 2017 08:19:50 +, 许雪寒 wrote: > I’m new to cephfs. I wonder whether cephfs guarantee client cache consistency > for file content. For example, if client A read some data of file X, then > client B modified the X’s content in the range that A read, will A be > notified of t

[ceph-users] Does cephfs guarantee client cache consistency for file data?

2017-04-19 Thread 许雪寒
Hi, everyone. I’m new to cephfs. I wonder whether cephfs guarantee client cache consistency for file content. For example, if client A read some data of file X, then client B modified the X’s content in the range that A read, will A be notified of the modification? _

Re: [ceph-users] OSD disk concern

2017-04-19 Thread Peter Maloney
On 04/19/17 07:42, gjprabu wrote: > Hi Shuresh, > >Thanks for your reply, Is it ok to have OS on normal SATA > hard drive, volume and journal on same SSD. Mainly we are asking this > suggestion for performance purpose. > For performance, it's always best to make it as parallel as pos

Re: [ceph-users] OSD disk concern

2017-04-19 Thread gjprabu
Hi Shuresh, Thanks for your reply, Is it ok to have OS on normal SATA hard drive, volume and journal on same SSD. Mainly we are asking this suggestion for performance purpose. Regards Prabu GJ On Wed, 19 Apr 2017 11:54:04 +0530 Shuresh