Re: [ceph-users] Failed to read JournalPointer - MDS error (mds rank 0 is damaged)

2017-05-02 Thread Patrick Donnelly
Looks like: http://tracker.ceph.com/issues/17236 The fix is in v10.2.6. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-02 Thread Willem Jan Withagen
On 02-05-17 19:54, David Turner wrote: > Are you guys talking about 5Mbytes/sec to each journal device? Even if > you had 8 OSDs per journal and had 2000 osds... you would need a > sustained 1.25 Gbytes/sec to average 5Mbytes/sec per journal device. I'm not sure I'm following this... But I'm

Re: [ceph-users] Ceph memory overhead when used with KVM

2017-05-02 Thread Jason Dillaman
Can you share the fio job file that you utilized so I can attempt to repeat locally? On Tue, May 2, 2017 at 2:51 AM, nick wrote: > Hi Jason, > thanks for your feedback. I did now some tests over the weekend to verify the > memory overhead. > I was using qemu 2.8 (taken from the

Re: [ceph-users] RBD behavior for reads to a volume with no data written

2017-05-02 Thread Jason Dillaman
If the RBD object map feature is enabled, the read request would never even be sent to the OSD if the client knows the backing object doesn't exist. However, if the object map feature is disabled, the read request will be sent to the OSD. The OSD isn't my area of expertise, but I can try to

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-02 Thread David Turner
Are you guys talking about 5Mbytes/sec to each journal device? Even if you had 8 OSDs per journal and had 2000 osds... you would need a sustained 1.25 Gbytes/sec to average 5Mbytes/sec per journal device. On Tue, May 2, 2017 at 1:47 PM Willem Jan Withagen wrote: > On 02-05-17

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-02 Thread Willem Jan Withagen
On 02-05-17 19:16, Дробышевский, Владимир wrote: > Willem, > > please note that you use 1.6TB Intel S3520 endurance rating in your > calculations but then compare prices with 480GB model, which has only > 945TBW or 1.1DWPD ( >

Re: [ceph-users] RBD behavior for reads to a volume with no data written

2017-05-02 Thread Prashant Murthy
I wanted to add that I was particularly interested about the behavior with filestore, but was also curious how this works on bluestore. Prashant On Mon, May 1, 2017 at 10:04 PM, Prashant Murthy wrote: > Hi all, > > I was wondering what happens when reads are issued to

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-02 Thread Дробышевский , Владимир
Willem, please note that you use 1.6TB Intel S3520 endurance rating in your calculations but then compare prices with 480GB model, which has only 945TBW or 1.1DWPD ( https://ark.intel.com/products/93026/Intel-SSD-DC-S3520-Series-480GB-2_5in-SATA-6Gbs-3D1-MLC ). It also worth to notice that

Re: [ceph-users] Maintaining write performance under a steady intake of small objects

2017-05-02 Thread Patrick Dinnen
Hi George, Also, I should have mentioned before. The results I shared were with a lowered cache pressure value (in an attempt to keep inodes in cache). vm.vfs_cache_pressure = 10 (down from the default 100). The results were a little ambiguous, but it seemed like that did help somewhat. We

Re: [ceph-users] Maintaining write performance under a steady intake of small objects

2017-05-02 Thread Patrick Dinnen
That's interesting Mark. It would be great if anyone has a definitive answer on the potential syncfs-related downside of caching a lot of inodes. A lot of our testing so far has been on the assumption that more cached inodes is a pure good. On Tue, May 2, 2017 at 9:19 AM, Mark Nelson

Re: [ceph-users] Ceph CBT simulate down OSDs

2017-05-02 Thread Henry Ngo
Mark, Thanks for the detailed explanation and example. This is exactly what I was looking for. Best, Henry Ngo On Tue, May 2, 2017 at 9:29 AM, Mark Nelson wrote: > Hi Henry, > > The recovery test mechanism is basically a state machine launched in > another thread that

Re: [ceph-users] Ceph CBT simulate down OSDs

2017-05-02 Thread Mark Nelson
Hi Henry, The recovery test mechanism is basically a state machine launched in another thread that runs concurrently during whatever benchmark you want to run. The basic premise is that it waits for a configurable amount of "pre" time to let the benchmarks get started, then marks osd

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-02 Thread Willem Jan Withagen
On 27-4-2017 20:46, Alexandre DERUMIER wrote: > Hi, > >>> What I'm trying to get from the list is /why/ the "enterprise" drives >>> are important. Performance? Reliability? Something else? > > performance, for sure (for SYNC write, >

Re: [ceph-users] cephfs metadata damage and scrub error

2017-05-02 Thread David Zafman
James, You have an omap corruption. It is likely caused by a bug which has already been identified. A fix for that problem is available but it is still pending backport for the next Jewel point release. All 4 of your replicas have different "omap_digest" values. Instead of the

[ceph-users] Ceph CBT simulate down OSDs

2017-05-02 Thread Henry Ngo
Hi all, CBT documentation states that this can be achieved. If so, how do I set it up? What do I add in the yaml file? Below is an EC example. Thanks. cluster: head: "ceph@head" clients: ["ceph@client"] osds: ["ceph@osd"] mons: ["ceph@mon"] osds_per_node: 1 fs: xfs

[ceph-users] Ceph FS installation issue on ubuntu 16.04

2017-05-02 Thread dheeraj dubey
Hi, Getting following error while installing "ceph-fs-common" package on ubuntu 16.04 $sudo apt-get install ceph-fs-common Reading package lists... Building dependency tree... Reading state information... You might want to run 'apt-get -f install' to correct these: The following packages have

Re: [ceph-users] Maintaining write performance under a steady intake of small objects

2017-05-02 Thread Mark Nelson
I used to advocate that users favor dentry/inode cache, but it turns out that it's not necessarily a good idea if you also are using syncfs. It turns out that when syncfs is used, the kernel will iterate through all cached inodes, rather than just dirty inodes. With high numbers of cached

Re: [ceph-users] ceph-deploy to a particular version

2017-05-02 Thread German Anders
I think you can do *$ceph-deploy install --release --repo-url http://download.ceph.com/. .. *, also you can change the --release flag with --dev or --testing and specify the version, I've done with release and dev flags and work great :) hope it helps best,

Re: [ceph-users] SSD Primary Affinity

2017-05-02 Thread David Turner
You would need to have 1TB of SSDs for every 2TB of HDDs used this way. If you set up your cluster with those ratios, you would fill up evenly. On Tue, May 2, 2017, 8:37 AM George Mihaiescu wrote: > One problem that I can see with this setup is that you will fill up the >

[ceph-users] ceph-deploy to a particular version

2017-05-02 Thread Puff, Jonathon
From what I can find ceph-deploy only allows installs for a release, i.e jewel which is giving me 10.2.7, but I’d like to specify the particular update. For instance, I want to go to 10.2.3.Do I need to avoid ceph-deploy entirely to do this or can I install the correct version via yum then

Re: [ceph-users] SSD Primary Affinity

2017-05-02 Thread George Mihaiescu
One problem that I can see with this setup is that you will fill up the SSDs holding the primary replica before the HDD ones, if they are much different in size. Other than that, it's a very inventive solution to increase read speeds without using a possibly buggy cache configuration. > On

Re: [ceph-users] Maintaining write performance under a steady intake of small objects

2017-05-02 Thread George Mihaiescu
Hi Patrick, You could add more RAM to the servers witch will not increase the cost too much, probably. You could change swappiness value or use something like https://hoytech.com/vmtouch/ to pre-cache inodes entries. You could tarball the smaller files before loading them into Ceph maybe.

Re: [ceph-users] Maintaining write performance under a steady intake of small objects

2017-05-02 Thread Mark Nelson
On 05/02/2017 01:32 AM, Frédéric Nass wrote: Le 28/04/2017 à 17:03, Mark Nelson a écrit : On 04/28/2017 08:23 AM, Frédéric Nass wrote: Le 28/04/2017 à 15:19, Frédéric Nass a écrit : Hi Florian, Wido, That's interesting. I ran some bluestore benchmarks a few weeks ago on Luminous dev (1st

[ceph-users] cephfs metadata damage and scrub error

2017-05-02 Thread James Eckersall
Hi, I'm having some issues with a ceph cluster. It's an 8 node cluster rnning Jewel ceph-10.2.7-0.el7.x86_64 on CentOS 7. This cluster provides RBDs and a CephFS filesystem to a number of clients. ceph health detail is showing the following errors: pg 2.9 is active+clean+inconsistent, acting

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-05-02 Thread Eneko Lacunza
Hi, Has anyone used the new S3520's? They are 1 DWPD and so much closer to the S3610 than previous S35x0's. Cheers El 01/05/17 a las 17:41, David Turner escribió: I can attest to this. I had a cluster that used 3510's for the first rack and then switched to 3710's after that. We had 3TB

Re: [ceph-users] Ceph memory overhead when used with KVM

2017-05-02 Thread nick
Hi Jason, thanks for your feedback. I did now some tests over the weekend to verify the memory overhead. I was using qemu 2.8 (taken from the Ubuntu Cloud Archive) with librbd 10.2.7 on Ubuntu 16.04 hosts. I suspected the ceph rbd cache to be the cause of the overhead so I just generated a lot

[ceph-users] Increase PG or reweight OSDs?

2017-05-02 Thread M Ranga Swami Reddy
Hello, I have added 5 new Ceph OSD nodes to my ceph cluster. Here, I wanted to increase PG/PGP numbers of pools based new OSDs count. Same time need to increase the newly added OSDs weight from 0 -> 1. My question is: Do I need to increase the PG/PGP num increase and then reweight the OSDs? Or