[ceph-users] SSD OSD's Dual Use

2017-06-21 Thread Ashley Merrick
Hello, Currently have a pool of SSD's running as a Cache in front of a EC Pool. The cache is very under used and the SSD's spend most time idle, would like to create a small SSD Pool for a selection of very small RBD disk's as scratch disks within the OS, should I expect any issues running

[ceph-users] Can't start ceph-mon through systemctl start ceph-mon@.service after upgrading from Hammer to Jewel

2017-06-21 Thread 许雪寒
Hi, everyone. I upgraded one of our ceph clusters from Hammer to Jewel. After upgrading, I can’t start ceph-mon through “systemctl start ceph-mon@ceph1”, while, on the other hand, I can start ceph-mon, either as user ceph or root, if I directly call “/usr/bin/ceph-mon �Ccluster ceph �Cid ceph1

Re: [ceph-users] Kernel RBD client talking to multiple storage clusters

2017-06-21 Thread Alex Gorbachev
On Mon, Jun 19, 2017 at 3:12 AM Wido den Hollander wrote: > > > Op 19 juni 2017 om 5:15 schreef Alex Gorbachev : > > > > > > Has anyone run into such config where a single client consumes storage > from > > several ceph clusters, unrelated to each other

Re: [ceph-users] Transitioning to Intel P4600 from P3700 Journals

2017-06-21 Thread Christian Balzer
Hello, Hmm, gmail client not grokking quoting these days? On Wed, 21 Jun 2017 20:40:48 -0500 Brady Deetz wrote: > On Jun 21, 2017 8:15 PM, "Christian Balzer" wrote: > > On Wed, 21 Jun 2017 19:44:08 -0500 Brady Deetz wrote: > > > Hello, > > I'm expanding my 288 OSD, primarily

Re: [ceph-users] Transitioning to Intel P4600 from P3700 Journals

2017-06-21 Thread Brady Deetz
On Jun 21, 2017 8:15 PM, "Christian Balzer" wrote: On Wed, 21 Jun 2017 19:44:08 -0500 Brady Deetz wrote: > Hello, > I'm expanding my 288 OSD, primarily cephfs, cluster by about 16%. I have 12 > osd nodes with 24 osds each. Each osd node has 2 P3700 400GB NVMe PCIe > drives

Re: [ceph-users] Transitioning to Intel P4600 from P3700 Journals

2017-06-21 Thread Christian Balzer
On Wed, 21 Jun 2017 19:44:08 -0500 Brady Deetz wrote: > Hello, > I'm expanding my 288 OSD, primarily cephfs, cluster by about 16%. I have 12 > osd nodes with 24 osds each. Each osd node has 2 P3700 400GB NVMe PCIe > drives providing 10GB journals for groups of 12 6TB spinning rust drives > and 2x

[ceph-users] Transitioning to Intel P4600 from P3700 Journals

2017-06-21 Thread Brady Deetz
Hello, I'm expanding my 288 OSD, primarily cephfs, cluster by about 16%. I have 12 osd nodes with 24 osds each. Each osd node has 2 P3700 400GB NVMe PCIe drives providing 10GB journals for groups of 12 6TB spinning rust drives and 2x lacp 40gbps ethernet. Our hardware provider is recommending

Re: [ceph-users] Ceph packages for Debian Stretch?

2017-06-21 Thread Christian Balzer
Hello, On Wed, 21 Jun 2017 11:15:26 +0200 Fabian Grünbichler wrote: > On Wed, Jun 21, 2017 at 05:30:02PM +0900, Christian Balzer wrote: > > > > Hello, > > > > On Wed, 21 Jun 2017 09:47:08 +0200 (CEST) Alexandre DERUMIER wrote: > > > > > Hi, > > > > > > Proxmox is maintening a

Re: [ceph-users] Mon Create currently at the state of probing

2017-06-21 Thread David Turner
You can specify an option in ceph-deploy to tell it which release of ceph to install, jewel, kraken, hammer, etc. `ceph-deploy --release jewel` would pin the command to using jewel instead of kraken. While running a mixed environment is supported, it should always be tested before assuming it

Re: [ceph-users] Mon Create currently at the state of probing

2017-06-21 Thread Jim Forde
David, Thanks for the reply. The scenario: Monitor node fails for whatever reason, Bad blocks in HD, or Motherboard fail, whatever. Procedure: Remove the monitor from the cluster, replace hardware, reinstall OS and add monitor to cluster. That is exactly what I did. However, my ceph-deploy

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-21 Thread Jason Dillaman
Do your VMs or OSDs show blocked requests? If you disable scrub or restart the blocked OSD, does the issue go away? If yes, it most likely is this issue [1]. [1] http://tracker.ceph.com/issues/20041 On Wed, Jun 21, 2017 at 3:33 PM, Hall, Eric wrote: > The VMs are using

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-21 Thread Hall, Eric
The VMs are using stock Ubuntu14/16 images so yes, there is the default “/sbin/fstrim –all” in /etc/cron.weekly/fstrim. -- Eric On 6/21/17, 1:58 PM, "Jason Dillaman" wrote: Are some or many of your VMs issuing periodic fstrims to discard unused extents?

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-21 Thread Jason Dillaman
Are some or many of your VMs issuing periodic fstrims to discard unused extents? On Wed, Jun 21, 2017 at 2:36 PM, Hall, Eric wrote: > After following/changing all suggested items (turning off exclusive-lock > (and associated object-map and fast-diff), changing host

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

2017-06-21 Thread Hall, Eric
After following/changing all suggested items (turning off exclusive-lock (and associated object-map and fast-diff), changing host cache behavior, etc.) this is still a blocking issue for many uses of our OpenStack/Ceph installation. We have upgraded Ceph to 10.2.7, are running 4.4.0-62 or later

[ceph-users] OSD returns back and recovery process

2017-06-21 Thread Дмитрий Глушенок
Hello! It is clear what happens after OSD goes OUT - PGs are backfilled to other OSDs and PGs whose primary copies were on lost OSD gets new primary OSDs. But when OSD returns back it looks like all that data, for which the OSD was holding primary copies, are read from that OSD and re-written

Re: [ceph-users] risk mitigation in 2 replica clusters

2017-06-21 Thread ceph
You have a point, depends on your needs Based on recovery time and usage, I may find acceptable to lock write during recovery Thanks you for that insight On 21/06/2017 18:47, David Turner wrote: > I disagree that Replica 2 will ever truly be sane if you care about your > data. The biggest issue

Re: [ceph-users] risk mitigation in 2 replica clusters

2017-06-21 Thread David Turner
I disagree that Replica 2 will ever truly be sane if you care about your data. The biggest issue with replica 2 has nothing to do with drive failures, restarting osds/nodes, power outages, etc. The biggest issue with replica 2 is the min_size. If you set min_size to 2, then your data is locked

Re: [ceph-users] risk mitigation in 2 replica clusters

2017-06-21 Thread ceph
2r on filestore == "I do not care about my data" This is not because of OSD's failure chance When you have a write error (ie data is badly written on the disk, without error reported), your data is just corrupted without hope of redemption Just as you expect your drives to die, expect your

[ceph-users] risk mitigation in 2 replica clusters

2017-06-21 Thread Blair Bethwaite
Hi all, I'm doing some work to evaluate the risks involved in running 2r storage pools. On the face of it my naive disk failure calculations give me 4-5 nines for a 2r pool of 100 OSDs (no copyset awareness, i.e., secondary disk failure based purely on chance of any 1 of the remaining 99 OSDs

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Peter Maloney
On 06/14/17 11:59, Dan van der Ster wrote: > Dear ceph users, > > Today we had O(100) slow requests which were caused by deep-scrubbing > of the metadata log: > > 2017-06-14 11:07:55.373184 osd.155 > [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d > deep-scrub starts > ... >

[ceph-users] Degraded objects while OSD is being added/filled

2017-06-21 Thread Andras Pataki
Hi cephers, I noticed something I don't understand about ceph's behavior when adding an OSD. When I start with a clean cluster (all PG's active+clean) and add an OSD (via ceph-deploy for example), the crush map gets updated and PGs get reassigned to different OSDs, and the new OSD starts

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-21 Thread Sage Weil
On Wed, 21 Jun 2017, Piotr Dałek wrote: > > > > > I tested on few of our production images and it seems that about 30% > > > > > is > > > > > sparse. This will be lost on any cluster wide event (add/remove nodes, > > > > > PG grow, recovery). > > > > > > > > > > How this is/will be handled in

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Casey Bodley
That patch looks reasonable. You could also try raising the values of osd_op_thread_suicide_timeout and filestore_op_thread_suicide_timeout on that osd in order to trim more at a time. On 06/21/2017 09:27 AM, Dan van der Ster wrote: Hi Casey, I managed to trim up all shards except for that

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-21 Thread Piotr Dałek
On 17-06-21 03:24 PM, Sage Weil wrote: On Wed, 21 Jun 2017, Piotr Dałek wrote: On 17-06-14 03:44 PM, Sage Weil wrote: On Wed, 14 Jun 2017, Paweł Sadowski wrote: On 04/13/2017 04:23 PM, Piotr Dałek wrote: On 04/06/2017 03:25 PM, Sage Weil wrote: On Thu, 6 Apr 2017, Piotr Dałek wrote: [snip]

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-21 Thread Jason Dillaman
On Wed, Jun 21, 2017 at 3:05 AM, Piotr Dałek wrote: > I saw that RBD (librbd) does that - replacing writes with discards when > buffer contains only zeros. Some code that does the same in librados could > be added and it shouldn't impact performance much, current

Re: [ceph-users] radosgw: scrub causing slow requests in the md log

2017-06-21 Thread Dan van der Ster
Hi Casey, I managed to trim up all shards except for that big #54. The others all trimmed within a few seconds. But 54 is proving difficult. It's still going after several days, and now I see that the 1000-key trim is indeed causing osd timeouts. I've manually compacted the relevant osd

Re: [ceph-users] Flash for mon nodes ?

2017-06-21 Thread Wido den Hollander
> Op 21 juni 2017 om 12:38 schreef Osama Hasebou : > > > Hi Guys, > > Has anyone used flash SSD drives for nodes hosting Monitor nodes only? > > If yes, any major benefits against just using SAS drives ? > Yes: - Less latency - Faster store compacting - More

Re: [ceph-users] Flash for mon nodes ?

2017-06-21 Thread Paweł Sadowski
On 06/21/2017 12:38 PM, Osama Hasebou wrote: > Hi Guys, > > Has anyone used flash SSD drives for nodes hosting Monitor nodes only? > > If yes, any major benefits against just using SAS drives ? We are using such setup for big (>500 OSDs) clusters. It makes it less painful when such cluster

Re: [ceph-users] Flash for mon nodes ?

2017-06-21 Thread Ashley Merrick
If you just mean normal DC rated SSD’s then that’s what I am running across a ~120 OSD cluster. When checking they are very unbusy and minimal use, however I can imagine the lower random latency will always help. So if you can I would. ,Ashley Sent from my iPhone On 21 Jun 2017, at 6:39 PM,

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-21 Thread Piotr Dałek
On 17-06-20 02:44 PM, Richard Hesketh wrote: Is there a way, either by individual PG or by OSD, I can prioritise backfill/recovery on a set of PGs which are currently particularly important to me? For context, I am replacing disks in a 5-node Jewel cluster, on a node-by-node basis - mark out

[ceph-users] Flash for mon nodes ?

2017-06-21 Thread Osama Hasebou
Hi Guys, Has anyone used flash SSD drives for nodes hosting Monitor nodes only? If yes, any major benefits against just using SAS drives ? Thanks. Regards, Ossi ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Ceph packages for Debian Stretch?

2017-06-21 Thread Fabian Grünbichler
On Wed, Jun 21, 2017 at 05:30:02PM +0900, Christian Balzer wrote: > > Hello, > > On Wed, 21 Jun 2017 09:47:08 +0200 (CEST) Alexandre DERUMIER wrote: > > > Hi, > > > > Proxmox is maintening a ceph-luminous repo for stretch > > > > http://download.proxmox.com/debian/ceph-luminous/ > > > > > >

Re: [ceph-users] Ceph packages for Debian Stretch?

2017-06-21 Thread Christian Balzer
Hello, On Wed, 21 Jun 2017 09:47:08 +0200 (CEST) Alexandre DERUMIER wrote: > Hi, > > Proxmox is maintening a ceph-luminous repo for stretch > > http://download.proxmox.com/debian/ceph-luminous/ > > > git is here, with patches and modifications to get it work >

Re: [ceph-users] Ceph packages for Debian Stretch?

2017-06-21 Thread Alexandre DERUMIER
Hi, Proxmox is maintening a ceph-luminous repo for stretch http://download.proxmox.com/debian/ceph-luminous/ git is here, with patches and modifications to get it work https://git.proxmox.com/?p=ceph.git;a=summary - Mail original - De: "Alfredo Deza" À: "Christian

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-21 Thread Piotr Dałek
On 17-06-14 03:44 PM, Sage Weil wrote: On Wed, 14 Jun 2017, Paweł Sadowski wrote: On 04/13/2017 04:23 PM, Piotr Dałek wrote: On 04/06/2017 03:25 PM, Sage Weil wrote: On Thu, 6 Apr 2017, Piotr Dałek wrote: [snip] I think the solution here is to use sparse_read during recovery. The PushOp