Re: [ceph-users] After OSD Flap - FAILED assert(oi.version == i->first)

2016-11-18 Thread Nick Fisk
Hi Sam, Updated with some more info. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Samuel Just > Sent: 17 November 2016 19:02 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] After OSD Flap - FAI

Re: [ceph-users] Intel P3700 SSD for journals

2016-11-18 Thread Nick Fisk
I'm using the 400Gb models as a Journal for 12x drives. I know this is probably pushing it a little bit, but seems to work fine. I'm guessing the reason may be relating to the TBW figure being higher on the more expensive models, maybe they don't want to have to replace warn NVME's on warranty?

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-18 Thread Nick Fisk
, but sooner or later it will bite you in the arse and it won't be pretty. From: "Brian ::" Sent: 18 Nov 2016 11:52 p.m. To: sj...@redhat.com Cc: Craig Chi; ceph-users@lists.ceph.com; Nick Fisk Subject: Re: [ceph-users] how possible is that ceph cl

[ceph-users] Replace OSD Disk with Ansible

2016-11-21 Thread Nick Fisk
Hi All, I need to rebuild an OSD which is failing to start. The cluster was built with Ansible and I wish to use it to re-create the OSD as well. I get that I need to zap the OSD device to blank it and also do a osd rm and osd auth del to clean the osd device, but I'm a little confused about the

Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-22 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Eugen Block > Sent: 22 November 2016 09:55 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] deep-scrubbing has large impact on performance > > Hi list, > > I've been searching the mai

Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-22 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Eugen Block > Sent: 22 November 2016 10:11 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] deep-scrubbing has large impact on performance >

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-23 Thread Nick Fisk
Hi Daznis, I'm not sure how much help I can be, but I will try my best. I think the post-split stats error is probably benign, although I think this suggests you also increased the number of PG's in your cache pool? If so did you do this before or after you added the extra OSD's? This may have

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-23 Thread Nick Fisk
you probably want to try and do a scrub to try and clean up the stats, which may then stop this happening when the hitset comes round to being trimmed again. > > > On Wed, Nov 23, 2016 at 12:04 PM, Nick Fisk wrote: > > Hi Daznis, > > > > I'm not sure how much h

Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-23 Thread Nick Fisk
Thanks for the tip Robert, much appreciated. > -Original Message- > From: Robert LeBlanc [mailto:rob...@leblancnet.us] > Sent: 23 November 2016 00:54 > To: Eugen Block > Cc: Nick Fisk ; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] deep-scrubbing has large imp

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-23 Thread Nick Fisk
ding the OSD. > -Original Message- > From: Daznis [mailto:daz...@gmail.com] > Sent: 23 November 2016 12:55 > To: Nick Fisk > Cc: ceph-users > Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD. > > Thank you. That helped quite a lot. Now I'm j

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-23 Thread Nick Fisk
0711 s=2 pgs=647 cs=5 l=0 c=0x42798c0).fault, initiating reconnect I do not manage to identify anything obvious in the logs. Thanks for your help … Thomas From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: jeudi 17 novembre 2016 11:02 To: Thomas Danan; n...@fisk.me.uk <mailto:n

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-23 Thread Nick Fisk
; Samuel Just > Sent: 19 November 2016 00:31 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] how possible is that ceph cluster crash > > Many reasons: > > 1) You will eventually get a DC wide power event anyway at which point > probably most of

Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-23 Thread Nick Fisk
Actually this might suggest that caution should be taken before enabling this at the moment http://tracker.ceph.com/issues/15774 > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nick > Fisk > Sent: 23 November 2016 11:17 &

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-23 Thread Nick Fisk
t not be one which also has unfound objects, otherwise you are likely have to get heavily involved in recovering objects with the object store tool. > -Original Message- > From: Daznis [mailto:daz...@gmail.com] > Sent: 23 November 2016 13:56 > To: Nick Fisk > Cc: ceph-

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Nick Fisk
Hi Craig, From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Craig Chi Sent: 24 November 2016 08:34 To: ceph-users@lists.ceph.com Subject: [ceph-users] Ceph OSDs cause kernel unresponsive Hi Cephers, We have encountered kernel hanging issue on our Ceph cluster. Just

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Burkhard Linke > Sent: 24 November 2016 14:06 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Stalling IO with cache tier > > Hi, > > > *snipsnap* > > > >> # ceph osd tier a

Re: [ceph-users] Stalling IO with cache tier

2016-11-24 Thread Nick Fisk
yes, it would make sense to add this to > the documentation. Yes, if your keys in use in Openstack only grant permission to the base pool, then it will not be able to access the cache pool when enabled. > > Cheers, > Kees > > On 24-11-16 15:12, Nick Fisk wrote: > > I

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-24 Thread Nick Fisk
Can you add them with different ID's, it won't look pretty but might get you out of this situation? > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Daznis > Sent: 24 November 2016 15:43 > To: Nick Fisk > C

[ceph-users] metrics.ceph.com

2016-11-24 Thread Nick Fisk
Who is responsible for the metrics.ceph.com site? I noticed that the mailing list stats are still trying to retrieve data from the gmane archives which are no longer active. Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.c

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-24 Thread Nick Fisk
10:37 To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph OSDs cause kernel unresponsive Hi Nick, Thank you for your helpful information. I knew that Ceph recommends 1GB/1TB RAM, but we are not going to change the hardware architecture now. Are there any methods

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-25 Thread Nick Fisk
gt; From: Daznis [mailto:daz...@gmail.com] > Sent: 24 November 2016 19:44 > To: Nick Fisk > Cc: ceph-users > Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD. > > I will try it, but I wanna see if it stays stable for a few days. Not sure if > I should report

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-25 Thread Nick Fisk
. From: Craig Chi [mailto:craig...@synology.com] Sent: 25 November 2016 01:46 To: Brad Hubbard Cc: Nick Fisk ; Ceph Users Subject: Re: [ceph-users] Ceph OSDs cause kernel unresponsive Hi Nick, I have seen the report before, if I understand correctly, the osd_map_cache_size generally

Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-25 Thread Nick Fisk
3:59 > To: Nick Fisk > Cc: ceph-users > Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD. > > I think it's because of these errors: > > 2016-11-25 14:51:25.644495 7fb73eef8700 -1 log_channel(cluster) log [ERR] : > 14.28 deep-scrub stat mismatch,

Re: [ceph-users] High ops/s with kRBD and "--object-size 32M"

2016-11-29 Thread Nick Fisk
- our workload is 100% VMWare VMs running replicated databases. Now with NfS, but likely still a lot of small IO. I wonder if we are a corner case. But with 16 MB objects with both iSCSI gateway, as well as NFS we saw a clear improvement in latency and throughput. I will reach to our pe

Re: [ceph-users] Regarding loss of heartbeats

2016-11-29 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Trygve Vea > Sent: 29 November 2016 14:07 > To: ceph-users > Subject: [ceph-users] Regarding loss of heartbeats > > Since Jewel, we've seen quite a bit of funky behaviour in Ceph. I've wr

Re: [ceph-users] Regarding loss of heartbeats

2016-11-29 Thread Nick Fisk
> -Original Message- > From: Trygve Vea [mailto:trygve@redpill-linpro.com] > Sent: 29 November 2016 14:36 > To: n...@fisk.me.uk > Cc: ceph-users > Subject: Re: Regarding loss of heartbeats > > - Den 29.nov.2016 15:20 skrev Nick Fisk n...@fisk.me.uk: &g

Re: [ceph-users] - cluster stuck and undersized if at least one osd is down

2016-11-30 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Piotr Dzionek > Sent: 30 November 2016 11:04 > To: Brad Hubbard > Cc: Ceph Users > Subject: Re: [ceph-users] - cluster stuck and undersized if at least one osd > is down > > Hi, > > Ok, b

Re: [ceph-users] osd crash

2016-12-01 Thread Nick Fisk
Are you using Ubuntu 16.04 (Guessing from your kernel version). There was a numa bug in early kernels, try updating to the latest in the 4.4 series. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of VELARTIS Philipp Dürhammer Sent: 01 December 2016 12:04 To: 'ceph-us...

Re: [ceph-users] Ceph QoS user stories

2016-12-02 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage > Weil > Sent: 02 December 2016 19:02 > To: ceph-de...@vger.kernel.org; ceph-us...@ceph.com > Subject: [ceph-users] Ceph QoS user stories > > Hi all, > > We're working on getting infrastu

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-12-05 Thread Nick Fisk
ion it has increased overall ceph cluster performances and reduced block ops occurrences. I don’t think this is the end of our issue but it seems it helped to limit its impact. Thomas From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: mercredi 23 novembre 2016 14:09 To: Thomas Danan; 

Re: [ceph-users] Reusing journal partitions when using ceph-deploy/ceph-disk --dmcrypt

2016-12-05 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alex > Gorbachev > Sent: 05 December 2016 15:39 > To: Pierre BLONDEAU > Cc: ceph-users > Subject: Re: [ceph-users] Reusing journal partitions when using > ceph-deploy/ceph-disk --dmcrypt > >

Re: [ceph-users] Ceph Blog Articles

2016-12-05 Thread Nick Fisk
cluster. > > Greetings > -Sascha- > > Am 11.11.2016 um 20:33 schrieb Nick Fisk: > > Hi All, > > > > I've recently put together some articles around some of the performance > > testing I have been doing. > > > > The first explores the hig

[ceph-users] PG's become undersize+degraded if OSD's restart during backfill

2016-12-05 Thread Nick Fisk
Hi, I had recently re-added some old OSD's by zapping them and reintroducing them into cluster as new OSD's. I'm using Ansible to add the OSD's and because there was an outstanding config change, it restarted all OSD's on the host where I was adding the OSD's at the end of the play. I noticed s

Re: [ceph-users] PG's become undersize+degraded if OSD's restart during backfill

2016-12-05 Thread Nick Fisk
opying of this message is prohibited. _ From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Nick Fisk [n...@fisk.me.uk] Sent: Monday, December 05, 2016 9:38 AM To: 'ceph-users' Subject: [ceph-users] PG's become undersize+degraded if OSD's restart during backfill Hi

Re: [ceph-users] Ceph Blog Articles

2016-12-06 Thread Nick Fisk
> same partition on the NVMes, each one has 4 partitions, so > no file based journal but raw partition) > > Greetings > -Sascha- > > Am 05.12.2016 um 17:16 schrieb Nick Fisk: > > Hi Sascha, > > > > Here is what I used > > > > [global] > > ioen

Re: [ceph-users] Ceph Blog Articles

2016-12-07 Thread Nick Fisk
and > if I specify the clientname as ceph.client.admin, client.admin > or admin. Any pointer what I might be missing here? > > Greetings > -Sascha- > > Am 06.12.2016 um 15:49 schrieb Nick Fisk: > > Hi Sascha, > > > > Have you got any write back caching enabl

Re: [ceph-users] RBD: Failed to map rbd device with data pool enabled.

2016-12-07 Thread Nick Fisk
Hi Aravind, I've also seen this merge on Monday and tried to create a RBD on an ecpool and also failed. Although I ended up with all my OSD's crashing and refusing to restart. I'm going to rebuild the cluster and try again. Have you tried using the rbd-nbd driver or benchmarking directly

Re: [ceph-users] RBD: Failed to map rbd device with data pool enabled.

2016-12-08 Thread Nick Fisk
ailing for the image which was created with -data-pool option, so I can't run fio or any IO on it. Aravind From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: Wednesday, December 07, 2016 6:23 PM To: Aravind Ramesh ; ceph-users@lists.ceph.com Subject: RE: RBD: Failed to map rbd device wi

[ceph-users] [Fixed] OS-Prober In Ubuntu Xenial causes journal errors

2016-12-14 Thread Nick Fisk
Hi All, For all those who have been hit by the bug in Ubuntu where update-grub causes your OSD's to crash out when it probes the partition, a fixed package has been released to xenial-proposed which fixes the problem. I have installed and confirm it fixes the problem. https://launchpad.net/ubun

Re: [ceph-users] OSD will not start after heartbeatsuicide timeout, assert error from PGLog

2016-12-22 Thread Nick Fisk
Hi, I hit this a few weeks ago, here is the related tracker. You might want to update it to reflect your case and upload logs. http://tracker.ceph.com/issues/17916 Nick > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Trygve Vea > Sent:

Re: [ceph-users] How can I debug "rbd list" hang?

2016-12-22 Thread Nick Fisk
I think you have probably just answered your previous question. I would guess pauserd and pausewr, pauses read and write IO, hence your command to list is being blocked on reads. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Stéphane Klein Sent: 22 December 2016 17

Re: [ceph-users] How can I debug "rbd list" hang?

2016-12-22 Thread Nick Fisk
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Stéphane Klein Sent: 22 December 2016 17:10 To: n...@fisk.me.uk Cc: ceph-users Subject: Re: [ceph-users] How can I debug "rbd list" hang? 2016-12-22 18:07 GMT+01:00 Nick Fisk mailto:n...@fisk.me.uk>>:

Re: [ceph-users] How to know if an object is stored in clients?

2016-12-30 Thread Nick Fisk
Just to add, the rados command will block until the objects are stored on sufficient number of OSD’s and you should also be able to check the return code to confirm that there wasn’t any errors as well. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jaemyoun Lee Sen

Re: [ceph-users] performance with/without dmcrypt OSD

2017-01-03 Thread Nick Fisk
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kent Borg Sent: 03 January 2017 12:47 To: M Ranga Swami Reddy Cc: ceph-users Subject: Re: [ceph-users] performance with/without dmcrypt OSD On 01/03/2017 06:42 AM, M Ranga Swami Reddy wrote: On Tue, Jan 3, 2017

Re: [ceph-users] cephfs AND rbds

2017-01-07 Thread Nick Fisk
Technically I think there is no reason why you couldn’t do this, but I think it is unadvisable. There was a similar thread a while back where somebody had done this and it caused problems when he was trying to do maintenance/recovery further down the line. I’m assuming you want to do this b

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-07 Thread Nick Fisk
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of kevin parrikar Sent: 07 January 2017 13:11 To: Lionel Bouton Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release Thanks for your

Re: [ceph-users] cephfs AND rbds

2017-01-08 Thread Nick Fisk
nd delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited. _ _____ From: Nick Fisk [n...@fisk.me.uk] Sent: Saturday, January 07, 2017 3:21 PM To: David Turner; ceph-users@lists.ceph.com Subject: RE: cephfs AND rbds Tec

Re: [ceph-users] Write back cache removal

2017-01-10 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido > den Hollander > Sent: 10 January 2017 07:54 > To: ceph new ; Stuart Harland > > Subject: Re: [ceph-users] Write back cache removal > > > > Op 9 januari 2017 om 13:02 schreef Stuart Ha

Re: [ceph-users] help needed

2018-09-06 Thread Nick Fisk
If it helps, I’m seeing about a 3GB DB usage for a 3TB OSD about 60% full. This is with a pure RBD workload, I believe this can vary depending on what your Ceph use case is. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David Turner Sent: 06 September 2018 14:09 To

[ceph-users] Tiering stats are blank on Bluestore OSD's

2018-09-10 Thread Nick Fisk
After upgrading a number of OSD's to Bluestore I have noticed that the cache tier OSD's which have so far been upgraded are no longer logging tier_* stats "tier_promote": 0, "tier_flush": 0, "tier_flush_fail": 0, "tier_try_flush": 0, "tier_try_flush_fail":

[ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Nick Fisk
If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. E

Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Nick Fisk
2 PM, Igor Fedotov wrote: > > > Hi Nick. > > > > > > On 9/10/2018 1:30 PM, Nick Fisk wrote: > >> If anybody has 5 minutes could they just clarify a couple of things > >> for me > >> > >> 1. onode count, should this be equal to the number of

[ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-18 Thread Nick Fisk
Hi, Ceph Version = 12.2.8 8TB spinner with 20G SSD partition Perf dump shows the following: "bluefs": { "gift_bytes": 0, "reclaim_bytes": 0, "db_total_bytes": 21472731136, "db_used_bytes": 3467640832, "wal_total_bytes": 0, "wal_used_bytes": 0,

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-19 Thread Nick Fisk
> > On 10/18/2018 7:49 PM, Nick Fisk wrote: > > Hi, > > > > Ceph Version = 12.2.8 > > 8TB spinner with 20G SSD partition > > > > Perf dump shows the following: > > > > "bluefs": { > > "gift_bytes": 0,

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-19 Thread Nick Fisk
> -Original Message- > From: Nick Fisk [mailto:n...@fisk.me.uk] > Sent: 19 October 2018 08:15 > To: 'Igor Fedotov' ; ceph-users@lists.ceph.com > Subject: RE: [ceph-users] slow_used_bytes - SlowDB being used despite lots of > space free in BlockDB on SSD?

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-01 Thread Nick Fisk
4.16 required? https://www.phoronix.com/scan.php?page=news_item&px=Skylake-X-P-State-Linux- 4.16 -Original Message- From: ceph-users On Behalf Of Blair Bethwaite Sent: 01 May 2018 16:46 To: Wido den Hollander Cc: ceph-users ; Nick Fisk Subject: Re: [ceph-users] Intel Xeon Scalable

[ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-01 Thread Nick Fisk
Hi all, Slowly getting round to migrating clusters to Bluestore but I am interested in how people are handling the potential change in write latency coming from Filestore? Or maybe nobody is really seeing much difference? As we all know, in Bluestore, writes are not double written and in mo

Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-03 Thread Nick Fisk
-Original Message- From: Alex Gorbachev Sent: 02 May 2018 22:05 To: Nick Fisk Cc: ceph-users Subject: Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences Hi Nick, On Tue, May 1, 2018 at 4:50 PM, Nick Fisk wrote: > Hi all, > > > > Slowly getting rou

Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-03 Thread Nick Fisk
Hi Nick, On 5/1/2018 11:50 PM, Nick Fisk wrote: Hi all, Slowly getting round to migrating clusters to Bluestore but I am interested in how people are handling the potential change in write latency coming from Filestore? Or maybe nobody is really seeing much difference? As we all know, in

Re: [ceph-users] Bluestore on HDD+SSD sync write latency experiences

2018-05-03 Thread Nick Fisk
case writing the IO's through the NVME first seems to help by quite a large margin. I'm curious what was the original rationale for 32kB? Cheers, Dan On Tue, May 1, 2018 at 10:50 PM, Nick Fisk wrote: Hi all, Slowly getting round to migrating clusters to Bluestore but I am i

[ceph-users] Scrubbing impacting write latency since Luminous

2018-05-10 Thread Nick Fisk
Hi All, I've just upgraded our main cluster to Luminous and have noticed that where before the cluster 64k write latency was always hovering around 2ms regardless of what scrubbing was going on, since the upgrade to Luminous, scrubbing takes the average latency up to around 5-10ms and deep scrubbi

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-14 Thread Nick Fisk
Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs On 05/01/2018 10:19 PM, Nick Fisk wrote: > 4.16 required? > https://www.phoronix.com/scan.php?page=news_item&px=Skylake-X-P-State- > Linux- > 4.16 > I've been trying with the 4.16 kernel for the last

[ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-05 Thread Nick Fisk
Hi, After a RBD snapshot was removed, I seem to be having OSD's assert when they try and recover pg 1.2ca. The issue seems to follow the PG around as OSD's fail. I've seen this bug tracker and associated mailing list post, but would appreciate if anyone can give any pointers. https://tracker.cep

Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-05 Thread Nick Fisk
quot; snapshot object and then allow thigs to backfill? -Original Message- From: ceph-users On Behalf Of Nick Fisk Sent: 05 June 2018 16:43 To: 'ceph-users' Subject: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end()) Hi, After a RBD snapshot was removed, I

Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-05 Thread Nick Fisk
From: ceph-users On Behalf Of Paul Emmerich Sent: 05 June 2018 17:02 To: n...@fisk.me.uk Cc: ceph-users Subject: Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end()) 2018-06-05 17:42 GMT+02:00 Nick Fisk mailto:n...@fisk.me.uk> >: Hi, After a RBD snapsh

Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-07 Thread Nick Fisk
sing the object-store-tool, but not sure if I want to clean the clone metadata or try and remove the actual snapshot object. -Original Message- From: ceph-users On Behalf Of Nick Fisk Sent: 05 June 2018 17:22 To: 'ceph-users' Subject: Re: [ceph-users] FAILED assert(p != recover

Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-08 Thread Nick Fisk
http://docs.ceph.com/docs/master/ceph-volume/simple/ ? From: ceph-users On Behalf Of Konstantin Shalygin Sent: 08 June 2018 11:11 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access) Wh

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Nick Fisk
I’ve seen similar things like this happen if you tend to end up with extreme weighting towards a small set of OSD’s. Crush tries a slightly different combination of OSD’s at each attempt, but in an extremely lop sided weighting, it can run out of attempts before it finds a set of OSD’s which mat

Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end())

2018-06-14 Thread Nick Fisk
ts.ceph.com] On Behalf Of Nick Fisk Sent: 07 June 2018 14:01 To: 'ceph-users' Subject: Re: [ceph-users] FAILED assert(p != recovery_info.ss.clone_snaps.end()) So I've recompiled a 12.2.5 ceph-osd binary with the fix included in https://github.com/ceph/ceph/pull/22396 The OSD has resta

[ceph-users] CephFS+NFS For VMWare

2018-06-29 Thread Nick Fisk
This is for us peeps using Ceph with VMWare. My current favoured solution for consuming Ceph in VMWare is via RBD's formatted with XFS and exported via NFS to ESXi. This seems to perform better than iSCSI+VMFS which seems to not play nicely with Ceph's PG contention issues particularly if wor

Re: [ceph-users] CephFS+NFS For VMWare

2018-06-30 Thread Nick Fisk
greater concern. Thanks, Nick From: Paul Emmerich [mailto:paul.emmer...@croit.io] Sent: 29 June 2018 17:57 To: Nick Fisk Cc: ceph-users Subject: Re: [ceph-users] CephFS+NFS For VMWare VMWare can be quite picky about NFS servers. Some things that you should test before deploying

Re: [ceph-users] CephFS+NFS For VMWare

2018-07-02 Thread Nick Fisk
Quoting Ilya Dryomov : On Fri, Jun 29, 2018 at 8:08 PM Nick Fisk wrote: This is for us peeps using Ceph with VMWare. My current favoured solution for consuming Ceph in VMWare is via RBD’s formatted with XFS and exported via NFS to ESXi. This seems to perform better than iSCSI+VMFS

Re: [ceph-users] Write back cache removal

2017-01-10 Thread Nick Fisk
esholds to higher than your hit set counts, this will abuse the tiering logic but should also stop anything getting promoted into your cache tier. On 10 Jan 2017, at 09:52, Wido den Hollander mailto:w...@42on.com> > wrote: Op 10 januari 2017 om 9:52 schreef Nick Fisk mailto

Re: [ceph-users] Ceph cache tier removal.

2017-01-10 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Daznis > Sent: 09 January 2017 12:54 > To: ceph-users > Subject: [ceph-users] Ceph cache tier removal. > > Hello, > > > I'm running preliminary test on cache tier removal on a live cluster

[ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-12 Thread Nick Fisk
Hi, I had been testing some higher values with the osd_snap_trim_sleep variable to try and reduce the impact of removing RBD snapshots on our cluster and I have come across what I believe to be a possible unintended consequence. The value of the sleep seems to keep the lock on the PG open so tha

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-13 Thread Nick Fisk
1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008652.html > -Original Message- > From: Dan van der Ster [mailto:d...@vanderster.com] > Sent: 13 January 2017 10:28 > To: Nick Fisk > Cc: ceph-users > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps lo

Re: [ceph-users] Mixing disks

2017-01-14 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Marc > Roos > Sent: 14 January 2017 12:56 > To: ceph-users > Subject: [ceph-users] Mixing disks > > > For a test cluster, we like to use some 5400rpm and 7200rpm drives, is it > advisable to

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-19 Thread Nick Fisk
ticable. It might be that this option shouldn't be used with Jewel+? > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nick > Fisk > Sent: 13 January 2017 20:38 > To: 'Dan van der Ster' > Cc: 'ceph-users

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-19 Thread Nick Fisk
I know you are working on fixing a bug with that. Nick > -Original Message- > From: Samuel Just [mailto:sj...@redhat.com] > Sent: 19 January 2017 15:47 > To: Dan van der Ster > Cc: Nick Fisk ; ceph-users > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG d

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-19 Thread Nick Fisk
uel Just [mailto:sj...@redhat.com] > Sent: 19 January 2017 18:58 > To: Nick Fisk > Cc: Dan van der Ster ; ceph-users > > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? > > Have you also tried setting osd_snap_trim_cost to

Re: [ceph-users] [Ceph-community] Consultation about ceph storage cluster architecture

2017-01-20 Thread Nick Fisk
I think he needs the “gateway” servers because he wishes to expose the storage to clients which won’t speak Ceph natively. I’m not sure I would entirely trust that windows port of CephFS and there are also security concerns with allowing end users to talk directly to Ceph. There’s also future st

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-01-20 Thread Nick Fisk
Hi Sam, I have a test cluster, albeit small. I’m happy to run tests + graph results with a wip branch and work out reasonable settings…etc From: Samuel Just [mailto:sj...@redhat.com] Sent: 19 January 2017 23:23 To: David Turner Cc: Nick Fisk ; ceph-users Subject: Re: [ceph-users

Re: [ceph-users] Crash on startup

2017-02-01 Thread Nick Fisk
Can you check to see if you have any disk errors in your kernel log and/or SMART errors? From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Hans van den Bogert Sent: 01 February 2017 16:01 To: ceph-us...@ceph.com Subject: [ceph-users] Crash on startup Hi All, I'm clu

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-03 Thread Nick Fisk
...@redhat.com] Sent: 03 February 2017 18:24 To: David Turner Cc: Nick Fisk ; ceph-users Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? They do seem to exist in Jewel. -Sam On Fri, Feb 3, 2017 at 10:12 AM, David Turner mailto:david.tur...@storagecraft.com>

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-07 Thread Nick Fisk
Hi Steve, >From what I understand, the issue is not with the queueing in Ceph, which is >correctly moving client IO to the front of the queue. The problem lies below >what Ceph controls, ie the scheduler and disk layer in Linux. Once the IO’s >leave Ceph it’s a bit of a free for all and the

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-07 Thread Nick Fisk
sed that any dissemination or copying of this message is prohibited. _____ From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: Tuesday, February 7, 2017 10:25 AM To: Steve Taylor ; ceph-users@lists.ceph.com Subject: RE: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? Hi Stev

Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-08 Thread Nick Fisk
Hi, I would also be interested in if there is a way to determine if this happening. I'm not sure if its related, but when I updated a number of OSD nodes to Kernel 4.7 from 4.4, I started seeing lots of random alerts from OSD's saying that other OSD's were not responding. The load wasn't particu

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-09 Thread Nick Fisk
Building now From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Samuel Just Sent: 09 February 2017 19:22 To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? Ok, https://github.com/athanatos/ceph/tree

Re: [ceph-users] To backup or not to backup the classic way - How to backup hundreds of TB?

2017-02-14 Thread Nick Fisk
Hardware failures are just one possible cause. If you value your data you will have a backup and preferably going to some sort of removable media that can be taken offsite, like those things that everybody keeps saying are dead…..what are they called….oh yeah tapes. J A online copy of your data

Re: [ceph-users] bcache vs flashcache vs cache tiering

2017-02-14 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Dongsheng Yang > Sent: 14 February 2017 09:01 > To: Sage Weil > Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com > Subject: [ceph-users] bcache vs flashcache vs cache tiering > > Hi

Re: [ceph-users] bcache vs flashcache vs cache tiering

2017-02-14 Thread Nick Fisk
> -Original Message- > From: Wido den Hollander [mailto:w...@42on.com] > Sent: 14 February 2017 16:25 > To: Dongsheng Yang ; n...@fisk.me.uk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] bcache vs flashcache vs cache tiering > > > > Op 14 febru

Re: [ceph-users] bcache vs flashcache vs cache tiering

2017-02-14 Thread Nick Fisk
> -Original Message- > From: Gregory Farnum [mailto:gfar...@redhat.com] > Sent: 14 February 2017 21:05 > To: Wido den Hollander > Cc: Dongsheng Yang ; Nick Fisk > ; Ceph Users > Subject: Re: [ceph-users] bcache vs flashcache vs cache tiering > > On Tue, Fe

Re: [ceph-users] bcache vs flashcache vs cache tiering

2017-02-15 Thread Nick Fisk
> -Original Message- > From: Christian Balzer [mailto:ch...@gol.com] > Sent: 15 February 2017 01:42 > To: 'Ceph Users' > Cc: Nick Fisk ; 'Gregory Farnum' > Subject: Re: [ceph-users] bcache vs flashcache vs cache tiering > > On Tue

[ceph-users] Passing LUA script via python rados execute

2017-02-15 Thread Nick Fisk
Hi Noah, I'm trying to follow your example where you can pass a LUA script as json when calling the rados execute function in Python. However I'm getting a rados permission denied error saying its failed to read the test object I have placed on the pool. I have also tried calling the cls_hello ob

Re: [ceph-users] bcache vs flashcache vs cache tiering

2017-02-15 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Nick Fisk > Sent: 15 February 2017 09:53 > To: 'Christian Balzer' ; 'Ceph Users' us...@lists.ceph.com> > Subject: Re: [ceph-users] bcache vs flashca

Re: [ceph-users] bcache vs flashcache vs cache tiering

2017-02-15 Thread Nick Fisk
> On Wed, 15 Feb 2017, Nick Fisk wrote: > > Just an update. I spoke to Sage today and the general consensus is > > that something like bcache or dmcache is probably the long term goal, > > but work needs to be done before its ready for prime time. The current > > tie

Re: [ceph-users] How safe is ceph pg repair these days?

2017-02-18 Thread Nick Fisk
>From what I understand in Jewel+ Ceph has the concept of an authorative shard, so in the case of a 3x replica pools, it will notice that 2 replicas match and one doesn't and use one of the good replicas. However, in a 2x pool your out of luck. However, if someone could confirm my suspicions that

Re: [ceph-users] Passing LUA script via python rados execute

2017-02-18 Thread Nick Fisk
ot; object is in the same pool as the objects to be worked on, then in theory could someone modify the script object to do something nasty, intentional or not. Nick > -Original Message- > From: Noah Watkins [mailto:noahwatk...@gmail.com] > Sent: 18 February 2017 19:56 > To:

Re: [ceph-users] How safe is ceph pg repair these days?

2017-02-21 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Gregory Farnum > Sent: 20 February 2017 22:13 > To: Nick Fisk ; David Zafman > Cc: ceph-users > Subject: Re: [ceph-users] How safe is ceph pg repair these days? > &

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-21 Thread Nick Fisk
Yep sure, will try and present some figures at tomorrow’s meeting again. From: Samuel Just [mailto:sj...@redhat.com] Sent: 21 February 2017 18:14 To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? Ok, I've added exp

<    1   2   3   4   5   6   7   8   >