Re: [ceph-users] OSD's hang after network blip

2020-01-16 Thread Dan van der Ster
We upgraded to 14.2.4 back in October and this week to v14.2.6. But I don't think the cluster had a network outage until yesterday, so I wouldn't have thought this is a .6 regression. If it happens again I'll look for the waiting for map message. -- dan On Thu, Jan 16, 2020 at 12:08 PM Nick

Re: [ceph-users] ?==?utf-8?q? OSD's hang after network blip

2020-01-16 Thread Nick Fisk
On Thursday, January 16, 2020 09:15 GMT, Dan van der Ster wrote: > Hi Nick, > > We saw the exact same problem yesterday after a network outage -- a few of > our down OSDs were stuck down until we restarted their processes. > > -- Dan > > > On Wed, Jan 15, 2020 at 3:37 PM Nick Fisk wrote:

Re: [ceph-users] OSD's hang after network blip

2020-01-16 Thread Dan van der Ster
Hi Nick, We saw the exact same problem yesterday after a network outage -- a few of our down OSDs were stuck down until we restarted their processes. -- Dan On Wed, Jan 15, 2020 at 3:37 PM Nick Fisk wrote: > Hi All, > > Running 14.2.5, currently experiencing some network blips isolated to a

[ceph-users] Luminous Bluestore OSDs crashing with ASSERT

2020-01-16 Thread Stefan Priebe - Profihost AG
Hello, does anybody know a fix for this ASSERT / crash? 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1 /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time 2020-01-16 02:02:31.304993 /build/ceph/src/os/bluestore/BlueStore.cc: 8808:

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-16 Thread Massimo Sgaravatto
And I confirm that a repair is not useful. As as far I can see it simply "cleans" the error (without modifying the big object) but the error of course reappears when the deep scrub runs again on that PG Cheers, Massimo On Thu, Jan 16, 2020 at 9:35 AM Massimo Sgaravatto <

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-16 Thread Massimo Sgaravatto
In my cluster I saw that the problematic objects have been uploaded by a specific application (onedata), which I think used to upload the files doing something like: rados --pool put Now (since Luminous ?) the default object size is 128MB but if I am not wrong it was 100GB before. This would

[ceph-users] Mon crashes virtual void LogMonitor::update_from_paxos(bool*)

2020-01-15 Thread Kevin Hrpcek
Hey all, One of my mons has been having a rough time for the last day or so. It started with a crash and restart I didn't notice about a day ago and now it won't start. Where it crashes has changed over time but it is now stuck on the last error below. I've tried to get some more information

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-15 Thread Liam Monahan
I just changed my max object size to 256MB and scrubbed and the errors went away. I’m not sure what can be done to reduce the size of these objects, though, if it really is a problem. Our cluster has dynamic bucket index resharding turned on, but that sharding process shouldn’t help it if

Re: [ceph-users] ?==?utf-8?q? OSD's hang after network blip

2020-01-15 Thread Nick Fisk
On Wednesday, January 15, 2020 14:37 GMT, "Nick Fisk" wrote: > Hi All, > > Running 14.2.5, currently experiencing some network blips isolated to a > single rack which is under investigation. However, it appears following a > network blip, random OSD's in unaffected racks are sometimes not

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-15 Thread Massimo Sgaravatto
I never changed the default value for that attribute I am missing why I have such big objects around I am also wondering what a pg repair would do in such case Il mer 15 gen 2020, 16:18 Liam Monahan ha scritto: > Thanks for that link. > > Do you have a default osd max object size of 128M?

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-15 Thread Liam Monahan
Thanks for that link. Do you have a default osd max object size of 128M? I’m thinking about doubling that limit to 256MB on our cluster. Our largest object is only about 10% over that limit. > On Jan 15, 2020, at 3:51 AM, Massimo Sgaravatto > wrote: > > I guess this is coming from: > >

[ceph-users] OSD's hang after network blip

2020-01-15 Thread Nick Fisk
Hi All, Running 14.2.5, currently experiencing some network blips isolated to a single rack which is under investigation. However, it appears following a network blip, random OSD's in unaffected racks are sometimes not recovering from the incident and are left running running in a zombie

[ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-15 Thread Stefan Bauer
Folks, i would like to thank you again for your help regarding performance speedup of our ceph cluster. Customer just reports, that database is around 40% faster than before without changing any hardware. This really kicks ass now! :) We measured the subop_latency - avgtime on our

[ceph-users] Weird mount issue (Ubuntu 18.04, Ceph 14.2.5 & 14.2.6)

2020-01-15 Thread Aaron
Seeing a weird mount issue. Some info: No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.3 LTS Release: 18.04 Codename: bionic Ubuntu 18.04.3 with kerne 4.15.0-74-generic Ceph 14.2.5 & 14.2.6 With ceph-common, ceph-base, etc installed: ceph/stable,now

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-15 Thread Massimo Sgaravatto
I guess this is coming from: https://github.com/ceph/ceph/pull/30783 introduced in Nautilus 14.2.5 On Wed, Jan 15, 2020 at 8:10 AM Massimo Sgaravatto < massimo.sgarava...@gmail.com> wrote: > As I wrote here: > > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html > >

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-14 Thread Massimo Sgaravatto
As I wrote here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html I saw the same after an update from Luminous to Nautilus 14.2.6 Cheers, Massimo On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan wrote: > Hi, > > I am getting one inconsistent object on our cluster with

[ceph-users] One lost cephfs data object

2020-01-14 Thread Andrew Denton
Hi all, I'm on 13.2.6. My cephfs has managed to lose one single object from it's data pool. All the cephfs docs I'm finding show me how to recover from an entire lost PG, but the rest of the PG checks out as far as I can tell. How can I track down which file does that object belongs to? I'm

Re: [ceph-users] units of metrics

2020-01-14 Thread Stefan Kooman
Quoting Robert LeBlanc (rob...@leblancnet.us): > > req_create > req_getattr > req_readdir > req_lookupino > req_open > req_unlink > > We were graphing these as ops, but using the new avgcount, we are getting > very different values, so I'm wondering if we are choosing the wrong new > value, or

Re: [ceph-users] Pool Max Avail and Ceph Dashboard Pool Useage on Nautilus giving different percentages

2020-01-14 Thread ceph
Does anyone know if this is also respecting an nearfull values? Thank you in advice Mehmet Am 14. Januar 2020 15:20:39 MEZ schrieb Stephan Mueller : >Hi, >I sent out this message on the 19th of December and somehow it didn't >got into the list and I just noticed it now. Sorry for the delay. >I

[ceph-users] PG inconsistent with error "size_too_large"

2020-01-14 Thread Liam Monahan
Hi, I am getting one inconsistent object on our cluster with an inconsistency error that I haven’t seen before. This started happening during a rolling upgrade of the cluster from 14.2.3 -> 14.2.6, but I am not sure that’s related. I was hoping to know what the error means before trying a

Re: [ceph-users] units of metrics

2020-01-14 Thread Robert LeBlanc
On Tue, Jan 14, 2020 at 12:30 AM Stefan Kooman wrote: > Quoting Robert LeBlanc (rob...@leblancnet.us): > > The link that you referenced above is no longer available, do you have a > > new link?. We upgraded from 12.2.8 to 12.2.12 and the MDS metrics all > > changed, so I'm trying to may the old

Re: [ceph-users] where does 100% RBD utilization come from?

2020-01-14 Thread vitalif
Hi Philip, I'm not sure if we're talking about the same thing but I was also confused when I didn't see 100% OSD drive utilization during my first RBD write benchmark. Since then I collect all my confusion here https://yourcmc.ru/wiki/Ceph_performance :) 100% RBD utilization means that

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread Stefan Bauer
Thank you all, performance is indeed better now. Can now go back to sleep ;) KR Stefan -Ursprüngliche Nachricht- Von: Виталий Филиппов  Gesendet: Dienstag 14 Januar 2020 10:28 An: Wido den Hollander ; Stefan Bauer CC: ceph-users@lists.ceph.com Betreff: Re: [ceph-users] low io

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread vitalif
Yes, that's it, see the end of the article. You'll have to disable signature checks, too. cephx_require_signatures = false cephx_cluster_require_signatures = false cephx_sign_messages = false Hi Vitaliy, thank you for your time. Do you mean cephx sign messages = false with "diable

Re: [ceph-users] where does 100% RBD utilization come from?

2020-01-14 Thread Philip Brown
Also.. "It seems like your RBD can't flush it's I/O fast enough" implies that there is some particular measure of "fast enough", that is a tunable value somewhere. If my network cards arent blocked, and my OSDs arent blocked... then doesnt that mean that I can and should "turn that knob" up?

Re: [ceph-users] where does 100% RBD utilization come from?

2020-01-14 Thread Philip Brown
The odd thing is: the network interfaces on the gateways dont seem to be at 100% capacity and the OSD disks dont seem to be at 100% utilization. so I'm confused where this could be getting held up. - Original Message - From: "Wido den Hollander" To: "Philip Brown" , "ceph-users"

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread Stefan Bauer
Hi Vitaliy, thank you for your time. Do you mean cephx sign messages = false with "diable signatures" ? KR Stefan -Ursprüngliche Nachricht- Von: Виталий Филиппов  Gesendet: Dienstag 14 Januar 2020 10:28 An: Wido den Hollander ; Stefan Bauer CC: ceph-users@lists.ceph.com

Re: [ceph-users] block db sizing and calculation

2020-01-14 Thread Lars Fenneberg
Hi Konstantin! Quoting Konstantin Shalygin (k0...@k0ste.ru): > >Is there any recommandation of how many osds a single flash device can > >serve? The optane ones can do 2000MB/s write + 500.000 iop/s. > > Any sizes of db, except 3/30/300 is useless. I have this from Mattia Belluco in my notes

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread Stefan Bauer
Hi Stefan, thank you for your time. "temporary write through" does not seem to be a legit parameter. However write through is already set: root@proxmox61:~# echo "temporary write through" > /sys/block/sdb/device/scsi_disk/*/cache_type root@proxmox61:~# cat

Re: [ceph-users] PGs inconsistents because of "size_too_large"

2020-01-14 Thread Massimo Sgaravatto
This is what I see in the OSD.54 log file 2020-01-14 10:35:04.986 7f0c20dca700 -1 log_channel(cluster) log [ERR] : 13.4 soid 13:20fbec66:::%2fhbWPh36KajAKcJUlCjG9XdqLGQMzkwn3NDrrLDi_mTM%2ffile2:head : size 385888256 > 134217728 is too large 2020-01-14 10:35:08.534 7f0c20dca700 -1

[ceph-users] PGs inconsistents because of "size_too_large"

2020-01-14 Thread Massimo Sgaravatto
I have just finished the update of a ceph cluster from luminous to nautilus Everything seems running, but I keep receiving notifications (about ~ 10 so far, involving different PGs and different OSDs) of PGs in inconsistent state. rados list-inconsistent-obj pg-id --format=json-pretty (an

Re: [ceph-users] block db sizing and calculation

2020-01-14 Thread Konstantin Shalygin
i'm plannung to split the block db to a seperate flash device which i also would like to use as an OSD for erasure coding metadata for rbd devices. If i want to use 14x 14TB HDDs per Node https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing recommends a minimum

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread Виталий Филиппов
...disable signatures and rbd cache. I didn't mention it in the email to not repeat myself. But I have it in the article :-) -- With best regards, Vitaliy Filippov___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] block db sizing and calculation

2020-01-14 Thread Xiaoxi Chen
One tricky thing is each layer of RocksDB is 100% on SSD or 100% on HDD, so either you need to tweak the rocksdb configuration , or there will be a huge waste, e.g 20GB DB partition makes no difference compared to a 3GB one (under default rocksdb configuration) Janne Johansson 于2020年1月14日周二

Re: [ceph-users] Hardware selection for ceph backup on ceph

2020-01-14 Thread Wido den Hollander
On 1/10/20 5:32 PM, Stefan Priebe - Profihost AG wrote: > Hi, > > we‘re currently in the process of building a new ceph cluster to backup rbd > images from multiple ceph clusters. > > We would like to start with just a single ceph cluster to backup which is > about 50tb. Compression ratio of

Re: [ceph-users] block db sizing and calculation

2020-01-14 Thread Janne Johansson
(sorry for empty mail just before) > i'm plannung to split the block db to a seperate flash device which i >> also would like to use as an OSD for erasure coding metadata for rbd >> devices. >> >> If i want to use 14x 14TB HDDs per Node >> >>

Re: [ceph-users] where does 100% RBD utilization come from?

2020-01-14 Thread Wido den Hollander
On 1/10/20 7:43 PM, Philip Brown wrote: > Surprisingly, a google search didnt seem to find the answer on this, so guess > I should ask here: > > what determines if an rdb is "100% busy"? > > I have some backend OSDs, and an iSCSI gateway, serving out some RBDs. > > iostat on the gateway

Re: [ceph-users] block db sizing and calculation

2020-01-14 Thread Janne Johansson
Den mån 13 jan. 2020 kl 08:09 skrev Stefan Priebe - Profihost AG < s.pri...@profihost.ag>: > Hello, > > i'm plannung to split the block db to a seperate flash device which i > also would like to use as an OSD for erasure coding metadata for rbd > devices. > > If i want to use 14x 14TB HDDs per

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-14 Thread Wido den Hollander
On 1/13/20 6:37 PM, vita...@yourcmc.ru wrote: >> Hi, >> >> we're playing around with ceph but are not quite happy with the IOs. >> on average 5000 iops / write >> on average 13000 iops / read >> >> We're expecting more. :( any ideas or is that all we can expect? > > With server SSD you can

Re: [ceph-users] block db sizing and calculation

2020-01-14 Thread Stefan Priebe - Profihost AG
Hello, does anybody have real live experience with externel block db? Greets, Stefan Am 13.01.20 um 08:09 schrieb Stefan Priebe - Profihost AG: > Hello, > > i'm plannung to split the block db to a seperate flash device which i > also would like to use as an OSD for erasure coding metadata for

Re: [ceph-users] units of metrics

2020-01-14 Thread Stefan Kooman
Quoting Robert LeBlanc (rob...@leblancnet.us): > The link that you referenced above is no longer available, do you have a > new link?. We upgraded from 12.2.8 to 12.2.12 and the MDS metrics all > changed, so I'm trying to may the old values to the new values. Might just > have to look in the code.

[ceph-users] Slow Performance - Sequential IO

2020-01-13 Thread Anthony Brandelli (abrandel)
I have a newly setup test cluster that is giving some surprising numbers when running fio against an RBD. The end goal here is to see how viable a Ceph based iSCSI SAN of sorts is for VMware clusters, which require a bunch of random IO. Hardware: 2x E5-2630L v2 (2.4GHz, 6 core) 256GB RAM 2x

Re: [ceph-users] Acting sets sometimes may violate crush rule ?

2020-01-13 Thread Dan van der Ster
Hi, One way this can happen is if you change the crush rule of a pool after the balancer has been running awhile. This is because the balancer upmaps are only validated when they are initially created. ceph osd dump | grep upmap Does it explain your issue? .. Dan On Tue, 14 Jan 2020, 04:17

[ceph-users] Acting sets sometimes may violate crush rule ?

2020-01-13 Thread Yi-Cian Pu
Hi all, We sometimes can observe that acting set seems to violate crush rule. For example, we had an environment before: [root@Ann-per-R7-3 /]# ceph -s cluster: id: 248ce880-f57b-4a4c-a53a-3fc2b3eb142a health: HEALTH_WARN 34/8019 objects misplaced (0.424%) services:

Re: [ceph-users] units of metrics

2020-01-13 Thread Robert LeBlanc
The link that you referenced above is no longer available, do you have a new link?. We upgraded from 12.2.8 to 12.2.12 and the MDS metrics all changed, so I'm trying to may the old values to the new values. Might just have to look in the code. :( Thanks! Robert LeBlanc PGP

[ceph-users] January Ceph Science Group Virtual Meeting

2020-01-13 Thread Kevin Hrpcek
Hello, We will be having a Ceph science/research/big cluster call on Wednesday January 22nd. If anyone wants to discuss something specific they can add it to the pad linked below. If you have questions or comments you can contact me. This is an informal open call of community members mostly

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-13 Thread vitalif
Hi, we're playing around with ceph but are not quite happy with the IOs. on average 5000 iops / write on average 13000 iops / read We're expecting more. :( any ideas or is that all we can expect? With server SSD you can expect up to ~1 write / ~25000 read iops per a single client.

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-13 Thread Stefan Priebe - Profihost AG
Hi Stefan, Am 13.01.20 um 17:09 schrieb Stefan Bauer: > Hi, > > > we're playing around with ceph but are not quite happy with the IOs. > > > 3 node ceph / proxmox cluster with each: > > > LSI HBA 3008 controller > > 4 x MZILT960HAHQ/007 Samsung SSD > > Transport protocol:   SAS (SPL-3) >

Re: [ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-13 Thread John Petrini
Do those SSD's have capacitors (aka power loss protection)? I took a look at the spec sheet on samsung's site and I don't see it mentioned. If that's the case it could certainly explain the performance you're seeing. Not all enterprise SSD's have it and it's a must have for Ceph since it syncs

[ceph-users] low io with enterprise SSDs ceph luminous - can we expect more? [klartext]

2020-01-13 Thread Stefan Bauer
Hi, we're playing around with ceph but are not quite happy with the IOs. 3 node ceph / proxmox cluster with each: LSI HBA 3008 controller 4 x MZILT960HAHQ/007 Samsung SSD Transport protocol:   SAS (SPL-3) 40G fibre Intel 520 Network controller on Unifi Switch Ping roundtrip to partner

[ceph-users] block db sizing and calculation

2020-01-12 Thread Stefan Priebe - Profihost AG
Hello, i'm plannung to split the block db to a seperate flash device which i also would like to use as an OSD for erasure coding metadata for rbd devices. If i want to use 14x 14TB HDDs per Node https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing recommends a

[ceph-users] One Mon out of Quorum

2020-01-12 Thread nokia ceph
Hi, When installing Nautilus on a five node cluster, we tried to install one node first and then the remaining four nodes. After that we saw that the fifth node is out of quorum and we found that the fsid was different in 5th node. When we replaced the ceph.conf file from the four nodes to the

Re: [ceph-users] Hardware selection for ceph backup on ceph

2020-01-12 Thread Martin Verges
Hello Stefan, AMD EPYC > great choice! Has anybody experience with the drives? some of our customers have different toshiba MG06SCA drives and they work great according to them. Can't say for MG07ACA but to be honest, I don't think there should be a huge difference. -- Martin Verges Managing

Re: [ceph-users] OSD Marked down unable to restart continuously failing

2020-01-11 Thread Eugen Block
Hi, you say the daemons are locally up and running but restarting fails? Which one is it? Do you see any messages suggesting flapping OSDs? After 5 retries within 10 minutes the OSDs would be marked out. What is the result of your checks for iostat etc.? Anything pointing to a high load on

Re: [ceph-users] OSD Marked down unable to restart continuously failing

2020-01-10 Thread Radhakrishnan2 S
Can someone please help to respond to the below query ? Regards Radha Krishnan S TCS Enterprise Cloud Practice Tata Consultancy Services Cell:- +1 848 466 4870 Mailto: radhakrishnan...@tcs.com Website: http://www.tcs.com Experience certainty. IT

[ceph-users] where does 100% RBD utilization come from?

2020-01-10 Thread Philip Brown
Surprisingly, a google search didnt seem to find the answer on this, so guess I should ask here: what determines if an rdb is "100% busy"? I have some backend OSDs, and an iSCSI gateway, serving out some RBDs. iostat on the gateway says rbd is 100% utilized iostat on individual OSds only goes

Re: [ceph-users] Dashboard RBD Image listing takes forever

2020-01-10 Thread Ernesto Puerta
Hi Lenz, That PR will need a lot of rebasing, as there's been later changes to the rbd controller. Nevertheless, while working on that I found a few quick wins that could be easily implemented (I'll try to come back at this in the next weeks): - Caching object instances and using flyweight

[ceph-users] Hardware selection for ceph backup on ceph

2020-01-10 Thread Stefan Priebe - Profihost AG
Hi, we‘re currently in the process of building a new ceph cluster to backup rbd images from multiple ceph clusters. We would like to start with just a single ceph cluster to backup which is about 50tb. Compression ratio of the data is around 30% while using zlib. We need to scale the backup

Re: [ceph-users] ceph (jewel) unable to recover after node failure

2020-01-10 Thread Eugen Block
Hi, A. will ceph be able to recover over time? I am afraid that the 14 PGs that are down will not recover. if all OSDs come back (stable) the recovery should eventually finish. B. what caused the OSDs going down and up during recovery after the failed OSD node came back online? (step 2

Re: [ceph-users] HEALTH_WARN, 3 daemons have recently crashed

2020-01-10 Thread Simon Oosthoek
On 10/01/2020 10:41, Ashley Merrick wrote: > Once you have fixed the issue your need to mark / archive the crash > entry's as seen here: https://docs.ceph.com/docs/master/mgr/crash/ Hi Ashley, thanks, I didn't know this before... It turned out there were quite a few old crashes (since I never

Re: [ceph-users] Looking for experience

2020-01-10 Thread Stefan Priebe - Profihost AG
> Am 10.01.2020 um 07:10 schrieb Mainor Daly : > >  > Hi Stefan, > > before I give some suggestions, can you first describe your usecase for which > you wanna use that setup? Also which aspects are important for you. It’s just the backup target of another ceph Cluster to sync snapshots

Re: [ceph-users] HEALTH_WARN, 3 daemons have recently crashed

2020-01-10 Thread Ashley Merrick
Once you have fixed the issue your need to mark / archive the crash entry's as seen here: https://docs.ceph.com/docs/master/mgr/crash/ On Fri, 10 Jan 2020 17:37:47 +0800 Simon Oosthoek wrote Hi, last week I upgraded our ceph to 14.2.5 (from 14.2.4) and either during the

[ceph-users] HEALTH_WARN, 3 daemons have recently crashed

2020-01-10 Thread Simon Oosthoek
Hi, last week I upgraded our ceph to 14.2.5 (from 14.2.4) and either during the procedure or shortly after that, some osds crashed. I re-initialised them and that should be enough to fix everything, I thought. I looked a bit further and I do see a lot of lines like this (which are worrying I

[ceph-users] Near Perfect PG distrubtion apart from two OSD

2020-01-09 Thread Ashley Merrick
Hey, I have a cluster of 30 OSD's that is near perfect distribution minus two OSD's. I am running ceph version 14.2.6 however has been the same for the previous versions, I have the balance module enabled in upmap and it says no improvements, I have also tried in crush mode. ceph

Re: [ceph-users] Looking for experience

2020-01-09 Thread Mainor Daly
Hi Stefan, before I give some suggestions, can you first describe your usecase for which you wanna use that setup? Also which aspects are important for you. Stefan Priebe - Profihost AG < s.pri...@profihost.ag> hat am 9. Januar 2020 um

Re: [ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread JC Lopez
Hi, you can actually specify the feature you want to enable at creation time so this way no need to remove the feature after. To illustrate Ilya’s message: rbd create rbd/test --size=128M --image-feature=layering,striping --stripe-count=8 --stripe-unit=4K The object size is hereby left to the

Re: [ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread Kyriazis, George
On Jan 9, 2020, at 2:16 PM, Ilya Dryomov mailto:idryo...@gmail.com>> wrote: On Thu, Jan 9, 2020 at 2:52 PM Kyriazis, George mailto:george.kyria...@intel.com>> wrote: Hello ceph-users! My setup is that I’d like to use RBD images as a replication target of a FreeNAS zfs pool. I have a 2nd

Re: [ceph-users] Looking for experience

2020-01-09 Thread Ed Kalk
It sounds like an I/O bottleneck (either max IOPS or max throughput) in the making. If you are looking for cold storage archival data only, then it may be ok.(if it doesn't matter how long it takes to write the data) If this is production data with any sort of IOPs load or data change rate,

Re: [ceph-users] Looking for experience

2020-01-09 Thread Stefan Priebe - Profihost AG
As a starting point the current idea is to use something like: 4-6 nodes with 12x 12tb disks each 128G Memory AMD EPYC 7302P 3GHz, 16C/32T 128GB RAM Something to discuss is - EC or go with 3 replicas. We'll use bluestore with compression. - Do we need something like Intel Optane for WAL / DB or

Re: [ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread Ilya Dryomov
On Thu, Jan 9, 2020 at 2:52 PM Kyriazis, George wrote: > > Hello ceph-users! > > My setup is that I’d like to use RBD images as a replication target of a > FreeNAS zfs pool. I have a 2nd FreeNAS (in a VM) to act as a backup target > in which I mount the RBD image. All this (except the source

Re: [ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread Stefan Kooman
Quoting Kyriazis, George (george.kyria...@intel.com): > > Hmm, I meant you can use large block size for the large files and small > block size for the small files. > > Sure, but how to do that. As far as I know block size is a property of the > pool, not a single file. recordsize:

Re: [ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread Kyriazis, George
On Jan 9, 2020, at 9:27 AM, Stefan Kooman mailto:ste...@bit.nl>> wrote: Quoting Kyriazis, George (george.kyria...@intel.com): On Jan 9, 2020, at 8:00 AM, Stefan Kooman mailto:ste...@bit.nl>> wrote: Quoting Kyriazis, George

Re: [ceph-users] Looking for experience

2020-01-09 Thread Stefan Priebe - Profihost AG
> Am 09.01.2020 um 16:10 schrieb Wido den Hollander : > >  > >> On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: >> Hi Wido, >>> Am 09.01.20 um 14:18 schrieb Wido den Hollander: >>> >>> >>> On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: Am 09.01.20 um 13:39 schrieb

Re: [ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread Stefan Kooman
Quoting Kyriazis, George (george.kyria...@intel.com): > > > > On Jan 9, 2020, at 8:00 AM, Stefan Kooman wrote: > > > > Quoting Kyriazis, George (george.kyria...@intel.com): > > > >> The source pool has mainly big files, but there are quite a few > >> smaller (<4KB) files that I’m afraid will

Re: [ceph-users] Looking for experience

2020-01-09 Thread Joachim Kraftmayer
I would try to scale horizontally with smaller ceph nodes, so you have the advantage of being able to choose an EC profile that does not require too much overhead and you can use failure domain host. Joachim Am 09.01.2020 um 15:31 schrieb Wido den Hollander: On 1/9/20 2:27 PM, Stefan

Re: [ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread Kyriazis, George
> On Jan 9, 2020, at 8:00 AM, Stefan Kooman wrote: > > Quoting Kyriazis, George (george.kyria...@intel.com): > >> The source pool has mainly big files, but there are quite a few >> smaller (<4KB) files that I’m afraid will create waste if I create the >> destination zpool with ashift > 12

Re: [ceph-users] Looking for experience

2020-01-09 Thread Wido den Hollander
On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: > Hi Wido, > Am 09.01.20 um 14:18 schrieb Wido den Hollander: >> >> >> On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: >>> >>> Am 09.01.20 um 13:39 schrieb Janne Johansson: I'm currently trying to workout a concept for

Re: [ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread Stefan Kooman
Quoting Kyriazis, George (george.kyria...@intel.com): > The source pool has mainly big files, but there are quite a few > smaller (<4KB) files that I’m afraid will create waste if I create the > destination zpool with ashift > 12 (>4K blocks). I am not sure, > though, if ZFS will actually write

[ceph-users] RBD EC images for a ZFS pool

2020-01-09 Thread Kyriazis, George
Hello ceph-users! My setup is that I’d like to use RBD images as a replication target of a FreeNAS zfs pool. I have a 2nd FreeNAS (in a VM) to act as a backup target in which I mount the RBD image. All this (except the source FreeNAS server) is in Proxmox. Since I am using RBD as a backup

Re: [ceph-users] monitor ghosted

2020-01-09 Thread Peter Eisch
As oddly as it drifted away it came back. Next time, should there be a next time, I will snag logs as suggested by Sascha. The window for all this was, local time: 9:02 am - disassociated; 11:20 pm - associated. No changes were made, I did reboot the mon02 host at 1 pm. No other network or

[ceph-users] OSD Marked down unable to restart continuously failing

2020-01-09 Thread Radhakrishnan2 S
Hello Everyone, One of the OSD node out of 16 has 12 OSD's with a bcache as NVMe, locally those osd daemons seem to be up and running, while the ceph osd tree shows them as down. Logs show that OSD's have struck IO for over 4096 sec. I tried checking for iostat, netstat, ceph -w along with

Re: [ceph-users] Looking for experience

2020-01-09 Thread Stefan Priebe - Profihost AG
Hi Wido, Am 09.01.20 um 14:18 schrieb Wido den Hollander: > > > On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: >> >> Am 09.01.20 um 13:39 schrieb Janne Johansson: >>> >>> I'm currently trying to workout a concept for a ceph cluster which can >>> be used as a target for backups

Re: [ceph-users] Looking for experience

2020-01-09 Thread Wido den Hollander
On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: > > Am 09.01.20 um 13:39 schrieb Janne Johansson: >> >> I'm currently trying to workout a concept for a ceph cluster which can >> be used as a target for backups which satisfies the following >> requirements: >> >> -

Re: [ceph-users] Looking for experience

2020-01-09 Thread Daniel Aberger - Profihost AG
Am 09.01.20 um 13:39 schrieb Janne Johansson: > > I'm currently trying to workout a concept for a ceph cluster which can > be used as a target for backups which satisfies the following > requirements: > > - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s > > > You might

Re: [ceph-users] Looking for experience

2020-01-09 Thread Janne Johansson
> > > I'm currently trying to workout a concept for a ceph cluster which can > be used as a target for backups which satisfies the following requirements: > > - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s > You might need to have a large (at least non-1) number of writers to get to that

[ceph-users] Looking for experience

2020-01-09 Thread Daniel Aberger - Profihost AG
Hello, I'm currently trying to workout a concept for a ceph cluster which can be used as a target for backups which satisfies the following requirements: - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s - 500 Tbyte total available space Does anyone we have experience with a ceph cluster

Re: [ceph-users] Install specific version using ansible

2020-01-09 Thread Konstantin Shalygin
Hello all! I'm trying to install a specific version of luminous (12.2.4). In the directory group_vars/all.yml I can specify the luminous version, but i didn't find a place where I can be more specific about the version. The ansible installs the latest version (12.2.12 at this time). I'm using

Re: [ceph-users] CRUSH rebalance all at once or host-by-host?

2020-01-09 Thread Stefan Kooman
Quoting Sean Matheny (s.math...@auckland.ac.nz): > I tested this out by setting norebalance and norecover, moving the host > buckets under the rack buckets (all of them), and then unsetting. Ceph starts > melting down with escalating slow requests, even with backfill and recovery > parameters

Re: [ceph-users] CRUSH rebalance all at once or host-by-host?

2020-01-08 Thread Sean Matheny
I tested this out by setting norebalance and norecover, moving the host buckets under the rack buckets (all of them), and then unsetting. Ceph starts melting down with escalating slow requests, even with backfill and recovery parameters set to throttle. I moved the host buckets back to the

Re: [ceph-users] monitor ghosted

2020-01-08 Thread Brad Hubbard
On Thu, Jan 9, 2020 at 5:48 AM Peter Eisch wrote: > Hi, > > This morning one of my three monitor hosts got booted from the Nautilus > 14.2.4 cluster and it won’t regain. There haven’t been any changes, or > events at this site at all. The conf file is the [unchanged] and the same > as the other

Re: [ceph-users] monitor ghosted

2020-01-08 Thread sascha a.
what does ceph mon dump say? If i run into problems like this im reprovisioning the monitor and readd it from scratch, this works, but if this is best practice i dont know.. Peter Eisch schrieb am Mi., 8. Jan. 2020, 20:48: > Hi, > > This morning one of my three monitor hosts got booted from

[ceph-users] monitor ghosted

2020-01-08 Thread Peter Eisch
Hi, This morning one of my three monitor hosts got booted from the Nautilus 14.2.4 cluster and it won’t regain. There haven’t been any changes, or events at this site at all. The conf file is the [unchanged] and the same as the other two monitors. The host is also running the MDS and MGR

Re: [ceph-users] Log format in Ceph

2020-01-08 Thread Sinan Polat
Hi Stefan, I do not want to know the reason. I want to parse Ceph logs (and use it in Elastic). But without knowing the log format I can’t parse. I know that the first and second ‘words’ are date + timestamp, but what about the 3rd-5th words of a log line? Sinan > Op 8 jan. 2020 om 09:48

Re: [ceph-users] Log format in Ceph

2020-01-08 Thread Stefan Kooman
Quoting Sinan Polat (si...@turka.nl): > Hi, > > > I couldn't find any documentation or information regarding the log format in > Ceph. For example, I have 2 log lines (see below). For each 'word' I would > like > to know what it is/means. > > As far as I know, I can break the log lines into: >

[ceph-users] Log format in Ceph

2020-01-08 Thread Sinan Polat
Hi, I couldn't find any documentation or information regarding the log format in Ceph. For example, I have 2 log lines (see below). For each 'word' I would like to know what it is/means. As far as I know, I can break the log lines into: [date] [timestamp] [unknown] [unknown] [unknown] [pthread]

[ceph-users] CRUSH rebalance all at once or host-by-host?

2020-01-07 Thread Sean Matheny
We’re adding in a CRUSH hierarchy retrospectively in preparation for a big expansion. Previously we only had host and osd buckets, and now we’ve added in rack buckets. I’ve got sensible settings to limit rebalancing set, at least what has worked in the past: osd_max_backfills = 1

Re: [ceph-users] Infiniband backend OSD communication

2020-01-07 Thread Nathan Stratton
Ok, so ipoib is required... ><> nathan stratton On Mon, Jan 6, 2020 at 4:45 AM Wei Zhao wrote: > From my understanding, the basic idea is that ceph exchange rdma > information(qp,gid and so) through ip address on rdma device, and then > communicate with each other throng rdma. But in my

[ceph-users] ceph (jewel) unable to recover after node failure

2020-01-07 Thread Hanspeter Kunz
here is the output of ceph health detail: HEALTH_ERR 16 pgs are stuck inactive for more than 300 seconds; 134 pgs backfill_wait; 11 pgs backfilling; 69 pgs degraded; 14 pgs down; 2 pgs incomplete; 14 pgs peering; 6 pgs recovery_wait; 69 pgs stuck degraded; 16 pgs stuck inactive; 167 pgs stuck

[ceph-users] ceph (jewel) unable to recover after node failure

2020-01-07 Thread Hanspeter Kunz
Hi, after a node failure ceph is unable to recover, i.e. unable to reintegrate the failed node back into the cluster. what happened? 1. a node with 11 osds crashed, the remaining 4 nodes (also with 11 osds each) re-balanced, although reporting the following error condition: too many PGs per OSD

Re: [ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-07 Thread Stefan Kooman
Quoting Paul Emmerich (paul.emmer...@croit.io): > We've also seen some problems with FileStore on newer kernels; 4.9 is the > last kernel that worked reliably with FileStore in my experience. > > But I haven't seen problems with BlueStore related to the kernel version > (well, except for that

<    1   2   3   4   5   6   7   8   9   10   >