Re: [ceph-users] Beginner questions

2020-01-16 Thread Bastiaan Visser
There is no difference in allocation between replication or EC. If failure domain is host, one osd per host ok s used for a PG. So if you use a 2+1 EC profile with a host failure domain, you need 3 hosts for a healthy cluster. The pool will go read-only when you have a failure (host or disk), or

Re: [ceph-users] Beginner questions

2020-01-16 Thread Dave Hall
Bastiaan, Regarding EC pools:   Our concern at 3 nodes is that 2-way replication seems risky - if the two copies don't match, which one is corrupted.  However,  3-way replication on a 3 node cluster triples the price per TB.   Doing EC pools that are the equivalent of RAID-5 2+1 seems like

Re: [ceph-users] Weird mount issue (Ubuntu 18.04, Ceph 14.2.5 & 14.2.6)

2020-01-16 Thread Aaron
This debugging started because the ceph-provisioner from k8s was making those users...but what we found was doing something similar by hand caused the same issue. Just surprised no one else using k8s and ceph backed PVC/PVs ran into this issue. Thanks again for all your help! Cheers Aaron On

Re: [ceph-users] Weird mount issue (Ubuntu 18.04, Ceph 14.2.5 & 14.2.6)

2020-01-16 Thread Aaron
No worries, can definitely do that. Cheers Aaron On Thu, Jan 16, 2020 at 8:08 PM Jeff Layton wrote: > On Thu, 2020-01-16 at 18:42 -0500, Jeff Layton wrote: > > On Wed, 2020-01-15 at 08:05 -0500, Aaron wrote: > > > Seeing a weird mount issue. Some info: > > > > > > No LSB modules are

Re: [ceph-users] [External Email] RE: Beginner questions

2020-01-16 Thread DHilsbos
Paul; So is the 3/30/300GB a limit of RocksDB, or of Bluestore? The percentages you list, are they used DB / used data? If so... Where do you get the used DB data from? Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc.

Re: [ceph-users] [External Email] RE: Beginner questions

2020-01-16 Thread Paul Emmerich
Discussing DB size requirements without knowing the exact cluster requirements doesn't work. Here are some real-world examples: cluster1: CephFS, mostly large files, replicated x3 0.2% used for metadata cluster2: radosgw, mix between replicated and erasure, mixed file sizes (lots of tiny files,

[ceph-users] Snapshots and Backup from Horizon to ceph s3 buckets

2020-01-16 Thread Radhakrishnan2 S
Hello, We are trying to route backups & snapshots of cinder volume and nova instances, into the s3 buckets hosted on ceph. Currently ceph is the block storage target as well. What we want to achieve ? 1. all snapshots of cinder volumes / nova instance to be routed to s3 buckets of that

Re: [ceph-users] [External Email] RE: Beginner questions

2020-01-16 Thread Bastiaan Visser
Dave made a good point WAL + DB might end up a little over 60G, I would probably go with ~70Gig partitions /LV's per OSD in your case. (if the nvme drive is smart enough to spread the writes over all available capacity, mort recent nvme's are). I have not yet seen a WAL larger or even close to

Re: [ceph-users] ceph nautilus cluster name

2020-01-16 Thread Ignazio Cassano
Hello Stefan, but if I want to use rbd mirroring I must have site-a.conf and site-b.conf on one of my nodes.probably one of the mon nodes. Is it only a configuration on ceph client side ? Thanks Ignazio Il Gio 16 Gen 2020, 22:13 Stefan Kooman ha scritto: > Quoting Ignazio Cassano

Re: [ceph-users] ceph nautilus cluster name

2020-01-16 Thread Stefan Kooman
Quoting Ignazio Cassano (ignaziocass...@gmail.com): > Hello, I just deployed nautilus with ceph-deploy. > I did not find any option to give a cluster name to my ceph so its name is > "ceph". > Please, how can I chenge my cluster name without reinstalling ? > > Please, how can I set the cluster

[ceph-users] ceph nautilus cluster name

2020-01-16 Thread Ignazio Cassano
Hello, I just deployed nautilus with ceph-deploy. I did not find any option to give a cluster name to my ceph so its name is "ceph". Please, how can I chenge my cluster name without reinstalling ? Please, how can I set the cluster name in installation phase ? Many thanks for help Ignazio

Re: [ceph-users] Luminous Bluestore OSDs crashing with ASSERT

2020-01-16 Thread Stefan Priebe - Profihost AG
Hi Igor, answers inline. Am 16.01.20 um 21:34 schrieb Igor Fedotov: > you may want to run fsck against failing OSDs. Hopefully it will shed > some light. fsck just says everything fine: # ceph-bluestore-tool --command fsck --path /var/lib/ceph/osd/ceph-27/ fsck success > Also wondering if

Re: [ceph-users] Luminous Bluestore OSDs crashing with ASSERT

2020-01-16 Thread Igor Fedotov
Stefan, you may want to run fsck against failing OSDs. Hopefully it will shed some light. Also wondering if OSD is able to recover (startup and proceed working) after facing the issue? If so do you have any one which failed multiple times? Do you have logs for these occurrences? Also

Re: [ceph-users] [External Email] RE: Beginner questions

2020-01-16 Thread DHilsbos
Dave; I don't like reading inline responses, so... I have zero experience with EC pools, so I won't pretend to give advice in that area. I would think that small NVMe for DB would be better than nothing, but I don't know. Once I got the hang of building clusters, it was relatively easy to

Re: [ceph-users] [External Email] RE: Beginner questions

2020-01-16 Thread Dave Hall
Dominic, We ended up with a 1.6TB PCIe NVMe in each node.  For 8 drives this worked out to a DB size of something like 163GB per OSD. Allowing for expansion to 12 drives brings it down to 124GB. So maybe just put the WALs on NVMe and leave the DBs on the platters? Understood that we will

Re: [ceph-users] Beginner questions

2020-01-16 Thread DHilsbos
Dave; I'd like to expand on this answer, briefly... The information in the docs is wrong. There have been many discussions about changing it, but no good alternative has been suggested, thus it hasn't been changed. The 3rd party project that Ceph's BlueStore uses for its database (RocksDB),

Re: [ceph-users] [External Email] Re: Beginner questions

2020-01-16 Thread Dave Hall
Paul, Bastiaan, Thank you for your responses and for alleviating my concerns about Nautilus.  The good news is that I can still easily move up to Debian 10.  BTW, I assume that this is still with the 4.19 kernel? Also, I'd like to inject additional customizations into my Debian configs via

Re: [ceph-users] Beginner questions

2020-01-16 Thread Paul Emmerich
Don't use Mimic, support for it is far worse than Nautilus or Luminous. I think we were the only company who built a product around Mimic, both Redhat and Suse enterprise storage was Luminous and then Nautilus skipping Mimic entirely. We only offered Mimic as a default for a limited time and

Re: [ceph-users] Luminous Bluestore OSDs crashing with ASSERT

2020-01-16 Thread Stefan Priebe - Profihost AG
Hi Igor, ouch sorry. Here we go: -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb: submit_transaction error: Corruption: block checksum mismatch code = 2 Rocksdb transaction: Put( Prefix = M key =

Re: [ceph-users] Beginner questions

2020-01-16 Thread Bastiaan Visser
I would definitely go for Nautilus. there are quite some optimizations that went in after mimic. Bluestore DB size usually ends up at either 30 or 60 GB. 30 GB is one of the sweet spots during normal operation. But during compaction, ceph writes the new data before removing the old, hence the

[ceph-users] Beginner questions

2020-01-16 Thread Dave Hall
Hello all. Sorry for the beginner questions... I am in the process of setting up a small (3 nodes, 288TB) Ceph cluster to store some research data.  It is expected that this cluster will grow significantly in the next year, possibly to multiple petabytes and 10s of nodes.  At this time I'm

Re: [ceph-users] Luminous Bluestore OSDs crashing with ASSERT

2020-01-16 Thread Igor Fedotov
Hi Stefan, would you please share log snippet prior the assertions? Looks like RocksDB is failing during transaction submission... Thanks, Igor On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote: Hello, does anybody know a fix for this ASSERT / crash? 2020-01-16 02:02:31.316394

[ceph-users] Snapshots and Backup from Horizon to ceph s3 buckets

2020-01-16 Thread Radhakrishnan2 S
Hello, We are trying to route backups & snapshots of cinder volume and nova instances, into the s3 buckets hosted on ceph. Currently ceph is the block storage target as well. What we want to achieve ? 1. all snapshots of cinder volumes / nova instance to be routed to s3 buckets of that

Re: [ceph-users] OSD's hang after network blip

2020-01-16 Thread Dan van der Ster
We upgraded to 14.2.4 back in October and this week to v14.2.6. But I don't think the cluster had a network outage until yesterday, so I wouldn't have thought this is a .6 regression. If it happens again I'll look for the waiting for map message. -- dan On Thu, Jan 16, 2020 at 12:08 PM Nick

Re: [ceph-users] ?==?utf-8?q? OSD's hang after network blip

2020-01-16 Thread Nick Fisk
On Thursday, January 16, 2020 09:15 GMT, Dan van der Ster wrote: > Hi Nick, > > We saw the exact same problem yesterday after a network outage -- a few of > our down OSDs were stuck down until we restarted their processes. > > -- Dan > > > On Wed, Jan 15, 2020 at 3:37 PM Nick Fisk wrote:

Re: [ceph-users] OSD's hang after network blip

2020-01-16 Thread Dan van der Ster
Hi Nick, We saw the exact same problem yesterday after a network outage -- a few of our down OSDs were stuck down until we restarted their processes. -- Dan On Wed, Jan 15, 2020 at 3:37 PM Nick Fisk wrote: > Hi All, > > Running 14.2.5, currently experiencing some network blips isolated to a

[ceph-users] Luminous Bluestore OSDs crashing with ASSERT

2020-01-16 Thread Stefan Priebe - Profihost AG
Hello, does anybody know a fix for this ASSERT / crash? 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1 /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time 2020-01-16 02:02:31.304993 /build/ceph/src/os/bluestore/BlueStore.cc: 8808:

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-16 Thread Massimo Sgaravatto
And I confirm that a repair is not useful. As as far I can see it simply "cleans" the error (without modifying the big object) but the error of course reappears when the deep scrub runs again on that PG Cheers, Massimo On Thu, Jan 16, 2020 at 9:35 AM Massimo Sgaravatto <

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-16 Thread Massimo Sgaravatto
In my cluster I saw that the problematic objects have been uploaded by a specific application (onedata), which I think used to upload the files doing something like: rados --pool put Now (since Luminous ?) the default object size is 128MB but if I am not wrong it was 100GB before. This would