[ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-16 Thread Uwe Sauter
Hi folks, I'm currently chewing on an issue regarding "slow requests are blocked". I'd like to identify the OSD that is causing those events once the cluster is back to HEALTH_OK (as I have no monitoring yet that would get this info in realtime). Collecting this information could help identify

Re: [ceph-users] Nfs-ganesha 2.6 packages in ceph repo

2018-05-16 Thread Oliver Freyermuth
Hi David, did you already manage to check your librados2 version and manage to pin down the issue? Cheers, Oliver Am 11.05.2018 um 17:15 schrieb Oliver Freyermuth: > Hi David, > > Am 11.05.2018 um 16:55 schrieb David C: >> Hi Oliver >> >> Thanks for the detailed reponse! I've

Re: [ceph-users] multi site with cephfs

2018-05-16 Thread David Turner
Object storage multi-site is very specific to using object storage. It uses the RGW API's to sync s3 uploads between each site. For CephFS you might be able to do a sync of the rados pools, but I don't think that's actually a thing yet. RBD mirror is also a layer on top of things to sync

Re: [ceph-users] a big cluster or several small

2018-05-16 Thread Alexandre DERUMIER
Hi, >>Our main reason for using multiple clusters is that Ceph has a bad >>reliability history when scaling up and even now there are many issues >>unresolved (https://tracker.ceph.com/issues/21761 for example) so by >>dividing single, large cluster into few smaller ones, we reduce the impact

Re: [ceph-users] Single ceph cluster for the object storage service of 2 OpenStack clouds

2018-05-16 Thread Massimo Sgaravatto
Thanks a lot ! On Tue, May 15, 2018 at 7:44 PM, David Turner wrote: > Yeah, that's how we do multiple zones. I find following the documentation > for multi-site (but not actually setting up a second site) to work well for > setting up multiple realms in a single cluster.

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-16 Thread Alexandre DERUMIER
Hi, I'm able to have fixed frequency with intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=1 Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz # cat /proc/cpuinfo |grep MHz cpu MHz : 3400.002 cpu MHz : 3399.994 cpu MHz : 3399.995 cpu MHz : 3399.994 cpu

Re: [ceph-users] slow requests are blocked

2018-05-16 Thread Paul Emmerich
By looking at the operations that are slow in your dump_*_ops command. We've found that it's best to move all the metadata stuff for RGW onto SSDs, i.e., all pools except the actual data pool. But that depends on your use case and whether the slow requests you are seeing is actually a problem for

Re: [ceph-users] multi site with cephfs

2018-05-16 Thread John Hearns
The answer given at the seminar yesterday was that a practical limit was around 60km. I don't think 100km is that much longer. I defer to the experts here. On 16 May 2018 at 15:24, Up Safe wrote: > Hi, > > About a 100 km. > I have a 2-4ms latency between them. > >

Re: [ceph-users] multi site with cephfs

2018-05-16 Thread Up Safe
But this is not the question here. The question is whether I can configure multi site for CephFS. Will I be able to do so by following the guide to set up the multi site for object storage? Thanks On Wed, May 16, 2018, 16:45 John Hearns wrote: > The answer given at the

Re: [ceph-users] Public network faster than cluster network

2018-05-16 Thread Gandalf Corvotempesta
No more advices for a new cluster ? Sorry for these multiple posts but I had some trouble with ML. I'm getting "Access Denied" Il giorno ven 11 mag 2018 alle ore 10:21 Gandalf Corvotempesta < gandalf.corvotempe...@gmail.com> ha scritto: > no more advices for a new cluster ? > Il giorno gio 10

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-16 Thread Wido den Hollander
On 05/16/2018 01:22 PM, Blair Bethwaite wrote: > On 15 May 2018 at 08:45, Wido den Hollander > wrote: > > > We've got some Skylake Ubuntu based hypervisors that we can look at to > > compare tomorrow... > > > > Awesome! > > > Ok, so

Re: [ceph-users] multi site with cephfs

2018-05-16 Thread Up Safe
Hi, About a 100 km. I have a 2-4ms latency between them. Leon On Wed, May 16, 2018, 16:13 John Hearns wrote: > Leon, > I was at a Lenovo/SuSE seminar yesterday and asked a similar question > regarding separated sites. > How far apart are these two geographical

Re: [ceph-users] slow requests are blocked

2018-05-16 Thread Grigory Murashov
Hello Paul! Thanks for your answer. How did you understand it's RGW Metadata stuff? No, I don't use any SSDs. Where I can find out more about Metadata pools, using SSD etc?.. Thanks. Grigory Murashov Voximplant 15.05.2018 23:42, Paul Emmerich пишет: Looks like it's mostly RGW metadata

Re: [ceph-users] multi site with cephfs

2018-05-16 Thread John Hearns
Leon, I was at a Lenovo/SuSE seminar yesterday and asked a similar question regarding separated sites. How far apart are these two geographical locations? It does matter. On 16 May 2018 at 15:07, Up Safe wrote: > Hi, > > I'm trying to build a multi site setup. > But the

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-16 Thread John Hearns
Blair, methinks someone is doing bitcoin mining on your systems when they are idle :-) I WAS going to say that maybe the cpupower utility needs an update to cope with that generation of CPUs. But 7proc/cpuinfo never lies (does it ?) On 16 May 2018 at 13:22, Blair Bethwaite

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-16 Thread Uwe Sauter
Hi Mohamad, >> I'm currently chewing on an issue regarding "slow requests are blocked". I'd >> like to identify the OSD that is causing those events >> once the cluster is back to HEALTH_OK (as I have no monitoring yet that >> would get this info in realtime). >> >> Collecting this information

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-16 Thread Mohamad Gebai
On 05/16/2018 07:18 AM, Uwe Sauter wrote: > Hi Mohamad, > >> >> I think this is what you're looking for: >> >> $> ceph daemon osd.X dump_historic_slow_ops >> >> which gives you recent slow operations, as opposed to >> >> $> ceph daemon osd.X dump_blocked_ops >> >> which returns current blocked

[ceph-users] multi site with cephfs

2018-05-16 Thread Up Safe
Hi, I'm trying to build a multi site setup. But the only guides I've found on the net were about building it with object storage or rbd. What I need is cephfs. I.e. I need to have 2 synced file storages at 2 geographical locations. Is this possible? Also, if I understand correctly - cephfs is

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-16 Thread Mohamad Gebai
Hi, On 05/16/2018 04:16 AM, Uwe Sauter wrote: > Hi folks, > > I'm currently chewing on an issue regarding "slow requests are blocked". I'd > like to identify the OSD that is causing those events > once the cluster is back to HEALTH_OK (as I have no monitoring yet that would > get this info in

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-16 Thread Blair Bethwaite
Possibly, but I think they'd be using the V100s rather than the CPUs. For reference: rr42-03:~$ sudo cpupower monitor -l Monitor "Nehalem" (4 states) - Might overflow after 92200 s C3 [C] -> Processor Core C3 C6 [C] -> Processor Core C6 PC3 [P] -> Processor Package C3 PC6

[ceph-users] ceph as storage for docker registry

2018-05-16 Thread Tomasz Płaza
Hi All, We are running ceph 12.2.3 as storage for docker registry with swift API. This is the only workload with swift API on our ceph cluster. We need to run radosgw-admin bucket check --fix --check-objects --bucket docker-registry from time to time to fix issue described

[ceph-users] RBD features and feature journaling performance

2018-05-16 Thread Jorge Pinilla López
I'm trying to better understand rbd features but I have only found the information on the RBD page, is there any further RBD feature information and implementation? also I would like to know about journaling feature, it seems to destroy rbd performance: Without journaling feature: rbd bench

Re: [ceph-users] Poor CentOS 7.5 client performance

2018-05-16 Thread Donald "Mac" McCarthy
CephFS. 8 core atom C2758, 16 GB ram, 256GB ssd, 2.5 GB NIC (supermicro microblade node). Read test: dd if=/ceph/1GB.test of=/dev/null bs=1M Write dd if=/dev/zero of=/ceph/out.test bs=1M count=1024 The tests are identical on both kernels - the results... well that is a different story.

[ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
I'm sending this message to both dovecot and ceph-users ML so please don't mind if something seems too obvious for you. Hi, I have a question for both dovecot and ceph lists and below I'll explain what's going on. Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox), when using

Re: [ceph-users] a big cluster or several small

2018-05-16 Thread Matthew Vernon
Hi, On 14/05/18 17:49, Marc Boisis wrote: > Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients > only, 1 single pool (size=3). That's not a large cluster. > We want to divide this cluster into several to minimize the risk in case > of failure/crash. > For example, a

Re: [ceph-users] a big cluster or several small

2018-05-16 Thread Jack
For what it worth, yahoo published their setup some years ago: https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at 54 nodes per cluster for 3.2PB of raw storage, I guess this leads to 16 * 4TB hdd per node, thus 896 per cluster (they may have used ssd as

Re: [ceph-users] Poor CentOS 7.5 client performance

2018-05-16 Thread Jason Dillaman
What is your client (librbd, krbd, CephFS, ceph-client, ...) and how are you testing performance? On Wed, May 16, 2018 at 11:14 AM, Donald "Mac" McCarthy wrote: > Recently upgraded a CEPH client to CentOS 7.5. Upon doing so read and write > performance became intolerably

[ceph-users] Increasing number of PGs by not a factor of two?

2018-05-16 Thread Oliver Schulz
Dear all, we have a Ceph cluster that has slowly evolved over several years and Ceph versions (started with 18 OSDs and 54 TB in 2013, now about 200 OSDs and 1.5 PB, still the same cluster, with data continuity). So there are some "early sins" in the cluster configuration, left over from the

[ceph-users] Poor CentOS 7.5 client performance

2018-05-16 Thread Donald "Mac" McCarthy
Recently upgraded a CEPH client to CentOS 7.5. Upon doing so read and write performance became intolerably slow. ~2.5 MB/s. When booted back to a CentOS 7.4 kernel, performance went back to a normal 200 MB/s read and write. I have not seen any mention of this issue in all of the normal

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-16 Thread Brad Hubbard
On Wed, May 16, 2018 at 6:16 PM, Uwe Sauter wrote: > Hi folks, > > I'm currently chewing on an issue regarding "slow requests are blocked". I'd > like to identify the OSD that is causing those events > once the cluster is back to HEALTH_OK (as I have no monitoring yet

Re: [ceph-users] RBD features and feature journaling performance

2018-05-16 Thread Konstantin Shalygin
I'm trying to better understand rbd features but I have only found the information on the RBD page, is there any further RBD feature information and implementation? http://tracker.ceph.com/issues/15000 k ___ ceph-users mailing list

Re: [ceph-users] Intepreting reason for blocked request

2018-05-16 Thread Gregory Farnum
On Sat, May 12, 2018 at 3:22 PM Bryan Henderson wrote: > I recently had some requests blocked indefinitely; I eventually cleared it > up by recycling the OSDs, but I'd like some help interpreting the log > messages > that supposedly give clue as to what caused the

Re: [ceph-users] ceph-volume and systemd troubles

2018-05-16 Thread Andras Pataki
Done: tracker #24152 Thanks, Andras On 05/16/2018 04:58 PM, Alfredo Deza wrote: On Wed, May 16, 2018 at 4:50 PM, Andras Pataki wrote: Dear ceph users, I've been experimenting setting up a new node with ceph-volume and bluestore. Most of the setup works

Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

2018-05-16 Thread Gregory Farnum
On Wed, May 16, 2018 at 6:49 AM Siegfried Höllrigl < siegfried.hoellr...@xidras.com> wrote: > Hi Greg ! > > Thank you for your fast reply. > > We have now deleted the PG on OSD.130 like you suggested and started it : > > ceph-s-06 # ceph-objectstore-tool --data-path > /var/lib/ceph/osd/ceph-130/

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Jack
Hi, Many (most ?) filesystems does not store multiple files on the same block Thus, with sdbox, every single mail (you know, that kind of mail with 10 lines in it) will eat an inode, and a block (4k here) mdbox is more compact on this way Another difference: sdbox removes the message, mdbox

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
Hello Jack, yes, I imagine I'll have to do some work on tuning the block size on cephfs. Thanks for the advise. I knew that using mdbox, messages are not removed but I though that was true in sdbox too. Thanks again. We'll soon do benchmarks of sdbox vs mdbox over cephfs with bluestore backend.

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Danny Al-Gaaf
Hi, some time back we had similar discussions when we, as an email provider, discussed to move away from traditional NAS/NFS storage to Ceph. The problem with POSIX file systems and dovecot is that e.g. with mdbox only around ~20% of the IO operations are READ/WRITE, the rest are metadata IOs.

Re: [ceph-users] Nfs-ganesha 2.6 packages in ceph repo

2018-05-16 Thread Oliver Freyermuth
Hi David, thanks for the reply! Interesting that the package was not installed - it was for us, but the machines we run the nfs-ganesha servers on are also OSDs, so it might have been pulled in via ceph-packages for us. In any case, I'd say this means librados2 as dependency is missing

Re: [ceph-users] ceph-volume and systemd troubles

2018-05-16 Thread Alfredo Deza
On Wed, May 16, 2018 at 4:50 PM, Andras Pataki wrote: > Dear ceph users, > > I've been experimenting setting up a new node with ceph-volume and > bluestore. Most of the setup works right, but I'm running into a strange > interaction between ceph-volume and systemd

Re: [ceph-users] Nfs-ganesha 2.6 packages in ceph repo

2018-05-16 Thread David C
Hi Oliver Thanks for following up. I just picked this up again today and it was indeed librados2...the package wasn't installed! It's working now, haven't tested much but I haven't noticed any problems yet. This is with nfs-ganesha-2.6.1-0.1.el7.x86_64, libcephfs2-12.2.5-0.el7.x86_64 and

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
Hello Danny, I actually saw that thread and I was very excited about it. I thank you all for that idea and all the effort being put in it. I haven't yet tried to play around with your plugin but I intend to, and to contribute back. I think when it's ready for production it will be unbeatable. I

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Jack
On 05/16/2018 09:35 PM, Webert de Souza Lima wrote: > We'll soon do benchmarks of sdbox vs mdbox over cephfs with bluestore > backend. > We'll have to do some some work on how to simulate user traffic, for writes > and readings. That seems troublesome. I would appreciate seeing these results ! >

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
Thanks Jack. That's good to know. It is definitely something to consider. In a distributed storage scenario we might build a dedicated pool for that and tune the pool as more capacity or performance is needed. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC

[ceph-users] ceph-volume and systemd troubles

2018-05-16 Thread Andras Pataki
Dear ceph users, I've been experimenting setting up a new node with ceph-volume and bluestore.  Most of the setup works right, but I'm running into a strange interaction between ceph-volume and systemd when starting OSDs. After preparing/activating the OSD, a systemd unit instance is created

[ceph-users] OpenStack Summit Vancouver 2018

2018-05-16 Thread Leonardo Vaz
Hey Cephers, As many of you know the OpenStack Summit Vancouver starts on next Monday, May 21st and the vibrant Ceph Community will be present! We created the following pad to organize the Ceph activities during the conference: http://pad.ceph.com/p/openstack-summit-vancouver-2018 If you're