Re: [ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

2018-11-08 Thread Martin Verges
Hello Vlad, Ceph clients connect to the primary OSD of each PG. If you create a crush rule for building1 and one for building2 that takes a OSD from the same building as the first one, your reads to the pool will always be on the same building (if the cluster is healthy) and only write request get

Re: [ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

2018-11-08 Thread Alexandre DERUMIER
>>If you're using kernel client for cephfs, I strongly advise to have the >>client on the same subnet as the ceph public one i.e all traffic should be on >>the same subnet/VLAN. Even if your firewall situation is good, if you >>have >>to cross subnets or VLANs, you will run into weird problems l

Re: [ceph-users] [Ceph-community] Pool broke after increase pg_num

2018-11-08 Thread Ashley Merrick
Are you sure the down OSD didn't happen to have any data required for the re-balance to complete? How long has the down now removed OSD been out? Before or after your increased PG count? If you do "ceph health detail" and then pick a stuck PG what does "ceph pg PG query" output? Has your ceph -s

[ceph-users] How to repair rstats mismatch

2018-11-08 Thread Bryan Henderson
How does one repair an rstats mismatch detected by 'scrub_path' (caused by a previous failure to write the journal)? And how bad is an rstats mismatch? What are rstats used for? I see one thing the mismatch does, apparently, is make it impossible to delete the directory, as Cephfs says it isn't

[ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

2018-11-08 Thread Vlad Kopylov
I am trying to test replicated ceph with servers in different buildings, and I have a read problem. Reads from one building go to osd in another building and vice versa, making reads slower then writes! Making read as slow as slowest node. Is there a way to - disable parallel read (so it reads onl

Re: [ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

2018-11-08 Thread Linh Vu
If you're using kernel client for cephfs, I strongly advise to have the client on the same subnet as the ceph public one i.e all traffic should be on the same subnet/VLAN. Even if your firewall situation is good, if you have to cross subnets or VLANs, you will run into weird problems later. Fuse

Re: [ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

2018-11-08 Thread Alexandre DERUMIER
Ok, It seem to come from firewall, I'm seeing dropped session exactly 15min before the log. The sessions are the session to osd, session to mon && mds are ok. Seem that keeplive2 is used to monitor the mon session https://patchwork.kernel.org/patch/7105641/ but I'm not sure about osd sessions

Re: [ceph-users] CephFS kernel client versions - pg-upmap

2018-11-08 Thread Linh Vu
Kernel 4.13+ (i tested up to 4.18) missed some non-essential feature (explained by a Ceph dev on this ML) that was in Luminous, so they show up as Jewel, but otherwise they're fully compatible with upmap. We have a few hundred nodes on the kernel client with CephFS, and we also run balancer with

Re: [ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

2018-11-08 Thread Alexandre DERUMIER
To be more precise, the logs occurs when the hang is finished. I have looked at stats on 10 differents hang, and the duration is always around 15 minutes. Maybe related to: ms tcp read timeout Description:If a client or daemon makes a request to another Ceph daemon and does not drop an un

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Graham Allan
Thanks for the extra info, but I find the question more nuanced than that. For example in my case I ended up with 12.2.9 on my last handful of newly-installed servers (and this is simply using yum rather than explicitly ceph-deploy). We are replacing hardware nodes so the cluster is actively

Re: [ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Matthew Vernon
Hi, On 08/11/2018 22:38, Ken Dreyer wrote: What's the full apt-get command you're running? I wasn't using apt-get, because the ceph repository has the broken 12.2.9 packages in it (and I didn't want to install them, obviously); so I downloaded all the .debs I needed, installed the dependenc

Re: [ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Ken Dreyer
Hi Matthew, What's the full apt-get command you're running? On Thu, Nov 8, 2018 at 9:31 AM Matthew Vernon wrote: > > Hi, > > in Jewel, /etc/bash_completion.d/radosgw-admin is in the radosgw package > In Luminous, /etc/bash_completion.d/radosgw-admin is in the ceph-common > package > > ...so if yo

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Ricardo J. Barberis
El Jueves 08/11/2018 a las 06:17, Marc Roos escribió: > And that is why I don't like ceph-deploy. Unless you have maybe hundreds > of disks, I don’t see why you cannot install it "manually". We do have another cluster with 600+ disks, this one has 91 so far. We actually started using ceph-deploy

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Neha Ojha
On Thu, Nov 8, 2018 at 12:16 PM, Ricardo J. Barberis wrote: > Hi Neha, thank you for the info. > > I'd like to clarify that we didn't actually upgrade to 12.2.9, we just > installed 4 more OSD servers and those got 12.2.9, so we have a mixture > of 12.2.9 and 12.2.8. > > Should we: > - keep as is

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Ricardo J. Barberis
Hi Neha, thank you for the info. I'd like to clarify that we didn't actually upgrade to 12.2.9, we just installed 4 more OSD servers and those got 12.2.9, so we have a mixture of 12.2.9 and 12.2.8. Should we: - keep as is and wait for 12.2.10+ before proceeding? - downgrade our newest OSDs from 1

Re: [ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Matthew Vernon
On 08/11/2018 16:31, Matthew Vernon wrote: Hi, in Jewel, /etc/bash_completion.d/radosgw-admin is in the radosgw package In Luminous, /etc/bash_completion.d/radosgw-admin is in the ceph-common package ...so if you try and upgrade, you get: Unpacking ceph-common (12.2.8-1xenial) over (10.2.9-0

[ceph-users] cephfs kernel, hang with libceph: osdx X.X.X.X socket closed (con state OPEN)

2018-11-08 Thread Alexandre DERUMIER
Hi, we are currently test cephfs with kernel module (4.17 and 4.18) instead fuse (worked fine), and we have hang, iowait jump like crazy for around 20min. client is a qemu 2.12 vm with virtio-net interface. Is the client logs, we are seeing this kind of logs: [jeu. nov. 8 12:20:18 2018] lib

Re: [ceph-users] CephFS kernel client versions - pg-upmap

2018-11-08 Thread Ilya Dryomov
On Thu, Nov 8, 2018 at 5:10 PM Stefan Kooman wrote: > > Quoting Stefan Kooman (ste...@bit.nl): > > I'm pretty sure it isn't. I'm trying to do the same (force luminous > > clients only) but ran into the same issue. Even when running 4.19 kernel > > it's interpreted as a jewel client. Here is the li

Re: [ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Matthew Vernon
On 08/11/2018 16:31, Matthew Vernon wrote: The exact versioning would depend on when the move was made (I presume either Jewel -> Kraken or Kraken -> Luminous). Does anyone know? To answer my own question, this went into 12.0.3 via https://github.com/ceph/ceph/commit/9fd30b93f7281fad70b93512f0

[ceph-users] Packaging bug breaks Jewel -> Luminous upgrade

2018-11-08 Thread Matthew Vernon
Hi, in Jewel, /etc/bash_completion.d/radosgw-admin is in the radosgw package In Luminous, /etc/bash_completion.d/radosgw-admin is in the ceph-common package ...so if you try and upgrade, you get: Unpacking ceph-common (12.2.8-1xenial) over (10.2.9-0ubuntu0.16.04.1) ... dpkg: error processing

Re: [ceph-users] CephFS kernel client versions - pg-upmap

2018-11-08 Thread Stefan Kooman
Quoting Stefan Kooman (ste...@bit.nl): > I'm pretty sure it isn't. I'm trying to do the same (force luminous > clients only) but ran into the same issue. Even when running 4.19 kernel > it's interpreted as a jewel client. Here is the list I made so far: > > Kernel 4.13 / 4.15: > "featu

[ceph-users] [Ceph-community] Pool broke after increase pg_num

2018-11-08 Thread Gesiel Galvão Bernardes
Em qui, 8 de nov de 2018 às 10:00, Joao Eduardo Luis escreveu: > Hello Gesiel, > > Welcome to Ceph! > > In the future, you may want to address the ceph-users list > (`ceph-users@lists.ceph.com`) for this sort of issues. > > Thank you, I will do. On 11/08/2018 11:18 AM, Gesiel Galvão Bernardes wr

Re: [ceph-users] CephFS kernel client versions - pg-upmap

2018-11-08 Thread Ilya Dryomov
On Thu, Nov 8, 2018 at 2:15 PM Stefan Kooman wrote: > > Quoting Ilya Dryomov (idryo...@gmail.com): > > On Sat, Nov 3, 2018 at 10:41 AM wrote: > > > > > > Hi. > > > > > > I tried to enable the "new smart balancing" - backend are on RH luminous > > > clients are Ubuntu 4.15 kernel. > [cut] > > > ok

Re: [ceph-users] CephFS kernel client versions - pg-upmap

2018-11-08 Thread Stefan Kooman
Quoting Ilya Dryomov (idryo...@gmail.com): > On Sat, Nov 3, 2018 at 10:41 AM wrote: > > > > Hi. > > > > I tried to enable the "new smart balancing" - backend are on RH luminous > > clients are Ubuntu 4.15 kernel. [cut] > > ok, so 4.15 kernel connects as a "hammer" (<1.0) client? Is there a > > hu

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Alfredo Deza
On Thu, Nov 8, 2018 at 3:02 AM Janne Johansson wrote: > > Den ons 7 nov. 2018 kl 18:43 skrev David Turner : > > > > My big question is that we've had a few of these releases this year that > > are bugged and shouldn't be upgraded to... They don't have any release > > notes or announcement and th

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2018-11-08 Thread Wido den Hollander
On 11/8/18 12:28 PM, Hector Martin wrote: > On 11/8/18 5:52 PM, Wido den Hollander wrote: >> [osd] >> bluestore_cache_size_ssd = 1G >> >> The BlueStore Cache size for SSD has been set to 1GB, so the OSDs >> shouldn't use more then that. >> >> When dumping the mem pools each OSD claims to be usin

Re: [ceph-users] mount rbd read only

2018-11-08 Thread Wido den Hollander
On 11/8/18 1:05 PM, ST Wong (ITSC) wrote: > Hi, > >   > > We created a testing rbd block device image as following: > >   > > - cut here --- > > # rbd create 4copy/foo --size 10G > > # rbd feature disable 4copy/foo object-map fast-diff deep-flatten > > # rbd --image 4copy/foo info

Re: [ceph-users] mount rbd read only

2018-11-08 Thread Ashley Merrick
What command are you using to mount the /dev/rbd0 to start with? You seem to have missed that on your copy and paste. On Thu, Nov 8, 2018 at 8:06 PM ST Wong (ITSC) wrote: > Hi, > > > > We created a testing rbd block device image as following: > > > > - cut here --- > > # rbd create 4copy

[ceph-users] mount rbd read only

2018-11-08 Thread ST Wong (ITSC)
Hi, We created a testing rbd block device image as following: - cut here --- # rbd create 4copy/foo --size 10G # rbd feature disable 4copy/foo object-map fast-diff deep-flatten # rbd --image 4copy/foo info rbd image 'foo': size 10 GiB in 2560 objects order 22 (4 MiB object

[ceph-users] Effects of restoring a cluster's mon from an older backup

2018-11-08 Thread Hector Martin
I'm experimenting with single-host Ceph use cases, where HA is not important but data durability is. How does a Ceph cluster react to its (sole) mon being rolled back to an earlier state? The idea here is that the mon storage may not be redundant but would be (atomically, e.g. lvm snapshot and dum

Re: [ceph-users] [Ceph-community] Pool broke after increase pg_num

2018-11-08 Thread Joao Eduardo Luis
Hello Gesiel, Welcome to Ceph! In the future, you may want to address the ceph-users list (`ceph-users@lists.ceph.com`) for this sort of issues. On 11/08/2018 11:18 AM, Gesiel Galvão Bernardes wrote: > Hi everyone, > > I am a beginner in Ceph. I made a increase of pg_num in a pool, and > after 

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2018-11-08 Thread Hector Martin
On 11/8/18 5:52 PM, Wido den Hollander wrote: > [osd] > bluestore_cache_size_ssd = 1G > > The BlueStore Cache size for SSD has been set to 1GB, so the OSDs > shouldn't use more then that. > > When dumping the mem pools each OSD claims to be using between 1.8GB and > 2.2GB of memory. > > $ ceph d

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Martin Verges
Hello Marc, > - You can use this separate from the commandline? yes, we don't take apart any feature or possible way, but we don't recommend it > - And if I modify something from the commandline, these changes are visible > in the webinterface? yes, we just ask Ceph/Linux for it's current state

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2018-11-08 Thread Wido den Hollander
On 11/8/18 11:34 AM, Stefan Kooman wrote: > Quoting Wido den Hollander (w...@42on.com): >> Hi, >> >> Recently I've seen a Ceph cluster experience a few outages due to memory >> issues. >> >> The machines: >> >> - Intel Xeon E3 CPU >> - 32GB Memory >> - 8x 1.92TB SSD >> - Ubuntu 16.04 >> - Ceph 1

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2018-11-08 Thread Stefan Kooman
Quoting Wido den Hollander (w...@42on.com): > Hi, > > Recently I've seen a Ceph cluster experience a few outages due to memory > issues. > > The machines: > > - Intel Xeon E3 CPU > - 32GB Memory > - 8x 1.92TB SSD > - Ubuntu 16.04 > - Ceph 12.2.8 What kernel version is running? What network card

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Marc Roos
H interesting maybe, - You can use this separate from the commandline? - And if I modify something from the commandline, these changes are visible in the webinterface? - I can easily remove/add this webinterface? I mean sometimes you have these tools that just customize the whole enviro

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Martin Verges
Sorry to say this, but that's why there's the croit management interface (free community edition feature). You don't have to worry about problems that are absolutely critical for reliable and stable operation. It doesn't matter if you run a cluster with 10 or 1000 hard disks, it just has to run! O

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Marc Roos
I know ceph is meant to operate at scale, that’s why we are all here. But if you have a 180 disk cluster, you have 6-9 nodes, is nothing when you add a node. I would just do the manual install and especially with a production environment, considering all the 'little' bugs surfacing here. I do

[ceph-users] ERR scrub mismatch

2018-11-08 Thread Marco Aroldi
Hello, Since upgrade from Jewel to Luminous 12.2.8, in the logs are reported some errors related to "scrub mismatch", every day at the same time. I have 5 mon (from mon.0 to mon.4) and I need help to indentify and recover from this problem. This is the log: 2018-11-07 15:13:53.808128 [ERR] mon.4

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Matthew Vernon
On 08/11/2018 09:17, Marc Roos wrote: And that is why I don't like ceph-deploy. Unless you have maybe hundreds of disks, I don’t see why you cannot install it "manually". ...as the recent ceph survey showed, plenty of people have hundreds of disks! Ceph is meant to be operated at scale, whi

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Simon Ironside
On 08/11/2018 09:17, Marc Roos wrote: And that is why I don't like ceph-deploy. Unless you have maybe hundreds of disks, I don’t see why you cannot install it "manually". On 07/11/2018 22:22, Ricardo J. Barberis wrote: Also relevant: if you use ceph-deploy like I do con CentOS 7, it ins

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Marc Roos
And that is why I don't like ceph-deploy. Unless you have maybe hundreds of disks, I don’t see why you cannot install it "manually". -Original Message- From: Ricardo J. Barberis [mailto:rica...@palmtx.com.ar] Sent: woensdag 7 november 2018 23:23 To: ceph-users@lists.ceph.com Subject

[ceph-users] Unexplainable high memory usage OSD with BlueStore

2018-11-08 Thread Wido den Hollander
Hi, Recently I've seen a Ceph cluster experience a few outages due to memory issues. The machines: - Intel Xeon E3 CPU - 32GB Memory - 8x 1.92TB SSD - Ubuntu 16.04 - Ceph 12.2.8 Looking at one of the machines: root@ceph22:~# free -h totalusedfree shared buff

Re: [ceph-users] [bug] mount.ceph man description is wrong

2018-11-08 Thread xiang . dai
Sure. Seems that there is a test itself bug: https://jenkins.ceph.com/job/ceph-pull-requests-arm64/25498/console Best Wishes - Original Message - From: "Ilya Dryomov" To: "xiang.dai" Cc: "ceph-users" Sent: Wednesday, November 7, 2018 10:40:13 PM Subject: Re: [ceph-users] [bug] mount.c

[ceph-users] Automated Deep Scrub always inconsistent

2018-11-08 Thread Ashley Merrick
Have in the past few days noticed that every single automated deep scrub comes back as inconsistent, once I run a manual deep-scrub it finishes fine and the PG is marked as clean. I am running the latest mimic but have noticed someone else under luminous is facing the same issue : http://lists.cep

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Valmar Kuristik
I'll second that. We are in progress of upgrading after just receiving new hardware, and it looks like right now, digging information on what to do and exactly how will take literally hundreds of times more time and effort than the upgrade itself, once you know. On 08.11.2018 10:02, Janne

Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Janne Johansson
Den ons 7 nov. 2018 kl 18:43 skrev David Turner : > > My big question is that we've had a few of these releases this year that are > bugged and shouldn't be upgraded to... They don't have any release notes or > announcement and the only time this comes out is when users finally ask about > it we