Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Анатолий Фуников
It's strange but parted output for this disk (/dev/sdf) show me that it's GPT: (parted) print Model: ATA HGST HUS726020AL (scsi) Disk /dev/sdf: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End SizeType File system Flags 2 1049kB

Re: [ceph-users] Prioritize recovery over backfilling

2019-02-20 Thread Frédéric Nass
Hi Sage, Would be nice to have this one backported to Luminous if easy. Cheers, Frédéric. > Le 7 juin 2018 à 13:33, Sage Weil a écrit : > > On Wed, 6 Jun 2018, Caspar Smit wrote: >> Hi all, >> >> We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a node >> to it. >> >>

Re: [ceph-users] Prioritize recovery over backfilling

2019-02-20 Thread Frédéric Nass
Hi, Please keep in mind that setting the ‘nodown' flag will prevent PGs from becoming degraded but will also prevent client's requests from being served by other OSDs that would have take over the non responsive one without the ‘nodown’ flag in a healthy manner. And this the whole time the OSD

Re: [ceph-users] Urgent: Reduced data availability / All pgs inactive

2019-02-20 Thread Irek Fasikhov
Hi, You have problems with MRG. http://docs.ceph.com/docs/master/rados/operations/pg-states/ *The ceph-mgr hasn’t yet received any information about the PG’s state from an OSD since mgr started up.* чт, 21 февр. 2019 г. в 09:04, Irek Fasikhov : > Hi, > > You have problems with MRG. >

[ceph-users] ccache did not support in ceph?

2019-02-20 Thread ddu
Hi When enable ccache for ceph, error occurs: - ccache: invalid option -- 'E' ... Unable to determine C++ standard library, got . - This is because variable "CXX_STDLIB" was

Re: [ceph-users] faster switch to another mds

2019-02-20 Thread David Turner
If I'm not mistaken, if you stop them at the same time during a reboot on a node with both mds and mon, the mons might receive it, but wait to finish their own election vote before doing anything about it. If you're trying to keep optimal uptime for your mds, then stopping it first and on its own

Re: [ceph-users] faster switch to another mds

2019-02-20 Thread Patrick Donnelly
On Tue, Feb 19, 2019 at 11:39 AM Fyodor Ustinov wrote: > > Hi! > > From documentation: > > mds beacon grace > Description:The interval without beacons before Ceph declares an MDS > laggy (and possibly replace it). > Type: Float > Default:15 > > I do not understand, 15 - are is

[ceph-users] Urgent: Reduced data availability / All pgs inactive

2019-02-20 Thread Ranjan Ghosh
Hi all, hope someone can help me. After restarting a node of my 2-node-cluster suddenly I get this: root@yak2 /var/www/projects # ceph -s   cluster:     id: 749b2473-9300-4535-97a6-ee6d55008a1b     health: HEALTH_WARN     Reduced data availability: 200 pgs inactive   services:    

Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Marco Gaiarin
Mandi! Alfredo Deza In chel di` si favelave... > > Ahem, how can i add a GPT label to a non-GPT partition (even loosing > > data)? > If you are coming from ceph-disk (or something else custom-made) and > don't care about losing data, why not fully migrate to the > new OSDs? >

Re: [ceph-users] Replicating CephFS between clusters

2019-02-20 Thread Balazs Soltesz
Hi everyone, Thank you all for the quick replies! So making snapshots and rsyncing them between clusters should work, I'll be sure to check that out. Snapshot mirroring is what we'd need, but I couldn't find any release date on nautilus, and we don’t really have time to wait for its release.

Re: [ceph-users] Access to cephfs from two different networks

2019-02-20 Thread Andrés Rojas Guerrero
Ohh, sorry for the question, now I understand why we need to define differents public networks in Ceph, I understand that clients contact with the mon in order to obtain only the cluster map, from the documentation: "When a Ceph Client binds to a Ceph Monitor, it retrieves the latest copy of

Re: [ceph-users] Ceph cluster stability

2019-02-20 Thread Alexandru Cucu
Hi, I would decrese max active recovery processes per osd and increase recovery sleep. osd recovery max active = 1 (default is 3) osd recovery sleep = 1 (default is 0 or 0.1) osd max backfills defaults to 1 so that should be OK if he's using the default :D Disabling scrubbing during

Re: [ceph-users] Ceph cluster stability

2019-02-20 Thread Darius Kasparavičius
Hello, Check your CPU usage when you are doing those kind of operations. We had a similar issue where our CPU monitoring was reporting fine < 40% usage, but our load on the nodes was high mid 60-80. If it's possible try disabling ht and see the actual cpu usage. If you are hitting CPU limits you

Re: [ceph-users] RBD image format v1 EOL ...

2019-02-20 Thread Mykola Golub
On Wed, Feb 20, 2019 at 10:22:47AM +0100, Jan Kasprzak wrote: > If I read the parallel thread about pool migration in ceph-users@ > correctly, the ability to migrate to v2 would still require to stop the client > before the "rbd migration prepare" can be executed. Note, if even rbd

Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Alfredo Deza
On Wed, Feb 20, 2019 at 10:21 AM Marco Gaiarin wrote: > > Mandi! Alfredo Deza > In chel di` si favelave... > > > I think this is what happens with a non-gpt partition. GPT labels will > > use a PARTUUID to identify the partition, and I just confirmed that > > ceph-volume will enforce looking

Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Marco Gaiarin
Mandi! Alfredo Deza In chel di` si favelave... > I think this is what happens with a non-gpt partition. GPT labels will > use a PARTUUID to identify the partition, and I just confirmed that > ceph-volume will enforce looking for PARTUUID if the JSON > identified a partition (vs. an LV). > From

Re: [ceph-users] Ceph cluster stability

2019-02-20 Thread M Ranga Swami Reddy
Thats expected from Ceph by design. But in our case, we are using all recommendation like rack failure domain, replication n/w,etc, still face client IO performance issues during one OSD down.. On Tue, Feb 19, 2019 at 10:56 PM David Turner wrote: > > With a RACK failure domain, you should be

Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Alfredo Deza
On Wed, Feb 20, 2019 at 8:40 AM Анатолий Фуников wrote: > > Thanks for the reply. > blkid -s PARTUUID -o value /dev/sdf1 shows me nothing, but blkid /dev/sdf1 > shows me this: /dev/sdf1: UUID="b03810e4-dcc1-46c2-bc31-a1e558904750" > TYPE="xfs" I think this is what happens with a non-gpt

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-02-20 Thread Alexandre DERUMIER
on osd.8, at 01:20 when latency begin to increase, I have a scrub running 2019-02-20 01:16:08.851 7f84d24d9700 0 log_channel(cluster) log [DBG] : 5.52 scrub starts 2019-02-20 01:17:18.019 7f84ce4d1700 0 log_channel(cluster) log [DBG] : 5.52 scrub ok 2019-02-20 01:20:31.944 7f84f036e700 0 --

Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Анатолий Фуников
Thanks for the reply. blkid -s PARTUUID -o value /dev/sdf1 shows me nothing, but blkid /dev/sdf1 shows me this: /dev/sdf1: UUID="b03810e4-dcc1-46c2-bc31-a1e558904750" TYPE="xfs" ср, 20 февр. 2019 г. в 16:27, Alfredo Deza : > On Wed, Feb 20, 2019 at 8:16 AM Анатолий Фуников > wrote: > > > >

Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Alfredo Deza
On Wed, Feb 20, 2019 at 8:16 AM Анатолий Фуников wrote: > > Hello. I need to raise the OSD on the node after reinstalling the OS, some > OSD were made a long time ago, not even a ceph-disk, but a set of scripts. > There was an idea to get their configuration in json via ceph-volume simple >

[ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Анатолий Фуников
Hello. I need to raise the OSD on the node after reinstalling the OS, some OSD were made a long time ago, not even a ceph-disk, but a set of scripts. There was an idea to get their configuration in json via ceph-volume simple scan, and then on a fresh system I can make a ceph-volume simple

Re: [ceph-users] Access to cephfs from two different networks

2019-02-20 Thread Wido den Hollander
On 2/20/19 1:03 PM, Andrés Rojas Guerrero wrote: > Hi all, sorry, we are newbies in Ceph and we have a newbie question > about it. We have a Ceph cluster with three mon's and two public networks: > > public network = 10.100.100.0/23,10.100.101.0/21 > > We have seen that ceph-mon are listen in

[ceph-users] Access to cephfs from two different networks

2019-02-20 Thread Andrés Rojas Guerrero
Hi all, sorry, we are newbies in Ceph and we have a newbie question about it. We have a Ceph cluster with three mon's and two public networks: public network = 10.100.100.0/23,10.100.101.0/21 We have seen that ceph-mon are listen in only one of this network: tcp 0 0 10.100.100.9:6789

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-02-20 Thread Alexandre DERUMIER
Something interesting, when I have restarted osd.8 at 11:20, I'm seeing another osd.1 where latency is decreasing exactly at the same time. (without restart of this osd). http://odisoweb1.odiso.net/osd1.png onodes and cache_other are also going down for osd.1 at this time. - Mail

Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-20 Thread Igor Fedotov
You're right - WAL/DB expansion capability is present in Luminous+ releases. But David meant volume migration stuff which appeared in Nautilus, see: https://github.com/ceph/ceph/pull/23103 Thanks, Igor On 2/20/2019 9:22 AM, Konstantin Shalygin wrote: On 2/19/19 11:46 PM, David Turner

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-02-20 Thread Alexandre DERUMIER
Hi, I have hit the bug again, but this time only on 1 osd here some graphs: http://odisoweb1.odiso.net/osd8.png latency was good until 01:00 Then I'm seeing nodes miss, bluestore onodes number is increasing (seem to be normal), after that latency is slowing increasing from 1ms to 3-5ms after

Re: [ceph-users] min_size vs. K in erasure coded pools

2019-02-20 Thread Eugen Block
Hi, I see that as a security feature ;-) You can prevent data loss if k chunks are intact, but you don't want to work with the least required amount of chunks. In a disaster scenario you can reduce min_size to k temporarily, but the main goal should always be to get the OSDs back up. For

[ceph-users] min_size vs. K in erasure coded pools

2019-02-20 Thread Clausen , Jörn
Hi! While trying to understand erasure coded pools, I would have expected that "min_size" of a pool is equal to the "K" parameter. But it turns out, that it is always K+1. Isn't the description of erasure coding misleading then? In a K+M setup, I would expect to be good (in the sense of "no

Re: [ceph-users] RBD image format v1 EOL ...

2019-02-20 Thread Jan Kasprzak
Hello, Jason Dillaman wrote: : For the future Ceph Octopus release, I would like to remove all : remaining support for RBD image format v1 images baring any : substantial pushback. : : The image format for new images has been defaulted to the v2 image : format since Infernalis, the v1

Re: [ceph-users] How to change/anable/activate a different osd_memory_target value

2019-02-20 Thread Konstantin Shalygin
we run into some OSD node freezes with out of memory and eating all swap too. Till we get more physical RAM I’d like to reduce the osd_memory_target, but can’t find where and how to enable it. We have 24 bluestore Disks in 64 GB centos nodes with Luminous v12.2.11 Just set value for