[ceph-users] Help on diag needed : heartbeat_failed

2019-11-26 Thread Vincent Godin
We encounter a strange behavior on our Mimic 13.2.6 cluster. A any time, and without any load, some OSDs become unreachable from only some hosts. It last 10 mn and then the problem vanish. It 's not always the same OSDs and the same hosts. There is no network failure on any of the host (because

Re: [ceph-users] Impact of a small DB size with Bluestore

2019-11-26 Thread Vincent Godin
The documentation tell to size the DB to 4% of the disk data ie 240GB for a 6 TB disk. Plz gives more explanations when your answer disagree with the documentation ! Le lun. 25 nov. 2019 à 11:00, Konstantin Shalygin a écrit : > > I have an Ceph cluster which was designed for file store. Each

[ceph-users] Impact of a small DB size with Bluestore

2019-11-25 Thread Vincent Godin
I have an Ceph cluster which was designed for file store. Each host have 5 SSDs write intensive of 400GB and 20 HDD of 6TB. So each HDD have a WAL of 5 GB on SSD If i want to put Bluestore on this cluster, i can only allocate ~75GB of WAL and DB on SSD for each HDD which is far below the 4% limit

Re: [ceph-users] ceph-objectstore-tool manual

2018-10-15 Thread Vincent Godin
complete manual Le lun. 15 oct. 2018 à 14:26, Matthew Vernon a écrit : > > Hi, > > On 15/10/18 11:44, Vincent Godin wrote: > > Does a man exist on ceph-objectstore-tool ? if yes, where can i find it ? > > No, but there is some --help output: > > root@sto-1-1:~

[ceph-users] ceph-objectstore-tool manual

2018-10-15 Thread Vincent Godin
Does a man exist on ceph-objectstore-tool ? if yes, where can i find it ? Thx ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Strange Ceph host behaviour

2018-10-02 Thread Vincent Godin
Ceph cluster in Jewel 10.2.11 Mons & Hosts are on CentOS 7.5.1804 kernel 3.10.0-862.6.3.el7.x86_64 Everyday, we can see in ceph.log on Monitor a lot of logs like these : 2018-10-02 16:07:08.882374 osd.478 192.168.1.232:6838/7689 386 : cluster [WRN] map e612590 wrongly marked me down 2018-10-02

[ceph-users] Release for production

2018-09-07 Thread Vincent Godin
Hello Cephers, if i had to go for production today, which release should i choose : Luminous or Mimic ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Recovery time is very long till we have a double tree in the crushmap

2018-05-22 Thread Vincent Godin
Two monthes ago, we had a simple crushmap : - one root - one region - two datacenters - one room per datacenter - two pools per room (one SATA and one SSD) - hosts in SATA pool only - osds in host So we created a ceph pool at the level SATA on each site. After some disk problems which impacted

Re: [ceph-users] New Ceph cluster design

2018-03-10 Thread Vincent Godin
Hi, As i understand it, you'll have one RAID1 of two SSDs for 12 HDDs. A WAL is used for all writes on your host. If you have good SSDs, they can handle 450-550 MBpsc. Your 12 HDDs SATA can handle 12 x 100 MBps that is to say 1200 GBps. So your RAID 1 will be the bootleneck with this design. A

[ceph-users] Re Two datacenter resilient design with a quorum site

2018-01-17 Thread Vincent Godin
Hello Alex, We have a similar design. Two Datacenters at short distance (sharing the same level 2 network) and one Datacenter at long range (more than 100km) for our Ceph cluster. Let's call these sites A1, A2 and B. We set 2 Mons on A1, 2 Mons on A2 and 1 Mon on B. A1 and A2 shared a same level

Re: [ceph-users] One object degraded cause all ceph requests hang - Jewel 10.2.6 (rbd + radosgw)

2018-01-11 Thread Vincent Godin
As no response were given, i will explain what i found : maybe it could help other people .dirXXX object is an index marker with a 0 data size. The metadata associated to this object (located in the levelDB of the OSDs currently holding this marker) is the index of the bucket corresponding to

[ceph-users] How to get the usage of an indexless-bucket

2018-01-11 Thread Vincent Godin
How to know the usage of an indexless bucket ? We need to have this information for our billing process ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] help needed after an outage - Is it possible to rebuild a bucket index ?

2018-01-04 Thread Vincent Godin
Yesterday we had an outage on our ceph cluster. One OSD was looping on << [call rgw.bucket_complete_op] snapc 0=[] ack+ondisk+write+known_if_redirected e359833) currently waiting for degraded object >> for hours blocking all the requests to this OSD and then ... We had to delete the degraded

[ceph-users] One object degraded cause all ceph requests hang - Jewel 10.2.6 (rbd + radosgw)

2018-01-04 Thread Vincent Godin
Yesterday we just encountered this bug. One OSD was looping on "2018-01-03 16:20:59.148121 7f011a6a1700 0 log_channel(cluster) log [WRN] : slow request 30.254269 seconds old, received at 2018-01-03 16:20:28.883837: osd_op(client.48285929.0:14601958 35.8abfc02e

[ceph-users] How to raise priority for a pg repair

2017-12-15 Thread Vincent Godin
We have some scrub errors on our cluster. A ceph pg repair x.xxx is take in account only after hours. It seems to be linked to deep-scrubs which are running at the same time. It 's look like it has to wait for a slot before launching the repair. I have then two question : is it possible to launch

Re: [ceph-users] Ceph OSD on Hardware RAID

2017-10-02 Thread Vincent Godin
In addition to the points that you made : I noticed on RAID0 disk that read IO errors are not always trapped by ceph leading to unattended behaviour of the impacted OSD daemon. On both RAID0 disk or non-RAID disk, a IO error is trapped on /var/log/messages Oct 2 15:20:37 os-ceph05 kernel: sd

Re: [ceph-users] erasure code profile

2017-09-25 Thread Vincent Godin
If you have at least 2 hosts per room, you can use a k=3 and m=3 and place 2 shards per room (one on each host). So you'll need 3 shards to read the data : you can loose a room and one host in the two other rooms and still get your data.It covers a double faults which is better. It will take more

Re: [ceph-users] Bug in OSD Maps

2017-05-29 Thread Vincent Godin
We had similar problem few month ago when migrating from hammer to jewel. We encountered some old bugs (which were declared closed on Hammer !!!l). We had some OSDs refusing to start because of lack of pg map like yours, some others which were completly busy and start declaring valid OSDs losts =>

Re: [ceph-users] Creating journal on needed partition

2017-04-18 Thread Vincent Godin
Hi, If you're using ceph-deploy, just run the command : ceph-deploy osd prepare --overwrite-conf {your_host}:/dev/sdaa:/dev/sdaf2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Need erasure coding, pg and block size explanation

2017-03-21 Thread Vincent Godin
When we use a replicated pool of size 3 for example, each data, a block of 4MB is written on one PG which is distributed on 3 hosts (by default). The osd holding the primary will copy the block to OSDs holding the secondary and third PG. With erasure code, let's take a raid5 schema like k=2 and

[ceph-users] idea about optimize an osd rebuild

2017-03-21 Thread Vincent Godin
when you replace a failed osd, it has to recover all of its pgs and so it is pretty busy. Is it possible to tell the OSD to not become primary for any of its already synchronized pgs till every pgs (of the OSD) have recover ? It should accelerate the rebuild process because the OSD won't have to

Re: [ceph-users] [Jewel] upgrade 10.2.3 => 10.2.5 KO : first OSD server freeze every two days :)

2017-03-09 Thread Vincent Godin
First of all, don't do a ceph upgrade while your cluster is in warning or error state. A process upgrade must be done from an clean cluster. Don't stay with a replicate at 2. Majority of problems come from that point: just look the advices given by experience users of the list. You should set a

[ceph-users] S3 Radosgw : how to grant a user within a tenant

2017-02-17 Thread Vincent Godin
I created 2 users : jack & bob inside a tenant_A jack created a bucket named BUCKET_A and want to give read access to the user bob with s3cmd, i can grant a user without tenant easylly: s3cmd setacl --acl-grant=read:user s3://BUCKET_A but with an explicit tenant, i tried : --acl-grant=read:bob

Re: [ceph-users] mkfs.ext4 hang on RBD volume

2017-01-17 Thread Vincent Godin
read > deadlock within librbd. > > On Mon, Jan 16, 2017 at 1:12 PM, Vincent Godin <vince.ml...@gmail.com> > wrote: > > We are using librbd on a host with CentOS 7.2 via virtio-blk. This server > > hosts the VMs on which we are doing our tests. But we have exactly th

Re: [ceph-users] mkfs.ext4 hang on RBD volume

2017-01-16 Thread Vincent Godin
Ceph version is Jewel 10.2.3 > > Ceph clients, mons and servers have the kernel > 3.10.0-327.36.3.el7.x86_64 > > on CentOS 7.2 > > > > 2017-01-13 20:07 GMT+01:00 Jason Dillaman <jdill...@redhat.com>: > >> > >> You might be hitting this issue [1] where mk

Re: [ceph-users] mkfs.ext4 hang on RBD volume

2017-01-16 Thread Vincent Godin
com>: > You might be hitting this issue [1] where mkfs is issuing lots of > discard operations. If you get a chance, can you retest w/ the "-E > nodiscard" option? > > Thanks > > [1] http://tracker.ceph.com/issues/16689 > > On Fri, Jan 13, 2017 at 12:5

[ceph-users] Questions about rbd image features

2017-01-13 Thread Vincent Godin
We are using a production cluster which started in Firefly, then moved to Giant, Hammer and finally Jewel. So our images have different features correspondind to the value of "rbd_default_features" of the version when they were created. We have actually three pack of features activated : image

Re: [ceph-users] Performance issues on Jewel 10.2.2

2016-12-15 Thread Vincent Godin
Hello, I didn't look at your video but i already can tell you some tracks : 1 - there is a bug in 10.2.2 which make the client cache not working. The client cache works as it never recieved a flush so it will stay in writethrough mode. This bug is clear in 10.2.3 2 - 2 SSDs in JBOD and 12 x 4TB

[ceph-users] Problems after upgrade to Jewel

2016-11-23 Thread Vincent Godin
Hello, We had our cluster failed again this morning. It took almost the day to stabilize.Here are some problems in OSD's logs we have encountered : *Some OSDs refused to start :* -1> 2016-11-23 15:50:49.507588 7f5f5b7a5800 -1 osd.27 196774 load_pgs: have pgid 9.268 at epoch 196874, but missing

[ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

2016-11-16 Thread Vincent Godin
Hello, We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2 (initial was hammer 0.94.5) but we have still some big problems on our production environment : - some ceph filesystem are not mounted at startup and we have to mount them with the "/bin/sh -c 'flock /var/lock/ceph-disk

[ceph-users] Big problems encoutered during upgrade from hammer 0.94.5 to jewel 10.2.3

2016-11-13 Thread Vincent Godin
After a test on a non production environment, we decided to upgrade our running cluster to jewel 10.2.3. Our cluster has 3 monitors and 8 nodes of 20 disks. The cluster is in hammer 0.94.5 with tunables set to "bobtail". As the cluster is in production and it wasn't possible to upgrade ceph client

[ceph-users] Upgrade from Hammer to Jewel

2016-10-25 Thread Vincent Godin
We have an Openstack which use Ceph for Cinder and Glance. Ceph is in Hammer release and we need to upgrade to Jewel. My question is : are the Hammer clients compatible with the Jewel servers ? (upgrade of Mon then Ceph servers first) As the upgrade of the Ceph client need a reboot of all the

[ceph-users] Modify placement group pg and pgp in production environment

2016-10-13 Thread Vincent Godin
When you increase your pg number, the new pgs will have to peer first and during this time they will be unreachable.So you need to put the cluster in maintenance mode for this operation. The way to upgrade the number of PG and the PGP of a running cluster is : - First, it's very important to

Re: [ceph-users] Increase PG number

2016-09-20 Thread Vincent Godin
Hi, In fact, when you increase your pg number, the new pgs will have to peer first and during this time, a lot a pg will be unreachable. The best way to upgrade the number of PG of a cluster (you 'll need to adjust the number of PGP too) is : - Don't forget to apply Goncalo advices to keep

Re: [ceph-users] 1 active+undersized+degraded+remapped+wait_backfill+backfill_toofull ???

2016-07-25 Thread Vincent Godin
I restart osd.80 and till now : no bakfill_toofull anymore 2016-07-25 17:46 GMT+02:00 M Ranga Swami Reddy <swamire...@gmail.com>: > can you restart osd.80 and check see, if the recovery procced? > > Thanks > Swami > > On Mon, Jul 25, 2016 at 9:05 PM, Vincent Godin <vin

[ceph-users] Fwd: 1 active+undersized+degraded+remapped+wait_backfill+backfill_toofull ???

2016-07-25 Thread Vincent Godin
The OSD 140 is 73.61% used and its backfill_full_ratio is 0.85 too -- Forwarded message -- From: Vincent Godin <vince.ml...@gmail.com> Date: 2016-07-25 17:35 GMT+02:00 Subject: 1 active+undersized+degraded+remapped+wait_backfill+backfill_toofull ??? To: ceph-users@lists.ce

[ceph-users] 1 active+undersized+degraded+remapped+wait_backfill+backfill_toofull ???

2016-07-25 Thread Vincent Godin
Hi, I'm facing this problem. The cluster is in Hammer 0.94.5 When i do a ceph health detail, i can see : pg 8.c1 is stuck unclean for 21691.555742, current state active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last acting [140] pg 8.c1 is stuck undersized for 21327.027365,

Re: [ceph-users] multiple journals on SSD

2016-07-12 Thread Vincent Godin
Hello. I've been testing Intel 3500 as journal store for few HDD-based OSD. I stumble on issues with multiple partitions (>4) and UDEV (sda5, sda6,etc sometime do not appear after partition creation). And I'm thinking that partition is not that useful for OSD management, because linux do no allow

[ceph-users] Ceph Cache Tier

2016-06-08 Thread Vincent Godin
Is there now a stable version of Ceph in Hammer and/or Infernalis whis which we can safely use cache tier in write back mode ? I saw few month ago a post saying that we have to wait for a next release to use it safely. ___ ceph-users mailing list

Re: [ceph-users] How to set a new Crushmap in production

2016-01-21 Thread Vincent Godin
u learn something new everyday. > > > [1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg26017.html > - > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Wed, Jan 20, 2016 at 7:11 AM, Vincent Godin

[ceph-users] How to set a new Crushmap in production

2016-01-20 Thread Vincent Godin
Hi, I need to import a new crushmap in production (the old one is the default one) to define two datacenters and to isolate SSD from SATA disk. What is the best way to do this without starting an hurricane on the platform ? Till now, i was just using hosts (SATA OSD) on one datacenter with the