Re: [ceph-users] v12.2.11 Luminous released

2019-02-07 Thread Dan van der Ster
On Fri, Feb 1, 2019 at 10:18 PM Neha Ojha wrote: > > On Fri, Feb 1, 2019 at 1:09 PM Robert Sander > wrote: > > > > Am 01.02.19 um 19:06 schrieb Neha Ojha: > > > > > If you would have hit the bug, you should have seen failures like > > > https://tracker.ceph.com/issues/36686. > > > Yes,

Re: [ceph-users] ceph mon_data_size_warn limits for large cluster

2019-02-07 Thread M Ranga Swami Reddy
Alternatively, will increase the mon_data_size to 30G (from 15G).. Thanks Swami On Thu, Feb 7, 2019 at 8:44 PM Dan van der Ster wrote: > > On Thu, Feb 7, 2019 at 4:12 PM M Ranga Swami Reddy > wrote: > > > > >Compaction isn't necessary -- you should only need to restart all > > >peon's then

Re: [ceph-users] Luminous to Mimic: MON upgrade requires "full luminous scrub". What is that?

2019-02-07 Thread Paul Emmerich
You need to run a full deep scrub before continuing the upgrade, the reason for this is that the deep scrub migrates the format of some snapshot-related on-disk data structure. Looks like you only tried a normal scrub, not a deep-scrub Paul -- Paul Emmerich Looking for help with your Ceph

Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Igor Fedotov
On 2/7/2019 6:06 PM, Eugen Block wrote: At first - you should upgrade to 12.2.11 (or bring the mentioned patch in by other means) to fix rename procedure which will avoid new inconsistent objects appearance in DB. This should at least reduce the OSD crash frequency. We'll have to wait until

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread jesper
> On 07/02/2019 17:07, jes...@krogh.cc wrote: > Thanks for your explanation. In your case, you have low concurrency > requirements, so focusing on latency rather than total iops is your > goal. Your current setup gives 1.9 ms latency for writes and 0.6 ms for > read. These are considered good, it

Re: [ceph-users] Luminous to Mimic: MON upgrade requires "full luminous scrub". What is that?

2019-02-07 Thread Eugen Block
Hi, could it be a missing 'ceph osd require-osd-release luminous' on your cluster? When I check a luminous cluster I get this: host1:~ # ceph osd dump | grep recovery flags sortbitwise,recovery_deletes,purged_snapdirs The flags in the code you quote seem related to that. Can you check that

[ceph-users] best practices for EC pools

2019-02-07 Thread Scheurer François
Dear All We created an erasure coded pool with k=4 m=2 with failure-domain=host but have only 6 osd nodes. Is that correct that recovery will be forbidden by the crush rule if a node is down? After rebooting all nodes we noticed that the recovery was slow, maybe half an hour, but all

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Виталий Филиппов
rados bench is garbage, it creates and benches a very small amount of objects. If you want RBD better test it with fio ioengine=rbd 7 февраля 2019 г. 15:16:11 GMT+03:00, Ryan пишет: >I just ran your test on a cluster with 5 hosts 2x Intel 6130, 12x 860 >Evo >2TB SSD per host (6 per SAS3008), 2x

Re: [ceph-users] v12.2.11 Luminous released

2019-02-07 Thread Neha Ojha
On Thu, Feb 7, 2019 at 10:50 AM Dan van der Ster wrote: > > On Fri, Feb 1, 2019 at 10:18 PM Neha Ojha wrote: > > > > On Fri, Feb 1, 2019 at 1:09 PM Robert Sander > > wrote: > > > > > > Am 01.02.19 um 19:06 schrieb Neha Ojha: > > > > > > > If you would have hit the bug, you should have seen

Re: [ceph-users] Luminous to Mimic: MON upgrade requires "full luminous scrub". What is that?

2019-02-07 Thread Andrew Bruce
Hi All. I was on luminous 12.2.0 as I do *not* enable repo updates for critical software (e.g. openstack / ceph). Upgrades need to occur on an intentional basis! So I first have upgraded to luminous 12.2.11 following the guide and release notes. [root@lvtncephx110 ~]# ceph version ceph version

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Maged Mokhtar
On 07/02/2019 17:07, jes...@krogh.cc wrote: Hi Maged Thanks for your reply. 6k is low as a max write iops value..even for single client. for cluster of 3 nodes, we see from 10k to 60k write iops depending on hardware. can you increase your threads to 64 or 128 via -t parameter I can

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread jesper
> That's a usefull conclusion to take back. Last question - We have our SSD pool set to 3x replication, Micron states that NVMe is good at 2x - is this "taste and safety" or is there any general thoughts about SSD-robustness in a Ceph setup? Jesper

Re: [ceph-users] best practices for EC pools

2019-02-07 Thread Eugen Block
Hi Francois, Is that correct that recovery will be forbidden by the crush rule if a node is down? yes, that is correct, failure-domain=host means no two chunks of the same PG can be on the same host. So if your PG is divided into 6 chunks, they're all on different hosts, no recovery is

Re: [ceph-users] best practices for EC pools

2019-02-07 Thread Alan Johnson
Just to add, that a more general formula is that the number of nodes should be greater than or equal to k+m+m so N>=k+m+m for full recovery -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eugen Block Sent: Thursday, February 7, 2019 8:47 AM

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Vitaliy Filippov
Ceph is a massive overhead, so it seems it maxes out at ~1 (at most 15000) write iops per one ssd with queue depth of 128 and ~1000 iops with queue depth of 1 (1ms latency). Or maybe 2000-2500 write iops (0.4-0.5ms) with best possible hardware. Micron has only squeezed ~8750 iops from

[ceph-users] Downsizing a cephfs pool

2019-02-07 Thread Brian Topping
Hi all, I created a problem when moving data to Ceph and I would be grateful for some guidance before I do something dumb. I started with the 4x 6TB source disks that came together as a single XFS filesystem via software RAID. The goal is to have the same data on a cephfs volume, but with

Re: [ceph-users] change OSD IP it uses

2019-02-07 Thread Wido den Hollander
On 2/8/19 8:38 AM, Ashley Merrick wrote: > So I was adding a new host using ceph-deploy, for the first OSD I > accidentally run it against the hostname of the external IP and not the > internal network. > > I stopped / deleted the OSD from the new host and then re-created the > OSD using the

Re: [ceph-users] Failed to load ceph-mgr modules: telemetry

2019-02-07 Thread Wido den Hollander
On 2/8/19 8:13 AM, Ashley Merrick wrote: > I have had issues on a mimic cluster (latest release) where the > dashboard does not display any read or write ops under the pool's > section on the main dashboard page. > > I have just noticed during restarting the mgr service the following > shows

Re: [ceph-users] Failed to load ceph-mgr modules: telemetry

2019-02-07 Thread Ashley Merrick
Message went away, but obviously still don't get the stats showing in the dashboard (I am guessing this isn't a known bug currently?) and that they should be working. Everything work's fine apart from the dashboard does not show the live I/O stats. Nothing is mentioned in mgr lags at the default

[ceph-users] Failed to load ceph-mgr modules: telemetry

2019-02-07 Thread Ashley Merrick
I have had issues on a mimic cluster (latest release) where the dashboard does not display any read or write ops under the pool's section on the main dashboard page. I have just noticed during restarting the mgr service the following shows under "Cluster Logs", nothing else just the following :

[ceph-users] change OSD IP it uses

2019-02-07 Thread Ashley Merrick
So I was adding a new host using ceph-deploy, for the first OSD I accidentally run it against the hostname of the external IP and not the internal network. I stopped / deleted the OSD from the new host and then re-created the OSD using the internal hostname along with the rest of the OSD's. They

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Marc Roos
I did your rados bench test on our sm863a pool 3x rep, got similar results. [@]# rados bench -p fs_data.ssd -b 4096 10 write --no-cleanup hints = 1 Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 10 seconds or 0 objects Object prefix:

Re: [ceph-users] CephFS overwrite/truncate performance hit

2019-02-07 Thread Marc Roos
Is this difference not related to chaching? And you filling up some cache/queue at some point? If you do a sync after each write, do you have still the same results? -Original Message- From: Hector Martin [mailto:hec...@marcansoft.com] Sent: 07 February 2019 06:51 To:

Re: [ceph-users] ceph mon_data_size_warn limits for large cluster

2019-02-07 Thread M Ranga Swami Reddy
Hi Sage Sure, we will increase the mon_data_size to 30G to avoid this type of warning. And currently we are using 500G disk here. I guss, which should good enough here. Thanks Swami On Wed, Feb 6, 2019 at 5:56 PM Sage Weil wrote: > > Hi Swami > > The limit is somewhat arbitrary, based on

Re: [ceph-users] ceph OSD cache ration usage

2019-02-07 Thread M Ranga Swami Reddy
Hello - We are using the ceph osd nodes with cache controller cache of 1G size. Are there any recommendation for using the cache for read and write? Here we are using - HDDs with colocated journals. For SSD journal - 0% cache and 100% write. Thanks On Mon, Feb 4, 2019 at 6:07 PM M Ranga Swami

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Marc Roos
4xnodes, around 100GB, 2x2660, 10Gbit, 2xLSI Logic SAS2308 Thanks for the confirmation Marc Can you put in a but more hardware/network details? Jesper ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] I get weird ls pool detail output 12.2.11

2019-02-07 Thread Hector Martin
On 07/02/2019 18:17, Marc Roos wrote: 250~1,2252~1,2254~1,2256~1,2258~1,225a~1,225c~1,225e~1,2260~1,2262~1,226 4~1,2266~1,2268~1,226a~1,226c~1,226e~1,2270~1,2272~1,2274~1,2276~1,2278~ 1,227a~1,227c~1,227e~1,2280~1,2282~1,2284~1,2286~1,2288~1,228a~1,228c~1,

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread jesper
Thanks for the confirmation Marc Can you put in a but more hardware/network details? Jesper ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] I get weird ls pool detail output 12.2.11

2019-02-07 Thread Marc Roos
Hmmm, I am having a daily cron job creating these only on maybe 100 directories. I am removing the snapshot if it exists with a rmdir. Should I do this differently? Maybe eg use snap-20190101, snap-20190102, snap-20190103 then I will always create unique directories and the ones removed

Re: [ceph-users] I get weird ls pool detail output 12.2.11

2019-02-07 Thread Hector Martin
On 07/02/2019 19:19, Marc Roos wrote: Hmmm, I am having a daily cron job creating these only on maybe 100 directories. I am removing the snapshot if it exists with a rmdir. Should I do this differently? Maybe eg use snap-20190101, snap-20190102, snap-20190103 then I will always create unique

Re: [ceph-users] ceph mon_data_size_warn limits for large cluster

2019-02-07 Thread M Ranga Swami Reddy
Hi Dan, >During backfilling scenarios, the mons keep old maps and grow quite >quickly. So if you have balancing, pg splitting, etc. ongoing for >awhile, the mon stores will eventually trigger that 15GB alarm. >But the intended behavior is that once the PGs are all active+clean, >the old maps

Re: [ceph-users] I get weird ls pool detail output 12.2.11

2019-02-07 Thread Marc Roos
> >> >> >> Hmmm, I am having a daily cron job creating these only on maybe 100 >> directories. I am removing the snapshot if it exists with a rmdir. >> Should I do this differently? Maybe eg use snap-20190101, snap-20190102, >> snap-20190103 then I will always create unique directories

Re: [ceph-users] CephFS overwrite/truncate performance hit

2019-02-07 Thread Hector Martin
On 07/02/2019 19:47, Marc Roos wrote: Is this difference not related to chaching? And you filling up some cache/queue at some point? If you do a sync after each write, do you have still the same results? No, the slow operations are slow from the very beginning. It's not about filling a

[ceph-users] I get weird ls pool detail output 12.2.11

2019-02-07 Thread Marc Roos
ceph osd pool ls detail pool 20 'fs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 29032 flags hashpspool stripe_width 0 application cephfs removed_snaps [3~1,5~31,37~768,7a0~3,7a4~b10,12b5~3,12b9~3,12bd~22c,14ea~22e,1719~b04,

Re: [ceph-users] I get weird ls pool detail output 12.2.11

2019-02-07 Thread Marc Roos
Also on pools that are empty, looks like on all cephfs data pools. pool 55 'fs_data.ec21.ssd' erasure size 3 min_size 3 crush_rule 6 object_hash rjenkins pg_num 8 pgp_num 8 last_change 29032 flags hashpspool,ec_overwrites stripe_width 8192 application cephfs removed_snaps

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread jesper
> On Thu, 7 Feb 2019 08:17:20 +0100 jes...@krogh.cc wrote: >> Hi List >> >> We are in the process of moving to the next usecase for our ceph cluster >> (Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and >> that works fine. >> >> We're currently on luminous / bluestore, if

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Wido den Hollander
On 2/7/19 8:41 AM, Brett Chancellor wrote: > This seems right. You are doing a single benchmark from a single client. > Your limiting factor will be the network latency. For most networks this > is between 0.2 and 0.3ms.  if you're trying to test the potential of > your cluster, you'll need

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread jesper
> On 2/7/19 8:41 AM, Brett Chancellor wrote: >> This seems right. You are doing a single benchmark from a single client. >> Your limiting factor will be the network latency. For most networks this >> is between 0.2 and 0.3ms.  if you're trying to test the potential of >> your cluster, you'll need

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Ryan
I just ran your test on a cluster with 5 hosts 2x Intel 6130, 12x 860 Evo 2TB SSD per host (6 per SAS3008), 2x bonded 10GB NIC, 2x Arista switches. Pool with 3x replication rados bench -p scbench -b 4096 10 write --no-cleanup hints = 1 Maintaining 16 concurrent writes of 4096 bytes to objects of

Re: [ceph-users] I get weird ls pool detail output 12.2.11

2019-02-07 Thread Hector Martin
On 07/02/2019 20:21, Marc Roos wrote: I also do not exactly know how many I have. It is sort of test setup and the bash script creates a snapshot every day. So with 100 dirs it will be a maximum of 700. But the script first checks if there is any data with getfattr --only-values --absolute-names

[ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Eugen Block
Hi list, I found this thread [1] about crashing SSD OSDs, although that was about an upgrade to 12.2.7, we just hit (probably) the same issue after our update to 12.2.10 two days ago in a production cluster. Just half an hour ago I saw one OSD (SSD) crashing (for the first time):

Re: [ceph-users] ceph mon_data_size_warn limits for large cluster

2019-02-07 Thread Dan van der Ster
On Thu, Feb 7, 2019 at 12:17 PM M Ranga Swami Reddy wrote: > > Hi Dan, > >During backfilling scenarios, the mons keep old maps and grow quite > >quickly. So if you have balancing, pg splitting, etc. ongoing for > >awhile, the mon stores will eventually trigger that 15GB alarm. > >But the intended

Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Igor Fedotov
Hi Eugen, looks like this isn't [1] but rather https://tracker.ceph.com/issues/38049 and https://tracker.ceph.com/issues/36541 (= https://tracker.ceph.com/issues/36638 for luminous) Hence it's not fixed in 12.2.10, target release is 12.2.11 Also please note the patch allows to avoid new

Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Eugen Block
Hi Igor, thanks for the quick response! Just to make sure I don't misunderstand, and because it's a production cluster: before anything else I should run fsck on that OSD? Depending on the result we'll decide how to continue, right? Is there anything else to be enabled for that command or

[ceph-users] Cephfs strays increasing and using hardlinks

2019-02-07 Thread Marc Roos
I read here [0] that to get strays removed, you have to 'touch' them or 'getattr on all the remote links'. Is this still necessary in luminous 12.2.11? Or is there meanwhile a manual option to force purging of strays? [@~]# ceph daemon mds.c perf dump | grep strays "num_strays":

Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Igor Fedotov
Eugen, At first - you should upgrade to 12.2.11 (or bring the mentioned patch in by other means) to fix rename procedure which will avoid new inconsistent objects appearance in DB. This should at least reduce the OSD crash frequency. At second - theoretically previous crashes could result

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Maged Mokhtar
On 07/02/2019 09:17, jes...@krogh.cc wrote: Hi List We are in the process of moving to the next usecase for our ceph cluster (Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and that works fine. We're currently on luminous / bluestore, if upgrading is deemed to change what

Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Eugen Block
At first - you should upgrade to 12.2.11 (or bring the mentioned patch in by other means) to fix rename procedure which will avoid new inconsistent objects appearance in DB. This should at least reduce the OSD crash frequency. We'll have to wait until 12.2.11 is available for openSUSE, I'm

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread jesper
Hi Maged Thanks for your reply. > 6k is low as a max write iops value..even for single client. for cluster > of 3 nodes, we see from 10k to 60k write iops depending on hardware. > > can you increase your threads to 64 or 128 via -t parameter I can absolutely get it higher by increasing the

Re: [ceph-users] ceph mon_data_size_warn limits for large cluster

2019-02-07 Thread M Ranga Swami Reddy
>Compaction isn't necessary -- you should only need to restart all >peon's then the leader. A few minutes later the db's should start >trimming. As we on production cluster, which may not be safe to restart the ceph-mon, instead prefer to do the compact on non-leader mons. Is this ok? Thanks

Re: [ceph-users] ceph mon_data_size_warn limits for large cluster

2019-02-07 Thread Dan van der Ster
On Thu, Feb 7, 2019 at 4:12 PM M Ranga Swami Reddy wrote: > > >Compaction isn't necessary -- you should only need to restart all > >peon's then the leader. A few minutes later the db's should start > >trimming. > > As we on production cluster, which may not be safe to restart the > ceph-mon,

[ceph-users] Luminous to Mimic: MON upgrade requires "full luminous scrub". What is that?

2019-02-07 Thread Andrew Bruce
Hello All! Yesterday started upgrade from luminous to mimic with one of my 3 MONs. After applying mimic yum repo and updating - a restart reports the following error from the MON log file: ==> /var/log/ceph/ceph-mon.lvtncephx121.log <== 2019-02-07 10:02:40.110 7fc8283ed700 -1