[ceph-users] Re: OSD Memory usage

2020-11-26 Thread Seena Fallah
This is what happens with my cluster (Screenshots attached). At 10:11 I turn on bluefs_buffered_io on all my OSDs and latency gets back but throughput decreases. I had these configs for all OSDs in recovery osd-max-backfills 1 osd-recovery-max-active 1 osd-recovery-op-priority 1 Do you have any

[ceph-users] Re: replace osd with Octopus

2020-11-26 Thread Anthony D'Atri
>> When replacing an osd, there will be no PG remapping, and backfill >>> will restore the data on the new disk, right? >> >> That depends on how you decide to go through the replacement process. >> Usually without your intervention (e.g. setting the appropriate OSD >> flags) the remapping will

[ceph-users] Re: Advice on SSD choices for WAL/DB?

2020-11-26 Thread Christian Wuerdig
I think it's time to start pointing out the the 3/30/300 logic not really holds any longer true post Octopus: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/CKRCB3HUR7UDRLHQGC7XXZPWCWNJSBNT/ On Thu, 2 Jul 2020 at 00:09, Burkhard Linke <

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Igor Fedotov
OK, cool! Will try to reproduce this locally tomorrow... Thanks, Igor On 11/26/2020 10:19 PM, Dan van der Ster wrote: Those osds are intentionally out, yes. (They were drained to be replaced). I have fixed 2 clusters' stats already with this method ... both had up but out osds, and

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Dan van der Ster
Those osds are intentionally out, yes. (They were drained to be replaced). I have fixed 2 clusters' stats already with this method ... both had up but out osds, and stopping the up/out osd fixed the stats. I opened a tracker for this: https://tracker.ceph.com/issues/48385 -- dan On Thu, Nov

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Igor Fedotov
hmm... I would suggest some issues in OSD-MON communication. The first question is whether this "broken" OSD set is constant or it changes over time? Does any of this OSD back the 'foo' PG? Igor On 11/26/2020 10:02 PM, Dan van der Ster wrote: There are a couple gaps, yes:

[ceph-users] Re: DB sizing for lots of large files

2020-11-26 Thread Christian Wuerdig
Sorry, I replied to the wrong email thread before, so reposting this: I think it's time to start pointing out the the 3/30/300 logic not really holds any longer true post Octopus: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/CKRCB3HUR7UDRLHQGC7XXZPWCWNJSBNT/ Although I suppose

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Igor Fedotov
Also wondering if you have the same "gap" OSDs at different cluster(s) which show stats improperly? On 11/26/2020 10:08 PM, Dan van der Ster wrote: Hey that's it! I stopped the up but out OSDs (100 and 177), and now the stats are correct! # ceph df RAW STORAGE: CLASS SIZE

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Igor Fedotov
Were they intentionally marked as 'out' or this was caused for unknown reasons? On 11/26/2020 10:08 PM, Dan van der Ster wrote: Hey that's it! I stopped the up but out OSDs (100 and 177), and now the stats are correct! # ceph df RAW STORAGE: CLASS SIZEAVAIL USED

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Dan van der Ster
Hey that's it! I stopped the up but out OSDs (100 and 177), and now the stats are correct! # ceph df RAW STORAGE: CLASS SIZEAVAIL USEDRAW USED %RAW USED hdd 5.5 PiB 1.2 PiB 4.3 PiB 4.3 PiB 78.62 TOTAL 5.5 PiB 1.2 PiB

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Dan van der Ster
There are a couple gaps, yes: https://termbin.com/9mx1 What should I do? -- dan On Thu, Nov 26, 2020 at 7:52 PM Igor Fedotov wrote: > > Does "ceph osd df tree" show stats properly (I mean there are no evident > gaps like unexpected zero values) for all the daemons? > > > > 1. Anyway, I found

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Igor Fedotov
Does "ceph osd df tree" show stats properly (I mean there are no evident gaps like unexpected zero values) for all the daemons? 1. Anyway, I found something weird... I created a new 1-PG pool "foo" on a different cluster and wrote some data to it. The stored and used are equal. Thu 26 Nov

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Dan van der Ster
0. OK, not sure this is informative: 2020-11-26 19:40:01.726 7fd10d09b700 20 bluestore(/var/lib/ceph/osd/ceph-252) statfs store_statfs(0x1839137/0x227b0/0x7470280, data 0x5c0ef8b8393/0x5c14998, compress 0x0/0x0/0x0, omap 0x9163, meta 0x227af6e9d) 2020-11-26 19:40:02.078

[ceph-users] Re: replace osd with Octopus

2020-11-26 Thread Tony Liu
Hi, > > When replacing an osd, there will be no PG remapping, and backfill > > will restore the data on the new disk, right? > > That depends on how you decide to go through the replacement process. > Usually without your intervention (e.g. setting the appropriate OSD > flags) the remapping will

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Igor Fedotov
For specific BlueStore instance you can learn relevant statfs output by setting debug_bluestore to 20 and leaving OSD for 5-10 seconds (or may be a couple of minutes - don't remember exact statsfs poll period ). Then grep osd log for "statfs" and/or "pool_statfs" and get the output formatted

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Dan van der Ster
Here's a different cluster we upgraded luminous -> nautilus in October: 2020-10-14 13:22:51.860 7f78e3d20a80 0 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable), process ceph-osd, pid 119714 ... 2020-10-14 13:27:50.368 7f78e3d20a80 1

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Dan van der Ster
Hi Igor, No BLUESTORE_LEGACY_STATFS warning, and bluestore_warn_on_legacy_statfs is the default true on this (and all) clusters. I'm quite sure we did the statfs conversion during one of the recent upgrades (I forget which one exactly). # ceph tell osd.* config get

[ceph-users] Re: DB sizing for lots of large files

2020-11-26 Thread Eugen Block
Also you don’t need to create the volume groups and logical volumes for the OSDs, ceph-volume can do that for you and handles raw disks, too. Check out the man page for all the options: https://docs.ceph.com/en/latest/man/8/ceph-volume/ Zitat von Richard Thornton : Hi, Sorry to bother

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Igor Fedotov
Hi Dan don't you have BLUESTORE_LEGACY_STATFS alert raised (might be silenced by bluestore_warn_on_legacy_statfs param) for the older cluster? Thanks, Igor On 11/26/2020 7:29 PM, Dan van der Ster wrote: Hi, Depending on which cluster I look at (all running v14.2.11), the bytes_used is

[ceph-users] octupus: stall i/o during recovery

2020-11-26 Thread Peter Lieven
Hi, I am currently evaluating ceph and stumbled across an odd issue when an osd comes back online. The osd was taken offline, but is still "in" and is brought back online before it is marked "out". As a test I run a fio job with 4k rand I/O on a 10G rbd volume during the OSD down and up

[ceph-users] ceph df: pool stored vs bytes_used -- raw or not?

2020-11-26 Thread Dan van der Ster
Hi, Depending on which cluster I look at (all running v14.2.11), the bytes_used is reporting raw space or stored bytes variably. Here's a 7 year old cluster: # ceph df -f json | jq .pools[0] { "name": "volumes", "id": 4, "stats": { "stored": 1229308190855881, "objects": 294401604,

[ceph-users] Re: DB sizing for lots of large files

2020-11-26 Thread Burkhard Linke
Hi, On 11/26/20 12:45 PM, Richard Thornton wrote: Hi, Sorry to bother you all. It’s a home server setup. Three nodes (ODROID-H2+ with 32GB RAM and dual 2.5Gbit NICs), two 14TB 7200rpm SATA drives and an Optane 118GB NVMe in each node (OS boots from eMMC). *snipsnap* Is there a rough

[ceph-users] Re: Public Swift yielding errors since 14.2.12

2020-11-26 Thread Vladimir Sigunov
Hi Jukka, In my case, public Swif buckets are working as exprcted for RGW Nautilus 14.2.12-14.2.14 with OpenStack Rocky. However, Octopus 15.2.5 which should have this fix according to the change log, still fails. Do you have anything interesting in rgw debug log (debug rgw = 20) or in keystone

[ceph-users] Re: replace osd with Octopus

2020-11-26 Thread Eugen Block
Hi, Thank you Eugen for pointing it out. Yes, OSDs are deployed by cephadm with drive_group. It seems that the orch module simplifies the process to make it easier for users. yes it does, you can also manage your OSDs via dashboard which makes it also easier for some users. When

[ceph-users] snap permission denied

2020-11-26 Thread vcjouni
Hi, I have k8s cephfs-provisioner.yaml and storageclass.yaml: --- kind: StorageClass apiVersion: storage.k8s.io/v1 metadata:   name: cephfs   namespace: cephfs provisioner: ceph.com/cephfs parameters:   monitors: 10.32.121.51:6789,10.32.121.52:6789,10.32.121.53:6789   adminId: admin  

[ceph-users] DB sizing for lots of large files

2020-11-26 Thread Richard Thornton
Hi, Sorry to bother you all. It’s a home server setup. Three nodes (ODROID-H2+ with 32GB RAM and dual 2.5Gbit NICs), two 14TB 7200rpm SATA drives and an Optane 118GB NVMe in each node (OS boots from eMMC). Only CephFS, I'm anticipating having 50-200K files when the 50TB (4+2 EC) is full. I'm

[ceph-users] Re: [Suspicious newsletter] Re: Unable to reshard bucket

2020-11-26 Thread Szabo, Istvan (Agoda)
Hi Eric, Thank you to pickup my question. Correct me if I'm wrong please regarding sharding and indexes. The flow when the user put an object to the cluster, it will create 1 object in the index pool that will hold the let's say location of the file in the data pool. 1 index entry is for 1

[ceph-users] Re: high memory usage in osd_pglog

2020-11-26 Thread Kalle Happonen
Hi Robert, This sounds very much like a big problem we had 2 weeks back. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EWPPEMPAJQT6GGYSHM7GIM3BZWS2PSUY/ Are you running EC? Which version are you running? It would fit our narrative if you use EC and recently updated to 14.2.11+