Re: [ceph-users] ceph-osd not starting after network related issues

2019-07-03 Thread Ian Coetzee
Hi All, Some feedback on my end. I managed to recover the "lost data" from one of the other OSDs. Seems like my initial summary was a bit off, in that the PG's was replicated, CEPH just wanted to confirm that the objects were still relevant. For future reference, I basically marked the OSD as

[ceph-users] pgs not deep-scrubbed in time

2019-07-03 Thread Alexander Walker
Hi, My Cluster show me this message cince last two weeks. Ceph Version (ceph -v): root@heku1 ~ # ceph -v ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) All pgs are active+clean: root@heku1 ~ # ceph -s   cluster:     id: 0839c91a-f3ca-4119-853b-eb10904cf322   

Re: [ceph-users] troubleshooting space usage

2019-07-03 Thread Andrei Mikhailovsky
Hi Igor, Many thanks for your reply. Here are the details about the cluster: 1. Ceph version - 13.2.5-1xenial (installed from Ceph repository for ubuntu 16.04) 2. main devices for radosgw pool - hdd. we do use a few ssds for the other pool, but it is not used by radosgw 3. we use

Re: [ceph-users] details about cloning objects using librados

2019-07-03 Thread nokia ceph
Hi Greg, Can you please share the api details for COPY_FROM or any reference document? Thanks , Muthu On Wed, Jul 3, 2019 at 4:12 AM Brad Hubbard wrote: > On Wed, Jul 3, 2019 at 4:25 AM Gregory Farnum wrote: > > > > I'm not sure how or why you'd get an object class involved in doing > >

[ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Luk
Hello, I have strange problem with scrubbing. When scrubbing starts on PG which belong to default.rgw.buckets.index pool, I can see that this OSD is very busy (see attachment), and starts showing many slow request, after the scrubbing of this PG stops, slow requests stops immediately.

[ceph-users] cephfs size

2019-07-03 Thread ST Wong (ITSC)
Hi, Mounted a CephFS through kernel module or FUSE. Both work except when we do a "df -h", the "Avail" value shown is the MAX AVAIL of the data pool in "ceph df". I'm expecting it should match with max_bytes of the data pool. Rbd mount doesn't have similar observation. Is this normal? Thanks

Re: [ceph-users] Nautilus - cephfs auth caps problem?

2019-07-03 Thread Paul Emmerich
Your cephfs was probably created with a buggy version that didn't set the metadata tags on the data pools correctly. IIRC there still isn't any automated migration of old broken pools. See https://github.com/ceph/ceph/pull/24125 Fix: ceph osd pool application set cephfs data cephfs_data2 Paul

Re: [ceph-users] troubleshooting space usage

2019-07-03 Thread Igor Fedotov
Hi Andrei, Additionally I'd like to see performance counters dump for a couple of HDD OSDs (obtained through 'ceph daemon osd.N perf dump' command). W.r.t average object size - I was thinking that you might know what objects had been uploaded... If not then you might want to estimate it by

Re: [ceph-users] How does monitor know OSD is dead?

2019-07-03 Thread Janne Johansson
Den ons 3 juli 2019 kl 05:41 skrev Bryan Henderson : > I may need to modify the above, though, now that I know how Ceph works, > because I've seen storage server products that use Ceph inside. However, > I'll > bet the people who buy those are not aware that it's designed never to go > down >

Re: [ceph-users] troubleshooting space usage

2019-07-03 Thread Igor Fedotov
Looks fine - comparing bluestore_allocated vs. bluestore_stored shows a little difference. So that's not the allocation overhead. What's about comparing object counts reported by ceph and radosgw tools? Igor. On 7/3/2019 3:25 PM, Andrei Mikhailovsky wrote: Thanks Igor, Here is a link to

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Janne Johansson
Den ons 3 juli 2019 kl 09:01 skrev Luk : > Hello, > > I have strange problem with scrubbing. > > When scrubbing starts on PG which belong to default.rgw.buckets.index > pool, I can see that this OSD is very busy (see attachment), and starts > showing many > slow request, after the

Re: [ceph-users] ceph-osd not starting after network related issues

2019-07-03 Thread Paul Emmerich
For anyone reading this in the future from a google search: please don't set osd_find_best_info_ignore_history_les unless you know exactly what you are doing. That's a really dangerous option and should be a last resort. It will almost definitely lead to some data loss or inconsistencies (lost

Re: [ceph-users] pgs not deep-scrubbed in time

2019-07-03 Thread Paul Emmerich
are you running with auto repair enabled? There's a bug that sometimes resets the scrub timestamps to 0 in this configuration. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89

Re: [ceph-users] Cinder pool inaccessible after Nautilus upgrade

2019-07-03 Thread Adrien Georget
Hi, With --debug-objecter=20, I found that the rados ls command hangs looping on laggy messages : | ||2019-07-03 13:33:24.913 7efc402f5700 10 client.21363886.objecter _op_submit op 0x7efc3800dc10|| ||2019-07-03 13:33:24.913 7efc402f5700 20 client.21363886.objecter _calc_target epoch 13146

[ceph-users] Octopus release target: March 1 2020

2019-07-03 Thread Sage Weil
Hi everyone, The target release date for Octopus is March 1, 2020. The freeze will be January 1, 2020. As a practical matter, that means any features need to be in before people leave for the holidays, ensuring the features get in in time and also that we can run tests over the holidays

Re: [ceph-users] 3 corrupted OSDs

2019-07-03 Thread Paul Emmerich
Hi, rocksdb in BlueStore should be opened like this with ceph-kvstore-tool: ceph-kvstore-tool bluestore-kv Instead of just "rocksdb" which is for rocksdb on some file system. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-03 Thread Paul Emmerich
On Wed, Jul 3, 2019 at 4:47 PM Luk wrote: > > > this pool is that 'big' : > > [root@ceph-mon-01 ~]# rados df | grep -e index -e WR > POOL_NAME USEDOBJECTS CLONES COPIES > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR > > default.rgw.buckets.index

Re: [ceph-users] troubleshooting space usage

2019-07-03 Thread Andrei Mikhailovsky
Hi Igor. The numbers are identical it seems: .rgw.buckets 19 15 TiB 78.22 4.3 TiB 8786934 # cat /root/ceph-rgw.buckets-rados-ls-all |wc -l 8786934 Cheers > From: "Igor Fedotov" > To: "andrei" > Cc: "ceph-users" > Sent: Wednesday, 3 July, 2019 13:49:02 > Subject: Re: [ceph-users]

Re: [ceph-users] troubleshooting space usage

2019-07-03 Thread Andrei Mikhailovsky
Thanks Igor, Here is a link to the ceph perf data on several osds. https://paste.ee/p/IzDMy In terms of the object sizes. We use rgw to backup the data from various workstations and servers. So, the sizes would be from a few kb to a few gig per individual file. Cheers > From: "Igor

Re: [ceph-users] How does monitor know OSD is dead?

2019-07-03 Thread Sage Weil
On Sun, 30 Jun 2019, Bryan Henderson wrote: > > I'm not sure why the monitor did not mark it _out_ after 600 seconds > > (default) > > Well, that part I understand. The monitor didn't mark the OSD out because the > monitor still considered the OSD up. No reason to mark an up OSD out. > > I

Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Janne Johansson
Den ons 3 juli 2019 kl 20:51 skrev Austin Workman : > > But a very strange number shows up in the active sections of the pg's > that's the same number roughly as 2147483648. This seems very odd, > and maybe the value got lodged somewhere it doesn't belong which is causing > an issue. > >

Re: [ceph-users] details about cloning objects using librados

2019-07-03 Thread Gregory Farnum
Well, the RADOS interface doesn't have a great deal of documentation so I don't know if I can point you at much. But if you look at Objecter.h, you see that the ObjectOperation has this function: void copy_from(object_t src, snapid_t snapid, object_locator_t src_oloc, version_t src_version,

Re: [ceph-users] How does monitor know OSD is dead?

2019-07-03 Thread Bryan Henderson
> I'm a bit confused about what happened here, though: that 600 second > interval is only important if *every* OSD in the system is down. If you > reboot the data center, why didn't *any* OSD daemons start? (And even if > none did, having the ceph -s report all OSDs down instead of up isn't

Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
That makes more sense. Setting min_size = 4 on the EC pool allows data to flow again(kind of not really because of the still missing 22 other PG's) maybe this automatically raised to 5 when I adjusted the EC pool originally?, outside of the 21 unknown and 1 down PG which are probably depending on

[ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
So several events unfolded that may have led to this situation. Some of them in hindsight were probably not the smartest decision around adjusting the ec pool and restarting the OSD's several times during these migrations. 1. Added a new 6th OSD with ceph-ansible 1. Hung during restart

Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
Something very curious is that I was adjusting the configuration for osd memory target via ceph-ansible and had at one point set 2147483648 which is around 2GB Currently It's set to 1610612736, but strangely in the config file it wrote 1963336226. But a very strange number shows up in the

[ceph-users] 3 OSDs stopped and unable to restart

2019-07-03 Thread Brett Chancellor
Hi All! Today I've had 3 OSDs stop themselves and are unable to restart, all with the same error. These OSDs are all on different hosts. All are running 14.2.1 I did try the following two commands - ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-80 list > keys ## This failed with the

Re: [ceph-users] Nautilus - cephfs auth caps problem?

2019-07-03 Thread Nigel Williams
thanks for the tip, I did wonder about that, and checked that at one point, and assumed that was ok. root@cnx-11:~# ceph osd pool application get cephfs_data { "cephfs": { "data": "cephfs" } } root@cnx-11:~# ceph osd pool application get cephfs_data2 { "cephfs": {

Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
After some creative PG surgery, everything is coming back online cleanly. I went through one at a time(80-90 PG's) on the least filled(new osd.5) and export-remove'd each PG that was causing the assertion failures after testing starting the OSD. # tail -f /var/log/ceph/ceph-osd.5.log | grep -A1