Re: [ceph-users] Cannot delete a pool

2018-03-01 Thread Eugen Block
nl/2015/04/protecting-your-ceph-pools-against-removal-or-property-changes/ kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Eugen Block voi

[ceph-users] Scrubbing for RocksDB

2018-04-09 Thread Eugen Block
ng like a journal-scrub, maybe. Has someone experienced similar issues and can shed some light on this? Any insights would be very helpful. Regards, Eugen [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024913.html -- Eugen Block voice : +

Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation

2018-04-10 Thread Eugen Block
o much if you want to deploy separate block.db from a bluestore made without block.db kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Eugen Block

Re: [ceph-users] something missing in filestore to bluestore conversion

2018-05-07 Thread Eugen Block
Hi, I'm not sure if this is deprecated or something, but I usually have to execute an additional "ceph auth del " before recreating an OSD. Otherwise the OSD fails to start. Maybe this is a missing step. Regards, Eugen Zitat von Gary Molenkamp : Good morning all, Last week I started co

Re: [ceph-users] something missing in filestore to bluestore conversion

2018-05-07 Thread Eugen Block
enkamp : Thanks Eugen, The OSDs will start immediately after completing the "ceph-volume prepare", but they won't start on a clean reboot.   It seems that the "prepare" is mounting the /var/lib/ceph/osd/ceph-osdX path/structure but this is missing now in my boot process. Ga

Re: [ceph-users] Deleting an rbd image hangs

2018-05-08 Thread Eugen Block
Hi, I have a similar issue and would also need some advice how to get rid of the already deleted files. Ceph is our OpenStack backend and there was a nova clone without parent information. Apparently, the base image had been deleted without a warning or anything although there were existi

Re: [ceph-users] Ceph - Xen accessing RBDs through libvirt

2018-05-22 Thread Eugen Block
Hi, So "somthing" goes wrong: # cat /var/log/libvirt/libxl/libxl-driver.log -> ... 2018-05-20 15:28:15.270+: libxl: libxl_bootloader.c:634:bootloader_finished: bootloader failed - consult logfile /var/log/xen/bootloader.7.log 2018-05-20 15:28:15.270+: libxl: libxl_exec.c:118:libxl_repor

[ceph-users] Different disk sizes after Luminous upgrade 12.2.2 --> 12.2.5

2018-05-25 Thread Eugen Block
Hi list, we have a Luminous bluestore cluster with separate block.db/block.wal on SSDs. We were running version 12.2.2 and upgraded yesterday to 12.2.5. The upgrade went smoothly, but since the restart of the OSDs I noticed that 'ceph osd df' shows a different total disk size: ---cut here

Re: [ceph-users] Different disk sizes after Luminous upgrade 12.2.2 --> 12.2.5

2018-05-25 Thread Eugen Block
On 5/25/2018 2:22 PM, Eugen Block wrote: Hi list, we have a Luminous bluestore cluster with separate block.db/block.wal on SSDs. We were running version 12.2.2 and upgraded yesterday to 12.2.5. The upgrade went smoothly, but since the restart of the OSDs I noticed that 'ceph osd df&#

Re: [ceph-users] Mimic EPERM doing rm pool

2018-05-29 Thread Eugen Block
Hi, [root@n1 ~]# ceph osd pool rm mytestpool mytestpool --yes-i-really-mean-it Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored if the command you posted is complete then you forgot one "really" in the --yes-i-really-really-mean-it option. Regards Zitat von Steffen W

Re: [ceph-users] Adding SSD-backed DB & WAL to existing HDD OSD

2018-07-03 Thread Eugen Block
Hi, we had to recreate some block.db's for some OSDs just a couple of weeks ago because our existing journal SSD had failed. This way we avoided to rebalance the whole cluster, just the OSD had to be filled up. Maybe this will help you too. http://heiterbiswolkig.blogs.nde.ag/2018/04/08/r

[ceph-users] slow requests break performance

2017-01-11 Thread Eugen Block
the other nodes? How do you deal with these slow requests? Thanks for any help! Regards, Eugen -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg

Re: [ceph-users] slow requests break performance

2017-01-11 Thread Eugen Block
sts.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : ebl...@nde.ag Vorsitzende des Aufsic

Re: [ceph-users] slow requests break performance

2017-01-12 Thread Eugen Block
8", "age": 112.118210, "duration": 26.452526, They also contain many "waiting for rw locks" messages, but not as much as the dump from the reporting OSD. To me it seems that because two OSDs take a lot of time to process their requests (on

Re: [ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Eugen Block
mpty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 577, "hit_set_history": { "current_last_update": "0'0", "history": []

Re: [ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Eugen Block
idn't have these inconsistencies since we increased the size to 3. Zitat von Mio Vlahović : Hello, From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eugen Block I had a similar issue recently, where I had a replication size of 2 (I changed that to 3 after th

Re: [ceph-users] 1 pgs inconsistent 2 scrub errors

2017-01-26 Thread Eugen Block
Glad I could help! :-) Zitat von Mio Vlahović : From: Eugen Block [mailto:ebl...@nde.ag] From what I understand, with a rep size of 2 the cluster can't decide which object is intact if one is broken, so the repair fails. If you had a size of 3, the cluster would see 2 intact objec

Re: [ceph-users] slow requests break performance

2017-02-01 Thread Eugen Block
ay help you nail it. I suspect though, that it may come down to enabling debug logging and tracking a slow request through the logs. On Thu, Jan 12, 2017 at 8:41 PM, Eugen Block wrote: Hi, Looking at the output of dump_historic_ops and dump_ops_in_flight I waited for new slow request messa

Re: [ceph-users] slow requests break performance

2017-02-01 Thread Eugen Block
scrubs in the evening, to avoid performance impact during working hours. Please let me know if I missed anything. I really appreciate you looking into this. Regards, Eugen Zitat von Christian Balzer : Hello, On Wed, 01 Feb 2017 11:43:02 +0100 Eugen Block wrote: Hi, I haven't t

Re: [ceph-users] slow requests break performance

2017-02-02 Thread Eugen Block
lot. Eugen Zitat von Christian Balzer : Hello, On Wed, 01 Feb 2017 15:16:15 +0100 Eugen Block wrote: > You've told us absolutely nothing about your cluster You're right, I'll try to provide as much information as possible. Please note that we have kind of a "special&q

Re: [ceph-users] Migrating data from a Ceph clusters to another

2017-02-10 Thread Eugen Block
mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : ebl

[ceph-users] 1 PG stuck unclean (active+remapped) after OSD replacement

2017-02-13 Thread Eugen Block
ut a full recovery. Or should I have deleted that PG instead of re-activating old OSDs? I'm not sure what the best practice would be in this case. Any help is appreciated! Regards, Eugen -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklun

Re: [ceph-users] 1 PG stuck unclean (active+remapped) after OSD replacement

2017-02-13 Thread Eugen Block
Mon, Feb 13, 2017 at 7:05 AM Wido den Hollander wrote: > Op 13 februari 2017 om 16:03 schreef Eugen Block : > > > Hi experts, > > I have a strange situation right now. We are re-organizing our 4 node > Hammer cluster from LVM-based OSDs to HDDs. When we did this on the &

Re: [ceph-users] Available tools for deploying ceph cluster as a backend storage ?

2017-05-18 Thread Eugen Block
easier to use and give production deployment. Thanks, Shambhu Rajak -- Eugen Block voice : +49-40-559 51 75 NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77 Postfach 61 03 15 D-22423 Hamburg e-mail : ebl...@nde.ag

Re: [ceph-users] Openstack ceph - non bootable volumes

2018-12-19 Thread Eugen Block
Hi, can you explain more detailed what exactly goes wrong? In many cases it's an authentication error, can you check if your specified user is allowed to create volumes in the respective pool? You could try something like this (from compute node): rbd --user -k /etc/ceph/ceph.client.OPENS

Re: [ceph-users] Openstack ceph - non bootable volumes

2018-12-20 Thread Eugen Block
6-a36a-587cc50ed1ff volume-baa6c928-8ac1-4240-b189-32b444b434a3 volume-c23a69dc-d043-45f7-970d-1eec2ccb10cc volume-f1872ae6-48e3-4a62-9f46-bf157f079e7f On Wed, 19 Dec 2018 at 09:25, Eugen Block wrote: Hi, can you explain more detailed what exactly goes wrong? In many cases it's an auth

[ceph-users] Clarification of mon osd communication

2019-01-10 Thread Eugen Block
Hello list, there are two config options of mon/osd interaction that I don't fully understand. Maybe one of you could clarify it for me. mon osd report timeout - The grace period in seconds before declaring unresponsive Ceph OSD Daemons down. Default 900 mon osd down out interval - The

[ceph-users] Clarification of communication between mon and osd

2019-01-14 Thread Eugen Block
Hello list, I noticed my last post was displayed as a reply to a different thread, so I re-send my question, please excuse the noise. There are two config options of mon/osd interaction that I don't fully understand. Maybe one of you could clarify it for me. mon osd report timeout - The

Re: [ceph-users] Clarification of communication between mon and osd

2019-01-14 Thread Eugen Block
ooking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Jan 14, 2019 at 10:17 AM Eugen Block wrote: Hello list, I noticed my last post was displayed as a reply to a different thread, so I re-send

Re: [ceph-users] block.db on a LV? (Re: Mixed SSD+HDD OSD setup recommendation)

2019-01-18 Thread Eugen Block
Hi Jan, I think you're running into an issue reported a couple of times. For the use of LVM you have to specify the name of the Volume Group and the respective Logical Volume instead of the path, e.g. ceph-volume lvm prepare --bluestore --block.db ssd_vg/ssd00 --data /dev/sda Regards, Eugen

Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools

2019-01-22 Thread Eugen Block
Hi Thomas, What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single pool "backup" and use namespace when writing data in the pool? I don't think one pool per DB is reasonable. If the number of DBs increase

Re: [ceph-users] The OSD can be “down” but still “in”.

2019-01-23 Thread Eugen Block
Hi, If the OSD represents the primary one for a PG, then all IO will be stopped..which may lead to application failure.. no, that's not how it works. You have an acting set of OSDs for a PG, typically 3 OSDs in a replicated pool. If the primary OSD goes down, the secondary becomes the prim

Re: [ceph-users] Creating a block device user with restricted access to image

2019-01-25 Thread Eugen Block
Hi, I replied to your thread a couple of days ago, maybe you didn't notice: Restricting user access is possible on rbd image level. You can grant read/write access for one client and only read access for other clients, you have to create different clients for that, see [1] for more details

Re: [ceph-users] Creating a block device user with restricted access to image

2019-01-25 Thread Eugen Block
:ImageState: 0x5643668a5700 failed to open image: (1) Operation not permitted rbd: error opening image isa: (1) Operation not permitted In some cases useful info is found in syslog - try "dmesg | tail". rbd: map failed: (1) Operation not permitted Regards Thomas Am 25.01.2019 um 1

Re: [ceph-users] Creating a block device user with restricted access to image

2019-01-25 Thread Eugen Block
You can check all objects of that pool to see if your caps match: rados -p backup ls | grep rbd_id Zitat von Eugen Block : caps osd = "allow pool backup object_prefix rbd_data.18102d6b8b4567; allow rwx pool backup object_prefix rbd_header.18102d6b8b4567; allow rx pool backup object_p

[ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Eugen Block
Hi list, I found this thread [1] about crashing SSD OSDs, although that was about an upgrade to 12.2.7, we just hit (probably) the same issue after our update to 12.2.10 two days ago in a production cluster. Just half an hour ago I saw one OSD (SSD) crashing (for the first time): 2019-02-07

Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Eugen Block
t earlier are still present in DB. And assertion might still happen (hopefully with less frequency). So could you please run fsck for OSDs that were broken once and share the results? Then we can decide if it makes sense to proceed with the repair. Thanks, Igor On 2/7/2019 3:37 PM, Eu

Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-02-07 Thread Eugen Block
properly. Will see, lets get fsck report first. W.r.t to running ceph-bluestore-tool - you might want to specify log file and increase log level to 20 using --log-file and --log-level options. On 2/7/2019 4:45 PM, Eugen Block wrote: Hi Igor, thanks for the quick response! Just to m

Re: [ceph-users] Luminous to Mimic: MON upgrade requires "full luminous scrub". What is that?

2019-02-07 Thread Eugen Block
Hi, could it be a missing 'ceph osd require-osd-release luminous' on your cluster? When I check a luminous cluster I get this: host1:~ # ceph osd dump | grep recovery flags sortbitwise,recovery_deletes,purged_snapdirs The flags in the code you quote seem related to that. Can you check that out

Re: [ceph-users] best practices for EC pools

2019-02-07 Thread Eugen Block
Hi Francois, Is that correct that recovery will be forbidden by the crush rule if a node is down? yes, that is correct, failure-domain=host means no two chunks of the same PG can be on the same host. So if your PG is divided into 6 chunks, they're all on different hosts, no recovery is po

Re: [ceph-users] will crush rule be used during object relocation in OSD failure ?

2019-02-12 Thread Eugen Block
Hi, I came to the same conclusion after doing various tests with rooms and failure domains. I agree with Maged and suggest to use size=4, min_size=2 for replicated pools. It's more overhead but you can survive the loss of one room and even one more OSD (of the affected PG) without losing

Re: [ceph-users] will crush rule be used during object relocation in OSD failure ?

2019-02-12 Thread Eugen Block
ction CEPH... Regards, /st -Original Message- From: ceph-users On Behalf Of Eugen Block Sent: Tuesday, February 12, 2019 5:32 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] will crush rule be used during object relocation in OSD failure ? Hi, I came to the same conclusion a

[ceph-users] How to control automatic deep-scrubs

2019-02-13 Thread Eugen Block
Hi cephers, I'm struggling a little with the deep-scrubs. I know this has been discussed multiple times (e.g. in [1]) and we also use a known crontab script in a Luminous cluster (12.2.10) to start the deep-scrubbing manually (a quarter of all PGs 4 times a week). The script works just fi

Re: [ceph-users] How to control automatic deep-scrubs

2019-02-13 Thread Eugen Block
Thank you, Konstantin, I'll give that a try. Do you have any comment on osd_deep_mon_scrub_interval? Eugen Zitat von Konstantin Shalygin : The expectation was to prevent the automatic deep-scrubs but they are started anyway You can disable deep-scrubs per pool via `ceph osd pool set node

Re: [ceph-users] How to control automatic deep-scrubs

2019-02-13 Thread Eugen Block
My Ceph Luminous don't know anything about this option: # ceph daemon osd.7 config help osd_deep_mon_scrub_interval { "error": "Setting not found: 'osd_deep_mon_scrub_interval'" } Exactly, it's also not available in a Mimic test-cluster. But it's mentioned in the docs for L and M (I didn'

Re: [ceph-users] How to control automatic deep-scrubs

2019-02-13 Thread Eugen Block
2:16 PM, Eugen Block wrote: Exactly, it's also not available in a Mimic test-cluster. But it's mentioned in the docs for L and M (I didn't check the docs for other releases), that's what I was wondering about. Can you provide ur

Re: [ceph-users] How to control automatic deep-scrubs

2019-02-13 Thread Eugen Block
I created http://tracker.ceph.com/issues/38310 for this. Regards, Eugen Zitat von Konstantin Shalygin : On 2/14/19 2:21 PM, Eugen Block wrote: Already did, but now with highlighting ;-) http://docs.ceph.com/docs/luminous/rados/operations/health-checks/?highlight=osd_deep_mon_scrub_interval

Re: [ceph-users] Ceph Nautilus Release T-shirt Design

2019-02-15 Thread Eugen Block
I have no issues opening that site from Germany. Zitat von Dan van der Ster : On Fri, Feb 15, 2019 at 11:40 AM Willem Jan Withagen wrote: On 15/02/2019 10:39, Ilya Dryomov wrote: > On Fri, Feb 15, 2019 at 12:05 AM Mike Perez wrote: >> >> Hi Marc, >> >> You can see previous designs on the C

Re: [ceph-users] Placing replaced disks to correct buckets.

2019-02-18 Thread Eugen Block
Hi, We skipped stage 1 and replaced the UUIDs of old disks with the new ones in the policy.cfg We ran salt '*' pillar.items and confirmed that the output was correct. It showed the new UUIDs in the correct places. Next we ran salt-run state.orch ceph.stage.3 PS: All of the above ran successfull

Re: [ceph-users] min_size vs. K in erasure coded pools

2019-02-20 Thread Eugen Block
Hi, I see that as a security feature ;-) You can prevent data loss if k chunks are intact, but you don't want to work with the least required amount of chunks. In a disaster scenario you can reduce min_size to k temporarily, but the main goal should always be to get the OSDs back up. For ex

Re: [ceph-users] ceph migration

2019-02-25 Thread Eugen Block
I just moved a (virtual lab) cluster to a different network, it worked like a charm. In an offline method - you need to: - set osd noout, ensure there are no OSDs up - Change the MONs IP, See the bottom of [1] "CHANGING A MONITOR’S IP ADDRESS", MONs are the only ones really sticky with the

Re: [ceph-users] ceph migration

2019-02-26 Thread Eugen Block
only a test cluster. Regards, Eugen Zitat von Janne Johansson : Den mån 25 feb. 2019 kl 13:40 skrev Eugen Block : I just moved a (virtual lab) cluster to a different network, it worked like a charm. In an offline method - you need to: - set osd noout, ensure there are no OSDs up - Change the

Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-03-11 Thread Eugen Block
earlier are still present in DB. And assertion might still happen (hopefully with less frequency). So could you please run fsck for OSDs that were broken once and share the results? Then we can decide if it makes sense to proceed with the repair. Thanks, Igor On 2/7/2019 3:37 PM, Eugen Block

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Eugen Block
Hi, my first guess would be a network issue. Double-check your connections and make sure the network setup works as expected. Check syslogs, dmesg, switches etc. for hints that a network interruption may have occured. Regards, Eugen Zitat von Zhenshi Zhou : Hi, I deployed a ceph clus

Re: [ceph-users] ceph-volume lvm batch OSD replacement

2019-03-21 Thread Eugen Block
Hi Dan, I don't know about keeping the osd-id but I just partially recreated your scenario. I wiped one OSD and recreated it. You are trying to re-use the existing block.db-LV with the device path (--block.db /dev/vg-name/lv-name) instead the lv notation (--block.db vg-name/lv-name): #

Re: [ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Eugen Block
Hi, If you run "rbd snap ls --all", you should see a snapshot in the "trash" namespace. I just tried the command "rbd snap ls --all" on a lab cluster (nautilus) and get this error: ceph-2:~ # rbd snap ls --all rbd: image name was not specified Are there any requirements I haven't noticed?

Re: [ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Eugen Block
--long" command. Thanks for the clarification. Eugen Zitat von Jason Dillaman : On Tue, Apr 2, 2019 at 8:42 AM Eugen Block wrote: Hi, > If you run "rbd snap ls --all", you should see a snapshot in > the "trash" namespace. I just tried the command "rbd

Re: [ceph-users] showing active config settings

2019-04-09 Thread Eugen Block
Hi, I haven't used the --show-config option until now, but if you ask your OSD daemon directly, your change should have been applied: host1:~ # ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4' host1:~ # ceph daemon osd.1 config show | grep osd_recovery_max_active "osd_recovery

Re: [ceph-users] showing active config settings

2019-04-10 Thread Eugen Block
grep osd_recovery_max_active osd_recovery_max_active = 3 Zitat von Janne Johansson : Den ons 10 apr. 2019 kl 13:31 skrev Eugen Block : While --show-config still shows host1:~ # ceph --show-config | grep osd_recovery_max_active osd_recovery_max_active = 3 It seems as if --show-config is not really up-to

Re: [ceph-users] showing active config settings

2019-04-10 Thread Eugen Block
#x27;ll keep it that way. ;-) Zitat von Janne Johansson : Den ons 10 apr. 2019 kl 13:37 skrev Eugen Block : > If you don't specify which daemon to talk to, it tells you what the > defaults would be for a random daemon started just now using the same > config as you have in /etc/

Re: [ceph-users] Fwd: HW failure cause client IO drops

2019-04-15 Thread Eugen Block
Good morning, the OSDs are usually marked out after 10 minutes, that's when rebalancing starts. But the I/O should not drop during that time, this could be related to your pool configuration. If you have a replicated pool of size 3 and also set min_size to 3 the I/O would pause if a node

Re: [ceph-users] Is it possible to get list of all the PGs assigned to an OSD?

2019-04-29 Thread Eugen Block
Sure there is: ceph pg ls-by-osd Regards, Eugen Zitat von Igor Podlesny : Or is there no direct way to accomplish that? What workarounds can be used then? -- End of message. Next message? ___ ceph-users mailing list ceph-users@lists.ceph.com http:

Re: [ceph-users] inconsistent number of pools

2019-05-20 Thread Eugen Block
Hi, have you tried 'ceph health detail'? Zitat von Lars Täuber : Hi everybody, with the status report I get a HEALTH_WARN I don't know how to get rid of. It my be connected to recently removed pools. # ceph -s cluster: id: 6cba13d1-b814-489c-9aac-9c04aaf78720 health: HEALTH_WAR

Re: [ceph-users] Is a not active mds doing something?

2019-05-21 Thread Eugen Block
Hi Marc, have you configured the other MDS to be standby-replay for the active MDS? I have three MDS servers, one is active, the second is active-standby and the third just standby. If the active fails, the second takes over within seconds. This is what I have in my ceph.conf: [mds.] mds_

Re: [ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Eugen Block
Hi, this question comes up regularly and is been discussed just now: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034867.html Regards, Eugen Zitat von Yoann Moulin : Dear all, I am doing some tests with Nautilus and cephfs on erasure coding pool. I noticed something strang

Re: [ceph-users] Failed Disk simulation question

2019-05-22 Thread Eugen Block
Hi Alex, The cluster has been idle at the moment being new and all. I noticed some disk related errors in dmesg but that was about it. It looked to me for the next 20 - 30 minutes the failure has not been detected. All osds were up and in and health was OK. OSD logs had no smoking gun eit

Re: [ceph-users] osd daemon cluster_fsid not reflecting actual cluster_fsid

2019-06-18 Thread Eugen Block
Hi, this OSD must have been part of a previous cluster, I assume. I would remove it from crush if it's still there (check just to make sure), wipe the disk, remove any traces like logical volumes (if it was a ceph-volume lvm OSD) and if possible, reboot the node. Regards, Eugen Zitat von

Re: [ceph-users] osd daemon cluster_fsid not reflecting actual cluster_fsid

2019-06-20 Thread Eugen Block
, "bluefs": "1", "ceph_fsid": "173b6382-504b-421f-aa4d-52526fa80dfa", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "osd_key": "AQBXwwddy5OEAxAAS4AidvOF0kl+k

Re: [ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread Eugen Block
Hi, some more information about the cluster status would be helpful, such as ceph -s ceph osd tree service status of all MONs, MDSs, MGRs. Are all services up? Did you configure the spare MDS as standby for rank 0 so that a failover can happen? Regards, Eugen Zitat von dhils...@performair

Re: [ceph-users] MGR Logs after Failure Testing

2019-06-28 Thread Eugen Block
on Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eugen Block Sent: Thursday, June 27, 2019 8:23 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] MGR Logs after

Re: [ceph-users] PGs allocated to osd with weights 0

2019-07-02 Thread Eugen Block
Hi, I can’t get data flushed out of osd with weights set to 0. Is there any way of checking the tasks queued for PG remapping ? Thank You. can you give some more details about your cluster (replicated or EC pools, applied rules etc.)? My first guess would be that the other OSDs are either

Re: [ceph-users] Cinder pool inaccessible after Nautilus upgrade

2019-07-02 Thread Eugen Block
Hi, did you try to use rbd and rados commands with the cinder keyring, not the admin keyring? Did you check if the caps for that client are still valid (do the caps differ between the two cinder pools)? Are the ceph versions on your hypervisors also nautilus? Regards, Eugen Zitat von Adr

Re: [ceph-users] ceph-volume failed after replacing disk

2019-07-05 Thread Eugen Block
Hi, did you also remove that OSD from crush and also from auth before recreating it? ceph osd crush remove osd.71 ceph auth del osd.71 Regards, Eugen Zitat von "ST Wong (ITSC)" : Hi all, We replaced a faulty disk out of N OSD and tried to follow steps according to "Replacing and OSD"

[ceph-users] OSD replacement causes slow requests

2019-07-18 Thread Eugen Block
Hi list, we're facing an unexpected recovery behavior of an upgraded cluster (Luminous -> Nautilus). We added new servers with Nautilus to the existing Luminous cluster, so we could first replace the MONs step by step. Then we moved the old servers to a new root in the crush map and then

Re: [ceph-users] OSD replacement causes slow requests

2019-07-24 Thread Eugen Block
ess hours, maybe that's why he needs his vacation. ;-) Regards, Eugen Zitat von Wido den Hollander : On 7/18/19 12:21 PM, Eugen Block wrote: Hi list, we're facing an unexpected recovery behavior of an upgraded cluster (Luminous -> Nautilus). We added new servers with Nautilus to

[ceph-users] Nautilus dashboard: crushmap viewer shows only first root

2019-07-24 Thread Eugen Block
Hi all, we just upgraded our cluster to: ceph version 14.2.0-300-gacd2f2b9e1 (acd2f2b9e196222b0350b3b59af9981f91706c7f) nautilus (stable) When clicking through the dashboard to see what's new we noticed that the crushmap viewer only shows the first root of our crushmap (we have two roots

Re: [ceph-users] Nautilus dashboard: crushmap viewer shows only first root

2019-07-24 Thread Eugen Block
Thank you very much! Zitat von EDH - Manuel Rios Fernandez : Hi Eugen, Yes its solved, we reported in 14.2.1 and team fixed in 14.2.2 Regards, Manuel -Mensaje original- De: ceph-users En nombre de Eugen Block Enviado el: miércoles, 24 de julio de 2019 15:10 Para: ceph-users

Re: [ceph-users] ceph device list empty

2019-08-15 Thread Eugen Block
Hi, are the OSD nodes on Nautilus already? We upgraded from Luminous to Nautilus recently and the commands return valid output, except for those OSDs that haven't been upgraded yet. Zitat von Gary Molenkamp : I've had no luck in tracing this down.  I've tried setting debugging and log ch

Re: [ceph-users] Howto add DB (aka RockDB) device to existing OSD on HDD

2019-08-29 Thread Eugen Block
Hi, Then I tried to move DB to a new device (SSD) that is not formatted: root@ld5505:~# ceph-bluestore-tool bluefs-bdev-new-db –-path /var/lib/ceph/osd/ceph-76 --dev-target /dev/sdbk too many positional options have been specified on the command line I think you're trying the wrong option.

Re: [ceph-users] Howto add DB (aka RockDB) device to existing OSD on HDD

2019-08-29 Thread Eugen Block
Sorry, I misread, your option is correct, of course since there was no external db device. This worked for me: ceph-2:~ # CEPH_ARGS="--bluestore-block-db-size 1048576" ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-1 bluefs-bdev-new-db --dev-target /dev/sdb inferring bluefs devices from

Re: [ceph-users] ceph stats on the logs

2019-10-08 Thread Eugen Block
Hi, there is also /var/log/ceph/ceph.log on the MONs, it has the stats you're asking for. Does this answer your question? Regards, Eugen Zitat von nokia ceph : Hi Team, With default log settings , the ceph stats will be logged like cluster [INF] pgmap v30410386: 8192 pgs: 8192 active+cl

Re: [ceph-users] clust recovery stuck

2019-10-21 Thread Eugen Block
Hi, can you share `ceph osd tree`? What crush rules are in use in your cluster? I assume that the two failed OSDs prevent the remapping because the rules can't be applied. Regards, Eugen Zitat von Philipp Schwaha : hi, I have a problem with a cluster being stuck in recovery after osd

Re: [ceph-users] clust recovery stuck

2019-10-23 Thread Eugen Block
_max_pg_per_osd to, say, 400 and see if that helps.  This would allow the recovery to proceed - but you should consider adding OSDs (or at least increase the memory allocated to OSDs above the defaults). Andras On 10/22/19 3:02 PM, Philipp Schwaha wrote: hi, On 2019-10-22 08:05, Eugen Bloc

Re: [ceph-users] Is deepscrub Part of PG increase?

2019-11-03 Thread Eugen Block
Hi, deep-scrubs can also be configured per pool, so even if you have adjusted the general deep-scrub time the deep-scrubs will still happen. To disable per pool deep-scrubs you need to set ceph osd pool set nodeep-scrub true Regards, Eugen Zitat von c...@elchaka.de: Hello, I have a N

Re: [ceph-users] Command ceph osd df hangs

2019-11-21 Thread Eugen Block
Hi, check if the active MGR is hanging. I had this when testing pg_autoscaler, after some time every command would hang. Restarting the MGR helped for a short period of time, then I disabled pg_autoscaler. This is an upgraded cluster, currently on Nautilus. Regards, Eugen Zitat von Thom

Re: [ceph-users] HEALTH_WARN 1 MDSs report oversized cache

2019-12-05 Thread Eugen Block
Hi, can you provide more details? ceph daemon mds. cache status ceph config show mds. | grep mds_cache_memory_limit Regards, Eugen Zitat von Ranjan Ghosh : Okay, now, after I settled the issue with the oneshot service thanks to the amazing help of Paul and Richard (thanks again!), I still

Re: [ceph-users] Cluster in ERR status when rebalancing

2019-12-09 Thread Eugen Block
Hi, since we upgraded our cluster to Nautilus we also see those messages sometimes when it's rebalancing. There are several reports about this [1] [2], we didn't see it in Luminous. But eventually the rebalancing finished and the error message cleared, so I'd say there's (probably) nothin

Re: [ceph-users] pgs backfill_toofull after removing OSD from CRUSH map

2019-12-19 Thread Eugen Block
Hi Kristof, setting the OSD "out" doesn't change the crush weight of that OSD, but removing it from the tree does, that's why the cluster started to rebalance. Regards, Eugen Zitat von Kristof Coucke : Hi all, We are facing a strange symptom here. We're testing our recovery procedures.

Re: [ceph-users] ceph (jewel) unable to recover after node failure

2020-01-10 Thread Eugen Block
Hi, A. will ceph be able to recover over time? I am afraid that the 14 PGs that are down will not recover. if all OSDs come back (stable) the recovery should eventually finish. B. what caused the OSDs going down and up during recovery after the failed OSD node came back online? (step 2 above

Re: [ceph-users] OSD Marked down unable to restart continuously failing

2020-01-11 Thread Eugen Block
Hi, you say the daemons are locally up and running but restarting fails? Which one is it? Do you see any messages suggesting flapping OSDs? After 5 retries within 10 minutes the OSDs would be marked out. What is the result of your checks for iostat etc.? Anything pointing to a high load on

<    1   2