[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Frank Schilder
and disable any caches if necessary. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 01 November 2020 14:37:41 To: Sagara Wijetunga; ceph-users@ceph.io Subject: [ceph-users] Re: How to recover

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-01 Thread Frank Schilder
sorry: *badblocks* can force remappings of broken sectors (non-destructive read-write check) = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 01 November 2020 14:35:35 To: Sagara Wijetunga; ceph-users

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-01 Thread Frank Schilder
will be changed to this one automatically. Note that this may lead to data loss on objects that were in the undefined state. As far as I can see, its only 1 object and probably possible to recover from (backup, snapshot). Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-01 Thread Frank Schilder
st regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Sagara Wijetunga Sent: 01 November 2020 13:16:08 To: ceph-users@ceph.io Subject: [ceph-users] How to recover from active+clean+inconsistent+failed_repair? Hi all I have a Ceph cluster

[ceph-users] Re: Very high read IO during backfilling

2020-10-30 Thread Frank Schilder
Are you a victim of bluefs_buffered_io=false: https://www.mail-archive.com/ceph-users@ceph.io/msg05550.html ? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Kamil Szczygieł Sent: 27 October 2020 21:39:22

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE: 3 clients failing to respond to capability release

2020-10-30 Thread Frank Schilder
umount + mount worked. Thanks! Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 30 October 2020 10:22:38 To: Frank Schilder Cc: ceph-users Subject: Re: [ceph-users] MDS_CLIENT_LATE_RELEASE

[ceph-users] MDS_CLIENT_LATE_RELEASE: 3 clients failing to respond to capability release

2020-10-30 Thread Frank Schilder
": "3.10.0-957.12.2.el7.x86_64", "root": "/hpc/home" } }, -- { "id": 30749150, "num_leases": 0, "num_caps": 44, "state": "open", "request_load_avg&qu

[ceph-users] Re: frequent Monitor down

2020-10-30 Thread Frank Schilder
, the reality is not that simple. There is apparently some kind of subtlety that has more to do with the physical set-up that makes 4 mons worse than 3 (more likely to lead to loss of service). I do not remember the thread, but it was within the last year. Best regards, = Frank

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-29 Thread Frank Schilder
e ceph cluster from backup might be the fastest option. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: 28 October 2020 07:23:09 To: Ing. Luis Felipe Domínguez Vega Cc: Ceph Users Subject: [ceph-use

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
be it gives a clue what is going on. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Zhenshi Zhou Sent: 29 October 2020 09:44:14 To: Frank Schilder Cc: ceph-users Subject: Re: [ceph-users] monitor sst files continue growing I reset the

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Frank Schilder
He he. > It will prevent OSDs from being marked out if you shut them down or the . ... down or the MONs loose heartbeats due to high network load during peering. ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schil

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Frank Schilder
is not critical as far as I can see. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mark Johnson Sent: 29 October 2020 08:58:15 To: ceph-users@ceph.io; Frank Schilder Subject: Re: pgs stuck backfill_toofull

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
This does not explain incomplete and inactive PGs. Are you hitting https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not recover from OSD restart"? In that case, temporarily stopping and restarting all new OSDs might help. Best regards, ===== Frank Schilde

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Frank Schilder
ill need some rebalancing, because you run a bit low on available space. As a final note, running with size 2 min size 1 is a serious data redundancy risk. You should get another server and upgrade to 3(2). Best regards, ===== Frank Schilder AIT Risø Campus Bygning

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
did your cluster end up like this? It looks like all OSDs are up and in. You need to find out - why there are inactive PGs - why there are incomplete PGs This usually happens when OSDs go missing. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Frank Schilder
data is in the pools right now and what the future plan is. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mark Johnson Sent: 29 October 2020 06:55:55 To: ceph-users@ceph.io Subject: [ceph-users] pgs stuck backfil

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Frank Schilder
ur crush map and rules, you either stop placing stuff at 2 sites, or you create a proper 2-site set-up and copy data over. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Ing. Luis Felipe Domínguez Vega Sent: 28 Oct

[ceph-users] Re: The feasibility of mixed SSD and HDD replicated pool

2020-10-27 Thread Frank Schilder
a specialized crush rule will work exactly as intended and is long-term stable. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: 胡 玮文 Sent: 26 October 2020 17:19 To: Frank Schilder Cc: Anthony D'Atri; ceph-users@ceph.io

[ceph-users] Re: Question about expansion existing Ceph cluster - adding OSDs

2020-10-26 Thread Frank Schilder
Hi Kristof, I missed that: why do you need to do manual compaction? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Kristof Coucke Sent: 26 October 2020 11:33:52 To: Frank Schilder; a.jazdzew...@googlemail.com

[ceph-users] Re: The feasibility of mixed SSD and HDD replicated pool

2020-10-26 Thread Frank Schilder
reasing capacity, one needs to take care of adding disks to hdd_np class and set their primary affinity to 0 * somewhat increased admin effort, but fully working solution Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 __

[ceph-users] Re: The feasibility of mixed SSD and HDD replicated pool

2020-10-25 Thread Frank Schilder
ice. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: 25 October 2020 15:03:16 To: 胡 玮文; Alexander E. Patrakov Cc: ceph-users@ceph.io Subject: [ceph-users] Re: The feasibility of mixed SSD and HDD

[ceph-users] Re: The feasibility of mixed SSD and HDD replicated pool

2020-10-25 Thread Frank Schilder
and, if everything is up, need only 1 copy in the SSD cache, which means that you have 3 times the cache capacity. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: 胡 玮文 Sent: 25 October 2020 13:40:55 To: Alexander E

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-23 Thread Frank Schilder
uption. Maybe there are low-level commands to repair it. You should wait with trying to clean up the unfound objects until this is resolved. Not sure about adding further storage either. To me, this sounds quite serious. Best regards and good luck! = Frank Schilder AIT Risø

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-22 Thread Frank Schilder
case. Good luck! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-22 Thread Frank Schilder
, but might be faster than getting more RAM and will not loose data. Your clients will not be able to do much, if anything during recovery though. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-22 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 22 October 2020 09:32:07 To: Michael Thomas; ceph-users@ceph.io Subject: [ceph-users] Re: multiple OSD crash, unfound objects Sounds good. Did you re-create the pool

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-22 Thread Frank Schilder
to you. I hope I find time today to look at the incomplete PG. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 21 October 2020 22:58:47 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph

[ceph-users] Re: Question about expansion existing Ceph cluster - adding OSDs

2020-10-21 Thread Frank Schilder
this to some degree by using force_recovery commands on PGs on the fullest OSDs. Best regards and good luck, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Kristof Coucke Sent: 21 October 2020 13:29:00 To: ceph-users@ceph.io

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-21 Thread Frank Schilder
the issue (but tell the user :). Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 20 October 2020 23:48:36 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD cras

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-20 Thread Frank Schilder
d post the contents of file crush.txt. Did the slow MDS request complete by now? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 Contents of previous messages removed. ___ ceph-users mailing list -- ceph-users@ceph.io T

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
t. The crush rules and crush tree look OK to me. I can't really see why the missing OSDs are not assigned to the two PGs 1.0 and 7.39d. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 16 Octob

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
ops" or "ceph daemon osd.ID dump_historic_slow_ops" and check what type of operations get stuck? I'm wondering if its administrative, like peering attempts. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Fr

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
SDs assigned. I need to look a bit longer at the data you uploaded to find out why. I can't see anything obvious. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 16 October 2020 02:08:01 To:

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-14 Thread Frank Schilder
"? I don't expect to get the incomplete PG resolved with the above, but it will move some issues out of the way before proceeding. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 14 O

[ceph-users] Long heartbeat ping times

2020-10-12 Thread Frank Schilder
related messages in any OSD log and the messages I find in /var/log/messages do not contain IP addresses or OSD IDs. Is there a way to find out which OSDs/hosts were the problem after health status is back to healthy? Thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: cephfs tag not working

2020-10-01 Thread Frank Schilder
. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: 01 October 2020 15:33:53 To: ceph-users@ceph.io Subject: [ceph-users] Re: cephfs tag not working Hi, I have a one-node-cluster (also 15.2.4

[ceph-users] Re: hdd pg's migrating when converting ssd class osd's

2020-10-01 Thread Frank Schilder
/#4KY3OW7PTOODLQVYKARZLGE5FZUNQOER . Maybe there is/are regressions with crush placement computations (and others)? I will add this to the list of tests before considering to upgrade from mimic. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From

[ceph-users] Re: hdd pg's migrating when converting ssd class osd's

2020-09-30 Thread Frank Schilder
location and sit on their own switches. The fs and RBD only share the MONs/MGRs. I'm not entirely sure if we observed something real or only a network blip. However, nagios went crazy on our VM environment for a few minutes. Maybe there is another issue that causes unexpected cross-dependencies t

[ceph-users] Re: hdd pg's migrating when converting ssd class osd's

2020-09-30 Thread Frank Schilder
s. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nico Schottelius Sent: 30 September 2020 09:12:49 To: Frank Schilder Cc: Eugen Block; Marc Roos; ceph-users@ceph.io Subject: Re: [ceph-users] Re: hdd pg's migrating when converting

[ceph-users] Re: hdd pg's migrating when converting ssd class osd's

2020-09-30 Thread Frank Schilder
This is how my crush tree including shadow hierarchies looks like (a mess :): https://pastebin.com/iCLbi4Up Every device class has its own tree. Starting with mimic, this is automatic when creating new device classes. Best regards, = Frank Schilder AIT Risø Campus Bygning 109

[ceph-users] Re: hdd pg's migrating when converting ssd class osd's

2020-09-29 Thread Frank Schilder
it tomorrow. I can check tomorrow how our crush tree unfolds. Basically, for every device class there is a full copy (shadow hierarchy) for each device class with its own weights etc. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: hdd pg's migrating when converting ssd class osd's

2020-09-29 Thread Frank Schilder
are independent of each other. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Marc Roos Sent: 29 September 2020 20:54:48 To: eblock Cc: ceph-users Subject: [ceph-users] Re: hdd pg's migrating when converting ssd class

[ceph-users] Re: samba vfs_ceph: client_mds_namespace not working?

2020-09-23 Thread Frank Schilder
and this should show up in the client session as root '/shares/FOLDER-NAME'. Starts looking like a bug in vfs_ceph.c . Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 23 September 2020 11:49:29

[ceph-users] Re: Documentation broken

2020-09-23 Thread Frank Schilder
Hi Lenz, thanks for that, this should do. Please retain the copy until all is migrated :) Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Lenz Grimmer Sent: 23 September 2020 10:55:13 To: ceph-users@ceph.io

[ceph-users] Re: samba vfs_ceph: client_mds_namespace not working?

2020-09-23 Thread Frank Schilder
ect failed: NT_STATUS_UNSUCCESSFUL Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: 23 September 2020 11:00:50 To: ceph-users Subject: [ceph-users] samba vfs_ceph: client_mds_namespace not working?

[ceph-users] samba vfs_ceph: client_mds_namespace not working?

2020-09-23 Thread Frank Schilder
get a page not found error. My last resort is now to ceph fs set-default CEPH-FS-NAME to the fs to be used and live with the implied restrictions and ugliness. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___

[ceph-users] Re: Unknown PGs after osd move

2020-09-22 Thread Frank Schilder
will go to less redundant storage for a while. The first method takes longer, but there is no redundancy degradation along the way. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nico Schottelius Sent: 22

[ceph-users] Re: Unknown PGs after osd move

2020-09-22 Thread Frank Schilder
cts is going on. There is a long-standing issue that causes placement information to be lost again and one would need to repeat the procedure. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nico Schottelius Sent

[ceph-users] Documentation broken

2020-09-22 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Setting up a small experimental CEPH network

2020-09-21 Thread Frank Schilder
, not with much detail. I have not seen a possibility to use IP:PORT for hashing to a switch port. I have no experience with bonding mode 6 (ALB) that might provide a per-connection hashing. Would be interested to hear how it performs. Best regards, = Frank Schilder AIT Risø Campus Bygning

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
;park" the problem of cluster health for later fixing. Best regads, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: 18 September 2020 15:38:51 To: Michael Thomas; ceph-users@ceph.io Subject: [ceph-users] Re: mu

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
Sorry that I can't be of more help here. However, if you figure out a solution (ideally non-destructive), please post it here. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 18 September 20

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
the way of recovery or when OSDs scrub or check for objects that can be deleted. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 17 September 2020 22:27:47 To: Frank Schilder; ceph-users@ceph.io

[ceph-users] vfs_ceph for CentOS 8

2020-09-17 Thread Frank Schilder
is part of it. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-16 Thread Frank Schilder
to my question in the tracker. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 16 September 2020 01:27:19 To: ceph-users@ceph.io Subject: [ceph-users] multiple OSD crash, unfound objects Over

[ceph-users] Re: The confusing output of ceph df command

2020-09-10 Thread Frank Schilder
all on our ceph fs pool. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: norman Sent: 10 September 2020 08:34:42 To: ceph-users@ceph.io Subject: [ceph-users] Re: The confusing output of ceph df command Anyon

[ceph-users] Re: OSD memory leak?

2020-08-31 Thread Frank Schilder
Looks like the image attachment got removed. Please find it here: https://imgur.com/a/3tabzCN = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 31 August 2020 14:42 To: Mark Nelson; Dan van der Ster; ceph

[ceph-users] Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals)

2020-08-31 Thread Frank Schilder
search the ceph-user list, you will find detailed instructions and also links to explanations and typical benchmarks. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: VELARTIS Philipp Dürhammer Sent: 31 August 2020 14

[ceph-users] How to query status of scheduled commands.

2020-08-31 Thread Frank Schilder
) Operation is running. 3) Operation has completed. 4) Exit code and error messages if applicable. Many thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email

[ceph-users] Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals)

2020-08-31 Thread Frank Schilder
Yes, they can - if volatile write cache is not disabled. There are many threads on this, also recent. Search for "disable write cache" and/or "disable volatile write cache". You will also find different methods of doing this automatically. Best regards, ====

[ceph-users] Re: OSD memory leak?

2020-08-20 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mark Nelson Sent: 20 August 2020 21:52 To: Frank Schilder; Dan van der Ster; ceph-users Subject: Re: [ceph-users] Re: OSD memory leak? Hi Frank, I downloaded but haven't had time

[ceph-users] Re: OSD memory leak?

2020-08-20 Thread Frank Schilder
Hi Dan, no worries. I checked and osd_map_dedup is set to true, the default value. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 20 August 2020 09:41 To: Frank Schilder Cc: Mark Nelson

[ceph-users] Re: OSD memory leak?

2020-08-20 Thread Frank Schilder
I increased the cach min sizes on all OSDs. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 17 August 2020 09:37 To: Dan van der Ster Cc: ceph-users Subject: [ceph-users] Re: OSD memory leak

[ceph-users] Re: OSD memory leak?

2020-08-17 Thread Frank Schilder
, I can do that. Please let me know if I should do all files and with what option (eg. against a base etc.). Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 14 August 2020 10:38:57 To: Frank S

[ceph-users] Re: OSD memory leak?

2020-08-11 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 21 July 2020 12:57:32 To: Mark Nelson; Dan van der Ster Cc: ceph-users Subject: [ceph-users] Re: OSD memory leak? Quick question: Is there a way to change the frequency

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-04 Thread Frank Schilder
.log - log of "old" OSD trimmed to day of restart logs/ceph-osd.288.log - entire log of "new" OSD Hope this helps. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eric Smith Sent: 04

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-04 Thread Frank Schilder
roduce with a standard crush map where host-bucket=physical host and I would, in fact, expect that this scenario is part of the integration test. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eric Smith Sent: 04

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-04 Thread Frank Schilder
lay the crush move game with 300+ OSDs. This unnecessary redundancy degradation on OSD restart cannot possibly be expected behaviour, or do I misunderstand something here? Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 __

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-04 Thread Frank Schilder
SD is located. ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eric Smith Sent: 04 August 2020 12:47:12 To: Frank Schilder; ceph-users Subject: RE: Ceph does not recover from OSD restart Have you adjusted the min_size for pool sr

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-03 Thread Frank Schilder
Sorry for the many small e-mails: requested IDs in the commands, 288-296. One new OSD per host. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 03 August 2020 16:59:04 To: Eric Smith; ceph

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-03 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eric Smith Sent: 03 August 2020 16:45:28 To: Frank Schilder; ceph-users Subject: RE: Ceph does not recover from OSD restart You said you had to move some OSDs out and back in for Ceph

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-03 Thread Frank Schilder
6764 v186764)", "description": "osd_failure(failed timeout osd.287 192.168.32.68:6804/3353324 for 37sec e186764 v186764)", "num_ops": 129 = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-03 Thread Frank Schilder
}, { "rule_id": 9, "rule_name": "sr-rbd-data-one-hdd", "ruleset": 9, "type": 3, "min_size": 3, "max_size": 8, "steps": [ { "o

[ceph-users] Re: Ceph does not recover from OSD restart

2020-08-03 Thread Frank Schilder
is it not finding the objects by itself? A power outage of 3 hosts will halt everything for no reason until manual intervention. How can I avoid this problem? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent

[ceph-users] Ceph does not recover from OSD restart

2020-08-03 Thread Frank Schilder
ld receive the missing OSD IDs, everything is up exactly as it was before the reboot. Thanks for your help and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubs

[ceph-users] Re: mimic: much more raw used than reported

2020-08-03 Thread Frank Schilder
nately, the tool we are using does not have an option to change that. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 01 August 2020 10:53:29 To: Frank Schilder; ceph-users Subject: Re: [ceph-users]

[ceph-users] Re: mimic: much more raw used than reported

2020-07-31 Thread Frank Schilder
could do this without (much) downtime of VMs and it might get us through this migration. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 30 July 2020 15:40 To: Frank Schilder; ceph-users Subject

[ceph-users] Re: mimic: much more raw used than reported

2020-07-30 Thread Frank Schilder
matters a lot, but it becomes a bit unclear now why. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 29 July 2020 16:25:36 To: Frank Schilder; ceph-users Subject: Re: [ceph-users] mimic

[ceph-users] Re: mimic: much more raw used than reported

2020-07-29 Thread Frank Schilder
tacenter ServerRoom/ {o=1} (o==1 && $2=="hdd") {s+=$5;u+=$7;printf("%4s %5s %5s\n", $1, $5, $7)} f==0 {printf("%4s %5s %5s\n", $1, $5, $6);f=1} END {printf("%4s %5.1f %5.1f\n", "SUM", s, u)}')" OSDS=( $(echo "$df_tr

[ceph-users] Re: mimic: much more raw used than reported

2020-07-27 Thread Frank Schilder
st regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 27 July 2020 12:54:02 To: Frank Schilder; ceph-users Subject: Re: [ceph-users] mimic: much more raw used than reported Hi Frank, you might be being hi

[ceph-users] mimic: much more raw used than reported

2020-07-26 Thread Frank Schilder
GiB 3.3 TiB 62.46 1.91 109 osd.85 87 hdd8.90999 1.0 8.9 TiB 5.0 TiB 5.0 TiB 189 MiB 16 GiB 3.9 TiB 55.91 1.71 102 osd.87 Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 _

[ceph-users] Re: OSD memory leak?

2020-07-21 Thread Frank Schilder
to adjust this? If not, can I change the dump path? Its likely to overrun my log partition quickly if I cannot adjust either of the two. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 20

[ceph-users] Re: OSD memory leak?

2020-07-20 Thread Frank Schilder
Dear Mark, thank you very much for the very helpful answers. I will raise osd_memory_cache_min, leave everything else alone and watch what happens. I will report back here. Thanks also for raising this as an issue. Best regards, = Frank Schilder AIT Risø Campus Bygning 109

[ceph-users] Re: OSD memory leak?

2020-07-20 Thread Frank Schilder
uot;: { "items": 9956, "bytes": 189640 }, "buffer_anon": { "items": 293298, "bytes": 59950954 }, "buffer_meta": { "i

[ceph-users] Re: OSD memory leak?

2020-07-16 Thread Frank Schilder
provide. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mark Nelson Sent: 15 July 2020 18:36:06 To: Dan van der Ster Cc: ceph-users Subject: [ceph-users] Re: OSD memory leak? On 7/15/20 9:58 AM, Dan van der Ster wrote

[ceph-users] Re: mon_osd_down_out_subtree_limit not working?

2020-07-15 Thread Frank Schilder
han "file" as is the case). Should I open a tracker ticket? I tested a shutdown of all OSDs on a host and it works now as expected and desired. Thanks! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Frank Schilder Sent: 1

[ceph-users] Re: mon_osd_down_out_subtree_limit not working?

2020-07-15 Thread Frank Schilder
Setting it in ceph.conf is exactly what I wanted to avoid :). I will give it a try though. I guess this should become an issue in the tracker? Is it, by any chance, required to restart *all* daemons or should MONs be enough? Best regards, = Frank Schilder AIT Risø Campus

[ceph-users] Re: mon_osd_down_out_subtree_limit not working?

2020-07-15 Thread Frank Schilder
IGNORES mon_osd_down_out_subtree_limit rackdefault mon so the setting in the config data base is still ignored. Any ideas? I cannot shut down the entire cluster for something that simple. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 10

[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-15 Thread Frank Schilder
or less got it sorted. Hints in this thread helped pinpointing issues. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 13 July 2020 15:38:58 To: André Gemünd; ceph-users Subject

[ceph-users] Re: OSD memory leak?

2020-07-14 Thread Frank Schilder
st regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Anthony D'Atri Sent: 14 July 2020 17:29 To: ceph-users@ceph.io Subject: [ceph-users] Re: OSD memory leak? >> In the past, the minimum recommendation was 1GB RAM per HD

[ceph-users] Re: OSD memory leak?

2020-07-14 Thread Frank Schilder
st regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mark Nelson Sent: 14 July 2020 14:48:36 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: OSD memory leak? Hi Frank, These might help: https://docs.cep

[ceph-users] Re: mon_osd_down_out_subtree_limit not working?

2020-07-14 Thread Frank Schilder
t mon mon_osd_reporter_subtree_level datacenter mon The default overrides the mon config database setting. What is going on here? I restarted all 3 monitors. Best regards and thanks for your help, ===== Frank Schilder AIT Risø Campus

[ceph-users] Re: OSD memory leak?

2020-07-14 Thread Frank Schilder
"items": 117697, "bytes": 2080280 }, "osdmap_mapping": { "items": 0, "bytes": 0 }, "pgmap": {

[ceph-users] Re: mon_osd_down_out_subtree_limit not working?

2020-07-14 Thread Frank Schilder
host. Unfortunately, in my case these two settings behave differently. If I understand the documentation correctly, the OSDs should not get marked out automatically. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From

[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Frank Schilder
driver = "raw" , cache = "none"] which translates to in the XML. We have no qemu settings in the ceph.conf. Looks like caching is disabled. Not sure if this is the recommended way though and why caching is disabled by default. Best regards, ===== Fran

[ceph-users] OSD memory leak?

2020-07-13 Thread Frank Schilder
63936 ( 1188.1 MiB) Actual memory used (physical + swap) MALLOC: + 20488192 ( 19.5 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 1266352128 ( 1207.7 MiB) Virtual address space used MALLOC: MALLOC: 54160 Spans in use MALLOC: 33 Thread heaps in use MALLOC: 8192 Tcmalloc page size Am I looking at a memory leak here or are these heap stats expected? I don't mind the swap usage, it doesn't have impact. I'm just wondering if I need to restart OSDs regularly. The "leakage" above occurred within only 2 months. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] mon_osd_down_out_subtree_limit not working?

2020-07-13 Thread Frank Schilder
and, unfortunately, the OSDs do get marked as out. Ceph status was showing 1 host down as expected. Am I doing something wrong or misreading the documentation? Thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users

[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Frank Schilder
ists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/ANHJQZLJT474B457VVM4ZZZ6HBXW4OPO/ . We are very sure that it is not related to other processes writing to disk, we monitor that too. There is also no competition on the RBD pool at the time of testing. Best regards, = Frank

[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-07-13 Thread Frank Schilder
servers where possible. If anyone has a hint what we can do, please let us know. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users

[ceph-users] Re: Are there 'tuned profiles' for various ceph scenarios?

2020-07-02 Thread Frank Schilder
to tune properly, I don't see a way around reading the entire kernel tuning parameter references. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Harry G. Coin Sent: 01 July 2020 21:26:59 To: ceph-users@ceph.io

<    1   2   3   4   5   6   7   8   >