[ceph-users] nfs-ganesha FSAL CephFS: nfs_health :DBUS :WARN :Health status is unhealthy
Hi! Today one of our nfs-ganesha gateway experienced an outage and since crashs every time, the client behind it tries to access the data. This is a Ceph Mimic cluster with nfs-ganesha from ceph-repos: nfs-ganesha-2.6.2-0.1.el7.x86_64 nfs-ganesha-ceph-2.6.2-0.1.el7.x86_64 There were fixes for this problem in 2.6.3: https://github.com/nfs-ganesha/nfs-ganesha/issues/339 Can the build in the repos be compiled against this bugfix release? Thank you very much. Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] omap vs. xattr in librados
Hi, I'm interested in writing a relatively simple application that would use librados for storage. Are there recommendations for when to use the omap as opposed to an xattr? In theory, you could use either a set of xattrs or an omap as a kv store associated with a specific object. Are there recommendations for what kind of data xattrs and omaps are intended to store? Just for background, I have some metadata i'd like to associate with each object (total size of all kv pairs in object metadata is ~250k, some values a few bytes, while others are 10-20k.) The object will store actual data (a relatively large FP array) as a binary blob (~3-5 MB). Thanks, Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark > Nelson > Sent: 10 September 2018 18:27 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Bluestore DB size and onode count > > On 09/10/2018 12:22 PM, Igor Fedotov wrote: > > > Hi Nick. > > > > > > On 9/10/2018 1:30 PM, Nick Fisk wrote: > >> If anybody has 5 minutes could they just clarify a couple of things > >> for me > >> > >> 1. onode count, should this be equal to the number of objects stored > >> on the OSD? > >> Through reading several posts, there seems to be a general indication > >> that this is the case, but looking at my OSD's the maths don't > >> work. > > onode_count is the number of onodes in the cache, not the total number > > of onodes at an OSD. > > Hence the difference... Ok, thanks, that makes sense. I assume there isn't actually a counter which gives you the total number of objects on an OSD then? > >> > >> Eg. > >> ceph osd df > >> ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS > >> 0 hdd 2.73679 1.0 2802G 1347G 1454G 48.09 0.69 115 > >> > >> So 3TB OSD, roughly half full. This is pure RBD workload (no > >> snapshots or anything clever) so let's assume worse case scenario of > >> 4MB objects (Compression is on however, which would only mean more > >> objects for given size) > >> 1347000/4=~336750 expected objects > >> > >> sudo ceph daemon osd.0 perf dump | grep blue > >> "bluefs": { > >> "bluestore": { > >> "bluestore_allocated": 1437813964800, > >> "bluestore_stored": 2326118994003, > >> "bluestore_compressed": 445228558486, > >> "bluestore_compressed_allocated": 547649159168, > >> "bluestore_compressed_original": 1437773843456, > >> "bluestore_onodes": 99022, > >> "bluestore_onode_hits": 18151499, > >> "bluestore_onode_misses": 4539604, > >> "bluestore_onode_shard_hits": 10596780, > >> "bluestore_onode_shard_misses": 4632238, > >> "bluestore_extents": 896365, > >> "bluestore_blobs": 861495, > >> > >> 99022 onodes, anyone care to enlighten me? > >> > >> 2. block.db Size > >> sudo ceph daemon osd.0 perf dump | grep db > >> "db_total_bytes": 8587829248, > >> "db_used_bytes": 2375024640, > >> > >> 2.3GB=0.17% of data size. This seems a lot lower than the 1% > >> recommendation (10GB for every 1TB) or 4% given in the official docs. I > >> know that different workloads will have differing overheads and > >> potentially smaller objects. But am I understanding these figures > >> correctly as they seem dramatically lower? > > Just in case - is slow_used_bytes equal to 0? Some DB data might > > reside at slow device if spill over has happened. Which doesn't > > require full DB volume to happen - that's by RocksDB's design. > > > > And recommended numbers are a bit... speculative. So it's quite > > possible that you numbers are absolutely adequate. > > FWIW, these are the numbers I came up with after examining the SST files > generated under different workloads: > > https://protect-eu.mimecast.com/s/7e0iCJq9Bh6pZCzILpy?domain=drive.google.com > Thanks for your input Mark and Igor. Mark I can see your RBD figures aren't too far off mine, so all looks to be as expected then. > >> > >> Regards, > >> Nick > >> > >> ___ > >> ceph-users mailing list > >> mailto:ceph-users@lists.ceph.com > >> https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com > > > > ___ > > ceph-users mailing list > > mailto:ceph-users@lists.ceph.com > > https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com > > ___ > ceph-users mailing list > mailto:ceph-users@lists.ceph.com > https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
On 9/10/2018 8:26 PM, Mark Nelson wrote: On 09/10/2018 12:22 PM, Igor Fedotov wrote: Just in case - is slow_used_bytes equal to 0? Some DB data might reside at slow device if spill over has happened. Which doesn't require full DB volume to happen - that's by RocksDB's design. And recommended numbers are a bit... speculative. So it's quite possible that you numbers are absolutely adequate. FWIW, these are the numbers I came up with after examining the SST files generated under different workloads: https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing Sorry, Mark. Speculative is a bit too strong word... I meant that two-parameter sizing model describing such a complex system as Ceph might tend to produce quite inaccurate results often enough... Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd-nbd on CentOS
On Mon, Sep 10, 2018 at 7:46 PM David Turner wrote: > > Now that you mention it, I remember those threads on the ML. What happens if > you use --yes-i-really-mean-it to do those things and then later you try to > map an RBD with an older kernel for CentOS 7.3 or 7.4? Will that mapping > fail because of the min-client-version of luminous set on the cluster while > allowing CentOS 7.5 clients map RBDs? Yes, more or less. If you _just_ set the require-min-compat-client setting, nothing will change. It's there to prevent you from accidentally locking out older clients by enabling some new feature. You will continue to be able to map images with both old and new kernels. If you then go ahead and install an upmap exception (manually or via the balancer module), you will no longer be able to map images with old kernels. This applies to all RADOS clients, not just the kernel client. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] data corruption issue with "rbd export-diff/import-diff"
On 2018-09-10 11:04:20-07:00 Jason Dillaman wrote: On Mon, Sep 10, 2018 at 1:35 PMwrote: > We utilize Ceph RBDs for our users' storage and need to keep data > synchronized across data centres. For this we rely on 'rbd export-diff / > import-diff'. Lately we have been noticing cases in which the file system on > the 'destination RBD' is corrupt. We have been trying to isolate the issue, > which may or may not be due to Ceph. We suspect the problem could be in 'rbd > export-diff / import-diff' and are wondering if people have been seeing > issues with these tools. Let me explain our use case and issue in more detail. > We have a number of data centres each with a Ceph cluster storing tens of > thousands of RBDs. We maintain extra copies of each RBD in other data > centres. After we are 'done' using a RBD, we create a snapshot and use 'rbd > export-diff' to create a diff between the most recent 'common' snapshot at > the other data center. We send the data over the network, and use 'rbd > import-diff' on the destination. When we apply a diff to a destination RBD we > can guarantee its 'HEAD' is clean. Of course we guarantee that an RBD is only > used in one data centre at a time. > We noticed corruption at the destination RBD based on fsck failures, further > investigation showed that checksums on the RBD mismatch as well. Somehow the > data is sometimes getting corrupted either by our software or 'rbd > export-diff / import-diff'. Our investigation suggests that the the problem > is in 'rbd export-diff/import-diff'. The main evidence of this is that > occasionally we sync an RBD between multiple data centres. Each sync is a > separate job with its own 'rbd export-diff'. We noticed that both destination > locations have the same corruption (and the same checksum) and the source is > healthy. Any chance you are using OSD tiering on your RBD pool? The export-diffs from a cache tier pool are almost guaranteed to be corrupt if that's the case since the cache tier provides incorrect object diff stats [1]. No, we are not using any OSD tiering in our pools. > In addition to this, we are seeing a similar type of corruption in another > use case when we migrate RBDs and snapshots across pools. In this case we > clone a version of an RBD (e.g. HEAD-3) to a new pool and rely on 'rbd > export-diff/import-diff' to restore the last 3 snapshots on top. Here too we > see cases of fsck and RBD checksum failures. > We maintain various metrics and logs. Looking back at our data we have seen > the issue at a small scale for a while on Jewel, but the frequency increased > recently. The timing may have coincided with a move to Luminous, but this may > be coincidence. We are currently on Ceph 12.2.5. > We are wondering if people are experiencing similar issues with 'rbd > export-diff / import-diff'. I'm sure many people use it to keep backups in > sync. Since it is backups, many people may not inspect the data often. In our > use case, we use this mechanism to keep data in sync and actually need the > data in the other location often. We are wondering if anyone else has > encountered any issues, it's quite possible that many people may have this > issue, buts simply don't realize. We are likely hitting it much more > frequently due to the scale of our operation (tens of thousands of syncs a > day). If you are able to recreate this reliably without tiering, it would assist in debugging if you could capture RBD debug logs during the export along w/ the LBA of the filesystem corruption to compare against. We haven't been able to reproduce this reliably as of yet, as of yet we haven't actually figured out the exact conditions that cause this to happen, we have just been seeing it happen on some percentage of export/import-diff operations. We will investigate creating ways to create debug logs of the export operations, and capture LBAs of the filesystem corruption when it occurs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] data corruption issue with "rbd export-diff/import-diff"
On Mon, Sep 10, 2018 at 1:35 PM wrote: > > Hi, > We utilize Ceph RBDs for our users' storage and need to keep data > synchronized across data centres. For this we rely on 'rbd export-diff / > import-diff'. Lately we have been noticing cases in which the file system on > the 'destination RBD' is corrupt. We have been trying to isolate the issue, > which may or may not be due to Ceph. We suspect the problem could be in 'rbd > export-diff / import-diff' and are wondering if people have been seeing > issues with these tools. Let me explain our use case and issue in more detail. > We have a number of data centres each with a Ceph cluster storing tens of > thousands of RBDs. We maintain extra copies of each RBD in other data > centres. After we are 'done' using a RBD, we create a snapshot and use 'rbd > export-diff' to create a diff between the most recent 'common' snapshot at > the other data center. We send the data over the network, and use 'rbd > import-diff' on the destination. When we apply a diff to a destination RBD we > can guarantee its 'HEAD' is clean. Of course we guarantee that an RBD is only > used in one data centre at a time. > We noticed corruption at the destination RBD based on fsck failures, further > investigation showed that checksums on the RBD mismatch as well. Somehow the > data is sometimes getting corrupted either by our software or 'rbd > export-diff / import-diff'. Our investigation suggests that the the problem > is in 'rbd export-diff/import-diff'. The main evidence of this is that > occasionally we sync an RBD between multiple data centres. Each sync is a > separate job with its own 'rbd export-diff'. We noticed that both destination > locations have the same corruption (and the same checksum) and the source is > healthy. Any chance you are using OSD tiering on your RBD pool? The export-diffs from a cache tier pool are almost guaranteed to be corrupt if that's the case since the cache tier provides incorrect object diff stats [1]. > In addition to this, we are seeing a similar type of corruption in another > use case when we migrate RBDs and snapshots across pools. In this case we > clone a version of an RBD (e.g. HEAD-3) to a new pool and rely on 'rbd > export-diff/import-diff' to restore the last 3 snapshots on top. Here too we > see cases of fsck and RBD checksum failures. > We maintain various metrics and logs. Looking back at our data we have seen > the issue at a small scale for a while on Jewel, but the frequency increased > recently. The timing may have coincided with a move to Luminous, but this may > be coincidence. We are currently on Ceph 12.2.5. > We are wondering if people are experiencing similar issues with 'rbd > export-diff / import-diff'. I'm sure many people use it to keep backups in > sync. Since it is backups, many people may not inspect the data often. In our > use case, we use this mechanism to keep data in sync and actually need the > data in the other location often. We are wondering if anyone else has > encountered any issues, it's quite possible that many people may have this > issue, buts simply don't realize. We are likely hitting it much more > frequently due to the scale of our operation (tens of thousands of syncs a > day). If you are able to recreate this reliably without tiering, it would assist in debugging if you could capture RBD debug logs during the export along w/ the LBA of the filesystem corruption to compare against. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1] http://tracker.ceph.com/issues/20896 -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] data corruption issue with "rbd export-diff/import-diff"
Hi, We utilize Ceph RBDs for our users' storage and need to keep data synchronized across data centres. For this we rely on 'rbd export-diff / import-diff'. Lately we have been noticing cases in which the file system on the 'destination RBD' is corrupt. We have been trying to isolate the issue, which may or may not be due to Ceph. We suspect the problem could be in 'rbd export-diff / import-diff' and are wondering if people have been seeing issues with these tools. Let me explain our use case and issue in more detail. We have a number of data centres each with a Ceph cluster storing tens of thousands of RBDs. We maintain extra copies of each RBD in other data centres. After we are 'done' using a RBD, we create a snapshot and use 'rbd export-diff' to create a diff between the most recent 'common' snapshot at the other data center. We send the data over the network, and use 'rbd import-diff' on the destination. When we apply a diff to a destination RBD we can guarantee its 'HEAD' is clean. Of course we guarantee that an RBD is only used in one data centre at a time. We noticed corruption at the destination RBD based on fsck failures, further investigation showed that checksums on the RBD mismatch as well. Somehow the data is sometimes getting corrupted either by our software or 'rbd export-diff / import-diff'. Our investigation suggests that the the problem is in 'rbd export-diff/import-diff'. The main evidence of this is that occasionally we sync an RBD between multiple data centres. Each sync is a separate job with its own 'rbd export-diff'. We noticed that both destination locations have the same corruption (and the same checksum) and the source is healthy. In addition to this, we are seeing a similar type of corruption in another use case when we migrate RBDs and snapshots across pools. In this case we clone a version of an RBD (e.g. HEAD-3) to a new pool and rely on 'rbd export-diff/import-diff' to restore the last 3 snapshots on top. Here too we see cases of fsck and RBD checksum failures. We maintain various metrics and logs. Looking back at our data we have seen the issue at a small scale for a while on Jewel, but the frequency increased recently. The timing may have coincided with a move to Luminous, but this may be coincidence. We are currently on Ceph 12.2.5. We are wondering if people are experiencing similar issues with 'rbd export-diff / import-diff'. I'm sure many people use it to keep backups in sync. Since it is backups, many people may not inspect the data often. In our use case, we use this mechanism to keep data in sync and actually need the data in the other location often. We are wondering if anyone else has encountered any issues, it's quite possible that many people may have this issue, buts simply don't realize. We are likely hitting it much more frequently due to the scale of our operation (tens of thousands of syncs a day). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd-nbd on CentOS
Now that you mention it, I remember those threads on the ML. What happens if you use --yes-i-really-mean-it to do those things and then later you try to map an RBD with an older kernel for CentOS 7.3 or 7.4? Will that mapping fail because of the min-client-version of luminous set on the cluster while allowing CentOS 7.5 clients map RBDs? On Mon, Sep 10, 2018 at 1:33 PM Ilya Dryomov wrote: > On Mon, Sep 10, 2018 at 7:19 PM David Turner > wrote: > > > > I haven't found any mention of this on the ML and Google's results are > all about compiling your own kernel to use NBD on CentOS. Is everyone > that's using rbd-nbd on CentOS honestly compiling their own kernels for the > clients? This feels like something that shouldn't be necessary anymore. > > > > I would like to use the balancer module with upmap, but can't do that > with kRBD because even the latest kernels still register as Jewel. What > have y'all done to use rbd-nbd on CentOS? I'm hoping I'm missing something > and not that I'll need to compile a kernel to use on all of the hosts that > I want to map RBDs to. > > FWIW upmap is fully supported since 4.13 and RHEL 7.5: > > https://www.spinics.net/lists/ceph-users/msg45071.html > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/029105.html > > Thanks, > > Ilya > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd-nbd on CentOS
On Mon, Sep 10, 2018 at 7:19 PM David Turner wrote: > > I haven't found any mention of this on the ML and Google's results are all > about compiling your own kernel to use NBD on CentOS. Is everyone that's > using rbd-nbd on CentOS honestly compiling their own kernels for the clients? > This feels like something that shouldn't be necessary anymore. > > I would like to use the balancer module with upmap, but can't do that with > kRBD because even the latest kernels still register as Jewel. What have y'all > done to use rbd-nbd on CentOS? I'm hoping I'm missing something and not that > I'll need to compile a kernel to use on all of the hosts that I want to map > RBDs to. FWIW upmap is fully supported since 4.13 and RHEL 7.5: https://www.spinics.net/lists/ceph-users/msg45071.html http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/029105.html Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
On 09/10/2018 12:22 PM, Igor Fedotov wrote: Hi Nick. On 9/10/2018 1:30 PM, Nick Fisk wrote: If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. onode_count is the number of onodes in the cache, not the total number of onodes at an OSD. Hence the difference... Eg. ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 2.73679 1.0 2802G 1347G 1454G 48.09 0.69 115 So 3TB OSD, roughly half full. This is pure RBD workload (no snapshots or anything clever) so let's assume worse case scenario of 4MB objects (Compression is on however, which would only mean more objects for given size) 1347000/4=~336750 expected objects sudo ceph daemon osd.0 perf dump | grep blue "bluefs": { "bluestore": { "bluestore_allocated": 1437813964800, "bluestore_stored": 2326118994003, "bluestore_compressed": 445228558486, "bluestore_compressed_allocated": 547649159168, "bluestore_compressed_original": 1437773843456, "bluestore_onodes": 99022, "bluestore_onode_hits": 18151499, "bluestore_onode_misses": 4539604, "bluestore_onode_shard_hits": 10596780, "bluestore_onode_shard_misses": 4632238, "bluestore_extents": 896365, "bluestore_blobs": 861495, 99022 onodes, anyone care to enlighten me? 2. block.db Size sudo ceph daemon osd.0 perf dump | grep db "db_total_bytes": 8587829248, "db_used_bytes": 2375024640, 2.3GB=0.17% of data size. This seems a lot lower than the 1% recommendation (10GB for every 1TB) or 4% given in the official docs. I know that different workloads will have differing overheads and potentially smaller objects. But am I understanding these figures correctly as they seem dramatically lower? Just in case - is slow_used_bytes equal to 0? Some DB data might reside at slow device if spill over has happened. Which doesn't require full DB volume to happen - that's by RocksDB's design. And recommended numbers are a bit... speculative. So it's quite possible that you numbers are absolutely adequate. FWIW, these are the numbers I came up with after examining the SST files generated under different workloads: https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
Hi Nick. On 9/10/2018 1:30 PM, Nick Fisk wrote: If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. onode_count is the number of onodes in the cache, not the total number of onodes at an OSD. Hence the difference... Eg. ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 0 hdd 2.73679 1.0 2802G 1347G 1454G 48.09 0.69 115 So 3TB OSD, roughly half full. This is pure RBD workload (no snapshots or anything clever) so let's assume worse case scenario of 4MB objects (Compression is on however, which would only mean more objects for given size) 1347000/4=~336750 expected objects sudo ceph daemon osd.0 perf dump | grep blue "bluefs": { "bluestore": { "bluestore_allocated": 1437813964800, "bluestore_stored": 2326118994003, "bluestore_compressed": 445228558486, "bluestore_compressed_allocated": 547649159168, "bluestore_compressed_original": 1437773843456, "bluestore_onodes": 99022, "bluestore_onode_hits": 18151499, "bluestore_onode_misses": 4539604, "bluestore_onode_shard_hits": 10596780, "bluestore_onode_shard_misses": 4632238, "bluestore_extents": 896365, "bluestore_blobs": 861495, 99022 onodes, anyone care to enlighten me? 2. block.db Size sudo ceph daemon osd.0 perf dump | grep db "db_total_bytes": 8587829248, "db_used_bytes": 2375024640, 2.3GB=0.17% of data size. This seems a lot lower than the 1% recommendation (10GB for every 1TB) or 4% given in the official docs. I know that different workloads will have differing overheads and potentially smaller objects. But am I understanding these figures correctly as they seem dramatically lower? Just in case - is slow_used_bytes equal to 0? Some DB data might reside at slow device if spill over has happened. Which doesn't require full DB volume to happen - that's by RocksDB's design. And recommended numbers are a bit... speculative. So it's quite possible that you numbers are absolutely adequate. Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd-nbd on CentOS
I haven't found any mention of this on the ML and Google's results are all about compiling your own kernel to use NBD on CentOS. Is everyone that's using rbd-nbd on CentOS honestly compiling their own kernels for the clients? This feels like something that shouldn't be necessary anymore. I would like to use the balancer module with upmap, but can't do that with kRBD because even the latest kernels still register as Jewel. What have y'all done to use rbd-nbd on CentOS? I'm hoping I'm missing something and not that I'll need to compile a kernel to use on all of the hosts that I want to map RBDs to. Alternatively there's rbd-fuse, but in it's current state it's too slow for me. There's a [1] PR for an update to rbd-fuse that is promising. I have seen the custom version of this rbd-fuse in action and it's really impressive on speed. It can pretty much keep pace with the kernel client. However, even if that does get merged, it'll be quite a while before it's back-ported into a release. [1] https://github.com/ceph/ceph/pull/23270 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] tier monitoring
Hi! Does anyone have a recipe for monitoring of tiering pool? Interested in such parameters as fullness, flush/evict/promote statistics and so on. WBR, Fyodor. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Need a procedure for corrupted pg_log repair using ceph-kvstore-tool
Can someone provide information about what to look for (and how to modify the related leveldb keys) in case of such error leading to OSD crash -5> 2018-09-10 14:46:30.896130 7efff657dd00 20 read_log_and_missing 712021'566147 (656569'562061) delete 28:b2d84df6:::rbd_data.423c863d6f7d13.071c:head by client.442854982.0:26349 2018-08-22 12:45:48.366430 0 -4> 2018-09-10 14:46:30.896135 7efff657dd00 20 read_log_and_missing 712021'566148 (396232'430937) modify 28:b2a8dfc4:::rbd_data.1279a2016dd7ff07.1715:head by client.375380018.0:66926373 2018-08-22 13:53:42.891543 0 -3> 2018-09-10 14:46:30.896140 7efff657dd00 20 read_log_and_missing 712021'566149 (455388'436624) modify 28:b2e5c03b:::rbd_data.c3b0cd3fe98040.0dd1:head by client.357924238.0:32177266 2018-08-22 12:40:20.290431 0 -2> 2018-09-10 14:46:30.896145 7efff657dd00 20 read_log_and_missing 712021'566150 (455452'436627) modify 28:b2be4e96:::rbd_data.c3b0cd3fe98040.0e8e:head by client.357924238.0:32178303 2018-08-22 13:51:03.149459 0 -1> 2018-09-10 14:46:30.896153 7efff657dd00 20 read_log_and_missing 714416'1 (0'0) error 28:b2b68805:::rbd_data.516e3914fdc210.1993:head by client.441544789.0:109624 0.00 -2 0> 2018-09-10 14:46:30.897918 7efff657dd00 -1 /build/ceph-12.2.7/src/osd/PGLog.h: In function 'static void PGLog::read_log_and_missing(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, PGLog::IndexedLog&, missing_type&, bool, std::ostringstream&, bool, bool*, const DoutPrefixProvider*, std::set >*, bool) [with missing_type = pg_missing_set; std::ostringstream = std::basic_ostringstream]' thread 7efff657dd00 time 2018-09-10 14:46:30.896158 /build/ceph-12.2.7/src/osd/PGLog.h: 1354: FAILED assert(last_e.version.version < e.version.version) The ceph version is 12.2.7, and the current problem is a consequence of multiple crashes of numerous OSDs due to some other ceph error. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help
(adding list back) The "clients failing to respond to capability release" messages can sometimes indicate a bug in the client code, so it's a good idea to make sure you've got the most recent fixes before investigating further. It's also useful to compare kernel vs. fuse clients to see if the issue occurs in one but not the other. The guidance on client choice and kernel versions is here: http://docs.ceph.com/docs/master/cephfs/best-practices/#which-client If you're happy running a non-LTS distro like Fedora, then I'd suggest running the latest Fedora release (28). John On Mon, Sep 10, 2018 at 3:17 PM marc-antoine desrochers wrote: > > What Is the advantages of using ceph-fuse ? and if I stay on kernel client > what kind of distro/kernel are you suggesting ? > > -Message d'origine- > De : John Spray [mailto:jsp...@redhat.com] > Envoyé : 10 septembre 2018 10:08 > À : marc-antoine.desroch...@sogetel.com > Cc : ceph-users@lists.ceph.com > Objet : Re: [ceph-users] Need help > > On Mon, Sep 10, 2018 at 1:40 PM marc-antoine desrochers > wrote: > > > > Hi, > > > > > > > > I am currently running a ceph cluster running in CEPHFS with 3 nodes each > > have 6 osd’s except 1 who got 5. I got 3 mds : 2 active and 1 standby, 3 > > mon. > > > > > > > > > > > > [root@ceph-n1 ~]# ceph -s > > > > cluster: > > > > id: 1d97aa70-2029-463a-b6fa-20e98f3e21fb > > > > health: HEALTH_WARN > > > > 3 clients failing to respond to capability release > > > > 2 MDSs report slow requests > > > > > > > > services: > > > > mon: 3 daemons, quorum ceph-n1,ceph-n2,ceph-n3 > > > > mgr: ceph-n1(active), standbys: ceph-n2, ceph-n3 > > > > mds: cephfs-2/2/2 up {0=ceph-n1=up:active,1=ceph-n2=up:active}, 1 > > up:standby > > > > osd: 17 osds: 17 up, 17 in > > > > > > > > data: > > > > pools: 2 pools, 1024 pgs > > > > objects: 541k objects, 42006 MB > > > > usage: 143 GB used, 6825 GB / 6969 GB avail > > > > pgs: 1024 active+clean > > > > > > > > io: > > > > client: 32980 B/s rd, 77295 B/s wr, 5 op/s rd, 14 op/s wr > > > > > > > > I’m using the cephFs as a mail storage. I currently have 3500 mailbox > > some of them are IMAP the others are POP3 the goal is to be able to > > migrate all mailbox from my old > > > > > > > > infrastructure so around 30 000 mailbox. > > > > > > > > I’m now facing a problem : > > > > MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability > > release > > > > mdsceph-n1(mds.0): Client mda3.sogetel.net failing to respond to > > capability releaseclient_id: 1134426 > > > > mdsceph-n1(mds.0): Client mda2.sogetel.net failing to respond to > > capability releaseclient_id: 1172391 > > > > mdsceph-n2(mds.1): Client mda3.sogetel.net failing to respond to > > capability releaseclient_id: 1134426 > > > > MDS_SLOW_REQUEST 2 MDSs report slow requests > > > > mdsceph-n1(mds.0): 112 slow requests are blocked > 30 sec > > > > mdsceph-n2(mds.1): 323 slow requests are blocked > 30 sec > > > > > > > > I can’t figure out how to fix this… > > > > > > > > > > Here some information’s about my cluster : > > > > I’m running ceph luminous 12.2.5 on my 3 ceph nodes : ceph-n1, ceph-n2, > > ceph-n3. > > > > > > I have 3 client identical : > > > > LSB Version::core-4.1-amd64:core-4.1-noarch > > > > Distributor ID: Fedora > > > > Description:Fedora release 25 (Twenty Five) > > > > Release:25 > > > > Codename: TwentyFive > > > > I can't say for sure whether it would help, but I'd definitely suggest > upgrading those nodes to latest Fedora if you're using the kernel client -- > Fedora 25 hasn't received updates for quite some time. > > John > > > > > My ceph nodes : > > > > > > > > CentOS Linux release 7.5.1804 (Core) > > > > NAME="CentOS Linux" > > > > VERSION="7 (Core)" > > > > ID="centos" > > > > ID_LIKE="rhel fedora" > > > > VERSION_ID="7" > > > > PRETTY_NAME="CentOS Linux 7 (Core)" > > > > ANSI_COLOR="0;31" > > > > CPE_NAME="cpe:/o:centos:centos:7" > > > > HOME_URL="https://www.centos.org/"; > > > > BUG_REPORT_URL="https://bugs.centos.org/"; > > > > > > > > CENTOS_MANTISBT_PROJECT="CentOS-7" > > > > CENTOS_MANTISBT_PROJECT_VERSION="7" > > > > REDHAT_SUPPORT_PRODUCT="centos" > > > > REDHAT_SUPPORT_PRODUCT_VERSION="7" > > > > > > > > CentOS Linux release 7.5.1804 (Core) > > > > CentOS Linux release 7.5.1804 (Core) > > > > > > > > ceph daemon mds.ceph-n1 perf dump mds : > > > > > > > > > > > > "mds": { > > > > "request": 21968558, > > > > "reply": 21954801, > > > > "reply_latency": { > > > > "avgcount": 21954801, > > > > "sum": 100879.560315258, > > > > "avgtime": 0.004594874 > > > > }, > > > > "forward": 13627, > > > > "dir_fetch": 3327, > > > > "dir_commit": 162830, > > > > "dir_split": 1, > > > > "dir_merge": 0, > > > > "inode_max": 2147483647, > > > > "inodes"
Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time
Hi list, A little update: meanwhile we added a new node consisting of Hammer OSDs to ensure sufficient cluster capacity. The upgraded node with Infernalis OSDs is completely removed from the CRUSH map and the OSDs removed (obviously we didn't wipe the disks yet). At the moment we're still running using flags noout,nobackfill,noscrub,nodeep-scrub. Although now only Hammer OSDs reside, we still experience OSD crashes on backfilling so we're unable to achieve HEALTH_OK state. Using debug 20 level we're (mostly my coworker Willem Jan is) figuring out why the crashes happen exactly. Hopefully we'll figure it out. To be continued... Regards, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help
On Mon, Sep 10, 2018 at 1:40 PM marc-antoine desrochers wrote: > > Hi, > > > > I am currently running a ceph cluster running in CEPHFS with 3 nodes each > have 6 osd’s except 1 who got 5. I got 3 mds : 2 active and 1 standby, 3 mon. > > > > > > [root@ceph-n1 ~]# ceph -s > > cluster: > > id: 1d97aa70-2029-463a-b6fa-20e98f3e21fb > > health: HEALTH_WARN > > 3 clients failing to respond to capability release > > 2 MDSs report slow requests > > > > services: > > mon: 3 daemons, quorum ceph-n1,ceph-n2,ceph-n3 > > mgr: ceph-n1(active), standbys: ceph-n2, ceph-n3 > > mds: cephfs-2/2/2 up {0=ceph-n1=up:active,1=ceph-n2=up:active}, 1 > up:standby > > osd: 17 osds: 17 up, 17 in > > > > data: > > pools: 2 pools, 1024 pgs > > objects: 541k objects, 42006 MB > > usage: 143 GB used, 6825 GB / 6969 GB avail > > pgs: 1024 active+clean > > > > io: > > client: 32980 B/s rd, 77295 B/s wr, 5 op/s rd, 14 op/s wr > > > > I’m using the cephFs as a mail storage. I currently have 3500 mailbox some of > them are IMAP the others are POP3 the goal is to be able to migrate all > mailbox from my old > > > > infrastructure so around 30 000 mailbox. > > > > I’m now facing a problem : > > MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release > > mdsceph-n1(mds.0): Client mda3.sogetel.net failing to respond to > capability releaseclient_id: 1134426 > > mdsceph-n1(mds.0): Client mda2.sogetel.net failing to respond to > capability releaseclient_id: 1172391 > > mdsceph-n2(mds.1): Client mda3.sogetel.net failing to respond to > capability releaseclient_id: 1134426 > > MDS_SLOW_REQUEST 2 MDSs report slow requests > > mdsceph-n1(mds.0): 112 slow requests are blocked > 30 sec > > mdsceph-n2(mds.1): 323 slow requests are blocked > 30 sec > > > > I can’t figure out how to fix this… > > > > > Here some information’s about my cluster : > > I’m running ceph luminous 12.2.5 on my 3 ceph nodes : ceph-n1, ceph-n2, > ceph-n3. > > > I have 3 client identical : > > LSB Version::core-4.1-amd64:core-4.1-noarch > > Distributor ID: Fedora > > Description:Fedora release 25 (Twenty Five) > > Release:25 > > Codename: TwentyFive > I can't say for sure whether it would help, but I'd definitely suggest upgrading those nodes to latest Fedora if you're using the kernel client -- Fedora 25 hasn't received updates for quite some time. John > > My ceph nodes : > > > > CentOS Linux release 7.5.1804 (Core) > > NAME="CentOS Linux" > > VERSION="7 (Core)" > > ID="centos" > > ID_LIKE="rhel fedora" > > VERSION_ID="7" > > PRETTY_NAME="CentOS Linux 7 (Core)" > > ANSI_COLOR="0;31" > > CPE_NAME="cpe:/o:centos:centos:7" > > HOME_URL="https://www.centos.org/"; > > BUG_REPORT_URL="https://bugs.centos.org/"; > > > > CENTOS_MANTISBT_PROJECT="CentOS-7" > > CENTOS_MANTISBT_PROJECT_VERSION="7" > > REDHAT_SUPPORT_PRODUCT="centos" > > REDHAT_SUPPORT_PRODUCT_VERSION="7" > > > > CentOS Linux release 7.5.1804 (Core) > > CentOS Linux release 7.5.1804 (Core) > > > > ceph daemon mds.ceph-n1 perf dump mds : > > > > > > "mds": { > > "request": 21968558, > > "reply": 21954801, > > "reply_latency": { > > "avgcount": 21954801, > > "sum": 100879.560315258, > > "avgtime": 0.004594874 > > }, > > "forward": 13627, > > "dir_fetch": 3327, > > "dir_commit": 162830, > > "dir_split": 1, > > "dir_merge": 0, > > "inode_max": 2147483647, > > "inodes": 68767, > > "inodes_top": 4524, > > "inodes_bottom": 56697, > > "inodes_pin_tail": 7546, > > "inodes_pinned": 62304, > > "inodes_expired": 1640159, > > "inodes_with_caps": 62192, > > "caps": 114126, > > "subtrees": 14, > > "traverse": 38309963, > > "traverse_hit": 37606227, > > "traverse_forward": 12189, > > "traverse_discover": 6634, > > "traverse_dir_fetch": 1769, > > "traverse_remote_ino": 6, > > "traverse_lock": 7731, > > "load_cent": 2196856701, > > "q": 0, > > "exported": 143, > > "exported_inodes": 291372, > > "imported": 125, > > "imported_inodes": 176509 > > > > > > Thanks for your help… > > > > Regards > > > > Marc-Antoine > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help
Hi, On 09/10/2018 02:40 PM, marc-antoine desrochers wrote: Hi, I am currently running a ceph cluster running in CEPHFS with 3 nodes each have 6 osd's except 1 who got 5. I got 3 mds : 2 active and 1 standby, 3 mon. [root@ceph-n1 ~]# ceph -s cluster: id: 1d97aa70-2029-463a-b6fa-20e98f3e21fb health: HEALTH_WARN 3 clients failing to respond to capability release 2 MDSs report slow requests *snipsnap* I'm now facing a problem : MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release mdsceph-n1(mds.0): Client mda3.sogetel.net failing to respond to capability releaseclient_id: 1134426 mdsceph-n1(mds.0): Client mda2.sogetel.net failing to respond to capability releaseclient_id: 1172391 mdsceph-n2(mds.1): Client mda3.sogetel.net failing to respond to capability releaseclient_id: 1134426 MDS_SLOW_REQUEST 2 MDSs report slow requests mdsceph-n1(mds.0): 112 slow requests are blocked > 30 sec mdsceph-n2(mds.1): 323 slow requests are blocked > 30 sec The messages indicate that clients do not release capabilities for opened/cached files. These files are either accessed by other clients (and thus these other clients need to acquire the capabilities), or the MDS runs out of memory and tries to reduce the number of capabilities in his book keeping to reduce the memory footprint. In both cases the client request to open a file is blocked. In case of the second problem, you can increase the mds cache size to allow it to store more inode and capability entries (mds_cache_memory_limit in ceph.conf). You should also try to figure out why the clients do not release the capabilities, e.g. whether they really have a large number of open/cached files. Do you use ceph-fuse or the kernel based implementation? Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Need help
I guess good luck. Maybe you can ask these guys to hurry up and get something production ready. https://github.com/ceph-dovecot/dovecot-ceph-plugin -Original Message- From: marc-antoine desrochers [mailto:marc-antoine.desroch...@sogetel.com] Sent: maandag 10 september 2018 14:40 To: ceph-users@lists.ceph.com Subject: [ceph-users] Need help Hi, I am currently running a ceph cluster running in CEPHFS with 3 nodes each have 6 osd’s except 1 who got 5. I got 3 mds : 2 active and 1 standby, 3 mon. [root@ceph-n1 ~]# ceph -s cluster: id: 1d97aa70-2029-463a-b6fa-20e98f3e21fb health: HEALTH_WARN 3 clients failing to respond to capability release 2 MDSs report slow requests services: mon: 3 daemons, quorum ceph-n1,ceph-n2,ceph-n3 mgr: ceph-n1(active), standbys: ceph-n2, ceph-n3 mds: cephfs-2/2/2 up {0=ceph-n1=up:active,1=ceph-n2=up:active}, 1 up:standby osd: 17 osds: 17 up, 17 in data: pools: 2 pools, 1024 pgs objects: 541k objects, 42006 MB usage: 143 GB used, 6825 GB / 6969 GB avail pgs: 1024 active+clean io: client: 32980 B/s rd, 77295 B/s wr, 5 op/s rd, 14 op/s wr I’m using the cephFs as a mail storage. I currently have 3500 mailbox some of them are IMAP the others are POP3 the goal is to be able to migrate all mailbox from my old infrastructure so around 30 000 mailbox. I’m now facing a problem : MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release mdsceph-n1(mds.0): Client mda3.sogetel.net failing to respond to capability releaseclient_id: 1134426 mdsceph-n1(mds.0): Client mda2.sogetel.net failing to respond to capability releaseclient_id: 1172391 mdsceph-n2(mds.1): Client mda3.sogetel.net failing to respond to capability releaseclient_id: 1134426 MDS_SLOW_REQUEST 2 MDSs report slow requests mdsceph-n1(mds.0): 112 slow requests are blocked > 30 sec mdsceph-n2(mds.1): 323 slow requests are blocked > 30 sec I can’t figure out how to fix this… Here some information’s about my cluster : I’m running ceph luminous 12.2.5 on my 3 ceph nodes : ceph-n1, ceph-n2, ceph-n3. I have 3 client identical : LSB Version::core-4.1-amd64:core-4.1-noarch Distributor ID: Fedora Description:Fedora release 25 (Twenty Five) Release:25 Codename: TwentyFive My ceph nodes : CentOS Linux release 7.5.1804 (Core) NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/"; BUG_REPORT_URL="https://bugs.centos.org/"; CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" CentOS Linux release 7.5.1804 (Core) CentOS Linux release 7.5.1804 (Core) ceph daemon mds.ceph-n1 perf dump mds : "mds": { "request": 21968558, "reply": 21954801, "reply_latency": { "avgcount": 21954801, "sum": 100879.560315258, "avgtime": 0.004594874 }, "forward": 13627, "dir_fetch": 3327, "dir_commit": 162830, "dir_split": 1, "dir_merge": 0, "inode_max": 2147483647, "inodes": 68767, "inodes_top": 4524, "inodes_bottom": 56697, "inodes_pin_tail": 7546, "inodes_pinned": 62304, "inodes_expired": 1640159, "inodes_with_caps": 62192, "caps": 114126, "subtrees": 14, "traverse": 38309963, "traverse_hit": 37606227, "traverse_forward": 12189, "traverse_discover": 6634, "traverse_dir_fetch": 1769, "traverse_remote_ino": 6, "traverse_lock": 7731, "load_cent": 2196856701, "q": 0, "exported": 143, "exported_inodes": 291372, "imported": 125, "imported_inodes": 176509 Thanks for your help… Regards Marc-Antoine ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] upgrade jewel to luminous with ec + cache pool
Yes, migrating to 12.2.8 is fine. Migrating to not use the cache tier is as simple as changing the ec pool mode to allow EC over writes, changing the cache tier mode to forward, flushing the tier, and removing it. Basically once you have EC over writes just follow the steps in the docs for removing a cache tier. On Mon, Sep 10, 2018, 7:29 AM Markus Hickel wrote: > Dear all, > > i am running a cephfs cluster (jewel 10.2.10) with a ec + cache pool. > There is a thread in the ML that states skipping 10.2.11 and going to > 11.2.8 is possible, does this work with ec + cache pool aswell ? > > I also wanted to ask if there is a recommended migration path from cephfs > with ec + cache pool to cephfs with ec pool only ? Creating a second cephfs > and moving the files would come to my mind, but maybe there is a smarter > way ? > > Cheers, > Markus > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Need help
Hi, I am currently running a ceph cluster running in CEPHFS with 3 nodes each have 6 osd's except 1 who got 5. I got 3 mds : 2 active and 1 standby, 3 mon. [root@ceph-n1 ~]# ceph -s cluster: id: 1d97aa70-2029-463a-b6fa-20e98f3e21fb health: HEALTH_WARN 3 clients failing to respond to capability release 2 MDSs report slow requests services: mon: 3 daemons, quorum ceph-n1,ceph-n2,ceph-n3 mgr: ceph-n1(active), standbys: ceph-n2, ceph-n3 mds: cephfs-2/2/2 up {0=ceph-n1=up:active,1=ceph-n2=up:active}, 1 up:standby osd: 17 osds: 17 up, 17 in data: pools: 2 pools, 1024 pgs objects: 541k objects, 42006 MB usage: 143 GB used, 6825 GB / 6969 GB avail pgs: 1024 active+clean io: client: 32980 B/s rd, 77295 B/s wr, 5 op/s rd, 14 op/s wr I'm using the cephFs as a mail storage. I currently have 3500 mailbox some of them are IMAP the others are POP3 the goal is to be able to migrate all mailbox from my old infrastructure so around 30 000 mailbox. I'm now facing a problem : MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release mdsceph-n1(mds.0): Client mda3.sogetel.net failing to respond to capability releaseclient_id: 1134426 mdsceph-n1(mds.0): Client mda2.sogetel.net failing to respond to capability releaseclient_id: 1172391 mdsceph-n2(mds.1): Client mda3.sogetel.net failing to respond to capability releaseclient_id: 1134426 MDS_SLOW_REQUEST 2 MDSs report slow requests mdsceph-n1(mds.0): 112 slow requests are blocked > 30 sec mdsceph-n2(mds.1): 323 slow requests are blocked > 30 sec I can't figure out how to fix this. Here some information's about my cluster : I'm running ceph luminous 12.2.5 on my 3 ceph nodes : ceph-n1, ceph-n2, ceph-n3. I have 3 client identical : LSB Version::core-4.1-amd64:core-4.1-noarch Distributor ID: Fedora Description:Fedora release 25 (Twenty Five) Release:25 Codename: TwentyFive My ceph nodes : CentOS Linux release 7.5.1804 (Core) NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/"; BUG_REPORT_URL="https://bugs.centos.org/"; CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" CentOS Linux release 7.5.1804 (Core) CentOS Linux release 7.5.1804 (Core) ceph daemon mds.ceph-n1 perf dump mds : "mds": { "request": 21968558, "reply": 21954801, "reply_latency": { "avgcount": 21954801, "sum": 100879.560315258, "avgtime": 0.004594874 }, "forward": 13627, "dir_fetch": 3327, "dir_commit": 162830, "dir_split": 1, "dir_merge": 0, "inode_max": 2147483647, "inodes": 68767, "inodes_top": 4524, "inodes_bottom": 56697, "inodes_pin_tail": 7546, "inodes_pinned": 62304, "inodes_expired": 1640159, "inodes_with_caps": 62192, "caps": 114126, "subtrees": 14, "traverse": 38309963, "traverse_hit": 37606227, "traverse_forward": 12189, "traverse_discover": 6634, "traverse_dir_fetch": 1769, "traverse_remote_ino": 6, "traverse_lock": 7731, "load_cent": 2196856701, "q": 0, "exported": 143, "exported_inodes": 291372, "imported": 125, "imported_inodes": 176509 Thanks for your help. Regards Marc-Antoine ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] tcmu-runner could not find handler
On Mon, Sep 10, 2018 at 6:36 AM 展荣臻 wrote: > > hi!everyone: > > I want to export ceph rbd via iscsi。 > ceph version is 10.2.11,centos 7.5 kernel 3.10.0-862.el7.x86_64, > and i also installed > tcmu-runner、targetcli-fb、python-rtslib、ceph-iscsi-config、 ceph-iscsi-cli。 > but when i lanuch "create pool=rbd image=disk_1 size=10G" with gwcli it > says "Failed : 500 INTERNAL SERVER ERROR". > below is content of /var/log/tcmu-runner.log: > 2018-09-10 09:29:38.856 14279 [INFO] dyn_config_start:425: event->mask: 0x800 > 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x4 > 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x400 > 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: > 0x8000 2018-09-10 09:29:38.857 14279 [WARN] tcmu_conf_set_options:156: The > logdir option is not supported by dynamic reloading for now! 2018-09-10 > 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x20 2018-09-10 > 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x1 2018-09-10 > 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x10 2018-09-10 > 10:22:38.449 14279 [DEBUG] handle_netlink:207: cmd 1. Got header version 2. > Supported 2. 2018-09-10 10:22:38.450 14279 [ERROR] add_device:485: could not > find handler for uio0 2018-09-10 18:05:23.720 14279 [DEBUG] > handle_netlink:207: cmd 1. Got header version 2. Supported 2. 2018-09-10 > 18:05:23.721 14279 [ERROR] add_device:485: could not find handler for uio0 > 2018-09-10 18:18:24.393 14279 [DEBUG] handle_netlink:207: cmd 1. Got header > version 2. Supported 2. 2018-09-10 18:18:24.393 14279 [ERROR] add_device:485: > could not find handler for uio0 > > in http://docs.ceph.com/docs/master/rbd/iscsi-overview/, it said required > Ceph Luminous or newer。 > Can someone tell me how to get lio to support jewel? Jewel is not supported for iSCSI (it's actually EOLed as well). I presume that you built your own tcmu-runner? I think it's basically saying that it cannot find the "/usr/lib64/tcmu-runner/handler_rbd.so" plugin for tcmu-runner, which would make sense if it failed to compile in your build environment. > I am from China, my English is not very good, I hope you can understand。 > Thanks for any help anyone can provide!!! > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic upgrade failure
I took a look at the mon log you sent. A few things I noticed: - The frequent mon elections seem to get only 2/3 mons about half of the time. - The messages coming in a mostly osd_failure, and half of those seem to be recoveries (cancellation of the failure message). It does smell a bit like a networking issue, or some tunable that relates to the messaging layer. It might be worth looking at an OSD log for an osd that reported a failure and seeing what error code it coming up on the failed ping connection? That might provide a useful hint (e.g., ECONNREFUSED vs EMFILE or something). I'd also confirm that with nodown set the mon quorum stabilizes... sage On Mon, 10 Sep 2018, Kevin Hrpcek wrote: > Update for the list archive. > > I went ahead and finished the mimic upgrade with the osds in a fluctuating > state of up and down. The cluster did start to normalize a lot easier after > everything was on mimic since the random mass OSD heartbeat failures stopped > and the constant mon election problem went away. I'm still battling with the > cluster reacting poorly to host reboots or small map changes, but I feel like > my current pg:osd ratio may be playing a factor in that since we are 2x normal > pg count while migrating data to new EC pools. > > I'm not sure of the root cause but it seems like the mix of luminous and mimic > did not play well together for some reason. Maybe it has to do with the scale > of my cluster, 871 osd, or maybe I've missed some some tuning as my cluster > has scaled to this size. > > Kevin > > > On 09/09/2018 12:49 PM, Kevin Hrpcek wrote: > > Nothing too crazy for non default settings. Some of those osd settings were > > in place while I was testing recovery speeds and need to be brought back > > closer to defaults. I was setting nodown before but it seems to mask the > > problem. While its good to stop the osdmap changes, OSDs would come up, get > > marked up, but at some point go down again (but the process is still > > running) and still stay up in the map. Then when I'd unset nodown the > > cluster would immediately mark 250+ osd down again and i'd be back where I > > started. > > > > This morning I went ahead and finished the osd upgrades to mimic to remove > > that variable. I've looked for networking problems but haven't found any. 2 > > of the mons are on the same switch. I've also tried combinations of shutting > > down a mon to see if a single one was the problem, but they keep electing no > > matter the mix of them that are up. Part of it feels like a networking > > problem but I haven't been able to find a culprit yet as everything was > > working normally before starting the upgrade. Other than the constant mon > > elections, yesterday I had the cluster 95% healthy 3 or 4 times, but it > > doesn't last long since at some point the OSDs start trying to fail each > > other through their heartbeats. > > 2018-09-09 17:37:29.079 7eff774f5700 1 mon.sephmon1@0(leader).osd e991282 > > prepare_failure osd.39 10.1.9.2:6802/168438 from osd.49 10.1.9.3:6884/317908 > > is reporting failure:1 > > 2018-09-09 17:37:29.079 7eff774f5700 0 log_channel(cluster) log [DBG] : > > osd.39 10.1.9.2:6802/168438 reported failed by osd.49 10.1.9.3:6884/317908 > > 2018-09-09 17:37:29.083 7eff774f5700 1 mon.sephmon1@0(leader).osd e991282 > > prepare_failure osd.93 10.1.9.9:6853/287469 from osd.372 > > 10.1.9.13:6801/275806 is reporting failure:1 > > > > I'm working on getting things mostly good again with everything on mimic and > > will see if it behaves better. > > > > Thanks for your input on this David. > > > > > > [global] > > mon_initial_members = sephmon1, sephmon2, sephmon3 > > mon_host = 10.1.9.201,10.1.9.202,10.1.9.203 > > auth_cluster_required = cephx > > auth_service_required = cephx > > auth_client_required = cephx > > filestore_xattr_use_omap = true > > public_network = 10.1.0.0/16 > > osd backfill full ratio = 0.92 > > osd failsafe nearfull ratio = 0.90 > > osd max object size = 21474836480 > > mon max pg per osd = 350 > > > > [mon] > > mon warn on legacy crush tunables = false > > mon pg warn max per osd = 300 > > mon osd down out subtree limit = host > > mon osd nearfull ratio = 0.90 > > mon osd full ratio = 0.97 > > mon health preluminous compat warning = false > > osd heartbeat grace = 60 > > rocksdb cache size = 1342177280 > > > > [mds] > > mds log max segments = 100 > > mds log max expiring = 40 > > mds bal fragment size max = 20 > > mds cache memory limit = 4294967296 > > > > [osd] > > osd mkfs options xfs = -i size=2048 -d su=512k,sw=1 > > osd recovery delay start = 30 > > osd recovery max active = 5 > > osd max backfills = 3 > > osd recovery threads = 2 > > osd crush initial weight = 0 > > osd heartbeat interval = 30 > > osd heartbeat grace = 60 > > > > > > On 09/08/2018 11:24 PM, David Turner wrote: > > > What osd/mon/etc config settings do you have that are not default? It > > > might be worth utilizing nodown to stop osds from marking
[ceph-users] upgrade jewel to luminous with ec + cache pool
Dear all, i am running a cephfs cluster (jewel 10.2.10) with a ec + cache pool. There is a thread in the ML that states skipping 10.2.11 and going to 11.2.8 is possible, does this work with ec + cache pool aswell ? I also wanted to ask if there is a recommended migration path from cephfs with ec + cache pool to cephfs with ec pool only ? Creating a second cephfs and moving the files would come to my mind, but maybe there is a smarter way ? Cheers, Markus ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] tcmu-runner could not find handler
hi!everyone: I want to export ceph rbd via iscsi。 ceph version is 10.2.11,centos 7.5 kernel 3.10.0-862.el7.x86_64, and i also installed tcmu-runner、targetcli-fb、python-rtslib、ceph-iscsi-config、 ceph-iscsi-cli。 but when i lanuch "create pool=rbd image=disk_1 size=10G" with gwcli it says "Failed : 500 INTERNAL SERVER ERROR". below is content of /var/log/tcmu-runner.log: 2018-09-10 09:29:38.856 14279 [INFO] dyn_config_start:425: event->mask: 0x800 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x4 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x400 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x8000 2018-09-10 09:29:38.857 14279 [WARN] tcmu_conf_set_options:156: The logdir option is not supported by dynamic reloading for now! 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x20 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x1 2018-09-10 09:29:38.857 14279 [INFO] dyn_config_start:425: event->mask: 0x10 2018-09-10 10:22:38.449 14279 [DEBUG] handle_netlink:207: cmd 1. Got header version 2. Supported 2. 2018-09-10 10:22:38.450 14279 [ERROR] add_device:485: could not find handler for uio0 2018-09-10 18:05:23.720 14279 [DEBUG] handle_netlink:207: cmd 1. Got header version 2. Supported 2. 2018-09-10 18:05:23.721 14279 [ERROR] add_device:485: could not find handler for uio0 2018-09-10 18:18:24.393 14279 [DEBUG] handle_netlink:207: cmd 1. Got header version 2. Supported 2. 2018-09-10 18:18:24.393 14279 [ERROR] add_device:485: could not find handler for uio0 in http://docs.ceph.com/docs/master/rbd/iscsi-overview/, it said required Ceph Luminous or newer。 Can someone tell me how to get lio to support jewel? I am from China, my English is not very good, I hope you can understand。 Thanks for any help anyone can provide!!!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Bluestore DB size and onode count
If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. Eg. ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 0 hdd 2.73679 1.0 2802G 1347G 1454G 48.09 0.69 115 So 3TB OSD, roughly half full. This is pure RBD workload (no snapshots or anything clever) so let's assume worse case scenario of 4MB objects (Compression is on however, which would only mean more objects for given size) 1347000/4=~336750 expected objects sudo ceph daemon osd.0 perf dump | grep blue "bluefs": { "bluestore": { "bluestore_allocated": 1437813964800, "bluestore_stored": 2326118994003, "bluestore_compressed": 445228558486, "bluestore_compressed_allocated": 547649159168, "bluestore_compressed_original": 1437773843456, "bluestore_onodes": 99022, "bluestore_onode_hits": 18151499, "bluestore_onode_misses": 4539604, "bluestore_onode_shard_hits": 10596780, "bluestore_onode_shard_misses": 4632238, "bluestore_extents": 896365, "bluestore_blobs": 861495, 99022 onodes, anyone care to enlighten me? 2. block.db Size sudo ceph daemon osd.0 perf dump | grep db "db_total_bytes": 8587829248, "db_used_bytes": 2375024640, 2.3GB=0.17% of data size. This seems a lot lower than the 1% recommendation (10GB for every 1TB) or 4% given in the official docs. I know that different workloads will have differing overheads and potentially smaller objects. But am I understanding these figures correctly as they seem dramatically lower? Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Force unmap of RBD image
Thanks for the suggestions, and will future check for LVM volumes, etc... the kernel version is the following 3.10.0-327.4.4.el7.x86_64 and the OS is CentOS 7.2.1511 (Core) Best, Martin On Mon, Sep 10, 2018 at 12:23 PM Ilya Dryomov wrote: > > On Mon, Sep 10, 2018 at 10:46 AM Martin Palma wrote: > > > > We are trying to unmap an rbd image form a host for deletion and > > hitting the following error: > > > > rbd: sysfs write failed > > rbd: unmap failed: (16) Device or resource busy > > > > We used commands like "lsof" and "fuser" but nothing is reported to > > use the device. Also checked for watcher with "rados -p pool > > listwatchers image.rbd" but there aren't any listed. > > The device is still open by someone. Check for LVM volumes, multipath, > loop devices etc. None of those typically show up in lsof. > > > > > By investigating `/sys/kernel/debug/ceph//osdc` we get: > > > > 160460241osd15019.b2af34image.rbd > > 231954'1271503593144320watch > > Which kernel is that? > > > > > Our goal is to unmap the image for deletion so if the unmap process > > should destroy the image is for us OK. > > > > Any help/suggestions? > > On newer kernels you could do "rbd umap -o force ", but it > looks like you are running an older kernel. > > Thanks, > > Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Force unmap of RBD image
On Mon, Sep 10, 2018 at 10:46 AM Martin Palma wrote: > > We are trying to unmap an rbd image form a host for deletion and > hitting the following error: > > rbd: sysfs write failed > rbd: unmap failed: (16) Device or resource busy > > We used commands like "lsof" and "fuser" but nothing is reported to > use the device. Also checked for watcher with "rados -p pool > listwatchers image.rbd" but there aren't any listed. The device is still open by someone. Check for LVM volumes, multipath, loop devices etc. None of those typically show up in lsof. > > By investigating `/sys/kernel/debug/ceph//osdc` we get: > > 160460241osd15019.b2af34image.rbd > 231954'1271503593144320watch Which kernel is that? > > Our goal is to unmap the image for deletion so if the unmap process > should destroy the image is for us OK. > > Any help/suggestions? On newer kernels you could do "rbd umap -o force ", but it looks like you are running an older kernel. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Tiering stats are blank on Bluestore OSD's
After upgrading a number of OSD's to Bluestore I have noticed that the cache tier OSD's which have so far been upgraded are no longer logging tier_* stats "tier_promote": 0, "tier_flush": 0, "tier_flush_fail": 0, "tier_try_flush": 0, "tier_try_flush_fail": 0, "tier_evict": 0, "tier_whiteout": 0, "tier_dirty": 0, "tier_clean": 0, "tier_delay": 0, "tier_proxy_read": 0, "tier_proxy_write": 0, "osd_tier_flush_lat": { "osd_tier_promote_lat": { "osd_tier_r_lat": { Example from Filestore OSD (both are running 12.2.8) "tier_promote": 265140, "tier_flush": 0, "tier_flush_fail": 0, "tier_try_flush": 88942, "tier_try_flush_fail": 0, "tier_evict": 264773, "tier_whiteout": 35, "tier_dirty": 89314, "tier_clean": 89207, "tier_delay": 0, "tier_proxy_read": 1446068, "tier_proxy_write": 10957517, "osd_tier_flush_lat": { "osd_tier_promote_lat": { "osd_tier_r_lat": { "New Issue" button on tracker seems to cause a 500 error btw Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
-Original message- > From:Alwin Antreich > Sent: Thursday 6th September 2018 18:36 > To: ceph-users > Cc: Menno Zonneveld ; Marc Roos > Subject: Re: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote: > > > > It is idle, testing still, running a backup's at night on it. > > How do you fill up the cluster so you can test between empty and full? > > Do you have a "ceph df" from empty and full? > > > > I have done another test disabling new scrubs on the rbd.ssd pool (but > > still 3 on hdd) with: > > ceph tell osd.* injectargs --osd_max_backfills=0 > > Again getting slower towards the end. > > Bandwidth (MB/sec): 395.749 > > Average Latency(s): 0.161713 > In the results you both had, the latency is twice as high as in our > tests [1]. That can already make quiet some difference. Depending on the > actual hardware used, there may or may not be the possibility for good > optimisation. > > As a start, you could test the disks with fio, as shown in our benchmark > paper, to get some results for comparison. The forum thread [1] has > some benchmarks from other users for comparison. > > [1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/ Thanks for the suggestion, I redid the fio test and one server seem to be causing trouble. When I initially tested our SSD's according to the benchmark paper our Intel SSD's performed more or less equal to the Samsung SSD's used. from fio.log fio: (groupid=0, jobs=1): err= 0: pid=3606315: Mon Sep 10 11:12:36 2018 write: io=4005.9MB, bw=68366KB/s, iops=17091, runt= 60001msec slat (usec): min=5, max=252, avg= 5.76, stdev= 0.66 clat (usec): min=6, max=949, avg=51.72, stdev= 9.54 lat (usec): min=54, max=955, avg=57.48, stdev= 9.56 However one of the other machines (with identical SSD's) now performs poorly compared to the others with these results fio: (groupid=0, jobs=1): err= 0: pid=3893600: Mon Sep 10 11:15:17 2018 write: io=1258.8MB, bw=51801KB/s, iops=12950, runt= 24883msec slat (usec): min=5, max=259, avg= 6.17, stdev= 0.78 clat (usec): min=53, max=857, avg=69.77, stdev=13.11 lat (usec): min=70, max=863, avg=75.93, stdev=13.17 I'll first resolve the slower machine before doing more testing as this surely won't help overall performance. > -- > Cheers, > Alwin Thanks!, Menno ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
I filled up the cluster by accident by not supplying --no-cleanup to the write benchmark, I'm sure there must be a better way for that though. I've run the tests again and when the cluster is 'empty' (I have a few test VM's stored on CEPH) and let it fill up again. Performance goes up from 276.812 to 433.859 MB/sec and latency goes down from 0.231178 to 0.147433. I do have to mention I did find a problem with the cluster thanks to Alwin's suggestion to (re)do fio benchmarks, one server with identical SSD's is performing poorly compared to the others, I'll resolve this first before continuing other benchmarks. When empty: # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 3784G 2488G 1295G 34.24 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ssd 1 431G 37.33 723G 110984 rbdbench 76 0 0 723G 0 # rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup Total time run: 180.223580 Total writes made: 12472 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 276.812 Stddev Bandwidth: 66.2295 Max bandwidth (MB/sec): 524 Min bandwidth (MB/sec): 112 Average IOPS: 69 Stddev IOPS: 16 Max IOPS: 131 Min IOPS: 28 Average Latency(s): 0.231178 Stddev Latency(s): 0.19153 Max latency(s): 1.16432 Min latency(s): 0.022585 And after a few benchmarks when I hit CEPH's warning near-full.: # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 3784G 751G 3032G 80.13 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ssd 1 431G 82.93 90858M 110984 rbdbench 76 579G 86.73 90858M 148467 # rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup Total time run: 180.233495 Total writes made: 19549 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 433.859 Stddev Bandwidth: 73.0601 Max bandwidth (MB/sec): 584 Min bandwidth (MB/sec): 220 Average IOPS: 108 Stddev IOPS: 18 Max IOPS: 146 Min IOPS: 55 Average Latency(s): 0.147433 Stddev Latency(s): 0.103518 Max latency(s): 1.08162 Min latency(s): 0.0218688 -Original message- > From:Marc Roos > Sent: Thursday 6th September 2018 17:15 > To: ceph-users ; Menno Zonneveld > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > > It is idle, testing still, running a backup's at night on it. > How do you fill up the cluster so you can test between empty and full? > Do you have a "ceph df" from empty and full? > > I have done another test disabling new scrubs on the rbd.ssd pool (but > still 3 on hdd) with: > ceph tell osd.* injectargs --osd_max_backfills=0 > Again getting slower towards the end. > Bandwidth (MB/sec): 395.749 > Average Latency(s): 0.161713 > > > -Original Message- > From: Menno Zonneveld [mailto:me...@1afa.com] > Sent: donderdag 6 september 2018 16:56 > To: Marc Roos; ceph-users > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > The benchmark does fluctuate quite a bit that's why I run it for 180 > seconds now as then I do get consistent results. > > Your performance seems on par with what I'm getting with 3 nodes and 9 > OSD's, not sure what to make of that. > > Are your machines actively used perhaps? Mine are mostly idle as it's > still a test setup. > > -Original message- > > From:Marc Roos > > Sent: Thursday 6th September 2018 16:23 > > To: ceph-users ; Menno Zonneveld > > > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > > than expected performance > > > > > > > > I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x > > > LSI SAS2308 1x dual port 10Gbit (one used, and shared between > > cluster/client vlans) > > > > I have 5 pg's scrubbing, but I am not sure if there is any on the ssd > > pool. I am noticing a drop in the performance at the end of the test. > > Maybe some caching on the ssd? > > > > rados bench -p rbd.ssd 60 write -b 4M -t 16 > > Bandwidth (MB/sec): 448.465 > > Average Latency(s): 0.142671 > > > > rados bench -p rbd.ssd 180 write -b 4M -t 16 > > Bandwidth (MB/sec): 381.998 > > Average Latency(s): 0.167524 > > > > > > -Original Message- > > From: Menno Zonneveld [mailto:me...@1afa.com] > > Sent: donderdag 6 september 2018 15:52 > > To: Marc Roos; ceph-users > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > > than expected performance > > > > ah yes, 3x replicated with minimal 2. > > > > > > my ceph.conf is pretty bare, just in case it might be relevant > > > > [global] >
[ceph-users] Force unmap of RBD image
We are trying to unmap an rbd image form a host for deletion and hitting the following error: rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy We used commands like "lsof" and "fuser" but nothing is reported to use the device. Also checked for watcher with "rados -p pool listwatchers image.rbd" but there aren't any listed. By investigating `/sys/kernel/debug/ceph//osdc` we get: 160460241osd15019.b2af34image.rbd 231954'1271503593144320watch Our goal is to unmap the image for deletion so if the unmap process should destroy the image is for us OK. Any help/suggestions? Best, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic upgrade failure
Den mån 10 sep. 2018 kl 08:10 skrev Kevin Hrpcek : > Update for the list archive. > > I went ahead and finished the mimic upgrade with the osds in a fluctuating > state of up and down. The cluster did start to normalize a lot easier after > everything was on mimic since the random mass OSD heartbeat failures > stopped and the constant mon election problem went away. I'm still battling > with the cluster reacting poorly to host reboots or small map changes, but > I feel like my current pg:osd ratio may be playing a factor in that since > we are 2x normal pg count while migrating data to new EC pools. > We found a setting to help us when we had constant reelections, though they were lots more frequent, and not related in the least to Mimic, but bumping the time between elections allowed our cluster to at least start. It voted, decided on a master, the master started (re)playing transactions, got so busy the others called for a new election, same mon won again, restarted the job and repeated over that. Bumping the election to last 30s instead of the default (5?) allowed the mon to finish looking over the things to do and start replying to heartbeats as expected and then it went smoother from there. mon_lease = 30 for future reference. -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com